MuHAVi: Multicamera Human Action Video Data

including selected action sequences with

MAS: Manually Annotated Silhouette Data

MuHAVi-uncut: Full videos with realistic Silhoutte Data

for the evaluation of human action recognition methods

Last updated September 2017

Originally part of the REASON project funded by the UK's Engineering and Physical Sciences Research Council (EPSRC)

Then part of the OBSERVE project, funded by the Fondecyt Regular Program of the Chilean Research Council for Science and Technology (Conicyt), grant number 1140209.

Currently part of the UC3M-Conex project CloseVU (Close View Visual Understanding) project, funded by the Marie Curie EU Programme.

This dataset was originally put together by a team based at Kingston University's then Digital Imaging Research Centre and continued by a team at the Department of Informatic Engineering at the University of Santiago de Chile.

The updates in Feb-July 2014 and from Sept 2015 have been done as part of a Chair of Excellence/Marie Curie Professorship stay by Prof. Sergio A Velastin at the Applied Artificial Intelligence Group of the Universidad Carlos III de Madrid

Organizing the experiments and data collection were performed by the project's team based at Kingston University together with the partners based at Reading University and UCL's Pamela Laboratory.

NOTE: We have also made our virtual human action silhouette data available online (visit our ViHASi page).

If you publish work that uses this dataset, please use the following references:

For MuHAVi-uncut:

@article{murtaza2016multi,
title={Multi-view human action recognition using 2D motion templates based on MHIs and their HOG description},
author={Murtaza, Fiza and Yousaf, Muhammad Haroon and Velastin, Sergio A},
journal={IET Computer Vision},
volume={10},
number={7},
pages={758--767},
year={2016},
publisher={IET Digital Library}
}

DOI 10.1049/iet-cvi.2015.0416

For MuHAVi-MAS:

@inproceedings{singh2010muhavi,
title={Muhavi: A multicamera human action video dataset for the evaluation of action recognition methods},
author={Singh, Sanchit and Velastin, Sergio A and Ragheb, Hossein},
booktitle={Advanced Video and Signal Based Surveillance (AVSS), 2010 Seventh IEEE International Conference on},
pages={48--55},
year={2010},
organization={IEEE}
}

DOI: 10.1109/AVSS.2010.63

You can also find here a list of publications that use the MuHAVi dataset.

New (12.09.2017):

A brand new set of temporal ground truths for MuHAVi-uncut has been prepared that defines start-end of each "sub-action" in an action
Silhouttes for each of the sub actions in MuHAVi-uncut are now available

Introduction

We have collected a large body of human action video (MuHAVi) data using 8 cameras. There are 17 action classes as listed in Table 2 performed by 14 actors. We initially processed videos corresponding to 7 actors in order to split the actions and provide the JPG image frames. These include included some image frames before and after the actual action, for the purpose of background subtraction, tracking, etc. The longest pre-action frames correspond to the actor called Person1. Note that what we provide is therefore temporally pre-segmented actions as this was typical when the dataset was first released. We now (see below) provide long unsegmented sequences for people to work on temporal segmentation.

Each actor performs each action several times in the action zone highlighted using white tapes on the scene floor. As actors were amateurs, the leader had to interrupt the actors in some cases and ask them to redo the action for consistency. As shown in Fig. 1 and Table 1, we have used 8 CCTV Schwan cameras located at 4 sides and 4 corners of a rectangular platform. Note that these cameras are not synchronised. Camera calibration information may be included here in the future. Meanwhile, one can use the patterns on the scene floor to calibrate the cameras of interest.

Note that to prepare training data for action recognition methods, each of our action classes may be broken into at least two primitive actions. For instance, the action "WalkTurnBack" consist of walk and turn back primitive actions. Further, although it is not quite natural to have a collapse action due to shotgun followed by standing up action, one can simply split them into two separate action classes.

We make the data available to the researchers in computer vision community through a password protected server at the University Carlos III de Madrid, Spain. The data may be accessed by sending an Email (subjected "MuHAVi-MAS Data") to Prof Sergio A Velastin at sergio.velastin@ieee.org giving the names, email addresses and institution(s) of the researchers who wish to use the data and their main purposes. We request this only to build a list of people using this dataset to form a "MuHAVi community" with whom to communitcate. The only requirement for using the MuHAVi data is to refer to this site and to our publication(s) in the corresponding publications.

Figure 1. The top view of the configuration of 8 cameras used to capture the actions in the blue action zone (which is marked with white tapes on the scene floor).

camera symbol

camera name

Camera_1

Camera_2

Camera_3

Camera_4

Camera_5

Camera_6

Camera_7

Camera_8

Table 1. Camera view names appearing in the MuHAVi data folders and the corresponding symbols used in Fig. 1.

*** This section is mainly of historical interest. It is better to download the data in the MuHAVi-uncut set ****

On the table below, you can click on the links to download the data (JPG images) for the corresponding action

Important: We noted that some earlier versions of that earlier versions of MS Internet Explorer could not download files over 2GB size, so we recomment to use alternative browsers such as Firefox or Chrome.

Each tar file contains 7 folders corresponding to 7 actors (Person1 to Person7) each of which contains 8 folders corresponding to 8 cameras (Camera_1 to Camera_8). Image frames corresponding to every combination of action/actor/camera are named with image frame numbers starting from 00000001.jpg for simplicity. The video frame rate is 25 frames per second and the resolution of image frames (except for Camera_8) is 720 x 576 Pixels (columns x rows). The image resolution is 704 x 576 for Camera_8.

action class

action name

size

2.6GB

2.5GB

3.0GB

3.4GB

4.3GB

4.5GB

3.0GB

3.9GB

4.6GB

C10

CrawlOnKnees

3.4GB

C11

WaveArms

2.2GB

C12

DrawGraffiti

2.7GB

C13

JumpOverFence

4.4GB

C14

DrunkWalk

4.0GB

C15

ClimbLadder

2.1GB

C16

SmashObject

3.3GB

C17

JumpOverGap

2.6GB

Table 2. Action class names appearing in the MuHAVi data folders and the corresponding symbols used in Fig. 3.

actor symbol

actor name

Person1

Person2

Person3

Person4

Person5

Person6

Person7

Table 3. Actor names appearing in the MuHAVi data folders and the corresponding symbols used in Fig. 3.

*** end of historical note

NEW MuHAVi "uncut" (November 2014)

So far, MUHaVi has consisted of

(Temporally) manually pre-segmented action sequences (in JPEG files)
Manually annotated silhouettes for a small sub-set of actors/actions/cameras (MuHAVi-MAS dataset).

Thanks to work done by Dr Zezhi Chen of Kingston University, Jorge Sepúlveda of the University of Santiago de Chile (USACH) and Prof. Sergio A Velastin from the Universidad Carlos III de Madrid, we are now able to provide:

Un-cut original video sequences (mainly in MPEG2) for each camera (the recordings are continuous and contain the acted actions but also the gaps and breaks in between).
Ground truth describing times of start and completion (frame numbers) of each sub-action in each video file by each actor (Note: the community's views are wellcome to agree on a set of metrics to evaluate temporal segmentation methods)
Silhouettes computed by Z.Chen´s algorithm (the rationale is that these are realistic silhouettes typical of the state of the art and people are invited to test the robustness of their human action recognition and temporal segmentation algorithms based such realistic, and "imperfect", segmentation)

Here are a couple of samples that do not need a user name and password to download:
Camera2A video sample
Camera2A sample silhouttes (parameter= 6.0)

You can now download the full-length videos from here:
Because of the length of these videos, use RightClick/"Save Link As" and use a high speed network:

Camera	Video file	Size
Camera_1	20080213AM-Cam1.mpg	6.8 GB
Camera_2A	dvcam3-1.mpg	3.3G
Camera_2B	dvcam3-2.mpg	3.8G
Camera_3A	dvcam2-1.mpg	3.3G
Camera_3B	dvcam2-2.mpg	3.8G
Camera_4	20080213AM-Cam4.mpg	7.0G
Camera_5	20080213AM-Cam5.mpg	7.6G
Camera_6A	dvcam1-1.mpg	3.3G
Camera_6B	dvcam1-2.mpg	3.8G
Camera_7	20080213AM-Cam7.mpg	5.7G
Camera_8A	Reason20080213-am1.m2p	2.1G
Camera_8B	Reason20080213-am2.m2p	0.9G

(A and B occur because in the original recordings, the recorder had to be stopped to change media!)

We provide these sets of silhouettes, generated by varying one parameter in the foreground detection algorithm (this is explained after the table):

Sequence	File	File size	Parameter
Camera_1	20080213AM-Cam1-3.0	6.4G	3.0
Camera_1	20080213AM-Cam1-4.0	3.6G	4.0
Camera_1	20080213AM-Cam1-6.0	1.6G	6.0
Camera_1	20080213AM-Cam1-9.0	0.85G	9.0
Camera_2A	dvcam3-1-3.0	1.7G	3.0
Camera_2A	dvcam3-1-4.0	1.1G	4.0
Camera_2A	dvcam3-1-6.0	0.64G	6.0
Camera_2A	dvcam3-1-9.0	0.39G	9.0
Camera_2B	dvcam3-2-3.0	1.9G	3.0
Camera_2B	dvcam3-2-4.0	1.2G	4.0
Camera_2B	dvcam3-2-6.0	0.74G	6.0
Camera_2B	dvcam3-2-9.0	0.44G	9.0
Camera_3A	dvcam2-1-3.0	1.8G	3.0
Camera_3A	dvcam2-1-4.0	1.2G	4.0
Camera_3A	dvcam2-1-6.0	0.73G	6.0
Camera_3A	dvcam2-1-9.0	0.46G	9.0
Camera_3B	dvcam2-2-3.0	2.2G	3.0
Camera_3B	dvcam2-2-4.0	1.4G	4.0
Camera_3B	dvcam2-2-6.0	0.86G	6.0
Camera_3B	dvcam2-2-9.0	0.55G	9.0
Camera_4	20080213AM-Cam4-3.0	5.2G	3.0
Camera_4	20080213AM-Cam4-4.0	3.4G	4.0
Camera_4	20080213AM-Cam4-6.0	2.1G	6.0
Camera_4	20080213AM-Cam4-9.0	1.3G	9.0
Camera_5	20080213AM-Cam5-3.0	7.1G	3.0
Camera_5	20080213AM-Cam5-4.0	3.4G	4.0
Camera_5	20080213AM-Cam5-6.0	2.8G	6.0
Camera_5	20080213AM-Cam5-9.0	1.8G	9.0
Camera_6A	dvcam1-1-3.0	1.4G	3.0
Camera_6A	dvcam1-1-4.0	0.9G	4.0
Camera_6A	dvcam1-1-6.0	0.68G	6.0
Camera_6A	dvcam1-1-9.0	0.42G	9.0
Camera_6B	dvcam1-2-3.0	1.7G	3.0
Camera_6B	dvcam1-2-4.0	1.7G	4.0
Camera_6B	dvcam1-2-6.0	0.72G	6.0
Camera_6B	dvcam1-2-9.0	0.45G	9.0
Camera_7	20080213AM-Cam7-3.0	6.8G	3.0
Camera_7	20080213AM-Cam7-4.0	4.4G	4.0
Camera_7	20080213AM-Cam7-6.0	2.5G	6.0
Camera_7	20080213AM-Cam7-9.0		9.0
Camera_8A	Reason20080213-am1-3.0	0.9G	3.0
Camera_8A	Reason20080213-am1-4.0	0.6G	4.0
Camera_8A	Reason20080213-am1-6.0	0.31G	6.0
Camera_8A	Reason20080213-am1-9.0	0.21G	9.0
Camera_8B	Reason20080213-am2-3.0	0.35G	3.0
Camera_8B	Reason20080213-am2-4.0	0.25G	4.0
Camera_8B	Reason20080213-am2-6.0	0.10G	6.0
Camera_8B	Reason20080213-am2-9.0	0.05G	9.0

Each compressed archive contains files named %d.png where the number is the frame number. In each, file black (0) represents the background, white (255) the foreground i.e. the silhouette and grey (127) is a detected shadow (normally to be considered as background).

The "Parameter" is one of the factors that affect foreground detection in terms of true positives vs false positives. When tested against the manually annotated silhouettes, a value of 3.0 produces a TPR (true positives rate) of around 0.78 and a FPR (false positives rate) of around 0.027 while 4.0 gives around 0.71 and 0.013 and 5.0 gives 0.625 and less than 0.01 (i.e. less noise but less foreground). As many of the false positives tend to be noise outside the main silhouette, we expect that most people will use the set with higher TPR and reduce the false positives e.g. with morphologial filtering. When publishing results can you please ensure that you give full details of any pre-processing of this kind.

foreground ROC curves

**** Historical note
When we first published MuHAVi we provided a spreadsheet with the times (frame numbers of when an action started and when it finished). Incidentally, it also described how the MuHAVi JPEG sequences were obtained from AVI files extracted from manually obtained temporal markers. Please also note that we discovered that there was a bug in mplayer (that converted from AVI to JPEGs) that resulted in some skipped frames in the JPEG sequences). In any case, we have found this to be of less use than we expected because:

Each action (e.g. "walk and turn back") was conducted by each actor a number of times (typically 3), but the annotation only contained the start and end of the (3) actions as a whole and not of each one separately.
Actions such as "walk and turn back" could really regarded as two or three sub-actions: walk (toward one end of the stage), turn, walk (back to the other end of the stage) and it would be nice to annotate them separately
Finally, we found that there were errors in the annotation!

**** end of historical note

The ground truth file can be obtained here in spreadsheet format (we are grateful to Erwann Nguyen-van-Sang, intern MSc student from the U. of Strasbourg, who spent many hours to produce this annotation).

Below is an extract from the spreadsheet.

The first column refers to the camera and actor numbers.
The second column header gives the action (e.g. "WalkTurnBack", "RunStop").
The numbers on the second column for each person give the frame number, in the video sequence, where the action starts and in the third column where it ends (this is somewhat subjective, of course and the community needs to agree on a metric that would not unjustly penalise algorithms).
If the action was repeated (that is almost always the case) the start and end frames are given in the fourth and fifth columns and so on

Camera 2 from dvcam3-1-6.0 and dvcam3-2-6.0
	WalkTurnBack				RunStop
	Start S1	End S1	Start S2	End S2	Start S1	End S1	Start S2	End S2	Start S3	End S3	Start S4	End S4
Actor 1	377	607	627	867	7387	7526	7527	7666	7667	7776	7777	7837
Actor 2	1297	1517	1537	1777	8267	8366	8367	8446	8447	8546	8547	8597
Actor 3	2087	2327	2347	2597	8867	8946	8947	9046	9047	9156	9157	9227
Actor 4	2967	3187	3217	3457	9717	9796	9797	9886	9887	9976	9977	10037
Actor 5	3847	4077	4097	4367	10347	10446	10447	10546	10547	10626	10627	10687
Actor 6	4747	5017	5047	5337	11187	11296	11297	11396	11397	11496	11497	11577
Actor 7	5677	5947	5967	6207	11927	12036	12037	12126	12127	12226	12227	12307

To help those working with this data, we have extracted (from the long silhouettes sequences) each sub-action described in the above spreadsheet into separate sub-sequences (divide up by actor, action and camera). As there are many of those, it is best to download the whole set from here (2.7GB)

**** Historical note

Note: this material is only historical (as we do not have well documented sources of these results)

Masks obtained by applying two different Tracking/Background Subtraction Methods to some of our Composite Actions

Each zip file contains masks (in their bounding boxes) corresponding to several sequences of composite actions performed by the actor A1 and captured from two camera views (V3 and V4) for the purpose of testing silhouette-based action recognition methods against more realistic input data (in conjunction with our MAS training data provided below), where the need for a temporal segmentation method is also clear.

More data and information to be added ...

Method1

Method2

Camera4 (3.8 MB)

Camera4 (3.7 MB)

Camera3

Camera3 (3.9 MB)

*** end of historical section

MuHAVi-MAS: Manually Annotated Silhouette Data

We recommend using this subset of MuHAVi to test Human Action Recognition (HAR) algorithms independently of the quality of silhouettes. For a fuller evalution of a HAR algorithm, please consider using MuHAVi "uncut" instead.

We have selected 5 action classes and manually annotated the corresponding image frames to generate the corresponding silhouettes of the actors. These actions are listed in Table 4. It can be seen that we have only selected 2 actors and 2 camera views for these 5 actions. The silhouettes images are in PNG format and each action combination can be downloaded as a small zip file (between 1 to 3 MB). We have also added 3 constant characters "GT-" to the beginning of every original image name to label them as ground truth images.

On the table below, you can click on the links to download the silhouette data for the corresponding action combinations.

action class

action name

combinations for silhouette annotation

WalkTurnBack

RunStop

Punch

Kick

ShotgunCollapse

C10

C11

C12

C13

C14

C15

C16

C17

Table 4. Action combinations corresponding to the MAS data for which ground truth silhouettes have been generated.

NEW! The table below contains links to the corresponding AVI video files (in MPEG2) from which the JPEG file sequences were extracted which were then used by the manual annotators to get the silhouettes (Note that due to a software bug the JPEG sequences had a couple of frames missing towards the end of the sequence, therefore the AVI files would not exactly correspond to the silhouette frames. As this happens toward the end, it should not significantly affect work that evaluates automatic silhouette segmentation and that uses performance metrics based on aggregating an averaging results over the whole sequence)

action class

action name

combinations for silhouette annotation

WalkTurnBack

RunStop

Punch

Kick

ShotgunCollapse

C10

C11

C12

C13

C14

C15

C16

C17

Table 5. Action combinations corresponding to the MAS data for which ground truth silhouettes have been generated.

Finally, the following table documents the frames that were manually segmented so that you can test foreground segmentation algorithms (i.e. this table tells you the correspondence between JPEG, AVI and PNG frames in the dataset). Please note that the human annotators worked on the JPEG files and hence there is a one to one correspondence between JPEG and PNG files. Because of a bug we later discovered on the version of mplayer that was used to generate the JPEG frames, there is small difference in the number of frames in the AVI files, but we still suggest you use the AVI files as the JPEG were effectively transcoded from the original MPEG2 videos)

ActionActorCamera	GT InitFrame	GT EndFrame	GT NFrames	JPG Nframes	AVI Nframes
KickPerson1Camera3	2370	2911	542	3001	3003
KickPerson1Camera4	2370	2911	542	2997	2999
KickPerson4Camera3	200	628	429	731	733
KickPerson4Camera4	200	628	429	721	723
PunchPerson1Camera3	2140	2607	468	2746	2748
PunchPerson1Camera4	2140	2607	468	2750	2752
PunchPerson4Camera3	92	536	445	642	643
PunchPerson4Camera4	92	536	445	645	647
RunStopPerson1Camera3	980	1418	439	1572	1574
RunStopPerson1Camera4	980	1418	439	1572	1574
RunStopPerson4Camera3	293	618	326	751	753
RunStopPerson4Camera4	293	618	326	749	751
ShotGunCollapsePerson1Camera3	267	1104	838	1444	1446
ShotGunCollapsePerson1Camera4	267	1104	838	1443	1445
ShotGunCollapsePerson4Camera3	319	1208	890	1424	1426
ShotGunCollapsePerson4Camera4	319	1208	890	1424	1426
WalkTurnBackPerson1Camera3	216	682	467	866	868
WalkTurnBackPerson1Camera4	216	682	467	860	862
WalkTurnBackPerson4Camera3	207	672	466	836	838
WalkTurnBackPerson4Camera4	207	672	466	839	841

GTInitFrame: Frame number for the start of the manual annotation

GTEndFrame: Frame number for the end of the manual annotation

GTNFrames: Number of manually annotated frames = (GTEndFrame-GTInitFrame+1)

JPGNFrames: Total number of frames in the JPEG sequence (slightly less than AVINFrames)

AVINFrames: Total number of frames in the AVI sequence

We have reorganized these 5 composite action classes as 14 primitive action classes as shown in the table below.

You may download the data by clicking here (32MB).

primitive action class

primitive action name

no. of samples

CollapseRight

4 * 2 = 8

CollapseLeft

4 * 2 = 8

StandupRight

4 * 2 = 8

StandupLeft

4 * 1 = 4

KickRight

4 * 4 =16

GuardToKick

4 * 4 =16

PunchRight

4 * 4 =16

GuardToPunch

4 * 4 =16

RunRightToLeft

4 * 2 = 8

C10

RunLeftToRight

4 * 2 = 8

C11

WalkRightToLeft

4 * 2 = 8

C12

WalkLeftToRight

4 * 2 = 8

C13

TurnBackRight

4 * 2 = 8

C14

TurnBackLeft

4 * 1 = 4

These 14 primitive action classes may also be reorganized in 8 classes where similar actions make a single class as shown in the table below.

primitive action class

primitive action name

no. of samples

Collapse (Right/Left)

4 * 4 = 16

Standup (Right/Left)

4 * 3 = 12

KickRight

4 * 4 =16

PunchRight

4 * 4 =16

Guard (ToKick/Punch)

4 * 8 =32

Run (Right/Left)

4 * 4 = 16

Walk (Right/Left)

4 * 4 = 16

TurnBack (Right/Left)

4 * 3 = 12

Figure 2. Sample images of annotated silhouettes from the MAS data (for actor A1) corresponding to 20 selected action sequences (5 action classes, 2 actors and 2 cameras) from the MuHAVi data (as listed in Table 4).

Figure 3. Sample image frames from the MuHAVi data for 17 action classes, 7 actors and 8 camera views (as listed in Table 1, 2 and 3, and, shown in Fig. 1).

Publications that use the MuHAVi dataset (if you have any not listed here please let me know):

Singh, Sanchit, Sergio A. Velastin, and Hossein Ragheb. "Muhavi: A multicamera human action video dataset for the evaluation of action recognition methods." In Advanced Video and Signal Based Surveillance (AVSS), 2010 Seventh IEEE International Conference on, pp. 48-55. IEEE, 2010.

Marlon Alcântara, Thierry Moreira, and Hélio Pedrini, “Real-time action recognition based on cumulative motion shapes,” in Acoustics, Speech and Signal Processing (ICASSP), 2014.

@inproceedings{alcantara2014,
author = {Alc\^antara, Marlon and Moreira, Thierry and Pedrini, H\'elio},
title = {Real-Time Action Recognition Based On Cumulative Motion Shapes},
booktitle = {Acoustics, Speech and Signal Processing (ICASSP)},
year = {2014},
}

A. A. Chaaraoui, P. Climent-Pérez, and F. Flórez-Revuelta, “A review on vision techniques applied to Human Behaviour Analysis for Ambient-Assisted Living,” Expert Systems with Applications, vol. 39, no. 12, pp. 10873–10888, 2012.
Available at ScienceDirect: http://www.sciencedirect.com/science/article/pii/S0957417412004757

- Climent-Pérez, Pau, Alexandros Andre Chaaraoui, and Francisco Flórez-Revuelta. "Useful Research Tools for Human Behaviour Understanding in the Context of Ambient Assisted Living." In Ambient Intelligence-Software and Applications, pp. 201-205. Springer Berlin Heidelberg, 2012.
Available at SpringerLink: http://link.springer.com/chapter/10.1007%2F978-3-642-28783-1_25?LI=true
Also uploaded at ResearchGate: http://www.researchgate.net/publication/224960636_Useful_Research_Tools_for_Human_Behaviour_Understanding_in_the_Context_of_Ambient_Assisted_Living

Chaaraoui, Alexandros Andre, Pau Climent-Pérez, and Francisco Flórez-Revuelta. "An efficient approach for multi-view human action recognition based on bag-of-key-poses." In Human Behavior Understanding, pp. 29-40. Springer Berlin Heidelberg, 2012.
Available at SpringerLink: http://link.springer.com/chapter/10.1007%2F978-3-642-34014-7_3?LI=true
Also uploaded at ResearchGate: http://www.researchgate.net/publication/232297472_An_Efficient_Approach_for_Multi-view_Human_Action_Recognition_Based_on_Bag-of-Key-Poses

A. A. Chaaraoui, P. Climent-Pérez, and F. Flórez-Revuelta, “Silhouette-based Human Action Recognition using Sequences of Key Poses,” Pattern Recognition Letters, vol. 34, no. 15, pp. 1799-1807, 2013.
Available at ScienceDirect: http://www.sciencedirect.com/science/article/pii/S0167865513000342
Also uploaded at ResearchGate: http://www.researchgate.net/publication/236306638_Silhouette-based_Human_Action_Recognition_using_Sequences_of_Key_Poses

Chaaraoui, Alexandros Andre, and Francisco Flórez-Revuelta. "Human action recognition optimization based on evolutionary feature subset selection." In Proceeding of the fifteenth annual conference on Genetic and evolutionary computation conference, pp. 1229-1236. ACM, 2013. Available at: http://hdl.handle.net/10045/33675

A. A. Chaaraoui, and F. Flórez-Revuelta, “Optimizing human action recognition based on a cooperative coevolutionary algorithm,” Engineering Applications of Artificial Intelligence, Available online 30 October 2013, ISSN 0952-1976, http://dx.doi.org/10.1016/j.engappai.2013.10.003. Available at ScienceDirect: http://www.sciencedirect.com/science/article/pii/S0952197613002066

A. A. Chaaraoui, and F. Flórez-Revuelta, “Vision-based Recognition of Human Behaviour for Intelligent Environments”, PhD Thesis, University of Alicante, 2014.
Available at: http://hdl.handle.net/10045/36395

Chaaraoui, Alexandros Andre, José Ramón Padilla-López, Francisco Javier Ferrández-Pastor, Mario Nieto-Hidalgo, and Francisco Flórez-Revuelta. "A Vision-Based System for Intelligent Monitoring: Human Behaviour Analysis and Privacy by Context." Sensors 14, no. 5 (2014): 8895-8925.

Cheema, Shahzad, Abdalrahman Eweiwi, Christian Thurau, and Christian Bauckhage. "Action recognition by learning discriminative key poses." In Computer Vision Workshops (ICCV Workshops), 2011 IEEE International Conference on, pp. 1302-1309. IEEE, 2011.

Moghaddam, Zia, and Massimo Piccardi. "Robust density modelling using the student's t-distribution for human action recognition." In Image Processing (ICIP), 2011 18th IEEE International Conference on, pp. 3261-3264. IEEE, 2011.

Martinez-Contreras, Francisco, Carlos Orrite-Urunuela, Elias Herrero-Jaraba, Hossein Ragheb, and Sergio A. Velastin. "Recognizing human actions using silhouette-based HMM." In Advanced Video and Signal Based Surveillance, 2009. AVSS'09. Sixth IEEE International Conference on, pp. 43-48. IEEE, 2009.

Eweiwi, Abdalrahman, Shahzad Cheema, Christian Thurau, and Christian Bauckhage. "Temporal key poses for human action recognition." In Computer Vision Workshops (ICCV Workshops), 2011 IEEE International Conference on, pp. 1310-1317. IEEE, 2011.

Kumari, Sonal, and Suman K. Mitra. "Human Action Recognition Using DFT." In Computer Vision, Pattern Recognition, Image Processing and Graphics (NCVPRIPG), 2011 Third National Conference on, pp. 239-242. IEEE, 2011.

López, Dennis Romero, Anselmo Frizera Neto, and Teodiano Freire Bastos. "Reconocimiento en-línea de acciones humanas basado en patrones de RWE aplicado en ventanas dinámicas de momentos invariantes." Revista Iberoamericana de Automática e Informática Industrial RIAI 11, no. 2 (2014): 202-211.

Karthikeyan, Shanmugavadivel, Utkarsh Gaur, Bangalore S. Manjunath, and Scott Grafton. "Probabilistic subspace-based learning of shape dynamics modes for multi-view action recognition." In Computer Vision Workshops (ICCV Workshops), 2011 IEEE International Conference on, pp. 1282-1286. IEEE, 2011.

Martínez-Usó, Adolfo, G. Salgues, and Sergio A. Velastin. "Evaluation of unsupervised segmentation algorithms for silhouette extraction in human action video sequences." In Visual Informatics: Sustaining Research and Innovations, pp. 13-22. Springer Berlin Heidelberg, 2011.

Piccardi, Massimo, and Zia Moghaddam. "Robust Density Modelling Using the Student's t-distribution for Human Action Recognition." (2011).

Wu, Xinxiao, and Yunde Jia. "View-invariant action recognition using latent kernelized structural SVM." In Computer Vision–ECCV 2012, pp. 411-424. Springer Berlin Heidelberg, 2012.

Moghaddam, Zia, and Massimo Piccardi. "Histogram-based training initialisation of hidden markov models for human action recognition." In Advanced Video and Signal Based Surveillance (AVSS), 2010 Seventh IEEE International Conference on, pp. 256-261. IEEE, 2010.

Gallego, Jaime, and Montse Pardas. "Enhanced bayesian foreground segmentation using brightness and color distortion region-based model for shadow removal." In Image Processing (ICIP), 2010 17th IEEE International Conference on, pp. 3449-3452. IEEE, 2010.

Rahman, Md Junaedur, J. Martínez del Rincón, Jean-Christophe Nebel, and Dimitrios Makris. "Body Pose based Pedestrian Tracking in a Particle Filtering Framework." (2013).

El-Sallam, Amar A., and Ajmal S. Mian. "Human body pose estimation from still images and video frames." In Image Analysis and Recognition, pp. 176-188. Springer Berlin Heidelberg, 2010.

Htike, Zaw Zaw, Simon Egerton, and Kuang Ye Chow. "Monocular viewpoint invariant human activity recognition." In Robotics, Automation and Mechatronics (RAM), 2011 IEEE Conference on, pp. 18-23. IEEE, 2011.

Holte, Michael B., Cuong Tran, Mohan M. Trivedi, and Thomas B. Moeslund. "Human action recognition using multiple views: a comparative perspective on recent developments." In Proceedings of the 2011 joint ACM workshop on Human gesture and behavior understanding, pp. 47-52. ACM, 2011.

Adeli Mosabbeb, Ehsan, Kaamran Raahemifar, and Mahmood Fathy. "Multi-View Human Activity Recognition in Distributed Camera Sensor Networks." Sensors 13, no. 7 (2013): 8750-8770.

Cheng, Zhongwei, Lei Qin, Yituo Ye, Qingming Huang, and Qi Tian. "Human daily action analysis with multi-view and color-depth data." In Computer Vision–ECCV 2012. Workshops and Demonstrations, pp. 52-61. Springer Berlin Heidelberg, 2012.

Abdul Rahman, Farah Yasmin, Aini Hussain, Wan Mimi Diyana Wan Zaki, Halimah Badioze Zaman, and Nooritawati Md Tahir. "Enhancement of Background Subtraction Techniques Using a Second Derivative in Gradient Direction Filter." Journal of Electrical and Computer Engineering 2013 (2013).

Concha, Oscar Perez, Richard Yi Da Xu, and Massimo Piccardi. "Robust Dimensionality Reduction for Human Action Recognition." In Digital Image Computing: Techniques and Applications (DICTA), 2010 International Conference on, pp. 349-356. IEEE, 2010.

Moghaddam, Zia, and Massimo Piccardi. "Training Initialization of Hidden Markov Models in Human Action Recognition." 1-15.

Templates, Motion. "Independent Viewpoint Silhouette-Based Human Action Modeling and Recognition." Handbook on Soft Computing for Video Surveillance (2012): 185.

Borzeshi, Ehsan Zare, Massimo Piccardi, and R. Y. D. Xu. "A discriminative prototype selection approach for graph embedding in human action recognition." In Computer Vision Workshops (ICCV Workshops), 2011 IEEE International Conference on, pp. 1295-1301. IEEE, 2011.

Gallego, Jaime, Montse Pardàs, and Gloria Haro. "Enhanced foreground segmentation and tracking combining Bayesian background, shadow and foreground modeling." Pattern Recognition Letters 33, no. 12 (2012): 1558-1568.

Piccardi, Massimo, Yi Da Xu, and Ehsan Zare Borzeshi. "A discriminative prototype selection approach for graph embedding in human action recognition." (2011).

Borzeshi, Ehsan Zare, Oscar Perez Concha, and Massimo Piccardi. "Human action recognition in video by fusion of structural and spatio-temporal features." In Structural, Syntactic, and Statistical Pattern Recognition, pp. 474-482. Springer Berlin Heidelberg, 2012.

Tweed, David S., and James M. Ferryman. "Enhancing change detection in low-quality surveillance footage using markov random fields." In Proceedings of the 1st ACM workshop on Vision networks for behavior analysis, pp. 23-30. ACM, 2008.

Chen, Fan, and Christophe De Vleeschouwer. "Robust volumetric reconstruction from noisy multi-view foreground occupancy masks." In Asia-Pacific Signal and Information Processing Association Annual Summit and Conference. 2011.

Määttä, Tommi, Aki Härmä, and Hamid Aghajan. "On efficient use of multi-view data for activity recognition." In Proceedings of the Fourth ACM/IEEE International Conference on Distributed Smart Cameras, pp. 158-165. ACM, 2010.

Nebel, Jean-Christophe, Paul Kuo, and Dimitrios Makris. "2D and 3D Pose Recovery from a Single Uncalibrated Video." In Multimedia Analysis, Processing and Communications, pp. 391-411. Springer Berlin Heidelberg, 2011.