MuHAVi: Multicamera Human Action Video Data

including selected action sequences with

MAS: Manually Annotated Silhouette Data

for the evaluation of human action recognition methods

Last updated on the 31st August 2014




Part of the REASON project funded by the UK's Engineering and Physical Sciences Research Council (EPSRC)

This dataset has been put together by the project's team based at Kingston University's Digital Imaging Research Centre

The updates in 2014 done as part of a Chair of Excellence stay at the Applied Artificial Intelligence Group of the Universidad Carlos III de Madrid


Organizing the experiments and data collection were performed by the project's team based at Kingston University together with the partners based at Reading University and  UCL's Pamela Laboratory.

NOTE: We have also made our virtual human action silhouette data available online (visit our ViHASi page).

If you publish work that uses this dataset, please use the following reference:

S Singh, S.A. Velastin and H Ragheb - "MuHAVi: A Multicamera Human Action Video Dataset for the Evaluation of Action Recognition Methods" in 2nd Workshop on Activity monitoring by multi-camera surveillance systems (AMMCSS), pp. 48--55, August 29, Boston, USA, (2010), DOI: 10.1109/AVSS.2010.63


You can also find here a list of publications that use the MuHAVi dataset.


New (28.08.2014):

 


We have collected a large body of human action video (MuHAVi) data using 8 cameras. There are 17 action classes as listed in Table 2 performed by 14 actors. So far we have processed videos corresponding to 7 actors in order to split the actions and provide the JPG image frames. However, we have included some image frames before and after the actual action, for the purpose of background subtraction, tracking, etc. The longest pre-action frames correspond to the actor called Person1. Note that what we provide is therefore temporally segmented actions as this was typical when the dataset was first released. We now experimentally (see below) provide long unsegmented sequences for people to work on temporal segmentation.

Each actor performs each action several times in the action zone highlighted using white tapes on the scene floor. As actors were amateurs the leader had to interrupt the actors in some cases and ask them to redo the action for consistency. As shown in Fig. 1 and Table 1, we have used 8 CCTV Schwan cameras located at 4 sides and 4 corners of a rectangular platform. Note that these cameras are not necessarily synchronised. Camera calibration information may be included here in the future. Meanwhile, one can use the patterns on the scene floor to calibrate the cameras of interest.

Note that to prepare training data for action recognition methods, each of our action classes may be broken into at least two primitive actions. For instance, the action "WalkTurnBack" consist of walk and turn back primitive actions. Further, although it is not quite natural to have a collapse action due to shotgun followed by standing up action, one can simply split them into two separate action classes.

We make the data available to the researchers in computer vision community through a password protected server at the Digital Imaging Research Centre of Kingston University London. The data may be accessed by sending an Email (subjected "MuHAVi-MAS Data") to Prof Sergio A Velastin at sergio.velastin@ieee.org giving the names and email addresses of the researchers who wish to use the data and their main purposes. We request this so as to build a list of people using this dataset to form a "MuHAVi community" with whom to communitcate. The only requirement for using the MuHAVi data is to refer to this site in the corresponding publications.

 

Figure 1. The top view of the configuration of 8 cameras used to capture the actions in the blue action zone (which is marked with white tapes on the scene floor).

camera symbol

camera name

V1 Camera_1
V2 Camera_2
V3 Camera_3
V4 Camera_4
V5 Camera_5
V6 Camera_6
V7 Camera_7
V8 Camera_8

Table 1. Camera view names appearing in the MuHAVi data folders and the corresponding symbols used in Fig. 1.


On the table below, you can click on the links to download the data (JPG images) for the corresponding action

Important: We noted that some earlier versions of that earlier versions of MS Internet Explorer could not download files over 2GB size, so we recomment to use alternative browsers such as Firefox or Chrome.

Each tar file contains 7 folders corresponding to 7 actors (Person1 to Person7) each of which contains 8 folders corresponding to 8 cameras (Camera_1 to Camera_8). Image frames corresponding to every combination of action/actor/camera are named with image frame numbers starting from 00000001.jpg for simplicity. The video frame rate is 25 frames per second and the resolution of image frames (except for Camera_8) is 720 x 576 Pixels (columns x rows). The image resolution is 704 x 576 for Camera_8.

action class

action name

size
C1 WalkTurnBack 2.6GB 
C2 RunStop 2.5GB 
C3 Punch 3.0GB 
C4 Kick 3.4GB 
C5 ShotGunCollapse 4.3GB 
C6 PullHeavyObject 4.5GB 
C7 PickupThrowObject 3.0GB 
C8 WalkFall 3.9GB 
C9 LookInCar 4.6GB 
C10 CrawlOnKnees 3.4GB 
C11 WaveArms 2.2GB 
C12 DrawGraffiti 2.7GB 
C13 JumpOverFence 4.4GB 
C14 DrunkWalk 4.0GB 
C15 ClimbLadder 2.1GB 
C16 SmashObject 3.3GB 
C17 JumpOverGap 2.6GB 

Table 2. Action class names appearing in the MuHAVi data folders and the corresponding symbols used in Fig. 3.

 

actor symbol

actor name

A1 Person1
A2 Person2
A3 Person3
A4 Person4
A5 Person5
A6 Person6
A7 Person7

Table 3. Actor names appearing in the MuHAVi data folders and the corresponding symbols used in Fig. 3.



NEW MuHAVi "uncut" (June 2014)

So far, MUHaVi has consisted of

Thanks to work done by Dr Zezhi Chen of Kingston University, Jorge Sepúlveda of the University of Santiago de Chile (USACH) and Prof. Sergio A Velastin now at USACH and while on a "Chair of Excellence" stay at the Universidad Carlos III de Madrid, we are now able to provide:

You can now download the full-length videos from here. Because of the length of these videos, use RightClick/"Save Link As" and use a high speed network:

Camera
Video file
Size
Camera_1
20080213AM-Cam1.mpg
6.8 GB
Camera_2A
dvcam3-1.mpg
3.3G
Camera_2B
dvcam3-2.mpg
3.8G
Camera_3A
dvcam2-1.mpg
3.3G
Camera_3B
dvcam2-2.mpg
3.8G
Camera_4
20080213AM-Cam4.mpg
7.0G
Camera_5
20080213AM-Cam5.mpg
7.6G
Camera_6A
dvcam1-1.mpg
3.3G
Camera_6B
dvcam1-2.mpg
3.8G
Camera_7
20080213AM-Cam7.mpg
5.7G
Camera_8A
Reason20080213-am1.m2p
2.1G
Camera_8B
Reason20080213-am2.m2p
0.9G

(A and B occur because in the original recordings, the recorder had to be stopped to change media)

Currently we have these sets of silhouettes, generated by varying one parameter in the foreground detection algorithm (this is explained after the table):

[Apologies: currently only the data for Camera_1, Camera_4, Camera_5 is available]

Sequence
File
File size
Parameter
Camera_1
20080213AM-Cam1-3.0 6.4G
3.0
Camera_1
20080213AM-Cam1-4.0 3.6G
4.0
Camera_1
20080213AM-Cam1-6.0 1.6G
6.0 *
Camera_1 20080213AM-Cam1-9.0 0.8G 9.0 *
Camera_2A
dvcam3-1-3.0
1.7G
3.0
Camera_2A
dvcam3-1-4.0
1.1G
4.0
Camera_2B
dvcam3-2-3.0
1.9G
3.0
Camera_2B
dvcam3-2-4.0
1.2G
4.0
Camera_3A
dvcam2-1-3.0
1.8G
3.0
Camera_3A
dvcam2-1-4.0 1.2G
4.0
Camera_3B
dvcam2-2-3.0 2.2G
3.0
Camera_3B
dvcam2-2-4.0 1.4G
4.0
Camera_4
20080213AM-Cam4-3.0 5.2G
3.0
Camera_4
20080213AM-Cam4-4.0 3.4G
4.0
Camera_5
20080213AM-Cam5-3.0 7.1G
3.0
Camera_5
20080213AM-Cam5-4.0 3.4G
4.0
Camera_6A
dvcam1-1-3.0 1.4G
3.0
Camera_6A
dvcam1-1-4.0 0.9G
4.0
Camera_6B
dvcam1-2-3.0 1.7G
3.0
Camera_6B
dvcam1-2-4.0 1.7G
4.0
Camera_7
20080213AM-Cam7-3.0 6.8G
3.0
Camera_7
20080213AM-Cam7-4.0 4.4G
4.0
Camera_8A
Reason20080213-am1-3.0 0.9G
3.0
Camera_8A
Reason20080213-am1-4.0 0.6G
4.0
Camera_8B
Reason20080213-am2-3.0 0.35G
3.0
Camera_8B
Reason20080213-am2-4.0 0.25G
4.0

* To show possible silhouettes with less noise (false positives) but also smaller silhouettes (less true positives). If there is a strong feeling in the community that we should provide results with either of these setting for all video files, we can do so.

Each compressed archive contains files named %d.png where the number is the frame number. In each, file black (0) represents the background, white (255) the foreground i.e. the silhouette and grey (127) is a detected shadow (normally to be considered as background).

The "Parameter" is one of the factors that affect foreground detection in terms of true positives vs false positives. When tested against the manually annotated silhouettes, a value of 3.0 produces a TPR (true positives rate) of around 0.78 and a FPR (false positives rate) of around 0.027 while 4.0 gives around 0.71 and 0.013 and 5.0 gives 0.625 and less than 0.01 (i.e. less noise but less foreground). As many of the false positives tend to be noise outside the main silhouette, we expect that most people will use the set with higher TPR and reduce the false positives e.g. with morphologial filtering. When publishing results can you please ensure that you give full details of any post-processing of this kind.

foreground ROC curves

The ground truth file can be obtained here in spreadsheet format (Incidentally,  it also describes how the  MuHAVi JPEG sequences were obtained from  AVI files extracted from manually obtained temporal markers. Please also note that we discovered that there was a bug in mplayer (that converted from AVI to JPEGs) that resulted in some skipped frames in the JPEG sequences).

Below is an extract from the spreadsheet.
  1. The first column refers to the camera and actor numbers.
  2. The second column header gives the action.
  3. The numbers on the second column for each person give the frame number, in the video sequence, where the action starts and in the third column where it ends (this is somewhat subjective, of course and the community needs to agree on a metric that would not unjustly penalise algorithms).
  4. The other numbers are to ensure that the same action captured by another camera, has the same number of frames.

Camera_1 WalkTurnBack
20080213AM-Cam1.mpg


Time ref 76561 00:51:02 Where synchronisation events takes place. See picture



Start End Start-Rel End-Rel Duration
Person1 76434 77302 -127 741 868
Person2 77302 78157 741 1596 855
Person3 78157 79017 1596 2456 860
Person4 79017 79865 2456 3304 848
Person5 79865 80800 3304 4239 935
Person6 80800 81775 4239 5214 975
Person7 81775 82669 5214 6108 894







Camera_3 WalkTurnBack offset dvcam2-1.mpg -127

Time ref 144




Person1 17 885

868
Person2 885 1740

855
Person3 1740 2600

860
Person4 2600 3448

848
Person5 3448 4383

935
Person6 4383 5358

975
Person7 5358 6252

894







Note: this material is only historical (as we do not have well documented sources of these results)

Masks obtained by applying two different Tracking/Background Subtraction Methods to some of our Composite Actions

Each zip file contains masks (in their bounding boxes) corresponding to several sequences of composite actions performed by the actor A1 and captured from two camera views (V3 and V4) for the purpose of testing silhouette-based action recognition methods against more realistic input data (in conjunction with our MAS training data provided below), where the need for a temporal segmentation method is also clear.

More data and information to be added ...

Method1

Method2

Camera4 (3.8 MB)

Camera4 (3.7 MB)

Camera3

Camera3 (3.9 MB)

*** end of historical section



MuHAVi-MAS: Manually Annotated Silhouette Data


We recommend using this subset of MuHAVi to test Human Action Recognition (HAR) algorithms independently of the quality of silhouettes. For a fuller evalution of a HAR algorithm, please consider using MuHAVi "uncut" instead.


We have selected 5 action classes and manually annotated the corresponding image frames to generate the corresponding silhouettes of the actors. These actions are listed in Table 4. It can be seen that we have only selected 2 actors and 2 camera views for these 5 actions. The silhouettes images are in PNG format and each action combination can be downloaded as a small zip file (between 1 to 3 MB). We have also added 3 constant characters "GT-" to the beginning of every original image name to label them as ground truth images.

On the table below, you can click on the links to download the silhouette data for the corresponding action combinations.

action class

action name

combinations for silhouette annotation
C1 WalkTurnBack Person1Camera3 Person1Camera4 Person4Camera3 Person4Camera4
C2 RunStop Person1Camera3 Person1Camera4 Person4Camera3 Person4Camera4
C3 Punch Person1Camera3 Person1Camera4 Person4Camera3 Person4Camera4
C4 Kick Person1Camera3 Person1Camera4 Person4Camera3 Person4Camera4
C5 ShotgunCollapse Person1Camera3 Person1Camera4 Person4Camera3 Person4Camera4
C6          
C7          
C8          
C9          
C10          
C11          
C12          
C13          
C14          
C15          
C16          
C17          

Table 4. Action combinations corresponding to the MAS data for which ground truth silhouettes have been generated.

NEW! The table below contains links to the corresponding AVI video files (in MPEG2) from which the JPEG file sequences were extracted which were then used by the manual annotators to get the silhouettes (Note that due to a software bug the JPEG sequences had a couple of frames missing towards the end of the sequence, therefore the AVI files would not exactly correspond to the silhouette frames. As this happens toward the end, it should not significantly affect work that evaluates automatic silhouette segmentation and that uses performance metrics based on aggregating an averaging results over the whole sequence)

action class

action name

combinations for silhouette annotation
C1 WalkTurnBack Person1Camera3 Person1Camera4 Person4Camera3 Person4Camera4
C2 RunStop Person1Camera3 Person1Camera4 Person4Camera3 Person4Camera4
C3 Punch Person1Camera3 Person1Camera4 Person4Camera3 Person4Camera4
C4 Kick Person1Camera3 Person1Camera4 Person4Camera3 Person4Camera4
C5 ShotgunCollapse Person1Camera3 Person1Camera4 Person4Camera3 Person4Camera4
C6          
C7          
C8          
C9          
C10          
C11          
C12          
C13          
C14          
C15          
C16          
C17          

Table 5. Action combinations corresponding to the MAS data for which ground truth silhouettes have been generated.

Finally, the following table documents the frames that were manually segmented so that you can test foreground segmentation algorithms (i.e. this table tells you the correspondence between JPEG, AVI and PNG frames in the dataset). Please note that the human annotators worked on the JPEG files and hence there is a one to one correspondence between JPEG and PNG files. Because of a bug we later discovered on the version of mplayer that was used to generate the JPEG frames, there is small difference in the number of frames in the AVI files, but we still suggest you use the AVI files as the JPEG were effectively transcoded from the original MPEG2 videos)

ActionActorCamera GT InitFrame GT EndFrame GT NFrames JPG Nframes AVI Nframes
KickPerson1Camera3 2370 2911 542 3001 3003
KickPerson1Camera4 2370 2911 542 2997 2999
KickPerson4Camera3 200 628 429 731 733
KickPerson4Camera4 200 628 429 721 723
PunchPerson1Camera3 2140 2607 468 2746 2748
PunchPerson1Camera4 2140 2607 468 2750 2752
PunchPerson4Camera3 92 536 445 642 643
PunchPerson4Camera4 92 536 445 645 647
RunStopPerson1Camera3 980 1418 439 1572 1574
RunStopPerson1Camera4 980 1418 439 1572 1574
RunStopPerson4Camera3 293 618 326 751 753
RunStopPerson4Camera4 293 618 326 749 751
ShotGunCollapsePerson1Camera3 267 1104 838 1444 1446
ShotGunCollapsePerson1Camera4 267 1104 838 1443 1445
ShotGunCollapsePerson4Camera3 319 1208 890 1424 1426
ShotGunCollapsePerson4Camera4 319 1208 890 1424 1426
WalkTurnBackPerson1Camera3 216 682 467 866 868
WalkTurnBackPerson1Camera4 216 682 467 860 862
WalkTurnBackPerson4Camera3 207 672 466 836 838
WalkTurnBackPerson4Camera4 207 672 466 839 841

GTInitFrame:    Frame number for the start of the manual annotation
GTEndFrame:   Frame number for the end of the manual annotation
GTNFrames:  Number of manually annotated frames = (GTEndFrame-GTInitFrame+1)
JPGNFrames: Total number of frames in the JPEG sequence (slightly less than AVINFrames)
AVINFrames: Total number of frames in the AVI sequence



We have reorganized these 5 composite action classes as 14 primitive action classes as shown in the table below.

You may download the data by clicking here (32MB).

primitive action class

primitive action name

 no. of samples
C1 CollapseRight 4 * 2 = 8
C2 CollapseLeft 4 * 2 = 8
C3 StandupRight 4 * 2 = 8
C4 StandupLeft 4 * 1 = 4
C5 KickRight 4 * 4 =16
C6 GuardToKick 4 * 4 =16
C7 PunchRight 4 * 4 =16
C8 GuardToPunch 4 * 4 =16
C9 RunRightToLeft 4 * 2 = 8
C10 RunLeftToRight 4 * 2 = 8
C11 WalkRightToLeft 4 * 2 = 8
C12 WalkLeftToRight 4 * 2 = 8
C13 TurnBackRight 4 * 2 = 8
C14 TurnBackLeft 4 * 1 = 4

 

These 14 primitive action classes may also be reorganized in 8 classes where similar actions make a single class as shown in the table below.

primitive action class

primitive action name

 no. of samples
C1 Collapse (Right/Left) 4 * 4 = 16
C2 Standup (Right/Left) 4 * 3 = 12
C3 KickRight 4 * 4 =16
C4 PunchRight 4 * 4 =16
C5 Guard (ToKick/Punch) 4 * 8 =32
C6 Run (Right/Left) 4 * 4 = 16
C7 Walk (Right/Left) 4 * 4 = 16
C8 TurnBack (Right/Left) 4 * 3 = 12

 


    

    

    

    

    

Figure 2. Sample images of annotated silhouettes from the MAS data (for actor A1) corresponding to 20 selected action sequences (5 action classes, 2 actors and 2 cameras) from the MuHAVi data (as listed in Table 4).

 


         

         

           

         

         

      

Figure 3. Sample image frames from the MuHAVi data for 17 action classes, 7 actors and 8 camera views (as listed in Table 1, 2 and 3, and, shown in Fig. 1).


Publications that use the MuHAVi dataset (if you have any not listed here please let me know):

Singh, Sanchit, Sergio A. Velastin, and Hossein Ragheb. "Muhavi: A multicamera human action video dataset for the evaluation of action recognition methods." In Advanced Video and Signal Based Surveillance (AVSS), 2010 Seventh IEEE International Conference on, pp. 48-55. IEEE, 2010.

Marlon Alcântara, Thierry Moreira, and Hélio Pedrini, “Real-time action recognition based on cumulative motion shapes,” in Acoustics, Speech and Signal Processing (ICASSP), 2014.

@inproceedings{alcantara2014,
  author = {Alc\^antara, Marlon and Moreira, Thierry and Pedrini, H\'elio},
  title = {Real-Time Action Recognition Based On Cumulative Motion Shapes},
  booktitle = {Acoustics, Speech and Signal Processing (ICASSP)},
  year = {2014},
}

A. A. Chaaraoui, P. Climent-Pérez, and F. Flórez-Revuelta, “A review on vision techniques applied to Human Behaviour Analysis for Ambient-Assisted Living,” Expert Systems with Applications, vol. 39, no. 12, pp. 10873–10888, 2012.
Available at ScienceDirect:
http://www.sciencedirect.com/science/article/pii/S0957417412004757

-          Climent-Pérez, Pau, Alexandros Andre Chaaraoui, and Francisco Flórez-Revuelta. "Useful Research Tools for Human Behaviour Understanding in the Context of Ambient Assisted Living." In Ambient Intelligence-Software and Applications, pp. 201-205. Springer Berlin Heidelberg, 2012.
Available at SpringerLink:
http://link.springer.com/chapter/10.1007%2F978-3-642-28783-1_25?LI=true
Also uploaded at ResearchGate:
http://www.researchgate.net/publication/224960636_Useful_Research_Tools_for_Human_Behaviour_Understanding_in_the_Context_of_Ambient_Assisted_Living

Chaaraoui, Alexandros Andre, Pau Climent-Pérez, and Francisco Flórez-Revuelta. "An efficient approach for multi-view human action recognition based on bag-of-key-poses." In Human Behavior Understanding, pp. 29-40. Springer Berlin Heidelberg, 2012.
Available at SpringerLink:
http://link.springer.com/chapter/10.1007%2F978-3-642-34014-7_3?LI=true
Also uploaded at ResearchGate:
http://www.researchgate.net/publication/232297472_An_Efficient_Approach_for_Multi-view_Human_Action_Recognition_Based_on_Bag-of-Key-Poses

A. A. Chaaraoui, P. Climent-Pérez, and F. Flórez-Revuelta, “Silhouette-based Human Action Recognition using Sequences of Key Poses,” Pattern Recognition Letters, vol. 34, no. 15, pp. 1799-1807, 2013.
Available at ScienceDirect:
http://www.sciencedirect.com/science/article/pii/S0167865513000342
Also uploaded at ResearchGate:
http://www.researchgate.net/publication/236306638_Silhouette-based_Human_Action_Recognition_using_Sequences_of_Key_Poses

Chaaraoui, Alexandros Andre, and Francisco Flórez-Revuelta. "Human action recognition optimization based on evolutionary feature subset selection." In Proceeding of the fifteenth annual conference on Genetic and evolutionary computation conference, pp. 1229-1236. ACM, 2013. Available at:
http://hdl.handle.net/10045/33675

A. A. Chaaraoui, and F. Flórez-Revuelta, “Optimizing human action recognition based on a cooperative coevolutionary algorithm,” Engineering Applications of Artificial Intelligence, Available online 30 October 2013, ISSN 0952-1976, http://dx.doi.org/10.1016/j.engappai.2013.10.003. Available at ScienceDirect: http://www.sciencedirect.com/science/article/pii/S0952197613002066

A. A. Chaaraoui, and F. Flórez-Revuelta, “Vision-based Recognition of Human Behaviour for Intelligent Environments”, PhD Thesis, University of Alicante, 2014.
Available at:
http://hdl.handle.net/10045/36395

Chaaraoui, Alexandros Andre, José Ramón Padilla-López, Francisco Javier Ferrández-Pastor, Mario Nieto-Hidalgo, and Francisco Flórez-Revuelta. "A Vision-Based System for Intelligent Monitoring: Human Behaviour Analysis and Privacy by Context." Sensors 14, no. 5 (2014): 8895-8925.

Cheema, Shahzad, Abdalrahman Eweiwi, Christian Thurau, and Christian Bauckhage. "Action recognition by learning discriminative key poses." In Computer Vision Workshops (ICCV Workshops), 2011 IEEE International Conference on, pp. 1302-1309. IEEE, 2011.

Moghaddam, Zia, and Massimo Piccardi. "Robust density modelling using the student's t-distribution for human action recognition." In Image Processing (ICIP), 2011 18th IEEE International Conference on, pp. 3261-3264. IEEE, 2011.

Martinez-Contreras, Francisco, Carlos Orrite-Urunuela, Elias Herrero-Jaraba, Hossein Ragheb, and Sergio A. Velastin. "Recognizing human actions using silhouette-based HMM." In Advanced Video and Signal Based Surveillance, 2009. AVSS'09. Sixth IEEE International Conference on, pp. 43-48. IEEE, 2009.

Eweiwi, Abdalrahman, Shahzad Cheema, Christian Thurau, and Christian Bauckhage. "Temporal key poses for human action recognition." In Computer Vision Workshops (ICCV Workshops), 2011 IEEE International Conference on, pp. 1310-1317. IEEE, 2011.

Kumari, Sonal, and Suman K. Mitra. "Human Action Recognition Using DFT." In Computer Vision, Pattern Recognition, Image Processing and Graphics (NCVPRIPG), 2011 Third National Conference on, pp. 239-242. IEEE, 2011.

López, Dennis Romero, Anselmo Frizera Neto, and Teodiano Freire Bastos. "Reconocimiento en-línea de acciones humanas basado en patrones de RWE aplicado en ventanas dinámicas de momentos invariantes." Revista Iberoamericana de Automática e Informática Industrial RIAI 11, no. 2 (2014): 202-211.

Karthikeyan, Shanmugavadivel, Utkarsh Gaur, Bangalore S. Manjunath, and Scott Grafton. "Probabilistic subspace-based learning of shape dynamics modes for multi-view action recognition." In Computer Vision Workshops (ICCV Workshops), 2011 IEEE International Conference on, pp. 1282-1286. IEEE, 2011.

Martínez-Usó, Adolfo, G. Salgues, and Sergio A. Velastin. "Evaluation of unsupervised segmentation algorithms for silhouette extraction in human action video sequences." In Visual Informatics: Sustaining Research and Innovations, pp. 13-22. Springer Berlin Heidelberg, 2011.

Piccardi, Massimo, and Zia Moghaddam. "Robust Density Modelling Using the Student's t-distribution for Human Action Recognition." (2011).

Wu, Xinxiao, and Yunde Jia. "View-invariant action recognition using latent kernelized structural SVM." In Computer Vision–ECCV 2012, pp. 411-424. Springer Berlin Heidelberg, 2012.

Moghaddam, Zia, and Massimo Piccardi. "Histogram-based training initialisation of hidden markov models for human action recognition." In Advanced Video and Signal Based Surveillance (AVSS), 2010 Seventh IEEE International Conference on, pp. 256-261. IEEE, 2010.

Gallego, Jaime, and Montse Pardas. "Enhanced bayesian foreground segmentation using brightness and color distortion region-based model for shadow removal." In Image Processing (ICIP), 2010 17th IEEE International Conference on, pp. 3449-3452. IEEE, 2010.

Rahman, Md Junaedur, J. Martínez del Rincón, Jean-Christophe Nebel, and Dimitrios Makris. "Body Pose based Pedestrian Tracking in a Particle Filtering Framework." (2013).

El-Sallam, Amar A., and Ajmal S. Mian. "Human body pose estimation from still images and video frames." In Image Analysis and Recognition, pp. 176-188. Springer Berlin Heidelberg, 2010.

Htike, Zaw Zaw, Simon Egerton, and Kuang Ye Chow. "Monocular viewpoint invariant human activity recognition." In Robotics, Automation and Mechatronics (RAM), 2011 IEEE Conference on, pp. 18-23. IEEE, 2011.

Holte, Michael B., Cuong Tran, Mohan M. Trivedi, and Thomas B. Moeslund. "Human action recognition using multiple views: a comparative perspective on recent developments." In Proceedings of the 2011 joint ACM workshop on Human gesture and behavior understanding, pp. 47-52. ACM, 2011.

Adeli Mosabbeb, Ehsan, Kaamran Raahemifar, and Mahmood Fathy. "Multi-View Human Activity Recognition in Distributed Camera Sensor Networks." Sensors 13, no. 7 (2013): 8750-8770.

Cheng, Zhongwei, Lei Qin, Yituo Ye, Qingming Huang, and Qi Tian. "Human daily action analysis with multi-view and color-depth data." In Computer Vision–ECCV 2012. Workshops and Demonstrations, pp. 52-61. Springer Berlin Heidelberg, 2012.

Abdul Rahman, Farah Yasmin, Aini Hussain, Wan Mimi Diyana Wan Zaki, Halimah Badioze Zaman, and Nooritawati Md Tahir. "Enhancement of Background Subtraction Techniques Using a Second Derivative in Gradient Direction Filter." Journal of Electrical and Computer Engineering 2013 (2013).

Concha, Oscar Perez, Richard Yi Da Xu, and Massimo Piccardi. "Robust Dimensionality Reduction for Human Action Recognition." In Digital Image Computing: Techniques and Applications (DICTA), 2010 International Conference on, pp. 349-356. IEEE, 2010.

Moghaddam, Zia, and Massimo Piccardi. "Training Initialization of Hidden Markov Models in Human Action Recognition." 1-15.

Templates, Motion. "Independent Viewpoint Silhouette-Based Human Action Modeling and Recognition." Handbook on Soft Computing for Video Surveillance (2012): 185.

Borzeshi, Ehsan Zare, Massimo Piccardi, and R. Y. D. Xu. "A discriminative prototype selection approach for graph embedding in human action recognition." In Computer Vision Workshops (ICCV Workshops), 2011 IEEE International Conference on, pp. 1295-1301. IEEE, 2011.

Gallego, Jaime, Montse Pardàs, and Gloria Haro. "Enhanced foreground segmentation and tracking combining Bayesian background, shadow and foreground modeling." Pattern Recognition Letters 33, no. 12 (2012): 1558-1568.

Piccardi, Massimo, Yi Da Xu, and Ehsan Zare Borzeshi. "A discriminative prototype selection approach for graph embedding in human action recognition." (2011).

Borzeshi, Ehsan Zare, Oscar Perez Concha, and Massimo Piccardi. "Human action recognition in video by fusion of structural and spatio-temporal features." In Structural, Syntactic, and Statistical Pattern Recognition, pp. 474-482. Springer Berlin Heidelberg, 2012.

Tweed, David S., and James M. Ferryman. "Enhancing change detection in low-quality surveillance footage using markov random fields." In Proceedings of the 1st ACM workshop on Vision networks for behavior analysis, pp. 23-30. ACM, 2008.

Chen, Fan, and Christophe De Vleeschouwer. "Robust volumetric reconstruction from noisy multi-view foreground occupancy masks." In Asia-Pacific Signal and Information Processing Association Annual Summit and Conference. 2011.

Määttä, Tommi, Aki Härmä, and Hamid Aghajan. "On efficient use of multi-view data for activity recognition." In Proceedings of the Fourth ACM/IEEE International Conference on Distributed Smart Cameras, pp. 158-165. ACM, 2010.

Nebel, Jean-Christophe, Paul Kuo, and Dimitrios Makris. "2D and 3D Pose Recovery from a Single Uncalibrated Video." In Multimedia Analysis, Processing and Communications, pp. 391-411. Springer Berlin Heidelberg, 2011.