MuHAVi: Multicamera Human Action Video Data

including selected action sequences with

MAS: Manually Annotated Silhouette Data

for the evaluation of human action recognition methods

Last updated on the 13 April 2014


Part of the REASON project funded by the UK's Engineering and Physical Sciences Research Council (EPSRC)

This dataset has been put together by the project's team based at Kingston University's Digital Imaging Research Centre

Organizing the experiments and data collection were performed by the project's team based at Kingston University together with the partners based at Reading University and  UCL's Pamela Laboratory.

NOTE: We have also made our virtual human action silhouette data available online (visit our ViHASi page).

If you publish work that uses this dataset, please use the following reference:

S Singh, S.A. Velastin and H Ragheb - "MuHAVi: A Multicamera Human Action Video Dataset for the Evaluation of Action Recognition Methods" in 2nd Workshop on Activity monitoring by multi-camera surveillance systems (AMMCSS), pp. 48--55, August 29, Boston, USA, (2010), DOI: 10.1109/AVSS.2010.63

 


We have collected a large body of human action video (MuHAVi) data using 8 cameras. There are 17 action classes as listed in Table 2 performed by 14 actors. So far we have processed videos corresponding to 7 actors in order to split the actions and provide the JPG image frames. However, we have included some image frames before and after the actual action, for the purpose of background subtraction, tracking, etc. The longest pre-action frames correspond to the actor called Person1. Note that what we provide is therefore temporally segmented actions as this was typical when the dataset was first released. We now experimentally (see below) provide long unsegmented sequences for people to work on temporal segmentation.

Each actor performs each action several times in the action zone highlighted using white tapes on the scene floor. As actors were amateurs the leader had to interrupt the actors in some cases and ask them to redo the action for consistency. As shown in Fig. 1 and Table 1, we have used 8 CCTV Schwan cameras located at 4 sides and 4 corners of a rectangular platform. Note that these cameras are not necessarily synchronised. Camera calibration information may be included here in the future. Meanwhile, one can use the patterns on the scene floor to calibrate the cameras of interest.

Note that to prepare training data for action recognition methods, each of our action classes may be broken into at least two primitive actions. For instance, the action "WalkTurnBack" consist of walk and turn back primitive actions. Further, although it is not quite natural to have a collapse action due to shotgun followed by standing up action, one can simply split them into two separate action classes.

We make the data available to the researchers in computer vision community through a password protected server at the Digital Imaging Research Centre of Kingston University London. The data may be accessed by sending an Email (subjected "MuHAVi-MAS Data") to Prof Sergio A Velastin at sergio.velastin@ieee.org giving the names and email addresses of the researchers who wish to use the data and their main purposes. We request this so as to build a list of people using this dataset to form a "MuHAVi community" with whom to communitcate. The only requirement for using the MuHAVi data is to refer to this site in the corresponding publications.

 

Figure 1. The top view of the configuration of 8 cameras used to capture the actions in the blue action zone (which is marked with white tapes on the scene floor).

camera symbol

camera name

V1 Camera_1
V2 Camera_2
V3 Camera_3
V4 Camera_4
V5 Camera_5
V6 Camera_6
V7 Camera_7
V8 Camera_8

Table 1. Camera view names appearing in the MuHAVi data folders and the corresponding symbols used in Fig. 1.


On the table below, you can click on the links to download the data (JPG images) for the corresponding action

Important: We noted that some earlier versions of that earlier versions of MS Internet Explorer could not download files over 2GB size, so we recomment to use alternative browsers such as Firefox or Chrome.

Each tar file contains 7 folders corresponding to 7 actors (Person1 to Person7) each of which contains 8 folders corresponding to 8 cameras (Camera_1 to Camera_8). Image frames corresponding to every combination of action/actor/camera are named with image frame numbers starting from 00000001.jpg for simplicity. The video frame rate is 25 frames per second and the resolution of image frames (except for Camera_8) is 720 x 576 Pixels (columns x rows). The image resolution is 704 x 576 for Camera_8.

action class

action name

size
C1 WalkTurnBack 2.6GB 
C2 RunStop 2.5GB 
C3 Punch 3.0GB 
C4 Kick 3.4GB 
C5 ShotGunCollapse 4.3GB 
C6 PullHeavyObject 4.5GB 
C7 PickupThrowObject 3.0GB 
C8 WalkFall 3.9GB 
C9 LookInCar 4.6GB 
C10 CrawlOnKnees 3.4GB 
C11 WaveArms 2.2GB 
C12 DrawGraffiti 2.7GB 
C13 JumpOverFence 4.4GB 
C14 DrunkWalk 4.0GB 
C15 ClimbLadder 2.1GB 
C16 SmashObject 3.3GB 
C17 JumpOverGap 2.6GB 

Table 2. Action class names appearing in the MuHAVi data folders and the corresponding symbols used in Fig. 3.

 

actor symbol

actor name

A1 Person1
A2 Person2
A3 Person3
A4 Person4
A5 Person5
A6 Person6
A7 Person7

Table 3. Actor names appearing in the MuHAVi data folders and the corresponding symbols used in Fig. 3.



NEW AND EXPERIMENTAL

So far, MUHaVi has consisted of

Thanks to work done by Dr Zezhi Chen of Kingston University, Jorge Sepúlveda of the University of Santiago de Chile (USACH) and Prof. Sergio A Velastin now at USACH and while on a "Chair of Excellence" stay at the Universidad Carlos III de Madrid, we are now able to provide:

You can now download the full-length videos from here (only one camera is available at the moment, more will be available after consultation with the community). Because of the length of these videos, use RightClick/"Save Link As" and use a high speed network:

Camera
Video file
Size
Camera_1
20080213AM-Cam1.mpg
6.8 GB
Camera_2


Camera_3


Camera_4


Camera_5


Camera_6


Camera_7


Camera_8



Currently we have two sets of silhouettes corresponding to camera 1, generated by varying one parameter in the foreground detection algorithm (this is explained after the table and is also a matter for consultation with the community):

Sequence
File
File size
Parameter
Camera_1
20080213AMCam13.0.tar.gz
6.4 GB
3.0
Camera_1
20080213AMCam14.0.tar.gz
2.2 GB
4.0

Each compressed archive contains files named %d.png where the number is the frame number. In each file black (0) represents the background, white (255) the foreground and grey (127) is a detected shadow (normally to be ignored by an algorithm).

The Parameter is one of the factors that affect foreground detection in terms of true positives vs false positives. When tested against the manually annotated silhouettes, a value of 3.0 produces a TPR of around 0.78 and a FPR of around 0.027 while 4.0 gives 0.71 and 0.013 (i.e. less noise but less foreground). We will seek the opinion of the community as to which of these to use.

foreground ROC curves

The ground truth file can be obtained here in spreadsheet format (Incidentally,  it also describes how the  MuHAVi JPEG sequences were obtained from  AVI files extracted from manually obtained temporal markers. Please also note that we discovered a bug in mplayer (that converted from AVI to JPEGs) that resulted in some skipped frames in the JPEG sequences).

Below is an extract from the spreadsheet.
  1. The first column refers to the camera and actor numbers.
  2. The second column header gives the action.
  3. The numbers on the second column for each person give the frame number, in the video sequence, where the action starts and in the third column where it ends (this is somewhat subjective, of course and the community needs to agree on a metric that would not unjustly penalise algorithms).
  4. The other numbers are to ensure that the same action captured by another camera, has the same number of frames.

Camera_1WalkTurnBack
20080213AM-Cam1.mpg


Time ref7656100:51:02Where synchronisation events takes place. See picture



StartEndStart-RelEnd-RelDuration
Person17643477302-127741868
Person277302781577411596855
Person3781577901715962456860
Person4790177986524563304848
Person5798658080033044239935
Person6808008177542395214975
Person7817758266952146108894







Camera_3WalkTurnBackoffsetdvcam2-1.mpg-127

Time ref144




Person117885

868
Person28851740

855
Person317402600

860
Person426003448

848
Person534484383

935
Person643835358

975
Person753586252

894




Note: this material is only historical 

Masks obtained by applying two different Tracking/Background Subtraction Methods to some of our Composite Actions

Each zip file contains masks (in their bounding boxes) corresponding to several sequences of composite actions performed by the actor A1 and captured from two camera views (V3 and V4) for the purpose of testing silhouette-based action recognition methods against more realistic input data (in conjunction with our MAS training data provided below), where the need for a temporal segmentation method is also clear.

More data and information to be added ...

Method1

Method2

Camera4 (3.8 MB)

Camera4 (3.7 MB)

Camera3

Camera3 (3.9 MB)

*** end of historical section



MuHAVi-MAS: Manually Annotated Silhouette Data


We have selected 5 action classes and manually annotated the corresponding image frames to generate the corresponding silhouettes of the actors. These actions are listed in Table 4. It can be seen that we have only selected 2 actors and 2 camera views for these 5 actions. The silhouettes images are in PNG format and each action combination can be downloaded as a small zip file (between 1 to 3 MB). We have also added 3 constant characters "GT-" to the beginning of every original image name to label them as ground truth images.

On the table below, you can click on the links to download the silhouette data for the corresponding action combinations.

action class

action name

combinations for silhouette annotation
C1 WalkTurnBack Person1Camera3 Person1Camera4 Person4Camera3 Person4Camera4
C2 RunStop Person1Camera3 Person1Camera4 Person4Camera3 Person4Camera4
C3 Punch Person1Camera3 Person1Camera4 Person4Camera3 Person4Camera4
C4 Kick Person1Camera3 Person1Camera4 Person4Camera3 Person4Camera4
C5 ShotgunCollapse Person1Camera3 Person1Camera4 Person4Camera3 Person4Camera4
C6          
C7          
C8          
C9          
C10          
C11          
C12          
C13          
C14          
C15          
C16          
C17          

Table 4. Action combinations corresponding to the MAS data for which ground truth silhouettes have been generated.

 

We have reorganized these 5 composite action classes as 14 primitive action classes as shown in the table below.

You may download the data by clicking here (32MB).

primitive action class

primitive action name

 no. of samples
C1 CollapseRight 4 * 2 = 8
C2 CollapseLeft 4 * 2 = 8
C3 StandupRight 4 * 2 = 8
C4 StandupLeft 4 * 1 = 4
C5 KickRight 4 * 4 =16
C6 GuardToKick 4 * 4 =16
C7 PunchRight 4 * 4 =16
C8 GuardToPunch 4 * 4 =16
C9 RunRightToLeft 4 * 2 = 8
C10 RunLeftToRight 4 * 2 = 8
C11 WalkRightToLeft 4 * 2 = 8
C12 WalkLeftToRight 4 * 2 = 8
C13 TurnBackRight 4 * 2 = 8
C14 TurnBackLeft 4 * 1 = 4

 

These 14 primitive action classes may also be reorganized in 8 classes where similar actions make a single class as shown in the table below.

primitive action class

primitive action name

 no. of samples
C1 Collapse (Right/Left) 4 * 4 = 16
C2 Standup (Right/Left) 4 * 3 = 12
C3 KickRight 4 * 4 =16
C4 PunchRight 4 * 4 =16
C5 Guard (ToKick/Punch) 4 * 8 =32
C6 Run (Right/Left) 4 * 4 = 16
C7 Walk (Right/Left) 4 * 4 = 16
C8 TurnBack (Right/Left) 4 * 3 = 12

 


    

    

    

    

    

Figure 2. Sample images of annotated silhouettes from the MAS data (for actor A1) corresponding to 20 selected action sequences (5 action classes, 2 actors and 2 cameras) from the MuHAVi data (as listed in Table 4).

 


         

         

           

         

         

      

Figure 3. Sample image frames from the MuHAVi data for 17 action classes, 7 actors and 8 camera views (as listed in Table 1, 2 and 3, and, shown in Fig. 1).