MuHAVi: Multicamera Human Action Video Data

including selected action sequences with

MAS: Manually Annotated Silhouette Data

for the evaluation of action recognition methods

last updated on the 21st of January 2009


Part of the REASON project funded by the UK's Engineering and Physical Sciences Research Council (EPSRC)

This dataset has been put together by the project's team based at Kingston University's Digital Imaging Research Centre

Organizing the experiments and data collection were performed by the project's team based at Kingston University together with the partners based at the Reading University and the UCL's Pamela.

NOTE: We have also made our virtual human action silhouette data available online (visit our ViHASi page).

 


We have collected a large body of human action video (MuHAVi) data using 8 cameras. There are 17 action classes as listed in Table 2 performed by 14 actors. So far we have processed videos corresponding to 7 actors in order to split the actions and provide the JPG image frames. However, we have included some image frames before and after the actual action, for the purpose of background subtraction, tracking, etc. The longest pre-action frames correspond to the actor called Person1. Each actor performs each action several times in the action zone highlighted using white tapes on the scene floor. As actors were amateurs the leader had to interrupt the actors in some cases and ask them to redo the action for consistency. As shown in Fig. 1 and Table 1, we have used 8 CCTV Schwan cameras located at 4 sides and 4 corners of a rectangular platform. Note that these cameras are not necessarily synchronised. We are working on improving the synchronisation between the images corresponding to different cameras. The current data will then be replaced by the improved version. Calibration information may be included here in the future. Meanwhile, one can use the patterns on the scene floor to calibrate the cameras of interest.

Note that to prepare training data for action recognition methods, each of our action classes may be broken into at least two primitive actions. For instance, the action "WalkTurnBack" consist of walk and turn back primitive actions. Further, although it is not quite natural to have a collapse action due to shotgun followed by standing up action, one can simply split them into two separate action classes.

 

We make the data available to the researchers in computer vision community through a password protected server at the Digital Imaging Research Centre of Kingston University London. The data may be accessed by sending an Email (subjected "MuHAVi-MAS Data") to Dr Sergio Velastin at Sergio.Velastin@kingston.ac.uk giving the names of the researchers who wish to use the data and their main  purposes. The only requirement for using the MuHAVi-MAS data is to refer to this site in the corresponding publications.

 

Figure 1. The top view of the configuration of 8 cameras used to capture the actions in the blue action zone (which is marked with white tapes on the scene floor).

 

camera symbol

camera name

V1 Camera_1
V2 Camera_2
V3 Camera_3
V4 Camera_4
V5 Camera_5
V6 Camera_6
V7 Camera_7
V8 Camera_8

Table 1. Camera view names appearing in the MuHAVi data folders and the corresponding symbols used in Fig. 1.

 

On the table below, you can click on the links to download the data (JPG images) for the corresponding action

Important: note that MS Internet Explorer can not download the files with over 2GB size, use alternative browsers such as Mozilla Firefox.

Each tar file contains 7 folders corresponding to 7 actors (Person1 to Person7) each of which contains 8 folders corresponding to 8 cameras (Camera_1 to Camera_8). Image frames corresponding to every combination of action/actor/camera are named with image frame numbers starting from 00000001.jpg for simplicity. The video frame rate is 25 frames per second and the resolution of image frames (except for Camera_8) is 720 x 576 Pixels (columns x rows). The image resolution is 704 x 576 for Camera_8.

 

action class

action name

size
C1 WalkTurnBack 2.6GB 
C2 RunStop 2.5GB 
C3 Punch 3.0GB 
C4 Kick 3.4GB 
C5 ShotGunCollapse 4.3GB 
C6 PullHeavyObject 4.5GB 
C7 PickupThrowObject 3.0GB 
C8 WalkFall 3.9GB 
C9 LookInCar 4.6GB 
C10 CrawlOnKnees 3.4GB 
C11 WaveArms 2.2GB 
C12 DrawGraffiti 2.7GB 
C13 JumpOverFence 4.4GB 
C14 DrunkWalk 4.0GB 
C15 ClimbLadder 2.1GB 
C16 SmashObject 3.3GB 
C17 JumpOverGap 2.6GB 

Table 2. Action class names appearing in the MuHAVi data folders and the corresponding symbols used in Fig. 3.

 

actor symbol

actor name

A1 Person1
A2 Person2
A3 Person3
A4 Person4
A5 Person5
A6 Person6
A7 Person7

Table 3. Actor names appearing in the MuHAVi data folders and the corresponding symbols used in Fig. 3.

 


 

NEW: Masks obtained by applying two different Tracking/Background Subtraction Methods to some of our Composite Actions

Each zip file contains masks (in their bounding boxes) corresponding to several sequences of composite actions performed by the actor A1 and captured from two camera views (V3 and V4) for the purpose of testing silhouette-based action recognition methods against more realistic input data (in conjunction with our MAS training data provided below), where the need for a temporal segmentation method is also clear.

More data and information to be added ...

Method1

Method2

Camera4 (3.8 MB)

Camera4 (3.7 MB)

Camera3

Camera3 (3.9 MB)

 


MAS: Manually Annotated Silhouette Data


We have selected 5 action classes and manually annotated the corresponding image frames to generate the corresponding silhouettes of the actors. These actions are listed in Table 4. It can be seen that we have only selected 2 actors and 2 camera views for these 5 actions. The silhouettes images are in PNG format and each action combination can be downloaded as a small zip file (between 1 to 3 MB). We have also added 3 constant characters "GT-" to the beginning of every original image name to label them as ground truth images.

On the table below, you can click on the links to download the silhouette data for the corresponding action combinations.

action class

action name

combinations for silhouette annotation
C1 WalkTurnBack Person1Camera3 Person1Camera4 Person4Camera3 Person4Camera4
C2 RunStop Person1Camera3 Person1Camera4 Person4Camera3 Person4Camera4
C3 Punch Person1Camera3 Person1Camera4 Person4Camera3 Person4Camera4
C4 Kick Person1Camera3 Person1Camera4 Person4Camera3 Person4Camera4
C5 ShotgunCollapse Person1Camera3 Person1Camera4 Person4Camera3 Person4Camera4
C6          
C7          
C8          
C9          
C10          
C11          
C12          
C13          
C14          
C15          
C16          
C17          

Table 4. Action combinations corresponding to the MAS data for which ground truth silhouettes have been generated.

 

NEW: We have reorganized these 5 composite action classes as 14 primitive action classes as shown in the table below.

You may download the data by clicking here (32MB).

primitive action class

primitive action name

 no. of samples
C1 CollapseRight 4 * 2 = 8
C2 CollapseLeft 4 * 2 = 8
C3 StandupRight 4 * 2 = 8
C4 StandupLeft 4 * 1 = 4
C5 KickRight 4 * 4 =16
C6 GuardToKick 4 * 4 =16
C7 PunchRight 4 * 4 =16
C8 GuardToPunch 4 * 4 =16
C9 RunRightToLeft 4 * 2 = 8
C10 RunLeftToRight 4 * 2 = 8
C11 WalkRightToLeft 4 * 2 = 8
C12 WalkLeftToRight 4 * 2 = 8
C13 TurnBackRight 4 * 2 = 8
C14 TurnBackLeft 4 * 1 = 4

 

These 14 primitive action classes may also be reorganized in 8 classes where similar actions make a single class as shown in the table below.

primitive action class

primitive action name

 no. of samples
C1 Collapse (Right/Left) 4 * 4 = 16
C2 Standup (Right/Left) 4 * 3 = 12
C3 KickRight 4 * 4 =16
C4 PunchRight 4 * 4 =16
C5 Guard (ToKick/Punch) 4 * 8 =32
C6 Run (Right/Left) 4 * 4 = 16
C7 Walk (Right/Left) 4 * 4 = 16
C8 TurnBack (Right/Left) 4 * 3 = 12

 

Help needed!

Having generated silhouttes for many of these sequences, we would like to ask the community to help generating the silhouettes for more actions/cameras/actors. The priority would be more actions first, then more cameras and finally more actors. We have produced a guide document describing this process using GIMP software. In our experience, generating each silhouette can take 2-3 minutes depending on the annotator. Please contact Dr Sergio Velastin by Email at Sergio.Velastin@kingston.ac.uk to indicate your willingness to assist. We will coordinate this by allocating annotation tasks (you can also let us know which particular part of the dataset you would prefer to annotate) and avoid duplication of effort.

    

    

    

    

    

Figure 2. Sample images of annotated silhouettes from the MAS data (for actor A1) corresponding to 20 selected action sequences (5 action classes, 2 actors and 2 cameras) from the MuHAVi data (as listed in Table 4).

 


         

         

           

         

         

      

Figure 3. Sample image frames from the MuHAVi data for 17 action classes, 7 actors and 8 camera views (as listed in Table 1, 2 and 3, and, shown in Fig. 1).