MuHAVi: Multicamera Human Action Video Data
including selected action
sequences with
MAS: Manually
Annotated Silhouette
Data
for the evaluation of action
recognition methods
last updated on the
21st of January 2009
Part of the REASON project funded by the UK's Engineering and Physical Sciences Research Council (EPSRC)
This dataset has been put together by the project's team based at Kingston University's Digital Imaging Research Centre
Organizing the experiments and data collection were performed by the
project's team based at Kingston University together with the partners
based at the Reading University and the UCL's Pamela.
NOTE:
We have also made our virtual human action silhouette data available online
(visit our
ViHASi page).
We have collected a large body of human
action video (MuHAVi) data using 8 cameras. There are 17 action classes as listed
in Table 2 performed by 14 actors. So far we have processed videos corresponding
to 7 actors in order to split the actions and provide the JPG image frames.
However, we have included some image frames before and after the actual action,
for the purpose of background subtraction, tracking, etc. The longest pre-action
frames correspond to the actor called Person1. Each actor performs each action
several times in the action zone highlighted using white tapes on the scene
floor. As actors were amateurs the leader had to interrupt the actors in some
cases and ask them to redo the action for consistency. As shown in Fig. 1 and
Table 1, we have used 8 CCTV Schwan
cameras located at 4 sides and 4 corners of a rectangular platform. Note that
these cameras are not necessarily synchronised. We are working on improving the
synchronisation between the images corresponding to different cameras. The
current data will then be replaced by the improved version. Calibration
information may be included here in the future. Meanwhile, one can use the patterns on the scene floor
to calibrate the cameras of interest.
Note that to prepare training data for
action recognition methods, each of our action classes may be broken into at
least two primitive actions. For instance, the action "WalkTurnBack" consist of
walk and turn back primitive actions. Further, although it is not quite natural
to have a collapse action due to shotgun followed by standing up action, one can
simply split them into two separate action classes.
We make the data available to the researchers in computer vision community
through a password protected server at the Digital Imaging Research Centre of
Kingston University London. The data may be accessed by sending an Email
(subjected "MuHAVi-MAS Data") to Dr Sergio Velastin at
Sergio.Velastin@kingston.ac.uk
giving the names of the researchers who wish to use the data and their main
purposes. The only requirement for using the MuHAVi-MAS data is to refer to this
site in the corresponding publications.

Figure 1. The top view of the configuration of 8
cameras used to capture the actions in the blue action zone (which is marked
with white tapes on the scene floor).
| camera symbol |
camera name |
Table 1. Camera view names appearing in the MuHAVi
data folders and the corresponding symbols used in Fig. 1.
On the table below, you can click on the links to download the data (JPG images) for the
corresponding action
Important: note that MS
Internet Explorer can not download the files with over 2GB size, use alternative
browsers such as Mozilla Firefox.
Each tar file contains 7 folders corresponding to 7
actors (Person1 to Person7) each of which contains 8 folders corresponding to 8
cameras (Camera_1 to Camera_8). Image frames corresponding to every combination
of action/actor/camera are named with image frame numbers starting from
00000001.jpg for simplicity. The video frame rate is 25 frames per second and
the resolution of image frames (except for Camera_8) is 720 x 576 Pixels (columns x rows).
The image resolution is 704 x 576 for Camera_8.
| action class |
action name |
size |
Table 2. Action class names appearing in the MuHAVi
data folders and the corresponding symbols used in Fig. 3.
Table 3. Actor names appearing in the MuHAVi
data folders and the corresponding symbols used in Fig. 3.
NEW: Masks obtained by
applying two different Tracking/Background Subtraction Methods to some of our
Composite Actions
Each zip file contains masks (in their bounding boxes)
corresponding to several sequences of composite actions performed by the actor
A1 and captured from two camera views (V3 and V4) for the purpose of testing
silhouette-based action recognition methods against more realistic input data
(in conjunction with our MAS training data provided below), where the need for a
temporal segmentation method is also clear.
More data and information to be added ...
MAS: Manually
Annotated Silhouette
Data
We have selected 5 action classes and
manually annotated the corresponding image frames to generate the corresponding
silhouettes of the actors. These actions are listed in Table 4. It can be seen
that we have only selected 2 actors and 2 camera views for these 5 actions. The
silhouettes images are in PNG format and each action combination can be
downloaded as a small zip file (between 1 to 3 MB). We have also added 3
constant characters "GT-" to the beginning of every original image name to label
them as ground truth images.
On the table below, you can click on the links to download the
silhouette data for the
corresponding action combinations.
| action class |
action name |
combinations
for silhouette annotation |
Table 4. Action combinations corresponding to the MAS
data for which ground truth silhouettes have been generated.
NEW:
We have reorganized these 5 composite action classes as 14 primitive action
classes as shown in the table below.
You may download the data by clicking
here
(32MB).
| primitive
action class |
primitive action name |
no. of samples |
| C1 |
CollapseRight |
4 * 2 = 8 |
| C2 |
CollapseLeft |
4 * 2 = 8 |
| C3 |
StandupRight |
4 * 2 = 8 |
| C8 |
GuardToPunch |
4 * 4 =16 |
| C9 |
RunRightToLeft |
4 * 2 = 8 |
| C10 |
RunLeftToRight |
4 * 2 = 8 |
| C11 |
WalkRightToLeft |
4 * 2 = 8 |
| C12 |
WalkLeftToRight |
4 * 2 = 8 |
| C13 |
TurnBackRight |
4 * 2 = 8 |
| C14 |
TurnBackLeft |
4 * 1 = 4 |
These 14 primitive action classes may also be
reorganized
in 8 classes where similar actions make a single class as
shown in the table below.
| primitive
action class |
primitive action name |
no. of samples |
| C1 |
Collapse (Right/Left) |
4 * 4 = 16 |
| C2 |
Standup (Right/Left) |
4 * 3 = 12 |
| C5 |
Guard (ToKick/Punch) |
4 * 8 =32 |
| C6 |
Run (Right/Left) |
4 * 4 = 16 |
| C7 |
Walk (Right/Left) |
4 * 4 = 16 |
| C8 |
TurnBack (Right/Left) |
4 * 3 = 12 |
Help needed!
Having generated silhouttes for many of these sequences, we would like
to ask the community to help generating the silhouettes for more actions/cameras/actors.
The priority would be more actions first, then more cameras and finally more
actors. We have produced a guide document describing this process using GIMP
software. In our experience, generating each silhouette can take 2-3 minutes
depending on the annotator. Please contact
Dr Sergio Velastin by Email at
Sergio.Velastin@kingston.ac.uk
to indicate your willingness to assist. We will coordinate this by allocating annotation tasks (you can also let us know which particular part of the dataset you would prefer to annotate) and avoid duplication of effort.





Figure 2. Sample images of annotated silhouettes from
the MAS data (for actor A1) corresponding to 20 selected action sequences (5
action classes, 2 actors and 2 cameras) from the MuHAVi data (as listed in Table
4).






Figure 3. Sample image frames from the MuHAVi data for
17 action classes, 7 actors and 8 camera views (as listed in Table 1, 2 and 3,
and, shown in Fig. 1).