G3D dataset contains a range of gaming actions captured with Microsoft Kinect. The Kinect enabled us to record synchronised video, depth and skeleton data. The dataset contains 10 subjects performing 20 gaming actions: punch right, punch left, kick right, kick left, defend, golf swing, tennis swing forehand, tennis swing backhand, tennis serve, throw bowling ball, aim and fire gun, walk, run, jump, climb, crouch, steer a car, wave, flap and clap.
The 20 gaming actions are recorded in 7 action sequences as shown in Table 1. Most sequences contain multiple actions in a controlled indoor environment with a fixed camera, a typical setup for gesture based gaming. Each sequence is repeated three times by each subject as shown in Table 2.
Due to the formats selected, it is possible to view all the recorded data and metadata without any special software tools. The three streams were recorded at 30fps in a mirrored view. The depth and colour images were stored as 640x480 PNG files and the skeleton data in XML files.
For each sequence we have recorded each frame as colour, raw depth and depth transformed to colour co-ordinates in PNG format (see Figure 1). The raw depth information contains the depth of each pixel in millimetres and was stored in 16-bit greyscale and the raw colour in 24-bit RGB. The depth information was also mapped to the colour coordinate space and stored in a 16-bit greyscale. The 16-bits of depth data contains 13 bits for depth data and 3 bits to identify the player. The player index can be used to segment the depth maps by user (see Figure 2).
In addition we have recorded skeleton data for each frame in XML format. The root node in the XML file is an array of skeletons to allow for future versions of the dataset to include multiple subjects. Each skeleton contains the player’s position and pose. The pose comprises 20 joints as defined by Microsoft. The player and joint positions are given in X,Y and Z co-ordinates in meters. These positions are also mapped into the depth and colour co-ordinates spaces. The skeleton data includes a joint tracking state, displayed in Figure 3 as tracked (green), inferred (yellow) and not tracked (red).
We make the data available to the researchers in computer vision community, the only requirement for using G3D is to cite our paper:
V. Bloom, V. Argyriou and D. Makris, "Hierarchical transfer learning for online recognition of compound actions", Computer Vision and Image Understanding, vol. 144, pp. 62-72, 2016.
On the table below, you can click on the links to download the data for the corresponding action sequence. Each zip file contains 30 folders corresponding to 10 actors repeating each action 3 times. Each of these folders contains 4 folders corresponding to colour, raw depth, transformed depth and skeleton files. The action point annotations can be downloaded from here.
|Bowling||2.9GB||Throw bowling ball|
|First Person Shooter||14GB||Aim & shoot gun centre|
|Aim & shoot gun right|
|Aim & shoot gun left|
|Driving a car||6.6GB||Hold steering wheel|
*Please note sequence 33 was corrupted and is therefore not available for download at this time.
For any queries related to the dataset please email Dr Victoria Bloom