MPII Human Pose Dataset

Team

Mykhaylo
Andriluka
Leonid
Pishchulin
Peter
Gehler
Bernt
Schiele

If you have any questions, please directly contact leonid at mpi-inf.mpg.de

References

Introduction

MPII Human Pose dataset is a state of the art benchmark for evaluation of articulated human pose estimation. The dataset includes around 25K images containing over 40K people with annotated body joints. The images were systematically collected using an established taxonomy of every day human activities. Overall the dataset covers 410 human activities and each image is provided with an activity label. Each image was extracted from a YouTube video and provided with preceding and following un-annotated frames. In addition, for the test set we obtained richer annotations including body part occlusions and 3D torso and head orientations.

Following the best practices for the performance evaluation benchmarks in the literature we withhold the test annotations to prevent overfitting and tuning on the test set. We are working on an automatic evaluation server and performance analysis tools based on rich test set annotations.

Citing the dataset

@inproceedings{andriluka14cvpr,
               author = {Mykhaylo Andriluka and Leonid Pishchulin and Peter Gehler and Schiele, Bernt}
               title = {2D Human Pose Estimation: New Benchmark and State of the Art Analysis},
               booktitle = {IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
               year = {2014},
               month = {June}
}

Publications

           
CVPR'14 paper   CVPR'14 poster   GCPR'14 paper   GCPR'14 poster

*Red boxes denote training samples

Download

You can download the images and annotations from the MPII Human Pose benchmark here:

Images (12.9 GB)

Annotations (12.5 MB)

Below you can download short videos including preceding and following frames for each image. The videos are split for download into 25 batches ~17 GB each:

Batch 1  Batch 2  Batch 3  Batch 4  Batch 5 

Batch 6  Batch 7  Batch 8  Batch 9  Batch 10 

Batch 11  Batch 12  Batch 13  Batch 14  Batch 15 

Batch 16  Batch 17  Batch 18  Batch 19  Batch 20 

Batch 21  Batch 22  Batch 23  Batch 24  Batch 25

Image - video mapping (239 KB)

Annotation description

Annotations are stored in a matlab structure RELEASE having following fields

  • .annolist(imgidx) - annotations for image imgidx

    • .image.name - image filename
    • .annorect(ridx) - body annotations for a person ridx
      • .x1, .y1, .x2, .y2 - coordinates of the head rectangle
      • .scale - person scale w.r.t. 200 px height
      • .objpos - rough human position in the image
      • .annopoints.point - person-centric body joint annotations
        • .x, .y - coordinates of a joint
        • id - joint id (0 - r ankle, 1 - r knee, 2 - r hip, 3 - l hip, 4 - l knee, 5 - l ankle, 6 - pelvis, 7 - thorax, 8 - upper neck, 9 - head top, 10 - r wrist, 10 - r wrist, 12 - r shoulder, 13 - l shoulder, 14 - l elbow, 15 - l wrist)
        • is_visible - joint visibility
    • .vidx - video index in video_list
    • .frame_sec - image position in video, in seconds
  • img_train(imgidx) - training/testing image assignment
  • single_person(imgidx) - contains rectangle id ridx of sufficiently separated individuals
  • act(imgidx) - activity/category label for image imgidx
    • act_name - activity name
    • cat_name - category name
    • act_id - activity id
  • video_list(videoidx) - specifies video id as is provided by YouTube. To watch video on youtube go to https://www.youtube.com/watch?v=video_list(videoidx)

Evaluation on MPII Human Pose Dataset

This README provides instructions on how to prepare your predictions using MATLAB for evaluation on MPII HumanPose Dataset. Predictions should emailed to [leonid at mpi-inf.mpg.de] and [eldar at mpi-inf.mpg.de].

Preliminaries

Download evaluation toolkit.

Multi-Person Pose Estimation

Evaluation protocol

Preparing predictions

  1. Extract testing annotation list structure from the entire annotation list
    annolist_test = RELEASE.annolist(RELEASE.img_train == 0);
  2. Extract groups of people using getMultiPersonGroups.m function from evaluation toolkit
    load('groups_v12.mat','groups');
    [imgidxs_multi_test,rectidxs_multi_test] = getMultiPersonGroups(groups,RELEASE,false);

    where imgidxs_multi_test are image IDs containing groups and rectidxs_multi_test are rectangle IDs of people in each group.

  3. Split testing images into groups
    pred = annolist_test(imgidxs_multi_test);
  4. Set predicted x_pred, y_pred coordinates in the original image, 0-based joint id (see 'annotation description') and prediction score for each body joint
    pred(imgidx).annorect(ridx).annopoints.point(pidx).x = x_pred;
    pred(imgidx).annorect(ridx).annopoints.point(pidx).y = y_pred;
    pred(imgidx).annorect(ridx).annopoints.point(pidx).id = id;
    pred(imgidx).annorect(ridx).annopoints.point(pidx).score = score;
  5. Save predictions into pred_keypoints_mpii_multi.mat
    save('pred_keypoints_mpii_multi.mat','pred');

Evaluation Script

Evaluation is performed by using evaluateAP.m function

Single Person Pose Estimation

Evaluation protocol

Preparing predictions

  1. Extract testing annotation list structure from the entire annotation list:
    annolist_test = annolist(RELEASE.img_train == 0);
  2. Extract image IDs and rectangle IDs of single persons
    rectidxs = RELEASE.single_person(RELEASE.img_train == 0);
  3. Set predicted x_pred, y_pred coordinates in the original image for each body joint of single persons and 0-based joint id (see 'annotation description')
    pred = annolist_test;
    pred(imgidx).annorect(ridx).annopoints.point(pidx).x = x_pred;
    pred(imgidx).annorect(ridx).annopoints.point(pidx).y = y_pred;
    pred(imgidx).annorect(ridx).annopoints.point(pidx).id = id;
  4. Save predictions into pred_keypoints_mpii.mat
    save('pred_keypoints_mpii.mat','pred');

Evaluation Script

Evaluation is performed by using evaluatePCKh.m function

Evaluation toolkit

Single Person

Evaluation is performed on sufficiently separated people only ("Single Person" subset).

Overall performance

PCKh evaluation measure

PCKh: PCK measure that uses the matching threshold as 50% of the head segment length.

PCKh @ 0.5
Method Head Shoulder Elbow Wrist Hip Knee Ankle PCKh
Pishchulin et al., ICCV'13 74.3 49.0 40.8 34.1 36.5 34.4 35.2 44.1
Tompson et al., NIPS'14 95.8 90.3 80.5 74.3 77.6 69.7 62.8 79.6
Carreira et al., CVPR'16 95.7 91.7 81.7 72.4 82.8 73.2 66.4 81.3
Tompson et al., CVPR'15 96.1 91.9 83.9 77.8 80.9 72.3 64.8 82.0
Hu&Ramanan., CVPR'16 95.0 91.6 83.0 76.6 81.9 74.5 69.5 82.4
Pishchulin et al., CVPR'16* 94.1 90.2 83.4 77.3 82.6 75.7 68.6 82.4
Belagiannis&Zisserman, arXiv'16 97.2 92.6 84.6 78.4 83.7 75.7 70.0 83.9
Lifshitz et al., arXiv'16 97.8 93.3 85.7 80.4 85.3 76.6 70.2 85.0
Gkioxary et al., ECCV'16 96.2 93.1 86.7 82.1 85.2 81.4 74.1 86.1
Rafi et al., BMVC'16 97.2 93.9 86.4 81.3 86.8 80.6 73.4 86.3
Insafutdinov et al., ECCV'16 96.8 95.2 89.3 84.4 88.4 83.4 78.0 88.5
Wei et al., CVPR'16* 97.8 95.0 88.7 84.0 88.4 82.8 79.4 88.5
Bulat&Tzimiropoulos, ECCV'16 97.9 95.1 89.9 85.3 89.4 85.7 81.7 89.7
Newell et al., ECCV'16 98.2 96.3 91.2 87.1 90.1 87.4 83.6 90.9

* methods trained when adding LSP training and LSP extended sets to the MPII training set

Table as TEX



Performance vs. complexity measures

Performance by pose

            

Shown are medoids of body pose clusters aranged according to pose complexity.


Performance by viewpoint and activity

                    

Shown are medoids of 3D torso orientation clusters arranged according to the cluster size:

cluster 1 represents upright frontal torso,

cluster 2 represents slightly rotated upright backward facing torso,

cluster 6 represents torso bending towards the camera, etc.


Multi-Person

Evaluation is performed on groups of multiple people ("Multi-Person" subset).

mAP evaluation measure

Mean Average Precision (mAP) based evaluation of body joint predictions forming cosistent body pose configurations.

Performance on full set

mAP @ 0.5
Method Head Shoulder Elbow Wrist Hip Knee Ankle mAP
Iqbal&Gall, ECCVw'16 58.4 53.9 44.5 35.0 42.2 36.7 31.1 43.1
Insafutdinov et al., ECCV'16 78.4 72.5 60.2 51.0 57.2 52.0 45.4 59.5

Table as TEX


Performance on subset of 288 testing images

mAP @ 0.5
Method Head Shoulder Elbow Wrist Hip Knee Ankle mAP
Pishchulin et al., CVPR'16 73.1 71.7 58.0 39.9 56.1 43.5 31.9 53.5
Iqbal&Gall, ECCVw'16 70.0 65.2 56.4 46.1 52.7 47.9 44.5 54.7
Insafutdinov et al., ECCV'16 87.9 84.0 71.9 63.9 68.8 63.8 58.1 71.2

Table as TEX