TY - GEN
T1 - Iterated Second-Order Label Sensitive Pooling for 3D Human Pose Estimation
AU - Ionescu, Catalin
AU - Carreira, Joao
AU - Sminchisescu, Cristian
PY - 2014
Y1 - 2014
N2 - Recently, the emergence of Kinect systems has demonstrated the benefits of predicting an intermediate body part labeling for 3D human pose estimation, in conjunction with RGB-D imagery. The availability of depth information plays a critical role, so an important question is whether a similar representation can be developed with sufficient robustness in order to estimate 3D pose from RGB images. This paper provides evidence for a positive answer, by leveraging (a) 2D human body part labeling in images, (b) second-order label-sensitive pooling over dynamically computed regions resulting from a hierarchical decomposition of the body, and (c) iterative structured-output modeling to contextualize the process based on 3D pose estimates. For robustness and generalization, we take advantage of a recent large-scale 3D human motion capture dataset, Human3.6M[18] that also has human body part labeling annotations available with images. We provide extensive experimental studies where alternative intermediate representations are compared and report a substantial 33% error reduction over competitive discriminative baselines that regress 3D human pose against global HOG features.
AB - Recently, the emergence of Kinect systems has demonstrated the benefits of predicting an intermediate body part labeling for 3D human pose estimation, in conjunction with RGB-D imagery. The availability of depth information plays a critical role, so an important question is whether a similar representation can be developed with sufficient robustness in order to estimate 3D pose from RGB images. This paper provides evidence for a positive answer, by leveraging (a) 2D human body part labeling in images, (b) second-order label-sensitive pooling over dynamically computed regions resulting from a hierarchical decomposition of the body, and (c) iterative structured-output modeling to contextualize the process based on 3D pose estimates. For robustness and generalization, we take advantage of a recent large-scale 3D human motion capture dataset, Human3.6M[18] that also has human body part labeling annotations available with images. We provide extensive experimental studies where alternative intermediate representations are compared and report a substantial 33% error reduction over competitive discriminative baselines that regress 3D human pose against global HOG features.
U2 - 10.1109/CVPR.2014.215
DO - 10.1109/CVPR.2014.215
M3 - Paper in conference proceeding
SP - 1661
EP - 1668
BT - 2014 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
PB - IEEE - Institute of Electrical and Electronics Engineers Inc.
T2 - 27th IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2014
Y2 - 23 June 2014 through 28 June 2014
ER -