TY - THES
T1 - Active and Physics-Based Human Pose Reconstruction
AU - Gärtner, Erik
N1 - Defence details
Date: 2023-01-13
Time: 10:15
Place: Lecture Hall MH:Hörmander, Centre for Mathematical Sciences, Sölvegatan 18, Faculty of Engineering LTH, Lund University, Lund. The dissertation will be live streamed, but part of the premises is to be excludes from the live stream.
External reviewer(s)
Name: Khan, Fahad
Title: Docent
Affiliation: Linköping University, Sweden.
---
PY - 2023
Y1 - 2023
N2 - Perceiving humans is an important and complex problem within computervision. Its significance is derived from its numerous applications, suchas human-robot interaction, virtual reality, markerless motion capture,and human tracking for autonomous driving. The difficulty lies in thevariability in human appearance, physique, and plausible body poses. Inreal-world scenes, this is further exacerbated by difficult lightingconditions, partial occlusions, and the depth ambiguity stemming fromthe loss of information during the 3d to 2d projection. Despite thesechallenges, significant progress has been made in recent years,primarily due to the expressive power of deep neural networks trained onlarge datasets. However, creating large-scale datasets with 3dannotations is expensive, and capturing the vast diversity of the realworld is demanding. Traditionally, 3d ground truth is captured usingmotion capture laboratories that require large investments. Furthermore,many laboratories cannot easily accommodate athletic and dynamicmotions. This thesis studies three approaches to improving visualperception, with emphasis on human pose estimation, that can complementimprovements to the underlying predictor or training data.The first two papers present active human pose estimation, where areinforcement learning agent is tasked with selecting informativeviewpoints to reconstruct subjects efficiently. The papers discard thecommon assumption that the input is given and instead allow the agent tomove to observe subjects from desirable viewpoints, e.g., those whichavoid occlusions and for which the underlying pose estimator has a lowprediction error.The third paper introduces the task of embodied visual active learning,which goes further and assumes that the perceptual model is notpre-trained. Instead, the agent is tasked with exploring its environmentand requesting annotations to refine its visual model. Learning toexplore novel scenarios and efficiently request annotation for new datais a step towards life-long learning, where models can evolve beyondwhat they learned during the initial training phase. We study theproblem for segmentation, though the idea is applicable to otherperception tasks.Lastly, the final two papers propose improving human pose estimation byintegrating physical constraints. These regularize the reconstructedmotions to be physically plausible and serve as a complement to currentkinematic approaches. Whether a motion has been observed in the trainingdata or not, the predictions should obey the laws of physics. Throughintegration with a physical simulator, we demonstrate that we can reducereconstruction artifacts and enforce, e.g., contact constraints.
AB - Perceiving humans is an important and complex problem within computervision. Its significance is derived from its numerous applications, suchas human-robot interaction, virtual reality, markerless motion capture,and human tracking for autonomous driving. The difficulty lies in thevariability in human appearance, physique, and plausible body poses. Inreal-world scenes, this is further exacerbated by difficult lightingconditions, partial occlusions, and the depth ambiguity stemming fromthe loss of information during the 3d to 2d projection. Despite thesechallenges, significant progress has been made in recent years,primarily due to the expressive power of deep neural networks trained onlarge datasets. However, creating large-scale datasets with 3dannotations is expensive, and capturing the vast diversity of the realworld is demanding. Traditionally, 3d ground truth is captured usingmotion capture laboratories that require large investments. Furthermore,many laboratories cannot easily accommodate athletic and dynamicmotions. This thesis studies three approaches to improving visualperception, with emphasis on human pose estimation, that can complementimprovements to the underlying predictor or training data.The first two papers present active human pose estimation, where areinforcement learning agent is tasked with selecting informativeviewpoints to reconstruct subjects efficiently. The papers discard thecommon assumption that the input is given and instead allow the agent tomove to observe subjects from desirable viewpoints, e.g., those whichavoid occlusions and for which the underlying pose estimator has a lowprediction error.The third paper introduces the task of embodied visual active learning,which goes further and assumes that the perceptual model is notpre-trained. Instead, the agent is tasked with exploring its environmentand requesting annotations to refine its visual model. Learning toexplore novel scenarios and efficiently request annotation for new datais a step towards life-long learning, where models can evolve beyondwhat they learned during the initial training phase. We study theproblem for segmentation, though the idea is applicable to otherperception tasks.Lastly, the final two papers propose improving human pose estimation byintegrating physical constraints. These regularize the reconstructedmotions to be physically plausible and serve as a complement to currentkinematic approaches. Whether a motion has been observed in the trainingdata or not, the predictions should obey the laws of physics. Throughintegration with a physical simulator, we demonstrate that we can reducereconstruction artifacts and enforce, e.g., contact constraints.
KW - computer vision
KW - human pose estimation
KW - reinforcement learning
KW - physics-based human pose estimation
KW - active learning
M3 - Doctoral Thesis (compilation)
SN - 978-91-8039-471-0
T3 - Dissertation
PB - Department of Computer Science, Lund University
CY - Lund
ER -