Reconstructing Hands in 3D with Transformers

Georgios Pavlakos1     Dandan Shan2     Ilija Radosavovic1     Angjoo Kanazawa1     David Fouhey3     Jitendra Malik1
1University of California, Berkeley   2University of Michigan   3New York University
CVPR 2024

Paper
Paper
Dataset
Data
Github code
Code
Huggging Faces Demo
Demo
Google Colab Demo
Colab



More Results

For these videos, we detect hands using a wholebody keypoint detector. We have also experimented with using a hand bounding box detector. The wholebody detector is more robust for smaller hands, where part of the rest of the body is visible. Since it is using the body context, it is more accurate at estimating side information (left/right). However, it requires running a person detector first. On the other hand, the hand bounding box detector is more accurate with larger hands, does not require additional preprocessing (i.e., human detector), but can miss smaller hands and is more prone to left-right classification errors. We will provide both alternatives with our demo code.



Citation



Acknowledgements

This research was supported by the DARPA Machine Common Sense program, ONR MURI, as well as BAIR/BDD sponsors. We thank members of the BAIR community for helpful discussions. We also thank StabilityAI for supporting us through a compute grant. DF and DS were supported by the National Science Foundation under Grant No. 2006619. This webpage template was borrowed from some colorful folks. Music credits: SLAHMR. Icons: Flaticon.