D e m o H L M

From One Demonstration to Generalizable Humanoid Loco-Manipulation


PKU    BeingBeyond

DemoHLM: From One Demonstration to Generalizable Humanoid Loco-Manipulation

Yuhui Fu*1,2   Feiyang Xie*1,2   Chaoyi Xu2   Jing Xiong1   Haoqi Yuan1,2   Zongqing Lu§1,2


1PKU     2BeingBeyond


*Equal Contribution   §Corresponding Author

Abstract

    Loco-manipulation is a fundamental challenge for humanoid robots to achieve versatile interactions in human environments. Although recent studies have made significant progress in humanoid whole-body control, loco-manipulation remains underexplored and often relies on hard-coded task definitions or costly real-world data collection, which limits autonomy and generalization. We present DemoHLM, a framework for humanoid loco-manipulation that enables generalizable loco-manipulation on a real humanoid robot from a single demonstration in simulation. DemoHLM adopts a hierarchy that integrates a low-level universal whole-body controller with high-level manipulation policies for multiple tasks. The whole-body controller maps whole-body motion commands to joint torques and provides omnidirectional mobility for the humanoid robot. The manipulation policies, learned in simulation via our data generation and imitation learning pipeline, command the whole-body controller with closed-loop visual feedback to execute challenging loco-manipulation tasks. Experiments show a positive correlation between the amount of synthetic data and policy performance, underscoring the effectiveness of our data generation pipeline and the data efficiency of our approach. Real-world experiments on a Unitree G1 robot equipped with an RGB-D camera validate the sim-to-real transferability of DemoHLM, demonstrating robust performance under spatial variations across ten loco-manipulation tasks.

Method Overview

overview

    Overview of DemoHLM. For each task, we collect a single demonstration via VR teleoperation in simulation and record the robot trajectory in the object frame. This trajectory is then used to generate the pre-manipulation and manipulation phases in our data generation pipeline. The generated transitions include robot proprioception, object poses in the camera frame, and actions expressed as high-level commands sent to the whole-body controller. A manipulation policy is trained using imitation learning on this dataset and is successfully deployed on a real robot to perform loco-manipulation.

Real-World Experiments

overview

    Real-world policy rollouts. Each pair of rows shows time-aligned first-person and third-person views. Frames progress from left to right over time.

BibTeX

@article{demohlm,
  title={DemoHLM: From One Demonstration to Generalizable Humanoid Loco-Manipulation},
  author={Fu, Yuhui and Xie, Feiyang and Xu, Chaoyi and Xiong, Jing and Yuan, Haoqi and Lu, Zongqing},
  journal={arXiv preprint arXiv:2510.11258},
  year={2025}
}