CoorDex lets humanoids move while manipulating
CoorDex turns humanoid body and hand control into latent priors so dexterous manipulation can happen while the robot is moving.

CoorDex turns humanoid body and hand control into latent priors so dexterous manipulation can happen while the robot is moving.
- Research org: Unspecified in arXiv abstract
- Core data: 20-DoF WUJI hand
- Breakthrough: Frozen body-hand latent priors with coordinated residual control
Humanoid manipulation has a familiar failure mode: the robot walks to the object, stops, does the task, then starts walking again. That stop-and-go pattern is easier to train, but it leaves a lot of capability on the table when the real goal is continuous movement through cluttered, dynamic, human-like environments.
This paper argues that the bottleneck is not just locomotion or grasping alone. It is the interface between them. CoorDex is built to make body motion and dexterous hand motion cooperate instead of compete, so a humanoid can keep moving while it reaches, grasps, carries, and opens things.
What problem CoorDex is trying to fix
Get the latest AI news in your inbox
Weekly picks of model releases, tools, and deep dives — no spam, unsubscribe anytime.
No spam. Unsubscribe at any time.
Most humanoid loco-manipulation systems simplify the task by breaking it into phases. Walk to the target. Stop. Manipulate. Resume walking. That decomposition makes learning easier, but it also creates a gap between “I can do the task” and “I can do the task while staying in motion.”

The abstract also points out another common simplification: low-DoF end effectors that behave like an open-close grasp primitive. That works for basic pick-and-place scenarios, but it does not capture the richer contact patterns needed for dexterous manipulation, especially when the robot is moving at the same time.
CoorDex is designed around those two limitations. It targets high-DoF body and hand control together, and it aims to preserve natural whole-body motion while improving finger-level contact reliability. For developers, that matters because the interface between locomotion and manipulation is often where real-world policies become brittle.
How the method works in plain English
The key idea is to stop asking one policy to directly output every joint command from scratch. Instead, CoorDex converts high-dimensional body and hand control into coordinated latent residual control. In practical terms, that means the system learns compact priors for how the body and hand should move, then trains a downstream policy to make small corrections on top of those priors.
The pipeline starts with simulated whole-body and hand demonstrations. From those demonstrations, CoorDex trains privileged motion tracking teachers: one for the humanoid body and one for the dexterous hand. These teachers are then distilled into proprioception-conditioned latent priors.
Those priors are frozen and reused as the action space for downstream residual reinforcement learning. That choice is important. Instead of letting reinforcement learning discover everything from a huge action space, the policy works inside a structured latent interface that already encodes useful motion behavior.
The final policy is coordinated rather than monolithic. It composes the body and hand priors through shared task context, but keeps separate residual heads for body and hand. In other words, the model can stay aligned on the task while still letting the locomotion and manipulation sides adapt independently.
What the paper actually shows
The abstract does not report benchmark percentages or success rates, so there are no numeric performance claims to quote here. What it does provide is a concrete capability demonstration: CoorDex enables a Unitree G1 humanoid with a 20-DoF WUJI hand to execute dexterous manipulation while in motion.

The examples called out in the abstract are specific and useful. The robot can do non-stop bottle grasping and carrying, open a fridge door while moving, and perform cube pick-and-turn. Those are not just isolated arm motions; they combine locomotion, contact-rich hand use, and whole-body coordination.
The ablation results are the strongest evidence in the abstract. On the walk-grasp-carry task, joint-space PPO fails under the same reward budget. Joint-space hand control also fails. Monolithic latent prediction fails too. The paper says the latent-prior interface and coordinated residual structure are what make high-dimensional contact-rich loco-manipulation trainable.
That is the main engineering message: the win is not simply “better RL” or “more compute.” It is the structure of the action space. The paper suggests that if you want continuous dexterous humanoid behavior, the policy needs a better control interface before it needs a bigger optimizer.
Why this matters for developers
If you work on humanoid robots, this paper is a reminder that task decomposition can be a trap. Splitting locomotion and manipulation into separate phases may get a demo working, but it can also hide the harder integration problem: maintaining stable contact while the base is still moving.
The latent-prior approach is especially relevant for robotics stacks that already rely on imitation learning, motion priors, or hierarchical policies. CoorDex is basically saying that a frozen learned prior can be a better action interface than raw joint-space control when the task is contact-rich and high-dimensional.
There is also a practical systems lesson here. The paper uses privileged teachers, then distills them into proprioception-conditioned priors, then adds residual learning on top. That is a familiar pattern in modern robotics: use stronger supervision or simulation-only information early, then hand a cleaner interface to the downstream policy.
What is still unclear
The abstract leaves several important questions open. It does not give benchmark numbers, so we cannot tell how large the gains are. It also does not describe hardware runtime details, training cost, or whether the method generalizes beyond the listed tasks and robot setup.
Another open question is transferability. The abstract names a Unitree G1 with a 20-DoF WUJI hand, but it does not say how much of the pipeline depends on that specific platform, the simulator, or the demonstration data. For practitioners, that matters because latent-prior systems can be powerful but sometimes sensitive to the exact embodiment they were trained on.
Even so, the contribution is clear: CoorDex frames continuous humanoid loco-manipulation as a control-interface problem, not just a policy-size problem. If that framing holds up beyond the reported tasks, it is a useful template for anyone trying to make humanoids do more than walk, stop, and grab.
Bottom line
CoorDex shows a structured way to make body and hand control cooperate through frozen latent priors and residual learning. The paper’s concrete demos suggest that this is a promising path for moving humanoids that need to manipulate objects without coming to a halt.
- It replaces raw joint-space control with learned latent priors for body and hand.
- It uses separate residual heads to coordinate locomotion and dexterous manipulation.
- It demonstrates continuous tasks like bottle grasping, fridge opening, and cube pick-and-turn on a Unitree G1.
For robotics engineers, the takeaway is straightforward: if you want dexterous humanoid behavior in motion, the action interface may matter as much as the reward function.
// Related Articles
- [RSCH]
LifeSciBench lets you test biotech models
- [RSCH]
Randomized YaRN boosts long-context reasoning
- [RSCH]
AutoDex automates dexterous grasp data collection
- [RSCH]
Anthropic’s scale lead is the real moat in frontier AI
- [RSCH]
TeamPCP供应链投毒暴露AI攻击升级
- [RSCH]
Ethereum turns Wikipedia into a dev cheat sheet