LocoFormer

Abstract

Modern locomotion controllers are manually tuned for specific embodiments. We present LocoFormer, a generalist omni-bodied locomotion model that can control previously unseen legged and wheeled robots, even without precise knowledge of their kinematics. LocoFormer is able to adapt to changes in morphology and dynamics at test time. We find that two key choices enable adaptation. First, we train massive scale RL on procedurally generated robots with aggressive domain randomization. Second, in contrast to previous policies that are myopic with short context lengths, we extend context by orders of magnitude to span episode boundaries. We deploy the same LocoFormer to varied robots and show robust control even with large disturbances such as weight change and motor failures. In extreme scenarios, we see emergent adaptation across episodes, LocoFormer learns from falls in early episodes to improve control strategies in later ones. We believe that this simple, yet general recipe can be used to train foundation models for other robotic skills in the future.

We control all the robots with the same LocoFormer policy.

Emergent Adaptation Behaviors

Adaptation to Large Disturbances

LocoFormer begins without precise knowledge about the target embodiment, and dynamically builds stable, embodiment-specific representations online. When facing large disturbances—such as morphology change, motor failures, or weight change—it has the ability to rebuild such representations to achieve online adaptation.

Adaptation Across Trials

LocoFormer learns continuously through online experience. The policy can learn from falls in early trials to improve control strategies in later ones. Once an effective strategy is discovered, LocoFormer retains and leverages it in future trials.

Robust Humanoid Walking

LocoFormer effectively adapts to diverse humanoid morphologies, enabling robust and stable walking performance.

Specialist Policy Fails to Adapt

LocoFormer's adaptability stems from long-term memory and large-scale reinforcement learning on diverse morphologies with aggressive domain randomization. In contrast, policies specialized for a single embodiment typically fail to adapt effectively to significant morphological changes.

LocoFormer Overview

To enable long-context adaptation, we allow LocoFormer to attend to states from prior trials. We use a transformer-XL backbone, that divides context into fixed-length segments. When processing the current segment, keys and values are also computed from states in the previous (cached) segment as well, but gradients are not propagated (blue lines). As the number of layers increases, this allows the TXL to use information even segments prior to the cached one (green lines). During training, we let the policy run multiple trials in an episode. Memory persists across trials, and the objective is to maximize the expected cumulative reward across the entire episode.

We train LocoFormer with large-scale RL on a variety of procedurally generated robots with aggressive domain randomization.

LocoFormer can be deployed to a wide variety of previously unseen robots, requiring just minutes for setup and seconds to adapt.

Failure Cases

In certain challenging morphologies, LocoFormer may not adapt quickly enough to sudden online disturbances. This occurs partly because, during training, the robot’s morphology and specific physical properties—such as the center of mass—remain fixed throughout each episode. Nevertheless, after such failures, LocoFormer can initiate a new trial and quickly adapt by leveraging memory from previous unsuccessful experiences.

BibTeX

@inproceedings{liulocoformer,
  title={LocoFormer: Generalist Locomotion via Long-context Adaptation},
  author={Liu, Min and Pathak, Deepak and Agarwal, Ananye},
  booktitle={9th Annual Conference on Robot Learning},
  year={2025}
}

LocoFormer: Generalist Locomotion via Long-Context Adaptation