Dynamic Legged Ball Manipulation on Rugged Terrains with Hierarchical Reinforcement Learning

Abstract

Advancing the dynamic loco-manipulation capabilities of quadruped robots in complex terrains is crucial for performing diverse tasks. Specifically, dynamic ball manipulation in rugged environments presents two key challenges. The first is coordinating distinct motion modalities to integrate terrain traversal and ball control seamlessly. The second is overcoming sparse rewards in end-to-end deep reinforcement learning, which impedes efficient policy convergence. To address these challenges, we propose a hierarchical reinforcement learning framework. A high-level policy, informed by proprioceptive data and ball position, adaptively switches between pre-trained low-level skills such as ball dribbling and rough terrain navigation. We further propose Dynamic Skill-Focused Policy Optimization to suppress gradients from inactive skills and enhance critical skill learning. Both simulation and real-world experiments validate that our methods outperform baseline approaches in dynamic ball manipulation across rugged terrains, highlighting its effectiveness in challenging environments.

Methodology

We propose a hierarchical RL framework that trains a high-level policy to coordinate dribbling and locomotion skills for dynamic ball manipulation on rugged terrains. We also introduce a dynamic skill-focused loss formulation to improve learning efficiency and convergence in mixed discrete-continuous action spaces.

Proposed Hierarchical Framework

Simulation Performance

We evaluate the robot's dribbling ability across five terrains: stair descent, ramp-down, rough terrain, ramp-up, and flat ground. With a fixed command, the robot successfully completes the task in 121 seconds. It maintains a stable direction except on rough terrain, where external disturbances cause deviations.

Cross-terrain dribbling performance evaluation.

Physical Deployment

We use a Unitree Go2 quadruped robot, which is additionally equipped with a downward-facing fisheye camera featuring a 240° field-of-view mounted on its head. All policy inference runs onboard with an NVIDIA Jetson Orin NX. We evaluate our policy on four terrains. Among them, the ramp-up, ramp-down, and stair descent terrains are constructed indoors. The cross-terrain scenario consists of irregular ground surfaces, raised curbs, smooth stone pavement, and soft gravel terrain.

Real-world deployment experiments on different terrains.

Dynamic Legged Ball Manipulation on Rugged Terrains with Hierarchical Reinforcement Learning

Abstract

Methodology

Simulation Performance

Training curves of PPO with DSF-PO compared to standard PPO.

Comparison of ball dribbling success rates across different terrains.

Usage frequency of low-level skills across different terrains.

Physical Deployment