RLDG: Robotic Generalist Policy Distillation via Reinforcement Learning

Department of EECS, University of California, Berkeley
Method Overview
RLDG is a framework for distilling specialist RL policies into a generalist robot policy. Generalists trained this way demonstrate higher performance compared to conventional fine-tuning methods using human demonstrations, and stronger generalization capabilities over the RL policies that they are distilled from. It works by:
  1. Train specialist policies on narrowly scoped tasks using online Reinforcement Learning. This could look like training a separate policy for each type of connector in the insertion task. It can also be training on just the "bottleneck" portion of a long-horizon task, while leaving the rest for human demonstrations.
  2. Generate a dataset of expert trajectories by rolling out the specialist policies. The dataset can contain episodes with multiple variants of the task generated by different RL policies. It may also include expert human demonstrations for the "easy" portion of a long-horizon task.
  3. Use the high-quality dataset to fine-tune any generalist robot policy and see improved performance!

Result Videos

FMB Multi-Stage Assembly

OpenVLA + RLDG

OpenVLA

⚠️

Octo + RLDG

⚠️

Octo

Connector Insertion

OpenVLA + RLDG

VGA Connector (Seen)

OpenVLA

VGA Connector (Seen)
⚠️

Octo + RLDG

VGA Connector (Seen)
⚠️

Octo

VGA Connector (Seen)
Type-C Connector (Unseen)
Type-C Connector (Unseen)
Type-C Connector (Unseen)
⚠️
Type-C Connector (Unseen)
DisplayPort Connector (Unseen)
DisplayPort Connector (Unseen)
DisplayPort Connector (Unseen)
DisplayPort Connector (Unseen)
XLR Connector (Unseen)
XLR Connector (Unseen)
⚠️
XLR Connector (Unseen)
⚠️
XLR Connector (Unseen)

Pick and Place

OpenVLA + RLDG

Seen Configuration

OpenVLA

Seen Configuration

Octo + RLDG

Seen Configuration

Octo

Seen Configuration
Unseen Object and Background
Unseen Object and Background
⚠️
Unseen Object and Background
⚠️
Unseen Object and Background