RLDG: Robotic Generalist Policy Distillation via Reinforcement Learning

Charles Xu, Qiyang Li, Jianlan Luo, and Sergey Levine

Department of EECS, University of California, Berkeley

Method Overview

RLDG is a framework for distilling specialist RL policies into a generalist robot policy. Generalists trained this way demonstrate higher performance compared to conventional fine-tuning methods using human demonstrations, and stronger generalization capabilities over the RL policies that they are distilled from. It works by:

Train specialist policies on narrowly scoped tasks using online Reinforcement Learning. This could look like training a separate policy for each type of connector in the insertion task. It can also be training on just the "bottleneck" portion of a long-horizon task, while leaving the rest for human demonstrations.
Generate a dataset of expert trajectories by rolling out the specialist policies. The dataset can contain episodes with multiple variants of the task generated by different RL policies. It may also include expert human demonstrations for the "easy" portion of a long-horizon task.
Use the high-quality dataset to fine-tune any generalist robot policy and see improved performance!

Result Videos

FMB Multi-Stage Assembly

OpenVLA + RLDG

✅

OpenVLA

⚠️

Octo + RLDG

⚠️

Octo

❌

Connector Insertion

OpenVLA + RLDG

VGA Connector (Seen)

✅

OpenVLA

VGA Connector (Seen)

⚠️

Octo + RLDG

VGA Connector (Seen)

⚠️

Octo

VGA Connector (Seen)

❌

Type-C Connector (Unseen)

✅

Type-C Connector (Unseen)

❌

Type-C Connector (Unseen)

⚠️

Type-C Connector (Unseen)

❌

DisplayPort Connector (Unseen)

✅

DisplayPort Connector (Unseen)

❌

DisplayPort Connector (Unseen)

✅

DisplayPort Connector (Unseen)

❌

XLR Connector (Unseen)

✅

XLR Connector (Unseen)

⚠️

XLR Connector (Unseen)

⚠️

XLR Connector (Unseen)

❌

Pick and Place

OpenVLA + RLDG

Seen Configuration

✅

OpenVLA

Seen Configuration

✅

Octo + RLDG

Seen Configuration

✅

Octo

Seen Configuration

❌

Unseen Object and Background

✅

Unseen Object and Background

⚠️

Unseen Object and Background

⚠️

Unseen Object and Background

❌