Differentiable Quality Diversity for
Reinforcement Learning by
Approximating Gradients

ALOE 2022 Workshop, 29 April 2022

Bryon Tjanaka
University of Southern California
tjanaka@usc.edu

Matthew C. Fontaine
University of Southern California
mfontain@usc.edu

Julian Togelius
New York University
julian@togelius.com

Stefanos Nikolaidis
University of Southern California
nikolaid@usc.edu

"Normal"

Front

Back

  1. Quality Diversity (QD)

  2. Differentiable Quality Diversity
    (DQD)

  3. Quality Diversity for Reinforcement Learning (QD-RL)

  4. Approximating Gradients
    for DQD in RL

  5. Experiments

  6. Results

  1. Quality Diversity (QD)

  2. Differentiable Quality Diversity
    (DQD)

  3. Quality Diversity for Reinforcement Learning (QD-RL)

  4. Approximating Gradients
    for DQD in RL

  5. Experiments

  6. Results

Performance: 2,300
Front: 40%
Back: 50%

QD Objective

For every output $x$ of the measure function $\bm{m}$, find $\bm{\phi}$ such that $\bm{m}(\bm{\phi}) = x$, and $f(\bm{\phi})$ is maximized.

  1. Quality Diversity (QD)

  2. Differentiable Quality Diversity
    (DQD)

  3. Quality Diversity for Reinforcement Learning (QD-RL)

  4. Approximating Gradients
    for DQD in RL

  5. Experiments

  6. Results

CMA-MEGA

Key Insight: Search by following objective and measure gradients.

Fontaine and Nikolaidis 2021, "Differentiable Quality Diversity." NeurIPS 2021 Oral.

CMA-MEGA

  1. Quality Diversity (QD)

  2. Differentiable Quality Diversity
    (DQD)

  3. Quality Diversity for Reinforcement Learning (QD-RL)

  4. Approximating Gradients
    for DQD in RL

  5. Experiments

  6. Results

Policy Gradient Assisted MAP-Elites
(PGA-MAP-Elites)

O. Nilsson and A. Cully 2021. "Policy Gradient Assisted MAP-Elites." GECCO 2021.

Inspired by
PGA-MAP-Elites!

  1. Quality Diversity (QD)

  2. Differentiable Quality Diversity
    (DQD)

  3. Quality Diversity for Reinforcement Learning (QD-RL)

  4. Approximating Gradients
    for DQD in RL

  5. Experiments

  6. Results

Hypothesis:

Since CMA-MEGA performs well in DQD domains,
it will outperform existing QD-RL algorithms
(i.e. PGA-MAP-Elites and MAP-Elites).

Problem: Environments are non-differentiable!

Solution: Approximate $\bm{\nabla} f$ and $\bm{\nabla m}$.

DQDQD-RL

Exact Gradients

Approximate Gradients

CMA-MEGA

CMA-MEGA with
gradient approximations

Approximating $\bm{\nabla} f$


Expected discounted return

Off-Policy Actor-Critic Method (TD3)

S. Fujimoto et al. 2018, "Addressing Function Approximation error in Actor-Critic Methods." ICML 2018.

Evolution Strategy (OpenAI-ES)

Salimans et al. 2017, "Evolution Strategies as a Scalable Alternative to Reinforcement Learning" https://arxiv.org/abs/1703.03864

Approximating $\bm{\nabla} \bm{m}$


Black Box

CMA-MEGA (ES)CMA-MEGA (TD3, ES)
$\bm{\nabla} f$ ES TD3
$\bm{\nabla} \bm{m}$ ES ES

CMA-MEGA (ES) & CMA-MEGA (TD3, ES)

CMA-MEGA (ES) & CMA-MEGA (TD3, ES)

CMA-MEGA (ES) & CMA-MEGA (TD3, ES)

CMA-MEGA (ES) & CMA-MEGA (TD3, ES)

CMA-MEGA (ES) & CMA-MEGA (TD3, ES)

CMA-MEGA (ES) & CMA-MEGA (TD3, ES)

  1. Quality Diversity (QD)

  2. Differentiable Quality Diversity
    (DQD)

  3. Quality Diversity for Reinforcement Learning (QD-RL)

  4. Approximating Gradients
    for DQD in RL

  5. Experiments

  6. Results

QD Ant

QD Half-Cheetah

QD Hopper

QD Walker

Independent Variables

  • Algorithm:
    CMA-MEGA (ES), CMA-MEGA (TD3, ES),
    PGA-MAP-Elites, MAP-Elites, ME-ES
  • Environment:
    QD Ant, QD Half-Cheetah, QD Hopper, QD Walker

Dependent Variable

  • QD Score
  1. Quality Diversity (QD)

  2. Differentiable Quality Diversity
    (DQD)

  3. Quality Diversity for Reinforcement Learning (QD-RL)

  4. Approximating Gradients
    for DQD in RL

  5. Experiments

  6. Results

CMA-MEGA (ES)CMA-MEGA (TD3, ES)
PGA-MAP-ElitesComparable on 2/4Comparable on 4/4
MAP-ElitesOutperforms on 4/4Outperforms on 4/4
ME-ESOutperforms on 3/4Outperforms on 4/4

Inspired by
PGA-MAP-Elites!

CMA-MEGA

(DQD Benchmark Domain)

Easy objective, difficult measures

CMA-MEGA (ES)

(QD Half-Cheetah)

Difficult objective, easy measures

PGA-MAP-ElitesCMA-MEGA (ES),
CMA-MEGA (TD3, ES)
Objective Gradient
Steps
5,000,0005,000

Future Directions

Future Directions

Differentiable Quality Diversity for
Reinforcement Learning by
Approximating Gradients

ALOE 2022 Workshop, 29 April 2022

Bryon Tjanaka, Matthew C. Fontaine,
Julian Togelius, Stefanos Nikolaidis

  1. Quality Diversity (QD)

  2. Differentiable Quality Diversity
    (DQD)

  3. Quality Diversity for Reinforcement Learning (QD-RL)

  4. Approximating Gradients
    for DQD in RL

  5. Experiments

  6. Results