Approximating Gradients for
Differentiable Quality Diversity in
Reinforcement Learning

GECCO 2022 — 11 July 2022

Bryon Tjanaka
University of Southern California
tjanaka@usc.edu

Matthew C. Fontaine
University of Southern California
mfontain@usc.edu

Julian Togelius
New York University
julian@togelius.com

Stefanos Nikolaidis
University of Southern California
nikolaid@usc.edu

"Normal"

Front

Back

  1. Quality Diversity (QD)

  2. Differentiable Quality Diversity
    (DQD)

  3. Quality Diversity for Reinforcement Learning (QD-RL)

  4. Approximating Gradients
    for DQD in RL

  5. Experiments

  6. Results

  1. Quality Diversity (QD)

  2. Differentiable Quality Diversity
    (DQD)

  3. Quality Diversity for Reinforcement Learning (QD-RL)

  4. Approximating Gradients
    for DQD in RL

  5. Experiments

  6. Results

Front: 40%
Back: 50%

$$\text{QD score} = \sum_{i=1}^M f(\bm{\phi}_i)$$

  1. Quality Diversity (QD)

  2. Differentiable Quality Diversity
    (DQD)

  3. Quality Diversity for Reinforcement Learning (QD-RL)

  4. Approximating Gradients
    for DQD in RL

  5. Experiments

  6. Results

CMA-MEGA

Key Insight: Search by following objective and measure gradients.

Fontaine and Nikolaidis 2021, "Differentiable Quality Diversity." NeurIPS 2021 Oral.

CMA-MEGA

  1. Quality Diversity (QD)

  2. Differentiable Quality Diversity
    (DQD)

  3. Quality Diversity for Reinforcement Learning (QD-RL)

  4. Approximating Gradients
    for DQD in RL

  5. Experiments

  6. Results

Policy Gradient Assisted MAP-Elites
(PGA-MAP-Elites)

O. Nilsson and A. Cully 2021. "Policy Gradient Assisted MAP-Elites." GECCO 2021.

Inspired by
PGA-MAP-Elites!

  1. Quality Diversity (QD)

  2. Differentiable Quality Diversity
    (DQD)

  3. Quality Diversity for Reinforcement Learning (QD-RL)

  4. Approximating Gradients
    for DQD in RL

  5. Experiments

  6. Results

Problem: Exact gradients are unavailable!

Solution: Approximate $\bm{\nabla} f$ and $\bm{\nabla m}$.

DQDQD-RL

Exact Gradients

Approximate Gradients

CMA-MEGA

CMA-MEGA with
gradient approximations

Approximating $\bm{\nabla} f$


Expected discounted return

Off-Policy Actor-Critic Method (TD3)

S. Fujimoto et al. 2018, "Addressing Function Approximation error in Actor-Critic Methods." ICML 2018.

Evolution Strategy (OpenAI-ES)

Salimans et al. 2017, "Evolution Strategies as a Scalable Alternative to Reinforcement Learning" https://arxiv.org/abs/1703.03864

Approximating $\bm{\nabla} \bm{m}$


Black Box

CMA-MEGA (ES)CMA-MEGA (TD3, ES)
$\bm{\nabla} f$ ES TD3
$\bm{\nabla} \bm{m}$ ES ES

CMA-MEGA (ES) & CMA-MEGA (TD3, ES)

CMA-MEGA (ES) & CMA-MEGA (TD3, ES)

CMA-MEGA (ES) & CMA-MEGA (TD3, ES)

CMA-MEGA (ES) & CMA-MEGA (TD3, ES)

CMA-MEGA (ES) & CMA-MEGA (TD3, ES)

CMA-MEGA (ES) & CMA-MEGA (TD3, ES)

  1. Quality Diversity (QD)

  2. Differentiable Quality Diversity
    (DQD)

  3. Quality Diversity for Reinforcement Learning (QD-RL)

  4. Approximating Gradients
    for DQD in RL

  5. Experiments

  6. Results

QD Ant

QD Half-Cheetah

QD Hopper

QD Walker

Independent Variables

  • Algorithm:
    CMA-MEGA (ES), CMA-MEGA (TD3, ES),
    PGA-MAP-Elites, MAP-Elites, ME-ES
  • Environment:
    QD Ant, QD Half-Cheetah, QD Hopper, QD Walker

Dependent Variable

  • QD Score
  1. Quality Diversity (QD)

  2. Differentiable Quality Diversity
    (DQD)

  3. Quality Diversity for Reinforcement Learning (QD-RL)

  4. Approximating Gradients
    for DQD in RL

  5. Experiments

  6. Results

CMA-MEGA (ES)CMA-MEGA (TD3, ES)
PGA-MAP-ElitesComparable on 2/4Comparable on 4/4
MAP-ElitesOutperforms on 4/4Outperforms on 4/4
ME-ESOutperforms on 3/4Outperforms on 4/4

Inspired by
PGA-MAP-Elites!

OG-MAP-Elites

(DQD Benchmark Domain)

Easy objective, difficult measures

PGA-MAP-Elites

(QD Walker)

Difficult objective, easy measures

PGA-MAP-ElitesCMA-MEGA (ES),
CMA-MEGA (TD3, ES)
Objective Gradient
Steps
5,000,0005,000

Approximating Gradients for
Differentiable Quality Diversity in
Reinforcement Learning

GECCO 2022 — 11 July 2022

Bryon Tjanaka, Matthew C. Fontaine,
Julian Togelius, Stefanos Nikolaidis

  1. Quality Diversity (QD)

  2. Differentiable Quality Diversity
    (DQD)

  3. Quality Diversity for Reinforcement Learning (QD-RL)

  4. Approximating Gradients
    for DQD in RL

  5. Experiments

  6. Results

Supplemental

DQD Benchmark

$$sphere(\bm{\phi}) = \sum_{i=1}^n (\bm{\phi}_i-2.048)^2$$ $$clip(\bm{\phi}_i) = \begin{cases} \bm{\phi}_i & \text{if } -5.12 \le \bm{\phi}_i \le 5.12 \\ 5.12/\bm{\phi_i} & \text{otherwise} \end{cases}$$ $$\bm{m}(\bm{\phi}) = \left(\sum_{i=1}^{\lfloor\frac{n}{2}\rfloor} clip(\bm{\phi}_i), \sum_{i=\lfloor\frac{n}{2}\rfloor+1}^n clip(\bm{\phi}_i) \right)$$

Fontaine et al. 2020, "Covariance Matrix Adaptation for the Rapid Illumination of Behavior Space." GECCO 2020.

DQD StyleGAN+CLIP

Fontaine and Nikolaidis 2021, "Differentiable Quality Diversity." NeurIPS 2021 Oral.

OpenAI-ES

$$\bm{\nabla} f(\bm{\phi}) \approx \frac{1}{\lambda_{es}\sigma} \sum_{i=1}^{\lambda_{es}} f(\bm{\phi} + \sigma \bm{\epsilon}_i) \bm{\epsilon}_i$$

TD3 Overview

CVT Archive

Vassiliades et al. 2018. "Using Centroidal Voronoi Tessellations to Scale Up the Multidimensional Archive of Phenotypic Elites Algorithm." IEEE Transactions on Evolutionary Computation 2018.

Sliding Boundaries Archive

Fontaine et al. 2019. "Mapping Hearthstone Deck Spaces through MAP-Elites with Sliding Boundaries." GECCO 2019.

Back to Supplemental