ALOE 2022 Workshop, 29 April 2022
Bryon Tjanaka
University of Southern California
tjanaka@usc.edu
Matthew C. Fontaine
University of Southern California
mfontain@usc.edu
Julian Togelius
New York University
julian@togelius.com
Stefanos Nikolaidis
University of Southern California
nikolaid@usc.edu
Website
https://dqd-rl.github.io
"Normal"
Front
Back
Quality Diversity (QD)
Differentiable Quality Diversity
(DQD)

Quality Diversity for Reinforcement Learning (QD-RL)
Approximating Gradients
for DQD in RL

Experiments
Results
Quality Diversity (QD)
Differentiable Quality Diversity
(DQD)

Quality Diversity for Reinforcement Learning (QD-RL)
Approximating Gradients
for DQD in RL

Experiments
Results
Performance: 2,300
Front: 40%
Back: 50%
For every output x of the measure function m, find ϕ such that m(ϕ)=x, and f(ϕ) is maximized.
Quality Diversity (QD)
Differentiable Quality Diversity
(DQD)

Quality Diversity for Reinforcement Learning (QD-RL)
Approximating Gradients
for DQD in RL

Experiments
Results
Key Insight: Search by following objective and measure gradients.
Fontaine and Nikolaidis 2021, "Differentiable Quality Diversity." NeurIPS 2021 Oral.
Quality Diversity (QD)
Differentiable Quality Diversity
(DQD)

Quality Diversity for Reinforcement Learning (QD-RL)
Approximating Gradients
for DQD in RL

Experiments
Results
O. Nilsson and A. Cully 2021. "Policy Gradient Assisted MAP-Elites." GECCO 2021.
Inspired by
PGA-MAP-Elites!
Quality Diversity (QD)
Differentiable Quality Diversity
(DQD)

Quality Diversity for Reinforcement Learning (QD-RL)
Approximating Gradients
for DQD in RL

Experiments
Results
Hypothesis:
Since CMA-MEGA performs well in DQD domains,
it will outperform existing QD-RL algorithms
(i.e. PGA-MAP-Elites and MAP-Elites).
Problem: Environments are non-differentiable!
Solution: Approximate ∇f and ∇m.
| DQD | QD-RL |
|---|---|
Exact Gradients | Approximate Gradients |
CMA-MEGA | CMA-MEGA with |
↑
Expected discounted return
S. Fujimoto et al. 2018, "Addressing Function Approximation error in Actor-Critic Methods." ICML 2018.
Salimans et al. 2017, "Evolution Strategies as a Scalable Alternative to Reinforcement Learning" https://arxiv.org/abs/1703.03864
↑
Black Box
| CMA-MEGA (ES) | CMA-MEGA (TD3, ES) | |
|---|---|---|
| ∇f | ||
| ∇m |
Quality Diversity (QD)
Differentiable Quality Diversity
(DQD)

Quality Diversity for Reinforcement Learning (QD-RL)
Approximating Gradients
for DQD in RL

Experiments
Results
QD Ant
QD Half-Cheetah
QD Hopper
QD Walker
Quality Diversity (QD)
Differentiable Quality Diversity
(DQD)

Quality Diversity for Reinforcement Learning (QD-RL)
Approximating Gradients
for DQD in RL

Experiments
Results
| CMA-MEGA (ES) | CMA-MEGA (TD3, ES) | |
|---|---|---|
| PGA-MAP-Elites | Comparable on 2/4 | Comparable on 4/4 |
| MAP-Elites | Outperforms on 4/4 | Outperforms on 4/4 |
| ME-ES | Outperforms on 3/4 | Outperforms on 4/4 |
Inspired by
PGA-MAP-Elites!
CMA-MEGA
(DQD Benchmark Domain)
Easy objective, difficult measures
CMA-MEGA (ES)
(QD Half-Cheetah)
Difficult objective, easy measures
| PGA-MAP-Elites | CMA-MEGA (ES), CMA-MEGA (TD3, ES) | |
|---|---|---|
| Objective Gradient Steps | 5,000,000 | 5,000 |
ALOE 2022 Workshop, 29 April 2022
Bryon Tjanaka, Matthew C. Fontaine,
Julian Togelius, Stefanos Nikolaidis
Quality Diversity (QD)
Differentiable Quality Diversity
(DQD)

Quality Diversity for Reinforcement Learning (QD-RL)
Approximating Gradients
for DQD in RL

Experiments
Results