ALOE 2022 Workshop, 29 April 2022
Bryon Tjanaka
University of Southern California
tjanaka@usc.edu
Matthew C. Fontaine
University of Southern California
mfontain@usc.edu
Julian Togelius
New York University
julian@togelius.com
Stefanos Nikolaidis
University of Southern California
nikolaid@usc.edu
Website
https://dqd-rl.github.io
"Normal"
Front
Back
Quality Diversity (QD)
Differentiable Quality Diversity
(DQD)
Quality Diversity for Reinforcement Learning (QD-RL)
Approximating Gradients
for DQD in RL
Experiments
Results
Quality Diversity (QD)
Differentiable Quality Diversity
(DQD)
Quality Diversity for Reinforcement Learning (QD-RL)
Approximating Gradients
for DQD in RL
Experiments
Results
Performance: 2,300
Front: 40%
Back: 50%
For every output $x$ of the measure function $\bm{m}$, find $\bm{\phi}$ such that $\bm{m}(\bm{\phi}) = x$, and $f(\bm{\phi})$ is maximized.
Quality Diversity (QD)
Differentiable Quality Diversity
(DQD)
Quality Diversity for Reinforcement Learning (QD-RL)
Approximating Gradients
for DQD in RL
Experiments
Results
Key Insight: Search by following objective and measure gradients.
Fontaine and Nikolaidis 2021, "Differentiable Quality Diversity." NeurIPS 2021 Oral.
Quality Diversity (QD)
Differentiable Quality Diversity
(DQD)
Quality Diversity for Reinforcement Learning (QD-RL)
Approximating Gradients
for DQD in RL
Experiments
Results
O. Nilsson and A. Cully 2021. "Policy Gradient Assisted MAP-Elites." GECCO 2021.
Inspired by
PGA-MAP-Elites!
Quality Diversity (QD)
Differentiable Quality Diversity
(DQD)
Quality Diversity for Reinforcement Learning (QD-RL)
Approximating Gradients
for DQD in RL
Experiments
Results
Hypothesis:
Since CMA-MEGA performs well in DQD domains,
it will outperform existing QD-RL algorithms
(i.e. PGA-MAP-Elites and MAP-Elites).
Problem: Environments are non-differentiable!
Solution: Approximate $\bm{\nabla} f$ and $\bm{\nabla m}$.
DQD | QD-RL |
---|---|
Exact Gradients | Approximate Gradients |
CMA-MEGA | CMA-MEGA with |
↑
Expected discounted return
S. Fujimoto et al. 2018, "Addressing Function Approximation error in Actor-Critic Methods." ICML 2018.
Salimans et al. 2017, "Evolution Strategies as a Scalable Alternative to Reinforcement Learning" https://arxiv.org/abs/1703.03864
↑
Black Box
CMA-MEGA (ES) | CMA-MEGA (TD3, ES) | |
---|---|---|
$\bm{\nabla} f$ | ES | TD3 |
$\bm{\nabla} \bm{m}$ | ES | ES |
Quality Diversity (QD)
Differentiable Quality Diversity
(DQD)
Quality Diversity for Reinforcement Learning (QD-RL)
Approximating Gradients
for DQD in RL
Experiments
Results
QD Ant
QD Half-Cheetah
QD Hopper
QD Walker
Quality Diversity (QD)
Differentiable Quality Diversity
(DQD)
Quality Diversity for Reinforcement Learning (QD-RL)
Approximating Gradients
for DQD in RL
Experiments
Results
CMA-MEGA (ES) | CMA-MEGA (TD3, ES) | |
---|---|---|
PGA-MAP-Elites | Comparable on 2/4 | Comparable on 4/4 |
MAP-Elites | Outperforms on 4/4 | Outperforms on 4/4 |
ME-ES | Outperforms on 3/4 | Outperforms on 4/4 |
Inspired by
PGA-MAP-Elites!
CMA-MEGA
(DQD Benchmark Domain)
Easy objective, difficult measures
CMA-MEGA (ES)
(QD Half-Cheetah)
Difficult objective, easy measures
PGA-MAP-Elites | CMA-MEGA (ES), CMA-MEGA (TD3, ES) | |
---|---|---|
Objective Gradient Steps | 5,000,000 | 5,000 |
ALOE 2022 Workshop, 29 April 2022
Bryon Tjanaka, Matthew C. Fontaine,
Julian Togelius, Stefanos Nikolaidis
Quality Diversity (QD)
Differentiable Quality Diversity
(DQD)
Quality Diversity for Reinforcement Learning (QD-RL)
Approximating Gradients
for DQD in RL
Experiments
Results