GECCO 2022 — 11 July 2022
Bryon Tjanaka
University of Southern California
tjanaka@usc.edu
Matthew C. Fontaine
University of Southern California
mfontain@usc.edu
Julian Togelius
New York University
julian@togelius.com
Stefanos Nikolaidis
University of Southern California
nikolaid@usc.edu
Website
https://dqd-rl.github.io
"Normal"
Front
Back
Quality Diversity (QD)
Differentiable Quality Diversity
(DQD)
Quality Diversity for Reinforcement Learning (QD-RL)
Approximating Gradients
for DQD in RL
Experiments
Results
Quality Diversity (QD)
Differentiable Quality Diversity
(DQD)
Quality Diversity for Reinforcement Learning (QD-RL)
Approximating Gradients
for DQD in RL
Experiments
Results
Front: 40%
Back: 50%
$$\text{QD score} = \sum_{i=1}^M f(\bm{\phi}_i)$$
Quality Diversity (QD)
Differentiable Quality Diversity
(DQD)
Quality Diversity for Reinforcement Learning (QD-RL)
Approximating Gradients
for DQD in RL
Experiments
Results
Key Insight: Search by following objective and measure gradients.
Fontaine and Nikolaidis 2021, "Differentiable Quality Diversity." NeurIPS 2021 Oral.
Quality Diversity (QD)
Differentiable Quality Diversity
(DQD)
Quality Diversity for Reinforcement Learning (QD-RL)
Approximating Gradients
for DQD in RL
Experiments
Results
O. Nilsson and A. Cully 2021. "Policy Gradient Assisted MAP-Elites." GECCO 2021.
Inspired by
PGA-MAP-Elites!
Quality Diversity (QD)
Differentiable Quality Diversity
(DQD)
Quality Diversity for Reinforcement Learning (QD-RL)
Approximating Gradients
for DQD in RL
Experiments
Results
Problem: Exact gradients are unavailable!
Solution: Approximate $\bm{\nabla} f$ and $\bm{\nabla m}$.
DQD | QD-RL |
---|---|
Exact Gradients | Approximate Gradients |
CMA-MEGA | CMA-MEGA with |
↑
Expected discounted return
S. Fujimoto et al. 2018, "Addressing Function Approximation error in Actor-Critic Methods." ICML 2018.
Salimans et al. 2017, "Evolution Strategies as a Scalable Alternative to Reinforcement Learning" https://arxiv.org/abs/1703.03864
↑
Black Box
CMA-MEGA (ES) | CMA-MEGA (TD3, ES) | |
---|---|---|
$\bm{\nabla} f$ | ES | TD3 |
$\bm{\nabla} \bm{m}$ | ES | ES |
Quality Diversity (QD)
Differentiable Quality Diversity
(DQD)
Quality Diversity for Reinforcement Learning (QD-RL)
Approximating Gradients
for DQD in RL
Experiments
Results
QD Ant
QD Half-Cheetah
QD Hopper
QD Walker
Quality Diversity (QD)
Differentiable Quality Diversity
(DQD)
Quality Diversity for Reinforcement Learning (QD-RL)
Approximating Gradients
for DQD in RL
Experiments
Results
CMA-MEGA (ES) | CMA-MEGA (TD3, ES) | |
---|---|---|
PGA-MAP-Elites | Comparable on 2/4 | Comparable on 4/4 |
MAP-Elites | Outperforms on 4/4 | Outperforms on 4/4 |
ME-ES | Outperforms on 3/4 | Outperforms on 4/4 |
Inspired by
PGA-MAP-Elites!
OG-MAP-Elites
(DQD Benchmark Domain)
Easy objective, difficult measures
PGA-MAP-Elites
(QD Walker)
Difficult objective, easy measures
PGA-MAP-Elites | CMA-MEGA (ES), CMA-MEGA (TD3, ES) | |
---|---|---|
Objective Gradient Steps | 5,000,000 | 5,000 |
GECCO 2022 — 11 July 2022
Bryon Tjanaka, Matthew C. Fontaine,
Julian Togelius, Stefanos Nikolaidis
Quality Diversity (QD)
Differentiable Quality Diversity
(DQD)
Quality Diversity for Reinforcement Learning (QD-RL)
Approximating Gradients
for DQD in RL
Experiments
Results
$$sphere(\bm{\phi}) = \sum_{i=1}^n (\bm{\phi}_i-2.048)^2$$ $$clip(\bm{\phi}_i) = \begin{cases} \bm{\phi}_i & \text{if } -5.12 \le \bm{\phi}_i \le 5.12 \\ 5.12/\bm{\phi_i} & \text{otherwise} \end{cases}$$ $$\bm{m}(\bm{\phi}) = \left(\sum_{i=1}^{\lfloor\frac{n}{2}\rfloor} clip(\bm{\phi}_i), \sum_{i=\lfloor\frac{n}{2}\rfloor+1}^n clip(\bm{\phi}_i) \right)$$
Fontaine et al. 2020, "Covariance Matrix Adaptation for the Rapid Illumination of Behavior Space." GECCO 2020.
Fontaine and Nikolaidis 2021, "Differentiable Quality Diversity." NeurIPS 2021 Oral.
$$\bm{\nabla} f(\bm{\phi}) \approx \frac{1}{\lambda_{es}\sigma} \sum_{i=1}^{\lambda_{es}} f(\bm{\phi} + \sigma \bm{\epsilon}_i) \bm{\epsilon}_i$$
Vassiliades et al. 2018. "Using Centroidal Voronoi Tessellations to Scale Up the Multidimensional Archive of Phenotypic Elites Algorithm." IEEE Transactions on Evolutionary Computation 2018.
Fontaine et al. 2019. "Mapping Hearthstone Deck Spaces through MAP-Elites with Sliding Boundaries." GECCO 2019.