Bryon Tjanaka
Oral Qualification Exam, 14 April 2022
Committee
Website
https://dqd-rl.github.io
Quality Diversity (QD)
Differentiable Quality Diversity
(DQD)
Quality Diversity for Reinforcement Learning (QD-RL)
Approximating Gradients
for DQD in RL
Experiments
Results
Quality Diversity (QD)
Differentiable Quality Diversity
(DQD)
Quality Diversity for Reinforcement Learning (QD-RL)
Approximating Gradients
for DQD in RL
Experiments
Results
Performance: 2,300
Front: 40%
Back: 50%
For every output $x$ of the measure function $\bm{m}$, find $\bm{\phi}$ such that $\bm{m}(\bm{\phi}) = x$, and $f(\bm{\phi})$ is maximized.
Key Insight: Stepping stones
A. Cully et al. 2015, "Robots that can adapt like animals." Nature 2015.
J.-B. Mouret and J. Clune 2015, "Illuminating search spaces by mapping elites." https://arxiv.org/abs/1504.04909
Quality Diversity (QD)
Differentiable Quality Diversity
(DQD)
Quality Diversity for Reinforcement Learning (QD-RL)
Approximating Gradients
for DQD in RL
Experiments
Results
Key Insight: Search by following objective and measure gradients.
Fontaine and Nikolaidis 2021, "Differentiable Quality Diversity." NeurIPS 2021 Oral.
Quality Diversity (QD)
Differentiable Quality Diversity
(DQD)
Quality Diversity for Reinforcement Learning (QD-RL)
Approximating Gradients
for DQD in RL
Experiments
Results
$$\pi_{\bm{\phi}}(a | s)$$
$$f(\bm{\phi}) = \mathbb{E}_{\xi\sim p_{\bm{\phi}}}\left[\sum_{t=0}^T\gamma^t r(s_t,a_t) \right]$$
O. Nilsson and A. Cully 2021. "Policy Gradient Assisted MAP-Elites." GECCO 2021.
O. Nilsson and A. Cully 2021. "Policy Gradient Assisted MAP-Elites." GECCO 2021.
O. Nilsson and A. Cully 2021. "Policy Gradient Assisted MAP-Elites." GECCO 2021.
O. Nilsson and A. Cully 2021. "Policy Gradient Assisted MAP-Elites." GECCO 2021.
O. Nilsson and A. Cully 2021. "Policy Gradient Assisted MAP-Elites." GECCO 2021.
Inspired by
PGA-MAP-Elites!
Quality Diversity (QD)
Differentiable Quality Diversity
(DQD)
Quality Diversity for Reinforcement Learning (QD-RL)
Approximating Gradients
for DQD in RL
Experiments
Results
Hypothesis:
Since CMA-MEGA performs well in DQD domains,
it will outperform existing QD-RL algorithms
(i.e. PGA-MAP-Elites and MAP-Elites).
DQD | QD-RL |
---|---|
Exact Gradients | |
CMA-MEGA | CMA-MEGA? |
Problem: Environments are non-differentiable!
Solution: Approximate $\bm{\nabla} f$ and $\bm{\nabla m}$.
DQD | QD-RL |
---|---|
Exact Gradients | Approximate Gradients |
CMA-MEGA | CMA-MEGA with |
↑
Expected discounted return
S. Fujimoto et al. 2018, "Addressing Function Approximation error in Actor-Critic Methods." ICML 2018.
Salimans et al. 2017, "Evolution Strategies as a Scalable Alternative to Reinforcement Learning" https://arxiv.org/abs/1703.03864
↑
Black Box
CMA-MEGA (ES) | CMA-MEGA (TD3, ES) | |
---|---|---|
$\bm{\nabla} f$ | ES | TD3 |
$\bm{\nabla} \bm{m}$ | ES | ES |
Quality Diversity (QD)
Differentiable Quality Diversity
(DQD)
Quality Diversity for Reinforcement Learning (QD-RL)
Approximating Gradients
for DQD in RL
Experiments
Results
QD Ant
QD Half-Cheetah
QD Hopper
QD Walker
Quality Diversity (QD)
Differentiable Quality Diversity
(DQD)
Quality Diversity for Reinforcement Learning (QD-RL)
Approximating Gradients
for DQD in RL
Experiments
Results
Best
2 legs
3 legs
Best
Back foot
Front foot
Best
High contact
Low contact
Best
Favoring one foot
CMA-MEGA (ES) | CMA-MEGA (TD3, ES) | |
---|---|---|
PGA-MAP-Elites | Comparable on 2/4 | Comparable on 4/4 |
MAP-Elites | Outperforms on 4/4 | Outperforms on 4/4 |
ME-ES | Outperforms on 3/4 | Outperforms on 4/4 |
Inspired by
PGA-MAP-Elites!
CMA-MEGA
(DQD Benchmark Domain)
Easy objective, difficult measures
CMA-MEGA (ES)
(QD Half-Cheetah)
Difficult objective, easy measures
PGA-MAP-Elites | CMA-MEGA (ES), CMA-MEGA (TD3, ES) | |
---|---|---|
Objective Gradient Steps | 5,000,000 | 5,000 |
A bare-bones Python library for quality diversity optimization.
B. Tjanaka et al. 2021, "pyribs: A bare-bones Python library for quality diversity optimization." https://github.com/icaros-usc/pyribs.
M. C. Fontaine*, Y.-C. Hsu*, Y. Zhang*, B. Tjanaka, S. Nikolaidis. "On the Importance of Environments in Human-Robot Collaboration." RSS 2021.
Learning collaborative strategies.
QD-RL for real-world
robots.
Enhance and publish
pyribs.
Bryon Tjanaka
Oral Qualification Exam, 14 April 2022
Quality Diversity (QD)
Differentiable Quality Diversity
(DQD)
Quality Diversity for Reinforcement Learning (QD-RL)
Approximating Gradients
for DQD in RL
Experiments
Results
$$sphere(\bm{\phi}) = \sum_{i=1}^n (\bm{\phi}_i-2.048)^2$$ $$clip(\bm{\phi}_i) = \begin{cases} \bm{\phi}_i & \text{if } -5.12 \le \bm{\phi}_i \le 5.12 \\ 5.12/\bm{\phi_i} & \text{otherwise} \end{cases}$$ $$\bm{m}(\bm{\phi}) = \left(\sum_{i=1}^{\lfloor\frac{n}{2}\rfloor} clip(\bm{\phi}_i), \sum_{i=\lfloor\frac{n}{2}\rfloor+1}^n clip(\bm{\phi}_i) \right)$$
Fontaine et al. 2020, "Covariance Matrix Adaptation for the Rapid Illumination of Behavior Space." GECCO 2020.
Fontaine and Nikolaidis 2021, "Differentiable Quality Diversity." NeurIPS 2021 Oral.
$$\bm{\nabla} f(\bm{\phi}) \approx \frac{1}{\lambda_{es}\sigma} \sum_{i=1}^{\lambda_{es}} f(\bm{\phi} + \sigma \bm{\epsilon}_i) \bm{\epsilon}_i$$
Vassiliades et al. 2018. "Using Centroidal Voronoi Tessellations to Scale Up the Multidimensional Archive of Phenotypic Elites Algorithm." IEEE Transactions on Evolutionary Computation 2018.
Fontaine et al. 2019. "Mapping Hearthstone Deck Spaces through MAP-Elites with Sliding Boundaries." GECCO 2019.