Running Distributed Quality-Diversity Algorithms on HPC

Bryon Tjanaka, ICAROS Lab | 16 February 2021

Article

Article QR Code

Slides

Article QR Code

Overview

  1. Distributed QD Algorithms
  2. Hardware
  3. Distributed Computations with Dask
  4. Additional Considerations
  5. Further Resources
  6. Demo

Distributed QD Algorithms

Regular QD


archive = init_archive()

for n in range(generations):
    solutions = create_new_solutions(archive)
    evaluate(solutions)
    insert_into_archive(solutions)
            

Expensive!

Source

Distributed QD


archive = init_archive()

for n in range(generations):
    solutions = create_new_solutions(archive)
    distribute_evaluations(solutions)
    retrieve_evaluation_results()
    insert_into_archive(solutions)
            

Hardware

Single-Machine


ssh USCNETID@discovery.usc.edu
            

1 node

20+ cores

100+ GB RAM


sbatch job.slurm
            

job.slurm

Slurm config

(Bash) commands

Distributing Computation with Dask

  • Python Library
  • Workers + Scheduler
  • TCP Connections

Running an Experiment on Dask

  1. Start scheduler job
  2. Start worker jobs
  3. Connect workers to scheduler
  4. Connect experiment to scheduler
  5. Repeatedly request evaluations

Additional Considerations

Integrating Dask into an Experiment

Environments / Containers

Configuration

Cluster

Algorithm

Running Robust Experiments


def evaluation():
    logging.info("...")
              

Logging and Monitoring

Reloading


save(reload_data, "tmp.pkl")
rename("tmp.pkl", "reload.pkl")
              

Further Resources

Lunar Lander Example (pyribs)

DQD-RL

Getting Started with HPC

Demo

View Notebook Download Notebook