Configuring tasks & curricula

Intro

In academia package, there are two ways of initializing tasks and curricula. The first method is through the use of LearningTask and Curriculum’s constructors. The other utilizes configuration files in the YAML format and load_task_config() or load_curriculum_config() functions.

Here is an example of initializing a simple curriculum directly in a script:

 1from academia.curriculum import LearningTask, Curriculum
 2from academia.environments import LavaCrossing
 3
 4# define tasks
 5task1 = LearningTask(
 6    env_type=LavaCrossing,
 7    env_args={'difficulty': 0, 'render_mode': 'human', 'append_step_count': True},
 8    stop_conditions={'max_episodes': 500},
 9)
10task2 = LearningTask(
11    env_type=LavaCrossing,
12    env_args={'difficulty': 1, 'render_mode': 'human', 'append_step_count': True},
13    stop_conditions={'max_episodes': 1000},
14)
15
16# define a curriculum
17curriculum = Curriculum(
18    tasks=[task1, task2],
19    output_dir='./my_curriculum/',
20)

This code creates a curriculum which comprises the first two levels of the Lava Crossing environment. An identical curriculum can be defined with the following configuration file:

my_config.curriculum.yml
 1output_dir: './my_curriculum/'
 2order:
 3- 0
 4- 1
 5tasks:
 6  0:
 7    env_args:
 8      difficulty: 0
 9      render_mode: human
10      append_step_count: True
11    env_type: academia.environments.LavaCrossing
12    evaluation_interval: 100
13    stop_conditions:
14      max_episodes: 500
15  1:
16    env_args:
17      difficulty: 1
18      render_mode: human
19      append_step_count: True
20    env_type: academia.environments.LavaCrossing
21    evaluation_interval: 100
22    stop_conditions:
23      max_episodes: 1000

This can be then loaded with a single line of code:

1from academia.curriculum import load_curriculum_config
2
3curriculum = load_curriculum_config('my_config.curriculum.yml')

Neither method is better than the other and it is up to the user to choose which one they prefer. Initializing through code gives more flexibility and can be easier for users not familiar with academia’s API. On the other hand, configuration files allow to extract most of the configuration logic out of the source code. They can also make large and complex configurations more concise and readable, which might make them a better option for more complex experiments.

To learn about the specific parameters for environments, tasks and curricula, feel free to explore the rest of the documentation to get familiar with academia’s functions and classes. The rest of this guide will focus on YAML configuration files. More specifically, we will explore some special features which make this method flexible and allow users to avoid duplication in their configuraions.

Note

While the configuration has to be in the YAML format, academia does not enforce any particular file extensions. However, it is often a good practise to differentiate task and curricula configuration files by using extensions such as .task.yml or .curriculum.yml.

Default task parameters inside a curriculum

Tasks inside a single curriculum often share similar sets of parameter values. For example, all of them could utilize the same environment, but with different difficulty levels. Curriculum configuration file allows to define a set of default parameters for tasks inside that curriculum.

In the example configuration above, both tasks share a lot of the same configuration, which leads to lots of code duplication. Below are highlighted the only unique pieces of configuration for both tasks:

 1output_dir: './my_curriculum/'
 2order:
 3- 0
 4- 1
 5tasks:
 6  0:
 7    env_args:
 8      difficulty: 0
 9      render_mode: human
10      append_step_count: True
11    env_type: academia.environments.LavaCrossing
12    evaluation_interval: 100
13    stop_conditions:
14      max_episodes: 500
15  1:
16    env_args:
17      difficulty: 1
18      render_mode: human
19      append_step_count: True
20    env_type: academia.environments.LavaCrossing
21    evaluation_interval: 100
22    stop_conditions:
23      max_episodes: 1000

To address this issue, a special _default task can be defined for the curriculum, which provides default parameters values for all tasks defined or loaded in this curriculum (more on loading later). The configuration listed above can be simplified in the following way:

 1output_dir: './my_curriculum/'
 2order:
 3- 0
 4- 1
 5tasks:
 6  _default:
 7    env_args:
 8      render_mode: human
 9      append_step_count: True
10    env_type: academia.environments.LavaCrossing
11    evaluation_interval: 100
12  0:
13    env_args:
14      difficulty: 0
15    stop_conditions:
16      max_episodes: 500
17  1:
18    env_args:
19      difficulty: 1
20    stop_conditions:
21      max_episodes: 1000

Now, all common configuration has been moved to the _default task, and the tasks define only their unique arguments. Note that the _default task can also be used in the curriculum, just as any other task. All we need to do is to supply all required parameters to it. Consider the following configuration, which again is equivalent to the ones listed before:

 1output_dir: './my_curriculum/'
 2order:
 3- easier
 4- _default
 5tasks:
 6  _default:
 7    env_args:
 8      difficulty: 1
 9      render_mode: human
10      append_step_count: True
11    env_type: academia.environments.LavaCrossing
12    evaluation_interval: 100
13    stop_conditions:
14      max_episodes: 1000
15  easier:
16    env_args:
17      difficulty: 0
18    stop_conditions:
19      max_episodes: 500

In curriculum learning, the final environment is treated as the most important one, and all other tasks are only there to speed up the training. It makes sense then to mark the target environment as _default in the configuration, and then for easier tasks define just their unique pieces of configuration. This is exactly what we do in the above example. Notice that both _default and easier tasks define the environment difficulty, as well as a max episodes stop condition. Each task can override the default configuration, and this is exactly what happens here. For instance, the easier task is now going to end after 500 episodes - if we did not specify this stop condition here, it would end after 1000 episodes, just as declared in the _default task.

Loading configurations from external files

It is not uncommon for multiple curricula to share common tasks. Let us say we want to design two curricula for the Door Key environment. Consider the difficulty level of 2 as the target difficulty for this environment. In the first curriculum, we want an agent to go through all the difficulty levels up to the level 2, starting at level 0. In the other curriculum, we want it to skip the level 1 and go straight from level 0 to level 2. Below are example configurations for both scenarios:

full.curriculum.yml
 1order:
 2  - 0
 3  - 1
 4  - 2
 5tasks:
 6  _default:
 7    env_type: academia.environments.DoorKey
 8    stop_conditions:
 9      min_evaluation_score: 0.9
10    evaluation_interval: 100
11    evaluation_count: 25
12    include_init_eval: True
13  0:
14    name: 'Easy task'
15    env_args:
16      difficulty: 0
17  1:
18    name: 'Intermediate task'
19    env_args:
20      difficulty: 1
21  2:
22    name: 'Hard task'
23    env_args:
24      difficulty: 2
25    stop_conditions:
26      max_episodes: 1000
task-skip.curriculum.yml
 1order:
 2  - 0
 3  - 2
 4tasks:
 5  _default:
 6    env_type: academia.environments.DoorKey
 7    stop_conditions:
 8      min_evaluation_score: 0.9
 9    evaluation_interval: 100
10    evaluation_count: 25
11    include_init_eval: True
12  0:
13    name: 'Easy task'
14    env_args:
15      difficulty: 0
16  2:
17    name: 'Hard task'
18    env_args:
19      difficulty: 2
20    stop_conditions:
21      max_episodes: 1000

We use the _default task to avoid configuration duplication in each of the files. Still, the configurations for tasks named “Easy task” and “Hard task” are identical in both files. It would be nice to somehow extract it to a separate file, and load it in both of the above’s configurations. Luckily, we can do it using the special attribute named _load. It tells the configuration loaders to load YAML attributes from another file. This way, we can split the above configurations into multiple files to create an equivalent configuration:

easy.task.yml
1name: 'Easy task'
2env_args:
3  difficulty: 0
intermediate.task.yml
1name: 'Intermediate task'
2env_args:
3  difficulty: 1
hard.task.yml
1name: 'Hard task'
2env_args:
3  difficulty: 2
4stop_conditions:
5  max_episodes: 1000
full.curriculum.yml
 1order:
 2  - 0
 3  - 1
 4  - 2
 5tasks:
 6  _default:
 7    env_type: academia.environments.DoorKey
 8    stop_conditions:
 9      min_evaluation_score: 0.9
10    evaluation_interval: 100
11    evaluation_count: 25
12    include_init_eval: True
13  0:
14    _load: ./easy.task.yml
15  1:
16    _load: ./intermediate.task.yml
17  2:
18    _load: ./hard.task.yml
task-skip.curriculum.yml
 1order:
 2  - 0
 3  - 2
 4tasks:
 5  _default:
 6    env_type: academia.environments.DoorKey
 7    stop_conditions:
 8      min_evaluation_score: 0.9
 9    evaluation_interval: 100
10    evaluation_count: 25
11    include_init_eval: True
12  0:
13    _load: ./easy.task.yml
14  2:
15    _load: ./hard.task.yml

Note that the path provided for the _load attribute must be relative to the current configuration file.

The _load special attribute can be used not just to load tasks. It is designed to be able to load attributes from any YAML file, which makes it very versatile. For example, in the above configurations, since the _default task is also shared across both curricula, we could extract its parameters into a separate file. It could look as follows for full curriculum (analogously for the task-skip curriculum):

task-defaults.yml
1env_type: academia.environments.DoorKey
2stop_conditions:
3  min_evaluation_score: 0.9
4evaluation_interval: 100
5evaluation_count: 25
6include_init_eval: True
full.curriculum.yml
 1order:
 2  - 0
 3  - 1
 4  - 2
 5tasks:
 6  _default:
 7    _load: ./task-defaults.yml
 8  0:
 9    _load: ./easy.task.yml
10  1:
11    _load: ./intermediate.task.yml
12  2:
13    _load: ./hard.task.yml

The _load special attribute could also be chained, i.e. you can load a file, which has _load in it, and it will also be handled. Also, just like with the _default task, attributes loaded with the _load can be overriden if you specify them alongside the _load attribute:

full.curriculum.yml
 1order:
 2  - 0
 3  - 1
 4  - 2
 5tasks:
 6  _default:
 7    _load: ./task-defaults.yml
 8    # this will override the evaluation_count of 25 from ./task-defaults.yml:
 9    evaluation_count: 10
10  0:
11    _load: ./easy.task.yml
12  1:
13    _load: ./intermediate.task.yml
14  2:
15    _load: ./hard.task.yml

This is just one way to transform these configurations, and there could possibly be even better ways to structure them. Remember that the _load special attribute can be used in both tasks and curricula configurations.

Variables in configuration files

So far all configuration files we looked at had all the parameter values hardcoded. There could be cases however when we might want to input some of the parameters dynamically. For example, let us say we want to run a task 10 times to be able to average the results of our experiment across different independent runs. Consider the following configuration file and script:

doorkey.task.yml
 1env_type: academia.environments.DoorKey
 2env_args:
 3  difficulty: 0
 4  append_step_count: True
 5  random_state: 123
 6stop_conditions:
 7  min_evaluation_score: 0.9
 8evaluation_interval: 100
 9evaluation_count: 25
10include_init_eval: True
run.py
1from academia.curriculum import load_task_config
2
3stats = []
4
5for run_no in range(10):
6    agent = ...  # initialise some agent here
7    task = load_task_config('./doorkey.task.yml')
8    task.run(agent)
9    stats.append(task.stats)

Note that we specify a random state to the Door Key environment to ensure reproducibility of our experiments. However, it could be better to pass a different random seed to the environment for each individual run. We can achieve this using variables inside our configuration. Variables are marked by a dollar sign $ in the configuration files and can be used as follows:

doorkey.task.yml
 1env_type: academia.environments.DoorKey
 2env_args:
 3  difficulty: 0
 4  append_step_count: True
 5  random_state: $env_random_state
 6stop_conditions:
 7  min_evaluation_score: 0.9
 8evaluation_interval: 100
 9evaluation_count: 25
10include_init_eval: True
run.py
 1from academia.curriculum import load_task_config
 2
 3stats = []
 4
 5for run_no in range(10):
 6    agent = ...  # initialise some agent here
 7    task = load_task_config('./doorkey.task.yml', variables={
 8        'env_random_state': run_no,
 9    })
10    task.run(agent)
11    stats.append(stats)

The same syntax applies for the load_curriculum_config() function. Variables can also be used in external files loaded via the _load attribute - the same variables dictionary will be used to resolve variables in any loaded files.

Variables can also be useful in setting parameters which are not possible to be set directly in the configuration files. Good examples of such parameters are task_callback for Curriculum and episode_callback for LearningTask. In the following example, we use a variable to configure the former:

my_curriculum.yml
 1output_dir: './my_curriculum/'
 2task_callback: $task_callback
 3order:
 4- 0
 5- 1
 6tasks:
 7  _default:
 8    env_args:
 9      render_mode: human
10      append_step_count: True
11    env_type: academia.environments.LavaCrossing
12    evaluation_interval: 100
13  0:
14    env_args:
15      difficulty: 0
16    stop_conditions:
17      max_episodes: 500
18  1:
19    env_args:
20      difficulty: 1
21    stop_conditions:
22      max_episodes: 1000
run.py
 1from academia.agents.base import Agent
 2from academia.curriculum import LearningStats, load_curriculum_config
 3
 4
 5def my_task_callback(agent: Agent, stats: LearningStats, task_id: str) -> None:
 6    agent.reset_exploration(0.8)
 7
 8
 9task = load_curriculum_config('my_curriculum.yml', variables={
10    'task_callback': my_task_callback,
11})

These examples provide just the most common use cases. Variables have also been designed with versitality in mind, and could also be used to specify full tasks inside a curriculum, or to order tasks in a curriculum. Basically, any attribute in the configuration (except for _load) can have a variable assigned to it with a value provided at runtime upon loading.

Note

Variables cannot be used to dynamically provide paths for the _load attribute. This is because by design all loads are handled before variables are resolved.