academia.curriculum module

Module contents

This module contains utilities for agents training. It basically controls interactions between agents and environments.

Exported classes:

In this module logging is used, which is handled using built-in logging library. Besides standard logging configuration, in certain methods such as LearningTask.run() and Curriculum.run() a user can specify verbosity level which can be used to filter out some of the logs. These verbosity levels are common throughout the entire module and are as follows:

Verbosity level

What is logged

0

no logging (except for errors)

1

Task finished/Task interrupted + warnings

2

Mean evaluation rewards

3

Each evaluation

4

Each episode

class academia.curriculum.Curriculum(tasks: list[LearningTask], output_dir: str | None = None)

Bases: SavableLoadable

Groups and executes instances of LearningTask in the specified order.

Parameters:
  • tasks – Tasks to be run. Tasks are run one by one so their order matters.

  • output_dir – A path to a file where agent states and training stats will be saved upon each task’s completion or interruption. If set to None, an agent’s state or training stats will not be saved at any point, unless relevant paths are specified for any of the tasks directly.

tasks

Tasks to be run. Tasks are run one by one so their order matters.

Type:

list[LearningTask]

output_dir

A path to a file where agent states and training stats will be saved upon each task’s completion or interruption. If set to None, an agent’s state or training stats will not be saved at any point, unless relevant paths are specified for any of the tasks directly.

Type:

str, optional

Examples

Initialization using class contructor:

>>> from academia.curriculum import LearningTask, Curriculum
>>> from academia.environments import LavaCrossing
>>> task1 = LearningTask(
>>>     env_type=LavaCrossing,
>>>     env_args={'difficulty': 0, 'render_mode': 'human', 'append_step_count': True},
>>>     stop_conditions={'max_episodes': 500},
>>> )
>>> task2 = LearningTask(
>>>     env_type=LavaCrossing,
>>>     env_args={'difficulty': 1, 'render_mode': 'human', 'append_step_count': True},
>>>     stop_conditions={'max_episodes': 1000},
>>> )
>>> curriculum = Curriculum(
>>>     tasks=[task1, task2],
>>>     output_dir='./my_curriculum/',
>>> )

Initializaton using a config file:

>>> from academia.curriculum import Curriculum
>>> curriculum = Curriculum.load('./my_config.curriculum.yml')

./my_config.curriculum.yml:

output_dir: './my_curriculum/'
order:
- 0
- 1
tasks:
  0:
    env_args:
      difficulty: 0
      render_mode: human
      append_step_count: True
    env_type: academia.environments.LavaCrossing
    evaluation_interval: 100
    stop_conditions:
      max_episodes: 500
  1:
    env_args:
      difficulty: 1
      render_mode: human
      append_step_count: True
    env_type: academia.environments.LavaCrossing
    evaluation_interval: 100
    stop_conditions:
      max_episodes: 1000

Running a curriculum:

>>> from academia.agents import DQNAgent
>>> from academia.utils.models import lava_crossing
>>> agent = DQNAgent(
>>>     n_actions=LavaCrossing.N_ACTIONS,
>>>     nn_architecture=lava_crossing.MLPStepDQN,
>>>     random_state=123,
>>> )
>>> curriculum.run(agent, verbose=4)
classmethod load(path: str) Curriculum

Loads a task configuration from the specified file.

A configuration file should be in YAML format. Tasks list should be stored using two properties: tasks and order - the former mapping task identifiers to their configuration and the latter being a list of task identifiers in the order of their execution. Individual task’s configurations can be either directly specified or a path to task’s configuration file can be provided. Other properties names should be identical to the arguments of the Curriculum’s constructor.

An example curriculum configuration file:

# my_config.curriculum.yml
output_dir: './my_curriculum/'
order:
- 0
- 1
tasks:
  0:
    # this task's config is specified here directly:
    env_args:
      difficulty: 0
      render_mode: human
    env_type: academia.environments.LavaCrossing
    evaluation_interval: 100
    stop_conditions:
      max_episodes: 500
  1:
    # this task's config lies in a separate file
    # path is relative to the location of my_config.curriculum.yml
    path: ./lava_crossing_hard.task.yml
Parameters:

path – Path to a configuration file. If the specified file does not end with ‘.yml’ extension, ‘.curriculum.yml’ will be appended to the specified path (for consistency with save() method).

Returns:

A Curriculum instance based on the configuration in the specified file.

run(agent: Agent, verbose=0)

Runs all tasks for the given agent. Agent’s states and training statistics will be saved upon each task’s completion or interruption if save paths are specified either for a specific task, or for the whole curriculum through agents_save_dir attribute.

Parameters:
  • agent – An agent to train

  • verbose – Verbosity level. These are common for the entire module - for information on different levels see academia.curriculum.

save(path: str) str

Saves this curriculum’s configuration to the file. Configuration is stored in a YAML format.

Parameters:

path – Path where a configuration file will be created. If the extension is not provided, it will will be automatically appended (‘.curriculum.yml’) to the specified path.

Returns:

A final (i.e. with an extension), absolute path where the configuration was saved.

property stats: dict[str, LearningStats]

A dictionary that maps task name/index to task statistics for every task in this curriculum.

class academia.curriculum.LearningStats(evaluation_interval: int)

Bases: SavableLoadable

Container for training statistics from LearningTask

episode_rewards

An array of floats which stores total rewards for each episode (excluding evaluations).

Type:

numpy.ndarray

agent_evaluations

An array of floats which stores total rewards for each evaluation.

Type:

numpy.ndarray

step_counts

An array of integers which stores step counts for each episode (excluding evaluations).

Type:

numpy.ndarray

episode_rewards_moving_avg

An array of floats which stores moving averages of total rewards for each episode (excluding evaluations). Each average is calculated from 5 observations.

Type:

numpy.ndarray

step_counts_moving_avg

An array of floats which stores moving averages of step counts for each episode (excluding evaluations). Each average is calculated from 5 observations.

Type:

numpy.ndarray

episode_wall_times

An array of floats which stores elapsed wall times for each episode (excluding evaluations).

Type:

numpy.ndarray

episode_cpu_times

An array of floats which stores elapsed CPU times for each episode (excluding evaluations).

Type:

numpy.ndarray

evaluation_interval

How often evaluations were conducted.

Type:

int

classmethod load(path: str)

Loads learning statistics from the specified file.

Specified file should be in JSON format. Example file:

{
    "episode_rewards": [1, 0, 0, 1],
    "step_counts": [250, 250, 250, 250],
    "episode_rewards_moving_avg": [1, 0.5, 0.33, 0.5],
    "step_counts_moving_avg": [250, 250, 250, 250],
    "agent_evaluations": [0, 0],
    "episode_wall_times": [
        0.5392518779990496,
        0.5948321321360364,
        0.6083159360059653,
        0.5948852870060364
    ],
    "episode_cpu_times": [
        2.1462997890000004,
        2.3829500180000007,
        2.4324373569999995,
        2.3217381230000001
    ],
    "evaluation_interval": 100
}
Parameters:

path – Path to a stats file. If the specified file does not end with ‘.json’ extension, ‘.stats.json’ will be appended to the specified path.

Returns:

A LearningStats instance with statistics from the specified file.

save(path: str) str

Saves this LearningStats’s contents to a file. Stats are stored in JSON format.

Parameters:

path – Path where a statistics file will be created. If the extension is not provided, it will will be automatically appended (‘.stats.json’) to the specified path.

Returns:

A final (i.e. with an extension), absolute path where the configuration was saved.

update(episode_no: int, episode_reward: float, steps_count: int, wall_time: float, cpu_time: float, verbose: int = 0) None

Updates and logs training statistics for a given episode

Parameters:
  • episode_no – Episode number (only for logging)

  • episode_reward – Total reward after the episode

  • steps_count – Steps count of the episode

  • wall_time – Actual time it took for the episode to finish

  • cpu_time – CPU time it took for the episode to finish

  • verbose – Verbosity level. See LearningTask.run() for information on different verboisty levels

class academia.curriculum.LearningStatsAggregator(stats: list[LearningStats] | list[dict[str, LearningStats]])

Bases: object

Aggregator of LearningStats objects. Accepts both a list of task stats and a list of curriculum stats stored as dictionaries (task-stats mapping).

Parameters:

stats – Statistics to be aggregated. These statistics can either come from different runs of a single task or different runs of a single curriculum.

stats

Statistics to be aggregated

Type:

Union[list[LearningStats], list[dict[str, LearningStats]]]

Examples

Aggregating multiple single task trajectories.

We are assuming that agent is defined by the user (see academia.agents for examples), and that a list of tasks with the same configuration is defined and run using the agent

>>> stats = [task.stats for task in tasks]
>>> aggregator = LearningStatsAggregator(stats)
>>> task_aggregate, timestamps = aggregator.get_aggregate(
>>>     time_domain = 'steps',
>>>     value_domain = 'agent_evaluations',
>>>     agg_func_name = 'mean',
>>> )

Aggregating multiple curricula trajectories.

We are assuming that agent is defined by the user (see academia.agents for examples), and that a list of curricula with the same configuration is defined and run using the agent

>>> stats = [curriculum.stats for curriculum in curricula]
>>> aggregator = LearningStatsAggregator(stats)
>>> curriculum_aggregate = aggregator.get_aggregate(
>>>     time_domain = 'steps',
>>>     value_domain = 'agent_evaluations',
>>>     agg_func_name = 'mean',
>>> )
>>> # `curriculum_aggregate` is a a dictionary with the same keys as
>>> # all `curriculum.stats`
>>> print(curriculum_aggregate['task_1']) # assuming `"task_1"` is name of one the tasks
Raises:
  • ValueError – If provided stats is not list-like.

  • ValueError – If provided stats is a list of dictionaries with mismatching keys.

  • ValueError – If provided stats is not composed of LearningStats.

get_aggregate(time_domain: Literal['steps', 'episodes', 'cpu_time', 'wall_time'] = 'steps', value_domain: Literal['agent_evaluations', 'episode_rewards', 'episode_rewards_moving_avg', 'step_counts', 'step_counts_moving_avg'] = 'agent_evaluations', agg_func_name: Literal['mean', 'min', 'max', 'std'] = 'mean') tuple[ndarray[Any, dtype[float32]], ndarray[Any, dtype[int32 | float32]]] | dict[str, tuple[ndarray[Any, dtype[float32]], ndarray[Any, dtype[int32 | float32]]]]

Creates an aggregate trajectory from a list of trajectories

Parameters:
  • time_domain – The time domain along which to aggregate the data. Defaults to "steps"`.

  • value_domain – The value domain across which to aggregate the data. Defaults to agent_evaluations.

  • agg_func_name – Name of the aggregate function used to aggregate the data. Defaults to "mean".

Returns:

Either a tuple of aggregated values and their timestamps or a dictionary with values being aggregate, timestamps tuples.

Raises:

ValueError – If an incorrect time_domain, value_domain or agg_func_name is passed.

class academia.curriculum.LearningTask(env_type: Type[ScalableEnvironment], env_args: dict, stop_conditions: dict, evaluation_interval: int = 100, evaluation_count: int = 25, include_init_eval: bool = True, greedy_evaluation: bool = True, exploration_reset_value: float | None = None, name: str | None = None, agent_save_path: str | None = None, stats_save_path: str | None = None)

Bases: SavableLoadable

Controls agent’s training.

Parameters:
  • env_type – A subclass of academia.environments.base.ScalableEnvironment that the agent will be trained on. This should be a class, not an instantiated object.

  • env_args – Arguments passed to the constructor of the environment class (passed as``env_type`` argument).

  • stop_conditions – Conditions deciding when to end the training process. For details see stop_predicates.

  • evaluation_interval – Controls how often evaluations are conducted. Defaults to 100.

  • evaluation_count – Controls how many evaluation episodes are run during a single evaluation. Final agent evaluation will be the mean of these individual evaluations. Defaults to 25.

  • include_init_eval – Whether or not to evaluate an agent before the training starts (i.e. right at the start of the run() method). Defaults to True.

  • greedy_evaluation – Whether or not the evaluation should be performed in greedy mode. Defaults to True.

  • exploration_reset_value – If specified, agent’s exploration parameter will get updated to that value after the task is finished. Unspecified by default.

  • name – Name of the task. This is unused when running a single task on its own. Hovewer, if specified it will appear in the logs and (optionally) in some file names if the task is run through the Curriculum object.

  • agent_save_path – A path to a file where agent’s state will be saved after the training is completed or if it is interrupted. If not set, an agent’s state will not be saved at any point.

  • stats_save_path – A path to a file where statistics gathered during training process will be saved after the training is completed or if it is interrupted. If not set, they will not be saved at any point.

Raises:

ValueError – If no valid stop conditions were passed.

env

An environment that an agent can interact with. It is of a type env_type, initialized with parameters from env_args.

Type:

ScalableEnvironment

stats

Learning statistics. For more detailed description of their contents see LearningStats.

Type:

LearningStats

name

Name of the task. This is unused when running a single task on its own. Hovewer, if specified it will appear in the logs and (optionally) in some file names if the task is run through the Curriculum object.

Type:

str, optional

agent_save_path

A path to a file where agent’s state will be saved after the training is completed or if it is interrupted. If set to None, an agent’s state will not be saved at any point.

Type:

str, optional

stats_save_path

A path to a file where statistics gathered during training process will be saved after the training is completed or if it is interrupted. If set to None, they will not be saved at any point.

Type:

str, optional

Examples

Initialization using class contructor:

>>> from academia.curriculum import LearningTask
>>> from academia.environments import LavaCrossing
>>> task = LearningTask(
>>>     env_type=LavaCrossing,
>>>     env_args={'difficulty': 2, 'render_mode': 'human', 'append_step_count': True},
>>>     stop_conditions={'max_episodes': 1000},
>>>     stats_save_path='./my_task_stats.json',
>>> )

Initializaton using a config file:

>>> from academia.curriculum import LearningTask
>>> task = LearningTask.load('./my_config.task.yml')

./my_config.task.yml:

env_type: academia.environments.LavaCrossing
env_args:
    difficulty: 2
    render_mode: human
    append_step_count: True
stop_conditions:
    max_episodes: 1000
stats_save_path: ./my_task_stats.json

Running a task:

>>> from academia.agents import DQNAgent
>>> from academia.utils.models import lava_crossing
>>> agent = DQNAgent(
>>>     n_actions=LavaCrossing.N_ACTIONS,
>>>     nn_architecture=lava_crossing.MLPStepDQN,
>>>     random_state=123,
>>> )
>>> task.run(agent, verbose=4)
classmethod from_dict(task_data: dict) LearningTask

Creates a task based on a configuration stored in a dictionary. This is a helper method used by the Curriculum class and it is not useful for the end user.

Parameters:

task_data – dictionary that contains raw contents from the configuration file

Returns:

A LearningTask instance based on the provided configuration.

classmethod load(path: str) LearningTask

Loads a task configuration from the specified file.

A configuration file should be in YAML format. Properties names should be identical to the arguments of the LearningTask’s constructor.

An example task configuration file:

# my_config.task.yml
env_type: academia.environments.LavaCrossing
env_args:
    difficulty: 2
    render_mode: human
stop_conditions:
    max_episodes: 1000
stats_save_path: ./my_task_stats.json
Parameters:

path – Path to a configuration file. If the specified file does not end with ‘.yml’ extension, ‘.task.yml’ will be appended to the specified path (for consistency with save() method).

Returns:

A LearningTask instance based on the configuration in the specified file.

run(agent: Agent, verbose=0) None

Runs the training loop for the given agent on an environment specified during this task’s initialization. Training statistics will be saved to a JSON file if stats_save_path is not None.

Parameters:
  • agent – An agent to train

  • verbose – Verbosity level. These are common for the entire module - for information on different levels see academia.curriculum.

save(path: str) str

Saves this task’s configuration to a file. Configuration is stored in a YAML format.

Parameters:

path – Path where a configuration file will be created. If the extension is not provided, it will will be automatically appended (‘.task.yml’) to the specified path.

Returns:

A final (i.e. with an extension), absolute path where the configuration was saved.

stop_predicates: dict[str, Callable[[Any, LearningStats], bool]] = {'max_episodes': <function _max_episodes_predicate>, 'max_reward_std_dev': <function _max_reward_std_dev_predicate>, 'max_steps': <function _max_steps_predicate>, 'max_wall_time': <function _max_wall_time_predicate>, 'min_avg_reward': <function _min_avg_reward_predicate>, 'min_evaluation_score': <function _min_evaluation_score_predicate>}

A class attribute that stores global (i.e. shared by every task) list of available learning stop conditions. These are stored as functions with the following signature:

>>> def my_stop_predicate(value, stats: LearningStats) -> bool:
>>>     pass

where value can be of any type and is passed in a stop_conditions dictionary through LearningTask’s constructor. The return value indicates whether learning should be stopped.

There are a few default stop predicates:

  • 'max_episodes' - maximum number of episodes,

  • 'max_steps' - maximum number of total steps,

  • 'min_avg_reward' - miniumum moving average of rewards (after at least five episodes),

  • 'max_reward_std_dev' - maximum standard deviation of the last 10 rewards,

  • 'min_evaluation_score' - minimum mean evaluation score,

  • 'max_wall_time’ - maximum elapsed wall time.

Example

Given that:

LearningTask.stop_predicates = {'predicate': my_stop_predicate}

and that a task was initialized with:

stop_conditions={'predicate': 500}

When checking whether the task should be stopped, a predicate would be called as follows:

my_stop_predicate(500, self.stats)
to_dict() dict

Puts this LearningTask’s configuration to a dictionary. This is a helper method used by the Curriculum class and it is not useful for the end user.

Returns:

A dictionary with the task configuration, ready to be written to a text file.