academia.curriculum module
Module contents
This module contains utilities for agents training. It basically controls interactions between agents and environments.
Exported classes:
In this module logging is used, which is handled using built-in logging
library. Besides standard logging
configuration, in certain methods such as LearningTask.run()
and Curriculum.run()
a user can
specify verbosity level which can be used to filter out some of the logs. These verbosity levels are common
throughout the entire module and are as follows:
Verbosity level |
What is logged |
---|---|
0 |
no logging (except for errors) |
1 |
Task finished/Task interrupted + warnings |
2 |
Mean evaluation rewards |
3 |
Each evaluation |
4 |
Each episode |
- class academia.curriculum.Curriculum(tasks: list[LearningTask], output_dir: str | None = None)
Bases:
SavableLoadable
Groups and executes instances of
LearningTask
in the specified order.- Parameters:
tasks – Tasks to be run. Tasks are run one by one so their order matters.
output_dir – A path to a file where agent states and training stats will be saved upon each task’s completion or interruption. If set to
None
, an agent’s state or training stats will not be saved at any point, unless relevant paths are specified for any of the tasks directly.
- tasks
Tasks to be run. Tasks are run one by one so their order matters.
- Type:
list[LearningTask]
- output_dir
A path to a file where agent states and training stats will be saved upon each task’s completion or interruption. If set to
None
, an agent’s state or training stats will not be saved at any point, unless relevant paths are specified for any of the tasks directly.- Type:
str, optional
Examples
Initialization using class contructor:
>>> from academia.curriculum import LearningTask, Curriculum >>> from academia.environments import LavaCrossing >>> task1 = LearningTask( >>> env_type=LavaCrossing, >>> env_args={'difficulty': 0, 'render_mode': 'human', 'append_step_count': True}, >>> stop_conditions={'max_episodes': 500}, >>> ) >>> task2 = LearningTask( >>> env_type=LavaCrossing, >>> env_args={'difficulty': 1, 'render_mode': 'human', 'append_step_count': True}, >>> stop_conditions={'max_episodes': 1000}, >>> ) >>> curriculum = Curriculum( >>> tasks=[task1, task2], >>> output_dir='./my_curriculum/', >>> )
Initializaton using a config file:
>>> from academia.curriculum import Curriculum >>> curriculum = Curriculum.load('./my_config.curriculum.yml')
./my_config.curriculum.yml
:output_dir: './my_curriculum/' order: - 0 - 1 tasks: 0: env_args: difficulty: 0 render_mode: human append_step_count: True env_type: academia.environments.LavaCrossing evaluation_interval: 100 stop_conditions: max_episodes: 500 1: env_args: difficulty: 1 render_mode: human append_step_count: True env_type: academia.environments.LavaCrossing evaluation_interval: 100 stop_conditions: max_episodes: 1000
Running a curriculum:
>>> from academia.agents import DQNAgent >>> from academia.utils.models import lava_crossing >>> agent = DQNAgent( >>> n_actions=LavaCrossing.N_ACTIONS, >>> nn_architecture=lava_crossing.MLPStepDQN, >>> random_state=123, >>> ) >>> curriculum.run(agent, verbose=4)
- classmethod load(path: str) Curriculum
Loads a task configuration from the specified file.
A configuration file should be in YAML format. Tasks list should be stored using two properties:
tasks
andorder
- the former mapping task identifiers to their configuration and the latter being a list of task identifiers in the order of their execution. Individual task’s configurations can be either directly specified or a path to task’s configuration file can be provided. Other properties names should be identical to the arguments of theCurriculum
’s constructor.An example curriculum configuration file:
# my_config.curriculum.yml output_dir: './my_curriculum/' order: - 0 - 1 tasks: 0: # this task's config is specified here directly: env_args: difficulty: 0 render_mode: human env_type: academia.environments.LavaCrossing evaluation_interval: 100 stop_conditions: max_episodes: 500 1: # this task's config lies in a separate file # path is relative to the location of my_config.curriculum.yml path: ./lava_crossing_hard.task.yml
- Parameters:
path – Path to a configuration file. If the specified file does not end with ‘.yml’ extension, ‘.curriculum.yml’ will be appended to the specified path (for consistency with
save()
method).- Returns:
A
Curriculum
instance based on the configuration in the specified file.
- run(agent: Agent, verbose=0)
Runs all tasks for the given agent. Agent’s states and training statistics will be saved upon each task’s completion or interruption if save paths are specified either for a specific task, or for the whole curriculum through
agents_save_dir
attribute.- Parameters:
agent – An agent to train
verbose – Verbosity level. These are common for the entire module - for information on different levels see
academia.curriculum
.
- save(path: str) str
Saves this curriculum’s configuration to the file. Configuration is stored in a YAML format.
- Parameters:
path – Path where a configuration file will be created. If the extension is not provided, it will will be automatically appended (‘.curriculum.yml’) to the specified path.
- Returns:
A final (i.e. with an extension), absolute path where the configuration was saved.
- property stats: dict[str, LearningStats]
A dictionary that maps task name/index to task statistics for every task in this curriculum.
- class academia.curriculum.LearningStats(evaluation_interval: int)
Bases:
SavableLoadable
Container for training statistics from LearningTask
- episode_rewards
An array of floats which stores total rewards for each episode (excluding evaluations).
- Type:
numpy.ndarray
- agent_evaluations
An array of floats which stores total rewards for each evaluation.
- Type:
numpy.ndarray
- step_counts
An array of integers which stores step counts for each episode (excluding evaluations).
- Type:
numpy.ndarray
- episode_rewards_moving_avg
An array of floats which stores moving averages of total rewards for each episode (excluding evaluations). Each average is calculated from 5 observations.
- Type:
numpy.ndarray
- step_counts_moving_avg
An array of floats which stores moving averages of step counts for each episode (excluding evaluations). Each average is calculated from 5 observations.
- Type:
numpy.ndarray
- episode_wall_times
An array of floats which stores elapsed wall times for each episode (excluding evaluations).
- Type:
numpy.ndarray
- episode_cpu_times
An array of floats which stores elapsed CPU times for each episode (excluding evaluations).
- Type:
numpy.ndarray
- evaluation_interval
How often evaluations were conducted.
- Type:
int
- classmethod load(path: str)
Loads learning statistics from the specified file.
Specified file should be in JSON format. Example file:
{ "episode_rewards": [1, 0, 0, 1], "step_counts": [250, 250, 250, 250], "episode_rewards_moving_avg": [1, 0.5, 0.33, 0.5], "step_counts_moving_avg": [250, 250, 250, 250], "agent_evaluations": [0, 0], "episode_wall_times": [ 0.5392518779990496, 0.5948321321360364, 0.6083159360059653, 0.5948852870060364 ], "episode_cpu_times": [ 2.1462997890000004, 2.3829500180000007, 2.4324373569999995, 2.3217381230000001 ], "evaluation_interval": 100 }
- Parameters:
path – Path to a stats file. If the specified file does not end with ‘.json’ extension, ‘.stats.json’ will be appended to the specified path.
- Returns:
A
LearningStats
instance with statistics from the specified file.
- save(path: str) str
Saves this
LearningStats
’s contents to a file. Stats are stored in JSON format.- Parameters:
path – Path where a statistics file will be created. If the extension is not provided, it will will be automatically appended (‘.stats.json’) to the specified path.
- Returns:
A final (i.e. with an extension), absolute path where the configuration was saved.
- update(episode_no: int, episode_reward: float, steps_count: int, wall_time: float, cpu_time: float, verbose: int = 0) None
Updates and logs training statistics for a given episode
- Parameters:
episode_no – Episode number (only for logging)
episode_reward – Total reward after the episode
steps_count – Steps count of the episode
wall_time – Actual time it took for the episode to finish
cpu_time – CPU time it took for the episode to finish
verbose – Verbosity level. See
LearningTask.run()
for information on different verboisty levels
- class academia.curriculum.LearningStatsAggregator(stats: list[LearningStats] | list[dict[str, LearningStats]])
Bases:
object
Aggregator of
LearningStats
objects. Accepts both a list of task stats and a list of curriculum stats stored as dictionaries (task-stats mapping).- Parameters:
stats – Statistics to be aggregated. These statistics can either come from different runs of a single task or different runs of a single curriculum.
- stats
Statistics to be aggregated
- Type:
Union[list[LearningStats], list[dict[str, LearningStats]]]
Examples
Aggregating multiple single task trajectories.
We are assuming that
agent
is defined by the user (seeacademia.agents
for examples), and that a list oftasks
with the same configuration is defined and run using the agent>>> stats = [task.stats for task in tasks] >>> aggregator = LearningStatsAggregator(stats) >>> task_aggregate, timestamps = aggregator.get_aggregate( >>> time_domain = 'steps', >>> value_domain = 'agent_evaluations', >>> agg_func_name = 'mean', >>> )
Aggregating multiple curricula trajectories.
We are assuming that
agent
is defined by the user (seeacademia.agents
for examples), and that a list ofcurricula
with the same configuration is defined and run using the agent>>> stats = [curriculum.stats for curriculum in curricula] >>> aggregator = LearningStatsAggregator(stats) >>> curriculum_aggregate = aggregator.get_aggregate( >>> time_domain = 'steps', >>> value_domain = 'agent_evaluations', >>> agg_func_name = 'mean', >>> ) >>> # `curriculum_aggregate` is a a dictionary with the same keys as >>> # all `curriculum.stats` >>> print(curriculum_aggregate['task_1']) # assuming `"task_1"` is name of one the tasks
- Raises:
ValueError – If provided
stats
is not list-like.ValueError – If provided
stats
is a list of dictionaries with mismatching keys.ValueError – If provided
stats
is not composed ofLearningStats
.
- get_aggregate(time_domain: Literal['steps', 'episodes', 'cpu_time', 'wall_time'] = 'steps', value_domain: Literal['agent_evaluations', 'episode_rewards', 'episode_rewards_moving_avg', 'step_counts', 'step_counts_moving_avg'] = 'agent_evaluations', agg_func_name: Literal['mean', 'min', 'max', 'std'] = 'mean') tuple[ndarray[Any, dtype[float32]], ndarray[Any, dtype[int32 | float32]]] | dict[str, tuple[ndarray[Any, dtype[float32]], ndarray[Any, dtype[int32 | float32]]]]
Creates an aggregate trajectory from a list of trajectories
- Parameters:
time_domain – The time domain along which to aggregate the data. Defaults to
"steps"`
.value_domain – The value domain across which to aggregate the data. Defaults to
agent_evaluations
.agg_func_name – Name of the aggregate function used to aggregate the data. Defaults to
"mean"
.
- Returns:
Either a tuple of aggregated values and their timestamps or a dictionary with values being aggregate, timestamps tuples.
- Raises:
ValueError – If an incorrect
time_domain
,value_domain
oragg_func_name
is passed.
- class academia.curriculum.LearningTask(env_type: Type[ScalableEnvironment], env_args: dict, stop_conditions: dict, evaluation_interval: int = 100, evaluation_count: int = 25, include_init_eval: bool = True, greedy_evaluation: bool = True, exploration_reset_value: float | None = None, name: str | None = None, agent_save_path: str | None = None, stats_save_path: str | None = None)
Bases:
SavableLoadable
Controls agent’s training.
- Parameters:
env_type – A subclass of
academia.environments.base.ScalableEnvironment
that the agent will be trained on. This should be a class, not an instantiated object.env_args – Arguments passed to the constructor of the environment class (passed as``env_type`` argument).
stop_conditions – Conditions deciding when to end the training process. For details see
stop_predicates
.evaluation_interval – Controls how often evaluations are conducted. Defaults to 100.
evaluation_count – Controls how many evaluation episodes are run during a single evaluation. Final agent evaluation will be the mean of these individual evaluations. Defaults to 25.
include_init_eval – Whether or not to evaluate an agent before the training starts (i.e. right at the start of the
run()
method). Defaults toTrue
.greedy_evaluation – Whether or not the evaluation should be performed in greedy mode. Defaults to
True
.exploration_reset_value – If specified, agent’s exploration parameter will get updated to that value after the task is finished. Unspecified by default.
name – Name of the task. This is unused when running a single task on its own. Hovewer, if specified it will appear in the logs and (optionally) in some file names if the task is run through the
Curriculum
object.agent_save_path – A path to a file where agent’s state will be saved after the training is completed or if it is interrupted. If not set, an agent’s state will not be saved at any point.
stats_save_path – A path to a file where statistics gathered during training process will be saved after the training is completed or if it is interrupted. If not set, they will not be saved at any point.
- Raises:
ValueError – If no valid stop conditions were passed.
- env
An environment that an agent can interact with. It is of a type
env_type
, initialized with parameters fromenv_args
.- Type:
- stats
Learning statistics. For more detailed description of their contents see
LearningStats
.- Type:
- name
Name of the task. This is unused when running a single task on its own. Hovewer, if specified it will appear in the logs and (optionally) in some file names if the task is run through the
Curriculum
object.- Type:
str, optional
- agent_save_path
A path to a file where agent’s state will be saved after the training is completed or if it is interrupted. If set to
None
, an agent’s state will not be saved at any point.- Type:
str, optional
- stats_save_path
A path to a file where statistics gathered during training process will be saved after the training is completed or if it is interrupted. If set to
None
, they will not be saved at any point.- Type:
str, optional
Examples
Initialization using class contructor:
>>> from academia.curriculum import LearningTask >>> from academia.environments import LavaCrossing >>> task = LearningTask( >>> env_type=LavaCrossing, >>> env_args={'difficulty': 2, 'render_mode': 'human', 'append_step_count': True}, >>> stop_conditions={'max_episodes': 1000}, >>> stats_save_path='./my_task_stats.json', >>> )
Initializaton using a config file:
>>> from academia.curriculum import LearningTask >>> task = LearningTask.load('./my_config.task.yml')
./my_config.task.yml
:env_type: academia.environments.LavaCrossing env_args: difficulty: 2 render_mode: human append_step_count: True stop_conditions: max_episodes: 1000 stats_save_path: ./my_task_stats.json
Running a task:
>>> from academia.agents import DQNAgent >>> from academia.utils.models import lava_crossing >>> agent = DQNAgent( >>> n_actions=LavaCrossing.N_ACTIONS, >>> nn_architecture=lava_crossing.MLPStepDQN, >>> random_state=123, >>> ) >>> task.run(agent, verbose=4)
- classmethod from_dict(task_data: dict) LearningTask
Creates a task based on a configuration stored in a dictionary. This is a helper method used by the
Curriculum
class and it is not useful for the end user.- Parameters:
task_data – dictionary that contains raw contents from the configuration file
- Returns:
A
LearningTask
instance based on the provided configuration.
- classmethod load(path: str) LearningTask
Loads a task configuration from the specified file.
A configuration file should be in YAML format. Properties names should be identical to the arguments of the
LearningTask
’s constructor.An example task configuration file:
# my_config.task.yml env_type: academia.environments.LavaCrossing env_args: difficulty: 2 render_mode: human stop_conditions: max_episodes: 1000 stats_save_path: ./my_task_stats.json
- Parameters:
path – Path to a configuration file. If the specified file does not end with ‘.yml’ extension, ‘.task.yml’ will be appended to the specified path (for consistency with
save()
method).- Returns:
A
LearningTask
instance based on the configuration in the specified file.
- run(agent: Agent, verbose=0) None
Runs the training loop for the given agent on an environment specified during this task’s initialization. Training statistics will be saved to a JSON file if
stats_save_path
is notNone
.- Parameters:
agent – An agent to train
verbose – Verbosity level. These are common for the entire module - for information on different levels see
academia.curriculum
.
- save(path: str) str
Saves this task’s configuration to a file. Configuration is stored in a YAML format.
- Parameters:
path – Path where a configuration file will be created. If the extension is not provided, it will will be automatically appended (‘.task.yml’) to the specified path.
- Returns:
A final (i.e. with an extension), absolute path where the configuration was saved.
- stop_predicates: dict[str, Callable[[Any, LearningStats], bool]] = {'max_episodes': <function _max_episodes_predicate>, 'max_reward_std_dev': <function _max_reward_std_dev_predicate>, 'max_steps': <function _max_steps_predicate>, 'max_wall_time': <function _max_wall_time_predicate>, 'min_avg_reward': <function _min_avg_reward_predicate>, 'min_evaluation_score': <function _min_evaluation_score_predicate>}
A class attribute that stores global (i.e. shared by every task) list of available learning stop conditions. These are stored as functions with the following signature:
>>> def my_stop_predicate(value, stats: LearningStats) -> bool: >>> pass
where
value
can be of any type and is passed in astop_conditions
dictionary throughLearningTask
’s constructor. The return value indicates whether learning should be stopped.There are a few default stop predicates:
'max_episodes'
- maximum number of episodes,'max_steps'
- maximum number of total steps,'min_avg_reward'
- miniumum moving average of rewards (after at least five episodes),'max_reward_std_dev'
- maximum standard deviation of the last 10 rewards,'min_evaluation_score'
- minimum mean evaluation score,'max_wall_time
’ - maximum elapsed wall time.
Example
Given that:
LearningTask.stop_predicates = {'predicate': my_stop_predicate}
and that a task was initialized with:
stop_conditions={'predicate': 500}
When checking whether the task should be stopped, a predicate would be called as follows:
my_stop_predicate(500, self.stats)
- to_dict() dict
Puts this
LearningTask
’s configuration to a dictionary. This is a helper method used by theCurriculum
class and it is not useful for the end user.- Returns:
A dictionary with the task configuration, ready to be written to a text file.