academia.environments module

Submodules

academia.environments.base module

Module contents

This module contains environments that agents can be trained on.

Most of these environments are wrappers for environments from MiniGrid or gymnasium packages (BridgeBuilding is the exception). The main purpose of these wrappers is to simplify scaling environments difficulty to adjusting a single difficulty parameter. Scalability of environments makes them suitable for Curriculum Learning.

Exported classes:

BridgeBuilding
LavaCrossing
DoorKey
LunarLander
MsPacman

Note

If you wish to use your own environment please refer to Using your own environments.

class academia.environments.BridgeBuilding(difficulty: int, river_width: int = 2, max_steps: int = 100, render_mode: Literal['human'] | None = None, obs_type: Literal['string', 'array'] = 'array', reward_density: Literal['sparse', 'dense'] = 'sparse', n_frames_stacked: int = 1, append_step_count: bool = False, random_state: int | None = None)

Bases: ScalableEnvironment

A grid environment where an agent has to use boulders scattered around the map to build a bridge across a river and get from one side to the other. The higher the difficulty the more work the agent has to put into building the bridge. At lower difficulties the bridge is partly (or even fully) built and the agent only has to learn how to finish it and/or navigate it.

The reward system is presented in the table below. Note that the last two rewards can only be obtained if reward_density is set to "dense":

Event	Reward
Running out of time	0
Drowning in the river	0
Reaching the goal	1 - `step_count`/`max_steps` + `bridge_length`/`river_width`
(Dense) Constructing the bridge	0.5 * (`length_after_step` - `length_before_step`)
(Dense) Deconstructing the bridge	0.5 * (`length_after_step` - `length_before_step`)

The main reward function (reaching the goal) is meant to mimic Minigrid’s reward function but its last component also forces the agent to fully build the bridge.

Possible actions:

Num	Name	Action
0	left	Turn left
1	right	Turn right
2	forward	Move forward
3	pickup/drop	Pick up/Drop boulder

Difficulty levels:

Difficulty	Description
n	The bridge is missing n boulders

Parameters:

difficulty – Difficulty level from 0 to river_width, where 0 is the easiest and river_width is the hardest.
river_width – The width of the river.
max_steps – The maximum number of steps an agent can spend in the environment. If the agent doesn’t reach the goal in that time the episode terminates. Defaults to 100.
render_mode – How the environment should be rendered. If set to "human" the environment will be rendered in a way interpretable by a human. Defaults to None.
obs_type – How should the state be observed. If "string" a string representing the state will be returned. If "array" an array representing the state will be returned. Defaults to "array".
reward_density – The density of the reward function. Possible values are "sparse" and "dense". If "sparse" is passed the agent will only get the reward at the end of the episode. If "dense" is passed the agent will additionally obtain rewards (and penalties) for constructing (or deconstructing) parts of the bridge. Defaults to "sparse".
n_frames_stacked – How many most recent states should be stacked together to form a final state representation. Defaults to 1.
append_step_count – Whether or not append the current step count to each state. Defaults to False.
random_state – Optional seed that controls the randomness of the environment. Defaults to None.

Raises:

ValueError – If the specified river width level is invalid.
ValueError – If the specified difficulty level is invalid.

step_count

Current step count since the last reset.

Type:: int

difficulty

Difficulty level. Higher values indicate more difficult environments.

Type:: int

river_width

The width of the river.

Type:: int

n_frames_stacked

How many most recent states should be stacked together to form a final state representation.

Type:: int

append_step_count

Whether or not append the current step count to each state.

Type:: bool

max_steps

The maximum number of steps an agent can spend in the environment.

Type:: int

render_mode Optiona[Literal["human"]]: How the environment should be rendered.

obs_type Literal["string", "array"]: How should the state be observed.

reward_density: The density of the reward function.

N_ACTIONS: int = 4: Number of available actions.

STATE_SHAPE: tuple[int, ...]: Shape of the state representation. Can vary for each instance

get_legal_mask() → ndarray[Any, dtype[int32]]

Returns a binary legal action mask.

Returns:: A binary mask with 0s in place for illegal actions (actions that have no effect) and 1s for legal actions.

observe() → str | ndarray[Any, dtype[float32]]

Returns the current state of the environment. Performs state stacking if n_frames_stacked is greater than 1.

Returns:: The current state of the environment.

render() → None: Renders the environment in the current render mode.

reset() → str | ndarray[Any, dtype[float32]]

Resets the environment to its initial state.

Returns:: The new state after resetting the environment.

step(action) → tuple[str | ndarray[Any, dtype[float32]], float, bool]

Advances the environment by one step given the specified action.

Parameters:: action – The action to take.
Returns:: A tuple containing the new state, reward, and a flag indicating episode termination.

class academia.environments.DoorKey(difficulty: int, n_frames_stacked: int = 1, append_step_count: bool = False, random_state: int | None = None, **kwargs)

Bases: GenericMiniGridWrapper

This class is a wrapper for MiniGrid’s Door Key environments.

DoorKey is a grid environment where an agent has to find a key and then open a door to reach the destination. The higher the difficulty, the bigger the grid so it is more complicated to find the key, then the door and then the destination.

Possible actions:

Num	Name	Action
0	left	Turn left
1	right	Turn right
2	forward	Move forward
3	pickup	Pick up an object
4	toggle	Toggle/activate an object

Difficulty levels:

Difficulty	Description
0	5x5 grid size with 1 key and 1 door
1	6x6 grid size with 1 key and 1 door
2	8x8 grid size with 1 key and 1 door
3	16x16 grid size with 1 key and 1 door

See also

MiniGrid’s Door Key environments: https://minigrid.farama.org/environments/minigrid/DoorKeyEnv/

Parameters:

difficulty – Difficulty level from 0 to 3, where 0 is the easiest and 3 is the hardest.
n_frames_stacked – How many most recent states should be stacked together to form a final state representation. Defaults to 1.
append_step_count – Whether or not append the current step count to each state. Defaults to False.
random_state – Optional seed that controls the randomness of the environment. Defaults to None.
kwargs – Arguments passed down to gymnasium.make.

Raises:

ValueError – If the specified difficulty level is invalid.

step_count

Current step count since the last reset.

Type:: int

difficulty

Difficulty level. Higher values indicate more difficult environments.

Type:: int

n_frames_stacked

How many most recent states should be stacked together to form a final state representation.

Type:: int

append_step_count

Whether or not append the current step count to each state.

Type:: bool

N_ACTIONS: int = 5: Number of available actions.

STATE_SHAPE: tuple[int, ...]: Shape of the state representation. Can vary for each instance

class academia.environments.LavaCrossing(difficulty: int, n_frames_stacked: int = 1, append_step_count: bool = False, random_state: int | None = None, **kwargs)

Bases: GenericMiniGridWrapper

This class is a wrapper for MiniGrid’s Lava Crossing environments.

A grid environment where an agent has to avoid patches of lava in order to reach the destination. The higher the difficulty, the more lava patches are generated on the grid.

Possible actions:

Num	Name	Action
0	left	Turn left
1	right	Turn right
2	forward	Move forward

Difficulty levels:

Difficulty	Description
0	9x9 grid size with 1 lava patch
1	9x9 grid size with 2 lava patches
2	9x9 grid size with 3 lava patches
3	11x11 grid size with 5 lava patches

See also

MiniGrid’s Lava Crossing environments: https://minigrid.farama.org/environments/minigrid/CrossingEnv/

Parameters:

difficulty – Difficulty level from 0 to 3, where 0 is the easiest and 3 is the hardest.
n_frames_stacked – How many most recent states should be stacked together to form a final state representation. Defaults to 1.
append_step_count – Whether or not append the current step count to each state. Defaults to False.
random_state – Optional seed that controls the randomness of the environment. Defaults to None.
kwargs – Arguments passed down to gymnasium.make.

Raises:

ValueError – If the specified difficulty level is invalid.

step_count

Current step count since the last reset.

Type:: int

difficulty

Difficulty level. Higher values indicate more difficult environments.

Type:: int

n_frames_stacked

How many most recent states should be stacked together to form a final state representation.

Type:: int

append_step_count

Whether or not append the current step count to each state.

Type:: bool

N_ACTIONS: int = 3: Number of available actions.

STATE_SHAPE: tuple[int, ...]: Shape of the state representation. Can vary for each instance

class academia.environments.LunarLander(difficulty: int, n_frames_stacked: int = 1, append_step_count: bool = False, random_state: int | None = None, **kwargs)

Bases: GenericGymnasiumWrapper

This class is a wrapper for Gymnasium’s Lunar Lander environment, which itself is a variant of the classic Lunar Lander game.

The goal is to land a spacecraft on the moon’s surface by controlling its thrusters. The environment has a state size of 8 and 4 possible actions. The difficulty ranges from 0 to 5, with higher values indicating more challenging conditions. The environment can be rendered in different modes.

Possible actions:

Num	Action
0	Do nothing
1	Fire left engine
2	Fire down engine
3	Fire right engine

Difficulty levels:

Difficulty	Description
0	No wind, no turbulence
1	Weak wind, no turbulence
2	Moderate wind, weak turbulence
3	Medium strong wind, moderate turbulence
4	Strong wind, medium strong turbulence
5	Very strong wind, strong turbulence

See also

Gymnasium’s Lunar Lander environment: https://gymnasium.farama.org/environments/box2d/lunar_lander/

Parameters:

difficulty – The difficulty level of the environment (0 to 5).
n_frames_stacked – How many most recent states should be stacked together to form a final state representation. Defaults to 1.
append_step_count – Whether or not append the current step count to each state. Defaults to False.
random_state – Optional seed that controls the randomness of the environment. Defaults to None.
kwargs – Arguments passed down to gymnasium.make.

Raises:

ValueError – If the specified difficulty level is invalid.

step_count

Current step count since the last reset.

Type:: int

difficulty

Difficulty level. Higher values indicate more difficult environments.

Type:: int

n_frames_stacked

How many most recent states should be stacked together to form a final state representation.

Type:: int

append_step_count

Whether or not append the current step count to each state.

Type:: bool

N_ACTIONS: int = 4: Number of available actions.

class academia.environments.MsPacman(difficulty: int, n_frames_stacked: int = 1, append_step_count: bool = False, flatten_state: bool = False, skip_game_start: bool = True, random_state: int | None = None, **kwargs)

Bases: GenericAtariWrapper

This class is a wrapper for Gymnasium’s Ms Pacman environment.

MsPacman is an Atari 2600 environment where the agent has to navigate a maze, eat pellets and avoid ghosts. The higher the difficulty, the more ghosts to avoid.

Possible actions:

Num	Name	Action
0	NOOP	Do nothing
1	UP	Move up
2	RIGHT	Move right
3	DOWN	Move down
4	LEFT	Move left
5	UPRIGHT	Move upright
6	UPLEFT	Move upleft
7	DOWNRIGHT	Move downright
8	DOWNLEFT	Move downleft

Difficulty levels:

Difficulty	Description
0	1 ghost is chasing the player
1	2 ghosts are chasing the player
2	3 ghosts are chasing the player
3	4 ghosts are chasing the player

Note

For this environment the keyword argument mode is not used. This is because Ms Pacman did not use the difficulty settings available in Atari but did use mode settings to control the number of ghosts on the map. Because of this the difficulty parameter is mapped to mode.

See also

Gymnasium’s Ms Pacman environment: https://www.gymlibrary.dev/environments/atari/ms_pacman/

Parameters:

difficulty – Difficulty level from 0 to 3, where 0 is the easiest and 3 is the hardest.
n_frames_stacked – How many most recent states should be stacked together to form a final state representation. Defaults to 1.
append_step_count – Whether or not append the current step count to each state. Defaults to False.
flatten_state – Wheter ot not to flatten the state if represented by and RGB or grayscale image. If obs_type is set to "ram" this parameter does nothing. Defaults to False.
skip_game_start – Whether or not skip the game start. After every reset the game is an “noop” state for 65 frames which can hinder the training process. If true the game skips this stage by applying 65 NOOP actions before returning the first observed state. Defaults to True.
random_state – Optional seed that controls the randomness of the environment. Defaults to None.
kwargs – Arguments passed down to gymnasium.make.

Raises:

ValueError – If the specified difficulty level is invalid.

step_count

Current step count since the last reset.

Type:: int

difficulty

Difficulty level. Higher values indicate more difficult environments.

Type:: int

n_frames_stacked

How many most recent states should be stacked together to form a final state representation.

Type:: int

append_step_count

Whether or not append the current step count to each state.

Type:: bool

flatten_state

Wheter ot not to flatten the state if represented by and RGB or grayscale image.

Type:: bool

skip_game_start

Whether or not skip the game start.

Type:: bool

N_ACTIONS: int = 9: Number of available actions.

STATE_SHAPE: tuple[int, ...]: Shape of the state representation. Can vary for each instance

reset() → ndarray[Any, dtype[float32]]

Resets the environment to its initial state.

Returns:: The new state after resetting the environment.

Note

if skip_game_start is set to True this method also performs 65 NOOP actions before returning the first observed state.