academia.environments module
Submodules
Module contents
This module contains environments that agents can be trained on.
Most of these environments are wrappers for environments from MiniGrid or gymnasium packages
(BridgeBuilding
is the exception). The main purpose of these wrappers is to simplify scaling
environments difficulty to adjusting a single difficulty
parameter. Scalability of environments makes
them suitable for Curriculum Learning.
Exported classes:
Note
If you wish to use your own environment please refer to Using your own environments.
- class academia.environments.BridgeBuilding(difficulty: int, river_width: int = 2, max_steps: int = 100, render_mode: Literal['human'] | None = None, obs_type: Literal['string', 'array'] = 'array', reward_density: Literal['sparse', 'dense'] = 'sparse', n_frames_stacked: int = 1, append_step_count: bool = False, random_state: int | None = None)
Bases:
ScalableEnvironment
A grid environment where an agent has to use boulders scattered around the map to build a bridge across a river and get from one side to the other. The higher the difficulty the more work the agent has to put into building the bridge. At lower difficulties the bridge is partly (or even fully) built and the agent only has to learn how to finish it and/or navigate it.
The reward system is presented in the table below. Note that the last two rewards can only be obtained if
reward_density
is set to"dense"
:Event
Reward
Running out of time
0
Drowning in the river
0
Reaching the goal
1 -
step_count
/max_steps
+bridge_length
/river_width
(Dense) Constructing the bridge
0.5 * (
length_after_step
-length_before_step
)(Dense) Deconstructing the bridge
0.5 * (
length_after_step
-length_before_step
)The main reward function (reaching the goal) is meant to mimic Minigrid’s reward function but its last component also forces the agent to fully build the bridge.
Possible actions:
Num
Name
Action
0
left
Turn left
1
right
Turn right
2
forward
Move forward
3
pickup/drop
Pick up/Drop boulder
Difficulty levels:
Difficulty
Description
n
The bridge is missing n boulders
- Parameters:
difficulty – Difficulty level from 0 to
river_width
, where 0 is the easiest andriver_width
is the hardest.river_width – The width of the river.
max_steps – The maximum number of steps an agent can spend in the environment. If the agent doesn’t reach the goal in that time the episode terminates. Defaults to
100
.render_mode – How the environment should be rendered. If set to
"human"
the environment will be rendered in a way interpretable by a human. Defaults toNone
.obs_type – How should the state be observed. If
"string"
a string representing the state will be returned. If"array"
an array representing the state will be returned. Defaults to"array"
.reward_density – The density of the reward function. Possible values are
"sparse"
and"dense"
. If"sparse"
is passed the agent will only get the reward at the end of the episode. If"dense"
is passed the agent will additionally obtain rewards (and penalties) for constructing (or deconstructing) parts of the bridge. Defaults to"sparse"
.n_frames_stacked – How many most recent states should be stacked together to form a final state representation. Defaults to 1.
append_step_count – Whether or not append the current step count to each state. Defaults to
False
.random_state – Optional seed that controls the randomness of the environment. Defaults to
None
.
- Raises:
ValueError – If the specified river width level is invalid.
ValueError – If the specified difficulty level is invalid.
- step_count
Current step count since the last reset.
- Type:
int
- difficulty
Difficulty level. Higher values indicate more difficult environments.
- Type:
int
- river_width
The width of the river.
- Type:
int
- n_frames_stacked
How many most recent states should be stacked together to form a final state representation.
- Type:
int
- append_step_count
Whether or not append the current step count to each state.
- Type:
bool
- max_steps
The maximum number of steps an agent can spend in the environment.
- Type:
int
- render_mode Optiona[Literal["human"]]
How the environment should be rendered.
- obs_type Literal["string", "array"]
How should the state be observed.
- reward_density
The density of the reward function.
- N_ACTIONS: int = 4
Number of available actions.
- STATE_SHAPE: tuple[int, ...]
Shape of the state representation. Can vary for each instance
- get_legal_mask() ndarray[Any, dtype[int32]]
Returns a binary legal action mask.
- Returns:
A binary mask with 0s in place for illegal actions (actions that have no effect) and 1s for legal actions.
- observe() str | ndarray[Any, dtype[float32]]
Returns the current state of the environment. Performs state stacking if
n_frames_stacked
is greater than 1.- Returns:
The current state of the environment.
- render() None
Renders the environment in the current render mode.
- reset() str | ndarray[Any, dtype[float32]]
Resets the environment to its initial state.
- Returns:
The new state after resetting the environment.
- step(action) tuple[str | ndarray[Any, dtype[float32]], float, bool]
Advances the environment by one step given the specified action.
- Parameters:
action – The action to take.
- Returns:
A tuple containing the new state, reward, and a flag indicating episode termination.
- class academia.environments.DoorKey(difficulty: int, n_frames_stacked: int = 1, append_step_count: bool = False, random_state: int | None = None, **kwargs)
Bases:
GenericMiniGridWrapper
This class is a wrapper for MiniGrid’s Door Key environments.
DoorKey is a grid environment where an agent has to find a key and then open a door to reach the destination. The higher the difficulty, the bigger the grid so it is more complicated to find the key, then the door and then the destination.
Possible actions:
Num
Name
Action
0
left
Turn left
1
right
Turn right
2
forward
Move forward
3
pickup
Pick up an object
4
toggle
Toggle/activate an object
Difficulty levels:
Difficulty
Description
0
5x5 grid size with 1 key and 1 door
1
6x6 grid size with 1 key and 1 door
2
8x8 grid size with 1 key and 1 door
3
16x16 grid size with 1 key and 1 door
See also
MiniGrid’s Door Key environments: https://minigrid.farama.org/environments/minigrid/DoorKeyEnv/
- Parameters:
difficulty – Difficulty level from 0 to 3, where 0 is the easiest and 3 is the hardest.
n_frames_stacked – How many most recent states should be stacked together to form a final state representation. Defaults to 1.
append_step_count – Whether or not append the current step count to each state. Defaults to
False
.random_state – Optional seed that controls the randomness of the environment. Defaults to
None
.kwargs – Arguments passed down to
gymnasium.make
.
- Raises:
ValueError – If the specified difficulty level is invalid.
- step_count
Current step count since the last reset.
- Type:
int
- difficulty
Difficulty level. Higher values indicate more difficult environments.
- Type:
int
- n_frames_stacked
How many most recent states should be stacked together to form a final state representation.
- Type:
int
- append_step_count
Whether or not append the current step count to each state.
- Type:
bool
- N_ACTIONS: int = 5
Number of available actions.
- STATE_SHAPE: tuple[int, ...]
Shape of the state representation. Can vary for each instance
- class academia.environments.LavaCrossing(difficulty: int, n_frames_stacked: int = 1, append_step_count: bool = False, random_state: int | None = None, **kwargs)
Bases:
GenericMiniGridWrapper
This class is a wrapper for MiniGrid’s Lava Crossing environments.
A grid environment where an agent has to avoid patches of lava in order to reach the destination. The higher the difficulty, the more lava patches are generated on the grid.
Possible actions:
Num
Name
Action
0
left
Turn left
1
right
Turn right
2
forward
Move forward
Difficulty levels:
Difficulty
Description
0
9x9 grid size with 1 lava patch
1
9x9 grid size with 2 lava patches
2
9x9 grid size with 3 lava patches
3
11x11 grid size with 5 lava patches
See also
MiniGrid’s Lava Crossing environments: https://minigrid.farama.org/environments/minigrid/CrossingEnv/
- Parameters:
difficulty – Difficulty level from 0 to 3, where 0 is the easiest and 3 is the hardest.
n_frames_stacked – How many most recent states should be stacked together to form a final state representation. Defaults to 1.
append_step_count – Whether or not append the current step count to each state. Defaults to
False
.random_state – Optional seed that controls the randomness of the environment. Defaults to
None
.kwargs – Arguments passed down to
gymnasium.make
.
- Raises:
ValueError – If the specified difficulty level is invalid.
- step_count
Current step count since the last reset.
- Type:
int
- difficulty
Difficulty level. Higher values indicate more difficult environments.
- Type:
int
- n_frames_stacked
How many most recent states should be stacked together to form a final state representation.
- Type:
int
- append_step_count
Whether or not append the current step count to each state.
- Type:
bool
- N_ACTIONS: int = 3
Number of available actions.
- STATE_SHAPE: tuple[int, ...]
Shape of the state representation. Can vary for each instance
- class academia.environments.LunarLander(difficulty: int, n_frames_stacked: int = 1, append_step_count: bool = False, random_state: int | None = None, **kwargs)
Bases:
GenericGymnasiumWrapper
This class is a wrapper for Gymnasium’s Lunar Lander environment, which itself is a variant of the classic Lunar Lander game.
The goal is to land a spacecraft on the moon’s surface by controlling its thrusters. The environment has a state size of 8 and 4 possible actions. The difficulty ranges from 0 to 5, with higher values indicating more challenging conditions. The environment can be rendered in different modes.
Possible actions:
Num
Action
0
Do nothing
1
Fire left engine
2
Fire down engine
3
Fire right engine
Difficulty levels:
Difficulty
Description
0
No wind, no turbulence
1
Weak wind, no turbulence
2
Moderate wind, weak turbulence
3
Medium strong wind, moderate turbulence
4
Strong wind, medium strong turbulence
5
Very strong wind, strong turbulence
See also
Gymnasium’s Lunar Lander environment: https://gymnasium.farama.org/environments/box2d/lunar_lander/
- Parameters:
difficulty – The difficulty level of the environment (0 to 5).
n_frames_stacked – How many most recent states should be stacked together to form a final state representation. Defaults to 1.
append_step_count – Whether or not append the current step count to each state. Defaults to
False
.random_state – Optional seed that controls the randomness of the environment. Defaults to
None
.kwargs – Arguments passed down to
gymnasium.make
.
- Raises:
ValueError – If the specified difficulty level is invalid.
- step_count
Current step count since the last reset.
- Type:
int
- difficulty
Difficulty level. Higher values indicate more difficult environments.
- Type:
int
- n_frames_stacked
How many most recent states should be stacked together to form a final state representation.
- Type:
int
- append_step_count
Whether or not append the current step count to each state.
- Type:
bool
- N_ACTIONS: int = 4
Number of available actions.
- class academia.environments.MsPacman(difficulty: int, n_frames_stacked: int = 1, append_step_count: bool = False, flatten_state: bool = False, skip_game_start: bool = True, random_state: int | None = None, **kwargs)
Bases:
GenericAtariWrapper
This class is a wrapper for Gymnasium’s Ms Pacman environment.
MsPacman is an Atari 2600 environment where the agent has to navigate a maze, eat pellets and avoid ghosts. The higher the difficulty, the more ghosts to avoid.
Possible actions:
Num
Name
Action
0
NOOP
Do nothing
1
UP
Move up
2
RIGHT
Move right
3
DOWN
Move down
4
LEFT
Move left
5
UPRIGHT
Move upright
6
UPLEFT
Move upleft
7
DOWNRIGHT
Move downright
8
DOWNLEFT
Move downleft
Difficulty levels:
Difficulty
Description
0
1 ghost is chasing the player
1
2 ghosts are chasing the player
2
3 ghosts are chasing the player
3
4 ghosts are chasing the player
Note
For this environment the keyword argument
mode
is not used. This is because Ms Pacman did not use the difficulty settings available in Atari but did use mode settings to control the number of ghosts on the map. Because of this thedifficulty
parameter is mapped tomode
.See also
Gymnasium’s Ms Pacman environment: https://www.gymlibrary.dev/environments/atari/ms_pacman/
- Parameters:
difficulty – Difficulty level from 0 to 3, where 0 is the easiest and 3 is the hardest.
n_frames_stacked – How many most recent states should be stacked together to form a final state representation. Defaults to 1.
append_step_count – Whether or not append the current step count to each state. Defaults to
False
.flatten_state – Wheter ot not to flatten the state if represented by and RGB or grayscale image. If
obs_type
is set to"ram"
this parameter does nothing. Defaults toFalse
.skip_game_start – Whether or not skip the game start. After every reset the game is an “noop” state for 65 frames which can hinder the training process. If true the game skips this stage by applying 65 NOOP actions before returning the first observed state. Defaults to
True
.random_state – Optional seed that controls the randomness of the environment. Defaults to
None
.kwargs – Arguments passed down to
gymnasium.make
.
- Raises:
ValueError – If the specified difficulty level is invalid.
- step_count
Current step count since the last reset.
- Type:
int
- difficulty
Difficulty level. Higher values indicate more difficult environments.
- Type:
int
- n_frames_stacked
How many most recent states should be stacked together to form a final state representation.
- Type:
int
- append_step_count
Whether or not append the current step count to each state.
- Type:
bool
- flatten_state
Wheter ot not to flatten the state if represented by and RGB or grayscale image.
- Type:
bool
- skip_game_start
Whether or not skip the game start.
- Type:
bool
- N_ACTIONS: int = 9
Number of available actions.
- STATE_SHAPE: tuple[int, ...]
Shape of the state representation. Can vary for each instance
- reset() ndarray[Any, dtype[float32]]
Resets the environment to its initial state.
- Returns:
The new state after resetting the environment.
Note
if
skip_game_start
is set toTrue
this method also performs 65 NOOP actions before returning the first observed state.