academia.agents.base module

Module contents

Base classes for all reinforcement learning algorithms available in this package. All user-defined algorithms should inherit from one of these classes.

Exported classes:

class academia.agents.base.Agent(n_actions: int, gamma: float = 0.99, random_state: int | None = None)

Bases: SavableLoadable

Agent class represents a generic reinforcement learning agent.

This class serves as the base class for various reinforcement learning agents. It defines common attributes and methods necessary for interacting with environments.

Parameters:
  • n_actions – Number of possible actions in the environment.

  • gamma – Discount factor. Defaults to 0.99.

  • random_state – Seed for the random number generator. Defaults to None.

n_actions

Number of possible actions in the environment.

Type:

int

gamma

Discount factor.

Type:

float

abstract get_action(state: Any, legal_mask: ndarray[Any, dtype[int32]] | None = None, greedy: bool = False) int

Gets an action for the given state.

Parameters:
  • state – Current state in the environment.

  • legal_mask – A mask representing legal actions in the current state.

  • greedy – Whether to choose the greedy action. Defaults to False.

Returns:

Action to be taken in the given state.

abstract reset_exploration(value)

Resets the exploration parameter to the specified value.

Parameters:

value – Value to reset the parameter to.

abstract update(state: Any, action: int, reward: float, new_state: Any, is_terminal: bool)

Updates the agent’s knowledge based on the observed reward and new state.

Parameters:
  • state – Current state in the environment.

  • action – Action taken in the current state.

  • reward – Reward received after taking the action.

  • new_state – New state observed after taking the action.

  • is_terminal – Whether the new state is a terminal state or not.

abstract update_exploration()

Updates the exploration parameter.

class academia.agents.base.EpsilonGreedyAgent(n_actions: int, epsilon: float = 1.0, epsilon_decay: float = 0.999, min_epsilon: float = 0.01, gamma: float = 0.99, random_state: int | None = None)

Bases: Agent

A base class for all epsilon-greedy reinforcement learning algorithms.

Parameters:
  • n_actions – Number of possible actions in the environment.

  • gamma – Discount factor. Defaults to 0.99.

  • random_state – Seed for the random number generator. Defaults to None.

  • epsilon – Exploration-exploitation trade-off parameter. Defaults to 1.

  • min_epsilon – Minimum value for epsilon during exploration. Defaults to 0.01.

  • epsilon_decay – Decay rate for epsilon. Defaults to 0.999.

n_actions

Number of possible actions in the environment.

Type:

int

gamma

Discount factor.

Type:

float

epsilon

Exploration-exploitation trade-off parameter.

Type:

float

min_epsilon

Minimum value for epsilon during exploration.

Type:

float

epsilon_decay

Decay rate for epsilon.

Type:

float

reset_exploration(value=1)

Resets the exploration parameter epsilon to the specified value.

Parameters:

value – Value to reset epsilon to. Defaults to 1.

update_exploration()

Decays the exploration parameter epsilon based on epsilon_decay.

class academia.agents.base.TabularAgent(n_actions, alpha=0.1, gamma=0.99, epsilon=1, epsilon_decay=0.999, min_epsilon=0.01, random_state: int | None = None)

Bases: EpsilonGreedyAgent

TabularAgent class implements a reinforcement learning agent for simple environments where a Q-table can be effectively used.

This class serves as the base class for tabular agents such as academia.agents.QLAgent and academia.agents.SarsaAgent. This agent learns to make decisions in an environment with discrete states and actions by maintaining a Q-table, which represents the quality of taking a certain actionin a specific state.

Parameters:
  • n_actions – Number of possible actions in the environment.

  • alpha – Learning rate. Defaults to 0.1.

  • gamma – Discount factor. Defaults to 0.99.

  • epsilon – Exploration-exploitation trade-off parameter. Defaults to 1.

  • epsilon_decay – Decay rate for epsilon. Defaults to 0.999.

  • min_epsilon – Minimum value for epsilon during exploration. Defaults to 0.01.

  • random_state – Seed for the random number generator. Defaults to None.

Raises:

ValueError – If the given state is not supported.

epsilon

Exploration-exploitation trade-off parameter.

Type:

float

min_epsilon

Minimum value for epsilon during exploration.

Type:

float

epsilon_decay

Decay rate for epsilon.

Type:

float

n_actions

Number of possible actions in the environment.

Type:

int

gamma

Discount factor.

Type:

float

alpha

Learning rate.

Type:

float

q_table

Q-table for the agent.

Type:

dict

get_action(state: Any, legal_mask: ndarray[Any, dtype[int32]] | None = None, greedy: bool = False) int

Gets an action for the given state using epsilon-greedy policy.

Parameters:
  • state – Current state in the environment.

  • legal_mask – A mask representing legal actions in the current state.

  • greedy – Whether to choose the greedy action. Defaults to False.

Returns:

Action to be taken in the given state.

Return type:

int

classmethod load(path: str) TabularAgent

Loads the agent’s state from a JSON file.

Parameters:

path – Path to the JSON file.

Returns:

A loaded agent with the saved state.

save(path: str) str

Saves the agent’s state to a JSON file.

Parameters:

path – Path to save the JSON file.

Returns:

An absolute path to the saved file.