academia.agents.base module
Module contents
Base classes for all reinforcement learning algorithms available in this package. All user-defined algorithms should inherit from one of these classes.
Exported classes:
- class academia.agents.base.Agent(n_actions: int, gamma: float = 0.99, random_state: int | None = None)
Bases:
SavableLoadableAgent class represents a generic reinforcement learning agent.
This class serves as the base class for various reinforcement learning agents. It defines common attributes and methods necessary for interacting with environments.
- Parameters:
n_actions – Number of possible actions in the environment.
gamma – Discount factor. Defaults to 0.99.
random_state – Seed for the random number generator. Defaults to
None.
- n_actions
Number of possible actions in the environment.
- Type:
int
- gamma
Discount factor.
- Type:
float
- abstract get_action(state: Any, legal_mask: ndarray[Any, dtype[int32]] | None = None, greedy: bool = False) int
Gets an action for the given state.
- Parameters:
state – Current state in the environment.
legal_mask – A mask representing legal actions in the current state.
greedy – Whether to choose the greedy action. Defaults to False.
- Returns:
Action to be taken in the given state.
- abstract reset_exploration(value)
Resets the exploration parameter to the specified value.
- Parameters:
value – Value to reset the parameter to.
- abstract update(state: Any, action: int, reward: float, new_state: Any, is_terminal: bool)
Updates the agent’s knowledge based on the observed reward and new state.
- Parameters:
state – Current state in the environment.
action – Action taken in the current state.
reward – Reward received after taking the action.
new_state – New state observed after taking the action.
is_terminal – Whether the new state is a terminal state or not.
- abstract update_exploration()
Updates the exploration parameter.
- class academia.agents.base.EpsilonGreedyAgent(n_actions: int, epsilon: float = 1.0, epsilon_decay: float = 0.999, min_epsilon: float = 0.01, gamma: float = 0.99, random_state: int | None = None)
Bases:
AgentA base class for all epsilon-greedy reinforcement learning algorithms.
- Parameters:
n_actions – Number of possible actions in the environment.
gamma – Discount factor. Defaults to 0.99.
random_state – Seed for the random number generator. Defaults to
None.epsilon – Exploration-exploitation trade-off parameter. Defaults to 1.
min_epsilon – Minimum value for epsilon during exploration. Defaults to 0.01.
epsilon_decay – Decay rate for epsilon. Defaults to 0.999.
- n_actions
Number of possible actions in the environment.
- Type:
int
- gamma
Discount factor.
- Type:
float
- epsilon
Exploration-exploitation trade-off parameter.
- Type:
float
- min_epsilon
Minimum value for epsilon during exploration.
- Type:
float
- epsilon_decay
Decay rate for epsilon.
- Type:
float
- reset_exploration(value=1)
Resets the exploration parameter epsilon to the specified value.
- Parameters:
value – Value to reset epsilon to. Defaults to 1.
- update_exploration()
Decays the exploration parameter epsilon based on epsilon_decay.
- class academia.agents.base.TabularAgent(n_actions, alpha=0.1, gamma=0.99, epsilon=1, epsilon_decay=0.999, min_epsilon=0.01, random_state: int | None = None)
Bases:
EpsilonGreedyAgentTabularAgent class implements a reinforcement learning agent for simple environments where a Q-table can be effectively used.
This class serves as the base class for tabular agents such as
academia.agents.QLAgentandacademia.agents.SarsaAgent. This agent learns to make decisions in an environment with discrete states and actions by maintaining a Q-table, which represents the quality of taking a certain actionin a specific state.- Parameters:
n_actions – Number of possible actions in the environment.
alpha – Learning rate. Defaults to 0.1.
gamma – Discount factor. Defaults to 0.99.
epsilon – Exploration-exploitation trade-off parameter. Defaults to 1.
epsilon_decay – Decay rate for epsilon. Defaults to 0.999.
min_epsilon – Minimum value for epsilon during exploration. Defaults to 0.01.
random_state – Seed for the random number generator. Defaults to
None.
- Raises:
ValueError – If the given state is not supported.
- epsilon
Exploration-exploitation trade-off parameter.
- Type:
float
- min_epsilon
Minimum value for epsilon during exploration.
- Type:
float
- epsilon_decay
Decay rate for epsilon.
- Type:
float
- n_actions
Number of possible actions in the environment.
- Type:
int
- gamma
Discount factor.
- Type:
float
- alpha
Learning rate.
- Type:
float
- q_table
Q-table for the agent.
- Type:
dict
- get_action(state: Any, legal_mask: ndarray[Any, dtype[int32]] | None = None, greedy: bool = False) int
Gets an action for the given state using epsilon-greedy policy.
- Parameters:
state – Current state in the environment.
legal_mask – A mask representing legal actions in the current state.
greedy – Whether to choose the greedy action. Defaults to False.
- Returns:
Action to be taken in the given state.
- Return type:
int
- classmethod load(path: str) TabularAgent
Loads the agent’s state from a JSON file.
- Parameters:
path – Path to the JSON file.
- Returns:
A loaded agent with the saved state.
- save(path: str) str
Saves the agent’s state to a JSON file.
- Parameters:
path – Path to save the JSON file.
- Returns:
An absolute path to the saved file.