academia.tools module

Submodules

Module contents

Miscellaneous classes and functions that don’t belong anywhere else.

The idea behind academia.tools is that no other module depends on it, but it itself does depend on other modules. That is different to academia.utils which works the other way around.

Exported classes:

class academia.tools.AgentDebugger(agent: Agent, env: ScalableEnvironment, start_greedy: bool = False, start_paused: bool = False, key_action_map: dict = {}, run: bool = False, run_verbose: int = 1)

Bases: object

Class allowing for easy agent debugging. Using this class the user can investigate agent’s behavior step-by-step with ability to check what the agent thinks about the current state. The user can also toggle between greedy and non-greedy behavior mid-episode.

Additionally the user can take over the agent at any moment by overriding the actions taken by the agent. This allows the user to put the agent in new, difficult or otherwise interesting situations and check how the agent behaves.

The user can interact with the debugger using the following keys:

  • ‘t’ - terminate the current episode (and start a new one)

  • ‘p’ - pause the environment

  • ‘g’ - toggle between greedy and non-greedy behavior

  • ‘ ‘ (space) - perform one step (only works when paused is set to True)

  • esc (’x1b’) - quit the debugger

The user can also interact with the environment using a custom key_action_map.

Parameters:
  • agent – Agent object to be debugged.

  • env – Environment object with which the agent will interact. The environment should be instantiated with render_mode set to "human" for the user to see it.

  • start_greedy – Whether the agent should start with greedy behavior. Defaults to False.

  • start_paused – Whether the environment should start in a paused state. Defaults to False.

  • key_action_map – Dictionary between keyboard keys and environment actions. It accepts one character per action. If a digit character is not present in the dictionary it will be automatically converted to the corresponding action. If any other character is not present in the dictionary it will be converted to None and ignored. The dictionary does not accept reserved_keys as its keys. Defaults to an empty dictionary.

  • run – Whether to run the debugger after initialization. Defaults to False.

  • run_verbose – Verbosity level with which to automatically run the debugger if run is True. Defaults to 1.

agent

Agent that is being debugged.

Type:

Agent

env

Environment with which the agent interacts.

Type:

ScalableEnvironment

key_action_map

Dictionary between keyboard keys and environment actions.

Type:

dict

greedy

Whether the agent behaves in a greedy manner.

Type:

bool

paused

Whether the environment is paused (allows for step-by-step execution).

Type:

bool

input_timeout

Time (in seconds) to wait for user input. If the user does not press any key in that time frame the execution continues (unless paused is True).

Type:

float

episodes

Number of episodes run in the environment.

Type:

int

steps

Number of steps in the current episode.

Type:

int

running

Whether the debugger is currently running.

Type:

bool

Examples

Initialization:

>>> from academia.tools import AgentDebugger
>>> from academia.environments import LavaCrossing
>>> from academia.agents import DQNAgent
>>> from academia.utils.models import lava_crossing
>>>
>>> agent = DQNAgent(lava_crossing.MLPDQN, 3)
>>> env = LavaCrossing(difficulty=0, render_mode='human')
>>>
>>> # auto running with keymap example
>>> AgentDebugger(agent, env, run=True, key_action_map={
>>>     'w': 2,
>>>     'a': 0,
>>>     'd': 1,
>>> })
>>>
>>> # manual running
>>> ad = AgentDebugger(agent, env)
>>> ad.run(verbose=5)
reserved_keys = ['t', '\x1b', ' ', 'p', 'g']

A list of reserved keys that cannot be used by key_action_map

run(verbose: int = 0) None

Runs the agent debugger with the specified verbosity level.

Verbosity level

What is logged

0

no logging (except for errors)

1

Episode Rewards

2

Step Rewards

3

Agent Thoughts

Parameters:

verbose – Verbosity level.

thoughts_handlers = {'DQNAgent': <function _dqnagent_thoughts_handler>, 'PPOAgent': <function _ppoagent_thoughts_handler>, 'QLAgent': <function _qlagent_thoughts_handler>, 'SarsaAgent': <function _sarsa_thoughts_handler>}

A class attribute that stores global list of available agent thought handlers. Thought handlers are functions that accept an agent object and an observed state and return a user defined “thought” e.g. q-values predicted by the agent.

These functions are stored with the following signature:

>>> def my_thought_handler(agent: Agent, state: Any) -> str:
>>>     pass

where agent is the agent object to handle and state is the observed state of the environment on which we want to get agent’s thoughts.

There are a few default thought handler corresponding to implemented agents:

  • 'PPOAgent' - returns the predicted probabilites of actions when in discrete mode and mean action when in continuous mode as well as the state value as predicted by the critic.

  • 'DQNAgent' - returns the predicted q-values of each action.

  • 'QLAgent' - returns the predicted q-values of each action.

  • 'SarsaAgent' - returns the predicted q-values of each action.

Example

>>> from academia.agents.base import Agent
>>>
>>> # custom agent class
>>> class MyAgent(Agent):
>>>     pass
>>>
>>> def my_agent_handler(agent: Agent, state: Any):
>>>     pass
>>> # adds a new handler to the dicitonary
>>> # the key should be a string containing the name of the class
>>> AgentDebugger.thoughts_handlers['MyAgent'] = my_agent_handler