queens.models.reinforcement_learning package#

Module for reinforcement learning capabilities.

Note

If you have no prior experience with RL, a good starting point might be the introduction of Spinning Up in Deep RL by OpenAI: https://spinningup.openai.com/en/latest/spinningup/rl_intro.html.

In the follwing, we provide a brief overview of RL concepts and terminology and their relation to QUEENS.

In its essence, Reinformcement Learning (RL) is a type of machine learning which tries to mimick they way how humans learn to accomplish a new task, namely by performing trial-and-error interactions with their environment and learning from the gathered experience.

In RL, this interaction happens between a so-called agent (i.e., a learning algorithm) and an environment (i.e., the task or problem to be solved). The agent can perform actions in the environment in order to modify its state and receives observations (i.e., of the new state of the environment after applying the action) and rewards (i.e., a numerical reward signal quantifying how well the undertaken action was with respect to solving the problem encoded in the environment) in return. One interaction step between the agent and the environment is called a timestep. The goal of the agent is to learn a policy (i.e., a mapping from observations to actions) allowing it to solve the task encoded in the environment by maximizing the cumulative reward signal obtained after performing an action.

In QUEENS terminology, the environment in it’s most general form can be thought of as a model which encodes the problem at hand, e.g., in the form of a physical simulation, and can be evaluated in forward fashion. The RL agent is trained by letting the algorithm repeatedly interact with the environment and learning a suitable policy from the collected experience. Once the agent is trained, it can be used to make predictions about the next action to perform based on a given observation. As such, the agent can be considered as a surrogate model as it first needs to be trained before being able to make predictions. Following the QUEENS terminology for models, a sample corresponds to an observation and the response of the RL model corresponds to the action to be taken.

This interpretation of RL in the context of QUEENS has been reflected in the design of the queens.models.reinforcement_learning.reinforcement_learning.ReinforcementLearning class.

Subpackages#

Submodules#

queens.models.reinforcement_learning.reinforcement_learning module#

Functionality for constructing an RL model with QUEENS.

For an introduction to RL in the context of QUEENS, we refer to the documentation of the queens.models.reinforcement_learning module.

class ReinforcementLearning(agent, deterministic_actions=False, render_mode=None, total_timesteps=10000)[source]#

Bases: Model

Main class for constructing an RL model with QUEENS.

The training or evaluation of an ReinforcementLearning model instance can be performed by using an instance of type queens.iterators.reinforcement_learning.ReinforcementLearning.

_agent#

An instance of a stable-baselines3 agent.

Type:

object

_deteministic_actions#

Flag indicating whether to use a deterministic policy.

Type:

bool

_render_mode#

String indicating whether (and how) the state of te environment should be rendered during evaluation.

  • If None, the state of the environment won’t be rendered.

  • If "human", the state of the environment will be visualized in a new pop-up window.

  • If "rgb_array", an rgb-image will be generated which will be stored in the member frames for further processing (no immediate screen output will be generated).

  • If "ansi", a string representaton of the environment will be generated which can be used for text-based rendering.

Note

Not all render modes can be used with all environments.

Type:

str, optional

_total_timesteps#

Total number of timesteps to train the agent.

Type:

int

_vectorized_environment#

A vectorized environment for evaluation.

Type:

object

frames#

A list with frames depicting the states of the environment generated from performing an evaluation interaction loop.

Type:

list

is_trained#

Flag indicating whether the agent has been trained.

Type:

bool

response#

The response of the last model evaluation.

Type:

dict

evaluate(samples)[source]#

Evaluate the model (agent) on the provided samples (observations).

Delegates the call to predict() internally and stores the of the model evaluation in the internal storage variable response.

Parameters:

samples (np.ndarray) – Input samples, i.e., multiple observations.

Returns:

dict – Results (actions) corresponding to current set of input samples.

grad(samples, upstream_gradient)[source]#

Evaluate the gradient of the model wrt. the provided input samples.

Warning

This method is currently not implemented for RL models.

Raises:

NotImplementedError – If the method is called.

interact(observation)[source]#

Perform one interaction step of an RL agent with an environment.

One interaction consists of the following steps:
  1. Predict the next action based on the current observation, see predict(). Whether or not a deterministic prediction will be made is determined by the value of _deterministic_action.

  2. Apply the predicted action to the environment, see step().

  3. Optionally render the environment depending on the value of _render_on_evaluation, see render().

  4. Return the new observation of the environment.

Parameters:

observation (np.ndarray) – The observation of the current state of the environment.

Returns:

result (dict) – A dictionary containing all the results generated during this interaction step, such as the undertaken action, the new observation, and the reward obtained from the environment.

predict(observations, deterministic=False)[source]#

Predict the actions to be undertaken for given observations.

Parameters:
  • observations (np.ndarray) – Either a single observation or a batch of observations.

  • deterministic (bool) – Flag indicating whether to use a deterministic policy.

Note

The deterministic flag is generally only relevant for testing purposes, i.e., to ensure that the same observation always results in the same action.

Returns:

result (dict) – Actions corresponding to the provided observations. The predicted actions are stored as the main result of this model.

render()[source]#

Render the current state of the environment.

Depending on the value of _render_mode the state of the environment will be either visualized in a pop-up window (self._render_mode=="human"), as an rgb-image (self._render_mode=="rgb_array"), or as a string representation (self._render_mode=="ansi"). If the scene is rendered but no pop-window generated, a representation of the scene will be appended to the member frames.

Note

Internally delegates the call to the render() method of the vectorized environment. Render settings can be controlled via the constructor of the environment and the value of member _render_mode.

reset(seed=None)[source]#

Resets the environment and returns its initial state.

Note

This method can also be used to generate an inital observation of the environment as the starting point for the evalution of a trained agent.

Parameters:

seed (int, optional) – Seed for making the observation generation reproducible.

Returns:

np.ndarray – (Random) Initial observation of the environment.

save(gs)[source]#

Save the trained agent to a file.

Delegates the call to queens.models.reinforcement_learning.utils.stable_baselines3.save_model().

Parameters:

gs (queens.utils.global_settings.GlobalSettings) – Global settings object

step(action)[source]#

Perform a single step in the environment.

Applys the provided action to the environment.

Parameters:

action (np.ndarray) – Action to be executed.

Returns:
  • observation (np.ndarray) – Observation of the new state of the environment.

  • reward (float) – Reward obtained from the environment.

  • done (bool) – Flag indicating whether the episode has finished.

  • info (dict) – Additional information.

train()[source]#

Train the RL agent.