queens.models.reinforcement_learning package#
Module for reinforcement learning capabilities.
Note
If you have no prior experience with RL, a good starting point might be the introduction of Spinning Up in Deep RL by OpenAI: https://spinningup.openai.com/en/latest/spinningup/rl_intro.html.
In the follwing, we provide a brief overview of RL concepts and terminology and their relation to QUEENS.
In its essence, Reinformcement Learning (RL) is a type of machine learning which tries to mimick they way how humans learn to accomplish a new task, namely by performing trial-and-error interactions with their environment and learning from the gathered experience.
In RL, this interaction happens between a so-called agent (i.e., a learning algorithm) and an environment (i.e., the task or problem to be solved). The agent can perform actions in the environment in order to modify its state and receives observations (i.e., of the new state of the environment after applying the action) and rewards (i.e., a numerical reward signal quantifying how well the undertaken action was with respect to solving the problem encoded in the environment) in return. One interaction step between the agent and the environment is called a timestep. The goal of the agent is to learn a policy (i.e., a mapping from observations to actions) allowing it to solve the task encoded in the environment by maximizing the cumulative reward signal obtained after performing an action.
In QUEENS terminology, the environment in it’s most general form can be thought of as a model which encodes the problem at hand, e.g., in the form of a physical simulation, and can be evaluated in forward fashion. The RL agent is trained by letting the algorithm repeatedly interact with the environment and learning a suitable policy from the collected experience. Once the agent is trained, it can be used to make predictions about the next action to perform based on a given observation. As such, the agent can be considered as a surrogate model as it first needs to be trained before being able to make predictions. Following the QUEENS terminology for models, a sample corresponds to an observation and the response of the RL model corresponds to the action to be taken.
This interpretation of RL in the context of QUEENS has been reflected in the design of the
queens.models.reinforcement_learning.reinforcement_learning.ReinforcementLearning
class.
Subpackages#
Submodules#
queens.models.reinforcement_learning.reinforcement_learning module#
Functionality for constructing an RL model with QUEENS.
For an introduction to RL in the context of QUEENS, we refer to the documentation of the
queens.models.reinforcement_learning
module.
- class ReinforcementLearning(agent, deterministic_actions=False, render_mode=None, total_timesteps=10000)[source]#
Bases:
Model
Main class for constructing an RL model with QUEENS.
The training or evaluation of an
ReinforcementLearning
model instance can be performed by using an instance of typequeens.iterators.reinforcement_learning.ReinforcementLearning
.- _agent#
An instance of a stable-baselines3 agent.
- Type:
object
- _deteministic_actions#
Flag indicating whether to use a deterministic policy.
- Type:
bool
- _render_mode#
String indicating whether (and how) the state of te environment should be rendered during evaluation.
If
None
, the state of the environment won’t be rendered.If
"human"
, the state of the environment will be visualized in a new pop-up window.If
"rgb_array"
, an rgb-image will be generated which will be stored in the memberframes
for further processing (no immediate screen output will be generated).If
"ansi"
, a string representaton of the environment will be generated which can be used for text-based rendering.
Note
Not all render modes can be used with all environments.
- Type:
str, optional
- _total_timesteps#
Total number of timesteps to train the agent.
- Type:
int
- _vectorized_environment#
A vectorized environment for evaluation.
- Type:
object
- frames#
A list with frames depicting the states of the environment generated from performing an evaluation interaction loop.
- Type:
list
- is_trained#
Flag indicating whether the agent has been trained.
- Type:
bool
- response#
The response of the last model evaluation.
- Type:
dict
- evaluate(samples)[source]#
Evaluate the model (agent) on the provided samples (observations).
Delegates the call to
predict()
internally and stores the of the model evaluation in the internal storage variableresponse
.- Parameters:
samples (np.ndarray) – Input samples, i.e., multiple observations.
- Returns:
dict – Results (actions) corresponding to current set of input samples.
- grad(samples, upstream_gradient)[source]#
Evaluate the gradient of the model wrt. the provided input samples.
Warning
This method is currently not implemented for RL models.
- Raises:
NotImplementedError – If the method is called.
- interact(observation)[source]#
Perform one interaction step of an RL agent with an environment.
- One interaction consists of the following steps:
Predict the next action based on the current observation, see
predict()
. Whether or not a deterministic prediction will be made is determined by the value of_deterministic_action
.Apply the predicted action to the environment, see
step()
.Optionally render the environment depending on the value of
_render_on_evaluation
, seerender()
.Return the new observation of the environment.
- Parameters:
observation (np.ndarray) – The observation of the current state of the environment.
- Returns:
result (dict) – A dictionary containing all the results generated during this interaction step, such as the undertaken action, the new observation, and the reward obtained from the environment.
- predict(observations, deterministic=False)[source]#
Predict the actions to be undertaken for given observations.
- Parameters:
observations (np.ndarray) – Either a single observation or a batch of observations.
deterministic (bool) – Flag indicating whether to use a deterministic policy.
Note
The
deterministic
flag is generally only relevant for testing purposes, i.e., to ensure that the same observation always results in the same action.- Returns:
result (dict) – Actions corresponding to the provided observations. The predicted actions are stored as the main result of this model.
- render()[source]#
Render the current state of the environment.
Depending on the value of
_render_mode
the state of the environment will be either visualized in a pop-up window (self._render_mode=="human"
), as an rgb-image (self._render_mode=="rgb_array"
), or as a string representation (self._render_mode=="ansi"
). If the scene is rendered but no pop-window generated, a representation of the scene will be appended to the memberframes
.Note
Internally delegates the call to the
render()
method of the vectorized environment. Render settings can be controlled via the constructor of the environment and the value of member_render_mode
.
- reset(seed=None)[source]#
Resets the environment and returns its initial state.
Note
This method can also be used to generate an inital observation of the environment as the starting point for the evalution of a trained agent.
- Parameters:
seed (int, optional) – Seed for making the observation generation reproducible.
- Returns:
np.ndarray – (Random) Initial observation of the environment.
- save(gs)[source]#
Save the trained agent to a file.
Delegates the call to
queens.models.reinforcement_learning.utils.stable_baselines3.save_model()
.- Parameters:
gs (queens.utils.global_settings.GlobalSettings) – Global settings object
- step(action)[source]#
Perform a single step in the environment.
Applys the provided action to the environment.
- Parameters:
action (np.ndarray) – Action to be executed.
- Returns:
observation (np.ndarray) – Observation of the new state of the environment.
reward (float) – Reward obtained from the environment.
done (bool) – Flag indicating whether the episode has finished.
info (dict) – Additional information.