Policies API
- class layeredrl.policies.Policy(action_space: Box, device=device(type='cpu'))[source]
-
Policy mapping env obs, level input, and level state to actions.
- __init__(action_space: Box, device=device(type='cpu'))[source]
Initialize the policy.
- Parameters:
action_space – The action space the policy output has to lie in.
device – The device to use.
- get_action(mapped_env_obs: Tensor, level_input: Tensor | None, level_state: Dict | None, deterministic: bool) Tensor[source]
Get an action for the given observation.
- Parameters:
mapped_env_obs – The observation from the environment after the env_obs_map has been applied.
level_input – The input to this level, i.e., the action from the level above.
level_state – The state of the level.
deterministic – Whether to return a deterministic action (as opposed to a stochastic one).
- Returns:
The action, and a Batch with info about the action (logits etc.)
- get_log_prob(mapped_env_obs: Tensor, level_input: Tensor | None, level_state: Dict | None, action: Tensor) Tensor[source]
Get the log probability of the given action under the policy.
- Parameters:
mapped_env_obs – The observation from the environment after the env_obs_map has been applied.
level_input – The input to this level, i.e., the action from the level above.
level_state – The state of the level.
action – The action.
- Returns:
The log probability of the action under the policy.
- class layeredrl.policies.UniformPolicy(action_space: Box, device=device(type='cpu'))[source]
Bases:
PolicyA policy that randomly samples actions from the action space.
- class layeredrl.policies.TianshouPolicy(action_space: Space, ts_policy: BasePolicy, device=device(type='cpu'))[source]
Bases:
PolicyWrapper around a tianshou policy.
Note: Only tested with SACPolicy and DQNPolicy at the moment.
- __init__(action_space: Space, ts_policy: BasePolicy, device=device(type='cpu'))[source]
Initialize the policy.
- Parameters:
action_space – The action space of the environment the policy acts in.
ts_policy – The tianshou policy.
device – The device to use.
- get_log_prob(mapped_env_obs: Tensor, level_input: Tensor | None, level_state: Dict | None, action: Tensor, std: float | None = None) Tensor[source]
Get the log probability of the given action under the policy.
- Parameters:
mapped_env_obs – The observation from the environment after the env_obs_map has been applied.
level_input – The input to this level, i.e., the action from the level above.
level_state – The state of the level.
action – The action.
std – The standard deviation of the action distribution. Overwrites the std from the policy if given.
- Returns:
The log probability of the action under the policy.
- get_value(mapped_env_obs: Tensor, level_input: Tensor | None, level_state: Dict | None, action: Tensor) Tensor[source]
Get the value of the given obs and action as predicted by the critic.
- Parameters:
mapped_env_obs – The observation from the environment after the env_obs_map has been applied.
level_input – The input to this level, i.e., the action from the level above.
level_state – The state of the level.
action – The action.
- Returns:
The value of the given obs and action as predicted by the critic.