Hierarchies API

class layeredrl.hierarchies.Hierarchy(levels: List[Level], env: Env, env_obs_maps: List[Callable[[Tensor], Tensor]] | None = None, mapped_env_obs_shapes: List[Tuple] | None = None, keep_params: bool = False, device: device = device(type='cpu'), writer: SummaryWriter | None = None)[source]

Bases: object

__init__(levels: List[Level], env: Env, env_obs_maps: List[Callable[[Tensor], Tensor]] | None = None, mapped_env_obs_shapes: List[Tuple] | None = None, keep_params: bool = False, device: device = device(type='cpu'), writer: SummaryWriter | None = None)[source]

Initialize the hierarchy. Makes sure the action a level emits fits the input expected by the level below it.

Parameters:
  • levels – The levels of the hierarchy.

  • env – The environment.

  • obs_input_maps – A list of functions that map the environment observation to a vector that is provided to the corresponding level. This can be used to implement information hiding and for moving a trained level from one environment to another with a different observation space. If None, the identity map is used.

  • mapped_env_obs_shapes – The shapes of the output of the env_obs_map of each level. If None, the dimension of the environment observation space is used. If negative, the negative dimension is added to the dimension of the environment observation space. This is useful if the map from the environment observation to the level observation is dropping some components.

  • keep_params – Whether to keep the parameters of the levels instead of resetting them. Setting this to True is only valid if the levels were already initialized before.

  • device – The device to use.

  • writer – The TensorBoard writer to use for logging. If None, no logging is done.

eval() None[source]

Set all levels of the hierarchy to evaluation mode.

get_action(obs: Tensor) Tensor[source]

Get an action for the given observation.

Note that obs and the returned action have a batch dimension corresponding to environment instances.

The method descends the hierarchy from top to bottom, starting with the active level. From thereon, an action is obtained for each level which is then passed to the level below. The action of the lowest level is returned (to be executed in the environment).

Parameters:

obs – The environment observation.

Returns:

The action for the environment.

get_copy() Hierarchy[source]

Return a copy of the hierarchy.

No models are copied, only the structure and state of the hierarchy.

The copy of the hierarchy can be used for testing rollouts without influencing the state of the original hierarchy, for example. Learning with the copy will influence the original hierarchy, however, and is not recommended.

learn() None[source]

Learn from the collected transitions.

load(path: Path) None[source]

Load the hierarchy from the given path.

load_buffers(path: Path) None[source]

Load the replay buffers of all levels from the given path.

process_transition(obs_next: Tensor, rew: Tensor, terminated: Tensor, truncated: Tensor) None[source]

Process the environment transition and return control to the higher levels where appropriate.

Starting from the lowest level, the active level can pass control back to the level above. This can continue until a level stays active or the highest level is reached.

While ascending the hierarchy, register the (semi-MDP) transitions with the levels.

Parameters:
  • obs_next – The next environment observation.

  • rew – The reward of the environment transition.

  • terminated – Whether the episode terminated. Tensor with one entry per environment instance.

  • truncated – Whether the episode was truncated. Tensor with one entry per environment instance.

reset() None[source]

Reset the hierarchy.

Call this at the beginning of the session. The highest level is active at the beginning for all environment instances.

Do not call at the end of episodes.

save(path: Path) None[source]

Save the hierarchy to the given path.

save_buffers(path: Path) None[source]

Save the replay buffers of all levels to the given path.

set_n_env_instances(n_env_instances: int, propagate_to_levels: bool = True) None[source]

Set the number of environment instances.

Parameters:

n_env_instances – The number of environment instances.

soft_reset() None[source]

Soft reset the hierarchy.

Call this when manually resetting the (vector) environment. This will not affect things like warm up periods etc. and can therefore be called without influencing the learning process.

train() None[source]

Set all levels of the hierarchy to training mode.

class layeredrl.hierarchies.RandomHierarchy(env, device)[source]

Bases: Hierarchy

A hierarchy consisting of a single level returning random actions.

The actions are sampled uniformly from the action space of the environment if the action space is finite/a finite interval.

__init__(env, device)[source]

Initialize the hierarchy. Makes sure the action a level emits fits the input expected by the level below it.

Parameters:
  • levels – The levels of the hierarchy.

  • env – The environment.

  • obs_input_maps – A list of functions that map the environment observation to a vector that is provided to the corresponding level. This can be used to implement information hiding and for moving a trained level from one environment to another with a different observation space. If None, the identity map is used.

  • mapped_env_obs_shapes – The shapes of the output of the env_obs_map of each level. If None, the dimension of the environment observation space is used. If negative, the negative dimension is added to the dimension of the environment observation space. This is useful if the map from the environment observation to the level observation is dropping some components.

  • keep_params – Whether to keep the parameters of the levels instead of resetting them. Setting this to True is only valid if the levels were already initialized before.

  • device – The device to use.

  • writer – The TensorBoard writer to use for logging. If None, no logging is done.

class layeredrl.hierarchies.FlatTianshouHierarchy(env, tianshou_config, device, **kwargs)[source]

Bases: Hierarchy

A hierarchy consisting of a single level with a Tianshou policy.

__init__(env, tianshou_config, device, **kwargs)[source]

Initialize the hierarchy.

Parameters:
  • env – The environment.

  • tianshou_config – The configuration of the Tianshou level.

  • device – The device to use.