Neural Networks API

class layeredrl.nets.ConcatNet(mapped_env_obs_shape: int, level_input_dim: int, level_state_dims: Dict[str, int], scale_remaining_steps: float = 1.0, *args, **kwargs)[source]

Bases: Net

A network that concatenates mapped_env_obs, level_input, action, n_remaining_states, and level_state along the last dimension.

For use with Tianshou level.

__init__(mapped_env_obs_shape: int, level_input_dim: int, level_state_dims: Dict[str, int], scale_remaining_steps: float = 1.0, *args, **kwargs)[source]: Initialize the network.

forward(obs: Dict, state: Any = None, **kwargs: Any) → Tuple[Tensor, Any][source]: Concatenate the inputs along the last dimension.

class layeredrl.nets.Critic(*args, **kwargs)[source]

Bases: Critic

Critic network for Tianshou level.

__init__(*args, **kwargs)[source]: Initialize internal Module state, shared by both nn.Module and ScriptModule.

forward(obs: ndarray | Tensor, act: ndarray | Tensor | None = None, info: Dict[str, Any] = {}) → Tensor[source]: Same as Tianshou critic but do not flatten obs and act before passing to preprocess_net.

class layeredrl.nets.ProbFCDynamics(state_space: ~gymnasium.spaces.box.Box, context_space: ~gymnasium.spaces.box.Box, action_space: ~gymnasium.spaces.box.Box | ~gymnasium.spaces.discrete.Discrete, n_modes: int, hidden_sizes: int = [128, 128], nonlinearity: ~typing.Any = <class 'torch.nn.modules.activation.ReLU'>, one_hot_action: bool = False, input_batch_norm: bool = False, ignore_context: bool = True, device: ~torch.device = device(type='cpu'))[source]

Bases: Module

Dynamics prediction network using a fully connected NN and predicting mean and standard deviation of the next state.

__init__(state_space: ~gymnasium.spaces.box.Box, context_space: ~gymnasium.spaces.box.Box, action_space: ~gymnasium.spaces.box.Box | ~gymnasium.spaces.discrete.Discrete, n_modes: int, hidden_sizes: int = [128, 128], nonlinearity: ~typing.Any = <class 'torch.nn.modules.activation.ReLU'>, one_hot_action: bool = False, input_batch_norm: bool = False, ignore_context: bool = True, device: ~torch.device = device(type='cpu'))[source]

Initialize the model.

Parameters:

state_space – The state space.
context_space – The context space (containing static information).
action_space – The action space.
n_modes – The number of modes to predict (in a mixture model).
hidden_size_lst – A list of hidden sizes for each layer.
one_hot_action – Whether the action comes already one-hot encoded.
input_batch_norm – Whether to use batch normalization on the input.
ignore_context – Whether to ignore the context and not concatenate it to the state.
device – The device to use.

forward(state: Tensor, context: Tensor, action: Tensor) → Tensor[source]

Predict the next state given the current state and action.

Parameters:

state – The current state.
context – The context, i.e., information that is constant over timesteps.
action – The action.

Returns:

Mean and standard deviation of next state, and termination probability.

class layeredrl.nets.RandomDynamics(state_space: Box, action_space: Box | Discrete, std: float = 0.3, device: device = device(type='cpu'), n_modes: int = 1)[source]

Bases: Module

Random but fixed dynamics from random linear transformation.

Note: Assumes only state deltas are to be predicted.

__init__(state_space: Box, action_space: Box | Discrete, std: float = 0.3, device: device = device(type='cpu'), n_modes: int = 1)[source]

Initialize the model.

Parameters:

state_space – The state space.
action_space – The action space.
device – The device to use.

forward(state: Tensor, action: Tensor) → Tensor[source]

Predict the next state given the current state and action.

Parameters:

state – The current state.
action – The action.

Returns:

Mean and standard deviation of next state and expected reward.

class layeredrl.nets.FixedEncoderNet(mapped_env_obs_shape: Tuple[int, ...], latent_state_dims: List[int], context_dims: List[int], device: device = device(type='cpu'))[source]

Bases: Encoder

A fixed map to a latent space picking out some dimensions of the observation.

__init__(mapped_env_obs_shape: Tuple[int, ...], latent_state_dims: List[int], context_dims: List[int], device: device = device(type='cpu'))[source]

Initialize the network.

Parameters:

mapped_env_obs_dim – The dimension of the mapped environment observation space.
latent_state_dim – The dimension of the latent state space.
context_dim – The dimension of the context space.
latent_state_dims – Which dimensions of the mapped observation to use as the latent state.
context_dims – Which dimensions of the mapped observation to use as the context.
device – The device to use.

property context_dim: int: Dimension of the context space.

decode(latent_state: Tensor, context: Tensor) → Tensor[source]: Decode the latent state.

forward(mapped_env_obs: Tensor) → Tuple[Tensor, Tensor][source]: Compute the latent state and context.

property latent_state_dim: int: Dimension of the latent state space.

class layeredrl.nets.Encoder[source]

Bases: ABC, Module

__init__()[source]

abstract property context_dim: int: Dimension of the context space.

abstract property latent_state_dim: int: Dimension of the latent state space.

class layeredrl.nets.EncoderNet(mapped_env_obs_shape: Tuple[int, ...], latent_state_dim: int, context_dim: int, standardize: bool = False, bn_momentum: float = 0.1, freeze_after: int = None, device: device = device(type='cpu'))[source]

Bases: Encoder

A linear encoder mapping env obs to a latent state and a context variable.

__init__(mapped_env_obs_shape: Tuple[int, ...], latent_state_dim: int, context_dim: int, standardize: bool = False, bn_momentum: float = 0.1, freeze_after: int = None, device: device = device(type='cpu'))[source]

Initialize the network.

Parameters:

mapped_env_obs_dim – The dimension of the mapped environment observation space.
latent_state_dim – The dimension of the latent state space.
standardize – Whether to standardize the input.
bn_momentum – The momentum for the batch norm.
freeze_after – The number of batches after which to freeze the normalization.
device – The device to use.

property context_dim: int: Dimension of the context space.

forward(mapped_env_obs: Tensor) → Tuple[Tensor, Tensor][source]

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

property latent_state_dim: int: Dimension of the latent state space.

class layeredrl.nets.ValueNet(state_dim: int, context_dim: int, hidden_sizes: int = [128, 128], nonlinearity: ~typing.Any = <class 'torch.nn.modules.activation.ReLU'>, device: ~torch.device = device(type='cpu'))[source]

Bases: Module

A value network that takes in state, context, and action (which is ignored).

__init__(state_dim: int, context_dim: int, hidden_sizes: int = [128, 128], nonlinearity: ~typing.Any = <class 'torch.nn.modules.activation.ReLU'>, device: ~torch.device = device(type='cpu'))[source]

Initialize the network.

Parameters:

state_dim – The dimension of the state space.
context_dim – The dimension of the context space.
hidden_sizes – A list of hidden sizes for each layer.
nonlinearity – The nonlinearity to use.

forward(state: Tensor, context: Tensor, action: Tensor) → Tensor[source]

Concatenate the state and context and feed it through the network.

Ignores action.

class layeredrl.nets.RewardNet(state_dim: int, context_dim: int, action_dim: int, hidden_sizes: int = [128, 128], nonlinearity: ~typing.Any = <class 'torch.nn.modules.activation.ReLU'>, device: ~torch.device = device(type='cpu'))[source]

Bases: Module

A reward function network that takes in state, context, and action.

__init__(state_dim: int, context_dim: int, action_dim: int, hidden_sizes: int = [128, 128], nonlinearity: ~typing.Any = <class 'torch.nn.modules.activation.ReLU'>, device: ~torch.device = device(type='cpu'))[source]

Initialize the network.

Parameters:

state_dim – The dimension of the state space.
context_dim – The dimension of the context space.
action_dim – The dimension of the action space.
hidden_sizes – A list of hidden sizes for each layer.
nonlinearity – The nonlinearity to use.

forward(state: Tensor, context: Tensor, action: Tensor) → Tensor[source]: Concatenate state, context and action and feed them to the network.

class layeredrl.nets.IdentityEncoder(mapped_env_obs_shape: Tuple[int, ...], device: device = device(type='cpu'))[source]

Bases: Encoder

Uses mapped environment observation directly as latent state.

__init__(mapped_env_obs_shape: Tuple[int, ...], device: device = device(type='cpu'))[source]

Initialize the network.

Parameters:

mapped_env_obs_dim – The dimension of the mapped environment observation space.
latent_state_dim – The dimension of the latent state space.
device – The device to use.

property context_dim: int: Dimension of the context space.

decode(latent_state: Tensor, context: Tensor) → Tensor[source]: Decode the latent state.

forward(mapped_env_obs: Tensor) → Tuple[Tensor, Tensor][source]: Compute the latent state and context.

property latent_state_dim: int: Dimension of the latent state space.