Neural Networks API
- class layeredrl.nets.ConcatNet(mapped_env_obs_shape: int, level_input_dim: int, level_state_dims: Dict[str, int], scale_remaining_steps: float = 1.0, *args, **kwargs)[source]
Bases:
NetA network that concatenates mapped_env_obs, level_input, action, n_remaining_states, and level_state along the last dimension.
For use with Tianshou level.
- class layeredrl.nets.Critic(*args, **kwargs)[source]
Bases:
CriticCritic network for Tianshou level.
- class layeredrl.nets.ProbFCDynamics(state_space: ~gymnasium.spaces.box.Box, context_space: ~gymnasium.spaces.box.Box, action_space: ~gymnasium.spaces.box.Box | ~gymnasium.spaces.discrete.Discrete, n_modes: int, hidden_sizes: int = [128, 128], nonlinearity: ~typing.Any = <class 'torch.nn.modules.activation.ReLU'>, one_hot_action: bool = False, input_batch_norm: bool = False, ignore_context: bool = True, device: ~torch.device = device(type='cpu'))[source]
Bases:
ModuleDynamics prediction network using a fully connected NN and predicting mean and standard deviation of the next state.
- __init__(state_space: ~gymnasium.spaces.box.Box, context_space: ~gymnasium.spaces.box.Box, action_space: ~gymnasium.spaces.box.Box | ~gymnasium.spaces.discrete.Discrete, n_modes: int, hidden_sizes: int = [128, 128], nonlinearity: ~typing.Any = <class 'torch.nn.modules.activation.ReLU'>, one_hot_action: bool = False, input_batch_norm: bool = False, ignore_context: bool = True, device: ~torch.device = device(type='cpu'))[source]
Initialize the model.
- Parameters:
state_space – The state space.
context_space – The context space (containing static information).
action_space – The action space.
n_modes – The number of modes to predict (in a mixture model).
hidden_size_lst – A list of hidden sizes for each layer.
one_hot_action – Whether the action comes already one-hot encoded.
input_batch_norm – Whether to use batch normalization on the input.
ignore_context – Whether to ignore the context and not concatenate it to the state.
device – The device to use.
- forward(state: Tensor, context: Tensor, action: Tensor) Tensor[source]
Predict the next state given the current state and action.
- Parameters:
state – The current state.
context – The context, i.e., information that is constant over timesteps.
action – The action.
- Returns:
Mean and standard deviation of next state, and termination probability.
- class layeredrl.nets.RandomDynamics(state_space: Box, action_space: Box | Discrete, std: float = 0.3, device: device = device(type='cpu'), n_modes: int = 1)[source]
Bases:
ModuleRandom but fixed dynamics from random linear transformation.
Note: Assumes only state deltas are to be predicted.
- class layeredrl.nets.FixedEncoderNet(mapped_env_obs_shape: Tuple[int, ...], latent_state_dims: List[int], context_dims: List[int], device: device = device(type='cpu'))[source]
Bases:
EncoderA fixed map to a latent space picking out some dimensions of the observation.
- __init__(mapped_env_obs_shape: Tuple[int, ...], latent_state_dims: List[int], context_dims: List[int], device: device = device(type='cpu'))[source]
Initialize the network.
- Parameters:
mapped_env_obs_dim – The dimension of the mapped environment observation space.
latent_state_dim – The dimension of the latent state space.
context_dim – The dimension of the context space.
latent_state_dims – Which dimensions of the mapped observation to use as the latent state.
context_dims – Which dimensions of the mapped observation to use as the context.
device – The device to use.
- class layeredrl.nets.EncoderNet(mapped_env_obs_shape: Tuple[int, ...], latent_state_dim: int, context_dim: int, standardize: bool = False, bn_momentum: float = 0.1, freeze_after: int = None, device: device = device(type='cpu'))[source]
Bases:
EncoderA linear encoder mapping env obs to a latent state and a context variable.
- __init__(mapped_env_obs_shape: Tuple[int, ...], latent_state_dim: int, context_dim: int, standardize: bool = False, bn_momentum: float = 0.1, freeze_after: int = None, device: device = device(type='cpu'))[source]
Initialize the network.
- Parameters:
mapped_env_obs_dim – The dimension of the mapped environment observation space.
latent_state_dim – The dimension of the latent state space.
standardize – Whether to standardize the input.
bn_momentum – The momentum for the batch norm.
freeze_after – The number of batches after which to freeze the normalization.
device – The device to use.
- forward(mapped_env_obs: Tensor) Tuple[Tensor, Tensor][source]
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class layeredrl.nets.ValueNet(state_dim: int, context_dim: int, hidden_sizes: int = [128, 128], nonlinearity: ~typing.Any = <class 'torch.nn.modules.activation.ReLU'>, device: ~torch.device = device(type='cpu'))[source]
Bases:
ModuleA value network that takes in state, context, and action (which is ignored).
- __init__(state_dim: int, context_dim: int, hidden_sizes: int = [128, 128], nonlinearity: ~typing.Any = <class 'torch.nn.modules.activation.ReLU'>, device: ~torch.device = device(type='cpu'))[source]
Initialize the network.
- Parameters:
state_dim – The dimension of the state space.
context_dim – The dimension of the context space.
hidden_sizes – A list of hidden sizes for each layer.
nonlinearity – The nonlinearity to use.
- class layeredrl.nets.RewardNet(state_dim: int, context_dim: int, action_dim: int, hidden_sizes: int = [128, 128], nonlinearity: ~typing.Any = <class 'torch.nn.modules.activation.ReLU'>, device: ~torch.device = device(type='cpu'))[source]
Bases:
ModuleA reward function network that takes in state, context, and action.
- __init__(state_dim: int, context_dim: int, action_dim: int, hidden_sizes: int = [128, 128], nonlinearity: ~typing.Any = <class 'torch.nn.modules.activation.ReLU'>, device: ~torch.device = device(type='cpu'))[source]
Initialize the network.
- Parameters:
state_dim – The dimension of the state space.
context_dim – The dimension of the context space.
action_dim – The dimension of the action space.
hidden_sizes – A list of hidden sizes for each layer.
nonlinearity – The nonlinearity to use.
- class layeredrl.nets.IdentityEncoder(mapped_env_obs_shape: Tuple[int, ...], device: device = device(type='cpu'))[source]
Bases:
EncoderUses mapped environment observation directly as latent state.
- __init__(mapped_env_obs_shape: Tuple[int, ...], device: device = device(type='cpu'))[source]
Initialize the network.
- Parameters:
mapped_env_obs_dim – The dimension of the mapped environment observation space.
latent_state_dim – The dimension of the latent state space.
device – The device to use.