Models API

class layeredrl.models.Model(n_models: int = 1, n_particles_per_model: int = 1, device: device = device(type='cpu'))[source]

Bases: ABC, Module

Abstract base class for dynamics models.

__init__(n_models: int = 1, n_particles_per_model: int = 1, device: device = device(type='cpu'))[source]

Initialize the model.

Parameters:
  • n_particles – The number of particles in the ensemble.

  • device – The device to use.

abstractmethod get_log_prob(state: Tensor, context: Tensor, action: Tensor, next_state: Tensor) Tuple[source]

Get log of probability (density) of next state, the termination probability, and an info dict.

Note that everything is assumed to have a ‘batch’ dimension, useful for parallelization.

Parameters:
  • state – The current state.

  • context – The context, i.e., information that is constant over timesteps.

  • action – The action.

  • next_state – The next state.

Returns:

  • The log probability (density) of the next state given the current state and action under the model.

  • The termination probability given the current state and action under the model.

  • An info dict with additional information.

Return type:

A tuple containing

abstractmethod get_parameters() Tensor[source]

Get the parameters of the model.

Returns:

An iterator over the parameters of the model.

get_prob(state: Tensor, context: Tensor, action: Tensor, next_state: Tensor) Tuple[source]

Get probability (density) of next state, the termination probability, and an info dict.

Note that everything is assumed to have a ‘batch’ dimension, useful for parallelization.

Parameters:
  • state – The current state.

  • context – The context, i.e., information that is constant over timesteps.

  • action – The action.

  • next_state – The next state.

Returns:

  • The probability (density) of the next state given the current state and action under the model.

  • The termination probability given the current state and action under the model.

  • An info dict with additional information.

Return type:

A tuple containing

learn(batch_lst: List[Batch]) None[source]

Learn from the given batch.

Parameters:

batch_lst – A list of training batches. The first dimension corresponds to the transitions.

Returns:

The loss after the updates.

abstractmethod loss(batch: Batch) Tensor[source]

Compute the loss for the given batch.

Parameters:
  • batch – The batch. The first dimension corresponds to the batch dimension (e.g. environments).

  • example (For)

  • = (batch.state.shape)

Returns:

The loss.

abstractmethod predict(state: Tensor, context: Tensor, action: Tensor, std: Tensor | None = None) Tuple[Tensor, ...][source]

Predict the next state given the current state and action.

Note that everything is assumed to have a ‘batch’ dimension, useful for parallelizing, e.g. for vectorized environments. This uses the mean of the predicted distribution and does not sample.

Parameters:
  • state – The current state.

  • context – The context, i.e., information that is constant over timesteps.

  • action – The action.

  • std – Overwrites the standard deviation that the model predicts if provided.

Returns:

Mean, weights and standard deviations of the modes of the mixture of Gaussians that make up the ensemble. Also averaged termination probability. Shape for state and std: (batch_size, n_models, n_modes, state_dim) Shape for weights: (batch_size, n_models, n_modes) Shape for term_prob: (batch_size)

abstractmethod rollout(initial_state: Tensor, context: Tensor, actions: Tensor, deterministic: bool = False) Tensor[source]

Rollout with the given actions from the given initial state (open loop).

Note that everything is assumed to have a ‘batch’ dimension, useful for parallelizing, e.g. for vectorized environments.

Parameters:
  • initial_state – The initial state. Shape: (batch_size, state_dim)

  • context – The context, i.e., information that is constant over the whole rollout.

  • actions – The actions. Shape: (batch_size, horizon, action_dim)

  • deterministic – Whether to use the mean of the predicted distribution or to sample from it.

Returns:

Batch containing the resulting states, termination probabilities and aleatoric and epistemic uncertanties. state shape: (batch_size, n_models, n_particles_per_model, horizon + 1, state_dim) state_mean shape: (batch_size, n_models, horizon + 1, state_dim) term_prob shape: (batch_size, n_models, n_particles_per_model, horizon)

abstractmethod sample(state: Tensor, context: Tensor, action: Tensor) Tensor[source]

Sample the next state given the current state and action.

Note that everything is assumed to have a ‘batch’ dimension, useful for parallelizing, e.g. for vectorized environments.

Parameters:
  • state – The current state.

  • context – The context, i.e., information that is constant over timesteps.

  • action – The action.

Returns:

The sampled next state.

set_n_total_env_steps(n_total_env_steps: int) None[source]

Set the total number of environment steps.

Parameters:

n_total_env_steps – The total number of environment steps.

class layeredrl.models.ProbabilisticEnsemble(state_space: Box, context_space: Box, action_space: Box | Discrete, partial_net: Callable[[Box, Box, Box | Discrete], Module], n_models: int = 1, n_modes: int = 1, n_particles_per_model: int = 1, learning_rate: float = 0.001, create_optimizer: bool = False, device: device = device(type='cpu'), predict_delta: bool = True, normalize_targets: bool = True, target_bn_momentum: float = 0.01, weighted_loss: bool = False, symmetry_breaking_start: bool = False, sb_start_duration: int = 100000, sb_start_factor: float = 1.5, partition_batch: bool = False, **kwargs)[source]

Bases: Model

Implementation of a Probabilistic Ensemble of dynamics models.

See https://arxiv.org/abs/1805.12114.

__init__(state_space: Box, context_space: Box, action_space: Box | Discrete, partial_net: Callable[[Box, Box, Box | Discrete], Module], n_models: int = 1, n_modes: int = 1, n_particles_per_model: int = 1, learning_rate: float = 0.001, create_optimizer: bool = False, device: device = device(type='cpu'), predict_delta: bool = True, normalize_targets: bool = True, target_bn_momentum: float = 0.01, weighted_loss: bool = False, symmetry_breaking_start: bool = False, sb_start_duration: int = 100000, sb_start_factor: float = 1.5, partition_batch: bool = False, **kwargs)[source]

Initialize the model.

Parameters:
  • state_space – The state space.

  • context_space – The context space (containing static information).

  • action_space – The action space.

  • partial_net – A function that takes in the state, context and action space and the number of modes and returns a randomly initialized neural network. This neural network should take in a state and an action and return: (state_mean, state_std), termination probability

  • n_models – The number of models in the ensemble.

  • n_modes – The number of modes to use for the Gaussian mixture model.

  • n_particles_per_model – The number of particles per model.

  • learning_rate – The learning rate for the supervised learning of the model.

  • create_optimizer – Whether to create an optimizer for the model.

  • device – The device to use.

  • predict_delta – Whether to internally predict the change in state instead of the next state.

  • normalize_targets – Whether to normalize the targets of the model during learning. If predict_delta is True, the delta is normalized.

  • target_bn_momentum – The momentum for the batch normalization of the targets.

  • weighted_loss – Whether to weight the contribution of each transition to the loss. Requires the batch to have a ‘weight’ key.

  • symmetry_breaking_start – Whether to start with a fixed but random model for specified number of calls to the learn method. This is useful to break the symmetry between skills when using the model with a skill learning method like SPlaTES or DADS.

  • sb_start_duration – The number of total env steps during which to use the fixed but random model.

  • sb_start_factor – The factor by which to multiply the deltas predicted by the random model during the symmetry breaking start.

  • partition_batch – Whether to partition the batch such that each network is trained on a different batch.

  • **kwargs – Additional keyword arguments for Model class.

get_log_prob(state: Tensor, context: Tensor, action: Tensor, next_state: Tensor, std: Tensor | None = None, loss_mode: bool = False, transform: Module | None = None) Tuple[source]

Get log probability (density) of next state and the termination probability for full ensemble.

Note that everything is assumed to have a ‘batch’ dimension, useful for parallelization.

Parameters:
  • state – The current state.

  • context – The context, i.e., information that is constant over timesteps.

  • action – The action.

  • next_state – The next state.

  • std – Overwrites the standard deviation that the model predicts if provided.

  • loss_mode – Whether to always use the true model instead of the symmetry breaking model.

  • transform – A transformation to apply to the state before computing the log probability. Note that this only makes sense with a given standard deviation, not with the learned one.

Returns:

  • The log probability (density) of the next state given the current state and action under the model.

  • The termination probability given the current state and action under the model.

  • An info dict with additional information.

Return type:

A tuple containing

get_member_log_prob(id: int, state: Tensor, context: Tensor, action: Tensor, next_state: Tensor, std: Tensor | None = None, loss_mode: bool = False, transform: Module | None = None) Tuple[Tensor, ...][source]

Get probability (density) of next state and the termination probability for ensemble member.

If batch normalization is used, it matters whether the Module is in eval or train mode. Note that everything is assumed to have a ‘batch’ dimension, useful for parallelization.

Parameters:
  • state – The current state.

  • context – The context, i.e., information that is constant over timesteps.

  • action – The action.

  • next_state – The next state.

  • std – Overwrites the standard deviation that the model predicts if provided.

  • loss_mode – Whether to always use the true model instead of the symmetry breaking model.

  • transform – A transformation to apply to the state before computing the log probability. Note that this only makes sense with a given standard deviation, not with the learned one.

Returns:

  • The probability (density) of the next state given the current state and action under the model.

  • The termination probability given the current state and action under the model.

  • The mean of the predicted distribution.

Return type:

A tuple containing

get_parameters() Generator[Tensor, None, None][source]

Get the parameters of the model.

Returns:

An iterator over the parameters of the model.

learn(batch_lst: List[Batch]) None[source]

Learn from the given batch.

Parameters:

batch_lst – A list of training batches. The first dimension corresponds to the transitions.

Returns:

The loss after the updates.

loss(batch: Batch) Tensor[source]

Compute the loss for the given batch.

Parameters:
  • batch – The batch. The first dimension corresponds to the batch dimension (e.g. environments).

  • example (For)

  • = (batch.state.shape)

Returns:

The loss.

predict(state: Tensor, context: Tensor, action: Tensor, std: Tensor | None = None) Tuple[Tensor, ...][source]

Predict the next state given the current state and action.

Note that everything is assumed to have a ‘batch’ dimension, useful for parallelizing, e.g. for vectorized environments. This uses the mean of the predicted distribution and does not sample.

Parameters:
  • state – The current state.

  • context – The context, i.e., information that is constant over timesteps.

  • action – The action.

  • std – Overwrites the standard deviation that the model predicts if provided.

Returns:

Mean, weights and standard deviations of the modes of the mixture of Gaussians that make up the ensemble. Also averaged termination probability. Shape for state and std: (batch_size, n_models, n_modes, state_dim) Shape for weights: (batch_size, n_models, n_modes) Shape for term_prob: (batch_size)

rollout(initial_state: Tensor, context: Tensor, actions: Tensor, deterministic: bool = False, loss_mode: bool = False) Tensor[source]

Rollout with the given actions from the given initial state (open loop).

Note that everything is assumed to have a ‘batch’ dimension, useful for parallelizing, e.g. for vectorized environments.

Parameters:
  • initial_state – The initial state. Shape: (batch_size, state_dim)

  • context – The context, i.e., information that is constant over the whole rollout.

  • actions – The actions. Shape: (batch_size, horizon, action_dim)

  • deterministic – Whether to use the mean of the predicted distribution or to sample from it.

  • loss_mode – Whether to always use the true model instead of the symmetry breaking model.

Returns:

Batch containing the resulting states, termination probabilities and aleatoric and epistemic uncertanties. state shape: (batch_size, n_models, n_particles_per_model, horizon + 1, state_dim) state_mean shape: (batch_size, n_models, horizon + 1, state_dim) term_prob shape: (batch_size, n_models, n_particles_per_model, horizon)

sample(state: Tensor, context: Tensor, action: Tensor) Tensor[source]

Sample the next state given the current state and action.

Note that everything is assumed to have a ‘batch’ dimension, useful for parallelizing, e.g. for vectorized environments.

Parameters:
  • state – The current state.

  • context – The context, i.e., information that is constant over timesteps.

  • action – The action.

Returns:

The sampled next state.