Models and Predictors

Each level can have a Predictor object, which contains all models that predict certain aspects of the (Semi-)MDP the level interacts with, like

  • dynamics model,

  • reward model,

  • value function, and

  • encoder.

The predictor is responsible for implementing and training these components. The main purpose of the predictor is to enable planning. The encoder additionally predicts a state and a context (assumed to be constant in each episode) from observations.

Furthermore, each level has access to the predictor of its parent level (if it exists) via the parent_predictor attribute. This is useful for skill learning methods that compute an estimate of a mutual-information-based reward based on the dynamics model of the predictor.

Models

In LayeredRL, models refer to dynamics models, which predict the next state given the current state and an action.

s_next_mean, weights, s_next_std, term_prob = model.predict(state, context, action)

Here, weights refers to the weights of a mixture of Gaussians with the corresponding means in s_next_mean.

Predicting repeatedly yields a rollout:

traj = model.rollout(initial_state, context, actions)

where traj is a Batch object which contains keys for states, termination probabilities etc.

Probabilistic Ensemble

ProbabilisticEnsemble is a model that enables uncertainty estimation by learning an ensemble of dynamics models. It furthermore supports several particles per ensemble member and several Gaussian modes per network particle.

from layeredrl.models import ProbabilisticEnsemble

ensemble = ProbabilisticEnsemble(
    state_space=...,
    context_space=...,
    action_space=...,
    partial_net=...,  # factory function for ensemble member networks
    n_models=...,
    n_modes=...,
    n_particles_per_model=...,
)

Benefits:

  • Estimates uncertainty via ensemble disagreement

  • Can be used for exploration bonuses

  • Can model transitions via mixture of Gaussians

Predictors

The following predictors are currently implemented in LayeredRL:

Static Predictor

Does not train any of its components. Can be used with known/manually defined dynamics and reward models, for example:

from layeredrl.predictors import StaticPredictor
from layeredrl.nets import IdentityEncoder

encoder = IdentityEncoder(obs.shape)
predictor = StaticPredictor(
    model=model,
    val_func=val_func,
    rew_func=rew_func,
    encoder=encoder,
    latent_state_dim=...,
    context_dim=0,
)

Reward Predictor

Trains dynamics model, reward function, and value function. It optionally also trains the encoder by backpropagating through it from the reward loss.

from layeredrl.predictors import RewardPredictor
from layeredrl.nets import EncoderNet

encoder = EncoderNet(
    mapped_env_obs_shape=obs.shape,
    latent_state_dim=...,
    context_dim=...,
)
predictor = RewardPredictor(
    model=model,
    val_func=val_func,
    rew_func=rew_func,
    encoder=encoder,
    latent_state_dim=...,
    context_dim=...,
    learn_encoder=True,
)

Encoders

Encoders map raw observations to learned representations of states and contexts.

Identity Encoder

Pass observations through unchanged as state:

from layeredrl.nets import IdentityEncoder

encoder = IdentityEncoder(obs.shape)
state, context = encoder(obs)

# state is equal to obs
# context.numel() is 0

Fixed Encoder

Picks out specified dimensions as state and context:

from layeredrl.nets import FixedEncoderNet

encoder = FixedEncoderNet(
    mapped_env_obs_shape=obs.shape,
    latent_state_dims=...,  # List of indices making up state
    context_dims=...,  # List of indices making up context
)

Learned Encoder

Trainable linear encoder:

from layeredrl.nets import EncoderNet

encoder = EncoderNet(
    mapped_env_obs_shape=obs.shape,
    latent_state_dim=...,
    context_dim=...,
)

Predictor training

The most convenient way of training a predictor is to use it as part of a PlannerLevel. However, it is possible to train a predictor manually:

loss, loss_info = predictor.learn(
    buffer,
    n_updates,
    batch_size,
    model_batch_size,
    n_total_env_steps,
)

With the arguments

  • buffer: A replay buffer containing transitions

  • n_updates: Desired number of updates

  • model_batch_size and batch_size: Batch sizes for dynamics model and other models respectively

  • n_total_env_steps: Number of environment steps passed in all environments collectively. Relevant for learning rate schedules.

API Reference

For detailed API documentation, see: