Models and Predictors

Each level can have a Predictor object, which contains all models that predict certain aspects of the (Semi-)MDP the level interacts with, like

dynamics model,
reward model,
value function, and
encoder.

The predictor is responsible for implementing and training these components. The main purpose of the predictor is to enable planning. The encoder additionally predicts a state and a context (assumed to be constant in each episode) from observations.

Furthermore, each level has access to the predictor of its parent level (if it exists) via the parent_predictor attribute. This is useful for skill learning methods that compute an estimate of a mutual-information-based reward based on the dynamics model of the predictor.

Models

In LayeredRL, models refer to dynamics models, which predict the next state given the current state and an action.

s_next_mean, weights, s_next_std, term_prob = model.predict(state, context, action)

Here, weights refers to the weights of a mixture of Gaussians with the corresponding means in s_next_mean.

Predicting repeatedly yields a rollout:

traj = model.rollout(initial_state, context, actions)

where traj is a Batch object which contains keys for states, termination probabilities etc.

Probabilistic Ensemble

ProbabilisticEnsemble is a model that enables uncertainty estimation by learning an ensemble of dynamics models. It furthermore supports several particles per ensemble member and several Gaussian modes per network particle.

from layeredrl.models import ProbabilisticEnsemble

ensemble = ProbabilisticEnsemble(
    state_space=...,
    context_space=...,
    action_space=...,
    partial_net=...,  # factory function for ensemble member networks
    n_models=...,
    n_modes=...,
    n_particles_per_model=...,
)

Benefits:

Estimates uncertainty via ensemble disagreement
Can be used for exploration bonuses
Can model transitions via mixture of Gaussians

Predictors

The following predictors are currently implemented in LayeredRL:

Static Predictor

Does not train any of its components. Can be used with known/manually defined dynamics and reward models, for example:

from layeredrl.predictors import StaticPredictor
from layeredrl.nets import IdentityEncoder

encoder = IdentityEncoder(obs.shape)
predictor = StaticPredictor(
    model=model,
    val_func=val_func,
    rew_func=rew_func,
    encoder=encoder,
    latent_state_dim=...,
    context_dim=0,
)

Reward Predictor

Trains dynamics model, reward function, and value function. It optionally also trains the encoder by backpropagating through it from the reward loss.

from layeredrl.predictors import RewardPredictor
from layeredrl.nets import EncoderNet

encoder = EncoderNet(
    mapped_env_obs_shape=obs.shape,
    latent_state_dim=...,
    context_dim=...,
)
predictor = RewardPredictor(
    model=model,
    val_func=val_func,
    rew_func=rew_func,
    encoder=encoder,
    latent_state_dim=...,
    context_dim=...,
    learn_encoder=True,
)

Encoders

Encoders map raw observations to learned representations of states and contexts.

Identity Encoder

Pass observations through unchanged as state:

from layeredrl.nets import IdentityEncoder

encoder = IdentityEncoder(obs.shape)
state, context = encoder(obs)

# state is equal to obs
# context.numel() is 0

Fixed Encoder

Picks out specified dimensions as state and context:

from layeredrl.nets import FixedEncoderNet

encoder = FixedEncoderNet(
    mapped_env_obs_shape=obs.shape,
    latent_state_dims=...,  # List of indices making up state
    context_dims=...,  # List of indices making up context
)

Learned Encoder

Trainable linear encoder:

from layeredrl.nets import EncoderNet

encoder = EncoderNet(
    mapped_env_obs_shape=obs.shape,
    latent_state_dim=...,
    context_dim=...,
)

Predictor training

The most convenient way of training a predictor is to use it as part of a PlannerLevel. However, it is possible to train a predictor manually:

loss, loss_info = predictor.learn(
    buffer,
    n_updates,
    batch_size,
    model_batch_size,
    n_total_env_steps,
)

With the arguments

buffer: A replay buffer containing transitions
n_updates: Desired number of updates
model_batch_size and batch_size: Batch sizes for dynamics model and other models respectively
n_total_env_steps: Number of environment steps passed in all environments collectively. Relevant for learning rate schedules.

API Reference

For detailed API documentation, see: