Models and Predictors
Each level can have a Predictor object, which contains all models that predict certain
aspects of the (Semi-)MDP the level interacts with, like
dynamics model,
reward model,
value function, and
encoder.
The predictor is responsible for implementing and training these components. The main purpose of the predictor is to enable planning. The encoder additionally predicts a state and a context (assumed to be constant in each episode) from observations.
Furthermore, each level has access to the predictor
of its parent level (if it exists) via the parent_predictor attribute. This is useful for skill learning
methods that compute an estimate of a mutual-information-based reward based on the dynamics model of the predictor.
Models
In LayeredRL, models refer to dynamics models, which predict the next state given the current state and an action.
s_next_mean, weights, s_next_std, term_prob = model.predict(state, context, action)
Here, weights refers to the weights of a mixture of Gaussians with the corresponding means in s_next_mean.
Predicting repeatedly yields a rollout:
traj = model.rollout(initial_state, context, actions)
where traj is a Batch object which contains keys for states, termination probabilities etc.
Probabilistic Ensemble
ProbabilisticEnsemble is a model that enables uncertainty estimation by learning an ensemble of
dynamics models. It furthermore supports several particles per ensemble member and several Gaussian modes per network particle.
from layeredrl.models import ProbabilisticEnsemble
ensemble = ProbabilisticEnsemble(
state_space=...,
context_space=...,
action_space=...,
partial_net=..., # factory function for ensemble member networks
n_models=...,
n_modes=...,
n_particles_per_model=...,
)
Benefits:
Estimates uncertainty via ensemble disagreement
Can be used for exploration bonuses
Can model transitions via mixture of Gaussians
Predictors
The following predictors are currently implemented in LayeredRL:
Static Predictor
Does not train any of its components. Can be used with known/manually defined dynamics and reward models, for example:
from layeredrl.predictors import StaticPredictor
from layeredrl.nets import IdentityEncoder
encoder = IdentityEncoder(obs.shape)
predictor = StaticPredictor(
model=model,
val_func=val_func,
rew_func=rew_func,
encoder=encoder,
latent_state_dim=...,
context_dim=0,
)
Reward Predictor
Trains dynamics model, reward function, and value function. It optionally also trains the encoder by backpropagating through it from the reward loss.
from layeredrl.predictors import RewardPredictor
from layeredrl.nets import EncoderNet
encoder = EncoderNet(
mapped_env_obs_shape=obs.shape,
latent_state_dim=...,
context_dim=...,
)
predictor = RewardPredictor(
model=model,
val_func=val_func,
rew_func=rew_func,
encoder=encoder,
latent_state_dim=...,
context_dim=...,
learn_encoder=True,
)
Encoders
Encoders map raw observations to learned representations of states and contexts.
Identity Encoder
Pass observations through unchanged as state:
from layeredrl.nets import IdentityEncoder
encoder = IdentityEncoder(obs.shape)
state, context = encoder(obs)
# state is equal to obs
# context.numel() is 0
Fixed Encoder
Picks out specified dimensions as state and context:
from layeredrl.nets import FixedEncoderNet
encoder = FixedEncoderNet(
mapped_env_obs_shape=obs.shape,
latent_state_dims=..., # List of indices making up state
context_dims=..., # List of indices making up context
)
Learned Encoder
Trainable linear encoder:
from layeredrl.nets import EncoderNet
encoder = EncoderNet(
mapped_env_obs_shape=obs.shape,
latent_state_dim=...,
context_dim=...,
)
Predictor training
The most convenient way of training a predictor is to use it as part of a PlannerLevel. However,
it is possible to train a predictor manually:
loss, loss_info = predictor.learn(
buffer,
n_updates,
batch_size,
model_batch_size,
n_total_env_steps,
)
With the arguments
buffer: A replay buffer containing transitionsn_updates: Desired number of updatesmodel_batch_sizeandbatch_size: Batch sizes for dynamics model and other models respectivelyn_total_env_steps: Number of environment steps passed in all environments collectively. Relevant for learning rate schedules.
API Reference
For detailed API documentation, see: