Models and Predictors ===================== Each level can have a :class:`~layeredrl.predictors.Predictor` object, which contains all models that predict certain aspects of the (Semi-)MDP the level interacts with, like * dynamics model, * reward model, * value function, and * encoder. The predictor is responsible for implementing and training these components. The main purpose of the predictor is to enable planning. The encoder additionally predicts a state and a context (assumed to be constant in each episode) from observations. Furthermore, each level has access to the predictor of its parent level (if it exists) via the ``parent_predictor`` attribute. This is useful for skill learning methods that compute an estimate of a mutual-information-based reward based on the dynamics model of the predictor. Models ------ In LayeredRL, models refer to dynamics models, which predict the next state given the current state and an action. .. code-block:: python s_next_mean, weights, s_next_std, term_prob = model.predict(state, context, action) Here, `weights` refers to the weights of a mixture of Gaussians with the corresponding means in `s_next_mean`. Predicting repeatedly yields a rollout: .. code-block:: python traj = model.rollout(initial_state, context, actions) where ``traj`` is a Batch object which contains keys for states, termination probabilities etc. Probabilistic Ensemble ^^^^^^^^^^^^^^^^^^^^^^ :class:`~layeredrl.models.ProbabilisticEnsemble` is a model that enables uncertainty estimation by learning an ensemble of dynamics models. It furthermore supports several particles per ensemble member and several Gaussian modes per network particle. .. code-block:: python from layeredrl.models import ProbabilisticEnsemble ensemble = ProbabilisticEnsemble( state_space=..., context_space=..., action_space=..., partial_net=..., # factory function for ensemble member networks n_models=..., n_modes=..., n_particles_per_model=..., ) **Benefits:** * Estimates uncertainty via ensemble disagreement * Can be used for exploration bonuses * Can model transitions via mixture of Gaussians Predictors ---------- The following predictors are currently implemented in LayeredRL: Static Predictor ^^^^^^^^^^^^^^^^ Does not train any of its components. Can be used with known/manually defined dynamics and reward models, for example: .. code-block:: python from layeredrl.predictors import StaticPredictor from layeredrl.nets import IdentityEncoder encoder = IdentityEncoder(obs.shape) predictor = StaticPredictor( model=model, val_func=val_func, rew_func=rew_func, encoder=encoder, latent_state_dim=..., context_dim=0, ) Reward Predictor ^^^^^^^^^^^^^^^^ Trains dynamics model, reward function, and value function. It optionally also trains the encoder by backpropagating through it from the reward loss. .. code-block:: python from layeredrl.predictors import RewardPredictor from layeredrl.nets import EncoderNet encoder = EncoderNet( mapped_env_obs_shape=obs.shape, latent_state_dim=..., context_dim=..., ) predictor = RewardPredictor( model=model, val_func=val_func, rew_func=rew_func, encoder=encoder, latent_state_dim=..., context_dim=..., learn_encoder=True, ) Encoders -------- Encoders map raw observations to learned representations of states and contexts. Identity Encoder ^^^^^^^^^^^^^^^^ Pass observations through unchanged as state: .. code-block:: python from layeredrl.nets import IdentityEncoder encoder = IdentityEncoder(obs.shape) state, context = encoder(obs) # state is equal to obs # context.numel() is 0 Fixed Encoder ^^^^^^^^^^^^^ Picks out specified dimensions as state and context: .. code-block:: python from layeredrl.nets import FixedEncoderNet encoder = FixedEncoderNet( mapped_env_obs_shape=obs.shape, latent_state_dims=..., # List of indices making up state context_dims=..., # List of indices making up context ) Learned Encoder ^^^^^^^^^^^^^^^ Trainable linear encoder: .. code-block:: python from layeredrl.nets import EncoderNet encoder = EncoderNet( mapped_env_obs_shape=obs.shape, latent_state_dim=..., context_dim=..., ) Predictor training ------------------ The most convenient way of training a predictor is to use it as part of a :class:`~layeredrl.levels.PlannerLevel`. However, it is possible to train a predictor manually: .. code-block:: python loss, loss_info = predictor.learn( buffer, n_updates, batch_size, model_batch_size, n_total_env_steps, ) With the arguments * ``buffer``: A replay buffer containing transitions * ``n_updates``: Desired number of updates * ``model_batch_size`` and ``batch_size``: Batch sizes for dynamics model and other models respectively * ``n_total_env_steps``: Number of environment steps passed in all environments collectively. Relevant for learning rate schedules. API Reference ------------- For detailed API documentation, see: * :doc:`../api/models` * :doc:`../api/predictors` * :doc:`../api/nets`