Quickstart
==========

This guide will help you get started with LayeredRL by creating a simple hierarchical RL setup.

Basic Hierarchy
---------------

This is a minimal example of creating and training a hierarchy. The lower level learns
SPlaTES skills and the higher level generates plans that chain these skills to achieve
a high return. Setting the autoreset mode of gymnasium to ``SAME_STEP`` is necessary to make
sure LayeredRL can process environment resets correctly.

.. code-block:: python

    import gymnasium as gym
    from layeredrl.hierarchies import Hierarchy
    from layeredrl.levels import PlannerLevel, SPlaTESLevel
    from layeredrl.predictors import get_default_predictor_factory

    skill_space_dim = ...  # dimensionality of skill vector space

    env = gym.make_vec(
        id="...",
        num_envs=...,
        vector_kwargs={"autoreset_mode": gym.vector.AutoresetMode.SAME_STEP},
    )

    predictor_factory = get_default_predictor_factory(env)
    planner_factory = partial(CEMPlanner)
    planner_level = PlannerLevel(
        partial_planner=planner_factory,
        predictor_factory=predictor_factory,
        initial_guess=torch.zeros(skill_space_dim),
        horizon=...,
    )

    splates_level = SPlaTESLevel(
        skill_space_dim=skill_space_dim,
        control_interval=...,
    )

    # Create a simple two-level hierarchy
    hierarchy = Hierarchy(
        levels=[
            planner_level,  # Higher level
            splates_level,  # Lower level
        ]
    )

    # Train the hierarchy
    hierarchy.train()
    collector = Collector(hierarchy=hierarchy, env=env)
    collector.reset()
    stats = collector.collect(n_steps=..., learn=True)
    print(f"Training stats: {stats}")

:meth:`~layeredrl.predictors.get_default_predictor_factory` returns a predictor factory that creates a `Predictor` object
that models high-level transitions. It assumes that the environment is goal-based and interprets the
desired goal as the context and the achieved goal as the state for the planner level.

For full code with reasonable hyperparameters for the Maze2D-Medium-v0 environment, see the 
``splates_hierarchy.py`` example.

Achieving good performance on a specific environment generally requires choosing appropriate
hyperparameters and potentially choosing or learning a custom encoder for the planner level.
For an example of SPlaTES running on more challenging MuJoCo environments, see the SPlaTES repository (TODO).

Logging with Tensorboard and Weights & Biases
---------------------------------------------

For logging with tensorboard, pass a ``SummaryWriter`` object to each level you want to
participate in logging and to the collector to monitor return and success rate during training:

.. code-block:: python

    from torch.utils.tensorboard import SummaryWriter

    writer = SummaryWriter("path/to/logdir")
    planner_level = PlannerLevel(...,writer=writer)
    splates_level = SPlaTESLevel(...,writer=writer)
    ...
    collector = Collector(...,writer=writer)

To additionally log with Weights & Biases, set ``sync_tensorboard=True``:

.. code-block:: python

    import wandb

    wandb.init(
        project="project_name",
        sync_tensorboard=True,
        name="run_name",
        dir="/log/dir",
    )

Testing periodically during training
------------------------------------

While training return and success rate are monitored by default by :meth:`~layeredrl.collectors.Collector.collect`,
it can make sense to also periodically test with :meth:`layeredrl.hierarchies.Hierarchy.eval()` as this 
may disable exploration noise (depending on the level type). Simply instantiate a second test 
vector environment and pass it to :meth:`~layeredrl.collectors.Collector.collect`:

.. code-block:: python

    test_env = gym.make_vec(
        id="...",
        num_envs=...,
        vector_kwargs={"autoreset_mode": gym.vector.AutoresetMode.SAME_STEP},
    )

    stats = collector.collect(
        n_steps=...,
        learn=True,
        test_interval=...,  # how often to test
        n_test_steps=...,  # for how many vec env steps to test
    )


Saving and loading hierarchies
-------------------------------

To save the parameters of a hierarchy simply run:

.. code-block:: python

    hierarchy.save("path/to/model/dir")

This will create a directory in which each level will save its parameters. To load a set of
saved parameters, run:

.. code-block:: python

    hierarchy.load("path/to/model/dir")

The same pattern works for saving and loading buffers:

.. code-block:: python

    hierarchy.save_buffers("path/to/buffer/dir")
    # and
    hierarchy.load_buffers("path/to/buffer/dir")

If you want to save hierarchy checkpoints periodically during training, specify a checkpoint interval and directory when instantiating :class:`~layeredrl.collectors.Collector`:

.. code-block:: python

    from pathlib import Path

    collector = Collector(
        hierarchy=hierarchy,
        env=env,
        ...,
        checkpoint_dir=Path("/path/to/checkpoint/dir"),
        checkpoint_interval=...,  # checkpoint every ... vec env steps
    )

Next Steps
----------

* Learn more about :doc:`user_guide/hierarchies`
* Explore available :doc:`user_guide/levels`
* Check out the :doc:`examples/gallery`