Collectors API

class layeredrl.collectors.Collector(hierarchy: Hierarchy, env: Env, test_env: Env | None = None, device=device(type='cpu'), writer: SummaryWriter | None = None, checkpoint_dir: Path | None = None, checkpoint_interval: int | None = None)[source]

Bases: object

__init__(hierarchy: Hierarchy, env: Env, test_env: Env | None = None, device=device(type='cpu'), writer: SummaryWriter | None = None, checkpoint_dir: Path | None = None, checkpoint_interval: int | None = None)[source]

Initialize the collector.

Parameters:
  • hierarchy – The hierarchy to collect data with.

  • env – The environment to collect data/train in.

  • test_env – The environment to test in.

  • device – The device to use.

  • writer – The TensorBoard writer to use for logging. If None, no logging is done.

  • checkpoint_dir – The directory to save checkpoints to. If None, no checkpoints are saved.

  • checkpoint_interval – The interval in steps between checkpoints. If None, only the final checkpoint is saved.

collect(n_steps: int, env_expects_numpy: bool = True, record_transitions: bool = False, learn: bool = False, n_steps_start: int = 0, log_interval: int = 100, test_interval: int | None = None, n_test_steps: int = 1000, verbose: bool = False, post_step_callback: Callable | None = None, video_logger: VideoLogger | None = None) Tuple | Batch[source]

Collect transitions from the environment with the hierarchical policy.

This collects different transitions on every level of the hierarchy as the higher levels see semi MDPs.

Parameters:
  • n_steps – The number of steps to collect in each environment instance. The total number of

  • n_envs. (collected steps is therefore n_steps *)

  • env_expects_numpy – Whether the environment expects numpy arrays as input.

  • record_transitions – Whether to record the environment transitions and return them in a batch. Note that the first dimension of the batch corresponds to the step, not the number of environment instances.

  • learn – Whether to learn after each step.

  • n_steps_start – The number of steps that have already been collected. This is useful for for resuming an experiment.

  • log_interval – The interval in steps between logging.

  • test_interval – The interval in vector environment steps between testing the hierarchy. If None, no testing is done.

  • n_test_steps – The number of vector environment steps to test the hierarchy for at each test interval.

  • verbose – Whether to print progress.

  • post_step_callback – A callback function that is called after each step. The callback function should take the current step and the next observation as an argument.

  • video_logger – A VideoLogger object to log videos of the rollouts.

Returns:

The statistics of the rollouts and (if record_transitions is Ture) the collected transitions in a Batch object.

reset(seed=None)[source]

Reset the collector.

Call this at the beginning of the session.

save_checkpoint(t: int, n_steps: int)[source]

Save a checkpoint of the hierarchy.

Parameters:

t – The current step.

test(t: int, test_hierarchy: Hierarchy, n_steps: int, env_expects_numpy: bool = True, video_logger: VideoLogger | None = None) dict[source]

Test the hierarchy in the environment.

Note: This resets the test environment and the test hierarchy.

Parameters:
  • t – The current training step.

  • n_steps – The number of steps to test the hierarchy.

  • env_expects_numpy – Whether the environment expects numpy arrays as input.

  • video_logger – A VideoLogger object to log videos of the rollouts.

Returns:

The statistics of the test run.