Utilities API

class layeredrl.utils.ToDeviceReplayBuffer(target_device=device(type='cpu'), *args, **kwargs)[source]

Bases: VectorReplayBuffer

Replay buffer that moves batch to target device after sampling.

__getitem__(index: slice | int | List[int] | ndarray) → Batch[source]: Return a data batch: self[index].

__init__(target_device=device(type='cpu'), *args, **kwargs)[source]

Initialize the wrapper.

Parameters:: target_device – The target device to move the batch to.

sample(batch_size: int) → Tuple[Batch, ndarray][source]

Sample from replay buffer and move batch to target device.

If batch_size is 0, return all the data in the buffer

Parameters:: batch_size – The batch size.
Returns:: Sample data and its corresponding indices inside the buffer.

layeredrl.utils.cdf_normal(value, loc, scale)[source]: Compute the CDF of a normal distribution.

layeredrl.utils.sample_truncated_normal(loc, scale, lower_bound, upper_bound, n_samples, device)[source]

Sample from a truncated normal distribution.

Note that everything is assumed to have a batch dimension. Sampling is done via the inverse CDF method rather than rejection sampling.

layeredrl.utils.get_normal_prob(x: Tensor, mean: Tensor, std: Tensor) → Tensor[source]

Get the probability (density) of x under a Normal distribution with the given mean and standard deviation.

Parameters:

x – The value.
mean – The mean of the Gaussian.
std – The standard deviation of the Normal distribution.

Returns:

The probability (density) of x under the Normal distribution.

layeredrl.utils.get_normal_log_prob(x: Tensor, mean: Tensor, std: Tensor, beta: float = 0.0) → Tensor[source]

Get the log probability of x under a 1D Normal distribution with the given mean and standard deviation.

Can multiply a scale invariance factor to the second term of the log probability to remove the gradient of the log probability with respect to the scale of x and mean.

No sum over the last dimension is performed.

Parameters:

x – The value.
mean – The mean of the Gaussian.
std – The standard deviation of the Normal distribution.
beta – The beta parameter for eta-NLL.

Returns:

The log probability of x under the Normal distribution.

layeredrl.utils.get_writer(logdir: Path | str, wandb_project: str | None = None, wandb_name: str | None = None, id: str | None = None)[source]

Get a TensorBoard writer for logging.

If a project name is given, also log to Weights & Biases.

Parameters:

logdir – The path to the directory where to store the Weights & Biases and TensorBoard logs. If None, no logs are written.
wandb_project – The name of the Weights & Biases project to log to. If None, no logs are written to Weights & Biases.
wandb_name – The name of the run in Weights & Biases. If None, a two-word name is generated by wandb.
id – The id of the run in Weights & Biases (that will be continued). If None, a new run is created.

Returns:

A TensorBoard writer or None if logdir is None.

class layeredrl.utils.VideoLogger(log_dir: str, fps: int = 10, max_episodes: int = 10, camera_name: str | None = None)[source]

Bases: object

Class for logging video clips of rollouts.

__init__(log_dir: str, fps: int = 10, max_episodes: int = 10, camera_name: str | None = None)[source]

Initialize the logger.

Parameters:

log_dir – The directory to save the videos.
fps – The frames per second of the video.

log_frame(env: Env, terminated: bool, truncated: bool)[source]

Log a frame from the environment.

Requires the environments to be in render mode ‘rbg_array’.

Parameters:

env – The environment.
terminated – Whether the episode is terminated.
truncated – Whether the episode is truncated.

save_video()[source]: Save video to disk.

class layeredrl.utils.RangesMap(ranges: List[Tuple[int | None, int | None]])[source]

Bases: object

Keeps only specified index ranges of the input tensor.

__call__(x: Tensor) → Tensor[source]

Parameters:: x – Input tensor.
Returns:: Output tensor with only the specified ranges.

__init__(ranges: List[Tuple[int | None, int | None]])[source]

Parameters:: ranges – List of index ranges to keep. Each range is a tuple of the form (start, end) with start and end being integers or None (indicating no bound of the range).

class layeredrl.utils.ConcatMap(maps: List[Callable])[source]

Bases: object

Concatenates multiple maps.

__call__(*args) → Tensor[source]

Parameters:: *args – Input tensors.
Returns:: Output tensor with the maps concatenated.

__init__(maps: List[Callable])[source]: Args: maps: List of maps to concatenate.

layeredrl.utils.to_torch(array: ndarray | Tensor, device: device = device(type='cpu')) → Tensor[source]

Convert an array to a torch tensor.

Parameters:: array – The array.
Returns:: The torch tensor.

layeredrl.utils.to_numpy(array: ndarray | Tensor, device: device = device(type='cpu')) → ndarray[source]

Convert an array to a numpy array.

Parameters:: array – The array.
Returns:: The numpy array.

layeredrl.utils.copy_torch_or_numpy(array: ndarray | Tensor) → ndarray[source]

Copy a torch tensor or numpy array.

Parameters:: array – The array.
Returns:: The copy.

layeredrl.utils.temp_eval_mode(net)[source]: Temporarily set the net to eval mode.

layeredrl.utils.get_key_indices(input_space: Dict) → Tuple[Dict, Dict][source]

Get index ranges that correspond to keys of a Dict space in flattened vector.

Also returns a dictionary containing the shapes of these observation components.

Returns:

A dictionary with keys corresponding to the observation components and
values being tuples of the form (start, end), where start and end are the indices at which the observation component starts and ends. The nested dictionary structure of the observation is preserved.
A dictionary of the same structure but with values being the shapes
of the observation components.

layeredrl.utils.get_obs_indices(env: Env) → Tuple[Dict, Dict][source]

Get index ranges that correspond to keys of the observation Dict space in flattened vector.

Also returns a dictionary containing the shapes of these observation components.

Parameters:

env – The environment with Dict observation space.

Returns:

A dictionary with keys corresponding to the observation components and
values being tuples of the form (start, end), where start and end are the indices at which the observation component starts and ends. The nested dictionary structure of the observation is preserved.
A dictionary of the same structure but with values being the shapes
of the observation components.

class layeredrl.utils.RunningBatchNorm(num_features: int, eps=1e-05, momentum=0.1, device=None, dtype=None, track_mean=True, freeze_after=None)[source]

Bases: Module

Batch normalization using running estimates of mean and variance.

__init__(num_features: int, eps=1e-05, momentum=0.1, device=None, dtype=None, track_mean=True, freeze_after=None)[source]: Initialize internal Module state, shared by both nn.Module and ScriptModule.

forward(x: Tensor)[source]

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class layeredrl.utils.Standardizer(latent_state_dim: int, device: device)[source]

Bases: Module

Standardize a random vector (trained externally).

__init__(latent_state_dim: int, device: device)[source]: Initialize internal Module state, shared by both nn.Module and ScriptModule.

forward(x)[source]

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

get_symm_mat()[source]

class layeredrl.utils.Schedule[source]

Bases: ABC

Base class for (learning rate) schedules.

class layeredrl.utils.RampSchedule(start: float, end: float, warm_up: int, duration: int)[source]

Bases: Schedule

__init__(start: float, end: float, warm_up: int, duration: int)[source]

class layeredrl.utils.ZeroThenLinearSchedule(zero_duration: int, duration: int, start_value: float, end_value: float)[source]

Bases: Schedule

__init__(zero_duration: int, duration: int, start_value: float, end_value: float)[source]

class layeredrl.utils.PiecewiseLinearSchedule(points: List[Tuple[int, float]])[source]

Bases: Schedule

__init__(points: List[Tuple[int, float]])[source]

class layeredrl.utils.PiecewiseLogLinearSchedule(points: List[Tuple[int, float]])[source]

Bases: Schedule

__init__(points: List[Tuple[int, float]])[source]

class layeredrl.utils.ConstantSchedule(value: float)[source]

Bases: Schedule

__init__(value: float)[source]