Environments and Wrappers API

class layeredrl.envs.LogRewWrapper(env, r_offset: float = 30.0, r_scale: float = 0.02)[source]

Bases: RewardWrapper

Applies log to the reward.

__init__(env, r_offset: float = 30.0, r_scale: float = 0.02)[source]

Constructor for the Reward wrapper.

Parameters:: env – Environment to be wrapped.

reward(r: SupportsFloat) → SupportsFloat[source]

Returns a modified environment reward.

Parameters:: reward – The env step() reward
Returns:: The modified reward

class layeredrl.envs.AntFlippedWrapper(env: Env[ObsType, ActType])[source]

Bases: Wrapper

Wrapper around Ant Maze environment that terminates when the Ant has flipped over.

reset(**kwargs)[source]: Uses the reset() of the env that can be overwritten to change the returned data.

step(action)[source]: Uses the step() of the env that can be overwritten to change the returned data.

class layeredrl.envs.AntNoWallFlippedWrapper(env: Env[ObsType, ActType])[source]

Bases: Wrapper

Wrapper around Ant environment that terminates when the Ant has flipped over.

reset(**kwargs)[source]: Uses the reset() of the env that can be overwritten to change the returned data.

step(action)[source]: Uses the step() of the env that can be overwritten to change the returned data.

class layeredrl.envs.AffineRewWrapper(env, r_offset: SupportsFloat = 0.0, r_scale: SupportsFloat = 1.0)[source]

Bases: RewardWrapper

Applies an affine linear transformation to the reward signal.

__init__(env, r_offset: SupportsFloat = 0.0, r_scale: SupportsFloat = 1.0)[source]

Constructor for the Reward wrapper.

Parameters:: env – Environment to be wrapped.

reward(r: SupportsFloat) → SupportsFloat[source]

Returns a modified environment reward.

Parameters:: reward – The env step() reward
Returns:: The modified reward

class layeredrl.envs.Maze2DEnv(maze_layout: ndarray | None = None, maze_size: Tuple[int, int] = (10, 10), cell_size: float = 1.0, start_pos: List[Tuple[float, float]] | None = None, goal_pos: List[Tuple[float, float]] | None = None, goal_radius: float = 0.3, max_velocity: float = 1.0, dt: float = 0.1, max_episode_steps: int = 400, dense_reward: bool = True, render_mode: str | None = None, pixel_size: int = 600)[source]

Bases: Env

A simple 2D maze environment with a velocity-controlled point mass.

The agent controls velocity directly. The environment includes: - Collision detection with walls - Configurable maze layouts - Pygame-based visualization for rendering and planning overlays

Observation:

Type: Dict with keys:

‘observation’: Box(2) - current position [x, y]
‘achieved_goal’: Box(2) - current position [x, y]
‘desired_goal’: Box(2) - goal position [x, y]

Each Box has:

Min: [0, 0] Max: [maze_width, maze_height]

Action:

Type: Box(2) Num Action Min Max 0 x velocity -max_velocity max_velocity 1 y velocity -max_velocity max_velocity

Reward:

Sparse reward of 1.0 when reaching the goal, 0.0 otherwise. Can be customized with a reward function.

Episode Termination:

Agent reaches within goal_radius of the goal position
Episode length is greater than max_episode_steps

__init__(maze_layout: ndarray | None = None, maze_size: Tuple[int, int] = (10, 10), cell_size: float = 1.0, start_pos: List[Tuple[float, float]] | None = None, goal_pos: List[Tuple[float, float]] | None = None, goal_radius: float = 0.3, max_velocity: float = 1.0, dt: float = 0.1, max_episode_steps: int = 400, dense_reward: bool = True, render_mode: str | None = None, pixel_size: int = 600)[source]

Initialize the Maze2D environment.

Parameters:

maze_layout – Binary array where 1 = wall, 0 = free space. If None, creates empty maze.
maze_size – Size of the maze in cells (height, width) if maze_layout is None
cell_size – Size of each cell in world coordinates
start_pos – List of starting positions (x, y). If None, uses all empty cells.
goal_pos – List of goal positions (x, y). If None, uses all empty cells.
goal_radius – Distance threshold for reaching the goal
max_velocity – Maximum velocity magnitude in each dimension
dt – Time step for integration
max_episode_steps – Maximum steps per episode
dense_reward – If True, provide dense reward based on distance to goal
render_mode – “human” or “rgb_array”
pixel_size – Size of the rendering window in pixels

close()[source]: Clean up resources.

metadata: dict[str, Any] = {'render_fps': 30, 'render_modes': ['human', 'rgb_array']}

render()[source]: Render the environment using Pygame.

reset(seed: int | None = None, options: Dict[str, Any] | None = None) → Tuple[ndarray, Dict[str, Any]][source]: Reset the environment to initial state.

set_plans(plans: list)[source]

Set plans to visualize in the render.

Parameters:: plans – List of plans, where each plan is a dict with: - ‘trajectory’: np.ndarray of shape (T, 2) with positions - ‘color’: tuple (r, g, b) for rendering

step(action: ndarray) → Tuple[ndarray, float, bool, bool, Dict[str, Any]][source]: Execute one time step within the environment.

layeredrl.envs.create_simple_maze(size: int = 10) → ndarray[source]: Create a simple maze with some walls.

layeredrl.envs.create_corridor_maze(width: int = 20, height: int = 5) → ndarray[source]: Create a corridor maze.

layeredrl.envs.create_medium_maze() → ndarray[source]: Create a medium complexity maze.