Environments and Wrappers API

class layeredrl.envs.LogRewWrapper(env, r_offset: float = 30.0, r_scale: float = 0.02)[source]

Bases: RewardWrapper

Applies log to the reward.

__init__(env, r_offset: float = 30.0, r_scale: float = 0.02)[source]

Constructor for the Reward wrapper.

Parameters:

env – Environment to be wrapped.

reward(r: SupportsFloat) SupportsFloat[source]

Returns a modified environment reward.

Parameters:

reward – The env step() reward

Returns:

The modified reward

class layeredrl.envs.AntFlippedWrapper(env: Env[ObsType, ActType])[source]

Bases: Wrapper

Wrapper around Ant Maze environment that terminates when the Ant has flipped over.

reset(**kwargs)[source]

Uses the reset() of the env that can be overwritten to change the returned data.

step(action)[source]

Uses the step() of the env that can be overwritten to change the returned data.

class layeredrl.envs.AntNoWallFlippedWrapper(env: Env[ObsType, ActType])[source]

Bases: Wrapper

Wrapper around Ant environment that terminates when the Ant has flipped over.

reset(**kwargs)[source]

Uses the reset() of the env that can be overwritten to change the returned data.

step(action)[source]

Uses the step() of the env that can be overwritten to change the returned data.

class layeredrl.envs.AffineRewWrapper(env, r_offset: SupportsFloat = 0.0, r_scale: SupportsFloat = 1.0)[source]

Bases: RewardWrapper

Applies an affine linear transformation to the reward signal.

__init__(env, r_offset: SupportsFloat = 0.0, r_scale: SupportsFloat = 1.0)[source]

Constructor for the Reward wrapper.

Parameters:

env – Environment to be wrapped.

reward(r: SupportsFloat) SupportsFloat[source]

Returns a modified environment reward.

Parameters:

reward – The env step() reward

Returns:

The modified reward

class layeredrl.envs.Maze2DEnv(maze_layout: ndarray | None = None, maze_size: Tuple[int, int] = (10, 10), cell_size: float = 1.0, start_pos: List[Tuple[float, float]] | None = None, goal_pos: List[Tuple[float, float]] | None = None, goal_radius: float = 0.3, max_velocity: float = 1.0, dt: float = 0.1, max_episode_steps: int = 400, dense_reward: bool = True, render_mode: str | None = None, pixel_size: int = 600)[source]

Bases: Env

A simple 2D maze environment with a velocity-controlled point mass.

The agent controls velocity directly. The environment includes: - Collision detection with walls - Configurable maze layouts - Pygame-based visualization for rendering and planning overlays

Observation:
Type: Dict with keys:
  • ‘observation’: Box(2) - current position [x, y]

  • ‘achieved_goal’: Box(2) - current position [x, y]

  • ‘desired_goal’: Box(2) - goal position [x, y]

Each Box has:

Min: [0, 0] Max: [maze_width, maze_height]

Action:

Type: Box(2) Num Action Min Max 0 x velocity -max_velocity max_velocity 1 y velocity -max_velocity max_velocity

Reward:

Sparse reward of 1.0 when reaching the goal, 0.0 otherwise. Can be customized with a reward function.

Episode Termination:
  • Agent reaches within goal_radius of the goal position

  • Episode length is greater than max_episode_steps

__init__(maze_layout: ndarray | None = None, maze_size: Tuple[int, int] = (10, 10), cell_size: float = 1.0, start_pos: List[Tuple[float, float]] | None = None, goal_pos: List[Tuple[float, float]] | None = None, goal_radius: float = 0.3, max_velocity: float = 1.0, dt: float = 0.1, max_episode_steps: int = 400, dense_reward: bool = True, render_mode: str | None = None, pixel_size: int = 600)[source]

Initialize the Maze2D environment.

Parameters:
  • maze_layout – Binary array where 1 = wall, 0 = free space. If None, creates empty maze.

  • maze_size – Size of the maze in cells (height, width) if maze_layout is None

  • cell_size – Size of each cell in world coordinates

  • start_pos – List of starting positions (x, y). If None, uses all empty cells.

  • goal_pos – List of goal positions (x, y). If None, uses all empty cells.

  • goal_radius – Distance threshold for reaching the goal

  • max_velocity – Maximum velocity magnitude in each dimension

  • dt – Time step for integration

  • max_episode_steps – Maximum steps per episode

  • dense_reward – If True, provide dense reward based on distance to goal

  • render_mode – “human” or “rgb_array”

  • pixel_size – Size of the rendering window in pixels

close()[source]

Clean up resources.

metadata: dict[str, Any] = {'render_fps': 30, 'render_modes': ['human', 'rgb_array']}
render()[source]

Render the environment using Pygame.

reset(seed: int | None = None, options: Dict[str, Any] | None = None) Tuple[ndarray, Dict[str, Any]][source]

Reset the environment to initial state.

set_plans(plans: list)[source]

Set plans to visualize in the render.

Parameters:

plans – List of plans, where each plan is a dict with: - ‘trajectory’: np.ndarray of shape (T, 2) with positions - ‘color’: tuple (r, g, b) for rendering

step(action: ndarray) Tuple[ndarray, float, bool, bool, Dict[str, Any]][source]

Execute one time step within the environment.

layeredrl.envs.create_simple_maze(size: int = 10) ndarray[source]

Create a simple maze with some walls.

layeredrl.envs.create_corridor_maze(width: int = 20, height: int = 5) ndarray[source]

Create a corridor maze.

layeredrl.envs.create_medium_maze() ndarray[source]

Create a medium complexity maze.