Rover on an 8×8 Gridworld Randomized goal, rocks & pit

Randomized 8×8 board with one goal (green), two rocks (impassable), and one pit (terminal negative). Every cell has a reward \(R(s)\); goal and pit override with \(+G\) and \(-P\).

Show:

Arrows: ●Green=Optimal, ●Orange=Learned, ●Purple=Path

Manual Control: Use ↑↓←→ arrow keys to move rover

We estimate values from zeros and update via Bellman sweeps or TD(0) simulation. The path heatmap intensifies as the rover traverses cells. Update rule (Bellman): \(V(s)\leftarrow \sum_a \pi(a|s)\,[\,R(s') + \gamma V(s')\,]\) where \(s'\) is the neighbor (blocked/wall → stays in place). Terminals: \(V(\text{goal})=G,\;V(\text{pit})=-P\).

Reachability & Progress

Probability reach goal within ≤ N steps

—

Rover position

—

Episodes finished

Parameters

Discount \( \gamma \) 0.90 TD learning rate \( \alpha \) 0.30 Policy \(p_↑\) 0.25 Policy \(p_↓\) 0.25 Policy \(p_←\) 0.25 Policy \(p_→\) 0.25 Seed Auto‑sweeps Reward min Reward max Goal reward \(G\) Pit penalty \(P\)

Heatmap: blue intensity ∝ visit count. Goal=green, Pit=red, Rocks=grey. Values & rewards are shown inside each cell (small).