Rover on an 8×8 Gridworld Randomized goal, rocks & pit

Randomized 8×8 board with one goal (green), two rocks (impassable), and one pit (terminal negative). Every cell has a reward \(R(s)\); goal and pit override with \(+G\) and \(-P\).

Show:
Arrows: ●Green=Optimal, ●Orange=Learned, ●Purple=Path
Manual Control: Use ↑↓←→ arrow keys to move rover
We estimate values from zeros and update via Bellman sweeps or TD(0) simulation. The path heatmap intensifies as the rover traverses cells. Update rule (Bellman): \(V(s)\leftarrow \sum_a \pi(a|s)\,[\,R(s') + \gamma V(s')\,]\) where \(s'\) is the neighbor (blocked/wall → stays in place). Terminals: \(V(\text{goal})=G,\;V(\text{pit})=-P\).

Reachability & Progress

Probability reach goal within ≤ N steps
Rover position
Episodes finished
0

Parameters

Heatmap: blue intensity ∝ visit count. Goal=green, Pit=red, Rocks=grey. Values & rewards are shown inside each cell (small).