Rover on an 8×8 Gridworld Randomized goal, rocks & pit
Randomized 8×8 board with one goal (green), two rocks (impassable), and one pit (terminal negative).
Every cell has a reward \(R(s)\); goal and pit override with \(+G\) and \(-P\).
We estimate values from zeros and update via Bellman sweeps or TD(0) simulation. The path heatmap intensifies as the rover traverses cells.
Update rule (Bellman): \(V(s)\leftarrow \sum_a \pi(a|s)\,[\,R(s') + \gamma V(s')\,]\) where \(s'\) is the neighbor (blocked/wall → stays in place). Terminals: \(V(\text{goal})=G,\;V(\text{pit})=-P\).
Reachability & Progress
Probability reach goal within ≤ N steps
—
Rover position
—
Episodes finished
0
Parameters
Heatmap: blue intensity ∝ visit count. Goal=green, Pit=red, Rocks=grey. Values & rewards are shown inside each cell (small).