Lowest advantage
episodes
(unexpected failures):
trajectory 2, frame 443
trajectory 3, frame 229
trajectory 5, frame 264
trajectory 6, frame 106
trajectory 1, frame 156
trajectory 8, frame 211
trajectory 4, frame 150
trajectory 7, frame 155
Highest advantage
episodes
(unexpected successes):
trajectory 7, frame 102
trajectory 2, frame 261
trajectory 3, frame 504
trajectory 8, frame 283
trajectory 6, frame 477
trajectory 4, frame 112
trajectory 1, frame 327
trajectory 5, frame 96
frame: 1 | policy: | next action: Q | ||
↙ ← ↖ ↓ no-op ↑ ↘ → ↗ D A W S Q E |
Observation | Positive attribution | Negative attribution | |
---|---|---|---|
policy logits: sums of policy logits: | |||
Click to expand feature
Hover to isolate
go backwards
go forwards
toggle play/pause