Lowest advantage
episodes
(unexpected failures):
trajectory 1, frame 421
trajectory 2, frame 212
trajectory 4, frame 484
trajectory 4, frame 95
trajectory 6, frame 163
trajectory 3, frame 92
trajectory 8, frame 415
trajectory 5, frame 366
trajectory 7, frame 220
trajectory 5, frame 63
trajectory 2, frame 188
trajectory 1, frame 6
trajectory 1, frame 196
trajectory 7, frame 293
trajectory 1, frame 76
trajectory 4, frame 191
Highest advantage
episodes
(unexpected successes):
trajectory 4, frame 440
trajectory 2, frame 190
trajectory 1, frame 285
trajectory 2, frame 217
trajectory 7, frame 123
trajectory 1, frame 67
trajectory 3, frame 99
trajectory 6, frame 236
trajectory 8, frame 149
trajectory 8, frame 448
trajectory 4, frame 82
trajectory 7, frame 21
trajectory 3, frame 230
trajectory 8, frame 216
trajectory 4, frame 325
trajectory 3, frame 303
frame: 1 | policy: | next action: ↘ | ||
↙ ← ↖ ↓ no-op ↑ ↘ → ↗ D A W S Q E |
Observation | Positive attribution | Negative attribution | |
---|---|---|---|
policy logits: sums of policy logits: | |||
Click to expand feature
Hover to isolate
go backwards
go forwards
toggle play/pause