Lowest advantage
episodes
(unexpected failures):
trajectory 3, frame 36
trajectory 7, frame 135
trajectory 3, frame 328
trajectory 4, frame 125
trajectory 5, frame 250
trajectory 8, frame 177
trajectory 2, frame 271
trajectory 1, frame 333
trajectory 6, frame 234
trajectory 5, frame 493
trajectory 5, frame 30
trajectory 6, frame 435
trajectory 3, frame 510
Highest advantage
episodes
(unexpected successes):
trajectory 5, frame 376
trajectory 6, frame 394
trajectory 3, frame 483
trajectory 5, frame 132
trajectory 8, frame 501
trajectory 4, frame 460
trajectory 2, frame 57
trajectory 7, frame 104
trajectory 6, frame 499
trajectory 5, frame 467
trajectory 1, frame 477
trajectory 3, frame 505
frame: 1 | policy: | next action: ← | ||
↙ ← ↖ ↓ no-op ↑ ↘ → ↗ D A W S Q E |
Observation | Positive attribution | Negative attribution | |
---|---|---|---|
policy logits: sums of policy logits: | |||
Click to expand feature
Hover to isolate
go backwards
go forwards
toggle play/pause