Lowest advantage
episodes
(unexpected failures):
trajectory 6, frame 395
trajectory 2, frame 440
trajectory 2, frame 22
trajectory 8, frame 265
trajectory 4, frame 397
trajectory 7, frame 380
trajectory 3, frame 322
trajectory 4, frame 328
trajectory 8, frame 506
trajectory 5, frame 496
trajectory 5, frame 437
trajectory 5, frame 16
trajectory 5, frame 47
trajectory 7, frame 113
trajectory 2, frame 136
trajectory 4, frame 482
Highest advantage
episodes
(unexpected successes):
trajectory 5, frame 19
trajectory 7, frame 159
trajectory 8, frame 425
trajectory 4, frame 427
trajectory 8, frame 353
trajectory 2, frame 50
trajectory 5, frame 487
trajectory 2, frame 240
trajectory 7, frame 360
trajectory 2, frame 488
trajectory 4, frame 109
trajectory 6, frame 59
trajectory 6, frame 434
trajectory 4, frame 315
trajectory 5, frame 297
trajectory 4, frame 235
frame: 1 | policy: | next action: → | ||
no-op → ← ↑ ↗ ↖ ↓ A B |
Observation | Positive attribution | Negative attribution | |
---|---|---|---|
policy logits: sums of policy logits: | |||
Click to expand feature
Hover to isolate
go backwards
go forwards
toggle play/pause