Lowest advantage
episodes
(unexpected failures):
trajectory 4, frame 133
trajectory 4, frame 228
trajectory 5, frame 393
trajectory 6, frame 16
trajectory 6, frame 147
trajectory 1, frame 23
trajectory 8, frame 88
trajectory 3, frame 442
trajectory 3, frame 314
trajectory 1, frame 320
trajectory 2, frame 381
trajectory 7, frame 32
trajectory 5, frame 129
trajectory 1, frame 373
trajectory 7, frame 307
trajectory 7, frame 487
Highest advantage
episodes
(unexpected successes):
trajectory 7, frame 35
trajectory 3, frame 324
trajectory 2, frame 430
trajectory 7, frame 493
trajectory 5, frame 408
trajectory 5, frame 506
trajectory 3, frame 457
trajectory 4, frame 166
trajectory 6, frame 53
trajectory 6, frame 434
trajectory 1, frame 147
trajectory 8, frame 418
trajectory 2, frame 43
trajectory 4, frame 297
trajectory 8, frame 312
trajectory 8, frame 42
frame: 1 | policy: | next action: → | ||
no-op → ← ↑ ↗ ↖ ↓ A B |
Observation | Positive attribution | Negative attribution | |
---|---|---|---|
policy logits: sums of policy logits: | |||
Click to expand feature
Hover to isolate
go backwards
go forwards
toggle play/pause