Lowest advantage
episodes
(unexpected failures):
trajectory 5, frame 443
trajectory 1, frame 6
trajectory 1, frame 304
trajectory 3, frame 345
trajectory 5, frame 32
trajectory 6, frame 461
trajectory 1, frame 438
trajectory 3, frame 212
trajectory 8, frame 464
trajectory 3, frame 494
trajectory 8, frame 63
trajectory 2, frame 438
trajectory 2, frame 132
trajectory 6, frame 350
trajectory 4, frame 421
trajectory 1, frame 104
Highest advantage
episodes
(unexpected successes):
trajectory 7, frame 483
trajectory 1, frame 444
trajectory 3, frame 254
trajectory 1, frame 368
trajectory 3, frame 361
trajectory 7, frame 42
trajectory 1, frame 122
trajectory 2, frame 66
trajectory 2, frame 220
trajectory 7, frame 216
trajectory 2, frame 374
trajectory 1, frame 277
trajectory 3, frame 149
trajectory 4, frame 399
trajectory 5, frame 80
trajectory 6, frame 370
frame: 1 | policy: | next action: → | ||
no-op → ← ↑ ↗ ↖ ↓ A B |
Observation | Positive attribution | Negative attribution | |
---|---|---|---|
policy logits: sums of policy logits: | |||
Click to expand feature
Hover to isolate
go backwards
go forwards
toggle play/pause