Lowest advantage
episodes
(unexpected failures):
trajectory 3, frame 440
trajectory 5, frame 255
trajectory 8, frame 442
trajectory 1, frame 217
trajectory 6, frame 373
trajectory 8, frame 166
trajectory 3, frame 158
trajectory 5, frame 412
trajectory 6, frame 495
trajectory 3, frame 55
trajectory 3, frame 271
trajectory 6, frame 224
trajectory 8, frame 36
trajectory 1, frame 87
trajectory 7, frame 123
trajectory 4, frame 40
Highest advantage
episodes
(unexpected successes):
trajectory 5, frame 275
trajectory 5, frame 59
trajectory 6, frame 509
trajectory 6, frame 50
trajectory 3, frame 137
trajectory 4, frame 227
trajectory 8, frame 71
trajectory 1, frame 355
trajectory 2, frame 221
trajectory 7, frame 418
trajectory 2, frame 451
trajectory 1, frame 417
trajectory 4, frame 90
trajectory 6, frame 97
trajectory 1, frame 274
trajectory 2, frame 308
frame: 1 | policy: | next action: → | ||
no-op → ← ↑ ↗ ↖ ↓ A B |
Observation | Positive attribution | Negative attribution | |
---|---|---|---|
policy logits: sums of policy logits: | |||
Click to expand feature
Hover to isolate
go backwards
go forwards
toggle play/pause