Lowest advantage
episodes
(unexpected failures):
trajectory 3, frame 355
trajectory 6, frame 451
trajectory 6, frame 70
trajectory 1, frame 424
trajectory 7, frame 38
trajectory 7, frame 318
trajectory 2, frame 404
trajectory 5, frame 226
trajectory 1, frame 493
trajectory 1, frame 26
trajectory 5, frame 123
trajectory 3, frame 97
trajectory 6, frame 285
trajectory 8, frame 496
trajectory 7, frame 471
trajectory 3, frame 45
Highest advantage
episodes
(unexpected successes):
trajectory 2, frame 415
trajectory 3, frame 386
trajectory 5, frame 223
trajectory 8, frame 312
trajectory 5, frame 166
trajectory 3, frame 454
trajectory 8, frame 82
trajectory 6, frame 483
trajectory 5, frame 23
trajectory 2, frame 207
trajectory 1, frame 437
trajectory 1, frame 209
trajectory 7, frame 366
trajectory 3, frame 274
trajectory 8, frame 215
trajectory 7, frame 126
frame: 1 | policy: | next action: → | ||
no-op → ← ↑ ↗ ↖ ↓ A B |
Observation | Positive attribution | Negative attribution | |
---|---|---|---|
policy logits: sums of policy logits: | |||
Click to expand feature
Hover to isolate
go backwards
go forwards
toggle play/pause