Lowest advantage
episodes
(unexpected failures):
trajectory 1, frame 358
trajectory 2, frame 204
trajectory 6, frame 392
trajectory 4, frame 197
trajectory 5, frame 307
trajectory 6, frame 1
trajectory 7, frame 63
trajectory 5, frame 394
trajectory 6, frame 230
trajectory 5, frame 500
trajectory 3, frame 330
trajectory 5, frame 185
trajectory 3, frame 421
trajectory 4, frame 1
trajectory 3, frame 96
trajectory 8, frame 428
Highest advantage
episodes
(unexpected successes):
trajectory 1, frame 435
trajectory 4, frame 131
trajectory 7, frame 164
trajectory 2, frame 445
trajectory 6, frame 330
trajectory 1, frame 53
trajectory 4, frame 406
trajectory 4, frame 36
trajectory 5, frame 61
trajectory 8, frame 163
trajectory 5, frame 9
trajectory 3, frame 489
trajectory 3, frame 229
trajectory 1, frame 131
trajectory 6, frame 462
trajectory 8, frame 473
frame: 1 | policy: | next action: A | ||
no-op → ← ↑ ↗ ↖ ↓ A B |
Observation | Positive attribution | Negative attribution | |
---|---|---|---|
policy logits: sums of policy logits: | |||
Click to expand feature
go backwards
go forwards
toggle play/pause