Lowest advantage
episodes
(unexpected failures):
trajectory 3, frame 409
trajectory 8, frame 293
trajectory 5, frame 384
trajectory 7, frame 94
trajectory 6, frame 263
trajectory 6, frame 475
trajectory 2, frame 305
trajectory 8, frame 490
trajectory 3, frame 30
trajectory 4, frame 149
trajectory 1, frame 237
trajectory 1, frame 105
trajectory 4, frame 17
trajectory 8, frame 47
trajectory 7, frame 385
trajectory 1, frame 396
Highest advantage
episodes
(unexpected successes):
trajectory 8, frame 296
trajectory 4, frame 345
trajectory 4, frame 426
trajectory 3, frame 489
trajectory 4, frame 474
trajectory 5, frame 397
trajectory 1, frame 303
trajectory 8, frame 142
trajectory 2, frame 49
trajectory 6, frame 387
trajectory 6, frame 354
trajectory 8, frame 5
trajectory 2, frame 162
trajectory 4, frame 13
trajectory 2, frame 466
trajectory 5, frame 200
frame: 1 | policy: | next action: no-op | ||
no-op → ← ↑ ↗ ↖ ↓ A B |
Observation | Positive attribution | Negative attribution | |
---|---|---|---|
policy logits: sums of policy logits: | |||
Click to expand feature
go backwards
go forwards
toggle play/pause