Lowest advantage
episodes
(unexpected failures):
trajectory 4, frame 164
trajectory 4, frame 449
trajectory 1, frame 289
trajectory 8, frame 220
trajectory 3, frame 390
trajectory 8, frame 99
trajectory 1, frame 204
trajectory 7, frame 308
trajectory 6, frame 402
trajectory 8, frame 292
trajectory 2, frame 466
trajectory 5, frame 434
trajectory 8, frame 411
trajectory 6, frame 344
trajectory 2, frame 198
trajectory 3, frame 184
Highest advantage
episodes
(unexpected successes):
trajectory 8, frame 466
trajectory 4, frame 333
trajectory 5, frame 470
trajectory 1, frame 250
trajectory 1, frame 65
trajectory 7, frame 412
trajectory 7, frame 323
trajectory 2, frame 499
trajectory 4, frame 379
trajectory 1, frame 38
trajectory 6, frame 373
trajectory 2, frame 247
trajectory 1, frame 136
trajectory 8, frame 355
trajectory 8, frame 266
trajectory 6, frame 289
frame: 1 | policy: | next action: B | ||
no-op → ← ↑ ↗ ↖ ↓ A B |
Observation | Positive attribution | Negative attribution | |
---|---|---|---|
policy logits: sums of policy logits: | |||
Click to expand feature
Hover to isolate
go backwards
go forwards
toggle play/pause