Lowest advantage
episodes
(unexpected failures):
trajectory 2, frame 276
trajectory 6, frame 316
trajectory 4, frame 435
trajectory 2, frame 59
trajectory 8, frame 231
trajectory 5, frame 19
trajectory 5, frame 196
trajectory 8, frame 443
trajectory 3, frame 66
trajectory 6, frame 213
trajectory 4, frame 96
trajectory 7, frame 191
trajectory 6, frame 60
trajectory 1, frame 452
trajectory 2, frame 422
trajectory 7, frame 423
Highest advantage
episodes
(unexpected successes):
trajectory 1, frame 497
trajectory 6, frame 331
trajectory 5, frame 336
trajectory 5, frame 477
trajectory 8, frame 391
trajectory 3, frame 42
trajectory 7, frame 43
trajectory 5, frame 366
trajectory 7, frame 219
trajectory 1, frame 49
trajectory 3, frame 262
trajectory 4, frame 33
trajectory 8, frame 6
trajectory 3, frame 461
trajectory 1, frame 418
trajectory 5, frame 289
frame: 1 | policy: | next action: → | ||
no-op → ← ↑ ↗ ↖ ↓ A B |
Observation | Positive attribution | Negative attribution | |
---|---|---|---|
policy logits: sums of policy logits: | |||
Click to expand feature
Hover to isolate
go backwards
go forwards
toggle play/pause