Lowest advantage
episodes
(unexpected failures):
trajectory 6, frame 408
trajectory 7, frame 84
trajectory 8, frame 500
trajectory 5, frame 487
trajectory 2, frame 264
trajectory 5, frame 130
trajectory 1, frame 405
trajectory 6, frame 78
trajectory 5, frame 393
trajectory 3, frame 80
trajectory 7, frame 232
trajectory 1, frame 317
trajectory 6, frame 183
trajectory 1, frame 68
trajectory 4, frame 119
trajectory 5, frame 56
Highest advantage
episodes
(unexpected successes):
trajectory 7, frame 96
trajectory 5, frame 511
trajectory 2, frame 313
trajectory 7, frame 472
trajectory 2, frame 237
trajectory 3, frame 378
trajectory 6, frame 479
trajectory 8, frame 10
trajectory 8, frame 511
trajectory 5, frame 249
trajectory 4, frame 216
trajectory 7, frame 498
trajectory 6, frame 358
trajectory 2, frame 12
trajectory 1, frame 140
trajectory 1, frame 258
frame: 1 | policy: | next action: → | ||
no-op → ← ↑ ↗ ↖ ↓ A B |
Observation | Positive attribution | Negative attribution | |
---|---|---|---|
policy logits: sums of policy logits: | |||
Click to expand feature
Hover to isolate
go backwards
go forwards
toggle play/pause