Lowest advantage
episodes
(unexpected failures):
trajectory 2, frame 98
trajectory 3, frame 400
trajectory 5, frame 127
trajectory 6, frame 49
trajectory 6, frame 314
trajectory 1, frame 467
trajectory 7, frame 406
trajectory 1, frame 393
trajectory 8, frame 211
trajectory 5, frame 1
trajectory 4, frame 19
trajectory 3, frame 182
trajectory 1, frame 39
trajectory 4, frame 318
trajectory 7, frame 497
trajectory 1, frame 139
Highest advantage
episodes
(unexpected successes):
trajectory 8, frame 319
trajectory 1, frame 97
trajectory 3, frame 411
trajectory 6, frame 9
trajectory 7, frame 48
trajectory 8, frame 55
trajectory 4, frame 107
trajectory 3, frame 68
trajectory 4, frame 350
trajectory 5, frame 252
trajectory 7, frame 454
trajectory 5, frame 179
trajectory 3, frame 296
trajectory 7, frame 326
trajectory 6, frame 392
trajectory 1, frame 441
frame: 1 | policy: | next action: → | ||
no-op → ← ↑ ↗ ↖ ↓ A B |
Observation | Positive attribution | Negative attribution | |
---|---|---|---|
policy logits: sums of policy logits: | |||
Click to expand feature
go backwards
go forwards
toggle play/pause