Lowest advantage
episodes
(unexpected failures):
trajectory 7, frame 491
trajectory 7, frame 102
trajectory 1, frame 407
trajectory 8, frame 54
trajectory 4, frame 183
trajectory 7, frame 2
trajectory 4, frame 343
trajectory 5, frame 7
trajectory 1, frame 52
trajectory 3, frame 285
trajectory 3, frame 452
trajectory 7, frame 344
trajectory 6, frame 504
trajectory 2, frame 246
trajectory 7, frame 249
trajectory 8, frame 469
Highest advantage
episodes
(unexpected successes):
trajectory 5, frame 71
trajectory 7, frame 130
trajectory 1, frame 68
trajectory 4, frame 88
trajectory 4, frame 438
trajectory 6, frame 316
trajectory 8, frame 138
trajectory 7, frame 453
trajectory 5, frame 379
trajectory 8, frame 450
trajectory 8, frame 168
trajectory 5, frame 225
trajectory 3, frame 39
trajectory 4, frame 394
trajectory 2, frame 83
trajectory 8, frame 279
frame: 1 | policy: | next action: → | ||
no-op → ← ↑ ↗ ↖ ↓ A B |
Observation | Positive attribution | Negative attribution | |
---|---|---|---|
policy logits: sums of policy logits: | |||
Click to expand feature
go backwards
go forwards
toggle play/pause