Lowest advantage
episodes
(unexpected failures):
trajectory 4, frame 353
trajectory 4, frame 64
trajectory 7, frame 87
trajectory 3, frame 168
trajectory 2, frame 117
trajectory 1, frame 29
trajectory 6, frame 151
trajectory 5, frame 355
trajectory 8, frame 31
trajectory 4, frame 66
trajectory 4, frame 247
trajectory 4, frame 490
trajectory 5, frame 217
trajectory 7, frame 377
trajectory 3, frame 377
trajectory 3, frame 73
Highest advantage
episodes
(unexpected successes):
trajectory 4, frame 290
trajectory 8, frame 54
trajectory 2, frame 161
trajectory 6, frame 142
trajectory 3, frame 461
trajectory 8, frame 415
trajectory 6, frame 465
trajectory 3, frame 25
trajectory 5, frame 406
trajectory 5, frame 12
trajectory 1, frame 445
trajectory 8, frame 222
trajectory 7, frame 180
trajectory 4, frame 394
trajectory 7, frame 272
trajectory 3, frame 331
frame: 1 | policy: | next action: → | ||
↙ ← ↖ ↓ no-op ↑ ↘ → ↗ D A W S Q E |
Observation | Positive attribution | Negative attribution | |
---|---|---|---|
policy logits: sums of policy logits: | |||
Click to expand feature
Hover to isolate
go backwards
go forwards
toggle play/pause