Lowest advantage
episodes
(unexpected failures):
trajectory 8, frame 413
trajectory 4, frame 410
trajectory 7, frame 90
trajectory 8, frame 121
trajectory 6, frame 79
trajectory 1, frame 33
trajectory 3, frame 301
trajectory 5, frame 281
trajectory 4, frame 247
trajectory 1, frame 483
trajectory 6, frame 251
trajectory 6, frame 171
trajectory 5, frame 465
trajectory 2, frame 62
trajectory 1, frame 173
trajectory 3, frame 75
Highest advantage
episodes
(unexpected successes):
trajectory 7, frame 96
trajectory 4, frame 503
trajectory 6, frame 106
trajectory 8, frame 260
trajectory 3, frame 105
trajectory 1, frame 450
trajectory 6, frame 470
trajectory 7, frame 139
trajectory 2, frame 408
trajectory 4, frame 161
trajectory 3, frame 471
trajectory 4, frame 23
trajectory 8, frame 288
trajectory 7, frame 344
trajectory 6, frame 254
trajectory 3, frame 17
frame: 1 | policy: | next action: ↗ | ||
↙ ← ↖ ↓ no-op ↑ ↘ → ↗ D A W S Q E |
Observation | Positive attribution | Negative attribution | |
---|---|---|---|
policy logits: sums of policy logits: | |||
Click to expand feature
Hover to isolate
go backwards
go forwards
toggle play/pause