Lowest advantage
episodes
(unexpected failures):
trajectory 6, frame 354
trajectory 3, frame 311
trajectory 4, frame 214
trajectory 7, frame 481
trajectory 7, frame 169
trajectory 2, frame 372
trajectory 4, frame 415
trajectory 1, frame 366
trajectory 5, frame 179
trajectory 3, frame 40
trajectory 8, frame 16
trajectory 6, frame 152
trajectory 2, frame 88
trajectory 8, frame 468
trajectory 8, frame 200
trajectory 4, frame 269
Highest advantage
episodes
(unexpected successes):
trajectory 1, frame 466
trajectory 3, frame 363
trajectory 4, frame 233
trajectory 8, frame 261
trajectory 7, frame 111
trajectory 4, frame 484
trajectory 7, frame 221
trajectory 1, frame 35
trajectory 3, frame 95
trajectory 1, frame 124
trajectory 7, frame 505
trajectory 2, frame 324
trajectory 4, frame 496
trajectory 3, frame 400
trajectory 5, frame 485
trajectory 7, frame 384
frame: 1 | policy: | next action: → | ||
↙ ← ↖ ↓ no-op ↑ ↘ → ↗ D A W S Q E |
Observation | Positive attribution | Negative attribution | |
---|---|---|---|
policy logits: sums of policy logits: | |||
Click to expand feature
Hover to isolate
go backwards
go forwards
toggle play/pause