Lowest advantage
episodes
(unexpected failures):
trajectory 6, frame 203
trajectory 7, frame 155
trajectory 8, frame 452
trajectory 1, frame 113
trajectory 2, frame 199
trajectory 4, frame 304
trajectory 5, frame 444
trajectory 8, frame 114
trajectory 3, frame 469
trajectory 6, frame 412
trajectory 2, frame 466
trajectory 7, frame 497
trajectory 5, frame 50
trajectory 1, frame 440
trajectory 3, frame 7
trajectory 4, frame 511
Highest advantage
episodes
(unexpected successes):
trajectory 8, frame 485
trajectory 7, frame 297
trajectory 2, frame 257
trajectory 5, frame 463
trajectory 4, frame 342
trajectory 1, frame 275
trajectory 8, frame 134
trajectory 6, frame 102
trajectory 5, frame 228
trajectory 6, frame 482
trajectory 3, frame 174
trajectory 1, frame 501
trajectory 4, frame 472
trajectory 2, frame 450
trajectory 3, frame 434
trajectory 7, frame 510
frame: 1 | policy: | next action: ↗ | ||
↙ ← ↖ ↓ no-op ↑ ↘ → ↗ D A W S Q E |
Observation | Positive attribution | Negative attribution | |
---|---|---|---|
policy logits: sums of policy logits: | |||
Click to expand feature
Hover to isolate
go backwards
go forwards
toggle play/pause