Lowest advantage
episodes
(unexpected failures):
trajectory 1, frame 48
trajectory 5, frame 496
trajectory 2, frame 294
trajectory 8, frame 151
trajectory 2, frame 24
trajectory 5, frame 339
trajectory 3, frame 14
trajectory 3, frame 63
trajectory 6, frame 78
trajectory 1, frame 314
trajectory 4, frame 478
trajectory 7, frame 156
trajectory 1, frame 293
trajectory 6, frame 440
trajectory 8, frame 51
trajectory 3, frame 436
Highest advantage
episodes
(unexpected successes):
trajectory 1, frame 318
trajectory 4, frame 165
trajectory 7, frame 48
trajectory 5, frame 468
trajectory 6, frame 379
trajectory 6, frame 324
trajectory 3, frame 391
trajectory 1, frame 229
trajectory 3, frame 140
trajectory 7, frame 127
trajectory 3, frame 229
trajectory 8, frame 99
trajectory 6, frame 252
trajectory 1, frame 85
trajectory 3, frame 478
trajectory 2, frame 323
frame: 1 | policy: | next action: no-op | ||
↙ ← ↖ ↓ no-op ↑ ↘ → ↗ D A W S Q E |
Observation | Positive attribution | Negative attribution | |
---|---|---|---|
policy logits: sums of policy logits: | |||
Click to expand feature
Hover to isolate
go backwards
go forwards
toggle play/pause