Lowest advantage
episodes
(unexpected failures):
trajectory 7, frame 61
trajectory 6, frame 421
trajectory 2, frame 418
trajectory 2, frame 18
trajectory 3, frame 236
trajectory 4, frame 301
trajectory 2, frame 324
trajectory 1, frame 98
trajectory 1, frame 295
trajectory 8, frame 248
trajectory 5, frame 325
trajectory 8, frame 438
trajectory 2, frame 509
trajectory 6, frame 192
trajectory 6, frame 118
trajectory 1, frame 407
Highest advantage
episodes
(unexpected successes):
trajectory 2, frame 73
trajectory 3, frame 85
trajectory 8, frame 119
trajectory 8, frame 361
trajectory 8, frame 14
trajectory 4, frame 426
trajectory 2, frame 491
trajectory 8, frame 449
trajectory 5, frame 346
trajectory 2, frame 364
trajectory 7, frame 138
trajectory 1, frame 382
trajectory 4, frame 25
trajectory 8, frame 263
trajectory 2, frame 141
trajectory 1, frame 130
frame: 1 | policy: | next action: ↙ | ||
↙ ← ↖ ↓ no-op ↑ ↘ → ↗ D A W S Q E |
Observation | Positive attribution | Negative attribution | |
---|---|---|---|
policy logits: sums of policy logits: | |||
Click to expand feature
Hover to isolate
go backwards
go forwards
toggle play/pause