Lowest advantage
episodes
(unexpected failures):
trajectory 7, frame 108
trajectory 1, frame 138
trajectory 5, frame 293
trajectory 6, frame 318
trajectory 8, frame 28
trajectory 1, frame 452
trajectory 3, frame 378
trajectory 1, frame 370
trajectory 3, frame 246
trajectory 6, frame 51
trajectory 2, frame 193
trajectory 5, frame 438
trajectory 7, frame 334
trajectory 7, frame 6
trajectory 4, frame 273
trajectory 5, frame 92
Highest advantage
episodes
(unexpected successes):
trajectory 7, frame 418
trajectory 3, frame 419
trajectory 5, frame 263
trajectory 4, frame 487
trajectory 5, frame 395
trajectory 2, frame 25
trajectory 6, frame 361
trajectory 7, frame 67
trajectory 1, frame 35
trajectory 3, frame 337
trajectory 4, frame 393
trajectory 7, frame 234
trajectory 6, frame 432
trajectory 5, frame 112
trajectory 6, frame 231
trajectory 6, frame 210
frame: 1 | policy: | next action: → | ||
no-op → ← ↑ ↗ ↖ ↓ A B |
Observation | Positive attribution | Negative attribution | |
---|---|---|---|
policy logits: sums of policy logits: | |||
Click to expand feature
Hover to isolate
go backwards
go forwards
toggle play/pause