Lowest advantage
episodes
(unexpected failures):
trajectory 2, frame 83
trajectory 5, frame 362
trajectory 7, frame 443
trajectory 8, frame 199
trajectory 8, frame 357
trajectory 1, frame 236
trajectory 3, frame 422
trajectory 1, frame 391
trajectory 4, frame 482
trajectory 3, frame 345
trajectory 2, frame 421
trajectory 6, frame 35
trajectory 7, frame 509
trajectory 5, frame 485
trajectory 8, frame 32
trajectory 6, frame 491
Highest advantage
episodes
(unexpected successes):
trajectory 8, frame 219
trajectory 2, frame 501
trajectory 1, frame 406
trajectory 4, frame 237
trajectory 5, frame 505
trajectory 6, frame 463
trajectory 8, frame 325
trajectory 2, frame 118
trajectory 5, frame 422
trajectory 2, frame 389
trajectory 3, frame 448
trajectory 3, frame 92
trajectory 2, frame 322
trajectory 4, frame 282
trajectory 1, frame 171
trajectory 5, frame 237
frame: 1 | policy: | next action: → | ||
↙ ← ↖ ↓ no-op ↑ ↘ → ↗ D A W S Q E |
Observation | Positive attribution | Negative attribution | |
---|---|---|---|
policy logits: sums of policy logits: | |||
Click to expand feature
Hover to isolate
go backwards
go forwards
toggle play/pause