Lowest advantage
episodes
(unexpected failures):
trajectory 8, frame 508
trajectory 7, frame 47
trajectory 8, frame 232
trajectory 4, frame 193
trajectory 2, frame 441
trajectory 6, frame 232
trajectory 3, frame 202
trajectory 7, frame 460
trajectory 1, frame 7
trajectory 4, frame 427
trajectory 3, frame 475
trajectory 1, frame 504
trajectory 5, frame 135
trajectory 2, frame 115
trajectory 5, frame 436
trajectory 6, frame 465
Highest advantage
episodes
(unexpected successes):
trajectory 6, frame 221
trajectory 8, frame 242
trajectory 2, frame 270
trajectory 7, frame 230
trajectory 2, frame 492
trajectory 3, frame 425
trajectory 4, frame 24
trajectory 4, frame 458
trajectory 1, frame 338
trajectory 5, frame 443
trajectory 7, frame 468
trajectory 6, frame 500
trajectory 3, frame 272
trajectory 5, frame 186
trajectory 1, frame 465
frame: 1 | policy: | next action: ← | ||
↙ ← ↖ ↓ no-op ↑ ↘ → ↗ D A W S Q E |
Observation | Positive attribution | Negative attribution | |
---|---|---|---|
policy logits: sums of policy logits: | |||
Click to expand feature
Hover to isolate
go backwards
go forwards
toggle play/pause