Lowest advantage
episodes
(unexpected failures):
trajectory 6, frame 284
trajectory 1, frame 126
trajectory 4, frame 174
trajectory 4, frame 322
trajectory 7, frame 168
trajectory 1, frame 240
trajectory 7, frame 20
trajectory 2, frame 331
trajectory 5, frame 467
trajectory 7, frame 480
trajectory 1, frame 439
trajectory 6, frame 82
trajectory 4, frame 150
trajectory 5, frame 366
trajectory 4, frame 474
trajectory 3, frame 365
Highest advantage
episodes
(unexpected successes):
trajectory 3, frame 78
trajectory 1, frame 85
trajectory 5, frame 317
trajectory 7, frame 82
trajectory 2, frame 188
trajectory 5, frame 376
trajectory 4, frame 42
trajectory 1, frame 488
trajectory 1, frame 409
trajectory 6, frame 386
trajectory 8, frame 237
trajectory 7, frame 176
trajectory 3, frame 446
trajectory 7, frame 332
trajectory 3, frame 122
trajectory 3, frame 330
frame: 1 | policy: | next action: → | ||
no-op → ← ↑ ↗ ↖ ↓ A B |
Observation | Positive attribution | Negative attribution | |
---|---|---|---|
policy logits: sums of policy logits: | |||
Click to expand feature
Hover to isolate
go backwards
go forwards
toggle play/pause