Lowest advantage
episodes
(unexpected failures):
trajectory 6, frame 474
trajectory 5, frame 427
trajectory 8, frame 108
trajectory 4, frame 2
trajectory 6, frame 497
trajectory 4, frame 202
trajectory 1, frame 245
trajectory 7, frame 55
trajectory 5, frame 163
trajectory 3, frame 343
trajectory 7, frame 172
trajectory 8, frame 28
trajectory 1, frame 33
trajectory 1, frame 104
trajectory 7, frame 328
trajectory 7, frame 4
Highest advantage
episodes
(unexpected successes):
trajectory 6, frame 117
trajectory 6, frame 315
trajectory 5, frame 333
trajectory 7, frame 448
trajectory 1, frame 399
trajectory 1, frame 494
trajectory 7, frame 507
trajectory 3, frame 169
trajectory 2, frame 28
trajectory 8, frame 92
trajectory 4, frame 118
trajectory 4, frame 274
trajectory 2, frame 160
trajectory 1, frame 195
trajectory 2, frame 204
trajectory 8, frame 57
frame: 1 | policy: | next action: → | ||
no-op → ← ↑ ↗ ↖ ↓ A B |
Observation | Positive attribution | Negative attribution | |
---|---|---|---|
policy logits: sums of policy logits: | |||
Click to expand feature
Hover to isolate
go backwards
go forwards
toggle play/pause