Lowest advantage
episodes
(unexpected failures):
trajectory 7, frame 215
trajectory 7, frame 91
trajectory 3, frame 231
trajectory 1, frame 361
trajectory 6, frame 257
trajectory 2, frame 81
trajectory 7, frame 452
trajectory 8, frame 420
trajectory 1, frame 221
trajectory 5, frame 205
trajectory 4, frame 296
trajectory 8, frame 489
trajectory 4, frame 156
trajectory 5, frame 443
trajectory 5, frame 63
trajectory 4, frame 30
Highest advantage
episodes
(unexpected successes):
trajectory 8, frame 413
trajectory 7, frame 83
trajectory 3, frame 100
trajectory 7, frame 333
trajectory 6, frame 437
trajectory 4, frame 471
trajectory 1, frame 325
trajectory 1, frame 84
trajectory 4, frame 128
trajectory 4, frame 191
trajectory 4, frame 54
trajectory 6, frame 288
trajectory 5, frame 495
trajectory 1, frame 498
trajectory 6, frame 345
trajectory 2, frame 113
frame: 1 | policy: | next action: → | ||
no-op → ← ↑ ↗ ↖ ↓ A B |
Observation | Positive attribution | Negative attribution | |
---|---|---|---|
policy logits: sums of policy logits: | |||
Click to expand feature
Hover to isolate
go backwards
go forwards
toggle play/pause