Lowest advantage
episodes
(unexpected failures):
trajectory 7, frame 431
trajectory 7, frame 101
trajectory 4, frame 223
trajectory 7, frame 506
trajectory 8, frame 271
trajectory 1, frame 261
trajectory 5, frame 266
trajectory 6, frame 290
trajectory 6, frame 346
trajectory 1, frame 456
trajectory 8, frame 403
trajectory 1, frame 171
trajectory 5, frame 21
trajectory 6, frame 109
trajectory 1, frame 58
trajectory 4, frame 313
Highest advantage
episodes
(unexpected successes):
trajectory 7, frame 435
trajectory 7, frame 389
trajectory 5, frame 455
trajectory 3, frame 237
trajectory 1, frame 98
trajectory 7, frame 46
trajectory 6, frame 300
trajectory 5, frame 407
trajectory 4, frame 70
trajectory 6, frame 100
trajectory 8, frame 402
trajectory 7, frame 334
trajectory 5, frame 163
trajectory 5, frame 44
trajectory 7, frame 137
trajectory 2, frame 32
frame: 1 | policy: | next action: → | ||
no-op → ← ↑ ↗ ↖ ↓ A B |
Observation | Positive attribution | Negative attribution | |
---|---|---|---|
policy logits: sums of policy logits: | |||
Click to expand feature
Hover to isolate
go backwards
go forwards
toggle play/pause