Lowest advantage
episodes
(unexpected failures):
trajectory 8, frame 206
trajectory 1, frame 300
trajectory 8, frame 432
trajectory 5, frame 369
trajectory 6, frame 501
trajectory 1, frame 38
trajectory 7, frame 184
trajectory 8, frame 125
trajectory 2, frame 345
trajectory 1, frame 180
trajectory 7, frame 130
trajectory 4, frame 214
trajectory 5, frame 49
trajectory 3, frame 180
trajectory 5, frame 178
trajectory 3, frame 165
Highest advantage
episodes
(unexpected successes):
trajectory 8, frame 477
trajectory 7, frame 128
trajectory 3, frame 445
trajectory 1, frame 323
trajectory 3, frame 17
trajectory 7, frame 481
trajectory 3, frame 500
trajectory 6, frame 312
trajectory 1, frame 26
trajectory 2, frame 34
trajectory 5, frame 450
trajectory 8, frame 136
trajectory 2, frame 370
trajectory 1, frame 495
trajectory 8, frame 349
trajectory 5, frame 266
frame: 1 | policy: | next action: → | ||
no-op → ← ↑ ↗ ↖ ↓ A B |
Observation | Positive attribution | Negative attribution | |
---|---|---|---|
policy logits: sums of policy logits: | |||
Click to expand feature
Hover to isolate
go backwards
go forwards
toggle play/pause