Lowest advantage
episodes
(unexpected failures):
trajectory 8, frame 280
trajectory 1, frame 157
trajectory 1, frame 11
trajectory 7, frame 165
trajectory 8, frame 454
trajectory 7, frame 399
trajectory 5, frame 110
trajectory 6, frame 158
trajectory 8, frame 73
trajectory 7, frame 3
trajectory 4, frame 7
trajectory 3, frame 320
trajectory 2, frame 223
trajectory 4, frame 170
trajectory 3, frame 163
trajectory 8, frame 179
Highest advantage
episodes
(unexpected successes):
trajectory 1, frame 183
trajectory 3, frame 458
trajectory 5, frame 175
trajectory 8, frame 289
trajectory 1, frame 21
trajectory 8, frame 240
trajectory 2, frame 354
trajectory 2, frame 138
trajectory 7, frame 238
trajectory 4, frame 356
trajectory 8, frame 380
trajectory 1, frame 326
trajectory 3, frame 265
trajectory 5, frame 484
trajectory 3, frame 504
trajectory 8, frame 100
frame: 1 | policy: | next action: → | ||
no-op → ← ↑ ↗ ↖ ↓ A B |
Observation | Positive attribution | Negative attribution | |
---|---|---|---|
policy logits: sums of policy logits: | |||
Click to expand feature
go backwards
go forwards
toggle play/pause