Lowest advantage
episodes
(unexpected failures):
trajectory 3, frame 478
trajectory 1, frame 80
trajectory 6, frame 110
trajectory 8, frame 168
trajectory 2, frame 60
trajectory 7, frame 291
trajectory 5, frame 54
trajectory 4, frame 349
trajectory 3, frame 504
Highest advantage
episodes
(unexpected successes):
trajectory 8, frame 238
trajectory 2, frame 189
trajectory 6, frame 195
trajectory 5, frame 383
trajectory 7, frame 466
trajectory 1, frame 43
trajectory 4, frame 167
trajectory 3, frame 122
trajectory 3, frame 495
frame: 1 | policy: | next action: D | ||
↙ ← ↖ ↓ no-op ↑ ↘ → ↗ D A W S Q E |
Observation | Positive attribution | Negative attribution | |
---|---|---|---|
policy logits: sums of policy logits: | |||
Click to expand feature
Hover to isolate
go backwards
go forwards
toggle play/pause