Lowest advantage
episodes
(unexpected failures):
trajectory 2, frame 234
trajectory 7, frame 463
trajectory 4, frame 500
trajectory 1, frame 498
trajectory 5, frame 499
trajectory 3, frame 500
trajectory 6, frame 495
trajectory 8, frame 491
trajectory 2, frame 395
trajectory 7, frame 511
Highest advantage
episodes
(unexpected successes):
trajectory 1, frame 260
trajectory 8, frame 260
trajectory 6, frame 257
trajectory 5, frame 256
trajectory 2, frame 496
trajectory 4, frame 253
trajectory 7, frame 260
trajectory 3, frame 256
trajectory 2, frame 133
trajectory 7, frame 500
frame: 1 | policy: | next action: E | ||
↙ ← ↖ ↓ no-op ↑ ↘ → ↗ D A W S Q E |
Observation | Positive attribution | Negative attribution | |
---|---|---|---|
policy logits: sums of policy logits: | |||
Click to expand feature
Hover to isolate
go backwards
go forwards
toggle play/pause