Lowest advantage
episodes
(unexpected failures):
trajectory 7, frame 200
trajectory 5, frame 5
trajectory 8, frame 132
trajectory 6, frame 376
trajectory 3, frame 160
trajectory 5, frame 460
trajectory 3, frame 37
trajectory 4, frame 44
trajectory 6, frame 98
trajectory 4, frame 395
trajectory 2, frame 120
trajectory 3, frame 268
trajectory 8, frame 322
trajectory 2, frame 27
trajectory 7, frame 352
trajectory 2, frame 370
Highest advantage
episodes
(unexpected successes):
trajectory 8, frame 445
trajectory 7, frame 446
trajectory 1, frame 334
trajectory 7, frame 211
trajectory 7, frame 76
trajectory 5, frame 69
trajectory 7, frame 49
trajectory 6, frame 293
trajectory 3, frame 366
trajectory 2, frame 148
trajectory 4, frame 91
trajectory 6, frame 463
trajectory 6, frame 174
trajectory 2, frame 404
trajectory 2, frame 292
trajectory 7, frame 483
frame: 1 | policy: | next action: → | ||
no-op → ← ↑ ↗ ↖ ↓ A B |
Observation | Positive attribution | Negative attribution | |
---|---|---|---|
policy logits: sums of policy logits: | |||
Click to expand feature
Hover to isolate
go backwards
go forwards
toggle play/pause