Lowest advantage
episodes
(unexpected failures):
trajectory 3, frame 108
trajectory 8, frame 200
trajectory 1, frame 79
trajectory 6, frame 376
trajectory 6, frame 136
trajectory 5, frame 303
trajectory 7, frame 344
trajectory 1, frame 350
trajectory 4, frame 137
trajectory 7, frame 495
trajectory 3, frame 408
trajectory 1, frame 190
trajectory 5, frame 123
trajectory 3, frame 261
trajectory 6, frame 222
trajectory 2, frame 339
Highest advantage
episodes
(unexpected successes):
trajectory 8, frame 264
trajectory 3, frame 112
trajectory 4, frame 375
trajectory 3, frame 433
trajectory 7, frame 241
trajectory 6, frame 253
trajectory 4, frame 496
trajectory 8, frame 49
trajectory 2, frame 92
trajectory 7, frame 100
trajectory 3, frame 42
trajectory 1, frame 430
trajectory 2, frame 232
trajectory 5, frame 57
trajectory 7, frame 409
trajectory 1, frame 319
frame: 1 | policy: | next action: → | ||
no-op → ← ↑ ↗ ↖ ↓ A B |
Observation | Positive attribution | Negative attribution | |
---|---|---|---|
policy logits: sums of policy logits: | |||
Click to expand feature
go backwards
go forwards
toggle play/pause