Lowest advantage
episodes
(unexpected failures):
trajectory 8, frame 406
trajectory 5, frame 19
trajectory 6, frame 430
trajectory 2, frame 439
trajectory 8, frame 234
trajectory 4, frame 255
trajectory 2, frame 6
trajectory 7, frame 87
trajectory 7, frame 163
trajectory 8, frame 77
trajectory 6, frame 362
trajectory 1, frame 126
trajectory 7, frame 351
trajectory 7, frame 242
trajectory 4, frame 407
trajectory 6, frame 217
Highest advantage
episodes
(unexpected successes):
trajectory 6, frame 198
trajectory 8, frame 257
trajectory 4, frame 249
trajectory 4, frame 73
trajectory 8, frame 453
trajectory 6, frame 276
trajectory 6, frame 460
trajectory 7, frame 420
trajectory 4, frame 40
trajectory 1, frame 170
trajectory 7, frame 268
trajectory 7, frame 102
trajectory 6, frame 375
trajectory 2, frame 273
trajectory 3, frame 176
trajectory 7, frame 388
frame: 1 | policy: | next action: → | ||
no-op → ← ↑ ↗ ↖ ↓ A B |
Observation | Positive attribution | Negative attribution | |
---|---|---|---|
policy logits: sums of policy logits: | |||
Click to expand feature
Hover to isolate
go backwards
go forwards
toggle play/pause