Lowest advantage
episodes
(unexpected failures):
trajectory 7, frame 151
trajectory 6, frame 448
trajectory 12, frame 242
trajectory 15, frame 4
trajectory 11, frame 411
trajectory 9, frame 97
trajectory 10, frame 258
trajectory 8, frame 7
trajectory 9, frame 301
trajectory 1, frame 402
trajectory 8, frame 325
trajectory 3, frame 44
trajectory 5, frame 192
trajectory 6, frame 68
trajectory 13, frame 429
trajectory 3, frame 366
Highest advantage
episodes
(unexpected successes):
trajectory 12, frame 249
trajectory 7, frame 509
trajectory 13, frame 462
trajectory 9, frame 322
trajectory 7, frame 152
trajectory 4, frame 251
trajectory 16, frame 273
trajectory 7, frame 65
trajectory 10, frame 439
trajectory 1, frame 369
trajectory 3, frame 304
trajectory 3, frame 13
trajectory 11, frame 6
trajectory 6, frame 453
trajectory 1, frame 32
trajectory 8, frame 377
frame: 1 | policy: | next action: → | ||
no-op → ← ↑ ↗ ↖ ↓ A B |
Observation | Positive attribution | Negative attribution | |
---|---|---|---|
policy logits: sums of policy logits: | |||
Click to expand feature
Hover to isolate
go backwards
go forwards
toggle play/pause