Lowest advantage
episodes
(unexpected failures):
trajectory 2, frame 221
trajectory 1, frame 158
trajectory 5, frame 381
trajectory 7, frame 115
trajectory 3, frame 2
trajectory 6, frame 509
trajectory 1, frame 50
trajectory 3, frame 394
trajectory 3, frame 332
trajectory 8, frame 119
trajectory 6, frame 142
trajectory 1, frame 362
trajectory 8, frame 355
trajectory 5, frame 341
trajectory 5, frame 1
trajectory 4, frame 213
Highest advantage
episodes
(unexpected successes):
trajectory 5, frame 461
trajectory 1, frame 331
trajectory 2, frame 357
trajectory 8, frame 74
trajectory 7, frame 197
trajectory 7, frame 128
trajectory 6, frame 453
trajectory 6, frame 43
trajectory 8, frame 257
trajectory 5, frame 85
trajectory 8, frame 222
trajectory 3, frame 232
trajectory 1, frame 275
trajectory 6, frame 286
trajectory 4, frame 43
trajectory 6, frame 335
frame: 1 | policy: | next action: ↘ | ||
↙ ← ↖ ↓ no-op ↑ ↘ → ↗ D A W S Q E |
Observation | Positive attribution | Negative attribution | |
---|---|---|---|
policy logits: sums of policy logits: | |||
Click to expand feature
Hover to isolate
go backwards
go forwards
toggle play/pause