You need to enable JavaScript to run this app.
Welcome! This is a viewer for sparse autoencoders features trained in
this paper
Pick a feature:
Note: by clicking this button you acknowledge that the content of the documents are taken randomly from the internet, and may contain offensive or inappropriate content.
Interesting features:
GPT-4
humans have flaws
police reports, especially child safety
price changes
ratification (multilingual)
would [...]
identification documents (multilingual)
lightly incremented timestamps
Technical knowledge
machine learning training logs
onclick/onchange = function(this)
edges (graph theory) and related concepts
algebraic rings
adenosine/dopamine receptors
blockchain vibes
GPT-2 small
rhetorical questions
counting human casualties
X and Y phrases
Patrick/Patty surname predictor
things that are unknown
words in quotes
these/those responsible things
2018 natural disasters
addition in code
function application
unclear/hidden things
what the ...
Safety relevant features (found via attribution methods)
profanity (1)
profanity (2)
profanity (3)
erotic content
[content warning] sexual abuse