Home
This website hosts samples from the models trained in the “Learning to Summarize from Human Feedback” paper. There are 5 categories of samples:
- TL;DR samples: posts from the TL;DR dataset, along with summaries from several of our models and baselines.
- TL;DR human comparisons: posts from the TL;DR dataset, along with two summaries from our models, and the summary that one of our labelers preferred. Also included are interpretation notes written by the labeler after only reading each summary (not the post).
- TL;DR evaluations: posts from the TL;DR dataset, along with summaries from several of our models and baselines. Each summary is evaluated on a 1-7 scale along four axes of quality by one of our labelers. Also included are interpretation notes written by the labeler after only reading each summary (not the post).
- CNN/DM samples: articles from the CNN/DailyMail dataset, along with summaries from several of our models and baselines.
- CNN/DM evaluations: articles from the CNN/DailyMail dataset, along with summaries from several of our models and baselines. Each summary is evaluated on a 1-7 scale along four axes of quality by one of our labelers. Also included are interpretation notes written by the labeler after only reading each summary (not the post).
Content warning
As noted in our blog post and paper, the TL;DR dataset consists of user-submitted posts to the website reddit.com, and can contain content that is offensive or reflects harmful social biases. The data on this site is not filtered for content.
The CNN/DM dataset consists of news articles from CNN and DailyMail. Some articles may be graphic.