The Curious Case of Neural Text Degeneration

Holtzman, Ari and Buys, Jan and Du, Li and Forbes, Maxwell and Choi, Yejin (2020) The Curious Case of Neural Text Degeneration, Proceedings of International Conference on Learning Representations, 26 April - 1 May 2020, Online.

[thumbnail of the_curious_case_of_neural_text_degeneration.pdf] Text
the_curious_case_of_neural_text_degeneration.pdf - Published Version

Download (4MB)

Abstract

Despite considerable advances in neural language modeling, it remains an open question what the best decoding strategy is for text generation from a language model (e.g. to generate a story). The counter-intuitive empirical observation is that even though the use of likelihood as training objective leads to high quality models for a broad range of language understanding tasks, maximization-based decoding methods such as beam search lead to degeneration — output text that is bland, incoherent, or gets stuck in repetitive loops. To address this we propose Nucleus Sampling, a simple but effective method to draw considerably higher quality text out of neural language models than previous decoding strategies. Our approach avoids text degeneration by truncating the unreliable tail of the probability distribution, sampling from the dynamic nucleus of tokens containing the vast majority of the probability mass. To properly examine current maximization-based and stochastic decoding methods, we compare generations from each of these methods to the distribution of human text along several axes such as likelihood, diversity, and repetition. Our results show that (1) maximization is an inappropriate decoding objective for open-ended text generation, (2) the probability distributions of the best current language models have an unreliable tail which needs to be truncated during generation and (3) Nucleus Sampling is currently the best available decoding strategy for generating long-form text that is both high-quality — as measured by human evaluation — and as diverse as human-written text.

Item Type: Conference paper
Subjects: Computing methodologies > Artificial intelligence > Natural language processing
Computing methodologies > Artificial intelligence > Natural language processing > Natural language generation
Date Deposited: 23 Dec 2020 06:51
Last Modified: 23 Dec 2020 06:51
URI: http://pubs.cs.uct.ac.za/id/eprint/1407

Actions (login required)

View Item View Item