Machine Learning | Natural Language Processing | Data Science
Exploring the drop-in strategy thatâs speeding up language models by 3x
In this article weâll discuss âSpeculative Samplingâ, a strategy that makes text generation faster and more affordable without compromising on performance.
First weâll discuss a major problem thatâs slowing down modern language models, then weâll build an intuitive understanding of how speculative sampling elegantly speeds them up, then weâll implement speculative sampling from scratch in Python.
Who is this useful for? Anyone interested in natural language processing (NLP), or cutting edge AI advancements.
How advanced is this post? The concepts in this article are accessible to machine learning enthusiasts, and are cutting edge enough to interest seasoned data scientists. The code at the end may be useful to developers.
Pre-requisites: It might be useful to have a cursory understanding of Transformers, OpenAIâs GPT models, or both. If you find yourself confused, you can refer to either of these articles:
Over the last four years OpenAIâs GPT models have grown from 117 million parameters in 2018 to an estimated 1.8 Trillion parameters in 2023. This rapid growth can largely be attributed to the fact that, in language modeling, bigger is better.