Posted by Rajat Sen and Yichen Zhou, Google Research
Time-series forecasting is widely used in various domains, including retail, finance, manufacturing, healthcare, and natural sciences. Improving the accuracy of demand forecasting in retail, for example, can significantly reduce inventory costs and increase revenue. Deep learning (DL) models have become popular in forecasting rich, multivariate, time-series data because they have shown excellent performance in different settings (e.g., DL models dominated the M5 competition leaderboard). Additionally, there have been significant advancements in large foundation language models used for natural language processing (NLP) tasks like translation, retrieval-augmented generation, and code completion. These models are trained on massive amounts of textual data from various sources, enabling them to identify language patterns. This makes them powerful zero-shot tools, capable of answering questions and summarizing current events when combined with retrieval.
However, DL-based forecasters still face challenges. Most DL architectures require extensive training and validation cycles before customers can test them on new time-series data. In contrast, a foundation model for time-series forecasting can provide accurate forecasts on unseen time-series data without additional training, allowing users to focus on refining forecasts for specific tasks, such as retail demand planning. In our paper, “A decoder-only foundation model for time-series forecasting,” we introduce TimesFM, a single forecasting model pre-trained on a large time-series corpus of 100 billion real-world time-points. Despite being much smaller (200M parameters) than the latest large language models (LLMs), TimesFM demonstrates impressive zero-shot performance on a variety of unseen datasets from different domains and temporal granularities, rivaling state-of-the-art supervised approaches trained explicitly on these datasets. We plan to make this model available for external customers in Google Cloud Vertex AI later this year.
TimesFM follows a decoder-only training approach similar to LLMs, consisting of three steps. First, text is divided into subwords called tokens. Then, these tokens are processed by stacked causal transformer layers that generate an output corresponding to each input token. Finally, the output summarizes information from previous tokens and predicts the next token. During inference, the model generates output tokens one at a time. For example, when given the prompt “What is the capital of France?” the model might generate the token “The,” then condition on “What is the capital of France? The” to generate the next token “capital,” and so on until it produces the complete answer: “The capital of France is Paris.”
In the context of time-series forecasting, we treat a patch of time-points as a token, similar to recent long-horizon forecasting work. The goal is to forecast the next patch of time-points given the previous output. Unlike language models, we need a multilayer perceptron block with residual connections to convert a patch of time-series into a token that can be input to the transformer layers. At the other end, the output can be used to predict a longer length of subsequent time-points than the input patch length. This flexibility allows for better performance in long-horizon forecasting.
To train TimesFM, we use a large volume of legitimate time series data. Synthetic data helps establish fundamental temporal patterns, while real-world data from public time series datasets, including Google Trends and Wikipedia Pageviews, provides domain-specific contexts that enhance generalization. We evaluate TimesFM’s zero-shot performance on unseen data using popular time-series benchmarks and find that it outperforms most statistical methods and even powerful DL models specifically trained on the target time-series. We also compare TimesFM to GPT-3.5 for forecasting and demonstrate that TimesFM performs better despite being significantly smaller.
In conclusion, we present TimesFM, a decoder-only foundation model for time-series forecasting trained on a large pretraining corpus. Despite its smaller size, TimesFM exhibits impressive zero-shot performance on various public benchmarks. We would like to acknowledge the contributions of our research team to this work.
Acknowledgements:
This work is…
Source link