Google Research announced TimesFM, a 200M parameter Transformer-based foundation model for time-series forecasting. TimesFM is trained on nearly 100B data points and has zero-shot forecasting performance comparable to or better than supervised-learning models.
TimesFM uses a decoder-only transformer architecture similar to large language models (LLMs) like ChatGPT. In this scheme, short patches of time series data are modeled as tokens, for both the model's input and output. The researchers pre-trained the model on real-world data, drawn from Wikipedia and Google search trends, as well as synthetic data. When the team evaluated the model's zero-shot performance on several forecasting benchmarks, TimesFM outperformed traditional statistical methods such as ARIMA and EMA, as well as deep learning models that had been trained on the benchmark's training dataset. According to Google:
Compared to the latest large language models, TimesFM is much smaller...yet we show that even at such scales, its zero-shot performance on a variety of unseen datasets of different domains and temporal granularities come close to the state-of-the-art supervised approaches trained explicitly on these datasets.
Time-series forecasting is an important tool for many domains, including retail sales, meteorology, and energy production. Recent advances in deep learning have led to models such as DeepAR, which can outperform traditional techniques. However, these typically require models to be trained on a task-specific dataset. The current use of LLMs as foundation models and their ability to perform many tasks in "zero-shot" settings, including time-series forecasting, inspired the Google researchers to develop TimesFM using the Transformer architecture that underlies most LLMs.
TimesFM Neural Architecture (Source: Google Research)
Because Transformers operate on discrete tokens, the first layer in the TimesFM model maps a short sequence of input data, or patch, into a token vector; as with LLMs, a positional encoding vector is added to this token vector. This passed to a stack of several self-attention layers to produce an output token. Finally, output tokens are converted into time-series data patches; however, the output patch length can be longer than the input patch length, which allows the model to predict longer output sequences with fewer auto-regressive invocations.
The Google team evaluated TimesFM in zero-shot mode on several public datasets: Monash, Darts, and Informer. The team measured its mean absolute error (MAE) and compared it to several baseline models as well as to GPT-3. On Monash, TimesFM was "among the top 3" models. For Darts, it was "within statistical significance of the best model," and on Informer it outperformed all other models.
In a discussion on Hacker News, one user lamented Google not making the model openly-available:
Sounds like a cool model. I'd love to try it but it seems like they're not releasing it (yet?). I've really gotten spoiled with the recent language models where I can download and run any new model or fine-tune I hear about. It's gotten to the point where I don't feel a model is very relevant unless I can run it locally. Hopefully a local version of this becomes available because I have plenty of time series data I'd like to run through it!
Google did note plans to make the model available in their Vertex AI platform "later this year." The researchers also hope to "delve into a more theoretical understanding" of the model in the future.