1. Language models
By now, we all know that large language models (LLMs) are very capable in qualitative and language-based tasks. The jury is still out, however, concerning their ๐ซ๐๐๐ฌ๐จ๐ง๐ข๐ง๐ ๐๐ง๐ ๐ง๐ฎ๐ฆ๐๐ซ๐ข๐๐๐ฅ skills.
Researchers at the University of Chicago’s Booth School of Business (my alma mater) used ๐
๐ข๐ง๐๐ง๐๐ข๐๐ฅ ๐๐ญ๐๐ญ๐๐ฆ๐๐ง๐ญ ๐๐ง๐๐ฅ๐ฒ๐ฌ๐ข๐ฌ (FSA) to test LLMsโ ability to analyze and synthesize purely financial numbers (paper here). The task was to predict whether earnings will grow or decline in the following period (various timeframes tested). The LLM (GPT 4.0 Turbo) was not given any textual information, just numbers, as shown in Fig. 1.
After telling it to assume the role of a financial analyst, ๐๐ก๐๐ข๐ง-๐จ๐-๐๐ก๐จ๐ฎ๐ ๐ก๐ญ (CoT) techniques guided the LLM towards its answers. The LLM was asked to:
- Identify notable changes in the financial statements.
- Compute financial ratios, by first stating the formulae, then computing the ratios.
- Provide economic interpretations of the computed ratios
- Predict the directional change of future earnings and provide the rationale for that prediction.
The authors found that the LLM, with CoT, easily outperformed the median financial analyst. Even though the LLM was only given quantitative material, it benefited from its general ‘understanding’ of the world, including business and investment know-how, combined with an emerging form of intuitive reasoning and a capacity to formulate hypotheses. Moreover, human financial analysts suffer from statistical bias, in all likelihood more so than LLMs in this specific, quantitative use case.
The authors also trained a three-layer artificial neural network (ANN) on a vast body of data. This ๐ญ๐๐ฌ๐ค-๐ฌ๐ฉ๐๐๐ข๐๐ข๐ ๐๐๐ just matched the general-purpose LLM’s accuracy. A remarkable result, considering that an ๐จ๐๐-๐ญ๐ก๐-๐ฌ๐ก๐๐ฅ๐ ๐ ๐๐ง๐๐ซ๐๐ฅ-๐ฉ๐ฎ๐ซ๐ฉ๐จ๐ฌ๐ ๐๐๐ without any further fine-tuning was used.
Overall, FSA is an interesting use case demonstrating the numerical skills and emerging reasoning capabilities of general-purpose LLMs. I’d like to see the results of this study when the LLM was fine-tuned with the data fed into the ANN…
2. Specialized foundation models
Above, I showed research demonstrating how a language model, basically pre-trained to perform next-word prediction, was capable of accomplishing ๐ง๐ฎ๐ฆ๐๐ซ๐ข๐๐๐ฅ ๐ญ๐๐ฌ๐ค๐ฌ and some related reasoning.
Recently, a new breed of specialized foundation models has emerged. ๐๐ข๐ฆ๐๐๐๐ is such a model ๐ฌ๐ฉ๐๐๐ข๐๐ฅ๐ข๐ณ๐๐ ๐ข๐ง ๐ญ๐ข๐ฆ๐ ๐ฌ๐๐ซ๐ข๐๐ฌ: It is pre-trained on over 100 billion rows of financial, weather, Internet of Things (IoT), energy, and web data.
In their latest paper, my LIRIS colleagues tested TimeGPT for soil water potential prediction for orchards. As data gathering in agriculture is often expensive, the relative ๐ฌ๐ก๐จ๐ซ๐ญ๐๐ ๐ ๐จ๐ ๐๐๐ญ๐ often precludes data-hungry deep learning methods such as LSTMs.
They find that, with minor fine-tuning using the target variable’s (soil water potential) history only, TimeGPT delivers respectable results, only losing out against the state-of-the-art Temporal Fusion Transformer (TFT) model. Note that the TFT model also included exogenous variables such as weather data in its dataset. Considering its ๐ฌ๐ฎ๐ฉ๐๐ซ๐ข๐จ๐ซ ๐๐๐ฌ๐ ๐จ๐ ๐ฎ๐ฌ๐ ๐ข๐ง ๐ญ๐๐ซ๐ฆ๐ฌ ๐จ๐ ๐๐๐๐จ๐ซ๐ญ ๐๐ง๐ ๐๐๐ญ๐, TimeGPT can therefore be considered a serious alternative for use cases plagued by data scarcity. TimeGPT and other such specialized foundation models can leverage their learned skills, such as time series forecasting, to address new problems where training data are not sufficiently available for alternative deep learning methods that require training from scratch.
Try this out yourself! MultAI.eu offers safe and easy access to OpenAI’s, Google’s, Anthropic’s, and others’ models.