11 May 2018
Big data and machine learning are increasing in prominence in economic analysis – but new research from ANZ suggests the robots won’t be replacing traditional economist elbow grease just yet.
Machine learning is being increasingly used in economic research as an alternative to traditional forecasting methods and to help identify relationships traditional theory doesn’t (yet) recognise.
"Consensus forecast is already providing much of the wisdom of the crowd.”
Many policymakers already acknowledge the potential for these techniques to aid their understanding of the economy.
ANZ Research has already contributed to this area - most notably in the development of the RBA Bias Index, which uses natural language processing to decipher the relative hawkishness or dovishness of the Reserve Bank of Australia’s post-meeting statements.
In a recent analysis, the research team looked to use other machine-learning techniques to forecast retail sales growth with a goal of significantly improving on consensus forecasts by using a ‘random forest’ algorithm.
The results found the technique led to a better forecast but only slightly so – suggesting forecasters are already using all available data in their formulations.
It also highlights the challenge Australia has (compared with other countries) when using data-dependent techniques for economic forecasting amid a relatively sparse level of economic indicators.
Random
ANZ Research’s random forest algorithm provided a marginal improvement on Bloomberg’s consensus forecast for retail trade growth.
One way of interpreting this is the consensus forecast is already providing much of the wisdom of the crowd and collectively economists are making use of the available information – like ANZ Research’s model did.
It is worth noting the data-driven approach of machine learning has a drawback. The absence of underlying theory makes interpreting the results more difficult as the causal links between variables are not clear.
Undeterred, ANZ Research will continue to investigate the latter for new insights into the Australian economy, considering alternative approaches, such as recursive neural networks, as well as alternative data sources.
Forests
The monthly retail sales figure is one of Australia’s most important economic indicators. It is also quite hard to predict.
The mean absolute error for the Bloomberg consensus forecast of monthly retail sales growth has been 0.36 percentage points since the beginning of 2010. Over that period, the mean absolute change in retail sales has been 0.43 per cent.
ANZ Research’s benchmark was to beat the accuracy of the Bloomberg consensus forecast. The team chose the random forest algorithm because it is intuitive and connected to linear regression techniques.
For clarity, let’s suppose we’re interested in whether retail sales rose or fell in a month – predicting a specific growth rate is essentially the same process.
The starting point for a random forest is a decision tree. The tree consists of a series of questions which help determine whether sales rose or not. The first question might be: did consumer confidence rise in the month? The two possible answers (yes/no) provide the first two branches of the tree.
The next question might be: did petrol prices rise? The answers provide two branches off each of the existing branches. And this process repeats for each subsequent question.
The idea is by using historical data one can determine the relationship between retail sales and the answers to each question and, by following the tree down, one can assess the probability retail sales rose given the other information.
This process also allows one to determine the informational content of each question in predicting retail sales. Say it was asked “is the first day of the month a weekday?”. If both possible answers show the same proportion of times retail sales rose, then this question doesn’t provide any additional information.
It says there is no apparent link between retail sales and the start day of the month. So one can determine which questions provide the most information.
The leap from a single decision tree to a random forest is similar to the concept of the wisdom of crowds. Each person brings their own knowledge to answering a question, so aggregating the answers increases the total amount of input information.
To construct a random forest, the results from many decision trees are calculated and averaged. Each of the trees is constructed from a random subset of the possible explanatory variables and is estimated using a random subset of the historical data.
Practice
ANZ research’s model considered 49 variables as well as the lagged value for these (and retail sales) for up to 12 months. In other words, 49 variables for 13 months and then lagged retail sales data for 12 months. This gave a total of 649 potential variables.
Each decision tree consisted of a random subset of those 649 variables. The capability of this technique to deal with this large number of explanatory variables is precisely the advantage it has over regular linear regression methods, which are more limited in the number of variables which can be considered for a given sample size.
To estimate and evaluate the model, ANZ Research used data from 2010 to May 2018. Of that data, 75 per cent was used to estimate the model while the remainder was used to evaluate it. This means the model has not already ‘seen’ the data upon which it is evaluated.
The mean absolute error for the model’s forecast of monthly retail sales growth was 0.31 percentage points – slightly less than the Bloomberg consensus of 0.36 percentage points.
As noted, random forests also show which variables are the most important in predicting retail sales. ANZ Research found two of the most-important variables were employment-related - the change in total employment four months ago and the change in the NAB business indicator for employment nine months ago.
Jack Chambers is an Economist and David Plank Head of Australian Economics at ANZ
The views and opinions expressed in this communication are those of the author and may not necessarily state or reflect those of ANZ.
11 May 2018
17 Aug 2016