Data Science & Analytics

Using Google Search Trends to Predict Initial Jobless Claims

Graham Giller

March 25, 2020·3 min read

Using Google Search Trends to Predict Initial Jobless Claims

For several years now, my focus has been on using alternative data of many kinds to predict macroeconomic statistics. When ran the Futures Group for the PDT Group at Morgan Stanley in the 1990’s I became a data omnivore and a data pack rat, taking everything I could find and storing it in case it became a useful predictor of the things we cared about. In that era, my focus was on fairly short-term price movements (daily or higher frequency), but over the last decade I’ve come to focus on longer term, more fundamental, prediction. My approach, when I find a dataset, is not to download it once for an ad hoc analysis but, always, to build a system that streams it regularly into a well-organized database. Many commentators have noted that there is a current spike in Google searches for terms like “unemployment insurance” as we enter an economic environment when jobs are at risk. Google Trends is a website that allows the user to look at how search volumes are changing. [caption id="attachment_19045" align="aligncenter" width="1600"]

Figure 1: Example of Google Trends Website[/caption] As can be clearly seen from the above screenshot, search trends are breaking out. It would be nice if that could be literally transformed into a predictor of Initial Claims, but Google do a lot of processing on the data they release — it is far from “raw” data — and so we need to use an algorithm to relate it to real metrics of the economy. [caption id="attachment_19044" align="aligncenter" width="1732"]

Figure 2: Key: Black = Google Trends data for keyword “unemployment insurance“; Blue = Weekly Initial Claims data (NSA)[/caption] The simplest “algorithm” is linear regression, but in my experience that rarely does a good job on macroeconomic data because the relationships between variables tends to shift through time (the relationships are non-stationary). For example, using ordinary least squares on the above data gives a prediction of 407,350 for the most recently released data (for the week ending 03/14/2020) when the released datum was 250,892. This is a massive error! Instead of linear regression, I tend to implement predictor-corrector machine learning algorithms. These are models that assume that there is a useful relationship between variables but that relationship tends to evolve through time. The algorithm starts off making naive predictions (it usually starts off by assuming the initial relationship is that this period’s data should equal the prior period’s data — technically this is known as a “Martingale” model) but observes its own errors through time and learns how to correct them. This is very similar to the way a child might learn how to predict the trajectory of a ball in flight so that they can catch it (without having to learn calculus and Newton’s laws of motion). [caption id="attachment_19046" align="aligncenter" width="1732"]

Figure 3: Key: blue curve = machine learning model; pink curve = official released data; dark grey = 68% confidence region about prediction; light grey = 95% confidence region about prediction; vertical band indicates periods with unreleased official data.[/caption] Using this kind of model gives a much better prediction, and for last week’s release we had 241,589. I always accompany my predictions with a confidence region, because there is always uncertainty in forecasting and the right approach is to quantify that uncertainty, and the prediction lies well within the confidence region for the history presented. I have set up automated procedures to run this model and hope to be able to publish its results on an ongoing basis on AWS Data Exchange for a small fee.

Graham Giller

Graham is Chief Executive Officer of Giller Investments (New Jersey), LLC, a financial research services firm he founded. Prior to July, 2019, he was Head of Primary Research at Deutsche Bank AG and before that Head of Data Science Research at JP Morgan. He has held Chief Data Scientist roles at Bloomberg LP and JP Morgan. Graham is an Experimental Elementary Particle Physicist by training, with a Doctorate from Oxford University, and has been living and working in the US since 1996, when he was an early member Peter Muller’s Process Driven Trading unit (PDT) at Morgan Stanley and where he was head of Futures modeling and trading. At Morgan Stanley he was the first proprietary trader risk-managed under the “value-at-risk” framework and he developed mathematical models of optimal trading algorithms. At Bloomberg he participated in many senior level meetings including giving Mike Bloomberg a tutorial on Causality Analysis in time-series data and his team did the work to bring social media data like #nfpguesses onto the terminal. From 2000 to 2008 he ran a “friends and family” private investment fund.

Using Google Search Trends to Predict Initial Jobless Claims

More in Data Science & Analytics

Quality Data, Quality Decisions: Why Web Scraping is Essential for Advanced Analytics

Supply Chain Blind Spots: The Psychology of Hidden Risks

What are we solving with analytics?