Quality Data, Quality Decisions: Why Web Scraping is Essential for Advanced Analytics
Gediminas Rickevičius·9 min
Knowing that we would like to detect a precious metals mispricing, we can look into some fundamental driving factors behind different metals and their pricing.
While gold is commonly used to hedge against both cash inflation and equity markets, Ben Bernanke is an outspoken critic on its viability as an investment. We will use this to look through gold and related precious metals to look at the spreads between them, a common statistical arbitrage technique.
Searching for Data
Now that we know we’d like to search for data related to gold and other precious metals, we can begin looking for datasets. There are several commercial alternative data providers who use high-powered telescopes, satellite imagery, and proprietary sources to collect and sell some of the alternative datasets we might be interested in, and they can be fantastic sources for ideas. However, these solutions are incredibly expensive, and for the purposes of our gold spreads thesis, we can leverage free datasets by building our own tools and scouring through free sources.
Quandl is one of these alternative data providers, but they offer a free subscription for retail users and individuals with access to a portion of the datasets that they host on their website. Looking through their offered free datasets, they have one set containing gold price observations
by GBP with a daily resolution, and another containing platinum prices by GBP with a daily resolution. This would be a great place to start to look for trends and anomalies.
To get started with extracting this data into an Excel-friendly format that we can incorporate into our research process, we can leverage some of the freedoms that the free version of Quandl allows us in terms of working with the dataset. Once we sign up for a free account, we are given an “API Key” that we can use to build out a script that will pull this sunspot data and export it to an Excel file. This will require some programming, but only a few lines, and I will provide the full script and a link to an interactive version of the tool we will build to make it easier for those with non-tech backgrounds.
import quandl
This will then be followed by a line giving us access to use Quandl’s datasets, you can think of this as a “login” of sorts, similar to entering a username and password to get access.
quandl.ApiConfig.api_key = 'YOURAPIKEYHERE'
This will then be followed by defining a new variable “df” for data frame, and setting the sunspot data we are interested in to be this new “df”.
df = quandl.get('DATASETLINKHERE')
Where the dataset links for the datasets we want are ‘LPPM/PLAT’ for Platinum, and ‘LBMA/GOLD’ for Gold.
Finally, we want to be able to work with this metadata in a more user-friendly environment and collaborate with members of our investment team who may not want to have to be programming while conducting their research. We will be exporting this data into an Excel file-format, ready to be shared with our team. We can do this with a built-in Python command, and only a single more line of code.
df.to_csv('df.csv')
This will create a CSV, to be opened in Excel, on our Desktop, with the title “df” that we can then conduct our analysis on and generate the investment thesis.
The full script can be copy-pasted from below (I’ve added a “print” command to show our dataset in the Python console):
import quandl
quandl.ApiConfig.api_key = ‘YOURAPIKEYHERE’ df = quandl.get(‘DATASETLINKHERE’)
print (df)
df.to_csv(‘df.csv’)
Here’s our dataset opened in Excel, after I added a column to calculate the spread, ready to be cleaned and visualized.
After removing some unnecessary columns and creating simple charts, we can see the empirical observations collected from Quandl demonstrate the Gold-Platinum spread is at an ALL-TIME HIGH, representing a possible arbitrage opportunity hypothesizing that the spread converges.
A few observations that are interesting to note that we can conclude from the above visualization:
Real-time institutional flow data and trading signals for serious investors.
Explore DataDrivenAlpha →Instantly repurpose any DDI article into a professionally produced short-form video.
Try DDI Media →