A Lazy Man’s Guide to Data Driven Investing

9 min read

Recently, I’ve read Dr Justin Chan’s piece on Data Driven Investing.

I found it quite interesting and eye opening:

Big Data intelligence firm for businesses and investors, used satellite imagery of JCPenney parking lots during the quarter to confirm that traffic into its stores across the country was in fact increasing.

The firm’s clients (mostly hedge funds) who paid to obtain this satellite imagery could thus deduce, virtually in real-time, that JC Penney’s performance was on the up. And many of them ultimately capitalized on this information by buying JCPenney stock well before the release of the company’s Q2 report in August — and well before the 10% price jump.

Chan J, 2019, The Rise of Data-Driven Investing, Data Driven Investing

Frankly, before I read this, I wouldn’t have thought to use satellite imagery to measure the busyness of stores.


Even though I’m trained in data analysis and machine learning, I still find using alternative data sources to make decisions too hard to do.

By hard I mean, you need to do the following:

If you’re a one man show like myself, this is just way too time consuming, especially if you have to juggle a career and family.


This guide is broken up into 3 sections.

Section 1: Why you shouldn’t be too much into data driven decision making.
Section 2: How to think like a data driven decision maker.
Section 3: 5 Tips on how to be lazy.

Why you shouldn’t be too much into data driven decision making

Photo by Mika Baumeister on Unsplash

As mentioned previously, you shouldn’t be too much into data driven decision making as it is hard.

Reason 1 — Most time spent on data is cleaning it.

This difficulty would be compounded especially if you don’t have any programming skills, particularly in R, SQL or Python.

In my humble opinion, I believe how the data came about is what is most important in data driven investing.

For instance, I wouldn’t really pay anyone to do data analysis for me as what you’re really paying for is for someone to clean your data.

A quote from Trifacta, neatly summarises what happens in data analysis.

80% of the time spent on data analytics is allocated to data munging, where IT manually cleans the data to pass over to business users who perform analytics. Data munging is time-consuming and disjointed process gets in the way of extracting true value and potential from data.

This is something I can vouch too as I have spent endless of hours cleaning data rather than actually finding value, and when I do find something of value, I realise that my results suck and I have to clean the data again.

To be frank, if your data is in a table, you can simply clean the data using Excel and use some machine learning software eg. Weka to do the machine learning for you. This can also be done in one shot in PowerBI if you’re familiar with the software.

If you are using imagery data, you can use GCP Vision AI to find results for you. All you need to do is feed images into the API.

Reason 2 — The point of data isn’t to find some sort of hidden trend.

Most people don’t understand data in the first place. By this I mean, the purpose of data isn’t to find a hidden trend. The purpose of data is to prove or disprove your hypothesis — ‘The Scientific Method’.

For example, the use of satellite imagery to analyse JCPenney parking lots to monitor store traffic isn’t some secret. Rather, someone was smart enough to be creative in their search for data to either support or disprove their idea.

What I think really happened was this: someone recently visited JCPenney to find that it was quite busy. The financial market says that profits were falling and probably had attributed it to online shopping. This person thought otherwise so used satellite imagery to either prove or disprove his hypothesis.

As a side note, if you didn’t have the big bucks to pay for real time satellite imagery, you could also drive to a few of your closest JCPenney’s to check. Or better yet, you can use Google Maps to check the surrounding traffic of a few JCPenney’s as a means of using ‘data’ and do this everyday at various times for a few weeks.


Photo by Rob Schreckhise on Unsplash

You don’t need to be overly well versed in data foundations to be a data driven investor. The reason is that the people in general make data seem more complex than what it is.

The simplest way to think of data is this: I’ve got $2. The shop sells a small cheeseburger for $2 and a large cheeseburger for $3. My data, the $2, tells me that I can only afford the small cheeseburger. — This is data driven investing in a nutshell.

Thinking Skill 1 — Data can be aggregated and disaggregated. Data can be correlated and causal.

Firstly, you need to realise that data points can be combined and separated. In other words, data can be added up, subtracted, multiplied, and divided. This too can happen with image, video and sound data via pre-processing.

*As a side note, even with simple satellite images, they still need to be processed so that they can become viewable.

Perhaps, what is lost with many people is that they take the highest aggregation as the truth and do not drill down further.

Here’s an example by what I mean:

Fig 1. ASX:ALL — 2016 Report: P&L Statement
Fig 2. ASX:ALL — 2016 Report: Notes 1.1
Fig 3. ASX:ALL — 2016 Report: Cashflow Statement

You can find more of Aristocrat’s Annual Reports here.

The revenue from 2015 to 2016 jumped by 546.3 million (fig 1). If you didn’t know any better you’d think the company had a good run for that year. But, if you decided to look at the foot notes (fig 2), you’ll see that revenue jumped across the board. However, if you decided to dig a little bit more, you can see from the Cash Flow statement (fig 3) that the company actually made a substantial acquisition in 2015. In most cases, acquisitions usually add to the revenues for the following year.

What we could guess from our data points above:

1. We could correlate with 2016 being a better year from the company than 2015, since all revenues from all segments went up.

2. We could say that the cause of 2016’s revenue improving is because of acquisitions made in 2015.

This is more annual report reading over alternative data decision making, but the concept is the same: data is data. As you can see data can be broken down and built up, but also data points can be influenced by other factors.

Even if you had the whole data as a big data format (ie. all transactional data), finding the relevant trends aren’t any easier. You can find so many correlated trends from a big data set but very little actual causal trends. All you can really do is aggregate the data and see trends for a specific variable and don’t think too hard in trying to find causal trends.

*As a side note, what supervised machine learning does is find many correlated trends to your target variable and makes a prediction of the future based on these correlations. Sometimes it is successful but many times it fails miserably.

Thinking Skill 2 — Understand that you’re a bias thinker and know that you’re always better off proving yourself wrong than proving yourself right.

Chapter 15 in ‘HBR Guide to Data Analytics Basics for Managers’ provides a good list of biases that we tend to suffer from when looking at data. The list includes confirmation, overconfidence, and overfitting.

Confirmation bias occurs when you read data in such a way that it supports your view or you twist the logic so that it does support your view.

For instance, company A has seen an increase in profit 15% year on year for the last 3 years. You will believe this will continue. Two reports come out. One report suggests the profits will continue to rise and the other suggests a sluggish year will come. You ignore the negative report and invest based only on the positive report’s suggestions.

Overconfidence occurs when you have such faith in the data that you believe it can’t be wrong.

For example, company A’s revenue has grown 15% year on year for the last 5 years. You are confident that it will continue growing 15% more in the future. Someone tells you that the company’s leverage has also increase 20% year on year as well. You confidently tell that person that you believe the company will surely pay off their growing debt in one swoop one day. In the following year, the company goes bankrupt.

Overfitting occurs when you found a trend and you believe it holds true for all situations.

As an example, you have found that 20 years ago that the Christmas period always increases the profit of large department stores. So, you decide to invest heavily in a particular department store just before Christmas. However, the following quarter’s financial statements show that revenue has fallen. Reports explain that online shopping has boomed for that Christmas period.

You could come up with strategies to tackle all three biases individually, or instead you can use one general strategy that tackles as many of your biases as possible.

That strategy is to prove yourself wrong.

For example, you think Domino’s Pizza Enterprise’s (ASX:DMP) investment in the German market will be successful, so you actively find data that could disprove your hypothesis. You can go onto the Eurostat webpage to conduct your research. You can see recently that the consumption of pizza fell in Germany from July to December 2020 but rose again slightly from January to March 2021. An educated guess might lead us to believe that pizza in Germany is consumed less in warmer months but more in cooler months, or perhaps, the German market didn’t like current options and only recently better options attracted more consumers.

Clearly, this data neither supports nor negates your hypothesis but it gives you the idea that hunting for data that might prove you wrong is important for data driven investing.

5 Tips on how to be lazy

Photo by Priscilla Du Preez on Unsplash

The purpose of these tips is mostly to drive you away from analysis paralysis sort of thinking to understanding how data flows around us. Analysis of data is hard work and you don’t really need to waste your time on too much analysis. Furthermore, you don’t need an academic background in data to understand how data flows. All you need to do is engage with data and it’ll come intuitively.

As you can read, the whole pint of the guide is to dissuade you from believing that data will give some sort of large competitive advantage in investing. If you think it will, then you’ll be at a terrible disadvantage. There are many big data firms out there already mining data sources to gain the competitive advantage. For example, TwoSigma uses machine learning to find how the news affects the share market in real time.

However, the advantage that you have is that really smart people tend to focus on things so intensely to the point that they suffer from diminishing returns and often don’t know when to give up. (sunk cost fallacy)

So, while data scientists focus intently on using machine learning algorithms to find hidden patterns and trends in data sets, all you really need to do is understand how a business work, hypothesise if a revenue stream is successful, and go out in the world to experience it yourself or find a data set as a proxy for your ideal data set.

If you want to see how I evaluate companies using financial and business data, you can see here: https://sites.google.com/view/focusanalysis/insights-and-analysis

Jason Huynh I'm a data analyst who enjoy reading annual reports. My hobbies include exercise, cooking and being a well rounded dad. I work as an analyst in the higher education sector in Australia but my passion is in investing. I used to believe that data could solve everything but it wasn't until I read Charlie Munger's "Poor Charlie's almanack" that I realised that I've been thinking in silos all this time and I really needed to expand my experiences and reading. What concerns me about life is making silly choices and following the trend aimlessly. I believe in critical thinking and serving others as I would like to be served.

4 Replies to “A Lazy Man’s Guide to Data Driven Investing”

  1. Good article, very useful for me. I have a related article with the same discussion. Please visit this website, let’s discuss together with me: http://news.unair.ac.id/2021/02/19/tips-investasi-saham-bagi-pemula/

  2. I believe things are a bit simpler than that. DMR my company provides alternative data for trading from social media and other online posts. We call the dataset social intelligence. The following steps hopefully illustrate what I mean when I say simple:
    1) take a test dataset with 2-3 years backdata and run correlation or regressions of the daily metrics in the dataset against stock price fluctuation the next day
    2) choose the metrics that have high enough correlation (which in this case implies causation) e.g. press releases in the news about firing or hiring a new CEO cause funds to sell or buy depending on their jusgement
    3) use the chose metrics to predict if a stock will go up or down the next day

  3. As a passive investor two common ways are to buy index funds or ETFs.

    Because index funds and ETFs let you invest in holdings from various industries, passive investing can help you diversify, so even if one asset in your basket has a downturn, it shouldn’t affect your entire portfolio.

    All your points are valid for those why choose to be lazy investors. Thanks for the article.

Leave a Reply

Your email address will not be published. Required fields are marked *