I would be a big fat liar if I say I have never wasted time. I’ve been unproductive and procrastinated plenty of times. I have spent hours watching funny dog videos on Facebook (but they are DOGS!!).
Wasting time is one in the vast sea of analytics hot-button topics. Of course, anything that wastes time also wastes money. The idea is gaining a bit of notoriety, as analytics practitioners seemingly spend more time in so-called time-wasters. So, people say things like:
Analytics practitioners waste time spending 80% of the time data wrangling, instead of doing high-value tasks like feature engineering and selection.
I heard it again recently almost verbatim in a public webinar aimed at an audience of business leaders. You know they all would love to make the most out of their analytics investment to get more value.
I agree that trivial activities are indeed a waste of time, not to mention highly frustrating. They include things like tracking down or even reverse-engineering data documentation, discovering and correcting technical errors in data, waiting for access, etc. I see them too often with my own two physical and virtual eyes.
I do, however, have an issue with calling data wrangling a waste of time. A valuable analytics professional learns so much about the business context of the data from wrangling with it. You gain a greater understanding of the business from this 80% of your activities that supposedly have little value.
On the contrary, many of these “high-value tasks” like feature engineering and selection, technique selection, and other specialized tasks are, in fact, prime candidates for automation. It is not trivial to note that automation is a business objective, not an analytical one.
But the value to whom?
Who determines the value of an analytical activity or task?
Analytics practitioners need to understand what activities truly contribute toward producing business value. The value of a product is determined by the consumer, and in turn, the value of any activity to create the product is based on how much that activity adds toward the said product. Newsflash—it’s the consumers of the analytics who determine the value of analytics. Consequently, the value of what an analytics practitioner does to build the analytic is not based on what each activity means to him or her.
I’ve been a big advocate of Lean and other similar approaches to process improvement. They start with the value to the customer then back into what adds value along the way. It is relatively intuitive to understand that the consumers determine the value of a product in today’s consumer market. We also intuitively understand that products that have little value to consumers do not sell. We are all consumers ourselves, after all.
The trivial activities mentioned earlier are non-value-add to the consumer of analytics. Actually, they don’t add any value for any stakeholder in the process. But the problem with the statement above is that it is based on the value system of the analytics practitioner. It does not reflect the perspective of the consumer, the business. It erroneously implies that the value of an analytical activity is determined by what it means to the person carrying it out.
Analytical elitism?
Too often, we are too eager to put into practice all the fancy techniques and other specialized skills. We want to do all the cool stuff and only the cool stuff. Sure, data wrangling generally does not require all the fancy math and algorithms. But some treat data wrangling as if it is beneath them; in a way, they are analytical elitists or snobs.
Data wrangling may be technically uninteresting to the analytics practitioner. However, the understanding you get from it is highly valuable to your ability to provide value to the business. Personally, I find this the most interesting part of the entire analysis process. If analytics practitioners want to provide value with what they do rather than get value out of what they do, it starts with recognizing that it is not about the value to themselves but rather to the consumer of their product.
Unfortunately, a lot of what people call analytics, statistics, data science, machine learning, etc., are just analytical mechanics—the carrying out of the tasks associated with the techniques. Contrary to popular perception, applying analytical mechanics is relatively straightforward. You can automate straightforward things with logic.
The whole of analytics is much more than that. It’s the things that are less than straightforward that make one valuable, and very few are successful at those. The statement above also reflects the troubling notion that there is a higher value in doing than in understanding. This could not be more wrong! I have had to talk too many people out of it. Data wrangling is not straightforward, and it is high-value. Many activities that are not straightforward in just about anything are high-value.
Analytical egocentricity?
The challenge is that we analytics practitioners are pretty much hard-wired to do problem-solving. We jump right into solving what we perceive, or worse, arrogantly decide, to be the problem. We don’t take the time and effort to analyze the problem itself. Many expect the problem to come already well-articulated by the consumer, ready to apply their deep technical expertise. Some even consider the ambiguity to be beneath them. I can say from experience that it is very challenging to get them to understand the problem before starting to solve it. I’ve managed, coached, and otherwise worked with hundreds of analytical professionals.
Much of this happens during the data wrangling activities. Sure, certain repetitive tasks should be standardized, but not automated; automation takes the human out of the process. From the perspective of the consumer—the researcher or the business—this is not low-value by any means.
In analytics, one of the big issues we have as a profession is egocentricity. What I want to do with the data is often more important. We don’t pay much attention to the fact that we don’t own the business or research problems to be solved with the said data.
To be a valuable analytical professional, you need to be not only willing but thirsting to understand the problem itself. Inadequate efforts in data wrangling activities speak volumes about where his or her true interest lies.
Let’s eliminate our own jobs!
As a hiring manager and an advisor to hiring managers in analytics, I have always considered a red flag when a candidate says something along the lines of “I just want to build models.” Analytics professionals are hired to solve problems and add business value, not to build models as an end objective. If we’re not interested in understanding the problem completely, we might as well apply automated mechanics and automated data wrangling. We’re not learning anything about the problem itself from it anyway! That would be a path to almost completely automated analytics, with which we can drive ourselves out of our jobs. Analytics may even be the first profession replaced by artificial intelligence. Wouldn’t that be ironic?
We need to reevaluate the value of different activities in data so that they align with what the consumer of the analytic values and not what the practitioner values. Business leaders and analytics practitioners alike need to mind the consumer at the end of the value chain.
Automate “low-value” analytical activities like data wrangling? As they say, be careful what you ask for, because you just might get it.
Hi Michiko,
Thanks for the great article.
As a data analyst, what you have written does resonate with me. I do find data wrangling a integral part of the job and do wish some days that it could just be completely automated.
I think that data wrangling does help with understanding the problem better. This is something that automated processes can’t do. For instance, a computer can’t really map between two data sets, especially when you are mapping between data sets where one is new, and the older requires legacy knowledge and has too many gaps in it.
Hope to see more great articles from you.
Thanks
Jason
Great advice! I work at a company Evo and one of the biggest things we try to do for our clients is to eliminate this kind of repetitive work to free them up to make impact. It’s something every business should make a priority.
Paolo