In today’s Data Science hyped world, organizations are increasingly looking for answers from the data they collect. The good news is, this means more business leaders are turning towards data-driven decision making. Unfortunately though, this often turns into you as a Data Analyst / Scientist going on a wild goose chase for answers that may or may not be in the data.
"Often, one looks at available data and tries to discern patterns. But is the pattern for real, or just the product of data snooping — that is, extensive hunting through the data until something interesting emerges? There is a saying among statisticians: “If you torture the data long enough, sooner or later it will confess.”
 Peter Bruce & Andrew Bruce, Practical Statistics for Data Scientists: 50 Essential Concepts (2017), O’Reilly Media, Inc.
As Data Analysts and Scientists, our job is not only to write the code and tell the story of the data but to set the expectations of our leadership and business stakeholders. We are the gatekeepers between the two worlds of data and business, and we must speak both languages fluently.
Before doing any analysis, we must first understand the problems our clients (internal or external) are facing. Specifically, what business decisions are they trying to answer and what value will our analysis bring? Too often the thought process is “if we get the data to the right people and have them look through it long enough, all our problems will be solved”. This is a false hope and often leads to time spent writing code and building visualizations that will eventually be thrown away. That is the best-case scenario. Worse, and more costly, is having your client using these analyses to make decisions for their business.
The term “GIGO” (Garbage In, Garbage Out) is often used within the data community. We know that if data was collected without a good design of experiment, or if the data is incomplete, the results will be, well, garbage. The same mantra applies to the process of the Analysis Lifecycle: unclear, non-pointed, or nonexistent business questions will result in bad analyses and therefore bad business decisions.
It is our job as the gatekeeper to slow down the desire to get to the results as fast as possible. It is our job to sit down with our clients and help them pinpoint what exact questions they want to be answered. Only then should we look through the data available to us to determine if this question can be partially or fully answered at all. This should all be done without writing a single line of code. Without this important step in the Data Analysis Lifecycle, we are leading our clients down a road where they believe they are informed, but in reality, we are just spreading misinformation. As this relationship between the business and the analyst goes unchecked, the effect will only be amplified.
To lead our clients down the right path, we need to take a step back from the technology and start to understand their world. The Data Analysis Lifecycle can help us do that. It’s a framework that I use to make sure I’m not spending countless hours building something that won’t be used or used for the wrong reason. Whenever I have a new client or a new ask from my current client, I step through this process.
You’ll notice that only one of the steps in the Data Analysis Lifecycle involves the actual analysis. I am not minimizing the effort it takes to do an analysis; I am showing that the analysis itself — no matter how difficult or long the process is — is just one piece that fits into the bigger puzzle. Continue to hone your technical skills and keep up to date on the latest technology — but also understand that our job is more than just writing code and building visualizations.
If you follow these steps, you will begin to see a new relationship to develop with your client. They will start to see you’re capable of more than just analysis, and the process of being thrown down the rabbit hole with the mantra “just look through the data until you find something interesting” will eventually come to an end.