Photo by Ludde Lorentz on Unsplash
The Data Analysis Lifecycle
How to avoid “Analysis for the sake of Analysis”
Apr 30th, 2020 | 6 min read

In today’s Data Science hyped world, organizations are increasingly looking for answers from the data they collect. The good news is, this means more business leaders are turning towards data-driven decision making. Unfortunately though, this often turns into you as a Data Analyst / Scientist going on a wild goose chase for answers that may or may not be in the data.

"Often, one looks at available data and tries to discern patterns. But is the pattern for real, or just the product of data snooping — that is, extensive hunting through the data until something interesting emerges? There is a saying among statisticians: “If you torture the data long enough, sooner or later it will confess.”

[1] Peter Bruce & Andrew Bruce, Practical Statistics for Data Scientists: 50 Essential Concepts (2017), O’Reilly Media, Inc.

More than just Analysis

As Data Analysts and Scientists, our job is not only to write the code and tell the story of the data but to set the expectations of our leadership and business stakeholders. We are the gatekeepers between the two worlds of data and business, and we must speak both languages fluently.

Before doing any analysis, we must first understand the problems our clients (internal or external) are facing. Specifically, what business decisions are they trying to answer and what value will our analysis bring? Too often the thought process is “if we get the data to the right people and have them look through it long enough, all our problems will be solved”. This is a false hope and often leads to time spent writing code and building visualizations that will eventually be thrown away. That is the best-case scenario. Worse, and more costly, is having your client using these analyses to make decisions for their business.

The term “GIGO” (Garbage In, Garbage Out) is often used within the data community. We know that if data was collected without a good design of experiment, or if the data is incomplete, the results will be, well, garbage. The same mantra applies to the process of the Analysis Lifecycle: unclear, non-pointed, or nonexistent business questions will result in bad analyses and therefore bad business decisions.

It is our job as the gatekeeper to slow down the desire to get to the results as fast as possible. It is our job to sit down with our clients and help them pinpoint what exact questions they want to be answered. Only then should we look through the data available to us to determine if this question can be partially or fully answered at all. This should all be done without writing a single line of code. Without this important step in the Data Analysis Lifecycle, we are leading our clients down a road where they believe they are informed, but in reality, we are just spreading misinformation. As this relationship between the business and the analyst goes unchecked, the effect will only be amplified.

The Data Analysis Lifecycle

Photo by Helloquence on Unsplash



To lead our clients down the right path, we need to take a step back from the technology and start to understand their world. The Data Analysis Lifecycle can help us do that. It’s a framework that I use to make sure I’m not spending countless hours building something that won’t be used or used for the wrong reason. Whenever I have a new client or a new ask from my current client, I step through this process.

  1. Understand the Problem — What are the pain-points your client has? How does this fit into their overall strategy for their business? This is where the “Domain Knowledge” aspect of Data Science comes into play
  2. Identify the Decisions and Pinpoint the Questions — Sit down with your client and ask them the decisions they need to make. These decisions may be the decision to the problem itself or a set of smaller decisions that will ultimately enable the client to solve that problem. Then, ask them to pinpoint what questions they need to be answered to make those decisions. If they are not sure or the questions are a bit unclear, help formulate these questions with them. Document the decisions and questions and make sure they verify them
  3. Check the Data — Make sure the data you have is sufficient and was collected in the right manner to answer the questions your client agreed upon. If not, it’s back to the drawing board. Go back to the client and explain that the data you have is not sufficient, and either (1) ask for the data you need if you know it exists or (2) present a few alternative questions you can answer with the data you have
  4. Do the Analysis — Now, the fun part. Here is where you use all of those skills and technologies we get excited about. I won’t go into detail on this as there are already many articles outlining this topic. What I will say, however, is to carefully think about which models/algorithms will sufficiently answer the client’s questions before diving in
  5. Interpret the Results Beforehand — It’s important to always have an unbiased opinion when reviewing the results of an analysis. The business may want one particular result, and oftentimes we want to please them by giving them what they want. We need to be objective when looking at the results; we are not there to manipulate outputs, we are there to translate the results as they are
  6. Present the Results and Get Feedback — Translate the outputs of your analysis in terms that the business cares about. Terms like “MSE” and “Recall” are just jargon. Here is where your fluency in both technology and business is essential. Present the findings in terms of actionable insights to your client; they will appreciate your understanding of what is important to them. Then, if they ask technical questions, feel free to deep-dive into some of those aspects. Finally, make sure to leave the meeting with the client ready to make those business decisions. If they aren’t comfortable to do so just yet, transition into Step 2 before the meeting ends
  7. Repeat as Needed — If your analysis leads to more questions during Step 6, go back and repeat these steps. This is often the process of an ongoing relationship between you and your client. Feel free to skip steps 1–3 at this point, as long as you (1) still fully understand the client’s business problem, (2) have discussed the new decisions and questions during the presentation of your results, and (3) you feel you have a good grasp on what’s already in the data

Conclusion

You’ll notice that only one of the steps in the Data Analysis Lifecycle involves the actual analysis. I am not minimizing the effort it takes to do an analysis; I am showing that the analysis itself — no matter how difficult or long the process is — is just one piece that fits into the bigger puzzle. Continue to hone your technical skills and keep up to date on the latest technology — but also understand that our job is more than just writing code and building visualizations.

If you follow these steps, you will begin to see a new relationship to develop with your client. They will start to see you’re capable of more than just analysis, and the process of being thrown down the rabbit hole with the mantra “just look through the data until you find something interesting” will eventually come to an end.

This article was written by Tom Sharp.
Consider supporting them so they can create more quality content.