Andy’s Neural Works

A wise folly can lead to great things!


Approaching Data Mining Projects

Published by

on

Introduction

For the uninitiated, having a data mining project land on your lap can be disorienting. How do you start? What steps are you going to follow? What is the conclusion that needs to be reached?

For millennia (maybe not that long), data have been collected and stored for the one day they are needed. It is true that collection can come from many different sources in different formats. Surveys, transactions, wearables, sensors, articles, manual entry, scraping, cave wall paintings, and so on are all simple examples of how data are sourced. Having so many sources might benefit you in your project if they are relevant. It is also wonderful that there are so many tools and languages built for data analysis.

Storytelling

The reason why data projects happen is that there is a need for storytelling. I do not mean Fairy Tales of Grimm but those dealing with the business or group you are working in. The analyst dives deep into data and determines how everything fits together. They take their knowledge of the business and see how the sources can show where things were, where things are, where things are going or even where things could be going.

As an example, think about a series of products being built in a factory and sold to big box stores. The analyst could very well have access to sales, inventory, delivery, and production data. Along comes the CEO of the company and wants to know the performance of his company. Some might go purely on sales data and start off with “your daily sales are x dollars per day.” 

Boring…

That is not good enough? I believe that we can be more creative storytellers in this example. How about not only talking about current sales values but talk about how they compare to the previous sales. Then, compare to a prediction using a regression model?

Still further, bring in the production, inventory, and delivery data. That would show the supply chain and possible gaps. Can you think of how to use data to relay how the product lines travel from build to sale? If so, you are now a storyteller. If not, do not dread. It takes time and practice.

How can you do this effectively? It is not simple, but the pathway has been built already!

The Process

What I want to introduce are the steps that a data analyst typically goes through. Fortunately, there are tested processes that one can learn about to help achieve victory. For purposes of keeping things simple, let us talk about the Cross-Industry Standard Process for Data Modeling. This is also known as the CRISP-DM.

This process, or methodology, can be listed in the following steps:

  1. Business Understanding
  2. Data Understanding
  3. Data Preparation
  4. Modeling
  5. Evaluation
  6. Deployment

Even though approachable in a linear format, there are points in the process that can go back to previous steps. This is meant to avoid the scenario where you go all the way to the end only to discover an issue several steps backward. That could overwhelm you with rework. You will see what I mean by following along with this process diagram1.

CRISP-DM Process Flow1

As you can tell, there is more to this than a step-by-step process. If something is discovered mid-cycle which is not-quite-right, then you can take a step back and think it out again. This process is meant to help you. Mastery does require knowledge as well as experience gathered over years (often of trial and failure).

Why?

The question you do need to ask is why a process like CRISP-DM needed? Could you just “wing it” and figure it out as it goes? The answer to the “why” comes the day your boss decides to give you the work with deadlines. Remember that trial and failure comment? 

Think about how to reduce the failure part to save redoing steps. If you do not have any direction, then you will make more mistakes/head in the wrong direction of the maze than you desire. Once you learn the process and see benefits, then it is up to you and your experience to come up with your own pathways. 

I am merely introducing the topic to you. My hope is that you go out and do your own searching from here. There are some links below that should get you started on your journey. Do not forget, you are the storyteller, your mind is the stage, and the data are your actors.

References

[1] IBM. CRISP-DM Help Overview. March 8, 2021. Retrieved from: https://www.ibm.com/docs/en/spss-modeler/18.2.0?topic=dm-crisp-help-overview

[2] Rodrigues, Israel. Toward Data Science. CRISP-DM methodology leader in data mining and big data. Feb 17, 2020. Retrieved from: https://towardsdatascience.com/crisp-dm-methodology-leader-in-data-mining-and-big-data-467efd3d3781

[3] Wikipedia. Cross-industry standard process for data mining. Retrieved from: https://en.wikipedia.org/wiki/Cross-industry_standard_process_for_data_mining

[4] Luna, Zipporah. Analytics Vidhya. Understanding CRISP-DM and its importance in Data Science projects. July 21, 2021. Retrieved from: https://medium.com/analytics-vidhya/understanding-crisp-dm-and-its-importance-in-data-science-projects-91c8742c9f9b

2 responses to “Approaching Data Mining Projects”

  1. […] good number of data projects follow the CRISP-DM. This one will be no different. Here is a list of the specific steps I want to […]

    Like

  2. […] If I had to analyze some product or service without in-depth knowledge, I would have to do a manual step to gain a deep understanding. That’s important and goes back to the CRISP-DM. Do you understand the business? Do you understand … […]

    Like