CRISP-DM and other approaches

Another popular framework for doing predictive analytics is the cross-industry standard process for data mining, most commonly known by its acronym, CRISP-DM, which is very similar to what we just described. This methodology is described in Wirth, R. & Hipp, J. (2000). In this methodology, the process is broken into six major phases, shown in the following diagram. The authors clarify that the sequence of the phases is not strict; although the arrows indicate the most frequent relationships between phases, those depend on the particularities of the project or the problem being solved. These are the phases of a predictive analytics project in this methodology:

  1. Business understanding
  2. Data understanding
  3. Data preparation
  4. Modeling
  5. Evaluation
  6. Deployment

There are other ways to look at this process; for example, R. Peng (2016) describes the process using the concept of Epicycles of Data Analysis. For him, the epicycles are the following:

  1. Develop expectations
  2. Collect data
  3. Match expectations with the data
  4. State a question
  5. Exploratory data analysis
  6. Model building
  7. Interpretation
  8. Communication

The word epicycle is used to communicate the fact that these stages are interconnected and that they form part of a bigger wheel that is the data analysis process.