Data Mining - Part 2
This law was first stated by David Watkins. We might expect that a proportion of data mining projects would fail because the patterns needed to solve the business problem are not present in the data, but this does not accord with the experience of practising data miners.
Previous explanations have suggested that this is because:
There is always something interesting to be found in a business-relevant dataset, so that even if the expected patterns were not found, something else useful would be found (this does accord with data miners’ experience), and
A data mining project would not be undertaken unless business experts expected that patterns would be present, and it should not be surprising that the experts are usually right.
However, Watkins formulated this in a simpler and more direct way: “There are always patterns.”, and this accords more accurately with the experience of data miners than either of the previous explanations. Watkins later amended this to mean that in data mining projects about customer relationships, there are always patterns connecting customers’ previous behaviour with their future behaviour, and that these patterns can be used profitably (“Watkins’ CRM Law”). However, data miners’ experience is that this is not limited to CRM problems – there are always patterns in any data mining problem (“Watkins’ General Law”).
The explanation of Watkins’ General Law is as follows:
· The business objective of a data mining project defines the domain of interest, and this is reflected in the data mining goal.
· Data relevant to the business objective and consequent data mining goal is generated by processes within the domain.
· These processes are governed by rules, and the data that is generated by the processes reflects those rules.
· In these terms, the purpose of the data mining process is to reveal the domain rules by combining pattern-discovery technology (data mining algorithms) with the business knowledge required to interpret the results of the algorithms in terms of the domain.
· Data mining requires relevant data, that is data generated by the domain processes in question, which inevitably holds patterns from the rules which govern these processes.
To summarise this argument: there are always patterns because they are an inevitable by-product of the processes which produce the data. To find the patterns, start from the process or what you know of it – the business knowledge.
Discovery of these patterns also forms an iterative process with business knowledge; the patterns contribute to business knowledge, and business knowledge is the key component required to interpret the patterns. In this iterative process, data mining algorithms simply link business knowledge to patterns which cannot be observed with the naked eye.
If this explanation is correct, then Watkins’ law is entirely general. There will always be patterns for every data mining problem in every domain unless there is no relevant data; this is guaranteed by the definition of relevance.
Data mining amplifies perception in the business domain
How does data mining produce insight? This law approaches the heart of data mining – why it must be a business process and not a technical one. Business problems are solved by people, not by algorithms. The data miner and the business expert “see” the solution to a problem, that is the patterns in the domain that allow the business objective to be achieved. Thus data mining is, or assists as part of, a perceptual process. Data mining algorithms reveal patterns that are not normally visible to human perception. The data mining process integrates these algorithms with the normal human perceptual process, which is active in nature. Within the data mining process, the human problem solver interprets the results of data mining algorithms and integrates them into their business understanding, and thence into a business process.
This is similar to the concept of an “intelligence amplifier”. Early in the field of Artificial Intelligence, it was suggested that the first practical outcomes from AI would be not intelligent machines, but rather tools which acted as “intelligence amplifiers”, assisting human users by boosting their mental capacities and therefore their effective intelligence. Data mining provides a kind of intelligence amplifier, helping business experts to solve business problems in a way which they could not achieve unaided.
In summary: Data mining algorithms provide a capability to detect patterns beyond normal human capabilities. The data mining process allows data miners and business experts to integrate this capability into their own problem solving and into business processes.
Prediction increases information locally by generalisation
The term “prediction” has become the accepted description of what data mining models do – we talk about “predictive models” and “predictive analytics”. This is because some of the most popular data mining models are often used to “predict the most likely outcome” (as well as indicating how likely the outcome may be). This is the typical use of classification and regression models in data mining solutions.
However, other kinds of data mining models, such as clustering and association models, are also characterised as “predictive”; this is a much looser sense of the term. A clustering model might be described as “predicting” the group into which an individual falls, and an association model might be described as “predicting” one or more attributes on the basis of those that are known.
Similarly we might analyse the use of the term “predict” in different domains: a classification model might be said to predict customer behaviour – more properly we might say that it predicts which customers should be targeted in a certain way, even though not all the targeted individuals will behave in the “predicted” manner. A fraud detection model might be said to predict whether individual transactions should be treated as high-risk, even though not all those so treated are in fact cases of fraud.
These broad uses of the term “prediction” have led to the term “predictive analytics” as an umbrella term for data mining and the application of its results in business solutions. But we should remain aware that this is not the ordinary everyday meaning of “prediction” – we cannot expect to predict the behaviour of a specific individual, or the outcome of a specific fraud investigation.
What, then, is “prediction” in this sense? What do classification, regression, clustering and association algorithms and their resultant models have in common? The answer lies in “scoring”, that is the application of a predictive model to a new example. The model produces a prediction, or score, which is a new piece of information about the example. The available information about the example in question has been increased, locally, on the basis of the patterns found by the algorithm and embodied in the model, that is on the basis of generalisation or induction. It is important to remember that this new information is not “data”, in the sense of a “given”; it is information only in the statistical sense.
The value of data mining results is not determined by the accuracy or stability
of predictive models
Accuracy and stability are useful measures of how well a predictive model makes its predictions. Accuracy means how often the predictions are correct (where they are truly predictions) and stability means how much (or rather how little) the predictions would change if the data used to create the model were a different sample from the same population. Given the central role of the concept of prediction in data mining, the accuracy and stability of a predictive model might be expected to determine its value, but this is not the case.
The value of a predictive model arises in two ways:
The model’s predictions drive improved (more effective) action, and
The model delivers insight (new knowledge) which leads to improved strategy.
In the case of insight, accuracy is connected only loosely to the value of any new knowledge delivered. Some predictive capability may be necessary to convince us that the discovered patterns are real. However, a model which is incomprehensibly complex or totally opaque may be highly accurate in its predictions, yet deliver no useful insight, whereas a simpler and less accurate model may be much more useful for delivering insight.
The disconnect between accuracy and value in the case of improved action is less obvious, but still present, and can be highlighted by the question “Is the model predicting the right thing, and for the right reasons?” In other words, the value of a model derives as much from of its fit to the business problem as it does from its predictive accuracy. For example, a customer attrition model might make highly accurate predictions, yet make its predictions too late for the business to act on them effectively. Alternatively an accurate customer attrition model might drive effective action to retain customers, but only for the least profitable subset of customers. A high degree of accuracy does not enhance the value of these models when they have a poor fit to the business problem.
The same is true of model stability; although an interesting measure for predictive models, stability cannot be substituted for the ability of a model to provide business insight, or for its fit to the business problem. Neither can any other technical measure.
In summary, the value of a predictive model is not determined by any technical measure. Data miners should not focus on predictive accuracy, model stability, or any other technical metric for predictive models at the expense of business insight and business fit.
The patterns discovered by data mining do not last forever. This is well-known in many applications of data mining, but the universality of this property and the reasons for it are less widely appreciated.
In marketing and CRM applications of data mining, it is well-understood that patterns of customer behaviour are subject to change over time. Fashions change, markets and competition change, and the economy changes as a whole; for all these reasons, predictive models become out-of-date and should be refreshed regularly or when they cease to predict accurately.
The same is true in risk and fraud-related applications of data mining. Patterns of fraud change with a changing environment and because criminals change their behaviour in order to stay ahead of crime prevention efforts. Fraud detection applications must therefore be designed to detect new, unknown types of fraud, just as they must deal with old and familiar ones.
Some kinds of data mining might be thought to find patterns which will not change over time – for example in scientific applications of data mining, do we not discover unchanging universal laws? Perhaps surprisingly, the answer is that even these patterns should be expected to change.
The reason is that patterns are not simply regularities which exist in the world and are reflected in the data – these regularities may indeed be static in some domains. Rather, the patterns discovered by data mining are part of a perceptual process, an active process in which data mining mediates between the world as described by the data and the understanding of the observer or business expert. Because our understanding continually develops and grows, so we should expect the patterns also to change. Tomorrow’s data may look superficially similar, but it will have been collected by different means, for (perhaps subtly) different purposes, and have different semantics; the analysis process, because it is driven by business knowledge, will change as that knowledge changes. For all these reasons, the patterns will be different.
To express this briefly, all patterns are subject to change because they reflect not only a changing world but also our changing understanding.