For example, consider NFL football data. Focusing on large football game data sets is usually not helpful and often misleading creating little, if any, value and not likely to improve decision-making. While you can find patterns and correlations in large data sets (may provide some limited value) you will not understand why certain things appear to happen. Understanding "why" and "causality" is of high value and helps make better decisions for the future.
Narrowing the focus on a single data set (e.g., turnover takeaway/giveaway ratio, passing completion percentage, touchdown to interception ratio, average yards per pass, dropped passes, red zone scoring percentage, total points scored and yielded...etc.) can create improved meaning and understanding. Yet considering only single data points can also create illusions and still not answer the question "why" a certain thing is happening.
Focusing on a diversity of data points can lead to even greater understanding and meaning and often prevent an illusion of reality (single data points are often incomplete and do not tell the entire story).
Consider, for example, your quarterback is completing a low percentage of passes over a number of games. The goal is to improve the percentage of passes completed. In order to improve quarterback play you need to know why or what are the possible causes of the low pass completion percentage. Here, the potential causes are numerous (e.g., injury, receivers dropping good passes, bad offensive line play leading to time pressure, inability to read defensive coverage's, poor throwing mechanics....etc.). Failure to understand "why" likely leads to wrong or sub-optimal decisions.
Causality matters in attempting to remedy. If the receivers are dropping well placed passes then implementing specific solutions is appropriate. If the quarterback fails to read the defense correctly another remedy makes sense. If offensive line play is poor - disrupting timing - another fix should be attempted. Perhaps there are a number of causes working in concert leading to another strategic solution.
Simply looking at big data (e.g., total offensive or defensive yards) will not provide the right information - and only focusing on the single data point of pass completion percentage will not provide the valuable intelligence to help reach the goal of improving pass completion percentage. Only integrating and analyzing a variety of smaller smart data points will provide the actionable knowledge to make the best possible decisions.
The goal of data science is to consider multiple scenarios, create meaning from data and provide decision-makers with high value information to make the best possible decisions. Data diversity and integration trumps big data to create valuable, actionable intelligence to make better decisions.