The curse of big data is described by Vincent Granville here. Put simply, you will find more "statistically significant" relationships in larger data sets. "Statistically significant" means a statistical assessment of whether observations reflect a pattern rather than just chance and may or may not be meaningful. The larger the data set, the more "statistically significant" relationships will have no meaning - creating greater opportunity to mistake noise for signal. "Signal" means a meaningful interpretation of data based on science that may be transformed into scientific evidence and knowledge. "Noise" means a competing interpretation of data not grounded in science that may not be considered scientific evidence. Yet noise may be manipulated into a form of knowledge (what does not work).
So big data produces more correlations and patterns between data - yet also produces much more noise than signal. The number of false positives will rise significantly. In other words, more correlations without causation leading to an illusion of reality.
"Correlation" means any of a broad class of statistical relationships involving dependence. "Spurious correlation" means a correlation between two variables that does not result from any direct relation between them but from their relation to other variables. "Causation" means the relationship between cause and effect backed by scientific evidence (e.g. relationship between an event (the cause) and a second event (the effect), where the second event is understood as a consequence of the first). "Correlation does not imply causation" is a phrase used in science and statistics to emphasize that a correlation between two variables does not necessarily imply that one causes the other.
Yet humans are hardwired from evolution to see patterns. This is a necessary quality for survival in the jungle, but disserves us in many forms of abstract thinking - especially mistaking meaning from randomness in data. Put another way, mistaking noise for signal.
Big data makes it harder to find the needle (actionable, valuable insights) in a larger and larger haystack. The danger is that we will increasingly be tricked by randomness found in big data and make bad decisions as a result believing noise is signal.
I suggest one good strategy to solve the "curse of big data" problem - in many (but not all) cases - is the intentional and purposeful break down of large data sets into smaller data sets. Creating smaller data sets from big data should be done strategically, not randomly. It is easier to analyze and test small data sets to differentiate signal from noise to extract meaning.
Beware of the curse of big data and avoid mistaking noise for signal. Small data is indeed very beautiful.