Choosing a Big Data Technology Stack for Digital Marketing
Different techniques for evaluating, planning or designing a technology stack for digital marketing.
Recent surveys suggest the number one investment area for both private and public organizations is the design and building of a modern data warehouse (DW) / business intelligence (BI) / data analytics architecture that provides a flexible, multi-faceted analytical ecosystem. The goal is to leverage both internal and external data to obtain valuable, actionable insights that allows the organization to make better decisions.
Unfortunately, the amount of recent DW / BI / Data Analytics innovation, themes and paths is causing confusion. The "Big Data" and "Hadoop" hype is causing many organizations to roll-out Hadoop / MapReduce systems to dump data into without a big-picture information management strategic plan or understanding how all the pieces of a data analytics ecosystem fit together to optimize decision making capabilities.
This has resulted in the creation of a new word: Hadump - meaning data dumped into Hadoop with no plan. There are two schools of thought about data collection and storage strategy:
1) Start big data analytics project with a specific use case or problem to solve
2) Start dumping data to store and analyze later
We strongly suggest using both strategies. One is short term for quick results and other for long term value.
Consider only about 30% of all collected data will be valuable. The problem is you do not know what 30% will indeed be valuable. Thus, it is prudent to collect and store all data: structured and unstructured as well as internal and external.
The cost of collecting and storing the data - and data analytics technology - has been significantly reduced and will get cheaper and cheaper.
The cost of analyzing the data for valuable, actionable insights is very high. While machine learning and automation will reduce cost in future, the formula of cheap, abundant data and expensive data science and business analytics will likely remain for some time.
Thus, start a data analytics project to solve a specific problem or to take advantage of an opportunity to demonstrate value. Yet understand the long term value of saving any and all data for future analysis - as the specific use case arises.
More importantly, it is crucial to spend time and resources to develop both an information management strategic plan and decision optimizing processes. Data science knowledge and business processes detailing the collection, storage, analysis and distribution of data is the magic sauce that orchestrates the data tech ingredients.
A traditional BI architecture has analytical processing first pass through a data warehouse.
In the new, modern BI architecture, data reaches users through a multiplicity of organization data structures, each tailored to the type of content it contains and the type of user who wants to consume it.
The data revolution (big and small data sets) provides significant improvements. New tools like Hadoop allow organizations to cost-effectively consume and analyze large volumes of semi-structured data. In addition, it complements traditional top-down data delivery methods with more flexible, bottom-up approaches that promote predictive or exploration analytics and rapid application development.
In the above diagram, the objects in blue represent traditional data architecture. Objects in pink represent the new modern BI architecture, which includes Hadoop, NoSQL databases, high-performance analytical engines (e.g. analytical appliances, MPP databases, in-memory databases), and interactive, in-memory visualization tools.
Most source data now flows through Hadoop, which primarily acts as a staging area and online archive. This is especially true for semi-structured data, such as log files and machine-generated data, but also for some structured data that cannot be cost-effectively stored and processed in SQL engines (e.g. call center records).
From Hadoop, data is fed into a data warehousing hub, which often distributes data to downstream systems, such as data marts, operational data stores, and analytical sandboxes of various types, where users can query the data using familiar SQL-based reporting and analysis tools.
Today, data scientists analyze raw data inside Hadoop by writing MapReduce programs in Java and other languages. In the future, users will be able to query and process Hadoop data using familiar SQL-based data integration and query tools.
The modern BI architecture can analyze large volumes and new sources of data and is a significantly better platform for data alignment, consistency and flexible predictive analytics.
Thus, the new BI architecture provides a modern analytical ecosystem featuring both top-down and bottom-up data flows that meet all requirements for reporting and analysis.
Data science and business analytics works with both structured and unstructured data. Yet the future belongs to unstructured or semi-structured data from both internal and external sources.
Total Enterprise Data Growth 2005-2015
IDC estimates the volume of digital data will grow 40% to 50% per year. By 2020, IDC predicts the number will have reached 40,000 EB, or 40 Zettabytes (ZB). The world’s information is doubling every two years. By 2020 the world will generate 50 times the amount of information and 75 times the number of information containers.
The massive growth of unstructured or semi-structured data is amazing and has implications for data warehouse / business intelligence / data analytics architecture and database design. The way we capture, store, analyze, and distribute data is transforming. New technologies like deduplication, compression, and analysis tools are lowering costs.
Structured data gives names to each field in a database and defines the relationships between the fields. Unstructured data is usually not stored in a relational database (as traditionally defined) where the data model is relevant to the meaning of the data.
The Internet of Things (equipping all objects in the world with identifying devices), blogs, videos, social media, emails, notes from call centers, and all forms of human and computer to computer communications will soon start to produce massive amounts of unstructured or semi-structured data.
The trick is to create value by extracting the right information from both internal and external data sources. That is what the science of data and art of business analytics needs to learn to extract from larger and larger sets of unstructured data.
See also: http://bit.ly/Sp0IWW
It has always been the dream of mankind to have the power to predict the future. Today, Big Data analytics are proving that dream is within our grasp. By parsing massive quantities of data on economics, climate, social or natural phenomena, we are now able to model how those forces are likely to behave in the future, creating efficiencies in business... and often saving lives.
Technology and big data is a disruptive game-changer. In the era of big data, new markets and services and unprecedented access are emerging.
“Process power and data storage are becoming almost free; networks and the cloud will provide global access and pervasive services; social media and cybersecurity will be large new markets,” states the “Global Trends 2030: Alternative Worlds” report, which was released this week by the Office of the Director of National Intelligence. “This growth and diffusion will present significant challenges for governments and societies, which must find ways to capture the benefits of new IT technologies while dealing with the new threats that those technologies present.”
“Because social networking technologies are becoming the fabric of online existence, they could become an important tool for providing corporations and governments with valuable information about individuals and groups, facilitating development of robust human social predictive models that can have applications ranging from targeted advertising to counterterrorism,” the report states. “Social networks could also displace services that existing corporations and government agencies now provide, substituting instead new classes of services that are inherently resistant to centralized oversight and control. For example, social networks could help drive the use of alternative and virtual monetary currencies.”