HOW MUCH MONEY HAS BEEN INVESTED IN BIG DATA IN PARTICULAR
Intel CIO Kim Stevenson discusses how Intel IT is looking to leverage predictive analytics to deal with the sea of data out there, and how this is already creating new opportunities for the organization.
In one example of a successful big data implementation at Intel, Stevenson discussed a pilot program that the company ran that identified customers that were more likely to purchase than others based on the heaps of information generated at Intel.
“We looked at that and we examined how our sales coverage model was against those customers, and we took our inside sales force and made calls based on what the predictive analytics said who the customers more likely to purchase were,” explained Stevenson. “In a short amount of time, we were able to cover customers that weren’t previously covered and generate millions of dollars in incremental revenue.”
Another example that Stevenson gave was in retroactive analysis of a failed program that she said cost the silicon giant $700 million when the dust settled. Using the massive amounts of manufacturing data available to them during the die process, Stevenson says that they are now able to see problems sooner using big data analytics to assist in the debugging process.
When asked what advice she would give to others, Stevenson said that she advises fellow CIOs to partner with their business units to identify where the hidden potential is and let that become the guiding light in terms of what problems are focused on – and then stay focused.
“There’s a lot of questions you can answer about any given business, but if you stay focused on a small set of business problems, they you’ll create some early wins and you’re able to grow based on your successful track record.”
Stevenson says Intel has set rules about how to operate predictive analytics in their company, which include small teams of roughly 5 people, and problems that can be solved within a 6 month period of time. Ultimately, says Stevenson, these projects are tied to ROI, which Intel has set a target of $10 million dollars for their first initial deployment. “That helps us narrow and prioritize the problems to higher value problems for the company.”
Stevenson also advises that enterprises start to build the skills that they need. “You need data scientists, visualization experts, data curators, and those types of skills – they’re rare today,” she comments. “It’s harder to learn the business knowledge that is needed to make the data into valuable information than to learn the IT technical skills,” she says in advising people to grow the skills internally. “It will be a focus diligent progression of taking the people that understand your business process today and complimenting them with the technical skills required in building big data management systems or predictive analytic models.”
Predictive analytics is now sexy in the business world. While predictive analytics has many benefits and can help organizations gain competitive advantage, the hype may be causing false expectations. There is a mistaken belief that all you need is new data crunching technology, big data and some business analysts to find meaning in the data - and wala - you can make predictions. This is a recipe for disaster.
An organization needs an information management strategy (including both internal and external data as well as both structured and unstructured data), a technology strategy and a data science strategy. The organization must invest in a team of data scientists to use sophisticated analytical techniques, machine learning and statistical algorithms for finding, accessing and crunching relevant data. The data science and business analytics team works with business leaders to design a strategy for using predictive information.
Organizations can hire data scientists in-house (difficult considering a lack of skilled business data science practitioners) or professional data scientists can be engaged on a time or fixed fee basis and be responsible for deploying, managing and scaling the data science and predictive analytics projects. A mixture of both internal and external data scientists may be optimal for ensuring objectivity and creativity. Hiring external data scientists offers the ability to quickly form a data science team and scale-up big data projects without the upfront CapEx of hiring data scientists in-house. Organizations can also scale down equally quickly and pay only for the data science services they use.
There are three types of data analysis:
While descriptive analytics (modern data warehouse / business intelligence systems) looks at data and analyzes past events for insight as to how to approach the future, predictive analytics (new data analytical platforms) uses data to determine the probable future outcome of an event or a likelihood of a situation occurring. Predictive analytics encompasses a variety of statistical techniques from modeling, machine learning, data mining and game theory that analyze current and historical facts to make predictions about future events.
In business, predictive models exploit patterns found in historical and transactional data to identify risks and opportunities. Models capture relationships among many factors to allow assessment of risk or potential associated with a particular set of conditions, guiding decision making for candidate transactions.
Three basic cornerstones of predictive analytics are:
Predictive analytics can help:
An example of using predictive analytics is optimizing customer relationship management systems. They can help enable an organization to analyze all customer data - exposing patterns that predict customer behavior. Another example is for an organization that offers multiple products, predictive analytics can help analyze customers’ spending, usage and other behavior, leading to efficient cross sales, or selling additional products to current customers. This directly leads to higher profitability per customer and stronger customer relationships. Credit scoring uses predictive analytics to process a customer's credit history, loan application and customer data to rank-order individuals by their likelihood of making future credit payments on time. An example is the FICO score.
Other applications of predictive analytics includes: decision support systems; collection analytics; cross-selling; customer retention; direct marketing; fraud detection; portfolio design and management; product design; economic forecasts; risk management; underwriting and others.
Analytical techniques include: regression techniques; linear regression models; discrete choice models; logistic regressions; multinomial logistic regressions; probit regressions; time series models; survival or duration analysis; classification and regression trees; and multivariate adaptive regression splines.
Machine learning techniques include: neural networks; radial basis functions; support vector machines; naïve bayes models; k-nearest neighbour algorithms; and geospatial predictive modeling.
Most garden variety business analysts do not have the training or experience to apply these analytical and machine learning techniques or design and execute customized algorithms to find the valuable, actionable insights from the raw data. But most data scientists do have the training and experience to apply all or some of these sophisticated techniques. As a result, it is prudent to distinguish between data scientists and business analysts and create a team assigning proper roles to each to optimize the predictive analytics strategy.
In addition, for an organization to become data-driven and optimize predictive analytics for better decision making, a cultural "mind-set" shift needs to occur between using descriptive analytics with the current business intelligence systems and learning to think prescriptively using data science and prescriptive analytics. Prescriptive analytics automatically synthesizes big data, mathematical sciences, business rules, and machine learning to make predictions and then suggests decision options to take advantage of the predictions.
Prescriptive analytics goes beyond predicting future outcomes by also suggesting actions to benefit from the predictions and showing the decision maker the implications of each decision option. Prescriptive analytics not only anticipates what will happen and when it will happen, but also why it will happen. Further, prescriptive analytics can suggest decision options on how to take advantage of a future opportunity or mitigate a future risk and illustrate the implication of each decision option. In practice, prescriptive analytics can continually and automatically process new data to improve prediction accuracy and provide better decision options.
Prescriptive analytics synergistically combines data, business rules, and mathematical models. The data inputs to prescriptive analytics may come from multiple sources, internal (inside the organization) and external (social media and other data sets). The data may be structured (transactional, numerical and categorical) as well as unstructured (text, images, audio and video data). Business rules define the business process and include constraints, preferences, policies, best practices, and boundaries. Mathematical models are techniques derived from mathematical sciences and related disciplines including applied statistics, machine learning, operations research, and natural language processing.
Python is an increasingly popular object-oriented, interpreted and interactive programming language used for heavy-duty data analysis. Python is designed for ease-of-use, speed, readability and tailored for data-intensive applications. Python supports multiple programming paradigms, including object-oriented, imperative and functional programming styles. It features a fully dynamic type system and automatic memory management, similar to that of Scheme, Ruby, Perl and Tcl.
You can create customized data tools using Python that can handle large data sets efficiently - it lets you work more quickly and integrate your systems more effectively. You can get more done in less time using Python for manipulating, processing, cleaning, and crunching data.
Python allows an organization to build a framework that makes it easy to collect data from a myriad of data sources and model them. So instead of spending time writing database connector code, you can use a simple configuration and quickly get off the ground. As a result of this easy familiarity, Python allows an organization to move code from development to production more quickly considering the same code created as a prototype can easily be moved into production.
If you like R language, Python libraries such as SciPy, iPython and Pandas provide much of the mathematical functionality typically found in R. While R offers more packages and visualization capabilities at this time, Python is catching up.
Simply, Python is easy to learn, platform neutral and cheap. Python is a tool to build other tools with, including data analysis tools. It was actually conceived in a huge orgy of different programming paradigms, styles and languages. Python runs on Windows, Linux/Unix, Mac OS X, and has been ported to the Java and .NET virtual machines.
Python is free to use, even for commercial products, because of its OSI-approved open source license. See: http://www.python.org/psf/license/
Pandas is a Python package for doing data transformation and statistical analysis. Pandas is an open source, BSD-licensed library providing high-performance, easy-to-use data structures and data analysis tools. See: http://pandas.pydata.org/
While R is the most widely-used open source environment for statistical modeling and graphics, Pandas adopts some of the best concepts of R, like the foundational data.frame. Pandas has been described as "R data.frame on steroids". Pandas seeks to remedy some frustrations common to R users:
1. R has simple data alignment and indexing functionality, leaving much work to the user. Pandas makes it easy and intuitive to work with messy, irregularly indexed data - like time series data. Pandas also provides rich tools, like hierarchical indexing, not found in R;
2. R is not well-suited to general purpose programming and system development. Pandas enables you to do large-scale data processing seamlessly when developing your production applications;
3. Hybrid systems connecting R to a low-productivity systems language like Java, C++, or C# suffer from significantly reduced agility and maintainability, and you’re still stuck developing the system components in a low-productivity language;
4. The "copyleft" GPL license of R can create concerns for commercial software vendors who want to distribute R with their software under another license. Python and Pandas use more permissive licenses.
Top Python Advantages
- Instant feedback from the interactive interpreter.
- Non-intrusive: You think about the problem, not the tool you are working with. After you learn Python, it gets out of the way.
- Libraries: Whatever you want to do, somebody has written code to help you get there.
- Community: The community is a great source of examples and ideas.
- The philosophy of one-best-way means that Python programmers all tend to do things in sort of the same way. This is a big advantage because it makes it easy to read other people's code - a great way to learn.
Top Python Disadvantages
- No single source of truth / best-practices: It can be hard to learn what is the best library for a particular job. The large number of packages relevant to a particular task can make it difficult to find the one best suited to your exact needs.
- Documentation is substandard: The Python official documentation is seldom the best way to learn a new library. The informal Python community provides the most useful examples. Yet sorting out the wheat from the chaff can be hit-or-miss.
- Concurrency: Python was designed without concurrency in mind and it shows.
Increasing data volumes and data's high utility have led to an explosion of capabilities and possibilities in the past few years. While stalwart structures of our information, like the enterprise data warehouse, remain highly supported, it is widely acknowledged that no single solution will satisfy all enterprise data management needs.
Many are confused by the value of Hadoop, data warehouse appliances, and stream processing, with their seemingly conflicting value propositions for current information management infrastructure. Although storage remains historically inexpensive, costs for keeping "all data for all time" are still escalating.
The key to making the correct data storage selection is understanding your workloads—current, projected, and envisioned. This practical session will organize and explore the major categories of information stores available and help you make the best choices to keep information as an unparalleled corporate asset.
No doubt the amount of data your company collects is growing. But what's the point of amassing all that information if you can't use it to drive your business forward? Smart businesses are giving people throughout their organizations access to deeper intelligence by marrying their big data and business intelligence efforts into a big data solution. The result is better decisions based on meaningful insights company wide. What's your strategy for big data analytics?