IBM is working to offer Watson as a smartphone-sized attendant, like Apple's Siri, for businesses. It would be a voice-activated smartphone app that answers questions. For example, it could help a farmer in a field decide the optimal time to plant.
The goal is to provide businesses ready access to an incredible engine with a world knowledge base at a reasonable price.
Watson is an artificial intelligence computer system capable of answering questions posed in natural language. IBM describes it as "an application of advanced Natural Language Processing, Information Retrieval, Knowledge Representation and Reasoning, and Machine Learning technologies to the field of open domain question answering" which is "built on IBM's DeepQA technology for hypothesis generation, massive evidence gathering, analysis, and scoring."
Each IBM Watson installation is a 10-rack supercomputer with a total of 2880 processor threads (90 Power7 CPUs clocked at 3.5GHz, each with eight cores, and each core with four threads). There is 16TB of RAM, and the entire thing is massively parallel — it can process 500 gigabytes of data per second. Watson runs IBM’s DeepQA software, which basically pores through millions of books and documents — dictionaries, encyclopedias, research papers, enws documents — and then uses that data to answer questions with remarkable speed and accuracy.
Watson High-level Architecture
IBM first has to turn Watson into an energy-efficient service that can run on a smartphone or tablet. The greatest challenge is to figure out how to price and deliver Watson as a handheld product.
In September 2011, IBM and Wellpoint, a major healthcare provider, announced a partnership to utilize Watson's data crunching capability to help suggest treatment options and diagnoses to doctors. Just as Watson analyzed massive data in Jeopardy! to reach a set of hypotheses and list several of the most likely outcomes, it could help doctors in diagnosing patients. Watson could analyze the patient's specific symptoms, medical history, and hereditary history, and synthesize that data with available unstructured and structured medical information, including published medical books and articles. IBM has made it clear that Watson does not intend to replace doctors, but assist them to avoid medical errors and sharpen medical diagnosis with the help of its advanced analytics technology.
IBM intends to use Watson in other information intensive fields as well, such as telecommunications, financial services, and government.
IBM Watson: The Science Behind an Answer
A data model is a plan for building a database. To use a common analogy, the data model is equivalent to an architect's building plans.
Data modeling is a process used to define and analyze data requirements needed to support the business processes within the scope of corresponding information systems in organizations. To be effective, it must be simple enough to communicate to the end user the data structure required by the database yet detailed enough for the database design to use to create the physical structure.
A data model is a conceptual representation of the data structures that are required by a database. The data structures include the data objects, the associations between data objects, and the rules which govern operations on the objects. As the name implies, the data model focuses on what data is required and how it should be organized rather than what operations will be performed on the data.
Data modeling is the formalization and documentation of existing processes and events that occur during application software design and development. Data modeling techniques and tools capture and translate complex system designs into easily understood representations of the data flows and processes, creating a blueprint for construction and/or re-engineering.
A data model can be thought of as a diagram or flowchart that illustrates the relationships between data. Although capturing all the possible relationships in a data model can be very time-intensive, it's an important step and shouldn't be rushed. Well-documented models allow stake-holders to identify errors and make changes before any programming code has been written.
Data modeling is also used as a technique for detailing business requirements for specific databases. It is sometimes called database modeling because a data model is eventually implemented in a database.
There are three different types of data models produced while progressing from requirements to the actual database to be used for the information system:
1) Conceptual data models. These models, sometimes called domain models, are typically used to explore domain concepts with project stakeholders. On Agile teams high-level conceptual models are often created as part of your initial requirements envisioning efforts as they are used to explore the high-level static business structures and concepts. On traditional teams conceptual data models are often created as the precursor to LDMs or as alternatives to LDMs.
2) Logical data models (LDMs). LDMs are used to explore the domain concepts, and their relationships, of your problem domain. This could be done for the scope of a single project or for your entire enterprise. LDMs depict the logical entity types, typically referred to simply as entity types, the data attributes describing those entities, and the relationships between the entities. LDMs are rarely used on Agile projects although often are on traditional projects (where they rarely seem to add much value in practice).
3) Physical data models (PDMs). PDMs are used to design the internal schema of a database, depicting the data tables, the data columns of those tables, and the relationships between the tables. PDMs often prove to be useful on both Agile and traditional projects and as a result the focus of this article is on physical modeling.
Although LDMs and PDMs sound very similar, and they in fact are, the level of detail that they model can be significantly different. This is because the goals for each diagram is different – you can use an LDM to explore domain concepts with your stakeholders and the PDM to define your database design.
Data Modeling in Context of Business Process Integration
Data Modeling in the Context of Database Design
Database design is defined as: "design the logical and physical structure of one or more databases to accommodate the information needs of the users in an organization for a defined set of applications". The design process roughly follows five steps:
1. planning and analysis
2. conceptual design
3. logical design
4. physical design
The data model is one part of the conceptual design process. The other, typically is the functional model. The data model focuses on what data should be stored in the database while the functional model deals with how the data is processed. To put this in the context of the relational database, the data model is used to design the relational tables. The functional model is used to design the queries which will access and perform operations on those tables.
Data Model Components
The data model gets its inputs from the planning and analysis stage. Here the modeler, along with analysts, collects information about the requirements of the database by reviewing existing documentation and interviewing end-users.
The data model has two outputs. The first is an entity-relationship diagram which represents the data strucures in a pictorial form. Because the diagram is easily learned, it is valuable tool to communicate the model to the end-user.
The second component is a data document. This a document that describes in detail the data objects, relationships,
and rules required by the database. The dictionary provides the detail required by the database developer to construct the physical database.
LinkedIn is the world’s largest professional networking site, with over 100 million members. The members’ activities generate terabytes of data, which are used actively by LinkedIn to drive data-driven products that provide maximum value for its members. The infrastructure supporting the analytics is based on multiple and diverse platforms—from traditional RDBMSes to Hadoop to Voldemort (a distributed key value store).
The last few years have brought a wealth of new data technologies organized around horizontal scalability. LinkedIn has built out an ecosystem of infrastructure to support products that use data in innovative ways and create significant infrastructure demands.
LinkedIn uses a mixture of apache projects like Hadoop, Zookeeper, Pig, and Avro as well as a set of open source projects of our own creation such as Voldemort, Kafka, and Azkaban.
Hadoop is the key ingredient for offline computation, but creating an agile system for offline computing requires a lot more than just a Hadoop cluster.
Stream-processing is an under-utilized model that enables real-time data processing. Kafka is LinkedIn's open source framework that enables map/reduce like processing without the high-latency turnaround of Hadoop jobs.
Finally live serving and data deployment are the last mile of analytical data processing—getting terrabytes of data delivered and available for serving with low latency is what actually gets your data in front of your users.
See Video: http://bit.ly/uAel5A
Optimization techniques for wide-area networks (WANs) can improve most organizations' application response times, particularly where network latency is high, which is often due to centralization of servers and IT resources. Typically, WAN optimization controllers (WOCs) serve to prevent network latency having a severe impact on the performance of applications and underlying protocols. Through data reduction and prioritization techniques, WOCs can also help organizations avoid costly bandwidth upgrades.
WAN optimization is about improving the performance of business applications over WAN connections. Most networks carry a variety of traffic types of differing characteristics and importance. Many organizations are striving to manage this traffic to optimize the response times of critical applications and reduce costs, given that bandwidth continues to represent a significant proportion of operating expenditure for wide-area data networks. But the cost of bandwidth isn't the only consideration — matching the allocation of WAN resources to business needs is also
important. And as resources are increasingly centralized, minimizing the effect of latency on application response times is becoming a critical requirement. In addition, virtualization and new application environments such as cloud computing and Web services can put an unexpected strain on the network.
WAN optimization controllers (WOCs) are deployed symmetrically — in data centers and remote locations — and improve the performance of applications that are accessed across a WAN. The WOCs are typically connected to the LAN side of WAN routers, or are software integrated with client devices. They address application performance problems caused by bandwidth constraints and latency or protocol limitations. The primary function of WOCs is to improve the response times of business-critical applications over WAN links, but they can also help to maximize return on investment in WAN bandwidth, and sometimes avoid the need for costly bandwidth upgrades. To achieve these objectives, WOCs use a combination of techniques, including:
The WAN optimization controller market is maturing, but it still sees a high level of innovation around cloud, video, multipoint quality of service, and security.
Before purchasing, ensure the vendors being considered offer the product capabilities and support required by your application mix.
Transforming into a data-driven organization - turning information into actionable insights is a 3 part strategy:
• Technology – build a modern BI architecture & analytics ecosystem with the right tools
• Processes – streamline and standardize BI processes, measurements, and reports wherever possible
• People – train staff to use BI tools, become data-driven decision makers to meet the needs of the organization
The goal of a modern BI system is to allow the organization to:
• Make confident, data-based decisions based on evidence
• Access timely, relevant information you need, to meet the requirements of all types of users
• Link strategy to execution, leveraging data from all data sources
• Get answers when and where you need them on any device, at any time
• Transform data into actionable insight for everyone
• Uncover new or hidden opportunities to increase competitiveness
• Explore data in an intuitive way, for immediate answers to questions
Top Benefits of Analytics
Analytics is about having the right information and insight to create better business outcomes. Business analytics means leaders know where to find the new revenue opportunities and which product or service offerings are most likely to address the market requirement. It means the ability to can quickly access the right data points to evaluate key performance and revenue indicators in building successful growth strategies. And, it means recognizing regulatory, reputational, and operational risks before they become realities.
1) Having the knowledge you need: Analytics delivers insightful information in context so decision makers have the right information, where, when and how you need it.
2) Making better, faster decisions: Analytics provides decision makers throughout the organization with the interactive, self-service environment needed for exploration and analysis.
3) Optimizing business performance: Analytics enables decision makers to easily measure and monitor financial and operational business performance, analyze results, predict outcomes and plan for better business results.
4) Uncover new business opportunities: Analytics delivers new insights that help the organization maximize customer and product profitability, minimize customer churn, detect fraud and increase campaign effectiveness.
Eight Levels of Analytics