Natural language processing (NLP) involves machine learning, artificial intelligence, algorithms and linguistics related to interactions between computers and human languages. One important goal of NLP is to design and build software that will understand and analyze human languages to simplify and optimize human - computer communication.
NLP algorithms are usually based on probability theory and machine learning grounded in statistical inference — to automatically learn rules through analysis of real-world usage. It includes word and sentence tokenization, text classification and sentiment analysis, spelling correction, information extraction, parsing, meaning extraction, question answering and requires both syntactic and semantic analysis at various levels.
NLP applications today involve spelling and grammar correction in word processors, machine translation, sentiment analysis and email spam detection. NLP plus data science is now allowing us to design and implement better automatic question / answering systems and the ability to detect and predict human opinions about products or services.
Examples of NLP algorithms include n-gram language modeling, naive bayes and maxent classifiers, sequence models like Hidden Markov Models, probabilistic dependency and constituent parsing, and vector-space models of meaning.
Google has open sourced a tool for computing continuous distributed representations of words that provides an efficient implementation of the continuous bag-of-words and skip-gram architectures for computing vector representations of words. These representations can be subsequently used in many natural language processing applications and for further research.
Download the code: svn checkout http://word2vec.googlecode.com/svn/trunk/
Run 'make' to compile word2vec tool
Run the demo scripts: ./demo-word.sh and ./demo-phrases.sh
IBM is working to offer Watson as a smartphone-sized attendant, like Apple's Siri, for businesses. It would be a voice-activated smartphone app that answers questions. For example, it could help a farmer in a field decide the optimal time to plant.
The goal is to provide businesses ready access to an incredible engine with a world knowledge base at a reasonable price.
Watson is an artificial intelligence computer system capable of answering questions posed in natural language. IBM describes it as "an application of advanced Natural Language Processing, Information Retrieval, Knowledge Representation and Reasoning, and Machine Learning technologies to the field of open domain question answering" which is "built on IBM's DeepQA technology for hypothesis generation, massive evidence gathering, analysis, and scoring."
Each IBM Watson installation is a 10-rack supercomputer with a total of 2880 processor threads (90 Power7 CPUs clocked at 3.5GHz, each with eight cores, and each core with four threads). There is 16TB of RAM, and the entire thing is massively parallel — it can process 500 gigabytes of data per second. Watson runs IBM’s DeepQA software, which basically pores through millions of books and documents — dictionaries, encyclopedias, research papers, enws documents — and then uses that data to answer questions with remarkable speed and accuracy.
Watson High-level Architecture
IBM first has to turn Watson into an energy-efficient service that can run on a smartphone or tablet. The greatest challenge is to figure out how to price and deliver Watson as a handheld product.
In September 2011, IBM and Wellpoint, a major healthcare provider, announced a partnership to utilize Watson's data crunching capability to help suggest treatment options and diagnoses to doctors. Just as Watson analyzed massive data in Jeopardy! to reach a set of hypotheses and list several of the most likely outcomes, it could help doctors in diagnosing patients. Watson could analyze the patient's specific symptoms, medical history, and hereditary history, and synthesize that data with available unstructured and structured medical information, including published medical books and articles. IBM has made it clear that Watson does not intend to replace doctors, but assist them to avoid medical errors and sharpen medical diagnosis with the help of its advanced analytics technology.
IBM intends to use Watson in other information intensive fields as well, such as telecommunications, financial services, and government.
IBM Watson: The Science Behind an Answer