From Science Fiction to Reality
The other day, I was looking at a VB Script, which needed to be modified just a bit in order to be useful for me. Being a scripting illiterate, I was overwhelmed by the complexity of the coding, I couldn’t understand anything that was in the script, forget about successfully modifying it as per my needs. My failure to get the desired result drove me to think – why can’t computers understand human language?? Why do we have to tell them things in their language and not ours?? Why couldn’t I just type in what I wanted the script to do in plain English and get the computer to execute it?? I didn’t ponder over it much as it was already very late and I needed to catch up on some sleep. The next day in office was very much like any other day, sometime during the middle of the day, an email from our chairman popped up on my inbox. I didn’t bother to read it immediately as I was in the middle of something, later that day, I opened that mail and started reading it, Sam was announcing the success of WATSON in the game show Jeopardy. As I read further, I found that Watson is the latest supercomputer built by IBM and it could understand human language. What?????????? A computer that can understand human language?? Oh My God, isn’t that what I was thinking about last night?? Something like that had never happened with me, what was science fiction for me the night before, was a reality the very next morning. I was once again overwhelmed, this time due to the enormity of the achievement. I always wanted to witness something like this, the next BIG stride in the world of technology, something that will change the way people use computers and something that will change the world itself. It had to be IBM to come up with something like this, thirty years after building the first PC, we have now given the world the first computer that can understand plain English.
The name Watson, made me think of Dr. Watson, the windows program error debugger that gathers information about your computer when an error occurs with a program. That was named after the character by the same name of Sherlock Holmes fame. The original name of this diagnostic tool was Sherlock. But it was more than obvious that IBM’s Watson was named after our founder Thomas J. Watson. Sam’s mail was about Watson winning Jeopardy – Jeopardy! is an American quiz show featuring trivia in history, literature, the arts, pop culture, science, sports, geography, wordplay, and more. The show has a unique answer-and-question format in which contestants are presented with clues in the form of answers, and must phrase their responses in question form. The show has a decades-long broadcast history in the United States since its creation by Merv Griffin in 1964. It first ran in the daytime on NBC from March 30, 1964 until January 3, 1975; concurrently ran in a weekly syndicated version from September 9, 1974 to September 5, 1975; and later ran in a revival from October 2, 1978 to March 2, 1979. All of these versions were hosted by Art Fleming. Its most successful incarnation is the Alex Trebek-hosted syndicated version, which has aired continuously since September 10, 1984, and has been adapted internationally. Although, I have never watched jeopardy, I immediately understood that making Watson compete on jeopardy was just a demonstration of it’s capabilities at understanding human language and providing the desired results. Sam himself said in his mail that we have not spent the last four years and millions of dollars just to win a game show.
The project under which Watson was built is known as the DeepQA project. Watson is designed according to Unstructured Information Management Architecture – UIMA for short. This software architecture is the standard for developing programs that analyze unstructured information such as text, audio and images. Watson doesn’t use any kind of a database to answer questions because of the simple reason that it’s impossible to build such a database that can answer each and every question in the world. Thus, it has to rely on text which in other words means unstructured data. Computers until now could only understand data presented to them in a structured format. But that’s where Watson is different. The biggest challenge that Watson overcomes is simply to understand written text. In order to understand text it has to understand the language very well. For example, when Watson searches for the answer of the following jeopardy clue “This Greek King was Born in Pella in 356 BC” it might stumble up on a sentence like “Pella is regarded as the birthplace of prince Alexander” now Watson has to understand that the birthplace and being born in mean the same thing. It has to understand that although the sentence has no reference to any king, a prince will grow up to be a king. You might say that a simple key word search can also produce similar results but it’s much beyond key word searches. To demonstrate that let me carry forward my last example about Alexander, while searching for the answer, Watson might also stumble upon another sentence like “Sreeraj, the king of laziness, was born in Pella” now, this sentence would be a better match if we were to rely on key word searches. It has a direct match for the words ‘King’ and ‘born’ which the previous sentence did not have. But, since Watson has been designed by much smarter people than me, it uses some smart algorithms which enable it to ignore this sentence and arrive at the correct answer as Alexander. In short, it has to analyze the sentence just the way a human would. Watson searches through it’s text data and generates hundreds of possible answers, evaluates each simultaneously and narrows its responses down to its top choice in about the same time it takes human champions to come up with their answers. Speed counts in a system doing this amount of processing in such little time. Humans get better with experience and so does Watson, it has been designed to learn from it’s mistakes and adapt dynamically to achieve a higher percentage of accuracy.
Being a systems guy, I HAVE to tell you the system details about Watson. The system powering Watson consists of 10 server racks and 90 IBM Power 750 servers based on the POWER7 processor. The computing power of Watson can be compared to over 2,880 computers with a single processor core, linked together in super high-speed network. A computer with a single processor core takes more than 2 hours to perform the deep analytics needed to answer a single Jeopardy! clue. Watson does this in less than three seconds. Up to now, Watson has been utilizing about 75% of its total processing resources. 500 gigabytes of disk hold all of the information Watson needs to compete on Jeopardy!. 500 GB might not seem like enough knowledge to compete on the quiz show Jeopardy!. Consider this: Watson mainly stores natural language documents – which require far less storage than the image, video and audio files on a personal computer. The information is derived from about 200 million printed pages of text. The Watson server room is cooled by two industrial grade, 20-ton air conditioning units. The two 20-ton air conditioning units that regulate the temperature of the Watson server room are enough to cool a room about one-third the size of a football field. The hardware that powers Watson is one hundred times more powerful than Deep Blue, the IBM supercomputer that defeated the world’s greatest chess player in 1997. The POWER7 processor inside the Power 750 is designed to handle both computation-intensive and transactional processing applications – from weather simulations, to banking systems, to competing against humans on Jeopardy!. Watson is optimized to answer each question as fast as possible. The same system could also be optimized to answer thousands of questions in the shortest time possible. This scalability is what makes Watson so appealing for business applications. In the past, the way to speed up processing was to speed up the processor. This consumed more energy and generated more heat. Watson scales its computations over 90 servers, each with 32 POWER7 cores running at 3.55 GHz. This provides greater performance and consumes less power. Watson is not connected to the Internet. However, the system’s servers are wired together by a 10 Gigabit Ethernet network.
Now, what is so great about Watson from a layman’s point of view?? Having a computer understand human language is a BIG deal for us but what does it change for the average computer user?? Imagine a person who could read up all the information ever printed on planet earth. Think about the amount of knowledge that person would have and now to the key part – Imagine you could ask a question to that person and ask that in your everyday language. There would be many who would say that Google does the same job and answers all our questions but what if you know that the person sitting next to you would know the answer to your question? Wouldn’t you prefer directing your question to him and not sit and think about how to phrase your question correctly for a Google search so that you get the most relevant results?? Anyways, Google doesn’t provide direct answers to your queries, it just throws up a lot of relevant information which you need to sit and read to get the correct answer to your question. Google can’t answer questions like “How much will I gain if I were to sell my stocks now?” “What are the chances of my patient recovering from disease without a surgery?” etc. Unlike Google you get straight answers here, you don’t have to wade through tons of information, interpret it correctly and arrive at the answer, Watson does all that for you. It’s just like speaking to a knowledgeable person. I, comparing Watson with Google doesn’t mean that they are peers or competitors, Google is just a search engine, nothing more, whereas, Watson is peerless. The difference is of chalk and cheese or should I say God and Man.
At the moment, Watson has been tuned to win Jeopardy but it can easily be tuned to be useful in some of the world’s biggest industries. IBM Plans to make use of Watson in some key areas like Finance and Medicine for example, Rice University uses a workload-optimized system based on POWER7 to analyze the root causes of cancer and other diseases. To the researchers, it’s not simply a more versatile server; it’s a giant leap towards understanding cancer.
I don’t deny the possibility of us not hearing about Watson again after the current euphoria disappears. It could very well be the next best thing that couldn’t live up to the expectations. But even then, it marks one of the greatest achievements in computing history which will eventually help us build a smarter planet.