Watson


From Science Fiction to Reality

The other day, I was looking at a VB Script, which needed to be modified just a bit in order to be useful for me. Being a scripting illiterate, I was overwhelmed by the complexity of the coding, I couldn’t understand anything that was in the script, forget about successfully modifying it as per my needs. My failure to get the desired result drove me to think – why can’t computers understand human language?? Why do we have to tell them things in their language and not ours?? Why couldn’t I just type in what I wanted the script to do in plain English and get the computer to execute it?? I didn’t ponder over it much as it was already very late and I needed to catch up on some sleep. The next day in office was very much like any other day, sometime during the middle of the day, an email from our chairman popped up on my inbox. I didn’t bother to read it immediately as I was in the middle of something, later that day, I opened that mail and started reading it, Sam was announcing the success of WATSON in the game show Jeopardy. As I read further, I found that Watson is the latest supercomputer built by IBM and it could understand human language. What?????????? A computer that can understand human language?? Oh My God, isn’t that what I was thinking about last night?? Something like that had never happened with me, what was science fiction for me the night before, was a reality the very next morning. I was once again overwhelmed, this time due to the enormity of the achievement. I always wanted to witness something like this, the next BIG stride in the world of technology, something that will change the way people use computers and something that will change the world itself. It had to be IBM to come up with something like this, thirty years after building the first PC, we have now given the world the first computer that can understand plain English.

The name Watson, made me think of Dr. Watson, the windows program error debugger that gathers information about your computer when an error occurs with a program. That was named after the character by the same name of Sherlock Holmes fame. The original name of this diagnostic tool was Sherlock. But it was more than obvious that IBM’s Watson was named after our founder Thomas J. Watson. Sam’s mail was about Watson winning Jeopardy – Jeopardy! is an American quiz show featuring trivia in history, literature, the arts, pop culture, science, sports, geography, wordplay, and more. The show has a unique answer-and-question format in which contestants are presented with clues in the form of answers, and must phrase their responses in question form. The show has a decades-long broadcast history in the United States since its creation by Merv Griffin in 1964. It first ran in the daytime on NBC from March 30, 1964 until January 3, 1975; concurrently ran in a weekly syndicated version from September 9, 1974 to September 5, 1975; and later ran in a revival from October 2, 1978 to March 2, 1979. All of these versions were hosted by Art Fleming. Its most successful incarnation is the Alex Trebek-hosted syndicated version, which has aired continuously since September 10, 1984, and has been adapted internationally. Although, I have never watched jeopardy, I immediately understood that making Watson compete on jeopardy was just a demonstration of it’s capabilities at understanding human language and providing the desired results. Sam himself said in his mail that we have not spent the last four years and millions of dollars just to win a game show.

The project under which Watson was built is known as the DeepQA project. Watson is designed according to Unstructured Information Management Architecture – UIMA for short. This software architecture is the standard for developing programs that analyze unstructured information such as text, audio and images.  Watson doesn’t use any kind of a database to answer questions because of the simple reason that it’s impossible to build such a database that can answer each and every question in the world. Thus, it has to rely on text which in other words means unstructured data. Computers until now could only understand data presented to them in a structured format. But that’s where Watson is different.  The biggest challenge that Watson overcomes is simply to understand written text. In order to understand text it has to understand the language very well.  For example, when Watson searches for the answer of the following jeopardy clue “This Greek King was Born in Pella in 356 BC” it might stumble up on a sentence like “Pella is regarded as the birthplace of prince Alexander” now Watson has to understand that the birthplace and being born in mean the same thing. It has to understand that although the sentence has no reference to any king, a prince will grow up to be a king. You might say that a simple key word search can also produce similar results but it’s much beyond key word searches. To demonstrate that let me carry forward my last example about Alexander, while searching for the answer, Watson might also stumble upon another sentence like “Sreeraj, the king of laziness, was born in Pella” now, this sentence would be a better match if we were to rely on key word searches. It has a direct match for the words ‘King’ and ‘born’ which the previous sentence did not have. But, since Watson has been designed by much smarter people than me, it uses some smart algorithms which enable it to ignore this sentence and arrive at the correct answer as Alexander. In short, it has to analyze the sentence just the way a human would. Watson searches through it’s text data and generates hundreds of possible answers, evaluates each simultaneously and narrows its responses down to its top choice in about the same time it takes human champions to come up with their answers. Speed counts in a system doing this amount of processing in such little time. Humans get better with experience and so does Watson, it has been designed to learn from it’s mistakes and adapt dynamically to achieve a higher percentage of accuracy.

Being a systems guy, I HAVE to tell you the system details about Watson. The system powering Watson consists of 10 server racks and 90 IBM Power 750 servers based on the POWER7 processor. The computing power of Watson can be compared to over 2,880 computers with a single processor core, linked together in super high-speed network. A computer with a single processor core takes more than 2 hours to perform the deep analytics needed to answer a single Jeopardy! clue. Watson does this in less than three seconds. Up to now, Watson has been utilizing about 75% of its total processing resources. 500 gigabytes of disk hold all of the information Watson needs to compete on Jeopardy!. 500 GB might not seem like enough knowledge to compete on the quiz show Jeopardy!. Consider this: Watson mainly stores natural language documents – which require far less storage than the image, video and audio files on a personal computer. The information is derived from about 200 million printed pages of text. The Watson server room is cooled by two industrial grade, 20-ton air conditioning units. The two 20-ton air conditioning units that regulate the temperature of the Watson server room are enough to cool a room about one-third the size of a football field. The hardware that powers Watson is one hundred times more powerful than Deep Blue, the IBM supercomputer that defeated the world’s greatest chess player in 1997. The POWER7 processor inside the Power 750 is designed to handle both computation-intensive and transactional processing applications – from weather simulations, to banking systems, to competing against humans on Jeopardy!. Watson is optimized to answer each question as fast as possible. The same system could also be optimized to answer thousands of questions in the shortest time possible. This scalability is what makes Watson so appealing for business applications. In the past, the way to speed up processing was to speed up the processor. This consumed more energy and generated more heat. Watson scales its computations over 90 servers, each with 32 POWER7 cores running at 3.55 GHz. This provides greater performance and consumes less power. Watson is not connected to the Internet. However, the system’s servers are wired together by a 10 Gigabit Ethernet network.

Now, what is so great about Watson from a layman’s point of view?? Having a computer understand human language is a BIG deal for us but what does it change for the average computer user?? Imagine a person who could read up all the information ever printed on planet earth. Think about the amount of knowledge that person would have and now to the key part – Imagine you could ask a question to that person and ask that in your everyday language. There would be many who would say that Google does the same job and answers all our questions but what if you know that the person sitting next to you would know the answer to your question? Wouldn’t you prefer directing your question to him and not sit and think about how to phrase your question correctly for a Google search so that you get the most relevant results?? Anyways, Google doesn’t provide direct answers to your queries, it just throws up a lot of relevant information which you need to sit and read to get the correct answer to your question. Google can’t answer questions like “How much will I gain if I were to sell my stocks now?” “What are the chances of my patient recovering from disease without a surgery?” etc. Unlike Google you get straight answers here, you don’t have to wade through tons of information, interpret it correctly and arrive at the answer, Watson does all that for you. It’s just like speaking to a knowledgeable person. I, comparing Watson with Google doesn’t mean that they are peers or competitors, Google is just a search engine, nothing more, whereas, Watson is peerless. The difference is of chalk and cheese or should I say God and Man.

At the moment, Watson has been tuned to win Jeopardy but it can easily be tuned to be useful in some of the world’s biggest industries. IBM Plans to make use of Watson in some key areas like Finance and Medicine for example, Rice University uses a workload-optimized system based on POWER7 to analyze the root causes of cancer and other diseases. To the researchers, it’s not simply a more versatile server; it’s a giant leap towards understanding cancer.

I don’t deny the possibility of us not hearing about Watson again after the current euphoria disappears. It could very well be the next best thing that couldn’t live up to the expectations. But even then, it marks one of the greatest achievements in computing history which will eventually help us build a smarter planet.

Advertisements
Posted in IT Infrastructure | Tagged , , | 2 Comments

How Kerberos Works


What happens at the windows logon screen between the time you press enter after entering your credentials and ‘Loading Your Personal Settings’ message appears

It hardly takes a second for your password to be accepted at the logon screen but what goes on behind the scenes to log you in on to your workstation in a domain environment will take much more than a second to explain.

My primary area of expertise is Active Directory but if you have read my previous blogs then you would know that I haven’t written anything on my favourite subject till now. That’s because so much has been written on AD that there is nothing new which I can write about.  Moreover, I don’t like to like to write a blog just to update my blog site. In order to write something, I need a subject which is not discussed on the web as much as other stuff is. This is because I want to contribute to the IT Infrastructure community in general and the Windows folks in particular through my blog and writing on subjects which have been written about zillions of times by great authors would not contribute anything to our community. One of the examples could be ABE (Access Based Enumeration) there is not too much about ABE on the internet apart from the Microsoft website and that came as a motivation for me to write about ABE. But, I always wanted to write about AD and now I have found something related to AD, which does not have too much written about it. It’s Kerberos, the preferred authentication protocol in Active Directory environments. This also gives me the opportunity to explain the behind the scenes action during a logon process.

Kerberos replaces LM, NTLM and NTLMV2 which were used in the pre Windows 2000 era (and are still used in some cases, we will come to that later). Massachusetts Institute of Technology (MIT) developed Kerberos to protect network services provided by Project Athena; Project Athena was a joint project of MIT, Digital Equipment Corporation, and IBM (my current employer) to produce a campus-wide distributed computing environment for educational use. It was launched in 1983, and research and development ran until June 30, 1991, eight years after it began. As of 2010, Athena is still in production use at MIT. Project Athena was important in the early history of desktop and distributed computing. It created the X Window System, Kerberos, and Zephyr Notification Service. It influenced the development of thin computing, LDAP, Active Directory, and instant messaging.

The name Kerberos comes from Greek mythology; it is the three-headed dog that guarded the entrance to Hades (hell), according to Hindu mythology, Lord Yama (God of Death) has a dog named ‘sarvara’, which sounds similar to kerberos. Why was this name chosen, remains a mystery to me.

Now, let’s get to know this dog better, Kerberos sees users (which are usually the client) as UPNs (User Principal Names) and services as SPNs (Service Principal Names), Your AD logon name – the one that looks like an email address (e.g., username@bigfirm.com) – is your UPN, Kerberos “introduces” UPNs to SPNs by giving a UPN a “ticket” to the SPN’s service. Let’s try to understand the user logon process through an example.  Let’s call our user OM, OM comes to office in the morning and starts his workstation. At the login screen he enters his username and password. At this point, his workstation sends a pre-authenticator to the Authentication Service (AS) of his local KDC (Key Distribution Centre), the KDC is better known as the domain controller, or we should say KDC is one of the roles of a domain controller. The KDC has two components, the Authentication Service or AS and the Ticket Granting Service or TGS. The pre-authenticator contains the current date and time in YYYYMMDDHHMMSSZ format, the Z in the end denotes that the date and time is in universal (zulu) time. This info is encrypted with OM’s password. Upon receiving the pre-authenticator, the AS decrypts the pre-authenticator using OM’s password, which it already has. If the AS is unable to decrypt the pre-authenticator then it means that the user entered the wrong password as that doesn’t match with the user’s password in the domain controller and hence the user receives a message saying his/her password is wrong. If the AS is able to decrypt the pre-authenticator then it compares the date and time inside with the DC’s own date and time, if the difference is not more than 5 minutes (default value, but can be changed) then AS sends the user a TGT (Ticket Granting Ticket). This is the reason that all domain joined machines need to have a time which is not different from the DC’s time by more or less than five minutes. The TGT is valid for 10 hours (default value but can be changed), This TGT contains a temporary password for the user and that is encrypted by the password of the krbtgt user account’s password. Krbtgt account is created by default when the DC is first installed. The user does not need to decrypt the TGT and hence it doesn’t need to know the krbtgt user’s password. As you might have noticed that the user has still not logged on to his workstation, that’s because the authentication process is yet to be completed. The next step in this process starts when the user sends the newly acquired TGT to the TGS, TGS is as we discussed earlier the second component of KDC, TGS decrypts the TGT and that confirms that OM is indeed OM and not an impersonator and then assigns OM a Service Ticket (ST) to his workstation. The service tickets generally contain time start, total lifetime, security token etc. This service ticket is encrypted with the workstation’s computer account password in AD. OM presents this ST to his workstation upon which his workstation decrypts it, as it has it’s own password and then allows OM to login. At this point OM sees the “Loading Your Personal Settings” message, then his profile gets loaded/created and then he is ready to work on his workstation. Now, to better understand Kerberos, let’s take this example a bit further and see what happens when OM tries to access a file server. When the user tries to access any particular service, like the file server or print server, it needs to authenticate itself to that particular server or service. This authentication process starts by the user sending his/her TGT to the TGS and upon verifying it, the TGS assigns a service ticket for the file server to OM, this ST is again encrypted with the password of the computer account of file server. OM presents this ST to the file server, file server decrypts it and then allows OM to view the list of shared folders on that file server. Whether OM is able to enter any or all of those shares depends on whether OM has the required NTFS and share permissions to those shares. This emphasises the fact that Kerberos is used for authentication and not for authorization.

Now, that we have understood how Kerberos works, let’s get to know, why is it considered so cool and better than NTLM. There are several reasons; we will discuss the two most important ones, one of them being that if you are using Kerberos then the user’s actual password is sent over the network just once in a day (or 10 hours to be exact). For the rest of the day the user uses his Ticket Granting Ticket (TGT) to authenticate for the various services that it might need. Second, would be the advanced encryption techniques available for Kerberos. If you are using Windows Server 2008 or above you can opt to use AES (Advanced Encryption Standard) which is one of the best generally available encryption techniques and it hasn’t been hacked yet. For, older versions of Windows you can use RC4 HMAC which isn’t a bad encryption technology either. Microsoft had to opt for a comparatively lesser encryption technology for older versions of Windows because till the late nineties it was unlawful in the United States to export software which used encryption technology beyond a certain level. LM and NTLM were ridiculously easy to hack through replay/mirror attacks, NTLMV2 was much better and Kerberos is extremely difficult or even impossible to hack.

It’s time to look at the scenarios where Kerberos is NOT used for authentication. The first scenario would be when you use an IP address in a UNC path. In this case Kerberos is not used because Kerberos needs SPNs and SPNs need DNS names. The second would be trying to connect to a computer which is in a workgroup. The third would be trying to connect to a pre windows 2000 computer. The most interesting scenario is when the domain controller is inundated with logon requests, it starts to login users with the previous authentication protocols to avoid the extra work that it needs to do with using Kerberos. I think this is one of the contributing reasons why Microsoft recommends the maximum hardware resources usage for domain controllers to be at 30%. But, How would you know whether you have been logged in with Kerberos or something else?? There are several indicators. If you are not logged in with Kerberos then you wouldn’t be able to add machines to domain, you won’t get any group policies etc. but the simplest way to find that out would be to run the command klist. Klist allows us to see the tickets that we currently have. It comes by default with Windows 7, Windows Server 2008 and later and can be installed on the previous versions. It’s part of the Windows Server 2003 resource kit. If you are not logged in with Kerberos then there won’t be any tickets.

This was the story behind what all happens within the blink of an eye during the logon process. I hope you liked what I had to share with you as my first blog on my favourite subject – Active Directory.

Posted in IT Infrastructure | Tagged , , , , , , , | 1 Comment

IT in Retail


How it’s different from other Sectors

It’s been two month’s since my last blog, in these two months, I celebrated my birthday, got hospitalized for the first time in my life and also switched my job (quite an eventful period, huh?). Now, I work for the BIG BLUE – IBM. Obviously, It’s one of the largest organizations anyone can ever work for and I hope to enjoy my time here.

Since, I have just switched jobs let me talk about the experiance with my previous organization, It’s also going to be my first non technical blog. My previous employer was a retail organization, in fact the second largest retail company in the country. During my tenure of just under 3 years, I experienced how IT in retail is very different from most other sectors. 

In Retail an incident of server down impacts cash flow directly, this is unlike most other sectors where such an incident would only have an indirect impact on the business. As a result escalations are fast, mounting more pressure on the IT professional. It demands quick and correct decision making, you can’t take your time in order to arrive at the right conclusion, you need to be fast and you need to be accurate all the time. There is no room for error. You might say that pressure is there in every sector and not just retail. Well, yes, it’s true but most of that pressure is artificial and usually the source is an over excited delivery manager. Such man made pressures are normally created for vested interests rather than any real urgency for solving the problem at hand. In Retail, the pressure is REAL, as real as it can get because the company is actually loosing money by the minute, you cannot compare that with anything else.

The number of stores is normally very high in any retail organization, usually in hundreds, and you can’t have an IT superhero at all your stores. Thus, arises the need to keep things as central as possible and whatever you have to keep at the store level, it should be as simple and tamper proof as possible, Architecture Design and Implementation are the two key phases to achieve that. Since, critical IT equipments like Servers and Routers are placed at the store, IT security and monitoring requirements go beyond just meeting compliance norms. I can go on and on like this but in short what I have learned is that IT Infrastructure requirements for an Retail Organization are very special in terms of Design, Operations Management, Incident Management and Hardware and Software requirements. You need a sturdy and stable IT Infra setup to run a retail organization, this may be true for any sector but more so in Retail.

Posted in IT Infrastructure | Tagged , | Leave a comment

The “SERVER” and “WORKSTATION” services


Why are they named so??

The file and printer sharing service in Windows is called the “SERVER” service but every other server service has a name which explains what it serves, like DNS Service or the DHCP service then why does the file and printer sharing service call itself as just SERVER as if it were an all encompassing service and all the other server based services are dependent on it? Although other services are not dependent on it but it’s name gives that illusion. Here’s the actual reason……….

In the olden days when Microsoft started writing network software there was just one thing that the OSs served – File and Printer Sharing. So, any server meant file and printer sharing server and hence the name was given to the service which provided it. Over the years nobody bothered to change the name and it just remained that way. That also explains the name “WORKSTATION” given to the client service for file and print server.

Posted in IT Infrastructure | Tagged , , | Leave a comment

Hiding Folders with Access Based Enumeration (ABE)


Invisible Folders

We have a heavily used Windows File Server which acts as the only way to share files within a department and with other departments in the corporate head office. The folders are all secure with NTFS permissions wherein every user is authorized to access the folders which he/she needs to but would be greeted with an “Access Denied” message if the user tried to access any other folder. Pretty good security by any standards but we started facing a peculiar problem which was not related to technology at all. Some users started requesting rights for folders which they had no business to access, we have a proper authorization policy in place which says that the HOD of a particular department needs to authorize a person wanting to access any folder belonging to that department. But still such requests led to sticky situations when very senior people in the organization requested such rights as it was difficult to tell them to follow the proper process when it’s more than clear that they don’t want to go the proper way. This led to a thought that if users couldn’t see the folders which they had no access to then they won’t ask for it and this problem could be minimized if not eliminated completely. It sounded very cool but I had never heard about any particular way to do it in Windows although I knew that the novell guys could do it with their OS.

That’s when ABE came to the rescue which was released with Microsoft Windows Server 2003 SP1. ABE is a feature of the SERVER service, the windows service that provides file and printer sharing. It works by modifying a feature of the server service called “enumeration”, which basically means how the server service answers the question, “What files and folders exist in  a given share?” Windows Explorer does the same thing when you open a folder. That flashlight which shines for a second sometimes when you open a share is the way explorer entertains you while it enumerates the folder contents.

ABE only works on Windows Server 2003 SP1 and higher, it’s not enabled by default and needs another small program to be turned ON or OFF. To be able to see a file/folder when ABE is turned ON, a user will need at least READ permissions on it. ABE works on both files and folders but only when accessed through a share, when someone directly logs on to the computer ABE has no affect on what the user is able to see.  Even through a share all files/folders, irrespective of their NTFS permissions, will be visible to anobody who is a member of the administrator’s group on the server hosting the share. This means, if you are an administrator, everything will be visible to you even if you don’t have even READ rights on some or any of the files/folders. This can be particularly tricky while you are testing the affects of ABE after enabling it, you might believe that ABE is not working because you can still see all the files/folders.  The tool used to enable/disable ABE is called abeui.msi and is downloadable from microsoft.com, it’s installation is pretty simple and once installed you can turn ABE ON or OFF for all shares or only some particular shares. Although you can do the same during the installation as well when the following dialog box is presented to you.

A new tab is added to the properties dialog box, of every share on the server on which ABEUI is installed, by the name “Access-Based Enumeration” as shown in the image given below.

  

Checking the first check box enables ABE for that particular share and checking the next one enables ABE for all the shares on that particular server. There is also a CLI for doing the same but since I’m not a big CLI fan, I have never tried that out myself. The installation of ABEUI is necessary even to get the CLI for ABE.

In Windows Server 2008, ABE comes pre-installed and pre-enabled. You can however, disable it for some or all shares if you need to.

If nothing then at least enabling ABE on your shares will reduce the temptation of your users to play internal hacker. And yes I enabled ABE on my File Server and it resulted in reduction of the problem that it was intended to reduce but then as luck would have it, the management decided to give at least READ access of all the folders on the file server to every user in the office 😉

Posted in IT Infrastructure | Tagged , , , , | Leave a comment

Windows 7 Supports RAID1


Disk Mirroring for Desktops, umm interesting!!

I just found out that Windows 7 Supports RAID 1 or Disk Mirroring as it’s commonly known as. It’s the first MS Client OS to support RAID1. For me it’s interesting to see RAID for desktops as untill recently it was something so niche that it was only available for servers. It’s amusing how niche tech becomes common place so fast.

Posted in IT Infrastructure | Tagged , , , , | 2 Comments

Notebook Entry: Leftover of Conficker/Downadup/Kido Worm


Another Leftover Story

Talking about virus leftovers – I recently noticed a leftover of the Conficker worm which is used by the virus to reinfect the systems, they are scheduled tasks on windows systems. I have only noticed them on Server OSs, they are named like ‘At’ and then any random numbers, usually starting with 1 like At1, At2……….At15, At4098 etc. As files they are named as At1.job…..At4753.job etc. as .job is the extension for secheduled tasks.

These tasks are used to run the infected dlls at a particular hour in a day, so, if a server has got 10 such jobs then it means that an attempt to resurrect the kido virus is made 10 times a day. These jobs try to run dlls followed by some random characters, one example would be “rundll32.exe ecaosp.hwo,cpmjb”.

Our antivirus (Kaspersky) doesn’t detect them – mainly because they are not viruses themselves but just a tool to re-enable the virus. Mcafee does detect them and cleans them up. The manual cleanup process for these jobs is well documented at http://support.microsoft.com/kb/962007 . If a system is infected with conficker the jobs are said to reappear after a few hours upon manual deletion but I didn’t see this behaviour on my servers may be because the servers were all cleaned up and only the jobs remained as  –     the leftovers.

Posted in IT Infrastructure | Tagged , , , , , , | 1 Comment