A new window on HPC clusters

A new window on HPC clusters

The skill of building scalable commodity clusters was once limited to technology vanguards that were typically faced with formidable computational challenges and sparse budgets. In recent years, scalable clusters themselves have become commodity and are often encapsulated as single items in vendor catalogs to be issued as part of the greater data center solution. Cluster technology has been bundled -- both in software and hardware.

Microsoft is not new to the data center. It surprised me to learn that MS Server dominates enterprise server deployments on midrange hardware alongside of Sun Solaris, Hewlett-Packard HP-UX, IBM AIX, and Linux. However, being a developer and consumer of scalable high-performance computing (HPC), up until now I have only seen scant evidence that Microsoft is a player in the HPC arena.

Factors that determine the appropriate desktop operating system (OS) are different than the factors that are important to the server OS. Reverent alignment arguments to one desktop solution versus another are mostly irrelevant. When it comes down to it, a good server OS is one that does not get in the way when I am trying to do my work.

Core Components

I have had the opportunity to test and deploy the Microsoft Windows Compute Cluster Server 2003 (CCS) beta 1. The CCS package contains everything needed to quickly deploy a Windows-based cluster -- except, of course, for the actual HPC applications. As mentioned, the software release is in beta (not even a release candidate) and therefore cannot be reasonably benchmarked or even criticized on features. Regardless, what I have seen is compelling. Some of the core cluster software components in the CCS bundle include:

• Compute Node OS Installation - Remote Installation Services (RIS) is a headless node OS installation tool that is functionally roughly equivalent to Red Hat's kick-start. In the case of RIS, the portal node is configured to be an OS image server. Imaging a new compute node is merely a matter of rebooting the nodes while the RIS is running on the portal node.

• Interprocess Communications - Microsoft's mpich2-compliant message passing interface (MPI) and services are interoperable with the server OS security and service management layers. I applaud Microsoft's choice of MPI (as opposed to a proprietary .NET solution) as strategic; it makes the cluster platform more potentially accessible to legacy Unix parallel applications.

• Distributed Resource Management - The Microsoft Scheduler provides basic primitive fault tolerant job execution management. The interface to the scheduler is flexible but is sadly missing the capability of single transaction job-arrays in the command line interface, which is important for large-scale, embarrassingly parallel tasks.

• Centralized Monitor and Management - The Compute Cluster Manager (CCM) interface is a single point of contact for the cluster administrator. The distributed job life cycle can be controlled along with node access and administration.

The steps required to deploy CCS are managed by a ToDoList application that helps the cluster administrator make choices regarding infrastructure configuration and enforces that it all gets done in the proper order. The ToDoList application will help the administrator choose from a menu of cluster network topologies, automatically deploy the appropriate system services, define a RIS deployment, and manage users. The CCM also includes a basic console to monitor and control various aspects of the entire cluster environment.

From the bioinformaticist's perspective, the Microsoft cluster platform will make a compelling splash when it is released. This is not to say that there will not be adoption challenges for Microsoft. One particular challenge is that Microsoft Windows is not Unix. This tautology reflects merely on the established momentum of Unix as a cluster OS and on the comfort of HPC users and developers to Unix methodology. A major factor that may peak interest in the CCS is the number, quality, and relevancy of cluster-ready applications -- both open source and commercial.

Sun v40z Shines On Windows

The v40z is a quad processor socket powerhouse based on AMD Opteron 64-bit dual-core processors -- a total of eight processors. The 3U enclosure is equipped with five SCSI drive bays with an internal RAID-1 mirroring controller along and with a RAM capacity of up to 32 Gigabytes. Published performance benchmarks based on Solaris 10 are sock-knock-off'ers. I was personally curious how this hurricane of power performs on Windows.

I have deployed many production HPC servers and clusters but have never deployed a Windows server. To my pleasure, the MS Windows Enterprise 2003 x64 OS installed on to the Sun v40z smoothly and without issue. Once installed, it was a matter of minutes before the machine was functioning as a Web/file/terminal server.

NCBI BLAST performance is a reasonable metric for gauging the capability of a system. I ported blastall to Windows 64-bit using the MS Visual C++ 2005 beta. Technically, the port was straightforward but operatively challenging, since I had to learn the Windows development platform paradigm. While the blastall per-processor-execution performance was comparable to execution under Linux Opteron servers, I was most pleased to see that the performance scaled nicely with the number of processors.

As a further test of the capability of the OS, I deployed the Microsoft Virtual Server Beta 1 software that creates virtual machines that run independently but under the supervision of the host OS. Virtual Server guided me to configure the memory and disk parameters of a new virtual machine. When I turned the new virtual machine on, it immediately picked up on a pxe/tftp/dhcp boot image server and loaded and installed SUSE Professional 9.3 OS. I repeated this several times until I had my own virtual cluster running where each node was accessible from its own IP address. Although this exercise of running Linux under Windows has little value beyond gee whiz, it could be a potentially efficient cluster software development test platform.

Michael Athanas, Ph.D., is a life sciences informatics consultant and a founding partner of The BioTeam, a scalable informatics solutions provider. E-mail:

Join the CIO New Zealand group on LinkedIn. The group is open to CIOs, IT Directors, COOs, CTOs and senior IT managers.

Join the newsletter!

Error: Please check your email address.
Show Comments