hpc.social


High Performance Computing
Practitioners
and friends /#hpc
Share: 
This is a crosspost from   Blogs on Technical Computing Goulash Recent content in Blogs on Technical Computing Goulash. See the original post here.

Armed and ready with IBM Platform LSF

These days it’s not uncommon to hear about CPUs based upon ARM cores. They can be found in mobile phones, embedded systems, laptops and even servers. Indeed, recently there have been a number of major announcements from vendors building processors based ARM cores. This includes the AMD Opteron A1100, NVIDIA Tegra K1 and even the Apple A7, which is used the iPhone 5s. What these all have in common is that they are 64-bit and based on the ARM v8 ISA. At the same time, the ARM-server chip startup Calxeda announced it was shutting down. Surging power requirements, as well as the announcement of 64-bit chips have led to renewed interest in energy efficient ARM based processors for high performance computing.

When building out an infrastructure for Technical Computing, a workload manager is typically used to control access to the computing resources. As it turns out,the leading workload manager IBM Platfom LSF (formerly Platform Computing) has supported Linux on ARM for about 10 years. In fact, today there are IBM clients using Platform LSF on Linux ARM-based clusters as part of mobile device design and testing.

The current release of IBM Platform LSF 9.1.2 supports Linux on ARM v7 with upcoming support for ARM v8. Given that Platform LSF provides the ability to build out heterogeneous clusters, creating a compute cluster containing ARM, Power and x86 based nodes is a snap. Jobs may be targetted to a specific processor type and the optional portal IBM Platform Application Centre provides an easy to use, highly configurable, application-centric web based interface for job management.

Hello. How do you “doo”?

I’ve recently had the opportunity to test IBM Platform LSF on two node, ARM based cluster . The IBM Platform LSF master node was a Udoo Quad system running Debian Wheezy ARMv7 EABI hard-float. The second node was running Fedora on a ARM v8 simulator. Installation and operation of the software was identical to other platforms. Using the Platform LSF ELIM (External LIM) facility for adding external load indices, I was able to quickly create a script to load the processor temperature on the Udoo Quad system.

Now, putting Platform LSF through it’s paces, we see the type and model and other physical characteristics of the nodes are detected.

$ lshosts -w
HOST_NAME type model cpuf ncpus maxmem maxswp server RESOURCES
udoo LINUX_ARM ARM7l 60.0 4 875M - Yes (mg)
ma1arms4 LINUX_ARM   ARM8  60.0     1   1.8G   1.9G    Yes ()

Looking at the load information on the system, we see the built-in load indices, in addition to the cputemp metric which I introduced to report the CPU temperature (Celsius). At this point the system is essentially idle.

$ lsload -l
HOST_NAME status r15s r1m r15m ut pg io ls it tmp swp mem cputemp
udoo ok 0.5 0.6 1.5 4% 0.0 311 1 0 1297M 0M 701M 45.0
ma1arms4   busy   3.6  *7.7   6.2  52%   0.0   50 3   0  954M  1.9G  1.6G 0.0

Next, we submit a job for execution to Platform LSF. Rather than the requisite sleep job, we submit something a bit more interesting, the HPC Challenge Benchmark (HPCC). Debian Wheezy happens to include a pre-compiled binary which is compiled against OpenMPI.

As the Udoo Quad is a 4 core system (as the name implies), hpcc is submitted requesting 4 cores.

$ bsub -n 4 mpiexec -n 4 /usr/bin/hpcc
Job <2> is submitted to default queue <normal>.

With HPCC running, we quickly see the utilization as well as the CPU temperature increase to 60C.

$ lsload -l
HOST_NAME status r15s r1m r15m ut pg io ls it tmp swp mem cputemp
udoo ok 5.1 5.1 2.4 94% 0.0 49 1 0 1376M 0M 497M 60.0
ma1arms4   ok   0.5  1.1   1.2  40%   0.0   50 3   0  954M  1.9G  1.6G 0.0

During the life of the job, the resource utilization may be easily viewed using the Platform LSF user commands. This includes details such as the PIDs which the job is comprised of.

$ bjobs -l
 
Job <2>, User <debian>, Project <default>, Status <RUN>, Queue <normal>, 
                    Command <mpiexec -n 4 /usr/bin/hpcc>, Share group charged </debian>
Sun Feb 2 23:49:48: Submitted from host <udoo>, CWD </opt/ibm/lsf/conf>, 
                    4 Processors Requested;
Sun Feb 2 23:49:48: Started on 4 Hosts/Processors <udoo> <udoo> <udoo> <udoo>,
Execution Home </home/debian>, Execution CWD </opt/ibm/lsf/conf>;
Sun Feb 2 23:51:05: Resource usage collected.
The CPU time used is 227 seconds.
MEM: 140 Mbytes; SWAP: 455 Mbytes; NTHREAD: 8
PGID: 15678; PIDs: 15678 15679 15681 15682 15683 15684
15685
....
....

New Roads?

Here we could speak of GFlops, and other such measures of performance, but that was not my objective. The key, is that there is a growing interest in non-x86 solutions for Technical Computing. IBM Platform LSF software has supported and continues to support a wide variety of operating systems and processor architectures, from ARM to IBM Power to IBM System z.

As for ARM based development boards such as the Udoo Quad, Parallela Board, etc., they are inexpensive as well as being energy efficient. This fact makes them of interest to HPC scientists looking at possible approaches to energy efficiency for HPC workloads. Let us know your thoughts about the suitability of ARM for HPC workloads.