hpc.social


High Performance Computing
Practitioners
and friends /#hpc
Share: 
This is a crosspost from   Blogs on Technical Computing Goulash Recent content in Blogs on Technical Computing Goulash. See the original post here.

Udoo Quad test drive

Here is a brief update regarding my experiences so far with the Udoo Quad board. I call this kicking the tires, but it simply amounts to tinkering with the board and getting a better understanding of it’s capabilities.

My choice of OS for this round of testing is Ubuntu Studio 12.04 armHF, which I obtained from the Udoo Community site downloads page.

As the Udoo Quad includes an on-board SATA connected, I followed the necessary steps to install the OS to the external disk, and to boot from it by selecting the appropriate device from the U-Boot environment. I used the following page as a high-level guide.

The disk in this case was an older ~80GB Hitachi disk that I had in my spares and suitable for the intended purpose. With the system booted up, here is what we see:

 root@udoo-studio-hfp:~# uname -a

Linux udoo-studio-hfp 3.0.35 #1 SMP PREEMPT Mon Dec 16 14:46:12 CET 2013 armv7l armv7l armv7l GNU/Linux

root@udoo-studio-hfp:~# cat /proc/cpuinfo

Processor : ARMv7 Processor rev 10 (v7l)
processor : 0
BogoMIPS : 1988.28

processor : 1
BogoMIPS : 1988.28

processor : 2
BogoMIPS : 1988.28

processor : 3
BogoMIPS : 1988.28

Features : swp half thumb fastmult vfp edsp neon vfpv3 

CPU implementer : 0x41
CPU architecture: 7
CPU variant : 0x2
CPU part : 0xc09
CPU revision : 10

Hardware : SECO i.Mx6 UDOO Board
Revision : 63012
Serial : 0000000000000000

root@udoo-studio-hfp:~# lsscsi

[0:0:0:0]    disk    ATA      Hitachi HTS54128 HP3O  /dev/sda

Using the trusty gnome-disk-utility, the read benchmark returns the following results. If this all looks a bit Mac OS X ish - don’t be alarmed. I’m connecting to my Udoo from my Macbook and tunneling X over ssh. Again keep in mind here that this is an old disk.

I was surprised to find the the cpufreq utilities all worked as expected on the system also. By default, the system booted in a conservative mode (~396 MHz) and with cpufreq-set I successfully enabled the performance governor.

 root@udoo-studio-hfp:/usr/bin# ./cpufreq-info

cpufrequtils 007: cpufreq-info (C) Dominik Brodowski 2004-2009

Report errors and bugs to cpufreq@vger.kernel.org, please.

analyzing CPU 0:

  driver: imx

  CPUs which run at the same hardware frequency: 0 1 2 3

  CPUs which need to have their frequency coordinated by software: 0 1 2 3

  maximum transition latency: 61.0 us.

  hardware limits: 396 MHz - 996 MHz

  available frequency steps: 996 MHz, 792 MHz, 396 MHz

  available cpufreq governors: interactive, conservative, ondemand, userspace, powersave, performance

  current policy: frequency should be within 396 MHz and 996 MHz.

                  The governor "performance" may decide which speed to use

                  within this range.

  current CPU frequency is 996 MHz (asserted by call to hardware).

  cpufreq stats: 996 MHz:8.10%, 792 MHz:0.63%, 396 MHz:91.27%  (172036)

....

As I indicated at the outset, the system has been installed with a ARM HF prepared Linux distribution. This implies that the distro has been compiled with the appropriate flags to enable hardware Floating Point Unit support.
Which should help us to attain better performance for applications which make use of floating point arithmetic.

The system readelf tool can be used to interrogate a binary for architecture information. In this case, I’ve installed the OS supplied HPC Challenge package to give the board it’s baptism into the world of Technical Computing.

 root@udoo-studio-hfp:/etc/apt# dpkg --get-selections |grep hpcc
hpcc install

root@udoo-studio-hfp:/etc/apt# readelf -A /usr/bin/hpcc
Attribute Section: aeabi
File Attributes
  Tag_CPU_name: "7-A"
  Tag_CPU_arch: v7
  Tag_CPU_arch_profile: Application
  Tag_ARM_ISA_use: Yes
  Tag_THUMB_ISA_use: Thumb-2
  Tag_FP_arch: VFPv3-D16
  Tag_ABI_PCS_wchar_t: 4
  Tag_ABI_FP_denormal: Needed
  Tag_ABI_FP_exceptions: Needed
  Tag_ABI_FP_number_model: IEEE 754
  Tag_ABI_align_needed: 8-byte
  Tag_ABI_align_preserved: 8-byte, except leaf SP
  Tag_ABI_enum_size: int
  Tag_ABI_HardFP_use: SP and DP
  Tag_ABI_VFP_args: VFP registers
  Tag_CPU_unaligned_access: v6
  Tag_DIV_use: Not allowed

Now that we’re done kicking the tires, lets take it for a drive!

The intent here was not for a Top 500 run. Rather, just to stress the Udoo Quad with a more intensive workload. For this purpose, I wrote a small Qt program to display the CPU temperature. I was curious to understand how the system would heat up given that it’s passively cooled (with a nice heatsink).

The output from my Linpack run is below:

 ================================================================================
HPLinpack 2.0  --  High-Performance Linpack benchmark  --   September 10, 2008
Written by A. Petitet and R. Clint Whaley,  Innovative Computing Laboratory, UTK
Modified by Piotr Luszczek, Innovative Computing Laboratory, UTK
Modified by Julien Langou, University of Colorado Denver
================================================================================
 

An explanation of the input/output parameters follows:
T/V    : Wall time / encoded variant.
N      : The order of the coefficient matrix A.
NB     : The partitioning blocking factor.
P      : The number of process rows.
Q      : The number of process columns.
Time   : Time in seconds to solve the linear system.
Gflops : Rate of execution for solving the linear system.

The following parameter values will be used:
 
N      :    7000 
NB     :      90      192      110 
PMAP   : Row-major process mapping
P      :       2 
Q      :       2 
PFACT  :   Right 
NBMIN  :       4 
NDIV   :       2 
RFACT  :   Crout 
BCAST  :  1ringM 
DEPTH  :       1 
SWAP   : Mix (threshold = 64)
L1     : transposed form
U      : transposed form
EQUIL  : yes
ALIGN  : 8 double precision words

--------------------------------------------------------------------------------

- The matrix A is randomly generated for each test.
- The following scaled residual check will be computed:
      ||Ax-b||_oo / ( eps * ( || x ||_oo * || A ||_oo + || b ||_oo ) * N )
- The relative machine precision (eps) is taken to be               1.110223e-16
- Computational tests pass if scaled residuals are less than                16.0
 
================================================================================
T/V                N    NB     P     Q               Time                 Gflops
--------------------------------------------------------------------------------
WR11C2R4        7000    90     2     2             133.23              1.717e+00
--------------------------------------------------------------------------------
||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)=        0.0033466 ...... PASSED
================================================================================
T/V                N    NB     P     Q               Time                 Gflops
--------------------------------------------------------------------------------
WR11C2R4        7000   192     2     2             130.95              1.747e+00
--------------------------------------------------------------------------------
||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)=        0.0034782 ...... PASSED
================================================================================
T/V                N    NB     P     Q               Time                 Gflops
--------------------------------------------------------------------------------
WR11C2R4        7000   110     2     2             137.24              1.667e+00
--------------------------------------------------------------------------------
||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)=        0.0034961 ...... PASSED
================================================================================

Finished      3 tests with the following results:
              3 tests completed and passed residual checks,
              0 tests completed and failed residual checks,
              0 tests skipped because of illegal input values.
--------------------------------------------------------------------------------

During the runs of HPCC (in particular the HPLinpack portion), I observed the CPU temperature climb to ~60 degrees Celsius.

I produced a short video showing a run of HPCC along with the Qt CPU temperature app that I created.

That wraps up a successful first test drive. What’s next? OpenCL sees like the next logical step.