hpc.social


High Performance Computing
Practitioners
and friends /#hpc
Share: 
This is a crosspost from   Blogs on Technical Computing Goulash Recent content in Blogs on Technical Computing Goulash. See the original post here.

Turning up the heat...on my Armada 8040

Although I took delivery of a shiny new SolidRun Marvell macchiatoBIN a few months back (end May), I’ve not really had a chance to put it through it’s paces until now. For those of you who are not familiar with the board, it’s a high-performance 64-bit Arm (v8) board designed really for networking. It’s based on the Marvell ARMADA 8040 processor for those who like to keep track. For those looking for more information about the board, there is a community page here.

What struck me about the board when I originally unpacked it were the shiny heatsinks. Definitely looks cool on my workbench (desk)! They did seem up to the task of keeping the mighty ARMADA 8040 cool as a cucumber - or so I thought.

Following the procedure (struggling) to install Ubuntu as described on the macchiatoBIN wiki - which ironically required me to use an x86 box to compile some necessary bits, I was off to the races with Ubuntu 16.04.3 LTS (Xenial Xerus). Note that this whole procedure left much to be desired as it was my understanding that this board was to be ARM SBSA compliant - which would allow any compliant OS distro to be used. This is something which at the time of writing is not the case - hope that an update does address this.

Being a high-performance computing kind of guy, my first challenge was to run the High-Performance Linpack (HPL) on the system. HPL you say? Yes, I know we can debate the merits of HPL all day long, but nevertheless it’s still a measure of some specific dimensions of system performance - and indeed it’s used to rank systems on the TOP500 list of Supercomputers. Because I was looking to run more than just HPL on the system, I opted to install Phoronix test suite which includes HPCC (HPC Challenge) as an available benchmark.

To get warmed up, I decided to first run the well know stream memory benchmark. Via Phoronix, I installed the stream benchmark and executed it.

root@flotta:~# phoronix-test-suite install-test pts/stream

 

Phoronix Test Suite v5.2.1
 

    To Install: pts/stream-1.3.1
 

    Determining File Requirements ...........................................
    Searching Download Caches ...............................................
 

    1 Test To Install

        1 File To Download [0.01MB]
        1MB Of Disk Space Is Needed
 

    pts/stream-1.3.1:

        Test Installation 1 of 1
        1 File Needed [0.01 MB / 1 Minute]
        Downloading: stream-2013-01-17.tar.bz2                       [0.01MB]
        Estimated Download Time: 1m .........................................
        Installation Size: 0.1 MB
        Installing Test @ 19:45:06

[NOTICE] Undefined: 0 in phodevi_cpu:267

[NOTICE] Undefined: 0 in phodevi_cpu:272

Next, we execute the benchmark stream

root@flotta:~# phoronix-test-suite benchmark pts/stream

 

Phoronix Test Suite v5.2.1

 

    Installed: pts/stream-1.3.1

 

 

Stream 2013-01-17:

    pts/stream-1.3.1

    Memory Test Configuration

        1: Copy

        2: Scale

        3: Add

        4: Triad

        5: Test All Options

        Type: 5

 

 

System Information

 

Hardware:

Processor: Unknown @ 1.30GHz (4 Cores), Memory: 4096MB, Disk: 8GB 8GME4R

 

Software:

OS: Ubuntu 16.04, Kernel: 4.4.8-armada-17.02.2-g4126e30 (aarch64), Compiler: GCC 5.4.0 20160609, File-System: ext4

 

    Would you like to save these test results (Y/n): n

 

 

Stream 2013-01-17:

    pts/stream-1.3.1 [Type: Copy]

    Test 1 of 4

    Estimated Trial Run Count:    5

    Estimated Test Run-Time:      7 Minutes

    Estimated Time To Completion: 25 Minutes

        Started Run 1 @ 19:46:03

        Started Run 2 @ 19:48:09

        Started Run 3 @ 19:50:14

        Started Run 4 @ 19:52:19

        Started Run 5 @ 19:54:24  [Std. Dev: 0.35%]

 

    Test Results:

        6701.1

        6669.1

        6655.9

        6637.4

        6657.1

 

    Average: 6664.12 MB/s

 

 

Stream 2013-01-17:

    pts/stream-1.3.1 [Type: Scale]

    Test 2 of 4

    Estimated Trial Run Count:    5

    Estimated Test Run-Time:      7 Minutes

    Estimated Time To Completion: 19 Minutes

        Started Run 1 @ 19:56:27

        Started Run 2 @ 19:56:27

        Started Run 3 @ 19:56:27

        Started Run 4 @ 19:56:27

        Started Run 5 @ 19:56:27  [Std. Dev: 0.12%]

 

    Test Results:

        7248.8

        7261.8

        7252.8

        7245.6

        7266.1

 

    Average: 7255.02 MB/s

 

 

Stream 2013-01-17:

    pts/stream-1.3.1 [Type: Triad]

    Test 3 of 4

    Estimated Trial Run Count:    5

    Estimated Test Run-Time:      7 Minutes

    Estimated Time To Completion: 13 Minutes

        Started Run 1 @ 19:56:27

        Started Run 2 @ 19:56:27

        Started Run 3 @ 19:56:27

        Started Run 4 @ 19:56:27

        Started Run 5 @ 19:56:27  [Std. Dev: 0.47%]

 

    Test Results:

        6872.3

        6895.9

        6934.9

        6847.9

        6889.5

 

    Average: 6888.10 MB/s

 

 

Stream 2013-01-17:

    pts/stream-1.3.1 [Type: Add]

    Test 4 of 4

    Estimated Trial Run Count:    5

    Estimated Time To Completion: 7 Minutes

        Started Run 1 @ 19:56:27

        Started Run 2 @ 19:56:27

        The test run ended prematurely.

        Started Run 3 @ 19:56:27

        Started Run 4 @ 19:56:27

        Started Run 5 @ 19:56:27

        The test run ended prematurely.  [Std. Dev: 0.09%]

 

    Test Results:

        6559.5

        6549.8

        6560.8

 

    Average: 6556.70 MB/s

We see during execution that stream definitely puts the system through it’s paces

top - 19:47:20 up  3:07,  2 users,  load average: 3.10, 1.21, 0.70

Tasks: 119 total,   2 running, 117 sleeping,   0 stopped,   0 zombie
%Cpu(s): 96.3 us,  0.4 sy,  0.0 ni,  1.7 id,  0.0 wa,  0.0 hi,  1.7 si,  0.0 st
KiB Mem :  3779668 total,   986744 free,  2410052 used,   382872 buff/cache
KiB Swap:        0 total,        0 free,        0 used.  1287376 avail Mem

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND    
18920 root      20   0 2370508 2.236g   1336 R 389.1 62.0   4:54.12 stream-bin
6854 root      20   0       0      0      0 S   2.3  0.0   0:07.30 kworker/u8+
18924 root     20   0    9288   3208   2632 R   0.7  0.1   0:00.37 top        
    3 root      20   0       0      0      0 S   0.3  0.0   0:00.50 ksoftirqd/0
    1 root      20   0    6868   5144   3476 S   0.0  0.1   0:03.90 systemd    
    2 root      20   0       0      0      0 S   0.0  0.0   0:00.00 kthreadd   
    5 root       0 -20       0      0      0 S   0.0  0.0   0:00.00 kworker/0:+
    7 root      20   0       0      0      0 S   0.0  0.0   0:01.82 rcu_preempt
    8 root      20   0       0      0      0 S   0.0  0.0   0:00.00 rcu_sched
    9 root      20   0       0      0      0 S   0.0  0.0   0:00.00 rcu_bh     
   10 root      rt   0       0      0      0 S   0.0  0.0   0:00.08 migration/0
   11 root      rt   0       0      0      0 S   0.0  0.0   0:00.14 watchdog/0
   12 root      rt   0       0      0      0 S   0.0  0.0   0:00.11 watchdog/1
   13 root      rt   0       0      0      0 S   0.0  0.0   0:00.07 migration/1
   14 root      20   0       0      0      0 S   0.0  0.0   0:00.03 ksoftirqd/1
   16 root       0 -20       0      0      0 S   0.0  0.0   0:00.00 kworker/1:+
   17 root      rt   0       0      0      0 S   0.0  0.0   0:00.13 watchdog/2

Onward and upward as they say. Moving to the HPCC benchmark which contains HPL. We install the pts/hpcc test for Phoronix.

root@flotta:~# phoronix-test-suite install-test pts/hpcc

 

Phoronix Test Suite v5.2.1

 

    To Install: pts/hpcc-1.2.0

 

    Determining File Requirements ...........................................

    Searching Download Caches ...............................................

 

    1 Test To Install

        1 File To Download [0.63MB]

        9MB Of Disk Space Is Needed

 

    pts/hpcc-1.2.0:

        Test Installation 1 of 1

        1 File Needed [0.63 MB / 1 Minute]

        Downloading: hpcc-1.5.0.tar.gz                               [0.63MB]

        Estimated Download Time: 1m .........................................

        Installation Size: 9 MB

        Installing Test @ 20:00:53

 

        [NOTICE] Supported install-time optional variables include $MPI_PATH,

        $MPI_INCLUDE, $MPI_CC, $MPI_LIBS, $LA_PATH, $LA_INCLUDE, $LA_LIBS,

        $CFLAGS, $LD_FLAGS, and $MPI_LD

 

        [NOTICE] Supported run-time optional environment variables include

        $N, $NB, $MPI_NUM_THREADS, $HOSTFILE

Before starting the HPL run, I put together a quick script to monitor the temperature utilization during the HPL run. The script simply prints out the values of /sys/class/thermal/thermal_zone[X]/temp in human readable format.

#!/bin/sh

while [ true ]

do

echo "$(date) @ $(hostname)"

echo "-----------------------"

cpu0=`cat /sys/class/thermal/thermal_zone0/temp`

cpu1=`cat /sys/class/thermal/thermal_zone1/temp`

cpu2=`cat /sys/class/thermal/thermal_zone2/temp`

echo "thermal_zone0 = $((cpu0/1000)) 'C"

echo "thermal_zone1 = $((cpu1/1000)) 'C"

echo "thermal_zone2 = $((cpu2/1000)) 'C"

/bin/sleep 10

done

After starting HPL and monitoring the temperatures, I found that they rapidly climbed to some rather uncomfortable levels - especially given that my ARMADA 8040 board currently has no cooling fan. So after kicking off the initial run (see below) I decided to err on the side of caution and get a fan setup before I do any serious damage to the board.

gsamu@flotta:~$ phoronix-test-suite benchmark pts/hpcc

 

Phoronix Test Suite v5.2.1

 

    Installed: pts/hpcc-1.2.0

 

 

HPC Challenge 1.5.0:

    pts/hpcc-1.2.0

    Processor Test Configuration

        1:  G-HPL

        2:  G-Ptrans

        3:  G-Random Access

        4:  G-Ffte

        5:  EP-STREAM Triad

        6:  EP-DGEMM

        7:  Random Ring Latency

        8:  Random Ring Bandwidth

        9:  Max Ping Pong Bandwidth

        10: Test All Options

        Test / Class: 1

 

 

System Information

 

 

[NOTICE] Undefined: 0 in phodevi_cpu:267

 

[NOTICE] Undefined: 0 in phodevi_cpu:272

Hardware:

Processor: Unknown @ 1.30GHz (4 Cores), Memory: 4096MB, Disk: 8GB 8GME4R

 

Software:

OS: Ubuntu 16.04, Kernel: 4.4.8-armada-17.02.2-g4126e30 (aarch64), Compiler: GCC 5.4.0 20160609, File-System: ext4

 

    Would you like to save these test results (Y/n): n

 

 

HPC Challenge 1.5.0:

    pts/hpcc-1.2.0 [Test / Class: G-HPL]

    Test 1 of 1

    Estimated Trial Run Count:    3

    Estimated Time To Completion: 1 Hour, 28 Minutes

        Started Run 1 @ 20:12:13^C

The above run was aborted when the temperature shown by my temperature monitoring script peaked 100 C.

Tue Aug 29 20:23:12 EDT 2017 @ flotta.localdomain

-----------------------

thermal_zone0 = 99 'C

thermal_zone1 = 91 'C

thermal_zone2 = 92 'C

Tue Aug 29 20:23:22 EDT 2017 @ flotta.localdomain

-----------------------

thermal_zone0 = 100 'C

thermal_zone1 = 92 'C

thermal_zone2 = 92 'C

Tue Aug 29 20:23:32 EDT 2017 @ flotta.localdomain

-----------------------

thermal_zone0 = 101 'C

thermal_zone1 = 93 'C

thermal_zone2 = 93 'C

Tue Aug 29 20:23:42 EDT 2017 @ flotta.localdomain

-----------------------

thermal_zone0 = 101 'C

thermal_zone1 = 94 'C

thermal_zone2 = 93 'C

Tue Aug 29 20:23:52 EDT 2017 @ flotta.localdomain

-----------------------

thermal_zone0 = 101 'C

thermal_zone1 = 94 'C

thermal_zone2 = 94 'C

So as I wait for my new cooling fan and Open Benchtable to arrive, I’ll get back to thrashing some good old Intel hardware…Hey for some real fun, I can disconnect the CPU fans on those ones :)

Gábor out!

UPDATE!!!

Well I decided to press ahead tonight with a run of HPL on my macchiatoBIN board. To monitor the temperature (recall that my current configuration is with passive cooling) I put together a small script to dump the values of the following to a text file during the HPL run:

The run started ok, but I lost contact with the macchiatoBIN after about 55 minutes…when it was on run 2 of HPL:

gsamu@flotta:~$ phoronix-test-suite benchmark pts/hpcc

 

Phoronix Test Suite v5.2.1

 

    Installed: pts/hpcc-1.2.0

 

 

HPC Challenge 1.5.0:

    pts/hpcc-1.2.0

    Processor Test Configuration

        1:  G-HPL

        2:  G-Ptrans

        3:  G-Random Access

        4:  G-Ffte

        5:  EP-STREAM Triad

        6:  EP-DGEMM

        7:  Random Ring Latency

        8:  Random Ring Bandwidth

        9:  Max Ping Pong Bandwidth

        10: Test All Options

        Test / Class: 1

 

 

System Information

 

 

[NOTICE] Undefined: 0 in phodevi_cpu:267

 

[NOTICE] Undefined: 0 in phodevi_cpu:272

Hardware:

Processor: Unknown @ 1.30GHz (4 Cores), Memory: 4096MB, Disk: 8GB 8GME4R

 

Software:

OS: Ubuntu 16.04, Kernel: 4.4.8-armada-17.02.2-g4126e30 (aarch64), Compiler: GCC 5.4.0 20160609, File-System: ext4

 

    Would you like to save these test results (Y/n): n

 

 

HPC Challenge 1.5.0:

    pts/hpcc-1.2.0 [Test / Class: G-HPL]

    Test 1 of 1

    Estimated Trial Run Count:    3

    Estimated Time To Completion: 1 Hour, 28 Minutes

        Started Run 1 @ 19:20:10

        Started Run 2 @ 19:50:34

And I guess this gives the reason (from /var/log/messages)…

Feb  6 20:18:35 flotta kernel: armada_thermal f06f808c.thermal: Overheat critical high threshold temperature reached

Plotting the temperature metrics with gnuplot - we see that we were well in the triple digits. Oh my! At this stage, I should probably stop abusing this poor board and wait until my Noctua industrial fan arrives :)