hpc.social


High Performance Computing
Practitioners
and friends /#hpc
Share: 
This is a crosspost from   Blogs on Technical Computing Goulash Recent content in Blogs on Technical Computing Goulash. See the original post here.

LSF hookin' up with the CRIU

With the unpredicable spring weather here in Southern Ontario, weekend projects are the order of the day. Whether it’s fixing my bike for spring, repairing things in the home which I’ve neglected for far long or topics relating to IT which have been percolating in my head, I am a textbook busybody.

A few decades back, when I was a support engineer at Platform Computing, I had my first experience working with clients using both kernel-level and user-level checkpoint and restart through the HPC workload scheduler Platform LSF (now IBM Spectrum LSF). I distinctly recall that user-level library was a bit tricky as you had to link your home grown code against it - and it had numerous limitations which I can’t recall off the top of my head. Back then, like today, IBM Spectrum LSF provides a number of ways that administrators can extend capabilities using plug-ins. Checkpoint and restart is an example where plug-ins can be used. More about this later.

I’ve been keeping an eye on the project known as CRIU for some time. CRIU, which stands for Checkpoint/Restore In Userspace provides checkpoint and restart functionality on Linux. And I thought it may be an interesting weekend project to integrate CRIU with LSF. As it turns out, I was not blazing any trails here as I found that there are others already using CRIU with LSF today. Nevertheless, I decided to give it a try.

My system of choice for this tinkering was a dual-socket POWER9 based system running CentOS Stream 8 and IBM Spectrum LSF Suite for HPC v10.2.0.12. The LSF online documentation contains information on the specifications of the LSF plugins for checkpoint and restart. The plugins are known as echkpnt and erestart, where the “e” denotes external.

Here is a quick rundown on the steps to integrate CRIU with LSF.

# uname -a
Linux kilenc 4.18.0-373.el8.ppc64le #1 SMP Tue Mar 22 15:28:39 UTC 2022 ppc64le ppc64le ppc64le GNU/Linux

# criu

Usage:
  criu dump|pre-dump -t PID [<options>]
  criu restore [<options>]
  criu check [--feature FEAT]
  criu page-server
  criu service [<options>]
  criu dedup
  criu lazy-pages -D DIR [<options>]

Commands:
  dump           checkpoint a process/tree identified by pid
  pre-dump       pre-dump task(s) minimizing their frozen time
  restore        restore a process/tree
  check          checks whether the kernel support is up-to-date
  page-server    launch page server
  service        launch service
  dedup          remove duplicates in memory dump
  cpuinfo dump   writes cpu information into image file
  cpuinfo check  validates cpu information read from image file

Try -h|--help for more info
gsamu   ALL=NOPASSWD:/usr/sbin/criu

The key for the echkpnt.criu script is to build out the list of PIDs for the job in question. For this I used an inelegant approach - simply scraping the output of the LSF bjobs -l command. This list of PIDs is then used as arguments to the criu dump command. The example echkpnt.criu script is included below.

I used a simple approach as well for erestart.criu. As per the specification for erestart, the key is to create a new LSF jobfile which contains the appropriate criu restore invocation, pointing to the checkpoint data. The example erestart.criu script is included below.

$ bsub -k "/home/gsamu/checkpoint_data method=criu" ./criu_test
Job <12995> is submitted to default queue <normal>.
$ bpeek 12995
<< output from stdout >>
0: Sleeping for three seconds ...
1: Sleeping for three seconds ...
2: Sleeping for three seconds ...
3: Sleeping for three seconds ...
4: Sleeping for three seconds ...

We’ve demonstrated how one can integrate CRIU checkpoint and restart with IBM Spectrum LSF using the echkpnt and erestart interfaces. As highlighted earlier, LSF provides a number of plugin interfaces which provides flexibility to organizations looking to do site specific customizations.