Beginner, HPC, News, Software

Summarising job performance

Rating value
Providing a rating to jobs.

Recently we have implements an automatic job summary which is printed at the end of a user’s job that run on our supercomputer at Cardiff University. These job summaries are to help the user understand what impact the job has had and how to improve future jobs – just like energy efficiency ratings. This post will disect the job summary and see what information is available to us.

What is a “job summary”

A job summary is an attempt to report to the user how the job ran on the system. It uses the feature of the PBS Pro epilogue script to allow commands to be run at the end of jobs. For example the following is an example job summary for a MPI job.

--------------------------------------------------------------------------------
-------------------------- JOB SUMMARY INFORMATION -----------------------------
--------------------------------------------------------------------------------

        --------------
        PBS Infomation
        --------------
        Job id         = 81078.raven0
        User           = sistg1
        Group          = PMITS-All
        Job name       = DLPOLY4
        Session id     = 35103
        Req. resources = mem=126gb,ncpus=64,place=scatter:excl,walltime=00:10:00
        Used resources = cpupercent=1551,cput=01:03:23,mem=2124768kb,ncpus=64,vmem=6695236kb,walltime=00:04:07
        Queue          = workq_haswell
        Account        = null
        Exit status    = 0

        -----------------
        Usage Information
        -----------------
        Number of CPUS       = 64
        Average CPU usage    = 1551.00%
        Mem usage (kb)       = 2124768kb
        CPU time  (hh:mm:ss) = 01:03:23
        Wallclock (hh:mm:ss) = 00:04:07
        Est. efficiency      = 24.06%

        ------------------
        Lustre Information
        ------------------
        + Metadata information +
mdt.scratch-MDT0000.job_stats:
-       close: {samples: 294, unit: reqs}
        getattr: {samples: 58, unit: reqs}
        getxattr: {samples: 2, unit: reqs}
        mkdir: {samples: 1, unit: reqs}
        open: {samples: 294, unit: reqs}
        setattr: {samples: 22, unit: reqs}
        setxattr: {samples: 4, unit: reqs}
        statfs: {samples: 98, unit: reqs}
        unlink: {samples: 8, unit: reqs}

        + Storage information + 
obdfilter.scratch-OST0000.job_stats:
-       punch: {samples: 2, unit: reqs}
        setattr: {samples: 1, unit: reqs}
        write: {max: 332, min: 332, samples: 1, sum: 332, unit: bytes}
obdfilter.scratch-OST0001.job_stats:
-       punch: {samples: 5, unit: reqs}
        read: {max: 1048576, min: 4096, samples: 1298, sum: 401936384,
                unit: bytes}
        setattr: {samples: 2, unit: reqs}
        write: {max: 1048576, min: 188, samples: 873, sum: 725154346,
                unit: bytes}
obdfilter.scratch-OST0002.job_stats:
-       punch: {samples: 2, unit: reqs}
        write: {max: 20839, min: 5176, samples: 2, sum: 26015,
                unit: bytes}
obdfilter.scratch-OST0003.job_stats:
-       punch: {samples: 7, unit: reqs}
obdfilter.scratch-OST0004.job_stats:
-       punch: {samples: 2, unit: reqs}
        setattr: {samples: 1, unit: reqs}
        write: {max: 1045064, min: 129992, samples: 124, sum: 127946776,
                unit: bytes}
obdfilter.scratch-OST0005.job_stats:
-       read: {max: 4096, min: 4096, samples: 446, sum: 1826816,
                unit: bytes}
        write: {max: 1048576, min: 12288, samples: 738, sum: 583733248,
                unit: bytes}
obdfilter.scratch-OST0006.job_stats:
-       punch: {samples: 7, unit: reqs}
        read: {max: 4096, min: 4096, samples: 452, sum: 1851392,
                unit: bytes}
        setattr: {samples: 1, unit: reqs}
        write: {max: 1048576, min: 24576, samples: 865, sum: 713719621,
                unit: bytes}
obdfilter.scratch-OST0007.job_stats:
-       read: {max: 1048576, min: 4096, samples: 650, sum: 373231616,
                unit: bytes}
        setattr: {samples: 2, unit: reqs}
        write: {max: 1048576, min: 1048576, samples: 134, sum: 140509184,
                unit: bytes}
obdfilter.scratch-OST0008.job_stats:
-       punch: {samples: 2, unit: reqs}
        write: {max: 1045064, min: 207688, samples: 123, sum: 127111312,
                unit: bytes}
obdfilter.scratch-OST0009.job_stats:
-       punch: {samples: 7, unit: reqs}
        write: {max: 20839, min: 1783, samples: 2, sum: 22622,
                unit: bytes}
obdfilter.scratch-OST000b.job_stats:
-       read: {max: 4096, min: 4096, samples: 450, sum: 1843200,
                unit: bytes}
        write: {max: 1048576, min: 49152, samples: 734, sum: 584508781,
                unit: bytes}

--------------------------------------------------------------------------------

The job summary is quite verbose but does contain some useful information. Lets look at each section.

  • PBS Information – PBS information contains the raw output which the PBS job scheduler reports. This should all be fairly familiar for anyone who has written a PBS Pro job script. The useful variables are things such as the “Job id” that can be referenced if you need to report a problem with a job.
  • Usage Information – The PBS information is used to calculate some useful statistics. These include the number of CPUs used, average CPU usage (this is a running average using a weighted sum of CPU usage within an interval), memory usage (using a special PBS Pro method of measuring memory usage across nodes), and estimated efficiency (CPU time divided by number of CPUs and then divided by the wallclock time). The estimated efficiency and memory are the most useful stats to monitor job performance.
  • Lustre information – Lustre has an option to monitor IO requests for the use of a job scheduler so PBS Pro jobs can see what Lustre resources were used. This is the most difficult information to summarise and at the moment it is a bit too verbose but we are working on improving this (things such as caching can make the numbers look even more confusing). A good guide on efficient Lustre usage can be found here. The key stats are:
    • Metadata Information – shows how efficient your job is at handling files. When a file is opened and closed then the these metadata operations tend to be a barrier in any parallel filesystem since all Lustre nodes need to know the latest information about the files state. Try to keep these operations to a minimum, if there are many opens and closes (in the 1000s) then the job needs to think about how it is handling files and maybe think about reading or writing the data in chunks rather than opening and closing every time it wants to perform an operation.
    • Storage information – shows where and how much data has been transferred to all the Lustre storage devices (this is why it is a parallel filesystem). The main thing to check is read and write values. The “sum” is the total amount of data transferred with “min” and “max” showing the range of data chunks transferred within a IO request. On a Lustre system the “min” and “max” should be fairly high since writing small chunks of data (less than 1Mb) is inefficient. Although too high “max” value can also be detrimental and should be limited to around 32Mb

“Job summary” summary

This is a new feature in our PBS jobs and do welcome feedback. We plan to try and add more information as this feature evolves. For example we ask users to set a walltime in their jobs, if this walltime and the actual walltime is too great we could add an advice section to provide some quick-wins for running the job in the future. This advice section could be expanded to include information about Lustre performance (which is too detailed at the moment). As mentioned at the beginning, electrical goods are rated on energy efficiency, maybe PBS jobs should be graded depending on meeting certain recommended practices!

Comments

No comments.

Leave a Reply

Your email address will not be published. Required fields are marked *