Skip to content

title: sacct: SLURM user guide#

Most Commonly Used sacct Options#

The sacct command is a powerful tool for monitoring job status and resource usage on the SLURM-based computing cluster at Institut Pasteur. Below are the most frequently used options, along with explanations and practical examples.


πŸ› οΈ Common sacct Options#

Option Description
-j <jobid1,jobid2> Display the state of specific jobs only (comma-separated job IDs).
-D For jobs that have been requeued, show information from the first run as well.
-S <starttime> Show jobs started after the given date/time. Format: mm/dd, mm/dd/yy, or hh:mm.
-E <endtime> Show jobs ended before the given date/time. Same format as -S.
--state=<STATE1,STATE2> Filter jobs by state. Common states:
β€’ CA or CANCELED
β€’ CD or COMPLETED
β€’ F or FAILED
β€’ PD or PENDING
β€’ R or RUNNING
β€’ TO or TIMEOUT
--partition <partition name> Filter jobs submitted to a specific partition (e.g., dedicated, common).
--qos <qos name> Filter jobs by Quality of Service (e.g., fast, normal).
--format=<field1,field2> Display only specific fields (comma-separated). Common fields: jobid, jobname, user, partition, qos, state, start, end, elapse, exitcode, ncpus, nodelist, reqmem, maxrss.

πŸ“Œ Example 1: Today’s Jobs in Dedicated Partition with Fast QoS#

Display all jobs from today submitted to the dedicated partition with fast QoS, showing only relevant fields and filtering for completed, running, and pending jobs.

login@maestro-submit ~ $ sacct \
  --partition=dedicated \
  --qos=fast \
  --format=jobid,jobname,user,partition,qos,state,start,end,elapse,exitcode,ncpus,nodelist

Output:#

       JobID    JobName      User  Partition        QOS     State                Start                End    Elapsed  ExitCode      NCPUS        NodeList 
------------ ---------- --------- ---------- ---------- ---------- ------------------- ------------------- ---------- -------- ---------- --------------- 
17131680          myjob     user1  dedicated       fast  COMPLETED 2017-08-10T15:06:59 2017-08-10T15:22:49   00:15:50      0:0         12        maestro-1008 
17131682          myjob     user1  dedicated       fast  COMPLETED 2017-08-10T15:06:59 2017-08-10T15:29:19   00:22:20      0:0         12        maestro-1049 
17131683          myjob     user1  dedicated       fast  COMPLETED 2017-08-10T15:06:59 2017-08-10T15:20:28   00:13:29      0:0         12        maestro-1057 
17131684       otherjob     user1  dedicated       fast     FAILED 2017-08-10T15:06:59 2017-08-10T15:12:38   00:05:39      1:0          2        maestro-1058 
17131685       otherjob     user1  dedicated       fast     FAILED 2017-08-10T15:06:59 2017-08-10T15:21:51   00:14:52      1:0          2        maestro-1059 

πŸ“Œ Example 2: Failed Jobs Between August 8th and 10th#

Find all jobs that failed and ran between August 8th and 10th.

login@maestro-submit ~ $ sacct --state=FAILED -S 08/08 -E 08/10

Output:#

       JobID    JobName  Partition    Account  AllocCPUS      State ExitCode 
------------ ---------- ---------- ---------- ---------- ---------- -------- 
16394545         mummer     common       fast          1     FAILED      2:0 
16394545.ba+      batch                  fast          1     FAILED      2:0 
16401727     prokka_14+     common       fast          4     FAILED      2:0 
16401727.ba+      batch                  fast          4     FAILED      2:0 

πŸ“Œ Example 3: Failed or Cancelled Jobs (Nov 15–17) – Memory Usage Analysis#

Identify jobs that failed or were cancelled between November 15th and 17th, and analyze memory usage.

This helps detect whether jobs exceeded allocated memory:

  • If ExitCode = 15: SLURM sent SIGTERM (graceful termination).
  • If ExitCode = 9: OS sent SIGKILL (immediate termination due to memory exhaustion).
login@maestro-submit ~ $ sacct \
  --state=F,CA \
  -S 11/15 \
  -E 11/17 \
  --format=jobid,jobname,partition,qos,state,start,end,elapse,exitcode,ncpus,nodelist,reqmem,maxrss

Output:#

      JobID    JobName   Partition        QOS      State               Start    Elapsed                 End ExitCode      NCPUS   NNodes        NodeList     ReqMem     MaxRSS 
------------ ---------- ---------- ---------- ---------- ------------------- ---------- ------------------- -------- ---------- -------- --------------- ---------- ---------- 
38697769     nf-Clean_+     common     normal CANCELLED+ 2019-11-15T10:29:04   00:14:01 2019-11-15T10:43:05      0:0          1        1        maestro-1021        4Gc            
38697769.ba+      batch                           FAILED 2019-11-15T10:29:04   00:14:03 2019-11-15T10:43:07     15:0          1        1        maestro-1021        4Gc   6392132K 
38697769.ex+     extern                        COMPLETED 2019-11-15T10:29:04   00:14:01 2019-11-15T10:43:05      0:0          1        1        maestro-1021        4Gc        84K 
38697782     nf-Clean_+     common     normal CANCELLED+ 2019-11-15T10:29:26   00:13:39 2019-11-15T10:43:05      0:0          1        1        maestro-1020        4Gc            
38697782.ba+      batch                           FAILED 2019-11-15T10:29:26   00:13:40 2019-11-15T10:43:06     15:0          1        1        maestro-1020        4Gc   5257816K 
...

πŸ” Key Observations:#

  • MaxRSS > ReqMem: Indicates memory overrun (e.g., 5.2G > 4G).
  • ExitCode = 15: SLURM killed the job after sending SIGTERM.
  • ExitCode = 9: OS killed the job immediately due to lack of memory.

βœ… Recommendation: Increase --mem or --mem-per-cpu in your job script if memory is frequently exceeded.


πŸ“ Tips#

  • Use --format to reduce clutter and focus on essential data.
  • Combine --state, --partition, --qos, and date filters for precise queries.
  • Always check MaxRSS vs ReqMem when diagnosing job failures.

πŸ“Œ Pro Tip: Save frequently used sacct commands in a shell alias or script for faster access!

πŸ’‘ Example alias:

alias sacct_failed='sacct --state=FAILED --format=jobid,jobname,state,exitcode,reqmem,maxrss'


πŸš€ Mastering sacct helps you debug, optimize, and manage your jobs efficiently on the cluster!