title: sacct: SLURM user guide#

Most Commonly Used `sacct` Options#

The sacct command is a powerful tool for monitoring job status and resource usage on the SLURM-based computing cluster at Institut Pasteur. Below are the most frequently used options, along with explanations and practical examples.

🛠️ Common `sacct` Options#

Option	Description
`-j <jobid1,jobid2>`	Display the state of specific jobs only (comma-separated job IDs).
`-D`	For jobs that have been requeued, show information from the first run as well.
`-S <starttime>`	Show jobs started after the given date/time. Format: `mm/dd`, `mm/dd/yy`, or `hh:mm`.
`-E <endtime>`	Show jobs ended before the given date/time. Same format as `-S`.
`--state=<STATE1,STATE2>`	Filter jobs by state. Common states: • `CA` or `CANCELED` • `CD` or `COMPLETED` • `F` or `FAILED` • `PD` or `PENDING` • `R` or `RUNNING` • `TO` or `TIMEOUT`
`--partition <partition name>`	Filter jobs submitted to a specific partition (e.g., `dedicated`, `common`).
`--qos <qos name>`	Filter jobs by Quality of Service (e.g., `fast`, `normal`).
`--format=<field1,field2>`	Display only specific fields (comma-separated). Common fields: `jobid`, `jobname`, `user`, `partition`, `qos`, `state`, `start`, `end`, `elapse`, `exitcode`, `ncpus`, `nodelist`, `reqmem`, `maxrss`.

📌 Example 1: Today’s Jobs in Dedicated Partition with Fast QoS#

Display all jobs from today submitted to the dedicated partition with fast QoS, showing only relevant fields and filtering for completed, running, and pending jobs.

login@maestro-submit ~ $ sacct \
  --partition=dedicated \
  --qos=fast \
  --format=jobid,jobname,user,partition,qos,state,start,end,elapse,exitcode,ncpus,nodelist

Output:#

       JobID    JobName      User  Partition        QOS     State                Start                End    Elapsed  ExitCode      NCPUS        NodeList 
------------ ---------- --------- ---------- ---------- ---------- ------------------- ------------------- ---------- -------- ---------- --------------- 
17131680          myjob     user1  dedicated       fast  COMPLETED 2017-08-10T15:06:59 2017-08-10T15:22:49   00:15:50      0:0         12        maestro-1008 
17131682          myjob     user1  dedicated       fast  COMPLETED 2017-08-10T15:06:59 2017-08-10T15:29:19   00:22:20      0:0         12        maestro-1049 
17131683          myjob     user1  dedicated       fast  COMPLETED 2017-08-10T15:06:59 2017-08-10T15:20:28   00:13:29      0:0         12        maestro-1057 
17131684       otherjob     user1  dedicated       fast     FAILED 2017-08-10T15:06:59 2017-08-10T15:12:38   00:05:39      1:0          2        maestro-1058 
17131685       otherjob     user1  dedicated       fast     FAILED 2017-08-10T15:06:59 2017-08-10T15:21:51   00:14:52      1:0          2        maestro-1059

📌 Example 2: Failed Jobs Between August 8th and 10th#

Find all jobs that failed and ran between August 8th and 10th.

login@maestro-submit ~ $ sacct --state=FAILED -S 08/08 -E 08/10

Output:#

       JobID    JobName  Partition    Account  AllocCPUS      State ExitCode 
------------ ---------- ---------- ---------- ---------- ---------- -------- 
16394545         mummer     common       fast          1     FAILED      2:0 
16394545.ba+      batch                  fast          1     FAILED      2:0 
16401727     prokka_14+     common       fast          4     FAILED      2:0 
16401727.ba+      batch                  fast          4     FAILED      2:0

📌 Example 3: Failed or Cancelled Jobs (Nov 15–17) – Memory Usage Analysis#

Identify jobs that failed or were cancelled between November 15th and 17th, and analyze memory usage.

This helps detect whether jobs exceeded allocated memory:

If ExitCode = 15: SLURM sent SIGTERM (graceful termination).
If ExitCode = 9: OS sent SIGKILL (immediate termination due to memory exhaustion).

login@maestro-submit ~ $ sacct \
  --state=F,CA \
  -S 11/15 \
  -E 11/17 \
  --format=jobid,jobname,partition,qos,state,start,end,elapse,exitcode,ncpus,nodelist,reqmem,maxrss

Output:#

      JobID    JobName   Partition        QOS      State               Start    Elapsed                 End ExitCode      NCPUS   NNodes        NodeList     ReqMem     MaxRSS 
------------ ---------- ---------- ---------- ---------- ------------------- ---------- ------------------- -------- ---------- -------- --------------- ---------- ---------- 
38697769     nf-Clean_+     common     normal CANCELLED+ 2019-11-15T10:29:04   00:14:01 2019-11-15T10:43:05      0:0          1        1        maestro-1021        4Gc            
38697769.ba+      batch                           FAILED 2019-11-15T10:29:04   00:14:03 2019-11-15T10:43:07     15:0          1        1        maestro-1021        4Gc   6392132K 
38697769.ex+     extern                        COMPLETED 2019-11-15T10:29:04   00:14:01 2019-11-15T10:43:05      0:0          1        1        maestro-1021        4Gc        84K 
38697782     nf-Clean_+     common     normal CANCELLED+ 2019-11-15T10:29:26   00:13:39 2019-11-15T10:43:05      0:0          1        1        maestro-1020        4Gc            
38697782.ba+      batch                           FAILED 2019-11-15T10:29:26   00:13:40 2019-11-15T10:43:06     15:0          1        1        maestro-1020        4Gc   5257816K 
...

🔍 Key Observations:#

MaxRSS > ReqMem: Indicates memory overrun (e.g., 5.2G > 4G).
ExitCode = 15: SLURM killed the job after sending SIGTERM.
ExitCode = 9: OS killed the job immediately due to lack of memory.

✅ Recommendation: Increase --mem or --mem-per-cpu in your job script if memory is frequently exceeded.

📝 Tips#

Use --format to reduce clutter and focus on essential data.
Combine --state, --partition, --qos, and date filters for precise queries.
Always check MaxRSS vs ReqMem when diagnosing job failures.

📌 Pro Tip: Save frequently used sacct commands in a shell alias or script for faster access!

💡 Example alias:

alias sacct_failed='sacct --state=FAILED --format=jobid,jobname,state,exitcode,reqmem,maxrss'

🚀 Mastering sacct helps you debug, optimize, and manage your jobs efficiently on the cluster!

title: sacct: SLURM user guide#

Most Commonly Used sacct Options#

🛠️ Common sacct Options#

📌 Example 1: Today’s Jobs in Dedicated Partition with Fast QoS#

Output:#

📌 Example 2: Failed Jobs Between August 8th and 10th#

Output:#

📌 Example 3: Failed or Cancelled Jobs (Nov 15–17) – Memory Usage Analysis#

Output:#

🔍 Key Observations:#

📝 Tips#

Most Commonly Used `sacct` Options#

🛠️ Common `sacct` Options#