title: sacct: SLURM user guide#
Most Commonly Used sacct
Options#
The sacct
command is a powerful tool for monitoring job status and resource usage on the SLURM-based computing cluster at Institut Pasteur. Below are the most frequently used options, along with explanations and practical examples.
π οΈ Common sacct
Options#
Option | Description |
---|---|
-j <jobid1,jobid2> |
Display the state of specific jobs only (comma-separated job IDs). |
-D |
For jobs that have been requeued, show information from the first run as well. |
-S <starttime> |
Show jobs started after the given date/time. Format: mm/dd , mm/dd/yy , or hh:mm . |
-E <endtime> |
Show jobs ended before the given date/time. Same format as -S . |
--state=<STATE1,STATE2> |
Filter jobs by state. Common states: β’ CA or CANCELED β’ CD or COMPLETED β’ F or FAILED β’ PD or PENDING β’ R or RUNNING β’ TO or TIMEOUT |
--partition <partition name> |
Filter jobs submitted to a specific partition (e.g., dedicated , common ). |
--qos <qos name> |
Filter jobs by Quality of Service (e.g., fast , normal ). |
--format=<field1,field2> |
Display only specific fields (comma-separated). Common fields: jobid , jobname , user , partition , qos , state , start , end , elapse , exitcode , ncpus , nodelist , reqmem , maxrss . |
π Example 1: Todayβs Jobs in Dedicated Partition with Fast QoS#
Display all jobs from today submitted to the dedicated
partition with fast
QoS, showing only relevant fields and filtering for completed, running, and pending jobs.
login@maestro-submit ~ $ sacct \
--partition=dedicated \
--qos=fast \
--format=jobid,jobname,user,partition,qos,state,start,end,elapse,exitcode,ncpus,nodelist
Output:#
JobID JobName User Partition QOS State Start End Elapsed ExitCode NCPUS NodeList
------------ ---------- --------- ---------- ---------- ---------- ------------------- ------------------- ---------- -------- ---------- ---------------
17131680 myjob user1 dedicated fast COMPLETED 2017-08-10T15:06:59 2017-08-10T15:22:49 00:15:50 0:0 12 maestro-1008
17131682 myjob user1 dedicated fast COMPLETED 2017-08-10T15:06:59 2017-08-10T15:29:19 00:22:20 0:0 12 maestro-1049
17131683 myjob user1 dedicated fast COMPLETED 2017-08-10T15:06:59 2017-08-10T15:20:28 00:13:29 0:0 12 maestro-1057
17131684 otherjob user1 dedicated fast FAILED 2017-08-10T15:06:59 2017-08-10T15:12:38 00:05:39 1:0 2 maestro-1058
17131685 otherjob user1 dedicated fast FAILED 2017-08-10T15:06:59 2017-08-10T15:21:51 00:14:52 1:0 2 maestro-1059
π Example 2: Failed Jobs Between August 8th and 10th#
Find all jobs that failed and ran between August 8th and 10th.
login@maestro-submit ~ $ sacct --state=FAILED -S 08/08 -E 08/10
Output:#
JobID JobName Partition Account AllocCPUS State ExitCode
------------ ---------- ---------- ---------- ---------- ---------- --------
16394545 mummer common fast 1 FAILED 2:0
16394545.ba+ batch fast 1 FAILED 2:0
16401727 prokka_14+ common fast 4 FAILED 2:0
16401727.ba+ batch fast 4 FAILED 2:0
π Example 3: Failed or Cancelled Jobs (Nov 15β17) β Memory Usage Analysis#
Identify jobs that failed or were cancelled between November 15th and 17th, and analyze memory usage.
This helps detect whether jobs exceeded allocated memory:
- If
ExitCode = 15
: SLURM sentSIGTERM
(graceful termination). - If
ExitCode = 9
: OS sentSIGKILL
(immediate termination due to memory exhaustion).
login@maestro-submit ~ $ sacct \
--state=F,CA \
-S 11/15 \
-E 11/17 \
--format=jobid,jobname,partition,qos,state,start,end,elapse,exitcode,ncpus,nodelist,reqmem,maxrss
Output:#
JobID JobName Partition QOS State Start Elapsed End ExitCode NCPUS NNodes NodeList ReqMem MaxRSS
------------ ---------- ---------- ---------- ---------- ------------------- ---------- ------------------- -------- ---------- -------- --------------- ---------- ----------
38697769 nf-Clean_+ common normal CANCELLED+ 2019-11-15T10:29:04 00:14:01 2019-11-15T10:43:05 0:0 1 1 maestro-1021 4Gc
38697769.ba+ batch FAILED 2019-11-15T10:29:04 00:14:03 2019-11-15T10:43:07 15:0 1 1 maestro-1021 4Gc 6392132K
38697769.ex+ extern COMPLETED 2019-11-15T10:29:04 00:14:01 2019-11-15T10:43:05 0:0 1 1 maestro-1021 4Gc 84K
38697782 nf-Clean_+ common normal CANCELLED+ 2019-11-15T10:29:26 00:13:39 2019-11-15T10:43:05 0:0 1 1 maestro-1020 4Gc
38697782.ba+ batch FAILED 2019-11-15T10:29:26 00:13:40 2019-11-15T10:43:06 15:0 1 1 maestro-1020 4Gc 5257816K
...
π Key Observations:#
MaxRSS
>ReqMem
: Indicates memory overrun (e.g.,5.2G
>4G
).ExitCode = 15
: SLURM killed the job after sendingSIGTERM
.ExitCode = 9
: OS killed the job immediately due to lack of memory.
β Recommendation: Increase
--mem
or--mem-per-cpu
in your job script if memory is frequently exceeded.
π Tips#
- Use
--format
to reduce clutter and focus on essential data. - Combine
--state
,--partition
,--qos
, and date filters for precise queries. - Always check
MaxRSS
vsReqMem
when diagnosing job failures.
π Pro Tip: Save frequently used sacct
commands in a shell alias or script for faster access!
π‘ Example alias:
alias sacct_failed='sacct --state=FAILED --format=jobid,jobname,state,exitcode,reqmem,maxrss'
π Mastering sacct
helps you debug, optimize, and manage your jobs efficiently on the cluster!