Skip to main content »

Trinity College Dublin

Slurm Commands

Display queue/partition names, runtimes and available nodes

[user1@iitac01 ~]$ sinfo
PARTITION AVAIL TIMELIMIT NODES STATE NODELIST
debug* up 3:00:00 6 idle iitac-n[142,144,167,197,227,259]
serial up 1-00:00:00 1 alloc iitac-n306
serial up 1-00:00:00 4 idle iitac-n[086-087,305,328]
compute up 4-00:00:00 2 down* iitac-n[206,341]
compute up 4-00:00:00 1 drain iitac-n088
compute up 4-00:00:00 220 alloc iitac-n[001-004,006-007,009-012,014-016,020-021,023-027,031-032,034-036,038-040,042-044,046-059,061,063-064,067-069,071-075,077-085,089-092,094-096,098-104,106-121,123-124,128-130,181-184,186-189,191-196,198-200,202-204,208-210,217-221,224-226,228-232,234,236-238,240-243,245-246,249-258,260-261,263,265-271,273,275,279,281-284,286-302,304,306,308-312,315-316,318,321-327,329-340,342]
compute up 4-00:00:00 37 idle iitac-n[131-132,134-141,143,145-148,150-151,153-157,159-160,162-165,171-179]
compute up 4-00:00:00 2 down iitac-n[233,307]

Display runtimes and available nodes for a particular queue/partition

[user1@iitac01 ~]$ sinfo -p debug
PARTITION AVAIL TIMELIMIT NODES STATE NODELIST
debug* up 3:00:00 6 idle iitac-n[142,144,167,197,227,259]

Display information about a specific job

[user1@iitac01 ~]$ scontrol show jobid 108
JobId=108 Name=test
UserId=user1(1351) GroupId=trhpc(3114)
Priority=1996 Account=root QOS=normal
JobState=RUNNING Reason=None Dependency=(null)
TimeLimit=00:10:00 Requeue=1 Restarts=0 BatchFlag=1 ExitCode=0:0
SubmitTime=2010-07-27T15:57:18 EligibleTime=2010-07-27T15:57:18
StartTime=2010-07-27T15:57:18 EndTime=2010-07-27T16:07:18
SuspendTime=None SecsPreSuspend=0
Partition=debug AllocNode:Sid=iitac01:8389
ReqNodeList=(null) ExcNodeList=(null)
NodeList=iitac-n[197,227]
NumNodes=2 NumCPUs=4 CPUs/Task=1 ReqS:C:T=65534:65534:65534
MinCPUsNode=1 MinMemoryNode=0 MinTmpDiskNode=0
Features=(null) Reservation=(null)
Shared=OK Contiguous=0 Licenses=(null) Network=(null)
Command=/home/trhpc/user1/job.sh
WorkDir=/home/trhpc/user1

Display only my jobs in the queue

[user1@iitac01 ~]$ squeue -u user1
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
109 debug test-4-c user1 R 0:01 2 iitac-n[197,227]

Display long output about my jobs in the queue

[user1@iitac01 ~]$ squeue -u user1 -l
Tue Jul 27 16:00:07 2010
JOBID PARTITION NAME USER STATE TIME TIMELIMIT NODES NODELIST(REASON)
109 debug test-4-c user1 RUNNING 0:43 10:00 2 iitac-n[197,227]

Display historical information about completed jobs

[user1@iitac01 ~]$ sacct --format=jobid,jobname,account,partition,ntasks,alloccpus,elapsed,state,exitcode -j 66808
       JobID    JobName    Account  Partition   NTasks  AllocCPUS    Elapsed      State ExitCode 
------------ --------- ----------- ---------- -------- ---------- ---------- ---------- -------- 
66808        my_test_j+      acc01    compute                   8   00:02:34  COMPLETED      0:0 
66808.batch       batch      acc01                   1          1   00:02:34  COMPLETED      0:0 

Display 'graphical' view of SLURM jobs and partitions

Show the info, updating every 2 seconds:

[user1@iitac01 ~]$ smap -i 2

Note: press 'q' to quit out of the smap view.

Full list of SLURM commands

Man pages exist for all SLURM daemons, commands, and API functions. The command option --help also provides a brief summary of options. Note that the command options are all case insensitive.

  • sacct is used to report job or job step accounting information about active or completed jobs.
  • salloc is used to allocate resources for a job in real time. Typically this is used to allocate resources and spawn a shell. The shell is then used to execute srun commands to launch parallel tasks.
  • sattach is used to attach standard input, output, and error plus signal capabilities to a currently running job or job step. One can attach to and detach from jobs multiple times.
  • sbatch is used to submit a job script for later execution. The script will typically contain one or more srun commands to launch parallel tasks.
  • sbcast is used to transfer a file from local disk to local disk on the nodes allocated to a job. This can be used to effectively use diskless compute nodes or provide improved performance relative to a shared file system.
  • scancel is used to cancel a pending or running job or job step. It can also be used to send an arbitrary signal to all processes associated with a running job or job step.
  • scontrol is the administrative tool used to view and/or modify SLURM state. Note that many scontrol commands can only be executed as user root.
  • sinfo reports the state of partitions and nodes managed by SLURM. It has a wide variety of filtering, sorting, and formatting options.
  • smap reports state information for jobs, partitions, and nodes managed by SLURM, but graphically displays the information to reflect network topology.
  • squeue reports the state of jobs or job steps. It has a wide variety of filtering, sorting, and formatting options. By default, it reports the running jobs in priority order and then the pending jobs in priority order.
  • srun is used to submit a job for execution or initiate job steps in real time. srun has a wide variety of options to specify resource requirements, including: minimum and maximum node count, processor count, specific nodes to use or not use, and specific node characteristics (so much memory, disk space, certain required features, etc.). A job can contain multiple job steps executing sequentially or in parallel on independent or shared nodes within the job's node allocation.
  • smap reports state information for jobs, partitions, and nodes managed by SLURM, but graphically displays the information to reflect network topology.
  • strigger is used to set, get or view event triggers. Event triggers include things such as nodes going down or jobs approaching their time limit.
  • sview is a graphical user interface to get and update state information for jobs, partitions, and nodes managed by SLURM.

Last updated 24 May 2011Contact TCHPC: info | support.