Licenses Guide
Licenses Overview
Slurm can help with software license management by assigning available licenses to jobs at scheduling time. If the licenses are not available, jobs are kept pending until licenses become available. Licenses in Slurm are essentially shared resources, meaning configured resources that are not tied to a specific host but are associated with the entire cluster.
Licenses in Slurm can be configured in two ways:
- Local Licenses: Local licenses are local to the cluster using the slurm.conf in which they are configured.
- Remote Licenses: Remote licenses are served by the database and are configured using the sacctmgr command. Remote licenses are dynamic in nature as upon running the sacctmgr command, the slurmdbd updates all clusters the licenses are assigned to.
Local Licenses
Local licenses are defined in the slurm.conf using the Licenses option.
slurm.conf:
Licenses=fluent:30,ansys:100
Configured licenses can be viewed using the scontrol command.
$ scontrol show lic
LicenseName=ansys
Total=100 Used=0 Free=100 Remote=no
LicenseName=fluent
Total=30 Used=0 Free=30 Remote=no
Requesting licenses is done by using the -L, or --licenses, submission option.
$ sbatch -L ansys:2 script.sh
Submitted batch job 5212
$ scontrol show lic
LicenseName=ansys
Total=100 Used=2 Free=98 Remote=no
LicenseName=fluent
Total=30 Used=0 Free=30 Remote=no
Licenses may also be requested using the --tres-per-task option for job submission. If this approach is used, the license must also be defined in the AccountingStorageTRES option of the slurm.conf.
slurm.conf:
Licenses=fluent:30 AccountingStorageTRES=license/fluent
Requesting licenses with the --tres-per-task submission option.
$ sbatch --tres-per-task=license/fluent:4 script.sh
Submitted batch job 6482
$ scontrol show lic
LicenseName=fluent
Total=30 Used=4 Free=26 Reserved=0 Remote=no
Remote Licenses
Use Case
A site has two license servers, one serves 100 Nastran licenses provided by FlexNet and the other serves 50 Matlab licenses from Reprise License Management. The site has two clusters named "fluid" and "pdf" dedicated to run simulation jobs using both products. The managers want to split the number of Nastran licenses equally between clusters, but assign 70% of the Matlab licenses to cluster "pdf" and the remaining 30% to cluster "fluid".
Configuring Slurm for the use case
Here we assume that both clusters have been configured correctly in the slurmdbd using the sacctmgr command.
$ sacctmgr show clusters format=cluster,controlhost
Cluster ControlHost
---------- ---------------
fluid 143.11.1.3
pdf 144.12.3.2
The licenses are added using the sacctmgr command, specifying the total count of licenses and the percentage that should be allocated to each cluster. This can be done either in one step or through a multi-step process.
One step:
$ sacctmgr add resource name=nastran cluster=fluid,pdf \ count=100 allowed=50 server=flex_host servertype=flexlm type=license Adding Resource(s) nastran@flex_host Cluster - fluid 50 Cluster - pdf 50 Settings Name = nastran Server = flex_host Description = nastran ServerType = flexlm Count = 100 Type = License
Multi-step:
$ sacctmgr add resource name=matlab count=50 server=rlm_host \ servertype=rlm type=license Adding Resource(s) matlab@rlm_host Settings Name = matlab Server = rlm_host Description = matlab ServerType = rlm Count = 50 Type = License $ sacctmgr add resource name=matlab server=rlm_host \ cluster=pdf allowed=70 Adding Resource(s) matlab@rlm_host Cluster - pdf 70 Settings Name = matlab Server = rlm_host Count = 50 LastConsumed = 0 Flags = (null) Type = License $ sacctmgr add resource name=matlab server=rlm_host \ cluster=fluid allowed=30 Adding Resource(s) matlab@rlm_host Cluster - fluid 30 Settings Name = matlab Server = rlm_host Count = 50 LastConsumed = 0 Flags = (null) Type = License
The sacctmgr command will now display the grand total of licenses.
$ sacctmgr show resource
Name Server Type Count LastConsumed Allocated ServerType Flags
---------- ---------- -------- ------ ------------ --------- ---------- --------------------
nastran flex_host License 100 0 100 flexlm
matlab rlm_host License 50 0 100 rlm
$ sacctmgr show resource withclusters
Name Server Type Count LastConsumed Allocated ServerType Cluster Allowed Flags
---------- ---------- -------- ------ ------------ --------- ---------- ---------- -------- --------------------
nastran flex_host License 100 0 100 flexlm fluid 50
nastran flex_host License 100 0 100 flexlm pdf 50
matlab rlm_host License 50 0 100 rlm fluid 30
matlab rlm_host License 50 0 100 rlm pdf 70
The configured licenses are now visible on both clusters using the scontrol command.
# On cluster "pdf":
$ scontrol show lic
LicenseName=matlab@rlm_host
Total=35 Used=0 Free=35 Reserved=0 Remote=yes
LastConsumed=0 LastDeficit=0 LastUpdate=2023-02-28T17:01:44
LicenseName=nastran@flex_host
Total=50 Used=0 Free=50 Reserved=0 Remote=yes
LastConsumed=0 LastDeficit=0 LastUpdate=2023-02-28T17:01:44
# On cluster "fluid":
$ scontrol show lic
LicenseName=matlab@rlm_host
Total=15 Used=0 Free=15 Reserved=0 Remote=yes
LastConsumed=0 LastDeficit=0 LastUpdate=2023-02-28T17:01:44
LicenseName=nastran@flex_host
Total=50 Used=0 Free=50 Reserved=0 Remote=yes
LastConsumed=0 LastDeficit=0 LastUpdate=2023-02-28T17:01:44
When submitting jobs to remote licenses, the name and server must be used.
$ sbatch -L nastran@flex_host script.sh Submitted batch job 5172
License percentages and counts can be modified as shown below:
$ sacctmgr modify resource name=matlab server=rlm_host set \
count=200
Modified server resource ...
matlab@rlm_host
Cluster - fluid - matlab@rlm_host
Cluster - pdf - matlab@rlm_host
$ sacctmgr modify resource name=matlab server=rlm_host \
cluster=pdf set allowed=60
Modified server resource ...
Cluster - pdf - matlab@rlm_host
$ sacctmgr show resource withclusters
Name Server Type Count LastConsumed Allocated ServerType Cluster Allowed Flags
---------- ---------- -------- ------ ------------ --------- ---------- ---------- -------- --------------------
nastran flex_host License 100 0 100 flexlm fluid 50
nastran flex_host License 100 0 100 flexlm pdf 50
matlab rlm_host License 200 0 90 rlm fluid 30
matlab rlm_host License 200 0 90 rlm pdf 60
Licenses can be deleted either on the cluster or all together as shown:
$ sacctmgr delete resource where name=matlab server=rlm_host cluster=fluid
Deleting resource(s)...
Deleting resource(s)...
Cluster - fluid - matlab@rlm_host
$ sacctmgr delete resource where name=nastran server=flex_host
Deleting resource(s)...
nastran@flex_host
Cluster - fluid - nastran@flex_host
Cluster - pdf - nastran@flex_host
$ sacctmgr show resource withclusters
Name Server Type Count LastConsumed Allocated ServerType Cluster Allowed Flags
---------- ---------- -------- ------ ------------ --------- ---------- ---------- -------- --------------------
matlab rlm_host License 200 0 60 rlm pdf 60
Starting with Slurm 23.02, a new Absolute flag is available that indicates the license allowed values for each cluster are to be treated as absolute license counts rather than percentages.
Some brief examples of license management using this flag.
$ sacctmgr -i add resource name=deluxe cluster=fluid,pdf count=150 allowed=70 \
server=flex_host servertype=flexlm flags=absolute
Adding Resource(s)
deluxe@flex_host
Cluster - fluid 70
Cluster - pdf 70
Settings
Name = deluxe
Server = flex_host
Description = deluxe
ServerType = flexlm
Count = 150
Flags = Absolute
Type = Unknown
$ sacctmgr show resource withclusters
Name Server Type Count LastConsumed Allocated ServerType Cluster Allowed Flags
---------- ---------- -------- ------ ------------ --------- ---------- ---------- -------- --------------------
deluxe flex_host License 150 0 140 flexlm fluid 70 Absolute
deluxe flex_host License 150 0 140 flexlm pdf 70 Absolute
$ sacctmgr -i update resource deluxe set allowed=25 where cluster=fluid
Modified server resource ...
Cluster - fluid - deluxe@flex_host
$ sacctmgr show resource withclusters
Name Server Type Count LastConsumed Allocated ServerType Cluster Allowed Flags
---------- ---------- -------- ------ ------------ --------- ---------- ---------- -------- --------------------
deluxe flex_host License 150 0 95 flexlm fluid 25 Absolute
deluxe flex_host License 150 0 95 flexlm pdf 70 Absolute
This can also be established as the default for all newly created licenses by adding AllResourcesAbsolute=yes to slurmdbd.conf (and restarting SlurmDBD to make the change take effect).
Dynamic licenses
Starting with Slurm 23.02, the LastConsumed field for remote licenses is designed to be periodically updated with the active use count from a license server. An example script for FlexLM's lmstat command is provided below — similar scripts can be easily constructed for other license management stacks.
#!/bin/bash
set -euxo pipefail
LMSTAT=/opt/foobar/bin/lmstat
LICENSE=foobar
consumed=$(${LMSTAT} | grep "Users of ${LICENSE}"|sed "s/.*Total of \([0-9]\+\) licenses in use)/\1/")
sacctmgr -i update resource ${LICENSE} set lastconsumed=${consumed}
When the LastConsumed value is changed through sacctmgr an update is automatically pushed to the Slurm controllers. They will use this value to calculate a LastDeficit value — this value indicates how many licenses that have "gone missing" from the cluster's perspective and will need to be set aside temporarily.
E.g., on this cluster 100 "foobar" licenses are available, and we are allocating access to 80 of them on the "blackhole" cluster:
$ sacctmgr add resource foobar count=100 flags=absolute cluster=blackhole allowed=80
Adding Resource(s)
foobar@slurmdb
Cluster - blackhole 80
Settings
Name = foobar
Server = slurmdb
Description = foobar
Count = 100
Flags = Absolute
Type = Unknown
Would you like to commit changes? (You have 30 seconds to decide)
(N/y): y
$ scontrol show license
LicenseName=foobar@slurmdb
Total=80 Used=0 Free=80 Reserved=0 Remote=yes
LastConsumed=0 LastDeficit=0 LastUpdate=2023-02-28T16:36:55
Now, our cron job comes in and updates the LastConsumed value to 30, while the cluster has yet to allocate any licenses to jobs:
$ sacctmgr -i update resource foobar set lastconsumed=30
Modified server resource ...
foobar@slurmdb
Cluster - blackhole - foobar@slurmdb
$ scontrol show license
LicenseName=foobar@slurmdb
Total=80 Used=0 Free=70 Reserved=0 Remote=yes
LastConsumed=30 LastDeficit=10 LastUpdate=2023-02-28T16:39:27
Note that the cluster has now calculated a deficit of 10 licenses, and has noticed that it should only schedule up to 70 licenses at the moment. The cluster knows that up to 20 licenses are reserved for other clusters or external use at the moment. However, since LastConsumed was set to 30 this implies an additional 10 licenses have "gone rogue" and their usage cannot be accounted for. Thus the cluster must not assign those to any pending jobs, as it's likely that the job would fail to acquire the desired licenses.
If a further update (likely driven through cron) now reduces the LastConsumed count to 10, the deficit is now considered to have disappeared, and the cluster will make all 80 assigned licenses available again:
$ sacctmgr -i update resource foobar set lastconsumed=20
Modified server resource ...
foobar@slurmdb
Cluster - blackhole - foobar@slurmdb
$ scontrol show license
LicenseName=foobar@slurmdb
Total=80 Used=0 Free=80 Reserved=0 Remote=yes
LastConsumed=20 LastDeficit=0 LastUpdate=2023-02-28T16:44:26
Last modified 25 April 2024