[ Platform Documentation ] [ Title ] [ Contents ] [ Previous ] [ Next ] [ Index ]
The
lsb.queues
file defines the batch queues in an LSF cluster. Numerous controls are available at the queue level to allow cluster administrators to customize site policies.This file is optional; if no queues are configured, LSF creates a queue named
default
, with all parameters set to default values.This file is installed by default in
LSB_CONFDIR/
cluster_name/configdir
.[ Top ]
lsb.queues Structure
Each queue definition begins with the line Begin Queue and ends with the line End Queue. The queue name must be specified; all other parameters are optional.
ADMINISTRATORS
ADMINISTRATORS =
user_name | user_group ...List of queue administrators.
Queue administrators can perform operations on any user's job in the queue, as well as on the queue itself.
Undefined (you must be a cluster administrator to operate on this queue)
BACKFILL
BACKFILL = Y
|N
If Y, enables backfill scheduling for the queue.
A possible conflict exists if BACKFILL and PREEMPTION are specified together. A backfill queue cannot be preemptable. Therefore, if BACKFILL is enabled, do not also specify PREEMPTION = PREEMPTABLE.
Undefined (no backfilling)
CHKPNT
CHKPNT =
chkpnt_dir [chkpnt_period]Enables automatic checkpointing.
The checkpoint directory is the directory where the checkpoint files are created. Specify an absolute path or a path relative to CWD, do not use environment variables.
Specify the checkpoint period in minutes.
Job-level checkpoint parameters override queue-level checkpoint parameters.
Only running members of a chunk job can be checkpointed.
To make a MultiCluster job checkpointable, both submission and execution queues must enable checkpointing, and the execution queue setting determines the checkpoint directory. Checkpointing is not supported if a job runs on a leased host.
Undefined
CHUNK_JOB_SIZE
CHUNK_JOB_SIZE =
integerChunk jobs only. Enables job chunking and specifies the maximum number of jobs allowed to be dispatched together in a chunk. Specify a positive integer greater than 1.
The ideal candidates for job chunking are jobs that have the same host and resource requirements and typically take 1 to 2 minutes to run.
Job chunking can have the following advantages:
- Reduces communication between
sbatchd
andmbatchd
and reduces scheduling overhead inmbschd
.- Increases job throughput in
mbatchd
and CPU utilization on the execution hosts.However, throughput can deteriorate if the chunk job size is too big. Performance may decrease on queues with CHUNK_JOB_SIZE greater than 30. You should evaluate the chunk job size on your own systems for best performance.
With MultiCluster job forwarding model, this parameter does not affect MultiCluster jobs that are forwarded to a remote cluster.
This parameter is ignored in the following kinds of queues:
- Interactive (INTERACTIVE = ONLY parameter)
- CPU limit greater than 30 minutes (CPULIMIT parameter)
- Run limit greater than 30 minutes (RUNLIMIT parameter)
If CHUNK_JOB_DURATION is set in
lsb.params
, chunk jobs are accepted regardless of the value of CPULIMIT or RUNLIMIT.The following configures a queue named
chunk
, which dispatches up to 4 jobs in a chunk:Begin Queue QUEUE_NAME = chunk PRIORITY = 50 CHUNK_JOB_SIZE = 4 End QueueUndefined
CORELIMIT
CORELIMIT =
integerThe per-process (hard) core file size limit (in KB) for all of the processes belonging to a job from this queue (see
getrlimit
(2
)).Unlimited
CPULIMIT
CPULIMIT =
[default_limit] maximum_limitwhere default_limit and maximum_limit are:
[hours:]minutes[/host_name | /host_model]
Maximum normalized CPU time and optionally, the default normalized CPU time allowed for all processes of a job running in this queue. The name of a host or host model specifies the CPU time normalization host to use.
Limits the total CPU time the job can use. This parameter is useful for preventing runaway jobs or jobs that use up too many resources.
When the total CPU time for the whole job has reached the limit, a SIGXCPU signal is sent to all processes belonging to the job. If the job has no signal handler for SIGXCPU, the job is killed immediately. If the SIGXCPU signal is handled, blocked, or ignored by the application, then after the grace period expires, LSF sends SIGINT, SIGTERM, and SIGKILL to the job to kill it.
If a job dynamically spawns processes, the CPU time used by these processes is accumulated over the life of the job.
Processes that exist for fewer than 30 seconds may be ignored.
By default, if a default CPU limit is specified, jobs submitted to the queue without a job-level CPU limit are killed when the default CPU limit is reached.
If you specify only one limit, it is the maximum, or hard, CPU limit. If you specify two limits, the first one is the default, or soft, CPU limit, and the second one is the maximum CPU limit. The number of minutes may be greater than 59. Therefore, three and a half hours can be specified either as 3:30 or 210.
If no host or host model is given with the CPU time, LSF uses the default CPU time normalization host defined at the queue level (DEFAULT_HOST_SPEC in
lsb.queues
) if it has been configured, otherwise uses the default CPU time normalization host defined at the cluster level (DEFAULT_HOST_SPEC inlsb.params
) if it has been configured, otherwise uses the host with the largest CPU factor (the fastest host in the cluster).On Windows, a job which runs under a CPU time limit may exceed that limit by up to SBD_SLEEP_TIME. This is because
sbatchd
periodically checks if the limit has been exceeded.On UNIX systems, the CPU limit can be enforced by the operating system at the process level.
You can define whether the CPU limit is a per-process limit enforced by the OS or a per-job limit enforced by LSF with LSB_JOB_CPULIMIT in
lsf.conf
.Jobs submitted to a chunk job queue are not chunked if CPULIMIT is greater than 30 minutes.
Unlimited
DATALIMIT
DATALIMIT =
[default_limit] maximum_limitThe per-process data segment size limit (in KB) for all of the processes belonging to a job from this queue (see
getrlimit
(2
)).By default, if a default data limit is specified, jobs submitted to the queue without a job-level data limit are killed when the default data limit is reached.
If you specify only one limit, it is the maximum, or hard, data limit. If you specify two limits, the first one is the default, or soft, data limit, and the second one is the maximum data limit
Unlimited
DEFAULT_EXTSCHED
DEFAULT_EXTSCHED =
external_scheduler_optionsSpecifies default external scheduling options for the queue.
-extsched
options on thebsub
command are merged with DEFAULT_EXTSCHED options, and-extsched
options override any conflicting queue-level options set by DEFAULT_EXTSCHED.Undefined
DEFAULT_HOST_SPEC
DEFAULT_HOST_SPEC =
host_name | host_modelThe default CPU time normalization host for the queue.
The CPU factor of the specified host or host model will be used to normalize the CPU time limit of all jobs in the queue, unless the CPU time normalization host is specified at the job level.
Undefined
DESCRIPTION
DESCRIPTION =
textDescription of the job queue that will be displayed by
bqueues -l.
This description should clearly describe the service features of this queue, to help users select the proper queue for each job.
The text can include any characters, including white space. The text can be extended to multiple lines by ending the preceding line with a backslash (\). The maximum length for the text is 512 characters.
DISPATCH_ORDER
DISPATCH_ORDER = QUEUE
Defines an ordered cross-queue fairshare set. DISPATCH_ORDER indicates that jobs are dispatched according to the order of queue priorities first, then user fairshare priority.
By default, a user has the same priority across the master and slave queues. If the same user submits several jobs to these queues, user priority is calculated by taking into account all the jobs the user has submitted across the master- slave set.
If DISPATCH_ORDER=QUEUE is set in the master queue, jobs are dispatched according to queue priorities first, then user priority. Jobs from users with lower fairshare priorities who have pending jobs in higher priority queues are dispatched before jobs in lower priority queues. This avoids having users with higher fairshare priority getting jobs dispatched from low-priority queues.
Jobs in queues having the same priority are dispatched according to user priority.
Queues that are not part of the cross-queue fairshare can have any priority; the are not limited to fall outside of the priority range of cross-queue fairshare queues.
Undefined
DISPATCH_WINDOW
DISPATCH_WINDOW =
time_window ...The time windows in which jobs from this queue are dispatched. Once dispatched, jobs are no longer affected by the dispatch window.
Undefined (always open)
EXCLUSIVE
EXCLUSIVE = Y
|N
If Y, specifies an exclusive queue.
Jobs submitted to an exclusive queue with
bsub -x
will only be dispatched to a host that has no other LSF jobs running.For hosts shared under the MultiCluster resource leasing model, jobs will not be dispatched to a host that has LSF jobs running, even if the jobs are from another cluster.
FAIRSHARE
Enables queue-level fairshare and specifies share assignments. Only users with share assignments can submit jobs to the queue.
FAIRSHARE = USER_SHARES[[
user,
number_shares]
...]
- Specify at least one user share assignment.
- Enclose the list in square brackets, as shown.
- Enclose each user share assignment in square brackets, as shown.
- user
Specify users who are also configured to use queue. You can assign the shares to:
- A single user (specify user_name)
- Users in a group, individually (specify group_name
@
) or collectively (specify group_name)- Users not included in any other share assignment, individually (specify the keyword
default
) or collectively (specify the keywordothers
)By default, when resources are assigned collectively to a group, the group members compete for the resources on a first-come, first-served (FCFS) basis. You can use hierarchical fairshare to further divide the shares among the group members.
When resources are assigned to members of a group individually, the share assignment is recursive. Members of the group and of all subgroups always compete for the resources according to FCFS scheduling, regardless of hierarchical fairshare policies.
- number_shares
Specify a positive integer representing the number of shares of the cluster resources assigned to the user.
The number of shares assigned to each user is only meaningful when you compare it to the shares assigned to other users or to the total number of shares. The total number of shares is just the sum of all the shares assigned in each share assignment.
Do not configure hosts in a cluster to use fairshare at both queue and host levels.
Undefined (no fairshare)
FAIRSHARE_QUEUES
FAIRSHARE_QUEUES =
queue_name queue_name ...Defines cross-queue fairshare.
When this parameter is defined:
- The queue in which this parameter is defined becomes the "master queue".
- Queues listed with this parameter are "slave queues" and inherit the fairshare policy of the master queue.
- A user has the same priority across the master and slave queues.
If the same user submits several jobs to these queues, user priority is calculated by taking into account all the jobs the user has submitted across the master-slave set.
- By default, the PRIORITY range defined for queues in cross-queue fairshare cannot be used with any other queues. For example, you have 4 queues: queue1, queue2, queue3,
queue4
. You configure cross-queue fairshare for queue1, queue2, queue3 and assign priorities of 30, 40, 50 respectively.- By default, the priority of
queue4
(which is not part of the cross-queue fairshare) cannot fall between the priority range of the cross-queue fairshare queues (30-50). It can be any number up to 29 or higher than 50. It does not matter ifqueue4
is a fairshare queue or FCFS queue.If DISPATCH_ORDER=QUEUE is set in the master queue, the priority of
queue4
(which is not part of the cross-queue fairshare) can be any number, including a priority falling between the priority range of the cross-queue fairshare queues (30-50).- FAIRSHARE must be defined in the master queue. If it is also defined in the queues listed in FAIRSHARE_QUEUES, it will be ignored.
- Cross-queue fairshare can be defined more than once within
lsb.queues
. You can define several sets of master-slave queues. However, a queue cannot belong to more than one master-slave set. For example, you can define:You cannot, however, define
night
,owners
, orpriority
as slaves in the queuenormal
; ornormal
,short
andlicense
as slaves in thepriority
queue; orshort
,license
,night
,owners
as master queues of their own.- Cross-queue fairshare cannot be used with host partition fairshare. It is part of queue-level fairshare.
Undefined
FILELIMIT
FILELIMIT =
integerThe per-process (hard) file size limit (in KB) for all of the processes belonging to a job from this queue (see
getrlimit
(2
)).Unlimited
HJOB_LIMIT
HJOB_LIMIT
=
integerPer-host job slot limit.
Maximum number of job slots that this queue can use on any host. This limit is configured per host, regardless of the number of processors it may have.
This may be useful if the queue dispatches jobs that require a node-locked license. If there is only one node-locked license per host then the system should not dispatch more than one job to the host even if it is a multiprocessor host.
The following will run a maximum of one job on each of
hostA
,hostB
, andhostC
:Begin Queue ... HJOB_LIMIT = 1 HOSTS=hostA hostB hostC ... End QueueUnlimited
HOSTS
HOSTS =
host_list |none
- host_list is a space-separated list of the following items:
- host_name[
@
cluster_name][+
pref_level]- host_partition[
+
pref_level]- host_group[
+
pref_level]- [
~
]host_name- [
~
]host_groupall@
cluster_nameList can include the following items only once:
none
keyword is only used with the MultiCluster job forwarding model, to specify a remote-only queue.A space-separated list of hosts on which jobs from this queue can be run.
If host groups and host partitions are included in the list, the job can run on any host in the group or partition. All the members of the host list should either belong to a single host partition or not belong to any host partition. Otherwise, job scheduling may be affected.
Some items can be followed by a plus sign (
+
) and a positive number to indicate the preference for dispatching a job to that host. A higher number indicates a higher preference. If a host preference is not given, it is assumed to be 0. If there are multiple candidate hosts, LSF dispatches the job to the host with the highest preference; hosts at the same level of preference are ordered by load.If host groups and host partitions are assigned a preference, each host in the group or partition has the same preference.
Use the keyword
others
to indicate all hosts not explicitly listed.Use the keyword
all
to indicate all hosts not explicitly excluded.Use the not operator (
~
) to exclude hosts from theall
specification in the queue. This is useful if you have a large cluster but only want to exclude a few hosts from the queue definition.The not operator can only be used with the
all
keyword. It is not valid with the keywordsothers
andnone
.The not operator (
~
) can be used to exclude host groups.With MultiCluster resource leasing model, use the format host_name
@
cluster_name to specify a borrowed host. LSF does not validate the names of remote hosts. The keywordothers
indicates all local hosts not explicitly listed. The keywordall
indicates all local hosts not explicitly excluded. Use the keywordallremote
to specify all hosts borrowed from all remote clusters. Useall@
cluster_name to specify the group of all hosts borrowed from one remote cluster. You cannot specify a host group or partition that includes remote resources, unless it uses the keywordallremote
to include all remote hosts.With MultiCluster resource leasing model, the not operator (
~
) can be used to exclude local hosts or host groups. You cannot use the not operator (~
) with remote hosts.Hosts that participate in queue-based fairshare cannot be in a host partition.
Host preferences specified by
bsub -m
override the queue specification.HOSTS = hostA+1 hostB hostC+1 hostD+3This example defines three levels of preferences: run jobs on
hostD
as much as possible, otherwise run on eitherhostA
orhostC
if possible, otherwise run onhostB
. Jobs should not run onhostB
unless all other hosts are too busy to accept more jobs.HOSTS = hostD+1 othersRun jobs on
hostD
as much as possible, otherwise run jobs on the least-loaded host available.With MultiCluster resource leasing model, this queue does not use borrowed hosts.
HOSTS = all ~hostARun jobs on all hosts in the cluster, except for
hostA
.With MultiCluster resource leasing model, this queue does not use borrowed hosts.
HOSTS = Group1 ~hostA hostB hostCRun jobs on
hostB
,hostC
, and all hosts inGroup1
except forhostA
.With MultiCluster resource leasing model, this queue will use borrowed hosts if Group1 uses the keyword
allremote
.
all
(the queue can use all hosts in the cluster, and every host has equal preference)With MultiCluster resource leasing model, this queue can use all local hosts, but no borrowed hosts.
IGNORE_DEADLINE
IGNORE_DEADLINE = Y
If Y, disables deadline constraint scheduling (starts all jobs regardless of deadline constraints).
IMPT_JOBBKLG
IMPT_JOBBKLG =
integer |infinit
MultiCluster job forwarding model only. Specifies the MultiCluster pending job limit for a receive-jobs queue. This represents the maximum number of MultiCluster jobs that can be pending in the queue; once the limit has been reached, the queue stops accepting jobs from remote clusters.
Use the keyword
infinit
to make the queue accept an infinite number of pending MultiCluster jobs.50
INTERACTIVE
INTERACTIVE = NO
|ONLY
Causes the queue to reject interactive batch jobs (NO) or accept nothing but interactive batch jobs (ONLY).
Interactive batch jobs are submitted via
bsub -I
.Undefined (the queue accepts both interactive and non-interactive jobs)
JOB_ACCEPT_INTERVAL
JOB_ACCEPT_INTERVAL =
integerThe number you specify is multiplied by the value of
lsb.params
MBD_SLEEP_TIME (60 seconds by default). The result of the calculation is the number of seconds to wait after dispatching a job to a host, before dispatching a second job to the same host.If 0 (zero), a host may accept more than one job in each dispatch turn. By default, there is no limit to the total number of jobs that can run on a host, so if this parameter is set to 0, a very large number of jobs might be dispatched to a host all at once. This can overload your system to the point that it will be unable to create any more processes. It is not recommended to set this parameter to 0.
JOB_ACCEPT_INTERVAL set at the queue level (
lsb.queues
) overrides JOB_ACCEPT_INTERVAL set at the cluster level (lsb.params
).Undefined (the queue uses JOB_ACCEPT_INTERVAL defined in
lsb.params
, which has a default value of 1)JOB_ACTION_WARNING_TIME
JOB_ACTION_WARNING_TIME=
[hours:
]minutesSpecifies the amount of time before a job control action occurs that a job warning action is to be taken. For example, 2 minutes before the job reaches run time limit or termination deadline, or the queue's run window is closed, an URG signal is sent to the job.
Job action warning time is not normalized.
A job action warning time must be specified with a job warning action in order for job warning to take effect.
The warning time specified by the
bsub -wt
option overrides JOB_ACTION_WARNING_TIME in the queue. JOB_ACTION_WARNING_TIME is used as the the default when no command line option is specified.JOB_ACTION_WARNING_TIME=2
Undefined
JOB_CONTROLS
JOB_CONTROLS = SUSPEND
[signal | command |CHKPNT
]RESUME
[signal | command]TERMINATE
[signal | command |CHKPNT
]
- CHKPNT is a special action, which causes the system to checkpoint the job. If the SUSPEND action is CHKPNT, the job is checkpointed and then stopped by sending the SIGSTOP signal to the job automatically.
- signal is a UNIX signal name (such as SIGSTOP or SIGTSTP).
- command specifies a
/bin/sh
command line to be invoked. Do not specify a signal followed by an action that triggers the same signal (for example, do not specifyJOB_CONTROLS=TERMINATE[bkill]
orJOB_CONTROLS=TERMINATE[brequeue]
). This will cause a deadlock between the signal and the action.Changes the behavior of the SUSPEND, RESUME, and TERMINATE actions in LSF.
For SUSPEND and RESUME, if the action is a command, the following points should be considered:
- The contents of the configuration line for the action are run with
/bin/sh -c
so you can use shell features in the command.- The standard input, output, and error of the command are redirected to the NULL device.
- The command is run as the user of the job.
- All environment variables set for the job are also set for the command action. The following additional environment variables are set:
- LSB_JOBPGIDS -- a list of current process group IDs of the job
- LSB_JOBPIDS --a list of current process IDs of the job
For the SUSPEND action command, the following environment variable is also set:
On UNIX, by default, SUSPEND sends SIGTSTP for parallel or interactive jobs and SIGSTOP for other jobs. RESUME sends SIGCONT. TERMINATE sends SIGINT, SIGTERM and SIGKILL in that order.
On Windows, actions equivalent to the UNIX signals have been implemented to do the default job control actions. Job control messages replace the SIGINT and SIGTERM signals, but only customized applications will be able to process them. Termination is implemented by the
TerminateProcess( )
system call.JOB_IDLE
JOB_IDLE =
numberSpecifies a threshold for idle job exception handling. The value should be a number between 0.0 and 1.0 representing CPU time/runtime. If the job idle factor is less than the specified threshold, LSF invokes
LSF_SERVERDIR/eadmin
to trigger the action for a job idle exception.Any positive number between 0.0 and 1.0
JOB_IDLE = 0.10A job idle exception is triggered for jobs with an idle value (CPU time/runtime) less than 0.10.
Undefined. No job idle exceptions are detected.
JOB_OVERRUN
JOB_OVERRUN =
run_timeSpecifies a threshold for job overrun exception handling. If a job runs longer than the specified run time, LSF invokes
LSF_SERVERDIR/eadmin
to trigger the action for a job overrun exception.JOB_OVERRUN = 5A job overrun exception is triggered for jobs running longer than 5 minutes.
Undefined. No job overrun exceptions are detected.
JOB_STARTER
JOB_STARTER =
starter [starter] ["%USRCMD"
] [starter]Creates a specific environment for submitted jobs prior to execution.
starter is any executable that can be used to start the job (i.e., can accept the job as an input argument). Optionally, additional strings can be specified.
By default, the user commands run after the job starter. A special string, %USRCMD, can be used to represent the position of the user's job in the job starter command line. The %USRCMD string may be enclosed with quotes or followed by additional commands.
JOB_STARTER = csh -c "%USRCMD;sleep 10"In this case, if a user submits a job
% bsub myjob argumentsthe command that actually runs is:
% csh -c "myjob arguments;sleep 10"Undefined (no job starter)
JOB_UNDERRUN
JOB_UNDERRUN=
run_timeSpecifies a threshold for job underrun exception handling. If a job exits before the specified number of minutes, LSF invokes
LSF_SERVERDIR/eadmin
to trigger the action for a job underrun exception.JOB_UNDERRUN = 2A job underrun exception is triggered for jobs running less than 2 minutes.
Undefined. No job underrun exceptions are detected.
JOB_WARNING_ACTION
JOB_WARNING_ACTION=
signal | command |CHKPNT
Specifies the job action to be taken before a job control action occurs. For example, 2 minutes before the job reaches run time limit or termination deadline, or the queue's run window is closed, an URG signal is sent to the job.
A job warning action must be specified with a job action warning time in order for job warning to take effect.
If JOB_WARNING_ACTION is specified, LSF sends the warning action to the job before the actual control action is taken. This allows the job time to save its result before being terminated by the job control action.
You can specify actions similar to the JOB_CONTROLS queue level parameter: send a signal, invoke a command, or checkpoint the job.
The warning action specified by the
bsub -wa
option overrides JOB_WARNING_ACTION in the queue. JOB_WARNING_ACTION is used as the the default when no command line option is specified.JOB_WARNING_ACTION=URG
Undefined
load_index
load_index
=
loadSched[/
loadStop]Specify
io
,it
,ls
,mem
,pg
,r15s
,r1m
,r15m
,swp
,tmp
,ut
, or a non-shared custom external load index. Specify multiple lines to configure thresholds for multiple load indices.Specify
io
,it
,ls
,mem
,pg
,r15s
,r1m
,r15m
,swp
,tmp
,ut
, or a non-shared custom external load index as a column. Specify multiple columns to configure thresholds for multiple load indices.Scheduling and suspending thresholds for the specified dynamic load index.
The
loadSched
condition must be satisfied before a job is dispatched to the host. If a RESUME_COND is not specified, theloadSched
condition must also be satisfied before a suspended job can be resumed.If the
loadStop
condition is satisfied, a job on the host will be suspended.The
loadSched
andloadStop
thresholds permit the specification of conditions using simple AND/OR logic. Any load index that does not have a configured threshold has no effect on job scheduling.LSF will not suspend a job if the job is the only batch job running on the host and the machine is interactively idle (
it
>0).The
r15s
,r1m
, andr15m
CPU run queue length conditions are compared to the effective queue length as reported bylsload -E
, which is normalized for multiprocessor hosts. Thresholds for these parameters should be set at appropriate levels for single processor hosts.MEM=100/10 SWAP=200/30These two lines translate into a
loadSched
condition ofmem>=100 && swap>=200and a
loadStop
condition ofmem < 10 || swap < 30Undefined
MANDATORY_EXTSCHED
MANDATORY_EXTSCHED =
external_scheduler_optionsSpecifies mandatory external scheduling options for the queue.
-extsched
options on thebsub
command are merged with MANDATORY_EXTSCHED options, and MANDATORY_EXTSCHED options override any conflicting job-level options set by-extsched
.Undefined
MAX_RSCHED_TIME
MAX_RSCHED_TIME =
integer |infinit
MultiCluster job forwarding model only. Determines how long a MultiCluster job stays pending in the execution cluster before returning to the submission cluster. The remote timeout limit in seconds is:
MAX_RSCHED_TIME * MBD_SLEEP_TIME = timeoutSpecify
infinit
to disable remote timeout (jobs always get dispatched in the correct FCFS order because MultiCluster jobs never get rescheduled, but MultiCluster jobs can be pending in the receive-jobs queue forever instead of being rescheduled to a better queue).Remote timeout limit never affects advance reservation jobs
Jobs that use an advance reservation always behave as if remote timeout is disabled.
20 (20 minutes by default)
MC_FAST_SCHEDULE
MC_FAST_SCHEDULE =
[y
|n
]MultiCluster only.
Specify
y
to enable fast MultiCluster scheduling.Because of communication that must occur between clusters, MultiCluster jobs always take longer to dispatch than local jobs. Fast MultiCluster scheduling helps to speed up MultiCluster operation.
By default, jobs forwarded from a remote cluster are treated just like jobs submitted to the local queue, and wait in the pending job list for the next dispatch turn.
If resource requirements are not important, you can give preference to remote jobs. If you enable fast MultiCluster scheduling, jobs from a remote queue are dispatched to the execution host immediately, without evaluating the resource requirement, and without waiting for the next dispatch turn. However, jobs might fail because their resource requirement is ignored and multiple jobs are dispatched against the same resource.
Should not be used with preemptive scheduling because LSF could overcommit job slots.
Undefined (fast MultiCluster scheduling is disabled).
MEMLIMIT
MEMLIMIT =
[default_limit] maximum_limitThe per-process (hard) process resident set size limit (in KB) for all of the processes belonging to a job from this queue (see
getrlimit
(2
)).Sets the maximum amount of physical memory (resident set size, RSS) that may be allocated to a process.
By default, if a default memory limit is specified, jobs submitted to the queue without a job-level memory limit are killed when the default memory limit is reached.
If you specify only one limit, it is the maximum, or hard, memory limit. If you specify two limits, the first one is the default, or soft, memory limit, and the second one is the maximum memory limit.
LSF has two methods of enforcing memory usage:
OS memory limit enforcement is the default MEMLIMIT behavior and does not require further configuration. OS enforcement usually allows the process to eventually run to completion. LSF passes MEMLIMIT to the OS which uses it as a guide for the system scheduler and memory allocator. The system may allocate more memory to a process if there is a surplus. When memory is low, the system takes memory from and lowers the scheduling priority (re-nice) of a process that has exceeded its declared MEMLIMIT. Only available on systems that support
RLIMIT_RSS
forsetrlimit()
.Not supported on:
To enable LSF memory limit enforcement, set LSB_MEMLIMIT_ENFORCE in
lsf.conf
toy
. LSF memory limit enforcement explicitly sends a signal to kill a running process once it has allocated memory past MEMLIMIT.You can also enable LSF memory limit enforcement by setting LSB_JOB_MEMLIMIT in
lsf.conf
toy.
The difference between LSB_JOB_MEMLIMIT set to y and LSB_MEMLIMIT_ENFORCE set to y is that with LSB_JOB_MEMLIMIT, only the per-job memory limit enforced by LSF is enabled. The per-process memory limit enforced by the OS is disabled. With LSB_MEMLIMIT_ENFORCE set to y, both the per-job memory limit enforced by LSF and the per-process memory limit enforced by the OS are enabled.Available for all systems on which LSF collects total memory usage.
The following configuration defines a queue with a memory limit of 5000 KB:
Begin Queue QUEUE_NAME = default DESCRIPTION = Queue with memory limit of 5000 kbytes MEMLIMIT = 5000 End QueueUnlimited
MIG
MIG =
minutesEnables automatic job migration and specifies the migration threshold, in minutes.
Does not affect MultiCluster jobs that are forwarded to a remote cluster.
If a checkpointable or rerunnable job dispatched to the host is suspended (SSUSP state) for longer than the specified number of minutes, the job is migrated (unless another job on the same host is being migrated). A value of 0 (zero) specifies that a suspended job should be migrated immediately.
If a migration threshold is defined at both host and queue levels, the lower threshold is used.
Members of a chunk job can be migrated. Chunk jobs in WAIT state are removed from the job chunk and put into PEND state.
Undefined (no automatic job migration)
NEW_JOB_SCHED_DELAY
NEW_JOB_SCHED_DELAY =
secondsThe number of seconds that a new job waits, before being scheduled. A value of zero (0) means the job is scheduled without any delay.
2 seconds
NICE
NICE =
integerAdjusts the UNIX scheduling priority at which jobs from this queue execute.
The default value of 0 (zero) maintains the default scheduling priority for UNIX interactive jobs. This value adjusts the run-time priorities for batch jobs on a queue-by-queue basis, to control their effect on other batch or interactive jobs. See the
nice
(1
) manual page for more details.On Windows, this value is mapped to Windows process priority classes as follows:
Platform LSF on Windows does not support
HIGH
orREAL-TIME
priority classes.0 (zero)
NQS_QUEUES
NQS_QUEUES =
NQS_ queue_name@
NQS_host_name ...Makes the queue an NQS forward queue.
NQS_host_name is an NQS host name that can be the official host name or an alias name known to the LSF master host through
gethostbyname
(3).NQS_queue_name is the name of an NQS destination queue on this host. NQS destination queues are considered for job routing in the order in which they are listed here. If a queue accepts the job, it is routed to that queue. If no queue accepts the job, it remains pending in the NQS forward queue.
lsb.nqsmaps
must be present for the LSF system to route jobs in this queue to NQS systems.You must configure LSB_MAX_NQS_QUEUES in
lsf.conf
to specify the maximum number of NQS queues allowed in the LSF cluster. This is required for LSF to work with NQS.Since many features of LSF are not supported by NQS, the following queue configuration parameters are ignored for NQS forward queues: PJOB_LIMIT, POLICIES, RUN_WINDOW, DISPATCH_WINDOW, RUNLIMIT, HOSTS, MIG. In addition, scheduling load threshold parameters are ignored because NQS does not provide load information about hosts.
Undefined
PJOB_LIMIT
PJOB_LIMIT =
floatPer-processor job slot limit for the queue.
Maximum number of job slots that this queue can use on any processor. This limit is configured per processor, so that multiprocessor hosts automatically run more jobs.
Unlimited
POST_EXEC
POST_EXEC =
commandA command run on the execution host after the job.
The entire contents of the configuration line of the pre- and post-execution commands are run under
/bin/sh -c
, so shell features can be used in the command.The pre- and post-execution commands are run in
/tmp
.Standard input and standard output and error are set to:
/dev/null
The output from the pre- and post-execution commands can be explicitly redirected to a file for debugging purposes.
The PATH environment variable is set to:
'/bin /usr/bin /sbin/usr/sbin'The pre- and post-execution commands are run under
cmd.exe/c
.To run these commands under a different user account (such as root, to do privileged operations, if necessary), configure the parameter LSB_PRE_POST_EXEC_USER in
lsf.sudoers
.Standard input and standard output and error are set to NUL. The output from the pre- and post-execution commands can be explicitly redirected to a file for debugging purposes.
The PATH is determined by the setup of the LSF Service.
- Other environment variables set for the job are also set for the pre- and post-execution commands.
- When a job is dispatched from a queue that has a pre-execution command, the system will remember the post-execution command defined for the queue from which the job is dispatched. If the job is later switched to another queue or the post-execution command of the queue is changed, the original post-execution command will be run.
- When the post-execution command is run, the environment variable LSB_JOBEXIT_STAT is set to the exit status of the job. Refer to the manual page for
wait
(2) for the format of this exit status.- The post-execution command is also run if a job is requeued because the job's execution environment fails to be set up or if the job exits with one of the queue's REQUEUE_EXIT_VALUES. The environment variable LSB_JOBPEND is set if the job is requeued. If the job's execution environment could not be set up, LSB_JOBEXIT_STAT is set to 0 (zero).
No post-execution commands
PRE_EXEC
PRE_EXEC =
commandA command run on the execution host before the job.
To specify a pre-execution command at the job level, use
bsub -E
. If both queue and job level pre-execution commands are specified, the job level pre- execution is run after the queue level pre-execution command.If the pre-execution command exits with a non-zero exit code, it is considered to have failed, and the job is requeued to the head of the queue. This feature can be used to implement customized scheduling by having the pre-execution command fail if conditions for dispatching the job are not met.
Other environment variables set for the job are also set for the pre- and post- execution commands.
The entire contents of the configuration line of the pre- and post-execution commands are run under
/bin/sh -c
, so shell features can be used in the command.The pre- and post-execution commands are run in
/tmp
.Standard input and standard output and error are set to:
/dev/null
The output from the pre- and post-execution commands can be explicitly redirected to a file for debugging purposes.
The PATH environment variable is set to:
/bin /usr/bin /sbin/usr/sbin
The pre- and post-execution commands are run under
cmd.exe/c
.To run these commands under a different user account (such as root, to do privileged operations, if necessary), configure the parameter LSB_PRE_POST_EXEC_USER in
lsf.sudoers
.Standard input and standard output and error are set to NUL. The output from the pre- and post-execution commands can be explicitly redirected to a file for debugging purposes.
The PATH is determined by the setup of the LSF Service.
No pre-execution commands
PREEMPTION
PREEMPTION = PREEMPTIVE
[[
queue_name[+
pref_level]...]
]PREEMPTABLE
[[
queue_name...]
]Enables preemptive scheduling and defines a preemption policy for the queue.
You can specify PREEMPTIVE or PREEMPTABLE or both.When you specify a list of queues, you must enclose the list in one set of square brackets.
- PREEMPTIVE defines a preemptive queue. Jobs in this queue preempt jobs from the specified lower-priority queues or from all lower-priority queues by default (if the parameter is specified with no queue names).
If you specify a list of lower-priority queues, you must enclose the list in one set of square brackets. To indicate an order of preference for the lower-priority queues, put a plus sign (+) after the names of queues and a preference level as a positive integer.
- PREEMPTABLE defines a preemptable queue. Jobs in this queue can be preempted by jobs from specified higher-priority queues, or from all higher-priority queues by default, even if the higher-priority queues are not preemptive. If you specify a list of higher-priority queues, you must enclose the list in one set of square brackets.
PREEMPTIVE and PREEMPTABLE can be used together, to specify that jobs in this queue can always preempt jobs in lower-priority queues and can always be preempted by jobs from higher-priority queues.
PRIORITY
PRIORITY =
integerThe queue priority. A higher value indicates a higher LSF dispatching priority, relative to other queues.
LSF schedules jobs from one queue at a time, starting with the highest-priority queue. If multiple queues have the same priority, LSF schedules all the jobs from these queues in first-come, first-served order.
However, only jobs from FCFS queues are scheduled together. If fairshare queues have the same priority, the jobs are always scheduled queue-by-queue, in the order in which the queues are listed in
lsb.queues
. If a cluster has both FCFS and fairshare queues all having the same priority, thelsb.queues
order is considered, but all the FCFS jobs are scheduled at once, when the first FCFS queue has its turn.Queue priority in LSF is completely independent of the UNIX scheduler's priority system for time-sharing processes. In LSF, the NICE parameter is used to set the UNIX time-sharing priority for batch jobs.
1 (lowest possible priority)
PROCESSLIMIT
PROCESSLIMIT =
[default_limit] maximum_limitLimits the number of concurrent processes that can be part of a job.
By default, if a default process limit is specified, jobs submitted to the queue without a job-level process limit are killed when the default process limit is reached.
If you specify only one limit, it is the maximum, or hard, process limit. If you specify two limits, the first one is the default, or soft, process limit, and the second one is the maximum process limit.
Unlimited
PROCLIMIT
PROCLIMIT =
[minimum_limit [default_limit]] maximum_limitMaximum number of slots that can be allocated to a job. For parallel jobs, the maximum number of processors that can be allocated to the job.
Optionally specifies the minimum and default number of job slots.
All limits must be positive numbers greater than or equal to 1 that satisfy the following relationship:
1 <= minimum <= default <= maximum
You can specify up to three limits in the PROCLIMIT parameter:
Jobs that request fewer slots than the minimum PROCLIMIT or more slots than the maximum PROCLIMIT cannot use the queue and are rejected. If the job requests minimum and maximum job slots, the maximum slots requested cannot be less than the minimum PROCLIMIT, and the minimum slots requested cannot be more than the maximum PROCLIMIT.
Unlimited, the default number of slots is 1
QJOB_LIMIT
QJOB_LIMIT
=
integerJob slot limit for the queue. Total number of job slots that this queue can use.
Unlimited
QUEUE_NAME
QUEUE_NAME =
stringRequired. Name of the queue.
Specify any ASCII string up to 60 characters long. You can use letters, digits, underscores (_) or dashes (-). You cannot use blank spaces. You cannot specify the reserved name
default
.You must specify this parameter to define a queue. The default queue automatically created by LSF is named
default
.RCVJOBS_FROM
RCVJOBS_FROM
=
cluster_name ... |allclusters
MultiCluster only. Defines a MultiCluster receive-jobs queue.
Specify cluster names, separated by a space. The administrator of each remote cluster determines which queues in that cluster will forward jobs to the local cluster.
Use the keyword
allclusters
to specify any remote cluster.RCVJOBS_FROM=cluster2 cluster4 cluster6This queue accepts remote jobs from clusters 2, 4, and 6.
REQUEUE_EXIT_VALUES
REQUEUE_EXIT_VALUES
=
[exit_code ...] [EXCLUDE(
exit_code ...)
]Enables automatic job requeue and sets the LSB_EXIT_REQUEUE environment variable. Separate multiple exit codes with spaces.
Jobs are requeued to the head of the queue. The output from the failed run is not saved, and the user is not notified by LSF.
Define an exit code as EXCLUDE(exit_code) to enable exclusive job requeue. Exclusive job requeue does not work for parallel jobs.
For MultiCluster jobs forwarded to a remote execution cluster, the exit values specified in the submission cluster with the EXCLUSIVE keyword are treated as if they were non-exclusive.
If
mbatchd
is restarted, it will not remember the previous hosts from which the job exited with an exclusive requeue exit code. In this situation, it is possible for a job to be dispatched to hosts on which the job has previously exited with an exclusive exit code.REQUEUE_EXIT_VALUES=30 EXCLUDE(20)means that jobs with exit code 30 are requeued, jobs with exit code 20 are requeued exclusively, and jobs with any other exit code are not requeued.
Undefined (jobs in this queue are not requeued)
RERUNNABLE
RERUNNABLE = yes
|no
If
yes
, enables automatic job rerun (restart).Members of a chunk job can be rerunnable. If the execution host becomes unavailable, rerunnable chunk job members are removed from the queue and dispatched to a different execution host.
no
RESOURCE_RESERVE
RESOURCE_RESERVE = MAX_RESERVE_TIME[
integer]
Enables processor reservation and memory reservation for pending jobs for the queue. Specifies the number of dispatch turns (MAX_RESERVE_TIME) over which a job can reserve job slots and memory.
Overrides the SLOT_RESERVE parameter. If both RESOURCE_RESERVE and SLOT_RESERVE are defined in the same queue, an error is displayed when the cluster is reconfigured, and SLOT_RESERVE is ignored. Job slot reservation for parallel jobs is enabled by RESOURCE_RESERVE if the LSF scheduler plugin module names for both resource reservation and parallel batch jobs (
schmod_parallel
andschmod_reserve
) are configured in thelsb.modules
file: Theschmod_parallel
name must come beforeschmod_reserve
inlsb.modules
.If a job has not accumulated enough memory or job slots to start by the time MAX_RESERVE_TIME expires, it releases all its reserved job slots or memory so that other pending jobs can run. After the reservation time expires, the job cannot reserve memory or slots for one scheduling session, so other jobs have a chance to be dispatched. After one scheduling session, the job can reserve available memory and job slots again for another period specified by MAX_RESERVE_TIME.
If BACKFILL is configured in a queue, and a run limit is specified with
-W
onbsub
or with RUNLIMIT in the queue, backfill jobs can use the accumulated memory reserved by the other jobs in the queue, as long as the backfill job can finish before the predicted start time of the jobs with the reservation.Unlike slot reservation, which only applies to parallel jobs, memory reservation and backfill on memory apply to sequential and parallel jobs.
SLOT_RESERVE = MAX_RESERVE_TIME[5]This example specifies that jobs have up to 5 dispatch turns to reserve sufficient job slots or memory (equal to 5 minutes, by default).
Undefined (no job slots or memory reserved)
RES_REQ
RES_REQ =
res_reqResource requirements used to determine eligible hosts. Specify a resource requirement string as usual. The resource requirement string lets you specify conditions in a more flexible manner than using the load thresholds.
The
select
section defined at the queue level must be satisfied at in addition to any job-level requirements or load thresholds.When both job-level and queue-level
rusage
sections are defined, therusage
section defined for the job overrides therusage
section defined in the queue. The tworusage
definitions are merged, with the job-levelrusage
taking precedence. For example:
- Given a RES_REQ definition in a queue:
RES_REQ = rusage[mem=200:lic=1] ...and job submission:
bsub -R'rusage[mem=100]' ...The resulting requirement for the job is
rusage[mem=100:lic=1]where
mem=100
specified by the job overridesmem=200
specified by the queue. However,lic=1
from queue is kept, since job does not specify it.- For the following queue-level RES_REQ (decay and duration defined):
RES_REQ = rusage[mem=200:duration=20:decay=1] ...and job submission (no decay or duration):
bsub -R'rusage[mem=100]' ...The resulting requirement for the job is:
rusage[mem=100:duration=20:decay=1]Queue-level duration and decay are merged with the job-level specification, and
mem=100
for the job overridesmem=200
specified by the queue. However,duration=20
anddecay=1
from queue are kept, since job does not specify them.The
order
section defined at the queue level is ignored if any resource requirements are specified at the job level (if the job-level resource requirements do not include theorder
section, the default order,r15s:pg
, is used instead of the queue-level resource requirement).The
span
section defined at the queue level is ignored if thespan
section is also defined at the job level.If RES_REQ is defined at the queue level and there are no load thresholds defined, the pending reasons for each individual load index will not be displayed by
bjobs
.select[type==local] order[r15s:pg]. If this parameter is defined and a host model or Boolean resource is specified, the default type will be
any
.RESUME_COND
RESUME_COND =
res_reqUse the
select
section of the resource requirement string to specify load thresholds. All other sections are ignored.LSF automatically resumes a suspended (SSUSP) job in this queue if the load on the host satisfies the specified conditions.
If RESUME_COND is not defined, then the
loadSched
thresholds are used to control resuming of jobs. TheloadSched
thresholds are ignored, when resuming jobs, if RESUME_COND is defined.RUN_WINDOW
RUN_WINDOW =
time_window ...Time periods during which jobs in the queue are allowed to run.
When the window closes, LSF suspends jobs running in the queue and stops dispatching jobs from the queue. When the window reopens, LSF resumes the suspended jobs and begins dispatching additional jobs.
Undefined (queue is always active)
RUNLIMIT
RUNLIMIT =
[default_limit] maximum_limitwhere default_limit and maximum_limit are:
[hours:]minutes[/host_name | /host_model]
The maximum run limit and optionally the default run limit. The name of a host or host model specifies the run time normalization host to use.
By default, jobs that are in the RUN state for longer than the specified maximum run limit are killed by LSF. You can optionally provide your own termination job action to override this default.
Jobs submitted with a job-level run limit (
bsub
-W
) that is less than the maximum run limit are killed when their job-level run limit is reached. Jobs submitted with a run limit greater than the maximum run limit are rejected by the queue.If a default run limit is specified, jobs submitted to the queue without a job- level run limit are killed when the default run limit is reached. The default run limit is used with backfill scheduling of parallel jobs.
If you specify only one limit, it is the maximum, or hard, run limit. If you specify two limits, the first one is the default, or soft, run limit, and the second one is the maximum run limit. The number of minutes may be greater than 59. Therefore, three and a half hours can be specified either as 3:30, or 210.
The run limit is in the form of [hours
:
]minutes. The minutes can be specified as a number greater than 59. For example, three and a half hours can either be specified as 3:30, or 210.The run limit you specify is the normalized run time. This is done so that the job does approximately the same amount of processing, even if it is sent to host with a faster or slower CPU. Whenever a normalized run time is given, the actual time on the execution host is the specified time multiplied by the CPU factor of the normalization host then divided by the CPU factor of the execution host.
If ABS_RUNLIMIT=Y is defined in
lsb.params
, the run time limit is not normalized by the host CPU factor. Absolute wall-clock run time is used for all jobs submitted to a queue with a run limit configured.Optionally, you can supply a host name or a host model name defined in LSF. You must insert `
/
' between the run limit and the host name or model name. (Seelsinfo
(1) to get host model information.)If no host or host model is given, LSF uses the default run time normalization host defined at the queue level (DEFAULT_HOST_SPEC in
lsb.queues
) if it has been configured; otherwise, LSF uses the default CPU time normalization host defined at the cluster level (DEFAULT_HOST_SPEC inlsb.params
) if it has been configured; otherwise, the host with the largest CPU factor (the fastest host in the cluster).For MultiCluster jobs, if no other CPU time normalization host is defined and information about the submission host is not available, LSF uses the host with the largest CPU factor (the fastest host in the cluster).
Jobs submitted to a chunk job queue are not chunked if RUNLIMIT is greater than 30 minutes.
Unlimited
SLOT_RESERVE
SLOT_RESERVE = MAX_RESERVE_TIME[
integer]
Enables processor reservation for the queue and specifies the reservation time. Specify the keyword MAX_RESERVE_TIME and, in square brackets, the number of MBD_SLEEP_TIME cycles over which a job can reserve job slots. MBD_SLEEP_TIME is defined in
lsb.params
; the default value is 60 seconds.If a job has not accumulated enough job slots to start before the reservation expires, it releases all its reserved job slots so that other jobs can run. Then, the job cannot reserve slots for one scheduling session, so other jobs have a chance to be dispatched. After one scheduling session, the job can reserve job slots again for another period specified by SLOT_RESERVE.
SLOT_RESERVE is overridden by the RESOURCE_RESERVE parameter.
If both RESOURCE_RESERVE and SLOT_RESERVE are defined in the same queue, job slot reservation and memory reservation are enabled and an error is displayed when the cluster is reconfigured. SLOT_RESERVE is ignored.
Job slot reservation for parallel jobs is enabled by RESOURCE_RESERVE if the LSF scheduler plugin module names for both resource reservation and parallel batch jobs (
schmod_parallel
andschmod_reserve
) are configured in thelsb.modules
file: Theschmod_parallel
name must come beforeschmod_reserve
inlsb.modules
.If BACKFILL is configured in a queue, and a run limit is specified with
-W
onbsub
or with RUNLIMIT in the queue, backfill parallel jobs can use job slots reserved by the other jobs, as long as the backfill job can finish before the predicted start time of the jobs with the reservation.Unlike memory reservation, which applies both to sequential and parallel jobs, slot reservation applies only to parallel jobs.
SLOT_RESERVE = MAX_RESERVE_TIME[5]This example specifies that parallel jobs have up to 5 cycles of MBD_SLEEP_TIME (5 minutes, by default) to reserve sufficient job slots to start.
SLOT_SHARE
SLOT_SHARE =
integerShare of job slots for queue-based fairshare. Represents the percentage of running jobs (job slots) in use from the queue. SLOT_SHARE must be greater than zero (0) and less than or equal to 100.
The sum of SLOT_SHARE for all queues in the pool does not need to be 100%. It can be more or less, depending on your needs.
Undefined
SLOT_POOL
SLOT_POOL =
pool_nameName of the pool of job slots the queue belongs to for queue-based fairshare. A queue can only belong to one pool. All queues in the pool must share the same set of hosts.
Specify any ASCII string up to 60 characters long. You can use letters, digits, underscores (_) or dashes (-). You cannot use blank spaces.
Undefined
Undefined (no job slots reserved)
SNDJOBS_TO
SNDJOBS_TO =
queue_name@
cluster_name ...Defines a MultiCluster send-jobs queue.
Specify remote queue names, in the form queue_name
@
cluster_name, separated by a space.This parameter is ignored if
lsb.queues
HOSTS specifies remote (borrowed) resources.SNDJOBS_TO=queue2@cluster2 queue3@cluster2 queue3@cluster3STACKLIMIT
STACKLIMIT =
integerThe per-process (hard) stack segment size limit (in KB) for all of the processes belonging to a job from this queue (see
getrlimit
(2
)).Unlimited
STOP_COND
STOP_COND =
res_reqUse the
select
section of the resource requirement string to specify load thresholds. All other sections are ignored.LSF automatically suspends a running job in this queue if the load on the host satisfies the specified conditions.
- LSF will not suspend the only job running on the host if the machine is interactively idle (
it
> 0).- LSF will not suspend a forced job (
brun -f
).- LSF will not suspend a job because of paging rate if the machine is interactively idle.
If STOP_COND is specified in the queue and there are no load thresholds, the suspending reasons for each individual load index will not be displayed by
bjobs
.STOP_COND= select[((!cs && it < 5) || (cs && mem < 15 && swap < 50))]In this example, assume "cs" is a Boolean resource indicating that the host is a computer server. The stop condition for jobs running on computer servers is based on the availability of swap memory. The stop condition for jobs running on other kinds of hosts is based on the idle time.
SWAPLIMIT
SWAPLIMIT =
integerThe amount of total virtual memory limit (in KB) for a job from this queue.
This limit applies to the whole job, no matter how many processes the job may contain.
The action taken when a job exceeds its SWAPLIMIT or PROCESSLIMIT is to send SIGQUIT, SIGINT, SIGTERM, and SIGKILL in sequence. For CPULIMIT, SIGXCPU is sent before SIGINT, SIGTERM, and SIGKILL.
Unlimited
TERMINATE_WHEN
TERMINATE_WHEN =
[LOAD
] [PREEMPT
] [WINDOW
]Configures the queue to invoke the TERMINATE action instead of the SUSPEND action in the specified circumstance.
- LOAD -- kills jobs when the load exceeds the suspending thresholds.
- PREEMPT -- kills jobs that are being preempted.
- WINDOW -- kills jobs if the run window closes.
If the TERMINATE_WHEN job control action is applied to a chunk job,
sbatchd
kills the chunk job element that is running and puts the rest of the waiting elements into pending state to be rescheduled later.Set TERMINATE_WHEN to WINDOW to define a night queue that will kill jobs if the run window closes:
Begin Queue NAME = night RUN_WINDOW = 20:00-08:00 TERMINATE_WHEN = WINDOW JOB_CONTROLS = TERMINATE[kill -KILL $LS_JOBPGIDS; mail - s "job $LSB_JOBID killed by queue run window" $USER < /dev/null] End QueueTHREADLIMIT
THREADLIMIT =
[default_limit] maximum_limitLimits the number of concurrent threads that can be part of a job. Exceeding the limit causes the job to terminate. The system sends the following signals in sequence to all processes belongs to the job: SIGINT, SIGTERM, and SIGKILL.
By default, if a default thread limit is specified, jobs submitted to the queue without a job-level thread limit are killed when the default thread limit is reached.
If you specify only one limit, it is the maximum, or hard, thread limit. If you specify two limits, the first one is the default, or soft, thread limit, and the second one is the maximum thread limit.
Both the default and the maximum limits must be positive integers. The default limit must be less than the maximum limit. The default limit is ignored if it is greater than the maximum limit.
THREADLIMIT=6No default thread limit is specified. The value 6 is the default and maximum thread limit.
THREADLIMIT=6 8The first value (6) is the default thread limit. The second value (8) is the maximum thread limit.
Unlimited
UJOB_LIMIT
UJOB_LIMIT
=
integerPer-user job slot limit for the queue. Maximum number of job slots that each user can use in this queue.
Unlimited
USERS
USERS =
all
| [user_name...] [ user_group...]A list of users that can submit jobs to this queue. Use the reserved word
all
to specify all LSF users. LSF cluster administrators are automatically included in the list of users, so LSF cluster administrators can submit jobs to this queue, or switch any user's jobs into this queue, even if they are not listed.If user groups are specified, each user in the group can submit jobs to this queue. If FAIRSHARE is also defined in this queue, only users defined by both parameters can submit jobs, so LSF administrators cannot use the queue if they are not included in the share assignments.
all
[ Top ]
SEE ALSO
lsf.cluster
(5),lsf.conf
(5),lsb.params
(5),lsb.hosts
(5),lsb.users
(5),lsf.sudoers
(5),bhpart
(1),busers
(1),bchkpnt
(1),bugroup
(1),bmgroup
(1),nice
(1),getgrnam
(3),getrlimit
(2),bqueues
(1),bhosts
(1),bsub
(1),lsid
(1),mbatchd
(8),badmin
(8)[ Top ]
[ Platform Documentation ] [ Title ] [ Contents ] [ Previous ] [ Next ] [ Index ]
Date Modified: February 24, 2004
Platform Computing: www.platform.com
Platform Support: support@platform.com
Platform Information Development: doc@platform.com
Copyright © 1994-2004 Platform Computing Corporation. All rights reserved.