[ Platform Documentation ] [ Title ] [ Contents ] [ Previous ] [ Next ] [ Index ]
The
lsb.params
file defines general parameters used by the LSF system. This file contains only one section, named Parameters.mbatchd
useslsb.params
for initialization. The file is optional. If not present, the LSF-defined defaults are assumed.Some of the parameters that can be defined in
lsb.params
control timing within the system. The default settings provide good throughput for long- running batch jobs while adding a minimum of processing overhead in the batch daemons.This file is installed by default in
LSB_CONFDIR/
cluster_name/configdir
.[ Top ]
Parameters Section
This section and all the keywords in this section are optional. If keywords are not present, the default values are assumed. The valid keywords for this section are:
ABS_RUNLIMIT
ABS_RUNLIMIT = y
|Y
If set, the run time limit specified by the
-W
option ofbsub
, or the RUNLIMIT queue parameter inlsb.queues
is not normalized by the host CPU factor. Absolute wall-clock run time is used for all jobs submitted with a run limit.Undefined. Run limit is normalized.
ACCT_ARCHIVE_AGE
ACCT_ARCHIVE_AGE =
daysEnables automatic archiving of LSF accounting log files, and specifies the archive interval. LSF archives the current log file if the length of time from its creation date exceeds the specified number of days.
- ACCT_ARCHIVE_SIZE also enables automatic archiving.
- ACCT_ARCHIVE_TIME also enables automatic archiving.
- MAX_ACCT_ARCHIVE_FILE enables automatic deletion of the archives.
Undefined (no limit to the age of
lsb.acct
).ACCT_ARCHIVE_SIZE
ACCT_ARCHIVE_SIZE =
kilobytesEnables automatic archiving of LSF accounting log files, and specifies the archive threshold. LSF archives the current log file if its size exceeds the specified number of kilobytes.
- ACCT_ARCHIVE_AGE also enables automatic archiving.
- ACCT_ARCHIVE_TIME also enables automatic archiving.
- MAX_ACCT_ARCHIVE_FILE enables automatic deletion of the archives.
Undefined (no limit to the size of
lsb.acct
).ACCT_ARCHIVE_TIME
ACCT_ARCHIVE_TIME =
hh:
mmEnables automatic archiving of LSF accounting log file
lsb.acct
, and specifies the time of day to archive the current log file.
- ACCT_ARCHIVE_AGE also enables automatic archiving.
- ACCT_ARCHIVE_SIZE also enables automatic archiving.
- MAX_ACCT_ARCHIVE_FILE enables automatic deletion of the archives.
Undefined (no time set for archiving
lsb.acct
).CHUNK_JOB_DURATION
CHUNK_JOB_DURATION =
minutesSpecifies a CPU limit or run limit for jobs submitted to a chunk job queue to be chunked.
When CHUNK_JOB_DURATION is set, the CPU limit or run limit set in the queue (CPULIMIT or RUNLMIT) or specified at job submission (
-c
or-W
bsub
options) must be less than or equal to CHUNK_JOB_DURATION for jobs to be chunked.If CHUNK_JOB_DURATION is set, jobs are not chunked if:
- No CPU limit and no run limit are specified in the queue (CPULIMIT and RUNLIMIT) or at job submission (
-c
or-W
bsub
options).or
- CPU limit or a run limit is greater than the value of CHUNK_JOB_DURATION.
If CHUNK_JOB_DURATION is set, chunk jobs are accepted regardless of the value of CPULIMIT or RUNLIMIT.
The value of CHUNK_JOB_DURATION is displayed by
bparams -l
.Undefined
CLEAN_PERIOD
CLEAN_PERIOD =
secondsFor non-repetitive jobs, the amount of time that job records for jobs that have finished or have been killed are kept in
mbatchd
core memory after they have finished.Users can still see all jobs after they have finished using the
bjobs
command.For jobs that finished more than CLEAN_PERIOD seconds ago, use the
bhist
command.3600 (1 hour)
CPU_TIME_FACTOR
CPU_TIME_FACTOR
=
numberUsed only with fairshare scheduling. CPU time weighting factor.
In the calculation of a user's dynamic share priority, this factor determines the relative importance of the cumulative CPU time used by a user's jobs.
0.7
COMMITTED_RUN_TIME_FACTOR
COMMITTED_RUN_TIME_FACTOR
=
numberUsed only with fairshare scheduling. Committed run time weighting factor.
In the calculation of a user's dynamic priority, this factor determines the relative importance of the committed run time in the calculation. If the
-W
option ofbsub
is not specified at job submission and a RUNLIMIT has not been set for the queue, the committed run time is not considered.Any positive number between 0.0 and 1.0
0.0
DEFAULT_HOST_SPEC
DEFAULT_HOST_SPEC
=
host_name | host_modelThe default CPU time normalization host for the cluster.
The CPU factor of the specified host or host model will be used to normalize the CPU time limit of all jobs in the cluster, unless the CPU time normalization host is specified at the queue or job level.
Undefined
DEFAULT_PROJECT
DEFAULT_PROJECT
=
project_nameThe name of the default project. Specify any string.
When you submit a job without specifying any project name, and the environment variable LSB_DEFAULTPROJECT is not set, LSF automatically assigns the job to this project.
default
DEFAULT_QUEUE
DEFAULT_QUEUE
=
queue_name ...Space-separated list of candidate default queues (candidates must already be defined in
lsb.queues
).When you submit a job to LSF without explicitly specifying a queue, and the environment variable LSB_DEFAULTQUEUE is not set, LSF puts the job in the first queue in this list that satisfies the job's specifications subject to other restrictions, such as requested hosts, queue status, etc.
Undefined. When a user submits a job to LSF without explicitly specifying a queue, and there are no candidate default queues defined (by this parameter or by the user's environment variable LSB_DEFAULTQUEUE), LSF automatically creates a new queue named
default
, using the default configuration, and submits the job to that queue.DISABLE_UACCT_MAP
DISABLE_UACCT_MAP = y | Y
Specify y or Y to disable user-level account mapping.
Undefined
EADMIN_TRIGGER_DURATION
Defines how often
LSF_SERVERDIR/eadmin
is invoked once a job exception is detected. Used in conjunction with job exception handling parameters JOB_OVERRUN and JOB_UNDERRUN inlsb.queues
.EADMIN_TRIGGER_DURATION=205 minutes
ENABLE_HIST_RUN_TIME
ENABLE_HIST_RUN_TIME = y
|Y
Used only with fairshare scheduling. If set, enables the use of historical run time in the calculation of fairshare scheduling priority.
Undefined
ENABLE_USER_RESUME
ENABLE_USER_RESUME = Y
|N
Defines job resume permissions.
When this parameter is defined:
- If the value is Y, users can resume their own jobs that have been suspended by the administrator.
- If the value is N, jobs that are suspended by the administrator can only be resumed by the administrator or
root
; users do not have permission to resume a job suspended by another user or the administrator. Administrators can resume jobs suspended by users or administrators.Undefined (users cannot resume jobs suspended by administrator)
EVENT_UPDATE_INTERVAL
EVENT_UPDATE_INTERVAL =
secondsUsed with duplicate logging of event and accounting log files. LSB_LOCALDIR in
lsf.conf
must also be specified. Specifies how often to back up the data and synchronize the directories (LSB_SHAREDIR and LSB_LOCALDIR).The directories are always synchronized when data is logged to the files, or when
mbatchd
is started on the first LSF master host.Use this parameter if NFS traffic is too high and you want to reduce network traffic.
1 to INFINIT_INT
INFINIT_INT is defined in
lsf.h
Undefined
See lsf.conf under LSB_LOCALDIR.
HIST_HOURS
HIST_HOURS =
hoursUsed only with fairshare scheduling. Determines a rate of decay for cumulative CPU time and historical run time.
To calculate dynamic user priority, LSF scales the actual CPU time using a decay factor, so that 1 hour of recently-used time is equivalent to 0.1 hours after the specified number of hours has elapsed.
To calculate dynamic user priority with historical run time, LSF scales the accumulated run time of finished jobs using the same decay factor, so that 1 hour of recently-used time is equivalent to 0.1 hours after the specified number of hours has elapsed.
When HIST_HOURS=0, CPU time accumulated by running jobs is not decayed.
5
JOB_ACCEPT_INTERVAL
JOB_ACCEPT_INTERVAL
=
integerThe number you specify is multiplied by the value of
lsb.params
MBD_SLEEP_TIME (60 seconds by default). The result of the calculation is the number of seconds to wait after dispatching a job to a host, before dispatching a second job to the same host.If 0 (zero), a host may accept more than one job. By default, there is no limit to the total number of jobs that can run on a host, so if this parameter is set to 0, a very large number of jobs might be dispatched to a host all at once. This can overload your system to the point that it will be unable to create any more processes. It is not recommended to set this parameter to 0.
JOB_ACCEPT_INTERVAL set at the queue level (
lsb.queues
) overrides JOB_ACCEPT_INTERVAL set at the cluster level (lsb.params
).1
JOB_ATTA_DIR
JOB_ATTA_DIR =
directoryThe shared directory in which
mbatchd
saves the attached data of messages posted with thebpost
command.Use JOB_ATTA_DIR if you use
bpost
(1) andbread
(1)to transfer large data files between jobs and want to avoid using space in LSB_SHAREDDIR. By default, thebread
(1) command reads attachment data from the JOB_ATTA_DIR directory.JOB_ATTA_DIR should be shared by all hosts in the cluster, so that any potential LSF master host can reach it. Like LSB_SHAREDIR, the directory should be owned and writable by the primary LSF administrator. The directory must have at least 1 MB of free space.
The attached data will be stored under the directory in the format:
JOB_ATTA_DIR/
timestamp.
jobid.msgs/msg$msgindex
On UNIX, specify an absolute path. For example:
JOB_ATTA_DIR=/opt/share/lsf_workOn Windows, specify a UNC path or a path with a drive letter. For example:
JOB_ATTA_DIR=\\HostA\temp\lsf_workor JOB_ATTA_DIR=D:\temp\lsf_workAfter adding JOB_ATTA_DIR to
lsb.params
, usebadmin reconfig
to reconfigure your cluster.JOB_ATTA_DIR can be any valid UNIX or Windows path up to a maximum length of 256 characters.
Undefined
If JOB_ATTA_DIR is not specified, job message attachments are saved in
LSB_SHAREDIR/info/
.JOB_DEP_LAST_SUB
Used only with job dependency scheduling.
If set to 1, whenever dependency conditions use a job name that belongs to multiple jobs, LSF evaluates only the most recently submitted job.
Otherwise, all the jobs with the specified name must satisfy the dependency condition.
Undefined
JOB_EXIT_RATE_DURATION
Defines how long LSF waits before checking the job exit rate for a host. Used in conjunction with EXIT_RATE in
lsb.hosts
for LSF host exception handling.If the job exit rate is exceeded for the period specified by JOB_EXIT_RATE_DURATION, LSF invokes
LSF_SERVERDIR/eadmin
to trigger a host exception.JOB_EXIT_RATE_DURATION=510 minutes
JOB_PRIORITY_OVER_TIME
JOB_PRIORITY_OVER_TIME =
increment/
intervalJOB_PRIORITY_OVER_TIME enables automatic job priority escalation when MAX_USER_PRIORITY is also defined.
increment
Specifies the value used to increase job priority every interval minutes. Valid values are positive integers.
interval
Specifies the frequency, in minutes, to increment job priority. Valid values are positive integers.
Undefined
JOB_PRIORITY_OVER_TIME=3/20
Specifies that every 20 minute interval increment to job priority of pending jobs by 3.
JOB_SCHEDULING_INTERVAL
JOB_SCHEDULING_INTERVAL =
secondsTime interval at which
mbatchd
sends jobs for scheduling to the scheduling daemonmbschd
along with any collected load information.5 seconds
JOB_SPOOL_DIR
JOB_SPOOL_DIR
=
dirSpecifies the directory for buffering batch standard output and standard error for a job.
When JOB_SPOOL_DIR is defined, the standard output and standard error for the job is buffered in the specified directory.
Files are copied from the submission host to a temporary file in the directory specified by the JOB_SPOOL_DIR on the execution host. LSF removes these files when the job completes.
If JOB_SPOOL_DIR is not accessible or does not exist, files are spooled to the default job output directory
$HOME/.lsbatch
.For
bsub -is
andbsub -Zs
, JOB_SPOOL_DIR must be readable and writable by the job submission user, and it must be shared by the master host and the submission host. If the specified directory is not accessible or does not exist, and JOB_SPOOL_DIR is specified,bsub -is
cannot write to the default directoryLSB_SHAREDIR/
cluster_name/lsf_indir
, andbsub -Zs
cannot write to the default directoryLSB_SHAREDIR/
cluster_name/lsf_cmddir
, and the job will fail.As LSF runs jobs, it creates temporary directories and files under JOB_SPOOL_DIR. By default, LSF removes these directories and files after the job is finished. See
bsub
(1
) for information about job submission options that specify the disposition of these files.On UNIX, specify an absolute path. For example:
JOB_SPOOL_DIR=/home/share/lsf_spoolOn Windows, specify a UNC path or a path with a drive letter. For example:
JOB_SPOOL_DIR=\\HostA\share\spooldiror
JOB_SPOOL_DIR=D:\share\spooldirIn a mixed UNIX/Windows cluster, specify one path for the UNIX platform and one for the Windows platform. Separate the two paths by a pipe character (|):
JOB_SPOOL_DIR=/usr/share/lsf_spool | \\HostA\share\spooldirJOB_SPOOL_DIR can be any valid path up to a maximum length of 256 characters. This maximum path length includes the temporary directories and files that the LSF system creates as jobs run. The path you specify for JOB_SPOOL_DIR should be as short as possible to avoid exceeding this limit.
Undefined
Batch job output (standard output and standard error) is sent to the
.lsbatch
directory on the execution host:
- On UNIX:
$HOME/.lsbatch
- On Windows:
%windir%\lsbtmp
user_id\.lsbatch
If %HOME% is specified in the user environment, uses that directory instead of %windir% for spooled output.
JOB_TERMINATE_INTERVAL
JOB_TERMINATE_INTERVAL
=
secondsUNIX only.
Specifies the time interval in seconds between sending SIGINT, SIGTERM, and SIGKILL when terminating a job. When a job is terminated, the job is sent SIGINT, SIGTERM, and SIGKILL in sequence with a sleep time of JOB_TERMINATE_INTERVAL between sending the signals. This allows the job to clean up if necessary.
10
MAX_ACCT_ARCHIVE_FILE
MAX_ACCT_ARCHIVE_FILE =
integerEnables automatic deletion of archived LSF accounting log files and specifies the archive limit.
ACCT_ARCHIVE_SIZE or ACCT_ARCHIVE_AGE should also be defined.
MAX_ACCT_ARCHIVE_FILE=10LSF maintains the current
lsb.acct
and up to 10 archives. Every time the oldlsb.acct.9
becomeslsb.acct.10
, the oldlsb.acct.10
gets deleted.
- ACCT_ARCHIVE_AGE also enables automatic archiving.
- ACCT_ARCHIVE_SIZE also enables automatic archiving.
- ACCT_ARCHIVE_TIME also enables automatic archiving.
- MAX_ACCT_ARCHIVE_FILE enables automatic deletion of the archives.
Undefined (no deletion of
lsb.acct.
n files).MAX_JOB_ARRAY_SIZE
MAX_JOB_ARRAY_SIZE =
integerSpecifies the maximum number of jobs in a job array that can be created by a user for a single job submission. The maximum number of jobs in a job array cannot exceed this value.
A large job array allows a user to submit a large number of jobs to the system with a single job submission.
Specify an integer value from 1 to 65534.
1000
MAX_JOB_ATTA_SIZE
MAX_JOB_ATTA_SIZE
=
integer |0
Specify any number less than 20000.
Maximum attached data size, in KB, that can be transferred to a job.
Maximum size for data attached to a job with the
bpost
(1) command. Useful if you usebpost
(1) andbread
(1) to transfer large data files between jobs and you want to limit the usage in the current working directory.0 indicates that jobs cannot accept attached data files.
Undefined. LSF does not set a maximum size of job attachments.
MAX_JOBID
MAX_JOBID
=
integerThe job ID limit. The job ID limit is the highest job ID that LSF will ever assign, and also the maximum number of jobs in the system.
By default, LSF assigns job IDs up to 6 digits. This means that no more than 999999 jobs can be in the system at once.
Specify any integer from 999999 to 9999999 (for practical purposes, any seven- digit integer).
You cannot lower the job ID limit, but you can raise it to seven digits. This means you can have more jobs in the system, and the job ID numbers will roll over less often.
LSF assigns job IDs in sequence. When the job ID limit is reached, the count rolls over, so the next job submitted gets job ID "1". If the original job 1 remains in the system, LSF skips that number and assigns job ID "2", or the next available job ID. If you have so many jobs in the system that the low job IDs are still in use when the maximum job ID is assigned, jobs with sequential numbers could have totally different submission times.
By raising the job ID limit, you allow more time for old jobs to leave the system, and make it more likely that numbers can be assigned in sequence without conflicting with existing jobs.
MAX_JOBID
=
1234567999999
MAX_JOBINFO_QUERY_PERIOD
MAX_JOBINFO_QUERY_PERIOD =
integerMaximum time for job information query commands (e.g., bjobs) to wait.
When the time arrives, the query command processes exit, and all associated threads are terminated.
If the parameter is not defined, query command processes will wait for all threads to finish.
Specify a multiple of MBD_REFRESH_TIME.
Any positive integer greater than or equal to one (1)
Undefined
See lsf.conf under LSB_BLOCK_JOBINFO_TIMEOUT.
MAX_JOB_MSG_NUM
MAX_JOB_MSG_NUM
=
integer |0
Maximum number of message slots for each job. Maximum number of messages that can be posted to a job with the
bpost
(1) command.0 indicates that jobs cannot accept external messages.
128
MAX_JOB_NUM
MAX_JOB_NUM
=
integerThe maximum number of finished jobs whose events are to be stored in the
lsb.events
log file.Once the limit is reached,
mbatchd
starts a new event log file. The old event log file is saved aslsb.events.
n, with subsequent sequence number suffixes incremented by 1 each time a new log file is started. Event logging continues in the newlsb.events
file.1000
MAX_PREEXEC_RETRY
MAX_PREEXEC_RETRY
=
integerMultiCluster job forwarding model only. The maximum number of times to attempt the pre-execution command of a job from a remote cluster.
If the job's pre-execution command fails all attempts, the job is returned to the submission cluster.
MAX_SBD_CONNS
MAX_SBD_CONNS
=
integerThe maximum number of file descriptors
mbatchd
can have open and connected concurrently tosbatchd
Controls the maximum number of connections that can maintained to
sbatchd
s in the system. Many sites require more than 32 connections.The value should not exceed the file descriptor limit of the root (the usual limit is 1024). Setting it equal or larger than this limit can cause
mbatchd
to constantly die becausembatchd
allocates all file descriptors tosbatchd
connection. This could causembatchd
to run out of descriptors, which results in anmbatchd
fatal error, such as failure to openlsb.events
.Reasonable settings are:
32
MAX_SBD_FAIL
MAX_SBD_FAIL
=
integerThe maximum number of retries for reaching a non-responding slave batch daemon,
sbatchd
.The interval between retries is defined by MBD_SLEEP_TIME. If
mbatchd
fails to reach a host and has retried MAX_SBD_FAIL times, the host is considered unavailable. When a host becomes unavailable,mbatchd
assumes that all jobs running on that host have exited and that all rerunnable jobs (jobs submitted with thebsub
-r
option) are scheduled to be rerun on another host.3
MAX_SCHED_STAY
MAX_SCHED_STAY
=
integerThe time in seconds the
mbatchd
has for scheduling pass.3
MAX_USER_PRIORITY
MAX_USER_PRIORITY
=
integerEnables user-assigned job priority and specifies the maximum job priority a user can assign to a job.
LSF administrators can assign a job priority higher than the specified value.
User-assigned job priority changes the behavior of
btop
andbbot
.MAX_USER_PRIORITY=100
Specifies that 100 is the maximum job priority that can be specified by a user.
Undefined
MBD_REFRESH_TIME
MBD_REFRESH_TIME
=
secondsTime interval, in seconds, at which
mbatchd
will fork a new childmbatchd
to service query requests to keep information sent back to clients updated. A childmbatchd
processes query requests creating threads.MBD_REFRESH_TIME applies only to UNIX platforms that support thread programming.
MBD_REFRESH_TIME works in conjunction with LSB_QUERY_PORT in
lsf.conf
. The childmbatchd
continues to listen to the port number specified by LSB_QUERY_PORT and creates threads to service requests until the job changes status, a new job is submitted, or MBD_REFRESH_TIME has expired.
- If MBD_REFRESH_TIME is < 10 seconds, the child
mbatchd
exits at MBD_REFRESH_TIME even if the job changes status or a new job is submitted before MBD_REFRESH_TIME expires- If MBD_REFRESH_TIME > 10 seconds, the child
mbatchd
exits at 10 seconds even if the job changes status or a new job is submitted before the 10 seconds- If MBD_REFRESH_TIME > 10 seconds and no job changes status or a new job is submitted, the child
mbatchd
exits at MBD_REFRESH_TIMEThe value of this parameter must be between 5 and 300. Any values specified out of this range are ignored, and the system default value is applied.
The
bjobs
command may not display up-to-date information if two consecutive query commands are issued before a childmbatchd
expires because childmbatchd
job information is not updated. If you use thebjobs
command and do not get up-to-date information, you may need to decrease the value of this parameter. Note, however, that the lower the value of this parameter, the more you negatively affect performance.The number of concurrent requests is limited by the number of concurrent threads that a process can have. This number varies by platform:
- Sun Solaris, 2500 threads per process
- AIX, 512 threads per process
- Digital, 256 threads per process
- HP-UX, 64 threads per process
5 seconds if not defined or if defined value is less than 5; 300 seconds if defined value is more than 300
MBD_SLEEP_TIME
MBD_SLEEP_TIME
=
secondsUsed in conjunction with the parameters SLOT_RESERVE, MAX_SBD_FAIL.
Amount of time in seconds used for calculating parameter values.
60
MC_RECLAIM_DELAY
MC_RECLAIM_DELAY =
minutesMultiCluster resource leasing model only. The reclaim interval (how often to reconfigure shared leases) in minutes.
Shared leases are defined by
Type=shared
in thelsb.resources
HostExport section.10
MC_PENDING_REASON_PKG_SIZE
MC_PENDING_REASON_PKG_SIZE =
kilobytes |0
MultiCluster job forwarding model only. Pending reason update package size, in KB. Defines the maximum amount of pending reason data this cluster will send to submission clusters in one cycle.
Specify the keyword
0
(zero) to disable the limit and allow any amount of data in one package.512
MC_PENDING_REASON_UPDATE_INTERVAL
MC_PENDING_REASON_UPDATE_INTERVAL =
seconds |0
MultiCluster job forwarding model only. Pending reason update interval, in seconds. Defines how often this cluster will update submission clusters about the status of pending MultiCluster jobs.
Specify the keyword
0
(zero) to disable pending reason updating between clusters.300
MC_RUSAGE_UPDATE_INTERVAL
MC_RUSAGE_UPDATE_INTERVAL
=
secondsMultiCluster only. Enables resource use updating for MultiCluster jobs running on hosts in the cluster and specifies how often to send updated information to the submission or consumer cluster.
300
NO_PREEMPT_RUN_TIME
NO_PREEMPT_RUN_TIME =
run_timeIf set, jobs have been running for the specified number of minutes or longer will not be preempted. Run time is wall-clock time, not normalized run time.
You must define a run limit for the job, either at job level by
bsub -W
option or in the queue by configuring RUNLIMIT inlsb.queues
.NO_PREEMPT_FINISH_TIME
NO_PREEMPT_FINISH_TIME =
finish_timeIf set, jobs that will finish within the specified number of minutes will not be preempted. Run time is wall-clock time, not normalized run time.
You must define a run limit for the job, either at job level by
bsub -W
option or in the queue by configuring RUNLIMIT inlsb.queues
.NQS_QUEUES_FLAGS
NQS_QUEUES_FLAGS =
integerFor Cray NQS compatibility only. Used by LSF to get the NQS queue information.
If the NQS version on a Cray is NQS 1.1, 80.42 or NQS 71.3, this parameter does not need to be defined.
For other versions of NQS on Cray, define both NQS_QUEUES_FLAGS and NQS_REQUESTS_FLAGS.
To determine the value of this parameter, run the NQS
qstat
command. The value of Npk_int[1] in the output is the value you need for this parameter. Refer to the NQS chapter in Administering Platform LSF for more details.Undefined
NQS_REQUESTS_FLAGS
NQS_REQUESTS_FLAGS
=
integerFor Cray NQS compatibility only.
If the NQS version on a Cray is NQS 80.42 or NQS 71.3, this parameter does not need to be defined.
If the version is NQS 1.1 on a Cray, set this parameter to 251918848. This is the is the
qstat
flag which LSF uses to retrieve requests on Cray in long format.For other versions of NQS on a Cray, run the NQS
qstat
command. The value ofNpk_int[1]
in the output is the value you need for this parameter. Refer to the NQS chapter in Administering Platform LSF for more details.Undefined
PEND_REASON_UPDATE_INTERVAL
PEND_REASON_UPDATE_INTERVAL
=
secondsTime interval that defines how often pending reasons are calculated by the scheduling daemon
mbschd
.30 seconds
PEND_REASON_MAX_JOBS
PEND_REASON_MAX_JOBS
=
integerNumber of jobs for each user per queue for which pending reasons are calculated by the scheduling daemon
mbschd
. Pending reasons are calculated at a time period set by PEND_REASON_UPDATE_INTERVAL.20 jobs
PG_SUSP_IT
PG_SUSP_IT
=
secondsThe time interval that a host should be interactively idle (it > 0) before jobs suspended because of a threshold on the
pg
load index can be resumed.This parameter is used to prevent the case in which a batch job is suspended and resumed too often as it raises the paging rate while running and lowers it while suspended. If you are not concerned with the interference with interactive jobs caused by paging, the value of this parameter may be set to 0.
180 (seconds)
PREEMPTABLE_RESOURCES
PREEMPTABLE_RESOURCES
=
resource_name...LicenseMaximizer only. Enables license preemption when preemptive scheduling is enabled (has no effect if PREEMPTIVE is not also specified) and specifies the licenses that will be preemption resources. Specify shared numeric resources, static or decreasing, that LSF is configured to release (RELEASE=Y in
lsf.shared
, which is the default).You must also configure LSF's preemption action to make the preempted application releases its licenses. To kill preempted jobs instead of suspending them, set TERMINATE_WHEN=PREEMPT in
lsb.queues
, or set JOB_CONTROLS inlsb.queues
and specifybrequeue
as the SUSPEND action.Undefined (if preemptive scheduling is configured, LSF preempts on job slots only)
PREEMPT_FOR
PREEMPT_FOR
=
[HOST_JLU
|USER_JLP
|GROUP_MAX
|GROUP_JLP
|MINI_JOB
|LEAST_RUN_TIME
]...If preemptive scheduling is enabled, this parameter can change the behavior of job slot limits and can also enable the optimized preemption mechanism for parallel jobs.
Specify a space-separated list of the following keywords:
- GROUP_MAX--LSF does not count suspended jobs against the total job slot limit for user groups, specified at the user level (MAX_JOBS in
lsb.users
); if preemptive scheduling is enabled, suspended jobs never count against the limit for individual users- HOST_JLU--LSF does not count suspended jobs against the total number of jobs for users and user groups, specified at the host level (JL/U in
lsb.hosts
)- USER_JLP--LSF does not count suspended jobs against the user-processor job slot limit for individual users, specified at the user level (JL/P in
lsb.users
)- GROUP_JLP--LSF does not count suspended jobs against the per-processor job slot limit for user groups, specified at the user level (JL/P in
lsb.users
)- MINI_JOB--LSF uses the optimized preemption mechanism for preemption between parallel jobs
- LEAST_RUN_TIME--LSF preempts job with least run time. Run time is wall-clock time, not normalized run time.
Job slot limits specified at the queue level always count suspended jobs.
Undefined. If preemptive scheduling is configured, the default preemption mechanism is used to preempt parallel jobs, and suspended jobs are ignored for the following limits only:
- Total job slot limit for hosts, specified at the host level (MXJ in
lsb.hosts
)- Total job slot limit for individual users, specified at the user level (MAX_JOBS in
lsb.users
); by default, suspended jobs still count against the limit for user groupsPREEMPTION_WAIT_TIME
PREEMPTION_WAIT_TIME
=
secondsLicenseMaximizer only. You must also specify PREEMPTABLE_RESOURCES in
lsb.params
).The amount of time LSF waits, after preempting jobs, for preemption resources to become available. Specify at least 300 seconds.
If LSF does not get the resources after this time, LSF might preempt more jobs.
300 (5 minutes)
RESOURCE_RESERVE_PER_SLOT
RESOURCE_RESERVE_PER_SLOT
=
y
|Y
If Y,
mbatchd
reserves resources based on job slots instead of per-host.By default,
mbatchd
only reserves static resources for parallel jobs on a per- host basis. For example, by default, the command:%bsub -n 4 -R "rusage[mem=500]" -q reservation my_job
requires the job to reserve 500 MB on each host where the job runs.
Some parallel jobs need to reserve resources based on job slots, rather than by host. In this example, if per-slot reservation is enabled by RESOURCE_RESERVE_PER_SLOT, the job
my_job
must reserve 500 MB of memory for each job slot (4 * 500 = 2 GB) on the host in order to run.If RESOURCE_RESERVE_PER_SLOT is set, the following command reserves the resource
static_resource
on all 4 job slots instead of only 1 on the host where the job runs:bsub -n 4 -R "static_resource > 0 rusage[static_resource=1]" myjobUndefined (reserve resources per-host)
RUN_JOB_FACTOR
RUN_JOB_FACTOR
=
numberUsed only with fairshare scheduling. Job slots weighting factor.
In the calculation of a user's dynamic share priority, this factor determines the relative importance of the number of job slots reserved and in use by a user.
3.0
RUN_TIME_FACTOR
RUN_TIME_FACTOR
=
numberUsed only with fairshare scheduling. Run time weighting factor.
In the calculation of a user's dynamic share priority, this factor determines the relative importance of the total run time of a user's running jobs.
0.7
SBD_SLEEP_TIME
SBD_SLEEP_TIME
=
secondsThe interval at which LSF checks the load conditions of each host, to decide whether jobs on the host must be suspended or resumed.
The job-level resource usage information is updated at a maximum frequency of every SBD_SLEEP_TIME seconds.
The update is done only if the value for the CPU time, resident memory usage, or virtual memory usage has changed by more than 10 percent from the previous update or if a new process or process group has been created.
30
SYSTEM_MAPPING_ACCOUNT
SYSTEM_MAPPING_ACCOUNT
=
user_accountLSF Windows Workgroup installations only. User account to which all Windows workgroup user accounts are mapped.
Undefined
USER_ADVANCE_RESERVATION
USER_ADVANCE_RESERVATION in
lsb.params
is obsolete. Use the ResourceReservation section configuration inlsb.resources
to configure advance reservation policies for your cluster.[ Top ]
SEE ALSO
lsf.conf
(5),lsb.params
(5),lsb.hosts
(5),lsb.users
(5),bsub
(1)[ Top ]
[ Platform Documentation ] [ Title ] [ Contents ] [ Previous ] [ Next ] [ Index ]
Date Modified: February 24, 2004
Platform Computing: www.platform.com
Platform Support: support@platform.com
Platform Information Development: doc@platform.com
Copyright © 1994-2004 Platform Computing Corporation. All rights reserved.