Platform LSF Version 6.0 - Platform LSF Reference

[ Platform Documentation ] [ Title ] [ Contents ] [ Previous ] [ Next ] [ Index ]

bqueues

displays information about queues

SYNOPSIS

bqueues [-w | -l | -r] [-m host_name | -m host_group | -m cluster_name | -m all] [-u user_name | -u user_group | -u all] [queue_name ...]

bqueues [-h | -V]

DESCRIPTION

Displays information about queues.

By default, returns the following information about all queues: queue name, queue priority, queue status, job slot statistics, and job state statistics.

In MultiCluster, returns the information about all queues in the local cluster.

Batch queue names and characteristics are set up by the LSF administrator (see lsb.queues(5) and mbatchd(8)).

CPU time is normalized.

OPTIONS

-w

Displays queue information in a wide format. Fields are displayed without truncation.

-l

Displays queue information in a long multiline format. The -l option displays the following additional information: queue description, queue characteristics and statistics, scheduling parameters, resource usage limits, scheduling policies, users, hosts, associated commands, dispatch and run windows, and job controls.

Also displays user shares.

If you specified an administrator comment with the -C option of the queue control commands qclose, qopen, qact, and qinact, qhist displays the comment text.

-r

Displays the same information as the -l option. In addition, if fairshare is defined for the queue, displays recursively the share account tree of the fairshare queue.

-m host_name | -m host_group | -m cluster_name | -m all

Displays the queues that can run jobs on the specified host. If the keyword all is specified, displays the queues that can run jobs on all hosts.

If a host group is specified, displays the queues that include that group in their configuration. For a list of host groups see bmgroup(1).

In MultiCluster, if the all keyword is specified, displays the queues that can run jobs on all hosts in the local cluster. If a cluster name is specified, displays all queues in the specified cluster.

-u user_name | -u user_group | -u all

Displays the queues that can accept jobs from the specified user. If the keyword all is specified, displays the queues that can accept jobs from all users.

If a user group is specified, displays the queues that include that group in their configuration. For a list of user groups see bugroup(1)).

queue_name ...

Displays information about the specified queues.

-h

Prints command usage to stderr and exits.

-V

Prints LSF release version to stderr and exits.

OUTPUT

Default Output

Displays the following fields:

QUEUE_NAME

The name of the queue. Queues are named to correspond to the type of jobs usually submitted to them, or to the type of services they provide.

lost_and_found

If the LSF administrator removes queues from the system, LSF creates a queue called lost_and_found and places the jobs from the removed queues into the lost_and_found queue. Jobs in the lost_and_found queue will not be started unless they are switched to other queues (see bswitch).

PRIO

The priority of the queue. The larger the value, the higher the priority. If job priority is not configured, determines the queue search order at job dispatch, suspension and resumption time. Jobs from higher priority queues are dispatched first (this is contrary to UNIX process priority ordering), and jobs from lower priority queues are suspended first when hosts are overloaded.

STATUS

The current status of the queue. The possible values are:

Open

The queue is able to accept jobs.

Closed

The queue is not able to accept jobs.

Active

Jobs in the queue may be started.

Inactive

Jobs in the queue cannot be started for the time being.

At any moment, each queue is either Open or Closed, and is either Active or Inactive. The queue can be opened, closed, inactivated and re-activated by the LSF administrator using badmin (see badmin(8)).

Jobs submitted to a queue that is later closed are still dispatched as long as the queue is active. The queue can also become inactive when either its dispatch window is closed or its run window is closed (see DISPATCH_WINDOWS in the "Output for the -l Option" section). In this case, the queue cannot be activated using badmin. The queue is re- activated by LSF when one of its dispatch windows and one of its run windows are open again. The initial state of a queue at LSF boot time is set to open, and either active or inactive depending on its windows.

MAX

The maximum number of job slots that can be used by the jobs from the queue. These job slots are used by dispatched jobs which have not yet finished, and by pending jobs which have slots reserved for them.

A sequential job will use one job slot when it is dispatched to a host, while a parallel job will use as many job slots as is required by bsub -n when it is dispatched. See bsub(1) for details. If `-' is displayed, there is no limit.

JL/U

The maximum number of job slots each user can use for jobs in the queue. These job slots are used by your dispatched jobs which have not yet finished, and by pending jobs which have slots reserved for them. If `-' is displayed, there is no limit.

JL/P

The maximum number of job slots a processor can process from the queue. This includes job slots of dispatched jobs that have not yet finished, and job slots reserved for some pending jobs. The job slot limit per processor (JL/P) controls the number of jobs sent to each host. This limit is configured per processor so that multiprocessor hosts are automatically allowed to run more jobs. If `-' is displayed, there is no limit.

JL/H

The maximum number of job slots a host can allocate from this queue. This includes the job slots of dispatched jobs that have not yet finished, and those reserved for some pending jobs. The job slot limit per host (JL/H) controls the number of jobs sent to each host, regardless of whether a host is a uniprocessor host or a multiprocessor host. If `-' is displayed, there is no limit.

NJOBS

The total number of job slots held currently by jobs in the queue. This includes pending, running, suspended and reserved job slots. A parallel job that is running on n processors is counted as n job slots, since it takes n job slots in the queue. See bjobs(1) for an explanation of batch job states.

PEND

The number of job slots used by pending jobs in the queue.

RUN

The number of job slots used by running jobs in the queue.

SUSP

The number of job slots used by suspended jobs in the queue.

Output for -l Option

In addition to the above fields, the -l option displays the following:

Description

A description of the typical use of the queue.

Default queue indication

Indicates that this is the default queue.

PARAMETERS/STATISTICS

NICE

The nice value at which jobs in the queue will be run. This is the UNIX nice value for reducing the process priority (see nice(1)).

STATUS

Inactive

The long format for the -l option gives the possible reasons for a queue to be inactive:

Inact_Win

The queue is out of its dispatch window or its run window.

Inact_Adm

The queue has been inactivated by the LSF administrator.

SSUSP

The number of job slots in the queue allocated to jobs that are suspended by LSF because of load levels or run windows.

USUSP

The number of job slots in the queue allocated to jobs that are suspended by the job submitter or by the LSF administrator.

RSV

The number of job slots in the queue that are reserved by LSF for pending jobs.

Migration threshold

The length of time in seconds that a job dispatched from the queue will remain suspended by the system before LSF attempts to migrate the job to another host. See the MIG parameter in lsb.queues and lsb.hosts.

Schedule delay for a new job

The delay time in seconds for scheduling after a new job is submitted. If the schedule delay time is zero, a new scheduling session is started as soon as the job is submitted to the queue. See the NEW_JOB_SCHED_DELAY parameter in lsb.queues.

Interval for a host to accept two jobs

The length of time in seconds to wait after dispatching a job to a host before dispatching a second job to the same host. If the job accept interval is zero, a host may accept more than one job in each dispatching interval. See the JOB_ACCEPT_INTERVAL parameter in lsb.queues and lsb.params.

RESOURCE LIMITS

The hard resource usage limits that are imposed on the jobs in the queue (see getrlimit(2) and lsb.queues(5)). These limits are imposed on a per-job and a per-process basis.

The possible per-job limits are:

CPULIMIT

The maximum CPU time a job can use, in minutes, relative to the CPU factor of the named host. CPULIMIT is scaled by the CPU factor of the execution host so that jobs are allowed more time on slower hosts.

When the job-level CPULIMIT is reached, a SIGXCPU signal is sent to all processes belonging to the job. If the job has no signal handler for SIGXCPU, the job is killed immediately. If the SIGXCPU signal is handled, blocked, or ignored by the application, then after the grace period expires, LSF sends SIGINT, SIGTERM, and SIGKILL to the job to kill it.

PROCLIMIT

The maximum number of processors allocated to a job. Jobs that request fewer slots than the minimum PROCLIMIT or more slots than the maximum PROCLIMIT are rejected. If the job requests minimum and maximum job slots, the maximum slots requested cannot be less than the minimum PROCLIMIT, and the minimum slots requested cannot be more than the maximum PROCLIMIT.

MEMLIMIT

The maximum running set size (RSS) of a process, in KB. If a process uses more than MEMLIMIT kilobytes of memory, its priority is reduced so that other processes are more likely to be paged in to available memory. This limit is enforced by the setrlimit system call if it supports the RLIMIT_RSS option.

SWAPLIMIT

The swap space limit that a job may use. If SWAPLIMIT is reached, the system sends the following signals in sequence to all processes in the job: SIGINT, SIGTERM, and SIGKILL.

PROCESSLIMIT

The maximum number of concurrent processes allocated to a job. If PROCESSLIMIT is reached, the system sends the following signals in sequence to all processes belonging to the job: SIGINT, SIGTERM, and SIGKILL.

THREADLIMIT

The maximum number of concurrent threads allocated to a job. If THREADLIMIT is reached, the system sends the following signals in sequence to all processes belonging to the job: SIGINT, SIGTERM, and SIGKILL.

The possible UNIX per-process resource limits are:

RUNLIMIT

The maximum wall clock time a process can use, in minutes. RUNLIMIT is scaled by the CPU factor of the execution host. When a job has been in the RUN state for a total of RUNLIMIT minutes, LSF sends a SIGUSR2 signal to the job. If the job does not exit within 10 minutes, LSF sends a SIGKILL signal to kill the job.

FILELIMIT

The maximum file size a process can create, in kilobytes. This limit is enforced by the UNIX setrlimit system call if it supports the RLIMIT_FSIZE option, or the ulimit system call if it supports the UL_SETFSIZE option.

DATALIMIT

The maximum size of the data segment of a process, in kilobytes. This restricts the amount of memory a process can allocate. DATALIMIT is enforced by the setrlimit system call if it supports the RLIMIT_DATA option, and unsupported otherwise.

STACKLIMIT

The maximum size of the stack segment of a process, in kilobytes. This restricts the amount of memory a process can use for local variables or recursive function calls. STACKLIMIT is enforced by the setrlimit system call if it supports the RLIMIT_STACK option.

CORELIMIT

The maximum size of a core file, in KB. This limit is enforced by the setrlimit system call if it supports the RLIMIT_CORE option.

If a job submitted to the queue has any of these limits specified (see bsub(1)), then the lower of the corresponding job limits and queue limits are used for the job.

If no resource limit is specified, the resource is assumed to be unlimited.

SCHEDULING PARAMETERS

The scheduling and suspending thresholds for the queue.

The scheduling threshold loadSched and the suspending threshold loadStop are used to control batch job dispatch, suspension, and resumption. The queue thresholds are used in combination with the thresholds defined for hosts (see bhosts(1) and lsb.hosts(5)). If both queue level and host level thresholds are configured, the most restrictive thresholds are applied.

The loadSched and loadStop thresholds have the following fields:

r15s

The 15-second exponentially averaged effective CPU run queue length.

r1m

The 1-minute exponentially averaged effective CPU run queue length.

r15m

The 15-minute exponentially averaged effective CPU run queue length.

ut

The CPU utilization exponentially averaged over the last minute, expressed as a percentage between 0 and 1.

pg

The memory paging rate exponentially averaged over the last minute, in pages per second.

io

The disk I/O rate exponentially averaged over the last minute, in kilobytes per second.

ls

The number of current login users.

it

On UNIX, the idle time of the host (keyboard not touched on all logged in sessions), in minutes.

On Windows, the it index is based on the time a screen saver has been active on a particular host.

tmp

The amount of free space in /tmp, in megabytes.

swp

The amount of currently available swap space, in megabytes.

mem

The amount of currently available memory, in megabytes.

In addition to these internal indices, external indices are also displayed if they are defined in lsb.queues (see lsb.queues(5)).

The loadSched threshold values specify the job dispatching thresholds for the corresponding load indices. If `-' is displayed as the value, it means the threshold is not applicable. Jobs in the queue may be dispatched to a host if the values of all the load indices of the host are within (below or above, depending on the meaning of the load index) the corresponding thresholds of the queue and the host. The same conditions are used to resume jobs dispatched from the queue that have been suspended on this host.

Similarly, the loadStop threshold values specify the thresholds for job suspension. If any of the load index values on a host go beyond the corresponding threshold of the queue, jobs in the queue will be suspended.

JOB EXCEPTION PARAMETERS

Configured job exception thresholds and number of jobs in each exception state for the queue.

Threshold and NumOfJobs have the following fields:

overrun

Configured threshold in minutes for overrun jobs, and the number of jobs in the queue that have triggered an overrun job exception by running longer than the overrun threshold

underrun

Configured threshold in minutes for underrun jobs, and the number of jobs in the queue that have triggered an underrun job exception by finishing sooner than the underrun threshold

idle

Configured threshold (CPU time/runtime) for idle jobs, and the number of jobs in the queue that have triggered an overrun job exception by having a job idle factor less than the threshold

SCHEDULING POLICIES

Scheduling policies of the queue. Optionally, one or more of the following policies may be configured:

FAIRSHARE

Queue-level fairshare scheduling is enabled. Jobs in this queue are scheduled based on a fairshare policy instead of the first-come, first-serve (FCFS) policy.

BACKFILL

A job in a backfill queue can use the slots reserved by other jobs if the job can run to completion before the slot-reserving jobs start.

Backfilling does not occur on queue limits and user limit but only on host based limits. That is, backfilling is only supported when MXJ, JL/U, JL/P, PJOB_LIMIT, and HJOB_LIMIT are reached. Backfilling is not supported when MAX_JOBS, QJOB_LIMIT, and UJOB_LIMIT are reached.

IGNORE_DEADLINE

If IGNORE_DEADLINE is set to Y, starts all jobs regardless of the run limit.

EXCLUSIVE

Jobs dispatched from an exclusive queue can run exclusively on a host if the user so specifies at job submission time (see bsub(1)). Exclusive execution means that the job is sent to a host with no other batch job running there, and no further job, batch or interactive, will be dispatched to that host while the job is running. The default is not to allow exclusive jobs.

NO_INTERACTIVE

This queue does not accept batch interactive jobs. (see the -I, -Is, and -Ip options of bsub(1)). The default is to accept both interactive and non-interactive jobs.

ONLY_INTERACTIVE

This queue only accepts batch interactive jobs. Jobs must be submitted using the -I, -Is, and -Ip options of bsub(1). The default is to accept both interactive and non-interactive jobs.

FAIRSHARE_QUEUES

Lists queues participating in cross-queue fairshare. The first queue listed is the master queue--the queue in which fairshare is configured; all other queues listed inherit the fairshare policy from the master queue. Fairshare information applies to all the jobs running in all the queues in the master- slave set.

DISPATCH_ORDER

DISPATCH_ORDER=QUEUE is set in the master queue. Jobs from this queue are dispatched according to the order of queue priorities first, then user fairshare priority. Within the queue, dispatch order is based on user share quota. This avoids having users with higher fairshare priority getting jobs dispatched from low-priority queues.

USER_SHARES

A list of [user_name, share] pairs. user_name is either a user name or a user group name. share is the number of shares of resources assigned to the user or user group. A party will get a portion of the resources proportional to that party's share divided by the sum of the shares of all parties specified in this queue.

DEFAULT HOST SPECIFICATION

The default host or host model that will be used to normalize the CPU time limit of all jobs.

If you want to view a list of the CPU factors defined for the hosts in your cluster, see lsinfo(1). The CPU factors are configured in lsf.shared(5).

The appropriate CPU scaling factor of the host or host model is used to adjust the actual CPU time limit at the execution host (see CPULIMIT in lsb.queues(5)). The DEFAULT_HOST_SPEC parameter in lsb.queues overrides the system DEFAULT_HOST_SPEC parameter in lsb.params (see lsb.params(5)). If a user explicitly gives a host specification when submitting a job using bsub -c cpu_limit[/host_name | /host_model], the user specification overrides the values defined in both lsb.params and lsb.queues.

RUN_WINDOWS

The time windows in a week during which jobs in the queue may run.

When a queue is out of its window or windows, no job in this queue will be dispatched. In addition, when the end of a run window is reached, any running jobs from this queue are suspended until the beginning of the next run window, when they are resumed. The default is no restriction, or always open.

DISPATCH_WINDOWS

Dispatch windows are the time windows in a week during which jobs in the queue may be dispatched.

When a queue is out of its dispatch window or windows, no job in this queue will be dispatched. Jobs already dispatched are not affected by the dispatch windows. The default is no restriction, or always open (that is, twenty-four hours a day, seven days a week). Note that such windows are only applicable to batch jobs. Interactive jobs scheduled by LIM are controlled by another set of dispatch windows (see lshosts(1)). Similar dispatch windows may be configured for individual hosts (see bhosts(1)).

A window is displayed in the format begin_time-end_time. Time is specified in the format [day:]hour[:minute], where all fields are numbers in their respective legal ranges: 0(Sunday)-6 for day, 0-23 for hour, and 0-59 for minute. The default value for minute is 0 (on the hour). The default value for day is every day of the week. The begin_time and end_time of a window are separated by `-', with no blank characters (SPACE and TAB) in between. Both begin_time and end_time must be present for a window. Windows are separated by blank characters.

USERS

A list of users allowed to submit jobs to this queue. LSF cluster administrators can submit jobs to the queue even if they are not listed here.

User group names have a slash (/) added at the end of the group name. See bugroup(1).

If the fairshare scheduling policy is enabled, users cannot submit jobs to the queue unless they also have a share assignment. This also applies to LSF administrators.

HOSTS

A list of hosts where jobs in the queue can be dispatched.

Host group names have a slash (/) added at the end of the group name. See bmgroup(1).

NQS DESTINATION QUEUES

A list of NQS destination queues to which this queue can dispatch jobs.

When you submit a job using bsub -q queue_name, and the specified queue is configured to forward jobs to the NQS system, LSF routes your job to one of the NQS destination queues. The job runs on an NQS batch server host, which is not a member of the LSF cluster. Although running on an NQS system outside the LSF cluster, the job is still managed by LSF in almost the same way as jobs running inside the LSF cluster. Thus, you may have your batch jobs transparently sent to an NQS system to run and then get the results of your jobs back. You may use any supported user interface, including LSF commands and NQS commands (see lsnqs(1)) to submit, monitor, signal and delete your batch jobs that are running in an NQS system. See lsb.queues(5) and bsub(1) for more information.

ADMINISTRATORS

A list of queue administrators. The users whose names are specified here are allowed to operate on the jobs in the queue and on the queue itself. See lsb.queues(5) for more information.

PRE_EXEC

The queue's pre-execution command. The pre-execution command is executed before each job in the queue is run on the execution host (or on the first host selected for a parallel batch job). See lsb.queues(5) for more information.

POST_EXEC

The queue's post-execution command. The post-execution command is run on the execution host when a job terminates. See lsb.queues(5) for more information.

REQUEUE_EXIT_VALUES

Jobs that exit with these values are automatically requeued. See lsb.queues(5) for more information.

RES_REQ

Resource requirements of the queue. Only the hosts that satisfy these resource requirements can be used by the queue.

Maximum slot reservation time

The maximum time in seconds a slot is reserved for a pending job in the queue. See the SLOT_RESERVE=MAX_RESERVE_TIME[n] parameter in lsb.queues.

RESUME_COND

The conditions that must be satisfied to resume a suspended job on a host. See lsb.queues(5) for more information.

STOP_COND

The conditions which determine whether a job running on a host should be suspended. See lsb.queues(5) for more information.

JOB_STARTER

An executable file that runs immediately prior to the batch job, taking the batch job file as an input argument. All jobs submitted to the queue are run via the job starter, which is generally used to create a specific execution environment before processing the jobs themselves. See lsb.queues(5) for more information.

CHUNK_JOB_SIZE

Chunk jobs only. Specifies the maximum number of jobs allowed to be dispatched together in a chunk job. All of the jobs in the chunk are scheduled and dispatched as a unit rather than individually. The ideal candidates for job chunking are jobs that typically takes 1 to 2 minutes to run.

SEND_JOBS_TO

MultiCluster. List of remote queue names to which the queue forwards jobs.

RECEIVE_JOBS_FROM

MultiCluster. List of remote cluster names from which the queue receives jobs.

PREEMPTION

PREEMPTIVE

The queue is preemptive. Jobs in a preemptive queue may preempt running jobs from lower-priority queues, even if the lower-priority queues are not specified as preemptive.

PREEMPTABLE

The queue is preemptable. Running jobs in a preemptable queue may be preempted by jobs in higher-priority queues, even if the higher- priority queues are not specified as preemptive.

RERUNNABLE

If the RERUNNABLE field displays yes, jobs in the queue are rerunnable. That is, jobs in the queue are automatically restarted or rerun if the execution host becomes unavailable. However, a job in the queue will not be restarted if the you have removed the rerunnable option from the job. See lsb.queues(5) for more information.

CHECKPOINT

If the CHKPNTDIR field is displayed, jobs in the queue are checkpointable. Jobs will use the default checkpoint directory and period unless you specify other values. Note that a job in the queue will not be checkpointed if you have removed the checkpoint option from the job. See lsb.queues(5) for more information.

CHKPNTDIR

Specifies the checkpoint directory using an absolute or relative path name.

CHKPNTPERIOD

Specifies the checkpoint period in seconds.

Although the output of bqueues reports the checkpoint period in seconds, the checkpoint period is defined in minutes (the checkpoint period is defined through the bsub -k "checkpoint_dir [checkpoint_period]" option, or in lsb.queues).

JOB CONTROLS

The configured actions for job control. See JOB_CONTROLS parameter in lsb.queues.

The configured actions are displayed in the format [action_type, command] where action_type is either SUSPEND, RESUME, or TERMINATE.

ADMIN ACTION COMMENT

If the LSF administrator specified an administrator comment with the -C option of the queue control commands qclose, qopen, qact, and qinact, qhist the comment text is displayed.

SLOT_SHARE

Share of job slots for queue-based fairshare. Represents the percentage of running jobs (job slots) in use from the queue. SLOT_SHARE must be greater than zero.

The sum of SLOT_SHARE for all queues in the pool does not need to be 100%. It can be more or less, depending on your needs.

SLOT_POOL

Name of the pool of job slots the queue belongs to for queue-based fairshare. A queue can only belong to one pool. All queues in the pool must share the same set of hosts.

Output for -r option

In addition to the fields displayed for the -l option, the -r option displays the following:

SCHEDULING POLICIES

FAIRSHARE

The -r option causes bqueues to recursively display the entire share information tree associated with the queue.

SEE ALSO

bugroup(1), nice(1), getrlimit(2), lsb.queues(5), bsub(1), bjobs(1), bhosts(1), badmin(8), mbatchd(8)

[ Top ]

[ Platform Documentation ] [ Title ] [ Contents ] [ Previous ] [ Next ] [ Index ]

Date Modified: February 24, 2004
Platform Computing: www.platform.com

Platform Support: support@platform.com
Platform Information Development: doc@platform.com

Copyright © 1994-2004 Platform Computing Corporation. All rights reserved.