[ Platform Documentation ] [ Title ] [ Contents ] [ Previous ] [ Next ] [ Index ]
The
lsb.hosts
file contains host-related configuration information for the server hosts in the cluster. It is also used to define host groups and host partitions.This file is optional. All sections are optional.
By default, this file is installed in
LSB_CONFDIR/
cluster_name/configdir
.[ Top ]
Host Section
Description
Optional. Defines the hosts, host types, and host models used as server hosts, and contains per-host configuration information. If this section is not configured, LSF uses all hosts in the cluster (the hosts listed in
lsf.cluster.
cluster_name) as server hosts.Each host, host model or host type can be configured to:
- Limit the maximum number of jobs run in total
- Limit the maximum number of jobs run by each user
- Run jobs only under specific load conditions
- Run jobs only under specific time windows
The entries in a line for a host override the entries in a line for its model or type.
When you modify the cluster by adding or removing hosts, no changes are made to
lsb.hosts
. This does not affect the default configuration, but if hosts, host models, or host types are specified in this file, you should check this file whenever you make changes to the cluster and update it manually if necessary.Host Section Structure
The first line consists of keywords identifying the load indices that you wish to configure on a per-host basis. The keyword HOST_NAME must be used; the others are optional. Load indices not listed on the keyword line do not affect scheduling decisions.
Each subsequent line describes the configuration information for one host, host model or host type. Each line must contain one entry for each keyword. Use empty parentheses ( ) or a dash (-) to specify the default value for an entry.
HOST_NAME
Required. Specify the name, model, or type of a host, or the keyword
default
.The name of a host defined in
lsf.cluster.
cluster_name. The official host name returned bygethostbyname(3)
.A host model defined in
lsf.shared
.A host type defined in
lsf.shared
.The reserved host name
default
indicates all hosts in the cluster not otherwise referenced in the section (by name or by listing its model or type).CHKPNT
If C, checkpoint copy is enabled. With checkpoint copy, all opened files are automatically copied to the checkpoint directory by the operating system when a process is checkpointed.
HOST_NAME CHKPNT hostA CCheckpoint copy is only supported on Cray systems.
No checkpoint copy.
DISPATCH_WINDOW
The time windows in which jobs from this host, host model, or host type are dispatched. Once dispatched, jobs are no longer affected by the dispatch window.
Undefined (always open).
EXIT_RATE
Specifies a threshold in minutes for exited jobs. If the job exit rate is exceeded for 10 minutes or the period specified by JOB_EXIT_RATE_DURATION, LSF invokes
LSF_SERVERDIR/eadmin
to trigger a host exception.The following Host section defines a job exit rate of 20 jobs per minute for all hosts:
Begin Host HOST_NAME MXJ EXIT_RATE Default ! 20 End HostUndefined
JL/U
Per-user job slot limit for the host. Maximum number of job slots that each user can use on this host.
HOST_NAME JL/U hostA 2Unlimited
MIG
Enables job migration and specifies the migration threshold, in minutes.
If a checkpointable or rerunnable job dispatched to the host is suspended (SSUSP state) for longer than the specified number of minutes, the job is migrated. A value of 0 specifies that a suspended job should be migrated immediately.
If a migration threshold is defined at both host and queue levels, the lower threshold is used.
If you do not want migrating jobs to be run or restarted immediately, set LSB_MBD_MIG2PEND in
lsf.conf
so that migrating jobs are considered as pending jobs and inserted in the pending jobs queue.If you want migrating jobs to be considered as pending jobs but you want them to be placed at the bottom of the queue without considering submission time, define both LSB_MBD_MIG2PEND and LSB_REQUEUE_TO_BOTTOM in
lsf.conf
.HOST_NAME MIG hostA 10In this example, the migration threshold is 10 minutes.
Undefined (no migration)
MXJ
The number of job slots on the host.
With MultiCluster resource leasing model, this is the number of job slots on the host that are available to the local cluster.
Use "!" to make the number of job slots equal to the number of CPUs on a host.
Use "!" for the reserved host name default to make the number of job slots equal to the number of CPUs on all hosts in a cluster not defined in the host section of the
lsb.hosts
file.By default, the number of running and suspended jobs on a host cannot exceed the number of job slots. If preemptive scheduling is used, the suspended jobs are not counted as using a job slot.
On multiprocessor hosts, to fully use the CPU resource, make the number of job slots equal to or greater than the number of processors.
Unlimited
load_index
load_index loadSched[/
loadStop]Specify
io
,it
,ls
,mem
,pg
,r15s
,r1m
,r15m
,swp
,tmp
,ut
, or a non-shared custom external load index as a column. Specify multiple columns to configure thresholds for multiple load indices.Scheduling and suspending thresholds for dynamic load indices supported by LIM, including external load indices.
Each load index column must contain either the default entry or two numbers separated by a slash `/', with no white space. The first number is the scheduling threshold for the load index; the second number is the suspending threshold.
Queue-level scheduling and suspending thresholds are defined in
lsb.queues
. If both files specify thresholds for an index, those that apply are the most restrictive ones.HOST_NAME mem swp hostA 100/10 200/30This example translates into a
loadSched
condition ofmem>=100 && swp>=200and a
loadStop
condition ofmem < 10 || swp < 30Undefined
Example of a Host Section
Begin Host HOST_NAME MXJ JL/U r1m pg DISPATCH_WINDOW hostA 1 - 0.6/1.6 10/20 (5:19:00-1:8:30 20:00-8:30) SUNSOL 1 - 0.5/2.5 - 23:00-8:00 default 2 1 0.6/1.6 20/40 () End HostSUNSOL is a host type defined in
lsf.shared
. This exampleHost
section configures one host and one host type explicitly and configures default values for all other load-sharing hosts.
HostA
runs one batch job at a time. A job will only be started onhostA
if ther1m
index is below 0.6 and thepg
index is below 10; the running job is stopped if ther1m
index goes above 1.6 or thepg
index goes above 20.HostA
only accepts batch jobs from 19:00 on Friday evening until 8:30 Monday morning and overnight from 20:00 to 8:30 on all other days.For hosts of type SUNSOL, the
pg
index does not have host-specific thresholds and such hosts are only available overnight from 23:00 to 8:00.The entry with host name default applies to each of the other hosts in the cluster. Each host can run up to two jobs at the same time, with at most one job from each user. These hosts are available to run jobs at all times. Jobs may be started if the
r1m
index is below 0.6 and thepg
index is below 20, and a job from the lowest priority queue is suspended ifr1m
goes above 1.6 orpg
goes above 40.[ Top ]
HostGroup Section
Description
Optional. Defines host groups.
The name of the host group can then be used in other host group, host partition, and queue definitions, as well as on the command line. Specifying the name of a host group has exactly the same effect as listing the names of all the hosts in the group.
Structure
Host groups are specified in the same format as user groups in
lsb.users
.The first line consists of two mandatory keywords, GROUP_NAME and GROUP_MEMBER. Subsequent lines name a group and list its membership.
The sum of host groups and host partitions cannot be more than MAX_GROUPS (see
lsbatch.h
for details).GROUP_NAME
An alphanumeric string representing the name of the host group.
You cannot use the reserved name
all
, and group names must not conflict with host names.GROUP_MEMBER
A space-separated list of host names or previously defined host group names, enclosed in parentheses.
The names of hosts and host groups can appear on multiple lines because hosts can belong to multiple groups. The reserved name
all
specifies all hosts in the cluster. Use an exclamation mark (!) to specify that the group membership should be retrieved viaegroup
. Use a tilde (~) to exclude specified hosts or host groups from the list.Examples of HostGroup Sections
Begin HostGroup GROUP_NAME GROUP_MEMBER groupA (hostA hostD) groupB (hostF groupA hostK) groupC (!) End HostGroupThis example defines three host groups:
- groupA includes hostsA and hostD.
- groupB includes hostsF and hostK, along with all hosts in groupA.
- the group membership of groupC will be retrieved via egroup.
Begin HostGroup GROUP_NAME GROUP_MEMBER groupA (all) groupB (groupA ~hostA ~hostB) groupC (hostX hostY hostZ) groupD (groupC ~hostX) groupE (all ~groupC ~hostB) groupF (hostF groupC hostK) End HostGroupThis example defines the following host groups:
groupA
contains all hosts in the cluster.groupB
contains all the hosts in the cluster except forhostA
andhostB
.groupC
contains onlyhostX
,hostY
, andhostZ
.groupD
contains the hosts ingroupC
except forhostX
. Note thathostX
must be a member of host groupgroupC
to be excluded fromgroupD
.groupE
contains all hosts in the cluster excluding the hosts ingroupC
andhostB
.groupF
containshostF
,hostK
, and the 3 hosts ingroupC
.[ Top ]
HostPartition Section
Description
Optional; used with host partition fairshare scheduling. Defines a host partition, which defines a fairshare policy at the host level.
Configure multiple sections to define multiple partitions.
The members of a host partition form a host group with the same name as the host partition.
Limitations on Queue Configuration
- If you configure a host partition, you cannot configure fairshare at the queue level.
- If a queue uses a host that belongs to a host partition, it should not use any hosts that don't belong to that partition. All the hosts in the queue should belong to the same partition. Otherwise, you might notice unpredictable scheduling behavior:
- Jobs in the queue sometimes may be dispatched to the host partition even though hosts not belonging to any host partition have a lighter load.
- If some hosts belong to one host partition and some hosts belong to another, only the priorities of one host partition are used when dispatching a parallel job to hosts from more than one host partition.
Shared Resources and Host Partitions
- If a resource is shared among hosts included in host partitions and hosts that are not included in any host partition, jobs in queues that use the host partitions will always get the shared resource first, regardless of queue priority.
- If a resource is shared among host partitions, jobs in queues that use the host partitions listed first in the
HostPartition
section oflsb.hosts
will always have priority to get the shared resource first. To allocate shared resources among host partitions, LSF considers host partitions in the order they are listed inlsb.hosts
.Structure
Each host partition always consists of 3 lines, defining the name of the partition, the hosts included in the partition, and the user share assignments.
HPART_NAME
HPART_NAME =
partition_nameSpecifies the name of the partition.
HOSTS
HOSTS =
[[~
]host_name | [~
]host_group |all
]...Specifies the hosts in the partition, in a space-separated list.
A host cannot belong to multiple partitions.
Hosts that are not included in any host partition are controlled by the FCFS scheduling policy instead of the fairshare scheduling policy.
Optionally, use the reserved host name
all
to configure a single partition that applies to all hosts in a cluster.Optionally, use the not operator (
~
) to exclude hosts or host groups from the list of hosts in the host partition.HOSTS = all ~hostK ~hostMThe partition includes all the hosts in the cluster, except for hosts K and M.
USER_SHARES
USER_SHARES = [
user,
number_shares]
...Specifies user share assignments
- Specify at least one user share assignment.
- Enclose each user share assignment in square brackets, as shown.
- Separate a list of multiple share assignments with a space between the square brackets.
- user
Specify users who are also configured to use the host partition. You can assign the shares:
- To a single user (specify user_name)
- To users in a group, individually (specify group_name
@
) or collectively (specify group_name)- To users not included in any other share assignment, individually (specify the keyword
default
) or collectively (specify the keywordothers
)By default, when resources are assigned collectively to a group, the group members compete for the resources according to FCFS scheduling. You can use hierarchical fairshare to further divide the shares among the group members.
When resources are assigned to members of a group individually, the share assignment is recursive. Members of the group and of all subgroups always compete for the resources according to FCFS scheduling, regardless of hierarchical fairshare policies.
- number_shares
Specify a positive integer representing the number of shares of the cluster resources assigned to the user.
The number of shares assigned to each user is only meaningful when you compare it to the shares assigned to other users or to the total number of shares. The total number of shares is just the sum of all the shares assigned in each share assignment.
Example of a HostPartition Section
Begin HostPartition HPART_NAME = Partition1 HOSTS = hostA hostB USER_SHARES = [groupA@, 3] [groupB, 7] [default, 1] End HostPartition
[ Top ]
[ Platform Documentation ] [ Title ] [ Contents ] [ Previous ] [ Next ] [ Index ]
Date Modified: February 24, 2004
Platform Computing: www.platform.com
Platform Support: support@platform.com
Platform Information Development: doc@platform.com
Copyright © 1994-2004 Platform Computing Corporation. All rights reserved.