Learn more about Platform products at http://www.platform.com

[ Platform Documentation ] [ Title ] [ Contents ] [ Previous ] [ Next ] [ Index ]



lsb.hosts


The lsb.hosts file contains host-related configuration information for the server hosts in the cluster. It is also used to define host groups and host partitions.

This file is optional. All sections are optional.

By default, this file is installed in LSB_CONFDIR/cluster_name/configdir.

Contents

[ Top ]


Host Section

Description

Optional. Defines the hosts, host types, and host models used as server hosts, and contains per-host configuration information. If this section is not configured, LSF uses all hosts in the cluster (the hosts listed in lsf.cluster.cluster_name) as server hosts.

Each host, host model or host type can be configured to:

The entries in a line for a host override the entries in a line for its model or type.

When you modify the cluster by adding or removing hosts, no changes are made to lsb.hosts. This does not affect the default configuration, but if hosts, host models, or host types are specified in this file, you should check this file whenever you make changes to the cluster and update it manually if necessary.

Host Section Structure

The first line consists of keywords identifying the load indices that you wish to configure on a per-host basis. The keyword HOST_NAME must be used; the others are optional. Load indices not listed on the keyword line do not affect scheduling decisions.

Each subsequent line describes the configuration information for one host, host model or host type. Each line must contain one entry for each keyword. Use empty parentheses ( ) or a dash (-) to specify the default value for an entry.

HOST_NAME

Required. Specify the name, model, or type of a host, or the keyword default.

host name

The name of a host defined in lsf.cluster.cluster_name. The official host name returned by gethostbyname(3).

host model

A host model defined in lsf.shared.

host type

A host type defined in lsf.shared.

default

The reserved host name default indicates all hosts in the cluster not otherwise referenced in the section (by name or by listing its model or type).

CHKPNT

Description

If C, checkpoint copy is enabled. With checkpoint copy, all opened files are automatically copied to the checkpoint directory by the operating system when a process is checkpointed.

Example

HOST_NAME  CHKPNT 
hostA         C

Compatibility

Checkpoint copy is only supported on Cray systems.

Default

No checkpoint copy.

DISPATCH_WINDOW

Description

The time windows in which jobs from this host, host model, or host type are dispatched. Once dispatched, jobs are no longer affected by the dispatch window.

Default

Undefined (always open).

EXIT_RATE

Description

Specifies a threshold in minutes for exited jobs. If the job exit rate is exceeded for 10 minutes or the period specified by JOB_EXIT_RATE_DURATION, LSF invokes LSF_SERVERDIR/eadmin to trigger a host exception.

Example

The following Host section defines a job exit rate of 20 jobs per minute for all hosts:

Begin Host
HOST_NAME    MXJ      EXIT_RATE
Default      !        20
End Host

Default

Undefined

JL/U

Description

Per-user job slot limit for the host. Maximum number of job slots that each user can use on this host.

Example

HOST_NAME  JL/U 
hostA         2

Default

Unlimited

MIG

Description

Enables job migration and specifies the migration threshold, in minutes.

If a checkpointable or rerunnable job dispatched to the host is suspended (SSUSP state) for longer than the specified number of minutes, the job is migrated. A value of 0 specifies that a suspended job should be migrated immediately.

If a migration threshold is defined at both host and queue levels, the lower threshold is used.

If you do not want migrating jobs to be run or restarted immediately, set LSB_MBD_MIG2PEND in lsf.conf so that migrating jobs are considered as pending jobs and inserted in the pending jobs queue.

If you want migrating jobs to be considered as pending jobs but you want them to be placed at the bottom of the queue without considering submission time, define both LSB_MBD_MIG2PEND and LSB_REQUEUE_TO_BOTTOM in lsf.conf.

Example

HOST_NAME   MIG 
hostA        10

In this example, the migration threshold is 10 minutes.

Default

Undefined (no migration)

MXJ

Description

The number of job slots on the host.

With MultiCluster resource leasing model, this is the number of job slots on the host that are available to the local cluster.

Use "!" to make the number of job slots equal to the number of CPUs on a host.

Use "!" for the reserved host name default to make the number of job slots equal to the number of CPUs on all hosts in a cluster not defined in the host section of the lsb.hosts file.

By default, the number of running and suspended jobs on a host cannot exceed the number of job slots. If preemptive scheduling is used, the suspended jobs are not counted as using a job slot.

On multiprocessor hosts, to fully use the CPU resource, make the number of job slots equal to or greater than the number of processors.

Default

Unlimited

load_index

Syntax

load_index
loadSched[/loadStop]

Specify io, it, ls, mem, pg, r15s, r1m, r15m, swp, tmp, ut, or a non-shared custom external load index as a column. Specify multiple columns to configure thresholds for multiple load indices.

Description

Scheduling and suspending thresholds for dynamic load indices supported by LIM, including external load indices.

Each load index column must contain either the default entry or two numbers separated by a slash `/', with no white space. The first number is the scheduling threshold for the load index; the second number is the suspending threshold.

Queue-level scheduling and suspending thresholds are defined in lsb.queues. If both files specify thresholds for an index, those that apply are the most restrictive ones.

Example

HOST_NAME    mem     swp
hostA        100/10  200/30

This example translates into a loadSched condition of

mem>=100 && swp>=200 

and a loadStop condition of

mem < 10 || swp < 30

Default

Undefined

Example of a Host Section

Begin Host
HOST_NAME MXJ JL/U r1m     pg    DISPATCH_WINDOW
hostA     1   -    0.6/1.6 10/20 (5:19:00-1:8:30 20:00-8:30)
SUNSOL    1   -    0.5/2.5 -     23:00-8:00
default   2   1    0.6/1.6 20/40 ()
End Host

SUNSOL is a host type defined in lsf.shared. This example Host section configures one host and one host type explicitly and configures default values for all other load-sharing hosts.

HostA runs one batch job at a time. A job will only be started on hostA if the r1m index is below 0.6 and the pg index is below 10; the running job is stopped if the r1m index goes above 1.6 or the pg index goes above 20. HostA only accepts batch jobs from 19:00 on Friday evening until 8:30 Monday morning and overnight from 20:00 to 8:30 on all other days.

For hosts of type SUNSOL, the pg index does not have host-specific thresholds and such hosts are only available overnight from 23:00 to 8:00.

The entry with host name default applies to each of the other hosts in the cluster. Each host can run up to two jobs at the same time, with at most one job from each user. These hosts are available to run jobs at all times. Jobs may be started if the r1m index is below 0.6 and the pg index is below 20, and a job from the lowest priority queue is suspended if r1m goes above 1.6 or pg goes above 40.

[ Top ]


HostGroup Section

Description

Optional. Defines host groups.

The name of the host group can then be used in other host group, host partition, and queue definitions, as well as on the command line. Specifying the name of a host group has exactly the same effect as listing the names of all the hosts in the group.

Structure

Host groups are specified in the same format as user groups in lsb.users.

The first line consists of two mandatory keywords, GROUP_NAME and GROUP_MEMBER. Subsequent lines name a group and list its membership.

The sum of host groups and host partitions cannot be more than MAX_GROUPS (see lsbatch.h for details).

GROUP_NAME

Description

An alphanumeric string representing the name of the host group.

You cannot use the reserved name all, and group names must not conflict with host names.

GROUP_MEMBER

Description

A space-separated list of host names or previously defined host group names, enclosed in parentheses.

The names of hosts and host groups can appear on multiple lines because hosts can belong to multiple groups. The reserved name all specifies all hosts in the cluster. Use an exclamation mark (!) to specify that the group membership should be retrieved via egroup. Use a tilde (~) to exclude specified hosts or host groups from the list.

Examples of HostGroup Sections

Example 1

Begin HostGroup
GROUP_NAME  GROUP_MEMBER
groupA      (hostA hostD)
groupB      (hostF groupA hostK)
groupC      (!)
End HostGroup

This example defines three host groups:

Example 2

Begin HostGroup
GROUP_NAME   GROUP_MEMBER
groupA       (all)
groupB       (groupA ~hostA ~hostB)
groupC       (hostX hostY hostZ)
groupD       (groupC ~hostX)
groupE       (all ~groupC ~hostB)
groupF       (hostF groupC hostK)
End HostGroup

This example defines the following host groups:

[ Top ]


HostPartition Section

Description

Optional; used with host partition fairshare scheduling. Defines a host partition, which defines a fairshare policy at the host level.

Configure multiple sections to define multiple partitions.

The members of a host partition form a host group with the same name as the host partition.

Limitations on Queue Configuration

Shared Resources and Host Partitions

Structure

Each host partition always consists of 3 lines, defining the name of the partition, the hosts included in the partition, and the user share assignments.

HPART_NAME

Syntax

HPART_NAME = partition_name

Description

Specifies the name of the partition.

HOSTS

Syntax

HOSTS = [[~]host_name | [~]host_group | all]...

Description

Specifies the hosts in the partition, in a space-separated list.

A host cannot belong to multiple partitions.

Hosts that are not included in any host partition are controlled by the FCFS scheduling policy instead of the fairshare scheduling policy.

Optionally, use the reserved host name all to configure a single partition that applies to all hosts in a cluster.

Optionally, use the not operator (~) to exclude hosts or host groups from the list of hosts in the host partition.

Example

HOSTS = all ~hostK ~hostM

The partition includes all the hosts in the cluster, except for hosts K and M.

USER_SHARES

Syntax

USER_SHARES = [user, number_shares]...

Description

Specifies user share assignments

Example of a HostPartition Section

Begin HostPartition
HPART_NAME = Partition1
HOSTS = hostA hostB
USER_SHARES = [groupA@, 3] [groupB, 7] [default, 1]
End HostPartition

[ Top ]


[ Platform Documentation ] [ Title ] [ Contents ] [ Previous ] [ Next ] [ Index ]


      Date Modified: February 24, 2004
Platform Computing: www.platform.com

Platform Support: support@platform.com
Platform Information Development: doc@platform.com

Copyright © 1994-2004 Platform Computing Corporation. All rights reserved.