[ Platform Documentation ] [ Title ] [ Contents ] [ Previous ] [ Next ] [ Index ]
The
lsf.shared
file contains common definitions that are shared by all load sharing clusters defined bylsf.cluster.
cluster_name files. This includes lists of cluster names, host types, host models, the special resources available, and external load indices.This file is installed by default in the directory defined by LSF_CONFDIR.
[ Top ]
Cluster Section
(Required) Lists the cluster names recognized by the LSF system
Cluster section structure
The first line must contain the mandatory keyword ClusterName. The other keyword is optional.
The first line must contain the mandatory keyword ClusterName and the keyword Servers in a MultiCluster environment.
Each subsequent line defines one cluster.
Example Cluster section
Begin Cluster ClusterName Servers cluster1 hostA cluster2 hostB End ClusterClusterName
Defines all cluster names recognized by the LSF system.
All cluster names referenced anywhere in the LSF system must be defined here. The file names of cluster-specific configuration files must end with the associated cluster name.
By default, if MultiCluster is installed, all clusters listed in this section participate in the same MultiCluster environment. However, individual clusters can restrict their MultiCluster participation by specifying a subset of clusters at the cluster level (
lsf.cluster.
cluster_name RemoteClusters section).Servers
MultiCluster only. List of hosts in this cluster that LIMs in remote clusters can connect to and obtain information from.
For other clusters to work with this cluster, one of these hosts must be running
mbatchd
.[ Top ]
HostType Section
(Required) Lists the valid host types in the cluster. All hosts that can run the same binary executable are in the same host type.
HostType section structure
The first line consists of the mandatory keyword TYPENAME.
Subsequent lines name valid host types.
Example HostType section
Begin HostType TYPENAME SUN41 SOLSPARC ALPHA HPPA NTX86 End HostTypeTYPENAME
Host type names are usually based on a combination of the hardware name and operating system. If your site already has a system for naming host types, you can use the same names for LSF.
[ Top ]
HostModel Section
(Required) Lists models of machines and gives the relative CPU scaling factor for each model. All hosts of the same relative speed are assigned the same host model.
LSF uses the relative CPU scaling factor to normalize the CPU load indices so that jobs are more likely to be sent to faster hosts. The CPU factor affects the calculation of job execution time limits and accounting. Using large or inaccurate values for the CPU factor can cause confusing results when CPU time limits or accounting are used.
HostModel section structure
The first line consists of the mandatory keywords MODELNAME, CPUFACTOR, and ARCHITECTURE.
Subsequent lines define a model and its CPU factor.
Example HostModel section
Begin HostModel MODELNAME CPUFACTOR ARCHITECTURE PC400 13.0 (i86pc_400 i686_400) PC450 13.2 (i86pc_450 i686_450) Sparc5F 3.0 (SUNWSPARCstation5_170_sparc) Sparc20 4.7 (SUNWSPARCstation20_151_sparc) Ultra5S 10.3 (SUNWUltra5_270_sparcv9 SUNWUltra510_270_sparcv9) End HostModelARCHITECTURE
(Reserved for system use only) Indicates automatically detected host models that correspond to the model names.
CPUFACTOR
Though it is not required, you would typically assign a CPU factor of 1.0 to the slowest machine model in your system and higher numbers for the others. For example, for a machine model that executes at twice the speed of your slowest model, a factor of 2.0 should be assigned.
MODELNAME
Generally, you need to identify the distinct host types in your system, such as MIPS and SPARC first, and then the machine models within each, such as SparcIPC, Sparc1, Sparc2, and Sparc10.
About automatically detected host models and types
When you first install LSF, you do not necessarily need to assign models and types to hosts in
lsf.cluster.
cluster_name. If you do not assign models and types to hosts inlsf.cluster.
cluster_name, LIM automatically detects the model and type for the host.If you have versions earlier than LSF 4.0, you may have host models and types already assigned to hosts. You can take advantage of automatic detection of host model and type also.
Automatic detection of host model and type is useful because you no longer need to make changes in the configuration files when you upgrade the operating system or hardware of a host and reconfigure the cluster. LSF will automatically detect the change.
Automatically detected models are mapped to the short model names in
lsf.shared
in the ARCHITECTURE column. Model strings in the ARCHITECTURE column are only used for mapping to the short model names.Example
lsf.shared
file:Begin HostModel MODELNAME CPUFACTOR ARCHITECTURE SparcU5 5.0 (SUNWUltra510_270_sparcv9) PC486 2.0 (i486_33 i486_66) PowerPC 3.0 (PowerPC12 PowerPC16 PowerPC31) End HostModelIf an automatically detected host model cannot be matched with the short model name, it is matched to the best partial match and a warning message is generated.
If a host model cannot be detected or is not supported, it is assigned the DEFAULT model name and an error message is generated.
Naming convention
Models that are automatically detected are named according to the following convention:
hardware_platform [_processor_speed[_processor_type]]where:
- hardware_platform is the only mandatory component
- processor_speed is the optional clock speed and is used to differentiate computers within a single platform
- processor_type is the optional processor manufacturer used to differentiate processors with the same speed
- Underscores (
_
) between hardware_platform, processor_speed, processor_type are mandatory.[ Top ]
Resource Section
Optional. Defines resources (must be done by the LSF administrator).
Resource section structure
The first line consists of the keywords. RESOURCENAME and DESCRIPTION are mandatory. The other keywords are optional. Subsequent lines define resources.
Example Resource section
Begin Resource RESOURCENAME TYPE INTERVAL INCREASING RELEASE DESCRIPTION mips Boolean () () () (MIPS architecture) dec Boolean () () () (DECStation system) sparc Boolean () () () (SUN SPARC) bsd Boolean () () () (BSD unix) hpux Boolean () () () (HP-UX UNIX) aix Boolean () () () (AIX UNIX) solaris Boolean () () () (SUN SOLARIS) myResource String () () () (MIPS architecture) static_sh1 Numeric () N () (static) external_1 Numeric 15 Y () (external) End ResourceRESOURCENAME
The name you assign to the new resource. An arbitrary character string.
- A resource name cannot begin with a number.
- A resource name cannot contain any of the following characters:
: . ( ) [ + - * / ! & | < > @ =- A resource name cannot be any of the following reserved names:
cpu cpuf io logins ls idle maxmem maxswp maxtmp type model status it mem ncpus ndisks pg r15m r15s r1m swap swp tmp ut- Resource names are case sensitive
- Resource names can be up to 29 characters in length
TYPE
The type of resource:
- Boolean--Resources that have a value of 1 on hosts that have the resource and 0 otherwise.
- Numeric--Resources that take numerical values, such as all the load indices, number of processors on a host, or host CPU factor.
- String-- Resources that take string values, such as host type, host model, host status.
If TYPE is not given, the default type is Boolean.
DESCRIPTION
Brief description of the resource.
The information defined here will be returned by the
ls_info
() API call or printed out by thelsinfo
command as an explanation of the meaning of the resource.INCREASING
Applies to numeric resources only.
If a larger value means greater load, INCREASING should be defined as Y. If a smaller value means greater load, INCREASING should be defined as N.
INTERVAL
Optional. Applies to dynamic resources only.
Defines the time interval (in seconds) at which the resource is sampled by the ELIM.
If INTERVAL is defined for a numeric resource, it becomes an external load index.
If INTERVAL is not given, the resource is considered static.
RELEASE
Applies to numeric shared resources only, such as floating licenses.
Controls whether LSF releases the resource when a job using the resource is suspended. When a job using a shared resource is suspended, the resource is held or released by the job depending on the configuration of this parameter.
Specify N to hold the resource, or specify Y to release the resource.
Y
[ Top ]
[ Platform Documentation ] [ Title ] [ Contents ] [ Previous ] [ Next ] [ Index ]
Date Modified: February 24, 2004
Platform Computing: www.platform.com
Platform Support: support@platform.com
Platform Information Development: doc@platform.com
Copyright © 1994-2004 Platform Computing Corporation. All rights reserved.