[ Platform Documentation ] [ Title ] [ Contents ] [ Previous ] [ Next ] [ Index ]
The LSF batch event log file
lsb.events
is used to display LSF batch event history and formbatchd
failure recovery.Whenever a host, job, or queue changes status, a record is appended to the event log file. The file is located in
LSB_SHAREDIR/<
cluster_name>/logdir
, where LSB_SHAREDIR must be defined inlsf.conf(5)
and <cluster_name> is the name of the LSF cluster, as returned bylsid
(1). Seembatchd
(8) for the description of LSB_SHAREDIR.[ Top ]
lsb.events Structure
The event log file is an ASCII file with one record per line. For the
lsb.events
file, the first line has the format "# <history seek position>", which indicates the file position of the first history event after log switch. For thelsb.events.#
file, the first line has the format "# <timestamp of most recent event>", which gives the timestamp of the recent event in the file.Records and fields
The fields of a record are separated by blanks. The first string of an event record indicates its type. The following types of events are recorded:
- JOB_NEW
- JOB_FORWARD
- JOB_ACCEPT
- JOB_START
- JOB_START_ACCEPT
- JOB_STATUS
- JOB_SWITCH
- JOB_MOVE
- QUEUE_CTRL
- HOST_CTRL
- MBD_START
- MBD_DIE
- UNFULFILL
- LOAD_INDEX
- JOB_SIGACT
- MIG
- JOB_MODIFY2
- JOB_SIGNAL
- CAL_NEW
- CAL_MODIFY
- CAL_DELETE
- JOB_EXECUTE
- JOB_REQUEUE
- JOB_CLEAN
- JOB_EXCEPTION
- JOB_EXT_MSG
- JOB_ATTA_DATA
- JOB_CHUNK
- SBD_UNREPORTED_STATUS
JOB_NEW
A new job has been submitted. The fields in order of occurrence are:
Version number (%s)
The version number
Event time (%d)
The time of the event
jobId (%d)
Job ID
userId (%d)
UNIX user ID of the submitter
options (%d)
Bit flags for job processing
numProcessors (%d)
Number of processors requested for execution
submitTime (%d)
Job submission time
beginTime (%d)
Start time - the job should be started on or after this time
termTime (%d)
Termination deadline - the job should be terminated by this time (%d)
sigValue (%d)
Signal value
chkpntPeriod (%d)
Checkpointing period
restartPid (%d)
Restart process ID
userName (%s)
User name
rLimits
Soft CPU time limit (%d), see
getrlimit(2)
rLimits
Soft file size limit (%d), see
getrlimit(2)
rLimits
Soft data segment size limit (%d), see
getrlimit(2)
rLimits
Soft stack segment size limit (%d), see
getrlimit(2)
rLimits
Soft core file size limit (%d), see
getrlimit(2
)rLimits
Soft memory size limit (%d), see
getrlimit(2
)rLimits
Reserved (%d)
rLimits
Reserved (%d)
rLimits
Reserved (%d)
rLimits
Soft run time limit (%d), see
getrlimit
(2)rLimits
Reserved (%d)
hostSpec (%s)
Model or host name for normalizing CPU time and run time
hostFactor (%f)
CPU factor of the above host
umask (%d)
File creation mask for this job
queue (%s)
Name of job queue to which the job was submitted
resReq (%s)
Resource requirements
fromHost (%s)
Submission host name
cwd (%s)
Current working directory
chkpntDir (%s)
Checkpoint directory
inFile (%s)
Input file name
outFile (%s)
Output file name
errFile (%s)
Error output file name
subHomeDir (%s)
Submitter's home directory
jobFile (%s)
Job file name
numAskedHosts (%d)
Number of candidate host names
askedHosts (%s)
List of names of candidate hosts for job dispatching
dependCond (%s)
Job dependency condition
preExecCmd (%s)
Job pre-execution command
timeEvent (%d)
Time Event, for job dependency condition; specifies when time event ended
jobName (%s)
Job name
command (%s)
Job command
nxf (%d)
Number of files to transfer (%d)
xf (%s)
List of file transfer specifications
mailUser (%s)
Mail user name
projectName (%s)
Project name
niosPort (%d)
Callback port if batch interactive job
maxNumProcessors (%d)
Maximum number of processors
schedHostType (%s)
Execution host type
loginShell (%s)
Login shell
userGroup (%s)
User group
exceptList (%s)
Exception handlers for the job
options2 (%d)
Bit flags for job processing
idx (%d)
Job array index
inFileSpool (%s)
Spool input file
commandSpool (%s)
Spool command file
jobSpoolDir (%s)
Job spool directory
userPriority (%d)
User priority
rsvId %s
Advance reservation ID; for example, "
user2#0
"jobGroup (%s)
The job group under which the job runs
extsched (%s)
External scheduling options
warningAction (%s)
Job warning action
warningTimePeriod (%d)
Job warning time period in seconds
sla (%s)
SLA service class name under which the job runs
SLArunLimit (%d)
Absolute run time limit of the job for SLA service classes
JOB_FORWARD
A job has been forwarded to a remote cluster (Platform MultiCluster only).
If LSF_HPC_EXTENSIONS="SHORT_EVENTFILE" is specified in
lsf.conf
, older daemons and commands (pre-LSF Version 6.0) cannot recognize thelsb.events
file format.The fields in order of occurrence are:
Version number (%s)
The version number
Event time (%d)
The time of the event
jobId (%d)
Job ID
numReserHosts (%d)
Number of reserved hosts in the remote cluster
cluster (%s)
Remote cluster name
reserHosts (%s)
List of names of the reserved hosts in the remote cluster
idx (%d)
Job array index
JOB_ACCEPT
A job from a remote cluster has been accepted by this cluster. The fields in order of occurrence are:
Version number (%s)
The version number
Event time (%d)
The time of the event
jobId (%d)
Job ID at the accepting cluster
remoteJid (%d)
Job ID at the submission cluster
cluster (%s)
Job submission cluster name
idx (%d)
Job array index (%d)
JOB_START
A job has been dispatched.
If LSF_HPC_EXTENSIONS="SHORT_EVENTFILE" is specified in
lsf.conf
, older daemons and commands (pre-LSF Version 6.0) cannot recognize thelsb.events
file format.The fields in order of occurrence are:
Version number (%s)
The version number
Event time (%d)
The time of the event
jobId (%d)
Job ID
jStatus (%d)
Job status, (4, indicating the RUN status of the job)
jobPid (%d)
Job process ID
jobPGid (%d)
Job process group ID
hostFactor (%f)
CPU factor of the first execution host
numExHosts (%d)
Number of processors used for execution
execHosts (%s)
List of execution host names
queuePreCmd (%s)
Pre-execution command
queuePostCmd (%s)
Post-execution command
jFlags (%d)
Job processing flags
userGroup (%s)
User group name
idx (%d)
Job array index
additionalInfo (%s)
Placement information of HPC jobs
JOB_START_ACCEPT
A job has started on the execution host(s). The fields in order of occurrence are:
Version number (%s)
The version number
Event time (%d)
The time of the event
jobId (%d)
Job ID
jobPid (%d)
Job process ID
jobPGid (%d)
Job process group ID
idx (%d)
Job array index
JOB_STATUS
The status of a job changed after dispatch. The fields in order of occurrence are:
Version number (%s)
The version number
Event time (%d)
The time of the event
jobId (%d)
Job ID
jStatus (%d)
New status, see <
lsbatch/lsbatch.h>
reason (%d)
Pending or suspended reason code, see
<lsbatch/lsbatch.h>
subreasons (%d)
Pending or suspended subreason code, see
<lsbatch/lsbatch.h>
cpuTime (%f)
CPU time consumed so far
endTime (%d)
Job completion time
ru (%d)
Resource usage flag
lsfRusage (%s)
Resource usage statistics, see
<lsf/lsf.h>
exitStatus (%d)
Exit status of the job, see
<lsbatch/lsbatch.h>
idx (%d)
Job array index
exitInfo (%d)
Job termination reason, see
<lsbatch/lsbatch.h>
JOB_SWITCH
A job switched from one queue to another. The fields in order of occurrence are:
Version number (%s)
The version number
Event time (%d)
The time of the event
userId (%d)
UNIX user ID of the user invoking the command
jobId (%d)
Job ID
queue (%s)
Target queue name
idx (%d)
Job array index
userName (%s)
Name of the job submitter
JOB_MOVE
A job moved toward the top or bottom of its queue. The fields in order of occurrence are:
Version number (%s)
The version number
Event time (%d)
The time of the event
userId (%d)
UNIX user ID of the user invoking the command
jobId (%d)
Job ID
position (%d)
Position number
base (%d)
Operation code, (TO_TOP or TO_BOTTOM), see
<lsbatch/lsbatch.h>
idx (%d)
Job array index
userName (%s)
Name of the job submitter
QUEUE_CTRL
A job queue has been altered. The fields in order of occurrence are:
Version number (%s)
The version number
Event time (%d)
The time of the event
opCode (%d)
Operation code), see
<lsbatch/lsbatch.h>
queue (%s)
Queue name
userId (%d)
UNIX user ID of the user invoking the command
userName (%s)
Name of the user
ctrlComments (%s)
Administrator comment text from the
-C
option ofbadmin
queue control commandsqclose
,qopen
,qact
, andqinact
HOST_CTRL
A batch server host changed status. The fields in order of occurrence are:
Version number (%s)
The version number
Event time (%d)
The time of the event
opCode (%d)
Operation code, see
<lsbatch/lsbatch.h>
host (%s)
Host name
userId (%d)
UNIX user ID of the user invoking the command
userName (%s)
Name of the user
ctrlComments (%s)
Administrator comment text from the
-C
option ofbadmin
host control commandshclose
andhopen
MBD_START
The
mbatchd
has started. The fields in order of occurrence are:Version number (%s)
The version number
Event time (%d)
The time of the event
master (%s)
Master host name
cluster (%s)
cluster name
numHosts (%d)
Number of hosts in the cluster
numQueues (%d)
Number of queues in the cluster
MBD_DIE
The
mbatchd
died. The fields in order of occurrence are:Version number (%s)
The version number
Event time (%d)
The time of the event
master (%s)
Master host name
numRemoveJobs (%d)
Number of finished jobs that have been removed from the system and
logged in the current event fileexitCode (%d)
Exit code from
mbatchd
ctrlComments (%s)
Administrator comment text from the
-C
option ofbadmin mbdrestart
UNFULFILL
Actions that were not taken because the
mbatchd
was unable to contact thesbatchd
on the job execution host. The fields in order of occurrence are:Version number (%s)
The version number
Event time (%d)
The time of the event
jobId (%d)
Job ID
notSwitched (%d)
Not switched: the
mbatchd
has switched the job to a new queue, but thesbatchd
has not been informed of the switchsig (%d)
Signal: this signal has not been sent to the job
sig1 (%d)
Checkpoint signal: the job has not been sent this signal to checkpoint itself
sig1Flags (%d)
Checkpoint flags, see
<lsbatch/lsbatch.h>
chkPeriod (%d)
New checkpoint period for job
notModified (%s)
If set to true, then parameters for the job cannot be modified.
idx (%d)
Job array index
LOAD_INDEX
mbatchd
restarted with these load index names (seelsf.cluster
(5)). The fields in order of occurrence are:Version number (%s)
The version number
Event time (%d)
The time of the event
nIdx (%d)
Number of index names
name (%s)
List of index names
JOB_SIGACT
An action on a job has been taken. The fields in order of occurrence are:
Version number (%s)
The version number
Event time (%d)
The time of the event
jobId (%d)
Job ID
period (%d)
Action period
pid (%d)
Process ID of the child
sbatchd
that initiated the actionjstatus (%d)
Job status
reasons (%d)
Job pending reasons
flags (%d)
Action flags, see
<lsbatch/lsbatch.h>
actStatus (%d)
Action status:
1: Action started
2: One action preempted other actions
3: Action succeeded
4: Action Failed
signalSymbol (%s)
Action name, accompanied by actFlags
idx (%d)
Job array index
MIG
A job has been migrated. The fields in order of occurrence are:
Version number (%s)
The version number
Event time (%d)
The time of the event
jobId (%d)
Job ID
numAskedHosts (%d)
Number of candidate hosts for migration
askedHosts (%s)
List of names of candidate hosts
userId (%d)
UNIX user ID of the user invoking the command
idx (%d)
Job array index
userName (%s)
Name of the job submitter
JOB_MODIFY2
This is created when the
mbatchd
modifies a previously submitted job viabmod(1)
.Version number (%s)
The version number
Event time (%d)
The time of the event
jobIdStr (%s)
Job ID
options (%d)
Bit flags for job processing
options2 (%d)
Bit flags for job processing
delOptions (%d)
Delete options for the options2 field
delOptions2 (%d)
Delete options for the options field
userId (%d)
UNIX user ID of the submitter
userName (%s)
User name
submitTime (%d)
Job submission time
umask (%d)
File creation mask for this job
numProcessors (%d)
Number of processors requested for execution
beginTime (%d)
Start time - the job should be started on or after this time
termTime (%d)
Termination deadline - the job should be terminated by this time
sigValue (%d)
Signal value
restartPid (%d)
Restart process ID for the original job
jobName (%s)
Job name
queue (%s)
Name of job queue to which the job was submitted
numAskedHosts (%d)
Number of candidate host names
askedHosts (%s)
List of names of candidate hosts for job dispatching; blank if the last field value is 0. If there is more than one host name, then each additional host name will be returned in its own field
resReq (%s)
Resource requirements
rLimits
Soft CPU time limit (%d), see
getrlimit
(2)rLimits
Soft file size limit (%d), see
getrlimit
(2)rLimits
Soft data segment size limit (%d), see
getrlimit
2)rLimits
Soft stack segment size limit (%d), see
getrlimit
(2)rLimits
Soft core file size limit (%d), see
getrlimit(2
)rLimits
Soft memory size limit (%d), see
getrlimit(2
)rLimits
Reserved (%d)
rLimits
Reserved (%d)
rLimits
Reserved (%d)
rLimits
Soft run time limit (%d), see
getrlimit
(2)rLimits
Reserved (%d)
hostSpec (%s)
Model or host name for normalizing CPU time and run time
dependCond (%s)
Job dependency condition
timeEvent (%d)
Time Event, for job dependency condition; specifies when time event ended
subHomeDir (%s)
Submitter's home directory
inFile (%s)
Input file name
outFile (%s)
Output file name
errFile (%s)
Error output file name
command (%s)
Job command
inFileSpool (%s)
Spool input file
commandSpool (%s)
Spool command file
chkpntPeriod (%d)
Checkpointing period
chkpntDir (%s)
Checkpoint directory
nxf (%d)
Number of files to transfer
xf (%s)
List of file transfer specifications
jobFile (%s)
Job file name
fromHost (%s)
Submission host name
cwd (%s)
Current working directory
preExecCmd (%s)
Job pre-execution command
mailUser (%s)
Mail user name
projectName (%s)
Project name
niosPort (%d)
Callback port if batch interactive job
maxNumProcessors (%d)
Maximum number of processors
loginShell (%s)
Login shell
schedHostType (%s)
Execution host type
userGroup (%s)
User group
exceptList (%s)
Exception handlers for the job
userPriority (%d)
User priority
rsvId %s
Advance reservation ID; for example, "
user2#0
"extsched (%s)
External scheduling options
warningAction (%s)
Job warning action
warningTimePeriod (%d)
Job warning time period in seconds
jobGroup (%s)
The job group to which the job is attached
sla (%s)
SLA service class name that the job is to be attached to
JOB_SIGNAL
This is created when a job is signaled via
bkill(1)
or deleted viabdel(1)
. The fields are in the order they appended:Version number (%s)
The version number
Event time (%d)
The time of the event
jobId (%d)
Job ID
userId (%d)
UNIX user ID of the user invoking the command
runCount (%d)
Number of runs
signalSymbol (%s)
Signal name
idx (%d)
Job array index
userName (%s)
Name of the job submitter
CAL_NEW
This is created when a new calendar is added to the system via
bcaladd(1)
. The fields in order of occurrence are:Version number (%s)
The version number
Event time (%d)
The time of the event
userId (%d)
UNIX user ID of the calendar owner or the invoker
options (%d)
Options, see
<lsbatch/lsbatch.h>
name (%s)
Calendar name
desc (%s)
Calendar description
calExpr (%s)
Time expression list associated with the calendar
CAL_MODIFY
This is created when a calendar is modified via
bcalmod
(1). The fields are the same as forCAL_NEW
.CAL_DELETE
This is created when a calendar is deleted via
bcaldel
(1). The fields are the same as forCAL_NEW
.JOB_EXECUTE
This is created when a job is actually running on an execution host. The fields in order of occurrence are:
Version number (%s)
The version number
Event time (%d)
The time of the event
jobId (%d)
Job ID
execUid (%d)
Mapped UNIX user ID on execution host
jobPGid (%d)
Job process group ID
execCwd (%s)
Current working directory job used on execution host
execHome (%s)
Home directory job used on execution host
execUsername (%s)
Mapped user name on execution host
jobPid (%d)
Job process ID
idx (%d)
Job array index
additionalInfo (%s)
Placement information of HPC jobs
SLAscaledRunLimit (%d)
Run time limit for the job scaled by the execution host
JOB_REQUEUE
This is created when a job ended and requeued by
mbatchd
. The fields in order of occurrence are:Version number (%s)
The version number
Event time (%d)
The time of the event
jobId (%d)
Job ID
idx (%d)
Job array index
JOB_CLEAN
This is created when a job is removed from the
mbatchd
memory. The fields in order of occurrence are:Version number (%s)
The version number
Event time (%d)
The time of the event
jobId (%d)
Job ID
idx (%d)
Job array index
JOB_EXCEPTION
This is created when an exception condition is detected for a job. The fields in order of occurrence are:
Version number (%s)
The version number
Event time (%d)
The time of the event
jobId (%d)
Job ID
exceptMask (%d)
Exception Id
0x01: missched
0x02: overrun
0x04: underrun
0x08: abend
0x10: cantrun
0x20: hostfail
0x40: startfail
actMask (%d)
Action Id
0x01: kill
0x02: alarm
0x04: rerun
0x08: setexcept
timeEvent (%d)
Time Event, for
missched
exception specifies when time event ended.exceptInfo (%d)
Except Info, pending reason for
missched
orcantrun
exception, the exit code of the job for theabend
exception, otherwise 0.idx (%d)
Job array index
JOB_EXT_MSG
An external message has been sent to a job. The fields in order of occurrence are:
Version number (%s)
The version number
Event time (%d)
The time of the event
jobId (%d)
Job ID
idx (%d)
Job array index
msgIdx (%d)
Index in the list
userId (%d)
Unique user ID of the user invoking the command
dataSize (%ld)
Size of the data if it has any, otherwise 0
postTime (%ld)
Message sending time
dataStatus (%d)
Status of the attached data
desc (%s)
Text description of the message
userName (%s)
Name of the author of the message
JOB_ATTA_DATA
An update on the data status of a message for a job has been sent. The fields in order of occurrence are:
Version number (%s)
The version number
Event time (%d)
The time of the event
jobId (%d)
Job ID
idx (%d)
Job array index
msgIdx (%d)
Index in the list
dataSize (%ld)
Size of the data if is has any, otherwise 0
dataStatus (%d)
Status of the attached data
fileName (%s)
File name of the attached data
JOB_CHUNK
This is created when a job is inserted into a chunk.
If LSF_HPC_EXTENSIONS="SHORT_EVENTFILE" is specified in
lsf.conf
, older daemons and commands (pre-LSF Version 6.0) cannot recognize thelsb.events
file format.The fields in order of occurrence are:
Version number (%s)
The version number
Event time (%d)
The time of the event
membSize (%ld)
Size of array membJobId
membJobId (%ld)
Job IDs of jobs in the chunk
numExHosts (%ld)
Number of execution hosts
execHosts (%s)
Execution host name array
SBD_UNREPORTED_STATUS
This is created when an unreported status change occurs. The fields in order of occurrence are:
Version number (%s)
The version number
Event time (%d)
The time of the event
jobId (%d)
Job ID
actPid (%d)
Acting processing ID
jobPid (%d)
Job process ID
jobPGid (%d)
Job process group ID
newStatus (%d)
New status of the job
reason (%d)
Pending or suspending reason code, see
<lsbatch/lsbatch.h>
suspreason (%d)
Pending or suspending subreason code, see
<lsbatch/lsbatch.h>
lsfRusage
The following fields contain resource usage information for the job. If the value of some field is unavailable (due to job abortion or the difference among the operating systems), -1 will be logged. Times are measured in seconds, and sizes are measured in KB.
ru_utime (%f)
User time used
ru_stime (%f)
System time used
ru_maxrss (%d)
Maximum shared text size
ru_ixrss (%d)
Integral of the shared text size over time (in kilobyte seconds)
ru_ismrss (%d)
Integral of the shared memory size over time (valid only on Ultrix)
ru_idrss (%d)
Integral of the unshared data size over time
ru_isrss (%d)
Integral of the unshared stack size over time
ru_minflt (%d)
Number of page reclaims
ru_magflt (%d)
Number of page faults
ru_nswap (%d)
Number of times the process was swapped out
ru_inblock (%d)
Number of block input operations
ru_oublock (%d)
Number of block output operations
ru_ioch (%d)
Number of characters read and written (valid only on HP-UX)
ru_msgsnd (%d)
Number of System V IPC messages sent
ru_msgrcv (%d)
Number of messages received
ru_nsignals (%d)
Number of signals received
ru_nvcsw (%d)
Number of voluntary context switches
ru_nivcsw (%d)
Number of involuntary context switches
ru_exutime (%d)
Exact user time used (valid only on ConvexOS)
exitStatus (%d)
Exit status of the job, see
<lsbatch/lsbatch.h>
execCwd (%s)
Current working directory job used on execution host
execHome (%s)
Home directory job used on execution host
execUsername (%s)
Mapped user name on execution host
msgId (%d)
ID of the message
actStatus (%d)
Action status
1: Action started
2: One action preempted other actions
3: Action succeeded
4: Action Failed
sigValue (%d)
Signal value
seq (%d)
Sequence status of the job
idx (%d)
Job array index
jRusage (run usage)
The following fields contain resource usage information for the job. If the value of some field is unavailable (due to job abortion or the difference among the operating systems), -1 will be logged. Times are measured in seconds, and sizes are measured in KB.
mem (%d)
Total resident memory usage in KB of all currently running processes in a given process group
swap (%d)
Totaly virtual memory usage in KB of all currently running processes in given process groups
utime (%d)
Cumulative total user time in seconds
stime (%d)
Cumulative total system time in seconds
npids (%d)
Number of currently active process in given process groups. This entry has four sub-fields:
pid (%d)
Process ID of the child
sbatchd
that initiated the actionppid (%d)
Parent process ID
pgid (%d)
Process group ID
jobId (%d)
Process Job ID
npgids (%d)
Number of currently active process groups
exitInfo (%d)
Job termination reason, see
<lsbatch/lsbatch.h>
SEE ALSO
lsid
(1),getrlimit
(2),lsb_geteventrec
(3),lsb.acct
(5),lsb.queues
(5),lsb.hosts
(5),lsb.users
(5),lsb.params
(5),lsf.conf
(5),lsf.cluster
(5),badmin
(8),mbatchd
(8)
LSB_SHAREDIR/<
clustername
>/logdir/lsb.events[.
n]
[ Top ]
[ Platform Documentation ] [ Title ] [ Contents ] [ Previous ] [ Next ] [ Index ]
Date Modified: February 24, 2004
Platform Computing: www.platform.com
Platform Support: support@platform.com
Platform Information Development: doc@platform.com
Copyright © 1994-2004 Platform Computing Corporation. All rights reserved.