Configuration File¶
Location¶
All commands in The GC3Apps software and The GC3Utils software read two configuration files at startup:
- system-wide one located at :file:
/etc/gc3/gc3pie.conf
, and- a user-private one at :file:
~/.gc3/gc3pie.conf
.
Both files use the same format. The system-wide one is read first, so that users can override the system-level configuration in their private file. Configuration data from corresponding sections in the two configuration files is merged; the value in the user-private file overrides the one from the system-wide configuration.
If you try to start any GC3Utils command without having a
configuration file, a sample one will be copied to the user-private
location :file:~/.gc3/gc3pie.conf
and an error message will be
displayed, directing you to edit the sample file before retrying.
Configuration file format¶
The GC3Pie configuration file follows the format understood by
Python ConfigParser,
which is very close to the syntax used in MS-Windows .INI
files.
See http://docs.python.org/library/configparser.html for reference.
The GC3Libs configuration file consists of several configuration blocks. Each configuration block (section) starts with a keyword in square brackets and contains the configuration options for a specific part.
The following sections are used by the GC3Apps/GC3Utils programs:
[DEFAULT]
– this is for global settings.[auth/name]
– these are for settings related to identity/authentication (identifying yourself to clusters & grids).[resource/name]
– these are for settings related to a specific computing resource (cluster, grid, etc.)
Sections with other names are allowed but will be ignored.
The DEFAULT
section¶
The [DEFAULT]
section is optional.
Values defined in the [DEFAULT]
section can be used to insert
values in other sections, using the %(name)s
syntax. See
documentation of the Python SafeConfigParser object at
http://docs.python.org/library/configparser.html for an example.
auth
sections¶
There can be more than one [auth]
section.
Each authentication section must begin with a line of the form:
[auth/name]
where the name
portion is any alphanumeric string.
You can have as many [auth/name]
sections as you want; any
name is allowed provided it’s composed only of letters, numbers and
the underscore character _
.
This allows you to define different auth methods for different
resources. Each [resource/name]
section will reference one
(and one only) authentication section.
Authentication types¶
Each auth
section must specify a type
setting.
type
defines the authentication type that will be used to access
a resource. There are three supported authentication types:
ssh
; use this for resources that will be accessed by opening an SSH connection to the front-end node of a cluster.voms-proxy
: usesvoms-proxy-init
to generate a proxy; use for resources that require a VOMS-enabled Grid proxy.grid-proxy
: usesgrid-proxy-init
to generate a proxy; use for resources that require a Grid proxy (but no VOMS extensions).
For the ssh
-type auth, the following keys must be provided:
type
: must bessh
username
: must be the username to log in as on the remote machine
Any other key/value pair will be ignored.
For the voms-proxy
type auth, the following keys must be provided:
type
: must bevoms-proxy
vo
: the VO to authenticate with (passed directly tovoms-proxy-init
as argument to the--vo
command-line switch)cert_renewal_method
: see below.remember_password
: see below.
Any other key/value pair will be ignored.
For the grid-proxy
type auth, the following keys must be provided:
type
: must begrid-proxy
cert_renewal_method
: see below.remember_password
: see below.
Any other key/value pair will be ignored.
For the voms-proxy
and grid-proxy
authentication types, the
cert_renewal_method
setting specifies whether GC3Libs should attempt
to get a certificate if the current one is expired or otherwise invalid.
Currently there are two supported cert_renewal_method
types:
slcs
: user certificate is generated through an invocation of theslcs-init
:command: program.manual
: user certificate is generated/renewed though an external process and has to be performed by the user outside of the scope of GC3Pie. In this case, if the user certificate is expired, invalid or non-existent, GC3Pie will fail to authenticate.
For the slcs
certificate renewal method, the following keys must be provided:
aai_username
: passed directly to slcs-init as argument to the--user
command-line switch.idp
: passed directly to slcs-init as argument to the--idp
command-line switch.
For the manual
certificate renewal method, no additional keys are required.
The remember_password
entry (optional) must be set to a boolean
value (the strings 1`, ``yes
, true
and on
are interpreted
as boolean “true”; any other value counts as “false”). If set to a
true value, the remember_password
entry instructs GC3Pie to keep
the password used for this authentication in the program’s main
memory; this implies that you will be asked for the password at most
once per program invocation. This setting is optional, and defaults
to “false”. Keeping passwords in memory is bad security practice; do
not set this option to “true” unless you understand the implications.
Example 1. The following example auth
section shows how to
configure GC3Pie for using SWITCHaai SLCS services to generate a
certificate and a VOMS proxy to access the Swiss National Distributed
Computing Infrastructure SMSCG:
[auth/smscg]
type = voms-proxy
cert_renewal_method = slcs
aai_username = <aai_user_name> # SWITCHaai/Shibboleth user name
idp= uzh.ch
vo = smscg
Example 2. The following configuration sections are used to set up two different accounts, that GC3Pie programs can use. Which account should be used on which computational resource is defined in the resource sections (see below).
[auth/ssh1]
type = ssh
username = murri # your username here
[auth/ssh2] # I use a different account name on some resources
type = ssh
username = rmurri
resource
sections¶
Each resource section must begin with a line of the form:
[resource/name]
You can have as many [resource/name]
sections as you want; this
allows you to define many different resources. Each [resource/name]
section must reference one (and one only) [auth/name]
section (by its auth
key).
Resources currently come in several flavours, distinguished by the
value of the type
key:
- If
type
isarc1
, then the resource is accessed using the ARC grid middleware (version 1.1.x/1.0.x);- If
type
isarc0
, then the resource is accessed using the ARC grid middleware (version 0.8.x);- If
type
issge
, then the resource is a Grid Engine batch system, to be accessed by an SSH connection to its front-end node.- If
type
ispbs
, then the resource is a Torque/PBS batch system, to be accessed by an SSH connection to its front-end node.- If
type
islsf
, then the resource is a LSF batch system, to be accessed by an SSH connection to its front-end node.- If
type
isslurm
, then the resource is a SLURM batch system, to be accessed by an SSH connection to its front-end node.- If
type
isshellcmd
, then the resource is the computer where the GC3Pie script is running and applications are executed by just spawning a local UNIX process.
All [resource/name]
sections (except those of shellcmd
type) must reference a valid auth/***
section. Resources of
sge
, pbs
, lsf
and slurm
type can only reference
:command:ssh
type sections.; resources of type arc0
or
arc1
can only reference [auth/***]
sections whose type is
voms-proxy
or grid-proxy
.
Some configuration keys are commmon to all resource types:
type
: Resource type, see above.
auth
: the name of a valid[auth/name]
section; only the authentication section name (after the/
) must be specified.
max_cores_per_job
: Maximum number of CPU cores that a job can request; a resource will be dropped during the brokering process if a job requests more cores than this.
max_memory_per_core
: Max amount of memory (expressed in GBs) that a job can request.
max_walltime
: Maximum job running time (in hours).
max_cores
: Total number of cores provided by the resource.
architecture
: Processor architecture. Should be one of the stringsx86_64
(for 64-bit Intel/AMD/VIA processors),i686
(for 32-bit Intel/AMD/VIA x86 processors), orx86_64,i686
if both architectures are available on the resource.
time_cmd
: Used only whentype
isshellcmd
. The time program is used as wrapper for the application in order to collect informations about the execution when running without a real LRMS.
prologue
: Used only whentype
ispbs
,lsf
,
slurm
orsge
. The content of the prologue script will be inserted into the submission script and it’s executed before the real application. It is intended to execute some shell commands needed to setup the execution environment before running the application (e.g. running a module load ... command). The script must be a valid, plain /bin/sh script.
epilogue
: Used only whentype
ispbs
,lsf
,
slurm
orsge
. The content of the epilogue script will be inserted into the submission script and it’s executed after the real application. The script must be a valid, plain /bin/sh script.
<application_name>_prologue
: Same asprologue
, but it is used only when<application_name>
matches the name of the application. Valid application names are: zods, gamess, turbomole, codeml, rosetta, rosetta_docking, geotop,
<application_name>_epilogue
: Same asepilogue
, but it is used only when<application_name>
matches the name of the application. Valid application names are: zods, gamess, turbomole, codeml, rosetta, rosetta_docking, geotop,
arc0
resources¶
The arc_ldap
key should be set to the LDAP URL of an ARC GIIS or
GRIS. If, in addition, the frontend
key is also defined, then
only queues belonging to the specified frontend will be considered for
brokering.
When a job has just been submitted, the ARC information system does
not immediately report about it: the job will appear at the next cache
update. This creates a time window during which no information is
reported about the job by ARC, as if it never existed. In order not
to mistake this for a “job lost” error, GC3Libs allow a “grace time”:
job information lookups are allowed to fail for a certain time span
after submission. The duration of this time span is set with the optional
lost_job_timeout
parameter, whose default is 4 times the ARC default
cache time; this parameter should not be lower than twice the
information system update frequency.
lost_job_timeout
: Time (in seconds) a failure in job lookup in the information system will not be considered critical
arc1
resources¶
The arc_ldap
key should be defined to a valid ARC1 information system URL.
When a job has just been submitted, the ARC information system does
not immediately report about it: the job will appear at the next cache
update. This creates a time window during which no information is
reported about the job by ARC, as if it never existed. In order not
to mistake this for a “job lost” error, GC3Libs allow a “grace time”:
job information lookups are allowed to fail for a certain time span
after submission. The duration of this time span is set with the optional
lost_job_timeout
parameter, whose default is 4 times the ARC default
cache time; this parameter should not be lower than twice the
information system update frequency.
lost_job_timeout
: Time (in seconds) a failure in job lookup in the information system will not be considered critical
sge
resources¶
The following configuration keys are required in a sge
-type resource section:
frontend
: should contain the FQDN of the SGE front-end node. An SSH connection will be attempted to this node, in order to submit jobs and retrieve status info.transport
: Possible values are:ssh
orlocal
. Ifssh
, we try to connect to the host specified infrontend
via SSH in order to execute SGE commands. Iflocal
, the SGE commands are run directly on the machine where GC3Pie is installed.
To submit parallel jobs to SGE, a “parallel environment” name must be
specified. You can specify the PE to be used with a specific
application using a configuration parameter application name +
_pe
(e.g., gamess_pe
, zods_pe
); the default_pe
parameter dictates the parallel environment to use if no
application-specific one is defined. If neither the
application-specific, nor the ``default_pe`` parallel environments are
defined, then it will not be possible to submit parallel jobs.
When a job has finished, the SGE batch system does not (by default)
immediately write its information into the accounting database. This
creates a time window during which no information is reported about
the job by SGE, as if it never existed. In order not to mistake this
for a “job lost” error, GC3Libs allow a “grace time”: qacct job
information lookups are allowed to fail for a certain time span after
the first time qstat failed. The duration of this time span is set
with the sge_accounting_delay
parameter, whose default is 15 seconds
(matches the default in SGE, as of release 6.2):
sge_accounting_delay
: Time (in seconds) a failure in qacct will not be considered critical.
pbs
resources¶
The following configuration keys are required in a pbs
-type resource section:
transport
: Possible values are:ssh
orlocal
. Ifssh
, we try to connect to the host specified infrontend
via SSH in order to execute Troque/PBS commands. Iflocal
, the Torque/PBS commands are run directly on the machine where GC3Pie is installed.frontend
: should contain the FQDN of the Torque/PBS front-end node. This configuration item is only relevant iftransport
islocal
. An SSH connection will be attempted to this node, in order to submit jobs and retrieve status info.
lsf
resources¶
The following configuration keys are required in a lsf
-type resource section:
transport
: Possible values are:ssh
orlocal
. Ifssh
, we try to connect to the host specified infrontend
via SSH in order to execute LSF commands. Iflocal
, the LSF commands are run directly on the machine where GC3Pie is installed.frontend
: should contain the FQDN of the LSF front-end node. This configuration item is only relevant iftransport
islocal
. An SSH connection will be attempted to this node, in order to submit jobs and retrieve status info.
slurm
resources¶
The following configuration keys are required in a slurm
-type resource section:
transport
: Possible values are:ssh
orlocal
. Ifssh
, we try to connect to the host specified infrontend
via SSH in order to execute SLURM commands. Iflocal
, the SLURM commands are run directly on the machine where GC3Pie is installed.frontend
: should contain the FQDN of the SLURM front-end node. This configuration item is only relevant iftransport
islocal
. An SSH connection will be attempted to this node, in order to submit jobs and retrieve status info.
Example resource
sections¶
Example 1. This configuration stanza defines a resource smscg
representing the whole SMSCG infrastructure, accessed through the ARC
(version 0.8.x) middleware:
[resource/smscg]
# A whole ARC-based Grid
type = arc0
auth = <voms_auth_name>
arc_ldap = ldap://giis.smscg.ch:2135/o=grid/mds-vo-name=Switzerland
# These values are correct as of 2011-02-28; please
# ask on the SMSCG mailing list if unsure.
max_cores_per_job = 256
max_memory_per_core = 3
max_walltime = 9999
ncores = 1200
architecture = x86_64, i686
Example 2. This configuration stanza shows how to access a single
cluster through the ARC middleware (version 1.x) using the name
idgc3grid01
(which is also the internet host name of the cluster
front-end):
[resource/idgc3grid01]
# A single cluster, accessed through the ARC middleware
type = arc
auth = <auth_name> # pick a ``voms`` type auth
frontend = idgc3grid01.uzh.ch
name = gc3
arc_ldap = ldap://idgc3grid01.uzh.ch:2135/mds-vo-name=local,o=grid
max_cores_per_job = 32
max_memory_per_core = 2
max_walltime = 12
ncores = 80
Example 3. This configuration stanza defines a resource to submit
jobs to the Grid Engine cluster whose front-end host is
ocikbpra.uzh.ch
:
[resource/ocikbpra]
# A single SGE cluster, accessed by SSH'ing to the front-end node
type = sge
auth = <auth_name> # pick an ``ssh`` type auth, e.g., "ssh1"
transport = ssh
frontend = ocikbpra.uzh.ch
gamess_location = /share/apps/gamess
max_cores_per_job = 80
max_memory_per_core = 2
max_walltime = 2
ncores = 80
Enabling/disabling selected resources¶
Any resource can be disabled by adding a line enabled = false
to its
configuration stanza. Conversely, a line enabled = true
will undo
the effect of an enabled = false
line (possibly found in a different
configuration file).
This way, resources can be temporarily disabled (e.g., the cluster is down for maintenance) without having to remove them from the configuration file.
You can selectively disable or enable resources that are defined in
the system-wide configuration file. Two main use cases are supported:
the system-wide configuration file :file:/etc/gc3/gc3pie.conf
lists and
enables all available resources, and users can turn them off in their
private configuration file :file:~/.gc3/gc3pie.conf
; or the system-wide
configuration can list all available resources but keep them disabled,
and users can enable those they prefer in the private configuration
file.