The ggeotop script¶
GC3Apps provide a script drive execution of multiple GEOtop
jobs. It uses the generic gc3libs.cmdline.SessionBasedScript
framework.
From GEOtop’s “read me” file:
#
# RUNNING
# Run this simulation by calling the executable (GEOtop_1.223_static)
# and giving the simulation directory as an argument.
#
# EXAMPLE
# ls2:/group/geotop/sim/tmp/000001>./GEOtop_1.223_static ./
#
# TERMINATION OF SIMULATION BY GEOTOP
# When GEOtop terminates due to an internal error, it mostly reports this
# by writing a corresponding file (_FAILED_RUN or _FAILED_RUN.old) in the
# simulation directory. When is terminates sucessfully this file is
# named (_SUCCESSFUL_RUN or _SUCCESSFUL_RUN.old).
#
# RESTARTING SIMULATIONS THAT WERE TERMINATED BY THE SERVER
# When a simulation is started again with the same arguments as described
# above (RUNNING), then it continues from the last saving point. If
# GEOtop finds a file indicating a successful/failed run, it terminates.
Introduction¶
ggeotop driver script acan the specified INPUT directories recursively for simulation directories and submit a job for each one found; job progress is monitored and, when a job is done, its output files are retrieved back into the simulation directory itself.
A simulation directory is defined as a directory containing a
geotop.inpts
file, an in
and an out
folders.
The ggeotop
command keeps a record of jobs (submitted, executed
and pending) in a session file (set name with the -s
option); at
each invocation of the command, the status of all recorded jobs is
updated, output from finished jobs is collected, and a summary table
of all known jobs is printed. New jobs are added to the session if
new input files are added to the command line.
Options can specify a maximum number of jobs that should be in
‘SUBMITTED’ or ‘RUNNING’ state; ggeotop
will delay submission of
newly-created jobs so that this limit is never exceeded.
Options can specify a maximum number of jobs that should be in ‘SUBMITTED’ or ‘RUNNING’ state; ggeotop will delay submission of newly-created jobs so that this limit is never exceeded.
In more detail, ggeotop does the following:
Reads the session (specified on the command line with the
--session
option) and loads all stored jobs into memory. If the session directory does not exist, one will be created with empty contents.Recursively scans trough
input
folder searching for any valid folder.ggeotop will generate a collection of jobs one for each valid input folder. Each job will transfer the input folder to the remote execution node and run
GEOTop
.GEOTop
reads geotop.inpts files for getting instructions on how to find the input data, what and how to process and where to place generated output results. Extracted from a generic geotop.inpts file:
DemFile = "in/dem"
MeteoFile = "in/meteo"
SkyViewFactorMapFile = "in/svf"
SlopeMapFile = "in/slp"
AspectMapFile = "in/asp"
!==============================================
! DIST OUTPUT
!==============================================
SoilAveragedTempTensorFile = "out/maps/T"
NetShortwaveRadiationMapFile="out/maps/SWnet"
InShortwaveRadiationMapFile="out/maps/SWin"
InLongwaveRadiationMapFile="out/maps/LWin"
SWEMapFile= "out/maps/SWE"
AirTempMapFile = "out/maps/Ta"
Updates the state of all existing jobs, collects output from finished jobs, and submits new jobs generated in step 2.
For each of the terminated jobs, a post-process routine is executed to check and validate the consistency of the generated output. If no
_SUCCESSFUL_RUN
or_FAILED_RUN
file is found, the related job will be resubmitted together with the current input and output folders. GEOTop is capable of restarting an interrupted claculation by inspecting the intermediate results generated inout
folder.Finally, a summary table of all known jobs is printed. (To control the amount of printed information, see the
-l
command-line option in the Introduction to session-based scripts section.)
If the
-C
command-line option was given (see below), waits the specified amount of seconds, and then goes back to step 3.The program ggeotop exits when all jobs have run to completion, i.e., when all valid input folders have been computed.
Execution can be interrupted at any time by pressing Ctrl+C
.
If the execution has been interrupted, it can be resumed at a later
stage by calling ggeotop with exactly the same
command-line options.
Command-line invocation of ggeotop¶
The ggeotop script is based on GC3Pie’s session-based script model; please read also the Introduction to session-based scripts section for an introduction to sessions and generic command-line options.
A ggeotop command-line is constructed as follows:
- Each argument (at least one should be specified) is considered as a folder reference.
-x
option is used to specify the path to the GEOtop executable file.
Example 1. The following command-line invocation uses
ggeotop to run GEOTop
on all valid input folder found
in the recursive check of input_folder
:
$ ggeotop -x /apps/geotop/bin/geotop_1_224_20120227_static ./input_folder
Example 2.
$ ggeotop --session SAMPLE_SESSION -w 24 -x /apps/geotop/bin/geotop_1_224_20120227_static ./input_folder
In this example, job information is stored into session
SAMPLE_SESSION
(see the documentation of the --session
option
in Introduction to session-based scripts). The command above creates the jobs,
submits them, and finally prints the following status report:
Status of jobs in the 'SAMPLE_SESSION' session: (at 10:53:46, 02/28/12)
NEW 0/50 (0.0%)
RUNNING 0/50 (0.0%)
STOPPED 0/50 (0.0%)
SUBMITTED 50/50 (100.0%)
TERMINATED 0/50 (0.0%)
TERMINATING 0/50 (0.0%)
total 50/50 (100.0%)
Calling ggeotop over and over again will result in the same jobs being monitored;
The -C
option tells ggeotop to continue running until
all jobs have finished running and the output files have been
correctly retrieved. On successful completion, the command given in
example 2. above, would print:
Status of jobs in the 'SAMPLE_SESSION' session: (at 11:05:50, 02/28/12)
NEW 0/50 (0.0%)
RUNNING 0/50 (0.0%)
STOPPED 0/540 (0.0%)
SUBMITTED 0/50 (0.0%)
TERMINATED 50/50 (100.0%)
TERMINATING 0/50 (0.0%)
ok 50/50 (100.0%)
total 50/50 (100.0%)
Each job will be named after the folder name (e.g. 000002) (you could
see this by passing the -l
option to ggeotop).; each of
these jobs will fill the related input folder with the produced
outputs.
For each job, the set of output files is automatically retrieved and placed in the locations described below.
Output files for ggeotop¶
Upon successful completion, the output directory of each ggeotop job contains:
- the
out
folder will contains what has been produced during the computation of the related job.
Example usage¶
This section contains commented example sessions with ggeotop.
Manage a set of jobs from start to end¶
In typical operation, one calls ggeotop with the -C
option and lets it manage a set of jobs until completion.
So, to analyse all valid folders under input_folder
, submitting
200 jobs simultaneously each of them requesting 2GB of memory and 8
hours of wall-clock time, one can use the following
command-line invocation:
$ ggeotop -s example -C 120 -x
/apps/geotop/bin/geotop_1_224_20120227_static -w 8 input_folder
The -s example
option tells ggeotop to store
information about the computational jobs in the example.jobs
directory.
The -C 120
option tells ggeotop to update job state
every 120 seconds; output from finished jobs is retrieved and new jobs
are submitted at the same interval.
The above command will start by printing a status report like the following:
Status of jobs in the 'example.csv' session:
SUBMITTED 1/1 (100.0%)
It will continue printing an updated status report every 120 seconds until the requested parameter range has been computed.
In GC3Pie terminology when a job is finished and its output has been
successfully retrieved, the job is marked as TERMINATED
:
Status of jobs in the 'example.csv' session:
TERMINATED 1/1 (100.0%)
Using GC3Pie utilities¶
GC3Pie comes with a set of generic utilities that could be used as a complemet to the ggeotop command to better manage a entire session execution.
gkill: cancel a running job¶
To cancel a running job, you can use the command gkill. For instance, to cancel job.16, you would type the following command into the terminal:
gkill job.16
or:
gkill -s example job.16
gkill could also be used to cancel jobs in a given state
gkill -s example -l UNKNOWN
Warning
There’s no way to undo a cancel operation! Once you have issued a gkill command, the job is deleted and it cannot be resumed. (You can still re-submit it with gresub, though.)
ginfo: accessing low-level details of a job¶
It is sometimes necessary, for debugging purposes, to print out all the details about a job; the ginfo command does just that: prints all the details that GC3Utils know about a single job.
For instance, to print out detailed information about job.13 in session example, you would type
ginfo -s example job.13
For a job in RUNNING
or SUBMITTED
state, only little
information is known: basically, where the job is running, and when it
was started:
$ ginfo -s example job.13
job.13
cores: 2
execution_targets: hera.wsl.ch
log:
SUBMITTED at Tue May 15 09:52:05 2012
Submitted to 'wsl' at Tue May 15 09:52:05 2012
RUNNING at Tue May 15 10:07:39 2012
lrms_jobid: gsiftp://hera.wsl.ch:2811/jobs/116613370683251353308673
lrms_jobname: GC3Pie_00002
original_exitcode: -1
queue: smscg.q
resource_name: wsl
state_last_changed: 1337069259.18
stderr_filename: ggeotop.log
stdout_filename: ggeotop.log
timestamp:
RUNNING: 1337069259.18
SUBMITTED: 1337068325.26
unknown_iteration: 0
used_cputime: 1380
used_memory: 3382706
If you omit the job number, information about all jobs in the session will be printed.
Most of the output is only useful if you are familiar with GC3Utils inner working. Nonetheless, ginfo output is definitely something you should include in any report about a misbehaving job!
For a finished job, the information is more complete and can include error messages in case the job has failed:
$ ginfo -c -s example job.13
job.13
_arc0_state_last_checked: 1337069259.18
_exitcode: 0
_signal: None
_state: TERMINATED
cores: 2
download_dir: /data/geotop/results/00002
execution_targets: hera.wsl.ch
log:
SUBMITTED at Tue May 15 09:52:04 2012
Submitted to 'wsl' at Tue May 15 09:52:04 2012
TERMINATING at Tue May 15 10:07:39 2012
Final output downloaded to '/data/geotop/results/00002'
TERMINATED at Tue May 15 10:07:43 2012
lrms_jobid: gsiftp://hera.wsl.ch:2811/jobs/11441337068324584585032
lrms_jobname: GC3Pie_00002
original_exitcode: 0
queue: smscg.q
resource_name: wsl
state_last_changed: 1337069263.13
stderr_filename: ggeotop.log
stdout_filename: ggeotop.log
timestamp:
SUBMITTED: 1337068324.87
TERMINATED: 1337069263.13
TERMINATING: 1337069259.18
unknown_iteration: 0
used_cputime: 360
used_memory: 3366977
used_walltime: 300
With option -v
, ginfo output is even more verbose and complete,
and includes information about the application itself, the input and
output files, plus some backend-specific information
$ ginfo -c -s example job.13
job.13
arguments: 00002
changed: False
environment:
executable: geotop_static
executables: geotop_static
execution:
_arc0_state_last_checked: 1337069259.18
_exitcode: 0
_signal: None
_state: TERMINATED
cores: 2
download_dir: /data/geotop/results/00002
execution_targets: hera.wsl.ch
log:
SUBMITTED at Tue May 15 09:52:04 2012
Submitted to 'wsl' at Tue May 15 09:52:04 2012
TERMINATING at Tue May 15 10:07:39 2012
Final output downloaded to '/data/geotop/results/00002'
TERMINATED at Tue May 15 10:07:43 2012
lrms_jobid: gsiftp://hera.wsl.ch:2811/jobs/11441337068324584585032
lrms_jobname: GC3Pie_00002
original_exitcode: 0
queue: smscg.q
resource_name: wsl
state_last_changed: 1337069263.13
stderr_filename: ggeotop.log
stdout_filename: ggeotop.log
timestamp:
SUBMITTED: 1337068324.87
TERMINATED: 1337069263.13
TERMINATING: 1337069259.18
unknown_iteration: 0
used_cputime: 360
used_memory: 3366977
used_walltime: 300
jobname: GC3Pie_00002
join: True
output_base_url: None
output_dir: /data/geotop/results/00002
outputs:
@output.list: file, , @output.list, None, None, None, None
ggeotop.log: file, , ggeotop.log, None, None, None, None
persistent_id: job.1698503
requested_architecture: x86_64
requested_cores: 2
requested_memory: 4
requested_walltime: 4
stderr: None
stdin: None
stdout: ggeotop.log
tags: APPS/EARTH/GEOTOP