The gcrypto script¶
GC3Apps provide a script drive execution of multiple gnfs-cmd
jobs
each of them with a different parameter set. Allotogehter they form a
single crypto simulation of a large parameter space.
It uses the generic gc3libs.cmdline.SessionBasedScript framework.
The purpose of gcrypto is to execute several concurrent
runs of gnfs-cmd
on a parameter set. These runs are performed in
parallel using every available GC3Pie resource; you can of
course control how many runs should be executed and select what output
files you want from each one.
Introduction¶
Like in a for-loop, the gcrypto driver script takes as input three mandatory arguments:
- RANGE_START: initial value of the range (e.g., 800000000)
- RANGE_END: final value of the range (e.g., 1200000000)
- SLICE: extent of the range that will be examined by a single job (e.g., 1000)
For example:
# gcrypto 800000000 1200000000 1000
will produce 400000 jobs; the first job will perform calculations on the range 800000000 to 800000000+1000, the 2nd one will do the range 800001000 to 800002000, and so on.
Inputfile archive location (e.g. lfc://lfc.smscg.ch/crypto/lacal/input.tgz) can be specified with the ‘-i’ option. Otherwise a default filename ‘input.tgz’ will be searched in current directory.
Job progress is monitored and, when a job is done,
output is retrieved back to submitting host in folders named:
RANGE_START + (SLICE * ACTUAL_STEP)
Where ACTUAL_STEP
correspond to the position of the job in the
overall execution.
The gcrypto command keeps a record of jobs (submitted, executed and pending) in a session file (set name with the ‘-s’ option); at each invocation of the command, the status of all recorded jobs is updated, output from finished jobs is collected, and a summary table of all known jobs is printed. New jobs are added to the session if new input files are added to the command line.
Options can specify a maximum number of jobs that should be in ‘SUBMITTED’ or ‘RUNNING’ state; gcrypto will delay submission of newly-created jobs so that this limit is never exceeded.
The gcrypto execute several runs of gnfs-cmd
on a
parameter set, and collect the generated output. These runs are
performed in parallel, up to a limit that can be configured with the
-J
command-line option. You can of course control how
many runs should be executed and select what output files you want
from each one.
In more detail, gcrypto does the following:
Reads the session (specified on the command line with the
--session
option) and loads all stored jobs into memory. If the session directory does not exist, one will be created with empty contents.Divide the initial parameter range, given in the command-line, into chunks taking the
-J
value as a reference. So from a coomand line argument like the following:$ gcrypto 800000000 1200000000 1000 -J 200
gcrypto will generate an initial chunks of 200 jobs starting from the initial range 800000000 incrementing of 1000. All jobs will run
gnfs-cmd
on a specific parameter set (e.g. 800000000, 800001000, 800002000, …). gcrypto will keep constant the number of simulatenous jobs running retrieving those terminated and submitting new ones untill the whole parameter range has been computed.Updates the state of all existing jobs, collects output from finished jobs, and submits new jobs generated in step 2.
Finally, a summary table of all known jobs is printed. (To control the amount of printed information, see the
-l
command-line option in the Introduction to session-based scripts section.)If the
-C
command-line option was given (see below), waits the specified amount of seconds, and then goes back to step 3.The program gcrypto exits when all jobs have run to completion, i.e., when the whole paramenter range has been computed.
Execution can be interrupted at any time by pressing Ctrl+C. If the execution has been interrupted, it can be resumed at a later stage by calling gcrypto with exactly the same command-line options.
gcrypto requires a number of default input files common to every submited job. This list of input files is automatically fetched by gcrypto from a default storage repository. Those files are:
gnfs-lasieve6
M1019
M1019.st
M1037
M1037.st
M1051
M1051.st
M1067
M1067.st
M1069
M1069.st
M1093
M1093.st
M1109
M1109.st
M1117
M1117.st
M1123
M1123.st
M1147
M1147.st
M1171
M1171.st
M8e_1200
M8e_1500
M8e_200
M8e_2000
M8e_2500
M8e_300
M8e_3000
M8e_400
M8e_4200
M8e_600
M8e_800
tdsievemt
When gcrypto has to be executed with a different set of input
files, an additional command line argument --input-files
could be
used to specify the locatin of a tar.gz
archive containing the
input files that gnfs-cmd
will expect. Similarly, when a different
version of gnfs-cmd command needs to be used, the command line
argument --gnfs-cmd
could be used to specify the location of the
gnfs-cmd
to be used.
Command-line invocation of gcrypto¶
The gcrypto script is based on GC3Pie’s session-based script model; please read also the Introduction to session-based scripts section for an introduction to sessions and generic command-line options.
A gcrypto command-line is constructed as follows: Like a for-loop, the gcrypto driver script takes as input three mandatory arguments:
- RANGE_START: initial value of the range (e.g., 800000000)
- RANGE_END: final value of the range (e.g., 1200000000)
- SLICE: extent of the range that will be examined by a single job (e.g., 1000)
Example 1. The following command-line invocation uses
gcrypto to run gnfs-cmd
on the parameter set ranging from
800000000 to 1200000000 with an increment of 1000.
$ gcrypto 800000000 1200000000 1000
In this case gcrypto will use the default values for
determine the chunks size from the default value of the -J
option (default value is 50 simulatenous jobs).
Example 2.
$ gcrypto --session SAMPLE_SESSION -c 4 -w 4 -m 8 800000000 1200000000 1000
In this example, job information is stored into session
SAMPLE_SESSION
(see the documentation of the --session
option
in Introduction to session-based scripts). The command above creates the jobs,
submits them, and finally prints the following status report:
Status of jobs in the 'SAMPLE_SESSION' session: (at 10:53:46, 02/28/12)
NEW 0/50 (0.0%)
RUNNING 0/50 (0.0%)
STOPPED 0/50 (0.0%)
SUBMITTED 50/50 (100.0%)
TERMINATED 0/50 (0.0%)
TERMINATING 0/50 (0.0%)
total 50/50 (100.0%)
Note that the status report counts the number of jobs in the session, not the total number of jobs that would correspond to the whole parameter range. (Feel free to report this as a bug.)
Calling gcrypto over and over again will result in the same jobs being monitored;
The -C
option tells gcrypto to continue running until
all jobs have finished running and the output files have been
correctly retrieved. On successful completion, the command given in
example 2. above, would print:
Status of jobs in the 'SAMPLE_SESSION' session: (at 11:05:50, 02/28/12)
NEW 0/400k (0.0%)
RUNNING 0/400k (0.0%)
STOPPED 0/400k (0.0%)
SUBMITTED 0/400k (0.0%)
TERMINATED 50/400k (100.0%)
TERMINATING 0/400k (0.0%)
ok 400k/400k (100.0%)
total 400k/400k (100.0%)
Each job will be named after the parameter range it has computed (e.g.
800001000, 800002000, … ) (you could see this by passing
the -l
option to gcrypto); each of these jobs will
create an output directory named after the job.
For each job, the set of output files is automatically retrieved and placed in the locations described below.
Output files for gcrypto¶
Upon successful completion, the output directory of each gcrypto job contains:
- a number of
.tgz
files each of them correspondin to a step within the execution of thegnfs-cmd
command. - A log file named
gcrypto.log
containing both the stdout and the stderr of thegnfs-cmd
execution.
Note
The number of .tgz
files may depend on whether the
execution of the gnfs-cmd
command has completed or not
(e.g. jobs may be killed by the batch system when exausting
requested resources)
Example usage¶
This section contains commented example sessions with gcrypto.
Manage a set of jobs from start to end¶
In typical operation, one calls gcrypto with the -C
option and lets it manage a set of jobs until completion.
So, to compute a whole parameter range from 800000000 to 1200000000 with an increment of 1000, submitting 200 jobs simultaneously each of them requesting 4 computing cores, 8GB of memory and 4 hours of wall-clock time, one can use the following command-line invocation:
$ gcrypto -s example -C 120 -J 200 -c 4 -w 4 -m 8 800000000 1200000000 1000
The -s example
option tells gcrypto to store
information about the computational jobs in the example.jobs
directory.
The -C 120
option tells gcrypto to update job state
every 120 seconds; output from finished jobs is retrieved and new jobs
are submitted at the same interval.
The above command will start by printing a status report like the following:
Status of jobs in the 'example.csv' session:
SUBMITTED 1/1 (100.0%)
It will continue printing an updated status report every 120 seconds until the requested parameter range has been computed.
In GC3Pie terminology when a job is finished and its output has been
successfully retrieved, the job is marked as TERMINATED
:
Status of jobs in the 'example.csv' session:
TERMINATED 1/1 (100.0%)
Using GC3Pie utilities¶
GC3Pie comes with a set of generic utilities that could be used as a complemet to the gcrypto command to better manage a entire session execution.
gkill: cancel a running job¶
To cancel a running job, you can use the command gkill. For instance, to cancel job.16, you would type the following command into the terminal:
gkill job.16
or:
gkill -s example job.16
gkill could also be used to cancel jobs in a given state
gkill -s example -l UNKNOWN
Warning
There’s no way to undo a cancel operation! Once you have issued a gkill command, the job is deleted and it cannot be resumed. (You can still re-submit it with gresub, though.)
ginfo: accessing low-level details of a job¶
It is sometimes necessary, for debugging purposes, to print out all the details about a job; the ginfo command does just that: prints all the details that GC3Utils know about a single job.
For instance, to print out detailed information about job.13 in session example, you would type
ginfo -s example job.13
For a job in RUNNING
or SUBMITTED
state, only little
information is known: basically, where the job is running, and when it
was started:
$ ginfo -s example job.13
job.13
cores: 2
execution_targets: hera.wsl.ch
log:
SUBMITTED at Tue May 15 09:52:05 2012
Submitted to 'wsl' at Tue May 15 09:52:05 2012
RUNNING at Tue May 15 10:07:39 2012
lrms_jobid: gsiftp://hera.wsl.ch:2811/jobs/116613370683251353308673
lrms_jobname: LACAL_800001000
original_exitcode: -1
queue: smscg.q
resource_name: wsl
state_last_changed: 1337069259.18
stderr_filename: gcrypto.log
stdout_filename: gcrypto.log
timestamp:
RUNNING: 1337069259.18
SUBMITTED: 1337068325.26
unknown_iteration: 0
used_cputime: 1380
used_memory: 3382706
If you omit the job number, information about all jobs in the session will be printed.
Most of the output is only useful if you are familiar with GC3Utils inner working. Nonetheless, ginfo output is definitely something you should include in any report about a misbehaving job!
For a finished job, the information is more complete and can include error messages in case the job has failed:
$ ginfo -c -s example job.13
job.13
_arc0_state_last_checked: 1337069259.18
_exitcode: 0
_signal: None
_state: TERMINATED
cores: 2
download_dir: /data/crypto/results/example.out/8000001000
execution_targets: hera.wsl.ch
log:
SUBMITTED at Tue May 15 09:52:04 2012
Submitted to 'wsl' at Tue May 15 09:52:04 2012
TERMINATING at Tue May 15 10:07:39 2012
Final output downloaded to '/data/crypto/results/example.out/8000001000'
TERMINATED at Tue May 15 10:07:43 2012
lrms_jobid: gsiftp://hera.wsl.ch:2811/jobs/11441337068324584585032
lrms_jobname: LACAL_800001000
original_exitcode: 0
queue: smscg.q
resource_name: wsl
state_last_changed: 1337069263.13
stderr_filename: gcrypto.log
stdout_filename: gcrypto.log
timestamp:
SUBMITTED: 1337068324.87
TERMINATED: 1337069263.13
TERMINATING: 1337069259.18
unknown_iteration: 0
used_cputime: 360
used_memory: 3366977
used_walltime: 300
With option -v
, ginfo output is even more verbose and complete,
and includes information about the application itself, the input and
output files, plus some backend-specific information:
$ ginfo -c -s example job.13
job.13
arguments: 800000800, 100, 2, input.tgz
changed: False
environment:
executable: gnfs-cmd
executables: gnfs-cmd
execution:
_arc0_state_last_checked: 1337069259.18
_exitcode: 0
_signal: None
_state: TERMINATED
cores: 2
download_dir: /data/crypto/results/example.out/8000001000
execution_targets: hera.wsl.ch
log:
SUBMITTED at Tue May 15 09:52:04 2012
Submitted to 'wsl' at Tue May 15 09:52:04 2012
TERMINATING at Tue May 15 10:07:39 2012
Final output downloaded to '/data/crypto/results/example.out/8000001000'
TERMINATED at Tue May 15 10:07:43 2012
lrms_jobid: gsiftp://hera.wsl.ch:2811/jobs/11441337068324584585032
lrms_jobname: LACAL_800001000
original_exitcode: 0
queue: smscg.q
resource_name: wsl
state_last_changed: 1337069263.13
stderr_filename: gcrypto.log
stdout_filename: gcrypto.log
timestamp:
SUBMITTED: 1337068324.87
TERMINATED: 1337069263.13
TERMINATING: 1337069259.18
unknown_iteration: 0
used_cputime: 360
used_memory: 3366977
used_walltime: 300
inputs:
srm://dpm.lhep.unibe.ch/dpm/lhep.unibe.ch/home/crypto/gnfs-cmd_20120406: gnfs-cmd
srm://dpm.lhep.unibe.ch/dpm/lhep.unibe.ch/home/crypto/lacal_input_files.tgz: input.tgz
jobname: LACAL_800000900
join: True
output_base_url: None
output_dir: /data/crypto/results/example.out/8000001000
outputs:
@output.list: file, , @output.list, None, None, None, None
gcrypto.log: file, , gcrypto.log, None, None, None, None
persistent_id: job.1698503
requested_architecture: x86_64
requested_cores: 2
requested_memory: 4
requested_walltime: 4
stderr: None
stdin: None
stdout: gcrypto.log
tags: APPS/CRYPTO/LACAL-1.0