Introduction to session-based scripts

All GC3Apps scripts derive their core functionality from a common blueprint, named a session-based script. The purpose of this section is to describe this common functionality; script-specific sections detail the scope and options that are unique to a given script. Readers interested in Python programming can find the complete documentation about the session-based script API in the SessionBasedScript section.

The functioning of GC3Apps scripts revolves around a so-called session. A session is just a named collection of jobs. For instance, you could group into a single session jobs that analyze a set of related files.

Each time it is run, a GC3Apps script performs the following steps:

  1. Reads the session directory and loads all stored jobs into memory. If the session directory does not exist, one will be created with empty contents.

  2. Scans the command-line input arguments: if existing jobs do not suffice to analyze the input data, new jobs are added to the session.

  3. The status of all existing jobs is updated, output from finished jobs is collected, and new jobs are submitted.

    Finally, a summary table of all known jobs is printed. (To control the amount of printed information, see the -l command-line option below.)

  4. If the -C command-line option was given (see below), waits the specified amount of seconds, and then goes back to step 3.

    Execution can be interrupted at any time by pressing Ctrl+C.

Basic command-line usage and options

The exact command-line usage of session-based scripts varies from one script to the other, so please consult the documentation page for your application. There are quite a number of common options and behaviors, however, which are described here.

Continuous execution

While single-pass execution of a GC3Apps script is possible (and sometimes used), it is much more common to keep the script running and let it manage jobs until all are finished. This is accomplished with the following command-line option:

-C NUM, --continuous NUM
 

Keep running, monitoring jobs and possibly submitting new ones or fetching results every NUM seconds.

When all jobs are finished, a GC3Apps script exits even if the -C option is given.

Verbose listing of jobs

Only a summary of job states is printed by default at the end of step 3., together with the count of jobs that are in the specified state. Use the -l option (see below) to get a detailed listing of all jobs.

-l STATE, --state STATE
 

Print a table of jobs including their status.

The STATE argument restricts output to jobs in that particular state. It can be a single state word (e.g., RUNNING) or a comma-separated list thereof (e.g., NEW,SUBMITTED,RUNNING).

The pseudo-states ok and failed are also allowed for selecting jobs in TERMINATED state with exit code (respectively) 0 or nonzero.

If STATE is omitted, no restriction is placed on job states, and a table of all jobs is printed.

Maximum number of concurrent jobs

There is a maximum number of jobs that can be in SUBMITTED or RUNNING state at a given time. GC3Apps scripts will delay submission of newly-created jobs so that this limit is never exceeded. The default limit is 50, but it can be changed with the following command-line option:

-J NUM, --max-running NUM
 Set the maximum NUMber of jobs (default: 50) in SUBMITTED or RUNNING state.

Location of output files

By default, output files are placed in the same directory where the corresponding input file resides. This can be changed with the following option; it is also possible to specify output locations that vary depending on certain job features.

-o DIRECTORY, --output DIRECTORY
 Output files from all jobs will be collected in the specified DIRECTORY path. If the destination directory does not exist, it is created.

Job control options

These command-line options control the requirements and constraints of new jobs. Indeed, note that changing the arguments to these options does not change the corresponding requirements on jobs that already exist in the session.

-c NUM, --cpu-cores NUM
 Set the number of CPU cores required for each job (default: 1). NUM must be a whole number.
-m GIGABYTES, --memory-per-core GIGABYTES
 Set the amount of memory required per execution core; (Default: 2GB). Specify this as an integral number followed by a unit, e.g. ‘512MB’ or ‘4GB’. Valid unit names are: ‘B’, ‘GB’, ‘GiB’, ‘KiB’, ‘MB’, ‘MiB’, ‘PB’, ‘PiB’, ‘TB’, ‘TiB’, ‘kB’.
-r NAME, --resource NAME
 Submit jobs to a specific resource. NAME is a reource name or comma-separated list of resource names. Use the command gservers to list available resources.
-w DURATION, --wall-clock-time DURATION
 Set the time limit for each job; default is ‘8 hours’. Jobs exceeding this limit will be stopped and considered as ‘failed’. The duration can be expressed as a whole number followed by a time unit, e.g., ‘3600 s’, ‘60 minutes’, ‘8 hours’, or a combination thereof, e.g., ‘2hours 30minutes’. Valid unit names are: ‘d’, ‘day’, ‘days’, ‘h’, ‘hour’, ‘hours’, ‘hr’, ‘hrs’, ‘m’, ‘microsec’, ‘microseconds’, ‘min’, ‘mins’, ‘minute’, ‘minutes’, ‘ms’, ‘nanosec’, ‘nanoseconds’, ‘ns’, ‘s’, ‘sec’, ‘second’, ‘seconds’, ‘secs’.

Session control options

This set of options control the placement and contents of the session.

-s PATH, --session PATH
 

Store the session information in the directory at PATH. (By default, this is a subdirectory of the current directory, named after the script you are executing.)

If PATH is an existing directory, it will be used for storing job information, and an index file (with suffix .csv) will be created in it. Otherwise, the job information will be stored in a directory named after PATH with a suffix .jobs appended, and the index file will be named after PATH with a suffix .csv added.

-N, --new-session
 Discard any information saved in the session directory (see the --session option) and start a new session afresh. Any information about jobs previously recorded in the session is lost.
-u, --store-url URL
 

Store GC3Pie job information at the persistent storage specified by URL. The URL can be any form that is understood by the gc3libs.persistence.make_store() function (which see for details). A few examples:

  • sqlite – the jobs are stored in a SQLite3 database named jobs.db and contained in the session directory.
  • /path/to/a/directory – the jobs are stored in the given directory, one file per job (this is the default format used by GC3Pie)
  • sqlite:////path/to/a/file.db – the jobs are stored in the given SQLite3 database file.
  • mysql://user,passwd@server/dbname – jobs are stored in table store of the specified MySQL database. The DB server and connection credentials (username, password) are also part of the URL.

If this option is omitted, GC3Pie’s SessionBasedScript defaults to storing jobs in the subdirectory jobs of the session directory; each job is saved in a separate file.

Exit code

A GC3Apps script exits when all jobs are finished, when some error occurred that prevented the script from completing, or when a user interrupts it with Ctrl+C

In any case, the exit code of GC3Apps scripts tracks job status (in the following sense). The exitcode is a bitfield; the 4 least-significant bits are assigned a meaning according to the following table:

Bit Meaning
0 Set if a fatal error occurred: the script could not complete
1 Set if there are jobs in FAILED state
2 Set if there are jobs in RUNNING or SUBMITTED state
3 Set if there are jobs in NEW state

This boils down to the following rules:

  • exitcode is 0: all jobs are DONE, no further action will be taken by the script (which exists immediately if called again on the same session).
  • exitcode is 1: an error interrupted the script execution.
  • exitcode is 2: all jobs finished, but some are in FAILED state.
  • exitcode > 3: run the script again to make jobs progress.