bcftbx.JobRunner
¶
Classes for starting, stopping and managing jobs.
Class BaseJobRunner is a template with methods that need to be implemented by subclasses. The subclasses implemented here are:
- SimpleJobRunner: run jobs (e.g. scripts) on a local file system.
- GEJobRunner : run jobs using Grid Engine (GE) i.e. qsub, qdel etc
- DRMAAJobRunner : run jobs using the DRMAA interface to Grid Engine
A single JobRunner instance can be used to start and manage multiple processes.
Each job is started by invoking the ‘run’ method of the runner. This returns an id string which is then used in calls to the ‘isRunning’, ‘terminate’ etc methods to check on and control the job.
The runner’s ‘list’ method returns a list of running job ids.
Simple usage example:
>>> # Create a JobRunner instance
>>> runner = SimpleJobRunner()
>>> # Start a job using the runner and collect its id
>>> job_id = runner.run('Example',None,'myscript.sh')
>>> # Wait for job to complete
>>> import time
>>> while runner.isRunning(job_id):
>>> time.sleep(10)
>>> # Get the names of the output files
>>> log,err = (runner.logFile(job_id),runner.errFile(job_id))
-
class
bcftbx.JobRunner.
BaseJobRunner
¶ Base class for implementing job runners
This class can be used as a template for implementing custom job runners. The idea is that the runners wrap the specifics of interacting with an underlying job control system and thus provide a generic interface to be used by higher level classes.
A job runner needs to implement the following methods:
run : starts a job running terminate : kills a running job list : lists the running job ids logFile : returns the name of the log file for a job errFile : returns the name of the error file for a job exit_status: returns the exit status for the command (or
None if the job is still running)Optionally it can also implement the methods:
errorState: indicates if running job is in an “error state” isRunning : checks if a specific job is runningif the default implementations are not sufficient.
-
errFile
(job_id)¶ Return name of error file relative to working directory
-
errorState
(job_id)¶ Check if the job is in an error state
Return True if the job is deemed to be in an ‘error state’, False otherwise.
-
exit_status
(job_id)¶ Return the exit status code for the command
Return the exit status code from the command that was run by the specified job, or None if the job hasn’t exited yet.
-
isRunning
(job_id)¶ Check if a job is running
Returns True if job is still running, False if not
-
list
()¶ Return a list of running job_ids
-
logFile
(job_id)¶ Return name of log file relative to working directory
-
log_dir
¶ Return the current log directory setting
-
run
(name, working_dir, script, args)¶ Start a job running
- Arguments:
- name: Name to give the job working_dir: Directory to run the job in script: Script file to run args: List of arguments to supply to the script
- Returns:
- Returns a job id, or None if the job failed to start
-
set_log_dir
(log_dir)¶ (Re)set the directory to write log files to
-
terminate
(job_id)¶ Terminate a job
Returns True if termination was successful, False otherwise
-
-
class
bcftbx.JobRunner.
DRMAAJobRunner
(queue=None)¶ Class implementing job runner using DRMAA
DRMAAJobRunner submits jobs to a Grid Engine cluster using the Python interface to Distributed Resource Management Application API (DRMAA), as an alternative to the GEJobRunner.
The DRMAAJobRunner requires: - the drmaa libraries (e.g. libdrmaa.so), pointed to by the
environment variable DRMAA_LIBRARY_PATH- the Python drmma library, see http://code.google.com/p/drmaa-python/
-
errFile
(job_id)¶ Return the error file name for a job
The name should be ‘<name>.e<job_id>’
-
errorState
(job_id)¶ Check if the job is in an error state
Return True if the job is deemed to be in an ‘error state’, False otherwise.
-
list
()¶ Get list of job ids in the queue.
-
logFile
(job_id)¶ Return the log file name for a job
The name should be ‘<name>.o<job_id>’
-
queue
(job_id)¶ Fetch the job queue name
Returns the queue as reported by qstat, or None if not found.
-
run
(name, working_dir, script, args)¶ Submit a script or command to the cluster via DRMAA
- Arguments:
- name: Name to give the job working_dir: Directory to run the job in script: Script file to run args: List of arguments to supply to the script
- Returns:
- Job id for submitted job, or ‘None’ if job failed to start.
-
terminate
(job_id)¶ Remove a job from the GE queue
-
class
bcftbx.JobRunner.
GEJobRunner
(queue=None, log_dir=None, ge_extra_args=None, poll_interval=5.0, timeout=30.0)¶ Class implementing job runner for Grid Engine
GEJobRunner submits jobs to a Grid Engine cluster using the ‘qsub’ command, determines the status of jobs using ‘qstat’ and terminates then using ‘qdel’.
Additionally the runner can be configured for a specific GE queue on initialisation.
Each GEJobRunner instance creates a temporary directory which it uses for internal admin; this will be removed at program exit via ‘atexit’.
-
errFile
(job_id)¶ Return the error file name for a job
The name should be ‘<name>.e<job_id>’
-
errorState
(job_id)¶ Check if the job is in an error state
Return True if the job is deemed to be in an ‘error state’ (i.e. qstat returns the state as ‘E..’), False otherwise.
-
exit_status
(job_id)¶ Return exit status from command run by a job
If the job is still running then returns ‘None’.
-
ge_extra_args
¶ Return the extra GE arguments
-
list
()¶ Get list of job ids which are queued or running
-
logFile
(job_id)¶ Return the log file name for a job
The name should be ‘<name>.o<job_id>’
-
name
(job_id)¶ Return the name for a job
-
queue
(job_id)¶ Fetch the job queue name
Returns the queue as reported by qstat, or None if not found.
-
run
(name, working_dir, script, args)¶ Submit a script or command to the cluster via ‘qsub’
- Arguments:
- name: Name to give the job working_dir: Directory to run the job in script: Script file to run args: List of arguments to supply to the script
- Returns:
- Job id for submitted job, or ‘None’ if job failed to start.
-
terminate
(job_id)¶ Remove a job from the GE queue using ‘qdel’
-
-
class
bcftbx.JobRunner.
SimpleJobRunner
(log_dir=None, join_logs=False)¶ Class implementing job runner for local system
SimpleJobRunner starts jobs as processes on a local system; the status of jobs is determined using the Linux ‘ps eu’ command, and jobs are terminated using ‘kill -9’.
-
errFile
(job_id)¶ Return the error file name for a job
-
exit_status
(job_id)¶ Return exit status from command run by a job
-
list
()¶ Return a list of running job_ids
-
logFile
(job_id)¶ Return the log file name for a job
-
name
(job_id)¶ Return the name for a job
-
run
(name, working_dir, script, args)¶ Run a command and return the PID (=job id)
- Arguments:
- name: Name to give the job working_dir: Directory to run the job in script: Script file to run args: List of arguments to supply to the script
- Returns:
- Job id for submitted job, or ‘None’ if job failed to start.
-
terminate
(job_id)¶ Kill a running job using ‘kill -9’
-
-
bcftbx.JobRunner.
fetch_runner
(definition)¶ Return job runner instance based on a definition string
Given a definition string, returns an appropriate runner instance.
Definitions are of the form:
RunnerName[(args)]RunnerName can be ‘SimpleJobRunner’ or ‘GEJobRunner’. If ‘(args)’ are also supplied then these are passed to the job runner on instantiation (only works for GE runners).