Queueing Pt. 2: Linking Jobs
Often, we'll want to run Tasks
one after the other, where each depends on the previous. Like in (Let's get Pipelining), this allows us to form complex protein design pipelines. In this example, we'll schedule a job to generate protein sequences using LigandMPNN, and schedule a subsequent job to fold the sequences using RaptorX-Single. (Code: /examples/example4
)
We'll start as before, defining a LigandMPNN Task
:
import ribbon
# First, we create 5 new sequences for this structure:
lmpnn_task = ribbon.LigandMPNN(
structure_list = ['my_structure.pdb'],
output_dir = './out/lmpnn',
num_designs = 5
)
Then, we'll queue this task. The queue()
function returns the job id, which is a unique identifier for the job we've submitted to the scheduler.
lmpnn_job_id = lmpnn_task.queue(scheduler='SGE')
We create and submit a RaptorX-Single Task
, using the depends_on
parameter to pass in a list of job IDs.
raptorx_task = ribbon.RaptorXSingle(
fasta_file_or_dir = './out/lmpnn',
output_dir = './out/raptorx',
)
raptorx_task.queue(
scheduler='SGE'
depends_on = [lmpnn_job_id]
)
Dependency Types
By default, the job will only run after all dependencies complete successfully. If using SLURM, more complex behavior can be set using the dependency_type
parameter, which takes standard SLURM dependency types (default: afterok
).
Here's our script completed:
import ribbon
# First, we create 5 new sequences for this structure:
lmpnn_task = ribbon.LigandMPNN(
structure_list = ['my_structure.pdb'],
output_dir = './out/lmpnn',
num_designs = 5
)
# We'll queue the job, and get the job ID
lmpnn_job_id = lmpnn_task.queue(scheduler='SLURM')
# Then, we create and queue a RaptorX Task:
raptorx_task = ribbon.RaptorXSingle(
fasta_file_or_dir = './out/lmpnn',
output_dir = './out/raptorx',
)
raptorx_task.queue(
scheduler='SLURM',
depends_on = [lmpnn_job_id]
)