@@ -4,62 +4,69 @@ Glossary
44.. glossary ::
55
66 Cache-root
7- The directory where cache directories for tasks to be executed are created.
8- Task cache directories are named within the cache root directory using a hash
9- of the task's parameters, so that the same task with the same parameters can be
10- reused.
7+ The root directory in which separate cache directories for each job are created.
8+ Job cache directories are named within the cache-root directory using a unique
9+ checksum for the job based on the task's parameters and software environment,
10+ so that if the same job is run again the outputs from the previous run can be
11+ reuused.
1112
1213 Combiner
1314 A combiner is used to combine :ref: `State-array ` values created by a split operation
1415 defined by a :ref: `Splitter ` on the current node, upstream workflow nodes or
1516 stand-alone tasks.
1617
1718 Container-ndim
18- The number of dimensions of the container object to be iterated over when using
19- a :ref: `Splitter ` to split over an iterable value. For example, a list-of-lists
20- or a 2D array with `container_ndim=2 ` would be split over the elements of the
21- inner lists into a single 1-D state array. However, if `container_ndim=1 `,
22- the outer list/2D would be split into a 1-D state array of lists/1D arrays.
19+ The number of dimensions of the container object to be flattened into a single
20+ state array when splitting over nested containers/multi-dimension arrays.
21+ For example, a list-of-list-of-floats or a 2D numpy array with `container_ndim=1 `,
22+ the outer list/2D would be split into a 1-D state array consisting of
23+ list-of-floats or 1D numpy arrays, respectively. Whereas with
24+ `container_ndim=2 ` they would be split into a state-array of floats consisiting
25+ of all the elements of the inner-lists/array.
2326
2427 Environment
2528 An environment refers to a specific software encapsulation, such as a Docker
26- or Singularity image, that is used to run a task.
29+ or Singularity image, in which a shell tasks are run. They are specified in the
30+ Submitter object to be used when executing a task.
2731
2832 Field
29- A field is a parameter of a task, or a task outputs object, that can be set to
30- a specific value. Fields are specified to be of any types, including objects
31- and file-system objects.
33+ A field is a parameter of a task, or an output in a task outputs class.
34+ Fields define the expected datatype of the parameter and other metadata
35+ parameters that control how the field is validated and passed through to the
36+ execution of the task.
3237
3338 Hook
34- A hook is a user-defined function that is executed at a specific point in the task
35- execution process . Hooks can be used to prepare/finalise the task cache directory
39+ A hook is a user-defined function that is executed at a specific point either before
40+ or after a task is run . Hooks can be used to prepare/finalise the task cache directory
3641 or send notifications
3742
3843 Job
39- A job is a discrete unit of work, a :ref: `Task `, with all inputs resolved
40- (i.e. not lazy-values or state-arrays) that has been assigned to a worker.
41- A task describes "what" is to be done and a submitter object describes
42- "how" it is to be done, a job combines both objects to describe a concrete unit
43- of processing.
44+ A job consists of a :ref: `Task ` with all inputs resolved
45+ (i.e. not lazy-values or state-arrays) and a Submitter object. It therefore
46+ represents a concrete unit of work to be executed, be combining "what" is to be
47+ done (Task) with "how" it is to be done (Submitter).
4448
4549 Lazy-fields
4650 A lazy-field is a field that is not immediately resolved to a value. Instead,
47- it is a placeholder that will be resolved at runtime, allowing for dynamic
48- parameterisation of tasks.
51+ it is a placeholder that will be resolved at runtime when a workflow is executed,
52+ allowing for dynamic parameterisation of tasks.
4953
5054 Node
51- A single task within the context of a workflow, which is assigned a name and
52- references a state. Note this task can be nested workflow task.
55+ A single task within the context of a workflow. It is assigned a unique name
56+ within the workflow and references a state object that determines the
57+ state-array of jobs to be run if present (if the state is None then a single
58+ job will be run for each node).
5359
5460 Read-only-caches
5561 A read-only cache is a cache root directory that was created by a previous
56- pydra runs, which is checked for matching task caches to be reused if present
57- but not written not modified during the execution of a task.
62+ pydra run. The read-only caches are checked for matching job checksums, which
63+ are reused if present. However, new job cache dirs are written to the cache root
64+ so the read-only caches are not modified during the execution.
5865
5966 State
6067 The combination of all upstream splits and combines with any splitters and
61- combiners for a given node, it is used to track how many jobs, and their
62- parameterisations, need to be run for a given workflow node.
68+ combiners for a given node. It is used to track how many jobs, and their
69+ parameterisations, that need to be run for a given workflow node.
6370
6471 State-array
6572 A state array is a collection of parameterised tasks or values that were generated
@@ -84,8 +91,9 @@ Glossary
8491
8592 Worker
8693 Encapsulation of a task execution environment. It is responsible for executing
87- tasks and managing their lifecycle. Workers can be local (e.g., a thread or
88- process) or remote (e.g., high-performance cluster).
94+ tasks and managing their lifecycle. Workers can be local (e.g., debug and
95+ concurrent-futures multiprocess) or orchestrated through a remote scheduler
96+ (e.g., SLURM, SGE).
8997
9098 Workflow
9199 A Directed-Acyclic-Graph (DAG) of parameterised tasks, to be executed in order.
0 commit comments