Workflow
The Workflow
class provides a framework for building stateful workflows with selective recomputation and intelligent caching. This is particularly useful for tasks like data loading, cleaning, and analysis, where intermediate results can be cached to avoid unnecessary re-execution.
Key Features
- Stateful Workflow Object: Maintains the state of computations, allowing efficient and reusable workflows.
- Caching: Intelligent caching avoids redundant computations for tasks that haven’t changed.
- Retry Policy: Optional retry policies for handling failures, with a default policy provided.
- Selective Recomputation: Rerun only affected parts of the workflow when changes occur.
Workflow Structure
Atoms
Atoms are individual, reusable units of computation. These are functions decorated with @workflow.atom()
. Atoms can:
- Have dependencies (other atoms they rely on).
- Be cached to prevent redundant execution.
Functions
A decorator used to define an atom (a node in the workflow).
Parameters:
dependencies
(list, optional): Names of other atoms this atom depends on.RetryPolicy
(optional): Specify a retry policy for this atom. If not provided, the default retry policy is used.force_recompute
(bool, optional): Whether to force computation of this atom even if it is unchanged.
Executes all atoms in the workflow, respecting dependencies and caching.
Parameters:
recompute_atoms
(set, optional): Names of atoms to force recomputation, bypassing the cache.
Returns:
results
(dict): A dictionary mapping atom names to their results.
Common Use Cases
- Data Loading and Cleaning: Load and preprocess data in stages, caching results to avoid reloading.
- Selective Execution: Rerun only the parts of the workflow affected by changes to interactive elements or inputs.
- Intermediate Results: Inspect cached intermediate results for debugging or analysis.
Example Workflow
Output Behavior
- Results: A dictionary mapping atom names to their outputs:
- Selective Execution: If the script is rerun without changes, only affected atoms are recomputed.
Reactive Runtime: Dependency-Aware Selective Execution
Preswald’s Workflow engine uses a reactive runtime that selectively reruns only the atoms affected by state changes — skipping any computation whose inputs haven’t changed. This improves performance and ensures fast, responsive apps.
The dependency graph that powers this system can be built in two ways:
1. Manual dependency declaration
You can explicitly declare dependencies using the dependencies
argument:
In this example:
task_b
depends ontask_a
even though it doesn’t use its output.- This ensures that
task_a
always runs beforetask_b
.
Manual dependencies are useful when:
- You need explicit control of execution order
- Atoms produce side effects or state changes rather than return values
- Dependencies cannot be inferred from function parameters
2. Automatic dependency inference
If an atom accepts another atom’s name as a parameter, the dependency is inferred automatically:
In this example, show_value
automatically depends on select_value
.
No matter how you define dependencies, the reactive runtime ensures that only atoms whose inputs have changed and their downstream dependents are recomputed.
3. Automatic Atom Lifting of Top-Level Components
Preswald automatically wraps top-level UI component calls (e.g., slider()
, text()
, etc.) in workflow atoms, even if you don’t explicitly use @workflow.atom()
.
This allows you to write concise, interactive apps without boilerplate:
These calls are automatically transformed into atoms during script execution. This allows Preswald to:
- Assign stable identifiers and track component state
- Include them in the dependency graph (DAG)
- Selectively rerun only the affected parts of the script when state changes
This automatic lifting is how Preswald maintains reactivity even in simple scripts without manual atom definitions.
If you run the app with debug logging enabled, you’ll see the transformed source in your logs:
This automatic lifting enables full reactivity while keeping your scripts clean and minimal.
Why Use Workflow
?
- Efficiency: Avoids redundant computation with caching.
- Modularity: Break down workflows into reusable, testable atoms.
- Flexibility: Supports retry policies and selective execution.
Streamline your data workflows with Workflow
! 🚀