Graceful fallback when `array` is used on unsupported executors #5924

itrujnara · 2025-03-26T15:18:04Z

New feature

Currently, config parsing always fails if the array process directive is used in an executor that does not support job arrays (most importantly local). It would be reasonable to have Nextflow fall back to running single processes instead and print a warning to stderr. If needed, this behavior can be controlled with an environment variable.

Use case

Some pipelines have processes that take very little time to run (< 5 minutes). In that case, using job arrays is a reasonable default behavior. However, it is impossible to set directly, as it will fail on executors without job arrays.

Suggested implementation

As described above.

The text was updated successfully, but these errors were encountered:

bentsherman · 2025-03-28T15:09:36Z

I would just label the processes for which you want to use job arrays and use withLabel to set the array directive for those processes. It's generally better to be specific in that way rather than enabling job arrays across the board

itrujnara · 2025-03-28T15:35:05Z

I'm not sure this fully addresses the issue. If I set the following in the config

withLabel: 'very_short' {
    array = 10
}

it will still fail if I run the pipeline with the local executor. Is there an idiomatic way to have such a pipeline ignore the array directive on a local machine but follow it on a Slurm cluster?

bentsherman · 2025-03-28T16:17:01Z

I assume you have two profiles for local and slurm? In that case you can just put the array config in the slurm profile

itrujnara · 2025-03-31T07:43:53Z

That's the solution I have implemented for the time being. The pipeline is nf-core, so it needs to be as portable as possible. I hoped I could somehow make using arrays the default behavior on grid executors, but it seems it's not possible for the time being.

bentsherman · 2025-04-03T15:45:02Z

I would caution against making job arrays the default across the entire pipeline, because there are cases where it can actually hurt you.

For example, the first step in the pipeline will probably receive a deluge of tasks all at once because it's just loading the inputs. But a later step might receive tasks at a slower rate as some upstream tasks take longer than others. In the latter case, it might not be so important to batch the job submissions if they are already slow, and enabling job arrays might just needlessly delay job submissions while waiting for a full array to submit

That's just my intuition, but in reality it would depend on the actual submit rates vs the capacity of the scheduler. I do wonder whether this issue would come up in practice or not

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Graceful fallback when `array` is used on unsupported executors #5924

Graceful fallback when `array` is used on unsupported executors #5924

itrujnara commented Mar 26, 2025

bentsherman commented Mar 28, 2025

itrujnara commented Mar 28, 2025

bentsherman commented Mar 28, 2025

itrujnara commented Mar 31, 2025

bentsherman commented Apr 3, 2025

Graceful fallback when array is used on unsupported executors #5924

Graceful fallback when array is used on unsupported executors #5924

Comments

itrujnara commented Mar 26, 2025

New feature

Use case

Suggested implementation

bentsherman commented Mar 28, 2025

itrujnara commented Mar 28, 2025

bentsherman commented Mar 28, 2025

itrujnara commented Mar 31, 2025

bentsherman commented Apr 3, 2025

Graceful fallback when `array` is used on unsupported executors #5924

Graceful fallback when `array` is used on unsupported executors #5924