Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Graceful fallback when array is used on unsupported executors #5924

Open
itrujnara opened this issue Mar 26, 2025 · 5 comments
Open

Graceful fallback when array is used on unsupported executors #5924

itrujnara opened this issue Mar 26, 2025 · 5 comments

Comments

@itrujnara
Copy link

New feature

Currently, config parsing always fails if the array process directive is used in an executor that does not support job arrays (most importantly local). It would be reasonable to have Nextflow fall back to running single processes instead and print a warning to stderr. If needed, this behavior can be controlled with an environment variable.

Use case

Some pipelines have processes that take very little time to run (< 5 minutes). In that case, using job arrays is a reasonable default behavior. However, it is impossible to set directly, as it will fail on executors without job arrays.

Suggested implementation

As described above.

@bentsherman
Copy link
Member

I would just label the processes for which you want to use job arrays and use withLabel to set the array directive for those processes. It's generally better to be specific in that way rather than enabling job arrays across the board

@itrujnara
Copy link
Author

I'm not sure this fully addresses the issue. If I set the following in the config

withLabel: 'very_short' {
    array = 10
}

it will still fail if I run the pipeline with the local executor. Is there an idiomatic way to have such a pipeline ignore the array directive on a local machine but follow it on a Slurm cluster?

@bentsherman
Copy link
Member

I assume you have two profiles for local and slurm? In that case you can just put the array config in the slurm profile

@itrujnara
Copy link
Author

That's the solution I have implemented for the time being. The pipeline is nf-core, so it needs to be as portable as possible. I hoped I could somehow make using arrays the default behavior on grid executors, but it seems it's not possible for the time being.

@bentsherman
Copy link
Member

I would caution against making job arrays the default across the entire pipeline, because there are cases where it can actually hurt you.

For example, the first step in the pipeline will probably receive a deluge of tasks all at once because it's just loading the inputs. But a later step might receive tasks at a slower rate as some upstream tasks take longer than others. In the latter case, it might not be so important to batch the job submissions if they are already slow, and enabling job arrays might just needlessly delay job submissions while waiting for a full array to submit

That's just my intuition, but in reality it would depend on the actual submit rates vs the capacity of the scheduler. I do wonder whether this issue would come up in practice or not

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants