-
Notifications
You must be signed in to change notification settings - Fork 685
Support closure operand for pipe operator #3423
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: jemunro <[email protected]>
Wow, this is great. Should |
Thanks!
No, it's meant to be equivalent the following: workflow {
multi_ch = Channel.of('foo', 'bar', 'baz') \
| branch { foo: it == 'foo'; other: true }
TO_UPPER(multi_ch.other)
} |
Understand now. Not sure we should allow |
The given example of the However, consider the process modules offered by nf-core. In general these emit two named output channels, one for the actual process outputs and one for the versions. In this case this syntax becomes very useful for piping the process outputs. For example: output:
tuple val(meta), path("*.gz"), emit: vcf
path "versions.yml" , emit: versions Using the nf-core processes BCFTOOLS_SORT, BCFTOOLS_FILTER and BCFTOOLS_INDEX:
BCFTOOLS_SORT(input_vcf_ch)
BCFTOOLS_FILTER(BCFTOOLS_SORT.out.vcf)
BCFTOOLS_INDEX(BCFTOOLS_FILTER.out.vcf)
input_vcf_ch \
| BCFTOOLS_SORT \
| BCFTOOLS_FILTER(_.vcf) \
| BCFTOOLS_INDEX(_.vcf) |
I think it's better to be more explicit in the pipeline composition. The last snippet is elegant but it hides too much information. |
It is a bit esoteric, but I don't think it hides any more information than piping already does. e.g.: process TO_UPPER {
executor 'local'
input: val x
output: stdout emit: result
script: "echo -n $x | tr a-z A-Z"
}
process OTHER {...}
workflow WITH_PIPE {
Channel.of('foo', 'bar', 'baz') \
| TO_UPPER \
| OTHER
}
workflow NO_PIPE {
input_ch = Channel.of('foo', 'bar', 'baz')
TO_UPPER(input_ch)
OTHER(TO_UPPER.out.result)
} Piping in example I think it would be really useful to have a way to use processes that output multiple channels with the pipe operator. It wouldn't necessarily have to be the proposed I'd be interested to hear the opinion of some of the nf-core folks. @ewels, @drpatelh, @mribeirodantas - I've noticed that nf-core pipelines in general don't use the pipe operator. Is that because the pipe operator doesn't really work with processes that have multiple output channels? |
2c461f8
to
23432f3
Compare
Ooh I like this!
I don't write a lot of Nextflow code these days but yes, from what I have done I've basically never had a situation where I could get the pipe to work except for dummy minimal examples. Most real life processes I've written always have more than one output channel. |
Another option, instead of the proposed input_vcf_ch \
| BCFTOOLS_SORT \
| extract { vcf } \
| BCFTOOLS_FILTER \
| extract { vcf } \
| BCFTOOLS_INDEX This ends up being more verbose though. |
e2b4a93
to
f32ea0b
Compare
cefb067
to
e523afd
Compare
What is the current thought on this? I'm happy to make some tweaks or discuss further. |
0d59b4c
to
b93634e
Compare
81f7cb7
to
8a43489
Compare
I stumbled across a Rust RFC for a pipeline operator and particular this comment about de Bruijn indices, which is a notation for specifying multiple anonymous parameters in the lambda calculus. Your notation seems to be doing a similar thing and makes a lot of sense. I think the key enhancement here is the ability to define "port mappings" between two components other than the implicit one-to-one mapping. I especially like the ability to access channel elements by index or emit name. I think one way we could make the syntax clearer is to wrap non-default port mappings in a closure. Here are some sketches: // (1) overload <component> | <closure>
input_vcf_ch
| BCFTOOLS_SORT
| { out -> BCFTOOLS_FILTER(out.vcf) } // or use implicit it variable
| { out -> BCFTOOLS_INDEX(out.vcf) }
// (2) overload process call with single closure arg
input_vcf_ch
| BCFTOOLS_SORT
| BCFTOOLS_FILTER { out -> out.vcf }
| BCFTOOLS_INDEX { out -> out.vcf } I don't really like (2) because many operators expect a single closure argument which would create some ambiguity. But I like (1) because it is a distinct syntax and it allows you to map outputs to inputs however you like. The closure arg is the |
Signed-off-by: Ben Sherman <[email protected]>
Signed-off-by: Ben Sherman <[email protected]>
✅ Deploy Preview for nextflow-docs-staging ready!
To edit notification comments on pull requests, go to your Netlify site configuration. |
The closure syntax was stupid simple to implement! Piping between closures actually feels very intuitive, it's basically the same paradigm as pipes in Bash. Instead of a command, you have a closure that defines the invocation of a component, and the closure argument and return value are like stdin and stdout. Added some docs and a testable doc snippet based on @jemunro 's example. I really want this feature now 😄 |
Signed-off-by: Ben Sherman <[email protected]>
Signed-off-by: Ben Sherman <[email protected]>
I extended the and operator to also work on channels, so you can do something like: // before
foo(ch1, ch2, ch3)
// after
(ch1 & ch2 & ch3) | foo I figure you can do this with one channel so why not multiple 😄 |
Thinking more about this, especially in the context of our efforts to develop a formal grammar #4613 , I think we can do much better than extending the existing pipe operator which is just overloading the bitwise-or operator for a few specific types. The script parser will allow us to define new syntax outside of the Groovy grammar. I would like to use this opportunity to add a new pipe operator // x can be any value
// foo can be function, process, workflow, callable
x |> foo = foo(x)
// even a closure
x |> { x -> foo(x) } = foo(x) Languages that support pipes tend to use This fits into some broader ideas I have about making the dataflow syntax easier to read and write (but just as powerful if not more-so), likely as a proposal for DSL3. I hope to share some sketches soon but might be a while before it becomes real. Until then, we'll just have to live with the current syntax Closing this PR for now. Thanks @jemunro for your initial draft. It will not be forgotten |
Based on #3356
This PR enables a more flexible approach to process inputs when piping by using
_
as a placeholder.For example:
can be simplified to:
In addition, when the input is a multichannel, component channels can be extracted by name or index:
For example:
or
One concern is there is that users are not prevented from overriding the placeholder variable.
This is the intend of this section:
https://github.com/jemunro/nextflow/blob/flexible-process-piping/modules/nextflow/src/main/groovy/nextflow/script/ScriptBinding.groovy#L191-L204
But on testing it doesn't seem to work.