Support closure operand for pipe operator #3423

jemunro · 2022-11-22T11:44:11Z

Based on #3356

This PR enables a more flexible approach to process inputs when piping by using _ as a placeholder.
For example:

workflow {
    append_ch = Channel.value('!')
    upper_ch = Channel.of('foo', 'bar', 'baz') \
        | TO_UPPER
    result_ch = CONCATENATE(upper_ch, append_ch)
}

can be simplified to:

workflow {
    append_ch = Channel.value('!')
    result_ch = Channel.of('foo', 'bar', 'baz') \
        | TO_UPPER \
        | CONCATENATE(_, append_ch)
}

In addition, when the input is a multichannel, component channels can be extracted by name or index:
For example:

workflow {
  Channel.of('foo', 'bar', 'baz') \
          | branch { foo: it == 'foo'; other: true } \
          | TO_UPPER(_.other)
}

or

workflow {
  Channel.of('foo', 'bar', 'baz') \
          | branch { foo: it == 'foo'; other: true } \
          | TO_UPPER(_[0])
}

One concern is there is that users are not prevented from overriding the placeholder variable.
This is the intend of this section:
https://github.com/jemunro/nextflow/blob/flexible-process-piping/modules/nextflow/src/main/groovy/nextflow/script/ScriptBinding.groovy#L191-L204
But on testing it doesn't seem to work.

Signed-off-by: jemunro <[email protected]>

pditommaso · 2022-11-22T11:48:19Z

Wow, this is great. Should TO_UPPER(_.other) be read TO_UPPER(_,other) ?

jemunro · 2022-11-22T12:51:15Z

Thanks!

Should TO_UPPER(_.other) be read TO_UPPER(_,other) ?

No, it's meant to be equivalent the following:

workflow {
  multi_ch = Channel.of('foo', 'bar', 'baz') \
          | branch { foo: it == 'foo'; other: true }
  TO_UPPER(multi_ch.other)
}

pditommaso · 2022-11-22T13:43:51Z

Understand now. Not sure we should allow _.something. I fear it will make the syntax too convoluted.

jemunro · 2022-11-22T21:43:11Z

The given example of the _.name syntax above is not very useful.

However, consider the process modules offered by nf-core. In general these emit two named output channels, one for the actual process outputs and one for the versions. In this case this syntax becomes very useful for piping the process outputs. For example:
https://github.com/nf-core/modules/blob/master/modules/nf-core/bcftools/sort/main.nf#L13-L15

    output:
    tuple val(meta), path("*.gz"), emit: vcf
    path "versions.yml"           , emit: versions

Using the nf-core processes BCFTOOLS_SORT, BCFTOOLS_FILTER and BCFTOOLS_INDEX:

without _.name syntax:

BCFTOOLS_SORT(input_vcf_ch)
BCFTOOLS_FILTER(BCFTOOLS_SORT.out.vcf)
BCFTOOLS_INDEX(BCFTOOLS_FILTER.out.vcf)

with _.name:

input_vcf_ch \
  | BCFTOOLS_SORT \
  | BCFTOOLS_FILTER(_.vcf) \
  | BCFTOOLS_INDEX(_.vcf)

pditommaso · 2022-11-23T09:06:48Z

I think it's better to be more explicit in the pipeline composition. The last snippet is elegant but it hides too much information.

jemunro · 2022-11-24T00:00:01Z

It is a bit esoteric, but I don't think it hides any more information than piping already does. e.g.:

process TO_UPPER {
    executor 'local'
    input: val x
    output: stdout emit: result
    script: "echo -n $x | tr a-z A-Z"
}

process OTHER {...}

workflow WITH_PIPE {
  Channel.of('foo', 'bar', 'baz') \
    | TO_UPPER \
    | OTHER
}

workflow NO_PIPE {
  input_ch = Channel.of('foo', 'bar', 'baz')
  TO_UPPER(input_ch)
  OTHER(TO_UPPER.out.result)
}

Piping in example WITH_PIPE already hides TO_UPPER.out.result

I think it would be really useful to have a way to use processes that output multiple channels with the pipe operator. It wouldn't necessarily have to be the proposed _.name construct, but I can't think of anything better.

I'd be interested to hear the opinion of some of the nf-core folks. @ewels, @drpatelh, @mribeirodantas - I've noticed that nf-core pipelines in general don't use the pipe operator. Is that because the pipe operator doesn't really work with processes that have multiple output channels?

ewels · 2022-11-24T22:05:09Z

Ooh I like this!

Is that because the pipe operator doesn't really work with processes that have multiple output channels?

I don't write a lot of Nextflow code these days but yes, from what I have done I've basically never had a situation where I could get the pipe to work except for dummy minimal examples. Most real life processes I've written always have more than one output channel.

jemunro · 2022-12-06T07:18:18Z

Another option, instead of the proposed _.name syntax, would be to add an operator that extracts channels from multi-channel outputs, say extract. E.g.:

input_vcf_ch \
  | BCFTOOLS_SORT \
  | extract { vcf } \
  | BCFTOOLS_FILTER \
  | extract { vcf } \
  | BCFTOOLS_INDEX

This ends up being more verbose though.

jemunro · 2023-02-03T04:27:30Z

What is the current thought on this? I'm happy to make some tweaks or discuss further.

bentsherman · 2023-11-16T16:08:27Z

I stumbled across a Rust RFC for a pipeline operator and particular this comment about de Bruijn indices, which is a notation for specifying multiple anonymous parameters in the lambda calculus. Your notation seems to be doing a similar thing and makes a lot of sense.

I think the key enhancement here is the ability to define "port mappings" between two components other than the implicit one-to-one mapping. I especially like the ability to access channel elements by index or emit name.

I think one way we could make the syntax clearer is to wrap non-default port mappings in a closure. Here are some sketches:

// (1) overload <component> | <closure>
input_vcf_ch
  | BCFTOOLS_SORT
  | { out -> BCFTOOLS_FILTER(out.vcf) } // or use implicit it variable
  | { out -> BCFTOOLS_INDEX(out.vcf) }

// (2) overload process call with single closure arg
input_vcf_ch
  | BCFTOOLS_SORT
  | BCFTOOLS_FILTER { out -> out.vcf }
  | BCFTOOLS_INDEX { out -> out.vcf }

I don't really like (2) because many operators expect a single closure argument which would create some ambiguity. But I like (1) because it is a distinct syntax and it allows you to map outputs to inputs however you like.

The closure arg is the ChannelOut of the previous component, and the closure should return the output of the component it invokes. The pipe operator should evaluate the closure immediately with the output of the previous component and the workflow context, so that other variables in the workflow can be used.

Signed-off-by: Ben Sherman <[email protected]>

netlify · 2023-11-16T18:09:22Z

✅ Deploy Preview for nextflow-docs-staging ready!

Name	Link
🔨 Latest commit	`61f539f`
🔍 Latest deploy log	https://app.netlify.com/sites/nextflow-docs-staging/deploys/65567ef1fd52a40007e8fa8d
😎 Deploy Preview	https://deploy-preview-3423--nextflow-docs-staging.netlify.app/workflow
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

bentsherman · 2023-11-16T18:17:48Z

The closure syntax was stupid simple to implement! Piping between closures actually feels very intuitive, it's basically the same paradigm as pipes in Bash. Instead of a command, you have a closure that defines the invocation of a component, and the closure argument and return value are like stdin and stdout.

Added some docs and a testable doc snippet based on @jemunro 's example.

I really want this feature now 😄

Signed-off-by: Ben Sherman <[email protected]>

bentsherman · 2023-11-16T21:39:14Z

I extended the and operator to also work on channels, so you can do something like:

// before
foo(ch1, ch2, ch3)

// after
(ch1 & ch2 & ch3) | foo

I figure you can do this with one channel so why not multiple 😄

bentsherman · 2024-04-29T20:05:07Z

Thinking more about this, especially in the context of our efforts to develop a formal grammar #4613 , I think we can do much better than extending the existing pipe operator which is just overloading the bitwise-or operator for a few specific types.

The script parser will allow us to define new syntax outside of the Groovy grammar. I would like to use this opportunity to add a new pipe operator |> that works with any function call, not just processes and channels:

// x can be any value
// foo can be function, process, workflow, callable
x |> foo = foo(x)

// even a closure
x |> { x -> foo(x) } = foo(x)

Languages that support pipes tend to use |> instead of | (see OCaml, F#, Elm, Elixir, Gleam). It should replace the | operator which can go back to just being bitwise-or (not that you need it very often in Nextflow).

This fits into some broader ideas I have about making the dataflow syntax easier to read and write (but just as powerful if not more-so), likely as a proposal for DSL3. I hope to share some sketches soon but might be a while before it becomes real. Until then, we'll just have to live with the current syntax

Closing this PR for now. Thanks @jemunro for your initial draft. It will not be forgotten

enable flexible process piping with InputPlaceholder

d92f7a9

Signed-off-by: jemunro <[email protected]>

pditommaso force-pushed the master branch 2 times, most recently from 2c461f8 to 23432f3 Compare November 24, 2022 19:15

bentsherman added the lang/operators label Nov 28, 2022

pditommaso force-pushed the master branch 2 times, most recently from e2b4a93 to f32ea0b Compare December 8, 2022 15:16

pditommaso force-pushed the master branch 2 times, most recently from cefb067 to e523afd Compare December 22, 2022 20:43

pditommaso force-pushed the master branch 2 times, most recently from 0d59b4c to b93634e Compare March 11, 2023 11:20

pditommaso force-pushed the master branch from 68c35f1 to 36b9e22 Compare June 4, 2023 20:18

pditommaso force-pushed the master branch from 38c9931 to 295bc1f Compare June 13, 2023 08:44

pditommaso force-pushed the master branch from e377812 to fcdeec0 Compare July 29, 2023 11:16

pditommaso force-pushed the master branch 2 times, most recently from 81f7cb7 to 8a43489 Compare August 20, 2023 20:13

bentsherman changed the title ~~enable flexible process piping~~ Extend pipe operator to support multiple input channels Nov 16, 2023

bentsherman changed the title ~~Extend pipe operator to support multiple input channels~~ Support arbitrary port mappings for pipe operator Nov 16, 2023

bentsherman linked an issue Nov 16, 2023 that may be closed by this pull request

DSL2 pipe syntax enhancements #3243

Open

bentsherman self-requested a review November 16, 2023 16:25

Refactor using closure syntax

8867ddd

Signed-off-by: Ben Sherman <[email protected]>

bentsherman added 2 commits November 16, 2023 11:30

Merge branch 'master' into pr/jemunro/3423

7d08adb

Update docs

8cf8f51

Signed-off-by: Ben Sherman <[email protected]>

bentsherman approved these changes Nov 16, 2023

View reviewed changes

bentsherman requested a review from pditommaso November 16, 2023 18:18

bentsherman added the lang/processes label Nov 16, 2023

bentsherman changed the title ~~Support arbitrary port mappings for pipe operator~~ Support closure operand for pipe operator Nov 16, 2023

bentsherman added 2 commits November 16, 2023 14:29

Compose multi-channels with and operator

d8efa4e

Signed-off-by: Ben Sherman <[email protected]>

Add then operator (equivalent to pipe with closure)

61f539f

Signed-off-by: Ben Sherman <[email protected]>

bentsherman mentioned this pull request Nov 16, 2023

Use pipe with closure nf-core/rnaseq#1119

Closed

6 tasks

pditommaso force-pushed the master branch from fd99141 to 19d2ccb Compare November 24, 2023 20:42

pditommaso force-pushed the master branch from 4e27468 to dfd7d09 Compare December 20, 2023 09:55

jemunro requested a review from a team as a code owner April 3, 2024 14:39

bentsherman closed this Apr 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support closure operand for pipe operator #3423

Support closure operand for pipe operator #3423

jemunro commented Nov 22, 2022

pditommaso commented Nov 22, 2022

jemunro commented Nov 22, 2022

pditommaso commented Nov 22, 2022

jemunro commented Nov 22, 2022

pditommaso commented Nov 23, 2022

jemunro commented Nov 24, 2022

ewels commented Nov 24, 2022

jemunro commented Dec 6, 2022

jemunro commented Feb 3, 2023

bentsherman commented Nov 16, 2023 •

edited

Loading

netlify bot commented Nov 16, 2023 •

edited

Loading

bentsherman commented Nov 16, 2023

bentsherman commented Nov 16, 2023

bentsherman commented Apr 29, 2024

Support closure operand for pipe operator #3423

Support closure operand for pipe operator #3423

Conversation

jemunro commented Nov 22, 2022

pditommaso commented Nov 22, 2022

jemunro commented Nov 22, 2022

pditommaso commented Nov 22, 2022

jemunro commented Nov 22, 2022

pditommaso commented Nov 23, 2022

jemunro commented Nov 24, 2022

ewels commented Nov 24, 2022

jemunro commented Dec 6, 2022

jemunro commented Feb 3, 2023

bentsherman commented Nov 16, 2023 • edited Loading

netlify bot commented Nov 16, 2023 • edited Loading

✅ Deploy Preview for nextflow-docs-staging ready!

bentsherman commented Nov 16, 2023

bentsherman commented Nov 16, 2023

bentsherman commented Apr 29, 2024

bentsherman commented Nov 16, 2023 •

edited

Loading

netlify bot commented Nov 16, 2023 •

edited

Loading