Skip to content

DSL2 pipe syntax enhancements #3243

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
jemunro opened this issue Sep 25, 2022 · 10 comments
Open

DSL2 pipe syntax enhancements #3243

jemunro opened this issue Sep 25, 2022 · 10 comments

Comments

@jemunro
Copy link

jemunro commented Sep 25, 2022

New feature

Enhancing the pipe operator to allow processes and workflows with multiple inputs to be piped together.

Usage scenario

Given the following processes:

process toUpper {
    input: val x
    output: stdout
    script: "echo -n $x | tr a-z A-Z"
}
process strConcat {
    input: val x; val y
    output: stdout
    script: "echo -n $x; echo -n $y"
}

If we want to run out data through toUpper and into strConcat, the best we can do with piping is the following:

workflow {
    append = channel.value('!')
    strConcat(
        channel.of('foo', 'bar') | toUpper,
        append) |
        view()
}

which gives output:

N E X T F L O W  ~  version 22.04.5
Launching `works.nf` [zen_mclean] DSL2 - revision: 2f1ccf68d9
executor >  local (4)
[68/eb8b8d] process > toUpper (2)   [100%] 2 of 2 ✔
[e9/7fb402] process > strConcat (2) [100%] 2 of 2 ✔
FOO!
BAR!

Suggest implementation

The following would be more intuitive and readable:

workflow {
    append = channel.value('!')
    channel.of('foo', 'bar') |
        toUpper() | 
        strConcat(append) |
        view()
}

Attempting to run the above currently gives the following error:

Process `toUpper` declares 1 input channel but 0 were specified
 -- Check script 'main.nf' at line: 17 or see '.nextflow.log' file for more details

if we replace toUpper() | with toUpper |, the error becomes:

Process `strConcat` declares 2 input channels but 1 were specified

This could be achieved by making the first argument or a process be the piped value (regardless of whether parenthesis are present). This already seems to be the case for operators that accept multiple arguments, e.g.:

channel.of('foo', 'bar') |
        mix(channel.of('baz')) |
        view
@bentsherman
Copy link
Member

Basically you want to be able to curry channel arguments. I think it would work for processes but not for operators, because operators use an object-oriented syntax, e.g. a | op(b) is equivalent to a.op(b). It's an interesting idea.

Also, the example with mix works because mix can take any number of arguments.

@jemunro
Copy link
Author

jemunro commented Sep 27, 2022

As far as I can tell this already works for most operators, just not for processes.

For example the following works fine:

workflow {
    left  = channel.of(['A', 1], ['B', 2], ['C', 3], ['D', 7])
    right = channel.of(['B', 6], ['C', 5], ['D', 2], ['A', 8])

    left | 
        join(right) | 
        map { s, x, y -> [s, x, y, x + y] } |
        groupTuple(by: 3) |
        map { ss, xx, yy, z -> [ss.join('-'), xx.sum() + yy.sum() ]} |
        combine(left) |
        view()
}
[B-C, 16, A, 1]
[B-C, 16, B, 2]
[B-C, 16, C, 3]
[B-C, 16, D, 7]
[A-D, 18, A, 1]
[A-D, 18, B, 2]
[A-D, 18, C, 3]
[A-D, 18, D, 7]

@jemunro
Copy link
Author

jemunro commented Sep 27, 2022

This piping paradigm is widely used in R: see https://r4ds.had.co.nz/pipes.html

@bentsherman
Copy link
Member

Yes, most operators already work this way because they use an object-oriented syntax, so it is clear which argument is being piped. With processes I only worry that it's not clear which input to pipe the argument into.

@jemunro
Copy link
Author

jemunro commented Sep 28, 2022

I think that it would be fairly intuitive that it was the first process input that was being piped into.

To illustrate further, let's consider some operators that could be implemented as processes:
1. unary operator:

process SPLIT_CSV {
    input: val(x)
    output: val(split)
    exec: split = x.split(',')
}
workflow {
    csv = channel.of('a,b,c', 'd,e,f')
    // --------------- operator ---------------
    csv.splitCsv().view()        // (1) - valid
    csv | splitCsv | view()      // (2) - valid
    splitCsv(csv) | view()       // (3) - not valid
    // --------------- process ----------------
    csv.SPLIT_CSV().view()       // (1) - not valid
    csv | SPLIT_CSV | view()     // (2) - valid
    SPLIT_CSV(csv) | view()      // (3) - valid
}

2. binary operator:

process MERGE {
    input: val(x); val(y)
    output: val(xy)
    exec: xy = (x instanceof List ? x : [x]) + (y instanceof List ? y : [y])
}
workflow {
    left  = channel.of(['A', 1], ['B', 2], ['C', 3])
    right = channel.of(['X', 4], ['Y', 5], ['Z', 6])
    // --------------- operator ---------------
    left.merge(right).view()      // (1) - valid
    left | merge(right) | view()  // (2) - valid
    merge(left, right) | view()   // (3) - not valid
    // --------------- process ----------------
    left.MERGE(right).view()      // (1) - not valid
    left | MERGE(right) | view()  // (2) - not valid
    MERGE(left, right) | view()   // (3) - valid
}

From this we can see that option (2) is valid for both the process and operator in unary case, but only for the operator in the binary case.

@bentsherman
Copy link
Member

Thank you for the detailed example. It is interesting, Nextflow's operator syntax suggests the first input should be piped, whereas currying in functional programming would suggest that the last input should be piped. I foresee future tribal divisions over this question 😄

I also wanted to mention, there is a & operator for invoking multiple processes with the same inputs (docs). What if we could use this syntax on channels as well?

// and operator with processes
workflow {
    channel.from('Hello') | (foo & bar) | mix | view
}

// and operator with channels
workflow {
    (left & right) | merge | view
}

@jemunro
Copy link
Author

jemunro commented Sep 29, 2022

The & operator is an interesting idea. Using the first example, are you suggesting using the & something like this?

workflow {
    append = channel.value('!')
    channel.of('foo', 'bar') |
        toUpper &
        append |
        strConcat |
        view()
}

@bentsherman
Copy link
Member

Well, it wouldn't work in that case, because you want channel.of('foo', 'bar') to be piped into toUpper and not append. So it looks like your argument currying idea is more versatile.

@jemunro jemunro mentioned this issue Sep 30, 2022
@jemunro
Copy link
Author

jemunro commented Sep 30, 2022

Another thought, what it there was an operator that worked similarly to Groovy's with?

Using OO syntax, with already mostly works:

append = channel.value('!')
channel.of('foo', 'bar')
    .with { toUpper (it) }
    .with { strConcat(it, append) }
    .view()

We could define an operator, say withOp, that works with pipes:

append = channel.value('!')
channel.of('foo', 'bar') |
    toUpper |
    withOp { strConcat(it, append) } |
    view

The nice thing about this approach is that it is quite flexible, you could do any arbitrary thing with channels, operators and processes in the closure and then pipe the result.

I made a draft PR that enables the above to work: #3254

@stale
Copy link

stale bot commented Mar 18, 2023

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Mar 18, 2023
@stale stale bot removed the stale label Mar 18, 2023
@bentsherman bentsherman linked a pull request Nov 16, 2023 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants