Skip to content

exec and eval operators #3356

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 7 commits into from

Conversation

jemunro
Copy link

@jemunro jemunro commented Nov 7, 2022

New operators eval and exec

This PR is intended to start a discussion about the potential usefulness of the proposed operators, but also provides an implmentation that could be built upon.

These operators are non-standard in the sense that they don't transform the contents of a channel but provide a tool for programming with channels, similarly to set. These were inspired based on discussion at #3243, and are somewhat relevant to #3272.

The main advantage is that they facilitate piping/chaining and result in fewer intermediatate channel variables being assigned.

eval

Evaluate a closure with the Channel or Multi-channel (as output by branch, multimap or processes with multiple outputs) source as delegate and return the result. In the case of named multi-channel sources the name can be used directly. This works similarly to Groovy's with() function.

exec

Execute a process using the source channel as first input. Multi-channel inputs are unpacked into subsequent inputs, following by any arguments given to the operator.

Examples

All examples use the following processes:

process TO_UPPER {
    input: val x
    output: stdout
    script: "echo -n $x | tr a-z A-Z"
}

process CONCATENATE {
    input: val x; val y
    output: stdout; val(x)
    script: "echo -n $x; echo -n $y"
}

All examples output the same result_ch:

[FOO!, BAR!]
[FOO!, BAZ!]

Object Orientated Style (no piping)

Without eval and exec

workflow {

    append_ch = Channel.value('!')

    upper_ch = TO_UPPER(channel.of('foo', 'bar', 'baz'))
    
    concat_ch = CONCATENATE(upper_ch, append_ch)
    
    branched_ch = concat_ch[0]
        .branch { 
            foo: it == 'FOO!'
            other: true }
    
    result_ch = branched_ch
        .foo
        .combine(branched_ch.other)
    
    result_ch.view()
}

With eval and exec

workflow {

    append_ch = Channel.value('!')

    result_ch = channel.of('foo', 'bar', 'baz')
        .exec(TO_UPPER)
        .exec(CONCATENATE, append_ch) 
        .eval { it[0] }
        .branch { 
            foo: it == 'FOO!'
            other: true }
        .eval { foo.combine(other) }

    result_ch.view()
}

Piping Style

Without eval and exec

workflow {

    append_ch = Channel.value('!')

    upper_ch = channel.of('foo', 'bar', 'baz') \
        | TO_UPPER
        
    concat_ch = CONCATENATE(upper_ch, append_ch)

    branched_ch = concat_ch[0] \
        | branch { 
            foo: it == 'FOO!'
            other: true }

    result_ch = branched_ch
        .foo \
        | combine(branched_ch.other)
    
    result_ch | view
}

With eval and exec

workflow {

    append_ch = Channel.value('!')

    result_ch = channel.of('foo', 'bar', 'baz') \
        | TO_UPPER \
        | exec(CONCATENATE, append_ch) \
        | eval { it[0] } \
        | branch { 
            foo: it == 'FOO!'
            other: true } \
        | eval { foo | combine(other) }

    result_ch | view()
}

@pditommaso
Copy link
Member

This sounds interesting.

Regarding eval if i'm understanding correctly it takes a multi-channel and map it to a channel. Begin so it looks like the reverse of multiMap. is that correct?

In relation to exec, it looks like the main use case it to allow a more fluent composition of processes when there's output/input cardinality mismatch.

In this case, it would be even nicer to allow of a special input keyword. following your example, something like

channel.of('foo', 'bar', 'baz') \
        | TO_UPPER \
        | CONCATENATE(-, append_ch) \

where - should represent the channel resulting by the chain operation. This would be more readable than the exec proposal. Above all, it would allow more control over which process input to be used, for example, it also is possible to use as CONCATENATE(append_ch, -).

Not sure about the feasibility to use - in practical terms. It may have some special meaning in the underlying groovy parser.

@jemunro
Copy link
Author

jemunro commented Nov 10, 2022

This sounds interesting.

Regarding eval if i'm understanding correctly it takes a multi-channel and map it to a channel. Begin so it looks like the reverse of multiMap. is that correct?

The primary use case of of eval would be as you say, to take multi-channels and return a single channel after applying some functions/operators/processes, in a way that would work nicely with the pipe operator. For example:

  1. Using named multi-channel

    Channel.of(['A', 1], ['B', 3], ['C', 4]) \
        | branch { a: it[0] == 'A'; other: true } \
        | eval { a.combine(other) } \
        | view
    ['A', 1, 'B', 3]
    ['A', 1, 'C', 4]
    

    Equivalent to:

    ch = Channel.of(['A', 1], ['B', 3], ['C', 4]) \
        | branch { a: it[0] == 'A'; other: true }
    ch.a | combine(ch.other) | view
  2. Using unamed multi-channel

    process IN_1_OUT_2 {
        input:
        tuple val(x), val(y)
    
        output: 
        val(x)
        val(y)
    
        exec: null
    }
    workflow {
        Channel.of(['A', 1]) \
            | IN_1_OUT_2 \
            | eval { first() } \
            | view
    }
    'A'
    

    equivalent to:

    workflow {
        Channel.of(['A', 1]) \
            | IN_1_OUT_2
        IN_1_OUT_2.out.first().view()
    }
  3. Flexible process call

    append_ch = Channel.value('!')
    channel.of('foo', 'bar') \
        | eval { CONCATENATE(it, append_ch) } \
        | view
    'foo!'
    'bar!'
    

    equivalent to:

    append_ch = Channel.value('!')
    input_ch = channel.of('foo', 'bar')
    CONCATENATE(input_ch, append_ch) \
        | view

@jemunro
Copy link
Author

jemunro commented Nov 10, 2022

In relation to exec, it looks like the main use case it to allow a more fluent composition of processes when there's output/input cardinality mismatch.

In this case, it would be even nicer to allow of a special input keyword. following your example, something like

channel.of('foo', 'bar', 'baz') \
        | TO_UPPER \
        | CONCATENATE(-, append_ch) \

where - should represent the channel resulting by the chain operation. This would be more readable than the exec proposal. Above all, it would allow more control over which process input to be used, for example, it also is possible to use as CONCATENATE(append_ch, -).

Not sure about the feasibility to use - in practical terms. It may have some special meaning in the underlying groovy parser.

This alternative proposal is very neat, but as you say '-' won't work due to the groovy parser. Alternatives could be:

  1. Closure
        | CONCATENATE { it; append } \
  2. Special reserved variable, e.g. '_'
        | CONCATENATE(_, append) \

I think both would be harder to implement than an operator such as exec, because it would need a rewrite of how Nextflow handles process calls.

@pditommaso
Copy link
Member

I think eval is useful, tho the name is too generic. It should be found something more specific.

I think both would be harder to implement than an operator such as exec, because it would need a rewrite of how Nextflow handles process calls.

Using _ instead - should be that hard. I would expect something similar to what is currently done for operation chaining in which the left channel is added as a source, see here

I think it's worth to give it a try

@jemunro
Copy link
Author

jemunro commented Nov 11, 2022

I think eval is useful, tho the name is too generic. It should be found something more specific.

I agree it is generic, but it is hard to think of a less generic name. Maybe chain? Since it helps to chain more operations together? E.g.:

Channel.of(['A', 1], ['B', 3], ['C', 4]) \
    | branch { a: it[0] == 'A'; other: true } \
    | chain { a.combine(other) } \
    | view

Using _ instead - should be that hard. I would expect something similar to what is currently done for operation chaining in which the left channel is added as a source, see here

I think it's worth to give it a try

How does this sound for an implementation approach:

  1. Create a class InputPlaceholder which will be assigned to _ in the script binding
  2. Add a new boolean field to ProcessDef isPartial
  3. Modify ProcessDef.run() at the following section, adding a check for instances of InputPlaceholder in the params
    Object run(Object[] args) {
    // initialise process config
    initialize()
    // get params
    final params = ChannelOut.spread(args)
    // sanity check
    if( params.size() != declaredInputs.size() )
    throw new ScriptRuntimeException(missMatchErrMessage(processName, declaredInputs.size(), params.size()))
    // set input channels
    for( int i=0; i<params.size(); i++ ) {
    (declaredInputs[i] as BaseInParam).setFrom(params[i])
    }
  4. If a InputPlaceholder is present, instead set isPartial = true and return this. Then the next time ProcessDef.run() is called, replace InputPlaceholder with args

@pditommaso
Copy link
Member

it sounds like a plan!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants