You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm not really sure this can be considered a bug, but since this "issue" puzzled me, I'm sharing it with everyone to see if it should be acted upon.
The issue
When the list of process names is printed in the .nextflow.log file, it may happen that two distinct processes, declared in two distinct nf files, which share the exact same name become only one in the list. Apr-03 09:55:00.694 [main] DEBUG nextflow.Session - Workflow process names [dsl2]: process_a, process_b
This was the source of my puzzlement, as when adding a new process in a file, you expect to see it appear in this list.
Step to reproduce
Declares processes in two files:
# file: main.nf
process a {
output:
stdout
script:
"""
echo "I'm process main:a"
"""
}
# file: sub.nf
process a {
output:
stdout
script:
"""
echo "I'm process sub:a"
"""
}
Execute the workflow, the following content will be printed in the logs: [main] DEBUG nextflow.Session - Workflow process names [dsl2]: a
In case the processes are used in several hierarchical workflows, the "resolved" names will also be printed as such: a for the use of main.nf:a in the "top level" workflow of main.nf, and sub:a for the use of sub.nf:a in the a sub workflow.
The cause
The reason behind this behavior is that the list of process names returned by the static member function ScriptMeta.allProcessNames() returns a Set<String>, which cannot contain duplicates. In a scenario where a is used within the main and a subworkflow, the following elements will actually be added to the set:
a : process main.nf:a
a : process sub.nf:a
a : Use of process main.nf:a within top-level workflow
sub:a : Use of process sub.nf:a within a sub workflow.
Resulting in the following set a, sub:a.
In case sub.nf:a is not used directly within a workflow, but instead imported with an aliased name in another workflow (eg. include {a as x} from ./sub.nf), then the list would look like this a, x, with no clear trace of the multiple definitions of a.
What can be done?
To answer what can be done, I should first explain why this is bothering me :) My objective* is to analyze the traces of pipelines run with Nextflow to try to build a model for predicting performances of future executions of this pipeline. To do that, I need to be able to associate each process execution from a report, identified by its "resolved" and possibly aliased name, with the corresponding process file:process_name. This objective is currently made impossible, notably because of the described issue.
Potential "solution" number 1:
A "simple" solution would be to replace the Set<String> returned by the ScriptMeta.allProcessNames() with a List<String>. That way, the printed list of process names would be "complete". In our example: a, a, a, sub:a
I'm not a big fan of this solution though. Although the multiple definitions, and use, of the workflow appear as now expected, the three a in the list are not strongly informative, and the difference between the definition and use of a in the top workflow is not visible.
Potential "solution" number 2:
I think it would be more informative to print the list of process definitions, separately from the list of "resolved" process names. These two list could then act as a dictionary to associate each resolved process names to its corresponding file and process definition. The result could look something like this in our example:
I can probably provide code for this solution in the next few days if you think it can be interesting to put this in the future version (which would also be great for me to avoid having to maintain this in my own fork :D )
Cheers,
Karol
*: (I'm an scientist in embedded high-performance system design and optimization)
The text was updated successfully, but these errors were encountered:
The only difference with previously described behavior is that resolved process names are grouped by process names, as follows: [main] DEBUG nextflow.Session - Workflow resolved process names: main.nf=a=[a], sub.nf=a=[sub:a, x]
I created a PR in case the proposed change is deemed worthy for production. I believe that beyond my own need, this changes facilitate the identification of what an aliased process name corresponds to.
Importantly, I verified that none of the suggested info was printed when elevating the log level to traces.
Hi,
I'm not really sure this can be considered a bug, but since this "issue" puzzled me, I'm sharing it with everyone to see if it should be acted upon.
The issue
When the list of process names is printed in the
.nextflow.log
file, it may happen that two distinct processes, declared in two distinct nf files, which share the exact same name become only one in the list.Apr-03 09:55:00.694 [main] DEBUG nextflow.Session - Workflow process names [dsl2]: process_a, process_b
This was the source of my puzzlement, as when adding a new process in a file, you expect to see it appear in this list.
Step to reproduce
Declares processes in two files:
Execute the workflow, the following content will be printed in the logs:
[main] DEBUG nextflow.Session - Workflow process names [dsl2]: a
In case the processes are used in several hierarchical workflows, the "resolved" names will also be printed as such:
a
for the use ofmain.nf:a
in the "top level" workflow ofmain.nf
, andsub:a
for the use ofsub.nf:a
in the asub
workflow.The cause
The reason behind this behavior is that the list of process names returned by the static member function
ScriptMeta.allProcessNames()
returns aSet<String>
, which cannot contain duplicates. In a scenario where a is used within the main and a subworkflow, the following elements will actually be added to the set:a
: processmain.nf:a
a
: processsub.nf:a
a
: Use of processmain.nf:a
within top-level workflowsub:a
: Use of processsub.nf:a
within a sub workflow.Resulting in the following set
a, sub:a
.In case
sub.nf:a
is not used directly within a workflow, but instead imported with an aliased name in another workflow (eg.include {a as x} from ./sub.nf
), then the list would look like thisa, x
, with no clear trace of the multiple definitions ofa
.What can be done?
To answer what can be done, I should first explain why this is bothering me :) My objective* is to analyze the traces of pipelines run with Nextflow to try to build a model for predicting performances of future executions of this pipeline. To do that, I need to be able to associate each process execution from a report, identified by its "resolved" and possibly aliased name, with the corresponding process
file:process_name
. This objective is currently made impossible, notably because of the described issue.Potential "solution" number 1:
A "simple" solution would be to replace the
Set<String>
returned by theScriptMeta.allProcessNames()
with aList<String>
. That way, the printed list of process names would be "complete". In our example:a, a, a, sub:a
I'm not a big fan of this solution though. Although the multiple definitions, and use, of the workflow appear as now expected, the three
a
in the list are not strongly informative, and the difference between the definition and use of a in the top workflow is not visible.Potential "solution" number 2:
I think it would be more informative to print the list of process definitions, separately from the list of "resolved" process names. These two list could then act as a dictionary to associate each resolved process names to its corresponding file and process definition. The result could look something like this in our example:
I can probably provide code for this solution in the next few days if you think it can be interesting to put this in the future version (which would also be great for me to avoid having to maintain this in my own fork :D )
Cheers,
Karol
*: (I'm an scientist in embedded high-performance system design and optimization)
The text was updated successfully, but these errors were encountered: