You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The task hash is a critical part of Nextflow infrastructure, used to determine whether a cached result can be used or not when re-running Nextflow with -resume (amongst other things).
Currently, when resuming a pipeline doesn't work as expected, some extensive detective work is required to figure out why. This usually involves doing (2+) new runs with the -dump-hashes flag, which can be time consuming and costly. This is especially difficult if the failed cache hit is not always reproduced.
The key information that is helpful in these cases is what values were used to calculate the hash. If Nextflow could always print this, debugging resume failures would be much simpler.
We don't want to print this to the Nextflow log, as it could represent a lot of data for large runs with millions of tasks. We also don't want to add another file to the task work directory, as Nextflow already creates several and this can put pressure on the file system.
The suggested approach is to add this information as a bash code-comment to the top of the .command.begin file already present in the task work directory.
Ideally this data can be written as YAML and surrounded by fixed strings so that it can easily be pulled out programmatically, for example:
# # Start of task hash info# container: quay.io/nextflow/rnaseq-nf:v1.1# inputs:# '*':# - sourceObj: /home/abhinav/rnaseq-nf/work/03/23372f156e80deb4d7183c5f509274/ggal_gut,# storePath: /home/abhinav/rnaseq-nf/work/03/23372f156e80deb4d7183c5f509274/ggal_gut,# stageName: ggal_gut# - sourceObj: /home/abhinav/rnaseq-nf/work/55/15b60995682daf79ecb64bcbb8e44e/fastqc_ggal_gut_logs,# storePath: /home/abhinav/rnaseq-nf/work/55/15b60995682daf79ecb64bcbb8e44e/fastqc_ggal_gut_logs,# stageName: fastqc_ggal_gut_logs# config:# - sourceObj: /home/abhinav/rnaseq-nf/multiqc# storePath: /home/abhinav/rnaseq-nf/multiqc# stageName: multiqc# # End of task hash info
This could then easily be pulled out and parsed as YAML:
Above is pseudo-code based on the blog post only, exact structure of YAML can relate to whatever makes most sense from Nextflow memory.
If possible, it's good to structure / label in such a way to make it as Human-readable as possible however.
Having this info would make debugging failed resumes as simple as a diff command:
The task hash is a critical part of Nextflow infrastructure, used to determine whether a cached result can be used or not when re-running Nextflow with
-resume
(amongst other things).Currently, when resuming a pipeline doesn't work as expected, some extensive detective work is required to figure out why. This usually involves doing (2+) new runs with the
-dump-hashes
flag, which can be time consuming and costly. This is especially difficult if the failed cache hit is not always reproduced.The key information that is helpful in these cases is what values were used to calculate the hash. If Nextflow could always print this, debugging resume failures would be much simpler.
We don't want to print this to the Nextflow log, as it could represent a lot of data for large runs with millions of tasks. We also don't want to add another file to the task work directory, as Nextflow already creates several and this can put pressure on the file system.
The suggested approach is to add this information as a bash code-comment to the top of the
.command.begin
file already present in the task work directory.Ideally this data can be written as YAML and surrounded by fixed strings so that it can easily be pulled out programmatically, for example:
This could then easily be pulled out and parsed as YAML:
Note
Above is pseudo-code based on the blog post only, exact structure of YAML can relate to whatever makes most sense from Nextflow memory.
If possible, it's good to structure / label in such a way to make it as Human-readable as possible however.
Having this info would make debugging failed resumes as simple as a
diff
command:The text was updated successfully, but these errors were encountered: