Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nextflow unable to check for existence of task outputs under fusion symlinks #5948

Open
robsyme opened this issue Apr 5, 2025 · 0 comments

Comments

@robsyme
Copy link
Collaborator

robsyme commented Apr 5, 2025

Bug report

In moving the fusion-specific path resolution out of Nextflow, it becomes difficult for Nextflow to check for the presence of task outputs where those outputs are nested inside a symlink.

Expected behavior and actual behavior

Given a process that expects output:

input: path("example")
output: path("example/*.dat", includeInputs: true)

and taskworkdir:

/fusion/s3/my-bucket/work/aa/aaaaaaaa
└── example <actually a fusion symlink>
    └── my-task-output.dat

When Nextflow comes to finalize the task, it needs to check for the existence of the dat file, Nextflow can only see the directory symlink, and being unable to recurse into the symlink, cannot observe the task output dat file:

$ aws s3 ls s3://my-bucket/work/aa/aaaaaaaa/
.fusion.symlinks
example

Which leads to a Nextflow error

Missing output file(s) `example/*.dat` expected by process
```

Nextflow needs to be able to identify and recurse into fusion symlinks when checking for task outputs.

### Steps to reproduce the problem

Example workflow:

```nextflow
workflow {
    Create() 
    | Consume
}

process Create {
    output: path("one")
    script: "mkdir -p one/two && echo 'Hello world!' > one/two/hello.txt"
}

process Consume {
    input: path("one")
    output: path("**/two/*.txt", includeInputs: true)
    script: ":"
}
```

When running on a non-fusion environment:

```
 N E X T F L O W   ~  version 25.01.0-edge

Launching `./main.nf` [infallible_goldberg] DSL2 - revision: 32c3e38b0d

executor >  local (2)
[98/2b0c5e] process > Create  [100%] 1 of 1 ✔
[a5/13372d] process > Consume [100%] 1 of 1 ✔
```

When running on Fusion:

```
N E X T F L O W  ~  version 24.10.4
Pulling robsyme/nf-test ...
downloaded from https://github.com/robsyme/nf-test.git
Launching `https://github.com/robsyme/nf-test` [friendly_hawking] DSL2 - revision: 166c64d74d [includeInputs]
Monitor the execution with Seqera Platform using this URL: https://cloud.stage-seqera.io/orgs/seqeralabs/workspaces/scidev-aws/watch/104f9XF0jLNlzY
[27/acba93] Submitted process > Create
[10/fa7c4a] Submitted process > Consume
ERROR ~ Error executing process > 'Consume'
Caused by:
  Missing output file(s) `**/two/*.txt` expected by process `Consume`
Command executed:
  :
Command exit status:
  0
Command output:
  (empty)
Work dir:
  s3://scidev-playground-eu-west-2/scratch/104f9XF0jLNlzY/10/fa7c4ad4038fe112ab197272eef95d
Container:
  wave.stage-seqera.io/wt/8b72dbf392d2/nextflow/bash:latest
Tip: you can replicate the issue by changing to the process work dir and entering the command `bash .command.run`
-- Check 'nf-104f9XF0jLNlzY.log' file for details
```

### Program output 

(Copy and paste the output produced by the failing execution. Please highlight it as a code block. Whenever possible upload the `.nextflow.log` file.)

### Environment 

* Nextflow version: [?] 
* Java version: [?]
* Operating system: [macOS, Linux, etc]
* Bash version: (use the command `$SHELL --version`)

### Additional context

(Add any other context about the problem here)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant