Skip to content

Include metagenomics mini-course #577

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 142 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
142 commits
Select commit Hold shift + click to select a range
6342ee3
Create index.md
jeffe107 Mar 19, 2025
9eb02ca
Update index.md
jeffe107 Mar 19, 2025
2917688
Update index.md
jeffe107 Mar 19, 2025
b88d787
Update index.md
jeffe107 Mar 19, 2025
8156f2d
Create 00_orientation.md
jeffe107 Mar 19, 2025
cf38072
Update 00_orientation.md
jeffe107 Mar 19, 2025
dafd27f
Update 00_orientation.md
jeffe107 Mar 19, 2025
f44a906
Update 00_orientation.md
jeffe107 Mar 19, 2025
283e051
Update 00_orientation.md
jeffe107 Mar 19, 2025
23ef25f
Update 00_orientation.md
jeffe107 Mar 19, 2025
e272249
Create 01_pipeline.md
jeffe107 Mar 19, 2025
dc7b03a
Create file.md
jeffe107 Mar 19, 2025
59d0c2a
Add files via upload
jeffe107 Mar 19, 2025
f7e45eb
Update 01_pipeline.md
jeffe107 Mar 19, 2025
b2fda13
Delete docs/nf4_science/metagenomics/src/file.md
jeffe107 Mar 19, 2025
198c673
Update 01_pipeline.md
jeffe107 Mar 19, 2025
e86ead3
Update 01_pipeline.md
jeffe107 Mar 19, 2025
a6e9e2c
Update 01_pipeline.md
jeffe107 Mar 19, 2025
caea6d8
Update 01_pipeline.md
jeffe107 Mar 19, 2025
7424aab
Update 01_pipeline.md
jeffe107 Mar 19, 2025
e4fcedc
Update 01_pipeline.md
jeffe107 Mar 19, 2025
b18841e
Update 01_pipeline.md
jeffe107 Mar 19, 2025
8c6e0b0
Update 01_pipeline.md
jeffe107 Mar 19, 2025
251379e
Update 01_pipeline.md
jeffe107 Mar 19, 2025
3b0bf20
Update 01_pipeline.md
jeffe107 Mar 19, 2025
949b46f
Update 01_pipeline.md
jeffe107 Mar 19, 2025
2269c1a
Update 01_pipeline.md
jeffe107 Mar 19, 2025
c1be7e4
Update 01_pipeline.md
jeffe107 Mar 19, 2025
448f361
Update 01_pipeline.md
jeffe107 Mar 19, 2025
d20195d
Update 01_pipeline.md
jeffe107 Mar 19, 2025
3f73935
Update 01_pipeline.md
jeffe107 Mar 19, 2025
b61450a
Update 01_pipeline.md
jeffe107 Mar 19, 2025
25a4c33
Update 01_pipeline.md
jeffe107 Mar 20, 2025
811f3a0
Update 01_pipeline.md
jeffe107 Mar 20, 2025
dc52662
Update 01_pipeline.md
jeffe107 Mar 20, 2025
5789e03
Update 01_pipeline.md
jeffe107 Mar 20, 2025
caa7952
Update 01_pipeline.md
jeffe107 Mar 20, 2025
3241b83
Update 01_pipeline.md
jeffe107 Mar 20, 2025
51f7908
Update 01_pipeline.md
jeffe107 Mar 20, 2025
b59bd2e
Update 01_pipeline.md
jeffe107 Mar 20, 2025
0315162
Update 01_pipeline.md
jeffe107 Mar 20, 2025
b02aef1
Update 01_pipeline.md
jeffe107 Mar 20, 2025
032c74b
Update 01_pipeline.md
jeffe107 Mar 20, 2025
d7cb6e3
Update 01_pipeline.md
jeffe107 Mar 20, 2025
f25066b
Update 01_pipeline.md
jeffe107 Mar 20, 2025
e4a8e30
Update 01_pipeline.md
jeffe107 Mar 20, 2025
37aadf1
Update 01_pipeline.md
jeffe107 Mar 20, 2025
53586cf
Create a.md
jeffe107 Mar 20, 2025
cf3fa44
Add files via upload
jeffe107 Mar 20, 2025
4befbb4
Delete nf4-science/metagenomics/a.md
jeffe107 Mar 20, 2025
ff643cc
Add files via upload
jeffe107 Mar 20, 2025
e7f4ca6
Add files via upload
jeffe107 Mar 20, 2025
d56bd8b
Create a.md
jeffe107 Mar 20, 2025
22551bd
Create b.md
jeffe107 Mar 20, 2025
1e99116
Add files via upload
jeffe107 Mar 20, 2025
93156a1
Add files via upload
jeffe107 Mar 20, 2025
58d63ee
Add files via upload
jeffe107 Mar 20, 2025
32a445b
Add files via upload
jeffe107 Mar 20, 2025
2d2a77d
Delete nf4-science/metagenomics/data/a.md
jeffe107 Mar 20, 2025
0885b01
Delete nf4-science/metagenomics/data/samples/b.md
jeffe107 Mar 20, 2025
719b834
Add files via upload
jeffe107 Mar 20, 2025
c56762a
Update 00_orientation.md
jeffe107 Mar 20, 2025
5a8c3cb
Create a.md
jeffe107 Mar 20, 2025
459d9ce
Add files via upload
jeffe107 Mar 20, 2025
8e074be
Update nextflow.config
jeffe107 Mar 20, 2025
5de3a89
Update bowtie2.nf
jeffe107 Mar 20, 2025
f9f0208
Update 01_pipeline.md
jeffe107 Mar 20, 2025
954b4b0
Update 01_pipeline.md
jeffe107 Mar 20, 2025
d815951
Update 00_orientation.md
jeffe107 Mar 20, 2025
4c07e5b
Update 01_pipeline.md
jeffe107 Mar 20, 2025
7091edb
Delete nf4-science/metagenomics/data/yeast/a.md
jeffe107 Mar 20, 2025
9f974c0
Update 01_pipeline.md
jeffe107 Mar 21, 2025
4dcec09
Create 02_multi-sample.md
jeffe107 Mar 21, 2025
095b1eb
Update 02_multi-sample.md
jeffe107 Mar 21, 2025
5ee361f
Update 02_multi-sample.md
jeffe107 Mar 21, 2025
3603a2d
Update index.md
jeffe107 Mar 21, 2025
0a7308b
Update 02_multi-sample.md
jeffe107 Mar 21, 2025
fa44ee9
Update 02_multi-sample.md
jeffe107 Mar 21, 2025
5a7f964
Update 02_multi-sample.md
jeffe107 Mar 21, 2025
112e29f
Update 02_multi-sample.md
jeffe107 Mar 21, 2025
b1a3e45
Update nextflow.config
jeffe107 Mar 21, 2025
043a178
Update 02_multi-sample.md
jeffe107 Mar 21, 2025
de67ccf
Update 02_multi-sample.md
jeffe107 Mar 21, 2025
b53e165
Update 02_multi-sample.md
jeffe107 Mar 21, 2025
c13f75b
Update 02_multi-sample.md
jeffe107 Mar 21, 2025
e92e784
Update workflow.nf
jeffe107 Mar 21, 2025
55178c1
Update 02_multi-sample.md
jeffe107 Mar 21, 2025
2a568b9
Update 02_multi-sample.md
jeffe107 Mar 21, 2025
251f5d0
Update 00_orientation.md
jeffe107 Mar 21, 2025
0f3b5ef
Update 01_pipeline.md
jeffe107 Mar 21, 2025
e16d04d
Update 01_pipeline.md
jeffe107 Mar 21, 2025
577397b
Update 02_multi-sample.md
jeffe107 Mar 21, 2025
e084aab
Update 01_pipeline.md
jeffe107 Mar 21, 2025
bb02c0a
Update 01_pipeline.md
jeffe107 Mar 21, 2025
495fe35
Update 01_pipeline.md
jeffe107 Mar 21, 2025
cb2ec81
Update 01_pipeline.md
jeffe107 Mar 21, 2025
e39905d
Update 01_pipeline.md
jeffe107 Mar 21, 2025
a3637d8
Create a.md
jeffe107 Mar 21, 2025
f4bbb88
Delete nf4-science/metagenomics/bin/report.Rmd
jeffe107 Mar 21, 2025
8e66bde
Add files via upload
jeffe107 Mar 21, 2025
1e8d047
Delete nf4-science/metagenomics/bin/a.md
jeffe107 Mar 21, 2025
6ccc95e
Update 02_multi-sample.md
jeffe107 Mar 21, 2025
e5eee0a
Update 02_multi-sample.md
jeffe107 Mar 21, 2025
bd1c9cb
Update mkdocs.yml
jeffe107 Mar 22, 2025
fbf261e
Update 00_orientation.md
jeffe107 Mar 22, 2025
07d3359
Update mkdocs.yml
jeffe107 Mar 22, 2025
c91461a
Delete docs/nf4_science/metagenomics/src directory
jeffe107 Mar 22, 2025
a6f503d
Add files via upload
jeffe107 Mar 22, 2025
d86fd80
Update 01_pipeline.md
jeffe107 Mar 22, 2025
f62d15f
Delete docs/nf4_science/metagenomics/workflow_kraken.png
jeffe107 Mar 22, 2025
b43540d
Add files via upload
jeffe107 Mar 22, 2025
2ed8ea0
Update 01_pipeline.md
jeffe107 Mar 22, 2025
4541141
Update 01_pipeline.md
jeffe107 Mar 22, 2025
4d85459
Update 01_pipeline.md
jeffe107 Mar 22, 2025
841135f
Update 02_multi-sample.md
jeffe107 Mar 22, 2025
9d8bd5b
Update 02_multi-sample.md
jeffe107 Mar 22, 2025
cacea16
Update 01_pipeline.md
jeffe107 Mar 22, 2025
b8b4ace
Update index.md
jeffe107 Mar 22, 2025
ffd10c2
Update mkdocs.yml
jeffe107 Mar 22, 2025
0c96dac
Update 01_pipeline.md
jeffe107 Mar 22, 2025
56f7d3e
Update 01_pipeline.md
jeffe107 Mar 22, 2025
c4cb74d
Update 01_pipeline.md
jeffe107 Mar 22, 2025
ae8e1d7
Update 01_pipeline.md
jeffe107 Mar 22, 2025
b99c1f3
Update 01_pipeline.md
jeffe107 Mar 22, 2025
567d59d
Update 01_pipeline.md
jeffe107 Mar 22, 2025
2ad28d7
Update 01_pipeline.md
jeffe107 Mar 22, 2025
a5734d3
Update 01_pipeline.md
jeffe107 Mar 22, 2025
c8fd070
Update 01_pipeline.md
jeffe107 Mar 22, 2025
10c986e
Update 01_pipeline.md
jeffe107 Mar 22, 2025
cf3eff9
Update 01_pipeline.md
jeffe107 Mar 22, 2025
1b5e2a8
Update 01_pipeline.md
jeffe107 Mar 22, 2025
d8526e9
Update 01_pipeline.md
jeffe107 Mar 22, 2025
04e05d2
Update 01_pipeline.md
jeffe107 Mar 22, 2025
2ace274
Update 01_pipeline.md
jeffe107 Mar 22, 2025
09aae31
Update 02_multi-sample.md
jeffe107 Mar 22, 2025
05954fa
Delete docs/assets/img/workflow_kraken.png
jeffe107 Mar 22, 2025
f50e795
Add files via upload
jeffe107 Mar 22, 2025
f0e3234
Delete docs/assets/img/workflow_kraken.png
jeffe107 Mar 22, 2025
3c34832
Add files via upload
jeffe107 Mar 22, 2025
5c9f247
Update 01_pipeline.md
jeffe107 Mar 23, 2025
53a1940
Update 01_pipeline.md
jeffe107 Mar 23, 2025
e6ad134
Merge branch 'master' into master
adamrtalbot May 2, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Binary file added docs/assets/img/workflow_kraken.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
92 changes: 92 additions & 0 deletions docs/nf4_science/metagenomics/00_orientation.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,92 @@
# Orientation

The training environment contains all the software, code and data necessary to work through this training course, so you don't need to install anything yourself.
However, you do need a (free) account to log in, and you should take a few minutes to familiarize yourself with the interface.

If you have not yet done so, please the [Environment Setup](../../envsetup/) mini-course before going any further.

## Materials provided

For the purpose of the course, we'll be working in the `nf4-science/metagenomics/` directory, therefore let's move into it since here you will find all the code files, test data and accessory files you will need. Run the command:

```bash
cd nf4-science/metagenomics/
```

There are some files that we need to download since they are quite large to be permanently stored within the GitHub repository. The file (or set of files) that we are going to download is the required database by Kraken2 and Bracken. Run the following commands in the exact order and wait until all of them are finished:

```bash
mkdir -p data/viral_db && cd "$_"
wget --no-check-certificate --no-proxy 'https://genome-idx.s3.amazonaws.com/kraken/k2_viral_20241228.tar.gz'
tar -xvzf k2_viral_20241228.tar.gz
rm -r k2_viral_20241228.tar.gz
cd -
```

Now, let's take a look of the files contained in the directory with the command:

```bash
tree . -L 3
```

Here you should see the following directory structure:

```console title="Directory contents"
.
├── bin
│ └── report.Rmd
├── data
│ ├── samples
│ │ ├── ERR2143768
│ │ │ ├── ERR2143768_1.fastq
│ │ │ └── ERR2143768_2.fastq
│ │ ├── ERR2143769
│ │ │ ├── ERR2143769_1.fastq
│ │ │ └── ERR2143769_2.fastq
│ │ ├── ERR2143770
│ │ │ ├── ERR2143770_1.fastq
│ │ │ └── ERR2143770_2.fastq
│ │ └── ERR2143774
│ │ ├── ERR2143774_1.fastq
│ │ └── ERR2143774_2.fastq
│ ├── samplesheet.csv
│ └── yeast
│ ├── yeast.1.bt2
│ ├── yeast.2.bt2
│ ├── yeast.3.bt2
│ ├── yeast.4.bt2
│ ├── yeast.rev.1.bt2
│ └── yeast.rev.2.bt2
├── main.nf
├── modules
│ ├── bowtie2.nf
│ ├── bracken.nf
│ ├── kReport2Krona.nf
│ ├── knit_phyloseq.nf
│ ├── kraken2.nf
│ ├── kraken_biom.nf
│ └── ktImportText.nf
├── nextflow.config
└── workflow.nf
```

You should be back at the `nf4-science/metagenomics/` directory.

!!!note

Don't panic. This is just a glimpse of the material, and we are going to dig into each necessary file for the analysis.

**This a summarized description of the files and directories found:**

- **`main.nf`** is the file we are going to invoke with the worldwide famous `nextflow run` command.
- **`workflow.nf`** is where all the magic happens, it stores the order of execution of tasks and how data should be handled.
- **`nextflow.config`**, you should know what this file does right? JK, with it we can manage different directives for workflow execution.
- **`modules`**, this is a really important folder since here we find dedicated files per each process of the pipeline.
- **`bin`**, in this directory we store customized scripts that can be run within a given process.
- **`data`** contains input data and related resources:
- An indexed genome within the `yeast` folder representing the host genome to which we want to map the reads for contamination removal.
- _viral_db_ is a directory that contains Kraken2 database necessary for both taxonomic annotation and species abundance re-estimation.
- _samplesheet.csv_ lists the IDs and paths of the example data files, for processing in batches.
- _samples_ directory is where the raw sequences are stored. The names correspond to accession numbers that you can search on the [Sequence Read Archive](https://www.ncbi.nlm.nih.gov/sra)

Now, to begin the course, click on the arrow in the bottom right corner of this page.
369 changes: 369 additions & 0 deletions docs/nf4_science/metagenomics/01_pipeline.md

Large diffs are not rendered by default.

204 changes: 204 additions & 0 deletions docs/nf4_science/metagenomics/02_multi-sample.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,204 @@
# Part 2: Process parallelization and multi-sample

In this part, we are going to rely on same pipeline structure we built in Part 1 to extend it for:

1. Multi-sample analysis
2. Use of a Nextflow operator
3. Control the execution of the workflow according to the input
4. Include a process that runs a customized script

---

## 1. Multi-sample input

With our shining brand-new pipeline, we are at this moment able to analyze each sample individually by running the workflow multiple times. Nonetheless, one of the most powerful capabilities by Nextflow is its native parallel execution according to the available resources the executor finds. You can think of this as a sort of "integrated _for_ loop" that will process all the samples in parallel in a single run without the need of re-running the pipeline.

To achieve this purpose, there are two possibilities:

* The use of wildcards in the input (this can be tricky and requires to take into account particular folder structures).
* Create a file that points out to the sample files regardless of their location in the file system.

In this course, we will target the second input option, albeit you are welcome to explore how you can use the first option by checking the [Nextflow documentation](https://www.nextflow.io/docs/latest/working-with-files.html).

To move forward, let's create then the file `samplesheet.csv` inside the folder **data**:

```csv title="data/samplesheet.csv" linenums="1"
sample_id,fastq_1,fastq_2
ERR2143768,/workspaces/training/nf4-science/metagenomics/data/samples/ERR2143768/ERR2143768_1.fastq,/workspaces/training/nf4-science/metagenomics/data/samples/ERR2143768/ERR2143768_2.fastq
ERR2143769,/workspaces/training/nf4-science/metagenomics/data/samples/ERR2143769/ERR2143769_1.fastq,/workspaces/training/nf4-science/metagenomics/data/samples/ERR2143769/ERR2143769_2.fastq
ERR2143770,/workspaces/training/nf4-science/metagenomics/data/samples/ERR2143770/ERR2143770_1.fastq,/workspaces/training/nf4-science/metagenomics/data/samples/ERR2143770/ERR2143770_2.fastq
ERR2143774,/workspaces/training/nf4-science/metagenomics/data/samples/ERR2143774/ERR2143774_1.fastq,/workspaces/training/nf4-science/metagenomics/data/samples/ERR2143774/ERR2143774_2.fastq
```

Here, we have provided the `sample id` and the absolute paths to both forward and reverse reads per sample. Please notice that the files are not required to be stored in the directory; however, it is recommend to maintain a consistent folder structure.

Now, we can not use this file as input in the current state of the pipeline given that it expects only a path to create a paire-end channel. Let's include then an additional parameter (we can use any name, feel creative!) in the `nextflow.config` file (notice that it would go inside the parameter block, keeping the same structure):

```groovy title="nextflow.config" linenums="10"
sheet_csv = null
```

We initialize this parameter as `null` since it can be used or not. Now, we need to modify the `main.nf` file to state how the input should be handled depending of the type of input:

```groovy title="main.nf" linenums="22"
if(params.reads){
reads_ch = Channel .fromFilePairs( params.reads, checkIfExists:true )
} else {
reads_ch = Channel.fromPath( params.sheet_csv )
.splitCsv(header:true)
.map { row-> tuple(row.sample_id, [file(row.fastq_1), file(row.fastq_2)]) }
}
```

This modified declaration states that if we use the parameter `--reads` when we invoke the `nextflow run main.nf`, the _reads_ channel will be created using only the path to paired-end files. Otherwise, we must include the parameter `--sheet_csv` with the corresponding file containing the sample information. Being so, it is necessary to use one of the two forms of input; if we use both at the same time, the `--reads` will predominate or if none of them is indicated, the pipeline will fail. Do not worry now for the way in which channel is created using the `.csv` file, this declaration is quite stantard and you can just copy and paste for other pipelines in which you would like to use it; however, you can learn more about this [here](https://nextflow-io.github.io/patterns/process-per-csv-record/).

Now, we would be ready to re-run the pipeline to process all the samples in a single call. Notwithstanding, the inclusion of additional samples has the advantage that we can expand the analysis to estimate β-diversity and compare them to extract important insights.

---

## 2. Additional processes

### 2.1 Kraken-biom

Let's create a new module that is going to handle the Bracken output to produce a Biological Observation Matrix (BIOM) file that concatenates the species abundance in each sample. The `kraken_biom.nf` file will be located in the **modules** directory:

```groovy title="modules/kraken_biom.nf" linenums="1"
process KRAKEN_BIOM {
tag "merge_samples"
publishDir "$params.outdir", mode:'copy'
container "community.wave.seqera.io/library/kraken-biom:1.2.0--f040ab91c9691136"

input:
val "files"

output:
path "merged.biom"

script:
"""
list=(${files.join(' ')})
extracted=\$(echo "\${list[@]}" | tr ' ' '\n' | awk 'NR % 3 == 2')
kraken-biom \${extracted} --fmt json -o merged.biom
"""
}
```

This process will _collect_ each output from the Bracken files to build a single `*.biom` file that contains the abundance species data of all the samples. In the `script` statement we find three tasks to execute, the first two lines are for variable manipulation required to handle the type of input this process receives (more about this when modifying `workflow.nf` below), and the second line executes the kraken-biom command that is available thanks to specified container.

### 2.1.1 Operator _collect()_ and conditional execution

Nextflow provides a high number of operators that smooth data handling and orchestrates the workflow to do exactly what we want. In this case, the process `KRAKEN_BIOM` requires all the files produced by Bracken belonging to each sample, which means that `KRAKEN_BIOM` can not be triggered until all Bracken processes are finished. For this task, the operator _collect()_ comes really handy, and therefore let's include it in our `workflow.nf`... but wait! Let's recall that `KRAKEN_BIOM` and the following `KNIT_PHYLOSEQ` are only triggered if the execution is aiming at processing more than one sample. Being so, we will include these processes and modify the workflow execution to add the conditional statement in the `workflow.nf`:

```groovy title="workflow.nf" linenums="9"
include { KRAKEN_BIOM } from './modules/kraken_biom.nf'
```

```groovy title="workflow.nf" linenums="29"
if(params.sheet_csv){
KRAKEN_BIOM(BRACKEN.out.collect())
}
```

Here, you can see that we have added the operator _collect()_ to capture all the output files from `BRACKEN`, and this is happening only if we are using as input `--sheet_csv`. This operator is going to return a list of the elements specified in the output of the process (`BRACKEN`), and, for instance, we are interested in each "second" (indices 1,4,7...) element of the list to run the _kraken-biom_ command; this is the reason why within the `script` statement in `kraken_biom.nf` we have incluced two codelines to obtain the paths to these files. If this is not entirely clear, please check the [Nextflow documentation](https://www.nextflow.io/docs/latest/reference/operator.html#collect).

## 2.2 Phyloseq

### 2.2.1 Including a customized script

We are at the last step of the pipeline execution, and now we need to process the `*.biom` file by transforming it into a Phyloseq object, which is easier to use, more intuitive to understand, and is equipped with multiple tools and methods to plot. Another amazing feature by Nextflow is the possibility to run the so-called _Scripts à la carte_, which means that a process does not necessarily requires an external tool to execute, and hence you can develop your own analysis with customized scripts, i.e., R or Python. Here, we will run an R script inside the module `knit_phyloseq.nf` to create and process the Phyloseq object taking as input the ouput from `kraken_biom.nf`:

```groovy title="modules/kraken_biom.nf" linenums="1"
process KNIT_PHYLOSEQ {
tag "knit_phyloseq"
publishDir "$params.outdir", mode:'copy'
container "community.wave.seqera.io/library/bioconductor-phyloseq_knit_r-base_r-ggplot2_r-rmdformats:6efceb52eb05eb44"

input:
path "merged"

output:
stdout

script:
def report = params.report
def outdir = params.outdir
"""
biom_path=\$(realpath ${merged})
outreport=\$(realpath ${outdir})
Rscript -e "rmarkdown::render('${report}', params=list(args='\${biom_path}'),output_file='\${outreport}/report.html')"
"""
}
```

As you can see, we are declaring some variables both in Nextflow and bash to able to call the script. This is a special case since this type of scripts can be stored in the **bin** directory for Nextflow to find them directly. Nevertheless, as we are not "running the script" directly but we are calling `Rscript` to render a final `*.html` report, Nextflow is not able to automatically find the customized script nor detect when report is rendered. As a result the ouput from this process is just a standard/command-line ouput, and we have to include an additional parameter in the `nextflow.config` file:

```groovy title="nextflow.config" linenums="11"
report = "/workspaces/training/nf4-science/metagenomics/bin/report.Rmd"
```

In addition, please notice the `container` used for the `KNIT_PHYLOSEQ`, which is combination of multiple packages required to render the `*.html` report. This is possible thanks to an awesome tool called [Seqera Containers](https://seqera.io/containers/), which is able to build almost any container (for docker or singularity!) by just "merging" different PyPI or Conda packages; please give it a try and be amazed by Seqera Containers.

Also, we have to include this new process within `workflow.nf`:

```groovy title="workflow.nf" linenums="10"
include { KNIT_PHYLOSEQ } from './modules/knit_phyloseq.nf'
```

We need to call it as well inside the conditional execution if multi-sample is being handled:

```groovy title="workflow.nf" linenums="31"
KNIT_PHYLOSEQ(KRAKEN_BIOM.out)
```

---

## 3. Execution

Now, we are completely set to run the analysis for as many samples as we would like, and we will obtain a final report depicting different metrics regarding taxonomic abundance, network analysis, and α and β-diversity. Let's execute:

```bash
nextflow run main.nf --sheet_csv 'data/samplesheet.csv'
```

On the output of the command line, you will see:

```console title="Output"
N E X T F L O W ~ version 24.10.4

Launching `main.nf` [stoic_miescher] DSL2 - revision: 8f65b983e6

__________________________________________________________________________________________________________________________________________________
__________________________________________________________________________________________________________________________________________________
>=> >=> >=> >=> >=>>=> >=>
>=> >=> >=> >=>>=> >=> >> >=> >=>
>=> >=> >> >==> >=> >=> >=> >=> >==> >==>>==> >> >=> >=> >> >=> >> >==> >=> >=> >==> >=> >=> >==> >==>>==>
>>=>> >=> >=> >=> >=> >=> >> >=> >=> >=> >=> >=> >==>>=> >=> >=> >=> >=> >=> >=> >> >=> >=> >=>
>=> >=> >=> >=> >=> >=>=> >>===>>=> >=> >=> >=> >=> >> >=> >=> >=> >=> >=> >=>=> >>===>>=> >=> >=>
>=> >=> >=> >=> >=> >=> >=> >> >=> >=> >=> >=> >> >> >=> >=> >=> >=> >=> >=> >> >=> >=>
>=> >=> >==> >==>>>==> >=> >=> >====> >==> >=> >======> >=> >===>>=> >==> >==>>>==> >==> >=> >=> >====> >==> >=>
__________________________________________________________________________________________________________________________________________________
__________________________________________________________________________________________________________________________________________________

executor > local (22)
[4e/914152] kraken2Flow:BOWTIE2 (ERR2143774) [100%] 4 of 4 ✔
[bf/7fcac7] kraken2Flow:KRAKEN2 (ERR2143774) [100%] 4 of 4 ✔
[f5/aa12aa] kraken2Flow:BRACKEN (ERR2143774) [100%] 4 of 4 ✔
[e9/84eb9d] kraken2Flow:K_REPORT_TO_KRONA (ERR2143774) [100%] 4 of 4 ✔
[59/456551] kraken2Flow:KT_IMPORT_TEXT (ERR2143768) [100%] 4 of 4 ✔
[da/7b9f45] kraken2Flow:KRAKEN_BIOM (merge_samples) [100%] 1 of 1 ✔
[d0/deccc9] kraken2Flow:KNIT_PHYLOSEQ (knit_phyloseq) [100%] 1 of 1 ✔
```

Keep in mind that since the execution is in parallel, the order in which the samples are processed is random and the order in which `sample ids` appear will differ among executions. Also, during while the pipeline is running you will see that `KRAKEN_BIOM`, and hence `KNIT_PHYLOSEQ`, will not be triggered until all the samples are processed by the previous processes.

Finally, inside the **output** directory, you will see multiple folders with the exact `sample ids`, and within these all the output files, including the files to visualize the Krona plots. Likewise, in the **output** folder you will see the file `report.html` which is ready to be opened and explored. It's your time to analyze it!

---

### Takeaway

You just learnt how control workflow execution by including conditionals and operators, process multiple samples simultaneously and running a customized script to perform a metagenomics data analysis at read level.

### What's next?

Great! You are well equipped now to start developing your ownn pipelines.
Loading
Loading