|
| 1 | +Sample CWL Command Line Tools. |
| 2 | + |
| 3 | +# Testing CWLs |
| 4 | + |
| 5 | +Test directory includes: |
| 6 | +* dm3_chr4.fa - Chromosome 4 of Drosophila genome |
| 7 | +* dm3_chr4.gtf - Chromosome 4 RefSeq annotation file |
| 8 | +* SRR1031972.fastq - The reduced raw reads file ( reads from Chromosome 4 only, RNA-Seq data) |
| 9 | + |
| 10 | +To test tools they have to be executed in particular order to produce input for the next one. Make ```./workflow/tools``` your current working directory |
| 11 | +and you will run command like this ```cwtool --basedir ./ ./TOOL.cwl ./jobs/TOOL-job.json```. |
| 12 | + |
| 13 | +## Tools |
| 14 | + |
| 15 | +Indexing genome |
| 16 | +--------------- |
| 17 | + |
| 18 | +The first step is to indexing our genome. I'm going to use *STAR genereteGenome*. To do that |
| 19 | +I run ```cwltool --basedir ./ ./STAR.cwl ./jobs/STAR-job-index.json```. dm3 genome will be place into ./test-files/dm3 directory. |
| 20 | + |
| 21 | +Output from cwltool: |
| 22 | + |
| 23 | +``` |
| 24 | +/usr/local/bin/cwltool 1.0.20151125221324 |
| 25 | +[job 4532697104] ./workflows/tools$ docker run -i --volume=./workflows/tools/test-files/dm3_chr4.gtf:/tmp/job257401672_test-files/dm3_chr4.gtf:ro --volume=./workflows/tools/test-files/dm3_chr4.fa:/tmp/job257401672_test-files/dm3_chr4.fa:ro --volume=./workflows/tools:/tmp/job_output:rw --volume=/var/folders/hx/3qsmpl9s50zdmn49jb03l_tw0000gn/T/tmpWo01Wd:/tmp/job_tmp:rw --workdir=/tmp/job_output --read-only=true --user=1000 --rm --env=TMPDIR=/tmp/job_tmp --env=PATH=/usr/local/bin/:/usr/bin:/bin scidap/star:v2.5.0a STAR --genomeDir ./test-files/dm3/ --genomeFastaFiles /tmp/job257401672_test-files/dm3_chr4.fa --outBAMcompression 10 --outSAMmode Full --outSAMtype BAM SortedByCoordinate --outStd Log --runMode genomeGenerate --runThreadN 4 --sjdbGTFfile /tmp/job257401672_test-files/dm3_chr4.gtf --sjdbOverhang 100 |
| 26 | +Nov 26 17:57:46 ..... Started STAR run |
| 27 | +Nov 26 17:57:46 ... Starting to generate Genome files |
| 28 | +Nov 26 17:57:46 ... starting to sort Suffix Array. This may take a long time... |
| 29 | +Nov 26 17:57:46 ... sorting Suffix Array chunks and saving them to disk... |
| 30 | +Nov 26 17:57:47 ... loading chunks from disk, packing SA... |
| 31 | +Nov 26 17:57:47 ... Finished generating suffix array |
| 32 | +Nov 26 17:57:47 ... Generating Suffix Array index |
| 33 | +Nov 26 17:57:50 ... Completed Suffix Array index |
| 34 | +Nov 26 17:57:50 ..... Processing annotations GTF |
| 35 | +Nov 26 17:57:50 ..... Inserting junctions into the genome indices |
| 36 | +Nov 26 17:57:58 ... writing Genome to disk ... |
| 37 | +Nov 26 17:57:58 ... writing Suffix Array to disk ... |
| 38 | +Nov 26 17:57:58 ... writing SAindex to disk |
| 39 | +Nov 26 17:58:35 ..... Finished successfully |
| 40 | +Final process status is success |
| 41 | +{ |
| 42 | + "indices": { |
| 43 | + "path": "./test-files/dm3//Genome", |
| 44 | + "size": 1753563, |
| 45 | + "secondaryFiles": [ |
| 46 | + { |
| 47 | + "path": "./test-files/dm3//SA", |
| 48 | + "class": "File" |
| 49 | + }, |
| 50 | + { |
| 51 | + "path": "./test-files/dm3//SAindex", |
| 52 | + "class": "File" |
| 53 | + }, |
| 54 | + { |
| 55 | + "path": "./test-files/dm3//chrNameLength.txt", |
| 56 | + "class": "File" |
| 57 | + }, |
| 58 | + { |
| 59 | + "path": "./test-files/dm3//chrLength.txt", |
| 60 | + "class": "File" |
| 61 | + }, |
| 62 | + { |
| 63 | + "path": "./test-files/dm3//chrStart.txt", |
| 64 | + "class": "File" |
| 65 | + }, |
| 66 | + { |
| 67 | + "path": "./test-files/dm3//geneInfo.tab", |
| 68 | + "class": "File" |
| 69 | + }, |
| 70 | + { |
| 71 | + "path": "./test-files/dm3//sjdbList.fromGTF.out.tab", |
| 72 | + "class": "File" |
| 73 | + }, |
| 74 | + { |
| 75 | + "path": "./test-files/dm3//chrName.txt", |
| 76 | + "class": "File" |
| 77 | + }, |
| 78 | + { |
| 79 | + "path": "./test-files/dm3//exonGeTrInfo.tab", |
| 80 | + "class": "File" |
| 81 | + }, |
| 82 | + { |
| 83 | + "path": "./test-files/dm3//genomeParameters.txt", |
| 84 | + "class": "File" |
| 85 | + }, |
| 86 | + { |
| 87 | + "path": "./test-files/dm3//sjdbList.out.tab", |
| 88 | + "class": "File" |
| 89 | + }, |
| 90 | + { |
| 91 | + "path": "./test-files/dm3//exonInfo.tab", |
| 92 | + "class": "File" |
| 93 | + }, |
| 94 | + { |
| 95 | + "path": "./test-files/dm3//sjdbInfo.txt", |
| 96 | + "class": "File" |
| 97 | + }, |
| 98 | + { |
| 99 | + "path": "./test-files/dm3//transcriptInfo.tab", |
| 100 | + "class": "File" |
| 101 | + } |
| 102 | + ], |
| 103 | + "class": "File", |
| 104 | + "checksum": "sha1$761906d19ceb10a0e2677afdfb756c4f1ca925a1" |
| 105 | + }, |
| 106 | + "aligned": null, |
| 107 | + "mappingstats": null |
| 108 | +}% |
| 109 | +``` |
| 110 | + |
| 111 | +Reads alignment |
| 112 | +--------------- |
| 113 | + |
| 114 | +To align reads run ```cwltool --basedir ./ ./STAR.cwl ./jobs/STAR-job-rna.json``` |
| 115 | + |
| 116 | +``` |
| 117 | +/usr/local/bin/cwltool 1.0.20151126171959 |
| 118 | +[job 4518994960] ./workflows/tools$ docker run -i --volume=./workflows/tools/test-files/dm3/:/tmp/job958208261_test-files/dm3/:ro --volume=./workflows/tools/test-files/SRR1031972.fastq:/tmp/job958208261_test-files/SRR1031972.fastq:ro --volume=./workflows/tools:/tmp/job_output:rw --volume=/var/folders/hx/3qsmpl9s50zdmn49jb03l_tw0000gn/T/tmptuCn_d:/tmp/job_tmp:rw --workdir=/tmp/job_output --read-only=true --user=1000 --rm --env=TMPDIR=/tmp/job_tmp --env=PATH=/usr/local/bin/:/usr/bin:/bin scidap/star:v2.5.0a STAR --genomeDir /tmp/job958208261_test-files/dm3/ --outBAMcompression 10 --outFileNamePrefix ./test-files/SRR1031972. --outSAMmode Full --outSAMtype BAM SortedByCoordinate --outStd Log --readFilesIn /tmp/job958208261_test-files/SRR1031972.fastq --runMode alignReads --runThreadN 4 |
| 119 | +Nov 26 19:27:58 ..... Started STAR run |
| 120 | +Nov 26 19:27:58 ..... Loading genome |
| 121 | +Nov 26 19:28:08 ..... Started mapping |
| 122 | +Nov 26 19:29:34 ..... Started sorting BAM |
| 123 | +Nov 26 19:29:36 ..... Finished successfully |
| 124 | +Final process status is success |
| 125 | +{ |
| 126 | + "indices": null, |
| 127 | + "aligned": { |
| 128 | + "path": "./workflows/tools/./test-files/SRR1031972.Aligned.sortedByCoord.out.bam", |
| 129 | + "size": 22139153, |
| 130 | + "secondaryFiles": [ |
| 131 | + { |
| 132 | + "path": "./test-files/SRR1031972.Log.final.out", |
| 133 | + "class": "File" |
| 134 | + }, |
| 135 | + { |
| 136 | + "path": "./test-files/SRR1031972.SJ.out.tab", |
| 137 | + "class": "File" |
| 138 | + }, |
| 139 | + { |
| 140 | + "path": "./test-files/SRR1031972.Log.out", |
| 141 | + "class": "File" |
| 142 | + } |
| 143 | + ], |
| 144 | + "class": "File", |
| 145 | + "checksum": "sha1$ba38fcd1f238553d244f339d1147cd591324e207" |
| 146 | + }, |
| 147 | + "mappingstats": "[{\"Started job on \":\"Nov 26 19:27:58\"},{\"Started mapping on \":\"Nov 26 19:28:08\"},{\"Finished on \":\"Nov 26 19:29:36\"},{\"Mapping speed, Million of reads per hour \":\"8.55\"},{\"Number of input reads \":\"209081\"},{\"Average input read length \":\"40\"},{\"Uniquely mapped reads number \":\"64313\"},{\"Uniquely mapped reads % \":\"30.76%\"},{\"Average mapped length \":\"38.19\"},{\"Number of splices: Total \":\"14213\"},{\"Number of splices: Annotated (sjdb) \":\"1640\"},{\"Number of splices: GT/AG \":\"12072\"},{\"Number of splices: GC/AG \":\"299\"},{\"Number of splices: AT/AC \":\"3\"},{\"Number of splices: Non-canonical \":\"1839\"},{\"Mismatch rate per base, % \":\"1.86%\"},{\"Deletion rate per base \":\"0.00%\"},{\"Deletion average length \":\"1.25\"},{\"Insertion rate per base \":\"0.00%\"},{\"Insertion average length \":\"1.03\"},{\"Number of reads mapped to multiple loci \":\"144768\"},{\"% of reads mapped to multiple loci \":\"69.24%\"},{\"Number of reads mapped to too many loci \":\"0\"},{\"% of reads mapped to too many loci \":\"0.00%\"},{\"% of reads unmapped: too many mismatches \":\"0.00%\"},{\"% of reads unmapped: too short \":\"0.00%\"},{\"% of reads unmapped: other \":\"0.00%\"}]" |
| 148 | +}% |
| 149 | +``` |
| 150 | + |
| 151 | +Indexing .bam file |
| 152 | +------------------ |
| 153 | + |
| 154 | +To index the .bam file ```cwltool --basedir ./ --outdir ./test-files ./samtools-index.cwl ./jobs/samtools-index-job.json``` |
| 155 | + |
| 156 | +Result: |
| 157 | +```json |
| 158 | +{ |
| 159 | + "sorted": { |
| 160 | + "path": "././test-files/SRR1031972.Aligned.sortedByCoord.out.bam.bai", |
| 161 | + "size": 40528, |
| 162 | + "class": "File", |
| 163 | + "checksum": "sha1$83738ffada23f654ba1f471973c7dccceb14cffc" |
| 164 | + } |
| 165 | +} |
| 166 | +``` |
| 167 | + |
| 168 | +Genome coverage |
| 169 | +--------------- |
| 170 | + |
| 171 | +To create a genome coverage .bedGraph file ```cwltool --basedir ./ ./bedtools-genomecov.cwl ./jobs/bedtools-genomecov-job.json``` |
| 172 | + |
| 173 | +Result: |
| 174 | +```json |
| 175 | +{ |
| 176 | + "genomecoverage": { |
| 177 | + "path": "./workflows/tools/./test-files/SRR1031972.bedGraph", |
| 178 | + "size": 1423143, |
| 179 | + "class": "File", |
| 180 | + "checksum": "sha1$dd87be96fc201734c2e5017f86e056b4bb0b2b3f" |
| 181 | + } |
| 182 | +} |
| 183 | +``` |
| 184 | + |
| 185 | +Sort bedGraph |
| 186 | +------------- |
| 187 | + |
| 188 | +To sort the .bedGraph file by first and second column ```cwltool --basedir ./ --outdir ./test-files ./linux-sort.cwl ./jobs/linux-sort-job.json``` |
| 189 | + |
| 190 | +Result: |
| 191 | +```json |
| 192 | +{ |
| 193 | + "sorted": { |
| 194 | + "path": "././test-files/SRR1031972.bedGraph.sorted", |
| 195 | + "size": 1423143, |
| 196 | + "class": "File", |
| 197 | + "checksum": "sha1$dd87be96fc201734c2e5017f86e056b4bb0b2b3f" |
| 198 | + } |
| 199 | +} |
| 200 | +``` |
| 201 | + |
| 202 | +bedGraph to bigWig |
| 203 | +------------------ |
| 204 | + |
| 205 | +To produce final .bigWig file ```cwltool --basedir ./ ./ucsc-bedGraphToBigWig.cwl ./jobs/ucsc-bedGraphToBigWig-job.json``` |
| 206 | + |
| 207 | +Result: |
| 208 | + |
| 209 | +```json |
| 210 | +{ |
| 211 | + "bigWigOut": { |
| 212 | + "path": "./workflows/tools/./test-files/SRR1031972.bigWig", |
| 213 | + "size": 500098, |
| 214 | + "class": "File", |
| 215 | + "checksum": "sha1$5a0332a2fce2303f135439d377f8b7420878a7b5" |
| 216 | + } |
| 217 | +} |
| 218 | +``` |
| 219 | + |
0 commit comments