Skip to content

Commit 42c4360

Browse files
authored
Merge pull request #2 from NSWPH-Genomics/develop
#1 - Fix GTF 2 BED conv + URLs + h38v41
2 parents 9b43fce + 24e39c8 commit 42c4360

File tree

2 files changed

+13
-3
lines changed

2 files changed

+13
-3
lines changed

Makefile

+10-2
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,8 @@ gencode-hg19: gencode.v19.annotation.genes.id4.bed
1212

1313
gencode-hg38: gencode.v27.annotation.genes.bed
1414

15+
gencode-hg38v41: gencode.v41.annotation.genes.bed
16+
1517
ensembl-hg19: Homo_sapiens.GRCh37.82.chr.bed
1618

1719
ensembl-hg38: Homo_sapiens.GRCh38.91.chr.bed
@@ -35,11 +37,16 @@ gencode.v19.annotation.genes.id4.bed: gencode.v19.annotation.genes.bed
3537
# ~~~~~ GENCODE hg38 ~~~~~ #
3638
# generate the Gencode hg38 annotations .bed file
3739
gencode.v27.annotation.gtf.gz:
38-
wget ftp://ftp.sanger.ac.uk/pub/gencode/Gencode_human/release_27/gencode.v27.annotation.gtf.gz
40+
wget ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_27/gencode.v27.annotation.gtf.gz
3941

4042
gencode.v27.annotation.genes.bed: gencode.v27.annotation.gtf.gz
41-
zcat gencode.v27.annotation.gtf.gz | grep -w gene | convert2bed --input=gtf - > gencode.v27.annotation.genes.bed
43+
zcat gencode.v27.annotation.gtf.gz | grep -w gene | awk '{ if ($$0 ~ "transcript_id") print $$0; else print $$0" transcript_id \"\";"; }' | convert2bed --input=gtf - > gencode.v27.annotation.genes.bed
44+
45+
gencode.v41.annotation.gtf.gz:
46+
wget ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_41/gencode.v41.annotation.gtf.gz
4247

48+
gencode.v41.annotation.genes.bed: gencode.v41.annotation.gtf.gz
49+
zcat gencode.v41.annotation.gtf.gz | grep -w gene | awk '{ if ($$0 ~ "transcript_id") print $$0; else print $$0" transcript_id \"\";"; }' | convert2bed --input=gtf - > gencode.v41.annotation.genes.bed
4350

4451

4552
# ~~~~~ ENSEMBL hg19 ~~~~~ #
@@ -102,6 +109,7 @@ Mus_musculus.GRCm38.91.chr.bed: Mus_musculus.GRCm38.91.chr.gtf
102109
Homo_sapiens.GRCh37.82.noGLMT.chr.bed \
103110
Homo_sapiens.GRCh37.82.noGLMT.chr.gtf \
104111
gencode.v27.annotation.gtf.gz \
112+
gencode.v41.annotation.gtf.gz \
105113
Homo_sapiens.GRCh38.91.chr.gtf \
106114
Homo_sapiens.GRCh38.91.chr.gtf.gz \
107115
Homo_sapiens.GRCh37.82.chr.gtf \

README.md

+3-1
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@ cd reference-annotations
1212

1313
Generate the desired annotation files from the available entries:
1414

15-
- `all`, `gencode-hg19`, `gencode-hg38`, `ensembl-hg19`, `ensembl-hg38`, `ensembl-mm10`
15+
- `all`, `gencode-hg19`, `gencode-hg38`, `gencode-hg38v41`, `ensembl-hg19`, `ensembl-hg38`, `ensembl-mm10`
1616

1717
```
1818
make all
@@ -36,6 +36,8 @@ The following files are created:
3636

3737
- `gencode-hg38`: `gencode.v27.annotation.genes.bed`; Gencode hg38 human gene annotations & genomic regions
3838

39+
- `gencode-hg38v41`: `gencode.v41.annotation.genes.bed`; Gencode hg38 human gene annotations version 41 & genomic regions
40+
3941
- `ensembl-hg19`: `Homo_sapiens.GRCh37.82.chr.bed`; Ensembl hg19 human gene annotations & genomic regions
4042

4143
- `ensembl-hg38`: `Homo_sapiens.GRCh38.91.chr.bed`; Ensembl hg38 human gene annotations & genomic regions

0 commit comments

Comments
 (0)