Skip to content

Develop #2462

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 61 commits into from
Closed

Develop #2462

Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
61 commits
Select commit Hold shift + click to select a range
cbfa3f1
TrimGdel folder is added to ~/src/design/
MetNetComp Feb 10, 2025
7b35ad2
TrimGdel folder is added to ~/src/design/
MetNetComp Feb 10, 2025
7f21015
TrimGdel folder is added to ~/src/design/
MetNetComp Feb 10, 2025
6aa81b9
update Function Docs (Automatic Workflow)
rmtfleming Mar 5, 2025
9415d7c
local changes post test
rmtfleming Mar 5, 2025
61026c5
Merge branch 'develop' of github.com:rmtfleming/cobratoolbox into dev…
rmtfleming Mar 5, 2025
d0f82d4
COBRAmodels:fire:code headers:art: add tutorials
MetNetComp Mar 7, 2025
ea515a7
Merge pull request #2455 from opencobra/master
farid-zare Mar 11, 2025
a5bf6a3
update Function Docs (Automatic Workflow)
farid-zare Mar 11, 2025
90de3c8
Minor bugfixes for Persephone
bramnap Mar 12, 2025
b967fd3
Merge pull request #2456 from bramnap/develop
farid-zare Mar 13, 2025
1f5cfa1
PSCM toolbox loadPSCMfile function bug fix and merge conflict resolve
trjhensen Mar 19, 2025
eccd899
Merge pull request #2459 from trjhensen/develop
farid-zare Mar 20, 2025
8446ab9
Update metanetxMapper.m
farid-zare Mar 24, 2025
c32c694
Merge pull request #2460 from farid-zare/fix_metanetxmapper
farid-zare Mar 24, 2025
c6f9826
Merge branch 'opencobra:develop' into develop
bramnap Mar 27, 2025
a99c313
minor bugfix persephone
bramnap Mar 27, 2025
4bf1ceb
add gitkeep files to have default folders in seqc
bramnap Mar 27, 2025
f3d496a
bugfix
bramnap Mar 27, 2025
1b2bebe
Merge branch 'develop' of https://github.com/bramnap/cobratoolbox int…
bramnap Mar 27, 2025
bc622d1
minor fix to analyseWBMsol for gf iWBM scaling
bramnap Mar 28, 2025
5082c58
Merge pull request #2461 from bramnap/develop
farid-zare Mar 31, 2025
bb37ba4
Update the tutorials folder
farid-zare Mar 31, 2025
9456446
Merge pull request #2458 from Yanjun2021/debug_createDummyModel
rmtfleming Apr 1, 2025
aa9bbd0
bugFix initPersephone
bramnap Apr 4, 2025
65441d3
generateRules
farid-zare Apr 4, 2025
57ba95a
Update generateRules
farid-zare Apr 4, 2025
f34b112
efba improvements
rmtfleming Apr 5, 2025
322c50a
Resolved merge conflict by accepting remote changes for conflicting f…
rmtfleming Apr 5, 2025
2b1100b
add removeGeneVersions.m
farid-zare Apr 7, 2025
a69acf7
Add test for removeGeneVersions
farid-zare Apr 7, 2025
f94bafb
Add testBuildGrRules
farid-zare Apr 7, 2025
a041fa2
Merge pull request #2466 from farid-zare/master
farid-zare Apr 7, 2025
9c073ae
Update COBRAModelFields.md
farid-zare Apr 7, 2025
680f8f7
Update COBRAModelFields.md
farid-zare Apr 7, 2025
71c5e9d
Merge pull request #2468 from farid-zare/develop
farid-zare Apr 7, 2025
bf1abad
Bug fix optimizeWBModel
trjhensen Apr 8, 2025
ecd7638
Merge pull request #2469 from trjhensen/develop
farid-zare Apr 8, 2025
7d2ce97
Merge pull request #2464 from bramnap/develop
farid-zare Apr 8, 2025
c04bdbc
testTrimGdel.m is removed.
MetNetComp Apr 9, 2025
28b1d61
Feature update Persephone pipeline for ensureWBMfeasibility function
trjhensen Apr 9, 2025
4b179b0
Accept xlsx and csv
Apr 9, 2025
e218895
Merge pull request #2470 from trjhensen/develop
farid-zare Apr 9, 2025
94801c3
Merge pull request #2471 from annasheehy/develop
farid-zare Apr 10, 2025
96e7677
Overhaul of nutrition toolbox
bramnap May 7, 2025
7340dad
Merge pull request #2427 from MetNetComp/develop
rmtfleming May 8, 2025
0f52913
Merge pull request #2474 from bramnap/develop
farid-zare May 9, 2025
64a090d
Bugfixes for nutrition toolbox
bramnap May 13, 2025
6580c2a
critical fixes
TheWileyB May 13, 2025
cf3babb
Added support for mixed effect regressions in Persephone and fixed sp…
trjhensen May 14, 2025
8c18e79
Merge pull request #2476 from bramnap/develop
farid-zare May 15, 2025
17bc40c
Merge pull request #2478 from trjhensen/develop
farid-zare May 15, 2025
1d87927
Merge pull request #2477 from TheWileyB/develop
farid-zare May 15, 2025
6487c8c
Update saved filenames in vmhFoodFinder.m
bramnap May 19, 2025
177580d
Merge pull request #2479 from bramnap/develop
farid-zare May 20, 2025
67ff29e
Delete docs directory
farid-zare May 20, 2025
b65bac0
Merge pull request #2480 from farid-zare/develop
farid-zare May 20, 2025
48fc036
Delete docs directory
farid-zare May 20, 2025
74ddb0d
Merge pull request #2465 from rmtfleming/develop
farid-zare May 21, 2025
498d3ba
Bug Fixes for Nutrition Toolbox
CCThinnes May 21, 2025
7d02aa4
Merge pull request #2481 from CCThinnes/develop
farid-zare May 22, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
166 changes: 83 additions & 83 deletions documentation/source/notes/COBRAModelFields.md

Large diffs are not rendered by default.

8 changes: 4 additions & 4 deletions src/analysis/persephone/SeqC_pipeline/.dockerignore
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
**/*.log
Dockerfile
.git
.gitignore
**/*.log
Dockerfile
.git
.gitignore
.env
10 changes: 5 additions & 5 deletions src/analysis/persephone/SeqC_pipeline/.gitignore
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
/seqc_proc/{DEPO_demo,DEPO_proc,REPO_host,REPO_tool}/
/seqc_input/
/seqc_output/
/seqc_proc/{DEPO_demo,DEPO_proc,REPO_host,REPO_tool}/*
/seqc_input/*
/seqc_output/*
.DS_Store
#/DB/REPO_tool/kraken/taxonomy/nucl_{gb,wgs}.accession2taxid.gz
#/DB/REPO_host
# Retain core directories
!/seqc_{input,output,proc}/.gitkeep
#echo .DS_Store >> ~/.gitignore_global
#git config --global core.excludesfile ~/.gitignore_global
100 changes: 54 additions & 46 deletions src/analysis/persephone/SeqC_pipeline/BASH_seqc_makedb.sh
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
# Title: make pipeline DBs
# Program by: Wiley Barton - 2022.02.07
# Modified for conda/docker pipeline - 2024.02.22
# last update - 2025.02.10
# last update - 2025.05.05
# Modified code sources:
# check volume size: https://stackoverflow.com/questions/8110530/check-free-disk-space-for-current-partition-in-bash
# semi-array in env var: https://unix.stackexchange.com/questions/393091/unable-to-use-an-array-as-environment-variable
Expand All @@ -22,6 +22,8 @@
# link dled genomes between kraken
# clear reduntancy with db build via drop of accession from get list
# better compression: tar pigball: tar cvf - /data/share/kdb_a2a/ | pigz --best - > /data/share/kdb_a2a.tar.gz
# evaluate and implement zenodo_get: pip3 install zenodo_get, zenodo_get -r 14888918
# update hsap genome to: https://huttenhower.sph.harvard.edu/kneadData_databases/Homo_sapiens_hg39_T2T_Bowtie2_v0.1.tar.gz
#======================================================================================================#
# Set vars
v_debug=0
Expand All @@ -37,6 +39,8 @@ v_vol_free=$(( $(stat --file-system --format="%a*%S" /) ))
v_vol_mb=1048576
# size of 1 gigabyte GB
v_vol_gb=1073741824
# size of 1 GB in MB
v_vol_gm=1000
# fixed used size
#v_vol_used=$(printf %.0f $(echo "${v_vol_gb} * 161.4" | bc -l))
v_vol_used=0
Expand Down Expand Up @@ -65,36 +69,40 @@ varr_db_path[$vn]=${varr_db_path[0]}'/REPO_host/hsap_contam/bowtie2'
varr_db_gets[$vn]='wget --no-check-certificate http://huttenhower.sph.harvard.edu/kneadData_databases/Homo_sapiens_hg37_and_human_contamination_Bowtie2_v0.1.tar.gz -O '${varr_db_path[$vn]}'/hsap_hg37_contam.tar.gz'
varr_db_pack[$vn]=${varr_db_pack[0]}${varr_db_path[$vn]}'/hsap_hg37_contam.tar.gz'' --directory '${varr_db_path[$vn]}
# check - if size > 1 then assume preexisting
if [[ "$(du ${varr_db_path[$vn]} 2> /dev/null | cut --fields 1 | tail -1)" -gt 1 ]];then
varr_db_check[$vn]=1
else
varr_db_check[$vn]=0
fi
# TODO - set to compression size as min threshold
varr_db_size[$vn]=7.4
# Pull current size
vt_L=$(du -BM ${varr_db_path[$vn]} 2> /dev/null | cut --fields 1 | tail -1 | sed 's|M||g')
# Pull expected size
vt_R=$(printf %.0f $(echo "${v_vol_gm} * ${varr_db_size[$vn]}" | bc -l))
# test and set accordingly
[[ ${vt_L} -gt ${vt_R} ]] && varr_db_check[$vn]=1 || varr_db_check[$vn]=0
# OoD ifelse approach
#if [[ "$(du -BM ${varr_db_path[$vn]} 2> /dev/null | cut --fields 1 | tail -1)" -gt "${varr_db_size[$vn]}" ]];then
#varr_db_check[$vn]=1
#else
#varr_db_check[$vn]=0
#fi
# host - ncbi-bt2 - btau - 3.7G ~ 30min w/ 6 threads
((vn++))
varr_db_name[$vn]='host_kd_btau'
varr_db_path[$vn]=${varr_db_path[0]}'/REPO_host/btau'
varr_db_gets[$vn]='datasets download genome accession GCF_002263795.3 --include genome --filename '${varr_db_path[$vn]}'/btau_ARS-UCD2.0.zip && unzip -qq '${varr_db_path[$vn]}'/btau_ARS-UCD2.0.zip -d '${varr_db_path[$vn]}' && micromamba run -n env_s1_kneaddata bowtie2-build --threads '${venv_cpu_max}' '${varr_db_path[$vn]}'/ncbi_dataset/data/GCF_002263795.3/GCF_002263795.3_ARS-UCD2.0_genomic.fna '${varr_db_path[$vn]}'/bowtie2/btau_ucd2'
if [[ "$(du ${varr_db_path[$vn]} 2> /dev/null | cut --fields 1 | tail -1)" -gt 1 ]];then
varr_db_check[$vn]=1
else
varr_db_check[$vn]=0
fi
varr_db_size[$vn]=3.7
vt_L=$(du -BM ${varr_db_path[$vn]} 2> /dev/null | cut --fields 1 | tail -1 | sed 's|M||g')
vt_R=$(printf %.0f $(echo "${v_vol_gm} * ${varr_db_size[$vn]}" | bc -l))
[[ ${vt_L} -gt ${vt_R} ]] && varr_db_check[$vn]=1 || varr_db_check[$vn]=0
# host - bowtie2 - mmus 3.5G
((vn++))
varr_db_name[$vn]='host_kd_mmus'
varr_db_path[$vn]=${varr_db_path[0]}'/REPO_host/mmus/bowtie2'
#varr_db_gets[3]='micromamba run -n env_s1 kneaddata_database --download mouse_C57BL bowtie2 '${varr_db_path[3]}'/C57BL.tar.gz'
varr_db_gets[$vn]='wget --no-check-certificate http://huttenhower.sph.harvard.edu/kneadData_databases/mouse_C57BL_6NJ_Bowtie2_v0.1.tar.gz -O '${varr_db_path[$vn]}'/C57BL.tar.gz'
varr_db_pack[$vn]=${varr_db_pack[0]}${varr_db_path[$vn]}'/C57BL.tar.gz'' --directory '${varr_db_path[$vn]}
if [[ "$(du ${varr_db_path[$vn]} 2> /dev/null | cut --fields 1 | tail -1)" -gt 1 ]];then
varr_db_check[$vn]=1
else
varr_db_check[$vn]=0
fi
varr_db_size[$vn]=3.5
vt_L=$(du -BM ${varr_db_path[$vn]} 2> /dev/null | cut --fields 1 | tail -1 | sed 's|M||g')
vt_R=$(printf %.0f $(echo "${v_vol_gm} * ${varr_db_size[$vn]}" | bc -l))
[[ ${vt_L} -gt ${vt_R} ]] && varr_db_check[$vn]=1 || varr_db_check[$vn]=0
# Tool repo
#checkm2 - 2.9G
((vn++))
Expand All @@ -105,12 +113,10 @@ varr_db_gets[$vn]='micromamba run -n env_s3_checkm2 checkm2 database --download
varr_db_pack[$vn]=''
#varr_db_gets[4]='wget https://zenodo.org/records/5571251/files/checkm2_database.tar.gz -O '${varr_db_path[4]}'/CheckM2_database.tar.gz'
#varr_db_pack[4]=${varr_db_pack[0]}${varr_db_path[4]}'/CheckM2_database.tar.gz'' --directory '${varr_db_path[4]}
if [[ "$(du ${varr_db_path[$vn]} 2> /dev/null | cut --fields 1 | tail -1)" -gt 1 ]];then
varr_db_check[$vn]=1
else
varr_db_check[$vn]=0
fi
varr_db_size[$vn]=2.9
vt_L=$(du -BM ${varr_db_path[$vn]} 2> /dev/null | cut --fields 1 | tail -1 | sed 's|M||g')
vt_R=$(printf %.0f $(echo "${v_vol_gm} * ${varr_db_size[$vn]}" | bc -l))
[[ ${vt_L} -gt ${vt_R} ]] && varr_db_check[$vn]=1 || varr_db_check[$vn]=0
#NCBI
# taxdump - 0.5G (493M)
((vn++))
Expand All @@ -120,62 +126,52 @@ varr_db_path[$vn]=${varr_db_path[0]}'/REPO_tool/ncbi_NR'
varr_db_gets[$vn]='wget --no-check-certificate https://ftp.ncbi.nlm.nih.gov/pub/taxonomy/taxdump.tar.gz -O '${varr_db_path[$vn]}'/taxdump.tar.gz'
varr_db_pack[$vn]=${varr_db_pack[0]}${varr_db_path[$vn]}'/taxdump.tar.gz --directory '${varr_db_path[$vn]}
#mkdir taxonomy && tar -xxvf taxdump.tar.gz -C taxonomy && mv ./taxonomy/ /DB/DEPO_demo/REPO_tool/ncbi_NR
if [[ "$(du ${varr_db_path[$vn]} 2> /dev/null | cut --fields 1 | tail -1)" -gt 1 ]];then
varr_db_check[$vn]=1
else
varr_db_check[$vn]=0
fi
varr_db_size[$vn]=0.5
vt_L=$(du -BM ${varr_db_path[$vn]} 2> /dev/null | cut --fields 1 | tail -1 | sed 's|M||g')
vt_R=$(printf %.0f $(echo "${v_vol_gm} * ${varr_db_size[$vn]}" | bc -l))
[[ ${vt_L} -gt ${vt_R} ]] && varr_db_check[$vn]=1 || varr_db_check[$vn]=0
# kraken/bracken
## premade - pluspf smol - 8G
((vn++))
varr_db_name[$vn]='tool_k2_std8'
varr_db_path[$vn]=${varr_db_path[0]}'/REPO_tool/kraken'
varr_db_gets[$vn]='wget --no-check-certificate https://genome-idx.s3.amazonaws.com/kraken/k2_pluspf_08gb_20240605.tar.gz -O '${varr_db_path[$vn]}'/kdb_std8.tar.gz'
varr_db_pack[$vn]=${varr_db_pack[0]}${varr_db_path[$vn]}'/kdb_std8.tar.gz --directory '${varr_db_path[$vn]}'/kdb_std8'
if [[ "$(du ${varr_db_path[$vn]}/kdb_std8 2> /dev/null | cut --fields 1 | tail -1)" -gt 1 ]];then
varr_db_check[$vn]=1
else
varr_db_check[$vn]=0
fi
varr_db_size[$vn]=8
vt_L=$(du -BM ${varr_db_path[$vn]} 2> /dev/null | cut --fields 1 | tail -1 | sed 's|M||g')
vt_R=$(printf %.0f $(echo "${v_vol_gm} * ${varr_db_size[$vn]}" | bc -l))
[[ ${vt_L} -gt ${vt_R} ]] && varr_db_check[$vn]=1 || varr_db_check[$vn]=0
# custom agora/apollo - apollo @ 4.5G
## apollo
((vn++))
varr_db_name[$vn]='tool_k2_apollo'
varr_db_path[$vn]=${varr_db_path[0]}'/REPO_tool/kraken/kdb_apollo'
varr_db_gets[$vn]='wget --no-check-certificate https://zenodo.org/records/14884732/files/kdb_apollo.tar.gz -O '${varr_db_path[$vn]}'.tar.gz'
varr_db_pack[$vn]='tar -I pigz -xvf '${varr_db_path[$vn]}'.tar.gz --directory '${varr_db_path[0]}'/REPO_tool/kraken/'
if [[ "$(du ${varr_db_path[$vn]} 2> /dev/null | cut --fields 1 | tail -1)" -gt 1 ]];then
varr_db_check[$vn]=1
else
varr_db_check[$vn]=0
fi
varr_db_size[$vn]=4.5
vt_L=$(du -BM ${varr_db_path[$vn]} 2> /dev/null | cut --fields 1 | tail -1 | sed 's|M||g')
vt_R=$(printf %.0f $(echo "${v_vol_gm} * ${varr_db_size[$vn]}" | bc -l))
[[ ${vt_L} -gt ${vt_R} ]] && varr_db_check[$vn]=1 || varr_db_check[$vn]=0
## agora2 - 50G (clean?) TODO confirm
((vn++))
varr_db_name[$vn]='tool_k2_agora'
varr_db_path[$vn]=${varr_db_path[0]}'/REPO_tool/kraken/kdb_agora'
varr_db_gets[$vn]='wget --no-check-certificate https://zenodo.org/records/14884741/files/kdb_agora.tar.gz -O '${varr_db_path[$vn]}'.tar.gz'
varr_db_pack[$vn]='tar -I pigz -xvf '${varr_db_path[$vn]}'.tar.gz --directory '${varr_db_path[0]}'/REPO_tool/kraken/'
if [[ "$(du ${varr_db_path[$vn]} 2> /dev/null | cut --fields 1 | tail -1)" -gt 1 ]];then
varr_db_check[$vn]=1
else
varr_db_check[$vn]=0
fi
varr_db_size[$vn]=50
vt_L=$(du -BM ${varr_db_path[$vn]} 2> /dev/null | cut --fields 1 | tail -1 | sed 's|M||g')
vt_R=$(printf %.0f $(echo "${v_vol_gm} * ${varr_db_size[$vn]}" | bc -l))
[[ ${vt_L} -gt ${vt_R} ]] && varr_db_check[$vn]=1 || varr_db_check[$vn]=0
## agora2 and apollo - 154G pre 69 post - 84G genomes - TODO ADJUST FOR FINAL BUILD 238
((vn++))
varr_db_name[$vn]='tool_k2_agora2apollo'
varr_db_path[$vn]=${varr_db_path[0]}'/REPO_tool/kraken/kdb_a2a'
varr_db_gets[$vn]='wget --no-check-certificate https://zenodo.org/records/14888918/files/kdb_a2a.tar.gz -O '${varr_db_path[$vn]}'.tar.gz'
varr_db_pack[$vn]='tar -I pigz -xvf '${varr_db_path[$vn]}'.tar.gz --directory '${varr_db_path[0]}'/REPO_tool/kraken/'
if [[ "$(du ${varr_db_path[$vn]} 2> /dev/null | cut --fields 1 | tail -1)" -gt 1 ]];then
varr_db_check[$vn]=1
else
varr_db_check[$vn]=0
fi
varr_db_size[$vn]=69
vt_L=$(du -BM ${varr_db_path[$vn]} 2> /dev/null | cut --fields 1 | tail -1 | sed 's|M||g')
vt_R=$(printf %.0f $(echo "${v_vol_gm} * ${varr_db_size[$vn]}" | bc -l))
[[ ${vt_L} -gt ${vt_R} ]] && varr_db_check[$vn]=1 || varr_db_check[$vn]=0
##smol demo - actual size needed
((vn++))
varr_db_name[$vn]='tool_k2_demo'
Expand All @@ -186,6 +182,9 @@ else
varr_db_check[$vn]=0
fi
varr_db_size[$vn]=8
vt_L=$(du -BM ${varr_db_path[$vn]} 2> /dev/null | cut --fields 1 | tail -1 | sed 's|M||g')
vt_R=$(printf %.0f $(echo "${v_vol_gm} * ${varr_db_size[$vn]}" | bc -l))
[[ ${vt_L} -gt ${vt_R} ]] && varr_db_check[$vn]=1 || varr_db_check[$vn]=0
# humann
func_help () {
# Help content
Expand Down Expand Up @@ -523,6 +522,15 @@ func_makedb_krak () {
#create fail log if missing
printf 'NCBI_ID\tno_attempts\n' > ${v_dir_k2_genm}/log_fail.txt
fi
# remove elements from array eg uncultured taxa
varr_drop=(77133)
for v_grab in "${varr_drop[@]}";do
for vi in "${!varr_taxid_uniq[@]}";do
if [[ ${varr_taxid_uniq[vi]} = $v_grab ]];then
unset 'varr_taxid_uniq[vi]'
fi
done
done
#remove whitespace
varr_taxid_uniq=( $(printf '%s\n' "${varr_taxid_uniq[@]}") )
for v_i in "${!varr_taxid_uniq[@]}";do
Expand Down
50 changes: 38 additions & 12 deletions src/analysis/persephone/SeqC_pipeline/BASH_seqc_mama.sh
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,8 @@
# Title: mama script: take run commands, est db link, run stepgen
# Program by: Wiley Barton - 2022.02.27
# Modified for conda/docker pipeline - 2024.02.22
# last update - 2025.01.30
# Version for Persephone
# last update - 2025.05.13
# Modified code sources:
# https://stackoverflow.com/questions/2043453/executing-multi-line-statements-in-the-one-line-command-line
# Notes: generate bash files according to user input for the completion of pipeline
Expand Down Expand Up @@ -36,6 +37,7 @@
# flesh out func_demo to build complete demo run within /DB/DEPO_demo
# auto compile file list from input dir in absence of provided list
# expand splash to include system params: cpu, mem, du of key directories
# implement pv for progress bar... tar -I pigz -xvf stuff.tar.gz | pv
#refs
# prodigal:https://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-11-119#citeas
# https://www.cyberciti.biz/tips/bash-shell-parameter-substitution-2.html
Expand All @@ -53,6 +55,8 @@ var_rep_len=${#var_rep_ary[@]}
#size check -db
v_vol_real=$(du -bs $v_dir_db | cut -f1 )
v_vol_none=60000
#system max cpu
v_sys_mem=$(nproc)
v_scrp_check=0
#var_rep_ary[var_rep_len]='x/y/z/cat'*'.kitty'
#var_rep_out=${var_rep_ary[@]}
Expand Down Expand Up @@ -136,7 +140,7 @@ func_help () {
printf " -o OUTPUT: Directory to contain final output (/path/to/output)\n"
printf " -r BrANCH: Branch of pipeline to use, one of SR, MAG, ALL\n\t(default: SR)\n"
printf " -c COMMS : Commands applied to step\n"
printf " -s STEPS : Steps of pipeline to run with '0' complete run \n\t(steps=( $( eval echo {1..9} ) ))\n"
printf " -s STEPS : Steps of pipeline to run with '0' complete run \n\t(alt. -s 2 -3)\n"
printf " -e HeAD : Head of files, from first char to start of variable region\n"
printf "\tsample_01.fastq\n\t...\n\tsample_10.fastq\n\t^^^^^^^\n"
printf " -t TAIL : File tail, ~extension, constant across all input\n"
Expand Down Expand Up @@ -1309,7 +1313,7 @@ if [[ ${v_scrp_check} -eq 1 ]];then
vopt_step=( $( eval echo {1..6} ) )
fi
else
vopt_step=${OPTARG}
vopt_step+=("$OPTARG")
vopt_part=1
fi
if ((vopt_dbug));then
Expand Down Expand Up @@ -1518,6 +1522,7 @@ if (( ${vopt_log} ));then
fi
printf 'Input file type: %s\n%s\n' "${vopt_file_type}" "${v_logblock1}" >> ${v_logfile}
# Check zip status - CRIT - extend to .tar files in dir
# pot. just exclude .tar
vt_L=$( printf '%s\n' "${v_file_in_good[@]}" | grep -E "\.gz|gzip$" | wc -l )
vt_R='0'
if [[ ${vt_L} -gt ${vt_R} ]];then
Expand Down Expand Up @@ -1724,12 +1729,21 @@ for istep in "${vopt_step[@]}";do
vin_head=''
vin_tail=''
fi
# proc check - ensure that too many processes are not called
if [[ -z ${venv_proc_max} ]];then
venv_proc_max=$(( ${venv_cpu_max} / 2 ))
fi
if [[ $(printf %.0f $(echo "${venv_cpu_max} * ${venv_proc_max}" | bc -l)) -gt "${v_sys_mem}" ]];then
vt="${venv_proc_max}"
while [[ $(printf %.0f $(echo "${venv_cpu_max} * ${vt}" | bc -l)) -gt "${v_sys_mem}" ]];do
((vt--))
#printf 'in%s\n' "${vt}"
done
#printf 'out%s\n' "${vt}"
venv_proc_max="${vt}"
fi
if [[ ${vopt_com_log} -eq 0 ]];then
#insert as awk print block - ex: vopt_com='{ printf "--output-format tsv --output-basename %s",va_nID }'
#improve with adjustment to :--processes <1>
if [[ -z ${venv_proc_max} ]];then
venv_proc_max=$(( ${venv_cpu_max} / 2 ))
fi
vin_com='{ printf " --remove-intermediate-output --reference-db %s --threads %s --processes %s --max-memory %sg --trimmomatic /opt/conda/envs/env_s1_kneaddata/share/trimmomatic --reorder","'${vin_host_rm}'",'${venv_cpu_max}','${venv_proc_max}','${venv_mem_max}' }'
fi
#WORKING EX
Expand Down Expand Up @@ -1757,6 +1771,9 @@ for istep in "${vopt_step[@]}";do
v_drop_exit=${v_drop_exit}' '${vout_sX}/${v_drop_catch}
fi
# exit logging
# TODO more elaborate permissions transfer approach
# pot. grab user ID at start and set ownership directly
chmod -R +777 ${vout_sX}/*
v_print_size=$( du -sh ${vout_sX} | cut -f 1 )
v_print_count=$( find ${vout_sX} | wc -l )
printf 'Step product location:\t%s\nStep run script location:\t%s\nStep output size(disk use):\t%s\nStep output count(files):\t%s\n%s\n' \
Expand Down Expand Up @@ -1986,7 +2003,7 @@ for istep in "${vopt_step[@]}";do
v_k2mpa_out=${v_kjoin_out/out/mpa_out_RA}
cat ${v_kjoin_out/out/mpa_out} | cut --fields 2${v_bk_col_frac} > "${v_k2mpa_out}"
#drop col type for match with meta data
sed -i "1 s/_$v_match//" "${v_k2mpa_out}"
sed -i "1,1s/_$v_match//g" "${v_k2mpa_out}"
#counts
v_match='num'
v_bk_cols=( $( head ${v_kjoin_out/out/mpa_out} -n 1 ) )
Expand All @@ -2006,7 +2023,7 @@ for istep in "${vopt_step[@]}";do
v_k2mpa_out=${v_kjoin_out/out/mpa_out_RC}
cat ${v_kjoin_out/out/mpa_out} | cut --fields 2${v_bk_col_num} > "${v_k2mpa_out}"
#drop col type for match with meta data
sed -i "1 s/_$v_match//" "${v_k2mpa_out}"
sed -i "1,1s/_$v_match//g" "${v_k2mpa_out}"
#default mars input is now "${v_k2mpa_out}" = /DB/DEPO_proc/step2_kraken/KB_S_mpa_out_RC.txt
vout_sX=${vout_s2}
# step keep-clean
Expand All @@ -2016,10 +2033,13 @@ for istep in "${vopt_step[@]}";do
v_drop_catch='*_{k2,bracken}_*{.txt,.fastq}'
eval "rm ${vout_sX}/${v_drop_catch}" 2> /dev/null
#statment for final drop
v_drop_catch='KB_*.txt'
v_drop_catch='KB_S_{taxid,out,mpa_out}.txt'
v_drop_exit=${v_drop_exit}' '${vout_sX}/${v_drop_catch}
fi
# exit logging
# TODO more elaborate permissions transfer approach
# pot. grab user ID at start and set ownership directly
chmod -R +777 ${vout_sX}/*
v_print_size=$( du -sh ${vout_sX} | cut -f 1 )
v_print_count=$( find ${vout_sX} | wc -l )
printf 'Step product location:\t%s\nStep run script location:\t%s\nStep output size(disk use):\t%s\nStep output count(files):\t%s\n%s\n' \
Expand Down Expand Up @@ -2204,6 +2224,9 @@ for istep in "${vopt_step[@]}";do
vout_s3=${vin_O_dir}/${v_exit_dir}
# exit logging
vout_sX=${vout_s3}
# TODO more elaborate permissions transfer approach
# pot. grab user ID at start and set ownership directly
chmod -R +777 ${vout_sX}/*
v_print_size=$( du -sh ${vout_sX} | cut -f 1 )
v_print_count=$( find ${vout_sX} | wc -l )
printf 'Step product location:\t%s\nStep run script location:\t%s\nStep output size(disk use):\t%s\nStep output count(files):\t%s\n%s\n' \
Expand Down Expand Up @@ -2582,6 +2605,9 @@ if (( ${vopt_log} ));then
# Relocate taxonomy output
v_drop_catch='KB_S_mpa_out_{RC,RA}.txt'
eval "mv ${vout_s2}/${v_drop_catch} /home/seqc_user/seqc_project/final_reports/" 2> /dev/null
# TODO more elaborate permissions transfer approach
# pot. grab user ID at start and set ownership directly
chmod -R +777 /home/seqc_user/seqc_project/final_reports
fi
if (( ${vopt_log} ));then
printf 'SeqC Stuff ENDS @: %s\n' "$(date +"%Y.%m.%d %H.%M.%S (%Z)")"
Expand Down Expand Up @@ -2665,7 +2691,7 @@ if [[ ${v_scrp_check} -eq 0 ]];then
#startup splash - ascii gen from: https://patorjk.com/software/taag, standard,slant,alpha,isometric1,impossible
#https://medium.com/@Frozenashes/making-a-custom-startup-message-for-a-linux-shell-using-bashrc-and-bash-scripting-280268fdaa17
func_splash() {
#if null create
#if null create
if [[ -z "${VEN_SPLASH}" ]]; then
echo "VEN_SPLASH="\"1\" >> /etc/environment
echo "export VEN_SPLASH=1" >> /root/.bashrc
Expand Down Expand Up @@ -2701,7 +2727,7 @@ if [[ ${v_scrp_check} -eq 0 ]];then
___\:::\ \:::\ \/::::::\ \:::\ \/:::/ / \:::\ \/:::/ / \:::\ \
/\ \:::\ \:::\____\::/\:::\ \:::\____\::/____/ \:::\____\::/ / \:::\____\
/::\ \:::\ \::/ /:/__\:::\ \::/ /:| | |:::| |::____/ \::/ /
\:::\ \:::\ \/____/::\ \:::\ \/____/::|____| |:::|____/::\ \ \/____/
\:::\ \:::\ \/____/::\ \:::\ \/____/::|____| |:::|____|::\ \ \/____/
\:::\ \:::\____\ \:::\ \:::\____\ |:::\ _\___/:::/ /:::\ \
\:::\ /:::/ / \:::\ \::/ / \:::\ |::| /:::/ / \:::\ \
\:::\/:::/ /______\:::\ \/____/______\:::\|::|/:::/ /___\:::\ \ _____ _____ ______
Expand Down
Loading