Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrong entry order when overriding config with JSON-format params-file #5902

Open
zhuxr11 opened this issue Mar 20, 2025 · 1 comment
Open

Comments

@zhuxr11
Copy link

zhuxr11 commented Mar 20, 2025

Bug report

When calling nextflow run -c $CONFIG_FILE -params-file $PARAMS_FILE, the parameters defined in PARAMS_FILE (but not in the same scope in CONFIG_FILE) may be loaded in a different order than originally in the file.

Expected behavior and actual behavior

When importing parameters that are only defined in PARAMS_FILE, keep the order of the parameters.

Steps to reproduce the problem

Create the following config file: test_load_json.nf.config

// ========================= Nextflow Configuration File ==============================
// This file contains parameters and process configurations for the Nextflow pipeline.
// ===================================================================================

// %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% Prameter Configuration %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
params {
    // ========================= Project Work Directory =================================
    project_dir = 'my_test_dir'
    main {
        // ========================= Parameters for model train process =========================
        model {
            pipeline=[
                na_strategy: [
                    from: "mypkg.preprocessing",
                    import: "NaStrategy"
                ],
                feature_perf: [
                    from: "mypkg.feature_selection",
                    import: "FeatureSelector"
                ],
                select_top: [
                    from: "mypkg.feature_selection",
                    import: "SelectTopN"
                ],
                xgbc: [
                    from: "xgboost",
                    import: "XGBClassifier"
                ]
            ]
            output_dir = "${params.project_dir}"
        }
    }
}

Create the following params file: test_load_json.params.json

{
    "model": {
        "output_dir": "my_test_dir",
        "sub_folder": "my_test_subdir",
        "pipeline": {
            "na_strategy": {
                "from": "mkpkg.preprocessing",
                "import": "NaStrategy"
            },
            "feature_perf": {
                "from": "mkpkg.feature_selection",
                "import": "FeatureSelector"
            },
            "select_top": {
                "from": "mkpkg.feature_selection",
                "import": "SelectTopN"
            },
            "xgbc": {
                "from": "xgboost",
                "import": "XGBClassifier"
            }
        }
    }
}

Then set up the following workflow: test_load_json.nf

nextflow.enable.dsl=2

workflow SUBWORKFLOW {
    println "params.model.pipeline = ${params.model.pipeline}"
}

process RUN_SUBWORKFLOW {
    input:
        path params_file
        path config_file

    output:
        path "console.log", emit: console_log

    script:
    """
    echo "* Params_file: ${params_file}" > "console.log"
    cat "${params_file}" >> "console.log"
    echo "* Config_file: ${config_file}" >> "console.log"
    cat "${config_file}" >> "console.log"
    nextflow run test_load_json.nf \
        -entry SUBWORKFLOW \
        -params-file "${params_file}" \
        -c "${config_file}" \
        >> "console.log"
    """
}

workflow {
    res = RUN_SUBWORKFLOW(
        Channel.fromPath("test_load_json*.json"),
        Channel.fromPath("test_load_json*.config")
    )
    res.console_log
        .map { it -> println "Console log:\n" + it.text }
}

Program output

Nextflow 24.10.5 is available - Please consider updating your version to it

 N E X T F L O W   ~  version 24.10.4

Launching `test_load_json.nf` [stoic_lovelace] DSL2 - revision: 581ec83ea4

executor >  local (1)
[22/4c5019] RUN_SUBWORKFLOW (1) [100%] 1 of 1 ✔
Console log:
* Params_file: test_load_json.params.json
{
    "model": {
        "output_dir": "my_test_dir",
        "sub_folder": "my_test_subdir",
        "pipeline": {
            "na_strategy": {
                "from": "mkpkg.preprocessing",
                "import": "NaStrategy"
            },
            "feature_perf": {
                "from": "mkpkg.feature_selection",
                "import": "FeatureSelector"
            },
            "select_top": {
                "from": "mkpkg.feature_selection",
                "import": "SelectTopN"
            },
            "xgbc": {
                "from": "xgboost",
                "import": "XGBClassifier"
            }
        }
    }
}
* Config_file: test_load_json.nf.config
// ========================= Nextflow Configuration File ==============================
// This file contains parameters and process configurations for the Nextflow pipeline.
// ===================================================================================

// %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% Prameter Configuration %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
params {
    // ========================= Project Work Directory =================================
    project_dir = 'my_test_dir'
    main {
        // ========================= Parameters for model train process =========================
        model {
            pipeline=[
                na_strategy: [
                    from: "mypkg.preprocessing",
                    import: "NaStrategy"
                ],
                feature_perf: [
                    from: "mypkg.feature_selection",
                    import: "FeatureSelector"
                ],
                select_top: [
                    from: "mypkg.feature_selection",
                    import: "SelectTopN"
                ],
                xgbc: [
                    from: "xgboost",
                    import: "XGBClassifier"
                ]
            ]
            output_dir = "${params.project_dir}"
        }
    }
}

 N E X T F L O W   ~  version 24.10.4

Launching `test_load_json.nf` [desperate_kirch] DSL2 - revision: 581ec83ea4

params.model.pipeline = [xgbc:[import:XGBClassifier, from:xgboost], na_strategy:[import:NaStrategy, from:mkpkg.preprocessing], feature_perf:[import:FeatureSelector, from:mkpkg.feature_selection], select_top:[import:SelectTopN, from:mkpkg.feature_selection]]

Completed at: 20-Mar-2025 13:30:45
Duration    : 3m 43s
CPU hours   : 0.1
Succeeded   : 1

As can be seen: the order of params.model.pipeline defined in test_load_json.params.json are: na_strategy, feature_perf, select_top, xgbc. While loaded by a process calling nextflow run, the order changed, making a wrong modelling pipeline that results in errors. Also the order of from and import has changed (although this does not affect downstream usage).

By the way, using YAML files as -params-file input does not mess up the order of the entries.

.nextflow.log

Environment

  • Nextflow version: 24.10.4
  • Java version: openjdk version "11.0.13" 2021-10-19
  • Operating system: Debian GNU/Linux 10 (buster)
  • Bash version: zsh 5.7.1 (x86_64-debian-linux-gnu)
@bentsherman
Copy link
Member

You are using a map, which is unordered by nature. If you want to preserve the order, you should use a list of maps:

            pipeline = [
                [
                    name: "na_strategy",
                    from: "mypkg.preprocessing",
                    import: "NaStrategy"
                ],
                // ...
            ]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants