Skip to content

TrainingOutput with OutputFileDatasetConfig (how to retrieve best HyperDriveStep run) #1532

Open
@JackCaster

Description

@JackCaster

I am using HyperDrive in my pipeline via the HyperDriveStep. How can I export the model from the best HyperDrive run, register it, and use it in a following step?

In the doc, you refer to TrainingOutput, which it seems can be used only with PipelineData. In my pipeline, instead, I am using OutputFileDatasetConfig to move data across steps, which is also the recommended way https://github.com/MicrosoftDocs/azure-docs/issues/76169.

Currently, the training step is storing the pickled model in a blob storage after each child run is completed. In the blob storage, each model is stored in a folder named after the {run_id}, for example HD_7bf73de6-90dd-44dc-a8a4-8e8fd3f481f6_xxx, where xxx is the HyperDrive iteration (0, 1, 2,...). In a subsequent pipeline step, I would like to retrieve the best model but I need to know HyperDriver step best run ID.

In a step dedicated to register/use the best model, I tried

# step to retrieve the best Hyperdrive model
run = Run.get_context()
pipeline_run = run.parent
hyperdrive_step = pipeline_run.find_step_run("hyperdrive_step")[0]
best_run = hyperdrive_step.get_best_run_by_primary_metric()

this fails, as hyperdrive_step is a StepRun object and it does not have the method get_best_run_by_primary_metric.

Would you be able to help?


Document Details

Do not edit this section. It is required for docs.microsoft.com ➟ GitHub issue linking.

Metadata

Metadata

Assignees

No one assigned

    Labels

    ADOIssue is documented on MSFT ADO for internal trackingMLOpsbugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions