-
Notifications
You must be signed in to change notification settings - Fork 43
Support for bounded dynamic model #701
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
624ba6f
to
010200c
Compare
Add support for output copy post ov infer to ORT for NPU with bounded dims
ov_tensor_data.tensor_ptr = std::make_shared<ov::Tensor>(input_info.type, input_info.ov_shape.get_shape(), | ||
const_cast<void*>(tensor.GetTensorRawData())); | ||
if (is_cpu) { | ||
tensor_ptr = std::make_shared<ov::Tensor>(input_info.type, input_tensor_shape, (void*)tensor_data); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fix this using cpp style cast
010200c
to
032d6db
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds support for bounded dynamic models by introducing a new user option (reshape_input) to convert fully dynamic models into bounded dynamic ones for better efficiency. Key changes include:
- Enhancements to the OpenVINO provider factory and parser utilities to parse and validate the new reshape_input configuration.
- Updates to backends and session tests to handle dynamic input shape validation and reshaping.
- Minor formatting and logging improvements in various OpenVINO-related components.
Reviewed Changes
Copilot reviewed 16 out of 16 changed files in this pull request and generated 2 comments.
Show a summary per file
File | Description |
---|---|
onnxruntime/test/providers/openvino/openvino_ep_context_test.cc | Validates error handling when a folder path is provided for ep_context_file_path. |
onnxruntime/test/perftest/ort_test_session.cc | Adds support for the new "reshape_input" configuration in performance tests. |
onnxruntime/python/tools/quantization/matmul_nbits_quantizer.py | Reformats conditions for readability without functional changes. |
onnxruntime/core/providers/openvino/ov_interface.h | Adds necessary includes for dynamic shapes. |
onnxruntime/core/providers/openvino/openvino_provider_factory.cc | Incorporates parsing logic for the new reshape_input option. |
onnxruntime/core/providers/openvino/openvino_parser_utils.{h,cc} | Implements parsing of reshape_input using regular expressions and helper functions. |
onnxruntime/core/providers/openvino/contexts.h | Introduces the reshape_t type to represent input reshaping settings. |
onnxruntime/core/providers/openvino/backends/basic_backend.{h,cc} | Uses dynamic_flags to track static, fully dynamic, and bounded dynamic states and introduces dimension validation. |
onnxruntime/core/providers/openvino/backend_utils.cc | Applies the reshape configuration to the OV model. |
onnxruntime/core/providers/openvino/backend_manager.{h,cc} | Implements dynamic input validation and ensures reshape_input covers all dynamic inputs. |
Comments suppressed due to low confidence (1)
onnxruntime/test/perftest/ort_test_session.cc:826
- Consider adding test cases to verify that the new 'reshape_input' key is correctly handled and its effects are appropriately validated in performance tests.
ov_options[key] = value;
int64_t min_dim = ov_dim.get_min_length(); | ||
int64_t max_dim = ov_dim.get_max_length(); | ||
if (ort_dim < min_dim || ort_dim > max_dim) { | ||
ORT_THROW(" ORT Dimension is out of range"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Enhance the error message in ValidateOrtDimsAgainstPartialShape to include additional context (such as the dimension index and expected range) to aid debugging.
Copilot uses AI. Check for mistakes.
@@ -146,6 +146,10 @@ CreateOVModel(std::string&& model, | |||
try { | |||
auto ov_model = OVCore::Get()->ReadModel(std::move(model), session_context.onnx_model_path_name.string()); | |||
|
|||
if (!session_context.reshape.empty()) { | |||
LOGS_DEFAULT(INFO) << log_tag << "Reshaping the ov tensor to specified shape"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] Consider logging the specific reshape details (e.g., target shape values) to provide clearer insight during debugging and validation of the reshaping process.
LOGS_DEFAULT(INFO) << log_tag << "Reshaping the ov tensor to specified shape"; | |
LOGS_DEFAULT(INFO) << log_tag << "Reshaping the ov tensor to specified shape: " << session_context.reshape; |
Copilot uses AI. Check for mistakes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approved
@preetha-intel @jatinwadhwa921 please fix Internal CI .. |
@vthaniel please approve from your end. |
@preetha-intel @sfatimar @jatinwadhwa921 |
All unit tests pass. Internal CI building fine. |
* Refactored the code for reshape feature * Refactor the inference logic accomodating bounded dimensions * Fix lint issues * Refactor OV shapes classification to be a part of bindings struct * Refactor the provider options key verification for python interface * Restrict removal of model proto when CPU fallback is enabled and fix unit test failures --------- Co-authored-by: jatinwadhwa921 <[email protected]>
// Unified OV compile_model is efficient when ov model caching is enabled | ||
// Unified OV compile_model API is supported with AUTO from version 2024.3 and above | ||
// Inputs with static dimensions | ||
// Not enabled for models with external weights and when ep context is set. | ||
const std::string model = model_proto->SerializeAsString(); | ||
// we have the serialized string, so we can release model proto to lower the peak memory consumption | ||
model_proto.reset(); | ||
if (disable_cpu_fallback) model_proto.reset(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@preetha-intel .. Wouldn't this introduce a compile time memory regression if CPU fallback is not disabled and model fully compiles on NPU
Description
Introduce an user option to convert a fully dynamic model into a bounded dynamic model for efficiency.