-
Notifications
You must be signed in to change notification settings - Fork 972
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: NVIDIA allow non-llama model registration #1859
base: main
Are you sure you want to change the base?
Conversation
4032efa
to
27a1657
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for adding this. few comments inline for you.
if _is_nvidia_hosted(self._config) and provider_model_id in special_model_urls: | ||
base_url = special_model_urls[provider_model_id] | ||
|
||
# add /v1 in case of hosted models |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a behavior change.
current behavior: always add /v1
new behavior: add /v1 for hosted and don't for non-hosted
the behavior should be consistent between hosted and non-hosted. for instance, a user should not need to know they're talking to https://integrate.api.nv.c and therefore don't need to supply the /v1 or since they're talking to http://localhost that they do need to provide the /v1.
is there an issue w/ /v1 and customizer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding /v1 may produce errors, specially when user is specifying NVIDIA_BASE_URL
Can we remove the /v1 entirely? And add it in default base_url, what do we miss in that case?
As some models endpoints on API catalogue follow /chat/completion.
NOTE: Only supports models endpoints compatible with AsyncOpenAI base_url format. | ||
""" | ||
if model.model_type == ModelType.embedding: | ||
# embedding models are always registered by their provider model id and does not need to be mapped to a llama model |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should be handled within the provider model id function
if provider_resource_id: | ||
model.provider_resource_id = provider_resource_id | ||
else: | ||
llama_model = model.metadata.get("llama_model") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i believe model is a https://github.com/meta-llama/llama-stack/blob/main/llama_stack/apis/models/models.py#L31, which does not have a metadata
https://github.com/meta-llama/llama-stack/blob/main/llama_stack/models/llama/datatypes.py#L346 has a metadata, and confusingly the same class name.
suggestion: trust the input config. it should fail at inference if it's incorrect.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is base register_model() logic: https://github.com/meta-llama/llama-stack/blob/main/llama_stack/providers/utils/inference/model_registry.py#L76
I modified only parts to allow non-llama models.
What does this PR do?
Adds custom model registration functionality to NVIDIAInferenceAdapter which let's the inference happen on:
Example Usage:
Test Plan
Updated Readme.md
cc: @dglogo, @sumitb, @mattf