feat: NVIDIA allow non-llama model registration #1859

raspawar · 2025-04-02T10:05:41Z

What does this PR do?

Adds custom model registration functionality to NVIDIAInferenceAdapter which let's the inference happen on:

post-training model
non-llama models in API Catalogue(behind https://integrate.api.nvidia.com and endpoints compatible with AyncOpenAI)

Example Usage:

from llama_stack.apis.models import Model, ModelType
from llama_stack.distribution.library_client import LlamaStackAsLibraryClient
client = LlamaStackAsLibraryClient("nvidia")
_ = client.initialize()

client.models.register(
        model_id=model_name,
        model_type=ModelType.llm,
        provider_id="nvidia"
)

response = client.inference.chat_completion(
    model_id=model_name,
    messages=[{"role":"system","content":"You are a helpful assistant."},{"role":"user","content":"Write a limerick about the wonders of GPU computing."}],
)

Test Plan

pytest tests/unit/providers/nvidia/test_supervised_fine_tuning.py 
========================================================== test session starts ===========================================================
platform linux -- Python 3.10.0, pytest-8.3.5, pluggy-1.5.0
rootdir: /home/ubuntu/llama-stack
configfile: pyproject.toml
plugins: anyio-4.9.0
collected 6 items                                                                                                                        

tests/unit/providers/nvidia/test_supervised_fine_tuning.py ......                                                                  [100%]

============================================================ warnings summary ============================================================
../miniconda/envs/nvidia-1/lib/python3.10/site-packages/pydantic/fields.py:1076
  /home/ubuntu/miniconda/envs/nvidia-1/lib/python3.10/site-packages/pydantic/fields.py:1076: PydanticDeprecatedSince20: Using extra keyword arguments on `Field` is deprecated and will be removed. Use `json_schema_extra` instead. (Extra keys: 'contentEncoding'). Deprecated in Pydantic V2.0 to be removed in V3.0. See Pydantic V2 Migration Guide at https://errors.pydantic.dev/2.11/migration/
    warn(

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
====================================================== 6 passed, 1 warning in 1.51s ======================================================

Updated Readme.md

cc: @dglogo, @sumitb, @mattf

mattf

thanks for adding this. few comments inline for you.

mattf · 2025-04-02T15:15:15Z

llama_stack/providers/remote/inference/nvidia/nvidia.py

-        if _is_nvidia_hosted(self._config) and provider_model_id in special_model_urls:
-            base_url = special_model_urls[provider_model_id]
-
+        # add /v1 in case of hosted models


this is a behavior change.

current behavior: always add /v1
new behavior: add /v1 for hosted and don't for non-hosted

the behavior should be consistent between hosted and non-hosted. for instance, a user should not need to know they're talking to https://integrate.api.nv.c and therefore don't need to supply the /v1 or since they're talking to http://localhost that they do need to provide the /v1.

is there an issue w/ /v1 and customizer?

Adding /v1 may produce errors, specially when user is specifying NVIDIA_BASE_URL
Can we remove the /v1 entirely? And add it in default base_url, what do we miss in that case?
As some models endpoints on API catalogue follow /chat/completion.

mattf · 2025-04-02T15:18:44Z

llama_stack/providers/remote/inference/nvidia/nvidia.py

+            NOTE: Only supports models endpoints compatible with AsyncOpenAI base_url format.
+        """
+        if model.model_type == ModelType.embedding:
+            # embedding models are always registered by their provider model id and does not need to be mapped to a llama model


this should be handled within the provider model id function

mattf · 2025-04-02T15:29:06Z

llama_stack/providers/remote/inference/nvidia/nvidia.py

+        if provider_resource_id:
+            model.provider_resource_id = provider_resource_id
+        else:
+            llama_model = model.metadata.get("llama_model")


i believe model is a https://github.com/meta-llama/llama-stack/blob/main/llama_stack/apis/models/models.py#L31, which does not have a metadata

https://github.com/meta-llama/llama-stack/blob/main/llama_stack/models/llama/datatypes.py#L346 has a metadata, and confusingly the same class name.

suggestion: trust the input config. it should fail at inference if it's incorrect.

This is base register_model() logic: https://github.com/meta-llama/llama-stack/blob/main/llama_stack/providers/utils/inference/model_registry.py#L76

I modified only parts to allow non-llama models.

raspawar added 2 commits April 2, 2025 09:51

add register_model method

3d2b374

update documentation

27a1657

raspawar requested review from ashwinb, yanxi0830, hardikjshah, dltn, raghotham, dineshyv, vladimirivic, sixianyi0721, ehhuang, terrytangyuan, SLR722 and leseb as code owners April 2, 2025 10:05

facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label Apr 2, 2025

raspawar changed the title ~~add register_model method~~ feature: NVIDIA allow non-llama model registration Apr 2, 2025

raspawar changed the title ~~feature: NVIDIA allow non-llama model registration~~ feat: NVIDIA allow non-llama model registration Apr 2, 2025

raspawar force-pushed the register_custom_model branch 2 times, most recently from 4032efa to 27a1657 Compare April 2, 2025 10:14

raspawar and others added 2 commits April 2, 2025 16:14

Merge branch 'meta-llama:main' into register_custom_model

c098cc3

fix linting

0a7f9e8

mattf reviewed Apr 2, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: NVIDIA allow non-llama model registration #1859

feat: NVIDIA allow non-llama model registration #1859

raspawar commented Apr 2, 2025 •

edited

Loading

mattf left a comment

mattf Apr 2, 2025

raspawar Apr 2, 2025

mattf Apr 2, 2025

mattf Apr 2, 2025

raspawar Apr 2, 2025

feat: NVIDIA allow non-llama model registration #1859

Are you sure you want to change the base?

feat: NVIDIA allow non-llama model registration #1859

Conversation

raspawar commented Apr 2, 2025 • edited Loading

What does this PR do?

Example Usage:

Test Plan

mattf left a comment

Choose a reason for hiding this comment

mattf Apr 2, 2025

Choose a reason for hiding this comment

raspawar Apr 2, 2025

Choose a reason for hiding this comment

mattf Apr 2, 2025

Choose a reason for hiding this comment

mattf Apr 2, 2025

Choose a reason for hiding this comment

raspawar Apr 2, 2025

Choose a reason for hiding this comment

raspawar commented Apr 2, 2025 •

edited

Loading