Skip to content

Can't use Hugginface Inference API (serverless) due to hardcoded /generate path #849

Open
@eschnou

Description

@eschnou

Spring-ai has support for Hugginface Inference endpoints. However this doesn't work with the 'serverless' version of the inference API due to a hardcoded '/generate' subpath being used.

Bug description
Configure hugginface and use a serverless inference endpoint such as:
spring.ai.huggingface.chat.url=https://api-inference.huggingface.co/models/meta-llama/Meta-Llama-3-8B-Instruct

This will result in an exception with the following error:
```404 Not Found: "{"error":"Model meta-llama/Meta-Llama-3-8B-Instruct/generate does not exist"}"````

This is because the chatModel is calling the generate method (which leads to the openclient client calling /generate), while it seems it should be the 'compatGenerate' method to invoke the endpoint the / path.

https://github.com/spring-projects/spring-ai/blob/v1.0.0-M1/models/spring-ai-huggingface/src/main/java/org/springframework/ai/huggingface/HuggingfaceChatModel.java#L97

Environment
Spring-ai 1.0.0.M1

Expected behavior
I would expect this library to work with the serverless version of the inference endpoints (much cheaper 😅). If the path has to be different between versions it should be configurable.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions