Description
Spring-ai has support for Hugginface Inference endpoints. However this doesn't work with the 'serverless' version of the inference API due to a hardcoded '/generate' subpath being used.
Bug description
Configure hugginface and use a serverless inference endpoint such as:
spring.ai.huggingface.chat.url=https://api-inference.huggingface.co/models/meta-llama/Meta-Llama-3-8B-Instruct
This will result in an exception with the following error:
```404 Not Found: "{"error":"Model meta-llama/Meta-Llama-3-8B-Instruct/generate does not exist"}"````
This is because the chatModel is calling the generate method (which leads to the openclient client calling /generate), while it seems it should be the 'compatGenerate' method to invoke the endpoint the /
path.
Environment
Spring-ai 1.0.0.M1
Expected behavior
I would expect this library to work with the serverless version of the inference endpoints (much cheaper 😅). If the path has to be different between versions it should be configurable.