Gemini embeddings apparently not sending the api key #1478

josefresna · 2025-03-27T16:44:02Z

First, thank you very much for this amazing tool

Describe the bug

I would like to use Gemini embeddings.

In order to do that I have specified the following configuration:

type: embedder
provider: litellm_embedder
models:
  - model: gemini/text-embedding-004 # gemini/<gemini_model_name>
    alias: default
    api_base: https://generativelanguage.googleapis.com/v1beta/models/text-embedding-004 
    timeout: 120

Also in the .env I have specified:

GEMINI_API_KEY=AIzaSyB4FcSN.....

This configuration was working in WrenAI 0.16.0. More specific, with wren-ai-service 0.16.5

But not in 0.17.0 or 0.18.0

As far as I get to know, the wren-ai-service is not sending the api key to gemini:

This is the error I am getting:

> embedding [src.pipelines.retrieval.retrieval.embedding()] encountered an error<
2025-03-27 17:15:31 > Node inputs:
2025-03-27 17:15:31 {'embedder': '<src.providers.embedder.litellm.AsyncTextEmbedder ...',
2025-03-27 17:15:31  'histories': None,
2025-03-27 17:15:31  'query': "'What are the most common job step kinds (jstkind)..."}
2025-03-27 17:15:31 ********************************************************************************
2025-03-27 17:15:31 Traceback (most recent call last):
2025-03-27 17:15:31   File "/app/.venv/lib/python3.12/site-packages/litellm/main.py", line 3258, in aembedding
2025-03-27 17:15:31     response = await init_response  # type: ignore
2025-03-27 17:15:31                ^^^^^^^^^^^^^^^^^^^
2025-03-27 17:15:31   File "/app/.venv/lib/python3.12/site-packages/litellm/llms/vertex_ai/gemini_embeddings/batch_embed_content_handler.py", line 165, in async_batch_embeddings
2025-03-27 17:15:31     response = await async_handler.post(
2025-03-27 17:15:31                ^^^^^^^^^^^^^^^^^^^^^^^^^
2025-03-27 17:15:31   File "/app/.venv/lib/python3.12/site-packages/litellm/litellm_core_utils/logging_utils.py", line 135, in async_wrapper
2025-03-27 17:15:31     result = await func(*args, **kwargs)
2025-03-27 17:15:31              ^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-03-27 17:15:31   File "/app/.venv/lib/python3.12/site-packages/litellm/llms/custom_httpx/http_handler.py", line 257, in post
2025-03-27 17:15:31     raise e
2025-03-27 17:15:31   File "/app/.venv/lib/python3.12/site-packages/litellm/llms/custom_httpx/http_handler.py", line 213, in post
2025-03-27 17:15:31     response.raise_for_status()
2025-03-27 17:15:31   File "/app/.venv/lib/python3.12/site-packages/httpx/_models.py", line 763, in raise_for_status
2025-03-27 17:15:31     raise HTTPStatusError(message, request=request, response=self)
2025-03-27 17:15:31 httpx.HTTPStatusError: Client error '403 Forbidden' for url 'https://generativelanguage.googleapis.com/v1beta/models/text-embedding-004:batchEmbedContents'
2025-03-27 17:15:31 For more information check: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/403
2025-03-27 17:15:31 
2025-03-27 17:15:31 During handling of the above exception, another exception occurred:
2025-03-27 17:15:31 
2025-03-27 17:15:31 Traceback (most recent call last):
2025-03-27 17:15:31   File "/app/.venv/lib/python3.12/site-packages/hamilton/async_driver.py", line 122, in new_fn
2025-03-27 17:15:31     await fn(**fn_kwargs) if asyncio.iscoroutinefunction(fn) else fn(**fn_kwargs)
2025-03-27 17:15:31     ^^^^^^^^^^^^^^^^^^^^^
2025-03-27 17:15:31   File "/app/.venv/lib/python3.12/site-packages/langfuse/decorators/langfuse_decorator.py", line 219, in async_wrapper
2025-03-27 17:15:31     self._handle_exception(observation, e)
2025-03-27 17:15:31   File "/app/.venv/lib/python3.12/site-packages/langfuse/decorators/langfuse_decorator.py", line 520, in _handle_exception
2025-03-27 17:15:31     raise e
2025-03-27 17:15:31   File "/app/.venv/lib/python3.12/site-packages/langfuse/decorators/langfuse_decorator.py", line 217, in async_wrapper
2025-03-27 17:15:31     result = await func(*args, **kwargs)
2025-03-27 17:15:31              ^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-03-27 17:15:31   File "/src/pipelines/retrieval/retrieval.py", line 129, in embedding
2025-03-27 17:15:31     return await embedder.run(query)
2025-03-27 17:15:31            ^^^^^^^^^^^^^^^^^^^^^^^^^
2025-03-27 17:15:31   File "/app/.venv/lib/python3.12/site-packages/backoff/_async.py", line 151, in retry
2025-03-27 17:15:31     ret = await target(*args, **kwargs)
2025-03-27 17:15:31           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-03-27 17:15:31   File "/src/providers/embedder/litellm.py", line 62, in run
2025-03-27 17:15:31     response = await aembedding(
2025-03-27 17:15:31                ^^^^^^^^^^^^^^^^^
2025-03-27 17:15:31   File "/app/.venv/lib/python3.12/site-packages/litellm/utils.py", line 1441, in wrapper_async
2025-03-27 17:15:31     raise e
2025-03-27 17:15:31   File "/app/.venv/lib/python3.12/site-packages/litellm/utils.py", line 1300, in wrapper_async
2025-03-27 17:15:31     result = await original_function(*args, **kwargs)
2025-03-27 17:15:31              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-03-27 17:15:31   File "/app/.venv/lib/python3.12/site-packages/litellm/main.py", line 3274, in aembedding
2025-03-27 17:15:31     raise exception_type(
2025-03-27 17:15:31           ^^^^^^^^^^^^^^^
2025-03-27 17:15:31   File "/app/.venv/lib/python3.12/site-packages/litellm/litellm_core_utils/exception_mapping_utils.py", line 2214, in exception_type
2025-03-27 17:15:31     raise e
2025-03-27 17:15:31   File "/app/.venv/lib/python3.12/site-packages/litellm/litellm_core_utils/exception_mapping_utils.py", line 1150, in exception_type
2025-03-27 17:15:31     raise BadRequestError(
2025-03-27 17:15:31 litellm.exceptions.BadRequestError: litellm.BadRequestError: VertexAIException BadRequestError - {
2025-03-27 17:15:31   "error": {
2025-03-27 17:15:31     "code": 403,
2025-03-27 17:15:31     "message": "Method doesn't allow unregistered callers (callers without established identity). Please use API Key or other form of API consumer identity to call this API.",
2025-03-27 17:15:31     "status": "PERMISSION_DENIED"
2025-03-27 17:15:31   }
2025-03-27 17:15:31 }
2025-03-27 17:15:31 
2025-03-27 17:15:31 -------------------------------------------------------------------
2025-03-27 17:15:31 Oh no an error! Need help with Hamilton?
2025-03-27 17:15:31 Join our slack and ask for help! https://join.slack.com/t/hamilton-opensource/shared_invite/zt-2niepkra8-DGKGf_tTYhXuJWBTXtIs4g
2025-03-27 17:15:31 -------------------------------------------------------------------
2025-03-27 17:15:31 
2025-03-27 17:15:31 E0327 16:15:31.790 8 wren-ai-service:151] Request 74e96005-777e-4cd4-a0d8-209217178b3c: Error validating question: litellm.BadRequestError: VertexAIException BadRequestError - {
2025-03-27 17:15:31   "error": {
2025-03-27 17:15:31     "code": 403,
2025-03-27 17:15:31     "message": "Method doesn't allow unregistered callers (callers without established identity). Please use API Key or other form of API consumer identity to call this API.",
2025-03-27 17:15:31     "status": "PERMISSION_DENIED"
2025-03-27 17:15:31   }
2025-03-27 17:15:31 }
2025-03-27 17:15:31 
2025-03-27 17:15:32 INFO:     172.18.0.6:43574 - "GET /v1/question-recommendations/74e96005-777e-4cd4-a0d8-209217178b3c HTTP/1.1" 200 OK

This is the request that seems to fail:

https://generativelanguage.googleapis.com/v1beta/models/text-embedding-004:batchEmbedContents

This should be the reques (with the api key as query string)t:

https://generativelanguage.googleapis.com/v1beta/models/text-embedding-004:batchEmbedContents?key=AIzaSyB4b...

So, it is a matter of sending the api key, apparently

To Reproduce
Steps to reproduce the behavior:

Configure wrenAI with gemini
Try to deploy a schema
See the errors in the wren-ai-service

Expected behavior
I was expecting to work as it did in the previous versions

Screenshots

Desktop (please complete the following information):

OS: MacOS
Browser: safari

Wren AI Information

Version: 0.17.0

Additional context
Everything seems to work fine if I specified version 0.16.5 of wren-ai-service

Relevant log output

Config.yaml

type: llm
provider: litellm_llm
models:
  - model: gemini/gemini-2.0-flash # gemini/<gemini_model_name>
    alias: default
    timeout: 120
    kwargs:
      n: 1
      temperature: 0
  - model: gemini/gemini-2.0-flash # gemini/<gemini_model_name>
    alias: gemini-llm-for-chart
    timeout: 120
    kwargs:
      n: 1
      temperature: 0
      response_format:
        type: json_object

---
type: embedder
provider: litellm_embedder
models:
  - model: gemini/text-embedding-004 
    alias: default
    api_base: https://generativelanguage.googleapis.com/v1beta/models/text-embedding-004 
    timeout: 120

---
type: engine
provider: wren_ui
endpoint: http://wren-ui:3000

---
type: engine
provider: wren_ibis
endpoint: http://wren-ibis:8000

---
type: document_store
provider: qdrant
location: http://qdrant:6333
embedding_model_dim: 768 # put your embedding model dimension here
timeout: 120
recreate_index: true

---
type: pipeline
pipes:
  - name: db_schema_indexing
    embedder: litellm_embedder.default
    document_store: qdrant
  - name: historical_question_indexing
    embedder: litellm_embedder.default
    document_store: qdrant
  - name: table_description_indexing
    embedder: litellm_embedder.default
    document_store: qdrant
  - name: db_schema_retrieval
    llm: litellm_llm.default
    embedder: litellm_embedder.default
    document_store: qdrant
  - name: historical_question_retrieval
    embedder: litellm_embedder.default
    document_store: qdrant
  - name: sql_generation
    llm: litellm_llm.default
    engine: wren_ui
  - name: sql_correction
    llm: litellm_llm.default
    engine: wren_ui
  - name: followup_sql_generation
    llm: litellm_llm.default
    engine: wren_ui
  - name: sql_summary
    llm: litellm_llm.default
  - name: sql_answer
    llm: litellm_llm.default
    engine: wren_ui
  - name: sql_breakdown
    llm: litellm_llm.default
    engine: wren_ui
  - name: sql_expansion
    llm: litellm_llm.default
    engine: wren_ui
  - name: semantics_description
    llm: litellm_llm.default
  - name: relationship_recommendation
    llm: litellm_llm.default
    engine: wren_ui
  - name: question_recommendation
    llm: litellm_llm.default
  - name: question_recommendation_db_schema_retrieval
    llm: litellm_llm.default
    embedder: litellm_embedder.default
    document_store: qdrant
  - name: question_recommendation_sql_generation
    llm: litellm_llm.default
    engine: wren_ui
  - name: chart_generation
    llm: litellm_llm.gemini-llm-for-chart
  - name: chart_adjustment
    llm: litellm_llm.gemini-llm-for-chart
  - name: intent_classification
    llm: litellm_llm.default
    embedder: litellm_embedder.default
    document_store: qdrant
  - name: data_assistance
    llm: litellm_llm.default
  - name: sql_pairs_indexing
    document_store: qdrant
    embedder: litellm_embedder.default
  - name: sql_pairs_retrieval
    document_store: qdrant
    embedder: litellm_embedder.default
    llm: litellm_llm.default
  - name: preprocess_sql_data
    llm: litellm_llm.default
  - name: sql_executor
    engine: wren_ui
  - name: sql_question_generation
    llm: litellm_llm.default
  - name: sql_generation_reasoning
    llm: litellm_llm.default
  - name: followup_sql_generation_reasoning
    llm: litellm_llm.default
  - name: sql_regeneration
    llm: litellm_llm.default
    engine: wren_ui
  - name: instructions_indexing
    embedder: litellm_embedder.default
    document_store: qdrant
  - name: instructions_retrieval
    embedder: litellm_embedder.default
    document_store: qdrant
  - name: sql_functions_retrieval
    engine: wren_ibis
    document_store: qdrant
  - name: project_meta_indexing
    document_store: qdrant

---
settings:
  column_indexing_batch_size: 50
  table_retrieval_size: 10
  table_column_retrieval_size: 100
  allow_using_db_schemas_without_pruning: false 
  allow_intent_classification: true
  allow_sql_generation_reasoning: true
  query_cache_maxsize: 1000
  query_cache_ttl: 3600
  langfuse_host: https://cloud.langfuse.com
  langfuse_enable: true
  logging_level: DEBUG
  development: true
  historical_question_retrieval_similarity_threshold: 0.9
  sql_pairs_similarity_threshold: 0.7
  sql_pairs_retrieval_max_size: 10
  instructions_similarity_threshold: 0.7
  instructions_top_k: 10

Please share your logs with us with the following command:

wrenai-ibis-server.log

wrenai-wren-ai-service.log

wrenai-wren-engine.log

wrenai-wren-ui.log

Thanks!

My Workaround:

Add a nginx proxy to the docker file:

nginx:
    image: nginx:1-alpine
    ports:
      - 8089:80
    volumes:
      - ./nginx/default.conf:/etc/nginx/conf.d/default.conf
    networks:
     - wren

Configure a proxy rule to add the api key:

server {
    location /v1beta/ {
      rewrite ^(.*)$ $1?key=AIzaSyB4Fc.....  break;
      proxy_pass https://generativelanguage.googleapis.com;
    }
}

Specify the nginx as the embedding host

type: embedder
provider: litellm_embedder
models:
  - model: gemini/text-embedding-004
    alias: default
    api_base: http://nginx/v1beta/models/text-embedding-004 
    timeout: 120

Start wrenAI with docker compose up instead of the launcher

This workaround suggests that the configuration that I had specified was ok (gemini urls, api key...).

So it is a matter of sending the api key when WrenAI ask for the embeddings

Thanks!

The text was updated successfully, but these errors were encountered:

agupta7 · 2025-04-03T04:35:34Z

I'm facing the same thing with the latest release 0.19.0. I added some log lines to the code:

diff --git a/wren-ai-service/src/providers/embedder/litellm.py b/wren-ai-service/src/providers/embedder/litellm.py
index f23025b8..5922bb6d 100644
--- a/wren-ai-service/src/providers/embedder/litellm.py
+++ b/wren-ai-service/src/providers/embedder/litellm.py
@@ -95,6 +95,7 @@ class AsyncDocumentEmbedder:
     async def _embed_batch(
         self, texts_to_embed: List[str], batch_size: int, progress_bar: bool = True
     ) -> Tuple[List[List[float]], Dict[str, Any]]:
+        logger.info(f"Will create batch embeddings with key {self._api_key} and URL {self._api_base_url} and model {self._model}")
         all_embeddings = []
         meta: Dict[str, Any] = {}
         for i in tqdm(
@@ -177,6 +178,7 @@ class LitellmEmbedderProvider(EmbedderProvider):
             f"Initializing LitellmEmbedder provider with API base: {self._api_base}"
         )
         logger.info(f"Using Embedding Model: {self._embedding_model}")
+        logger.info(f"Using {api_key_name}={self._api_key}")
 
     def get_text_embedder(self):
         return AsyncTextEmbedder(`

Added an api_key_name to the embedding section of the config.google_ai_studio.yaml:

type: embedder
provider: litellm_embedder
models:
  # put GEMINI_API_KEY=<your_api_key> in ~/.wrenai/.env
  - model: gemini/text-embedding-004 # gemini/<gemini_model_name>
    alias: default
    api_base: https://generativelanguage.googleapis.com/v1beta/models/text-embedding-004 # change this according to your embedding model
    api_key_name: GEMINI_API_KEY
    timeout: 120

It makes to the LitellmEmbedderProvider in wren-ai-service just fine. But litellm doesn't seem to use it just like @josefresna mentioned above.

I0403 04:32:10.745 7 wren-ai-service:180] Using Embedding Model: gemini/text-embedding-004
I0403 04:32:10.745 7 wren-ai-service:181] Using GEMINI_API_KEY=AIzaS...
...
I0403 04:24:02.131 8 wren-ai-service:98] Will create batch embeddings with key AIzaS... and URL https://generativelanguage.googleapis.com/v1beta/models/text-embedding-004 and model gemini/text-embedding-004

I followed @josefresna's excellent idea for the workaround and it worked.

cyyeh · 2025-04-03T04:52:26Z

Thanks for raising the issue, and I will look into this issue this weekend

josefresna · 2025-04-04T10:18:30Z

Great! Thank you very much

cyyeh · 2025-04-07T03:37:12Z

@josefresna @agupta7 we've fixed the issue.

please use WREN_AI_SERVICE_VERSION=0.19.2 in ~/.wrenai/.env and use this config.google_ai_studio.yaml as example

https://github.com/Canner/WrenAI/blob/main/wren-ai-service/docs/config_examples/config.google_ai_studio.yaml

agupta7 · 2025-04-08T05:05:47Z

It works now, thanks! I was looking at how you fixed it in #1524 and I'm a little confused. How did making api_base optional in LitellmEmbedderProvider fix it? Does Litellm know its own default api_base URLs for a given model (gemini in my case)?

cyyeh · 2025-04-08T05:30:44Z

@agupta7 yes, I think so. the code should be here: https://github.com/BerriAI/litellm/blob/ff3a6830a441d232eaada541018d9d42b5d28783/litellm/llms/vertex_ai/common_utils.py#L115

cyyeh · 2025-04-08T05:31:13Z

I'll close issue now

josefresna added the bug Something isn't working label Mar 27, 2025

cyyeh closed this as completed Apr 8, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Gemini embeddings apparently not sending the api key #1478

Gemini embeddings apparently not sending the api key #1478

josefresna commented Mar 27, 2025 •

edited

Loading

agupta7 commented Apr 3, 2025 •

edited

Loading

cyyeh commented Apr 3, 2025

josefresna commented Apr 4, 2025

cyyeh commented Apr 7, 2025

agupta7 commented Apr 8, 2025

cyyeh commented Apr 8, 2025 •

edited

Loading

cyyeh commented Apr 8, 2025

Gemini embeddings apparently not sending the api key #1478

Gemini embeddings apparently not sending the api key #1478

Comments

josefresna commented Mar 27, 2025 • edited Loading

agupta7 commented Apr 3, 2025 • edited Loading

cyyeh commented Apr 3, 2025

josefresna commented Apr 4, 2025

cyyeh commented Apr 7, 2025

agupta7 commented Apr 8, 2025

cyyeh commented Apr 8, 2025 • edited Loading

cyyeh commented Apr 8, 2025

josefresna commented Mar 27, 2025 •

edited

Loading

agupta7 commented Apr 3, 2025 •

edited

Loading

cyyeh commented Apr 8, 2025 •

edited

Loading