Skip to content

feat: pgvector - make error messages more informative #1684

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
May 6, 2025

Conversation

superkelvint
Copy link
Contributor

@superkelvint superkelvint commented May 1, 2025

The pgvector document store does not currently report SQL exceptions, which makes debugging difficult.

Using hayhooks, for example, with this patch, the error goes from this:

│ Server error: Pipeline execution failed: The following component failed to run: │
│ Component name: 'indexing:writer'                                               │
│ Component type: 'DocumentWriter'                                                │
│ Error: Could not write documents to PgvectorDocumentStore.                      │
│ You can find the SQL query and the parameters in the debug logs.                │

to this:

│ Server error: Pipeline execution failed: The following component failed to run:
│ Component name: 'indexing:writer'
│ Component type: 'DocumentWriter'
│ Error: Could not write documents to PgvectorDocumentStore:         │
│ expected 768 dimensions, not 384 
│You can find the SQL query and the parameters in the debug logs."}                   │

Proposed Changes:

Include the SQL exception in the reporting message.

Checklist

@superkelvint superkelvint requested a review from a team as a code owner May 1, 2025 15:51
@superkelvint superkelvint requested review from vblagoje and removed request for a team May 1, 2025 15:51
@anakin87 anakin87 self-requested a review May 2, 2025 08:50
@anakin87
Copy link
Member

anakin87 commented May 5, 2025

Ok. I see the issue but I would not overcrowd the error message if possible.

With Haystack

import os
from haystack_integrations.document_stores.pgvector import PgvectorDocumentStore
from haystack import Document

os.environ["PG_CONN_STR"] = "postgresql://postgres:postgres@localhost:5432/postgres"

document_store = PgvectorDocumentStore(
    embedding_dimension=5,
    vector_function="cosine_similarity",
    recreate_table=True,
    search_strategy="hnsw",
)

document_store.write_documents([
    Document(content="This is first", embedding=[0.1]*2),
    Document(content="This is second", embedding=[0.3]*2)
    ])
print(document_store.count_documents())

Error:

Traceback (most recent call last):
  File "/Users/stefano.fiorucci/dev/haystack-core-integrations/integrations/pgvector/src/haystack_integrations/document_stores/pgvector/document_store.py", line 791, in write_documents
    self._cursor.executemany(sql_insert, db_documents, returning=True)
  File "/Users/stefano.fiorucci/Library/Application Support/hatch/env/virtual/pgvector-haystack/tK4vRGuL/pgvector-haystack/lib/python3.12/site-packages/psycopg/cursor.py", line 128, in executemany
    raise ex.with_traceback(None)
psycopg.errors.DataException: expected 5 dimensions, not 2

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/Users/stefano.fiorucci/dev/haystack-core-integrations/integrations/pgvector/try.py", line 14, in <module>
    document_store.write_documents([
  File "/Users/stefano.fiorucci/dev/haystack-core-integrations/integrations/pgvector/src/haystack_integrations/document_stores/pgvector/document_store.py", line 801, in write_documents
    raise DocumentStoreError(error_msg) from e
haystack.document_stores.errors.errors.DocumentStoreError: Could not write documents to PgvectorDocumentStore. 
You can find the SQL query and the parameters in the debug logs.

In this case, the cause of the error is easily understandable, by inspecting the stacktrace.

In Hayhooks
Similar setting

from haystack import Pipeline
from hayhooks import BasePipelineWrapper
import os
from haystack_integrations.document_stores.pgvector import PgvectorDocumentStore
from haystack import Document
from haystack.components.writers import DocumentWriter

os.environ["PG_CONN_STR"] = "postgresql://postgres:postgres@localhost:5432/postgres"

class PipelineWrapper(BasePipelineWrapper):
    def setup(self) -> None:
        document_store = PgvectorDocumentStore(
        embedding_dimension=5,
        vector_function="cosine_similarity",
        recreate_table=True,
        search_strategy="hnsw",
        )
        
        pipe = Pipeline()
        document_writer = DocumentWriter(document_store)
        pipe.add_component("document_writer", document_writer)
        self.pipeline = pipe

    def run_api(self, text: str) -> str:
        document = Document(content=text, embedding=[0.1]*2)
        result = self.pipeline.run({"document_writer": {"documents": [document]}})
        return result

Error

Server error
Pipeline execution failed: The following component failed to run:
Component name: 'document_writer'
Component type: 'DocumentWriter'
Error: Could not write documents to PgvectorDocumentStore. 
You can find the SQL query and the parameters in the debug logs.

@mpangrazzi WDYT? Would it be possible/appropriate to show the entire stacktrace in Hayhooks?

@mpangrazzi
Copy link
Contributor

@anakin87 @superkelvint yes it's definitely possible to show full stacktraces on Hayhooks! You simply need to set HAYHOOKS_SHOW_TRACEBACKS env variable to a truthy value (e.g. 1 or true) and (re)launch the server. That variable is documented in Configuration section.

@superkelvint
Copy link
Contributor Author

Thanks for the feedback! I understand the concern about overcrowding error messages.

That said, I'd like to emphasize how including the SQL exception (even in a truncated form) could actually improve the developer experience. In environments like Hayhooks, where the full stack trace is hidden by default unless HAYHOOKS_SHOW_TRACEBACKS is set, the exception is unhelpful and the user is left with a generic error like:

Could not write documents to PgvectorDocumentStore.

Even outside of Hayhooks, just using plain Haystack, the raised exception adds an extra layer that can get in the way. It hides the real error behind a generic message, which makes debugging slower and less streamlined.

Would you be open to conditionally including the SQL error? For example:

raise DocumentStoreError(f"{error_msg}: {type(e).__name__}: {e}") from e

or even

raise DocumentStoreError(f"{error_msg}: {str(e)[:200]}") from e

This would keep the message compact while still providing valuable context that can immediately point to the issue. It helps streamline the debugging process by providing developers with actionable information right at the point of failure.

Let me know if you'd prefer this be gated behind a flag or if there's another compromise you'd suggest—happy to adapt!

@CLAassistant
Copy link

CLAassistant commented May 6, 2025

CLA assistant check
All committers have signed the CLA.

@github-actions github-actions bot added the type:documentation Improvements or additions to documentation label May 6, 2025
@anakin87 anakin87 changed the title pgvector: Include the actual SQL exception when reporting the error. feat: pgvector - make error messages more informative May 6, 2025
@anakin87
Copy link
Member

anakin87 commented May 6, 2025

Hey @superkelvint, I understand your point.
I'm OK with the idea and I pushed some small improvements.

Could you please sign the CLA?

@superkelvint
Copy link
Contributor Author

Thanks @anakin87! I have just signed the CLA.

@anakin87 anakin87 reopened this May 6, 2025
@anakin87 anakin87 merged commit c82f00f into deepset-ai:main May 6, 2025
11 checks passed
@anakin87
Copy link
Member

anakin87 commented May 6, 2025

the improvement is available in the new package release: https://pypi.org/project/pgvector-haystack/3.3.0/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
integration:pgvector type:documentation Improvements or additions to documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants