Skip to content

feat: llm e2e tests #48

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Apr 11, 2025
Merged

feat: llm e2e tests #48

merged 1 commit into from
Apr 11, 2025

Conversation

gregnr
Copy link
Collaborator

@gregnr gregnr commented Apr 11, 2025

Adds a new test suite for end-to-end LLM-based tests. Simulates real world MCP usage by:

  1. mocking orgs / projects / databases via mock management API (as we already do for unit tests)
  2. using ai-sdk to orchestrate real text generation via an LLM
  3. connecting the MCP server via ai-sdk's MCP client
  4. testing typical chat questions / prompts users use when interacting with Supabase MCP
  5. evaluating which tools were called as a result of the user prompts
  6. evaluating the text response using llm-as-a-judge concepts

To facilitate llm-as-a-judge, we create a custom vitest extension called toMatchCriteria() that accepts a natural language criteria string and asks an LLM to evaluate the generated output against the expected criteria. Eg:

await expect(text).toMatchCriteria(
  'Describes a single table in the "todos-app" project called "todos"'
);

@gregnr gregnr merged commit e4c6f7a into main Apr 11, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants