feat: llm e2e tests #48

gregnr · 2025-04-11T03:22:10Z

Adds a new test suite for end-to-end LLM-based tests. Simulates real world MCP usage by:

mocking orgs / projects / databases via mock management API (as we already do for unit tests)
using ai-sdk to orchestrate real text generation via an LLM
connecting the MCP server via ai-sdk's MCP client
testing typical chat questions / prompts users use when interacting with Supabase MCP
evaluating which tools were called as a result of the user prompts
evaluating the text response using llm-as-a-judge concepts

To facilitate llm-as-a-judge, we create a custom vitest extension called toMatchCriteria() that accepts a natural language criteria string and asks an LLM to evaluate the generated output against the expected criteria. Eg:

await expect(text).toMatchCriteria(
  'Describes a single table in the "todos-app" project called "todos"'
);

feat: llm e2e tests

265d812

olirice approved these changes Apr 11, 2025

View reviewed changes

gregnr merged commit e4c6f7a into main Apr 11, 2025
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: llm e2e tests #48

feat: llm e2e tests #48

gregnr commented Apr 11, 2025 •

edited

Loading

feat: llm e2e tests #48

feat: llm e2e tests #48

Conversation

gregnr commented Apr 11, 2025 • edited Loading

gregnr commented Apr 11, 2025 •

edited

Loading