EvalDog
Live at evaldog.com · early access

Ship LLM features that don't silently break.

EvalDog grades your prompt & RAG outputs against real assertions, scores every case, and barks the moment a model update breaks something. Hosted dashboard + a zero-token CLI for CI and AI agents.

Open the dashboard
$ npx evaldog run cases.csv
evaldog — terminal

$ npx evaldog run shopbot.csv --min 80

✓ Greeting & intent

✓ Product search

✓ Add to cart

✗ Order status contains "delivered"

✓ Refund & escalation

80% 4/5 passed (gate 80%) exit 1

Runs in CI·Built for AI agents·Zero LLM tokens·CSV / JSON / YAML·promptfoo-compatible

HOW IT WORKS

From test cases to a graded report in 60 seconds.

01

Upload your cases

Drop a CSV, JSON, or YAML of test cases — the output you already have, plus what to assert.

02

Get a graded report

Every case is checked — contains, equals, regex, valid-JSON, not-empty — and scored pass/fail.

03

Watch for drift

Re-run on every model update. EvalDog flags the moment your score drops. (rolling out)

FOR CI & AGENTS

One command. A score. An exit code.

The evaldog CLI grades locally with no model calls — so an agent can check 200 outputs with a single shell command instead of streaming every case through the LLM.

  • Deterministic — no tokens, no API keys
  • Exit 1 on regression — drop it straight into CI
  • --json output your agent can parse
  • Same engine as the hosted dashboard
Read the quick start
ci.yml

# fail the build if quality drops

$ npx evaldog run evals/*.csv --min 90 --json

✓ 47 passed

✗ 3 failed

94% 213/226 (gate 90% → exit 1)

TRY IT NOW

A full ShopBot journey, pre-loaded.

Five ready-made evals — greeting, search, cart, order status, refund. One click each in the dashboard.

Open the dashboard
STEP 1

Greeting & Intent

STEP 2

Product Search

STEP 3

Add to Cart

STEP 4

Order Status

STEP 5

Refund & Escalation

Stop finding out from your users.

Grade your prompts before they ship. Free to try — no card, no setup.