Tutorials¶

A set of hands-on Jupyter notebooks that walk through the main checkllm workflows. Each notebook is fully runnable end-to-end without a real API key — judged metrics are powered by an in-process FakeJudge. A clearly marked cell in every notebook shows how to swap in a real provider (OpenAIJudge, AnthropicJudge, etc.) once you have credentials.

All notebooks live in docs/notebooks/ with their outputs cleared so reviews stay clean.

Notebooks¶

#	Notebook	What you'll learn
01	Quickstart	Install checkllm, run your first deterministic check, call a judge offline, interpret results.
02	RAG evaluation	Faithfulness, context precision/recall, NDCG, and MRR on a toy retrieval pipeline.
03	Conversational eval	Multi-turn chat evaluation with `ConversationalTestCase` and per-turn scoring.
04	Agent trajectory	Tool-use validation, step efficiency, loop detection, and final-answer judging.
05	Red teaming	Attack generation, OWASP scorecard, vulnerability rollups.

Running locally¶

pip install checkllm jupyterlab
jupyter lab docs/notebooks/

Every notebook starts from a stubbed judge so the first run requires no secrets. Once you're comfortable, replace FakeJudge with a real backend from checkllm.judge and re-run.