Skip to content

Tutorials

A set of hands-on Jupyter notebooks that walk through the main checkllm workflows. Each notebook is fully runnable end-to-end without a real API key — judged metrics are powered by an in-process FakeJudge. A clearly marked cell in every notebook shows how to swap in a real provider (OpenAIJudge, AnthropicJudge, etc.) once you have credentials.

All notebooks live in docs/notebooks/ with their outputs cleared so reviews stay clean.

Notebooks

# Notebook What you'll learn
01 Quickstart Install checkllm, run your first deterministic check, call a judge offline, interpret results.
02 RAG evaluation Faithfulness, context precision/recall, NDCG, and MRR on a toy retrieval pipeline.
03 Conversational eval Multi-turn chat evaluation with ConversationalTestCase and per-turn scoring.
04 Agent trajectory Tool-use validation, step efficiency, loop detection, and final-answer judging.
05 Red teaming Attack generation, OWASP scorecard, vulnerability rollups.

Running locally

pip install checkllm jupyterlab
jupyter lab docs/notebooks/

Every notebook starts from a stubbed judge so the first run requires no secrets. Once you're comfortable, replace FakeJudge with a real backend from checkllm.judge and re-run.