Deployment Guide¶

This guide covers deploying checkllm in production environments, including Docker containers, multi-GPU judge scaling, and operational hardening.

Prerequisites¶

Python 3.10+ installed
At least one API key (OpenAI, Anthropic, or Gemini) or a local model server (Ollama, vLLM)
Docker 24.0+ (for containerised deployments)
2 GB RAM minimum; 8 GB+ recommended when running local judge models

Docker: Single Container¶

Dockerfile¶

Create a Dockerfile in your project root:

FROM python:3.12-slim

WORKDIR /app

RUN apt-get update && apt-get install -y --no-install-recommends \
    build-essential curl \
    && rm -rf /var/lib/apt/lists/*

COPY pyproject.toml requirements.lock ./

RUN pip install --no-cache-dir -r requirements.lock && \
    pip install --no-cache-dir "checkllm[all]"

COPY . .

CMD ["pytest", "tests/", "-v"]

Build and run¶

docker build -t myapp-llm-tests .
docker run \
  -e OPENAI_API_KEY="$OPENAI_API_KEY" \
  -e CHECKLLM_BUDGET=10.0 \
  myapp-llm-tests

Environment variables¶

Variable	Required	Default	Description
`OPENAI_API_KEY`	one required	—	OpenAI judge key
`ANTHROPIC_API_KEY`	one required	—	Anthropic judge key
`GEMINI_API_KEY`	one required	—	Gemini judge key
`CHECKLLM_BUDGET`	No	unlimited	Max spend per run (USD)
`CHECKLLM_JUDGE_MODEL`	No	`gpt-4o`	Default judge model
`CHECKLLM_CACHE_ENABLED`	No	`true`	Cache judge responses
`CHECKLLM_LOG_LEVEL`	No	`WARNING`	Log verbosity
`CHECKLLM_PROFILE`	No	—	Activate a named profile

Docker Compose: Local Dev with Ollama¶

Run checkllm alongside a local Ollama instance — zero API cost.

`docker-compose.yml`¶

services:
  ollama:
    image: ollama/ollama:latest
    ports:
      - "11434:11434"
    volumes:
      - ollama_data:/root/.ollama
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:11434/api/tags"]
      interval: 10s
      timeout: 5s
      retries: 5

  checkllm:
    build: .
    depends_on:
      ollama:
        condition: service_healthy
    environment:
      CHECKLLM_JUDGE_MODEL: "ollama/llama3.2"
      OLLAMA_BASE_URL: "http://ollama:11434"
    command: >
      sh -c "
        curl -s http://ollama:11434/api/pull -d '{\"model\":\"llama3.2\"}' &&
        pytest tests/ -v -m 'not llm'
      "

volumes:
  ollama_data:

Run¶

docker compose up --build

Multi-GPU Judge Scaling with vLLM¶

For high-throughput pipelines, run a vLLM server as a local OpenAI-compatible judge backend across multiple GPUs.

Start the vLLM server¶

docker run --gpus all \
  -p 8000:8000 \
  vllm/vllm-openai:latest \
  --model meta-llama/Llama-3.1-70B-Instruct \
  --tensor-parallel-size 4 \  # set to number of GPUs
  --max-model-len 8192

Configure checkllm for vLLM¶

# pyproject.toml
[tool.checkllm]
judge_model = "meta-llama/Llama-3.1-70B-Instruct"

[tool.checkllm.judge_config]
base_url = "http://vllm-server:8000/v1"
api_key = "not-required"
max_concurrency = 50

Or in code:

from checkllm import OpenAICompatibleJudge

judge = OpenAICompatibleJudge(
    model="meta-llama/Llama-3.1-70B-Instruct",
    base_url="http://vllm-server:8000/v1",
    api_key="not-required",
)

Concurrency tuning¶

GPU memory	Model size	Recommended `max_concurrency`
24 GB (RTX 4090)	7 B	32
2x40 GB (A100)	13 B	48
4x80 GB (H100)	70 B	64

Production Hardening Checklist¶

[ ] Set CHECKLLM_BUDGET to prevent runaway costs
[ ] Enable caching (CHECKLLM_CACHE_ENABLED=true) to deduplicate evaluations
[ ] Pin the judge model version: judge_model = "gpt-4o-2024-11-20" (not gpt-4o)
[ ] Tune max_concurrency to stay within API rate limits
[ ] Store API keys in a secrets manager (AWS Secrets Manager, HashiCorp Vault)
[ ] Add --fail-on-regression in CI to block score drops
[ ] Run checkllm estimate tests/ before any new CI job to sanity-check spend
[ ] Use requirements.lock for reproducible builds (see Lockfile docs)

Kubernetes (Quick Reference)¶

apiVersion: batch/v1
kind: Job
metadata:
  name: checkllm-eval
spec:
  template:
    spec:
      containers:
        - name: checkllm
          image: myapp-llm-tests:latest
          env:
            - name: OPENAI_API_KEY
              valueFrom:
                secretKeyRef:
                  name: llm-api-keys
                  key: openai
            - name: CHECKLLM_BUDGET
              value: "10.0"
      restartPolicy: Never