LLM features (summarize, draft assist)
What’s in scope
Section titled “What’s in scope”This page covers how to configure the LLM backend that powers every synthesis surface in mxr. The features themselves each have their own guide:
| Feature | Guide | What the LLM does |
|---|---|---|
mxr summarize | this page | thread → Markdown summary |
mxr draft-assist | this page | thread + instruction → draft body |
mxr send --check answer coverage | Pre-send safety | extract asks from thread, judge whether the draft addresses each |
mxr send --check commitment candidates | Forgotten work | extract “I’ll send the deck Friday” promises from drafts |
mxr ask | Archive intelligence | retrieval-grounded answer over local mail, every claim cited |
mxr decisions rebuild | Archive intelligence | extract explicit decisions from threads |
mxr briefing thread / recipient | Briefings and loop-in | dormant-thread / long-gap recap from the local thread transcript or relationship baseline |
| delivery extraction | Deliveries | confirm a shortlisted email is a real shipment, extract merchant / carrier / items / ETA |
Every feature has an explicit disabled path when [llm] enabled = false.
Pure LLM commands such as mxr summarize and mxr draft-assist return
LLM is disabled; features with deterministic substrate, such as briefings
or safety checks, return the local fallback instead of pretending synthesis
succeeded.
Enable LLM features by setting [llm] enabled = true in your config and
pointing at any backend that speaks the OpenAI Chat Completions
schema.
Backends supported
Section titled “Backends supported”Because the wire format is OpenAI-compatible, the same client covers every major option:
| Backend | base_url | api_key_env | Notes |
|---|---|---|---|
| Ollama (local) | http://localhost:11434/v1 | (empty) | No auth header. Default in mxr’s config. |
| LM Studio (local) | http://localhost:1234/v1 | (empty) | No auth header. |
| OpenAI | https://api.openai.com/v1 | OPENAI_API_KEY | |
| Groq | https://api.groq.com/openai/v1 | GROQ_API_KEY | Very fast, tight context windows. |
| OpenRouter | https://openrouter.ai/api/v1 | OPENROUTER_API_KEY | Single key, many models. |
| Together AI | https://api.together.xyz/v1 | TOGETHER_API_KEY | |
| Mistral La Plateforme | https://api.mistral.ai/v1 | MISTRAL_API_KEY | |
| Anthropic via OpenAI-compatible proxy | depends | depends |
mxr’s local-first stance: the recommended config uses Ollama or LM Studio so completions never leave your machine. Cloud endpoints are opt-in via the same single config block.
Configuration
Section titled “Configuration”In the file printed by mxr config path:
[llm]enabled = true
# Ollama (recommended local default):base_url = "http://localhost:11434/v1"model = "qwen2.5:3b-instruct"api_key_env = ""
# Common alternatives — uncomment and adjust:# base_url = "http://localhost:1234/v1" # LM Studio# base_url = "https://api.openai.com/v1" # OpenAI# base_url = "https://api.groq.com/openai/v1" # Groq
context_window = 8192request_timeout_secs = 120The API key is read from the env var named in api_key_env at runtime
(empty = no Authorization header sent). Keeping the secret out of
the config file is intentional — the config is checked into dotfiles;
the env var lives in your shell init.
Check what the running daemon is using:
mxr llm statusmxr llm status --format jsonConfig reloads rebuild the runtime provider, so changing [llm] and
reloading the daemon account/config runtime switches the model without
restarting the process.
Recommended local models
Section titled “Recommended local models”For Ollama (ollama pull <model>):
qwen2.5:3b-instruct— ~2GB, very fast on a laptop, good summary quality. Default in mxr’s example config.qwen2.5:7b-instruct— ~4.4GB, noticeably better at draft generation for longer threads.llama3.2:3b— comparable to Qwen 3B, slightly different tone.llama3.1:8b— larger but stronger. Good if you have the RAM.
For LM Studio: any GGUF model loaded via the LM Studio UI works. Use
the model identifier shown in LM Studio’s “Local Server” tab as model
in the config.
# Summarize a long thread:mxr summarize THREAD_ID
# Generate a reply draft:mxr draft-assist THREAD_ID "decline politely, suggest next month"mxr draft-assist THREAD_ID "ack and ask for the deadline"mxr draft-assist writes the body to stdout. Pipe it into your editor:
mxr draft-assist THREAD_ID "decline politely, suggest next month" \ | $EDITOR -Or use --format json for structured output:
mxr summarize THREAD_ID --format jsonmxr draft-assist THREAD_ID "..." --format jsonDraft JSON includes the generated body, model id, humanizer score summary, voice-match metadata when a relationship profile exists, and rewrite iteration count.
In the TUI, press y or Ctrl-p → Summarize Thread. The summary runs in
the background and renders above the message body; opening a long uncached
thread can also start a debounced background summary. In the web reader, the
Summary button or y calls the same daemon request and renders the result
in the AI overview collapsible above the thread.
What the prompts look like
Section titled “What the prompts look like”Both features use a tuned system prompt followed by the thread context. The summarizer asks for concise Markdown that names who said what, preserves concrete dates/deadlines/asks, and ends with next steps. The draft assistant asks for just the reply body, no greeting line if the thread is mid-conversation, no signature, plain prose, matching the formality and length of the thread.
When semantic search is enabled and indexed, draft assist first looks for similar prior outbound messages, filters out inbound mail and the current thread, and includes up to three examples as voice grounding. If semantic search is disabled or unavailable, draft assist still works with only the current thread and instruction.
When relationship data exists for a contact, mxr injects it as weak background guidance. The current thread and your explicit instruction override it, and the prompt tells the model not to invent familiarity outside stored known topics, commitments, or summaries.
Relationship/profile context is guarded separately for cloud providers.
Keep llm.allow_cloud_relationship_data = false to block that context from
non-local endpoints; set it to true only when you want relationship-aware
summaries, briefings, or drafts to use a cloud LLM.
Every generated draft also runs through a deterministic local humanizer detector. It flags common AI-writing patterns such as stock vocabulary, em-dash overuse, sycophantic openers, filler phrases, and rule-of-three formatting. Detection does not require an LLM.
You’ll get the best results with models that follow instructions well and stay close to the source — Qwen 2.5 instruct, Llama 3 instruct, and the GPT-5 family all do this reliably.
Limits and what’s deferred
Section titled “Limits and what’s deferred”- Single-shot completions — no streaming yet. The features are short-form (≤2KB outputs); a single round-trip beats streaming for this use case.
- 24KB prompt budget — long threads truncate oldest-first.
- Semantic grounding is opportunistic — prior sent examples are included only when semantic search is ready and has indexed matching sent messages.
- Summary cache — unchanged threads reuse the cached summary. The cache hash includes weak relationship context, so changed relationship summaries or style data invalidate stale summaries. Opening a thread returns a valid cached summary with the thread payload when one exists.
- Humanizer auto-rewrite is not the core contract — deterministic scoring is available locally; automatic rewrite loops are a separate pipeline layer.
Disabling
Section titled “Disabling”Set [llm] enabled = false in your config (or remove the section
entirely). Pure LLM commands return LLM is disabled; mixed features use
their deterministic fallback when one exists.
Demo mode: canned offline responses
Section titled “Demo mode: canned offline responses”When mxr demo is active, every LLM-backed feature is answered by an
in-process canned provider instead of the real backend. The provider
inspects each request’s system prompt to classify it (summarize, briefing,
draft-assist, ask, voice, commitments, decisions, …) and returns a realistic
template — so recordings of the demo show real-looking output without
spending tokens or needing an OPENAI_API_KEY.
The swap happens inside build_llm_provider based on MXR_INSTANCE == mxr-demo. It supersedes whatever [llm] is configured for your real
profile, so even if you have a paid OpenAI key wired up, mxr demo will
never call it. Exit demo mode with mxr demo stop to return to your
configured backend.
In real life
Section titled “In real life”- Catching up after vacation:
mxr search 'is:unread newer_than:7d' --format ids | xargs -n1 mxr summarize | less— turn 200 unread threads into 200 short summaries. - Replying to legalese:
mxr summarize THREAD_IDfirst, thenmxr draft-assist THREAD_ID "ack the request, ask for a 2-week extension"to generate a draft you can polish. - Triage rule of thumb: if a thread has 4+ messages and you’re about to reply, summarise it first. The cost is 2 seconds with local Ollama; the saving is reading the whole chain again.
Agent prompts that work
Section titled “Agent prompts that work”"Summarise every thread in my reply-later queue with more than 3messages. Use `mxr replies --format jsonl | jq -r .id | xargs -I{}mxr summarize {}`. Group by sender so I can batch responses.""Draft a polite decline to the latest message from acme@example.com.Use `mxr search 'from:acme@example.com' --format ids | head -1` toget the thread id, then `mxr draft-assist`. Show me the draft —don't send."See also
Section titled “See also”- Pre-send safety — the safety pipeline’s LLM-backed answer-coverage check
- Forgotten work — LLM-confirmed commitment extraction from drafts
- Archive intelligence —
mxr askand the decision log, citations required - Briefings and loop-in — dormant-thread briefings, deterministic expert lookup, and whois
- Recipes — talking to your agent
- For agents
- Config —
[llm] - CLI — LLM features