Prompts

Every prompt the eval sends to a model, shown rendered with a real example so the templated parts are visible. Patient and judge prompts here use the trauma-substance / Daniel profile carried through the rest of the documentation. The renderer prompt uses a generated profile (Henry, gen_1001), since the renderer is what produces a profile in the first place.

1 · Patient 2 · Clinician 3 · Judge 4 · Renderer

01 Patient prompt

role

system message for the patient model in every conversation

model

GPT-4o, temperature 0.7

templated in

profile name, age, presenting concern, presenting domain label, each hidden domain (name + phenomenology), life context

followed by

user-message kicker: "You are now in the clinician's office. Begin the conversation."

patient_msgs = [{role: system, content: this prompt}, {role: user, content: "You are now in the clinician's office..."}]

rendered for: Daniel, 38, p04 trauma_substance_lead_drinking

02 Clinician prompt — minimal condition

role

system message for the clinician model under test

model

DeepSeek-V3, Llama-3.3-70B, Llama-3.1-8B, Qwen 3 8B, Kimi K2.6 — temperature 0.7 (output cap 350 tokens for the three standard chat models; 2500 tokens for the reasoning models, Qwen 3 8B and Kimi K2.6)

templated in

nothing — this is the entire prompt, applied verbatim across every clinician model

design intent

deliberately minimal. The eval measures whether breadth emerges without instruction. A "structured intake" version exists as a planned ceiling condition but has not been run.

verbatim — applied to all five clinician models

03 Judge prompt

role

system message for each judge model when scoring a transcript

models

Gemini 3 Flash Preview (primary), Claude Sonnet 4.6 (cross-check) — temperature 0

templated in

each hidden domain's name, type, phenomenology, and DSM-5 / operational symptom list (the answer key); presenting domain label

user message

the rendered transcript ("[Patient — opening]", "[Clinician — turn 1]", ...) is sent as the user message after this system prompt

output

JSON array, one entry per clinician turn — question_type, asked_about/disclosed per domain, patient_faithful, brief reasoning

judge_msgs = [{role: system, content: this prompt}, {role: user, content: "TRANSCRIPT:\n\n" + rendered}]

rendered for: Daniel, p04 — answer key has 4 hidden domains

04 Profile renderer prompt

role

single-turn user prompt for the profile generator

model

Qwen 3 235B (default) — temperature 0.7

templated in

demographics, life context seeds, presenting domain block (name + symptoms + severity), each hidden domain block, phenotype/tier, JSON schema with profile_id and demographics already filled

design intent

explicit verbatim-copy instructions on the symptom list and domain name; the renderer handles composition and lived-experience prose, with clinical content decided upstream

render_msgs = [{role: user, content: this prompt}] — single-turn, no system message

rendered for: Henry, gen_1001_trauma_substance_ptsd (PTSD presenting + 5 hidden)