Prompts

Every prompt the eval sends to a model, shown rendered with a real example so the templated parts are visible. Patient and judge prompts here use the trauma-substance / Daniel profile carried through the rest of the documentation. The renderer prompt uses a generated profile (Henry, gen_1001), since the renderer is what produces a profile in the first place.

01 Patient prompt

role
system message for the patient model in every conversation
model
GPT-4o, temperature 0.7
templated in
profile name, age, presenting concern, presenting domain label, each hidden domain (name + phenomenology), life context
followed by
user-message kicker: "You are now in the clinician's office. Begin the conversation."
patient_msgs = [{role: system, content: this prompt}, {role: user, content: "You are now in the clinician's office..."}]
rendered for: Daniel, 38, p04 trauma_substance_lead_drinking

  

02 Clinician prompt — minimal condition

role
system message for the clinician model under test
model
DeepSeek-V3, Llama-3.3-70B, Llama-3.1-8B, Qwen 3 8B, Kimi K2.6 — temperature 0.7 (output cap 350 tokens for the three standard chat models; 2500 tokens for the reasoning models, Qwen 3 8B and Kimi K2.6)
templated in
nothing — this is the entire prompt, applied verbatim across every clinician model
design intent
deliberately minimal. The eval measures whether breadth emerges without instruction. A "structured intake" version exists as a planned ceiling condition but has not been run.
verbatim — applied to all five clinician models

  

03 Judge prompt

role
system message for each judge model when scoring a transcript
models
Gemini 3 Flash Preview (primary), Claude Sonnet 4.6 (cross-check) — temperature 0
templated in
each hidden domain's name, type, phenomenology, and DSM-5 / operational symptom list (the answer key); presenting domain label
user message
the rendered transcript ("[Patient — opening]", "[Clinician — turn 1]", ...) is sent as the user message after this system prompt
output
JSON array, one entry per clinician turn — question_type, asked_about/disclosed per domain, patient_faithful, brief reasoning
judge_msgs = [{role: system, content: this prompt}, {role: user, content: "TRANSCRIPT:\n\n" + rendered}]
rendered for: Daniel, p04 — answer key has 4 hidden domains

  

04 Profile renderer prompt

role
single-turn user prompt for the profile generator
model
Qwen 3 235B (default) — temperature 0.7
templated in
demographics, life context seeds, presenting domain block (name + symptoms + severity), each hidden domain block, phenotype/tier, JSON schema with profile_id and demographics already filled
design intent
explicit verbatim-copy instructions on the symptom list and domain name; the renderer handles composition and lived-experience prose, with clinical content decided upstream
render_msgs = [{role: user, content: this prompt}] — single-turn, no system message
rendered for: Henry, gen_1001_trauma_substance_ptsd (PTSD presenting + 5 hidden)