Sources used to anchor the comorbidity probabilities, severity bands, demographic priors, and methodology behind the eval pipeline.
1 · Foundational classification
The DSM-5 is the source of truth for symptom lists, diagnostic categories, severity descriptors, and onset-age conventions throughout the pipeline.
DSM-5 manual
American Psychiatric Association.
(2013).
Diagnostic and Statistical Manual of Mental Disorders
(5th ed.). American Psychiatric Publishing.
Used for: all diagnostic-category symptom lists, criterion thresholds (e.g., MDD requires ≥5 of 9 symptoms), prerequisites such as postpartum onset and late-life cognitive impairment, severity descriptors (mild/moderate/severe), and onset-age ranges. Specific behaviors and cognitions (NSSI, passive SI, intrusive thoughts) follow DSM-5 conditions-for-further-study definitions where applicable.
DSM-5-TR revision
American Psychiatric Association.
(2022).
Diagnostic and Statistical Manual of Mental Disorders, Text Revision
(5th ed., text rev.). American Psychiatric Publishing.
Used for: minor refinements to PTSD criteria (Criterion E hypervigilance/anger language) and prolonged grief disorder framing where it bears on late-life depression presentation.
2 · Lifetime prevalence and comorbidity (general epidemiology)
Phenotype weight values and the comorbidity probabilities in phenotypes.py are loosely anchored in the National Comorbidity Survey Replication (NCS-R) and related survey work.
NCS-R lifetime prevalence
Kessler, R. C., Berglund, P., Demler, O., Jin, R., Merikangas, K. R., & Walters, E. E.
(2005).
Lifetime prevalence and age-of-onset distributions of DSM-IV disorders in the National Comorbidity Survey Replication.
Archives of General Psychiatry, 62(6), 593–602.
Used for: base-rate lifetime_prevalence values per disorder in disorders.py; rough phenotype weight ordering reflecting clinical-population frequency; severity-band priors.
NCS-R 12-month prevalence and severity
Kessler, R. C., Chiu, W. T., Demler, O., & Walters, E. E.
(2005).
Prevalence, severity, and comorbidity of 12-month DSM-IV disorders in the National Comorbidity Survey Replication.
Archives of General Psychiatry, 62(6), 617–627.
Used for: the (mild, moderate, severe) severity-band distributions used as defaults in disorders.py. NCS-R reports approximately 40% mild / 37% moderate / 22% severe across major disorders, which is the basis for the (0.40, 0.35, 0.25) default tuple.
NESARC substance + psychiatric
Grant, B. F., Saha, T. D., Ruan, W. J., Goldstein, R. B., Chou, S. P., Jung, J., et al.
(2016).
Epidemiology of DSM-5 drug use disorder: results from the National Epidemiologic Survey on Alcohol and Related Conditions–III.
JAMA Psychiatry, 73(1), 39–47.
Used for: opioid, cannabis, and stimulant use prevalence and comorbidity rates with mood/anxiety/PTSD; weight calibration on the opioid_with_internalizing and trauma_substance phenotypes.
NESARC PTSD-AUD
Pietrzak, R. H., Goldstein, R. B., Southwick, S. M., & Grant, B. F.
(2011).
Prevalence and Axis I comorbidity of full and partial posttraumatic stress disorder in the United States: results from Wave 2 of the NESARC.
Journal of Anxiety Disorders, 25(3), 456–465.
Used for: PTSD-AUD lifetime comorbidity OR (~3-5×) cited in the trauma_substance phenotype notes.
WHO world mental health
Kessler, R. C., & Üstün, T. B. (Eds.).
(2008).
The WHO World Mental Health Surveys: Global Perspectives on the Epidemiology of Mental Disorders.
Cambridge University Press.
Used for: cross-country sanity check on the comorbidity ratios used; primarily as a consistency reference rather than a direct numeric source.
3 · Disorder-specific comorbidity
Where phenotype optional probabilities reflect specific clinical literature rather than general epidemiology.
OCD-BDD Pinto et al.
Pinto, A., Mancebo, M. C., Eisen, J. L., Pagano, M. E., & Rasmussen, S. A.
(2006).
The Brown Longitudinal Obsessive Compulsive Study: clinical features and symptoms of the sample at intake.
Journal of Clinical Psychiatry, 67(5), 703–711.
Used for: OCD-BDD comorbidity rate (~30-40%) and OCD-MDD comorbidity rate (~50-60%) used in the ocd_spectrum phenotype optional probabilities.
BDD Phillips longitudinal
Phillips, K. A., & Stout, R. L.
(2006).
Associations in the longitudinal course of body dysmorphic disorder with major depression, obsessive-compulsive disorder, and social phobia.
Journal of Psychiatric Research, 40(4), 360–369.
Used for: BDD-social anxiety co-occurrence and BDD-MDD comorbidity calibration.
ADHD meta-analytic comorbidity
Cortese, S., Moreira-Maia, C. R., St. Fleur, D., Morcillo-Peñalver, C., Rohde, L. A., & Faraone, S. V.
(2016).
Association between ADHD and obesity: a systematic review and meta-analysis.
American Journal of Psychiatry, 173(1), 34–43.
Used for: ADHD-binge-eating comorbidity rate (~30%) used in the adhd_internalizing phenotype; ADHD-mood/anxiety comorbidity context.
Postpartum intrusive thoughts
Fairbrother, N., & Woody, S. R.
(2008).
New mothers' thoughts of harm related to the newborn.
Archives of Women's Mental Health, 11(3), 221–229.
Used for: base-rate of intrusive harm thoughts in postpartum women (~50%), informing the postpartum_intrusive_thoughts 0.55 inclusion probability in the postpartum_complex phenotype.
BPD NESARC-III
Tomko, R. L., Trull, T. J., Wood, P. K., & Sher, K. J.
(2014).
Characteristics of borderline personality disorder in a community sample: comorbidity, treatment utilization, and general functioning.
Journal of Personality Disorders, 28(5), 734–750.
Used for: BPD prevalence (~1.6% lifetime) and comorbidity rates with PTSD, NSSI, and AUD informing the bpd_complex phenotype optional probabilities.
Bipolar II diagnostic delay
Hirschfeld, R. M. A., Lewis, L., & Vornik, L. A.
(2003).
Perceptions and impact of bipolar disorder: how far have we really come? Results of the National Depressive and Manic-Depressive Association 2000 survey of individuals with bipolar disorder.
Journal of Clinical Psychiatry, 64(2), 161–174.
Used for: the "up to 40% of bipolar II misdiagnosed as MDD" figure in the bipolar_masked phenotype notes.
First-episode psychosis prodrome
Yung, A. R., Yuen, H. P., McGorry, P. D., Phillips, L. J., Kelly, D., Dell'Olio, M., et al.
(2005).
Mapping the onset of psychosis: the Comprehensive Assessment of At-Risk Mental States.
Australian and New Zealand Journal of Psychiatry, 39(11–12), 964–971.
Used for: the attenuated psychotic symptom criteria backing attenuated_psychotic_syndrome domain (DSM-5 Section III condition for further study); prodromal-to-conversion rates.
Late-life depression-MCI
Steffens, D. C., McQuoid, D. R., & Potter, G. G.
(2014).
Outcomes of older cognitively impaired individuals with current and past depression in the NCODE study.
Journal of Geriatric Psychiatry and Neurology, 27(1), 4–11.
Used for: MCI-depression overlap (~50%) informing the late_life_complex phenotype.
Eating disorders NSSI
Cucchi, A., Ryan, D., Konstantakopoulos, G., Stroumpa, S., Kaçar, A. Ş., Renshaw, S., et al.
(2016).
Lifetime prevalence of non-suicidal self-injury in patients with eating disorders: a systematic review and meta-analysis.
Psychological Medicine, 46(7), 1345–1358.
Used for: NSSI rate in anorexia/bulimia (~40-50%) informing the eating_disorder_complex phenotype optional probability.
4 · Methodology
References for the eval and synthetic-patient methodology.
LLM-judge Zheng et al.
Zheng, L., Chiang, W.-L., Sheng, Y., Zhuang, S., Wu, Z., Zhuang, Y., et al.
(2023).
Judging LLM-as-a-judge with MT-Bench and Chatbot Arena.
Advances in Neural Information Processing Systems, 36.
Used for: general framing for using a stronger LLM as judge over a weaker LLM's outputs; calibration concerns; the inter-judge agreement methodology.
Synthetic patients review
Tu, T., Palepu, A., Schaekermann, M., Saab, K., Freyberg, J., Tanno, R., et al.
(2024).
Towards conversational diagnostic AI.
arXiv preprint.
Used for: design pattern of using LLMs as both simulated patient and clinician; the constraint-rendering pattern (structured spec → LLM → conversation).
Clinical interview standardized intake
Sheehan, D. V., Lecrubier, Y., Sheehan, K. H., Amorim, P., Janavs, J., Weiller, E., et al.
(1998).
The Mini-International Neuropsychiatric Interview (M.I.N.I.): the development and validation of a structured diagnostic psychiatric interview for DSM-IV and ICD-10.
Journal of Clinical Psychiatry, 59 (Suppl 20), 22–33.
Used for: reference for what a "broad" psychiatric intake covers — used as an anchor for the structured-prompt ceiling condition (not yet run) where the clinician model is told to do a thorough psychiatric review of systems.