Sean W. KelleyHuman–AI interaction · behavioral science · mental health.
I measure how AI systems behave, and build tools that act on what I find.
Postdoctoral researcher at Northeastern's Network Science Institute and co-founder of Myndgard. My research measures how language models behave — things like sycophancy, epistemic independence, and causal reasoning. At Myndgard, I build mental-health tools that clinicians use in their work.
What I do
01 / aboutI work across human–AI interaction and behavioral science. My PhD is in psychology, focused on computational psychiatry — building machine-learning and network models of mental health from real longitudinal data. That work led to first-author papers in Nature Communications, PNAS, and NPJ Digital Medicine.
Now my research turns vague claims about how language models behave into numbers you can compare across models. I also co-founded Myndgard, where I build AI safety and mental-health tools that universities use with their counseling services.
My work splits in two: research on how AI behaves, and products that put it to use where it matters most — mostly mental health.
Research & evaluation
02 / measuring model behaviorGoalPref-Bench
A benchmark that tests whether nine language models stick to what a user actually wants or just agree with them. It turns a common worry about sycophancy into one number you can compare across models.
Causal-Coherence Probing
Asks a model to forecast something and explain its reasoning as a causal diagram. It then works out which factors matter most and challenges them one at a time: does the model change its mind more when you push on an important factor than a minor one? Run across seven models and hundreds of questions, with reliability and placebo checks.
Personalization & Epistemic Independence
Experiments on what happens when you personalize a model to a user. Personalizing makes it warmer and more emotionally in tune, but whether it also makes the model more agreeable depends on the role it's playing.
Personalized AI for Creative Work
A study with hundreds of people testing whether a personalized AI helps with creative work. It did: people and the AI built on each other's ideas across a back-and-forth, instead of the AI just doing the work. I ran it end to end, from the question to the analysis.
Building
03 / shipped & in flightDiscover
A 12-day, 38-module mental-health course. An AI pipeline writes the content for a range of student backgrounds and checks its own quality. 90% of students improved on mental-health measures.
visit siteShare
Monitors each student's mental health during the wait for care and flags those who are deteriorating to the clinic manager — so students getting worse are seen sooner, not triaged by who booked first. It also gives the therapist a pre-session summary. In production with university counseling services; 15–20% shorter initial sessions and fewer total sessions per student.
Guardian
Watches a teenager's conversations with AI companions for signs of harm, and runs on the device itself so the conversations never leave it.
Discover Family
A 10-module course that teaches teens (13+) to spot the ways AI companions can go wrong — flattery, growing attachment, blurred personas, being pulled away from real relationships, and bad responses in a crisis. Each module has scenarios to work through and an AI coach to talk them over with.
visit site