by Guillermo Cecchi
Patterns are everywhere. Benoit Mandelbrot found them in nature, and gave us fractals. And now computer systems and algorithms find them in data, like how Watson teases out relevant information in just about anything. Machines can even find patterns in speech to accurately predict psychosis onset in high-risk youths, as colleagues and I explain in a recent Nature Publishing Journals – Schizophrenia article, Automated Analysis of Free Speech Predicts Psychosis Onset in High-Risk Youths.
About 1 percent of the population between the age of 14 and 27 is at clinically high risk, or CHR, for experiencing a psychotic episode at some point in their lives. One percent might not sound like much, but a statistically significant 30 percent of those known CHR individuals will have an episode. This led me to work with academic and clinical psychiatrists to apply machine learning to the data – in the form of transcribed interviews – to find patterns that would accurately predict that 30 percent.
CHR symptoms range from delusions, hallucinations, and disorganized thoughts. But to be considered at risk for an episode, the symptoms must show direct adverse effects on an individual’s life for at least one month. This standard is difficult for families of a patient, much less the patient him or herself, to manually account for. And while psychiatrists, when they apply current classification techniques, are about 80 percent accurate at discovering CHR patients, we show that data analysis of a transcribed interview with a patient can help close that remaining gap. This improved recognition, in turn, means preventing the dramatic, debilitating extremes of these symptoms manifested during psychosis outbreaks.
Beginning in 2011, we interviewed 34 CHR patients, following up with them every three months over the course of the next two-and-a-half years. Using only the transcript from each subject’s initial interview for analysis, our system accurately predicted the five patients who experienced a “psychosis development.”
Our interviews differed from scripted interviews typically used in therapy sessions. The psychiatrists at Columbia University who partnered with us instead used fewer, open-ended questions, letting the patients talk about themselves in a more conversational, natural way. We didn’t worry about what they said; anything was useful.
This “free speech” approach established each patient’s semantic coherence (how well he or she stayed on topic), and syntactic structure, such as phrase length and use of determiner words that link the phrases. A clinical psychiatrist may intuitively recognize signs of disorganized thoughts in a traditional interview, but a machine can augment what they’re hearing and writing down by precisely measuring these variables.
The system first determines if patients under- or over-use words or sequences of words compared to normal speech. So, a patient can’t really attempt to outsmart the questions because, as mentioned, the system can use anything they say. The system then applies a machine learning convex hull algorithm to measure semantic coherence patterns. Think of the “hull” like a cluster of normally used, meaningful words and phrases, placed on a graph. The further outside that cluster, statistically speaking, the words and phrases are (or the absence of normal words and phrases), the more likely the risk of psychosis. For example, a patient changing subjects might not be as abrupt as switching from talking about sports to honey bees. But he or she may go on a tangent and leave out determiners like “which” or “this,” and ultimately does significantly change subjects.
If clinicians could use our system to examine any of their patients’ written communication, including social media, as part of their clinical assessment, they could quickly and more accurately reach those most likely to have a psychotic episode – well before the episode. And using a machine could also mean consistent, frequent patient monitoring by the clinician and even family members. Our research has also identified immediate word and phrase equivalencies in languages other than English, including Spanish and Portuguese.
With more data, we hope to establish a hull threshold, and ultimately diminish the rate of CHR-to-psychosis conversion. Our light-weight textual data analysis could make a valuable addition to the growing number of health apps already used in clinical settings.