By Dr. Guillermo Cecchi

More than 63 million psychiatric interviews are conducted every year. But none of them are analyzed in a quantitative codified manner. Surprising? Not really. Doctors don’t have time to find patterns in the pages of notes they keep per patient. Those pages, though, keep “big data” on psychiatric issues that analytics can help unlock and predict before episodes occur.

Now, after a multi-year study and accompanying development of text analysis algorithms, we may finally be able to quantify patterns in these interviews, and help doctors treat patients suffering from post-traumatic stress disorder and other conditions.

The most recent effort to match machine learning to clinical text started with work by my colleague Cheryl Corcoran at Columbia University, whom I worked with on speech graphs. She had been studying speech patterns to predict psychotic episodes. The patients had known pre-psychotic symptoms, but no known outbreaks. They participated in one interview and were observed for another two-and-a-half years. The belief was that speech patterns could identify those who were pre-psychotic, regardless of how apparent – or not – the symptoms were.

The unstructured data from the interviews was just too large to sort and codify. No patterns were emerging. But maybe a smart machine, and data from a past ecstasy study, could help.

Our current collaboration on pre-psychotic speech analysis needed baseline data. We found it in the form of an ecstasy study by another Columbia University colleague, Gillinder Bedi. While at the University of Chicago she compared interviews of those under the influence of the drug ecstasy, versus those taking a placebo. While a drug of abuse, ecstasy also has well-established pro-social effects, and is being studied for potential psychotherapeutic use. Her study administered ecstasy to regular users for four interviews under strict monitoring protocols. Its affect on a person’s emotional state, such as increased empathy, made for effective comparison to those not under the influence – and the algorithms we wrote with the help of my colleagues Facundo Carrillo and Diego Slezak at the University of Buenos Aires uncovered even more.

We found for the first time in known literature that ecstasy users’ speech fluidity increases and they use fewer catch phrases. This knowledge helped establish a baseline for which to compare patients with potential to suffer a psychotic episode, as the coherence of their discourse (how semantically similar consecutive phrases are) is a key symptom; our initial results were presented at the annual meeting of the American College of Neuropsychopharmacology in 2013.

By using real time machine learning to find word and phrase patterns during interviews, a psychiatrist would have a much better view of a patient’s true state of mind.

Based on findings from the University of Chicago ecstasy study, the points on the graph represent words in the studied lexicon, for the four conditions analyzed.

Based on findings from the University of Chicago ecstasy study, the points on the graph represent words in the studied lexicon, for the four conditions analyzed.

Combining the qualitative with the quantitative

Psychiatry is full of historical literature that characterizes patient conditions. Doctors must also fill out interview scales with questions such as “how anxious is the patient, on a scale from 1 to 5?” And these resources are effective. They’re just qualitative, only. For example, we did not find any study attempting to predict schizophrenia because, in part, of practitioners’ inability to simply and quickly compare notes. Until now, the ability to use computers to match vast amounts of this unstructured data didn’t exist. So, there was a lack of objective criteria that could be agreed upon across these practitioners and institutions.

Our machine learning algorithms can accurately read, analyze, and find those patterns. The next step is to give doctors a way to do this analysis in real time, by “hearing,” transcribing, and analyzing an interview in real time – all via a mobile device.

The prototype developed by our software lab in India works through a mobile cloud platform via a smart phone interface designed for health care workers. The device acquires (“hears”) the speech of the patient being interviewed, and sends it to a server managed by the healthcare facility for transcription and de-identification (for patient confidentiality). The output is then analyzed by a separate IBM cloud application, which returns results to the device – comparing the patient’s speech against other previously diagnosed patients, and the density of “loops” in the patient’s speech, as compared to the normal population.

We do not think of the app so much as a diagnostics tool, but rather as something akin to a blood test. It is and will always be the clinical psychiatrist who makes the diagnosis. Our measures can inform that diagnosis by capturing speech patterns not readily identifiable and feeding them back in real time, and kept over time, to the psychiatrist. Today, in the current research phase, participating doctors must enter their diagnosis before receiving the assessment. This aggregation and annotation of the data allows us to fine tune the analyses.

Perhaps in the future, those annual 63 million interviews will be codified, and contribute to diagnoses that help those suffering from PTSD, depression, and other conditions – all before any psychotic episodes actually occur

Read our findings in the paper A Window into the Intoxicated Mind? Speech as an Index of Psychoactive Drug Effects in the latest issue of Neuropsychopharmacology.

This work was also done in collaboration with I. Rish and J. Kozloski at IBM’s Thomas J. Watson Research Center, and S. Allam at IBM India.

7:59 am

You might be interested in similar research exploring the psychotherapeutic process on the basis of verbatim transscripts of therapy sessions. This reseach conducted since the 1980′s (!) has led to remarkable discoveries both in computational linguistics/text analysis and effectiveness in psychotherapy. Please refer to for more details.

Posted by: Sebastian Goeser
July 8, 2014
11:51 am

>This is very cool- any thoughts to offload the data crunching to Watson? Or is the data not “big” enough? [Kapil]

Eventually it will be possible, but at the moment our goal is to work very closely with the psychiatrists who perform the experiments and collect the data. We need a more solid background of understanding how to relate machine learning features with psychiatric knowledge, before tackling big and more unstructured data.

> Now my question and can the same baseline that you have infered by work with esctasy be used for other phychopharmacological drugs?… if yes will this do justice to the patient using other drugs as the baseline that would be infered by them may not be same as that of ecstasy? [ Bikash Mishra]

Not necessarily as a numerical value, but as a trend that may be detected under the influence of different psychoactive drugs, or mental states. What is important in our findings in this respect is that as we codify well-established features (i.e. increased empathy with ecstasy), we also stumble upon novel features that cut across conditions.

Posted by: Guillermo Cecchi
July 8, 2014
12:34 am

This is very cool- any thoughts to offload the data crunching to Watson? Or is the data not “big” enough?

Posted by: Kapil
July 7, 2014
6:37 pm

I have been waiting for this acticle for yours since 6th July .. Your baselining was done on work of Dr. Bedi, who used the interviews of Patients under the influence of ecstasy and/or placebos to generate speech patterns where in estasy users use fewer catch phrases.

Now my question and can the same baseline that you have infered by work with esctasy be used for other phychopharmacological drugs?… if yes will this do justice to the patient using other drugs as the baseline that would be infered by them may not be same as that of ecstasy?

Posted by: Bikash Mishra
July 7, 2014
2:58 pm

this is great

Posted by: Leonardo Shikida
Post a Comment