Using Informatics to Help Identify Patients for Research

Using Informatics to Help Identify Patients for Research 150 150 Tiasha Letostak, PhD

Researchers use Informatics for Integrating Biology and the Bedside (i2b2), a system that can rapidly convert a large amount of clinical data into searchable information, to dramatically reduce the time needed for patient cohort identification.

Patient cohort identification, or the process of finding patients with shared characteristics, is an important first step in the early identification of disease risks for patients and participant recruitment for clinical trials. But cohort identification methods such as searching a large clinical database for prospective subjects or conducting manual chart reviews can be time consuming and cost prohibitive.

Now, a multidisciplinary team at Nationwide Children’s Hospital has found a way to expedite this process by leveraging the existing functionalities of Informatics for Integrating Biology and the Bedside (i2b2), an interactive medical informatics system that is widely used for patient cohort identification. Although this application was developed for sleep disorder research, researchers note that the natural language processing functionality of i2b2, which enables quick analysis of large amounts of text, could be used for cohort identification in other clinical studies as well.

Yungui Huang, PhD, MBA, is director of Research Information Solutions and Innovations (RISI) at The Research Institute at Nationwide Children’s and senior author of the study, which was published in Applied Clinical Informatics.

“Searching for patients that met requirements of a specific research study involved a lengthy and personnel-driven process of sifting through 16,000 sleep disorder medical summary documents,” says Dr. Huang, whose multidisciplinary team also included Wei Chen, PhD, senior systems programmer from the big data team of RISI; Robert Kowatch, MD, PhD, principal investigator in the Center for Innovation in Pediatric PracticeSimon Lin, MD, MBA, chief research information officer at Nationwide Children’s; and Mark Splaingard, MD, director of the Sleep Disorders Center at Nationwide Children’s.

According to Dr. Huang, the manual process used to take an average of about two weeks. A primary motivation for this study was determining ways to speed up this research patient identification process.

From January 2004 to September 2014, 15,683 sleep study reports were collected at Nationwide Children’s. The researchers note that this large number of documents means that physicians urgently needed the cohort identification process to be automated by using an informatics system to overcome the poor performance of traditional manual means.

Dr. Huang and her team developed an i2b2 application to speed the process of cohort identification by converting textual information from sleep study documents into data that is more organized and useful for researchers.

“We extracted essential information from the sleep disorder summary documents and stored them in a database,” explains Dr. Huang. “Using the i2b2 platform, which comes with a user-friendly drag-and-drop interface, combined with a customized ontology or structure of searchable terms, we enabled cohort identification in real time – and that’s an important first step for clinical research.”

In addition to reducing the labor cost of cohort identification, the i2b2 system made data more accessible than before. Rather than using traditional keyword-based methods of document searching, “what if” questions could be asked directly in real time, instead of waiting for weeks for manual chart review.

“Our study shortened the research patient identification process from two weeks to about 15 minutes,” says Dr. Huang. “This paves the road for future clinicians and investigators to take advantage of this simple yet powerful platform for their patient care and research needs.”

As far as next steps for this team’s research, Dr. Huang explains that they are planning to roll out this institutional i2b2 platform organization-wide, in hopes that many other researchers will be able to learn about it and use it to prepare for their clinical studies.

“We are also in the process of linking our i2b2 platform with other institutions,” says Dr. Huang. “This will further enhance researchers’ capabilities to search for eligible research patients not only within Nationwide Children’s but across institutional boundaries.”



Chen W, Kowatch R, Lin S, Splaingard M, Huang Y. Interactive cohort identification of sleep disorder patients using natural language processing and i2b2Applied Clinical Informatics. 2015 May 27;6:345-363.

About the author

Tiasha is the senior strategist for Clinical & Research Communications at Nationwide Children's Hospital. She provides assistance to investigators in The Research Institute and clinician-scientists at Nationwide Children’s for internal and external communication of clinical studies, peer-reviewed journal articles, grant awards and research news. She is also the editor-in-chief for Research Now, Nationwide Children's monthly, all-employee e-newsletter for research, as well as a writer for Pediatrics Nationwide.