New Frontiers for Data Science in Pediatric Research

New Frontiers for Data Science in Pediatric Research 1024 577 Peter White, PhD

From discovery science to population health, researchers generate masses of data that hold immense potential to transform pediatric research, diagnostics, treatments and even guide disease prevention strategies. To make sense of this data deluge, we’ve been harnessing the power of cloud computing and cutting-edge data science techniques, such as artificial intelligence (AI), machine learning (ML) and big data analytics.

And that’s why the Abigail Wexner Research Institute at Nationwide Children’s Hospital has taken the bold step of creating the Office of Data Sciences. As the first chief data sciences officer, I’m proud to be leading this important initiative that will enable our research community to learn more from our data than ever before. Throughout my career here, I’ve had the opportunity to empower our research community with cutting-edge capabilities in genomics, data analysis and interpretation.

Organizational Support for the Data Sciences

In our quest to empower our researchers with the tools for innovation and excellence in their fields, we dedicated the past six months to a comprehensive analysis of our extensive clinical and research data ecosystem. This deep dive revealed critical insights: as our capacity for data generation and storage has expanded rapidly, we’ve encountered challenges with siloed datasets that hinder seamless integration. Furthermore, while our research community has a palpable enthusiasm to leverage data science methodologies, several obstacles have emerged. High data transaction costs, the intricate nature of big data analytics, and the scarcity of skilled personnel have significantly constrained our ability to harness the full potential of our data resources.

The outcome of our planning has led to the establishment of the Office of Data Sciences, marking a new chapter of excellence within our research institute. This initiative embodies our strategic vision to fully unlock the capabilities of biomedical data sciences throughout our research endeavors. By leveraging our robust foundation in EPIC, data sciences, genomics and cloud technologies, along with a collective ambition to enhance data literacy and embrace cutting-edge technologies, our office is perfectly positioned to utilize AI and sophisticated data science techniques. Our aim is to propel research forward by making the most of our extensive clinical data ecosystem.

To guide our efforts, we have pinpointed four critical areas of focus for the new office: data lake, data intelligence, data translation and data mastery.

Four Critical Areas of Focus

  • Data Lake: We aim to revolutionize how diverse data sources are integrated, employing AI-enhanced technologies within a centralized data lake. This initiative will facilitate access to advanced data analytics and ML platforms via cloud computing technologies, enabling groundbreaking research possibilities.
  • Data Intelligence: By spearheading innovative AI projects, we intend to
    lead advancements in child health. This involves formulating a strategic approach to AI application in research and fostering a robust community of data science expertise.
  • Data Translation: Our objective is to accelerate the transformation of data science research into tangible insights and practical applications. We will achieve this through strategic translational initiatives and the deployment of advanced data science technologies, including AI and ML.
    This approach is designed to bridge the gap between theoretical research and its real-world impact, ensuring that scientific discoveries are promptly translated into benefits for patient outcomes.
  • Data Mastery: Our commitment extends to broadening the data science talent pool and making data science knowledge universally accessible. To realize this, we are launching education initiatives specifically designed to cultivate data science expertise among groups presently underrepresented in the field. Coupled with the creation of innovative software and the integration of AI technologies, these efforts are set to
    significantly elevate our research capabilities.

All four strategic initiatives share an aspect of the application of AI, which holds remarkable promise for transforming the health care sector and offers innovative solutions to complex challenges. Technologies such as ChatGPT exemplify the potential of AI to enhance our understanding and management of health conditions, promising a future where data-driven insights lead to better patient outcomes. As we stand on the brink of this technological revolution, it’s crucial to invest in research and development to fully realize AI’s capabilities in clinical settings.

However, integrating AI into health care is not without its challenges. Current AI technologies, including advanced large language models like ChatGPT, exhibit limitations that necessitate cautious optimism. One notable concern is their lack of determinism; the same query can yield varying responses, complicating their reliability in clinical decision-making. Moreover, these systems can “hallucinate,” generating responses that seem plausible yet lack factual grounding. Such issues underscore the importance of ongoing research and validation to ensure AI’s safe and effective application in health care environments.

Applying Data Science: The First Two Projects

Two pivotal translational initiatives will help us synergize data within the Office of Data Sciences data lake and assess emerging data science technologies.

A GENiUS Idea for Newborns

The first is a collaboration named GENiUS: GENomic analysis with enhanced AI for Understanding and Swift diagnosis. Th is project unites ODS with the Steve and Cindy Rasmussen Institute for Genomic Medicine, the Division of Clinical Genetics and Genomics, and the Division of Neonatology. With GENiUS, we hope to integrate electronic medical record (EMR) data with our
comprehensive genomic sequencing data, optimizing genetic testing’s efficacy for our most vulnerable neonatal intensive care unit patients.

GENiUS is poised to tackle two significant challenges. The first focuses on utilizing large language models (LLMs) and ML to identify neonates requiring genomic testing by detecting signs of genomic diseases — signs that might elude clinicians. The second is dedicated to the automation of genomic data analysis, ensuring genomic data from patients who initially test negative are
continuously re-analyzed and updated. Such updates are crucial as new disease mechanisms are uncovered and as the patient’s clinical condition evolves, guaranteeing the most accurate and timely diagnosis for ongoing care.

Data Science for Suicide Prevention

The second translational initiative is a pioneering partnership with the Center for Suicide Prevention and Research, aptly named DREAM: Data Review and
Evaluation Assistant for Medicaid Data Chatbot. Th is initiative leverages a comprehensive Medicaid claims database to develop a Chatbot powered by advanced LLMs. Th is innovative tool is designed to empower researchers at the Center for Suicide Prevention and Research by enabling rapid evaluation of new ideas and hypotheses. It also facilitates their navigation through the complex queries required to distill meaningful insights from this extensive dataset.

By incorporating behavioral health data into the data lake and developing sophisticated AI tools for data extraction and analysis, DREAM aims to foster the creation of groundbreaking biobehavioral health promotion strategies. This initiative exemplifies our commitment to leveraging AI for enhancing research and underscores our dedication to critical areas of public health, such as suicide prevention.

We believe LLMs have significant potential to extract valuable insights from unstructured text, such as clinical notes, and assist in interpreting genomic data. Our approach is to rigorously assess LLM technology with research data to ensure its application is responsible and free from potential harm. We are dedicated to understanding its biases and are making careful recommendations to ensure ethical AI use, aiming to prevent health inequities.

Bright Future Ahead

Reflecting on my journey, I am reminded of the palpable excitement I felt 15 years ago when I started working with genomics and next-generation sequencing. Just as genomics has proven to be a revolutionary force in medicine and biomedical research over the past decade, I envision data science as the next catalyst for transformative discovery and advancements.

I am mindful of the challenges that lie ahead. The path to integrating data science deeply and effectively within biomedical research is fraught with complexity. Challenges such as ensuring data privacy, data interoperability, navigating the intricacies of AI and ML applications in health care, and fostering a culture of continuous learning and adaptation among our team are but a few of the obstacles we must overcome.

Unleash the full potential of the biomedical data sciences and transformative AI to drive innovative health research, fuel development of groundbreaking diagnostics and treatments, inform disease prevention and health promotion strategies, and ensure equitable health outcomes for all people.

This is not going to be easy, but we have the right ingredients to be successful — an innovative spirit, a collaborative research community and a relentless pursuit of excellence. More importantly, we are guided by a shared vision and an unwavering commitment to improving patient outcomes through the power of data science.

This article appeared in the Spring/Summer 2024 issue. Download the full issue.


Image credits: Peter White, Ashley Kubatko, AdobeStock, Shutter Stock (illustrations)

About the author

Dr. White champions data-driven innovation as the inaugural Chief Data Sciences Officer (CDSO) of the Abigail Wexner Research Institute (AWRI) at Nationwide Children's Hospital and a Professor of Pediatrics at The Ohio State University College of Medicine.

As a member of AWRI's senior leadership team, Dr. White is responsible for developing and implementing a robust data science strategy. This involves integrating and analyzing diverse big data sources, such as genomic data and electronic health records, to extract critical insights for diagnosing and treating pediatric diseases.