Varhouse and AVA: Revolutionizing Genomic Data Management and Interpretation
Varhouse and AVA: Revolutionizing Genomic Data Management and Interpretation https://pediatricsnationwide.org/wp-content/uploads/2025/02/AdobeStock_97322256-1-1024x672.jpeg 1024 672 Lauren Dembeck Lauren Dembeck https://pediatricsnationwide.org/wp-content/uploads/2021/03/Dembeck_headshot.gif
The groundbreaking software system is used for both research and clinical purposes and is transforming how genomic data is organized, interpreted and applied to improve patient care.
When it comes to working with big data, it doesn’t get much bigger than the human genome. Each of our genomes has approximately 6 billion data points, half from one’s mother and half from one’s father. Compared to the human reference genome, an individual typically harbors 4-5 million genetic variants, a large number of which may be unique to that person. Understanding the impact of these variants and their multitude of interactions is crucial to improving our health and management of disease, particularly in pediatric cancer and rare diseases.
Clinicians and researchers use various genome sequencing approaches to identify genetic variants that may be causing a child’s cancer or disease. Discerning which variants are potentially causative in each case can be a daunting task.
“Genomics is an example of truly ‘big data’,” says Peter White, PhD, chief data sciences officer of the Abigail Wexner Research Institute (AWRI) at Nationwide Children’s Hospital and professor of Pediatrics at The Ohio State University College of Medicine.
Upon joining Nationwide Children’s 16 years ago, Dr. White started a new genomics initiative that eventually evolved into The Steve and Cindy Rasmussen Institute for Genomic Medicine (IGM). During that time, he and his IGM cofounders began applying genomics to various clinical areas. With the development of new technologies that allowed rapid genome sequencing, the team was creating a vast amount of data that needed to be organized and interpreted.
“When we first started, we were sequencing around 10 patients a year. Now, we’re sequencing thousands annually,” Dr. White explains. “As the volume grew, we began asking ourselves important questions: How do we manage this vast amount of data? How do we organize all these data points for each patient and connect them to their clinical characteristics?”
Dr. White and his team of software engineers began developing a system to house and interpret this data, and over the years, they refined it into its current fully functional form, the Variant Analysis Warehouse or Varhouse for short.
Functionality of Varhouse and AVA
Their solution, Varhouse, represents the culmination of a decade of research and development. Varhouse is a state-of-the-art genomic data warehouse for large-scale analysis and interpretation of genomic data. It offers cloud-enabled, scalable storage and real-time genomic data analysis.
“The system is designed to scale with an increasing number of sequenced patients, allowing for long-term data organization and discovery,” says Dr. White. “However, Varhouse is not just a data warehouse; it is a tool that empowers researchers and clinicians to harness the full potential of genomic data.”
In addition to genomic data warehousing, Varhouse supports data visualization, filtering, variant prioritization, interpretation, clinical report generation and advanced analytics. The analysis is completed by a cutting-edge web-based tool concurrently developed by the Nationwide Children’s team called Assisted Variant Assessment (AVA). The AVA component has two main functions:
annotation and interpretation. Annotation involves identifying genetic variants and their impacts on genes. Varhouse continuously retrieves the latest genomic annotation data, including publicly available data, and AVA integrates these annotation sources to provide real-time updates, ensuring the latest genomic variant information is always available.
“The field of genomics is rapidly evolving. For example, about 8,000 rare diseases have been described, and every month, an additional 40 to 50 new rare disease genes are discovered,” explains Dr. White. “Clinicians can’t stay on top of that volume of literature. So, we’re using AI technology to bring in all of that information to identify the relevant genetic changes for each patient’s condition.”

Clinical and Research Applications of Varhouse
Varhouse is integrated into clinical and translational research workflows at Nationwide Children’s. At the hospital, it is primarily applied in pediatric cancer, rare disease diagnostics and translational research. It enables the identification of novel genetic variants, guiding targeted treatments and personalized therapeutic strategies and linking patients to relevant clinical trials based on their genetic variants.
Of particular note, Varhouse is being used to store and analyze data from the Molecular Characterization Initiative (MCI), part of the National Cancer Institute (NCI) Childhood Cancer Data Initiative (CCDI) and is a national collaboration between members of the childhood cancer community providing state-of-the-art molecular characterization at the time of diagnosis that helps participants and doctors select the best and most appropriate treatment. However, the data are also made accessible to researchers for future studies.
“Our clinical team uses the Varhouse interface to highlight key variants, such as a TP53 deletion or MYC amplification, to guide treatment decisions,” explains Dr. White. “Varhouse is integrated with our clinical reporting software and, after review by one of our clinical directors, automatically de-identifies all data and shares it in real-time with the NCI for global research access.”
Integration of Advanced AI in Genomic Data Interpretation
The team at Nationwide Children’s has also prioritized the integration of artificial intelligence (AI) and natural language processing algorithms in the interpretation process to prioritize relevant genetic variants. This allows automated data interpretation, streamlining the genomic data analysis and interpretation bottleneck with AI-driven tools, reducing manual workloads, improving diagnostic accuracy and accelerating discoveries.
Natural language processing and large language models are used to read clinical notes and identify relevant phenotypes linked to genes. Machine learning algorithms consider the impact of genetic variants, the match between a patient’s clinical characteristics and known disease genes, and inheritance patterns to rank variants. These AI-driven tools are new to Varhouse and are currently being clinically validated.
“Using AI, we’ve gone from looking at hundreds of genetic variants to looking at a handful of genetic variants,” says Dr. White. “We can use AI to streamline these processes. A human still makes the final assessment, but AI brings the most likely answers to the top of the list.”
The team recently published a proof-of-concept analysis using their machine learning algorithm, CALVaRi. The platform identifies likely diagnostic variants for review using clinical characteristics from patient clinical notes. In the study, CAVaLRi identified diagnostic findings in 18 previously non-diagnostic cases non-diagnostic cases.
Impact and Future Directions
Varhouse is routinely used for all patients undergoing diagnostic testing and tumor molecular characterization at Nationwide Children’s, with over 4,000 patients sequenced for the MCI so far. The system has been instrumental in helping researchers and clinicians identify the causes of cancers and rare diseases in thousands of patients.
“Our software engineers at Nationwide Children’s came from many different backgrounds, and they all love the fact that what they’re doing here is having such an impact. They are helping find answers for families who have sometimes gone years without a diagnosis, and it is just so rewarding to create something that is helping kids,” says Dr. White. “I’m proud to see how our software has evolved, growing from something used initially for research to something now used routinely for patient care.”
References
Schuetz RJ, Antoniou AA, Lammi GE, Gordon DM, Kuck HC, Chaudhari BP, White P. CAVaLRi: An algorithm for rapid identification of diagnostic germline variation. Human Mutation. 2024; 2024(1) 1-15.
About the author
Lauren Dembeck, PhD, is a freelance science and medical writer based in New York City. She completed her BS in biology and BA in foreign languages at West Virginia University. Dr. Dembeck studied the genetic basis of natural variation in complex traits for her doctorate in genetics at North Carolina State University. She then conducted postdoctoral research on the formation and regulation of neuronal circuits at the Okinawa Institute of Science and Technology in Japan.
- Lauren Dembeckhttps://pediatricsnationwide.org/author/lauren-dembeck/
- Lauren Dembeckhttps://pediatricsnationwide.org/author/lauren-dembeck/
- Lauren Dembeckhttps://pediatricsnationwide.org/author/lauren-dembeck/
- Lauren Dembeckhttps://pediatricsnationwide.org/author/lauren-dembeck/January 29, 2019