Accessing Genomic Data for Research and Clinical Diagnostics – Meeting the Challenge

April 2, 2018

What good is genomic data if it can’t be shared efficiently?

The genesis of DNA sequencing technology in the 1970s was a turning point in science, giving birth to a modern era in biology. Further, with the unfolding of the relationship between nucleic acid order and uniqueness across species, improvement in strategies to sequence this molecule continued at a rigorous pace.

In 1976, a bacteriophage became the first organism sequenced. Since then, consistent improvements in technology within the sequencing space have ultimately led to an avalanche of data, spawning the field of bioinformatics. The Human Genome Project began in 1990, and its completion in 2003 is a remarkable milestone in the history of medicine, contributing to our understanding of the molecular mechanisms underlying a multitude of human diseases. Through advanced next-generation sequencing technologies, particularly whole exome and whole genome sequencing, genomic data is entering the clinical space.

Genomic data has the ability to be extremely informative, and the remarkable improvements in sequencing technologies have helped create a substantial pool of data. But what good is data that cannot be shared across applications and institutions?


Currently, data generated within research laboratories are siloed, with many small pockets of geographically dispersed information. One of the main reasons for this limited accessibility is the privacy associated with genomic data. Although it is not considered personal health information according to HIPAA guidelines, it is possible to identify an individual using their genomic information. The Global Alliance for Genomics and Health (GA4GH) is trying to enable seamless data integration and sharing by designing interoperable genomic application programming interfaces (APIs), termed genomic API. Google has adopted the framework proposed by GA4GH and created a functional product named Google Genomics. Although the first version of the API is in production and actively used by the research community, work is ongoing to add features and enhancements to the existing version.

The possibility of using a blockchain framework for secure storage and sharing of genomic data is also frequently discussed. Blockchain is the underlying technology behind cryptocurrencies, such as bitcoin, and is disrupting the financial industry with its demonstrated strengths in interoperability, transparency and auditability. Blockchain uses a decentralized, distributed ledger technology that eliminates the need to have an intermediary to verify transactions, thus empowering users depositing data to control its sharing. Every transaction performed gets added as a block to the chain, creating a historical footprint, which cannot be modified or deleted, preventing malicious attacks on the data. Although health care organizations are still in the infant stages of exploring this technology, blockchain has the potential to become the basis to transform health information technology.


Genomic data is gaining visibility in the clinical space, especially in the diagnosis of genetic disorders. Currently, once a health care provider orders a genetic test, the samples are sent to a certified genetic laboratory that returns an analyzed, concise clinical summary report in the form of a PDF document. This report is mostly stored as a paper version but in some cases, it might be linked to the patient record in the electronic health record (EHR) as an attachment.

The emerging practice of personalized medicine requires the integration of genomic data with clinical information in the EHRs for the overall benefit of health care and improved patient outcomes. One of the benefits of this integration is the possibility of using the combined information for automated downstream clinical decision support (CDS). A major challenge to executing this action is the inadequate use of standards to represent genomic data. Genomic data comes in a variety of formats and sizes that would require a major reconstruction of the workflow within EHR. Another challenge is also the insufficient education of the clinical workforce related to how best to use genomic information for the benefit of the patient.

Despite these challenges, one area in which clinical integration of genomics data in the HER for CDS has been successful is the emerging field of pharmacogenomics. Here, a patient’s genomic information can predict his or her reaction toward specific drugs. Adverse drug reactions are a leading cause of morbidity and mortality during routine medical care. Identifying a genetic variant and linking it to the patient record allows the activation of downstream processes that can guide physicians toward prescribing the appropriate medication for a patient. At Nationwide Children’s, we’ve rolled out a CDS system for the thiopurine-TPMT drug-gene pair. Although the effect of the TPMT gene on the metabolism of thiopurines is well documented for the pediatric population, the lack of prescribing physician knowledge about genomic test results prompted the development of an alert system within the EHR.

Genomic data is here to stay. Innovative solutions to overcome some of the challenges outlined here are required to enable the utilization of genomic information for the benefit of both clinical and research communities.