Every day, NASA spacecraft beam down hundreds of petabytes of data, all of which has to be codified, stored and distributed to scientists across the globe. Increasingly, artificial intelligence is helping to "read" this data as well, highlighting similarities between datasets that scientists might miss. For the past 15 years, the big data techniques pioneered by NASA’s Jet Propulsion Laboratory in Pasadena, California, have been revolutionising biomedical research.
On Sept. 6, 2016, JPL and the National Cancer Institute (NCI), part of the National Institutes of Health, renewed a research partnership through 2021, extending the development of data science that originated in space exploration and is now supporting new cancer discoveries.
The NCI-supported Early Detection Research Network (EDRN) is a consortium of biomedical investigators who share anonymised data on cancer biomarkers, chemical or genetic signatures related to specific cancers.
Their goal is to pool all their research data into a single, searchable network, with the goal of translating their collective work into techniques for early diagnosis of cancer or cancer risk.
A lung specimen that was analysed using the same machine learning algorithms that were originally developed for space research.
Credits: Early Research Detection Network/University of Colorado
In the time they’ve worked together, JPL and EDRN‘s efforts have led to the discovery of six new Food and Drug Administration-approved cancer biomarkers and nine biomarkers approved for use in Clinical Laboratory Improvement Amendments labs.
The FDA has approved each of these biomarkers for use in cancer research and diagnosis. These agency-approved biomarkers have been used in more than 1 million patient diagnostic tests worldwide.
“After the founding of EDRN in 2000, the network needed expertise to take data from multiple studies on cancer biomarkers and create a single, searchable network of research findings for scientists,” said Sudhir Srivastava, chief of NCI’s Cancer Biomarkers Research Group and head of EDRN.
JPL had decades of experience doing similar work for NASA, where spacecraft transmit hundreds of petabytes of data to be coded, stored and distributed to scientists across the globe.
Dan Crichton, the head of JPL’s Center for Data Science and Technology, a joint initiative with Caltech in Pasadena, California, helped establish a JPL-based informatics center dedicated to supporting EDRN’s big data efforts.
In the renewed partnership, JPL is expanding its data science efforts to research and applying technologies for additional NCI-funded programs. Those programs include EDRN, the Consortium for Molecular and Cellular Characterisation of Screen-Detected Lesions, and the Informatics Technology for Cancer Research initiative.
“From a NASA standpoint, there are significant opportunities to develop new data science capabilities that can support both the mission of exploring space and cancer research using common methodological approaches,” Crichton said. “We have a great opportunity to perfect those techniques and grow JPL's data science technologies, while serving our nation.
Crichton said JPL has led the way when it comes to taking data from raw observations to scientific conclusions. One example: JPL often deals with measurements from a variety of sensors -- say, cameras and mass spectrometers.
Both can be used to study a star, planet or similar target object. But it takes special software to recognise that readings from very different instruments relate to one another.
There’s a similar problem in cancer research, where readings from different biomedical tests or instruments require correlation with one another. For that to happen, data have to be standardised, and algorithms must be “taught” to know what they’re looking for.
Since the time of its founding, EDRN’s major challenge has been access. Research centers all over the United States had large numbers of biomarker specimens, but each had its own way of labeling, storing and sharing their datasets.
Ten sites may have high-quality specimens for study, but if their common data elements -- age of patient, cancer type and other characteristics – aren’t listed uniformly, they can't be studied as a whole.
“We didn’t know if they were early-stage or late-stage specimens, or if any level of treatment had been tried,” Srivastava said. “And JPL told us, ‘We do this type of thing all the time! That’s how we manage our Planetary Data System.’”
As the network has developed, it has added members from dozens of institutions, including Dartmouth College’s Geisel School of Medicine; Harvard Medical School’s Massachusetts General Hospital; Stanford’s NIST Genome-Scale Measurements Group; University of Texas’ MD Anderson Cancer Center; and numerous others.
Christos Patriotis, program director at NCI’s Cancer Biomarkers Research Group, said the network’s members now include international researchers from the U.K., China, Japan, Australia, Israel and Chile.
“The more we expand, the more data we integrate,” Patriotis said. “Instead of being silos, now our partners can integrate their findings. Each system can speak to the others.”
As JPL and NCI’s collaboration advances, next steps include image recognition technology, such as helping EDRN archive images of cancer specimens. Those images could be analysed by computer vision, which is currently used to spot similarities in star clusters and other astrophysics research.
In the near future, Crichton said, machine learning algorithms could compare a CT scan with an archive of similar images, searching for early signs of cancer based on a patient’s age, ethnic background and other demographics.
“As we develop more automated methods for detecting and classifying features in images, we see great opportunities for enhancing data discovery,” Crichton said. “We have examples where algorithms for detection of features in astronomy images have been transferred to biology and vice-versa.”