Why Exposomics Needs Spatial Standards and “Spectrumomics”

“Human biology should be primarily concerned with the responses that the body and the mind make to the surroundings and ways of life…little effort has been made to develop methods for investigating scientifically the interrelatedness of things.” - René Dubos, An exposome perspective: Early-life events and immune development in a changing world

Fifty years after Dubos wrote those words, major science funding organizations have just begun to take this challenge seriously. Scientists now have powerful new methods for investigating the effects of environments on health and disease. In this article, we explore the new exposure science and discuss how standard ways of representing time and space will be important as scientists explore the environmental determinants of disease with the help of big data and cloud computing.

Exposomics

Health and disease are products of both nature (genetics) and nurture (environment). While genomics is about the genetic factors that predict health, exposomics is about the environmental factors that predict health over the course of a lifetime, including nutrition, social factors, chemicals, and the physical environment.

Figure 1: The exposome and exposomics

In 2015, the American Society for Mass Spectrometry pointed out “that the genetic heritability of respiratory disease, cardiovascular disease and many cancers is very low: estimated to be 10% - 20%. …. In the exposome paradigm, 80% - 90% of chronic human diseases are determined by E and GxE.” (That is, Environment and Genetics interacting with Environment.)

Add to that the fact that most chronic human diseases are non-communicable, and the importance of studying exposomics becomes clear. But given the number of genetic and environmental factors, it assumes the necessity of gathering, integrating and analyzing very large quantities of data – and location matters.

Pioneering geographer Waldo Tobler, who was among the first to use computers in cartographic and geographic research, is famous for his “First Law of Geography.” It states: “Everything is related to everything else, but near things are more related than distant things." In the new era of environmental exposure research, researchers will depend on computers to analyze co-location of “hot spots” of different kinds: For example, where do people who have particular habits have a higher incidence of a particular cancer? Where are high-incidence disease hot spots co-located with particular kinds of infrastructure installations or industrial sites?

In the old paradigm of genetics, before the genomics era, single gene variants were identified that were associated with particular hereditary cancers and other illnesses. It was a slow process. The paradigm changed as advances in gene sequencing technology enabled the use of a whole-genome approach to examining differences between different genomes and their resultant biological forms and functions (phenotypes). The whole genome approach involved not only high-throughput sequencing technology, but also international collaboration and data sharing among different initiatives, as advanced by the Human Genome Project. Standard experimental protocols and data models coupled with advances in computing and bioinformatics were essential elements in the Human Genome Project’s program of sharing and integrating data.

This successful approach involves creating large “Findable, Accessible, Interoperable, Re-usable (FAIR)”* datasets. It was the case for genomics and it is the case for the proteome (proteins), trascriptome (RNA), epigenome (epigenetics), microbiome (gut flora), ionome (electrolytes and elements) etc.

Exposomics builds on the “multiomics” systems that are being assembled to discover how these “omes” work together. High performance multiomics systems aggregate research literature and data from multiple omics life science subdomains and sometimes chemistry and physics and physics as well. Researchers apply big data approaches to find among these aggregated databases correspondences, congruences, dependencies, and apparent causalities that suggest hypotheses for further study and experimentation. When exposomics datasets are added to this array of biomolecular omics datasets, researchers have a powerful new way to explore how environment affects health.

Yet from my review of the websites of the world’s current exposomics programs and from conversations with various participants in these programs, it appears that location is not adequately modeled and there is no current effort to reach consensus on standard spatial and temporal ontologies.

When there are no standards, curation of data is necessary. The curation process involves restructuring data to conform to standards that may or may not have a basis in consensus. (This is analogous to the state of geospatial data integration before the OGC began working with industry, users and other standards development organization to normalize abstract data models and encodings for geospatial data and services.)

The Comparative Toxicogenomics Database is perhaps the most widely used international resource for finding and using exposure ontologies. The CTD first pioneered the biocuration of exposure science data by manually curating content from the scientific literature and implementing controlled vocabularies for chemicals, genes, phenotypes, and diseases, which allows for seamless data integration. By centralizing and codifying this data, CTD provides a web platform that enables scientists to discover potential molecular mechanisms and design testable hypotheses for environmental diseases. It was in 2012 that NIH-funded researchers expanded core CTD toxicology content to include exposure data (specifically within the context of chemical stressors and human receptors). They captured these data using “the [draft] Exposure Ontology (ExO), a framework that structures key exposure concepts: ‘exposure stressor’ (an agent, stimulus, activity or event that causes stress on an organism), ‘exposure receptor’ (an entity that interacts with an exposure stressor), ‘exposure event’ (an interaction between an exposure stressor and an exposure receptor), and ‘exposure outcome’.… ExO depth was expanded by using existing third-party vocabularies where applicable. For geographic location, country codes published by the International Organization for Standardization (ISO 3166) and U.S. state abbreviations were used.” (Grondin et al, 2016)

ExO was developed and is maintained by the Open Biological and Biomedical Ontology Foundry. I have explained to the OBO’s ExO developers why it would be useful to implement some of the Spatial Data on the Web Best Practices that resulted from the joint OGC/W3C Spatial Data on the Web Working Group. This implementation effort would probably first involve documenting a number of exposure use cases that require precise location determination.

A CTD manager explained to me that comprehensive location ontology is not yet part of ExO. He said that they are aware of the geo.owl ontology, which is part of the OBO Foundry, but this isn't aligned with their curation of countries, states, regions and cities, and they would need to investigate other geospatial ontologies for a better fit. (geo.owl predates the important modifications that were worked out by joint OGC/W3C Spatial Data on the Web Working Group.) CTD doesn’t capture data at the level of latitude and longitude because they don't capture information at this level from the primary literature. They curate what the authors publish, so ideally, such specificity would need to come from the authors. Also, they are not aware of an international collaboration on exposure ontologies or any US/Europe exposomics collaboration.

So, with CTD, a process is in place that could accept a more comprehensive spatial ontology, but the requirements need to come from the researchers. If an exposomics location ontology and best practices for its use were developed, the next step would be outreach and education. The existence and importance of the ontology would need to be communicated to researchers and their funders, professional associations and journal editors. This information would also need to be communicated to data curators, exposome software developers, data librarians and data managers. Ideally, these communities would work together in a consensus process.

Spectrumomics?

The requirements for a precise location ontology will probably come from the bioelectromagnetics research community, because electromagnetic field stressors and receptors usually have precise locations relative to each other and to the physical features and phenomena that affect radiation transmission at wireless communication frequencies. Yet, in none of the exposome initiatives I’ve looked at are researchers looking at the biological effects of EMF. The initiative that comes closest to being a bioelectromagnetics exposome initiative is GERoNiMO (“Generalised EMF Research using Novel Methods – an integrated approach: from research to risk assessment and support to risk management”). It is an ongoing project, funded under the European Commission’s FP7-ENVIRONMENT, running from 2014-01-01 to 2018-12-31. It consists of an array of coordinated studies, most of which have not yet been published. Presumably, the studies will be published next year. It will be interesting to see the results.

One of the key assumptions of exposomics is that we are all exposed to multiple environmental stressors, and when potentially toxic stimuli are combined, their adverse effects can be increased. Unfortunately, few people outside of the bioelectromagnetics research community are aware that this applies not only to chemicals but also to EMF. To change that, I believe that exposomics needs what might be called “spectrumomics.”

Like the other exposomics domains, spectrumomics would involve:

1) Institutional collaboration, with coordinated research agendas and resource sharing.

A few leaders in the world of bioelectromagnetics would need to develop a plan based on something like the Bermuda Principles. The spectrumomics initiative would leverage the many open resources available to exposomics researchers.

2) Digital technologies for high-throughput exposure data collection.

A tremendous amount of epidemiological data could be collected through citizen science projects that make use of cell phones’ abilities to collect data about incoming and outgoing emissions and position of device relative to body and relative to the earth. Fitness trackers could provide physiological data. For laboratory studies, special laboratory test equipment could be designed, using the latest miniature signal generators and electromagnetic shielding materials.

3) Open standard protocols and data models.

Along with the special laboratory equipment, standard experimental methodologies and ontologies would need to be agreed upon in a consensus process. The ontologies would draw from those available through CTD, OGC and other authorities.

4) Major efforts to publish, curate and share data.

The academic journals that most frequently publish bioelectromagnetics research papers would need to be enlisted and their options for open access and open data would need to be considered for their alignment or non-alignment with spectrumomics goals.

I’m exploring ways to implement this vision with others. “Further research is urgently needed”, as reported in this 256 page “Biomarkers of exposure ...” report from the EU's FP7 project HEALS (search on EMF). It will take time to materialize, but I think a spectrumomics addition to the exposome is inevitable.