Mapping Maine Cancers

By Dr. Janet M. Hock, Dr. Chris Farah and Dr. James H. Page

Maine's predominantly rural population of 1.3 million has the distinction of having the oldest median age in the United States, as well as the more dubious distinction of having one of the highest age-adjusted incidence rates for all cancers in the country. Indeed, since 2000, Maine has consistently ranked among the top three states in cancer incidence for all anatomical sites and for persons of all ages, races and genders (National Program of Cancer Registries, 2009 #92). Figure 1 shows the incidence rates for 15 common cancers mapped against Maine's 16 counties, showing that Maine's age-adjusted cancer incidence rates are significantly different from the confidence intervals in the U.S. overall.

Figure 1. Regional variation of age-adjusted incidence rates (2001-2005) for major cancer types, across the 16 Maine counties, as reported by the National Program of Cancer Registries (NPCR) {National Program of Cancer Registries, 2009 #92}. Rates describe persons of all ages and all races. With the exceptions of breast, breast in situ, and ovarian cancers (female) and prostate cancer (male), age-adjusted incidence rates were computed for males and females combined. Red cells indicate incidence rates significantly (p < 0.05) exceeding the 95% confidence interval of the corresponding U.S. national rate. Empty cells indicate no difference between county and U.S. in age-adjusted cancer rates. Gray cells denote rates suppressed by NPCR to protect patient privacy. X-axis: ordered as increase in population density of counties.ï¿1⁄2 Y-axis: incidence rate of cancer type ordered with highest incidence rate at the top. (Click for larger view.)

The obvious question is what combination of genetic, cultural and environmental factors is behind this unacceptably high incidence rate. The Maine Institute for Human Genetics and Health (MIHGH), a not-for-profit research subsidiary of Eastern Maine Healthcare Systems, has teamed with James W. Sewall Company (Sewall), a private Maine consultancy with specializations in mapping and GIS, to develop the Maine BioGeoBank, a research resource that links a repository of annotated human cancer biospecimens and cancer registry data with a GIS repository of cultural and environmental data. The BioGeoBank will enable researchers to undertake complex queries and analyses that explore relations between cancer genomics and the rural environment, yielding a better understanding about human susceptibility to cancer.

Why Maine?
Maine is a particularly useful test ground for this research. Our population is remarkably stable, and Maine's rural families tend to be large, extended and multi-generational, often living in close proximity to one another with similar lifestyles and environmental experiences. In Aroostook County, for example, 16-20% of householders have lived in the same residence for at least 30 years, compared to 10% in the U.S. overall. To cite another statistic, more than 70% of European families trace their genealogy to settlers of the 1750s. These sorts of factors enable better mapping of family genomics and histories than is often available elsewhere.

Maine's environmental history includes substantial use of toxins in ship building, forestry and the pulp and paper industries, as well as herbicides and pesticides used in Maine's blueberry and potato industries. Furthermore, an unusual geologic feature in Maine is the geographic juxtaposition of radon with endocrine-disruptive chemicals (EDCs) such as arsenic and dioxin. Arsenic occurs naturally in soil and bedrock and in particular is present at high levels throughout the area, as well as being used in certain wood products and as an ingredient in pesticides.

The BioGeoBank
The BioGeoBank's schematic architecture is presented in Figure 2. (Click for larger view.)
Pathological and clinical data and information from patient questionnaires and other related documentation are collected and managed in a third-party clinical research management system (CRMS). So, for example, a cancer patient requiring surgery gives consent, provides health and family histories, and donates blood and surgically resected tissue specimens. Individual and family information includes the individual's dwelling and occupational history, known environmental exposures, and lifestyle behaviors such as tobacco and alcohol use. To protect privacy, patient identifiers are delinked and replaced by a barcode not available to BioGeoBank users.

A user gateway enables complex queries drawing on data from the CRMS with the GIS-enabled environmental data. The GIS component is a relatively straightforward geospatial data management and delivery system. Sewall hosts the data in a PostgreSQL/PostGIS environment delivered using its GeoPower Hosted Portal Solution. Current data layers for the pilot areas currently under investigation include: streams, rivers and lakes for arsenic concentrations; geologic formations for radon emissions; soils for mercury; industrial use patterns for pesticide and herbicide applications; air quality; and wastewater outflows, brownfields and other EPA regulated sites. Each dataset is layered on appropriate land bases. More data layers will be added as they become available. The process is, of course, iterative. As more data become available, better correlations will emerge, calling in turn for new and better data. Security firewalls allow for levels of access to safeguard protected information.

Geospatial Challenges
There are significant practical challenges in using GIS data as envisioned here. One challenge, for example, is what is known in the GIS industry as the "quilt" problem. Many GIS datasets, especially those covering large, disparate regions, are assembled, or quilted together, from multiple sources, each with its own differing purpose and with its own scale, accuracy and reference points. Yet in such cases there are likely missing data that, because of their absence, will skew or bias queries.ï¿1⁄2 If one analysis matching cancer incidence with certain environmental data shows a high correlation of events in locations A and C, but not in a similar or intervening location B, possible explanations include the fact that the data do not support the correlation at B, or representative environmental or geospatial data at B equivalent to that at locations A and C were never captured. A related problem is that of trying to align and layer data captured with different scales and accuracies. What first appears to be a strong correlation between events and location-types may turn out completely different or even absent when the events are layered on a more accurate land base.

Another key geospatial challenge concerns the spatiotemporal requirements central to these studies. If, for example, the incubation period of a certain kind of cancer is 10-15 years and the research is querying correlations between that cancer and a certain type of chemical exposure, the subjects must be linked to particular locations where exposures were likely during the exposure period. Effective methods to incorporate the temporal dimension into GIS are only now emerging from conceptual and pilot studies.

Finally, few large GIS data acquisition projects have been driven by medical research questions. Projects that cover large geographic areas on a high resolution scale are often commissioned by public entities whose drivers are determined by political uses and boundaries, not medical or even environmental needs. Cancer registries, for example, capture patient data by political boundaries, with county often the smallest data aggregate when population density is low; yet exposure patterns linked to rivers, for example, typically transgress political boundaries on a regular basis. This problem speaks to increased coordination for data acquisition based on environmental and demographic patterns rather than political ones.

Given the complexity of issues surrounding the growth in cancer rates in Maine and throughout the nation, we believe the BioGeoBank's research potential is significant. The opportunities for GIS to contribute to this method of research are both exciting and challenging. The opportunities for discipline cross-fertilization are particularly exciting. We believe that clinicians' research needs will help drive development of effective temporal mapping, for example. Likewise, mappers' understanding of issues like the quilting problem should assist researchers in making better correlations and judgments. We expect the partnership of biomedical research and GIS to make meaningful contributions to advances in our understanding of human health.
Acknowledgments: MIHGH is grateful for financial support of the U.S. Dept. of Defense grant number W81XWH-07-2-0116, PI: JM Hock, and Eastern Maine Healthcare System (EMHS). We would like to acknowledge Paul Laub, PhD, MIHGH BioGeoBank for his work on some of the descriptive epidemiology mentioned in the text.

Ed. note: A version of this material was originally presented at the URISA GIS in Public Health Conference, June 6-8, 2009, Providence, RI.

Published Thursday, November 5th, 2009

Written by Dr. Janet M. Hock, Dr. Chris Farah and Dr. James H. Page

If you liked this article subscribe to our newsletter...stay informed on the latest geospatial technology

© 2016 Directions Media. All Rights Reserved.