Exploring GIS and Molecular Biology as a High School Student
Abstract
Geographic Information Systems (GIS) continue to expand beyond traditional mapping into diverse scientific fields, including biology and public health. The increasing accessibility of tools such as QGIS allows students and researchers with limited formal GIS training to investigate complex, real-world questions. Over the course of three years in high school, I integrated my interests in molecular biology, health, and social equity into a spatial research project using GIS.
This study applies Ordinary Least Squares (OLS) regression within QGIS to examine rhizospheric conditions associated with rice blast, a severe fungal disease affecting global rice production. Spatial regression results suggest that water flow patterns and accumulation may influence a location’s vulnerability to root-level fungal infection. However, these findings should be considered exploratory. Additional validation is required, particularly to assess the influence of confounding factors such as crop density and planting practices.
Introduction
This GIS investigation originated from a molecular biology experiment conducted during a research internship. My focus was on Magnaporthe oryzae, a fungal pathogen responsible for rice blast—a disease that accounts for approximately 30 percent of global rice yield losses each year.
While much existing research emphasizes leaf infection, this project sought to better understand fungal penetration at the root level. The guiding questions were: Which rhizospheric conditions promote fungal entry into root tissue? and How does susceptibility to root pathogenesis vary spatially across agricultural landscapes?
From the outset, it was clear that laboratory experiments alone could not fully capture the complexity of real-world rice farm environments. Even the most controlled lab settings cannot replicate the dynamic interactions present in agricultural ecosystems. GIS offered a powerful alternative by enabling analysis of environmental conditions as they occur across actual landscapes.
Spatial Regression Framework
GIS platforms such as ArcGIS and QGIS provide regression tools for analyzing spatial relationships among variables. This project employed the Ordinary Least Squares (OLS) regression method, which models linear relationships between dependent and independent variables using raster datasets.
Each raster cell within the study area is treated as an individual observation. The OLS model follows a standard linear regression form, where coefficients represent the influence of explanatory variables, residuals capture unexplained variation, and the dependent variable reflects the phenomenon being modeled.
The study area selected was Karnataka, a state in southwestern India where rice blast is a recurring agricultural problem. Root pathogenesis intensity was defined as the dependent variable, while two explanatory variables were examined: soil nitrogen availability and water flow accumulation.
Environmental Variables Considered
Soil Nitrogen Availability
Nitrogen levels vary significantly across Karnataka. Preliminary overlay analysis and supporting literature suggested that areas with lower nitrogen availability often coincide with increased vulnerability to rice blast. This relationship was tested spatially using OLS regression.
Water Flow and Accumulation
Wet rice cultivation dominates agricultural practices in Karnataka, with rice paddies typically submerged during growth.
Given this environment, water movement plays a crucial role in spore transport and accumulation within the rhizosphere. Flow direction and accumulation were therefore hypothesized to influence fungal spread and infection likelihood.
Methods
To prepare inputs for regression analysis, more than 100 elevation measurements across Karnataka were collected and plotted using an X–Y coordinate system. These data points were interpolated using the Inverse Distance Weighted (IDW) method, which estimates raster cell values based on the proximity and influence of nearby measurements.
The resulting Digital Elevation Model (DEM) was used to calculate flow direction for each cell by identifying the direction of steepest descent.
Cell sizes were standardized, with orthogonal distances set to 1 and diagonal distances set to √2. Cells with no lower neighboring cells were assigned no flow direction.
Using the flow direction raster, weighted flow accumulation was computed. Each cell was assigned a weight based on average rainfall levels within its district. Accumulation values were calculated by summing the weighted contributions of all upstream cells.
Rice blast occurrence data were obtained from the International Rice Research Institute, while soil nitrogen data were sourced from Raitimitra Agriprofile. The regression assumed a stationary relationship between root pathogenesis intensity and the selected environmental variables.
Results and Interpretation
The OLS model produced an R² value of approximately 73 percent, indicating that nearly three-quarters of the variability in root pathogenesis could be explained by soil nitrogen and water accumulation. Coefficient signs aligned with initial hypotheses.
Nitrogen availability exhibited a negative coefficient, indicating that lower nitrogen levels were associated with increased rice blast prevalence. In contrast, water accumulation showed a positive coefficient and the greatest magnitude, suggesting it may be the strongest predictor among the variables considered.
However, the regression also returned a statistically significant Koenker (BP) statistic, signaling spatial nonstationarity within the model. This result implies that relationships between variables vary across space and that important explanatory factors may be missing.
A notable cluster of positive residuals appeared in southwestern Karnataka, including areas such as Mysore, Mangalore, and southern Chikmagalur. One plausible missing variable is soil acidity, as soils in this region are particularly acidic and may influence fungal infection dynamics.
Conclusion
The findings suggest that water flow direction and accumulation play an important role in identifying locations susceptible to root-level rice blast infection. However, the results remain preliminary. Additional variables—such as crop density, phosphorus and potassium levels, and soil acidity—must be incorporated to improve model robustness.
With further refinement, this GIS-based approach has potential applications for farmers, plant pathologists, and agricultural planners seeking to anticipate disease risk and optimize management strategies.
Acknowledgements
I would like to thank Mr. Desjardins of the University of North Carolina at Charlotte for his guidance in model development and support in navigating GIS tools.
Works Cited
ArcGIS Pro. Ordinary Least Squares (OLS).
Rice Knowledge Bank. Rice Blast (Leaf and Collar).
Ishiguro, K. Simulation Models of Rice Blast Epidemics.
GIS Geography. How to Build Spatial Regression Models in ArcGIS.
Sesma, A., & Osbourn, A. The Rice Leaf Blast Pathogen.
Howard, R. J., & Valent, B. Host Penetration by Rice Blast Pathogen.















