# A High-Schooler’s Explorations in GIS and Molecular Biology

**Abstract**

As with every science discipline, new applications of spatial science, specifically GIS, are being discovered everyday. The user-friendliness of softwares such as qGIS enables individuals with little to no experience in information systems to navigate and explore their own research questions. Over the past three years of my high school career, I’ve channeled my own passions for biology, social equity, and health into applications of GIS. In this paper, I discuss a project conducted using qGIS’s OLS (Ordinary Least Squares) regression tool to understand, from a spatial standpoint, rhizospheric factors contributing to rice blast, a devastating cereal disease caused by fungal pathogenesis. Results from regression analysis indicate degree of water flow and accumulation impact a location’s susceptibility to root pathogenesis. However, at this point results should be considered exploratory. Further validation is needed that higher blast occurrences are a direct result of higher flow accumulation; variability in number of rice crops planted could be a major lurking variable.

## **Introduction**

This GIS project had its foundations in a molecular biology experiment I performed during an internship. I chose to investigate *Magnaporthe oryzae*, a fungal pathogen (see Figure 1) that penetrates rice plant tissue and causes rice blast, a devastating cereal disease accounting for nearly 30% of global rice harvest loss each year (see Figure 2).

Figure 1: *Magnaporte oryzae*’s three-celled spores

Figure 2: Blast legions formed on aerial parts of rice plant

While most scientific efforts have focused on fungal penetration by way of the leaf, mine sought to understand root pathogenicity on a deeper level. My research question was as follows: what rhizospheric factors or conditions are conducive to fungal penetration of root tissue? How does a plant’s susceptibility to root pathogenesis vary with location?

From start, I was convinced such a complex research question could not be solved in a lab setting, where no matter what a scientist does, he or she can never recreate a complete representation of the rice farm rhizosphere. I figured that for such a question, I ought to observe real-world data that would be more representative than a lab sample. GIS provided an effective, yet efficient outlet to do just this.

ArcGIS provides several regression tools for intervariable analysis. One such tool is the OLS (Ordinary Least Squares) regression tool. Given raster data for every considered variable (both independent and dependent), OLS assigns corresponding values to each pixel, or cell, in the designated spatial extent. Treating each pixel as a separate data point, OLS develops a linear regression model in the following form:

Where β_{i} represents the corresponding coefficient to explanatory variable *x*_{i}, ε represents residual error, and *y *represents the dependent variable. Thus, using OLS’s linear model, the value of a dependent variable can be predicted given explanatory variables.

In this project, the region considered for regression modeling was Karnataka, a state in southwestern India where rice blast is a recurring problem. The assigned dependent variable was intensity of root pathogenesis. Two key explanatory variables were explored.

The first was soil nitrogen availability. Throughout Karnataka, average nitrogen concentration varies greatly. Furthermore, rough overlay analysis and review of relevant literature (see The Significance of Nitrogen Regulation, Source and Availability on the Interaction Between Rice and Rice Blast by Donofrio, Mitchell, and Dean) suggests low nitrogen levels tend to correspond with locations vulnerable to root pathogenesis. OLS was pursued to explore these initial findings.

The second included explanatory variable was degree of water accumulation. As shown in figure 3, farms in Karnataka generally practice wet rice cultivation. The rice paddy is grown in flooded fields such that roots are completely immersed in water.

From this, it logically follows that direction and degree of water flow play key roles in spore dispersion and accumulation in the rhizosphere.

Figure 3: Wet rice cultivation

**Methods**

In preparation for OLS analysis, over 100 altitude measurements from across Karnataka were retrieved and plotted on the X-Y Coordinate system. This data was interpolated through the Inverse Distance Weighted (IDW) method, which estimates cell values by averaging the valueof surrounding points based on their respective distances. The closer a point is to the center of the cell being estimated, the more weight it is given during estimation.

IDW generated a raster layer of altitude, or a Digital Elevation Model (DEM). Flow direction was subsequently determined by calculating the direction of deepest descent, or maximum drop, from each cell: maximum_drop = change_in_z-value / distance * 100

Z-value corresponds to the cell’s assigned altitude in this case. Cell (or pixel) sizes were set to 1, meaning the distance between two orthogonal cells is 1, while the difference in two diagonal cells is 1.414, or the square root of 2. When a direction of steepest descent was found, the output cell was coded with the color representing that direction. (refer to figure 4). If all neighbor cells had higher altitudes, the particular cell was not given a direction flow.

Figure 4: Calculation of flow direction from DEM

The generated flow direction raster was next used to calculate weighted flow accumulation (figure 5) of each raster cell. Weights were assigned to each cell according to average rainfall levels per state district, and accumulation level of each cell was subsequently determined by adding the weight of all cells flowing into the particular cell. Collectively, the flow accumulation cells outlined a stream network depicting how and where water particles tend to accumulate (see figure 6).

Figure 5: Calculation of flow accumulation given flow direction raster

Figure 6: Water accumulation raster

Rice blast data in Karnataka was obtained from the International Rice Research Institute (IRRI), and soil nitrogen content were obtained from Raitimitra Agriprofile. Regression analysis was performed assuming a stationary relationship between the independent variable (rice blast prevalence) and the dependent variable (flow accumulation, nitrogen availability).

**Interpreting Results**

The regression outputted a relatively high R^{2} value (73%), indicating that 73% of the variability in the dependent variable could be explained by the two independent variables. Additionally, outputted variable coefficients agree with hypothesized relationships between the independent and dependent variables. For example, as expected, nitrogen availability showed an inverse relationship with rice blast prevalence, as indicated by the negative coefficient (-.006). In other words, as nitrogen availability decreases, on average the prevalence of rice blast is expected to increase. Likewise, rice blast prevalence on average tended to be greater in areas with higher flood accumulation levels, as indicated by the positive coefficient .042. The latter variable has the greatest magnitude, indicating that of the two variables, flow accumulation is suggested to have the greatest impact on rice blast prevalence.

However, the results additionally outputted a statistically significant Koenker variable. The Koenker (BP) statistic is a test that determines whether variables in a model have a constant or dynamic relationship in geographic space, an indication of heteroscedasticity. The null hypothesis of this test (h_{0}) is that all relationships are stationary, but a statistically significant probability of <.005 (.003) indicated the existence of a nonstationary relationship within the model. A cluster of positive residuals was apparent in South-Western Karnataka (Mysore, Mangalore and Southern Chikmagalur), indicating variable(s) may be missing from this model. A potential missing ariable is acidity, as south-western soils in Karnataka are especially acidic and may thus affect *M. oryzae* infection and spread (see Figure 7).

Figure 7: Soil acidity levels in India

**Conclusion**

OLS analysis suggests the significance of flow direction and accumulation in computationally pinpointing locations susceptible to root pathogenesis. However, as previously stressed, further refinement is needed to account for lurking variables, such as rice crops planted, which may be partially responsible for the correlation observed. In the future, the OLS model will also be expanded with the addition of variables such as phosphorus and potassium levels, as well as soil acidity. To such a degree of refinement, this model presents numerous applications for farmers, epidemiologists, and more.

As a final acknowledgement, I’d like to thank Mr. Desjardins of the University of North Carolina at Charlotte (UNCC) for assisting in model development and offering guidance in navigating ArcGIS.

**Works Cited**

“ArcGIS Pro.” Ordinary Least Squares (OLS)—ArcGIS Pro | ArcGIS Desktop, pro.arcgis.com/en/pro-app/tool-reference/spatial-statistics/ordinary-least-squares.htm.

Ishiguro, Kiyoshi. "Simulation Models of Rice Blast Epidemics." Rice Blast: Interaction with Rice and Control (2004): 297-302. Web.

“How to Build Spatial Regression Models in ArcGIS.” GIS Geography, 1 Feb. 2017, gisgeography.com/spatial-regression-models-arcgis/.

Sesma, Ane, and Anne E. Osbourn. "The Rice Leaf Blast Pathogen Undergoes Developmental Processes Typical of Root-infecting Fungi." Nature431.7008 (2004): 582-86. Web.

Howard, R.J. and B. Valent. 1996. Breaking and entering: host penetration by the fungal rice blast pathogen. Annual Review of Microbiology 50: 491-512.