Befriending the Data Scientist in the Cubicle Next to You
The preponderance of voluminous and varied data arriving at a high velocity virtually every second has been the primary driver of the emergent field of data science. It’s been about a decade since this phenomenon became widespread, when “big data” started showing up on every list of emerging technical issues. Every business, organization, government, and institution has mounds of the stuff and the primary role of their data scientists is to make sense out of it, often empowered by algorithms that have been designed to interpret patterns.
Data science jobs are in high demand, but they’re not easy to do well or for a long time. When data scientists describe the reasons that they are experiencing burnout or leaving their jobs, they often cite the lack of direction (so much you could do, where to start?), mentoring (you’re a team of one and few others understand your methods), or metrics for success (so you found one trend or pattern, is it a valid one? a worthwhile one? now what?).
It shouldn’t be a surprise that such situations arise. Frequently, management has created a position for a role that is impossibly large and poorly defined. Another reason data scientists cite as a source of their frustration and anxiety is that they can quickly become the go-to person for all types of data wrangling, analysis, and interpretation, despite having little or no familiarity with those type of data or the types of questions that are appropriate or suitable for those data. Fortunately, smart people are realizing that the cart was put before the horse, which, in turn, was put before the big, and rapidly growing mounds of horse poop, err, I mean data, and standards and definitions for data science are now being organized as we speak. Going forward, written policies and guidelines such as these will help inform job descriptions, curricula, and contracts, improving the alignment between possibilities, expectations, and outcomes.
If you are already a geospatial data scientist in your organization, you are expected to fulfill specific roles in terms of spatial data and its uses in the workflow of decision-making. Ideally you are already knowledgeable about spatial data: understanding the role of spatial autocorrelation in geographic patterns, the challenging dilemma that merging data across inevitably inconsistent spatial boundaries presents (i.e., the modifiable areal unit problem, or MAUP), the implications of having latitude and longitude location data that someone long ago rounded up to two decimal places. These are things that a newly minted “data scientist” is unlikely to know but can make the difference between a keen and valuable data perception or an epiphanot (an idea that seems like an amazing insight to the conceiver but is in fact pointless, mundane, stupid, or incorrect).
In fact, it’s pretty likely that significant gaps exist between the geospatial work and the data science work that is taking place within a company, when connections instead would be mutually beneficial. Standards that the geospatial community has developed and employed, such as those published by the Open Geospatial Consortium, can inform and support the wrangling taking place. “Data Visualization” is a frequent component of data science work, and maps are commonly selected formats for representations, given the preponderance of geographically-referenced data. You and the data scientist might both be using Tableau or R to produce your maps, and could likely learn a thing or two from each other. Become that team member that you or the other may be lacking.
Data scientist may be a role that you are asked to play in your company or organization abruptly or something into which you transition. There are chances that your own academic geospatial background may provide you with training in this direction as new programs in spatial data science are starting, such as the Spatial Data Science & Technology program at the University of Oregon, or Penn State’s Geospatial Big Data Analytics Certificate. Perhaps University of Colorado’s Earth Data Analytics – Foundations is the right professional certificate for you. If you’re looking to enhance your already existing geospatial credentials, sites like Data Science Graduate Programs and Masters in Data Science offer extensive collections of programs, and a background in GIS will be a huge differentiator as you develop further expertise.
Regardless of whether you engage with data science and data scientists formally or not, continued intersections are inevitable. The connections are too numerous and too probable to avoid, and the potential synergies are too enticing. As Esri’s David DiBiase noted in his recent commentary on GIScience and Data Science, data scientist is more of a role than a singular profession, and there are numerous ways that role can develop and deploy. What does matter is that location-savvy expertise not get lost in the shuffle or completely ignored in favor of the whiz-bang, new-kid-on-the-block algorithm-of-the-day. Geospatial thinking and skills are vital to major data science topics we are encountering every day, including gerrymandering and the privacy of location-based cell phone data being confronted on the national stage.