Ed.Note: This is the first in a series of “Business Geographics 101” articles to be provided by Dr.Ela Dramowicz that will cover the basic techniques required for spatial business analytics.In order to geocode data, it must contain information about location such as a street address, a postal code (or at least part of it), or a name of an area, e.g.county, census subdivision, etc.Geocoding is about adding x, y coordinates to point locations represented by these pieces of information.Three main methods of geocoding are available: by street address, by postal code, and by boundary.
In order to geocode data, it must contain information about location such as a street address, a postal code (or at least part of it), or a name of an area, e.g.county, census subdivision, etc.Geocoding is about adding x, y coordinates to point locations represented by these pieces of information.Three main methods of geocoding are available:
- by street address,
- by postal code; and
- by boundary.
Geocoding by street address
Address geocoding involves matching addresses in a table to be geocoded to the street names and address ranges in the street network file.Geocoding software reads the first record in the file.It first matches the street name in both - table to be geocoded and a reference table, which is accompanied by a map.Once the street name is matched, all address ranges for this street are examined in order to determine the street segment where the particular address is found, on the odd or even number side of the street.This part of the geocoding process is known as address matching.Since the coordinates of the street segment endpoints are in the reference layer, as well as the range of street numbers for this segment, it is possible for the software to interpolate the coordinates of the address.In the example below the table to be geocoded (Table 1) consists of a few restaurants and their addresses.Table 2 is an example of a reference attribute table, with the fields specific to a particular address style (U.S.streets in this case).The address of the Harvey's Restaurants (marked with a red arrow) is found in the second record in the reference table, on the even number side of the street (FromLeft, ToLeft).Interpolated addresses are distributed evenly along a street segment as illustrated in Figure 1.The 5650 Spring Garden Rd address is the sixth point from east (symbolized on a map with a red dot).Most likely it is not a true location of this restaurant.Street address geocoding produces only an approximation of this restaurant's true location.
Address geocoding results in the same accuracy in urban and rural areas, as illustrated in Figures 2 and 3.In rural areas in Canada, however, address geocoding cannot often be completed, because address ranges are not available for small towns and other small communities.This is improving every year, as street network data providers release new versions of street network files.
Geocoding by postal code
Any database that includes postal code information can be geocoded by postal code based on a file that consists of postal code centroid points with the geographic coordinates attached to them.In Canada, the Postal Code Conversion File (PCCF) also provides a correspondence between the postal code and Statistics Canada's standard geographic areas for which census data are produced.Through the link between postal codes and standard geographic areas, the PCCF permits the integration of data from various sources.According to the Canada Post Corporation, the postal code consists of six characters (e.g.B0S 1M0) of which the first three are known as Forward Sortation Areas (FSAs).Forward Sortation Areas are large polygons with distinct boundaries, whereas postal codes do not have boundaries even though they represent areas.Figures 4 and 5 show the spatial relationship between postal codes and Forward Sortation Areas and the density of postal codes in urban and rural areas.
Geocoding using postal code involves matching postal code information in the table to be geocoded to the postal code information (FSALDU - Forward Sortation Area Local Delivery Unit) in the PCCF file that acts as a reference layer.Tables 3 and 4 are examples of data sets used in this process.Postal codes in the table to be geocoded cannot contain spaces in order to ensure match with FSALDU.
Table 4.Sample records and fields from the Postal Code Conversion File.
Based on the match (relational join operation) the table to be geocoded receives coordinates from the PCCF file and points can then be created on a map.The geocoded records will have locations identical to the centroid locations of those postal codes to which they were matched.In the above example only one record from Table 3 (indicated with a red arrow) was matched to a record in Table 4.
Geocoding based on a postal code produces radically different results in urban and rural areas.Urban postal codes represent very small areas as they approximate a block face - one side of the street between two intersections.The results of geocoding are therefore fairly accurate.The red circles in Figure 2, representing postal code-geocoded restaurants, are within a reasonable distance from green dots, representing street address- geocoded restaurants.Rural postal codes are very large, covering many communities; therefore the results of geocoding are less accurate.In Figure 3 all restaurants are geocoded to a single postal code location.There are no other postal codes within the area shown in this figure.The question is: who would use data sets geocoded this way? For some businesses it is the only available information about location of customers, gathered most likely from customer satisfaction surveys that include a question about postal code.Knowing who the people are who patronize their businesses (using the link between postal codes and census standard geographic areas) and where (what part of the city, town, or county) they are located helps in planning marketing campaigns and targeting new customers.For some research projects, due to data confidentiality, customer data is often aggregated by postal code (in both urban and rural areas), then geocoded and analyzed.
Geocoding by boundary
Geocoding by boundary is the least accurate of the three methods of mapping point locations.Any boundary file, such as counties or Forward Sortation Areas, can be used, as long as the boundary name in the table to be geocoded can be matched with the boundary name in the reference table.For example, GIS software reads the FSA name in the first record in the table to be geocoded (Table 5), matches it with the FSA name in the reference table (Table 6, record marked with a red arrow) and assigns the FSA centroid coordinates to this record.The geocoded location will display at the FSA centroid location.If the restaurants in Figure 2 were geocoded using the FSA, all would display at the red dot location that represents the centroid of this FSA.The FSA centroid for the area shown in Figure 3 is located outside of the map, but again, all restaurants would be located there.
Table 5.Table to be geocoded.
Table 6.Sample records and fields from the Forward Sortation Area reference table.
None of the discussed geocoding methods produces accurate results in the sense of positional accuracy.In cases where such accuracy matters, a GPS device could be used for coordinate collection.GIS software can easily create points from pairs of x, y coordinates provided by GPS.
With the advent of GPS technology, more people are starting to use accurate location information instead of street addresses.According to the Natural Area Coding (NAC) Geographic Products Inc., geographic coordinates have too many digits for consumers to deal with.NAC developed the Natural Area Coding System that may revolutionize addresses all over the world (click here for a recent article about NAC).Natural Area Codes are defined by a series of grids applied on the earth surface.There is an unlimited number of NAC grids defined with cell sizes ranging from thousand kilometers to one meter, a few centimeters, or even smaller sizes.An eight or 10 character NAC (also called a Universal Address) is a cell on the forth or fifth level NAC Grid with width/length about 30 meters or one meter, respectively.There are many advantages of using Universal Addresses and this technology has already been adopted by a number of companies.NAC's intention is not to replace existing street addresses but to complement them, so that people who know how to use Universal Addresses can benefit from using them.