Retail Trade Area Analysis Using the Huff Model
The concept of a retail trade area has been used by analysts and practitioners in retail site evaluation and other market studies for a very long time. In fact, retail trade area analysis and site evaluation are complementary procedures. Retail trade area analysis focuses on locating and describing the target market. This knowledge is critical for both marketing and merchandising purposes, as well as for choosing new retail locations. In site evaluation, trade area analysis is combined with many operational requirements of the retail chain (Jones, Simmons 1993).
It is much easier to analyze trade areas and produce market profiles using GIS. The majority of GIS software includes functionality for extracting and aggregating data at various levels of geography. As a result, trade area analysis became one of the most popular areas of GIS applications in analyzing business problems. The most common definition of a retail trade area is used for the purpose of this article. According to this definition, a retail trade area is “that area, typically around the store, from which the store derives most of its patronage” (Lea 1998b, p.140).
Retail trade area analysis was a very popular theme during the time Business Geographics magazine was published (1993-2001). A couple dozen articles were written on this subject by researchers and GIS consultants specializing in the retail sector. A variety of techniques on how to delimit and analyze trade areas were discussed, along with their advantages and disadvantages. These techniques range from simple ones, such as an application of rings, to more sophisticated, such as utilizing probabilistic trade area surfaces (Gross 1997, Hooper 1997, Lea 1998a, Peterson 1997, Simmons 1998).
All techniques represent either the spatial monopoly or market penetration approaches to analyzing trade areas (Jones, Simmons 1993). The concentric rings method, drive time/distance polygons or Thiessen (Voronoi) polygons are examples of this type of approach. These methods are easy to conceptualize and use. However, they assume that a store has a monopoly over the area – that all households in the trade area relate to the store and no households outside the trade area visit the store. Once the trade area is delimited geographically as a ring, Thiessen or other type of polygon, it is easy to prepare a market profile by extracting and aggregating data using GIS software. Although the methods representing the spatial monopoly approach are commonly used, they involve a lot of simplification because they do not account for the existence of competing stores. Therefore, they should be utilized only if no better alternatives exist (Lea 1998c).
The market penetration approach assumes that there is a spatial variation in the proportion of households served by a store due to competition. The best example of this type of approach is the Huff trade area model. The trade area is conceptualized as a probability surface, which represents the likelihood of customer patronage. This model provides an answer to a basic question: What is the probability that a customer will decide to shop at a particular store, given the presence of competing stores? The creation of probability surface is based on a spatial interaction model that takes into account such variables as distance, attractiveness and competition. The probability surface can be contoured to produce regions of patronage probability, which can then be further used as weights in the preparation of market profile.
The intent of this article is to raise the awareness of the Huff model within the GIS community.
Introduction to the Huff Model
The Huff model was introduced by David Huff in 1963 (Huff 1963). Its popularity and longevity can be attributed to its conceptual appeal, relative ease of use, and applicability to a wide range of problems, of which predicting consumer spatial behavior is the most commonly known. The probability (Pij) that a consumer located at i will choose to shop at store j is calculated according to the following formula (Huff 2003).
- Aj is a measure of attractiveness of store j, such as square footage
- Dij is the distance from i to j
- is an attractiveness parameter estimated from empirical observations
- is the distance decay parameter estimated from empirical observations
- n is the total number of stores including store j.
Four examples included in this article use the Huff model for the following purposes.
- Trade area analysis for a single site using a single variable for site attractiveness
- Trade area analysis for a single site using multiple variables for site attractiveness
- Comparison of potential revenue for two sites
- Modeling a market scenario – more complex trade area analysis involving the use of customer spotting data, information on shopping trips and model calibration.
Trade area analysis for a single site using a single variable for site attractiveness
Two data sets were used in this example: shopping centers with their characteristics, and small census units with some related data. The purpose of this example was to create a market profile for a single mall. The Gross Leasable Area (GLA) was used as an attractiveness variable. The patronage probability surface (a grid) was created for a selected mall (Figure 1). A potential customer is assumed to be located at every grid cell. The probability of a customer patronizing a selected mall is positively related (directly proportional) to the attractiveness of the mall and negatively related (inversely proportional) to the distance between the mall and the customer, given the presence of all competing malls.
The patronage probability surface was converted to regions of probability. It is possible to select any number of regions. For this example, ten regions of probability were chosen. The regions are delineated using contours shown as white lines in Figure 1. The data was then extracted for each region from underlying census polygons (Table 1). The numbers in Table 1 include all households in the study area.
Table 1 was then summarized to show the totals for each variable (Table 2).
Now the values of probabilities came into play. They were used as weights for scaling down the numbers from Table 1, to simply make them more realistic. Each of the first four columns in Table 1 (Population, Dwellings, Families and Households), representing absolute values, was multiplied (weighted) by the midpoint value of every probability region. For example, the midpoint value for the 0 – 0.1 region equals to 0.05. Table 3 presents more realistic market area profile that is based on weighted data. The last two columns in Table 1 were not weighted because they represent relative values and therefore are not included in Table 3.
Table 3 was then summarized to show the totals for each variable (Table 4).
The comparison of tables 2 and 4 allows for stating that the actual number of households patronizing this mall will be less than 10% of all households located in the study area. This number was determined by calculating the proportion of weighted number of households (10,203) in the total number of households (110,810).
Trade area analysis for a single site using multiple variables for site attractiveness
The difference between Examples 1 and 2 is that in Example 2 more than one variable was used as an attractiveness index. “The value of the model depends on the ability to incorporate a number of different measures of store attractiveness” (Jones, Simmons 1993, p.345). If more variables are included, it is easier to understand a variation in patronage patterns. In addition to the GLA, in this example the number of stores in each mall and the number of parking spaces were also considered as the attractiveness attributes. These three variables were converted to z-values and added together to make up an attractiveness index. Some of the attractiveness index values were negative and software could not calculate a surface. To alleviate this problem, the minimum negative z-value was added to all attractiveness values in order to create positive numbers, so that software would be able to handle the task. Figure 2 shows the probability surface created for the same mall using an attractiveness index composed of three, instead of one, characteristics. These three variables are correlated to some degree.
Comparison of maps in Figures 1 and 2 indicates that considering more attractiveness variables increases the drawing power of the analyzed mall. The extent of isolines has changed (for an example see isoline labeled with the value of 0.3).
The new surface was converted to regions of probability and data was then extracted for each region from underlying census polygons (Table 5). The numbers in Table 5 include all households in the study area.
Table 5 was then summarized to show the totals for each variable (Table 6).
Each of the first four columns in Table 5 (Population, Dwellings, Families, Households), representing absolute values, was weighted by the midpoint value for every probability region. The result is shown in Table 7. Again, the last two columns in Table 5 were not weighted because they represent relative values and therefore are not included in Table 7.
Table 7 was then summarized to show the totals for each variable (Table 8).
The numbers in Table 8 are significantly larger than in Table 4. The number of households patronizing the studied mall increased by 32.2% and this mall would now capture 12.3% of households in the study area. The latter number was received by calculating the proportion of weighted number of households (13,487; Table 8) in the total number of households (110,030; Table 6).
Comparison of potential revenue for two shopping centers
This example involves the calculation of potential revenue for two shopping centers using the same data sets and the same study area as in the previous examples. The first site used in this comparison is Micmac Mall for which the patronage probability grid was previously created (Figure 2). The Halifax Shopping Centre was selected as the second site. The patronage probability grid for the Halifax Shopping Centre was also calculated using z values (Figure 3).
Additional data included potential spending on food, furniture and clothing (Table 9). The area of each polygon was then calculated. Knowing the grid cell size, it was easy to calculate a number of cells in each polygon. The potential spending on each category was then divided by the number of cells in order to get spending per single cell. These were the necessary steps in order to prepare data suitable for rasterization. Table 9 shows a portion of a larger table with data prepared in this manner.
Three grids were produced for three spending categories (Figure 4). The dollar values shown in the legend of each grid represent the spending potential per single grid cell.
Values of each of these grids shown in Figure 4 were then multiplied by the values of the patronage probability grid for each shopping centre. This produced the total of six grids with weighted dollar values: three for each mall (Figures 5 and 6). This is the most interesting part of this example, allowing for an estimation of potential revenue each mall may anticipate from selling food, furniture and clothing.
The last step was to aggregate weighted grid data shown in Figures 5 and 6 and calculate totals ($) for each shopping centre and each category. The aggregation of data took place within a region that was created by combining polygons within the study area (Figure 7).
The final results of data aggregation are shown in Table 10. They indicate that Micmac Mall may slightly ‘outperform’ the Halifax Shopping Centre.
The characteristics of both shopping centers that served as attractiveness variables in the Huff model may help in understanding this situation (Table 11).
These numbers indicate that considering only attractiveness characteristics, both malls would indeed demonstrate a similar drawing power.
Modeling a market scenario – more complex trade area analysis involving the use of customer spotting data, information on shopping trips and model calibration
This exercise was completed using different software allowing for more complex trade area analysis. The software uses a new version of the Huff model that accepts multiple predictor variables and estimates the parameters using the ordinary least squares method. The predictor variables include store (destination) attributes, census polygons (origins) characteristics and distance of origins to destinations. With this new version, it is possible to model a variety of market scenarios such as an addition or closure of a store and determine the impact this will have on existing stores. A different study area was used for this example (Figure 8). There are three stores in this study area: A, B, and C, representing the same retail chain. The study area was determined based on customer spotting data and included polygons that make up the primary (60% of customers) and secondary (next 25% of customers) trade area for all three stores.
Three data sets are required.
- Destinations (stores) and their attributes. Only two attributes (retail square footage and warehouse square footage) were used as predictor (formerly attractiveness) variables, but many more would be desired (Table 12).
- Origins (small census units) and their attributes. Attributes such as a number of households and average household income could be included, but for simplicity of this example only product expenditures were considered as origin characteristic (Table 13).
- Scenario data. Scenario data indicate the patronage between a given origin and destination and the distance between them. Distance can be calculated as road mileage or travel time. If this information is not available, straight line distances can easily be calculated from coordinates for the store locations and origin centroids included in Tables 12 and 13. Table 14 illustrates sample scenario data. Because the table does not include the column Distance, straight line distances will be calculated when the model is first run.
The scenario data is the most important of the three data sets because it is required for calibration of the Huff model. It includes actual patronage information from a sample of households in each origin within the study area. Such data is usually compiled based on a customer survey. The values of proportions in Table 14 indicate the proportion of shopping trips that were made to a particular store from a specific origin. Each origin is listed three times in Table 14 because there are three stores. The sum of the proportion values for each origin should be one.
The calibration of the model was completed with the use of scenario data. Several reports, graphs and maps are produced for evaluation of results. The statistical report summarizes the regression results and statistics for predictor variables. Table 15 presents one of the calibration reports. The third column shows the actual proportions entered as input data, the fourth column is populated with the expected proportions calculated from the model, and the last column shows the differences. A positive difference indicates that there were actually fewer shopping trips to a particular store from a given origin than the model predicts. This means that three origins listed in Table 15 are ‘underperforming’ in terms of the patronage of store B. Negative differences indicate that the origins are ‘outperforming’ comparing to what the model indicates. Store C represents this situation.
Another calibration report shows market share and how the total sales are shared between stores. The Huff model calculates market shares by applying probabilities to the expenditure data for all origins. The highest market share belongs to store B, followed by store C.
These and other results, also for individual stores, can be analyzed using maps and graphs resulting from calibration.
Once the model calibration is complete, it is possible to simulate various market scenarios. Table 17 demonstrates the changes in proportions when a new store is added to the market. The retail and warehouse square footage of the new store are equal to 4,000. All stores lost some portion of patronage in favor of the new store.
Table 18 illustrates changes in market share and total sales.
Again, there is much more to explore. An impressive number of possible ways to further analyze the results of this new market scenario using graphs, maps and other reports is available to the user.
There are a few commercially available GIS software packages that incorporate the Huff model and less than a few GIS software packages that allow for parameter estimation and model calibration. Software used for the completion of the examples presented in this article cannot therefore remain unnamed. Examples 1-3 were completed with MapInfo’s Vertical Mapper and for the last exercise, MPSI Systems’ Huff’s Market Area Planner was used. Unfortunately, MPSI Systems does not have plans to update or enhance the software.
David Huff explains why most GIS packages do not provide a statistical capability for calibrating the model (Huff 2004, p.2). The first reason is that in order to calibrate the model, scenario data is required, and this data can be expensive to collect. However, this looks like more a user’s problem rather than a vendor’s problem. Huff states also that indeed, “the statistical analysis required to calibrate the model may be perceived as being too difficult” and that the additional functionality would have to be included in GIS software to perform various statistical analyses. Huff also says that the model “has not always been employed correctly and its full potential has not been realized” (Huff 2003). This is a matter of time. The inclusion of the Huff model in Business Analyst from ESRI is a major step forward. There are many applications for this model not only in the retail environment but also in other business areas.
Finally, it is interesting to note that the Huff model is the only one discussed in the “Trade Area Analysis” chapter in the “Geographic Information Systems in Business” book, published in 2005 (Pick 2005), most likely because of its superiority to other methods.
An Update from MPSI, 2005. An interview with Jim Auten, MPSI’s president and CEO. Directions Magazine, March 22
Gross, B., 1997. Dynamic Trade Areas. Business Geographics, September, 30-33 (not available online)
Hernandez, T., Lea, A.C., and Bermingham, P., 2004. What’s in a Trade Area? Publications of the Centre for the Study of Commercial Activity , Ryerson University, Toronto
Hooper, H., 1997. Who’s Really Shopping My Store? Business Geographics, September, 34-36 (not available online)
Huff, D.L., 1963. A Probabilistic Analysis of Shopping Center Trade Areas. Land Economics 39: 81-90
Huff, D.L., 2003. Parameter Estimation in the Huff Model. ArcUser, October-December, 34-36
Huff D.L., 2004. A Note on the Misuse of the Huff Model in GIS. Retrieved from www.mpsisys.com on January 16, 2004
Huff, D.L., 2005. The Use of Geographic Information Systems and Spatial Models in Market Area Analysis. ESRI GeoInfo Summit, April 18-19, Chicago
Lea, A.C., 1998a. An Arsenal of Trade Areas. Business Geographics, August, 34-35
Lea, A.C., 1998b. Misuses and Abuses of the Trade Area Concept, GIS’98/RT’98 Conference Proceedings, 140-143, Toronto
Lea, A.C., 1998c. Trade Areas: Concepts, not Polygons. Business Geographics, February, p.18
Lea, A.C., 2005. Site Evaluation and Sales Estimation Modelling for Retailers and Banks. ESRI GeoInfo Summit, April 18-19, Chicago
Jones, K., Simmons, J., 1993. Location, Location, Location. Nelson Canada
Peterson, K., 1997. A Trade Area Primer. Business Geographics, September, 18-21
Pick, J. B., 2005. Geographic Information Systems in Business. Idea Group Publishing
Simmons, W., 1998. Defining Trade Areas. Business Geographics, September, 28-30