The Search for the Silver Bullet: Building the Ultimate Sales Forecasting Model for Retailers

During the past 40 years I have had the pleasure of developing, using and/or critiquing several dozen different approaches to location-based sales forecasting tools for retailers. While quite simple tools have worked well for some retailers (e.g. Quizno's opened several hundred locations with a checklist scoring model), other retailers have insisted on the most sophisticated analytics available and have invested in the staff, data and equipment needed to make this possible (e.g. Whole Foods, McDonald's and Walgreens).

To put all of these different approaches into perspective I thought it might be interesting to rank them based on the percentage of a total solution each approach could provide. As you will quickly see, my ranking cannot be taken too literally because in most solutions today more than one approach is used. For example, most models would combine some type of GIS analysis with non-GIS data and other analytical procedures. With this qualification in mind, here is my ranking.

BASIC STATISTICAL OR LOGICAL METHODS

5% Solution: Find a Location That Is … Near My Home, Cheap and Available, or In Florida

Rationale: These are actually real strategies for many small retailers. I've done lots of work on locations that were near the owner's home or represented an outpost location in Florida.

10% Solution: The Site Checklist: A simple tool that can be very strong and helpful or a rigid roadblock depending on how it was developed and applied.

Rationale: The user simply fills out the checklist and scores or weights are added to each answer, then summed for a final total.

20% Solution: Multiple Regression Analysis and other Statistical Models: These powerful, accessible tools have lots to offer for understanding real-world patterns but are also the source for tens of thousands of poor retail locations or "dogs."

Rationale: Typically the user sets up a prediction equation with known variables such as demographics or site features on one side of the equation and store sales as the criteria; then, poof, like magic an "optimal" weighting of the predictor variables appears. It's very unusual for such a solution to be a useful, reliable prediction model.

25% Solution: Lifestyle Models: Find locations that have a large number of people living in the trade area who represent the best customer segments identified in a lifestyle analysis for the client.

Rationale: A popular approach used by many major companies for location analysis but guaranteed to produce a very weak model without many qualifications.

ADVANCED MATHEMATICAL OR SPATIAL METHODS

30% Solution: Neural Networks: Great tools for stock market analysis and other kinds of forecasts where the quantity and quality of data lead to manageable noise but likely to be disastrous, unless constrained, in a site selection context.

35% Solution: Similar Attribute Models: The idea of finding sites with attributes similar to your good sites is not a bad idea, but when used as the primary modeling methodology, this approach has many pitfalls, especially without a clear understand of market interactions and scenarios.

40% Solution: Gravity and Spatial Interaction Models: A large independent class of modeling techniques that for many kinds of retail such as grocery or C-store/fuel are often the only approach used.

45% Solution: Hierarchical Models: Another large category of models ranging from standard analytical procedures such as Chi-squared Automatic Interaction Detection (CHAID) to a variety of expert systems.

Rationale: These models are often used as components in other modeling systems such as genetic algorithms.

50% Solution: Genetic Algorithms: One of my favorite tools because it is flexible enough to integrate almost every other modeling approach as a component to solve specific sub-problems.

Rationale: The process here works similar to a statistical approach as you feed the genetic algorithm predictors and a criterion, and evolve a solution over many generations.

SPECIAL MODELS OR APPROACHES TO INCREASE PREDICTION ACCURACY

60% Solution: Regional Models: Adding regional component models that reflect demographic, lifestyle, behavioral and marketing biases will almost always improve the overall prediction accuracy.

65% Solution: Day-part Models: The factors that drive breakfast, lunch and dinner sales are always different to some degree; yet, we seldom look at these separately in our forecasts.

70% Solution: Market Interaction Models: Most retail models confound within-market and between-market variance which can lead to artificially higher levels of back prediction and lousy forecasts ---market interaction models fix this problem.

75% Solution: Transparent Models: Models where the weights directly reflect the importance of that component to sales. For example, learning that "a logo sign on the interstate is worth $40,000 in annual sales" is a simple version of this concept. Complex variations are, well, much more complex!

Rationale: In this approach you attempt to translate the model weights into sales numbers so that you can see how important each variable is to the final sales prediction.

80% Solution: Prototype Models: Specific models for different retail prototypes definitely will improve the final modeling solution.

Rationale: This approach has been common for some time with the best modeling vendors and is implemented by building different models for each prototype.

85% Solution: Customer Source Models: Ultimately, predicting retail sales is about predicting customer volumes and all customer sources --- HOME, WORK, SHOPPING, etc. --- are driven by different factors.

Rationale: In this method customer data is collected from each store representing the number of customers for each source. As you study sales models with multiple sources you quickly see that BALANCE is the key to high performing stores.

90% Solution: CHARMS Categorical + Continuous Models: Sales forecasts depend on using the best combination of continuous and categorical information in your equations. In most modeling systems this problem is ignored. CHARMS is an automatic, non-linear process I developed for Experianto help solve this problem.

Rationale: Using this process involves separating categorical and continuous prediction variables before you start, then testing the value of each predictor variable with the statistical noise eliminated.

100% Solution: Scenario Models: Small towns, commercial settings, residential settings and urban settings all require different approaches for predicting sales, yet typically they are lumped together in one model. Use a scenario approach to get a huge improvement in your model's forward prediction accuracy.

Rationale: This approach involves defining the relevant scenarios for the retailer and then building a model for each scenario. It is seldom used but always needed because no current modeling approach is robust enough to work well across scenarios without fudging.

And, yes, there is the 110% solution that many of you are already using. It includes data from social media, real-time POS systems, cell phones, Google ads and customer counting video systems all integrated into a sophisticated GIS framework. This discussion will have to wait for another time. I hope my list gives you some idea of the options that are available for retail sales forecasting. It is pretty obvious, looking at these options, that the best models will need to combine aggregated or specific kinds of spatial data with many other kinds of information that can classify stores, locations or markets into useful categories or scenarios. If you are currently using only one of these approaches you have lots of forecasting power to gain by expanding your thinking. Good luck with your prediction models!