Does it really matter which data vendor you use for demographic data? The answer may surprise you.Now, I'm not talking about the cost of data or the service you receive from one company over another.No, what I'm talking about is the actual data you get.
For those of you who can't wait, skip down to "And the Results Are..." toward the end of this article to see the results.Hint: the article's title tells you the results.
A Demographic Dilemma
Just like you, in conducting a comparison of demographic suppliers, I was faced with a number of vendors.I also had the task of which geographic areas to examine and which demographic variables to compare.
For data suppliers, I tried to choose five suppliers that most of you would encounter: Claritas, ESRI, MapInfo, Applied Geographic Solutions, and the US Census.Actually, I threw in the US Census as a control for comparison against actual figures.The exclusion of other vendors simply means I had to draw a line in the sand somewhere.(For more information about the chosen vendors, please see the addendum "About the Vendors" at the end of this article.)
The geographic areas posed an even greater challenge. I elected to use standard census geographies (census tracts, census tract aggregations, and counties) for the comparison.I did this to make things as comparable as possible without introducing extraneous factors.Although radii are most commonly used, I wanted to avoid potential differences in data retrieval methods that might come into play using radii.I did, however, try to make the aggregated census tracts roughly the size of a three-mile radius.
The other challenge was identifying common variables across all of the vendors.As simple as it might sound, it actually turned into some work to get comparable variables.For example, on age categories, I ended up adding together many of the younger age groups to form an Age 5-24 years, as each vendor had their own breaks that created multiple overlaps across these years.
A Question of Geography
As I mentioned, I used census tracts, aggregated census tracts and counties for my comparison.The final geographic selection was somewhat arbitrary but covers a range of positive and declining growth areas around the United States.
I used growth patterns from 1990 to 2000, as defined by the US Census Bureau, to identify 32 possible candidates.Sixteen of these were cities and 16 were counties.The geographies had to have a population of at least 40,000 and less than 2 million.
Each group of 16 was broken down into four categories: high-growth, medium-growth, slow- to no-growth, and declining-growth.The rough growth ranges were 20% or more, 5% to 15%, 3% to -3%, or more than -5%.
Most interesting were the geographies that showed moderate to high growth or large declines.If an area's growth was static from 1990 to 2000, it's not that hard to project out what things will look like in 2001 or even 2006 for that matter.
I elected to use five areas representing two declining areas and three moderate to high growth areas.I chose one county and one area of aggregated census tracts for each.I then threw in another aggregated census tract area in a moderate to high growth market since these would tend to be the areas most focused on by companies.I also looked at the individual census tracts used to create the census tract aggregations.
The final cut, shown with their 1990-2000 population growth in parentheses, is: Aroostook County, Maine (-15%); Will County, Illinois (41%); western Fort Wayne, Indiana (6%); Scottsdale, Arizona (237%); and central Birmingham, Alabama (-10%).(For a precise definition of each geographic area, please see "The Geographies Defined" section at the end of this article.)
I have the vendors and the geographies, now it is time to see how they compare.I chose a handful of variables that measured both quantitative characteristics, such as population or median household income, as well as qualitative characteristics, such as percent white population or percent population aged 0-4 years.(You can see the complete variable list at the end of this article)
Going in to the analysis I assumed that there would
be no difference in the data supplied by the various vendors.(For those
of you with some statistical background this was my null hypothesis.) Therefore,
unless one vendor's data showed a marked difference from the others it
was deemed equivalent.
Marked differences were defined in a straightforward manner.The values for each individual variable being analyzed were averaged across the four vendors.For example, the four current year total population values for Aroostook County, ME are 73131, 73303, 73526, and 73507.The average of these four values is 73367.Each individual value was then statistically compared to this average value.(For more information about the statistical test used, please see the addendum "The Tau1 Test" at the end of this article.)
This same test was used to see if the values supplied differed significantly from the values provided through the US Census Bureau. This test would demonstrate whether current year estimates and five-year projections differed significantly from the year 2000 census values or the 1990 census values in the case of income.
And the Results Are...
Overall, the data supplied was remarkable consistent across the vendors.In addition, the current year estimates showed some differences and the five-year projections showed many differences from Census data.
Except for Aroostook County, Maine, differences between five-year projections on percent age groups and 2000 census data were significant.All five areas showed significant differences in income values with the 1990 census data.(At this writing, the 2000 census income figures were not available.) Race differences from the US census were evident for Scottsdale and Birmingham.
All areas showed significant differences from the 2000 census data in population or households except for Birmingham, Alabama, which showed no differences.
The implication is that even though current year estimates are only one year out from the 2000 census data, they are already showing significant differences from the census data when it comes to population counts.Unless you are working in fairly static growth areas, as represented by changes from 1990 to 2000, it is wise to consider getting current year estimates for the area.
As for variations with individual vendors, ESRI showed itself to be significantly different from the other vendors for population estimates and projections in the two high growth markets.In one instance they provided the lowest values, in the other the highest. (Please note that the test conducted makes no assertion that certain values are correct or wrong, only that they are different.)
The only difference in income values was found for Birmingham, Alabama.In this market, MapInfo's 2001 per capita income figure differed from the rest.For median household income and average household income MapInfo's data was consistent with the other vendors.
Race and ethnicity differences were found in Will County, Illinois, where ESRI's five-year projection of % Hispanics showed a significant difference.
Some age categories also showed statistical differences. However, since the actual values for the age categories tended to be small the actual difference in practical terms would not matter.(How much difference does it make that one vendor might have a value of 1.6% when the average was 1.2%?)
The results of the individual census tract analysis were consistent with the aggregated tract data results.Only four of the thirty-four tracts showed differences, which were the drivers for the results cited at the aggregated tract level.
As you can see, very few significant differences exist between the vendors.For all intent and purposes, you should be able to freely substitute one vendor's data for another's and still draw the same conclusions about the market.
When buying demographic data, you have many options and things to consider.However, one thing you don't need to worry about is the data itself.Although only a small number of geographies were used in this analysis, it appears that the data quality between vendors is very consistent.
Knowing this can simplify your decision as to which vendor to use.You don't have to understand or worry about the data your buying.Rather you can focus on the things that will make a difference to you: customer service, ease of access and deployment of data, and price.
About the Author
Mike Pugliese has worked with demographic data since the mid-1970's when he built his first site selection model for a home center chain.He worked at National Decision Systems (now Claritas) for ten years and was the co-developer of the Circular Field Retrieval System.He also worked for Strategic Mapping, Inc.(now part of ESRI) for several years.
During his career he has helped many leading retail and restaurant chains in the U.S.to use demographic data more effectively. He is an expert in site selection modeling and customer segmentation analysis.
Mr.Pugliese is President of StickEApps.com, a firm that specializes in web site initiatives for small to mid-size restaurant and retail chains.
Demographic Variable List
All variables were calculated for the year 2000, year 2001 (current year estimate), and year 2006 (five-year projection) with the exception of the income variables which were calculated for the year 1990, year 2001, and year 2006.
Age variables (total population)
Percent age 0-4 years
Percent age 5-24 years
Percent age 25-34 years
Percent age 35-44 years
Percent age 45-64 years
Percent age 65-74 years
Percent age 75-84 years
Percent age 85+ years
Percent white population
Percent black population
Percent Hispanic population
Average household income
Median household income
Per capita income
Population and households
The Geographies Defined
Will County, Illinois: The entire county.
Aroostook County, Maine: The entire county.
Scottsdale, Maricopa County, Arizona: 1990 census tracts 303.33, 303.43 and 2168.17 or 2000 census tracts 303.33, 303.43, 303.58, 303.63, 303.64, 303.65, 2168.22, 2168.23, 2168.24, 2168.28, and 2168.29.
Fort Wayne, Allen County, Indiana: 1990 census tracts 6, 9, 10, 11, 12, 19, 20, 21, 22, 24, 25, 26, 32, 37, 38, 39.01, 39.02, 115.01, 115.02, 116.01, 116.03, 116.04, and 116.05 or 2000 census tracts 6, 9, 10, 11, 12, 20, 21, 22, 25, 26, 32, 37, 38, 39.01, 39.02, 115.01, 115.02, 116.01, 116.03, 116.04, and 116.05.
Birmingham, Jefferson County, Alabama: 1990 and 2000 census tracts 14, 15, 16, 24, 27, 29, 42, 45, 47.01, 47.02, 48, and 49.
About the Vendors
Applied Geographic Solutions
Applied Geographic Solutions, Inc.(AGS) is a leading supplier of premium quality demographic and marketing databases and drive-time software.
Web site: http://www.appliedgeographic.com
Claritas has provided state-of-the-art, targeted solutions for your consumer and business marketing issues.
Web site: http://www.claritas.com
ESRI Business Information Solutions provides data, demographics, desktop software, segmentation, online reports, mapping and marketing analysis to many industries, non-profits and government agencies.
Web site: http://demographics.caci.com
MapInfo is a global company and software technology leader.
Web site: http://www.mapinfo.com
US Census Bureau
The US Census Bureau is part of the Department of Commerce.
Web site: http://www.census.gov
The Tau1 Test
This statistical test is measures the mean value (average) when using 10 or fewer numbers to calculate the mean.To calculate the tau1 value you need three things: the calculated mean (or average) of the values, the difference between the largest and smallest value you have (the range), and the value you expect the mean to be (the hypothesized value).
For this article, there were four values used to
calculate the mean.Using the current year population for Aroostook County,
Maine as an example, the four values are 73131, 73303, 73526, and 73507.
To see if the average current year population estimate differs from the
2000 census value calculate the following:
Mean value = (73131 + 73303 + 73526 + 73507) / 4 = 73367
Range = 73526 - 73131 = 395
2000 population = 73938
To see if the average of the current year populations differed from the 2000 census value, calculate
tau1 = (73367 - 73938) / 395 = -1.45
This is a significant difference at a 95% confidence level.(In conducting this test, there is only a 5% chance that we would say the values are different when in fact they are not.)
The same formula was used for each vendor's value,
substituting the individual vendor value for 2000 population in the calculation.
This test determined whether an individual value differed significantly
from the average of all the values.Any tau1 value that was less than -0.717
or greater than 0.717 indicated a significant difference at the 95% confidence
Editor's Note: This article was funded by SRC, a distributor of demographic data.Although we find the article to be written fairly, we wanted to inform the reader that the author was paid by a vendor of software tools and demographics.