This article brought to you by Applied Geographic Solutions
By now, you’ve probably heard that the 2020 Census has introduced intentional errors to deal with potential privacy issues. While this is needed, it doesn’t come without challenges. The introduced errors are more severe than anyone thought they would be, often putting all the children in a block group in one single block and putting households of people living over bodies of water. The data team at Applied Geographic Solutions has thoroughly reviewed the data and has made changes to balance the 2020 Census into a useable state.
In years past, the census has — as required by law — made substantial efforts at protecting the privacy of individuals. As the genealogy world well knows, the physical records, which have names and addresses, are sealed for decades. When the census included both the short form and the long form, the sensitive personal data found in the long form was reasonably well protected — it was based on a sample, and techniques were employed to “borrow” characteristics between similar, nearby census blocks. With the demise of the long form — replaced by the American Community Survey — the census consists of only completely enumerated geographic areas (obviously with some error). As a result, the data for small areas can be used in conjunction with other databases (mailing lists, property records, etc.) to potentially identify individuals within them.
The unpleasant conclusion is that the data has been seriously corrupted, so much so that a significant number of census block groups have statistically impossible data. These include entire blocks of children with no adults, occupied dwellings with no people living in them, family sizes well above average and people living over bodies of water.
For every identified impossibility, there lurks underneath it at least 10 improbabilities, and these are just the baseline numbers. The real meat of the 2020 census is found in the detailed tables which address key population characteristics (age, sex, race, Hispanic origin, ancestry) and household characteristics (household size and structure).
The privacy “budget” was essentially exhausted at the block group level with the release of the general population counts, and the Census is considering releasing the detailed tables only to the Census Tract level. It is not hard to understand why. Massive reallocation was required just to release top level statistics. Imagine what will need to be done to publish a table of population by age and sex.
From an operational standpoint, the goal of maintaining privacy while maintaining the essence of each geographic unit is an almost impossible task. The published redistricting results clearly indicate that the problem was not solved by one-at-a-time characteristic trading between nearby areas, but instead relied upon bulk operations which radically change the essential character of each geographic unit. The presence of statistical impossibilities is clear evidence of this.
To further help users understand how the privacy budget has affected small area data, we decided to deep dive into a specific area to see how it looks on the ground. We chose a block group which we know has not changed over the past 10 years, located in a well-established part of Thousand Oaks, California, where AGS’ headquarters is. This is not an “outlier,” and it is important to note that we found similar patterns in nearly all block groups nationwide.
The block group 061110059092 (2010) was not redefined, although the unpopulated blocks along the freeway have been merged into block 2000. For convenience, we will label them only using the 2020 numbers, as the block numbering has changed drastically. The 10 blocks appear below:
Block groups in Thousand Oaks, California. (Satellite Imagery/Google Maps).
It is largely a residential neighborhood, built in the 1970s, with open space along the freeway that includes an equestrian center. At a summary level, the block group has changed little over 10 years. The number of homes has grown slightly with infill development, and the average household size has decreased slightly over time (as it has generally).
At the block level, the results are much more dramatic. The number of vacant dwellings in the block group doubled from 5 to 10, and yet all 10 are located in a single block (2004) which does not appear to be materially different than the rest. Indeed, we believe that it has vacant dwellings “borrowed” from an adjacent block group!
Further, its population increased significantly, so the average household size jumped from 2.85 (average) to 6.40. The table at the end of the article contains the data for the 10 census blocks.
While a household size of 6+ persons does occur in the United States about 5% of the time, this is very abnormal in an established, upper middle-income neighborhood.
Indeed, if we map block group boundaries and display the average household size, a clear pattern emerges — almost all block groups have a single block which stands out as having a large household size (orange and red on the map below).
On closer examination, we generally find that the percentage of vacant dwelling units is substantially higher than in adjacent blocks in almost all cases.
Block Level Comparisons
The table shows the ten block groups in detail, with the 2010 numbers, the 2020 published numbers, and our revised 2020 numbers.
Our 2020 Census Approach
The additional complexity of the privacy budget concept presents additional challenges, in that even the base population counts at a census block level have been modified, sometimes even to the point of statistical impossibility. Since AGS does its demographic modeling at the census block level, this poses particular challenges because only the total dwelling units and population in group quarters are stated to be correct at the block level. Everything else has been manipulated, and even at the block group level, there are significant anomalies.
Our approach to resolving this includes what we refer to as “balancing,” meaning that the entirety of the geographic hierarchy is utilized. State totals (stated by the census as being correct) are used to balance the county numbers, which, in turn, balance census tracts, block groups, and, finally, blocks. The outcomes are that the resulting block estimates are well constrained and do not generally include a single block which looks nothing like its neighbors.
From an internal modeling perspective, this will yield much better results moving forward and avoid using trending on non-comparable datasets. While we can’t know what the actual census results were, we are convinced that the resulting database is likely a more accurate rendition of those results than those which have been published.
If you are interested in learning more about how we cleaned up the census data, please drop us a line. We will be happy to talk in detail about the methodology and results, and provide you with the datasets for comparison purposes.