Researchers Wary of Census Bureau’s Plan to Use ‘Differential Privacy’ in 2020 Count

By Tammy Joyner

As a demographer, Alexis Santos relies heavily on census data to track public-health disparities, especially in communities of color.

But a proposed change by the U.S. Census Bureau designed to further safeguard the confidentiality of its data threatens to upend the work of Santos and other researchers.

The bureau wants to use a new algorithm called differential privacy, beginning with this year’s census.

“Differential privacy is more concrete,” Maria Filippelli, a public-interest technology fellow at the New America think tank in Washington, told Digital Privacy News. “It’s more technical.

“In the end, a set of mathematical equations or algorithms will process the data, so that it’s more secure,” she said.

“It will cloud our understanding of population dynamics in the United States.”

Alexis Santos, Penn State University.

But Santos and other opponents said it would alter the data so much that it would cause inaccuracies in their research. That, in turn, could distort outcomes in population characteristics and possibly how and where federal money is dispersed.

“It will cloud our understanding of population dynamics in the United States,” said Santos, assistant professor of human development and family studies at Penn State University.

Protecting Data

Census officials, however, insisted the change was necessary in today’s highly technical world, where sensitive and private information can be breached.

“Our goal is to ensure the public trusts us with their data, and values the statistics that we produce,” John Abowd, the bureau’s chief scientist and associate director for research and methodology, told Digital Privacy News.

“Differential privacy is a standard for protecting confidentiality in published data,” he added. “We know that the nation needs timely and accurate information to make informed decisions.”

The bureau is testing the new mathematical method and will make a final decision next March, after all the data is collected. 

“We know that the nation needs timely and accurate information to make informed decisions.”

John Abowd, U.S. Census Bureau.

If implemented, differential privacy will be used for the first time to publish census tabulations or counts. The method traditionally is used in banking and other private-sector industries.

The decision is a critical one. Not only does the census provide a decennial snapshot of the American population, the tally is used to determine the number of seats each state has in the U.S. House of Representatives and to distribute more than $675 billion in federal money to communities.

The Methods Explained

The new method would replace “swapping,” a privacy-protection method the Census Bureau has used since 1990.

Swapping involves exchanging a person who’s easily identifiable in one community with someone with similar characteristics in a different community.

That way, the individual’s privacy and identity are protected.

Here’s how the methods differ:

Say, for example, a small county in Pennsylvania only has three Latino residents. They’re easily identifiable.

The Census Bureau would “swap” the three Latinos with people who share similar traits — gender, age, for instance — in a nearby community.

Under differential privacy, however, the three Latino residents could potentially be removed without being replaced. Essentially, that alters the characteristics of the community from which they’ve been removed, Santos told Digital Privacy News.

“For my line of research, that would result in inaccurate calculations of prevalence or incidents of health conditions within the community they were removed from,” he said.

“If there are inaccurate measures of health conditions at community levels, we may end up assigning funds to an area that doesn’t need it, or a community that needs funding may go without funding or get less money than it needs.”

Unless officials can fine-tune the new method, Santos said he preferred swapping.

“They should keep the current method,” he told Digital Privacy News. “At least the data is useful for researchers.

“If differential privacy is implemented without taking necessary precautions, we may end up with an inaccurate understanding of the nation.”

Growing Threats

By law, each new set of census data is not available for public use for 72 years. But that mandate easily can be breached now because of faster, more high-powered computers and growing privacy threats from hackers.

Public-records companies, for instance, could mix census data with other public information — enabling a domestic abuser or a stalker to find victims. 

“Differential privacy is more concrete. It’s more technical.”

Maria Filippelli, New America think tank, Washington.

Consequently, the Census Bureau decided to revamp its procedures to further safeguard confidentiality, Michael Hawes, the agency’s census senior advisor for data access and privacy, said during a March webinar on differential privacy.

“Computers can easily perform the complex matching algorithms necessary to leverage that external data in order to re-identify individuals,” Hawes said. “These trends are not abstract concerns; they represent real concrete threats to protecting confidentiality.”

For that reason, New America’s Filippelli favors differential privacy.

“In the world of big data, what this is doing is not only protecting individual census responses, it’s also protecting census data from being compared to other data sets.”

But, Filippelli told Digital Privacy News, the bureau needs to “find a balance in the privacy-accuracy tradeoff.”

Tammy Joyner is an Atlanta writer.

Sources (external links):