Q&A: Harvard’s Latanya Sweeney

‘Privacy Protections Are Not Working’

By Gaspard Le Dem 

Second of three parts. 

When Latanya Sweeney co-published her now-famous research paper on “k-anonymity” in 1998, the concept of data privacy was still in its infancy.

In the second of three interviews, Sweeney, an MIT Ph.D., told Digital Privacy News that federal HIPAA laws had fallen short despite overwhelming evidence that they were not working.

This interview was edited for length and clarity.

Nearly a quarter-century ago, you introduced k-anonymity. The concept is still being applied today: Google used it in 2019 for its Password Checkup extension, and it was a driving force behind Troy Hunt’s “Have I Been Pwned” site. What is k-anonymity? 

The idea is: How much privacy can I get if you can’t distinguish me from others? 

If there are “k” indistinguishable people, and I’m one of those people, you can’t do better than guessing I’m one out of the “k.” 

There are situations where that can be quite helpful, in terms of offering you some privacy protection. 

How was k-anonymity received back in 1998? 

Total confusion. The computer community had a hard time wrapping its head around data privacy. 

At the time, computer security was big: the idea of breaking into a system.

But if you said you wanted to offer privacy protection, people often asked: “How does this stop somebody from breaking into a computer?”

With k-anonymity, “We weren’t talking about protection from a break-in, but from someone going through the front door to get data that is given away.”

We weren’t talking about protection from a break-in, but from someone going through the front door to get data that is given away.

So, there was a lot of confusion in the tech world about it.  

In the policy world, it upended a lot of things.

What led me to k-anonymity was a re-identification of William Weld’s data when he was the governor of Massachusetts.

It showed that demographics like date of birth, gender and ZIP code tend to be unique for most individuals. 

That was believed to be sufficient for privacy.

That meant that the way data was being shared around the world was insufficient, given how much data was being shared on individuals. 

So, laws and regulations around the world began changing because of that — and k-anonymity offered a resolution.

We can offer a k-anonymity solution so that we can still share information with some privacy guaranteed. 

Did the policy changes come before k-anonymization was widely used? 

In some ways, the policy was already aligned to using it. 

How?

Even before k-anonymity, the Social Security Administration had tried to offer guidance on what they thought were good practices for sharing data. 

You had to remove a person’s name, address and phone number — and use only a sample of the data.

It was an early version of what we now call k-anonymity. But because k-anonymity could be expressed in simple words, it was much easier to implement.

Did you expect k-anonymity to become as popular as it did? 

No! I mean, in some ways, we can now look back and see why that happened.

Why? 

K-anonymity became the cornerstone of what is now known as data privacy, which is now a field all unto itself, right?

“There was a lot of confusion in the tech world about it.”

But, really, the arc of my work is around technology — and how it has changed the social contracts that we operate in. 

Privacy was just the first wave. Today, all of our democratic values are being challenged by what technology design does or doesn’t allow.

How so?

Thousands of people have now worked in the data-privacy space — and it’s had a huge impact.

But in some ways, we’ve lost a lot of the battle.

We’ve lost a lot of privacy — and the challenges from technology design have continued to come. 

What’s important to realize about data privacy is that it exists in the context of technology changing society in ways that strip us of our historical protections.

So, it begs the question: How do we shore up our historical protections faster than we’ve done so through data privacy? 

Have techniques to re-identify data evolved faster than those to protect it? Are we playing catch-up on privacy?

I don’t know that the world is still a world of re-identification versus privacy, in a kind of cat-and-mouse circle.

There’s some of that, but that ship has sailed. 

The question now is: How do we set up infrastructures that can make certain guarantees and limits on re-identification.

Approaches like “differential privacy” can make some guarantees, but they don’t solve all the problems. 

So again, we find ourselves forced into a tiered-access model — and this notion of tiered access gets us a little bit away from the see-saw game. 

The Weld experiment raised awareness around privacy and led to an overhaul of HIPAA standards. But, now, there’s serious vulnerabilities in public-health records. So, how much progress have we really made in anonymizing health data?  

We’re still really bad at protecting health data.

Even if I follow the HIPAA Safe Harbor, I could re-identify almost half the people in a single data set. 

“Today, all of our democratic values are being challenged by what technology design does or doesn’t allow.”

In the Data Map project (2010) we documented the sharing of health data and found that more than half of data-sharing arrangements aren’t covered by HIPAA.  

Privacy protections are totally not working — and one of the reasons is because there’s just a lot of money in data.

There’s a huge incentive to know more about you. Many data-analytics companies have translated that into an economic gain. 

The result is that there’s now a much bigger political cost to making privacy requirements more firm –– like changing HIPAA, for example. 

But is people’s information safer than it was 20 years ago? 

Absolutely not! Are you kidding me? 

Just think about it: This is 2021.

In 2000, before 9/11, how much did anybody know about you? I couldn’t just type your name into Google or Facebook and see photos of you.

The data brokers, which came en masse after 9/11, weren’t there.

Now, I can get your name, your Social Security number, your address, your driver’s license number –– all from a data broker.

How much information comes off your phone? Every so many seconds, some app is capturing your GPS location.

You don’t have any control over that information, but somebody could know where you are all the time. 

There’s no way you have anywhere near the privacy that you had 20 years ago. 

My great-grandfather used to say, “Go West, young man.”

What he meant was that if you messed up on the East Coast, you could go to the West Coast and start over.

You always had an opportunity to start over.

“Privacy protections are totally not working — and one of the reasons is because there’s just a lot of money in data. There’s a huge incentive to know more about you.”

But in today’s data setting, there is no starting over.

You don’t get a clean slate — and that’s because of the sheer volume of data that’s collected on individuals.

It’s not that in the last 20 years our policies changed, as much as it is that the ability to collect data about an individual has skyrocketed. 

Speaking of the West, your 2015 re-identification experiment on patient health data prompted California and Washington state to update their privacy laws. But other states didn’t budge. Why?

This is a great question. We worked with Bloomberg, submitting FOIA requests across the country to find out who gets state-level hospital data. 

We expected it to primarily be researchers — the (Johns) Hopkins and the Harvards — but we found that analytics companies and lots of other people were also getting this data. 

How does that happen?

When data is related to a revenue stream, it becomes really difficult to garner the political will to cut that off.

That’s part of the issue. 

The other part is how non-transparent data-sharing is.

People don’t see it. It’s not explicit.

As a result, it’s easier for regulators to ignore it. 

“When data is related to a revenue stream, it becomes really difficult to garner the political will to cut that off.”

So, instead of saying: “Latanya just re-identified this data. We’ve got to do something,” regulators say: “Our data’s just a little bit different — and Latanya didn’t really re-identify our data, so our data is fine.”

These kinds of pacifier clauses are another reason that things don’t get better. 

Friday: Big Tech as the new policymakers of the privacy world.

Gaspard Le Dem is a Washington writer.

Sources: