Q&A: HIBP’s Troy Hunt

Huge Facebook Leak Brings the ‘Ability to Send More Targeted Phishing Emails’

By Rachel Looker 

Facebook made headlines this month after news that a data leak exposed the personal information of more than 533 million users.  

First reported April 3 by Business Insider, the leak included cellphone numbers, names, locations, birthdates and some email addresses for users in over 100 countries.  

But Facebook said hackers obtained the data before September 2019 by “scraping” it from the platform through misuse of its contact importer tool. 

“This feature was designed to help people easily find their friends to connect with on our services using their contact lists,” Facebook said in an April 6 blog post.

The platform said the contact importer had been updated to prevent software from imitating the app and uploading large sets of phone numbers to see if any matched a Facebook user.  

The leaked data did not include financial information, health information or passwords, Facebook said.  

Troy Hunt, an Australian and regional director for Microsoft, is the founder of the “Have I Been Pwned” (HIBP) website. It aggregates data breaches to help individuals find if they have been affected by malicious activity. 

Following the leak, Hunt added cellphone numbers, which encompassed most of the affected data, to his site’s search capabilities.

He told Digital Privacy News that HIBP’s traffic had “dramatically increased” since the leak because Facebook directed users to the site to check if they had been affected.

“Because (the data’s) in so many different people’s hands, it starts getting a lot of chatter.”

Hunt, however, declined to say how much money HIBP was making from the new traffic sent by Facebook.

In 2019, Hunt sought potential buyers for HIBP but wrote last year that he would continue operating it independently.  

He told Digital Privacy News that the biggest risk from the recent Facebook leak related to increased phishing attacks and spam that might be used for malicious purposes. 

This interview was edited for length and clarity. 

If the leak occurred in 2016, why is it making news now? 

The data has been around for a while. 

My suspicion is that what tends to happen with data breaches or scrapes is that someone gets it — it’s a whole bunch of data — and they’ll often begin by monetizing it.

They don’t want it to spread too far, because they want people who are going to pay for it. 

If it’s spreading too far, then you don’t have to pay for it. 

So, inevitably, they sold it in relatively close circles — and then, over time, perhaps they advertised it a bit more broadly to try to milk a bit more out of it. 

Then, sooner or later, someone who’s managed to acquire it ends up just dumping it publicly — and it spreads extensively. 

That appears to have been the tipping point here: where the data has gone public — and because it’s in so many different people’s hands, it starts getting a lot of chatter. 

Sooner or later, the press picks up on it — and you have news headlines. 

Why didn’t Facebook come forward with this information if they had it? 

I believe they did make a statement about shutting down a tool or abuse of a tool. 

Obviously, there were various endpoints that were either publicly and freely available or part of commercial services that leads to much information. 

“Can you tell the difference between someone scraping this data … versus many people just using the service legitimately?”

One of the challenges is, did they even know that it was being abused at the time?

Can you tell the difference between someone scraping this data against the terms of services versus many people just using the service legitimately? 

It could have been very difficult. They may well not have known.

We could speculate that maybe it wouldn’t have done them any favors to disclose at the time, but we could equally speculate that it might not have been anything noteworthy to disclose at all. 

That’s not as far as I knew. 

It seems Facebook was almost forced to acknowledge the leak, since it was reported by Business Insider. What are your thoughts on this? 

Look, I can see it both ways. 

I’m not sure that I’ll necessarily be a Facebook sympathizer here, but this is something that happened quite some time ago. 

I see their position on that. 

But the other thing is that, for the Facebook folks, there are so many alleged data breaches, so much data. 

“The way they’ve (announced) it — most of us feel has been a little bit dismissive.”

Particularly, when we get to scraping, that data can be pulled from different locations, or we could get data from somewhere completely different from Facebook. 

Then, I’ve got to look at this and go: “Well, is it from Facebook? Do we have to do anything? Do we not?” 

What’s probably different in this case is that it just got such broad headlines very quickly that they’ve eventually said, “All right, I’ve got to acknowledge it in some way.” 

But then, the way they’ve done it — most of us feel has been a little bit dismissive. 

Did Facebook try to minimize the impact of the leak by saying it had been reported before? 

You can always tell when data-breach notices or anything similar have been written by lawyers. 

It’s really obvious. 

We’ve had a couple of cases like that just this year, where clearly there’s a lot that is not being said — and the words that have been chosen had been exceptionally carefully picked. 

It’s a lawyer that writes that. 

Inevitably, they’re trying to minimize damage — and that’s fine. 

My view, in general, for data breaches is that transparency and openness and good explanations about what actually happened are really key. 

You can take a data breach and do a good job of the disclosure — and the organization comes out looking pretty good. 

In some cases, where they’re dismissive or where you feel that there’s deception — or there’s a prioritization of shareholder value over customers — that’s when everyone gets less comfortable. 

It does feel that way with Facebook. 

Why? What’s different?

The thing that makes it different, in terms of the validity of their disclosure notice, is that it really wasn’t a breach in the traditional sense of the word. 

It was a feature they had. 

They did discuss it at the time — and in many ways, this is just recycled news. 

Facebook referred to the leaked information as “old data.” What does that mean? 

In this context, they mean it was data that was acquired some time ago. 

I don’t know how “old” we consider August 2019: by the sound of it or some period in 2019. 

Part of the interesting thing about the spreading around (of leaked data) is that we tend to look at it and judge it by the lens of today, as opposed to the lens of when it might have happened. 

Very often, there’s a period of years that go by before this information circulates. 

The term could be a bit misleading because, in the Facebook situation, “old,” “not old” — whatever you call it — it’s still legitimate phone numbers and legitimate personal details. 

“That data to (affected users) isn’t old. That data is accurate and current as of today.”

I verified a bunch of the info with friends that are in there. My fiancée was in there, with a Norwegian phone number. 

I managed to not get caught up in it — but that data to them isn’t old. 

That data is accurate and current as of today. 

You wrote in a blog that you’ve seen unprecedented traffic to HIBP after the news of this incident. Beyond the number of users affected, what makes this situation different? 

What makes this particularly interesting is Facebook alone. 

As soon as there’s a story related to Facebook, even if it’s relatively benign, it’s going to make headlines. 

They are a favorite whipping boy for various reasons — and they’ve certainly done things in the past where they’ve probably deserved it. 

What makes it interesting, partly, is Facebook. The volume: 533 million is a big headline — and even though that’s only about 20% of Facebook’s subscriber base, it’s still 533 million, which is a very large number. 

That makes it interesting, too. 

Then, it’s the presence of these mobile numbers. Phone numbers are becoming particularly interesting because they’re used so often as forms of identity verification. 

We will use phone numbers, for example, to allow people to do two-factor authentication: Enter and use my password? “OK, we’ll send you a text message with the code. Please enter it.” 

A lot of people are very worried about the risks associated with SMS. 

I’m more worried about things like the ability to send more targeted phishing emails — because now you not only have a name, but you have a number and a nice mail-mergeable format. 

In some cases, those Facebook records also had a date of birth or the person’s location. 

The more information you have to craft very targeted messages, the more successful things like phishing are. 

What should people to do if they suspect they were part of the Facebook leak?

The thing is Facebook doesn’t really change any of the guidance — or this incident doesn’t change any of the guidance. 

The biggest risk we’ve got from this is phishing and spam, more so than we had before, because all of us had phishing and spam before. 

This is not going to suddenly start giving us phishing and spam when we didn’t have any before. 

The guidance that we’d always give, particularly in the phishing side of things, is to always verify the identity of who is asking for information. 

“My view, in general, for data breaches is that transparency and openness and good explanations about what actually happened are really key.”

So often, the messages we get that are phishing attacks are indistinguishable from the messages we get that are legitimate communications. 

Phishing always emulates legitimate communication. 

Part of the problem is that we have these organizations looking like phishing attacks themselves, which makes it very hard for normal, everyday people to try and figure out which ones are attacks and which ones are legitimate. 

This is where businesses and government have a role to play in not looking like phishing attacks. 

I still find it unbelievable that we have to go through this process of “identity verification.”

It’s an interesting time. 

The takeaway from this is that it doesn’t particularly change anything other than being a reminder that we’ve just got to be conscious of things like phishing. 

Rachel Looker is a Washington writer.