The Public Needs to Know Where Their Data Has Been
By Emilie Rodriguez
The Adobe data breach occurred in October 2013, the largest known at the time. Hackers exposed user account information, created a source code leak, and stole nearly 3 million encrypted customer credit card records.
An estimated 38 million users were affected.
After the incident, Troy Hunt, an Australian internet security professional, started the website “Have I Been Pwned” (HIBP).
Hunt told Digital Privacy News that the site sought to combat the growing threats posed by data breaches by helping users discover if their email addresses and passwords had been breached.
Why did you start the site?
I was writing blog posts and analyzing a bunch of data breaches around 2013.
It’s interesting, you see two different data breaches — and the people in both breaches have the same passwords.
We knew anecdotally that people reuse passwords, but to have empirical evidence was quite a different thing.
The final catalyst was after the Adobe data breach.
I found my own data there twice — my work address and my personal address — and I thought it would be interesting if people could see where their data had been exposed.
Was HIBP for nondigitally-savvy users?
It wasn’t just nonsavvy users.
I’ve had notifications from HIBP to let me know that I was in a breach from an account I never knew I had.
I think of myself as a savvy user, but it was useful for me.
HIBP is also in domain searches. There are a lot of people who are responsible for an organization — and they want to get a good overview of everyone on their domain in a breach.
The service is useful for them, too.
How has the mission evolved?
It’s still pretty much the same. Data breaches happen — and you load it into the website.
What’s really evolved is the growth in audience numbers and data breaches. I started HIBP with about 155 million records. Now, it’s over 10.1 billion, which is kind of unfathomable.
It began with a few people checking the public website.
Now, it’s government, law enforcement and cybersecurity companies who want to see if people are reusing bad passwords.
How do you define this website?
I would define it as a pet project. That’s where it started. It’s evolved in different ways; everything is very organic.
I’d love to say I had the foresight to plan all of this. But, it’s a pet project.
I’ve used that term a lot to reiterate that the point of this is it’s a community-centric service.
Does HIBP keep a record of the emails and passwords put into the search engine?
No. Nothing gets explicitly logged, but some things get logged in transit by webservices — say, Cloudflare — which protects everything.
I can look at the last 24 hours and see the logs from Cloudflare and see the email addresses that were searched. However, there’s nothing in the code that loads anything into a database.
There’s never any intent to log any of the searches. If I did, that’s a big privacy issue.
Upon entering information, a user can see how many times their data’s been breached, but they can’t access it unless they subscribe. Why do users need to subscribe?
Subscribing allows you to be notified if you appear in a breach in the future. There are 3.3 million people subscribed.
When breaches are flagged as “sensitive,” subscribers also get a link sent to their email, which will direct them to see what those sensitive websites are.
If someone adds their password and it hasn’t been breached — making them use it to safeguard data — would that password still be recorded?
No — and passwords are different from email addresses.
The password search uses an k-anonymity model. You type the password into the browser, the password gets turned into a cryptographic cache — and a small portion of the cache gets sent to a server.
What’s searched for never gives me enough information to fully know your password.
HIBP then comes back and gives you possible matches.
There is a donate button on the site, but for users it’s free. How do you pay for this service?
I’ve never sold any data or paid for any data. The donations came after I started, and they cover my more tangible costs.
My main costs are for the data storage running the services and log ingestion.
In a recent blog, you said you decided to open-source the site. Why? You mentioned a “failed M&A process.” Are you trying to sell HIBP?
The realization I came to in January last year was that HIBP was just getting extremely big without any real organization.
I was approaching burnout — and there was also no succession plan. I needed to find a way to solve those problems.
At the time, the best path forward was to go down a (mergers and acquisitions) process, where someone would literally pick HIBP up, pick me up — and we’d all transition.
But what came through the M&A process was that people didn’t want to buy HIBP — or the data.
They wanted to buy me for years, which felt super weird. Every single deal was uninterested in HIBP and interested in Troy running HIBP, locking me in for several years.
It narrowed down the candidates — and the only candidate that was left fundamentally changed their business structure over Christmas, which absolutely killed it.
After that, the open-source model kept floating to the surface. It allows me to give others responsibility for the code, giving HIBP some survivorship.
By open-sourcing the site, are you not giving away patents for free? Does this ultimately devalue the site?
If I thought I was sitting on all this valuable code, and I wanted to maximize the monetary value, then I wouldn’t do this. The code itself has no real value.
Are data breaches inevitable? Can people safeguard their information?
The answer is yes to both.
They are going to happen — and no matter how switched on you are, you cannot stop a breach.
The only thing I can do is reduce the impact it has on me individually. Doing that means having a unique password.
That’s why I have a password manager.
Can “Have I Been Pwned” be “pwned”?
That would be really meta, wouldn’t it?
I have a hard time taking any organization seriously who says they are “hack proof.” That is always possible.
A very pertinent question is what do I do to mitigate the risk of that happening and reduce the risk of impact if it does?
I have secure coding practices — and I’m cautious around firewall configurations.
In terms of addressing potential breaches, the only thing that goes into HIBP are the email addresses. This data sits in a very secure place that’s not on a cloud, so it’s hard for someone to get to.
If someone did get it, they could not get it online. They would only have the email addresses that are public knowledge and the email addresses of my 3.3 million subscribers.
I obviously don’t want that to happen — but in terms of a data breach, that’s extremely low.
Emilie Rodriguez is a Digital Privacy News intern. She is a journalism major at the University of Nevada.