The people who design hardware and software for smartphones, internet browsers, high-tech cars and many other internet-enabled devices need to know how people use their products in order to make them better.
But when faced with the request to send information about a computer error back to the developers, many of us are inclined to say “No,” just in case that information is too personal.
Adding to the techniques for bridging this divide, researchers at Stanford University have developed a new system for aggregating these kinds of usage reports that emphasizes maintaining personal privacy.
“We have an increasing number of devices — in our lightbulbs, in our cars, in our toasters — that are collecting personal data and sending it back to the device’s manufacturer. More of these devices means more sensitive data floating around, so the problem of privacy becomes more important,” said Henry Corrigan-Gibbs, a graduate student in computer science who co-developed this system. “This type of system is a way to collect aggregate usage statistics without collecting individual user data in the clear.”
Their system, called Prio, works by breaking up and obscuring individual information through a technique known as “secret sharing” and only allowing for the collection of aggregate reports. So, an individual’s information is never reported in any decipherable form.
Prio is currently being tested by Mozilla in a version of Firefox called Nightly, which includes features Mozilla is still testing. On Nightly, Prio ran in parallel to the current remote data collection (telemetry) system for six weeks, gathering over 3 million data values. There was one glitch but once that was fixed, Prio’s results exactly matched the results from the current system.
“This is rare example of a new privacy technology that is getting deployed in the real world,” said Dan Boneh, the Rajeev Motwani Professor in the School of Engineering and head of the Applied Cryptography Group at Stanford, who developed Prio with Corrigan-Gibbs. “It is really exciting to see this put to use.”
Two servers keeping secrets
Secret sharing is a method for maintaining the security of data that involves breaking up a piece of information into specially formulated parts, so that if someone gets hold of only one part, they learn nothing about the original piece of information. Prio uses secret sharing to break individual data points — such as whether you chose to change your browser homepage from the default setting — into secret shares and then sends those to two different servers. Even if an attacker is able to take over one of the two servers, the attacker still cannot recover any individual’s data point.
To produce the aggregate value of interest, the servers each sum up their shares and then exchange these sums. By combining the sums, the servers can learn the final aggregate statistic — what percent of people changed their browser homepage from the default — without leaking any other information about the individual pieces of information involved.
Prio can handle large amounts of data and, so long as the servers never collude, the system reveals nothing other than aggregate statistics. The system can further enhance privacy by slightly perturbing the final result. Corrigan-Gibbs and Boneh developed a method whereby the system sending the data proves to the servers that a set of secret shares is well formed without revealing any information about the data that the shares encode. Without a proof of this sort, a single faulty or malicious participant could send a garbled set of shares to the servers, which would completely corrupt the final reports.
A real-world test
Currently Mozilla is testing Prio using non-sensitive data it already collects and is running both servers. In order to fulfill the privacy-preserving potential of Prio, Mozilla would have to find a trustworthy third party to run the second server. It is also continuing its tests of Prio and will be providing updates about progress via its blog.
For their part, the researchers are excited about the potential of Prio for many different kinds of devices and data sharing. They also appreciate seeing their work in action.
“To me, this is the best example of why research is exciting. You get to study these things and you get to launch them into the real world and see them have impact,” said Corrigan-Gibbs. “This began as a fascinating theoretical problem about proof systems and zero knowledge. And then 18 months later, there are 100,000 people using it.”