By David Chaum, August 14, 2019.
During a recent meetup in Los Angeles, I was asked what it was about metadata that stood out as a privacy issue to me before anyone else. How had I come across a problem that would be such a pivotal issue 40 years later, and why had I decided to build technical solutions for it while everyone in cryptography was focused elsewhere?
By the time I arrived at Berkeley, I had obviously realized that cryptography was going to be a critical tool for structuring our future informational world. But the issue of metadata, specifically, came to me by chance while exploring the UC Berkeley library..
Back in the 70s, and I think still, the Moffitt library at UCB was one of the repositories for the Federal Record. Meaning that as a Berkeley student, you had access to the full record of congressional testimony if you were interested, and I was. While combing through those records, I came across a story about the CIA and Chile.
The story was actually Senate Testimony by the CIA, and was one of the few times the CIA has ever had to publicly testify and discuss its operations on record. The testimony focused on how the CIA helped overthrow the democratically elected Popular Unity government in Chile in 1973. It was particularly interesting to me because it was the first time I had really been exposed to the revealing nature of traffic analysis--about how the data surrounding message content could be more revealing than the content of the message itself. Essentially, what it said was that by tapping into the central phone exchange at the Presidential Palace, the CIA was able to record which numbers called each other and how long they talked. They did this over the course of several months, and from that information were able to understand who was really running the country by identifying the 60 or so people around the President who were really important to tactically going in and seizing power.
For me, this was an eye-opening moment that revealed the power of traffic analysis (which is today euphemistically called “metadata”). What stood out to me then, and still does today, is that the information about who talks to who and when, how it correlates to events, and how long people talk is the most revealing information available because it doesn’t lie. It can’t tell lies or speak in code to trick an outside party. It is also very simple information--time-stamps, ID numbers, data sizes that can be easily aggregated for analysis, and can very directly tell you the social graph of those surveilled.
I was immediately convinced that this was going to be a key issue in information security, both right away at the time, and even more critically as computers became powerful enough to analyze massive volumes of that metadata. So I set to work coming up with ideas on how to prevent its collection and analysis in the future. It was that testimony which led me to discuss metadata privacy and voting with Bob Fabry, which ultimately led to mix networks and more.
Unlike back then, nowadays the term “metadata” is a part of our daily vocabulary. In recent years it has moved from a topic discussed between advertising professionals, engineers, and privacy experts, to a punchline between friends joking about Instagram ads so highly targeted it’s as though Facebook read our minds or bugged our conversation.
But even though metadata has been a part of our everyday lives for decades, only recently has it been so top of mind. Thanks to a combination of the Snowden revelations, the Facebook and Cambridge Analytica scandals, and general increased understanding of our internet’s advertising-driven business model, we’ve now grown acutely aware of its constant presence and impact. It is unfortunate that it has gotten this far and become the widespread issue we all know, but it’s a welcome change to have the public aware of metadata’s privacy implications. That’s creating the necessary momentum to go back and fix the problems we’ve neglected. It’s our second bite at the apple.