The Anonymization Fallacy

I was asked by the local medical school to give an afternoon session on patient data security. The idea was to tell them how to properly anonymize their data so that the relevant patient data security laws can be followed.

I was planning on talking about the laws and then going through selection, generalization, perturbation, k-anonymity, etc. Maybe throw in a bit of cryptographic magic dust. Then I found a paper that shocked me.

There is no such thing as anonymous patient data.

The so-called patient identifying information, obvious stuff like name and address, are not the only things useful to identify people. You can be identified on the basis of birthdate, sex, and zip code alone. You can be identified on the basis of the search terms you typed into a search machine. You can be identified on the basis of the movies you reviewed.

Forget even things like semi-anonymous blogs.

I rooted around and found four disturbing papers:

  • Latanya Sweeney, Uniqueness of Simple Demographics in the U.S. Population, Laboratory for International Data Privacy Working Paper, LIDAP-WP4 (2000)

    She analyzed US census data for 1990 and asserts that 87% of U.S. citizens are identifiable (1-anonymous) with just birthdate, sex, and zip code. She also irritated the hell out of Massachusetts Governor William Weld, who released "anonymized" health data on state employees by handily selecting his complete medical history out of the data, using another set of data she purchased for 20$.

    The working paper does not seem to be available online, but a preprint of her dissertation with the results is findable.

  • Arvind Narayanan and Vitaly Shmatikov, De-anonymizing Social Networks, IEEE Security & Privacy '09.

    Narayanan (who authors the blog 33bits on privacy questions) and Shmatikov took the Netflix challenge in a bit different way than intended and compared this massive graph with unknown people rating known films to the IMDB database with known people rating known films. It turns out that even though subgraph isomorphism is NP-Complete (really hard to calculate, for non-theoretical computer scientists), if you can identify certain nodes (in this case the film names on the nodes in one half of the bipartite graph), you can quickly find overlap. Unique overlap.

  • Paul Ohm, “Broken promises of privacy: responding to the surprising Failure of Anonymization”, Preprint, University of Colorado Law Legal Studies Research Paper No. 2009-12

    This lawyer has given the whole privacy question such a thorough shake down, that it is left standing nakeder than under a full-body scanner. There is no such thing as privacy any more. He insists that the ancient, creaking legal system get its act together and deal with technology. I'm not holding my breath, but shaken to the core. This paper is an excellent read, copiously footnoted.
  • Philippe Golle, Revisiting the uniqueness of simple demographics in the US population. In Proceedings of the 5th ACM Workshop on Privacy in Electronic Society (Alexandria, Virginia, USA, October 30 - 30, 2006). WPES '06. ACM, New York, NY, 77-80. DOI= http://doi.acm.org/10.1145/1179601.1179615 (in the ACM Digital Library)

    Golle tries to revalidate Sweeney's results. He "only" has 63% instead of 87% for the 1990 census, but gives his methods and tests both the 1990 and 2000 census data. A fascinating - and scary - read.
So I decided to give the students a different lecture and let them understand what privacy would mean and why it is a problem.

Right at the start some computer science types questioned my premise that there is no anonymity. They have had lectures on this before. I discussed these three papers, and then we did an experiment.

I passed out papers and had them put down their age, sex, country of birth, bachelor's degree program and city they graduated in. I didn't trust my own senses, but it turned out I could have. The last two were completely unnecessary in such a small group. They were also to make up a horrible disease they had.

I collected the papers and had one student choose one at random. Then I had everyone stand up. A 27-year-old female from Germany was chosen. Even though there were 8 students from Germany in this population of 23 students, when I asked for all the non-27-year-olds to sit down, only 5 were left standing. And there was only one woman in this group. I had the men sit down and asked her if she had written the deadly disease X on her paper. She had, and was rather shaken.

If you can isolate equivalence classes and then rule out some of them, you very quickly can narrow down even an extremely large population.

We had a break, and after the break did some case studies in medical ethics. They had a good time with that, two students from different countries getting into a wonderful row about private companies obtaining data to deny people health insurance.

After class many came up to speak to me. One wanted assurance that TOR would keep him anonymous when surfing. Sorry, I said. Did you know that the versions of the plugins you use for Firefox, the installed fonts and your time zone pretty much identify you? The EFF has a site, Panopticlick, that will help you see this. And even the history links in your browser are readable and can identify you.

Of course, sometimes they are wrong. According to this site, there is a
Likelihood of you being FEMALE is 9%
Likelihood of you being MALE is 91%
I guess I surf like a guy.

The EFF has a few tips on staying anonymous. They rather boil down to only surfing with an iPhone that is not registered in your name.

I suppose we will have to realize that the whole world is watching what we do online. Literally.


Google doesn't get privacy

The blogosphere is "abuzz" (if I may use that term) about the new Google service that is supposed to rival Twitter. Google is so helpful setting it up for you from your Google account, it automatically adds the people you write a lot of emails to as people you follow and who follow you.

Slashdot reports on many discussions, including this one (blog is now blocked, so I am quoting from the thread on /.)

I use my private Gmail account to email my boyfriend and my mother.
There’s a BIG drop-off between them and my other “most frequent” contacts.
You know who my third most frequent contact is?
My abusive ex-husband.
Which is why it’s SO EXCITING, Google, that you AUTOMATICALLY allowed all my most frequent contacts access to my Reader, including all the comments I’ve made on Reader items, usually shared with my boyfriend, who I had NO REASON to hide my current location or workplace from, and never did.
Ah, yes. If you write emails to each other, you must be friends. It would be wonderful if the world was such a nice place!

Many of the commenters berate the woman for being "stupid". Well, after my recent bout with Google, all I can say is that it is easy to be "stupid".

When I started this blog in 2005 I called it semi-anonymous. People in Real Life knew who I was, but strangers were welcome to stop by. I didn't show my real name on the profile. Then I started a second blog, a professional one. I wanted my real name there.

Then I realized that Google was publishing my name on my private blog! No, Google, I don't want that! So I removed author information, got down into the template, commented out the place where my name was, and continued my merry way. For years.

Just a few weeks ago, an anonymous commenter remarked that I wasn't all that anonymous, I should have a look at the RSS feed. Gasp! There was my full name, in the clear.

I made up a new email, and split the private blog off - added the new email as an author, then removed myself (I had been using another, different throwaway email for Blogspot, but somehow it knew that this was me). I scoured the options - only WiseWoman, no one else.

But my real name was still on the RSS feed.

I went though the forums. Many people, pleading for help - how to set up the XML so that it DIDN'T send out author information? There is just no place to set this. Google decides what to put in the XML, you either like it or go elsewhere. I published pleas for information, but Google doesn't appear to read its own help forums.

I kept checking the forums every now and then, no answer. Today I got mad, and went ballistic really looking for help in other forums. I found an intriguing solution: export the blog to XML on my computer (which would give me a backup, too); use a code editor to search and replace the author info; delete all the posts; and then import them again. This would bug my readers, giving them 500 new posts, but it would fix the identity problem.

I tried one first. Opened the post in a new tab (I'm paranoid), deleted it. Imported the massive file. Yup. One new post, and now written by WiseWoman. So then I deleted all the ones written in my real name, which is no easy task. I reformatted the edit posts to have 100 posts per page, selected all, and deleted, starting at the back. Then on to 25 per page for the latest posts, so that I wouldn't disturb my readers currently enjoying the Metropolis post.

Then I re-imported the mess. It took a while, but now all of my posts are written by WiseWoman. And the RSS feed is clean. But really, Google, you don't get privacy. It's not about not doing things I shouldn't, as Eric Schmidt, CEO of Google, Inc. said. Privacy is about me determining who gets to see or know which details about me. It's my life. You only get to know what I want you to know, nothing more.


Since there was so much interest in the premiere of the restored version of Fritz Lang's "Metropolis", the French/German TV station arte broadcast it in parallel with the showing at the 60th Berlin Film Festival.

In 2008 an almost complete version of the film was found in an Argentine film musuem. The copy was well-worn, but they have been able to to piece it together and resequence it, as the music score has been preserved in its original length.

The Berlin Symphony orchestra played live in the Friedrichspalast for the showing, and there is some sort of public viewing at the Brandenburg Gate - in the snow. We are sitting in a warm living room, full after a nice dinner and with a nice cold glass of prosecco and are enjoying the film.

Having seen the film many times before - one can pay attention to some details: the editing, the shapes on the doors, the flow of trafic, the traffic lights that only go on while a car is driving under them, the heavy Christian symbolism, the video telephone. Many of the title boards with dialogue are well-known sayings to modern day Germans: "Es muss ein Mensch an der Machine sein", "Vater, Vater, nehmen 10 Stunden nie ein Ende?", "Der Mittler zwischen Hirn und Hände muss das Herz sein".

Some parts actually now make sense - the missing scenes are still horribly scratched, but they give reasons to Freder's actions and reactions. Of course, many of the cut scenes are what passed for sex scenes in 1927, or are extremely critical of Germany at the time. And the bit about the woman between Rotwang and Frederson was cut for the US version, because her name, Hel, sounds like, well, a four-letter word....

And in a way it's like watching the source for many a film quotation (plagiarism?). Bladerunner springs to mind. It was also hard work - being a silent film, you had to concentrate on the screen. No surfing for me during the film! The music was just wonderful - the musicians played for 2 1/2 hours without a break!

A DVD is expected out in December.


A Shot in the Back

Friday morning. 8.30 am.

I called my GP, they say "we have a packed waiting room and we'll just send you to the orthopedist if it looks bad."

Praise be having private insurance, I don't have to get a transfer. So I managed to get my boots on, and crept over the icy sidewalks to our local bones and ligament specialists.

There were already people waiting, and a big sign: We are an appointment office now. Oops. Oh well. When the receptionist showed up my card was taken, and I was assured that I could be seen.

The waiting room filled - 14 people, and the docs aren't here yet! I wonder how many are ice injuries? It is a scandal and a disgrace that Berlin still doesn't have clean sidewalks after 6 weeks of snow!

But I didn't have to wait too long, and the doctor had a look and poked me in different places. Then he gave me a shot of something great between the third and fourth lumbar vertebrae, and I soon could move without excruciating pain.

I also got told, nicely, to cut down on stress and come back next week for a physiotherapy plan. He prescribed pain pills and promised another shot, if I came back next week. I'll do the physiotherapy bit, I hear one gets nice back massages prescribed....


Oh my aching back!

I felt this coming on last evening while attending a formal dinner and sitting on fancy but uncomfortable chairs: lumbago attack.

This morning I could barely get out of bed. But it was last lecture and faculty board meeting and I was to chair the meeting, so I dragged myself out of bed after a session with the heat lamp.

I packed a heating pad, heating cream and ibuprofen and headed out.

I had to have students set up my computer, the electrical outlet is taped to the bottom of the table leg, no way I could bend down to do that today. I assumed my seat and didn't move for 90 minutes. I found it quite problematic to sit still and lecture. I normally stride all over the place and like to bang around on the projected image to call attention to this point or that.

None of that today.

Then when I went to play the video, I realized I would have to get up and go over to the wall to do so - and asked for student help again, which was readily given.

But we made it, and I applied heating in my office for an hour before heading out to the faculty meeting, which even in a healthy state can be very trying. Again I got myself settled and had someone go get me a tea (coffee is supposed to not mix with ibuprofen well).

Getting back to the car was hard - sloshing through the snow, afraid at every step of slipping out. Getting into the car was pure pain. Planned braking I did by using my shifting right hand to pick up my leg from the gas and plant it on the brake. I hoped there would not be unplanned braking, I got lucky.

Getting out at home was painful, I managed to turn the left blinkers on while getting out and had a hard time bending back down in to turn them off. Then a neighbor from next door whom I do not know hollered something down at me. I had to ask her to repeat. She wanted to tell me that my right back lights are out. Well, if I have the left blinker on, yes they are. Sigh. I suppose she meant well.

But I'm feeling like I'll be needing a walker soon. If so, I want it to be purple and have a basket to put all my stuff in.