The Erosion of Online Anonymity (and How to Restore It)

One of the most important principles of individual privacy is the ability to act anonymously.  When people are driving to a store or reading a book at home, they have a reasonable assumption that nobody is monitoring their behavior and attaching it to their name and address.

Peopleoncomputer The same should be true on the internet: when you are online, there should be a presumption of anonymity. Nobody — including websites, ad networks, ad exchanges, widgets, outside analytics services, etc.  — should know who you are and what you do unless you sign up or log in.

In a better world with sufficient anonymity online, your search history and the sites you visit should not be matched back to personally-identifiable information (like your name, address, email, etc.) so it cannot be stolen, used to discriminate against you, or subpoenaed by the government. 

In online advertising, there are various standards for what constitutes sufficient online anonymity. But unless companies  adhere to the highest standard and increase awareness to consumers, internet users may think their browsing behavior is being tied to their identity and may subsequently dramatically decrease internet consumption and be less likely to experiment with new online services. In short, the lack of available anonymity could stifle the online economy and all the innovation happening on the web.

What Anonymity Means

Anonymous The key to protecting anonymity is to make it technically impossible – not just contractually prohibited or difficult – to tie an internet user to their name and address when they are not explicitly logged in.

This doesn’t mean that websites and third party services can’t know something interesting about you.  They might know that you are a woman who lives in the New York area, plays tennis, enjoys Settlers of Catan, is in market for a trip to Italy, and drives a hybrid.  This is good because they can use this data to give you a more personalized experience: content you like, better customer service, more targeted ads, and less spam.  But they should not know that it is you.

Of course, once you log in or when you link to your name or Twitter profile, people might know it is you.   But it is important that when you first arrive at a site, nobody knows exactly who you are unless you explicitly log in. 

Prescriptions to improve online privacy and anonymity

Here are some prescriptions that online services should use to raise the bar on online privacy:

1. Eliminate the collection and analysis of “Machine ID”

Fingerprint A Machine ID is like your computer’s fingerprint that can usually uniquely identify you.  It is the information that your computer may send to the sites you visit, like your IP address, browser configuration, “clock skew” (the millisecond difference in the clock on your machine and that on the server), and more. 

One reason sites and third-parties collect Machine ID is to help customize the online experience for people based on past machine behavior. But this can be uniquely identifiable because people largely use the same computers, meaning that machine IDs can potentially be looked up and traced back to an individual (there is a marketplace for addresses-to-IP matches today). In fact, IP address alone can be traced back to 30% of households today

2. Store audience data in browser cookies

While browser cookies have recently received a lot of attention, they are one of the most privacy-centric ways to help personalize services for consumers.  Today, pretty much every site that you go to uses cookies every time you visit.  This is generally a good thing for consumer privacy, since – using browser settings – a person browsing the Internet can entirely control their cookies.   However, many companies don’t do a good job of anonymizing cookies.

Cookies Many firms store a unique ID in a user’s browser cookie and ping cloud servers with this unique ID to “see” data associated with the cookie.  This system of storing unique cookie IDs has a lot of benefits since it enables the information associated with cookies to be quickly updated and more easily analyzed.

But using unique IDs also means people may no longer be anonymous.  A more privacy-centric solution is to store all the segments of a person directly on a cookie.  The data can be encrypted and secured so that only the cookie-placer can access it. 

Changing the cookie system from unique ID-centric to segment-centric is a large technical challenge and might take some sites, ad networks, and widgets many months to complete.  But it would be great that if by this time next year, all companies could be more pro-consumer in the way they store data within cookies.

3. Make it impossible to identify an individual using anonymous data segments

But storing the data directly on the cookie is only part of the challenge. Data also needs to be anonymized appropriately.  Simply stripping personally identifiable information out of a cookie is not enough to make it anonymous. Recently, Netflix had to shut down its million-dollar Netflix recommendation contest as a result of an FTC inquiry about their anonymization practices.

If there is data on me that says my company is “Rapleaf” and my title is “CEO,” it is not anonymous because I am the only person that fits the join of both of those attributes.  A more appropriate description would be company “technology start-up” and title of “executive”—that gives me room to add other criteria like lives in “SF Bay Area,” plays “soccer,” and reads lots of books on “foreign policy” without knowing it is me.   Many people fit all those characteristics.

These are just three of many prescriptions that companies should implement to help ensure the presumption of anonymity.   Adopting these changes will require a short-term sacrifice for web sites and third-parties, but long-term these are the right decisions for companies to make.

Giving technologists a better appreciation of why privacy – and in particular anonymity – is really important is not an easy task.  Most Silicon Valley companies come from the perspective that their technology is sacrosanct.  As an engineer, I admit that we started my company Rapleaf with that approach. However, years of engagement with our web users, customers, partners, privacy experts and advocates (including our own privacy advisory board), have made it clear that investing in a safe infrastructure where users have the presumption of anonymity will ensure that the Internet will continue to grow and stay vibrant.

Special thanks to Michael Hsu, Joel Jewitt, Jeremy Lizt, Travis May, and others for their help and edits….

24 thoughts on “The Erosion of Online Anonymity (and How to Restore It)

  1. Mike Lee

    You hit this on the nail. Many people don’t realize that IP addresses can track most American households. you cited the statistic of 30% of U.S. households but my estimation is that it is closer to 50% — and it is roughly 80% of high income households (people that have broadband). Third party services (and likely first party series) should not be collecting IP addresses.

  2. Brad Acker

    Just encountered your blog – love it. you helped me clarify what i have been struggling with in my mind — my desire to have product makers understand my needs without having to click off a bunch of boxes in surveys all the time, yet my concomitant desire to be protected against “enemies” finding data that could be used against me in some public forum to achieve ends that i oppose. I appreciate your thinking!

  3. Hank Stringer

    I appreciate today’s blog a great deal. The company I helped found in the mid 90’s, was based on the value of candidate anonymity. As a long time recruiter I understood the desire of working professionals to have control over their personal data, how it used and monetized. I was afraid then and have seen now that most organizations in the recruitment technology space monetize traffic flow with little sight on getting the right talent with the right opportunity. This is frustrating and has caused the pipes to be too full – in many cases we’ve become inefficient as a result. I am not through trying to solve this problem –

  4. Hunter Johnson

    I love this topic. If you haven’t read all of these, they cover some good ground — this in basically my order of enjoyment:
    Total Recall: How the E-Memory Revolution Will Change Everything, by Gordon Bell & Jim Gemmell
    Ambient Findability, by Peter Morville
    Delete: The Virtue of Forgetting in the Digital Age, by Viktor Mayer-Schonberger.
    The last book in particular ends with a fizzle, since I don’t think much of the plausibility of his solutions. But it was worth reading for the investigation of the problem.

  5. Dennis Kneale

    i do, of course, entirely disagree with you. anonymity is ruining the net. it’s one thing to grant it to an anti-govt activist in china who would otherwise be imprisoned. it’s entirely another thing to grant anonymity to some guy trashing his neighbor.

  6. Jeff Hawkins

    A very interesting article and it raises many interesting questions.
    First, having tons of information about someone from NY may not be enough to identify the person, but what about someone from a more rural setting? For example, a farmer from a small town in Wyoming can probably be identified as the person who made a trip to Italy, or bought recently bought a certain vehicle.
    As you point out: If people know they can be identified by their web behaviour, does this affect their browsing. I think it does. Should people be able to control what others see? Are people willing to knowingly share purchase and association information (think of a Facebook profile page) if they know it’s for their own benefit? If people had the ability to maintain different online identities, would this free them to see and purchase more freely?
    And given that many people browse and purchase from work, would a central profiling tool better pool the information together?

  7. steve e

    Online privacy is an issue that is close to my heart, as you know– I sent you a book on the matter three years ago, before it was a popular topic of conversation. For that reason, I consider this your most important blog post ever. The general perception, I think, is that you and Rapleaf tend towards the other extreme, so I am very pleased to hear that these are your views.
    Privacy on the level of the individual helps foster innovation, dissent, and deviation from the norm, while governmental privacy breeds corruption and waste. Yet as the last vestiges of privacy are taken away from ordinary citizens by the government (“we must know about everything you do or say”) an increasingly greater amount of privacy is demanded by the government from its citizenry (“everything we do must remain secret”).
    Because knowledge is power, the corporate world seems to regard the erosion of privacy as a foregone conclusion, and perhaps they’re correct– but that doesn’t mean there aren’t things we can do to slow the process, minimize the impact, and educate people about the future.

  8. Spot Draves

    Hey Auren you might be interested to know that IPv6 actually has some serious privacy features built-in:
    Not many people use it yet but with the end of v4 address space approaching rapidly, adoption’s gotta pick up soon.

  9. George L. Lenard

    This is good stuff, and to the extent I comprehend it technically (maybe 70%), I agree.
    But there is another side of Internet anonymity, the dark side, that I find very troubling. That is the acceptance of anonymity or screen names in expressions of opinion. Most newspapers have some requirements for ID verification before publishing letters to the editor. Yet blogs and even official news sites of those same newspapers, TV stations, etc. allow pseudonyms. I have seen some godawful stuff expressed that people would never say if their name were attached. For example, a TV news posting about a fight at my local high school quickly degenerated into racism of the worst sort, including my being called “n___r-lover, students called filthy animals, and much more.
    Perhaps this is the price of free speech. Perhaps it is just as well we know peoples’ true thoughts and the attitudes they express only under cover of anonymity. But perhaps such public online expression is very destructive and even partly responsible for our current, very troubling, socio-political divisiveness and legislative and policy stalemate.
    I’d like to see many sites adopt no-anonymity/pseudonym policies, with some means of automated ID verification. Personally, I always use my real name. Although at times I say things on FaceBook etc. (like posting clips from NYT columnists) that might turn some people off, my choice of lack of anonymity keeps me aware of the importance of exercising some moderation.

  10. Jennifer Duxtra

    Third parties, like ad networks, don’t have strong user relationship. Unlike first-parties that people go to directly and engage with, third parties are behind the scenes and often the user does not even know they exist. While many third parties (including Rapleaf) perform incredibly valuable services, the bar for these companies should be much higher.

  11. I'm staying anonymous

    Great stuff. Unfortunately, these prescriptions are not well abided by today. Government and industry regulators should provide guidance on protecting anonymity, but we need to see more companies invest their technical resources to protecting consumers. This is really important. Today, the vast majority of companies in the NAI and the IAB (the two leading self-regulating industry groups) do not adhere to these principles – and the impact is a less anonymous web that is beginning to erode user trust.

  12. teknozen

    Nice article but it leaves out a few important details.
    1. You talk about the need to “Eliminate the collection and analysis of “Machine ID” but you don’t provide any inofrmation about how to do that.
    2. You completely leave unaddressed the issue of Adobe Flash cookies. Even if you block all cookies on your computer, and diligently delete any that you do require for viewing certain sites, chances are there are flash coolies being stored on your drive that act just like regular cookies and more: they can contain keystrokes, AND they can resore the regular cookies that you have already deleted.

  13. David Gorodyansky

    This is a great blog post. As a matter of fact I think it summarizes perfectly the tradeoff between personalization online and the need for privacy and anonymity.
    AnchorFree is actually one of the leading drivers behind making everyone online private and anonymous. 8.5 million unique users each month use our Hotspot Shield free VPN service to surf more than 2 billion monthly page views anonymously. We provide an anonymous IP address to the user and do not collect any personally identifiable information. With that said we’re actually ad supported. Just like you write in your blog we enable advertisers to see that their BMW ad is target to car sites that a given user visits, and yet we prevent all third parties and ourselves from seeing who the user actually is.
    The side effect of this is also that millions of people that live in regions that censor the web are able to bypass censorship and gain complete freedom online. There is a BusinessWeek article in the latest BusinessWeek magazine about how what we’re doing is enabling users in China to have an uncensored Internet experience:

  14. Jennifer Fonstine

    This is one of the better privacy discussions. Three cheers for Rapleaf!

  15. La Guera

    At first thought, I disagree with you – believing that the internet is a public venue and should be viewed and accepted as such. I feel that only material that is legal in a user’s locale should be accessible on the internet. If a teenager walks into a grocery store and attempts to purchase liquor, the cashier is not going to accept that the teen simply recite his date of birth in order to “pass” the age test. Proof is required. The same should be true online. The Government should approve all web sites individually before they are made live and also monitor for improper/illegal content.

  16. Danny C

    “This is good because they can use this data to give you a more personalized experience: content you like, better customer service, more targeted ads, and less spam.”
    Why do you assume that it is a benefit to the individual to receive a “more personalized experience?” Can we start with debating that convenient assumption in rationalizing your business? Is there a law written in stone that a “more personalized experience” is better?

  17. cris2per

    Government and industry regulators should provide guidance on protecting anonymity, but we need to see more companies invest their technical resources to protecting consumers. This is really important. Today, the vast majority of companies in the NAI and the IAB (the two leading self-regulating industry groups) do not adhere to these principles.
    mini security cameras

  18. Miten Sampat

    Auren, thanks for writing this article and prescribing remedies.
    I recently logged into Rapleaf to “see my data” on your systems, and saw that I was accurately identified as an officer at the company I work for. Doesnt that conflict with principle 3 in your prescription above?

  19. thesis writing

    Online privacy really is a serious issue nowadays, but there’s nothing we can actually do without using some special software or proxies to keep our real IP and other data private

  20. heat pumps prices

    Great content, and hope to add your blog in my RRS reader, butI can’t get your rrs url. Could you help me, it’s pretty time-saving when using rss to read your blog.

  21. Online Surveys

    This unique blog is definitely awesome. I have discovered a bunch of interesting tips out of this source.
    I’d love to return again and again. Thanks a lots!


Leave a Reply