What can our search queries tell us about ourselves?


Is privacy just a facade? In the world of web searching, the data is in: there is no such thing as confidentiality. Recently, AOL released a list of 20 million search queries that were collected over a three month period. The data was released under their AOL Research division as an offering for academic research. According to the New York Times, the release of this data so angered privacy advocates that AOL did an about face and rescinded this data set and offered a public apology.

What’s the big deal, you say? Why should we be worried about search results? Well… let’s take a look and see.

AOL was kind enough to remove any blatant personal identifiers from this data set. Instead, they inserted a unique number that was tied to each individual AOL account. While this may make you say, “whew, at least there’s nothing personal attached to this data”, you’re mistaken. As the New York Times points out, a little sleuthing is all that’s required to identify some searchers.

While the NY Times article shared a fairly tame user’s search results, some other search results might lead to more troubling user account “outings”. Consider one example that was highlighted in an article in Slate:

The searches of AOL user No. 672368, for example, morphed over several weeks from “you’re pregnant he doesn’t want the baby” to “foods to eat when pregnant” to “abortion clinics charlotte nc” to “can christians be forgiven for abortion.”

It quickly becomes evident that our search results tell a story about our lives. Like our email, our web usage tells a lot about our interests, our desires and who we are as a person. By sifting through our internet usage patterns, one could learn to understand us almost as well as we know ourselves, warts and all.

The Slate article goes on to identify seven types of web searchers. From “The Pornhound” to “The Newbie” to “The Basket Case”, there are numerous labels that can be both descriptive and dangerous.

While I do find these search results to be quite interesting, I do see danger in the use of that data. It’s a slippery slope from academic study of search results to censorship and even to persecution. As crazy as this sounds, it is already happening in the world. Look at the media control in some communist countries. And if you think we’re immune here in the western world, well… think again. It wasn’t long ago that freedom of speech was curtailed by the church. Even the United States is experiencing a resurgence in censorship.

How long until this powerful information is abused and distorted for unethical means? I’d argue that it is already happening. What do you think?

Todd

For further information:

Techcrunch – Blog Archive – great info on sources and further info:
http://www.techcrunch.com/2006/08/06/aol-proudly-releases-massive-amounts-of-user-search-data/

AOL Search data mirrors:
http://www.gregsadetsky.com/aol-data/

Working mirror (as of Tues Aug 15):
http://aolsearchlogs.cloudsites.com/AOL-data.tgz

Advertisements
About

Author, Geek, CF fundraiser & Cancer Survivor. My wife & kids, faith, baseball, infosec & devops are a few of my favorite things.

Tagged with: , , , ,
Posted in technology

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: