Usage Data from Web Search users. Benefits and Limitations

Internet has grown a lot over the last years and this can be disturbing because the big amount of information it has about us (our web sites visited and the searches we do). This is useful for us because it ease our online life but, on the other hand, it’s a big privacy issue!

In the beginning of this year it took place the Barcelona search congress where Ricardo Baeza-Yates, VP of Research for Europe and Latin America, explained us how search engines work and how they will work taking this aspect into account:

The main idea is that every time there is more information about what we search on internet. This leaves such a footprint about what we are and what we are interested in, which is very good and can be quite bad.

The best advantage is search engines can anticipate our information requests. It means that it helps us to rewrite our searches with the famous Did you mean? question (AKA dynamic queries these days). (Page 5 of the presentation).

dinamic-queries

But this has its risks and limitations:

  • Size issue: We can get trapped into thewisdom of crowds where, for example, you’ll check info about some german friend of yours named shwarzneger but search engines keep showing you info about the famous shwarzenegger (Page 10 of the presentation):shwarzneger
  • Personalization issue: Currently the results of our searches are customized by language, locationand previous searches. That get us stuck inside our preferences and tastes as the Truman’s show protagonist (page 12 of presentation).El Show de Truman
  • Privacy issue: Today, just analyzing our searches’ logs it’s possible to guess our gender (with 84% accuracy), age (with 79% accuracy. 10 years margin of error), postal code (with 35% accuracy) or our name (full name 1.25% or just the name 8.9% accuracy). I tjust happened once to an old woman who found one day a journalist knoking at her door asking her if she were the user number 4417749.

    A Face Is Exposed for AOL Searcher No. 4417749

    This is what happened: at 2006 AOL released a list with 20 million web search queries (no users identification, only an id) for social investigation. Apparently no problem detected, but user number 4417749 made hundreds of searches over a three-month period (e.g.: “landscapers in Lilburn, Ga”, “60 single men” or “dog that urinates on everything”). It was easy to identify that user: a 62-year-old widow neighbor of Liburn (Georgia, USA) who got visited by a reporter (more info in The New York Times:A Face Is Exposed for AOL Searcher No. 4417749).

Besides this, is possible to take advantage of all of this information by means of the context (page 15 of presentation).

Ricardo showed us a very interesting work where he combined maps and geolocalizacion, specially a set of photos from pilgrims Twittter users of Way of St. James that forms a cloud in Spain map at north.

You may find something similar on Eric Fisher’s Flickr gallery:

See something or say something: Barcelona
Pics from Barcelona

Just let me finish mentioning Fernando Macià and his work team from HumanLevel for their fantastic job sharing their notes about this presentation. I’d also like to recommend these books suggested by the speaker:

And you? Did you assisted to BCN SearchCongress 2013? Did I miss something?

Comment, please O:)

Comments are closed.