Skip to Navigation | Skip to Content

Search Analytics

Conversations with your Customers

Title In Progress

Needed: policies on retention and use of search logs

The premise of our book is that search analytics, or search log analysis, yields tremendous benefits. We think just about everyone ought to analyze searches their customers perform. If we succeed in our mission, then a lot more organizations will start analyzing their search logs. But the recent AOL disclosure of log information vividly demonstrated the risk to privacy if you let your logs out into the wild. So Lou and I are mulling what an ideal search log retention and use policy would look like.

I've scoured the Net, and have found that while lots of sites have privacy policies, virtually no one discusses search logs in particular. A happy exception is the ACM, whose policy states:

ACM does not log the identity of visitors. However, we may keep access logs, for example containing a visitor's IP address and search queries. We may analyze log files periodically to help maintain and improve our Web site and enforce our online service polices. Cookies are only set when users visit restricted portions of our website. Raw log files are treated as confidential and retained for no longer than two years.

Ahh. Clear and concise. So if we were going to write a manifesto to persuade everyone who retains search logs that they need a policy, what might the major bullet points be? Here's a stab:

Bullet points:

  • The AOL fiasco showed poor judgment on the part of well-meaning researchers who lost their jobs due to an attempt to contribute to the academic field of information retrieval research.
  • The AOL researchers failed to realize that identifying users by a randomly assigned number was not sufficient to protect identity. It is trivially easy to load the dataset into a database and reconstruct an individual's search activity over time -- and that's what techies did within days.
  • Search log analysis is an incredibly useful tool that can help improve the way we match end users with the information they seek. It would be a shame to see this incident impede that research in any way.
  • Many organizations publish their privacy policies, but very few sites that offer a search function specify their policies on search log analysis. A notable exception is the ACM. A search log analysis policy should include:
    • Scope: what is recorded in the log? (E.g. IP address, user name, etc.)
    • Access: who in the organization had access to the logs? Does the organization share the logs with outsiders?
    • Retention: how long is raw log information retained?
    • Reduction: does the organization routinely aggregate the raw log data so that no personally identifiable information remains?
  • The long tail depreciates: We need to gather search logs in order to improve the information services we offer. We do not need to retain it forever. We should all understand that raw data depreciates over time. Most organizations have little reason to keep raw logs longer than, say, 60 days.
  • The public and the media should vigorously press Google and its competitors on this issue. At the very least, Google et al should disclose very specifically what their policy is on retention and use of search logs.

/rich

Comments

You might find the EFF's white paper on Best Practices for Online Service Providers a useful reference, at the very least, in regards to information that can explicitly identify individuals.


Netflix's privacy policy covers search logs as part of "site activity": http://www.netflix.com/PrivacyPolicy

Netflix does not disclose personal information, but may disclose analysis of aggregates of that information. Even internally, we need a specific reason to look at one subscriber's data, like a customer support request.

There is specific legislation about privacy and video rentals, the "Video Privacy Protection Act of 1988", passed as a reaction to the publication of Judge Robert Bork's video rental history. Perhaps that is one model for search record privacy.

Hmm, looking at the law as a non-lawyer, it might apply to a wide variety of internet video delivery, like YouTube, internet courses, ...

Thanks guys! These look quite interesting.

Post a comment

We’ve enabled comment moderation on Rosenfeld Media. Upon posting your comment, it will not immediately appear on this page. Hang tight, we’ll be sure to screen it before too long. (Starred fields are required)

Within This Book's Site: