Sample Chapter
(Note: This is an early draft our book's first chapter.)
Chapter 1: Why Search Analytics? An Empirical Case1
Necessity, they say, is the mother of invention. This is the story of how one major university adapted when its enterprise search engine couldn't grow as fast as the university's Web presence. First we improvised a "best bets" solution-and we discovered that it provided more benefit than we ever dreamed possible. Then turning the scientific method on its head, we did some research to learn why search analysis and the "best bets" approach yielded such great results.
An Enterprise Search Experience: Bliss, Then Pain
Think back to the mid 90s: Digital Equipment Corporation had launched a new search engine, AltaVista. The new search kid on the block raised the bar for search performance. For a while, AltaVista enjoyed the buzz and admiration that Google enjoys today. Michigan State University was the first institution of higher learning to beta test the new AltaVista enterprise search product in 1996. We acquired the product as our campus-wide Web search engine, installed AltaVista on a Digital Equipment Corporation server that was powerful for its time, and then let the spider loose crawling our Webspace. Soon we launched our comprehensive local index of the msu.edu domain.
Our local AltaVista engine hummed along merrily. We provided users with a Web search service that was efficient and comprehensive. Both our end users and our content providers were pleased with the search experience. We found that our local AltaVista covered at least twice as many pages in the msu.edu Webspace as did the global service. And, we found that our local index was far more up-to-date [than before??].
We were happy.
Bliss Turns to Frustration
But something terrible slowly unfolded as our Webspace grew. Over the course of just a few years, we grew from 100,000 Web pages2 online to over a million. Alas, AltaVista just couldn't keep up. Even though we were running our own server and managing our own index, we couldn't seem to keep the search service relevant and current.
We began to hear complaints from end users and content providers alike:
- Why can't I find the Web site for the Human Resources department?
- Why can't I find how to check my grades?
- Why is this horribly out-of-date page showing instead of the current one?
- Why does the search engine point to pages that are no longer online?
We implored the vendor to address our performance concerns, alas to no avail. We compared notes with other institutions and found they had similar complaints. Some had devised partial workarounds: One university rebuilt its entire index weekly-a time-consuming and resource-intensive process. Another university simply limited the scope of the spider's grasp to 100,000 or so official pages. That solved the problem of scale, but it meant that a million other pages disappeared from their index.
Eventually the AltaVista product and its parent company went through several corporate marriages and splits. Not surprisingly, given our experience, the AltaVista enterprise search was decommissioned as a product. Today, all that's left is the altavista.com search site, now maintained on life support by Yahoo.
We had a problem. We'd invested a lot of money to obtain the AltaVista enterprise search project, and it was hard to make a case for buying something else. Meanwhile, the users and content providers were still complaining. We knew we had to do something.
Stumbling into "Best Bets"
It just stands to reason: If you go to a university Web site and search for, say, "registrar," then the home page for the Registrar's office should be at the top of the hit list, right? But the official page might appear as the 10th item-or worse, the Registrar link might not show in the first page of results at all. Increasingly customers complained about AltaVista's increasing inability to put the "right" page at the top of the hit list.
So we asked the vendor to provide a simple solution: Give us a knob to turn to affect the ranking algorithm. All search engines use an algorithm, or mathematical formula, to decide which pages rank higher than others. Ranking algorithms vary from tool to tool; in fact, Google stormed the search industry by devising an algorithm that favors pages that have the good fortune of being popularly linked.
Our request was simple: For a given URL, let me specify a "fudge factor" for the ranking formula, telling it in effect "If this URL appears in the result set, artificially inflate its relevance by this amount."
When the vendor didn't deliver, we decided to take matters into our own hands. Why not intercept each search request before feeding it to AltaVista; and, for those sites that lots of people complained were hard to find, simply manually display the link?
So we set out to implement this workaround. I hired a bright computer science student, a rare breed who not only was an outstanding programmer, but also possessed database and Web services understanding.
Development began.
We chose Microsoft SQL as our back-end database. In a nutshell, our Best Bets tool, which we dubbed "MSU Keywords, works like this: For any given URL, the database has a table entry that lists one or more keywords. Thus, for the Registrar's site, http://www.reg.msu.edu would have an entry such as:
- Registar
- Registrar's Office
- Office of the Registrar
- register
Every time someone does a search from the university home page, a simple script interrogates the database. If matching keywords are found, we display the URL-or URLs in the case of multiple matches-in the form of a hit list. If we don't find a match in our database, we just let the search engine do its best.
Once "MSU Keywords" was in production, an amazing thing happened: Complaints about not being able to find something in the MSU Webspace declined. Some people thought AltaVista had been upgraded. (What irony: The software that had failed us got credit for our good work!)
Even content providers began to notice that their pages were at or near the top of the hit list. A few even sent kudos.
Slouching into Search Analysis
As we built MSU Keywords, we needed an intelligent way to choose which keywords to put into our database. We had a list of complaints from customers and content providers alike, so we tackled those items first. For instance, we had a lot of complaints about finding the Registrar's site, so that was a high priority to include in our Best Bets database.
We could just follow up on people's complaints as they came in, but we wanted to be more proactive. One idea was to go through the university organizational chart and try to exhaustively include any Web starting points for university departments into our Best Bets database. But that would take a long time, and it would be driven by the university's view of itself, not by what customers look for the most.
Surely there was a better way. Since we were trying to respond to users needs, it made sense to make it a priority to add the things people searched for the most into our Best Bets database. We wrote a simple PERL script that combed through our search server's Web logs, which totaled how often each keyword was sought. This gave us a report detailing what people search for, with the most popular searches at the top of the list.3

Figure 1-1: Marked-up search report.
Eyeballing the rank-ordered list, it appeared we could add a relatively short list of keywords into our database, and do a lot of good. Here's a sample report:

Figure 1-2: Common search queries from Michigan State University's search logs.
We started at the top of the list and added entries to our "Best Bets" database from the top down. After we had added only a few hundred entries in our database, customer satisfaction seemed to improve greatly. We received a lot fewer complaints from customers and from campus content providers as well. But why? We'll explore that question later in the chapter. Without fully understanding all of the reasons, we realized that we should get serious about search analytics.
For our initial log analysis, we used a simple PERL script to aggregate log information and produce reports. We ran these reports from time to time, watching for new popular searches. Eventually we decided we needed something more dynamic-that is, something closer to real-time analysis. So we built a search logging and reporting database tool. For every search performed from the university home page, we increment a counter in the database for the keyword or phrase sought. Ergo, now we could run a report anytime, covering whatever time interval we wished to explore.
Armed with a tool we could run daily over a simple Web interface, we got into the habit of looking at the logs frequently. If you spend enough time in the search logs, you can learn a lot.
What We Learned
Search analytics reveals how your customers really behave-and, sometimes, it's a surprise.
Take a look at MSU's sample report in Fig 1-2 above. Searches appear on the list in the order of popularity. This report reflects a 31 day period in the middle of a semester. The most popular searches vary throughout the year, but some searches remain popular all year long. For instance, "campus map" is perennially popular. This is a surprise because the university home page at www.msu.edu not only has a direct link to "Maps," it even offers an iconographic image of a map. (Or maybe it's not a surprise, as we'll discuss later.)
Notice that "campus map" is the first entry on the list, and "map" is the second. In most cases, most people seek the same thing - though "map" is more ambiguous.
The phrase "study abroad" is high on the list because MSU is a leading American university that offers overseas study opportunities. Searches for "study abroad" may come from prospective students, current students, parents, or high school guidance counselors.4 "Computer Store" ranks highly because the university requires all students to bring a computer to campus; "bookstore" is there for obvious reasons (as is the plural "bookstores").
Some of the popular terms are local to the institution; "Spartan Trak " is the campus-branded link to the MonsterTrak job searching database; "CATA" is the campus and area bus system; "Olin" is the name of the student health center.
But let's return to our little mystery. The home page has a very obvious link to the campus map. So why on Earth would people head for the search box when there's a clearly-labeled link for maps right there on the home page?
One answer is simple human nature: It's a common convention for Webmasters to place the search box in the upper right corner of the home page, where the customer's eye is quickly drawn. Some site visitors won't even skim the home page looking for the desired link.
And then there's the "Google Effect." Google has trained the masses that it's often more efficient to search than to browse. You want the home page for the Jay Leno show, or the campus map for Oxford University, or Lyle Lovett's discography? Just type in what you're looking for and it'll be very high on the hit list. Google has set expectations very high for those who use the humble search box.
In other cases, people may have landed somewhere else on our university Web site-somewhere other than the home page. Maybe they followed a "deep link" into the site. Or maybe they used Google or Yahoo or MSN to do a search and landed on a page deep within the site. Maybe then they said "This looks like a cool place-I want to see a campus map. Since the search box appears on every page on the site, you can search on any topic from any page, whether the page you're looking at relates to the topic or not. Thus any deeper page won't offer an obvious link to the campus map-but every page on the site offers the visitor a search box5, always in the upper right corner-a window to all the content on the site.
People Search for Starting Points
As the Web revolution unfolded, and search engines such as AltaVista came on the scene, along with browsing catalogs such as Yahoo, some of us6 believed that people would use browsing catalogs like Yahoo to find major starting points: the home page for CBS, or Berkeley, or ESPN. We believed that people would browse to find the starting points for well-known enterprises or brand names. At the same time, we thought if people sought something specific-or even something obscure-people would use search engines.
We were wrong. It turns out that a lot of people use search engines to find starting points.
Thanks again largely to Google, people often search for starting points rather than browsing through complicated classification schemes devised by others. If I want the home page for the Post Hotel in Lake Louise, then I'm going to search for (you guessed it) post hotel lake louise
And guess what? Google delivers what you're looking for as the very first hit. In recent years, Yahoo and MSN re-engineered their Web search technology in order to confront this reality. Don't give me articles about the Post Hotel or personal opinions about it; give me the home page for the hotel (and give me the other stuff later in the hit list).
So it goes when people come to your Web site and do a search. When people search for "human resources" from the Michigan State home page, they want the home page for the Human Resources department7. If your search engine doesn't give "the" right matches for popular searches you may need to use Best Bets or other re-engineering based on what you learn from search analytics.
People Don't Employ Complicated Searches
Note that customers use one or two word searches most of the time. This doesn't give a search engine much to chew on. Nonetheless, a one-word search can be effective-for instance, when a specific, unique term such as "AOP" is sought. On the other hand, a one word search for a common English word may be too ambiguous. For instance, when someone searches for just the word "maps," we don't know whether they want the campus map, or the university's map library.
The Popular Searches Are Really Popular
Note that the most popular search, for "campus map" occurs over 5800 times during our sample interval. People sought the twentieth most popular item, "human resources" about 1800 times. If you assume that the top two searches, for "campus map" and for "map" usually pertain to the same thing, that adds to over 10,000 searches. This implies that customers seek the most popular item over five times as often as the item ranked twentieth.
If you were to graph the distribution of all searches over a sample interval, you would discover a simple fact that seems to apply to all Web sites, almost as a law of nature:
A small number of search phrases accounts for most of the searches at your Web site
It turns out that science backs up this observation. Analyze the search patterns at any university, or any Fortune 500 company, or any other enterprise-and you'll find that a few searches account for most of the searching. We'll explore the science that explains this pattern at the end of this chapter.
Some Searches Are Seasonal
A few weeks before the college football season begins, it's not surprising to see "football tickets" as a popular search. Towards the end of a semester, you'll find a lot of searches for "final exams." Any organization will notice seasonal patterns in searching.
Some Searches Are Mysterious
One search term that often appeared in our search logs mystified us: a search for the word "address." For a long time we assumed that a lot of customers, such as prospective students, wanted to find the postal address for the university. Alas, there is no single postal address; if you want the Office of Admissions, then you need to mail your letter to a specific room and building. (And in fact, the Office of Admissions doesn't particularly want to receive paper mail: it's a great expense to respond to thousands of postal inquiries annually, and they'd much prefer that prospective students visit their Web site.)
One day we learned that there was at least one other popular use of "address." MSU provides campus telephone operator service 24 x 7. These operators field a large number of information requests on a wide range of topics. One day an operator told us that a lot of people call them asking for the address of the university, because they want to type that address into Mapquest so they can get detailed driving directions to the university.
Duh! We should have thought of that long ago. Once we understood this customer need, we added entries to MSU Keywords that would help people get the driving directions they need. We also embarked upon a project to provide driving directions and regional maps and featured the maps prominently. (See "Tune Your Content" section below.)
What We Did-and What We Do
We regularly analyze search patterns by asking the following questions:
- Does every popular keyword have an entry?
- Are people searching for new or different things now?
- Does the database still point to the "right" URL for the most popular searches? (We also built an automated link checker. Every day, it checks every entry in the MSU Keywords database, and produces a report for those URLs that are not accessible.)
Now search log analysis is part of our regular routine. It's as if everyone who enters a search is participating in a real-time survey.
For our purposes, we're mostly interested in the top 500 to 1,000 unique searches for our "best bets" database. That gives our users a great deal of benefit without an undue amount of manual labor. A few years ago, we switched to the free university search from Google, which provides good coverage of many of the millions of URLs in MSU Webspace.
There's a lot of buzz lately about the "Long Tail"-how to mine your logs for the seldom-sought, in order to achieve mass customization. For the purposes of our Best Bets effort, we're most concerned with the "Short Head." We'll explain the math of the Zipf curve in detail in a later chapter, and explore how best to exploit the entire curve.
You could say that that's the long and the short of it.
Analyze Often
We regularly analyze search logs-at least once a day, if not once a week. We look for:
- Searches that are popular but return no results.
- Popular searches that don't yield the best results.
- Changes in what people search for.
Armed with this information, we take action as necessary.
Our Best Bets service also checks every link in the database daily to flag entries that point to "bad" URLs. If we discover a broken link, we repair it by finding the URL of the new home of the content. Sometimes the content is no longer online-perhaps a site has been retired.
Keep Your "Best Bets" Database Current
When we find that a popular search isn't yielding results, or yields the wrong results, we add or update entries in the Best Bets database. We learn of the need to update them from the following analytical tools:
- Search analytics
- Our daily link checker, which lets us know of any items in our database that are linked to non-functioning URLs.
- Comments from customers and content providers
A search engine like Google or Verity has a robot that constantly crawls your site, looking for new content and dead links. If you use a Best Bets database, you'll need to keep it up to date. Fortunately, if you keep the database small-that is, limited to the popular destinations-this burden is relatively minor.
Tune Your Site Navigation
A large number of searches for a given topic may be a sign that it's hard for customers to find the item by browsing. You may want to see if these popular items need more prominent placement in the site's browsing view.
But don't take search popularity alone as a sign of failure; if a lot of people search for "campus map" and you deliver what they need, pat yourself on the back.
Tune Your Content
Sometimes we find that a lot of people search for something, but there simply isn't any content online that matches what they seek. Then it's time to talk to the Webmaster or the Web content team about preparing and posting the missing content. In a large enterprise, such as a university or a corporation, this may take a while: you've got to find the person who "owns" the content area, and you have to persuade them to add the content that your log analysis information demonstrates what people are seeking.
Or maybe you have content, but perhaps it isn't in the right form. If a lot of people remotely are searching for a campus map, maybe you should be sure to offer a readable map as a PDF that will print on single sheet of 8.5 X 11 paper. (You'd be surprised how many universities offer online maps that are cut-up copies of large fold-out maps-totally useless to the remote would-be visitor.) If a lot of people locally want to know where they can pick up a paper map, maybe you need a page giving the person just that information on that basic point.
Tune Your Print Materials
Search log analysis can reveal opportunities to improve printed materials. Analysis of our Web search logs revealed that "campus map" was a popular search. We realized that in many cases people wanted to find out how to order a printed campus map, so we added a link to our Best Bets.
We took a look at the Web-based maps at other universities, and discovered that major universities make major mistakes when it comes to campus maps. Here are some of those mistakes:
- Offering only a detailed map that shows every building. A large percent of visitors want to find how to get to a handful of places, such as the performance arts center or athletic venues. They need regional and local area maps.
- Providing artistic maps, such as 3-D renderings replete with trees. These work for people who already know the campus, but not for the masses who are new to the place.
- Failure to offer both online friendly and printable PDFs of maps.
Search log analysis helped us make the case to the people who maintain the official printed campus maps, and partner with them to improve the online offerings. We were even able to integrate the look of the printed map with the online offerings. With SLA, it was easy to show that an information need, as basic as showing major travel routes to the university, was as important as helping people find the Old Horticulture building.
Enterprises still publish an abundance of print materials-brochures, directories, order forms, catalogs, etc. Printed materials may drive people to your home page, where they'll search for the topic of interest. If your search analysis reveals that a lot of people searching for directions to the clinical center or information on the Emerald Ash Borer, you can respond using Best Bets links or other methods.
Listen to Your Customers through Other Channels
Search analysis is a powerful tool, but there are other ways to communicate with your customers. A very simple approach that we use is to offer a Contact Us link in the footer of every page on our Web site, including search results pages. We soon learned that people would click on the Contact Us link to ask basic questions such as "How do I apply to your school?" so we added an FAQ-a real Frequently-Asked Questions document, based on the most-frequently asked questions. This lets us off the hook for questions we've covered, so we can spend time on unique questions, comments, or suggestions.
Not only can end users tell us about problem searches, so can campus content providers. There is nothing more satisfying than receiving notice of a bad search, doing a quick fix in the Best Bets database, and telling them the problem was resolved within minutes.
And there is nothing more satisfying that watching the customer comments devolve from complaints to kudos.
Why Search Analytics Works So Well
A good scientist forms a hypothesis, and then performs experiments to confirm or reject its validity. We had performed the experiment without the hypothesis. First we built our Best Bets database, and we kept seeing evidence that it was yielding real benefits. We'd stumbled onto something good. So then I decided to go back and figure out why.
Some searching for relevant literature led me to a discussion of the Zipf Distribution, named after a Harvard professor who counted the distribution of words in James Joyce's Ulysses back in the 1940s8. Not surprisingly, he found that common words such as "the" appear in the text far more frequently than less-common words such as "neatsleather." In fact a small number of unique words accounts for a large percentage of the total word count for the entire text (which happens to be 260,430).
Zipf found that when he plotted the frequency distribution of the unique words in Ulysses, and he found that a very steep curve applied. The curve looks something like this:

Figure 1-3: The Zipf Curve.
Actually, being a scientist, Zipf plotted the data on a logarithmic scale, and found that a straight line applied. Zipf studied other phenomena and discovered many parallel cases. For instance, a small number of cities within a nation accounts for a large fraction of the population of the entire country. Think about it: if you add up the population of New York, Los Angeles, Chicago, Atlanta, Las Vegas, and Dallas, you've got a pretty good fraction of the population of the entire United States.
What Zipf teaches is that it makes a whole lot of sense to start at the top of the list of popular items, because that gives you the greatest bang for the buck. Conversely, it would make little sense to start with the less popular terms. You've heard this concept expressed as "the law of diminishing returns." Or perhaps you speak of the 80/20 rule - that 20% of the effort expended provides 80% of the benefit. (This rule was named after Italian economist Vilfredo Pareto who sought to explain income distributions in his country.)
In the 1940s, U.S. scholar Dr. Joseph Juran described the phenomenon as the "vital few and trivial many."
Having learned that s a scientific underpinning might apply to what we were observing, I plotted the frequency distribution of searches at MSU. My eyes grew wide when I saw this curve:

Figure 1-4: The Zipf Curve applied to search queries.
When I saw this curve, I felt as if I were a scientist who discovered penicillin or some other breakthrough. Of course, I hadn't discovered anything; I'd re-discovered something that others had understood for years.
To really understand the power of the Zipf Curve, consider this: in one sample of 200,000 searches at Michigan State University, only 500 unique searches account for 40% of all searches performed. Only 1000 unique searches account for 50% of all searches people execute.
What does this tell us? A small Best Bets database can easily cover all of the searches your customers perform. This means you can expend a small amount of effort building your Best Bets database, and derive a huge amount of benefit.
Conversely - and this is where "diminishing returns" comes into play - you wouldn't want to build a Best Bets database of more than 1000-3000 entries. At the worst extreme, you'd include thousands of searches that were only tried once. At that point, you'd be replacing your search engine with a human being - yourself!
The happy moral of this story is that search analytics can do much more than help you understand what your customers seek. You can use what you learn to take action that yields important benefits for your enterprise and your customers. And the best news: a small amount of effort will yield huge benefits.
Conclusion
Thus ends our tale of one institution's happy foray into "search analytics." We've shown how you can understand your customers' needs better through this process - and how you can take steps to improve the customer experience.
Now it's time for you to make the business case for employing search analytics to improve your own Web presence. This means "selling" the concept to your colleagues and to your management. Chapter 2 will tell you how to make the business case for making search analytics a key part of your Web development process.
Notes
1: Note from Rich Wiggins: This chapter is the culmination of my experience as a Senior Information Technologist at Michigan State University.
2: By "Web pages" we mean unique URLs offering unique content. Even back in 1996 AltaVista indexed a number of "pages" that were dynamically generated out of a database.
3: You can run reports yourself using Michigan State University's search analytics database. Visit http://searchlogger.acns.msu.edu/info/
4: Or high school guidance counselors, or newspaper reporters, or universities wanting to compare their "Study Abroad" section with that of a leading competitor.
5: At least, every page on your site should offer a highly-visible search box. Usability guru Jakob Nielsen and others argue that, since up to half of visitors would rather search than browse, you might as well enable their lust for searching from every page on your site.
6: Including one of your authors, whose initials are RW.
7: Well, truth be told, they may be looking for academic programs for human resources professionals. Thus the Human Resources department, and the academic programs, both need to appear high on the hit list.
8: Well, of course, Professor Zipf didn’t count the words. His graduate students did.
Comments
This chapter isn't convincing to me. First, being better than AltaVista leaves a vast range of badness. AltaVista was never especially competitive and only had a one or two hundred customers left at the end. Falling over at a million docs is embarrassing. Ultraseek was handling over four million docs in 1998. Any search engine that couldn't give good results for such unique terms as "CATA", "spartantrak", and "Olin" was seriously broken.
The quote about "the vital few and the trivial many" is a completely wrong point of view for search. If I'm doing a rare search, is it trivial? Of course not. It is my only search and it is vitally important to me. The whole point of full-text search is to gracefully handle the queries that are so rare that you can never test them.
Manually hacking search results is expensive to maintain. How do you know when the URLs change? How much effort (hours) did it take to write the software and enter those first 1000 entries and what does it cost (full-time equivalent employees) to maintain it? How does that compare to buying a better search engine?
Finally, I can't quite find the "center" of this chapter. Is it about MSU, about Zipf distrubutions, about some unstated problem, about this (manual) solution, or about what you do when you don't have money to buy good software?
I'm most worried about the "unstated problem" part. The chapter says it will explain what to do when your search engine can't scale. The answer to that is to get a non-broken search engine. The chapter really seems to be about a manual replacement for bad relevance (again, get a non-broken engine), but I think the book is about something else.
Who is going to buy this book? What problem do they have? How is this going to help them with that problem?
wunder
Posted by: Walter Underwood | June 12, 2006 06:34 PM
Hi Walter, thanks for the great feedback. I think our primary disagreement centers around programming a search engine to gracefully handle all queries great and small (i.e., frequent and infrequent) versus investing manual effort into improving performance for the really frequent queries. First, these two philosophies are by no means mutually exclusive. But let's face it, the performance of even strong tools like Ultraseek still can be enhanced by editorially-driven improvements like best bets, stuff that can't be handled automatically (even by a non-broken search engine). And if you're going to make those kinds of enhancements, clearly you'll apply them to the most frequently-occurring queries.
Rich has more direct experience managing best bet results, and I believe MSU has automated aspects of its maintenance (e.g., link checking). I'm quite certain that the effort costs less than purchasing, configuring, and implementing a new enterprise-class search engine. Perhaps more importantly, the project Rich described in the chapter reverse-engineered some important organizational change at MSU. Web and content managers in various departments around MSU are now directly involved in the creation and maintenance of best bets, thereby distributing much of the work. And perhaps most importantly, their literacy regarding information management is far ahead of what it was just a few years ago.
I don't want to over-interpret your critique, but let's avoid getting into an "automation versus manual effort" pickle; they're both important and necessary, and search analytics can help you with both.
As far as the message of the chapter, well... This is an early version, and I'm sure will improve before publication. And it will be part of a broader context (i.e., between a preface, a second chapter, and described by a table of contents). So I hope those concerns will go away once we send you a copy of the final book. :-)
Thanks again Walter.
Posted by: Lou Rosenfeld | June 16, 2006 09:04 AM
Best Bets can be very effective, especially on slowly changing content, but I would never recommend them as the first step to take in improving search.
I'd like to see the chapter close with a summary of what steps they took and what they would have done now that they know better.
I know what I would suggest, but that is turning into a post which is self-contained, so it makes more sense to put that on my own blog. You can read that post at Good to Great Search.
Posted by: Walter Underwood | June 18, 2006 05:18 PM
So, I checked out the post on your blog and tried to comment, but it was rejected, because I wasn't registered. Odd. Even more odd, there wasn't a way to register - at least not one I could find. So, I'm posting the response here.
"Best bets are effective, but usually a last resort, not the first."
Last resort based on what? We're currently doing research on redesigning the search and search results interfaces for a very large pharma company. The search tool is going to run agains a couple billion documents. Yes, that's Billion with a B.
One of the things we've found is that from the customer's perspective, Best Bets have very high value. They're not after a few good answers, the are after the "right" answer. And best bets helps solve that issue for them. If the customer enters a particular address, they want to see a thumbnail map and address, not a "typical search result" for the address. That's their interpretation of a Best Bet.
If you mean to say "Don't put all your eggs in the Best Bets basket," then I'd agree. But to say they're a last resort, well, that's simply not good advice either.
"Get a better search engine."
Well, that's one way to do it. Or improve the implementation of your search engine. But if you can't plunk down another $1.5-$5M for a new search tool, and a 3-6 month implementation then what? Don't throw the baby out w/the bathwater.
Additionally, you commented on the book review's site that Best Bets were also bad because they require manual work. Isn't what your suggesting on your blog also manual work? Or is there a way to automate that?
While I won't claim to be a search analytics expert, I have done my share of large implementations with Google, Verity, Automony, and a few others. The reality is that I've found great search requires both manual and automated efforts.
Posted by: Todd Warfel | August 4, 2006 03:07 PM
I thought this was pretty useful in that it did a good job of emphasizing the fundamental point about Zipf distributions.
(If you mention that Zipf did his chart logarithmically, you should include a logarithmic chart in the chapter. Also, is there any reason why readers might want to consider using logarithmic charts themselves? or did Zipf just do it because "he's a scientist" and they're wacky that way?)
One negative is that this may not be the best choice for the first chapter -- maybe it should be somewhere else in the book. When I realized that the first chapter was about Altavista (snooze) at a university (snooze) beginning in the mid 90s (snooze) I did not feel extremely motivated to read on ... Maybe you should preface this story with a series of shorter, more current vignettes with some zip, like the AOL fiasco.
Posted by: Fred Zimmerman | September 1, 2006 11:42 AM
One of the problems about implementing best bets as the first option is that there does need to be some initial analysis done on content. Maybe it will work in an environment like a university campus, but would it work in a specialist library or research centre. I work in the financial services sector in the UK in search which is very specialised. We have specific areas dealing in tax, assurance (a very large and profitable area since Sarbanes-Oxley) and consulting. In the tax area, a very specialist research subject where content can to a certain extent be always relevant (at least in UK law), you are dealing with the very leanest part of the long tail when it comes to search engines. Traditional librarian/IA skills come into play in building a taxonomy that deals with the specialist language that comes with the subject and then that gets incorporated within the search itself. In this instance best bets comes after the search algoritim and the taxonomy in classifying relevant results for any given search query. It is almost as if the users are treating their search engine as a specialist researcher who understands their terms of reference. The most effective way of doing this is not precooking queries by best bets in the first instance, but by modelling that language through a model that a computer can understand whether that be through XML or just a hierarchical list in Excel. On the other hand, dealing with less specialist search queries, best bets will work well. This came out in the recent AOL fiasco where they released the search data for a three month period. When that data was analysed it did completely correspond with the Zipf distribution chart. It showed that there was a massive drop-off in clickthroughs after the first three results. Best bets can help a lot with general searches and the effort is relatively low after analysing the metrics. It's essentially all about what search service you are trying to deliver from the requirements. Best bets, taxonomies, thesauri, clustering etc are all just different tools that you can bring to the work.
I guess the caveat that I am trying to express is that the "best" in best bets may be misleading in terms of the real search for relevancy in any given environment.
Posted by: John Robey | October 3, 2006 05:12 PM
I think that there is a lot of good material here that would be quite useful to journeyman IA's.
After reading the chapter and the comments, I think that readers are interacting with the MSU example as a design recommendation rather than a case study--despite the title.
I think that the point is that search analytics for the MSU site led to a best bets solution, not that best bets will be the best solution for every site. For many sites, a best bets kind of system is absolutely the right approach; for other content and users, it would be less effective. The key to knowing what is best for your situation is understanding your audience, your content, and your site goals. Search analytics is an important tool in understanding your audience. *If* best bets is an appropriate tool, search analytics also gives you the information you need to build the tool.
As a draft chapter, I would strengthen the "this is a case study" aspect. I think that what you are trying to establish for your first chapter is that analytics have value in obtaining customer satisfaction, achieving site goals, etc. and, "See, here is an example of that."
If appropriate to the scope of your book, you could easily spend a chapter on best bets themselves as an IA design response, including when they are appropriate and when they are not.
Posted by: Dale Mead | November 2, 2006 01:15 PM
I agree with Fred Zimmerman's comment above, not necessarily that it's a negative, but I agree this may not be the best choice for an introductory chapter. If the book were more about Best Bets methods, this may be more appropriate for an introductory example.
I believe this would be more appropriate in a "Case Studies" section with other such examples (not necessarily just Best Bets) of how SA has been used to educate content owners, enhance the customer experience, shorten delivery of enhancements, strengthen what is now a cross-functional Web team (especially with Intranets), etc.
Perhaps the book could contain an appendix on Case Studies, and chapters within the meat of the book could refer to these cases for supporting examples of the text. I find it more effective to be offered a short paragraph or two of a case study, supporting a point, with a reference to the full study should I care to read more later.
Anyway, I enjoyed the reading as I am in the initial research process of identifying the most effective Search tools for a new Web site. The last book I read on Web Analytics was Jim Sterne's Web Metrics: Proven Methods for Measuring Web Site Success. Although published in 2002, I still found some good information and strong ideas for measuring customer behavior on our Web sites.
Please let me know when the book is published, and I will be one of your first guinea pigs ... :)
Posted by: Wynne Hunkler | February 13, 2007 10:33 AM
Sachi, it's a growing field, and I think demand will only go up. Welcome!
Posted by: Lou Rosenfeld | August 14, 2007 07:09 AM
Onsite search is often than not, overlooked by site owners and to larger extent web analytics in general. I am a true believer that this is an area that can make a difference for a site.
Is this book published?
Posted by: Peter He | October 22, 2007 03:31 PM
Not yet, but now that Google Analytics finally supports internal SA, we'd better hurry up!
Posted by: Louis Rosenfeld | October 22, 2007 03:47 PM
Just a quick note on this -- if you're not doing a bibliography at the end, I think you should give links on the scientific sources in the endnotes. The best I could do was this, so I'm curious if you can find better sources:
Zipf's Law, described in a paper by Viswanath Poosala (PostScript: www.math.tau.ac.il/~matias/courses/papers/zipf.ps, also at citeseer.ist.psu.edu/116813.html)
"Pareto principle" as defined and named for Vilfredo Pareto by Dr. J.M. Juran: www.juran.com/who_sub_our_founder.asp.
Posted by: Avi Rappoport | November 29, 2007 08:16 PM
When will the book be published?
Posted by: tang | February 10, 2008 06:59 PM
Gosh, a year ago would have been nice! :-) But we're optimistic about this spring.
Posted by: Lou Rosenfeld | February 10, 2008 11:24 PM