One of the links from the blog entry, 20 Ways Search Engines May Rerank Results, just below this post, is a paper presented at a conference, Search Engine Strategies, 2005. Dr. E. Garcia presents, in fairly accessible language, a study of how Google and other search engines select and rank web pages. He approaches the matter from the point of view of consultants who wish to increase the click-throughs to a client's web page. But the methods of comparing and selecting web pages are fascinating. Somewhat like the methods of the natural language search engines in Lexis and Westlaw, the algorithms snip text, sort out stop words and account for duplication of words. The search engines create a window of about 100 characters or about 15 terms to compare, and work through the document creating these little windows of terms to judge relevance This is the Snippet Optimization Process (SOP).
Dr. Garcia notes
Understanding SOP permits the optimizer to conduct some tests just in case he or she suspects that a given search engine is using snippet-based filtering techniques. The optimizer or copywriter not only may be able to avoid unexpected surprises but can actually identify specific portions of text and optimize these according to the local contextual information or surrounding text.
From the development side, understanding SOP enables developers and marketers to design hierarchical clustering interfaces for content categorization.
Of course, librarians may want to increase the selection of their web pages or blogs, but more likely, our attitude is that we want to understand how the search engine works so we can write better searches. The information is equally applicable to our library aims. Read the link!