Finding Related Web Pages

Google is the only major search engine that offers a "similar pages" feature, but not too many people use it. Launched in September 1999 as GoogleScout (scout=explore, investigate), the feature shows around 30 web pages related to a search result.

For example, to find sites related to Google Reader, you can click on the "similar pages" link placed after the snippet and you'll discover other feed readers, Google Reader's blog, information about feeds, blog platforms.


The related pages are generated by analyzing the link structure of the web. A patent from 2000 explains how this feature works: "a first set of hyperlinked documents that have a forward link to the selected hyperlinked document is provided. Additionally, a second set of hyperlinked documents that are pointed to by the forward links in the hyperlinked documents in the first set is provided. A value is assigned to each forward link in each of the hyperlinked documents in the first set, with the value being reduced for a forward link if there are multiple hyperlinked documents from the same host as the hyperlinked document that includes the forward link. A score is generated for each hyperlinked document in the second set according to the values of the forward links pointing to the hyperlinked document. Accordingly, a list of related hyperlinked documents is generated from the second set according to the score of the hyperlinked documents."

Basically, you're expecting that many sites that link to Google Reader will also link to its competitors and to related information. This is very similar to Amazon's recommendations: "customers who bought this item also bought".

How to use this features?

Unfortunately, Google's implementation has a major flaw: because many pages link to popular sites like Blogger, Flickr, StatCounter, you'll sometimes find these sites in the list of related links even if they're completely unrelated. Gred Linden calls this "the Harry Potter problem", when talking about Amazon's recommendation system. "The first version of similarities was quite popular. But it had a problem, the Harry Potter problem. Oh, yes, Harry Potter. Harry Potter is a runaway bestseller. Kids buy it. Adults buy it. Everyone buys it. So, take a book, any book. If you look at all the customers who bought that book, then look at what other books they bought, rest assured, most of them have bought Harry Potter."

So even if GoogleScout doesn't work well all the time, it's a great tool for research and serependitious discoveries (add a bookmarklet to your browser to use this feature for any site you visit). Another way to find related pages is to search for a site in Google Directory and to click on its category. Similicio.us uses the bookmarks from del.icio.us to complete this sentence: "people who bookmarked this site also bookmarked...", while the untrustworthy Alexa fills in the blanks for "people who visit this site also visit...". Google also uses similar ideas to provide recommendations based on your search history.

Labels

Web Search Gmail Google Docs Mobile YouTube Google Maps Google Chrome User interface Tips iGoogle Social Google Reader Traffic Making Devices cpp programming Ads Image Search Google Calendar tips dan trik Google Video Google Translate web programming Picasa Web Albums Blogger Google News Google Earth Yahoo Android Google Talk Google Plus Greasemonkey Security software download info Firefox extensions Google Toolbar Software OneBox Google Apps Google Suggest SEO Traffic tips Book Search API Acquisitions InOut Visualization Web Design Method for Getting Ultimate Traffic Webmasters Google Desktop How to Blogging Music Nostalgia orkut Google Chrome OS Google Contacts Google Notebook SQL programming Google Local Make Money Windows Live GDrive Google Gears April Fools Day Google Analytics Google Co-op visual basic Knowledge java programming Google Checkout Google Instant Google Bookmarks Google Phone Google Trends Web History mp3 download Easter Egg Google Profiles Blog Search Google Buzz Google Services Site Map for Ur Site game download games trick Google Pack Spam cerita hidup Picasa Product's Marketing Universal Search FeedBurner Google Groups Month in review Twitter Traffic AJAX Search Google Dictionary Google Sites Google Update Page Creator Game Google Finance Google Goggles Google Music file download Annoyances Froogle Google Base Google Latitude Google Voice Google Wave Google Health Google Scholar PlusBox SearchMash teknologi unik video download windows Facebook Traffic Social Media Marketing Yahoo Pipes Google Play Google Promos Google TV SketchUp WEB Domain WWW World Wide Service chord Improve Adsence Earning jurnalistik sistem operasi AdWords Traffic App Designing Tips and Tricks WEB Hosting linux How to Get Hosting Linux Kernel WEB Errors Writing Content award business communication ubuntu unik