Mining the Web

When you have a lot of indexed web pages, and information about user queries, you can extract a lot of meaningful data. A simple way to do that is by exploiting the document markup structure.

Reworking the navigation links for a site
Google shows four internal links for the top result, when they are available and also relevant. Usually, they are the most popular links. This is a great way to compress a big list of navigation links. Google detects navigation links by looking at groups of links that belong to a phrase.


Finding definitions
Google mines the web to find glossaries. Most of them use the DD tag, so it's pretty easy the detect them. The result: you find definitions that aren't available in traditional dictionaries or encyclopedias. You can find definitions by adding define: in front of your query.


Spell checking
"Google's spell checking software automatically looks at your query and checks to see if you are using the most common version of a word's spelling. If it calculates that you're likely to generate more relevant search results with an alternative spelling, it will ask "Did you mean: (more common spelling)?". Clicking on the suggested spelling will launch a Google search for that term." Google's spell checker recognizes frequent typos, common misspellings, but also terms that are generally confused. So Google is good at detecting misspellings for words that aren't included in dictionaries.


Lists of related terms
Google Sets lets you enter a list of terms and generates related terms. It's a good way to find a list of US presidents, similar illnesses, competitors or movie recommendations. There's no description of the algorithm at Google Sets site, but Google could look at phrases that appear more frequently in a web page, for example in a list.

Universal autocomplete
By looking at popular queries, Google Suggest autocompletes your query, so you type less and also use better queries. This might be extended to a general autocomplete for input boxes, that could be restricted to a domain (for example, music artists).

Google could also mine FAQs (lists of frequently answered questions), create a search engine for files by listing different mirrors and context from the web pages that linked to the files, show related images by mining photo albums, show what sites embed a YouTube video or have frequently updated feeds, or create summaries for web pages by looking at the anchors. When you have hundreds of terabytes of information, the possibilities are endless.

Labels

Web Search Gmail Google Docs Mobile YouTube Google Maps Google Chrome User interface Tips iGoogle Social Google Reader Traffic Making Devices cpp programming Ads Image Search Google Calendar tips dan trik Google Video Google Translate web programming Picasa Web Albums Blogger Google News Google Earth Yahoo Android Google Talk Google Plus Greasemonkey Security software download info Firefox extensions Google Toolbar Software OneBox Google Apps Google Suggest SEO Traffic tips Book Search API Acquisitions InOut Visualization Web Design Method for Getting Ultimate Traffic Webmasters Google Desktop How to Blogging Music Nostalgia orkut Google Chrome OS Google Contacts Google Notebook SQL programming Google Local Make Money Windows Live GDrive Google Gears April Fools Day Google Analytics Google Co-op visual basic Knowledge java programming Google Checkout Google Instant Google Bookmarks Google Phone Google Trends Web History mp3 download Easter Egg Google Profiles Blog Search Google Buzz Google Services Site Map for Ur Site game download games trick Google Pack Spam cerita hidup Picasa Product's Marketing Universal Search FeedBurner Google Groups Month in review Twitter Traffic AJAX Search Google Dictionary Google Sites Google Update Page Creator Game Google Finance Google Goggles Google Music file download Annoyances Froogle Google Base Google Latitude Google Voice Google Wave Google Health Google Scholar PlusBox SearchMash teknologi unik video download windows Facebook Traffic Social Media Marketing Yahoo Pipes Google Play Google Promos Google TV SketchUp WEB Domain WWW World Wide Service chord Improve Adsence Earning jurnalistik sistem operasi AdWords Traffic App Designing Tips and Tricks WEB Hosting linux How to Get Hosting Linux Kernel WEB Errors Writing Content award business communication ubuntu unik