Hosted by Google, but Not Open to Search Engines

Like many other sites, Google uses robots.txt files to prevent search engines from indexing some of the content from google.com. In most cases, Google includes search results pages and other pages generated automatically, which would pollute indexes.


But sometimes Google excludes useful content, either directly using robots.txt files or using addresses that are difficult to index:

* published documents, spreadsheets and presentations from Google Docs - I suspect that the main reason why search engines aren't allowed to index Google Docs pages is that many documents would become public if search engines indexed invitation URLs.

* public pages for Google Reader's shared items - most of the content from these pages is copied from other pages, but Google Notebooks can be indexed by search engines.

* the albums and the photos hosted by Picasa Web Albums (the photos are indexed by Google Image Search, while the albums are included in Google's main search results). Picasa Web's front-end uses AJAX and URLs like http://picasaweb.google.com/guedin/AdriChezLesKiwisToutesLesPhotos12#5312778271091234418 can't be indexed by search engines, which usually remove fragments.

* the answers and questions from Google Moderator, another AJAX app that uses addresses like http://moderator.appspot.com/#15/e=cc&t=6. The application powers a new section from White House's website called "Open for Questions", which also can't be indexed by search engines.

* the LIFE photo archive, which is only available in Google Image Search. "It's disappointing that Google gets exclusive access to index these images and every other search engine is out of luck. Exclusivity like this doesn't seem in line with Google's philosophy," says Andy Baio.

* the books scanned by Google that are available in Google Book Search (they're included in Google's main search results, as part of Universal Search)

* the patents from the United States Patent and Trademark Office that are available in Google Patent Search

* the charts generated using Google Chart API

* the captions from videos hosted by YouTube and Google Video (they're indexed by YouTube and Google Video)

Labels

Web Search Gmail Google Docs Mobile YouTube Google Maps Google Chrome User interface Tips iGoogle Social Google Reader Traffic Making Devices cpp programming Ads Image Search Google Calendar tips dan trik Google Video Google Translate web programming Picasa Web Albums Blogger Google News Google Earth Yahoo Android Google Talk Google Plus Greasemonkey Security software download info Firefox extensions Google Toolbar Software OneBox Google Apps Google Suggest SEO Traffic tips Book Search API Acquisitions InOut Visualization Web Design Method for Getting Ultimate Traffic Webmasters Google Desktop How to Blogging Music Nostalgia orkut Google Chrome OS Google Contacts Google Notebook SQL programming Google Local Make Money Windows Live GDrive Google Gears April Fools Day Google Analytics Google Co-op visual basic Knowledge java programming Google Checkout Google Instant Google Bookmarks Google Phone Google Trends Web History mp3 download Easter Egg Google Profiles Blog Search Google Buzz Google Services Site Map for Ur Site game download games trick Google Pack Spam cerita hidup Picasa Product's Marketing Universal Search FeedBurner Google Groups Month in review Twitter Traffic AJAX Search Google Dictionary Google Sites Google Update Page Creator Game Google Finance Google Goggles Google Music file download Annoyances Froogle Google Base Google Latitude Google Voice Google Wave Google Health Google Scholar PlusBox SearchMash teknologi unik video download windows Facebook Traffic Social Media Marketing Yahoo Pipes Google Play Google Promos Google TV SketchUp WEB Domain WWW World Wide Service chord Improve Adsence Earning jurnalistik sistem operasi AdWords Traffic App Designing Tips and Tricks WEB Hosting linux How to Get Hosting Linux Kernel WEB Errors Writing Content award business communication ubuntu unik