tutorial - How to Create and Configure robot.txt for Apache web server

"Robots.txt" is a regular text file that through its name, has special meaning to the majority of "honorable" robots on the web. By defining a few rules in this text file, you can instruct robots to not crawl and index certain files, directories within your site, or at all. For example, you may not want Google to crawl the /images directory of your site, as it's both meaningless to you and a waste of your site's bandwidth. "Robots.txt" lets you tell Google just that.

1) Here's a basic "robots.txt":
User-agent: *
Disallow: /
With the above declared, all robots (indicated by "*") are instructed to not index any of your pages (indicated by "/"). Most likely not what you want, but you get the idea.

2) you may not want Google's Image bot crawling your site's images and making them searchable online, if just to save bandwidth. The below declaration will do the trick:
User-agent: Googlebot-Image
Disallow: /
3) The following disallows all search engines and robots from crawling select directories and pages:
User-agent: *
Disallow: /cgi-bin/
Disallow: /privatedir/
Disallow: /tutorials/blank.htm
4) You can conditionally target multiple robots in "robots.txt." Take a look at the below:
User-agent: *
Disallow: /
User-agent: Googlebot
Disallow: /cgi-bin/
Disallow: /privatedir/
This is interesting- here we declare that crawlers in general should not crawl any parts of our site, EXCEPT for Google, which is allowed to crawl the entire site apart from /cgi-bin/ and /privatedir/. So the rules of specificity apply, not inheritance.

5) There is a way to use Disallow: to essentially turn it into "Allow all", and that is by not entering a value after the semicolon(:):
User-agent: *
Disallow: /
User-agent: ia_archiver
Disallow:
Here all crawlers should be prohibited from crawling our site, except for Alexa, which is allowed.

6) Finally, some crawlers now support an additional field called "Allow:", most notably, Google. As its name implies, "Allow:" lets you explicitly dictate what files/folders can be crawled. However, this field is currently not part of the "robots.txt" protocol, so use it only if absolutely needed, as it might confuse some less intelligent crawlers.

Per Google's FAQs for webmasters, the below is the preferred way to disallow all crawlers from your site EXCEPT Google:
User-agent: *
Disallow: /
User-agent: Googlebot
Allow: /
Finally this file (robot.txt) must be uploaded to the root accessible directory of your site, not a subdirectory (eg. www.mysite.com/robot.txt) it is only by following the above rules will search engines interpret the instructions contained in the file.


Free, facebook, tips, Links, blogging, Downloads, Google, facebookTips, money, news, apps, Social, Media, Website, Tricks, games, Android, software, PIctures, Internet, Security, Web, codes, Review, bloggers, SAMSUNG, Worldwide, Contest, Exitic, Phones, facebookTricks, hacking, London, Olympics, SEO, Youtube, iOS, Adsense, gadgets, iPHONE, widgets, Doodle, twitter, video, Deals, technology, Aircel, Airtel, iPAD, Angry, Birds, BSNL, TechLife, GMAIL, Idea, Microsoft, SmartPhones, Stress, Buster, Windows, Yahoo, Infolinks, Nokia, Scam, Uninor, browsers, Amazon, Euro, CUP, Chat, IDM, JOBS, Modem, Music, Reliance, Results, SSC, Tata, Docomo, bing, freebie, mobile, placements, AIEEE, AlertPay, Chrome, College, Competetive, Exam, Dehradun, Extension, FireFox, GPRS, HTC, IMPACT, Info, MTS, Mark, Zukerberg, Paypal, Promotional, Post, Torrent, UTU, Unlocking, VodaFone, Wall, Paper, apple, books, engineering, iCAR, iTunes, pinterest, rovio, AVG, Admit, Card, Adobe, Affiliate, Marketing, Akhilesh, Amul, Girl, BlackBerry, ChromeBook, Clixsense, Coupon, Digitallife, Discovery, Emoticons, Festival, GATE, GIMP, Income, Tax, International, JSS, JailBreaking, Kindle, Linux, Local, MAX, PAYNE, Mac, Mango, Memory, Speed, Nexus, Online, Shopping, Raakhi, Report, Rising, Stars, Sample, Science, Sony, Syllabus, TabletBooK, Teamviewer, Templates, Dark, Knight, Rises, USA, UPMT, Virgin, Xperia, ZTE, challan, counselling, course, btech, funny, iMOVE, registration

source:http://linuxpoison.blogspot.com/2009/02/13578175711797.html

Labels

Web Search Gmail Google Docs Mobile YouTube Google Maps Google Chrome User interface Tips iGoogle Social Google Reader Traffic Making Devices cpp programming Ads Image Search Google Calendar tips dan trik Google Video Google Translate web programming Picasa Web Albums Blogger Google News Google Earth Yahoo Android Google Talk Google Plus Greasemonkey Security software download info Firefox extensions Google Toolbar Software OneBox Google Apps Google Suggest SEO Traffic tips Book Search API Acquisitions InOut Visualization Web Design Method for Getting Ultimate Traffic Webmasters Google Desktop How to Blogging Music Nostalgia orkut Google Chrome OS Google Contacts Google Notebook SQL programming Google Local Make Money Windows Live GDrive Google Gears April Fools Day Google Analytics Google Co-op visual basic Knowledge java programming Google Checkout Google Instant Google Bookmarks Google Phone Google Trends Web History mp3 download Easter Egg Google Profiles Blog Search Google Buzz Google Services Site Map for Ur Site game download games trick Google Pack Spam cerita hidup Picasa Product's Marketing Universal Search FeedBurner Google Groups Month in review Twitter Traffic AJAX Search Google Dictionary Google Sites Google Update Page Creator Game Google Finance Google Goggles Google Music file download Annoyances Froogle Google Base Google Latitude Google Voice Google Wave Google Health Google Scholar PlusBox SearchMash teknologi unik video download windows Facebook Traffic Social Media Marketing Yahoo Pipes Google Play Google Promos Google TV SketchUp WEB Domain WWW World Wide Service chord Improve Adsence Earning jurnalistik sistem operasi AdWords Traffic App Designing Tips and Tricks WEB Hosting linux How to Get Hosting Linux Kernel WEB Errors Writing Content award business communication ubuntu unik