Googlebot Can Destroy Sites

Googlebot follows every links in the pages it indexes. But what if that links have disastrous effects? Here's a lesson about Googlebot's power to destroy a badly conceived site.

"Josh Breckman worked for a company that landed a contract to develop a content management system for a fairly large government website. Much of the project involved developing a content management system so that employees would be able to build and maintain the ever-changing content for their site.

Things went pretty well for a few days after going live. But, on day six, things went not-so-well: all of the content on the website had completely vanished and all pages led to the default "please enter content" page. Whoops.

Josh was called in to investigate and noticed that one particularly troublesome external IP had gone in and deleted *all* of the content on the system. The IP didn't belong to some overseas hacker bent on destroying helpful government information. It resolved to googlebot.com, Google's very own web crawling spider. Whoops.

As it turns out, Google's spider doesn't use cookies, which means that it can easily bypass a check for the "isLoggedOn" cookie to be "false". It also doesn't pay attention to Javascript, which would normally prompt and redirect users who are not logged on. It does, however, follow every hyperlink on every page it finds, including those with "Delete Page" in the title. Whoops."

So next time don't assume every visitor has JavaScript activated and validate the actions both on client side and on server side. If you want to validate your data for accuracy and security then you must use server side code to check your form inputs.

And something else: according to the HTTP 1.1 specification, the GET method is defined as a Safe Method which "SHOULD NOT have the significance of taking an action other than retrieval." If you to change a state (delete content, replace data), you should use POST.

Related posts about security:
Google Deleted... Google Blog
GMail vulnerability: GMail runs javascript in body
Get sensitive information using Google

Labels

Web Search Gmail Google Docs Mobile YouTube Google Maps Google Chrome User interface Tips iGoogle Social Google Reader Traffic Making Devices cpp programming Ads Image Search Google Calendar tips dan trik Google Video Google Translate web programming Picasa Web Albums Blogger Google News Google Earth Yahoo Android Google Talk Google Plus Greasemonkey Security software download info Firefox extensions Google Toolbar Software OneBox Google Apps Google Suggest SEO Traffic tips Book Search API Acquisitions InOut Visualization Web Design Method for Getting Ultimate Traffic Webmasters Google Desktop How to Blogging Music Nostalgia orkut Google Chrome OS Google Contacts Google Notebook SQL programming Google Local Make Money Windows Live GDrive Google Gears April Fools Day Google Analytics Google Co-op visual basic Knowledge java programming Google Checkout Google Instant Google Bookmarks Google Phone Google Trends Web History mp3 download Easter Egg Google Profiles Blog Search Google Buzz Google Services Site Map for Ur Site game download games trick Google Pack Spam cerita hidup Picasa Product's Marketing Universal Search FeedBurner Google Groups Month in review Twitter Traffic AJAX Search Google Dictionary Google Sites Google Update Page Creator Game Google Finance Google Goggles Google Music file download Annoyances Froogle Google Base Google Latitude Google Voice Google Wave Google Health Google Scholar PlusBox SearchMash teknologi unik video download windows Facebook Traffic Social Media Marketing Yahoo Pipes Google Play Google Promos Google TV SketchUp WEB Domain WWW World Wide Service chord Improve Adsence Earning jurnalistik sistem operasi AdWords Traffic App Designing Tips and Tricks WEB Hosting linux How to Get Hosting Linux Kernel WEB Errors Writing Content award business communication ubuntu unik