Quantity Over Quality at Google Book Search

Campus Technology has a well-documented article about Google Book Search: The Good, the Bad & the Ugly, which suggests that Google's project is more about quantity than quality. For example, The University of California has to deliver 3,000 books a day to Google, according to their agreement. "All of the libraries are talking about that, in the sense of what might be the most interesting materials to scan. But I'll be very frank: There's a real balance point between volume and selection, especially when looking at these numbers. UC is trying to meet the needs of the contract it's signed," says Robin Chandler, former director of data acquisitions for UC's California Digital Library.

And since Google has to scan a lot of books, it needs a scalable scanning technology. "When it first started, the technical challenge was simply building a scanning device that worked. The next technical challenge was being able to run this scanning process at scale. We would have been quite happy to use commercial scanning technologies if they were adequate to scale to this. We only built our own scanning process because that was the way to make this project achievable for Google," says Dan Clancy from Google.

Surprisingly, the scanning process involves humans, as you can see in some books from Google's index (TechCrunch, Google Blogoscoped, George Hernandez, The Genealogue spotted fingers). "If you go into Google [Book Search] and look at any book, you'll be able to see by the number of body parts and fingerprints that [the pages] are being turned manually," suggests Linda Becker, VP at Kirtas, the company that produces the fastest robotic book scanner in the world: APT BookScan 2400. "If you were to go to the Google site, you'd see that one out of every five pages is either missing, or has fingers in it, or is cut off, or is blurry."


Larry Page announced in October 2007 that the book search index is "over a million books". A search for "now" returns 2,190,600 results (1,740,600 available in limited preview and 214,600 fully available for reading and downloading).

The conclusion of the article is optimistic:
When it comes down to it, then, this brave new world of book search probably needs to be understood as Book Search 1.0. And maybe participants should not get so hung up on quality that they obstruct the flow of an astounding amount of information. Right now, say many, the conveyor belt is running and the goal is to manage quantity, knowing that with time the rest of what's important will follow. Certainly, there's little doubt that in five years or so, Book Search as defined by Google will be very different. The lawsuits will have been resolved, the copyright issues sorted out, the standards settled, the technologies more broadly available, the integration more transparent.

Labels

Web Search Gmail Google Docs Mobile YouTube Google Maps Google Chrome User interface Tips iGoogle Social Google Reader Traffic Making Devices cpp programming Ads Image Search Google Calendar tips dan trik Google Video Google Translate web programming Picasa Web Albums Blogger Google News Google Earth Yahoo Android Google Talk Google Plus Greasemonkey Security software download info Firefox extensions Google Toolbar Software OneBox Google Apps Google Suggest SEO Traffic tips Book Search API Acquisitions InOut Visualization Web Design Method for Getting Ultimate Traffic Webmasters Google Desktop How to Blogging Music Nostalgia orkut Google Chrome OS Google Contacts Google Notebook SQL programming Google Local Make Money Windows Live GDrive Google Gears April Fools Day Google Analytics Google Co-op visual basic Knowledge java programming Google Checkout Google Instant Google Bookmarks Google Phone Google Trends Web History mp3 download Easter Egg Google Profiles Blog Search Google Buzz Google Services Site Map for Ur Site game download games trick Google Pack Spam cerita hidup Picasa Product's Marketing Universal Search FeedBurner Google Groups Month in review Twitter Traffic AJAX Search Google Dictionary Google Sites Google Update Page Creator Game Google Finance Google Goggles Google Music file download Annoyances Froogle Google Base Google Latitude Google Voice Google Wave Google Health Google Scholar PlusBox SearchMash teknologi unik video download windows Facebook Traffic Social Media Marketing Yahoo Pipes Google Play Google Promos Google TV SketchUp WEB Domain WWW World Wide Service chord Improve Adsence Earning jurnalistik sistem operasi AdWords Traffic App Designing Tips and Tricks WEB Hosting linux How to Get Hosting Linux Kernel WEB Errors Writing Content award business communication ubuntu unik