Improving Google Image Search Using Implicit PageRank

Image search engines have a very limited usefulness since it's difficult to accurately describe images in words and since search engines completely ignore the images, preferring to index anchor texts, file names or the text that surrounds images. "Search for apples, and they haven't actually somehow scanned the images itself to see if they contain pictures of apples," illustrates Danny Sullivan.

Image analysis didn't produce algorithms that could be used to process billions of images in a scalable way. "While progress has been made in automatic face detection in images, finding other objects such as mountains or tea pots, which are instantly recognizable to humans, has lagged," explains The New York Times.

An interesting paper [PDF] written by Yushi Jing and Google's Shumeet Baluja describes an algorithm similar to PageRank that uses the similarity between images as implicit votes. "We cast the image-ranking problem into the task of identifying authority nodes on an inferred visual similarity graph and propose an algorithm to analyze the visual link structure that can be created among a group of images. Through an iterative procedure based on the PageRank computation, a numerical weight is assigned to each image; this measures its relative importance to the other images being considered."

The paper, titled "PageRank for Product Image Search", assumes that people are more likely to go from an image to other similar images. "By treating images as web documents and their similarities as probabilistic visual hyperlinks, we estimate the likelihood of images visited by a user traversing through these visual-hyperlinks. Those with more estimated visits will be ranked higher than others." To determine the similarity between images, the paper suggests using different features depending on the type of images: local features, global features (color histogram, shape).

The system was tested on the most popular 2000 queries from Google Image Search on July 23rd, 2007, by applying the algorithm to the top 1000 results produced by Google's search engine and the results are promising: users found 83% less irrelevant images in the top 10 results, from 2.83 results in the current Google search engine to 0.47.

For example, a search for [Monet paintings] returned some of his famous paintings, but also "Monet Painting in His Garden at Argenteuil" by Renoir.


It may seem that this algorithm lacks the human element used to compute PageRank (links are actually created by people), but the two authors disagree. "First, by making the approach query dependent (by selecting the initial set of images from search engine answers), human knowledge, in terms of linking relevant images to webpages, is directly introduced into the system, since the links on the pages are used by Google for their current ranking. Second, we implicitly rely on the intelligence of crowds: the image similarity graph is generated based on the common features between images. Those images that capture the common themes from many of the other images are those that will have higher rank."

For now, this is just a research paper and it's not very clear if Google will actually use it to improve its search engine, but image search is certainly an area that will evolve dramatically in the future and will change the way we perceive search engines. Just imagine taking a picture of a dog with your mobile phone, uploading it to a search engine and instantly finding web pages that include similar pictures and information about the breed.

In 2006, Google acquired Neven Vision, a company specialized in image analysis, but the only new feature that could be connected to that acquisition is face detection in image search. Riya, another interesting company in this area, didn't manage to create a scalable system and decided to focus on a shopping search engine.

Labels

Web Search Gmail Google Docs Mobile YouTube Google Maps Google Chrome User interface Tips iGoogle Social Google Reader Traffic Making Devices cpp programming Ads Image Search Google Calendar tips dan trik Google Video Google Translate web programming Picasa Web Albums Blogger Google News Google Earth Yahoo Android Google Talk Google Plus Greasemonkey Security software download info Firefox extensions Google Toolbar Software OneBox Google Apps Google Suggest SEO Traffic tips Book Search API Acquisitions InOut Visualization Web Design Method for Getting Ultimate Traffic Webmasters Google Desktop How to Blogging Music Nostalgia orkut Google Chrome OS Google Contacts Google Notebook SQL programming Google Local Make Money Windows Live GDrive Google Gears April Fools Day Google Analytics Google Co-op visual basic Knowledge java programming Google Checkout Google Instant Google Bookmarks Google Phone Google Trends Web History mp3 download Easter Egg Google Profiles Blog Search Google Buzz Google Services Site Map for Ur Site game download games trick Google Pack Spam cerita hidup Picasa Product's Marketing Universal Search FeedBurner Google Groups Month in review Twitter Traffic AJAX Search Google Dictionary Google Sites Google Update Page Creator Game Google Finance Google Goggles Google Music file download Annoyances Froogle Google Base Google Latitude Google Voice Google Wave Google Health Google Scholar PlusBox SearchMash teknologi unik video download windows Facebook Traffic Social Media Marketing Yahoo Pipes Google Play Google Promos Google TV SketchUp WEB Domain WWW World Wide Service chord Improve Adsence Earning jurnalistik sistem operasi AdWords Traffic App Designing Tips and Tricks WEB Hosting linux How to Get Hosting Linux Kernel WEB Errors Writing Content award business communication ubuntu unik