Using Anchor Text to Translate Search Queries

Google was granted a new patent titled: "Systems and methods for using anchor text as parallel corpora for cross-language information retrieval". Because queries are usually short and can have multiple meanings, Google tries to find the best search results for the translated query by looking at anchor text.

The method includes receiving a search query that includes terms in a first language; determining possible translations of the terms of the search query into a second language; locating documents in the first language that match the terms of the search query; identifying documents in the second language that contain references to the first language documents; and disambiguating among the possible translations of the terms of the search query using the second language documents to identify one of the possible translations as a likely translation of the search query.

Here's an example:
Assume that a user provides a search query to the server in Spanish, but desires documents to be returned in English. Further, assume that the user desires documents relating to "banks interest." In this case, the query provided by the user may include the terms "bancos" and "interes." To facilitate English-language document retrieval, the server may translate the Spanish query to English.

The query translation engine may perform an initial translation of the terms of the query using, for example, the dictionary. In this case, the query translation engine finds that each of the terms of the query has more than one possible translation. For example, the Spanish word "bancos" could be translated as "banks" or "benches" (among other possibilities) in English. The Spanish word "interes" could be translated as "interest" or "concern" (among other possibilities) in English. The query translation engine disambiguates among the possible translations using documents identified by the search engine.

The search engine performs a search using the original Spanish query (i.e., "bancos interes") to identify Spanish-language documents that include anchors that contain all of the query terms and point to English-language documents. The search engine provides the English-language documents that are pointed to by the anchors to the query translation engine.

The query translation engine analyzes the text of the English-language documents to, for example, compute the frequency of co-occurrence of the various translation possibilities. Specifically, the query translation engine determines how often the word "banks" occurs with "interest," "banks" occurs with "concern," "benches" occurs with "interest," and "benches" occurs with "concern." Presumably, the query translation engine would determine that "banks" and "interest" are the most frequent combination and use these terms as the correct translation for the Spanish query "bancos interes."

Google didn't implement this method into the search engine yet. If you could also translate documents into the first language (your native language), you would need one language to search the web.

Labels

Web Search Gmail Google Docs Mobile YouTube Google Maps Google Chrome User interface Tips iGoogle Social Google Reader Traffic Making Devices cpp programming Ads Image Search Google Calendar tips dan trik Google Video Google Translate web programming Picasa Web Albums Blogger Google News Google Earth Yahoo Android Google Talk Google Plus Greasemonkey Security software download info Firefox extensions Google Toolbar Software OneBox Google Apps Google Suggest SEO Traffic tips Book Search API Acquisitions InOut Visualization Web Design Method for Getting Ultimate Traffic Webmasters Google Desktop How to Blogging Music Nostalgia orkut Google Chrome OS Google Contacts Google Notebook SQL programming Google Local Make Money Windows Live GDrive Google Gears April Fools Day Google Analytics Google Co-op visual basic Knowledge java programming Google Checkout Google Instant Google Bookmarks Google Phone Google Trends Web History mp3 download Easter Egg Google Profiles Blog Search Google Buzz Google Services Site Map for Ur Site game download games trick Google Pack Spam cerita hidup Picasa Product's Marketing Universal Search FeedBurner Google Groups Month in review Twitter Traffic AJAX Search Google Dictionary Google Sites Google Update Page Creator Game Google Finance Google Goggles Google Music file download Annoyances Froogle Google Base Google Latitude Google Voice Google Wave Google Health Google Scholar PlusBox SearchMash teknologi unik video download windows Facebook Traffic Social Media Marketing Yahoo Pipes Google Play Google Promos Google TV SketchUp WEB Domain WWW World Wide Service chord Improve Adsence Earning jurnalistik sistem operasi AdWords Traffic App Designing Tips and Tricks WEB Hosting linux How to Get Hosting Linux Kernel WEB Errors Writing Content award business communication ubuntu unik