Google Has the Largest Number of Dead and Old Pages

Ziv Bar-Yossef, from Google, wrote a paper about sampling random pages from a search engine's index using queries. He explains some of the technical details in this video, including the utility of sampling random pages: comparing search engines, estimating the amount of spam, of fresh results etc.

He applied the results from his paper and compared Google, Yahoo and MSN Search. Here are three charts that show a comparison of the index size, how many dead pages are in each search engine and how fresh the results are. The charts are only an estimation, and they have a bias of around 10%. As you can see, Google doesn't do very well.

To find out more, watch the video, which is fairly long (1 hour) or skip to the results. There's also the paper "Random Sampling from a Search Engine's Index" (PDF), that got the best paper award at WWW 2006.

The Hidden Purpose of Google Base

ComputerWorld reports that Google intends to extend Google Base integration into main search results.

"When users search for products on Google.com, the system will present them with another search box so that they can refine their query. After users refine their queries, Google takes them to a second page populated with product results from the Google Base listings service."

Google also plans to diminish Froogle's importance and to include ads in Google Base. Google recently redesigned Google Base, removed the search box from the homepage and added the tagline "Post it on Base. Find it on Google" to show you'll see more search results from Google Base on Google.com.

You can already see this integration if you search for "jobs" (it works in the US and in few other countries).



In the future, you'll search for a product like "dress" and customize its characteristics before actually seeing the search results.



It's important to note that Google ranks the results from Google Base according to their relevancy and using the metadata attached to each item. Listing products in Google Base is free.

When it was launched, Google said that Google Base is a service that collects information not yet in Google Search, but the real idea was organizing this information and making search results more intelligent using it.

Google Belgium Homepage, Dreadfully Sad


Google finally complied to Belgian's court order completely. After removing several sites from Google.be and Google News, they show the text of the court order on Google.be.

"Also order the defendant to publish, in a visible and clear manner and without any commentary from her part the entire intervening judgment on the home pages of 'google.be' and of 'news.google.be' for a continuous period of 5 days within 10 days of the notification of the intervening order, under penalty of a daily fine of 500,000,- € per day of delay."

Google Belgium homepage now looks like a big wound on the face of the Internet.

Try Google's Updated Design Experiment

Google has updated their experimental design of the search result pages, that shows the services in a left sidebar.

If you want to try it, copy this code:

javascript:document.cookie="PREF= ID=ad93daafaa747f70:TM=1158373640:LM=1158374016:GM=1:S=wNuiLiKHrkRnMZtf; path=/; domain=.google.com"

go to google.com, paste it in the address bar, then go to the preferences and click "Save preferences".

If you want to go back to the original design, just delete your Google cookie.




{ Via Googlified. }

Related:
Other design experiments
User experience at Google

Google Ajax Search, To Help JavaScript Worms

Gnucitizen blog has an interesting post about Google Ajax Search API, a tool that allows you to integrate Google Search into your site and let visitors search Google without leaving your site. The post shows that this API could make life much easier for those who write malware and might facilitate their propagation.

"Web worms can use Google's infrastructure to propagate. If a malicious mind finds a vulnerability in WordPress for example and this vulnerability allows SQL Injection, a worm may be written to crawl blogs in search for this vulnerability and embed itself into everything that is vulnerable. Once a user visits an infected blog the worm starts another cycle.

Another worm might be able to crawl random sites and run generic Cross-site Scripting and SQL Injection checks and send the results to their master who will use them to release more advance worms.

Malicious minds can use Google technology and recently discovered vulnerabilities to create a BotNet that can be used for computational tasks, attacks, information gathering and pretty much everything else that the masters can come up with."


Unlike standard worms, JavaScript worms are not easy to detect and can spread rapidly . The author also thinks that in the future the web will be the new arena for malware, and we may need a web anti-virus that monitors visited web pages.

Related:
Cross-site scripting (Wikipedia)
Cross-site request forgery (Wikipedia)
Samy is my hero (MySpace worm)
More about Google Ajax Search API

Get Rich From ATMs Using Google

EWeek reports that you don't have to be clever to get rich.

"Using clues obtained from a YouTube video and a simple four-word Google search engine query, a criminal can find step-by-step instructions for how to hack into and take control of thousands of ATMs scattered around the United States. (...)

In the operator manual freely available on the Web site of a Canadian reseller, a section titled Programming provides the specific key sequence that will pop up a screen on the ATM that asks for the master password. It then lists three default passwords—master, service and operator—that could be used to hijack and possibly rig a machine."

And because most people are lazy, many ATMs still have the default passwords, which are freely available. A quote from the manual of an ATM:

"The default Master password is 123456 and the default Administrative password is 987654. To enter Management Functions as the Administrative user, enter 987654 and press ENTER (OK)."



The article concludes that "the episode underscores how easy it is to use the power of search engines to find sensitive security information. In the past, Google queries have been used to find security flaws in Web-facing applications, default passwords in Oracle databases and even live malware samples seeded on forums and other malicious sites." That's true, but you should also think that publicly available information is... available to public, so anyone can use it. Google and other search engines can only make this process easier, but the fault is not theirs.

Google Periodic Table

You must have studied chemistry. Then you must know the periodic table of elements, a list of chemical elements ordered according to their atomic numbers. It doesn't look very well, it's hard to remember the position of each element and it doesn't stir your imagination.

Joey deVilla has a friend who thought it would be nice to pick the top result for each element in Google Image Search and put it in the table. The result is extraordinary, although it could be improved a little bit.

Create More Than One Site in Google Pages


Google Page Creator has a new feature: you can create up to five sites using a single account.

First, enable "experimental features" in Site Settings. Then go to the page manager and click on "Create a site with a different address", choose an address and that's it. As the address has the format *.googlepages.com, you can't choose the name of an existing Gmail account.

This is a good way to overcome the limitation of having only 100 MB space and only 100 files, without creating a new account.

The answer to the question: "How do I create a new site?" from help has changed from:

"During this initial testing period, we're only allowing each account to have a single site. However, this site can be comprised of up to 100 pages. We'll soon be offering support for multiple sites, but don't have any specific timeline to share at this point."

to

"Select the Create a site with a different address link that appears near the top of the Page Manager."

Related:
Add gadgets to Google Pages
Free website monitoring

Google Checkout Integrates with Froogle


If you search for a product on Froogle, you'll notice a new item in the list of stores: "Google Checkout Stores". Google didn't create a special store, it just lists all the products that can be bought using Google Checkout.

There's also a Google Checkout logo at the bottom of the page: "Google - Accepts Google Checkout". Google wants to make their new service more visible, hoping that more online stores will use it.

Compare Google Results from Around the World


Search Engine Watch found a site that lets you compare Google search results from different locations. Google uses geotargeting, so the results from different countries (and even regions from the same country) might be different. For better results, select a data center instead of using google.com. My only complaints are that the site is pretty slow and it doesn't have an extensive list of countries.

This geotargeted search comparison might be useful if you want to have a better look at your rankings, but you should know that there are other factors like personalization that can modify the order of the search results.

Behind corp.google.com

Tony Ruscoe found some internal Google subdomains. All of them have this form *.corp.google.com and include:

alien.corp.google.com
amd.corp.google.com
blackberry.corp.google.com
blueberry.corp.google.com
cluster.corp.google.com
cupid.corp.google.com
discovery.corp.google.com
gypsy.corp.google.com
ideas.corp.google.com
matrix.corp.google.com
paranoia.corp.google.com

While some of them have connections with existing or future Google services and Google employees, most of the subdomains have mysterious names.

Also see a list of public Google subdomains.

GFrost

In Google AdSense for Radio targets 'declining sector', she quotes her article Will Google ever build another billion dollar business? where she quotes her Google targets GPS-based in-car personalized advertising where she quotes Eric Schmidt.

"Eric Schmidt, Google CEO, believes that when he is listening to the radio in his car, radio ads should personally address him about his needs. For example, while driving past a clothing store, a radio ad should remind Eric that he needs a pair of pants and instruct him to turn left at the upcoming clothing store."

In Google Apps is risky business she quotes her Google's not so fine print: Google Apps TOS put Google first where she quotes from Google Apps for Your Domain:

"Google reserves final approval authority with respect to the means used by Customer to deploy each component of Google Apps, and in the event Google disapproves of such deployment, Google shall have the right, upon notice to Customer, to suspend any continued use of Google Apps until such time Customer implements adequate corrective modifications as reasonably required and determined by Google."

In Why Digg fraud, Google bombing, Wikipedia vandalism will not be stopped, she quotes Marissa Mayer that said:

"We don't condone the practice of googlebombing, or any other action that seeks to affect the integrity of our search results, but we're also reluctant to alter our results by hand in order to prevent such items from showing up. Pranks like this may be distracting to some, but they don't affect the overall quality of our search service, whose objectivity, as always, remains the core of our mission."

In Google 'gift' to advertisers: 'Free' Google employee clicks, she quotes Eric Schmidt who said he clicks on Google ads "all the time", concluding this is click fraud.

In "Let click fraud happen"? Uh, no. Google Blog quotes her, who quotes Eric Schmidt who said: "Eventually, the price that the advertiser is willing to pay for the conversion will decline, because the advertiser will realize that these are bad clicks, in other words, the value of the ad declines, so over some amount of time, the system is in-fact, self-correcting. In fact, there is a perfect economic solution which is to let it happen." and she concludes that Google lets click fraud happen.

In fact, Google is an evil company that has deceiving terms of services and talks only to a certain part of the media.

"Google has been rightly called to task for its disingenuous "do no evil" formula. As we embark on this changing of the seasons perhaps it is also time to change our tune on Google's celebrated mission to make "universally accessible and useful" the world's information they have "organized."

As I put forth in "Google to Microsoft: Wolf in sheep's clothing," Google has an uncanny ability to make even its most calculated of competitive moves appear to be generous, friendly endeavors," she says.

Writely, Migrating to Google Accounts


Writely, Google's word processing tool, is finally moving to Google Accounts, after recently being opened to public.

"In a few days, we will update your Writely account to use your @gmail.com Google Account registration settings," says Writely Team in a mail.

"We'll send email to Writely users a few days before making the change with instructions about migrating your account. Please note that you can sign up now for a Google Account using any email address - including your current Writely address."

So if you use a Gmail address to login, the two accounts will be automatically merged. Otherwise, you can keep your Writely mail address if you create a Google Account using it or migrate to an already existing Google Account.

Related:
Writely review
Blogger, moving to Google Accounts

Handy Calculator in Firefox 2.0

If you use Firefox 2, you've noticed that the search box shows suggestions for Google, Yahoo and Answers.com. Google's suggestions are useful if you want to type less, but also if you want to use Google Calculator. Type math expressions, unit conversions, constants and the answer appears in the list of suggestions. It's handy that you can just copy the result and get on with your work.


Firefox uses this URL to retrieve the suggestions: http://suggestqueries.google.com/complete/search?output=firefox&qu=[query]. Note that Google Toolbar for Internet Explorer doesn't show expression results.

{ Via Blogoscoped. }

Related:
10 uses for Google Suggest
Yahoo instant calculator

Captions on Google Video


Google Video tries to promote captions and features a small list of videos that have captions. Although adding video captions was available in the video status section, I didn't see any video with captions until today.

It's really strange that Google supports only SubViewer (*.SUB) and SubRip (*.SRT) formats, instead of focusing on professional formats used in television, for example. Most people who upload their videos won't take the time to create subtitles, as this requires a software and it's not very easy to do.

Some speech recognition combined with an automated translation software would be really useful in this area. Or at least a collaborative captioning system, similar to the way volunteers translate Google interface.

Picasa Web Albums - No Invitation Required



I've already posted that Picasa 2.5 is out of beta, but now it's official.

"I have 80,000 photos in Picasa, Google's free photo organizer, but most of my friends haven't had a chance to see them yet. That's why I'm so excited about the new version of Picasa that came out today. It has a feature called Picasa Web Albums that lets you post and share your photos online for free with just one click. You can show the world (or just your friends and family) what kinds of pictures you've been taking. And best of all, you can even download your friends' online photos right back to Picasa," says a Picasa engineer.

Since it was launched, Picasa Web Albums has added new features. Now you can add your friends, view their recently uploaded photos, link to your friends' albums on the homepage of your album. You can also embed photos and albums into blogs, even though this feature still needs some work (embedded photos are too small, embedded albums should display random photos). And, best of all, Picasa Web Albums doesn't need invitation anymore.

If you have a Gmail Google Account, you can try it at http://picasaweb.google.com. If you don't have an account or you just want to see a sample, visit this album.

Google Send to Phone - Free SMS in the US

Google has a web page that lets you send free SMS in the US: Google Send to Phone.

This page is used in Google Toolbar for IE and Google Send to Phone for Firefox to send short text messages of web page content.

This feature has been available since last year and it's a simple way to send free SMS, without using sites that require registration.

To find out more about this, including some of the privacy issues, read the FAQ.


Update: In July 2008, the service has been retired and it's no longer available in Google Toolbar or as a separate extension. There are many other services that let you send free SMS, including Gizmo SMS, Text4Free and TxtDrop.

Choose How Often Google Crawls Your Site

Bigmouthmedia reports that Google Sitemaps has a new feature that allows you to choose how often Googlebot crawls your site. You can select from 5 values, from slowest to fastest, but you must know that a faster crawl uses more bandwidth. For the moment, this feature is still experimental, so you may not find it in your Google Sitemaps account.

"We are testing an alpha version of our new tool with a small percentage of webmasters who use Sitemaps. You should leave this control at the Normal setting unless you are having trouble with the speed at which Googlebot is crawling your server.

Simply select the rate at which you would like the Googlebot to crawl your server and click save. During this stage of testing, we will evaluate requests to determine the best way of using this data and providing this tool to everyone."


Google Sitemaps, recently rebranded as Google Webmaster Central, is a control panel for webmasters, where they can find statistics about searches, crawling errors and submit sitemaps so that Google finds their pages faster.

{ Thank you, TomHTML. }

Belgian Press Out Of Google



A Belgian court has ordered Google to remove articles from newspapers represented by Copiepresse. From the court order:

"Find that the activities of Google News and the use of the Google cached violate in particular the laws on copyright and ancillary rights (1994) and the law on data bases (1998).

Order the defendant to withdraw the articles, photographs and graphic representations of Belgian publishers of the French - and German-speaking daily press, represented by the plaintiff, from all their sites (Google News and "cache" Google or any other name within 10 days of the notification of the intervening order, under penalty of a daily fine of 1,000,000.- € per day of delay.

Also order the defendant to publish, in a visible and clear manner and without any commentary from her part the entire intervening judgment on the home pages of 'google.be' and of 'news.google.be' for a continuous period of 5 days within 10 days of the notification of the intervening order, under penalty of a daily fine of 500,000,- € per day of delay."


Belgian publishers were upset that Google keeps the content of their articles in the cache and considered this a copyright infringement. They didn't understand there are other ways to be removed from Google's index and that Google's traffic is valuable.

"We are asking for Google to pay and seek our authorization to use our content. Google sells advertising and makes money on our content," said Copiepresse general secretary, Margaret Boribon. Her statement is, of course, false because Google News doesn't have ads, Google Search shows ads next to page snippets and Google drives traffic to their unworthy sites.

Google plans to appeal to the court order, but Belgium's law seems to not be on their side. Until then, they removed the sites from Google Belgium, as you can see from this search (Le Soir is one of the most popular newspapers from Belgium). Users can still find the pages at Google.com.

Developing Google Calendar

Carl Sjogreen, from Google, talked about Google Calendar development. Rakesh Agrawal took some notes, from which you can find a lot of interesting things:

Why did Google choose to build a calendar tool? Because no solution was very good and there was little innovation in this space.

How should Google Calendar be? Fast, simple, easy to share, visually appealing.

What's the biggest competitor for Google Calendar? The paper, because it's easy to carry with you, doesn't need a computer or a password.

Apple's iTV Might Include Google Video

Newsweek reports that Apple's iTV, a wireless video streaming set-top box to be released next year, might include a link to Google Video.

"In addition to the photos, movies, TV shows and tunes on your hard drive, iTV, with the ridiculously minimal six-button Apple remote, lets you go to the Net to get stuff. Last week Jobs showed only a menu item that pulls in movie trailers, but when you open up your iTunes library, you can also listen to bits of new music recommended by the iTunes store. Is it possible that when iTV ships next year, you may also be able to choose a menu item called Google Video, and then zip through the best of the thousands of user-submitted videos on the search giant's service? Google's consumer product chief, Marissa Mayer, tells me that indeed, the two companies are engaged in talks."

Google also has a partnership with DivX to make Google Video easy to integrate with electronic devices, so Google might provide an option to view the videos at higher quality. But to make the integration really useful, Google needs to focus on personalizing Google Video.

"More quality videos, easier to find videos and a more personal experience. I think this is the key for a better Google Video," I concluded in What's next for Google Video.

{ Via Garett Rogers, who anticipated this. }

How to Test Google Web Accelerator

Google Web Accelerator is a software that accelerates page load times. It prefetches some of the pages and uses Google servers to retrieve data faster.

If you have the program, you must be wondering if it's really working well. Google displays the amount of time Google Web Accelerator saved, but this is an aggregated value.

But there's a way to see how fast Google's accelerator is: go to this page, called Google Web Accelerator IFrame Racing (it works only if you have the software) and enter a URL. The upper iframe shows the optimized loading, while the other iframe shows the standard loading.

If you want to load a page directly, without the accelerator, add .direct.google after the domain name: instead of http://www.cnn.com type http://www.cnn.com.direct.google.

Flickr Uses Google Accounts API to Access Blogger


Here's something you'll see more often from now on: Google Accounts API, a system that allows third party sites to talk to Google services.

In this case, Flickr wants the permission to post photos in Blogger Beta. Until now, when you wanted to send photos to Blogger you needed to enter your username and password. The new Blogger uses the same authentication like the rest of Google services, so giving you credentials means giving access to Gmail, Search history, Checkout, that store sensitive information. Of course you trust Flikr and you know they don't store your information or use it for dubious purposes, but the new authentication system sends you to Google when you need to enter your password and gives access only to one service (in this case, Flickr has access only to Blogger, not other services). If you want to, you can disable access for a web site at any time.

Hopefully, other sites like Meebo or Netvibes will use the same system, so you don't have enter your Google Account credentials if you don't see google.com in your address bar.

Labels

Web Search Gmail Google Docs Mobile YouTube Google Maps Google Chrome User interface Tips iGoogle Social Google Reader Traffic Making Devices cpp programming Ads Image Search Google Calendar tips dan trik Google Video Google Translate web programming Picasa Web Albums Blogger Google News Google Earth Yahoo Android Google Talk Google Plus Greasemonkey Security software download info Firefox extensions Google Toolbar Software OneBox Google Apps Google Suggest SEO Traffic tips Book Search API Acquisitions InOut Visualization Web Design Method for Getting Ultimate Traffic Webmasters Google Desktop How to Blogging Music Nostalgia orkut Google Chrome OS Google Contacts Google Notebook SQL programming Google Local Make Money Windows Live GDrive Google Gears April Fools Day Google Analytics Google Co-op visual basic Knowledge java programming Google Checkout Google Instant Google Bookmarks Google Phone Google Trends Web History mp3 download Easter Egg Google Profiles Blog Search Google Buzz Google Services Site Map for Ur Site game download games trick Google Pack Spam cerita hidup Picasa Product's Marketing Universal Search FeedBurner Google Groups Month in review Twitter Traffic AJAX Search Google Dictionary Google Sites Google Update Page Creator Game Google Finance Google Goggles Google Music file download Annoyances Froogle Google Base Google Latitude Google Voice Google Wave Google Health Google Scholar PlusBox SearchMash teknologi unik video download windows Facebook Traffic Social Media Marketing Yahoo Pipes Google Play Google Promos Google TV SketchUp WEB Domain WWW World Wide Service chord Improve Adsence Earning jurnalistik sistem operasi AdWords Traffic App Designing Tips and Tricks WEB Hosting linux How to Get Hosting Linux Kernel WEB Errors Writing Content award business communication ubuntu unik