Wednesday, February 02, 2005

Google Defies Dotcom Downturn

Google Defies Dotcom Downturn
By Mitch Wagner. InternetWeek

It doesn't look like Google got the e-mail that the dotcom boom is over.

While other e-businesses are cutting back, Google is increasing its infrastructure as fast as it can, doubling the size of its server farm in the last 10 months, to 8,000 systems.

Google needs all that iron because demand for the site is booming. Google, which ranks in the top 25 Web pages worldwide, had 10.9 million unique visitors in March, compared with 3.2 million unique visitors in April 2000, according to Jupiter Media Metrix.

Despite the growing traffic, the site remains fast and accessible. Visitors to Google can access one of its pages in an average of.64-second, according to measuring firm Keynote Systems.

Google is one of the biggest enterprises using the increasingly popular server farm approach to scalability. As the prices and size of Intel-architecture servers shrink, enterprises scale by using large numbers of cheap, low-powered servers.

Google, like many other companies using this approach, runs Linux. "You don't need one, enormous 64-way system, as long as you have truckloads of small systems," said Rich Partridge, an analyst with D.H. Brown Associates. "Google is taking a trend that others are doing and taking it out to an extreme."

As part of the infrastructure expansion, Google is consolidating. The company is moving out of datacenters in the San Francisco Bay and Washington D.C. areas, and consolidating in a new facility in the D.C. area. That means Google is moving from five to four datacenters--this, after adding three datacenters in the past year or so.

Google indexes 1.3 Web billion pages on over a petabyte of storage--that's more than a million gigabytes.

"That's not to say that the index takes up a petabyte. We have several hundred copies of the index," said Marc Feltontions manager for Google. "Most of the servers are serving up some fraction of the index." The index is partitioned into individual segments, and queries are routed to the appropriate server based on which segment is likely to hold the answer.

And while many big Web sites use RAID arrays or Storage Area Networks (SANs) to store data, Google simply uses massive amounts of conventional disk storage because it is faster, Felton said.

Many of Google's storage devices are outfitted with 80GB hard drives from Maxtor. They have a single controller per hard drive and two hard drives per PC. In some cases, the company uses PCs that are twice as big, with four controllers, four hard drives, two processors and twice the RAM of the smaller units.

"If we wrap more CPU and RAM around four hard drives, we get more efficiency than if we use two hard drives to a PC," he said. In most PCs, two hard drives share an integrated drive electronics (IDE) controller. Google, however, gives each hard drive its own, which increases throughput.

Google gets its Intel-architecture servers from Rackable Systems. Space in datacenters is expensive, so Google crams in as many PCs as possible. Packing them in with too much density leads to overheating, however.

"You stack those `pizza boxes' tightly, and side by side, and very soon you've got a pizza oven," D.H. Brown's Partridge said.

"The single biggest killer of PCs is heat," Felton said. "It's not just where the heat goes within the box, which is where everyone else is designing their computers. Everyone else thinks of computers as a standalone unit, but Rackable does some work on top of that. They direct the heat to a central chimney which is blown up to a high-powered fan."

The operating system that the servers run is Red Hat Linux.

Google serves as proof by example that Linux can be used to run a big business. "For a company considering making an investment in the operating system as a foundation for their applications, Google is a validation," said Stacey Quandt, an analyst with Giga Information Group.

She added, "The combination of a commodity operating system and a commodity hardware gives Google a competitive advantage, since it can invest its resources in development and support of a custom application."

But any IT manager thinking, "If Linux can run Google's applications, it can run ours," is in for a disappointment, she said. Google's applications are unique, requiring far more extensive load-balancing, computing, and input-output bandwidth than other enterprise applications.

Google downloads Red Hat for free, taking advantage of the company's open source distribution. And Linux's open source nature allowed Google to make extensive modifications to the OS to meet its own needs, for remote management, security and to boost performance.

Felton, who describes himself as "not a Linux evangelist by any means," but rather someone who uses whatever works best, said he likes Red Hat's performance, stability, packaging and installation routines. Also, because Red Hat is the most popular Linux version, IT staff skilled in its function are relatively easy to find.

Also, by choosing Linux, Google avoids locking itself into a single vendor for hardware or operating system, Quandt said.