IT Career Advice >> Browse Articles >> Career Resources
IT Lessons from Amazon and Google
R. Marc Phillips
How much does it cost your company every time your network or your website goes down? Probably not as much as the millions per minute Amazon has on the table if it’s down for any real chunk of time. Global enterprises like Google and Amazon are the overclocked gaming PCs of the IT world—very few people need a machine on that scale, but we can all benefit from the lessons they learn in pushing technology so far. Here are a few things these web giants have taught us over the years:
Test Environments Are Your Friend
This goes back to the cost question above. Whatever change you’re about to make, you’re better off doing a trial run first. Massive websites take their test and staging environments to extremes, but even a small test setup can pay massive dividends each time it catches a bug or a mistake you would have pushed live. Invest in testing and watch it save your butt time and again.
Don’t Throw Away Data
Google’s used its position as probably the biggest (non manufacturing) hard drive and memory buyer in the world to release some astounding real-world reliability tests. (More on that in a minute.) But Google couldn’t do that if it weren’t constantly collecting data about failure rates, replacement times, manufacturers — everything. Make data and metrics a part of your organizational culture, because the next time you need to make a key decision, it will be great to have the data to back it up.
ECC and Memory Replacements Actually ARE Important
Here’s the most recent finding of Google long-running study of RAM: Memory errors happen way more often than you think. After 2.5 years analyzing more RAM than most countries use, Google found that the rate of hardware errors was higher than they’d expected, with machines likely to experience at least one per year under continuous operation. Plus, after about 20 months in service, the error rate jumps up drastically.
If you’re running a data center, get that memory on a regular replacement cycle. Or at least check out the Google study to learn more about the risks you’re running.
Hard Drives Fail, Too. Starting at Two Years
Here’s some more info from Google’s vast data set: Hard drive failure rates jump from around two percent in the first year of operation up to around eight percent after two years. Google’s hard drive usage more closely resembles a torture test than the way most consumers normally user a hard drive, but its other conclusions were striking. According to the study, failures weren’t correctly predicted by SMART and drive temperature didn’t correlate with failure rates.

