Sync is (almost) the new backup

I've been wrestling with how to keep my photos and videos and code and email backed up for several years now. Haven't lost much yet, but I spend a lot of time working on it. Currently I'm using a combination of rsync to a machine I put in a colo, plus a sync tool I made for myself to schlep files from Windows to UNIX filesystems, plus ZFS (with its awesome snapshots) running on Solaris, plus Mozy, plus a little Time Machine on the Macs, and occasionally Crashplan.

My idea in writing this post is that the only system that works for a majority of people is p2p sync of actual files, because this results in multiple "live" replicas of a file, which can be verified to work all the time:
  1. Not backup to DVD/USB drive
  2. Not cloud backup over 128kbps DSL
  3. Not cloud sync.
Here's why.

Cloud Dreams!

First what we've got is the "cloud" guys purporting to store all our stuff online! What a great solution, and it is the right solution, because it doesn't get lost. But look at the actual cost:

Amazon S3: $1.80/GB/yr
MozyPro: $6.00/GB/yr
1TB Box in colo: $1.50/GB/yr
MozyHome: $60/yr (unlimited magic)
1TB local hard drive: $90 ($0.09/GB, maybe $0.03/GB/yr)

Now the photos-only guys (no raw, limited video, etc.):
Picasaweb: $1.25-2.00/GB/yr
Flickr: $24.95/yr
Smugmug: $39.95/yr

Loss-averagers

Let's first blow up the "unlimited" storage idea. We're in an era of stupidly slow upstream connections, and these guys get away with "surely nobody is going to upload stuff to us for 6 months".

I installed special traffic shaping on my firewall and bought a really fast line so I could upload for 6 months without really noticing it, and then I did it. But really, is this pricing model real? Only because people don't use it. So they have a theoretical backup, not a real one.

For instance, Mozy clearly isn't faring so well when I store 400GB there. Looks like they are losing $2340 on me annually, when you compare to their "pro" pricing. Ouch.

Same with Smugmug, Flickr, etc. Having Smugmug store my stuff in S3 at $1.44/GB/yr (Amazon's bulk rate) losing them about $536/yr. I imagine Flickr pays more than that, but I don't have any numbers.

So, PCs are 30x cheaper than the cloud

Storing an extra copy of this data on a new hard drive in my computer costs $12/year, plus maybe $8 for power if I leave it on all the time, or about $20/year.

I could even buy a netbook with a 160GB hard drive, and use that as a backup brick for $1/GB/year, cheaper than Amazon s3.

Sync today...tied to the cloud

So today, we have serious R&D going into things like "Microsoft Live Mesh" and Dropbox, and Zumodrive, and SugarSync. They handle the hard problem of folders that stay in sync no matter what you do to them. Lots of people are working on this.

But all of these services (except Mesh) today are tied to costs in the $1+/GB/year world of cloud storage. And so I can't use them for storing my "important" data. Even though maybe they are cool, I'm not paying an extra $1000/year for the privilege, and most don't scale to the level I'd need anyway.

Backup today...sort of

On the backup front, Mac users can get a Time Capsule and store their data on a second hard drive. And there are dozens of backup programs that will copy data you might or might not be able to restore in the future. Microsoft has some sorts of backup built into Windows 7, with the Time Machine equivalent being relegated to "Ultimate".

Crashplan is probably the best p2p backup for regular users. It mostly works, even if its CPU and memory usage is a bit high.

But the problem with backup is twofold:
1. Restore has to work. You're not watching daily for failures in your external USB drive, unless you're an enterprise IT guy whose job it is to keep the near-line backup device functioning.
2. You are using hardware for no incremental benefit. Backup is complex and there's no benefit for you to buy more of it to store more backups. Why buy it or install it? Why verify it still works? When you run out of space, what happens?

People need a benefit for backing up today.

What's missing: p2p Sync

Microsoft bought a little company called FolderShare a few years ago, and hasn't improved it much. 20,000 files, max, not so many folders. But what FolderShare did is approximately, sort of, the right thing: a PC-to-PC replication feature where files are actually usable while they're replicated. Why is this so important? Because you didn't know that all your backups were corrupted, or that your backup USB drive wasn't working.

But if your files stopped working on your laptop but were still good on your desktop, that would be noticeable. You could fix it, and you'd notice it, and you'd hopefully get a new laptop before it was too late.

The only system today that appears to do p2p replication is Mesh, and it's a complex weirdo with Silverlight UI in a browser (huh?!). My install on XP took an hour and asked to setup remote desktop more prominently than setting up stuff to sync, and I couldn't figure out exactly how to sync data to my wife's computer. Mesh appears to be a toolset to solve problems, but it doesn't really help you figure it out much.

Using Mesh is a letdown. When I clicked on a folder in the web UI, it took 20 seconds to show the 15 files inside it, and I'm apologizing in advance for not having really run it through its paces. I uninstalled it. I get the idea, but the implementation is just awful.

Mesh also lets you replicate folders from all over your hard drive, which is the other useful feature you really want. Dropbox doesn't do this.

I'm wondering if somehow Mesh will win despite its unreasonable bulk. I suppose the Microsoft 3.0 rule still applies. They are certainly thinking right, even if the execution is off.

Meanwhile, there are rumors of Dropbox doing useable p2p sync. I think this would be a great thing, and maybe people would understand it. Dropbox mostly just does one thing and does it really well.

But I need to be able to bring home a new computer, add it to my network, wait for all my files to show up, and turn off the old one. Three copies of everything I care about, files shared between my wife and I. Easy right?

But somebody...really just needs to do it.

PC hardware is crap (99% uptime sucks)

Until now, I've mostly been on the 2 year "faster, better" upgrade path. New CPUs were 4x faster than what I had on my desk, so I upgraded. This happened often...and until the last couple of years, computers lasted longer than I kept them.

I realized the servers in my closet were going on about 6 years old and needed upgrades, and over the last year, the laptops are all 3 years old, and we've replaced power supplies and laptop keyboards and many hard drives and even LCDs, and just tonight my monitor stopped turning on, after my Macbook hard drive started clicking and not booting.

Planned obsolescence.

And all I can think is that the quality of PC hardware seems to be crap right now. Quality of hardware still matters. My computer is less reliable than my car, and a few years ago, it seemed to be much more reliable. When my car fails, it doesn't drive off a cliff, it just smells bad or stops moving. When my computer fails, it loses all my data and leaves me unable to do work for a week.

(MTBF of hard drives appear to be going up exponentially in the marketing literature. It's not true.)

If anything will propel cloud computing to the mainstream, it will simply be the demise of the PC as a reliable store for data. The complexity of client OSes makes it impossible to replace hardware or to ensure that data, settings, or availability of the "upgrade experience" is reasonable. The size of the data is huge, and moving it to a new machine is too difficult.

If I had to give my own IT operations an "N-nines" uptime percentage, I would only qualify for 99% uptime over the past year. It is common for each machine to have a full day outage once a year. We have 8 computers in the house, so generally the internet still works, but individual PCs aren't so great anymore.

99% uptime is awful, and it should be possible to fix the hardware AND the software to do better.

I keep great backups, so I don't usually lose data, but I lose a TON of time. Fixing things, restoring things, reinstalling things.

We're seeing cloud services with 99.9% and 99.99% uptime, and the PC is looking very dated in this model. Because the PC wastes your time, and the PC loses your data.

On the hardware end: why isn't every new PC shipping with RAID-1? OEMs seem to charge 5x retail prices for hard drives, so why aren't there free replacement parts for 3-5 years, like the manufacturers offer? Why can I only buy redundant power on server configurations? Why can't the cabling bus be separated from the power supply, so I can buy a standard part and replace it? Fans? RAM tests? Is it still 1991?

On the software side: why do so few people backup their data? Why is it so hard to restore a full OS, or to upgrade to a new machine? Where is my distributed cloud filesystem? I used one in college in 1993.

I could build most of this redundancy into a PC for $500 extra, and I would pay that for certain to have a reliable computer.

And even the expensive computers from companies like Apple don't have it, not much of it anyway.

The PC needs to evolve or be obsoleted by much better ways to store data.

Today, I can get service contracts that promise to replace things that break, maybe within 24 hours or 4 hours or a week, but nobody in the hardware industry seems to consider it their business to preserve my use of the computer or that my data stays around.

Hard drive manufacturers are working on their "data recovery" businesses (very profitable), instead of trying to improve the environment where so much data needs to be recovered.

Seagate, since you've lost so much data for me this year, why not ship a 2x2.5" hard drive in a 3.5" pack that does its own backups and beeps loudly when it fails and plugs into its replacement for migration? It's possible to build entirely new systems that work better.

I absolutely don't mind a fragmentation into "netbook" and "reliable home PC" market, or whatever it takes to have a place where a person's photos and documents can live without getting lost entirely every 5 years. But the software and hardware that's getting built has to adapt to current needs, and 5 year "catastrophic" failures are just no good for most people's photos and videos and important documents.

All my techy friends buy NAS boxes and setup Linux servers with RAID-6 and bizarre filesystems, and we all spend too much time at it. And lose data too. Regular people just lose data, and that's the end of it.

This really needs to get fixed, in hardware or software.

The long-term arrow is pointing towards software, and the only place things are moving is software.

But today's software isn't ready for it, not entirely.

And today, I think hardware manufacturers are missing out on a boatload of revenue by not offering better hardware, and I think they should meet the demand for it with innovative products that focus on making a promise to consumers that's not solely about price.