I previously wrote about my ZFS NAS, and how I managed to get dedupe to write moderately fast with a cheap SSD drive.
I've decided to go back on the decision to use dedupe at all, and I have some pretty great fast and noisy hardware to talk about instead.
Today, the publically-available "build 134" of OpenSolaris has quite a few dedupe bugs that make simple operations like destroying a dataset or even rm'ing a bunch of files very slow, sometimes leaving a server unresponsive for a day at a time. These bugs are getting fixed (there's a lot of talk about "single threading" and things like that), but it has been 6 months with not a lot of forward progress. The Nexenta guys are patching their OS from the development versions, which apparently are buggy, but I didn't want to deal with any of that.
At the same time, I had a security camera logging to my NAS, and almost filled it up. I tried to delete an old 500GB snapshot, and it locked the machine trying to remove all the deduped blocks. Basically there was no way to free up space. Also, my dedupratio at the time was about 1.8...when you're using 5TB on a 4 x 1.5TB RAIDZ there's no easy way to go back.
So I decided to get more disks and not ever turn on dedupe again.
I considered building a totally new server in a new enormous case, so I could put more disks in it. But this is tricky: you can get the cooling wrong, and a lot of the off-the-shelf bigger cases sound like wind tunnels. Instead, I decided to do it like datacenter guys do it: get a "direct attached" drive enclosure.
I found this enclosure from Sans Digital: the TowerRAID TR8X.
The TR8X is not cheap in home NAS terms (it's $400, plus a $300 LSI SAS controller), but compared to a box from Dell (which costs thousands) it is very cheap. And it has the advantages of a "real" external drive enclosure: it has good cooling, it has hot-pluggable drives, and it's actually SAS, so every OS on earth will recognize the drives instantly, and you can use enterprise SAS drives if you wanted to spend the money.
It turns out the cheaper "port multiplier" SATA controllers have mostly buggy drivers, even on Linux sometimes. But this SAS one works everywhere.
So I plugged it in.
I don't usually believe theoretical performance numbers will ever really happen, but a ZFS stripe on 8 drives actually did give me 700MB/sec performance.
With my old 4-disk setup, I was also nervous about running "single parity" RAIDZ. With this upgrade I was able to go to RAIDZ2 (so 3 disks have to fail before there's data loss), and the TR8X is still pulling >320MB/sec read and write with all that turned on. And a scrub of 5TB takes less than 4 hours. It's way too fast for anything I do.
This enclosure isn't for everyone: it is quite a bit noisier than my old quiet NAS, a lot of that is due to the really noisy Hitachi A7k2000 disks I got.
But it is a really solid piece of hardware, performance is amazing, and I didn't have to do a whole lot of reconfiguration to upgrade.
I wish everyone luck with dedupe, but I'm going brute-force for now, instead.