Here are a few notes and updated recommendations:
Single-parity RAIDZ makes me nervous; dual-parity RAIDZ2 is better for integrity and a group of mirrors is better for speed. Of the 5x1.5GB Caviar Green drives I started with, I've replaced 2 due to small failures that ZFS detected. (I can't easily upgrade to RAIDZ2.)
Weekly scrubs find errors, but you have to do a little work to optimize for scrub time. At one point, my scrubs took 80 hours, and now they take about 16 hours for a larger amount of data. What helped? A few things:
- Disable access time (atime). Otherwise, snapshots with lots of files that you verify daily will each have their own metadata, and scrubs will take dramatically longer. Also, disabling atime gives you general performance boost.
- Install a newer ZFS that has metadata prefetch during scrub (I think this was added in b129).
- One of my datasets had 4 million files, 20 snapshots, and compression turned on, and destroying this dataset reduced my scrub time a lot. (It was a Mac rsync backup, which now uses Time Machine instead.) I think compression can slow down scrub, but it might be unrelated.
Dedup eats write performance, and you must use SSD
When I first enabled dedup and replayed all my datasets (zfs send/recv), I was able to write to the dedup'd RAIDZ volume at only 3MB/sec! Previously I could write at 60MB/sec.
The best theory I have is that the DDT ("dedupe table") was using more space than RAM, and so the number of reads from disk required to do a small write was very large. Not much would help this, until I put an SSD drive into the pool. DDT is now cached and I can write 25MB/sec usually.
Also, "upgrading" to dedup is somewhat difficult and time-consuming. There is no automatic way.
However, my "dedupratio" is 1.7 for the volume now, so even though the write speed isn't as good as before, the results are amazing, and I can tolerate it for the storage efficiency. The speed when reading is as good as before, also.
SSD+ZFS is magic. ZFS is the first system that makes a tiny $90 SSD drive super-useful. With normal filesystems, you have to manually move "hot" data (like your OS) to the drive, and then you run out of space or spend $1000 to get more. ZFS does this automatically, using the SSD as a cache. I got the OCZ Vertex 30GB drive, and while I know there are faster Intel drives, this has made an enormous difference.
As I said above, SSD has improved my dedupe write speed by 8x. And it also serves as a cache of hot data, so if you read a lot of filesystem metadata (like you would by compiling over NFS) it can perform 50x faster than leaving it out of the pool. (This 50x number is based on a benchmark of opening all the files in a folder, reading the first 30k, then closing.)
Also, the SSD acts as a "log" device and can handle small writes much better than disks can. So when an NFS client wants to do a small write, the ZFS NAS can respond dramatically faster than a disk-based server can, but still can guarantee data integrity.
There is some debate about which devices are suitable for use as a ZFS log. Some devices may slow down a fast mirror, by writing slower than the disks would. Also, the device isn't allowed to do any RAM buffering to improve write speeds (and my OCZ Vertex might do this wrong). But in the meantime, NFS and CIFS are just quite a lot faster, so I will pretend that I'm not really in much danger of data loss. Currently, about 2GB of my SSD is devoted to log, and the rest to cache. Here's a solaris-discuss thread that says the OCZ is slower than it seems to me. My NFS compiles are incredibly fast right now.
You can spend more on an Intel SSD, and supposedly it's even faster.
Dedupe is an amazing technology, but you have to give it the hardware it needs. If I could figure out how to make a quiet case that held 10 drives, I'd probably avoid it. But for a 4-disk RAIDZ, it is a good match for me.
My advice is to add an SSD to your ZFS box, no matter what. For certain, you can use a $90 SSD to be a nice fast cache. If you want to use the device as a ZIL (ZFS intent log) and you're paranoid about data integrity, read the thread above and spend $500 on an SSD. Otherwise, I think my $90 one does pretty well too.