Mar 7, 2010

ZFS NAS followup: SSD is amazing

I've been running my ZFS NAS for about a year. By now, I've upgraded many times, currently at snv132 from 101b, and I've enabled dedup for the storage pool.

Here are a few notes and updated recommendations:

Single-parity RAIDZ makes me nervous; dual-parity RAIDZ2 is better for integrity and a group of mirrors is better for speed. Of the 5x1.5GB Caviar Green drives I started with, I've replaced 2 due to small failures that ZFS detected. (I can't easily upgrade to RAIDZ2.)

Weekly scrubs find errors, but you have to do a little work to optimize for scrub time. At one point, my scrubs took 80 hours, and now they take about 16 hours for a larger amount of data. What helped? A few things:
  • Disable access time (atime). Otherwise, snapshots with lots of files that you verify daily will each have their own metadata, and scrubs will take dramatically longer. Also, disabling atime gives you general performance boost.
  • Install a newer ZFS that has metadata prefetch during scrub (I think this was added in b129).
  • One of my datasets had 4 million files, 20 snapshots, and compression turned on, and destroying this dataset reduced my scrub time a lot. (It was a Mac rsync backup, which now uses Time Machine instead.) I think compression can slow down scrub, but it might be unrelated.
Dedup eats write performance, and you must use SSD

When I first enabled dedup and replayed all my datasets (zfs send/recv), I was able to write to the dedup'd RAIDZ volume at only 3MB/sec! Previously I could write at 60MB/sec.

The best theory I have is that the DDT ("dedupe table") was using more space than RAM, and so the number of reads from disk required to do a small write was very large. Not much would help this, until I put an SSD drive into the pool. DDT is now cached and I can write 25MB/sec usually.

Also, "upgrading" to dedup is somewhat difficult and time-consuming. There is no automatic way.

However, my "dedupratio" is 1.7 for the volume now, so even though the write speed isn't as good as before, the results are amazing, and I can tolerate it for the storage efficiency. The speed when reading is as good as before, also.

SSD+ZFS is magic.
ZFS is the first system that makes a tiny $90 SSD drive super-useful. With normal filesystems, you have to manually move "hot" data (like your OS) to the drive, and then you run out of space or spend $1000 to get more. ZFS does this automatically, using the SSD as a cache. I got the OCZ Vertex 30GB drive, and while I know there are faster Intel drives, this has made an enormous difference.

As I said above, SSD has improved my dedupe write speed by 8x. And it also serves as a cache of hot data, so if you read a lot of filesystem metadata (like you would by compiling over NFS) it can perform 50x faster than leaving it out of the pool. (This 50x number is based on a benchmark of opening all the files in a folder, reading the first 30k, then closing.)

Also, the SSD acts as a "log" device and can handle small writes much better than disks can. So when an NFS client wants to do a small write, the ZFS NAS can respond dramatically faster than a disk-based server can, but still can guarantee data integrity.

There is some debate about which devices are suitable for use as a ZFS log. Some devices may slow down a fast mirror, by writing slower than the disks would. Also, the device isn't allowed to do any RAM buffering to improve write speeds (and my OCZ Vertex might do this wrong). But in the meantime, NFS and CIFS are just quite a lot faster, so I will pretend that I'm not really in much danger of data loss. Currently, about 2GB of my SSD is devoted to log, and the rest to cache. Here's a solaris-discuss thread that says the OCZ is slower than it seems to me. My NFS compiles are incredibly fast right now.

You can spend more on an Intel SSD, and supposedly it's even faster.

Dedupe is an amazing technology, but you have to give it the hardware it needs. If I could figure out how to make a quiet case that held 10 drives, I'd probably avoid it. But for a 4-disk RAIDZ, it is a good match for me.

My advice is to add an SSD to your ZFS box, no matter what. For certain, you can use a $90 SSD to be a nice fast cache. If you want to use the device as a ZIL (ZFS intent log) and you're paranoid about data integrity, read the thread above and spend $500 on an SSD. Otherwise, I think my $90 one does pretty well too.

Jan 2, 2010

Tabula Rasa

We had a number of laptops and cameras stolen on New Years' Eve. Replaced the laptops at least, so we can get stuff done. (and furiously changing passwords...)

Since Safari doesn't have bookmark sync, my new bookmarks bar is completely empty, and I set my homepage to regular Google, not iGoogle.

The amazing thing about this is that the computer isn't "pushing" activities to me right now. I open it to do something, and then do it. It's not like "Look at what iGoogle thinks I should read," and I have to actually type something to see what's going on in Facebook/Friendfeed/Twitter land, which is enough effort that I don't do it as much.

This dopamine addiction (the pursuit of novelty) has attracted a lot of research over the past few years. And activity addiction is almost the new "couch potato" mode...spending hours reading the "news" (which is really entertainment). For people who use computers as tools, it is important to find ways to segment entertainment from work.

What I'm finding with this latest "tabula rasa" is that my own goal to "look up this" or "write to this person" is much easier to achieve without the little novelty interruption. And I think that self-directed activity is very superior to reading on the 30-second "give me a hit of novelty" schedule. Here's hoping things stay so simple in 2010.

Dec 2, 2009

LED lighting

It's almost time for LED lights to be something you might want to buy more than CFLs. Here's a glimpse, at least.

The Pharox60 is a 6 watt dimmable LED light. Seems to make a small buzzing sound from like 60-90% brightness, but is silent from 20-60%, and it doesn't go lower than that.

It's bright enough to replace a 40W incandescent, color is only decent, 86CRI.

Here's a link to the current promos to buy one:
http://www.mypharox.com/rebates.html (in British Columbia it's $30 off)
http://www.mypharox.com/pharox6w.html (in the US you can get $10 off, $40 total)

I got the earlier 4W one, and it wasn't bright enough to use, but this one is actually a reasonable replacement bulb.

Why would you pay $40 for a light? Because it lasts forever, really. But you probably want to wait until the total lumen output is a bit higher before replacing everything in your house, because you won't get to do it more than once in the next ten years.

I'm also trying some 1927 fixtures that use incandescent tube lights. Not efficient in the least, but they look like ~2300K and gorgeous.

Nov 28, 2009

Paying for healthcare by taxing the rich

I was thinking a bit about the proposal to pay for the new healthcare bill by "taxing the rich". The biggest problem with this scheme is that the rich have volatile income.

When you have to pay for a thing every year, without fail, you usually don't depend on a volatile revenue source.

Recently, Schwarzenegger vowed to fix the "boom-bust" cycle in California by spreading state taxes more evenly across income brackets. This proposal is a very conservative way to prevent volatile tax revenues. Unemployment hasn't historically risen above 10% for very long, and incomes for most people are pretty flat, so a scheme that depends less on a Tech Bubble and a Housing Bubble to meet its budgets is definitely a good step.

But the Obama healthcare proposal is the opposite. We intend to pay for healthcare using tax revenues that are extremely volatile.

To fix this (rather than going as far as Schwarzenegger's plan), maybe we should develop a spending policy based on a 10-to-15 year moving average of tax revenues, with some hedging to avoid over-committing during good times, and falling short during bad ones.

Nov 11, 2009

New Google Storage Prices: Big News

Google announced 25c/GB/year storage for photos today. That means they reduced the price of cloud storage by a factor of TEN. (And free bandwidth is included, too.)


I posted a few months ago about cloud storage prices, and how the cost of keeping machines on, ensuring redundancy, etc., was keeping prices about 30x higher than the incremental cost of buying a new disk.

This is no longer true.

Now I can get:
  1. Redundant storage, offsite backup
  2. Free bandwidth to access and share my photos
  3. An image frontend that re-sizes images to any size I like
  4. Full two-way sync and download capability with Picasa
All this for about 3x the price of local storage.

When I can store raw photos and other files, I think the battle will be over. But today is definitely the day it started.

You can mail a hard drive to Amazon to get them to put your data on S3.

So I think I need a name for the coffee shop that peers with Google over a DS-3 and has gigabit plugs for your laptop.

Suggestions?

Nov 9, 2009

Windows 7 Upgrade: Network Backup painfully expensive

If you have a bunch of computers, you can now buy the "Windows 7 Home Premium" upgrade for $149, for three computers. I almost did that, but!

...if you want to backup all three computers to a network drive, you have to buy the "professional" version separately for each machine for $199 x 3 = $597.

The upgrade to the network backup feature ($149) costs more than the entire OS ($49). This is if you don't really care about BitLocker, joining domains easily, or having an easy way to install XP in a virtual machine, which I mostly don't.

Why is keeping good backups such an expensive feature? People care about their data, and Microsoft should care too.

Single external USB drives are a huge point of failure, and people should be allowed to use their external RAID NAS device they bought for $300, without paying additional humongous sums of cash to do it.

Oct 20, 2009

The "It's Done" Fallacy

Today Apple released a new mouse, and it made me think about one of the major fallacies of developing technology products.

The fallacy goes something like this: "We're done with that, so let's do something new."

See, the mouse has been "done" for years. In fact, I tried to look for a replacement mouse for my old mouse a couple weeks ago, and was really unimpressed with the Microsoft & Logitech offerings. Nothing really new since 1999 or whenever cordless mice were "new". It seems like both of these companies mostly gave up. Longer battery life for wireless mice, I think? But they must have fired all their industrial designers, because now they have mice that push into a little well, so your fingers push into this sharp edge of the case, or these other mice that have back buttons exactly where your thumb goes (Microsoft's are smart about putting the buttons near where your thumb goes). Anyway, I digress.

Making software and hardware better is a bigger problem than that. Because it's about how you set priorities in your organization.

It's not all about "don't re-invent what we did 10 years ago."

Often it's even, "Don't work on something that we finished a week ago." And that probably doesn't make sense, but here's how it happens:

Management: Can we ship this feature early?
Engineering: Well, the UI's not really done, and we want to add all this other stuff to make it work better for actual users. I mean, it kind of works a little bit.
Management: Great, let's ship it. We can do all that other stuff in version 2.

And the big secret is that version 2 actually never happens. Ever.

Because, next month, compared to having a new feature, improving the old feature looks boring. It's not as easy to talk about. It's not as impressive.

But the best teams actually do the exact opposite, because they care about their product and their users. They go back to the thing they were working on last month, and finish the things they were intending to do. They go back to the thing they were doing 2 years ago, and make them better. They keep working until it's done.

And yes, this fallacy happens a lot in the Agile/Continuous Deployment models, where finishing a task means you should probably ship it to users. It was easier to hide "getting it right" in the old, slow way of making features. You have to schedule it in the newer models, and that's hard to do, but it's important.

The era of "more features is better" is ancient history in most products (it makes me think of those magic PC Magazine checklists comparing MS Word to WordPerfect), and companies that do a smaller number of features really really well, keep their users happier and win.

Oct 14, 2009

Best Commercial Ever: Our jokes aren't like your jokes

"Picasa-like" smooth scroll for Chrome

I modified some code I found on the web to make Chrome do smooth scroll like Picasa using the new Extensions system. You need dev-channel Chrome at the moment, and it doesn't work for all the cases, more of an experiment for now.

See this post on Stereopsis for more info.

I wish Chrome would compile on VC Express. Then I'd just hack this into the native code...ha!

Oct 6, 2009

Information Content of SEC filings: text compression for annual reports

One of the functions of the SEC is to make sure people have consistent "information" about public companies.

Of course, "information" means something else to computer scientists and people who study text compression. The more a piece of text repeats itself, the smaller a compressed file will be. So naturally I wanted to see how well SEC filings would compress.

Here's what I did:
  1. I went to the SEC's companysearch page and found the pages for SEC filings for some big tech companies.
  2. I used Firefox's "save as text" to convert the huge HTML to plaintext.
  3. Then I ran 7-zip to compress.
Of the four companies I tried, Google's 10-K reports are the least redundant, and Yahoo's the most.

Here are the specific ratios I calculated (the "terseness" factor?)
  • Google: 22.34%
  • Microsoft: 21.26%
  • Apple: 20.92%
  • Yahoo: 20.37%
Probably someone is strange enough to be tracking metrics like this, but it's kind of fun to look at the 10,000-foot view. Maybe tracking compression over time would give useful insights, like who's covering up bad results with flowery language.

Of course, this "terseness" number is only make-believe information theory. It probably matters more if the information you're reading is any good.

Oct 2, 2009

Case-Shiller up or not? Extra Math Inside!

Case-Shiller is reporting that home prices in LA are up a little.
In fact, the LA index is up 1.8%.

Great news, except...

The median home price in LA right now is $395,000.
There's a first-time buyer tax credit for $8,000.
And $8000/395000 = 2.03%

1.8% - 2.03% = -.23%

Of course, not everybody's a first time buyer.

But, when you can use $8000 more for a down payment, you can buy $80,000 more house, if you put 10% down.

$80,000 / 395000 = 20.3%.

I'll stop there.

Sep 27, 2009

L-Prize (replacement for 60W bulb) gets first entry from Philips. Also CQS...

Very excited to see the L-Prize, and Philips with the first entry!

Finally we might have something better than non-dimmable CFLs with poor color rendition, and better energy efficiency as well. (The L-Prize demands 90 lumens/watt.)

However, I do wish the prize had incorporated one of the "better than CRI" methods of color quality estimation. The L-Prize specifies CRI>90, but CRI has some issues.

CRI is Outdated: Enter CQS...

NIST is working on a new "Color Quality Scale" to fix CRI. Still in the proposal phase, but it looks very good.

Today's CRI is an average response over 8 color swatches (and not quite the right 8 colors). One of the measured samples can be reasonably bad, and a light source still gets a decent CRI score. CRI attracts lots of critics: it uses an ancient color space, and there are a bunch of other issues you can read about.

To fix this, CQS tests 15 saturated samples instead of the 8 pastel-flavored samples that CRI uses (since the latter don't show distortions from modern LCDs very well).

From an article at NIST:
None of the eight reflective samples used in the computation of Ra are highly saturated. This is problematic, especially for the peaked spectra of white LEDs. Color rendering of saturated colors can be very poor even when the Ra value is good. Further, by optimization of lamps’ spectra to the CRI, Ra values can be made very high while actual color rendering is much poorer. This problem exists because too few samples are used in the calculation of Ra, and they are of too low chromatic saturation.

Illustration from http://colorqualityscale.com/ here. Illustrated is the color response from a fixture that scores CRI=80, but CQS=73, because it sucks at a few particular colors.



CIE has also announced their interest in doing a project like this, but you apparently have to pay to see their proposal, so I don't know much about it.

Sep 7, 2009

Spaniel Enrichment

About 5 years ago, we were lucky enough to get a behind-the-scenes tour at the LA Zoo. The highlight was a visit with JonBoy, who was a really fascinating character...his job was Animal Enrichment.

Apparently when you put animals in captivity, they get, well, "bored" is the best way to say it. Daily life has a lot of variation, and predators especially aren't so happy to get a bucket of food every day. They need more of a challenge.

So that day in 2004, we learned about all these wonderful devices that people make to keep animals engaged or "enriched". It's almost like a toy shop, little containers that must be shaken, bitten, or reached into for a treat.

Taking a cue from all that, we put some chicken treats in an empty paper towel roll and closed up the ends. Maggie spent nearly a half hour getting the treats out.




Thank you, JonBoy.

Aug 17, 2009

iSCSI is fast (update: actually, it's not)

[update: iSCSI isn't so fast -- my benchmarks below are "buffer cache is fast". I went back to virtualbox disks instead, which are now even faster.]

After I built a ZFS server that uses 79W, I started looking around at other places I was using a bunch of power. And my Linux server (a few years old, P4) was using 150W+. What if I could turn it off and virtualize it?

I moved the whole thing to VirtualBox 3.0 with OpenSolaris as the host. This has worked reasonably well, aside from some odd hangs and a bunch of tuning you do to make the guest not eat all the host's CPU. (Hint: pass "divider=10" to your kernel. You don't need a new kernel.)

The glitches there are annoying enough that I recommend you mostly don't go the guest/host route. If you want to consolidate a bunch of servers, maybe you should try Xen or VMWare ESXi (which is sort of free now) instead.

And VirtualBox's performance overall is quite good (CPU, network).

The only problem is I put /home on NFS, and Solaris has all these rules, and they don't let you do async writes as much (like normal Linux NFS servers have let you do for years), and so compiling is very, very slow.

A small project of mine was taking about 2 minutes to compile over NFS. Same thing took 25 seconds on "local" Virtualbox make-believe disk. And yes I did "async" and "noatime" and all those things, and it didn't make a big difference.

So instead of adding another disk to Virtualbox, I decided to try out the iSCSI support in OpenSolaris. Would still get snapshots, would be exportable to a real physical box, and it's just cool to try.

After some help setting up the server (ahem, the "Target"), and setting up the client (the "Initiator"), things mostly worked, and I formatted with ext3.

23 seconds to do the compile.

Cool. I'm done with that stinky NFS volume.

Aug 4, 2009

When "search with Bing..." isn't

I was visiting MSDN today and saw "Search with Bing..." at the top. Like, maybe, they've fixed the terribly slow search on MSDN. Wouldn't that be nice?

I searched, and it was slow as usual. So I measured it.

MSDN search for "GetWindowThreadProcessId": 1126ms
Bing search for "GetWindowThreadProcessId site:msdn.microsoft.com": 234ms
Google search for "GetWindowThreadProcessId site:msdn.microsoft.com": 125ms

With Google, I can make a bookmarklet for "I'm feeling lucky" and get the first MSDN result in about 1/10 the time of MSDN's search.

Message to Microsoft Marketing: Search and Replace is not such a good re-branding strategy. Why rebrand something that's awful if you're trying to compete with something that's good? You are making an effort to tell people that your search is 10 times slower than your competition.

MSDN, if you need help, Google has a site search product that might be useful. Isn't any slower than "real" Google!

Aug 2, 2009

The Default Way

My uncle is a fine artist. He's almost 65, and has never really learned how to use a computer. He asked what computer to get, and I recommended a Mac. My dad asked me, why not a PC? My dad has two PCs, and everyone else in the family mostly uses PCs.

I even write a lot of software for the PC, and while I use both platforms (PC a bit more), somehow this one was pretty easy.

I told him, "The Mac has one way to do most things."

If he gets a Vista machine, and he asks someone who uses XP, "How do I connect to a wireless network?" He'll get the wrong answer. Because the answer to that question isn't the same, not even similar, on two versions of Windows.

For somebody who's never used email much, having an email program and a web browser that are quite good, that's a good enough answer. Photos, good enough answer.

My friend dropped off a midiman keyboard because he knew I broke my foot, to keep me entertained. Plugged it into the Mac, and Garageband just kinda worked. Plugged it into the PC and I don't know what to do next. I guess I could buy something, but what? Too many options. No default way.

Of course, the thing that cinched it with my uncle is Apple's "One to One" program, which provides a year of training for $99. They're going to lose a ton of money on my uncle, because he's dutifully going once a week, and doing the whole curriculum. But they'll have a Mac user for life.

Microsoft's ripping out their "Live" apps in the next OS, and I think this is a very bad move. Because it means that there's not a default way to do things. Yes there are good free apps that get included on new PCs (maybe Picasa is one of them) but they are rare. More often, their goal is to get you to upgrade or spy on you, not to make a good experience.


Photo from my doctor's office: 4 offers to upgrade trial software.

When you don't have a default way to do things, that means there are multiple ways to do things. You "go online" and try a bunch of them, and you get spyware with half of the free apps. You spend a bunch of money to get a "premium" tool, but nobody else you ask actually knows how to use it.

Apple's OS isn't more secure by design, but the funny thing is that it's more secure because you don't have to download tons of stuff to get it to work.

A default way, a usable way, a way that's not buggy and doesn't make you download stuff you don't understand.

Now when I talk to him on the phone, we talk about art and design, instead of which software to use to get something done.

Jul 27, 2009

Biodiesel can't be stored underground?

No wonder BMW is having so much trouble selling the x5 xdrive35d and 335d (both are diesel models, which Europeans love and Americans don't.)

Local gas stations are not selling biodiesel anymore because the State Water Resources Control Board asked them not to store it underground.

More here.

Jul 25, 2009

Electricity prices up 5% YoY, also home-energy monitoring!

Just because you wanted to know: electricity prices are up 5% year-over-year in California, and about the same amount across the U.S. See this chart for more data.

Depending on the state, it's a mixed bag. But it is fun to read how much prices vary: people in Idaho pay 7c/kwh, whereas Connecticut and Hawaii pay >20c!

My household spends >3x as much on electricity as gas (natural and automobile combined). I wonder if this is common, because the ratio seems high. The computers amount to maybe 15% of the total, so having too many computers isn't skewing the numbers too much.

It's a good thing utilities and other companies are paying attention to this. If you want something to help understand your energy usage before your utility company and Google fix it, this article has two good alternatives, called Blue Line and TED (which weirdly is branded very much like the TED conference.)

Geoff Fowler at the WSJ seems to like the Blue Line/Black & Decker monitors slightly more than the TED thing. Maybe it's worth trying one, since they sound pretty easy to install.

Jul 15, 2009

Dropbox to add "LAN sync"

From an email today:
In addition to the iPhone app, we're also finishing up a new version of the Dropbox desktop software that features numerous performance improvements and our new "LAN sync" feature. LAN sync knows when Dropboxes are on the same network and will automatically exchange files directly between computers instead of downloading them from our servers - this makes sharing large files in an office environment much faster than was previously possible.
I hope there's an option to keep some files off the server, as well. Would be nice to be able to use Dropbox for everything!

Jul 13, 2009

MoCA bridges rock: the Netgear MCAB1001

We've had a terrible time getting a wireless network to reach our detached garage, where Lorna paints and uses her Mac. It has kind of worked, kind of not, and the latency has been awful.

Amazingly there is coax wired between the structures, and so there's a new variety of "MoCA" bridges that will use coax like ethernet, without disrupting your TV or internet connection. It's like powerline networking, except using coax, and also except it's actually fast.

We got the Netgear MCAB1001, and it works better than you could believe. 5 minutes to set up, and then it's on.

Since the ethernet side of the Netgear is 100mbps, it appears to bottleneck there, but I'm seeing 12MB/sec over HTTP, and less than 3ms everywhere. It really just works.