Hard drives don't multitask

Just so you have a rule of thumb about it:

If your app does disk-bandwidth-bound work in one thread (doing 1MB reads, say), and you brilliantly decide to make it do the same work in two threads, then things overall will go 10x slower. This is one of the things I keep having to remember in Picasa.

People say Windows Vista (which allows bigger reads and some nicer scheduling) and Linux 2.6 (which adjusts the thread quantum based on predicted disk activity) and hard drives with Native Command Queueing can all help this a little bit, but overall I've not seen a situation where you retain even half the single-threaded disk bandwidth.

2 comments:

  1. I find this to be true to a certain extent. However, I was surprised when I recently had a chance to convert a script that iteratively processes a few thousand images to be multithreaded (via fork). On a dual processor machine, running in two threads and dividing the workload evenly, the task finished in almost exactly half the time even though the workload was to run an imagemagick command across each image, producing another image output. On a quad processor machine running in processes it cut the time to about a quarter of the time. And, interestingly, running 5 threads still produced some small gains. The original time to run the process took over 8 hours and now barely takes over 2 hours.

    Why is this when we so often see apps bottlenecked by the hard drive and this clearly used the disk for both reading and writing? Well, my theory is that I got lucky and there was just enough processing to allow queued reads and writes to occur during that time. I was surprised that it still held up across four or five threads, though.

    Certainly, though, we still need much faster hard drives.

    ReplyDelete
  2. This just shows that ImageMagick is slow and CPU-bound. Disk speed only matters when you are waiting on the disk.

    ReplyDelete