Nerdblog.com: Cloud DBs

I've been doing some research on "cloud databases" - the non-relational key/value storage systems that people are using to scale their web apps past MySQL and SQLite.

First are the Bigtable clones, where you actually get columns and higher-level features:

HBase: database used for Hadoop

Cassandra: database used by Facebook

These are "big" projects, and if you have a big application you might consider them. But there is a lot of code there, and they seem pretty new for their complexity.

Next are the low-level improvements on what people know as "DBM" or "Berekeley DB", the simple "put a value, get a value" interfaces. Typically these packages wrap a number of different backends: typically a fixed database (flat file), a hash table, and a B+Tree. Compared to Berekely DB, these guys are faster and usually LGPL:

http://luxio.sourceforge.net/

http://tokyocabinet.sourceforge.net/spex-en.html

http://tokyocabinet.sourceforge.net/benchmark.pdf

By some reports, using this kind of code is 10-100x faster than using MySQL or Sqlite to do the same task. And bindings are good, supporting C, Java, Ruby, and Python.

Of course there's the popular "memcached" which stores key/value pairs in RAM across multiple machines. Memcached is interesting because people are using the protocol as a standard for persistent key/value storage (as well as what everyone knows, an implementation for RAM-only caching):

http://www.danga.com/memcached/

A feature you might want is to be able to access your database over the network, rather than by touching disk. Interesting entrants here:

http://couchdb.apache.org/

http://tokyocabinet.sourceforge.net/tyrantdoc/

http://memcachedb.org/ - Danga's memcached + Berkeley DB

Tokyo Cabinet and MemCacheDB support the "memcached protocol" and most of the above do some kind of rest-ful storage. CouchDB does map/reduce for its indices, which sounds neat but proves to be 100x slower than MySQL in practice.

Finally, you should know which of these systems support horizontal scaling (i.e. linear scaling when you add more machines), and those include HBase, Cassandra, and some layers on top of the key/value guys. Most of the above systems (including CouchDB) do not scale horizontally, and you basically make full replicas of all of your data, or just use them on one disk.

LightCloud: scaling layer built on Tokyo Tyrant.

Project Voldemort: used by LinkedIn and others

At this point, I'm very impressed with the Tokyo stuff, and I especially like that I can break the key/value abstraction and do cursor ops on the btree directly. So if I have 1000 keys that appear sequentially, it is insanely fast to fetch them.

For smaller projects I think I'm going to test out Tokyo Cabinet, and for larger ones Lightcloud. Love to hear other suggestions.

2 comments:

James Burke9:47 AM
Have you looked at Persevere? It just got a new storage engine that reminds me a bit of the couchdb model, but I have not looked closely at it. It does not do replication yet, but it is listed as a future enhancement. It is very JavaScript/JSON-centric. This blog post describes the new storage engine.
Gavin4:16 PM
What James said. I was just going to point at that same URL. There's also AppEngine (bigtable), of course, just for completeness.