17 January 2013
Recently I decided to get myself somewhat up to speed on some of many nosql databases. Redis is a tool I have heard people raving about for several years now, so I decided it would be a great place to start.
Redis is an in memory key value store, but with a difference. Many key value stores allow a string value to be stored against a key, and that is it. Redis supports this, but it also supports other simple types too:
Plenty has been written about these types in other places, and I am not going to say much more about them. Read up in the Redis Manual and have a scan over all the available commands.
With Redis, all of your data needs to fit in memory. It has a couple of persistence options to ensure your data is not lost if Redis crashes, but this data is never read except at startup time.
In Redis 2.2 an experimental feature was added called Virtual Memory, but it was later removed in Redis 2.6. The Redis team decided that they wanted to do one thing well - serve data from memory - and not be concerned with reading data from disk.
One interesting thing about Redis is that it is very simple. It is a single threaded server, and in version 2.2 was only 20k lines of code. Due to the single threaded design, Redis will use at most 1 CPU core (except when it is background saving when it could use another one) and this helps keep the code simple - Redis doesn't need to worry about latches and locks to prevent concurrent processes stomping over each other.
It can only run a single command at a time - in other words each command is atomic and blocks the entire server while it is being processed. When you first hear this, it sounds like a terrible, unscalable design - until you realize just how fast typical Redis commands are. 50K - 100K requests per second are typical on commodity hardware on a single CPU.
Right now, Redis is a single node database. If your data is too large to fit on a single machine, then sharding it across multiple machines is a job for the application. Apparently Redis Cluster is coming and should provide sharding capabilities.
One Redis node can be replicated to another very easily, so having a fail-over instance that is identical to the master is pretty easy right now.
Getting started with Redis is so simple, it is literally as easy as:
$ make
$ cd src
$ ./redis-server
Unlike with many databases, you can be up and running in about 5 minutes. Setting up replication is just about as easy too.
You can read the entire Redis manual and understand it all in under a day, which is one of the reasons I decided to investigate it.
Redis is the first non-relational database I have ever experimented with, and I was sceptical about being able to hit the performance numbers that were being suggested. Luckily Redis comes with a handy benchmarking tool, which allows you to see how it performs on your hardware.
Simply running:
$ ./redis-benchmark
Will run a bunch of tests you can use to compare your setup with others. The default tests are not terribly real world in my opinion, as the value set for any keys is always 2 bytes, but this can be changed with the -d switch.
Running the benchmark on my hardware, with a payload size of 200 bytes, gives the following results for set and get operations:
$ ====== SET ======
10000 requests completed in 0.07 seconds
50 parallel clients
200 bytes payload
keep alive: 1
100.00% <= 0 milliseconds
151515.16 requests per second
====== GET ======
10000 requests completed in 0.08 seconds
50 parallel clients
200 bytes payload
keep alive: 1
99.51% <= 8 milliseconds
99.97% <= 9 milliseconds
100.00% <= 9 milliseconds
131578.95 requests per second
At 150K sets per second and 131K gets per second, the single threaded nature of Redis doesn't seem so bad any-more.
The benchmark tool also lets you run any Redis command you like, for instance, to test the cost of pushing 1 million items onto a sorted set (a more complex operation that a simple key-value set operation), try the following command:
$ ./redis-benchmark -r 1000000 -n 1000000 zadd sortedset 10 random_value_that_is_a_little_long_:rand:000000000000
====== zadd sortedset 10 random_value_that_is_a_little_long_:rand:000000000000 ======
1000000 requests completed in 10.70 seconds
50 parallel clients
3 bytes payload
keep alive: 1
99.96% <= 1 milliseconds
99.99% <= 2 milliseconds
100.00% <= 9 milliseconds
100.00% <= 10 milliseconds
100.00% <= 10 milliseconds
93466.68 requests per second
93K requests per second - not bad at all.
Redis has plenty of potential use cases, I cannot possibly think of them all. At the most simple, it can be used as a cache for a web application, it also has applications in queuing, distributed object stores and with some thought the lists, sets and sorted sets have a lot of potential use cases.