I’ve used CouchDB for a project at work and now I’m just seeing opportunities for using it everywhere. At the moment I’m just starting out a new project (more on that later) and I think CouchDB just fits the bill (lots of reads, high availability, etc.) So, how about I benchmark CouchDB first and see what we can make out of it?
Much along the lines of Chris’ post on benchmarking CouchDB I decided to have a jab at it.
DISCLAIMER 1: I’m still very new to CouchDB, there’s no guarantees that my setup is right, that mi views are right or that I benchmarked the whole thing properly. You’ve been warned.
DISCLAIMER 2: I really like CouchDB so you will see a couple of “wow”, “awesome” and “yeah!” all throughout this article. Don’t expect me to be unbiased.
My setup
I ran a single instance of CouchDB on my desktop machine featuring an Intel Core Duo each @2.2GHz. In terms of RAM I only have 2G (2075688kB as reported by cat /proc/meminfo).
erl reports a version number 5.6.5:
ulises@mochos:~$ erl Erlang (BEAM) emulator version 5.6.5 [source] [smp:2] [async-threads:0] [hipe] [kernel-poll:false] Eshell V5.6.5 (abort with ^G)
The test DB
This is a very simple/small DB with a total number of docs of about 28,000 (as reported by CouchDB):
{"db_name":"all_lyrics", "doc_count":28446, ... }
and the average doc size is 573 bytes. Each document has a structure as follows:
{"_id":"0008ac909c7a176746eb5de7e185d305", "_rev":"774836400", "id":23787, "artist":"bruce dickinson" "song_name":"man of sorrows", "lyrics": "lyrics go here ... omitted for brevity"}
As you can see this is a small and simple DB however I just wanted to find out about how would CouchDB would perform on it. My expectations: it will rock.
GETting a document
I started simple: getting a document a lot of times with no concurrency. Maybe it doesn’t even make sense to test this, but I did it anyway. Then I tried incrementing the concurrency level and looking at the number of request served and how long it took CouchDB to serve up to 99% the requests.
The actual numbers
The command I ran was
ulises@mochos:~$ ab -n 100000 -c N http://localhost:5984/all_lyrics/0012ed2f6d93d78f929e251af876cab7
for N = 1, 10, 100, 1000.
| # of concurrent reqs. | reqs/sec (mean) | ms to serve 99% of the reqs. |
|---|---|---|
| 1 | 789.96 | 3 |
| 10 | 1363.47 | 16 |
| 100 | 1435.43 | 256 |
| 1000 | 1345.04 | 3269 (still serving 90% in under 173ms though) |
What does this all mean?
From the top of my head: I’m not sure.
Keeping in mind the setup, i.e. a single CouchDB instance, normal PC, etc. these numbers look good. I mean, would you even consider serving 100 or more requests a second with this setup? I didn’t think so.
What I did notice was that CouchDB hit a ceiling at about 1400 reqs/sec almost regardless of the level of concurrency (not enough data to assert that though so take my comment with a grain of salt). That is: it managed quite well. The request serving times is a different story however. The more request being handled concurrently the more serving times degrade. However, serving 90% the requests under 200ms is certainly really really good (and in some cases serving 99% the requests in under 16ms). Remember that 90% the requests in this scenario means handling 90,000 reqs in under 200ms! (see the -n argument on the shell command). Wow! Awesome! Yeah! (see disclaimer 2)
Now let’s try some views.
GETting views
Now for this dataset I have some views that don’t play nice. For every document they iterate over every word in the lyrics field and emit it (much like Chris’ counting words example). Some have reduces, some don’t.
global_tf
Without going to much into details the view I tested is reproduced here:
{ ... "global_tf": { "map": "function(doc) { if(doc.lyrics) { words = doc.lyrics.split(" "); for(i in words) emit(words[i], 1); } }", "reduce": "function(key, values, rereduce) { return sum(values); }" }, ... }
This is the standard word counting view that produces entries of the form ['word', number_of_times_it_appeared]. As a view it is fairly intensive to build for the first time you GET it as it takes a couple of minutes but after that it spits out the results rather quickly.
I did my tests after the view was built (of course).
Chris commented on retrieving views and how at first he (together with @janl and @mattetti) thought that they should perform as fast as retrieving documents due to their incremental nature. However they suggested later on that this is not the case as every time you GET a view it has to check that it’s up to date. I expect my views to perform much much worse than getting document as they are fairly cpu-intensive and produces a lot of info (so checking if it’s up to date should add quite some overhead) but we shall see.
The actual numbers, part II
Again the issued command:
ulises@mochos:~$ ab -n 100000 -c N http://localhost:5984/all_lyrics/_view/index/global_tf?key="aa"
for N in 1, 10, 100 and 1000
A quick test on the view reports:
ulises@mochos:~$ curl http://localhost:5984/all_lyrics/_view/index/global_tf?key=%22aa%22 {"rows":[{"key":null,"value":66}]} ulises@mochos:~$
so the view works and returns a reasonable amount of data. Now on to the results:
| # of concurrent reqs. | reqs/sec (mean) | ms to serve 99% of the reqs. |
|---|---|---|
| 1 | 178.05 | 9 |
| 10 | 283.99 | 54 |
| 100 | 343.16 | 3181 (with 90% under 212ms) |
| (*)1000 | 372.77 | 9551 (with 50% under 229ms) |
(*) I ran the benchmark with 1000 concurrent request for a total of only 60000 requests as anything bigger than that would timeout with a message like
apr_socket_recv: Connection timed out (110) Total of 64153 requests completed
Running out of ports? running out of processes (CouchDB launched a couchjs process for each request)? I don’t know, I still have to investigate this issue.
So what do these numbers mean?
Again: I’m not sure. As expected GETting views is much slower than getting documents but this is certainly associated with the sort of views I deal with. Still retrieving info from view performs rather well as you can see that you could serve up to 100 concurrent requests with reasonable performance with a single CouchDB instance running on a rather modest desktop (90% the requests are served in under 250ms). Rock on!
Wrapping up
My poor attempt at benchmarking CouchDB showed me that it does indeed live up to the expectations. For handling more serious traffic and bigger databases you will have to have more than one CouchDB instance and certainly bigger hardware, but that’s the case with any other DB that I am familiar with. Moreover proxying and loadbalancing a couple of CouchDBs (would that qualify as a living room?) is an interesting option. One thing I haven’t bechmarked is getting data out of views (or getting documents) while updating the DB although I suspect that unless the view is really nasty CouchDB will deliver. Either way the options are plenty and I like what the future holds for us happy and relaxed CouchDB users!
Comments 2
Good work benchmarking. The first set (doc requests) seems to be about what I was getting, and the reduce view times seem about what I’d expect them to be. I’d be interested to see what the timing is like for view without reduce. Reduce will be slower than Map, because it always uses 1 couchjs process. Map on the other hand should be nearly pure IO.
Posted 07 Dec 2008 at 1:59 pm ¶@Chris I haven’t tested GETting a view without the reduce part but will do so. As soon as I do it I will report the numbers here. One thing I did notice was that CouchDB was launching a new couchjs process per request and that seemed a tad excessive. Perhaps is as you comment the fact that I was testing views with the reduce, I don’t know really.
Posted 07 Dec 2008 at 4:41 pm ¶Post a Comment