Counting tags with CouchDB and map-reduce
My previous post on CouchDB covered adding a simple view, but what if we've got a problem that can't be solved simply by mapping the existing documents to new documents? What if we want to get a list of each tag used in articles along with a count of how many articles use that tag? Sure, we could emit doc.tags
then munge the resulting arrays in the client language, but wouldn't it be great if CouchDB would do that for us?
Well, yes, it would be great, and CouchDB can do it for us.
As a reminder, here's an example of the article documents that I'm using.
{
"_id": "monkeys-are-awesome",
"_rev": "1534115156",
"type": "article",
"title": "Monkeys are awesome",
"posted_at": "2008-09-14T20:45:14Z",
"tags": [
"monkeys",
"awesome"
],
"status": "Live",
"author_id": "craig@barkingiguana.com",
"updated_at": "2008-09-14T21:23:59Z",
"body": "The article body would go here..."
}
It's fairly easily add a view that, for each document, will return a count of how many times that document has been tagged by each tag.
function(doc) {
if(doc.type == 'article') {
for(i in doc.tags) {
emit(doc.tags[i], 1);
}
}
}
For the example article document this would return ("awesome", 1)
and ("monkeys", 1)
, but if there were several documents tagged "monkeys" then this would return ("monkeys", 1)
several times.
What we want to do now is to reduce
the result set to a list of unique tags and the number of times those tags were found in the results.
The reduce takes the form of a method. It's called once for each unique instance of a key that appears in the map output and is passed the key and an array of all values that the map emits with that key.
Since our keys are tags and our values are numbers we simply need to add all the numbers for each key.
function(tag, counts) {
var sum = 0;
for(var i=0; i < counts.length; i++) {
sum += counts[i];
}
return sum;
}
This can be installed in the same way as a map function, just with the key "reduce".
{
"tags": {
"map": "function(doc) { if(doc.type == 'article') { for(var i in doc.tags) { emit(doc.tags[i], 1); }}}",
"reduce": "function(tag, counts) { var sum = 0; for(var i = 0; i < counts.length; i++) { sum += counts[i]; }; return sum; }"
}
// other views omitted for brevity
}
Looking at this view in Futon will get you a nicely formatted list of tags and a count of the number of documents that have the tag. To use the view directly you must ask CouchDB to group the results by key.
// GET http://localhost:5984/blog/_view/articles/tags?group=true&group_level=1
{"rows":[
{"key":"awesome","value":1},
{"key":"agile","value":2},
{"key":"ajax","value":2},
{"key":"apache","value":2},
{"key":"api","value":1},
{"key":"caching","value":1},
{"key":"coding","value":7},
{"key":"conference","value":1},
// and so on ...
]}
curl -LO http://barkingiguana.com/2009/01/28/counting-tags-with-couchdb-and-map-reduce.html.orig
curl -LO http://barkingiguana.com/2009/01/28/counting-tags-with-couchdb-and-map-reduce.html.orig.asc
gpg --verify counting-tags-with-couchdb-and-map-reduce.html.orig{.asc,}
If you'd like to have a conversation about this post, email craig@barkingiguana.com. I don't bite.