Couch Cloud Tags

This is a work in progress

Tutorial: CouchCloud: Creating a Tag Cloud View in CouchDB

The goal of this tutorial is to display a tag cloud from CouchDB. In so doing we'll learn about more complex key views. This means learning about both map and reduce functions.

This tutorial assumes you have CouchDB running and some have used curl to work with it. If not, go through the Hello World CouchDB Tutorial. In that tutorial you set up CouchDB and became familar with using curl.

1. Set Up Tutorial Database. We're going to create a new database for this tutorial which you can delete when you complete the tutorial. We'll use the name: couchtagcloud.

curl -X PUT http://localhost:5984/couchtagcloud

with a reply of

{"ok":true}

You've now created a CouchDB database.

2. Test Data. It is always a good idea to have test data early in a project and especially in a tutorial. So we're going to add some documents which are arrays of tags. Our tags will be: couchdb, couchapp, tags, javascript, json, views, map, and tags. Note that all the tags are lowercase and that later we should deal with an upper case tag. Our test data, like all documents in CouchCloud are in JSON.

{ "tags": ["couchdb", "tags"] }
{ "tags": ["couchdb", "javascript", "json"] }
{ "tags": ["couchdb", "views", "map", "tags"] }
{ "tags": ["couchdb", "couchapp", "tags", "json"] }
{ "tags": ["couchdb", "views"] }

In a real application the documents would probably contain blog posts, articles, comments or some other text that is being tagged.

3. Store the Test Data. When we save documents in a CouchDB database we need to assign each document a unique _id. Normally we'd use a UUID (also known as GUID). But for this tutorial we are simply going to use the first five letters of the alphabet as _id. This should make your typing easier. So will put the data in using the following command lines:

curl -X PUT http://127.0.0.1:5984/couchtagcloud/a -d '{"tags": ["couchdb", "tags"] }'
curl -X PUT http://127.0.0.1:5984/couchtagcloud/b -d '{"tags": ["couchdb", "javascript", "json"] }'
curl -X PUT http://127.0.0.1:5984/couchtagcloud/c -d '{"tags": ["couchdb", "views", "map", "tags"] }'
curl -X PUT http://127.0.0.1:5984/couchtagcloud/d -d '{"tags": ["couchdb", "couchapp", "tags", "json"] }'
curl -X PUT http://127.0.0.1:5984/couchtagcloud/e -d '{"tags": ["couchdb", "views"] }'

4. Is the Data There? Let's make sure that our data is really in CouchDB. Again use curl to look at the "a" document.

curl http://localhost:5984/couchtagcloud/a

and you should get return something that looks like this with a different _rev string.
{"_id":"a","_rev":"1-3473798464","tags":["couchdb","tags"]}

On your own you can check documents "b", "c", "d" and "e" if you'd like.

5. Result. We have a result already? It is alway good to know what you are trying to accomplish with your code. This is part of test generated coding (???). We're not going there in this tutorial, but look for a future tutorial on the subject.

The result that our code should produce is an alphabetical list of tags along with a count of how often that tag appears in the database. There are a few different ways to encode this in JSON. So the JSON that the view should return is something like this:

{ couchapp, 1}
{ couchdb, 5}
{ javascript, 1}
{ json, 2}
{ map, 1}
{ tags, 3}
{ views, 1}

The actual JSON, as you will see. will contain more information. But this at least gives us an idea of what we are looking for.

6. Time For A View To Sort Things Out. To pull multiple documents out of CouchDB you create a view. A view is two short JavaScript functions: map and reduce where reduce is optional. All view returns JSON. We'd like the JSON to contain the information we've described in the previous step.

Most views contain multiple "rows" which means they are sorted in some order. In a simple view they are sorted by a single field like LastName. But you may need to sort on mutiple fields, like LastName and FirstName. This is almost trivial. Hint: you put LastName, FirstName into an array. For our tag cloud tutorial we already know we want alphabetical order so our key will only be the tag.

7. Temporary View. CouchDB has a facility for developers to easily test views. You can experiment with temporary views until you refine it just right. So below you'll see a number of different views as we try to refine our work. This is a perfect time to use Futon because it supports temporary views.

From the Futon Overview screen you should see the couchcloudtag database. Clicking on it gets you to one of several "views" of the database. Select the "temporary view" and you should see this default map function:

function(doc) {
  emit(null, doc);
}

Run it, if you'd like, and the result will be all five documents. Note that they are in no particular order since the key is null.

8. First We Map (Not Nap On The Couch). Again we want our tags in alphabetical order. So the code should read each document and produce one "line" for each tag. In other words, we are going to get multiple lines for each document. This shows how flexible a view can be.

So we need to pull from each document the value from "tags" which is an array. We then step through that array. For each tag we emit to tag string (e.g., "cloudapp") as the key (what we want to sort on). Since we are working with key/value pairs, we need a value for each key (tag). We are going to use the number value of 1 to indicate one occurrence of the tag. This is sometimes called a count view.

function(doc) {
  for (i in doc.tags) {
    emit(doc.tags[i], 1);
  }
}

The result on our test data is

couchapp  | 1
couchdb   | 1
couchdb   | 1
couchdb   | 1
couchdb   | 1
couchdb   | 1
javascript| 1
json      | 1
json      | 1
map       | 1
tags      | 1
tags      | 1
tags      | 1
views     | 1

9. You Call It Reduction, I Call It Aggregation. The above view looks close. We have the tags in alphabetical order. If you've work with databases or even just spreadsheets you have probably recognized that we need a way to group the like tags together and get a sum. This is where the Reduce part of Map/Reduce comes into play. We're going to use a sum function.

function(keys, values) {
   return sum(values);
}

That gives us exactly what we want:

couchapp  | 1
couchdb   | 5
javascript| 1
json      | 2
map       | 1
tags      | 3
views     | 1

10. Design Documents. There is a special type of document that you store in CouchDB called design documents. Contained within these documents are most of the code you need to make your database work. There are validation routines, shows, lists and views. So we need to create a design document and store the above map and reduce functions. In the design document a view will look like:

"language": "javascript",
"views": { 
  "by_tagcloud": { 
    "map":  function(doc) {for (i in doc.tags) {emit(doc.tags[i], 1); }},            
    "reduce": function(keys, values) {return sum(values);}
  }
}

11. Create Our Design Document. Now we can save our view functions, map and reduce, into a design document. All design documents have an _id which starts with "_design". For this tutorial we will have an id of "_design/couchtagcloud". So the command line would be:

curl -X PUT http://localhost:5984/couchtagcloud/_design/couchtagcloud/  -d '{"views" : {"by_tagcloud" : {"map" : "function(doc){for (i in doc.tags){emit(doc.tags[i], 1)}}","reduce": "function(keys, values) {return sum(values);}"}}}'

Note: It can be very frustrating using curl with JSON. It is not easy to debug incorrect brackets or quotes. You may want to test you JSON first before using curl.

You can verify that your design document is in your database with:

curl -X GET http://localhost:5984/couchtagcloud/_design/couchtagcloud

12. We've Got JSON. Our database will now return our JSON file for our tag cloud:

curl -X GET http://127.0.0.1:5984/couchtagcloud/_design/couchtagcloud/_view/by_tagcloud?group=true

Giving us:

{"rows":[
{"key":"couchapp","value":1},
{"key":"couchdb","value":5},
{"key":"json","value":1},
{"key":"map","value":2},
{"key":"tags","value":4},
{"key":"views","value":3}
]}
Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License