NoSQL

"So we hit 9 billion records, which is of course very exciting. Traffic to our public API is keeps growing–MongoDB served 100M queries in the last week and didn’t break a sweat." - Wordnik  

NoSQL is a technology that is gathering support and a bandwagon behind it from some of the worlds biggest companies.

 Facebook use it to power their messaging system, and Digg recently moved their to it. Twitter have released tools for it, and AOL use it to power some of their advertising campaigns.

A standard database  can be thought of as just a natural extension of using Excel. Your data is held in “tables” (think Excel sheets), and there’s just a column in each table telling it what sheet to look in for a related piece of information.

 For example, we might have two “tables”, one for users, and one for widgets. A user might have columns called “email” and “password”, and a widget might have a “name” and a “cost”. To tie a user to a widget, all you’d do is store a “user row” column in the widget.

 This way, if I was looking through my excel document, and I saw a widget with a user row of 5, I know that it belongs to the user in the 5th row of the users sheet.

There has been decades of research into the fastest way to retrieve information across tables. The difficulty with the standard solutions comes when you have to split the database. One machine can only serve so much traffic at once. Suddenly, you have to manage multiple databases on multiple machines. Now,  what happens if you receive a sales enquiry and you don’t know which computer the customer's file is on? Even better, you might have their last order information on one computer, but have to go to another computer to retrieve their address!

There are methods that have been developed over the years to prevent this situation degrading into madness, but this is where the NoSQL solution arose. NoSQL essentially discards the traditional table model of databases.

Instead, it maintains lots of individual elements, which can hold other elements inside them. The closest analogy to this is your files on your computer. You might have a “document”(think folder) for our user, and a “embedded document” (think file) for our widgets. If I want to see how many widgets Bob has bought, I just open up his document and look at all the embedded documents inside it.

Pros:

  • Scalable: If I need to expand my database past one machine, it’s simple. Given the widget example, I could say that “customers with names starting a-n are on machine 1, o-z are on machine 2”. I’d then know exactly where to go when Bob called up enquiring about buying something he’d bought before off us.
  • Customizable: If I want to say that Bob has an email, a mobile number and a website, and all my other clients only have mobiles, I don’t need to add 2 blank contact columns to every other contact. I just add all 3 to Bob, and leave all the others as just having one. Wonderful when almost every one of your products has many custom fields!

Cons:

  • Developers: Pretty much every web development firm in the world is familiar with a branch of SQL. If your lead database developer quits, you can easily replace him. With a NoSQL database, though it’s getting more popular, it certainly has nowhere near the adoption of SQL.
  • Size: NoSQL’s biggest strength is when you are dealing with massive datasets. Very few companies ever get near having to deal with the load that Facebook or AOL deals with on a daily basis.