13 January 2012

Don't Be a Dupe

The post on the NoSql dust up from Postgres led to me adding a comment on one of the posts, and now Chris Travers has commented on my post (ring around a rosy?). You can find his comment here(scroll to the bottom).

Rather than waste the opportunity in a comment, and because I truly believe it's worthwhile on it's own, herewith.

I do not believe that there is any use case for NoSql, on technical grounds. That is to say, there are no use cases where NoSql provides a superior solution to the RM/SQL (henceforth, just SQL). I do believe that NoSql is chosen 99.44% of the time just because it implements Programmers' Perpetual Employment Paradise. No database, all COBOL forever; well, java or C++/#.

The reason SQL is always superior is because:
- it implements the smallest real data footprint to logical data footprint
- it enforces logical data integrity

One of the hot new technologies, over the last few years, has been data deduplication. Dedupe is just an ad hoc approach to 5NF, with none of the benefits of SQL; get rid of the redundancies. But there's no structure or control. One of the objections to dedupe tech is the fragility of it. How much of a glitch will cause data corruption, and so on.

Now, for non-transactional datastores with no logical structure, then SQL databases probably don't matter much. If one considers that a use case, then OK by me. I wouldn't spend my time with such an application, that's for sure.

4 comments:

Chris Travers said...

Ok, to defend my viewpoint here. I have done a fair bit of programming on non-relational as well as relational databases. The non-relational ones I have invariably been backed by BDB, either because I was working with BDB directly or because another database system I was using (for example OpenLDAP) was using it on a lower level. In particular my experience with directory services platforms suggests that in fact there are cases where non-relational solutions are inherently better than relational ones.

Let's start by explaining what an RDBMS does which makes it so powerful. This is quite simple: An RDBMS is basically a math engine to provide a means of storing, retrieving, manipulating, and presenting data in a way grounded in set theory. You can take this quite far and in fact all of the great use cases of an RDBMS are based on this.

Indeed one reason I generally suggest putting a fair big of business logic in the database is that you can take this set-based approach very, very far when you are using stored procedures. Set-based approaches are also extremely useful when doing reporting, whether ad-hoc or within the system. So this level of flexibility and power when it comes to reporting is unequalled in any non-relatoinal system. It also means that for many things an RDBMS will beat the pants off any non-relational system performance-wise.

The problem is though that there are some sorts of applications where all of the following are true and LDAP directories, for example, hit all three of these on the nose:

1) There are no real reporting needs for the data within the database.

2) Interoperability takes place using a dedicated, well developed networked protocol that is widely supported.

3) While data access functions can be represented as set operations, these are unusually expensive. Access traversing a tree structure for example (as in LDAP) is slow in a relational db.

An LDAP directory (which is really a structured, hierarchical database) hits all three of these on the nose, and therefore even the most pro-RDBMS folks I know do not tend to recommend backing one with an RDBMS where any load is involved. That doesn't mean that an RDBMS has no place in the environment but it means the mainline servers are better off running a non-relational store. So you might run a master LDAP server backed by a relational db for reporting and push changes out to production servers which handle the requests and do not use an RDBMS on the back-end.

So saying no use cases for non-relational stores is definitely that says "hammer syndrome" to me. After all, if that was the case, we wouldn't have filesystems anymore (another document-oriented hierarchical database management system).

The problem exists on the other side: RDBMS's win out overall when any ONE of the following is true:

1) There are significant reporting needs in the application, whether built-in or ad-hoc

2) Integration may need to be handled by feeding data directly into the back-end, or

3) The data access operations can be reduced to relatively cheap set-based operations.

I think you will find that the vast majority of the time, NoSQL is being implemented when it shouldn't be, i.e. where at least one of the above is true. However, there are narrow use cases where such a decision makes sense. I would agree with you that of actual deployments of NoSQL technologies though, less than 1% are using it for the right reasons.

Anonymous said...

If NoSQL is simply key->value data store, then you may have plenty of them inside RDBMS just by means of 'CREATE TABLE nosql_1 (key ..., value ...)'. Won't that be pretty much equivalent to BDB? Or am I missing something?

Anonymous said...

> there are no use cases where NoSql provides a superior solution to the RM/SQL (henceforth, just SQL).

Have to seen Event Sourcing, coupled with DDD+CQRS (http://abdullin.com/cqrs/), is one valid case where use of SQL is optional.

Robert Young said...

The notion that there are no use-cases where NoSql is superior is based on the notion that the RM/SQL provides the smallest footprint, far smaller than flatfile structures. Yes, one (generally a coder who doesn't want centralized control of the data) can use flatfile structures, but you'll end up with more work and more code (including client and some sort of "server"). For coders, that's a good answer. Just not for the folks paying the freight.

For an absolute zero integrity case, e.g. search, one could devise a code/data solution. Then SQL still has the size advantage.

In my experience, the flatfile folks are universally coders who either don't know much about Codd/Date/RM/SQL or actively wish to subvert it. That's fine, but it's not intellectually honest.

As to key/value in SQL databases, sure, one could do that, but it would like putting a '79 Yugo engine in a Ferrari FXX: it looks like a Ferrari, but wastes the experience.

1% is a generous give-back.