Dr. Codd Was Right: Don't Be a Dupe

Kent State Leads by 2 - Let's Go MAGA!! Ya couda had #3 on 2/2

[T]hey've got to draw in their horns and stop their aggression, or we're going to bomb them back into the Stone Age. And we would shove them back into the Stone Age with Air power or Naval power -- not with ground forces.
-- Curtis LeMay/1965 [didn't work in Nam, won't work in Iran; these boots are made for walkin]

We'll smash down your doors, we don't bother to knock
We've done it before, so why all the shock?
We're the biggest and the toughest kids on the block
And we're the cops of the world, boys
We're the cops of the world
-- Phil Ochs/1966 [from back when Americans gave a shit]

There's not a single example of things working out for the appeaser.
-- Nicolle Wallace/2024 [like this? the next extortion is on the way]

I'm hiding in honduras, I'm a desperate man
Send lawyers, guns, and money
The shit has hit the fan
-- Warren Zevon/1978 [Bone Spur Samurai^©'s sphincter gets tighter by the second]

Effective 1 April 2026, by this Executive Order, only attorneys approved by U.S. Attorney General ~~Pam Bondi~~ will be allowed to practice law at the Federal, State, and Local levels. Bar credentials are hereby deleted. Bar associations are eliminated. [next step from EO of 7 December 2025]
-- Hate filled, Petty, Paranoid, Demented, Dictator Don/20 March 2026

I have had to explain and re-explain and re-explain and re-explain, you know, how relational databases work, what is an eigenvector, what is dimensionality reduction.

-- Christopher Wylie/2018

I believe quite strongly that, if you think about the issue at the appropriate level of abstraction, you're inexorably led to the position that databases must be relational.

-- Chris Date/2009

This Week's thought

It is going to make their quality go up and their productivity go up. I mean by every company I literally mean every company.
-- Jerk Bozos/2026 [productivity moolah goes to Capital. who'll have money to buy all that increased productivity?]

See you next week in a brand new show^{©Heckle and Jeckle}

Therefore:

In a time of SSD, multi-core/processor, two terabyte memory and Optane App Direct Mode (RIP) machines, there is no reason not to build from BCNF data. Time to do what Dr. Codd demonstrated. Technology has finally caught up with the maths.

13 January 2012

Don't Be a Dupe

The post on the NoSql dust up from Postgres led to me adding a comment on one of the posts, and now Chris Travers has commented on my post (ring around a rosy?). You can find his comment here(scroll to the bottom).

Rather than waste the opportunity in a comment, and because I truly believe it's worthwhile on it's own, herewith.

I do not believe that there is any use case for NoSql, on technical grounds. That is to say, there are no use cases where NoSql provides a superior solution to the RM/SQL (henceforth, just SQL). I do believe that NoSql is chosen 99.44% of the time just because it implements Programmers' Perpetual Employment Paradise. No database, all COBOL forever; well, java or C++/#.

The reason SQL is always superior is because:
- it implements the smallest real data footprint to logical data footprint
- it enforces logical data integrity

One of the hot new technologies, over the last few years, has been data deduplication. Dedupe is just an ad hoc approach to 5NF, with none of the benefits of SQL; get rid of the redundancies. But there's no structure or control. One of the objections to dedupe tech is the fragility of it. How much of a glitch will cause data corruption, and so on.

Now, for non-transactional datastores with no logical structure, then SQL databases probably don't matter much. If one considers that a use case, then OK by me. I wouldn't spend my time with such an application, that's for sure.

4 comments:

Chris Travers said...: Ok, to defend my viewpoint here. I have done a fair bit of programming on non-relational as well as relational databases. The non-relational ones I have invariably been backed by BDB, either because I was working with BDB directly or because another database system I was using (for example OpenLDAP) was using it on a lower level. In particular my experience with directory services platforms suggests that in fact there are cases where non-relational solutions are inherently better than relational ones.

Let's start by explaining what an RDBMS does which makes it so powerful. This is quite simple: An RDBMS is basically a math engine to provide a means of storing, retrieving, manipulating, and presenting data in a way grounded in set theory. You can take this quite far and in fact all of the great use cases of an RDBMS are based on this.

Indeed one reason I generally suggest putting a fair big of business logic in the database is that you can take this set-based approach very, very far when you are using stored procedures. Set-based approaches are also extremely useful when doing reporting, whether ad-hoc or within the system. So this level of flexibility and power when it comes to reporting is unequalled in any non-relatoinal system. It also means that for many things an RDBMS will beat the pants off any non-relational system performance-wise.

The problem is though that there are some sorts of applications where all of the following are true and LDAP directories, for example, hit all three of these on the nose:

1) There are no real reporting needs for the data within the database.

2) Interoperability takes place using a dedicated, well developed networked protocol that is widely supported.

3) While data access functions can be represented as set operations, these are unusually expensive. Access traversing a tree structure for example (as in LDAP) is slow in a relational db.

An LDAP directory (which is really a structured, hierarchical database) hits all three of these on the nose, and therefore even the most pro-RDBMS folks I know do not tend to recommend backing one with an RDBMS where any load is involved. That doesn't mean that an RDBMS has no place in the environment but it means the mainline servers are better off running a non-relational store. So you might run a master LDAP server backed by a relational db for reporting and push changes out to production servers which handle the requests and do not use an RDBMS on the back-end.

So saying no use cases for non-relational stores is definitely that says "hammer syndrome" to me. After all, if that was the case, we wouldn't have filesystems anymore (another document-oriented hierarchical database management system).

The problem exists on the other side: RDBMS's win out overall when any ONE of the following is true:

1) There are significant reporting needs in the application, whether built-in or ad-hoc

2) Integration may need to be handled by feeding data directly into the back-end, or

3) The data access operations can be reduced to relatively cheap set-based operations.

I think you will find that the vast majority of the time, NoSQL is being implemented when it shouldn't be, i.e. where at least one of the above is true. However, there are narrow use cases where such a decision makes sense. I would agree with you that of actual deployments of NoSQL technologies though, less than 1% are using it for the right reasons.; January 14, 2012 at 9:28 AM
Anonymous said...: If NoSQL is simply key->value data store, then you may have plenty of them inside RDBMS just by means of 'CREATE TABLE nosql_1 (key ..., value ...)'. Won't that be pretty much equivalent to BDB? Or am I missing something?; January 21, 2012 at 12:16 PM
Anonymous said...: > there are no use cases where NoSql provides a superior solution to the RM/SQL (henceforth, just SQL).

Have to seen Event Sourcing, coupled with DDD+CQRS (http://abdullin.com/cqrs/), is one valid case where use of SQL is optional.; January 22, 2012 at 3:04 PM
Robert Young said...: The notion that there are no use-cases where NoSql is superior is based on the notion that the RM/SQL provides the smallest footprint, far smaller than flatfile structures. Yes, one (generally a coder who doesn't want centralized control of the data) can use flatfile structures, but you'll end up with more work and more code (including client and some sort of "server"). For coders, that's a good answer. Just not for the folks paying the freight.

For an absolute zero integrity case, e.g. search, one could devise a code/data solution. Then SQL still has the size advantage.

In my experience, the flatfile folks are universally coders who either don't know much about Codd/Date/RM/SQL or actively wish to subvert it. That's fine, but it's not intellectually honest.

As to key/value in SQL databases, sure, one could do that, but it would like putting a '79 Yugo engine in a Ferrari FXX: it looks like a Ferrari, but wastes the experience.

1% is a generous give-back.; January 22, 2012 at 4:15 PM

Dr. Codd Was Right

Kent State Leads by 2 - Let's Go MAGA!! Ya couda had #3 on 2/2

About

Shameless Plug

Extended Pieces

Good Stuff

Followers

Blog Archive

13 January 2012

Don't Be a Dupe

4 comments: