Dr. Codd Was Right: July 2011

29 July 2011

STEC Crashes

What's it all mean? Beyond losing a bunch o' cash for those holding (not I)?

It might be a bad thing for BCNF on SSD, but may be not. It kind of depends. According to reports from the conference call, STEC parts are being replaced by its clients (who are mostly storage vendors, not the user enterprises) with cheaper SATA drives and protocol morphing dongles. If true, then STEC's fall, while not good for them, is not relevant to the SSD Revolution.

On the other hand, if this means that SSD is being shifted aside from primary datastore to cache/Tier0/foo, then it bodes ill for my version of the Revolution. In Enterprise, at least. I could live with that. Enterprise has an absolute reactionary tilt; they keep 40 year old COBOL systems alive. Why isn't there a Do Not Resuscitate for dying code?

New systems, from smaller builders, are where the "innovation" will come from. I can live with that. If I never see the inside of a Fortune 500 (as an employee, that is) building, that is perfectly OK.

28 July 2011

Mongo Loves Candy

I recently chatted with some folks about real databases, in RoR, to solve real problems. Not so sure they're interested in real databases, but they're interested in Rails. Along the way, they mentioned that they'd been using Fusion-io SSDs. Be still my heart! Turns out that they've a separate datastore, in MongoDB, which had become as slow as molasses uphill in winter. So they bought a 1T Fusion-io card, in hopes of speeding things up. Didn't work out.

What's not widely understood about PCIe SSDs is that they're, more or less, heavily dependent on the cpu to get the work done. Or, as Zsolt puts it (on today's front page): "how much of the host CPU power is needed to make the SSDs work? - this is important if you're trying to fix an already overloaded production server - because you can't afford to lose performance while you tune the hot spots (even if the theoretical end point of the tuning process is faster)". I suspect they might decide MongoDB is the problem (document datastores make my teeth hurt). SSD with BCNF databases will generate real performance improvements. PCIe cards are not indicated if the problem is cpu bound.

One can find out, well enough anyway, whether the process is cpu or I/O bound with iostat and vmstat on *nix systems. That's the place to start.

27 July 2011

Ohm's Law

I've gotten to enjoy Christophe Pettus' postings linked from the PostgreSQL site. He does a neat presentation. This is his latest. Note especially pages 50 and following. While he's a Python/Postgres kind of person, and I'm currently exploring RoR again (long story), he does say things the way I do. Not quite as famous as he is, of course. In the database is truth. Note in particular his observations with regard to "cloud" I/O; it's what I've always suspected. It's your data, don't treat it like a red haired step-child. The SSD is the future of normalized, i.e. fast, data. The "data explosion" is largely the result of bad (non-existent?) data modeling. Cloud is all about minimalist/commodity parts which are easily re-assignable. If anything kills off the RM, it will be public clouds. Coders get infinite employment, and the profession relives the 1960s. Sniff.

So far as that goes, what he's saying about coders abusing the database from Django is about what I've seen with coders abusing the database from RoR; may be more so, given David's attitude toward data. The problem with ORMs is that they seek to solve a problem created by OO coders, but which doesn't exist in the Real World. Such coders refer to the problem as Impedance Mismatch, which is merely an assumption that objects can't be populated with data from the RM. But it's just an assumption. What they steadfastly (shades of Tea Baggers, what?) refuse to acknowledge is that BCNF databases allow for construction of arbitrarily complex data structures, unlike the hierarchic/IMS/xml approach, which is locked in to a parent/child structure. Change that, and all the application code which manages it has to change. Well, unless you've written a bare bones RM engine into your application. Don't laugh; I've lived through folks doing just that.

The world isn't hierarchic, no matter what OO/xml folks want to assert. I've worked lots of places, small to huge, and the archetype for the hierarchic structure doesn't actually exist. That structure is the Org Chart. In the hypothetical world, each worker bee has one, and only one, supervisor. The real world is run on Matrix Management, one has supervisor du jour, never the same one each day, varies by project/location/assignment/foobar. The real world is relational, connections come and go, in vivid multiplicity. The relational model stores such natively. From this structure can be built any set of connections which arise. By *not predefining* the connections, only the absolute identities of each type/rule, one can create new relationships simply by naming new foreign keys (cross-reference tables, by various names, for many-to-many relations).

One can also add new data without (if one has been moderately smart with the DDL/SQL) clobbering any existing SQL (or, heaven help us all) application code which directly queries the DB. Existing queries can ignore, if desired, new columns and new tables; so long as one avoids 'Select * from ...', of course. You would never do that, right?

13 July 2011

M'mmm, Kool Aid

I'll include some of the text, since the way Zsolt's site is structured, entries tend to disappear down a rabbit hole. Today, this is still front page. Go there to finish it up: he's got quite a lot to chew on, and he does the site for a living.

Editor:- July 11, 2011 - I recently had a conversation with a very knowledgeable strategist at a leading enterprise storage software company. I won't say who the company is - but if I did - most of you would know the name.

The interesting thing for me was that he'd recognized that if the hardware architecture of the datacenter is going to change due to the widespread adoption of solid state storage - that will create new markets for traditional software companies too.

And I'm not talking here about new software which simply helps SSDs to work or interoperate with hard drives - but software which does useful things with your data - and which can take advantage of different assumptions about how quickly it can get to that data - and how much intensive manipulation it can do with it.

While he doesn't say BCNF-RDBMS in his text, he's saying it. I've been bugging him for some time to drink the Kool-Aid. Sounds quite like both he and the unnamed "strategist" (no, not I, alas) have quaffed deeply. Face it, if all you do is keep appending flatfile "fields and records" to some file, not only do you never get ahead of the bull, but you get gored sooner or later. BCNF is the *only* hope. Yes, this requires designers/developers to actually *think* about the data. But, isn't that why we get paid the *big bucks*?

(OK, I went a bit asterisk nuts with this one. Finding validation does do that.)

12 July 2011

What's Up Doc??

Another in the occasional post from elsewhere. This time, simple-talk (no surprise there) with thread on bugs. Herewith my contribution, because the issue of buggy software can't be divorced from the application architecture and data language.

Ultimately there are two categories of bugs:
A) those caused by stupidity, inattention, carelessness, etc.
B) those that are the result of extending the developer's/team's experience

The entire ecosystem around each is necessarily different, and there are multiple approachess.

The A variety will be dealt with as the ethos of the organization dictates; anywhere from fired on first mistake to employed forever out of harm's way. Detecting such bugs should be possible with known testing harnesses/practices.

The B variety is more interesting.

For those in the BCNF realm, much of what passes for "new technology" in data stores and processing is VSAM redux, which brings with it the COBOL RBAR mentality, irregardless of the source language. This POV is wrapped in whatever jargon is native: NoSql, Hadoop, Map/Reduce/BigData/foobar. But the fact remains that coders are implementing ACID (if they care at all about their data) in some (high-level) language outside the storage engine.

Whether the organization realizes its mistake, and implements engine side processing, a la Phil's current article, or undertakes to use the FOTM client side framework, the coders are left in unexplored territory.

Left unexplored, generally, is an analysis of what architecture (engine side vs. client side vs. application language vs. database engine [not all do all things well]) is the least prone to both type A and type B errors for the application in hand.

Declarative languages (SQL, Prolog) just tend toward fewer errors. SQL is dependent on schema quality, and coders tend to view schema specification as a low value, unimportant task. Certainly not one for which specific expertise and experience is required; any coder can do it.

The bugs that matter, which mess up the datastore, are just less likely if processing stays in the engine. Bugs which consist of ugly fonts, not so much.

As to IT managers, again, two categories: those that were and are technically superior, and those who never were. The former, albeit rarely do they exist, generally get more done. The latter do awesome Power Points.

09 July 2011

Workin' on the Chain Gang

LinkedIn, LinkedIn whatever are we to do with you? I've not had anything to say, given how silly the whole mess is, but today's NY Times has an almost true article. I don't have meaningful disagreement with the problems raised in the article, but it avoids the underlying issue. (That it makes my thesis that advertising based business is inherently unstable, is another atta boy for me.)

No, the problem with LinkedIn is that the business model is foolish. The business model is based on the assertion that people without employment and income will rush out to buy stuff. How stupid is that? One can slather on all sorts of finery, but that's the business model. At least Google attaches ads to activities utilized by everybody.

There's a reason that employment agencies charge money for their services; they actually do some work. Most of it is negative, removing for essentially arbitrary reasons otherwise qualified folks. LinkedIn presumes that if an unemployed is known to the employed, that this will embolden hiring agents to consider an unemployed for a position. Factually false. Been there, done that. Employers, though it be illegal, are more than willing to admit not interviewing an unemployed.

What, then, about those on LinkedIn who are currently employed? Will they be buying stuff? May be. May be not. The folks from my last employer that LinkedIn offers up each week or so, for instance. Are they looking? I don't know. I do know that 99.44% of them have never worked anywhere else (both young and old) or on any other software. In many cases only the decades old COBOL that constitutes the application. On a mainframe. Will such folks be buying stuff? Probably not.

Near as I can tell, LinkedIn, whether its progenitors say so or not, is attempting to implement what the high end (or low end, depending on your point of view) agencies promote: access to the hidden job market. Whether such actually exists has been a matter of controversy at least since the 1970's, lawsuits and all. For companies large enough to have an HR department, ain't nobody gettin' through without they go through them. It's job preservation, after all. For the SMB crowd, it might work. For startups (where the really interesting, and vastly stupid, activity is), even less so.

LinkedIn is a bottle rocket, soon enough to come crashing down. Google needn't worry that it is the advert server to fear. There will be such an advert server, as I have written. LinkedIn isn't it.

Dr. Codd Was Right

Make America White Again - The Gang of Six, 29 April 2026

About

Shameless Plug

Extended Pieces

Good Stuff

Followers

Blog Archive