Dr. Codd Was Right

17 January 2013

Don't Hate Me Because I'm Thin

The major corollary of this endeavor is that smart-server/dumb-client computing, with high NF RDBMS calling the tune, is the Back to the Future moment. One of the touchstones of this re-conversion is the VT-220, a terminal made by DEC for years. Turned out to be so popular that many other terminal vendors emulated the VT-220. I saw lots of Wyse terminals in VT-220 mode.

So, here's what Dell is up to. Dell bought Wyse some time back.

You will be assimilated. You are number 6. (Watch out for the bubbles.)

15 January 2013

Cleave Only to Thy Foxy Lady

It's been a while since there appeared here a missive (well, rant) on the future of computing. To recap: the future, for those that prosper, will be multi-processor/core/thread/SSD servers running *nix and an industrial strength RDBMS. I will be kind to the MS folks, and admit that SQL Server (rather a nice engine) might save Windows; but it's a slim shady chance.

News today. This posting suggesting that MS cleave to Mozilla in hopes of surviving. Whether co-incidence or not, Apple's share is diving as I type. Could be transitory. Might not be. Ignoring 80% or more of consumers might catch up with you. It even caught with Rolls-Royce; the Krauts defeated by the RAF got control. Kind of like MS marrying Mozilla, inverse.

Anyway, here's a tidbit from the piece:

Which is why Mozilla's approach is so intriguing. The company isn't going after high-end smartphones, but rather after low-end, emerging market phones. To accomplish this, Mozilla can't wait around for hardware to get better. Instead, it needs to make the web stack better - now - such that it can work on even barebones phones, including in areas of limited or no bandwidth. Mozilla has therefore developed its web apps to be offline from the start, and to use equal-or-less bandwidth than native apps.

If that doesn't sound like the resurrection of the VT-220, I don't know what else could be. Minimal weight on the client, minimal weight on the wire, and maximal weight on the server. Now, consider the paradigm. If the client is merely a pixelized VT-220, doesn't that benefit MS? If the paradigm of app development is towards the server, doesn't that benefit MS? Where is MS's strength? Yes, Office brings in, historically, most of the moolah. But going forward, there's decreasing need for "the next" Office. Office work is about writing memos, after all. Windows Server and SQL Server, while not yet Enterprise weighty, could get there with a reasonable amount of effort. Moreover, by embracing the normalized database paradigm, one needs much less machine to get the job done.

May you live in interesting times, sleeping with strange bedfellows.

11 January 2013

Scuba Diving in Arcania

If I were to be interested in scuba, I'm not, the Caribbean island of Arcania is where I would go to look for unusual and brightly colored specimens. Since actually going there is not on the agenda, a thought visit is required. This journey was inspired by news out today, that an obscure Fed research group had looked into The Great Recession. The reporting says that new ways of modeling, not heretofore used in economics or policy, will come to our aid. Not surprisingly, the methods described, but otherwise named, sounded rather familiar.

So, off we go. That which is new, ain't necessarily so. As stated muchly, Your Good Mother knows better, and quants as often as not are hired to obscure the truth; ignoring or suppressing basic metrics.

To reiterate: TGR was caused by the (willful?) ignorance of quants (financial engineers). As early as 2003, it was clear from available data that the house price / median income ratio had come seriously unstuck. Since housing is not a return (cash) generating allocation of capital (modulo "psychic" income, even if you're a believer in that sort of thing the cash value is arbitrary and not real), the only support for increasing mortgage levels is increases in median income. The latter wasn't, and isn't, reality; thus the inflation in house price had to be corrupt. Quants don't generally have a corruption variable in their models.

Which brings us to Norris' article. The gist of it is that "agent based modeling" (quotes in the original) offers a better quant, which will identify problems before they morph into a TGR.

But a new assessment from a little-known agency created by the Dodd-Frank law argues that the models used by regulators to assess risk need to be fundamentally changed, and that until they are they are likely to be useful during normal times, but not when they matter the most.

Risk assessment by quants has been based on time series analysis for a very long time. The problem with time series analysis is the assumption that tomorrow looks mostly like today, and today looks mostly like yesterday. More so than other quant methods, time series analysis *assumes* that all determinants of the metric under study are embedded in that metric's historical data. As a result, when the price/income ratio went parabolic, the quants (and their Suit overseers) said, "goody, goody" when they should have said, "what the fuck is going on?" It was not in either the quants or Suits direct, immediate money, interest to question the parabola. They all, ignoring Your Good Mother's advice on group behaviour, went off the cliff in a lemming dive.

Mr. Bookstaber argues that conventional ways to measure risk -- known as "value at risk" and stress models -- fail to take into account interactions and feedback effects that can magnify a crisis and turn it into something that has effects far beyond the original development.

And that part is correct. But the argument, and the logic which extends, doesn't deal with identifying the underlying cause of TGR. It does attempt to find where the bread crumbs *will go*.

The working paper explains why the Office of Financial Research, which is part of the Treasury Department, has begun research into what is called "agent-based modeling," which tries to analyze what each agent -- in this case each bank or hedge fund -- will do as a situation develops and worsens. That effort is being run by Mr. Bookstaber, a former hedge fund manager and Wall Street risk manager and the author of an influential 2007 book, "A Demon of Our Own Design," that warned of the problems being created on Wall Street.

Agent based modeling? As we're about to see, it's old wine in new bottles. Kind of like NoSql being just VSAM.

"Agent-based modeling" has been used in a variety of nonfinancial areas, including traffic congestion and crowd dynamics (it turns out that putting a post in front of an emergency exit can actually improve the flow of people fleeing an emergency and thus save lives). But the modeling has received little attention from economists.

This is where it gets interesting. If you review ABM (why did they end up with the acronym for Anti-Ballistic Missile?) here in the wiki, you can walk a breadcrumb trail. ABM is fundamentally very old, and came from economics, although more recently associated with operations research.

The patient zero of ABM is Leontief's input-output analysis. Leontief built I/O analysis in 1936, well before computers and data were as available as today. My senior seminar somehow got Robert Solow to give us a talk on economic growth (that year's topic). In 1958, Solow co-authored "Linear Programming and Economic Analysis". Large, interaction based, models have been part and parcel of economics for decades.

Here is where the article, and Bookstaber, stumble:

Mr. Bookstaber said that he hoped that information from such models, coupled with the additional detailed data the government is now collecting on markets and trading positions, could help regulators spot potential trouble before it happens, as leverage builds up in a particular part of the markets. [my emphasis]

The cause of TGR wasn't leverage; it was the corruption of historic norms. The result of the corruption was an increase in leverage by those who didn't even know they'd done so: hamburger flippers living in McMansions. It remains a fact: only increasing median income can propel house prices. With contracted resets, not tied to prime, only those in growth income employment (or generalized inflation, which amounts to the same thing) can finance the growing vig. ABM, as described here at least, won't detect such corruption of markets. It can't.

Perhaps regulators could then take steps to raise the cost of borrowing in that particular area, rather than use the blunt tool of raising rates throughout the market.

Here we find the anti-Krugman (and humble self, of course). It was the rising interest rates from contractual resets that finally blew up the housing market. Had regulators forced ARMs to reset higher and faster, TGR would have triggered earlier, and might not have been Great. It's the job of economists to know how the economy works. Leontief's I/O model is the basis of contemporary macro-economic modeling.

Here's the thing. In the relational model, the developer specifies a priori which tables relate and which columns in the tables create that relationship. These relations aren't probabilistic, they're deterministic. A similar distinction exists in macro analysis. A traditional I/O model, while derived from real world data, is deterministic in the input and output relations. On the other hand, traditional macro models are probabilistic; R² rules! Unless economists, and pundits, identify fundamental metrics, and build their models around them, they'll not have any luck predicting. Depressions and recessions have deterministic causes. Now, the loony monetarists tend to blame to the victims, just as they have this time (AIG suing the American taxpayer?). Keynesians tend to blame the centers of economic influence, just as they have this time. Historically, the Keynesians have been right more often than not. Volker be damned.

03 January 2013

Frenemies

There's that old saw: "the enemy of my enemy is my friend". I figured that this was due to Shakespeare, but the wiki says no, the adage originated either in Arabia or China. Makes sense: both cultures were way ahead of England by the time Shakespeare came around.

Each day I get an update from sqlservercentral. They're one part of the Goliath organization which published the Triage piece, so although I'm not currently doing much with SQL Server, it just seemed right. Today's feed included this link: "All Flash". Yhawza!! My little heart goes pit-a-pat. Then I scan the first paragraph, and go off to the link, and these are NoSql kiddies!! Arrgh!

The piece starts off reasonably, making the case that short-stroked arrays of HDD necessary to get the IOPS of a Samsung 840 is orders of magnitude greater than the cost of the 840. A couple of problems with that though. The 840 (not the 840 Pro) is a TLC read-(almost)only part. AnandTech tore it up and then again. While not an egregious part, it isn't by any stretch a server part. Consumer for sure; prosumer not so much.

The piece does get contradictory, however:

Flash is 10x more expensive than rotational disk. However, you'll make up the few thousand dollars you're spending simply by saving the cost of the meetings to discuss the schema optimizations you'll need to try to keep your database together. Flash goes so fast that you'll spend less time agonizing about optimizations.

This is the classic mistake: assuming that flat-file access is scalable. Of course, it isn't with consumer flash drives, and that's why the NoSql crowd find themselves in niche applications. The advantage of the RM, and the subsequent synergy with SSD, is that the RM defines the minimal data footprint. Since random I/O is the norm with multi-user servers, there's no greater penalty to normalization.

When a flash drive fails, you can still read the data.

I don't know where the author gets this from. Since each SSD controller has its own method of controlling data writing on the NAND, unlike HDD which follow standards and largely use Marvell parts (if CMU doesn't bankrupt them), data recovery is iffy. Most failures of SSD to date have been in the controller's firmware, not NAND giving up the ghost, and frequently lead to bricked parts. So, no, you can't remotely depend on simple recovery of data from SSD. While I've not seen definitive proof, SSD failure should be more predictable, which by itself is an advantage. One simply swaps out a drive at some percentage of the write limit. Modulo those firmware burps, you should be good to go until then.

Importantly, new flash technology is available every year with higher durability, such as this year's Intel S3700 which claims each drive can be rewritten 10 times a day for 5 years before failure.

Well, sort of. The S3700's consistency isn't due to NAND durability, but controller magic. It is well known that as geometry has shrunk, inherent durability of NAND has dropped. And that will continue. As I've mused before, we will reach a point where the gymnastics needed to compensate for falling P/E cycles in NAND by controllers will exceed the cost savings of smaller geometries. This is particularly true of the Samsung 840, which begins the article.

Over time, flash device firmware will improve, and small block writes will become more efficient and correct...

It's going the other way, alas. As geometries shrink, page size and erase block size have increased, not decreased. The use of DRAM caching on the drive is the common way to reduce write amplification, i.e. support smaller than page size writes. Firmware can only work around increasing page/erase block size.

What the author misses, of course, is that organic NF relational databases implement minimum byte footprint storage, get you a TPM, lots of DRI, and client agnosticism in the process. So, on the whole, it's a half right article.

01 January 2013

Bootstraps

What makes the marriage of SSD and the RM so appealing is the ease (relatively) of application building. If one specifies a schema, and active components, there is little if any logic that *must* be implemented exclusively on the client. Since the client is disconnected, in typical http driven applications, it can't know what the rest of its global world is doing. It's flying blind. Why would one want to rely on disparate clients (after all, Codd's paper's title includes "large shared data banks") all correctly implementing such logic?

As a New Year's present, a SQL Server (not my current cup of tea) article for generating active stored procs. Note that he references INFORMATION_SCHEMA, which is a standard; this process should work for any compliant engine, modulo SP syntax emitted. I consider this something of a compromise; one could, with more effort certainly, generate the html stream from the schema. There are applications which do that, too.

Touching Me, Touching You (Third Chorus)

(An existing piece, with some new information.)

Diligent readers know that, while this endeavor began with my having some free time to make a public stand for the full relational model/database due to the availability of much less expensive flash SSD (compared to DRAM SSD, which have been around for decades) in a "normal" OLTP application, the world changed a bit from then to now. In particular, the iPad. I've mentioned the implications in earlier postings.

Now, as regular readers know, the iPad is not especially new, from a semantic point of view. Tablets have been in use in warehouse software applications (MRP/ERP/Distribution) for a very long time. (This is just a current version.) I programmed with them in the early '90s.

But the iPad does mean that mainstream software now has a new input semantic to deal with: touch me, touch me, not my type. So, it was with some amusement that I saw this story in today's NY Times. Small-ish touch screens means small bytes of data, a bit at a time. The 100 field input screen that's been perpetuated (in no small measure as a result of the Fortune X00 penchant for "porting" 1970's COBOL screens to java or php) now for what seems like forever is headed the way of the dodo. It simply won't work. And the assumption that "well, we'll just 'break up' those flatfiles into 'sections'" will fail miserably. There'll be deadlocks, livelocks, and collisions till the cows come home.

BCNF schemas, doled out in "just the right size" to Goldilocks, is the way forward. Very cool.

[update]

So, now we have Win8, and the move to PC and touch. Here's the first 2013 story, pushing for push-button computing. As above, while one might argue that the pure semantics of touch versus mouse isn't large (after all, one is "pushing a button" seemingly in either case) the speed and fluency of touch is miles ahead (props to the horn player) of the rodent. Keeping up with this, by providing tidy morsels of data is key to success.

The Way We Were

Well, it's New Year's Day, and here I sit in my drafty New England garret ready to contemplate the most significant opportunity last year in the intersecting worlds of relational databases and quants. I have to give the laurels to the Obama quants, who ousted the political operatives, and set out to do political quant better than their opposition. One might give a thorny crown to the Romney crew, who did things the crony way (letting the loyal political hacks make the decisions), but that would be cruel. Previously cited, here is the story. Given that the DNC came a cropper in state and local elections, ceding yet more control to the Republicans, one can't conclude that Democrats are naturally more adept at quant.

A few honorable mentions:

- SSD land turned into a mine field. OCZ nearly went belly up, and might yet still. Consumer/prosumer SSD has fallen to commodity status; we can expect the Big Boys to dominate going forward. STEC found trouble, and is still in it. Fusion-io has been largely static. Flash arrays, from the bigger companies, gained mindshare; the quote from Linus is closer to true now than last New Year's.

- NoSql and NOSql fought with each other and against SQL for mindshare. On the whole, NOSql appears to have bested NoSql, but I stand by Date's quote. If you care about your data, you have to have a central control, i.e. a TPM, and Kiddie Koders who think they can gin up a replacement (in client code, no less) for any of the industrial strength engines that have been developed over the last two decades are kidding themselves (and any Suit dumb enough to buy the story).

- R, and other analytics, get closer integration with databases; SAP/HANA/R notably. But not, so far as I can tell, Sql Server. That part is too bad.

- Violin was reported to IPO at $2 billion in October, but hasn't happened yet. Enterprise SSD may yet still live as something more than just HDD cache.

Dr. Codd Was Right

USofA Cultural Revolution: Kill The Smart People

About

Shameless Plug

Extended Pieces

Good Stuff

Followers

Blog Archive