03 February 2010

We Don't Need no Stinkin' Innovation

Artima is a running bunch of message threads on subjects generally related to coding; occasionally databases directly, but not so much recently. There is a current thread, started by Bruce Eckel, discussing the proposition that software development has stalled. The thread then mixed in ideas of innovation and concurrent machines. Well, this was too much to ignore. Herewith my contribution; I find that it stands pretty much on its own, and makes another case for the point of this endeavor.

While it is true that development has stalled, the stall began with java. Java did not usher in a wave of innovation, which has somehow petered out. It merely allowed the ignorant to recreate the paradigms of the 1960's.

The web, so far, is semantically implemented exactly the same way as COBOL/3270 code, circa 1970. You have a dumb datastore, managed by bespoke code, talking to a semi-smart block mode terminal. The only difference is some syntax (COBOL on the mainframe vs. java on the server, 3270 edit language on the terminal vs. html/javascript on the browser), but the method is identical. Better ways were discovered between 1970 and 2000, but the folks, young-uns mostly, who stole the process were quite ignorant of these. They revel in "languages" rather than systems.

The principle reason this is so amusing is that there was far and away more innovation in systems through the 1980's. There were multiple mainframe machines, each with a specific instruction set (aka, architecture), as well as an emerging group of mini-computers, again, each with a specific architecture; among them explicitly parallel machines. Machines were innovated to solve problems heretofore unsolved. We've devolved to a near mono-culture: the X86 instruction set and z/ instruction set. The ARM processor is gaining some traction, and may pull us out of the weeds; it is aimed at a different problem.

All this kerfuffle about concurrent languages is so misinformed, again, because those who consider themselves players weren't around when parallel architectures were first developed in the late 1970's. Those architectures basically went nowhere, and not because they were poor architectures, but because there aren't many problems (in the computer science sense) that benefit from parallelism. As to concurrency in linear problems, see Amdahl's law (it's been mentioned before).

The notion that ad hoc language creation will make it easy to use multi-core/processor machines to execute linear problems (virtually every application which is not a server) more efficiently, is fantasy. Even without Amdahl's law, simple logic (which one may argue is all Amdahl was expressing) makes it clear.

The problem which is staring the industry in the face is the re-siloing of applications as a result of web kiddies' love affair with languages rather than systems, which leads to the "explosion of data". Dr. Codd, bless his heart, gave us the answer: replace code with disciplined data. But because these self-appointed players have a hammer called "languages", they seek to create "new" ones to solve a problem (nail) which is not language solvable.

To accept that the solution is to implement shared, disciplined datastores, is to accept that each application is not going to be its own little fiefdom. Further amusement here, in that Bruce (and many others, as well) make much of "community" of coders in these discussions, but absolutely reject the notion that "community" can be applied to datastores. NEVER do that. If they did that, then there'd be so much less code to build. There'd be so much less "freedom" to make up one's own rules about this and that. So we get a Tower of Babel; each cabal of coders pissing on its territorial boundaries. My code. My data. You don't understand what I'm trying to do here. And so forth. And so very mainframe 60's. Your grandfather said the same thing.

Note that the multi-core/processor machine is not a new architecture, but rather just the aggregation of existing Von Neumann machines (linear execution with an existing instruction set). These are not machines based on an architecture which accepts code and parallelizes it for execution by some new non-linear instruction set.

This is all wasteful wheel spinning. The linear instruction sets are (save for register size expansion) fundamentally unaltered for 30 years (nearly 50 for the z/). ARM takes RISC to a logical conclusion, and attacks an utterly different problem from that of the X86 machines.

This proliferation of languages is scribbling around the edges. They all have to (compile to and) execute on the same old X86 instruction set. You can't (well, really) push a square peg in a round hole. Today's multi-machines aren't really designed for parallel algorithms (and how many of those are there?), they're just miniaturized versions of the IBM 360 MFT (that's 1965 for those keeping track) machines.

The answer to the density/speed/heat problem won't be found in such ad hoc language sand castles. It will require some really smart hardware engineers (are there any left?) figuring out the successor to the Von Neumann architecture. Have fun making new languages, but don't delude yourselves into thinking that you'll actually be solving the real problem. The problem isn't in high level application languages. The answer is either in databases (replacing code with data, leaving display to application languages) or in new hardware architecture.


TroyK said...

Nice post, including some good quotables :)

Are you familiar with Toon Koppelaars' thoughts on a data-centric architecture? Excellent points, and, with hope, the start of something better in software development.


Take Care,

Robert Young said...

Absolutely. One of my earlier posts is a discussion of the book:

Hopefully, the industry will see the light and not let the skies be Cloudy all day.