Sometimes, reading other sites leads to "Ah, hah!!" moments, sometimes a chuckle, and sometimes a WTF!!! Here we have them all rolled into one.
The "Ah, hah!!" comes from this quote:
A newly graduated developer might perform database operations at the application layer instead of the database layer. This will put the processing on the application and not server. Further, this would also put the database at risk of data integrity issue and receive bad data.
A coder (I surmise) who gets data. How unusual.
The chuckle comes from the subtitle of the site: Simple solutions for complex problems. For those who haven't just fallen off the turnip truck, simple solutions to complex problems are inherently wrong. H. L. Menken is generally credited with the seminal quote, "For every complex problem there is an answer that is clear, simple, and wrong."
The WTF comes from an ad on the site (you might not see it, depending): Web programmers $10/hour. I wonder whether the author of the site even realizes how stupid this is. Pay bottom dollar, get bottom results. Mon dieu.
21 December 2009
18 December 2009
Fast Eddy Felson
When is fast not so fast? One of my pet sites, storagesearch, has a new article on SSD performance, with some historical background. Check it out.
Here is the money quote:
If repeat writes to the same small address range (same flash blocks) is interspersed with read operations (the play it again Sam scenario - which occurs in database tables) the performance outcome varies significantly between fat and skinny flash SSDs. A fat flash SSD may produce results which more closely follow the behavior seen in HDDs and RAM SSDs - as predicted by the write IOPS spec. But performance in most skinny flash SSD designs will collapse.
The meaning of skinny and fat is explained here. So, for the purposes of what this endeavor advocates, the fat should be on the fire.
Here is the money quote:
If repeat writes to the same small address range (same flash blocks) is interspersed with read operations (the play it again Sam scenario - which occurs in database tables) the performance outcome varies significantly between fat and skinny flash SSDs. A fat flash SSD may produce results which more closely follow the behavior seen in HDDs and RAM SSDs - as predicted by the write IOPS spec. But performance in most skinny flash SSD designs will collapse.
The meaning of skinny and fat is explained here. So, for the purposes of what this endeavor advocates, the fat should be on the fire.
10 December 2009
Coal in Your Stocking for Christmas?
Could we be witnessing the rapid burnout of a star? A supernova of IT? STEC could be in serious difficulty. The latest news from Fusion-io, whom STEC management has dissed in the past, is a design in with IBM. Here's the announcement and IBM's. As I type, the share has fallen still further from my previous posting: into the $11's.
Why this is important: Fusion-io set out on a separate path, leveraging the PCIe connector, where STEC and storage rack vendors (EMC, for example) didn't. Early on in this endeavor, I pointed out that Fusion-io was adopting the Google paradigm rather than the mainframe/server paradigm of external mass storage. The mass of servers way rather than the mass of drive racks has other implications. In particular, what will database systems look like? The knee jerk reaction is: distributed. On the other hand, may be not in the way that has been before. Fusion-io still (they haven't changed their tune, so far as I know) is on the path to terabytes on the slot. For my beloved BCNF databases which are orders of magnitude more compact than flatfile/xml messes, that is perfectly fine.
Of note in the announcements is that Fusion-io mounts an additional NAND chip for data protection. Not only is STEC threatened, but so are all the storage rack vendors. To the extent that the PCIe vendors can mount so much SSD in the board, which is faster still, and the notion of a room full of servers rather than a room full of drives taking over... Too bad Fusion-io is still private; a few shares might make for a more comfortable, and early, retirement. For those who crave retirement, anyway.
Why this is important: Fusion-io set out on a separate path, leveraging the PCIe connector, where STEC and storage rack vendors (EMC, for example) didn't. Early on in this endeavor, I pointed out that Fusion-io was adopting the Google paradigm rather than the mainframe/server paradigm of external mass storage. The mass of servers way rather than the mass of drive racks has other implications. In particular, what will database systems look like? The knee jerk reaction is: distributed. On the other hand, may be not in the way that has been before. Fusion-io still (they haven't changed their tune, so far as I know) is on the path to terabytes on the slot. For my beloved BCNF databases which are orders of magnitude more compact than flatfile/xml messes, that is perfectly fine.
Of note in the announcements is that Fusion-io mounts an additional NAND chip for data protection. Not only is STEC threatened, but so are all the storage rack vendors. To the extent that the PCIe vendors can mount so much SSD in the board, which is faster still, and the notion of a room full of servers rather than a room full of drives taking over... Too bad Fusion-io is still private; a few shares might make for a more comfortable, and early, retirement. For those who crave retirement, anyway.
01 December 2009
More Testing
AnandTech has updated its SSD review. Have a wander over there, if you haven't lately (the piece is dated 17 November).
30 November 2009
He Who Rests, Rusts
Google's recently announced operating system to be, Chrome OS, is to support only SSD. Hallelujah?? Well, yes and no. The OS is designed to run on netbooks, with most data on the Google machine at the other end of the wire. Oh, and that's where the programs will reside, too. So, in such a limited venue, supporting only SSD makes perfect sense. The device is therefore fully solid state, and therefifth totally rugged; the most vulnerable part being the screen.
Whether this decision has much impact on SSD adoption at the datastore, not much, I'm afraid. But it does get the attention of OEM's. And to the extent that storage production moves more to SSD by units shipped, and thus lowering the price of SSD's globally, then that's a Good Thing. We, and I count you Dear Reader in that plural, have to educate the Pointy Haired Bosses to the benefit of SSD as a refactoring opportunity, not just as another storage option. Now, let's get out there with the pickets: "Down with Rust!!! Up with Sand!!! Yeah, Yeah, Yeah".
Whether this decision has much impact on SSD adoption at the datastore, not much, I'm afraid. But it does get the attention of OEM's. And to the extent that storage production moves more to SSD by units shipped, and thus lowering the price of SSD's globally, then that's a Good Thing. We, and I count you Dear Reader in that plural, have to educate the Pointy Haired Bosses to the benefit of SSD as a refactoring opportunity, not just as another storage option. Now, let's get out there with the pickets: "Down with Rust!!! Up with Sand!!! Yeah, Yeah, Yeah".
25 November 2009
Fusion-io Ups the Ante, Gauntlet Tossed
Last week Fusion-io raised the bar considerably. While I don't normally quote much from sources, this time I can't resist:
Achieving a 1TB/s sustained bandwidth with existing state-of-the-art storage technologies requires close to 55,440 disk drives, 396 SAN controllers, 792 I/O servers and 132 racks of equipment. Fusion-io can achieve this same bandwidth with a mere 220 ioDrive Octal cards, housed in Infiniband-attached I/O servers running the Lustre parallel file system. This 1TB/s Fusion-io based solution requires only six racks or less than 1/20th the rack space of an equivalent, high-performance, hard disk drive-based storage system.
What's important about this quote is its specificity: .5% of the number of drives, and ancillary gear. The other important point is that Fusion-io adapted their PCIe cards to an external rack. The rackable folk (STEC in particular) won't have as easy a task of taking their drives to the slot. Whether the controller does the job of wear leveling and the like, only time will tell.
And then there's the air of mystery, "two presently undisclosed government organizations" are the buyers. Could it be CIA/NSA doing a real-time database??? Only The Shadow Knows.
Achieving a 1TB/s sustained bandwidth with existing state-of-the-art storage technologies requires close to 55,440 disk drives, 396 SAN controllers, 792 I/O servers and 132 racks of equipment. Fusion-io can achieve this same bandwidth with a mere 220 ioDrive Octal cards, housed in Infiniband-attached I/O servers running the Lustre parallel file system. This 1TB/s Fusion-io based solution requires only six racks or less than 1/20th the rack space of an equivalent, high-performance, hard disk drive-based storage system.
What's important about this quote is its specificity: .5% of the number of drives, and ancillary gear. The other important point is that Fusion-io adapted their PCIe cards to an external rack. The rackable folk (STEC in particular) won't have as easy a task of taking their drives to the slot. Whether the controller does the job of wear leveling and the like, only time will tell.
And then there's the air of mystery, "two presently undisclosed government organizations" are the buyers. Could it be CIA/NSA doing a real-time database??? Only The Shadow Knows.
14 November 2009
MySql is a Threat to Oracle
This past week, the Times, in one day, provided two really stupid stories. I wrote a screed, which they've not published (no surprise there), so here's the one for DrCodd. The background: they ran a story about why Oracle and Sun should or should not merge, and the reasons why the EU objections are wrong. Not directly about SSD, except that Oracle really needs Sun's hardware if it has a chance of stealing IBM's mainframe customer base; I suspect that SSD, either in real drive form or the faux form now on display, is integral to accomplishing that.
The objection by the EU rests on facts which neither today's, nor any other I've read in The Times, deign to tell the reader. First and foremost: MySql is *not*, repeat *not*, Open Source as that term is understood. From the beginning, MySql has held a dual license, one of which is Open Source. As MySql stands today, this minimalist version is used by lots of web site and amateur developers. This version is and can be used without being beholden to MySql/Sun/Oracle.
There is, however, the commercial license version. This version, as explicitly stated countless times by the original developers, was the source of funding the Open Source version. Oracle, as I understand it now, is not compelled to continue this largesse by the DoJ. Oracle could simply abandon the Open Source version of MySql.
It gets better. The heart of database software is known as the engine. MySql's original (and still the Open Source version) engine is primitive in the extreme. No serious database developer would use it for anything serious. It is not a threat to Oracle, or any industrial strength database.
But, then Oracle bought, and maintains, an engine which is syntactically equivalent to Oracle itself. It is called InnoDB. While InnoDB is Open Source in origin, it is maintained by a separate company (owned by Oracle), Innobase OY in Finland.
Here's where the potential conflict arises: before now, Oracle, through Innobase OY, had an arm's length contractual relationship with MySql in providing InnoDB for MySql. With the merger, Oracle *could* abandon InnoDB in MySql, and thus force users to take up Oracle database, since the SQL access to a MySql/InnoDB database matches Oracle, and not the SQL that would be used with Open Source MySql. That's why, in my estimation, the EU is concerned.
Oracle has reason to do so: MySql/InnoDB *is* a low cost threat to Oracle.
The objection by the EU rests on facts which neither today's, nor any other I've read in The Times, deign to tell the reader. First and foremost: MySql is *not*, repeat *not*, Open Source as that term is understood. From the beginning, MySql has held a dual license, one of which is Open Source. As MySql stands today, this minimalist version is used by lots of web site and amateur developers. This version is and can be used without being beholden to MySql/Sun/Oracle.
There is, however, the commercial license version. This version, as explicitly stated countless times by the original developers, was the source of funding the Open Source version. Oracle, as I understand it now, is not compelled to continue this largesse by the DoJ. Oracle could simply abandon the Open Source version of MySql.
It gets better. The heart of database software is known as the engine. MySql's original (and still the Open Source version) engine is primitive in the extreme. No serious database developer would use it for anything serious. It is not a threat to Oracle, or any industrial strength database.
But, then Oracle bought, and maintains, an engine which is syntactically equivalent to Oracle itself. It is called InnoDB. While InnoDB is Open Source in origin, it is maintained by a separate company (owned by Oracle), Innobase OY in Finland.
Here's where the potential conflict arises: before now, Oracle, through Innobase OY, had an arm's length contractual relationship with MySql in providing InnoDB for MySql. With the merger, Oracle *could* abandon InnoDB in MySql, and thus force users to take up Oracle database, since the SQL access to a MySql/InnoDB database matches Oracle, and not the SQL that would be used with Open Source MySql. That's why, in my estimation, the EU is concerned.
Oracle has reason to do so: MySql/InnoDB *is* a low cost threat to Oracle.
04 November 2009
Humpty Dumpty Had a Great Fall
Our bellwether company, STEC, reported last night; and boy howdy, was it bad. Not too surprisingly, this endeavor had been predicting less than wonderful news. The share dropped to below $16 in after hours yesterday, and this morning is a bit above that number as I type. The point of this endeavor isn't stock tips, of course, but the fortunes of Real World producers of SSD is of significance to what does matter: the transition to BCNF databases on SSD multi-core/processor machines.
The accounting for the 3rd quarter was what the analysts predicted. The news that has sent the share in the toilet is the revelation (which could have been discovered earlier) that EMC, which accounts for "90%" (according to one news report) of the ZeusIOPS demand, would stretch its current stock into 1st quarter 2010. This news pretty much contradicts what had been in the wind, that EMC was taking all the STEC SSD it could get.
Not surprisingly, the emerging news and analysis boils down to: SSD isn't replacing HDD in the enterprise at warp speed after all. I will gloat here. I had mentioned in earlier posts that the transition from HDD --> SSD had already morphed to HDD --> SSD+HDD. While none of the reports have explicitly stated this as the reason that EMC has enough SSD for the time being, such would be the rosier outlook.
The less rosy outlook is that the storage system business isn't going to expand at nearly the rate a growing economy warrants. Dum da dum dum. Dum.
Truth be told, my earlier surmise, that SSD multi-machines will be adopted more by the VAR networks where they control both the software and hardware as a package, remains my conviction. The enterprise is still populated by brontosaurs.
The accounting for the 3rd quarter was what the analysts predicted. The news that has sent the share in the toilet is the revelation (which could have been discovered earlier) that EMC, which accounts for "90%" (according to one news report) of the ZeusIOPS demand, would stretch its current stock into 1st quarter 2010. This news pretty much contradicts what had been in the wind, that EMC was taking all the STEC SSD it could get.
Not surprisingly, the emerging news and analysis boils down to: SSD isn't replacing HDD in the enterprise at warp speed after all. I will gloat here. I had mentioned in earlier posts that the transition from HDD --> SSD had already morphed to HDD --> SSD+HDD. While none of the reports have explicitly stated this as the reason that EMC has enough SSD for the time being, such would be the rosier outlook.
The less rosy outlook is that the storage system business isn't going to expand at nearly the rate a growing economy warrants. Dum da dum dum. Dum.
Truth be told, my earlier surmise, that SSD multi-machines will be adopted more by the VAR networks where they control both the software and hardware as a package, remains my conviction. The enterprise is still populated by brontosaurs.
31 October 2009
Dahling, You Have to Read This
Shooting fish in a barrel, and criticizing xml, are congruent. The difference is that shooting fish is a one step process, while dealing with the zits of xml is a life long ordeal. Thus, I've only made a few random comments. The effort to build a new coherent jeremiad simply didn't feel worth the effort.
Then I ran across this. While I think he gives xml too much credit, in that "documents" intended to be consumed into my beloved BCNF database really don't need the structure (metadata information). A csv file will do. Pascal has written about that years ago. He was right then, and he's still right.
What I find mind boggling is this quote:
It's pretty popular these days to kick XML, all the cool kids are doing it and they don't seem to discriminate between its 'good' purposes and 'bad' uses.
I guess I'm not lucky enough to live in his world, 'cause where I've been, xml-ness is still au courant. XML Spy, ACORD, and the like remain in control. I need a new life!!
Then I ran across this. While I think he gives xml too much credit, in that "documents" intended to be consumed into my beloved BCNF database really don't need the structure (metadata information). A csv file will do. Pascal has written about that years ago. He was right then, and he's still right.
What I find mind boggling is this quote:
It's pretty popular these days to kick XML, all the cool kids are doing it and they don't seem to discriminate between its 'good' purposes and 'bad' uses.
I guess I'm not lucky enough to live in his world, 'cause where I've been, xml-ness is still au courant. XML Spy, ACORD, and the like remain in control. I need a new life!!
29 October 2009
Double, Double Toil and Trouble
There's been a hurricane of consternation on the Yahoo! (and I strongly expect, other) message board for STEC, our resident poster child of high performance SSD. The share has cratered from its high ~$40 to ~$20 yesterday. Today it's up a bit. Message boards have some interest, since a small fraction of the stream is intelligent. No, I don't hold any STEC, nor do I care whether they get rich. I do want them, or a company doing what they do, to continue.
One piece of intelligence is reference to this blog, at IBM. It doesn't navigate very well, and I'm sure not going to regurgitate it here, except to say that IBM has, if this fellow speaks for the company, backed off somewhat from the PCIe (Fusion-io) approach and reverted to straight SSD; STEC according to what he has written. The posting of interest talks about why PCIe went away, and SAS drives came back.
OK, one quote, from the 18 Sep entry:
Another interesting side note can be seen when you add the areal density of silicon, which to this day has tracked almost scarily to Moores Law. If for example, the GMR [giant magnetoresistive] head had not been invented by Stuart Parkin, then we'd probably have had mainstream solid state drives in the mid 90's. If nothing else comes along to push spinning rust back to the heady days of 65-100% CAGR[compound annual growth rate], then by 2015 solid state density will overtake magnetic density - in a bits per square inch term.
Now, this is important. The transition from HDD to SSD has changed, based on public pronouncements from the vendors, since January of this very year. Up until then, the notion was that HDD arrays would be replaced by (smaller unit count) SSD (mirrored?) arrays. The higher cost of SSD would be mitigated by the lower unit count resulting from not requiring striping (and possibly, mirroring) units in the array. (Aside, and from refactoring databases to BCNF; thus jettisoning at least an order of magnitude of those bytes.) Now, the notion being promoted is "disk caching", with some small number of SSD fronting the existing HDD array. I'm still not convinced this makes much sense, but there you are. The major impact of this approach is to simply not deal with data bloat, and thus get maximum benefit from SSD, but to garner "good enough" improvement.
If we are headed to a cross over in data density (but not necessarily cost/bit), then preparing for, and building now, pure SSD systems isn't implausible. While I don't relish the "disk caching" approach as the end game, if it serves to jam a size 12 brogan in the door; I'll take it.
One piece of intelligence is reference to this blog, at IBM. It doesn't navigate very well, and I'm sure not going to regurgitate it here, except to say that IBM has, if this fellow speaks for the company, backed off somewhat from the PCIe (Fusion-io) approach and reverted to straight SSD; STEC according to what he has written. The posting of interest talks about why PCIe went away, and SAS drives came back.
OK, one quote, from the 18 Sep entry:
Another interesting side note can be seen when you add the areal density of silicon, which to this day has tracked almost scarily to Moores Law. If for example, the GMR [giant magnetoresistive] head had not been invented by Stuart Parkin, then we'd probably have had mainstream solid state drives in the mid 90's. If nothing else comes along to push spinning rust back to the heady days of 65-100% CAGR[compound annual growth rate], then by 2015 solid state density will overtake magnetic density - in a bits per square inch term.
Now, this is important. The transition from HDD to SSD has changed, based on public pronouncements from the vendors, since January of this very year. Up until then, the notion was that HDD arrays would be replaced by (smaller unit count) SSD (mirrored?) arrays. The higher cost of SSD would be mitigated by the lower unit count resulting from not requiring striping (and possibly, mirroring) units in the array. (Aside, and from refactoring databases to BCNF; thus jettisoning at least an order of magnitude of those bytes.) Now, the notion being promoted is "disk caching", with some small number of SSD fronting the existing HDD array. I'm still not convinced this makes much sense, but there you are. The major impact of this approach is to simply not deal with data bloat, and thus get maximum benefit from SSD, but to garner "good enough" improvement.
If we are headed to a cross over in data density (but not necessarily cost/bit), then preparing for, and building now, pure SSD systems isn't implausible. While I don't relish the "disk caching" approach as the end game, if it serves to jam a size 12 brogan in the door; I'll take it.
27 October 2009
Swimming the Amazon Upstream
News from the jungle. Amazon is now offering an explicit relational database cloud service, RDS. It is MySql 5.1, so calling it "relational" is a bit of a stretch, but it is some good news.
I had the temerity to suggest that they offer SSD storage as an explicit option, so that real database wonks can build real BCNF applications. Got back a personalized auto reply, which said that "An AWS representative will reach out as soon as possible to address your questions about cloud computing and Amazon Web Services." I am on tenterhooks.
I had the temerity to suggest that they offer SSD storage as an explicit option, so that real database wonks can build real BCNF applications. Got back a personalized auto reply, which said that "An AWS representative will reach out as soon as possible to address your questions about cloud computing and Amazon Web Services." I am on tenterhooks.
23 October 2009
Follow the Yellow Brick Road
An update on where I think this SSD/multi-machine/BCNF database journey is going. The independent storage vendors are adopting a "tiered storage" paradigm, with SSD/HDD machines and embedded controllers aimed at "enterprise" clients; large servers (quasi mainframes) and mainframes. My conclusion is that their clients don't want to refactor anything, just keep that 30 or 40 year old code and make it run a bit faster. Such machines adopt a disk caching approach; I'm not convinced that there is much point to having live data on multiple varieties of persistent storage. (Aside: in the *nix world of databases, it is understood that the database machine should have *nix file buffering turned off so that the database can do it's thing, faster. The same applies to persistent storage. They'll all find out, eventually.) So, who's going to adopt SSD/BCNF machines outright?
Well, back in the late 1980's, the VAR (value added reseller) emerged as a conduit for *nix database companies. The archetypes were Progress, Uniface, and Informix (PowerBuilder in the M$ world). They provided a "relational" database and a 4GL to manage it. In the case of Informix, it was/is a real database with proper SQL access. The others stressed their own 4GL over SQL. They aimed at developers with business expertise to build replacement applications for hoary minicomputer (mostly; some mainframe) file based ones. Since these folks weren't burdoned with legacy "stuff", they had a reasonably clean sheet.
This venue is where I see the entrepreneurial force to accept the "new" idea of declarative data, minimal code. The VARs I've dealt with (or worked for) always supplied the machine, too. A source of profit, but also met a strategic need: support of clients is *so* much easier if you already know more about the full system than the clients. Shipping out the full system with a couple of STEC drives (or Fusion-io, or...) is not a client decision. That, fact is, has always been true. I still keep getting calls to work on COBOL code that's decades old, when it is clear that such organizations are desperately trying to keep from sinking in the tar pits (like the metaphor?), without having to do any real thinking or engineering. The other area is a gawd awful lot of SQLServer. I don't get that at all. Not all change embodies progress, but all progress requires change. You can't make an omelet with breaking some eggs. Doing the same thing over and over again, and expecting a different result, is a definition of insanity. The first step in fixing a problem is to admit that the problem exists.
Now, I just have to find one. And, no, I don't have in mind any application that I just have to create; I care about the tech, not the application.
Well, back in the late 1980's, the VAR (value added reseller) emerged as a conduit for *nix database companies. The archetypes were Progress, Uniface, and Informix (PowerBuilder in the M$ world). They provided a "relational" database and a 4GL to manage it. In the case of Informix, it was/is a real database with proper SQL access. The others stressed their own 4GL over SQL. They aimed at developers with business expertise to build replacement applications for hoary minicomputer (mostly; some mainframe) file based ones. Since these folks weren't burdoned with legacy "stuff", they had a reasonably clean sheet.
This venue is where I see the entrepreneurial force to accept the "new" idea of declarative data, minimal code. The VARs I've dealt with (or worked for) always supplied the machine, too. A source of profit, but also met a strategic need: support of clients is *so* much easier if you already know more about the full system than the clients. Shipping out the full system with a couple of STEC drives (or Fusion-io, or...) is not a client decision. That, fact is, has always been true. I still keep getting calls to work on COBOL code that's decades old, when it is clear that such organizations are desperately trying to keep from sinking in the tar pits (like the metaphor?), without having to do any real thinking or engineering. The other area is a gawd awful lot of SQLServer. I don't get that at all. Not all change embodies progress, but all progress requires change. You can't make an omelet with breaking some eggs. Doing the same thing over and over again, and expecting a different result, is a definition of insanity. The first step in fixing a problem is to admit that the problem exists.
Now, I just have to find one. And, no, I don't have in mind any application that I just have to create; I care about the tech, not the application.
16 October 2009
Shift in Focus?
In a few short months, there has been a shift in focus with regard to Enterprise SSD. I don't yet have a conclusion whether this is a good thing or bad thing, but I suspect the latter from the point of view of this endeavor, the BCNF relational database. What has emerged is the "disk cache" notion, with a few SSD (STEC or similar) fronting the HDD array. The Sun/Oracle FlashFire is the most recent example.
The controllers (SSD and HDD) work in concert to move data from the HDD to the SSD, just in time. The downside, from my point of view, is that this bandaid measure is a sop to those existing bloated file based (read: xml) applications. From the desire for immediate gratification, I suppose this is OK. But doing so squanders the big value of SSD. The vendors would be just as well off, if not better off, using static RAM caches.
We'll just have to wait and see how SSD plays out. If the "disk cache" approach becomes the norm, then the enterprise SSD (and its effect on the relational database) shrinks into niche status. The would break my heart, tender young thing that it is.
The controllers (SSD and HDD) work in concert to move data from the HDD to the SSD, just in time. The downside, from my point of view, is that this bandaid measure is a sop to those existing bloated file based (read: xml) applications. From the desire for immediate gratification, I suppose this is OK. But doing so squanders the big value of SSD. The vendors would be just as well off, if not better off, using static RAM caches.
We'll just have to wait and see how SSD plays out. If the "disk cache" approach becomes the norm, then the enterprise SSD (and its effect on the relational database) shrinks into niche status. The would break my heart, tender young thing that it is.
12 October 2009
Can Somebody Please Make Up His Mind?
A double dip day.
This has gotten out of hand. First, I read that the Sun/Oracle FlashFire gizmo is a superDuper SSD. And I pass that on, with my Value Added commentary and glee. Then Larry says, "No" it isn't. It's just a bunch of NAND as cache. And I fess up to an error and slink back to my lair.
So, today The Register runs a story with the details... wait for it. FlashFire is a superDuper something like an SSD; although not called that exactly. But it is populated with STEC parts. My head is spinning, lights flash, getting sooooo dark.
Whatever. The salient fact is that SSD and databases are taking over the world, just as I've been predicting. I want to be First Chancellor of Normal Form. No database can leave the crib without a proper bris; all extraneous fluff removed.
This has gotten out of hand. First, I read that the Sun/Oracle FlashFire gizmo is a superDuper SSD. And I pass that on, with my Value Added commentary and glee. Then Larry says, "No" it isn't. It's just a bunch of NAND as cache. And I fess up to an error and slink back to my lair.
So, today The Register runs a story with the details... wait for it. FlashFire is a superDuper something like an SSD; although not called that exactly. But it is populated with STEC parts. My head is spinning, lights flash, getting sooooo dark.
Whatever. The salient fact is that SSD and databases are taking over the world, just as I've been predicting. I want to be First Chancellor of Normal Form. No database can leave the crib without a proper bris; all extraneous fluff removed.
Not A Cloud Was in The Sky
I've been saying all along that The Cloud is a Crock. Well, here's the latest in the saga. You should go and read the story; I won't cut-n-paste it here. I will gloat, however. Imagine what's going to happen when the BigMegaCorp leaves its data in the hands of MicroSoft? Same thing.
Well, maybe one quote:
Microsoft said in an emailed statement that the recovery process has been "incredibly complex" because it suffered a confluence of errors from a server failure that hurt its main and backup databases supporting Sidekick users.
Going to The Cloud is a dereliction of duty, pure and simple. Losing one's phone numbers is one small step for ineptitude, but it should not be one great fall of mankind. Your data is yours. Do not give it away to gain a penny here or a penny there.
Well, maybe one quote:
Microsoft said in an emailed statement that the recovery process has been "incredibly complex" because it suffered a confluence of errors from a server failure that hurt its main and backup databases supporting Sidekick users.
Going to The Cloud is a dereliction of duty, pure and simple. Losing one's phone numbers is one small step for ineptitude, but it should not be one great fall of mankind. Your data is yours. Do not give it away to gain a penny here or a penny there.
04 October 2009
Da, Comrade, All You Need is Black
I was just over at the Yahoo! STEC message board, attempting to bring some sense to those folks. A heavy burden, but someone has to do it. There was a thread from August, which got restarted, which tried to find justification for SSD, STEC's in particular, in cloud computing.
I disabused the poster, pointing out that cloud is all about scads of plain vanilla resource; disc in this case. A cloud provider may not even tell the client what resources types are being provided, only CPU seconds, gigabytes of disc, and the like.
Then it hit me, the light went off, the Red Sea parted. The cloud is the implementation of the Soviet Model: "What, you want a suit that's not black? You don't need blue. Black will do". It's such delicious irony; the titans of Capitalism implementing Soviet era Communism. I'll sleep better tonight, secure in the belief that American corporations are content to be told what to do by their vendors. Ah, sweet justice. Can the pogroms be far behind?
I disabused the poster, pointing out that cloud is all about scads of plain vanilla resource; disc in this case. A cloud provider may not even tell the client what resources types are being provided, only CPU seconds, gigabytes of disc, and the like.
Then it hit me, the light went off, the Red Sea parted. The cloud is the implementation of the Soviet Model: "What, you want a suit that's not black? You don't need blue. Black will do". It's such delicious irony; the titans of Capitalism implementing Soviet era Communism. I'll sleep better tonight, secure in the belief that American corporations are content to be told what to do by their vendors. Ah, sweet justice. Can the pogroms be far behind?
29 September 2009
Alice, This isn't Kansas
"Alice in Wonderland" gives us a wonderful quote, from Humpty Dumpty: When I use a word it means just what I choose it to mean - neither more nor less.
What that has to do with today's musing is thus, I still would rather do databases for money than trade biotech stocks. So, I keep a baleful eye on various listings, looking for anything interesting. I generally only bother with titles as; Database Architect, Database Designer, and DBA (this is the catchall title which sometimes is more than doing backups).
Here's a position with the title, Database Architect for Nokia (Boston area office). I'll leave the curious to venture there, at their peril. Now, I would expect (devilish imp that I am) that a company the size and sophistication of Nokia would have a clue about data and databases. I eagerly surfed to the listing, only to find (among other silliness) the following:
Domain expert in MySQL design and development
Familiarity with Hadoop/Pig/HBase and MapReduce/Sawzall/Bigtable desirable
I'll send along a resume just for shits and grins, but for pity's sake. This isn't a Database Architect. It's an application coder. Gawd darn it.
What that has to do with today's musing is thus, I still would rather do databases for money than trade biotech stocks. So, I keep a baleful eye on various listings, looking for anything interesting. I generally only bother with titles as; Database Architect, Database Designer, and DBA (this is the catchall title which sometimes is more than doing backups).
Here's a position with the title, Database Architect for Nokia (Boston area office). I'll leave the curious to venture there, at their peril. Now, I would expect (devilish imp that I am) that a company the size and sophistication of Nokia would have a clue about data and databases. I eagerly surfed to the listing, only to find (among other silliness) the following:
Domain expert in MySQL design and development
Familiarity with Hadoop/Pig/HBase and MapReduce/Sawzall/Bigtable desirable
I'll send along a resume just for shits and grins, but for pity's sake. This isn't a Database Architect. It's an application coder. Gawd darn it.
26 September 2009
We've Seen This Movie Before
Many of us have seen this movie before, if you're of a certain age (or had instructors who are). It's a cross between "The Return of Fankenstein" and "Groundhog Day". The theme has arisen with some frequency in the last few weeks, on Artima in particular.
The movie is scripted: it's all well and fine for you to talk about SSD changing the rules, but we've still got to write our applications for the normal HDD environment; not all clients will have gotten the SSD revelation. I knew this sounded familiar, but it took me a while to put my fingers on it.
In the 1960's, and more so the 1970's, the magnetic disk subsystem (IBM terminology then) began to proliferate. But it didn't instantly replace the 9 track tape drive. In the IBM world, COBOL was the language (along with some Assembler and that mutant PL/1) of choice, and had been in use for a decade by the time the 370 took over. The I/O calls had been written for sequential access (3 tape sort/merge update was the paradigm) from time immemorial.
The result was that COBOL continued to be written to a sequential access method, even though random access was the whole point of the disk drive. Files on disk were imaged as if they were on tape. The reason was simply convenience to COBOL maintenance coders. Even new applications tended to do the same; inertia is a powerful force.
Hardware paradigm shifts often are captive of software inertia. SSD is not the only one now. The multi-core/processor machine presents problems to coders; in greater magnitude than SSD. Here's Spolky's recent rumination. The money quote:
Sure, there's nothing officially wrong with trying to write multithreaded code in C++ on Windows using COM. But it's prone to disastrous bugs, the kind of bugs that only happen under very specific timing scenarios, because our brains are not, honestly, good enough to write this kind of code.
Whether multi-core/processor machines will ever be useful to application programs, by which I mean guys writing the 1,023,484th General Ledger for Retail Sales, remains up in the air. The guys writing operating systems and database engines will have much fewer issues; they've been in the multi-threaded world for decades. We should let them go ahead and do their thing. Let the database engine do all that heavy lifting, and leave us mere mortals to decide which widget to use and what the schema should look like.
On the other hand, making the transition to SSD store and BCNF schemas will require slapping down hidebound application coders who wish to remain in the far past. I see a future where applications which have limped along, structurally, unchanged since the 70's finally being replaced with small high normalized databases. It will be just too cheap not to. A system based on a few dozen SSD will replace those geriatric pigs with thousands or more HDD. The ongoing cost difference (TCO, as they say) will easily be greater than the amortization of build costs.
For those geriatric pigs which were built somewhat more recently, and built around stored procedures rather than application SQL, will have a better chance of survival. All these codebases will need is the schema refactored, and the stored procs updated. The client application code wouldn't change; the proc still returns the same bloated data, alas.
If you go to the concession stand, I'd like a large popcorn and Dr. Pepper. Thanks.
The movie is scripted: it's all well and fine for you to talk about SSD changing the rules, but we've still got to write our applications for the normal HDD environment; not all clients will have gotten the SSD revelation. I knew this sounded familiar, but it took me a while to put my fingers on it.
In the 1960's, and more so the 1970's, the magnetic disk subsystem (IBM terminology then) began to proliferate. But it didn't instantly replace the 9 track tape drive. In the IBM world, COBOL was the language (along with some Assembler and that mutant PL/1) of choice, and had been in use for a decade by the time the 370 took over. The I/O calls had been written for sequential access (3 tape sort/merge update was the paradigm) from time immemorial.
The result was that COBOL continued to be written to a sequential access method, even though random access was the whole point of the disk drive. Files on disk were imaged as if they were on tape. The reason was simply convenience to COBOL maintenance coders. Even new applications tended to do the same; inertia is a powerful force.
Hardware paradigm shifts often are captive of software inertia. SSD is not the only one now. The multi-core/processor machine presents problems to coders; in greater magnitude than SSD. Here's Spolky's recent rumination. The money quote:
Sure, there's nothing officially wrong with trying to write multithreaded code in C++ on Windows using COM. But it's prone to disastrous bugs, the kind of bugs that only happen under very specific timing scenarios, because our brains are not, honestly, good enough to write this kind of code.
Whether multi-core/processor machines will ever be useful to application programs, by which I mean guys writing the 1,023,484th General Ledger for Retail Sales, remains up in the air. The guys writing operating systems and database engines will have much fewer issues; they've been in the multi-threaded world for decades. We should let them go ahead and do their thing. Let the database engine do all that heavy lifting, and leave us mere mortals to decide which widget to use and what the schema should look like.
On the other hand, making the transition to SSD store and BCNF schemas will require slapping down hidebound application coders who wish to remain in the far past. I see a future where applications which have limped along, structurally, unchanged since the 70's finally being replaced with small high normalized databases. It will be just too cheap not to. A system based on a few dozen SSD will replace those geriatric pigs with thousands or more HDD. The ongoing cost difference (TCO, as they say) will easily be greater than the amortization of build costs.
For those geriatric pigs which were built somewhat more recently, and built around stored procedures rather than application SQL, will have a better chance of survival. All these codebases will need is the schema refactored, and the stored procs updated. The client application code wouldn't change; the proc still returns the same bloated data, alas.
If you go to the concession stand, I'd like a large popcorn and Dr. Pepper. Thanks.
21 September 2009
Inoesco: "Exit the King"
I will happily admit that the machinations going on around SSD is currently my main fascination; this is the enabling technology for true RDBMS designs. So, watching the money grubbers is certainly amusing.
Last week (and continuing, although to a lesser degree, today) STEC's share dropped like a rock. Today they released a White Paper defending their part. It's actually got some useful information.
But, you've got to wonder. The PR and the posting of it's existence on the Yahoo (and probably all others; it's also on Schwab) message board is geared to Joe SixPack plunger, not the enterprise CIO. Some times, one has to just wonder what goes on in some people's heads.
Last week (and continuing, although to a lesser degree, today) STEC's share dropped like a rock. Today they released a White Paper defending their part. It's actually got some useful information.
But, you've got to wonder. The PR and the posting of it's existence on the Yahoo (and probably all others; it's also on Schwab) message board is geared to Joe SixPack plunger, not the enterprise CIO. Some times, one has to just wonder what goes on in some people's heads.
19 September 2009
Color Me Sad
Were I a real reporter, I might have been there for the announcement, but alas no. Turns out that FlashFire is not a SSD array. This is Larry (ComputerWorld is where I found this quote, for whatever reason not in their PR):
"We have a huge, fast flash cache built into our storage servers," Ellison said. "These are not flash disks -- make no mistake, these are not flash disks. This is a smart memory hierarchy made up of DRAM in our database servers and flash in our storage servers, with very sophisticated algorithms. This is a very smart memory hierarchy where the Oracle software manages that memory extremely efficiently, much faster than flash disk."
So, it seems that FlashFire is not a flash disk subsystem. Oh well.
Update:
I knew I had read that FlashFire was specific to SSD implementation; I know Professor, I should keep better notes. But Burleson is one of the veterans of SSD for databases; Oracle in his case. I'll be adding a few Good Stuff links, and his main site will be one of them. So, here's his take on FlashFire (which also conflicts with what Larry said).
"We have a huge, fast flash cache built into our storage servers," Ellison said. "These are not flash disks -- make no mistake, these are not flash disks. This is a smart memory hierarchy made up of DRAM in our database servers and flash in our storage servers, with very sophisticated algorithms. This is a very smart memory hierarchy where the Oracle software manages that memory extremely efficiently, much faster than flash disk."
So, it seems that FlashFire is not a flash disk subsystem. Oh well.
Update:
I knew I had read that FlashFire was specific to SSD implementation; I know Professor, I should keep better notes. But Burleson is one of the veterans of SSD for databases; Oracle in his case. I'll be adding a few Good Stuff links, and his main site will be one of them. So, here's his take on FlashFire (which also conflicts with what Larry said).
18 September 2009
I Can do a Full Gainer
I'm going to bite the bullet, so to speak, and get an SSD for this machine and do some personal experimenting. I've been putting it off for a while due to: the time devoted to getting work (which is way too time consuming), the time needed to be a stock tycoon, and my less than stellar view of what's been available. What I want is a device that's under a grand, has enough capacity to simulate a real-world (commercial) system, and is a no brainer to install on ubuntu. I've long since lost interest in doing hardware installs just for laughs.
I think I've found what I want. It's the Fusion-io ioXtreme. It was reported to be shipping in July, but looking at the Fusion-io site, they're in sign up to be the first on your block to own one mode. Sigh. But the price, $895 for 80gig in a PCIe card, is in the proper ballpark, and it won't be rattling around in the machine. More than the X-25M, about the same as the X-25E, $$/gig anyway.
While I was wandering around the Fusion-io site, I came across this from July. Now, it doesn't read as though they went ahead and normalized the data, just moved it to the SSD. (It's a TPC-H database and likely some form of star/snowflake.) I can live with that; folks are still taking baby steps along the Yellow Brick Road. It does demonstrate, if one accepts the notion of validity of TPC benchmarks, that SSD can save money while being faster and less filling.
I think I've found what I want. It's the Fusion-io ioXtreme. It was reported to be shipping in July, but looking at the Fusion-io site, they're in sign up to be the first on your block to own one mode. Sigh. But the price, $895 for 80gig in a PCIe card, is in the proper ballpark, and it won't be rattling around in the machine. More than the X-25M, about the same as the X-25E, $$/gig anyway.
While I was wandering around the Fusion-io site, I came across this from July. Now, it doesn't read as though they went ahead and normalized the data, just moved it to the SSD. (It's a TPC-H database and likely some form of star/snowflake.) I can live with that; folks are still taking baby steps along the Yellow Brick Road. It does demonstrate, if one accepts the notion of validity of TPC benchmarks, that SSD can save money while being faster and less filling.
17 September 2009
Blood in the Street
There be carnage out there. I've been keeping a periodic eye gazing at the SSD stocks (and the increasing number of privates), with STEC being the "acknowledged leader" in enterprise drives. So their PR always says. It is true that STEC was, if not the first, certainly early and often qualified. Their list includes EMC, IBM, Sun, Compellent, HP.
It's been a week since I looked at the stock (I spend much of my stock time tracking biotech; more money more faster), and to my wondering eyes do appear but a true crash. Last I looked the share was over $40. Today it closed at $31.53. Trust me, stock promotion is not a factor in this endeavor, but it is undeniable that the current state of SSD in the enterprise is because of STEC's efforts to make itself rich, which it has.
The management of the company has been singing the "ain't nobody can do what we do" song for the last couple years, and in the last 12 months has signed up with the aforementioned companies. All the while deflecting questions about the likelihood of other suppliers of SSD. I never bought it. This site has been tracking the SSD world for more than a decade; since before flash was even used. Spending some time there makes it clear that STEC isn't the only game in town, and never was.
So, what happened? Turns out that Pliant Technology released its version of enterprise SSD a few days ago, which prompted some of the analysts to reduce their opinions of STEC.
The reason all this matters is that having multiple credible sources of enterprise SSD (what that term means is still open to discussion) is better for real relational database implementations. Which is what this endeavor is really all about. The SSD aspect is merely the implementation detail that makes it all possible.
What's bad for the Wall Street casino players is actually good for folks who are working at building useful things, and not merely engaging in zero sum games with each other.
It's been a week since I looked at the stock (I spend much of my stock time tracking biotech; more money more faster), and to my wondering eyes do appear but a true crash. Last I looked the share was over $40. Today it closed at $31.53. Trust me, stock promotion is not a factor in this endeavor, but it is undeniable that the current state of SSD in the enterprise is because of STEC's efforts to make itself rich, which it has.
The management of the company has been singing the "ain't nobody can do what we do" song for the last couple years, and in the last 12 months has signed up with the aforementioned companies. All the while deflecting questions about the likelihood of other suppliers of SSD. I never bought it. This site has been tracking the SSD world for more than a decade; since before flash was even used. Spending some time there makes it clear that STEC isn't the only game in town, and never was.
So, what happened? Turns out that Pliant Technology released its version of enterprise SSD a few days ago, which prompted some of the analysts to reduce their opinions of STEC.
The reason all this matters is that having multiple credible sources of enterprise SSD (what that term means is still open to discussion) is better for real relational database implementations. Which is what this endeavor is really all about. The SSD aspect is merely the implementation detail that makes it all possible.
What's bad for the Wall Street casino players is actually good for folks who are working at building useful things, and not merely engaging in zero sum games with each other.
11 September 2009
Larry Finally Speaks, and.... I'm Right
For all of you out there who've been saying that Larry wants Sun for java or Solaris or MySql, here's what he said yesterday in the Wall Street Journal:
We're in it to win it.
IBM, we're looking forward to competing
with you in the hardware business.
Larry Ellison
How dare you all doubting me. I've been doing this for a long time. He wants to kill Armonk's mainframe business. He always has. (See 28 August for the most recent discussion.)
Update (15 September):
OK, so today Oracle/Sun announced the new Oracle Database Machine. Oracle had previously been building the Exadata machine on HP hardware. No longer. Of particular interest to the readers of this endeavor is this:
The Sun Oracle Database Machine also includes Sun's new FlashFire technology to cache 'hot' data for dramatically improved transaction response times and throughput.
So, what is FlashFire? According to Sun, it's their implementation ofSSD flash cache and system software for same.
From the PR:
You get ten times faster I/O response time and use ten times fewer disks for business applications from Oracle as well as third-party providers.
Larry's not interested in hardware. Nope. Armonk, you've got a problem.
We're in it to win it.
IBM, we're looking forward to competing
with you in the hardware business.
Larry Ellison
How dare you all doubting me. I've been doing this for a long time. He wants to kill Armonk's mainframe business. He always has. (See 28 August for the most recent discussion.)
Update (15 September):
OK, so today Oracle/Sun announced the new Oracle Database Machine. Oracle had previously been building the Exadata machine on HP hardware. No longer. Of particular interest to the readers of this endeavor is this:
The Sun Oracle Database Machine also includes Sun's new FlashFire technology to cache 'hot' data for dramatically improved transaction response times and throughput.
So, what is FlashFire? According to Sun, it's their implementation of
From the PR:
You get ten times faster I/O response time and use ten times fewer disks for business applications from Oracle as well as third-party providers.
Larry's not interested in hardware. Nope. Armonk, you've got a problem.
03 September 2009
Persistent Myth: Bandwidth is Infinite
There exists, still, the myth of infinite bandwidth. The myth exists in support of the notion that "web" applications can and should be just like desktop applications. But there is a problem: what is a desktop application? In the beginning, 1982, the IBM PC provided a standalone little computer, which was expected to be programed just like the 370, only for smaller problems related to the work of the individual.
That fairy tale came to an end with Lotus 1-2-3, which turned the PC into a toaster: an appliance which did some computing (itself done by programs written by professional assembly language programmers) upon some data entered, or made available, by the individual. Then came typing programs, later renamed word processing. The toaster syndrome was in full swing.
Then came Netware, and its ilk, to lead us to a kind of client/server environment. This is what "desktop application" really means these days: a local PC connected to a semi-local big computer. The VT-100 connected to a *nix database machine is the precursor to that.
AJAX, and so on, are attempts to take the 3270 behaviour of the web and turn it into the VT-100, albeit with pixels and graphics. In order to do that, the link to the outside world has to behave like fast RS-232.
So, today The New York Times runs this story. Infinite bandwidth, my eye. A bloody phone brings the net to its knees. When will people learn what your Mama told you, "what kind of world would we have if everybody behaved like you?". Nothing is infinite, stupdity likely excepted.
Update II:
There is the olde canard about tapes in a station wagon. Here is a new and even more amusing example.
Update:
In response to some questions from readers elsewhere, I'm led to pontificate further.
I missed out the obvious point (to me, anyway). There are two expenses in getting an image on the screen: computation and transfer of the image. With a 1982 desktop, what could be computed was memory mapped to the screen, so transfer was instantaneous (mostly).
With local networks and VT-100 to RS-232 to database, the screen is still a memory map in the server; all that goes over the wire is the characters in the screen image.
With GUI-ed screens in a local network, it's still manageable with Ethernet on wire.
With GUI-ed screens in the cell tower, not so much. Given that HTTP is about lots of request/response between the client (iPhone) and the server (Google machine, or whatever), the "virtual wire" gets overloaded. And will always be.
It's the same with building highways or subways or ...; traffic overwhelms infrastructure.
With an HTTP based internet, it's not possible to have a (mostly) passive (memory mapped) screen with all the computation at the server. Fact is, increasing computational power is a couple of orders of magnitude cheaper than I/O. And the web is about the least efficient form of I/O ever invented. It's not being used the way Cerf had designed it.
Abuse leads to breakdown, and the web is broke. The iPhone just makes it obvious, but not the reason.
That fairy tale came to an end with Lotus 1-2-3, which turned the PC into a toaster: an appliance which did some computing (itself done by programs written by professional assembly language programmers) upon some data entered, or made available, by the individual. Then came typing programs, later renamed word processing. The toaster syndrome was in full swing.
Then came Netware, and its ilk, to lead us to a kind of client/server environment. This is what "desktop application" really means these days: a local PC connected to a semi-local big computer. The VT-100 connected to a *nix database machine is the precursor to that.
AJAX, and so on, are attempts to take the 3270 behaviour of the web and turn it into the VT-100, albeit with pixels and graphics. In order to do that, the link to the outside world has to behave like fast RS-232.
So, today The New York Times runs this story. Infinite bandwidth, my eye. A bloody phone brings the net to its knees. When will people learn what your Mama told you, "what kind of world would we have if everybody behaved like you?". Nothing is infinite, stupdity likely excepted.
Update II:
There is the olde canard about tapes in a station wagon. Here is a new and even more amusing example.
Update:
In response to some questions from readers elsewhere, I'm led to pontificate further.
I missed out the obvious point (to me, anyway). There are two expenses in getting an image on the screen: computation and transfer of the image. With a 1982 desktop, what could be computed was memory mapped to the screen, so transfer was instantaneous (mostly).
With local networks and VT-100 to RS-232 to database, the screen is still a memory map in the server; all that goes over the wire is the characters in the screen image.
With GUI-ed screens in a local network, it's still manageable with Ethernet on wire.
With GUI-ed screens in the cell tower, not so much. Given that HTTP is about lots of request/response between the client (iPhone) and the server (Google machine, or whatever), the "virtual wire" gets overloaded. And will always be.
It's the same with building highways or subways or ...; traffic overwhelms infrastructure.
With an HTTP based internet, it's not possible to have a (mostly) passive (memory mapped) screen with all the computation at the server. Fact is, increasing computational power is a couple of orders of magnitude cheaper than I/O. And the web is about the least efficient form of I/O ever invented. It's not being used the way Cerf had designed it.
Abuse leads to breakdown, and the web is broke. The iPhone just makes it obvious, but not the reason.
28 August 2009
Larry, Darryl, or Darryl
Remember Larry, Darryl, and that other brother Darryl? The question, from our point of view, is: are we looking at Larry or one of the Darryl's? The question comes up again in an article in Fortune yesterday. The article argues that this is Darryl (one of them, anyway) we're dealing with, the one who cares only about software.
What's interesting is that among the commenters (by my count, the heavy majority agreeing with my thesis that Sun's value is the hardware business) is this link. In sum, Larry is calling IBM/DB2 out into the street for a gunfight. Which is the sheriff and which the bad guy?? Depends on which coast you live, I guess.
Larry has known for a long time that Oracle is faster than DB2, in the arenas he cares about. And that it is better adapted to the web world.
As I've said a few times: Larry has always wanted to bury the 370. With Sun's hardware business, he now has the equivalent of IBM's infrastructure. They've had that infrastructure, in one form or another, since the mid 1950's when Univac took months to decide on a name for their machine. That infrastructure has always been based on CPU's, files, and COBOL (on the mainframe, DB2 is just a veil over VSAM, which is one reason Oracle is a dog there; the notion that the 360 could do the full circle of computing from scientific to business didn't last very long ending with an early 370 one-off - it's a COBOL machine). Larry now has an infrastructure based on CPU's, a database, and java. IBM is facing the first threat to its mainframe cash cow, ever. Armonk, you have a problem.
What's interesting is that among the commenters (by my count, the heavy majority agreeing with my thesis that Sun's value is the hardware business) is this link. In sum, Larry is calling IBM/DB2 out into the street for a gunfight. Which is the sheriff and which the bad guy?? Depends on which coast you live, I guess.
Larry has known for a long time that Oracle is faster than DB2, in the arenas he cares about. And that it is better adapted to the web world.
As I've said a few times: Larry has always wanted to bury the 370. With Sun's hardware business, he now has the equivalent of IBM's infrastructure. They've had that infrastructure, in one form or another, since the mid 1950's when Univac took months to decide on a name for their machine. That infrastructure has always been based on CPU's, files, and COBOL (on the mainframe, DB2 is just a veil over VSAM, which is one reason Oracle is a dog there; the notion that the 360 could do the full circle of computing from scientific to business didn't last very long ending with an early 370 one-off - it's a COBOL machine). Larry now has an infrastructure based on CPU's, a database, and java. IBM is facing the first threat to its mainframe cash cow, ever. Armonk, you have a problem.
24 August 2009
A Bird on the Plate is Worth Two in the Cloud
There is a report today which amounts to a small, tiny, fledgling crow. Yummy, lightly saute'd with a good chianti.
It is a discussion of Clouding with SSD. As I have talked about here, I never expected Clouding to go SSD, just because the allure of Cloud is cheap and dirty. SSD is neither of those. Although, the justification given in the article is not the nirvana I have discussed, BCNF databases, but simple brute force speed with existing bloated data.
The discussion is also not a vindication of STEC, the loudest (well in some parts of the world, anyway) proponent. In fact, they talk about PCIe form factor, and that is in the wheelhouse of Fusion-io. My suspicion that the distributed machine with attached SSD in the PCIe slot will be at least as important as massive arrays of the EMC/IBM/Sun style looks to be getting stronger. Of course, it too could end up being a bit feathery in a few months.
The nature of the discussion is in terms, not of open Clouds (Amazon, et al), but of MySpace and the like. In other words, providers of Cloudy stuff to their own users. Who happen to want to store their data off-site. To me, that isn't quite what Cloud means, but redefining words to fit reality is characteristic.
I guess you can't have everything, but it does amount to a foot in the door. In time, the smart database folks will see the opportunity to have their cake and eat it too: SSD serving BCNF data will always be faster than serving the un-normalized stuff.
Another couple of steps forward.
It is a discussion of Clouding with SSD. As I have talked about here, I never expected Clouding to go SSD, just because the allure of Cloud is cheap and dirty. SSD is neither of those. Although, the justification given in the article is not the nirvana I have discussed, BCNF databases, but simple brute force speed with existing bloated data.
The discussion is also not a vindication of STEC, the loudest (well in some parts of the world, anyway) proponent. In fact, they talk about PCIe form factor, and that is in the wheelhouse of Fusion-io. My suspicion that the distributed machine with attached SSD in the PCIe slot will be at least as important as massive arrays of the EMC/IBM/Sun style looks to be getting stronger. Of course, it too could end up being a bit feathery in a few months.
The nature of the discussion is in terms, not of open Clouds (Amazon, et al), but of MySpace and the like. In other words, providers of Cloudy stuff to their own users. Who happen to want to store their data off-site. To me, that isn't quite what Cloud means, but redefining words to fit reality is characteristic.
I guess you can't have everything, but it does amount to a foot in the door. In time, the smart database folks will see the opportunity to have their cake and eat it too: SSD serving BCNF data will always be faster than serving the un-normalized stuff.
Another couple of steps forward.
18 August 2009
It was a Cloudy day, not a Sun was in the sky
Well, mangled Paul Simon a bit there, but this tid bit (via O'Reilly) from one Carl Hewitt set off the "The Thought Leaders Have Finally Figured Out the Obvious" bell:
As Jim Gray noted in "Distributed Computing Economics" (MSR-TR-2003-24) there is a growing imbalance between the computation power of billions of cores in aggregator datacenters and the relatively feeble fiber optic communications coming out of aggregator datacenters. This problem has now become so severe that Amazon has been forced to introduce a commercial service that lets users of their cloud import and export data through the post--as in, put it on storage devices and ship it by land, sea, or air.
For those who haven't been following along, I am among those who've been calling bullshit on the whole "we'll put our data in the (Amazon/Google/Microsoft/Grace L. Ferguson) cloud, save lots of money, and not have to worry about the annoying data any more" crowd. One needs to consider the bait and switch tactic of hucksters. Just because Google's motto is "don't be evil" doesn't mean they aren't. They are young and naive, and quite greedy. Their corporate clients are just greedy, and generally stupid. Witness the meltdown they have caused.
It used to be that business schools taught from a prime directive: don't buy (or outsource, same difference) your core competence. The same is true of your data. It is the life blood of your business (or life, if you are just a person). The dumbest thing you can do is hand it over to "the others". There will be hell to pay for those that do.
Read the article, and the comments. Some are intelligent, others very much less so.
An Update:
here is today's next step along the way to dissipating the Clouds so we have just the sky. The argument boils down to what those of us who were around when Service Bureaus (the real first one was created before my time by IBM) were all the rage: you can't have it both ways. You can't have interchangeable resources and anything like performance and security specific to each client. They all have to accept lunch as "Cheeseburger, Cheeseburger, no Coke, Pepsi", no hot dogs or quiche. And Cloud will fail for that reason. Well, it is failing in the sense that those who would be providers continue to backpedal. And they will keep doing it. Suits are such knuckleheads. How DID they get to make decisions?
Update II:
TheStreet.com has a story on 3 September about Boeing. Toward the end is this:
As far as the extensive 787 program outsourcing to suppliers, [CEO Jim] McNerney was asked whether he would do it the same way again. He said he would not.
As Jim Gray noted in "Distributed Computing Economics" (MSR-TR-2003-24) there is a growing imbalance between the computation power of billions of cores in aggregator datacenters and the relatively feeble fiber optic communications coming out of aggregator datacenters. This problem has now become so severe that Amazon has been forced to introduce a commercial service that lets users of their cloud import and export data through the post--as in, put it on storage devices and ship it by land, sea, or air.
For those who haven't been following along, I am among those who've been calling bullshit on the whole "we'll put our data in the (Amazon/Google/Microsoft/Grace L. Ferguson) cloud, save lots of money, and not have to worry about the annoying data any more" crowd. One needs to consider the bait and switch tactic of hucksters. Just because Google's motto is "don't be evil" doesn't mean they aren't. They are young and naive, and quite greedy. Their corporate clients are just greedy, and generally stupid. Witness the meltdown they have caused.
It used to be that business schools taught from a prime directive: don't buy (or outsource, same difference) your core competence. The same is true of your data. It is the life blood of your business (or life, if you are just a person). The dumbest thing you can do is hand it over to "the others". There will be hell to pay for those that do.
Read the article, and the comments. Some are intelligent, others very much less so.
An Update:
here is today's next step along the way to dissipating the Clouds so we have just the sky. The argument boils down to what those of us who were around when Service Bureaus (the real first one was created before my time by IBM) were all the rage: you can't have it both ways. You can't have interchangeable resources and anything like performance and security specific to each client. They all have to accept lunch as "Cheeseburger, Cheeseburger, no Coke, Pepsi", no hot dogs or quiche. And Cloud will fail for that reason. Well, it is failing in the sense that those who would be providers continue to backpedal. And they will keep doing it. Suits are such knuckleheads. How DID they get to make decisions?
Update II:
TheStreet.com has a story on 3 September about Boeing. Toward the end is this:
As far as the extensive 787 program outsourcing to suppliers, [CEO Jim] McNerney was asked whether he would do it the same way again. He said he would not.
13 August 2009
Two steps forward (one step back?)
In order to reach nirvana, fully normalized databases blazing through joined rows, it is necessary that application users, hardware vendors, and those vendors' suppliers all be enamoured of SSD. The issue has always been: which end of the string is the head and which the tail. In normal life, one can't push a string, only pull it. This argues for the notion that users must demand SSD systems rather than vendors offering them unbidden.
It seems to be working out the other way. May be.
We're most of the way through earnings season, and some clues are now evident. STEC is, by all lights, the pioneer in SSD at the enterprise level. They've been working on these devices since 2005, and have only now had significant shipments. They have been qualified to EMC, HP, IBM, Compellent, Hitachi, Fujitsu, Sun (that we know about). They've announced a $120 million deal with one of these, assumed to be EMC, to ship starting 3rd quarter.
STEC continues to claim that they are alone as enterprise SSD vendor. Intel's X-25 parts are user PC oriented. Fusion-io's part runs through the PCIe slot, rather then in a separate storage array. Texas Memory Systems has been building SSD (with DRAM for most of the time; they now have NAND parts) for enterprise machines for years.
What's kind of interesting is that STEC has just announced MLC based parts since "...several of our price-sensitive OEM customers are now looking for SSD alternatives which only a true MLC-based SSD can deliver...". What makes this interesting is that until this announcement, STEC was only talking about its SLC parts. Note that the NAND chips are sourced, not built by STEC. Samsung is the likely vendor; in any case, the MLC chips, which were heretofor said to be inferior for enterprise SSD are now kosher. This sounds to me to be a retrenchment; the "enterprise SSD" market wasn't quite as big as thought, or not so price inelastic as thought or ...
The implication is that machine vendors are not flocking to SSD, which means that users are not either. SSD is the future, and the future is closer than it appears in the mirror, but for SSD to be widely (and wildly) successful, it has to be part of a refactoring of the supported datastore. SSD will never be price or data density competitive with spinning rust. Efficiency (an order of magnitude less data retrieved faster) is the big win.
EMC, IBM, and Compellent have reported, and gave guidance for 3rd quarter that isn't skyrocketing. Whether Sun will be serious in the hardware segment under Uncle Larry is still moot; I've bet that hardware is the reason for the purchase, but we'll see. If so, then STEC will benefit. In all for the near term STEC, Intel, Fusion-io, et al are not looking like saloons in a mining town.
Get out there and push for 3NF. It's the only way to fly.
It seems to be working out the other way. May be.
We're most of the way through earnings season, and some clues are now evident. STEC is, by all lights, the pioneer in SSD at the enterprise level. They've been working on these devices since 2005, and have only now had significant shipments. They have been qualified to EMC, HP, IBM, Compellent, Hitachi, Fujitsu, Sun (that we know about). They've announced a $120 million deal with one of these, assumed to be EMC, to ship starting 3rd quarter.
STEC continues to claim that they are alone as enterprise SSD vendor. Intel's X-25 parts are user PC oriented. Fusion-io's part runs through the PCIe slot, rather then in a separate storage array. Texas Memory Systems has been building SSD (with DRAM for most of the time; they now have NAND parts) for enterprise machines for years.
What's kind of interesting is that STEC has just announced MLC based parts since "...several of our price-sensitive OEM customers are now looking for SSD alternatives which only a true MLC-based SSD can deliver...". What makes this interesting is that until this announcement, STEC was only talking about its SLC parts. Note that the NAND chips are sourced, not built by STEC. Samsung is the likely vendor; in any case, the MLC chips, which were heretofor said to be inferior for enterprise SSD are now kosher. This sounds to me to be a retrenchment; the "enterprise SSD" market wasn't quite as big as thought, or not so price inelastic as thought or ...
The implication is that machine vendors are not flocking to SSD, which means that users are not either. SSD is the future, and the future is closer than it appears in the mirror, but for SSD to be widely (and wildly) successful, it has to be part of a refactoring of the supported datastore. SSD will never be price or data density competitive with spinning rust. Efficiency (an order of magnitude less data retrieved faster) is the big win.
EMC, IBM, and Compellent have reported, and gave guidance for 3rd quarter that isn't skyrocketing. Whether Sun will be serious in the hardware segment under Uncle Larry is still moot; I've bet that hardware is the reason for the purchase, but we'll see. If so, then STEC will benefit. In all for the near term STEC, Intel, Fusion-io, et al are not looking like saloons in a mining town.
Get out there and push for 3NF. It's the only way to fly.
18 July 2009
Common Sense isn't always common
I have been meaning to write about common table expressions (CTE), since this syntax (sugar, some say) explicitly implements hierarchies, and given the reactionaries out there pining for IMS and such won't be quiet. I first used them on DB2/LUW years ago. SQLServer 2005 added support, and now Postgres 8.4 has, too. Oracle has had its CONNECT BY syntax much longer. CTE syntax got added to SQL-2003.
I read various database sites and blogs; among my favorites is PostgreSQL.org. (Note to self; add to Good Stuff.) It has a rotating set of links, and today there is an article discussing CTE in postgres. Since I haven't got around to installing 8.4 yet, the work is done for me. Read it. The article compares using CTE versus functions, which is a neat idea in and of itself. They demonstrate that CTE can run faster than standard syntax on a HDD machine. Very neat.
CTE is germane to the main point of this endeavor, SSD multi-machines, since a CTE is just massive joining behind the scenes. You get the point. What would be somewhat inefficient on HDD, becomes trivial on SSD. One more reason to put 1960's datastore behind. Paul McCartney played on the marquis of the Ed Sullivan theater a few days ago, but he still looked as ancient as Ozzie Osbourne. Hierarchy is from the same time. Dump it. XML is IMS in drag; all sequins and rhinestones.
I read various database sites and blogs; among my favorites is PostgreSQL.org. (Note to self; add to Good Stuff.) It has a rotating set of links, and today there is an article discussing CTE in postgres. Since I haven't got around to installing 8.4 yet, the work is done for me. Read it. The article compares using CTE versus functions, which is a neat idea in and of itself. They demonstrate that CTE can run faster than standard syntax on a HDD machine. Very neat.
CTE is germane to the main point of this endeavor, SSD multi-machines, since a CTE is just massive joining behind the scenes. You get the point. What would be somewhat inefficient on HDD, becomes trivial on SSD. One more reason to put 1960's datastore behind. Paul McCartney played on the marquis of the Ed Sullivan theater a few days ago, but he still looked as ancient as Ozzie Osbourne. Hierarchy is from the same time. Dump it. XML is IMS in drag; all sequins and rhinestones.
16 July 2009
Sunrise, Sunset. Game, Set, Match. Part 2.
As (what now turns out to be) Part 1 began:
Well, the other shoe dropped. Oracle has bid for Sun. In my tracking of speculation, Oracle had more weight than IBM. And so it has turned out. This might end up being a problem for IBM.
The buyout/merger/lunch is official (so far as shareholders go; DoJ might vomit, but we'll have to wait on that) today, and since yesterday's post about file systems is still fresh in my brain, I thought I'd take a minute or two to expand on why I believe Oracle wants Sun.
It's not java.
It never was java, to my way of thinking. So, here's my way of thinking.
Larry has been buying up applications for a few years. What to do next? Or should he just keep doing that? What is the next Everest?
IBM has been diligent in transforming itself (as has GE, by the way) from a maker of things to a finance and body shop. The reason for doing so is apparent: services require little (real) capital investment. All you do is rent some office space, some computer time (or cloud it? may be that, too), find some minimally competent bodies; and sell the hell out of the "organization". IBM was, is, and always will be, a sales organization. It needed mainframes in the beginning because the US economy had not yet made the transition to de-industrialization (2007: 40% of corporate profit is finance). And to that point, the company had a history of making bespoke machines; in no small motivation to lock-in its customers. Sales staff's major duty was "client management"; squeeze the last dollar out of each of them, and make damn sure clients didn't go someplace else.
The last place that Larry has not been able to take Oracle (the database) is the IBM mainframe. There are technical reasons (mostly due to how IBM designed those things decades ago) why Oracle has never run all that well on the IBM mainframe. Consider that rather than try to climb Everest, may be you could nuke the sucker, stroll over the debris and say that you've climbed Everest. Kind of true.
How, then, does Larry nuke IBM on the mainframe? That's where Sun comes in. Rather than trying to get Oracle (the database) to run as well on the mainframe doing COBOL support as DB2 does, why not build an alternative which is cheaper, faster, easier, and runs Oracle? Cool. Oracle and Sun have been together for years. Now Larry can get the hardware folks to do exactly what he wants. He has HP doing sort of what he wants; but he doesn't own them. And the Sun processors, some tech folk assert, are better than what IBM now uses.
Which brings us to btrfs. It was developed by Oracle, then open sourced. It is still under development, but if you read the site, you find that SSD support is what it's about. He'll have a machine with SSD and a file system tuned for them, running his database. He can make a compelling TCO argument that converting antique COBOL/VSAM applications to his super hot-rod database machine is a slam dunk. You heard it here first.
As I have been saying, SSD multi-core multi-processor high normal form database machines are the most cost effective way to handle data. The more I learn about what Larry is up to, the more convinced I am that this is where he's taking Oracle. I couldn't be happier.
Well, the other shoe dropped. Oracle has bid for Sun. In my tracking of speculation, Oracle had more weight than IBM. And so it has turned out. This might end up being a problem for IBM.
The buyout/merger/lunch is official (so far as shareholders go; DoJ might vomit, but we'll have to wait on that) today, and since yesterday's post about file systems is still fresh in my brain, I thought I'd take a minute or two to expand on why I believe Oracle wants Sun.
It's not java.
It never was java, to my way of thinking. So, here's my way of thinking.
Larry has been buying up applications for a few years. What to do next? Or should he just keep doing that? What is the next Everest?
IBM has been diligent in transforming itself (as has GE, by the way) from a maker of things to a finance and body shop. The reason for doing so is apparent: services require little (real) capital investment. All you do is rent some office space, some computer time (or cloud it? may be that, too), find some minimally competent bodies; and sell the hell out of the "organization". IBM was, is, and always will be, a sales organization. It needed mainframes in the beginning because the US economy had not yet made the transition to de-industrialization (2007: 40% of corporate profit is finance). And to that point, the company had a history of making bespoke machines; in no small motivation to lock-in its customers. Sales staff's major duty was "client management"; squeeze the last dollar out of each of them, and make damn sure clients didn't go someplace else.
The last place that Larry has not been able to take Oracle (the database) is the IBM mainframe. There are technical reasons (mostly due to how IBM designed those things decades ago) why Oracle has never run all that well on the IBM mainframe. Consider that rather than try to climb Everest, may be you could nuke the sucker, stroll over the debris and say that you've climbed Everest. Kind of true.
How, then, does Larry nuke IBM on the mainframe? That's where Sun comes in. Rather than trying to get Oracle (the database) to run as well on the mainframe doing COBOL support as DB2 does, why not build an alternative which is cheaper, faster, easier, and runs Oracle? Cool. Oracle and Sun have been together for years. Now Larry can get the hardware folks to do exactly what he wants. He has HP doing sort of what he wants; but he doesn't own them. And the Sun processors, some tech folk assert, are better than what IBM now uses.
Which brings us to btrfs. It was developed by Oracle, then open sourced. It is still under development, but if you read the site, you find that SSD support is what it's about. He'll have a machine with SSD and a file system tuned for them, running his database. He can make a compelling TCO argument that converting antique COBOL/VSAM applications to his super hot-rod database machine is a slam dunk. You heard it here first.
As I have been saying, SSD multi-core multi-processor high normal form database machines are the most cost effective way to handle data. The more I learn about what Larry is up to, the more convinced I am that this is where he's taking Oracle. I couldn't be happier.
File Systems for SSD are here
I have no idea how many folks really notice the quotes at the top of this endeavor, nevertheless, I have found a recent article which talks about Linux file systems. The interviewee (if that's a real word) has worked on writing file systems for Linux, and has the following (page 2) to say about SSD and file systems:
With regard to local file systems, I think btrfs is flexible enough to handle any projected hardware changes for the next decade, both in performance and capacity - in other words, SSDs and truly enormous quantities of storage. I also think we may see more flash-based devices exporting access to the hardware directly as the SSD market commoditizes, in which case LogFS and NILFS become even more interesting, perhaps as part of an open source wear-leveling layer that can integrate tightly with other Linux file systems.
So, the future is here. For information about btrfs, see Wikipedia for a start (SSD is explicitly supported).
tick, tock. what's up Doc?
With regard to local file systems, I think btrfs is flexible enough to handle any projected hardware changes for the next decade, both in performance and capacity - in other words, SSDs and truly enormous quantities of storage. I also think we may see more flash-based devices exporting access to the hardware directly as the SSD market commoditizes, in which case LogFS and NILFS become even more interesting, perhaps as part of an open source wear-leveling layer that can integrate tightly with other Linux file systems.
So, the future is here. For information about btrfs, see Wikipedia for a start (SSD is explicitly supported).
tick, tock. what's up Doc?
08 July 2009
Lots of Chrome, and Tail Fins
Phones will kill the Internet, but not databases
That was the way it was shown in the Musings in Process list, but times have changed. A couple of weeks on the calendar, but eons in system time. We now have ChromeOS, so my musing was almost right; might still be eventually, but for the near- to mid-term, it looks now as though the netbook may well rule the roost.
The netbook means it's back to the future (well, the 1970's anyway). The netbook means that host/terminal computing has won. Took a while, but won it has. Now, the world will look like the XTerm world of the 80's. M$'s worst nightmare. Odd thing is, as I type this, ARM's share price is going down and so is Texas Instruments. Both are mentioned in articles about ChromeOS and how it will run. That I can't explain.
In any case, high normal form databases serving, effectively, XTerms is as close to Nirvana as one could hope for.
Tick. Tick. Tick.
That was the way it was shown in the Musings in Process list, but times have changed. A couple of weeks on the calendar, but eons in system time. We now have ChromeOS, so my musing was almost right; might still be eventually, but for the near- to mid-term, it looks now as though the netbook may well rule the roost.
The netbook means it's back to the future (well, the 1970's anyway). The netbook means that host/terminal computing has won. Took a while, but won it has. Now, the world will look like the XTerm world of the 80's. M$'s worst nightmare. Odd thing is, as I type this, ARM's share price is going down and so is Texas Instruments. Both are mentioned in articles about ChromeOS and how it will run. That I can't explain.
In any case, high normal form databases serving, effectively, XTerms is as close to Nirvana as one could hope for.
Tick. Tick. Tick.
04 July 2009
Erik Naggum's thoughts on XML
Erik Naggum died recently. He was a SGML expert, and an opponent of XML used stupidly. This is from a 2002 usenet thread. Couldn't say it better myself. The point, of course, is that XML data requires specific code to extract the raw data, which is then processed by the remaining application code. As others have pointed out, one could use comma separated value files, skip the processing code, and go straight to the application code. Or use a database with SQL.
People who think object-orientation is so great, have generally failed to
grasp the value of data-driven designs despite the serious attempt at
making such design easier to model, and think solely in terms of code-
driven designs where their class hierarchies are poor adaptations to
their incompetent coding styles. This is extremely depressing, as the
interminable "software crisis" is a result of code-driven design. SGML
and XML were attempts at promoting data-driven design that would produce
data that was _supposedly_ indepedent of any application. The result is
that people who have so little clue they should have attracted one simply
by the sucking power of vacuum do code-driven designs in XML, which is
_really_ retarded, and then they need to store their moronically designed
data in databases, which is, of course, too hard given their braindamaged
designs, so the relational model does not "work" for them.
People who think object-orientation is so great, have generally failed to
grasp the value of data-driven designs despite the serious attempt at
making such design easier to model, and think solely in terms of code-
driven designs where their class hierarchies are poor adaptations to
their incompetent coding styles. This is extremely depressing, as the
interminable "software crisis" is a result of code-driven design. SGML
and XML were attempts at promoting data-driven design that would produce
data that was _supposedly_ indepedent of any application. The result is
that people who have so little clue they should have attracted one simply
by the sucking power of vacuum do code-driven designs in XML, which is
_really_ retarded, and then they need to store their moronically designed
data in databases, which is, of course, too hard given their braindamaged
designs, so the relational model does not "work" for them.
03 July 2009
Cloud: Lucy in the Sky with Razorblades ... Updated, twice
"'The time has come,' the walrus said, 'to talk of many things'"
Today I endeavor to check off one of the musings in process, dealing with silos in the sky. I am motivated by a bit of bloviating I ran across in my surfing, this nonsense. I will leave it to readers to endure it on their own. I will only pull out the occasional quote to gloat.
A bit of background is in order. With the arrival of Web 2.0, and to a lesser extent Web .00001, coders began to change the rules back to what existed in the 1960's: all code is smart and all data is dumb. Java became COBOL, and data became copybooks. The main problems are that the coders are dishonest about their desires and intentions, and the effect of what they are attempting will eventually lead to the problems which led Dr. Codd to devise the relational model and database; he is not responsible for SQL, Chamberlin is the guilty party there and he continues to sin with XQuery and the like.
In the 1960's there developed the industry segment known as Service Bureaus. IBM was a major player. A service bureau was a connected computer service, generally over leased lines, to which a company could off-load its applications. Often, applications were provided by the service bureau. The service bureau agreed to provide resources as needed.
SOA and cloud are just http versions of service bureaus. Service bureaus fell out of favor for the obvious reasons: they didn't actually manage to provide resources on demand, they weren't any more reliable than in-house, they weren't any more (often quite less) secure than in-house, and they didn't save their customers any money. After all, they had to make a profit doing what had been a simple cost for their clients. There didn't turn out (surprise, surprise) to be any economies of scale. There won't be for SOA or cloud, either. That is no surprise.
The notion of provisioning in the cloud being cheaper and more scalable is founded on a single, false, assumption. That is: demand spikes for resources among the clients are uncorrelated, or that gross demand for all clients is either constant or monotonically increasing. That the assumption is false is easy to fathom; there are easily identifiable real world generators of demand spikes, and few are unique to either specific companies or industries. Daily closing hour, Friday peaks, weekend peaks, seasonal peaks, month end peaks, and so on. The argument is made that the resource demands made by the release of the latest iPhone (as example) are transitory to Apple, the retail sites, and AT&T. What is ignored is that all such organizations, unless they are failing, will absorb these resources in their continuing (growing) business in short future.
The real failing of SOA/cloud will be imposition of lowest-common-denominator data structures. Which brings us back to that execrable post. Since these NoSQL folk have it in their heads that "all data be mine", just as COBOL programmers did in the 1960's, they will build silos in the sky, just as their grandpappies built silos in glass rooms (most likely don't even know what a glass room is, alas). As that old saying, those that ignore history are doomed to repeat it.
So, some quotes.
"'Relational databases give you too much. They force you to twist your object data to fit a RDBMS [relational database management system],' said Jon Travis, principal engineer at Java toolmaker SpringSource..."
Mr. Travis fails to understand that object data are separate and apart from object methods. Always were, and always will be. The reason OODBMS failed was just for that reason. There is nothing to be gained, and a lot of pain to be endured, from storing all that method text more than once. The instance data, stored relationally which is the minimum cover (so to speak) of the data requirement, identifies each instance of the class. There is no "twisting" to be done. The relational model keeps all the data nice and neat and constrained to correctness; no code needed. The method text is needed only for transitory changes from one stable, correct state to the next. Each state being written back to the relational database. Simple as that. But that means far less code, which coders view as a job threat. Yes, yes it is.
"'SQL is an awkward fit for procedural code, and almost all code is procedural,' said Curt Monash, an independent database analyst and blogger."
Mr. Monash is infamous in the database world, and proves it again. The OO world, it claims anyway, is explicitly non-procedural. It is event driven, through message passing. So it says. In any case, how the object's data is manipulated after construction is of no concern to the data store, be it a RDBMS or shoe box full of 3x5 cards. All such data manipulation is transitory and irrelevant to state. State is what the data store cares about. Mr. Monash doesn't get that.
Siloing, or rather the riddance of same, was Dr. Codd's concern. These KiddieKoders are such ignorant reactionaries, and don't even know it. The title of his first published paper: "A Relational Model of Data for Large Shared Data Banks" puts it all together. No siloing. The cloud, with these primitive one-off data structures, create the massive data problem, not solve it.
The serious problem with the cloud, however, is that, if widely adopted, it may kill off the advantage of the point of this endeavor, the SSD multi-machine database. SSD storage has the advantage of supporting high normal form, and thus parsimonious, data structures. If cloud becomes ascendant, it will adopt lowest-common-denominator storage, i.e. cheap. And SSD multi-machines are not that. I believe it can be shown that SSD storage will be faster (with a smaller footprint) for high normal form databases than for the de-normalized spreadsheets so beloved of KiddieKoders. But cloud providers aren't going to be that smart. Back to the 60's.
I'll close with a quote from those I keep around as sigs for e-mails:
This on-demand, SaaS phenomenon is something I've lived through three times in my career now. The first time, it was called service bureaux. The second time, it was application service providers, and now it's called SaaS. People will realise the hype about SaaS companies has been overblown within the next two years. ... People are stupid. History has shown it repeats itself, and people make the same mistakes.
-- Harry Debes/2008
Update:
I check up with Gartner PRs every now and again; I don't spend the coin to subscribe. Well, they still haven't released the database survey (always looking to see how much DB2 has fallen behind), but there is a survey concerning SaaS that, surprise surprise, reinforces my thesis. Gartner is no better at prognosticating than Madam Zonga, but they do surveys as well as anybody. Have a look.
Update 2:
Another Gartner-ism from a newer report on cloud specifically:
"As cloud computing evolves, combinations of cloud services will be too complex and untrustworthy for end consumers to handle their integration, according to Gartner, Inc. Gartner predicts that as cloud services are adopted, the ability to govern their use, performance and delivery will be provided by cloud service brokerages."
In other words, cloud will need yet a new infrastructure to make this really, really cheap and easy off-loading of responsibility possible. Have these folks already forgotten the clustered duck that was CORBA? God almighty.
Today I endeavor to check off one of the musings in process, dealing with silos in the sky. I am motivated by a bit of bloviating I ran across in my surfing, this nonsense. I will leave it to readers to endure it on their own. I will only pull out the occasional quote to gloat.
A bit of background is in order. With the arrival of Web 2.0, and to a lesser extent Web .00001, coders began to change the rules back to what existed in the 1960's: all code is smart and all data is dumb. Java became COBOL, and data became copybooks. The main problems are that the coders are dishonest about their desires and intentions, and the effect of what they are attempting will eventually lead to the problems which led Dr. Codd to devise the relational model and database; he is not responsible for SQL, Chamberlin is the guilty party there and he continues to sin with XQuery and the like.
In the 1960's there developed the industry segment known as Service Bureaus. IBM was a major player. A service bureau was a connected computer service, generally over leased lines, to which a company could off-load its applications. Often, applications were provided by the service bureau. The service bureau agreed to provide resources as needed.
SOA and cloud are just http versions of service bureaus. Service bureaus fell out of favor for the obvious reasons: they didn't actually manage to provide resources on demand, they weren't any more reliable than in-house, they weren't any more (often quite less) secure than in-house, and they didn't save their customers any money. After all, they had to make a profit doing what had been a simple cost for their clients. There didn't turn out (surprise, surprise) to be any economies of scale. There won't be for SOA or cloud, either. That is no surprise.
The notion of provisioning in the cloud being cheaper and more scalable is founded on a single, false, assumption. That is: demand spikes for resources among the clients are uncorrelated, or that gross demand for all clients is either constant or monotonically increasing. That the assumption is false is easy to fathom; there are easily identifiable real world generators of demand spikes, and few are unique to either specific companies or industries. Daily closing hour, Friday peaks, weekend peaks, seasonal peaks, month end peaks, and so on. The argument is made that the resource demands made by the release of the latest iPhone (as example) are transitory to Apple, the retail sites, and AT&T. What is ignored is that all such organizations, unless they are failing, will absorb these resources in their continuing (growing) business in short future.
The real failing of SOA/cloud will be imposition of lowest-common-denominator data structures. Which brings us back to that execrable post. Since these NoSQL folk have it in their heads that "all data be mine", just as COBOL programmers did in the 1960's, they will build silos in the sky, just as their grandpappies built silos in glass rooms (most likely don't even know what a glass room is, alas). As that old saying, those that ignore history are doomed to repeat it.
So, some quotes.
"'Relational databases give you too much. They force you to twist your object data to fit a RDBMS [relational database management system],' said Jon Travis, principal engineer at Java toolmaker SpringSource..."
Mr. Travis fails to understand that object data are separate and apart from object methods. Always were, and always will be. The reason OODBMS failed was just for that reason. There is nothing to be gained, and a lot of pain to be endured, from storing all that method text more than once. The instance data, stored relationally which is the minimum cover (so to speak) of the data requirement, identifies each instance of the class. There is no "twisting" to be done. The relational model keeps all the data nice and neat and constrained to correctness; no code needed. The method text is needed only for transitory changes from one stable, correct state to the next. Each state being written back to the relational database. Simple as that. But that means far less code, which coders view as a job threat. Yes, yes it is.
"'SQL is an awkward fit for procedural code, and almost all code is procedural,' said Curt Monash, an independent database analyst and blogger."
Mr. Monash is infamous in the database world, and proves it again. The OO world, it claims anyway, is explicitly non-procedural. It is event driven, through message passing. So it says. In any case, how the object's data is manipulated after construction is of no concern to the data store, be it a RDBMS or shoe box full of 3x5 cards. All such data manipulation is transitory and irrelevant to state. State is what the data store cares about. Mr. Monash doesn't get that.
Siloing, or rather the riddance of same, was Dr. Codd's concern. These KiddieKoders are such ignorant reactionaries, and don't even know it. The title of his first published paper: "A Relational Model of Data for Large Shared Data Banks" puts it all together. No siloing. The cloud, with these primitive one-off data structures, create the massive data problem, not solve it.
The serious problem with the cloud, however, is that, if widely adopted, it may kill off the advantage of the point of this endeavor, the SSD multi-machine database. SSD storage has the advantage of supporting high normal form, and thus parsimonious, data structures. If cloud becomes ascendant, it will adopt lowest-common-denominator storage, i.e. cheap. And SSD multi-machines are not that. I believe it can be shown that SSD storage will be faster (with a smaller footprint) for high normal form databases than for the de-normalized spreadsheets so beloved of KiddieKoders. But cloud providers aren't going to be that smart. Back to the 60's.
I'll close with a quote from those I keep around as sigs for e-mails:
This on-demand, SaaS phenomenon is something I've lived through three times in my career now. The first time, it was called service bureaux. The second time, it was application service providers, and now it's called SaaS. People will realise the hype about SaaS companies has been overblown within the next two years. ... People are stupid. History has shown it repeats itself, and people make the same mistakes.
-- Harry Debes/2008
Update:
I check up with Gartner PRs every now and again; I don't spend the coin to subscribe. Well, they still haven't released the database survey (always looking to see how much DB2 has fallen behind), but there is a survey concerning SaaS that, surprise surprise, reinforces my thesis. Gartner is no better at prognosticating than Madam Zonga, but they do surveys as well as anybody. Have a look.
Update 2:
Another Gartner-ism from a newer report on cloud specifically:
"As cloud computing evolves, combinations of cloud services will be too complex and untrustworthy for end consumers to handle their integration, according to Gartner, Inc. Gartner predicts that as cloud services are adopted, the ability to govern their use, performance and delivery will be provided by cloud service brokerages."
In other words, cloud will need yet a new infrastructure to make this really, really cheap and easy off-loading of responsibility possible. Have these folks already forgotten the clustered duck that was CORBA? God almighty.
29 June 2009
Oracle does Data
SeekingAlpha is the most fact based of the myriad stock "analysis" sites available. Since I spend a good deal of my time these days getting rich day-trading (yeah, sure) I have fallen under its spell.
Today yields a post about Oracle and its "new" data modeler. I haven't tried it out yet (I'm still primarily interested in DB2, sigh), but from the write up, this sounds like another nail in the de-normalization coffin.
Tick. Tick. Tick.
Today yields a post about Oracle and its "new" data modeler. I haven't tried it out yet (I'm still primarily interested in DB2, sigh), but from the write up, this sounds like another nail in the de-normalization coffin.
Tick. Tick. Tick.
22 June 2009
Zealous. Conviction.
Some might consider a blog dedicated to the relational model, relational databases, and the extinction of xml as datastore rather quixotic. Zealous, even. This month marks the 39th anniversary of Dr. Codd's public paper; the 40th of the first paper, internal at IBM, is in August. All along, Dr. Codd, and Chris Date most prominently of those who followed, asserted that the relational model and the database were logical constructs. In particular, physical implementation was a vendor detail; vendors were free to use any hardware and coding they wished to support the relational database.
Early on, the implementation of the join was seen to be a stumbling block. Hardware based on rotating storage had to be supplemented with ever more ingenious buffering to make joins less costly. Today, industrial strength RDBMS as Oracle and DB2 with sophisticated RAID disk subsystems can support high normal form relational databases. But they are not only expensive to acquire, but expensive to maintain. They require lots of vendor specific knowledge, since the file structures and buffer structures are not defined either in the relational model or in SQL (not strictly even a part of the relational model, but few understand that).
I, and others, made the connection a few years ago, that freeing the relational database from rotating storage meant that any logical database structure, in particular those of high normal form, would be hardware neutral with a machine built on multi-cores/multi-processors with solid state disk storage; even with current products. Nirvana had been reached.
Here in rural Connecticut, as other locations I will admit, the mainframe/COBOL/VSAM mindset (hidebound as it is) still holds sway. Yet I continue to preach. Some say, zealously. Though I haven't yet delved into the xml morass, I have acquired one knucklehead correspondent. Not "Database Debunking" territory, but there is always hope. I gather that some nerve has been pinched. That is a good thing. The TCO equation is becoming ever more difficult to ignore. The ever more widespread development of SSD, and more importantly, NAND controllers make it all inevitable. Any vendor with access to NAND and an OEM controller will be able to produce these drives.
Which brings me to the zealous. Not I, believe it or not. I first became aware of SSD with Texas Memory Systems. But they dealt, at that time, in hundreds of kilobuck DRAM based hardware. It was only in the last few months, with the Intel X-25 parts, that I became aware of flash SSD; turns out that flash SSD had been around for some time, just not very good until the Intel drives. From there I found STEC, which, by all accounts, is the numero uno enterprise vendor. They also are not shy about making their case. This past week, the principals were interviewed by a trade organ and the CEO had this to say:
"To say that they [new and existing vendors] are competing with STEC is really a misunderstanding. We don't have a direct competitor today. We've got the five customers worldwide that we went after. Basically, we have all of our target customers."
The COO, his brother:
"We see in the next three or four years flash drives from STEC and others wiping out the whole hard drive industry for high-end storage. The biggest guys in the industry are forced to follow in our footsteps instead of us following them."
Of course, they would say good things about their business. The thing is, the industry analysts agree. The fact that there are others in the industry *trying* to take the business is far more significant. If no other storage manufacturer cared, and if there were no startups and micro-cap privates working on flash SSD drives, flash SSD controllers, flash NAND replacements; then SSD, it could be argued, would just be the niche product it has been for decades with Texas Memory having it largely to itself. Texas Memory now ships flash SSD, in addition to its DRAM machines.
Repeat: "wiping out the whole hard drive industry for high-end storage". But, it looks to me that along with STEC dominating high end, Intel et al will wipe out HDD in consumer computers. Not that this is directly relevant to the RDBMS world in which I live. I left dBase II behind some time ago.
Tick. Tick. Tick.
Early on, the implementation of the join was seen to be a stumbling block. Hardware based on rotating storage had to be supplemented with ever more ingenious buffering to make joins less costly. Today, industrial strength RDBMS as Oracle and DB2 with sophisticated RAID disk subsystems can support high normal form relational databases. But they are not only expensive to acquire, but expensive to maintain. They require lots of vendor specific knowledge, since the file structures and buffer structures are not defined either in the relational model or in SQL (not strictly even a part of the relational model, but few understand that).
I, and others, made the connection a few years ago, that freeing the relational database from rotating storage meant that any logical database structure, in particular those of high normal form, would be hardware neutral with a machine built on multi-cores/multi-processors with solid state disk storage; even with current products. Nirvana had been reached.
Here in rural Connecticut, as other locations I will admit, the mainframe/COBOL/VSAM mindset (hidebound as it is) still holds sway. Yet I continue to preach. Some say, zealously. Though I haven't yet delved into the xml morass, I have acquired one knucklehead correspondent. Not "Database Debunking" territory, but there is always hope. I gather that some nerve has been pinched. That is a good thing. The TCO equation is becoming ever more difficult to ignore. The ever more widespread development of SSD, and more importantly, NAND controllers make it all inevitable. Any vendor with access to NAND and an OEM controller will be able to produce these drives.
Which brings me to the zealous. Not I, believe it or not. I first became aware of SSD with Texas Memory Systems. But they dealt, at that time, in hundreds of kilobuck DRAM based hardware. It was only in the last few months, with the Intel X-25 parts, that I became aware of flash SSD; turns out that flash SSD had been around for some time, just not very good until the Intel drives. From there I found STEC, which, by all accounts, is the numero uno enterprise vendor. They also are not shy about making their case. This past week, the principals were interviewed by a trade organ and the CEO had this to say:
"To say that they [new and existing vendors] are competing with STEC is really a misunderstanding. We don't have a direct competitor today. We've got the five customers worldwide that we went after. Basically, we have all of our target customers."
The COO, his brother:
"We see in the next three or four years flash drives from STEC and others wiping out the whole hard drive industry for high-end storage. The biggest guys in the industry are forced to follow in our footsteps instead of us following them."
Of course, they would say good things about their business. The thing is, the industry analysts agree. The fact that there are others in the industry *trying* to take the business is far more significant. If no other storage manufacturer cared, and if there were no startups and micro-cap privates working on flash SSD drives, flash SSD controllers, flash NAND replacements; then SSD, it could be argued, would just be the niche product it has been for decades with Texas Memory having it largely to itself. Texas Memory now ships flash SSD, in addition to its DRAM machines.
Repeat: "wiping out the whole hard drive industry for high-end storage". But, it looks to me that along with STEC dominating high end, Intel et al will wipe out HDD in consumer computers. Not that this is directly relevant to the RDBMS world in which I live. I left dBase II behind some time ago.
Tick. Tick. Tick.
19 June 2009
Real Time. No Bill Maher, But I'll be Funny. I promise.
The point of this endeavor is to find, and at times create, reasons to embrace not only the solid state disk multi-processor database machine, but also the relational database as envisioned by Dr. Codd. The synergy between the machine and the data model, to me anyway, is so obvious that the need to have this blog is occasionally disappointing. Imagine having to publicize and promote the Pythagorean Theorem. But it keeps me off the streets.
Whilst sipping beer and watching the Yankees and Red Sox (different games on different TVs) at a local tavern, I noticed for the umpteenth time that the staff were using Aloha order entry terminals. Aloha has been around for years, and I've seen it in many establishments. The sight dredged up a memory from years ago. I had spent some time attempting to be the next Ernst Haas, but then returned to database systems when it didn't work out. I was working as MIS director for a local contractor, and convinced them that it might be a good idea to replace their TI-990 minicomputer applications with something a tad more up to date. They took me up on the idea, so I had to find replacements for their applications. Eventually, we settled on two applications both written to the Progress database/4GL. They're still using them.
Progress was and is relational, but not especially SQL oriented. While talking with the developers of one of the applications (both were general ledger driven verticals), we talked about some of the configuration switches available. The ledger posting subsystems each had a switch for real time versus batch updating. The recommendation was to batch everything except the order entry; inventory needed to be up to date, but A/R, purchasing, and the like would put too much strain on the machine. And don't even think about doing payroll updates in real time.
Now, the schema for this application printed out on a stack of 11 by 14 greenbar about a foot thick. There were a lot of tables with a slew of columns. Looking back, not very relational. Looking ahead, do we need batch processing any longer?
The reason for doing batch processing goes back to the first computers, literally. Even after the transition from tape to disk, most code was still sequential (and my recent experience with a Fortune 100 dinosaur confirms this remains true) and left as is. Tape files on disk.
But now the SSD/multi machine makes it not only feasible, but preferable to run such code with the switch on real time. No more batch processing. No more worrying about the "batch window" overnight. The amount of updating to tables is, at worst, exactly the same and at best, less. Less when the table structure is normalized and therefore less data exists to be modified. Each update is a few microseconds, since the delay on disk based joined tables is removed. The I/O load is the reason to avoid real time updates in databased applications. We're not talking about rocket science computations, just moving data about in the tables and perhaps an addition here and there.
New rule: we don't need no stinking batches.
Whilst sipping beer and watching the Yankees and Red Sox (different games on different TVs) at a local tavern, I noticed for the umpteenth time that the staff were using Aloha order entry terminals. Aloha has been around for years, and I've seen it in many establishments. The sight dredged up a memory from years ago. I had spent some time attempting to be the next Ernst Haas, but then returned to database systems when it didn't work out. I was working as MIS director for a local contractor, and convinced them that it might be a good idea to replace their TI-990 minicomputer applications with something a tad more up to date. They took me up on the idea, so I had to find replacements for their applications. Eventually, we settled on two applications both written to the Progress database/4GL. They're still using them.
Progress was and is relational, but not especially SQL oriented. While talking with the developers of one of the applications (both were general ledger driven verticals), we talked about some of the configuration switches available. The ledger posting subsystems each had a switch for real time versus batch updating. The recommendation was to batch everything except the order entry; inventory needed to be up to date, but A/R, purchasing, and the like would put too much strain on the machine. And don't even think about doing payroll updates in real time.
Now, the schema for this application printed out on a stack of 11 by 14 greenbar about a foot thick. There were a lot of tables with a slew of columns. Looking back, not very relational. Looking ahead, do we need batch processing any longer?
The reason for doing batch processing goes back to the first computers, literally. Even after the transition from tape to disk, most code was still sequential (and my recent experience with a Fortune 100 dinosaur confirms this remains true) and left as is. Tape files on disk.
But now the SSD/multi machine makes it not only feasible, but preferable to run such code with the switch on real time. No more batch processing. No more worrying about the "batch window" overnight. The amount of updating to tables is, at worst, exactly the same and at best, less. Less when the table structure is normalized and therefore less data exists to be modified. Each update is a few microseconds, since the delay on disk based joined tables is removed. The I/O load is the reason to avoid real time updates in databased applications. We're not talking about rocket science computations, just moving data about in the tables and perhaps an addition here and there.
New rule: we don't need no stinking batches.
17 June 2009
And the Beat Goes On
Some more industry news. Compellent Technologies is a smallish (relative to EMC) storage subsystem supplier. They qualified the STEC SSD drives some time ago; among the earliest to do so.
The have confabs, as do most vendors. They put out a press release for their recent meeting. Compellent doesn't, so far as I can find out, care whether they ship HDD or SSD subsystems. But they did run a survey during the fest, and found that 91% of their business partners (78% of customers; i.e. the sheep need to be led) checked the boxes for "I really need to do SSD" and the like. The train keeps gathering speed. Now, we just need for the CIO types to realize that SSD systems have much more innovative/disruptive implications for application design.
Tick. Tick. Tick.
Here is the PR.
The have confabs, as do most vendors. They put out a press release for their recent meeting. Compellent doesn't, so far as I can find out, care whether they ship HDD or SSD subsystems. But they did run a survey during the fest, and found that 91% of their business partners (78% of customers; i.e. the sheep need to be led) checked the boxes for "I really need to do SSD" and the like. The train keeps gathering speed. Now, we just need for the CIO types to realize that SSD systems have much more innovative/disruptive implications for application design.
Tick. Tick. Tick.
Here is the PR.
13 June 2009
Let's Go Dutch
In keeping with the theme of this endeavor, every now and then I return to the basics; which is to say the relational model. Since I'm neither Date nor CELKO, I seek out published authors who have figured it out (agree with me, he he). Today, a Dutch Treat, Applied Mathematics for Database Professionals. The book discusses a formalized model of data, which was first developed by the authors' mentor, Bert de Brock in 1995. As with much that isn't grounded in Flavor of the Month, the ideas took some time to develop, and still remain relevant. Date and Darwen provide a, rather tepid for some reason, forward. I don't quite understand why they aren't explicitly supportive, but there you are.
What de Haan and Koopelaars talk about won't (unfortunately) likely get you that next job doing SQL Server for Wendy's. On the other hand, it will clarify why and how high normal form data structures do what this endeavor seeks: define the most parsimonious data structure which is self-regulating and self-defending. As I have been saying, the SSD/multi machine makes this fully doable in the face of the "joins are too slows" rabble. Following the encapsulation discussion earlier, an existing bloated flat file database can be refactored to proper normal form, and existing code (COBOL, C++, java, etc.) reads from views which replicate the old table structures and writes through stored procedures. No sane code writes directly anyway (modulo 1970 era COBOL, alas).
I won't attempt to rehash the text, but I will give an overview (the link will take you to Amazon, as the eager amongst you have already found). The first part, four chapters, deals with math; logic, set theory, and functions. This is a rather complete treatment, which is welcome. The second part, five chapters, is titled Application, and is the meat of the effort. Here the authors build both the vocabulary of their modeling approach, and the model. It is expressed in the language of the math they set out in part 1, not SQL or the DDL of some particular engine. (That is dealt with in the third part, one chapter, at the end and is Oracle syntax.) I found it quite alien on first reading. The book does demand re-reading, but rewards one with a very clear understanding of what it is possible to do in a data modeling vocabulary.
The model is based on the idea of a database state which is initially correct, and that this state will only be modified into another state which is also correct. The definition of correct is closed world, and the transition process is database centric, not table centric. Years and years ago, I worked with the Progress database/4GL, which was not a SQL implementation (its 4GL was the basis of programming 99.44% of the time) by intent, although it did support SQL. I talked with one of its principle developers, The Wizard, at a conference who observed that a database, if it really is such, is whole. It is not a collection of files/tables; if you can replace any one table at will, what you have is not a database. It just isn't. That was one of the many epiphanies in life. de Haan and Koopelaars take this approach with no remorse. The object of interest is a database state. Refreshing.
The discussion of constraints is more explicit than usual. They describe constraints as tuple, table, and database level. With a high normal form view of data, there will be more tuple constraints than is common is legacy databases.
The last chapter presents the method using Oracle syntax. The most interesting aspect of the chapter is the evolution of what the authors refer to as the Execution Model. There are six of increasing correctness, and are implemented both declaratively and with triggers. The trigger code is Oracle specific, in that other databases, SQL Server for example, define change of state coverage differently; some databases require separate triggers for each of insert, update, and delete, others support multiple actions for a trigger. And Oracle users have to deal with the dreaded mutating table error; DB2 users will not since DB2 provides support for before and after images of rows. But Oracle remains the mindshare leader.
So, 300 pages which will, unless you have a degree (or nearly) in undergraduate math, stretch your understanding of what math there really is in the Relational Model; and how that can be leveraged into far stronger database specifications. Well worth the effort.
What de Haan and Koopelaars talk about won't (unfortunately) likely get you that next job doing SQL Server for Wendy's. On the other hand, it will clarify why and how high normal form data structures do what this endeavor seeks: define the most parsimonious data structure which is self-regulating and self-defending. As I have been saying, the SSD/multi machine makes this fully doable in the face of the "joins are too slows" rabble. Following the encapsulation discussion earlier, an existing bloated flat file database can be refactored to proper normal form, and existing code (COBOL, C++, java, etc.) reads from views which replicate the old table structures and writes through stored procedures. No sane code writes directly anyway (modulo 1970 era COBOL, alas).
I won't attempt to rehash the text, but I will give an overview (the link will take you to Amazon, as the eager amongst you have already found). The first part, four chapters, deals with math; logic, set theory, and functions. This is a rather complete treatment, which is welcome. The second part, five chapters, is titled Application, and is the meat of the effort. Here the authors build both the vocabulary of their modeling approach, and the model. It is expressed in the language of the math they set out in part 1, not SQL or the DDL of some particular engine. (That is dealt with in the third part, one chapter, at the end and is Oracle syntax.) I found it quite alien on first reading. The book does demand re-reading, but rewards one with a very clear understanding of what it is possible to do in a data modeling vocabulary.
The model is based on the idea of a database state which is initially correct, and that this state will only be modified into another state which is also correct. The definition of correct is closed world, and the transition process is database centric, not table centric. Years and years ago, I worked with the Progress database/4GL, which was not a SQL implementation (its 4GL was the basis of programming 99.44% of the time) by intent, although it did support SQL. I talked with one of its principle developers, The Wizard, at a conference who observed that a database, if it really is such, is whole. It is not a collection of files/tables; if you can replace any one table at will, what you have is not a database. It just isn't. That was one of the many epiphanies in life. de Haan and Koopelaars take this approach with no remorse. The object of interest is a database state. Refreshing.
The discussion of constraints is more explicit than usual. They describe constraints as tuple, table, and database level. With a high normal form view of data, there will be more tuple constraints than is common is legacy databases.
The last chapter presents the method using Oracle syntax. The most interesting aspect of the chapter is the evolution of what the authors refer to as the Execution Model. There are six of increasing correctness, and are implemented both declaratively and with triggers. The trigger code is Oracle specific, in that other databases, SQL Server for example, define change of state coverage differently; some databases require separate triggers for each of insert, update, and delete, others support multiple actions for a trigger. And Oracle users have to deal with the dreaded mutating table error; DB2 users will not since DB2 provides support for before and after images of rows. But Oracle remains the mindshare leader.
So, 300 pages which will, unless you have a degree (or nearly) in undergraduate math, stretch your understanding of what math there really is in the Relational Model; and how that can be leveraged into far stronger database specifications. Well worth the effort.
07 June 2009
But, I can see Russia from Alaska
As Sarah said, you can see Russia from Alaska. With regard to SSD and relational databases, a view is going to motivate another major change to how we build database applications.
During this down time, I have been keeping track of the producer of SSD, STEC (both its name and stock symbol; it started out as Simple Technology), and they continue to announce agreements with large systems sellers, HP being the newest. A notion has begun to bubble in my brain. Said notion runs along these lines. The major sellers of hardware have gotten the SSD religion. Absolute capacity of SSD will not, in a reasonable timeframe, catch up with HDD. This puts the hardware folk in the following situation: they have these machines which can process data at unheard of speeds, but not the massive amounts of data that are the apple of the eye of flat file KiddieKoders. What to do, what to do?
The fact that STEC and Intel are making available SLC drives, with STEC targeting big server and near mainframe machines, and the machine makers taking them up on the deal means that there really is a paradigm shift in process. Aside: Fusion-io is targeting server quality drives on PCIe cards. That puzzled me for a bit, but the answer seems clear now. Such a drive is aimed squarely at Google, et al. The reason: the Google approach is one network, many machines. A disk farm is not their answer. Fusion-io can sell a ton of SSD to them. Alas, Fusion-io is private, so no money for the retail investor. Sigh.
The paradigm shift is motivated by the TCO equation, as much as blazing speed. SSD gulps a fraction of the power of HDD, and generates a fraction of the heat and thereby demands a fraction of the cooling. So, total power required per gigabyte is through the floor compared to HDD. There is further the reduced demand to do RAID-10, since the SSD takes care of data access speed in and of itself. So, one can replace dozens of HDD with a single SSD, which then sips just a bit of juice from the wall plug. The savings, for large data centres, could be so large that, rather than just wait to build "green fields" projects on SSD/multi machines as they arise in normal development, existing machines should be just scrapped and replaced. Ah bliss.
But now for the really significant paradigm shift. My suspicion is that Larry Ellison, not Armonk, has already figured this out. You read it here first, folks. There was a reason he wanted Sun; they have been doing business with STEC for a while.
The premise: SSD/multi machines excel at running high NF databases. The SSD speeds up flat file applications some, but that is not the Big Win. The Big Win is to excise terabytes from the data model. That's where the money is to be made. The SSD/multi machine with RDBMS is the answer. But to get there, fully, requires a paradigm shift in database software. The first database vendor to get there wins The Whole Market. You read that here first, too, folks.
There have been two historical knocks on 4/5NF data models. The first is that joins are too slow. SSD, by itself, slays that. With the SSD/multi machine, joins are trivial and fast. These models also allow the excision of terabytes of data. The second is that vendors have, as yet, not been able (or, more likely, willing) to implement Codd's 6th rule on view updating. Basically, only views defined as a projection of a single table are updatable in current products. This is material in the following way. In order to reap the maximum benefit of the SSD/multi machine database application, one needs the ability to both read and write the data in its logical structure, which includes the joins. In the near term, stored procedures suffice; send the joined row to the SP, which figures out the pieces and the logical order to update the base tables.
But what happens if a database engine can update joined views? Then it's all just SQL from the application code point of view. Less code, fewer bugs, faster execution. What's not to love? The pressure on database vendors to implement true view updating will increase as developers absorb the importance of the SSD/multi machines, and managers absorb the magnitude of cost saving available from using such machines to their maximum facility.
Oracle will be the first to get there just because IBM seems not to care about the relational aspects of its databases. The mainframe (z/OS) version is just a COBOL file handler, and the LUW version is drinking at the xml trough. Microsoft doesn't play with the big boys, despite protestations to the contrary. With the Sun acquisition, Oracle has the vertical to displace historical IBM mainframes. Oracle has never run well on IBM mainframe OSs. Oracle now has the opportunity to poach off z/OS applications at will. It will be interesting.
During this down time, I have been keeping track of the producer of SSD, STEC (both its name and stock symbol; it started out as Simple Technology), and they continue to announce agreements with large systems sellers, HP being the newest. A notion has begun to bubble in my brain. Said notion runs along these lines. The major sellers of hardware have gotten the SSD religion. Absolute capacity of SSD will not, in a reasonable timeframe, catch up with HDD. This puts the hardware folk in the following situation: they have these machines which can process data at unheard of speeds, but not the massive amounts of data that are the apple of the eye of flat file KiddieKoders. What to do, what to do?
The fact that STEC and Intel are making available SLC drives, with STEC targeting big server and near mainframe machines, and the machine makers taking them up on the deal means that there really is a paradigm shift in process. Aside: Fusion-io is targeting server quality drives on PCIe cards. That puzzled me for a bit, but the answer seems clear now. Such a drive is aimed squarely at Google, et al. The reason: the Google approach is one network, many machines. A disk farm is not their answer. Fusion-io can sell a ton of SSD to them. Alas, Fusion-io is private, so no money for the retail investor. Sigh.
The paradigm shift is motivated by the TCO equation, as much as blazing speed. SSD gulps a fraction of the power of HDD, and generates a fraction of the heat and thereby demands a fraction of the cooling. So, total power required per gigabyte is through the floor compared to HDD. There is further the reduced demand to do RAID-10, since the SSD takes care of data access speed in and of itself. So, one can replace dozens of HDD with a single SSD, which then sips just a bit of juice from the wall plug. The savings, for large data centres, could be so large that, rather than just wait to build "green fields" projects on SSD/multi machines as they arise in normal development, existing machines should be just scrapped and replaced. Ah bliss.
But now for the really significant paradigm shift. My suspicion is that Larry Ellison, not Armonk, has already figured this out. You read it here first, folks. There was a reason he wanted Sun; they have been doing business with STEC for a while.
The premise: SSD/multi machines excel at running high NF databases. The SSD speeds up flat file applications some, but that is not the Big Win. The Big Win is to excise terabytes from the data model. That's where the money is to be made. The SSD/multi machine with RDBMS is the answer. But to get there, fully, requires a paradigm shift in database software. The first database vendor to get there wins The Whole Market. You read that here first, too, folks.
There have been two historical knocks on 4/5NF data models. The first is that joins are too slow. SSD, by itself, slays that. With the SSD/multi machine, joins are trivial and fast. These models also allow the excision of terabytes of data. The second is that vendors have, as yet, not been able (or, more likely, willing) to implement Codd's 6th rule on view updating. Basically, only views defined as a projection of a single table are updatable in current products. This is material in the following way. In order to reap the maximum benefit of the SSD/multi machine database application, one needs the ability to both read and write the data in its logical structure, which includes the joins. In the near term, stored procedures suffice; send the joined row to the SP, which figures out the pieces and the logical order to update the base tables.
But what happens if a database engine can update joined views? Then it's all just SQL from the application code point of view. Less code, fewer bugs, faster execution. What's not to love? The pressure on database vendors to implement true view updating will increase as developers absorb the importance of the SSD/multi machines, and managers absorb the magnitude of cost saving available from using such machines to their maximum facility.
Oracle will be the first to get there just because IBM seems not to care about the relational aspects of its databases. The mainframe (z/OS) version is just a COBOL file handler, and the LUW version is drinking at the xml trough. Microsoft doesn't play with the big boys, despite protestations to the contrary. With the Sun acquisition, Oracle has the vertical to displace historical IBM mainframes. Oracle has never run well on IBM mainframe OSs. Oracle now has the opportunity to poach off z/OS applications at will. It will be interesting.
27 May 2009
Postgres has religion
Keeping up with new developments is turning into a full time job. I can live with that. Here is a specific test. He uses a newish Fusion-io PCIe card drive. This kind of drive is not architecturally like a STEC or Texas Memory Systems subsystem, or even a SATA drive like the Intels. Still, superb results.
22 May 2009
MySql (in part) gets the religion
I am no fan of MySql, if only because it has been driven by application coders who view databases not as relational algebra engines but as file dumps with a SQL interface; however, when I run across news that supports the thrust of this endeavor, I accept. The following is a quote from Andy Oram on the O'Reilly site:
Although there's no simple to characterize the many tweaks and additions, I detect two major sources of change:
* Taking advantage of the larger memory (including Solid State Drive (SSD)/Flash memory starting to appear on servers).
* Taking advantage of multicores, especially by making locks more granular.
He's talking about the fork, or not, of MySql post Oracle gobble. The irony, I suspect, is that the SSD/multi-core machine is fully suited to 5NF datastores, while not so much to the flat-file paradigm beloved of MySql application KiddieKoders. May be they'll figure out that normalized datastores will get better advantage from such machines. There is hope.
Although there's no simple to characterize the many tweaks and additions, I detect two major sources of change:
* Taking advantage of the larger memory (including Solid State Drive (SSD)/Flash memory starting to appear on servers).
* Taking advantage of multicores, especially by making locks more granular.
He's talking about the fork, or not, of MySql post Oracle gobble. The irony, I suspect, is that the SSD/multi-core machine is fully suited to 5NF datastores, while not so much to the flat-file paradigm beloved of MySql application KiddieKoders. May be they'll figure out that normalized datastores will get better advantage from such machines. There is hope.
12 May 2009
IBM does SSD
There is more news on the SSD front. STEC is a smallish (relative to Intel, Texas Memory Systems is private) flash SSD manufacturer. I just wish I'd known about them a few months ago; the stock has tripled in the last few months. Oh well.
More to the point: they manufacture flash based Tier 0 SSD subsystems, and have just announced deals with IBM and Hitachi (which just announced its worst year ever, eek) to run fiber channel storage.
So, take that all you silly doubting Thomases. The RDBMS is back, taking names, and kicking ass.
To review, IBM bought up solidDB a few years back, now has validated SSD. Lead, follow, or get out of the way. It's going to be glorious ride.
More to the point: they manufacture flash based Tier 0 SSD subsystems, and have just announced deals with IBM and Hitachi (which just announced its worst year ever, eek) to run fiber channel storage.
So, take that all you silly doubting Thomases. The RDBMS is back, taking names, and kicking ass.
To review, IBM bought up solidDB a few years back, now has validated SSD. Lead, follow, or get out of the way. It's going to be glorious ride.
09 May 2009
Wither DB2?
Where is DB2 going? IBM spent time trying to get Sun/MySql, all the while dallying with PostgreSQL with whom it now seems to have consummated marriage. At the same time, it is pushing MySql. MySql has been available on the iSeries for a few years, and I see a growing (although not a tidal wave) number of listing by IBM for MySql installers/maintenance. It, sort of, doesn't compute.
The enterpriseDB (PostgreSQL) mashup certainly sounds like the way to get MVCC support without having to actually write it, or admit that it was necessary. It is well known that MVCC has advantages for OLTP applications over locker databases. This is the technical reason Oracle (and PostgreSQL) hold such a lead; we'll see how the annual Gartner report goes. SQLServer got "snapshot" isolation in 2005 for just that reason. IBM pulled an IMS rabbit out of its hat instead. Gad.
Looking at the job boards, as I am these days, it is disheartening to see so little DB2/LUW in demand. Fact is, it really is the best locker out there. Its configuration is at least as good as Oracle. There are those in the DB2 community (I won't name names) who feel that IBM is at fault. I concur.
The "free" LUW DB2 is a pain to use. It needn't be; on the other hand, Oracle doesn't make the current version available in its free/community install.
It seems that IBM continues to push the mainframe DB2 installs, which act more like MySql as simple sql parsers fronting the filesystem than as RDBMS, as the source of its market share and revenue. I would dearly love to see true figures on the adoption of DB2 on z/OS. How much is just VSAM file dumps for antique COBOL code versus new or refactored relational datastores for java/ruby/python/etc.?
The situation with LUW is such a pathetic waste. It is so much better than its rivals.
The enterpriseDB (PostgreSQL) mashup certainly sounds like the way to get MVCC support without having to actually write it, or admit that it was necessary. It is well known that MVCC has advantages for OLTP applications over locker databases. This is the technical reason Oracle (and PostgreSQL) hold such a lead; we'll see how the annual Gartner report goes. SQLServer got "snapshot" isolation in 2005 for just that reason. IBM pulled an IMS rabbit out of its hat instead. Gad.
Looking at the job boards, as I am these days, it is disheartening to see so little DB2/LUW in demand. Fact is, it really is the best locker out there. Its configuration is at least as good as Oracle. There are those in the DB2 community (I won't name names) who feel that IBM is at fault. I concur.
The "free" LUW DB2 is a pain to use. It needn't be; on the other hand, Oracle doesn't make the current version available in its free/community install.
It seems that IBM continues to push the mainframe DB2 installs, which act more like MySql as simple sql parsers fronting the filesystem than as RDBMS, as the source of its market share and revenue. I would dearly love to see true figures on the adoption of DB2 on z/OS. How much is just VSAM file dumps for antique COBOL code versus new or refactored relational datastores for java/ruby/python/etc.?
The situation with LUW is such a pathetic waste. It is so much better than its rivals.
25 April 2009
What I told Bob
(Cringely had another post about Sun/Oracle, which needed reply. Since I don't expect that readers there are readers here, I am providing.)
If you read Gartner, and I only see the PR condensed version, DB2 off-mainframe is falling behind every year since 2000. I, and others, speculated that the reason IBM wanted MySql in the first place was to spruce it up, and call it DB2. What those who’ve never been in a DB2 shop don’t understand is that most DB2 installs on are z/OS, and most of those are running 1970 era COBOL code. In such cases, very little of Relational is ever utilized. MySql as simple sql parser in front of the file system is just all that a COBOL (or java) coder needs. MySql == DB2. That was the plan. Now IBM needs Plan B.
I suspect a bit of whistling past the graveyard in that leaked (horrors, how did that get out!!!!!!!!!!!) email. 750 Power customers? This is a big deal? Those are mainframe numbers. Oracle needed, and now has, a weapon to finally kill DB2 off-mainframe. Like it or not, the Intel multi-core/processor machine is ascendant. The off-mainframe database is where the future lies. IBM cannot possibly want a future of being just another Intel OEM, with customers running Open Source software on same. There is no future there. Oracle has built up a portfolio of software for which there is no easy Open Source alternative.
Remember: Oracle is a MVCC architecture database. DB2 is a locker. SQLServer was a pure locker, and has added MVCC (sorta, kinda) in 2008. Postgres is MVCC and Open Source. This architecture difference is not trivial. Why IBM chose to port IMS to DB2, calling it pureXML, is impossible to fathom. It was not a value add for customers in a web environment. The MVCC architecture is widely agreed to be superior there. IBM has to believe, why I cannot fathom again, that its mainframe machine will dominate the future. Its database off-mainframe is not going to.
Finally, IBM had a compliant bitch in Sun vis-a-vis java. That won’t be the case with Larry.
If you read Gartner, and I only see the PR condensed version, DB2 off-mainframe is falling behind every year since 2000. I, and others, speculated that the reason IBM wanted MySql in the first place was to spruce it up, and call it DB2. What those who’ve never been in a DB2 shop don’t understand is that most DB2 installs on are z/OS, and most of those are running 1970 era COBOL code. In such cases, very little of Relational is ever utilized. MySql as simple sql parser in front of the file system is just all that a COBOL (or java) coder needs. MySql == DB2. That was the plan. Now IBM needs Plan B.
I suspect a bit of whistling past the graveyard in that leaked (horrors, how did that get out!!!!!!!!!!!) email. 750 Power customers? This is a big deal? Those are mainframe numbers. Oracle needed, and now has, a weapon to finally kill DB2 off-mainframe. Like it or not, the Intel multi-core/processor machine is ascendant. The off-mainframe database is where the future lies. IBM cannot possibly want a future of being just another Intel OEM, with customers running Open Source software on same. There is no future there. Oracle has built up a portfolio of software for which there is no easy Open Source alternative.
Remember: Oracle is a MVCC architecture database. DB2 is a locker. SQLServer was a pure locker, and has added MVCC (sorta, kinda) in 2008. Postgres is MVCC and Open Source. This architecture difference is not trivial. Why IBM chose to port IMS to DB2, calling it pureXML, is impossible to fathom. It was not a value add for customers in a web environment. The MVCC architecture is widely agreed to be superior there. IBM has to believe, why I cannot fathom again, that its mainframe machine will dominate the future. Its database off-mainframe is not going to.
Finally, IBM had a compliant bitch in Sun vis-a-vis java. That won’t be the case with Larry.
20 April 2009
Sunrise, Sunset. Game, set, match
Well, the other shoe dropped. Oracle has bid for Sun. In my tracking of speculation, Oracle had more weight than IBM. And so it as turned out. This might end up being a problem for IBM.
IBM has made MySql one of the databases on its iSeries (nee: AS/400). Interesting to see how that goes. The not often mentioned fly in the MySql ointment had been the transactional engine, InnoDB, was own by Oracle for some time, and the putative replacement, Falcon (which wasn't very transactional by design) died aborning.
Assuming this gets past regulatory complaints, and there could be due to the fact that MySql represents a measurable fraction of installed databases. Since it is "free" software, sort of, using common measures such as license fees and similar will make the calculation fuzzy, but a case could be made (and I expect that IBM will make it) that Oracle will control too much of the relational database market. Time will tell.
This is not good for IBM. Following the Gartner reports for the last decade, carefully read, might lead one to conclude that DB2 depends on mainframe installs for its continued existence. With the iSeries moving to use MySql (and we'll see how that goes in future), the Linux/Unix/Windows version may be the red haired stepchild. IBM may conclude that it has no reason to exist. It hasn't made significant inroads against Oracle and SQLServer in the decade. IBM has never been shy about cutting its losses. We may have to wave goodbye to LUW. Sniff.
How would this affect the point of this endeavor? We would be left with just two industrial strength databases on *nix: Oracle and Postgres. Both are MVCC engines, not lockers; does this distinction matter with regard to SSD hosted databases? I think not. While the MVCC approach eats more memory, I don't see that base table storage should be affected. Oracle has Times Ten and IBM has SolidDB in memory databases, so both are working that angle; adapting to SSD should only be a baby step away.
IBM has made MySql one of the databases on its iSeries (nee: AS/400). Interesting to see how that goes. The not often mentioned fly in the MySql ointment had been the transactional engine, InnoDB, was own by Oracle for some time, and the putative replacement, Falcon (which wasn't very transactional by design) died aborning.
Assuming this gets past regulatory complaints, and there could be due to the fact that MySql represents a measurable fraction of installed databases. Since it is "free" software, sort of, using common measures such as license fees and similar will make the calculation fuzzy, but a case could be made (and I expect that IBM will make it) that Oracle will control too much of the relational database market. Time will tell.
This is not good for IBM. Following the Gartner reports for the last decade, carefully read, might lead one to conclude that DB2 depends on mainframe installs for its continued existence. With the iSeries moving to use MySql (and we'll see how that goes in future), the Linux/Unix/Windows version may be the red haired stepchild. IBM may conclude that it has no reason to exist. It hasn't made significant inroads against Oracle and SQLServer in the decade. IBM has never been shy about cutting its losses. We may have to wave goodbye to LUW. Sniff.
How would this affect the point of this endeavor? We would be left with just two industrial strength databases on *nix: Oracle and Postgres. Both are MVCC engines, not lockers; does this distinction matter with regard to SSD hosted databases? I think not. While the MVCC approach eats more memory, I don't see that base table storage should be affected. Oracle has Times Ten and IBM has SolidDB in memory databases, so both are working that angle; adapting to SSD should only be a baby step away.
08 April 2009
Encapsulation
I have been looking into Python based web frameworks recently; TurboGears, Pylons, and Django. Each uses an ORM along the way, and the ORM of preference is SQLAlchemy. Of the three frameworks, only TurboGears is being developed with a "reverse" ORM in tow. That being Sprox (nee: DBSprockets).
Sprox doesn't call itself that; I made it up. But it does the reverse to what is found in the framework texts, tutorials, etc. The frameworks use SQLAlchemy to permit the coder to create Python files, which are then run to force DDL into the database. Not what I consider a Good Thing. But, then I've been a database geek for decades; I get database design and specification. While SQLAlchemy is, in my opinion, a better ORM than any of the others I've seen, it doesn't support (nor is it ever likely to) real database design. It's made for Pythonistas who know enough to make trouble.
Sprox sets out to generate a UI from the schema, constraints (some anyway) and all. Data and its constraints of record in one place. Ah, bliss.
But this excursion into the dark side led to a minor epiphany. The OO folk love to talk up encapsulation, isolation of concerns, and other such. They also love to complain that changing the database schema messes up their code, so let's not ever do that once a database gets defined. Of course, coders would never suggest that they, themselves, should never amend their code as needs change. Of course not.
The fact is, industrial strength RDBMS (DB2, Oracle, Postgres, SQLServer) all implement encapsulation and other OO niceties already. The mechanisms are views and stored procedures. The problem is that coders start from the view of odbc/jdbc and, likely, some ancient flat-file derived "relational database". So, they build or acquire some code which, more or less, simplifies the creation of DML in the coding language du jour. DML is just sql, and the odbc/jdbc/cli interfaces are client centric: here's a query, give me back a result set(s) which I'll then iterate through. More often then not, the coders will (especially if they've been exposed to COBOL, ever) read the Master File, then read the Detail File. You get the gist.
With such a simplistic interface, schema/catalog changes cause all sorts of heartburn. But none of that need happen.
Views should be defined for the Object data; that is, the instance data needed to differentiate instances of each Class. This data can be of arbitrary complexity. The base tables in the database are irrelevant to application code; unless there is change to definition of the Class instance data, the client code(r) never knows (or cares) how said data is stored. So, an Order would consist of data from Order, Customer, Address, Order_Line, Inventory, etc. The view would still be called Order, but would be the appropriate join or not. Or not? If the current schema is just some flat-file dump into the RDBMS, then not. But, should there be a refactoring (and would be if smarter heads prevail), the view name is still Order and is still the reference so far as the application code(r) knows, but is now a more or less normalized retrieval. Which can be refactored incrementally, all the while leaving the interface name and data unchanged.
For those RDBMS which support stored procedures that return result sets, SP can be used to fully encapsulate data logic. The call into the database is an SP called, say, GetOrder. The return is the order data. How that data is accumulated is of no concern to the client code(r).
Stored procedures provide the write process encapsulation. The client code(r) calls with necessary parameters, WriteOrder. The stored procedure then figures out where to put the data; this may be some flat-file image in the database or a 5NF decomposition or something in between. The client code(r) neither knows nor cares.
The solid state disc multi-core/processor machine, which is the main subject of this endeavor, is ideally suited to support this approach. Conventional machines, duly buffered, can do much the same.
Sprox doesn't call itself that; I made it up. But it does the reverse to what is found in the framework texts, tutorials, etc. The frameworks use SQLAlchemy to permit the coder to create Python files, which are then run to force DDL into the database. Not what I consider a Good Thing. But, then I've been a database geek for decades; I get database design and specification. While SQLAlchemy is, in my opinion, a better ORM than any of the others I've seen, it doesn't support (nor is it ever likely to) real database design. It's made for Pythonistas who know enough to make trouble.
Sprox sets out to generate a UI from the schema, constraints (some anyway) and all. Data and its constraints of record in one place. Ah, bliss.
But this excursion into the dark side led to a minor epiphany. The OO folk love to talk up encapsulation, isolation of concerns, and other such. They also love to complain that changing the database schema messes up their code, so let's not ever do that once a database gets defined. Of course, coders would never suggest that they, themselves, should never amend their code as needs change. Of course not.
The fact is, industrial strength RDBMS (DB2, Oracle, Postgres, SQLServer) all implement encapsulation and other OO niceties already. The mechanisms are views and stored procedures. The problem is that coders start from the view of odbc/jdbc and, likely, some ancient flat-file derived "relational database". So, they build or acquire some code which, more or less, simplifies the creation of DML in the coding language du jour. DML is just sql, and the odbc/jdbc/cli interfaces are client centric: here's a query, give me back a result set(s) which I'll then iterate through. More often then not, the coders will (especially if they've been exposed to COBOL, ever) read the Master File, then read the Detail File. You get the gist.
With such a simplistic interface, schema/catalog changes cause all sorts of heartburn. But none of that need happen.
Views should be defined for the Object data; that is, the instance data needed to differentiate instances of each Class. This data can be of arbitrary complexity. The base tables in the database are irrelevant to application code; unless there is change to definition of the Class instance data, the client code(r) never knows (or cares) how said data is stored. So, an Order would consist of data from Order, Customer, Address, Order_Line, Inventory, etc. The view would still be called Order, but would be the appropriate join or not. Or not? If the current schema is just some flat-file dump into the RDBMS, then not. But, should there be a refactoring (and would be if smarter heads prevail), the view name is still Order and is still the reference so far as the application code(r) knows, but is now a more or less normalized retrieval. Which can be refactored incrementally, all the while leaving the interface name and data unchanged.
For those RDBMS which support stored procedures that return result sets, SP can be used to fully encapsulate data logic. The call into the database is an SP called, say, GetOrder. The return is the order data. How that data is accumulated is of no concern to the client code(r).
Stored procedures provide the write process encapsulation. The client code(r) calls with necessary parameters, WriteOrder. The stored procedure then figures out where to put the data; this may be some flat-file image in the database or a 5NF decomposition or something in between. The client code(r) neither knows nor cares.
The solid state disc multi-core/processor machine, which is the main subject of this endeavor, is ideally suited to support this approach. Conventional machines, duly buffered, can do much the same.
05 April 2009
Thank You Ted and Steve
There were a couple of interesting developments this past week which bear on the subject of this endeavor.
First, Tony Davis had posted an editorial at SimpleTalk about the Big Deal that had been made about multi-core cpu's, which has subsequently faded; the Big Deal, that is. He went on to observe that for the majority of application developers, and I infer that he means database connected developers since SimpleTalk is a SQLServer based site, parallel programming hasn't been and won't be an issue. Well. A few days into the Editorial's posting (comments are encouraged by the award of a prize for the one deemed Best), Ted Neward, one time servlet maven and now in the M$ camp from what I can see, took it upon himself to post a screed on his web/blog site saying, in general, that what Mr. Davis had written was crap. He asserted, still, that multi-core coding was in the future of coders generally, and that things database were not relevant. The usual client coder bilge.
There ensued a minor sortie on his site from posters of SimpleTalk. It seems to have ended in a draw, with no (as of today) further rebuttal from Mr. Neward. Why this scuffle is relevant here is that one of the postings referenced a New York Times article, which went to some lengths in discussing the nature of cpu's and their future. The driving force is the rise of non-PC devices which connect to some manner of centralized datastore, likely from the Web but not of necessity. These devices use much simpler cpu's, notably of ARM architecture, and are being run on linux increasingly.
The conclusion of Mr. Davis, and most of the comments both at SimpleTalk and Mr. Neward's site, is that we are returning to a world more like the early 1970's, with a centralized computer brain talking to relatively dumb terminal-like devices, than the actively intelligent network envisioned by Mr. McNealy which was just an extension of the client/server architecture that Standford University Network was. While the network might be the computer, it's looking more like Multics every day. Look it up.
The other event of note is also a New York Times article, this time telling the tales of those who struck it rich with iPhone apps. We learn about a handful of winners. We also learn that there are already 25,000 such apps, more every day, and still few winners. On the other hand, it seems to be the venue of choice for those who like to engage in lipsticking. A place for them to go and leave the rest of us alone to do real work. Hopefully, the siren song of (low chance) riches will siphon off many thousands of knuckleheads so that the rest of us can get real work done. Ah bliss.
First, Tony Davis had posted an editorial at SimpleTalk about the Big Deal that had been made about multi-core cpu's, which has subsequently faded; the Big Deal, that is. He went on to observe that for the majority of application developers, and I infer that he means database connected developers since SimpleTalk is a SQLServer based site, parallel programming hasn't been and won't be an issue. Well. A few days into the Editorial's posting (comments are encouraged by the award of a prize for the one deemed Best), Ted Neward, one time servlet maven and now in the M$ camp from what I can see, took it upon himself to post a screed on his web/blog site saying, in general, that what Mr. Davis had written was crap. He asserted, still, that multi-core coding was in the future of coders generally, and that things database were not relevant. The usual client coder bilge.
There ensued a minor sortie on his site from posters of SimpleTalk. It seems to have ended in a draw, with no (as of today) further rebuttal from Mr. Neward. Why this scuffle is relevant here is that one of the postings referenced a New York Times article, which went to some lengths in discussing the nature of cpu's and their future. The driving force is the rise of non-PC devices which connect to some manner of centralized datastore, likely from the Web but not of necessity. These devices use much simpler cpu's, notably of ARM architecture, and are being run on linux increasingly.
The conclusion of Mr. Davis, and most of the comments both at SimpleTalk and Mr. Neward's site, is that we are returning to a world more like the early 1970's, with a centralized computer brain talking to relatively dumb terminal-like devices, than the actively intelligent network envisioned by Mr. McNealy which was just an extension of the client/server architecture that Standford University Network was. While the network might be the computer, it's looking more like Multics every day. Look it up.
The other event of note is also a New York Times article, this time telling the tales of those who struck it rich with iPhone apps. We learn about a handful of winners. We also learn that there are already 25,000 such apps, more every day, and still few winners. On the other hand, it seems to be the venue of choice for those who like to engage in lipsticking. A place for them to go and leave the rest of us alone to do real work. Hopefully, the siren song of (low chance) riches will siphon off many thousands of knuckleheads so that the rest of us can get real work done. Ah bliss.
29 March 2009
The Next Revolution and Alternate Storage Propositions
I've spent the last few days reading Chris Date's latest book, "SQL and Relational Theory". One buys books as much to provide support to the author, kind of like alms, as to acquire the facts, thoughts, and opinions therein. Kind of like buying Monkees albums; one doesn't really expect to hear anything new. I may post a discussion of the text, particularly if I find information not in previous books.
What this post is about is the TransRelational Model [TRM] which this latest Date book resurrects, column stores such as Stonebraker's Vertica, and the impact of the Next Revolution on them. As always, this is a thought experiment, not a report on a Proof of Concept or pilot project about either. May be someday.
In Date's eighth edition of "Introduction...", there is the (in)famous Appendix A, wherein he explicates why Tarin's patented Tarin Transform Method, when applied to relational databases, will be "the most significant development in this field since Codd gave us the relational model, nearly 35 years ago" without referencing an implementation. In particular that, "the time it takes to join 20 relations is only twice the time to join 10 (loosely speaking)." When published in 2004, Appendix A led to a bit of kerfuffle over whether, given the reality of discs, slicing and dicing rows could logically lead to the claimed improvements. I found a paper, which says it is the first implementation of TRM. The paper is for sale from Springer, for those who may be interested. You will need to buy the book to see what they found.
At the end of "SQL and Relational Theory", in the About the author, is a list of some of Date's books, among them "Go Faster! The TransRelational Approach to DBMS Implementation, is due for publication in the near future." The same book is "To appear" in Appendix A of the eighth edition. And I had thought it had gone away. The url provided for Required Technologies, Inc. is now the home of an ultrasound firm.
The column database has been around for a while; Vertica is Michael Stonebraker's version. There is also a blog, The Database Column which discusses column stores. It makes for some interesting reading. Two of the listed posters are of Vertica.
My interest is this: given the Next Revolution, do either a TRM or column store database have a purpose? Or any 'new and improved' physical storage proposition. My conclusion is, on the whole, no. The column store, when used to support existing petabyte OLAP systems may be worth the grief, but for transactional systems, at which the TRM is aiming and from which column stores would extract, not so much. The claim in the eighth edition is that TRM datastores scale linearly with the number of tables referenced in a JOIN, but my thoughts are that the SSD table/row RDBMS cares not about the number of tables referenced in the JOIN, since access time is independent of access path. In such a scenario, the number of tables in the JOIN (assuming that the number of tables is determined by the degree of decomposition) should lead to faster access, since there is less data to be retrieved. As I said in part 2, there is a cost in cycles for the engine to synthesize the rows. The actual timing differences will be determined by the real data. In all, however, it seems to me that plain vanilla table/row 5NF RDBMS on SSD multi-processor machines will have better performance than either TRM or column store on any type of machine. Were I of TRM or a column store vendor, inexpensive SSD multi-processor servers would be making my sphincter uncomfortable.
The sine qua non of RDBMS performance implementation, is access path on storage. The fastest are in memory databases, such as solidDB now from IBM. For production databases for normal organizations, mainstream storage for mainstream databases will be where the action is. Both TRM and column datastores, so far as either has 'fessed up, are an attempt to gain superior performance from standard disc storage machines. Remove that assumption, and there may not be any there, there. Gertrude Stein again. Kind of like making the finest buggy whip in 1920.
Current mainstream databases can be run against heavily cached disc storage, buffering in the engine and the storage subsystem. The cost of such systems will approach that of dedicated RAM implemented SSD storage, since the hardware and firmware required to insure data integrity is the same. As was discovered by the late 1990's, one level of buffering which is controlled by the engine is the most efficient and secure way to design physical storage.
And for what it's worth, back in the 1970's, before the RDBMS came into existence, there was the "fully inverted file" approach to 'databases'. In essence, one indexed the data in a file on each 'field', and turned all random requests into sequential requests. This appears to be the kernel behind the TRM and column store approaches. Not new, but if one buys Jim Gray's assertion that density increases will continue to surpass seek/latency improvements, then it makes some sense for rust based storage. The overwhelming tsunami of data which results may be a problem. If we view a world where storage is on SSD, rather than rust, as Torvalds says, the nature of file systems changes. These changes have a material impact on RDBMS implementations.
What this post is about is the TransRelational Model [TRM] which this latest Date book resurrects, column stores such as Stonebraker's Vertica, and the impact of the Next Revolution on them. As always, this is a thought experiment, not a report on a Proof of Concept or pilot project about either. May be someday.
In Date's eighth edition of "Introduction...", there is the (in)famous Appendix A, wherein he explicates why Tarin's patented Tarin Transform Method, when applied to relational databases, will be "the most significant development in this field since Codd gave us the relational model, nearly 35 years ago" without referencing an implementation. In particular that, "the time it takes to join 20 relations is only twice the time to join 10 (loosely speaking)." When published in 2004, Appendix A led to a bit of kerfuffle over whether, given the reality of discs, slicing and dicing rows could logically lead to the claimed improvements. I found a paper, which says it is the first implementation of TRM. The paper is for sale from Springer, for those who may be interested. You will need to buy the book to see what they found.
At the end of "SQL and Relational Theory", in the About the author, is a list of some of Date's books, among them "Go Faster! The TransRelational Approach to DBMS Implementation, is due for publication in the near future." The same book is "To appear" in Appendix A of the eighth edition. And I had thought it had gone away. The url provided for Required Technologies, Inc. is now the home of an ultrasound firm.
The column database has been around for a while; Vertica is Michael Stonebraker's version. There is also a blog, The Database Column which discusses column stores. It makes for some interesting reading. Two of the listed posters are of Vertica.
My interest is this: given the Next Revolution, do either a TRM or column store database have a purpose? Or any 'new and improved' physical storage proposition. My conclusion is, on the whole, no. The column store, when used to support existing petabyte OLAP systems may be worth the grief, but for transactional systems, at which the TRM is aiming and from which column stores would extract, not so much. The claim in the eighth edition is that TRM datastores scale linearly with the number of tables referenced in a JOIN, but my thoughts are that the SSD table/row RDBMS cares not about the number of tables referenced in the JOIN, since access time is independent of access path. In such a scenario, the number of tables in the JOIN (assuming that the number of tables is determined by the degree of decomposition) should lead to faster access, since there is less data to be retrieved. As I said in part 2, there is a cost in cycles for the engine to synthesize the rows. The actual timing differences will be determined by the real data. In all, however, it seems to me that plain vanilla table/row 5NF RDBMS on SSD multi-processor machines will have better performance than either TRM or column store on any type of machine. Were I of TRM or a column store vendor, inexpensive SSD multi-processor servers would be making my sphincter uncomfortable.
The sine qua non of RDBMS performance implementation, is access path on storage. The fastest are in memory databases, such as solidDB now from IBM. For production databases for normal organizations, mainstream storage for mainstream databases will be where the action is. Both TRM and column datastores, so far as either has 'fessed up, are an attempt to gain superior performance from standard disc storage machines. Remove that assumption, and there may not be any there, there. Gertrude Stein again. Kind of like making the finest buggy whip in 1920.
Current mainstream databases can be run against heavily cached disc storage, buffering in the engine and the storage subsystem. The cost of such systems will approach that of dedicated RAM implemented SSD storage, since the hardware and firmware required to insure data integrity is the same. As was discovered by the late 1990's, one level of buffering which is controlled by the engine is the most efficient and secure way to design physical storage.
And for what it's worth, back in the 1970's, before the RDBMS came into existence, there was the "fully inverted file" approach to 'databases'. In essence, one indexed the data in a file on each 'field', and turned all random requests into sequential requests. This appears to be the kernel behind the TRM and column store approaches. Not new, but if one buys Jim Gray's assertion that density increases will continue to surpass seek/latency improvements, then it makes some sense for rust based storage. The overwhelming tsunami of data which results may be a problem. If we view a world where storage is on SSD, rather than rust, as Torvalds says, the nature of file systems changes. These changes have a material impact on RDBMS implementations.
Subscribe to:
Posts (Atom)