Dr. Codd Was Right: The Next Revolution and Alternate Storage Propositions

Lisa Murkowski, Swamp Critter

The world is not linear.
-- Dr. McElhone/1974

Power tends to corrupt; absolute power corrupts absolutely.
-- Lord Acton/1887

We have a golden share, which I control, or the president controls. Now I'm a little concerned whoever the president might be, but that gives you total control.
-- Mad Dictator Don/2025 [the march of dictatorship shambles on]

I think we are on the verge of losing vaccines for this country, from this country. And the reason is that Robert F. Kennedy Jr. will hold up a paper, in the next four or five months, that says it's aluminum in vaccines that are causing a whole swath of problems, including autism. I think he is about to destroy vaccines in this country. I do.
-- Dr. Paul Offit/2025 [may the MAGA and MAHA be with you]

There's not a single example of things working out for the appeaser.
-- Nicolle Wallace/2024 [like this? the next extortion is on the way]

I have had to explain and re-explain and re-explain and re-explain, you know, how relational databases work, what is an eigenvector, what is dimensionality reduction.

-- Christopher Wylie/2018

... but Flash-based storage has such a different performance profile from rotating media, that I suspect that it will end up having a large impact on filesystem design. Right now, most filesystems tend to be designed with the latencies of rotating media in mind.

-- Linus Torvalds/2007

I believe quite strongly that, if you think about the issue at the appropriate level of abstraction, you're inexorably led to the position that databases must be relational.

-- Chris Date/2009

This Week's thought

High levels is toxic, no doubt about it. Is the mercury that we're exposed to routinely toxic? The answer is no. If it was, we'd have to move to another planet.
-- Dr. Paul Offit/2025 [a Real vaccine expert]

See you next week in a brand new show^{©Heckle and Jeckle}

Therefore:

In a time of SSD, multi-core/processor, two terabyte memory and Optane App Direct Mode (RIP) machines, there is no reason not to build from BCNF data. Time to do what Dr. Codd demonstrated. Technology has finally caught up with the maths.

29 March 2009

The Next Revolution and Alternate Storage Propositions

I've spent the last few days reading Chris Date's latest book, "SQL and Relational Theory". One buys books as much to provide support to the author, kind of like alms, as to acquire the facts, thoughts, and opinions therein. Kind of like buying Monkees albums; one doesn't really expect to hear anything new. I may post a discussion of the text, particularly if I find information not in previous books.

What this post is about is the TransRelational Model [TRM] which this latest Date book resurrects, column stores such as Stonebraker's Vertica, and the impact of the Next Revolution on them. As always, this is a thought experiment, not a report on a Proof of Concept or pilot project about either. May be someday.

In Date's eighth edition of "Introduction...", there is the (in)famous Appendix A, wherein he explicates why Tarin's patented Tarin Transform Method, when applied to relational databases, will be "the most significant development in this field since Codd gave us the relational model, nearly 35 years ago" without referencing an implementation. In particular that, "the time it takes to join 20 relations is only twice the time to join 10 (loosely speaking)." When published in 2004, Appendix A led to a bit of kerfuffle over whether, given the reality of discs, slicing and dicing rows could logically lead to the claimed improvements. I found a paper, which says it is the first implementation of TRM. The paper is for sale from Springer, for those who may be interested. You will need to buy the book to see what they found.

At the end of "SQL and Relational Theory", in the About the author, is a list of some of Date's books, among them "Go Faster! The TransRelational Approach to DBMS Implementation, is due for publication in the near future." The same book is "To appear" in Appendix A of the eighth edition. And I had thought it had gone away. The url provided for Required Technologies, Inc. is now the home of an ultrasound firm.

The column database has been around for a while; Vertica is Michael Stonebraker's version. There is also a blog, The Database Column which discusses column stores. It makes for some interesting reading. Two of the listed posters are of Vertica.

My interest is this: given the Next Revolution, do either a TRM or column store database have a purpose? Or any 'new and improved' physical storage proposition. My conclusion is, on the whole, no. The column store, when used to support existing petabyte OLAP systems may be worth the grief, but for transactional systems, at which the TRM is aiming and from which column stores would extract, not so much. The claim in the eighth edition is that TRM datastores scale linearly with the number of tables referenced in a JOIN, but my thoughts are that the SSD table/row RDBMS cares not about the number of tables referenced in the JOIN, since access time is independent of access path. In such a scenario, the number of tables in the JOIN (assuming that the number of tables is determined by the degree of decomposition) should lead to faster access, since there is less data to be retrieved. As I said in part 2, there is a cost in cycles for the engine to synthesize the rows. The actual timing differences will be determined by the real data. In all, however, it seems to me that plain vanilla table/row 5NF RDBMS on SSD multi-processor machines will have better performance than either TRM or column store on any type of machine. Were I of TRM or a column store vendor, inexpensive SSD multi-processor servers would be making my sphincter uncomfortable.

The sine qua non of RDBMS performance implementation, is access path on storage. The fastest are in memory databases, such as solidDB now from IBM. For production databases for normal organizations, mainstream storage for mainstream databases will be where the action is. Both TRM and column datastores, so far as either has 'fessed up, are an attempt to gain superior performance from standard disc storage machines. Remove that assumption, and there may not be any there, there. Gertrude Stein again. Kind of like making the finest buggy whip in 1920.

Current mainstream databases can be run against heavily cached disc storage, buffering in the engine and the storage subsystem. The cost of such systems will approach that of dedicated RAM implemented SSD storage, since the hardware and firmware required to insure data integrity is the same. As was discovered by the late 1990's, one level of buffering which is controlled by the engine is the most efficient and secure way to design physical storage.

And for what it's worth, back in the 1970's, before the RDBMS came into existence, there was the "fully inverted file" approach to 'databases'. In essence, one indexed the data in a file on each 'field', and turned all random requests into sequential requests. This appears to be the kernel behind the TRM and column store approaches. Not new, but if one buys Jim Gray's assertion that density increases will continue to surpass seek/latency improvements, then it makes some sense for rust based storage. The overwhelming tsunami of data which results may be a problem. If we view a world where storage is on SSD, rather than rust, as Torvalds says, the nature of file systems changes. These changes have a material impact on RDBMS implementations.

13 comments:

Anonymous said...: Robert,

I enjoy reading your posts, but had not read this one until recently when reviewing your earlier posts.

You may have already discovered the findings below since this post in 2009. I looked around for more information on the TransRelational Model and Date’s book “Go Faster! …” that you mentioned in this post. I found that the book is now published (2011) and even found a PDF version that can be downloaded for free at http://www.zums.ac.ir/files/research/site/ebooks/it-programming/go-faster.pdf. This PDF copy has advertising in it, but appears to be a complete copy at 287 pages. The book PDF download link is also referenced in Date’s news page http://www.justsql.co.uk/chris_date/chris_date.htm, so I suspect it is legit.

The book mentions that the book publication was held up due on non-disclosure agreements that expired in 2011 (and likely why you hadn’t seen the book at the time of your 2009 post). There is also further information about the Tarin Transform Method and a patent on that method (US PTO 6,009,432 – 12/28/1999 – Value-Instance-Connectivity Computer-Implemented Database). I am still in the process of reading the book, but found some interesting info so far.

I will be interested to hear your comments on this book – perhaps in a future post.

Thanks again for your writings and thoughts.

Scott R.; March 28, 2013 at 8:34 PM
Sadhana Rathore said...: I wanted to thank for sharing this article and I have bookmarked this page to check out new stuff.
AWS Training in Chennai
AWS course in Chennai
DevOps certification in Chennai
DevOps Training in Chennai
Data Science Course in Chennai
Data Science Training in Chennai
AWS Training in OMR
AWS Training in Porur; February 7, 2019 at 3:55 AM
Raga Designers said...: I have read your excellent post. Thanks for sharing

aws training in chennai
big data training in chennai
iot training in chennai
data science training in chennai
blockchain training in chennai
rpa training in chennai
security testing training in chennai; July 15, 2019 at 7:32 AM
Racim Boudjakdjk said...: Robert,

Codd's response to previous hierarchic or graph databases was not more speed or more power but rather total physical independence.

Physical independence happens only when the relationship between data run time representation and operation and physical encoding organization is broken up in the way where there is no relationship at all between the properties of both layers. Unless this condition is met, formalized and mathematically proven, speaking of RDBMS respecting RDM's PI is like speaking of a car running with a fuel engine eating water tanks: won t happen.; July 18, 2019 at 11:08 AM
Mithun said...: Great Post with lots of useful informations. Excellent blog very much interesting...
SAP Training in Chennai | AWS Training in Chennai | SAP Training | AWS Training; August 5, 2019 at 6:59 AM
TIC Academy said...: Excellent blog on AWS Concepts. Superb information.
AWS Exam Center in Chennai | AWS Training in Chennai | AWS Training Institute in Chennai; December 24, 2019 at 4:32 AM
Manigandan said...: Wonderful blog with lots of useful info..
Hardware and Networking Training in Chennai
CCNA Training in Chennai
AWS Training in Chennai
SAP Training in Chennai
Software Testing Training in Chennai
Java Training in Chennai
QTP Training in Chennai
iOS Training in Chennai
Oracle Training in Chennai
Pearson Vue Exam Center in Chennai; December 26, 2019 at 2:53 AM
Mithun said...: Wonderful Blogspot
AWS Training in Chennai
AWS Training Institutes in Chennai
AWS Training Center in Chennai
AWS Training Course in Chennai
AWS Training Class in Chennai
Best AWS Training in Chennai
AWS Training Institute in Chennai
AWS Certification in Chennai
AWS Classes in Chennai
AWS Training; July 14, 2020 at 11:01 AM
Aishu said...: Awesome post.
IELTS Coaching in chennai

German Classes in Chennai

GRE Coaching Classes in Chennai

TOEFL Coaching in Chennai

spoken english classes in chennai | Communication training; July 20, 2020 at 2:20 AM
rajmohan1140 said...: I really like what you have here concerning blogs. Your article is so useful for us, thanks for sharing. Good stuff!

Java Training in Chennai

Java Course in Chennai; December 24, 2020 at 5:58 AM
mathivarsha said...: Awesome post
DevOps Training in Chennai; January 5, 2021 at 1:46 PM
lillyraju729 said...: Amazing article ! I would like to thank you for the efforts you had made for writing this awesome article. this article
inspired me to read more. Keep it up.Data Science Training In Chennai

Data Science Course In Chennai; January 6, 2021 at 3:45 AM
Let2know said...: Rekordbox DJ Crack has become an activation application for audio programs. You might even utilize the Pioneer DJ participant to perform Rekordbox Free Download with License Key; November 1, 2022 at 4:22 AM

Dr. Codd Was Right

Lisa Murkowski, Swamp Critter

About

Shameless Plug

Extended Pieces

Good Stuff

Followers

Blog Archive

29 March 2009

The Next Revolution and Alternate Storage Propositions

13 comments: