Dr. Codd Was Right: schizophRenia [update]

Lisa Murkowski, Swamp Critter

The world is not linear.
-- Dr. McElhone/1974

Power tends to corrupt; absolute power corrupts absolutely.
-- Lord Acton/1887

We have a golden share, which I control, or the president controls. Now I'm a little concerned whoever the president might be, but that gives you total control.
-- Mad Dictator Don/2025 [the march of dictatorship shambles on]

I think we are on the verge of losing vaccines for this country, from this country. And the reason is that Robert F. Kennedy Jr. will hold up a paper, in the next four or five months, that says it's aluminum in vaccines that are causing a whole swath of problems, including autism. I think he is about to destroy vaccines in this country. I do.
-- Dr. Paul Offit/2025 [may the MAGA and MAHA be with you]

There's not a single example of things working out for the appeaser.
-- Nicolle Wallace/2024 [like this? the next extortion is on the way]

I have had to explain and re-explain and re-explain and re-explain, you know, how relational databases work, what is an eigenvector, what is dimensionality reduction.

-- Christopher Wylie/2018

... but Flash-based storage has such a different performance profile from rotating media, that I suspect that it will end up having a large impact on filesystem design. Right now, most filesystems tend to be designed with the latencies of rotating media in mind.

-- Linus Torvalds/2007

I believe quite strongly that, if you think about the issue at the appropriate level of abstraction, you're inexorably led to the position that databases must be relational.

-- Chris Date/2009

This Week's thought

High levels is toxic, no doubt about it. Is the mercury that we're exposed to routinely toxic? The answer is no. If it was, we'd have to move to another planet.
-- Dr. Paul Offit/2025 [a Real vaccine expert]

See you next week in a brand new show^{©Heckle and Jeckle}

Therefore:

In a time of SSD, multi-core/processor, two terabyte memory and Optane App Direct Mode (RIP) machines, there is no reason not to build from BCNF data. Time to do what Dr. Codd demonstrated. Technology has finally caught up with the maths.

27 July 2015

schizophRenia [update]

More than one (and, I suspect, growing in days to come) post via R-bloggers reference this IEEE post on computer language popularity. The R-blogger posts are laudatory, "R is becoming the Next Big Thing" and such.

But, the emperor has no clothes. I just checked CRAN, twice in a minute or so. The first time said 6911 packages, the second 6915. It's a cancer. OK, a bit strong. But, the point is: R isn't a programming language. It's a statistical command language which is also programmable with a common syntax. In particular, one needn't (and likely, shouldn't) view R through the lens of C++ or java or even PHP. The value of R lies in dirt common stat routines it implements.

More and more, one reads that the Real R Programmers are dissatisfied with performance or capabilities, and grouse. A lot. Most often, they grouse about leaving the R world for Rcpp (well, may be that's only a step) or Julia or python. Let them go.

Much of the corpus of R packages come from grad students in need of creating new work in order to satisfy thesis/dissertation requirements. (The same reason we've seen Bayes take over the field; frequentist methods cover the world, and a thesis/dissertation has to cover "new ground", so Bayes was dug up from his grave to give grad students some way to be "new". Gad.) Writing code is the avenue. The fact that it's a wholly redundant exercise is not relevant to the grad student. For working data folk, using R to do mainstream analysis is where the best bang for the buck comes from.

Once again, before posting, new information arises. This time, Dirk takes another swipe at Hadley. Poor Dirk.

Hadley is a popular figure, and rightly so as he successfully introduced many newcomers to the wonders offered by R. His approach strikes some of us old greybeards as wrong---I particularly take exception with some of his writing which frequently portrays a particular approach as both the best and only one. Real programming, I think, is often a little more nuanced and aware of tradeoffs which need to be balanced. As a book on another language once popularized: "There is more than one way to do things."

Poor Dirk. "Nuance" is just a euphemism for "ambiguous". Languages, whether human or computer, that promote ambiguity generally fail. English is the archetypal human language which affords no known structure. Of all the alphabet based languages, it is the most difficult to either learn as a second language or as one's first language in learning another alpha language. The mindset of English is chaos. And so it is with programming languages. The "more than one way to do things" language is Perl, the product of a right-wing Christian. It's a mess, and widely despised.

On the other side of the coin, one finds python, built by a European math, and eiffel, ditto. Both seek to be as close to fully orthogonal in syntax and semantics.

Python:

There should be one-- and preferably only one --obvious way to do it.

Eiffel:

Exactly one way to do anything: in stark contrast to Perl's philosophy of there is more than one way to do it, Eiffel follows Bertrand Meyer's Principle of Uniqueness: "The language design should provide one good way to express every operation of interest; it should avoid providing two."

R will, in time, fail. It is an amateur language built by amateurs. To the extent it is used as a stat command language (its original purpose), it will succeed, but if the "R is a programming language" crowd get control, it will fail, because as a programming language it has far more warts than rosy cheeks.

And, of course, the RM:

The principle of orthogonal design (abbreviated POOD) was developed by database researchers David McGoveran and Christopher J. Date in the early 1990s, and first published "A New Database Design Principle" in the July 1994 issue of Database Programming and Design and reprinted several times.

Which is not say, sadly, that SQL engines enforce such. Or as Holub has said, you've got "Enough Rope to Shoot Yourself in the Foot".

[update]
Well, turns out I'm not the only one.
Revolutionary Dave:

I couldn't agree with the sentiment more, and I too [wish] the field of Statistics had more respect for solving these "mundane" (i.e. non-mathematical), but important problems.

Here's what Dave's agreeing with:

"There are definitely some academic statisticians who just don't understand why what I do is statistics, but basically I think they are all wrong . What I do is fundamentally statistics. The fact that data science exists as a field is a colossal failure of statistics. To me, that is what statistics is all about. It is gaining insight from data using modelling and visualization. Data munging and manipulation is hard and statistics has just said that's not our domain."

And this is the revelatory bit, makes my heart skip more than a single beat:

During this first job, Wickham began to reflect on better ways to store and manipulate data. "I've always been very certain that I could come up with a good way of doing things," he explained, "and that that way would actually help people." Although he didn't know it at the time, he believes it was then that he "internalized" the concept of Third Normal Form, a database design concept that would become central to his future work. Third Normal Form is essentially a manner of structuring data in a way that reduces duplication of data and ensures consistency. Wickham refers to such data as "tidy," and his tools promote and rely on it.
[my emphasis]

And, of course, Third Normal Form or its logical extension Organic Normal Form™ is just an implementation of the orthogonal principle. Great minds gather together.

Dr. Codd Was Right

Lisa Murkowski, Swamp Critter

About

Shameless Plug

Extended Pieces

Good Stuff

Followers

Blog Archive

27 July 2015

schizophRenia [update]

No comments: