Dr. Codd Was Right: The Tail Wags the Dog, 95% CI

Vaccinated ≠ Not Infectious

$536,800,000 MARF^™(party like it's 1829)
-- New York Slammers, so far/2024 [GA and The Feds still to come]

Covid-19 has killed at least 1,123,836 people (as of 20 March 2023, final update)

Scientists aren't vocal enough about science. There are large groups of people who think scientists are all frauds and who don't believe in science, and they're being cultured by some of our far right-wing politicians, religious leaders and community leaders.
-- Drew Weissman/2022

To date, Microsoft is stating that organizations testing In-Memory OLTP have seen transaction speeds improve by up to 30 times compared to past performance, with the best performance gains achieved when the business logic resides in the database and not in the applications.
-- Jonathan Watts/2015 [my emphasis]

I have had to explain and re-explain and re-explain and re-explain, you know, how relational databases work, what is an eigenvector, what is dimensionality reduction.

-- Christopher Wylie/2018

... but Flash-based storage has such a different performance profile from rotating media, that I suspect that it will end up having a large impact on filesystem design. Right now, most filesystems tend to be designed with the latencies of rotating media in mind.

-- Linus Torvalds/2007

I believe quite strongly that, if you think about the issue at the appropriate level of abstraction, you're inexorably led to the position that databases must be relational.

-- Chris Date/2009

This week's thought

Investigative reporters are an idiosyncratic breed of journalist. Typically fearless, they are often a source of angina to their editors. Mr. Walsh was no exception.
-- Michael S. Rosenwald/2024 [no surprise - I spent a bit of time on Jack Anderson's staff]

Therefore:

In a time of SSD, multi-core/processor, two terabyte memory and Optane App Direct Mode machines, there is no reason not to build from BCNF data. Time to do what Dr. Codd demonstrated. Technology has finally caught up with the maths.

30 April 2012

The Tail Wags the Dog, 95% CI

There's been a spate of R pieces recently, dealing with R as a programming language, and in particular, its assumed deficiencies. Here, and and here, and here are examples.

It's a bunch of tails wagging the dog, and doesn't address the real question: how to make R the de-facto stat pack where SAS, SPSS, and Stata tread currently. As mentioned in the Triage piece some reviews of R are concerned with how much is R, how much C, and how much Fortran. Various reviewers have been puzzled by the poles: more R than expected, and less R than expected. There are reported to be 3,800 packages in CRAN, and rather fewer (554) in Bioconductor. Call it 4,400 in round numbers. Assume that a package has, on average, 3 authors, which I think is generous, given how many grad students are involved (hell, Hadley Wickham does ggplot2 all by his lonesome). That's 13,200 folks.

So, we have 2,000,000 useRs. We have 13,200 "developers" (not counting the core team maintaining the language). Which group should the "language" serve? Clearly, the 2,000,000. In particular, insurgency into the SAS/SPSS beachhead will not be supported by emphasizing R as a coders' paradise (it isn't; too many warts), rather than as an analysts' golden sword. It seems to me, having used most stat packs and "real" programming languages over the years, that this divided duties situation is what makes for some of the oddities of R. Oddities both from a command writer's point of view, as well as a coder's. The first link has all the gory details. I have used one 4GL, Progress, which was the best in high button shoes for databases in the early 1990's, which was bootstrapped. But the audience was other coders (the report generator had its own syntax, a bit of RPG), not analysts, so having one syntax for two groups of coders wasn't a big deal. With R, the two constituencies are much more different.

One can build a language successfully, while not being a (group of) language designer by profession: Perl and Ruby being the two most well known examples. Contrast with python (defined by a mathematician) and java (language builder). Which of these one finds most comfortable as a working syntax says more about oneself than the language. For what it's worth, python. I've read Chamber's book, and a good deal of others, and I'm still not clear about why or how R's syntactical oddities are supposed to serve uniquely the purpose of stats. Clearly, the vector paradigm comes from Fortran and BMDP and does fit. The rest, not so much. And it is true that numerical programming has been on a drift from Fortran to C for some time; one can argue that this represents a lowering of the semantic, and thus not helpful.

As I commented on a post, Rcpp is the likely platform for development going forward; how soon, I can't say. Such a transition will mean that stat folks won't be the driving authors anymore, unless they choose to be real coders too (or primarily). This opinion is all based on the received wisdom that what's holding R back from displacing SAS/SPSS is speed. I don't think it's the open source thing, really. After all, even IBM uses linux. The file based structure of SAS/SPSS does have advantages.

Dr. Codd Was Right

Vaccinated ≠ Not Infectious

About

Shameless Plug

Extended Pieces

Good Stuff

Followers

Blog Archive

30 April 2012

The Tail Wags the Dog, 95% CI

No comments: