25 January 2012

Don't Thread On Me

This thread was mentioned in a performance sub-group posting. Give it a read.

Back? It means, so far as I can see, that PG is toast. It will fall down to being the cheap and dirty alternative to MySql, which even has, at least two, multi-threaded engines. DB2 switched it's *nix engine to threads from processes with release 9.5. Oracle claims it for releases going back to 7 (I haven't tried to determine which parts or applications; Larry has bought so many tchochtkes over the years...). SQL Server is threaded.

Given that cpu's are breeding threads faster than cores, PG will fall into irrelevance. Too bad, it's kind of a nice database.

24 January 2012

Can't Find My Way Home

I have a second fifteen seconds of fame over on Simple Talk, here. In the course of the commentary, I was asked why Google is a toy application, along with Facebook and Twitter. What follows is a much longer version of what I posted.


Look closely at Google, and all you've got left is a massive advert agency. Adverts are important to life? Advert engines advance civilization? Of course not. We Yanks are fascinated with "Mad Men" in a tee-hee sort of way, yet base more of our system building, and one might assert our economy, on selling adverts. Rather than making things for ourselves, we make insubstantial playthings. Psychologists call this behaviour cognitive dissonance.

Decisions, and the actions which follow from them, are manifestations of value judgments. To the extent that our systems building efforts concentrate on toys rather than meaningful (for some definition of that) systems, we've revealed what it is we value. Pixelated toys. I'd love to hear an argument that Facebook and Twitter etc. represent advancement of civilization. And, no, they're not improvements in communication.

The share price just got spanked because they've not kept up the advert machinery, consuming all advertising on the planet. If they manage that, then what?

Facebook just arbitrarily skunked its users.

These are toy applications since their only point is to support play. They encourage adolescent tribalism, which the planet doesn't need in excess. The vehicle is fancier than passing notes in class back when your grandfather was a kid, but it's the same behaviour only prettier to do.

For a more formal, and literate, take on the issue, read up Nick Carr. He also has an infrequently posted blog.

From a purely systems point of view, Google is a re-hash of code/file systems from the 1960's. Size isn't distinguishing. Read the Wikipedia write up on Map/Reduce.

Wikipedia is a more important piece of software.

16 January 2012

What Language Do You Spook?

Drew Conway is my kind of guy: a spook. Now, before any of the uber-PC folk get their backs up, an alternate definition of the word: colloquial for intelligence operative, origination unknown. I don't always put it on my resume, but I spent some time in the 1980's in Jack Anderson's shop. Officially, an intern; I did get paid some money. That's likely because I brought in a two column story (a story getting more than one was highly unusual) on the CIA running guns from Argentina (if memory serves) to Ghana. Corky Johnson worked on it with me. This was the time of Iran-Contra and the like. The story falls into the "like" category. We could never figure out why the Post didn't follow-up. I still have the galleys.

So, I was surfing through R-bloggers looking for background using R with Powerpoint; I know, yucky mucky. But I'm supposed to chat up some folks about that today, if they bother to call. In the search results is a long piece by Mr. Conway on proper data visualization. Comes out against pie charts. Good for him. I highly recommend it; he manages a bit of humor, when offering that a graph should have a 140 character footnote. Such a twit!!

13 January 2012

Don't Be a Dupe

The post on the NoSql dust up from Postgres led to me adding a comment on one of the posts, and now Chris Travers has commented on my post (ring around a rosy?). You can find his comment here(scroll to the bottom).

Rather than waste the opportunity in a comment, and because I truly believe it's worthwhile on it's own, herewith.

I do not believe that there is any use case for NoSql, on technical grounds. That is to say, there are no use cases where NoSql provides a superior solution to the RM/SQL (henceforth, just SQL). I do believe that NoSql is chosen 99.44% of the time just because it implements Programmers' Perpetual Employment Paradise. No database, all COBOL forever; well, java or C++/#.

The reason SQL is always superior is because:
- it implements the smallest real data footprint to logical data footprint
- it enforces logical data integrity

One of the hot new technologies, over the last few years, has been data deduplication. Dedupe is just an ad hoc approach to 5NF, with none of the benefits of SQL; get rid of the redundancies. But there's no structure or control. One of the objections to dedupe tech is the fragility of it. How much of a glitch will cause data corruption, and so on.

Now, for non-transactional datastores with no logical structure, then SQL databases probably don't matter much. If one considers that a use case, then OK by me. I wouldn't spend my time with such an application, that's for sure.

07 January 2012

Color Me Happy

I posted the following comment on Offensive Politics' post on its graphic presentation of Iowa results:

This post got referenced on Revolutions Analytics, and garnered a comment that the (default) HCL scale used by ggplot is faulty. I commented that the closeness of the vote is perfectly reflected in the color scale; the confusion is correct.

So, off I went to Wickham's ggplot2 book, chapter 6 on scales. scale_full_gradient2() and library(vcd) provide ways to impose non-linear color scales. At first, a good thing. But I wonder. There's been a good deal of discussion, here and there, with regard to spinning data. It would seem that using a non-linear color scale is just as bad as fiddling with axis scales.

What do you think?


I mentioned this post earlier, which was then followed by its linking from Revolution Analytics. One of the prime directives of professional statistics, as distinct from polemic number spewing, is that form shouldn't affect perception. In other words, don't fiddle the picture to plant an idea not supported by the data. In this case, the original commenter complained that the color scale made it difficult to distinguish among the counties' vote winners. My view, so far at least, is that the defaults used by R/ggplot for this implementation rightly reflect the similarity between Romney and Santorum.

The default color scale is, essentially, linear. The alternatives mentioned are decidedly non-linear. Now, the notion of linearity with respect to color scales may be problematic, but the ggplot defaults do implement a notion of linearity, in that two opposite colors are blended along a linear mixing.

Here's examples from Wickham's ggplot2 text. I suppose it was uploaded legally. I hope.

This is the various color scalings:



As can be seen, the color gradient in the right two alternatives is far more extreme. I'm still not convinced that one should do such fiddling.

06 January 2012

You Rook Mahvelous

A couple of other pieces are in the works, but this just announced from OCZ and Marvell. I didn't see any mention of exclusivity in the PRs, but AnandTech says so and mentions what caught my eye when reading the PR: it's PCIe on x1 channel. What's up with that? There's that line from "The King and I" (film version, at least), that goes "A bee flits from flower to flower...". OCZ is sure doing that with regard to controller.

Last I knew, PCIe SSD were controller-less (in the silicon sense), off-loading control to code run on the cpu. Depending on who one follows, this is either a good thing or a bad thing. Said code is written by the SSD vendor (or bought in).

03 January 2012

Be Kind to Idiots [updated]

In the olden days, when I was a child, good parents instilled good values in their children. One was to not make fun of idiots; being an idiot was not the idiot's fault. Or, so the thought was. We know different now. Many idiots are so because they've chosen the path of ignorance, devolving further into idiocy. We see this in politics and IT, specifically the nexus with RDBMS.

I've been following the LedgerSMB postings (those that appear on the Postgres site, anyway), because the structure is similar to that taken by xTuple. Both, to greater or lesser extent, espouse integrity in the database, as opposed to client application code. Neither prevents editing in client screens, but both assert that the constraints, of record, are in the database; and can be duplicated in the client.

One of the fascinating aspects of this approach will be, soon, the picture when the likes of WebSocket is routinely available. Data centralization is always the future. When the IBM PC was first released, it was as a low powered programming work station; the "user" would use it to write "non-professional" code to suit his (not many hers at the time) needs. Scientists and engineers were the market, and IBM estimated 2,500 per year would sell. Then, along came 1-2-3, and the computer appliance was born; only now have I finally been vindicated in making that description. Soon, Netware showed up, to connect all those independent PCs to printers and file servers. The kodder kiddies really are convinced that The Cloud is something new. Some of them assert that it's really different this time; that this time it's about the cheapest hardware possible appearing like manna from heaven on demand. Foolishness. They'll find out, and another few hundred billion dollars will be wasted by the Fortune X00 chasing this as was when they chased J2EE applications.

From the very beginning, Dr. Codd had to cram the RM down the throats of heel dragging coders, who willfully ignore the fact that logic in code is just data compares made complicated. Do the logic with data where it lives. But, NO, that threatens the rice bowl of all them coders.

So, Chris Travers (of, and perhaps all of, LedgerSMB) has been posting about the application structure.

He published this post recently.
This led to the following post.
And that led to a supportive post from Joe Abbate.

Chris was too kind. Just as Ron Paul shouldn't be taken seriously, neither should Tony Marston. Neither has a clue, only zealotry. Joe, to his credit, took up this quote: "The database has always been a dumb data store, with all the business logic held separately within the application." His response was also too kind.

Marston has clearly never read, or if he did hasn't a clue what he read, Codd or Date. His statement is only true of coders back to AutoCoder and COBOL, not databases. As with the COBOL/VSAM/IMS folks that gave Dr. Codd the finger, Marston is just another guy who wants to keep writing lots of code, no matter how dumb that is. Prior to Dr. Codd's RM/RDBMS, there was IDMS and IMS; network and hierarchical database respectively. Both implemented logic in the database structure. Moreover, in 1968/9, IBM released CICS, likely the longest lived TPM (transaction processing monitor). A TPM does the read/write control for external code, largely COBOL with the IBM mainframes of the time. So, Marston is wrong in toto. Marston embodies the latter day Goths, who wish to take us back to the Dark Ages of their crude ancestors. A lot like Ron Paul, come to think of it.

The entire purpose of TPMs, pre-relational databases (IDMS, IMS, PICK), and RBDMS is to put data control with the data where it belongs. That is a threat to the horde of coders. Too bad.

[update]
For the record, I went back to see how the thread had gone, and couldn't keep my mouth shut. You have been warned.

01 January 2012

Back to the Future

As those who've been following this adventure since the beginning must know, one part of the journey down The Yellow Brick Road involves the nature of application execution. I've been saying for many years, since long before I ever put fingers to blog, that the inevitable result of "progress" in application design is to replicate, albeit with pretty pixels, the world of *nix databases and the VT-100. The reason is simple: this paradigm provides the greatest control with the greatest ease of use.

The continuing problem with actually getting there is the disconnected nature of http. "Server push", as it's often called, hasn't been functional. So, imagine my surprise to see this Eckel post on Artima. Here's the Wikipedia entry, too. The upshot of WebSocket is a terminal on a long wire to the database. The winners in this world will be those who recognize that the high normal form database, with small bits of data per UI screen, will be the winners. This paradigm supports centrally edited data, to any screen. In other words, client agnostic.

In due time, perhaps with my consulting advice, organizations will realize what WebSocket means. There's that old joke: "Doctor, it hurts when I do this." "Don't do that." The easiest approach to dealing with Big Data is to not build stupid flatfile datastores. *nix databases, of high normal form, on multi-core SSD machines allow data to be sent to clients in whatever form needed. Smaller is better, of course, but massive joined rows can be sent. Not that one would want to, of course.