Water, Water Everywhere

Nor any drop to drink. Remember that sentence? "Rime of the Ancient Mariner" by Coleridge, opium fiend. Today, the overriding meme is Big Data, as if all analyses have to shift through the entire population of interest every time for every question. Yet, there isn't always the data one needs to answer a question.

For example, I'm told that all humans dream and that some remember these dreams clearly, while others (including humble self) remember them vaguely or not at all. Sometime before dawn, the dogs invaded the bed and woke me in dreamus interruptus. The narrative of the dream was clear for a few moments, I remember that I'd remembered if you get my drift, but now it's much less clear. The dream involved Joanie and an off-hand remark she made about minor surgery. In some way, the question I became concerned with involves Joanie, who left me for her husband lo those many years ago. Ah, to be told that Dante, her son, was looking forward to me being "Dad". Didn't work out that way.

But, here's the question. As she and I were preparing to do the deed for the first time (and not on our first date, either), she says, "I hope you don't need a tight [euphemism of choice, she used the one woman abhor]". (To quote the late Lou Gottlieb, "Here comes the smut Martha! No indeed. Fortunately, it's a subject which can be handled delicately".) Never having spent time with a MILF before, or even considered it, I hadn't any thoughts on the subject. Turned out not to disappoint, since I was stark raving IN LOVE, although the experience is different.

Which leads to my data problem. How many women, either by country, economic stratum, religion, whatever; choose C-section in order to preserve that mechanical superiority (I can now admit, sorry Joanie, that tighter is better)? How many? Is the documented rise in C-sections ( this article doesn't list this motivation) due in some part to try to Keep Daddy Home? I quick review (on Amazon; I don't own it) of the "Freakonomics" index offers no C-section section. How could they miss this topic?

There is a downside to each choice. Go for down the chute, and you give up any chance of comparisons to a vacuum cleaner. Go for the C-section, and you have a scar. The latter choice is less intrusive than in decades past, due to smaller transverse incision which is below all but a Paris Hilton bikini. The WikiPedia article includes this aside: "The women in these studies have indicated that their preference for Caesarean section is more likely to be partly due to considerations of pain and vaginal tone." Hmmm. I may be on to something, after all.

Would Mom admit the motivation to her Ob/Gyn? Replaying some conversations with Joanie dredged up a memory of her talking about the birth of Dante (around the same time??), and mentioning "Father's Knot". Well, that gets some response from Google. A rather long discussion; and frank too. Alas, no data. I need data. I want to make graphs and regressions!!! Come on ladies, give it up.

Apples to Apples

Some of us in the RDBMS world take it for granted, or that it is obvious, that SSD is a better way to go over HDD. The logic is unimpeachable: normalized data uses a fraction of the space needed by flatfile schemas, thus the $$$/database (not necessarily $$$/GB) is a wash, while the response is better for SSD.

Well, I've found an explorer who has had the opportunity to look at SSD and HDD with existing schema. Even in this less than fair comparison, SSD wins. He hasn't, yet, compared what I'll assume is a generally [un|de]normalized schema to the normalized version. Praise Codd.

A Confidence Man

The new home sales report for March was just released. The "takeaway" number was a bit below February, a bit below expectations, at 313,000. You can find it here.

Now, here's the kicker.

The headline:
"Sales of new single-family houses in February 2012 were at a seasonally adjusted annual rate of 313,000, according to estimates released jointly today by the U.S. Census Bureau and the Department of Housing and Urban Development."

Which is followed by (and won't likely ever be printed in mainstream press reports):
"This is 1.6 percent (±23.9%)* below the revised January rate of 318,000, but is 11.4 percent (±17.8%)* above the February 2011 estimate of 281,000."

So, what do the asterisks mean? Here's the footnote they tie to:
"* 90% confidence interval includes zero. The Census Bureau does not have sufficient statistical evidence to conclude that the actual change is different from zero."

In other words, these "estimates" are wildly, I say WILDLY unreliable. The CIs are huge, and they're only 90% intervals! Common practice is 95% intervals (by definition, they're bigger, by how much depends on the sample data). Too bad Census didn't post those numbers.

In other words smores, there's no statistical (these are sample derived, recall) evidence that new home sales have improved over the last month, AND YEAR. So take that.

Is The Soup Seasoned to Your Taste?

Turns out, there are professional quants (technically, I'm not one at the moment, since no one pays me to compose these missives) out there who question the impact of seasonal adjustment on the rosier than expected numbers for December to February (technically, meteorological winter).

Piggies, Again

It was just a few short days ago that I re-iterated my prediction that BI/DW would soon succumb to the multi-core/SSD/RDBMS machine. Little did I fathom that it would only be a few short days.

This PR from Fusion-io describes one such implementation. Now, SQL Server is one of my favorite databases, though, since it runs only on Windows, not one I use all that much right now.

Here's the money quote:
"The Fusion ioDrives allowed us to forego the data warehouse entirely, and implement real-time analysis on our primary online transaction processing system," said Thomas Pullen, BetOnSoft Database Administrator.

Boy howdy! It's been figured out. While the piece doesn't describe the normal-ness of the database, I'll see if they'll tell me. Look for an update in the next few days.

Praise Codd!

Death by Triage

It's been recently reported that the Obama money won't be finding its way to Congressional contests. Given that money is the lifeblood of politics, Obama is putting a fork in Democracy. It didn't have to be this way. By ignoring the 2010 election, he gave the Right Wingnuts all the opening they needed. I guess it's just pure ego.

I'm of two minds about how this affects a Triage system. On the one hand, with less money, and no co-ordination among the White House, DNC and congressional election committees, Triage might not be worth the effort. On the other hand, with less money, but with willing co-operation among DNC and all others but the WH, Triage may be more important than ever.

Sigh. I do wish they'd call.

Quant on Quant Violence

It's not been too frequent, even in the wake of quants taking down the world's economy, to read a quant who has the temerity (and, possibly, gonads) to say that which must be silent. But now it has.

One quote from the interview (then I'll expect you to go off and read it):

Q: What's the biggest change you feel the credit crisis has brought to the development of quantitative finance?

A: One change is that it has pushed quants away from the illusion that their models are true. That's a good thing, but unfortunately probably temporary. People will be lulled into complacency once their models have worked well for a while. The other major change I see is that it has prompted more thought on hard but important problems. I'm thinking of things like understanding herding risk, and the real dynamics of markets.

This is a closer understanding of what went wrong, but still doesn't deal with the core issue: the data needed to identify the disconnect was readily available, but went ignored by quants. Mea culpa is always painful.

Galbraith got it right, and no one has demonstrated anything smarter: "Financial genius is a rising market". Or, as I'll be musing about anon, all the swans are black.

We should all remember the story of the three little pigs: one built his house of straw, the second of sticks, while the third built with bricks and foiled the evil, hungry wolf who blew the others' houses down and feasted on a little pig each time. How should a data warehouse be built? To date, the dominant forms are Inmon (straw) and Kimball (sticks). Reality blows them down and eats the database engineer. Not fun.

Some time back, I scribbled a bit about BI, making the case that high NF on SSD would make the various un-normalized stars and snowflakes obsolete. I hadn't found anyone in the mainstream of BI/DW agreeing with me up to then, and let it go at that.

Now, comes this piece, which does come from, if not mainstream at least in the main, BI/DW practitioners. Well, Martha, there appears to be intelligent life in the solar system.