27 August 2011

Don't Mess With Texas

When I was somewhat younger, I worked for a math stat who was born in Rhode Island, grew up in Las Vegas, and did his graduate work in Austin. This was when I first heard the phrase "Don't mess with Texas". Context is everything, and today the context is data storage. Here's one version of the news. Which wouldn't be all that interesting, given that Texas Memory has been doing SSD for decades.

No, what makes this of interest is the following quote:
"TMS is targeting relational databases with its new storage device, just as Fibre Channel drives would be used as the primary storage."

Rad. BCNF support in the flesh. YeeHa.

24 August 2011

Epiphany

Whilst bloviating on the OCZ message board, I had an epiphany. It follows, including the snippet upon which I was commenting.


-- Maybe the strategy is to sell more consumer products at a low GM so they can increase the brand awareness of OCZ which will help sell more enterprise products that have huge GM's.

Not likely. The Enterprise SSD vendors, modulo Fusion (may be), build parts which are lots more expensive, and generally have bespoke controllers and SLC NAND (eMLC, whatever that might really mean, too).

To the extent that Enterprise SSD goes the route of Enterprise HDD (buy 'em cheap and swap 'em when they crap out), OCZ could best the likes of STEC, Violin and Texas Memory. We're not there yet; whoever figures out how to make a cheap SSD which dies gracefully wins. That may not be possible, given the physics, of course.

19 August 2011

Viagra At Home

A bit of R. I've mentioned a few times that I "knew" we were headed into the ditch around 2003. I don't recall that I'd read Shiller at that point (or even that I was aware of him), it was just obvious that house prices were outstripping median income. The raw data is available (at Shiller's blog, http://www.econ.yale.edu/~shiller/data.htm no idea how long it has been), so here's a picture worth a few words. The data run from 1890 to 2009.




Where's my little blue pill???

18 August 2011

Old Frankenstein

In "Young Frankenstein", The Doctor asks Eye-gor (Igor) whose brain he *really* retrieved. Igor replies, "Abby Normal?" I've spent the last hour or so wandering amongst some web sites, blogs, and whitepapers which seek to explain Normal Forms to normal folks; no math, just words.

This one says this: "'Normalization' just means making something more normal, which usually means bringing it closer to conformity with a given standard." Alas, not even close.

Which, since I've been re-reading my probability, stats, and stat pack books and docs, this flipped a switch. Which switch leads to a clearer, albeit slightly mathematical, definition.

I've done a quick search, and can't confirm that he explicitly said so, but given that Dr. Codd was trained as a mathematician, I'll surmise that he used the word in the following sense. In math, two terms are used as synonyms, orthogonal and normal. Remember from geometry class that a 90 degree line is the normal line? It's also orthogonal. Orthogonal as a concept means independence of influence (just as the X axis is independent of the Y axis; there some math), and Codd uses that term liberally in his paper.

So, the normal forms have nothing to do with not insane or seeking standards, but with data independence. Which is normal.

16 August 2011

How To Mistreat Life

It is amazing, but so far as I can remember, web apps have gotten more than half-way through 2011 before an article which takes client side code to task for being silly. Hard truth #1 is the worst. And the only way to avoid it: database enforced integrity. There, I said it again. NO DATA GETS WRITTEN WITHOUT THE ENGINE'S SAY SO.

08 August 2011

The Know Nothing Party

I came upon this rant/essay via R-bloggers. Beyond the fact that Zed (love that name) has a background fairly close to mine, is an R afficionado, and is willing to call the Emperor naked, one could substitute "RDBMS" for "statistics" in his piece. It would then read like a few of those which have appeared in this endeavor.

I really should send the link along to some of those folks in Washington I've chatted with over the last few weeks. Nah. They wouldn't get the joke.

Of particular relevance:
"It's pretty simple: If you want to measure something, then don't measure other shit. Wow, what a revelation."

04 August 2011

And The Survey Says...

As my dive into stats, and possible departure from RDBMS as the site at the end of the Yellow Brick Road, continues, I came across a ruby library called fechell. My inital thought: "shouldn't that be fechall, as in Fetch All, Fetch Ell. What does that mean? Well, D'oh! The normal name for the code is FECHell. Ah, much more to the point.

I found two posts, by way of R-bloggers by the person who developed the library. Here's the post where he develops the use of the data and the library. He references a Part 1 post with the background.

This intrigues me not a little bit. Suppose, just for grins, that you're the campaign manager for a state wide (or larger) candidate. That is, one where monies are allocated to distinct locations. Further, suppose that you have this data in close to real-time, and you also have data measuring "outcome" for the use of these monies, say polling data. And let's say that the two maps, monies and outcomes, are congruent.

Could one make predictive decisions about monies allocations? Well, it depends. The naive' answer is: abso-freakin-lutely!!!! The real answer: not so much. The naive' notion is that money well spent is indicated by winning the election (which is kind of too late for allocation decisions) or some upward movement in polling data. Ah. Let's spend where the spending works. Superficially, makes a lot of sense.

The only problem: stat studies invariably show little correlation between money and winning. I know, Liberals in particular are worried about the Citizens United effect, where corporations have gobs more loot than anybody else. They'll just buy the elections. And they well might. This would not make me smile. But, the studies of the data show that the effectiveness of campaign ads is less grounded in their expense, rather their content. Sometimes, may be often, attack ads work.

Here's an academic attempt to find out.

And yet another.

A quote from the second story (not, that I know yet, cited from the study):
"While we see an influence of the campaign ad in the short-run, in the long run the ad loses its effectiveness. This finding begs the question: how cost effective is it for politicians to spend millions of dollars on campaign ads which have little long-term effect on voter opinion?"

StatMan to the rescue!!! The problem is that it's now August, 2011, and any application being written as I write (assuming that folks have started) need to be up and running by January. In order to be worth the time and money expended, the application has to have *predictive* value. FECHell data passed through some software is only retrospective. Political ops should know enough about their candidates and opponents to design ads that work. Making a simplistic leap from $$$ to polling/winning is a waste of that time and money. The retrospective data needs to be run through some multi-variate hoops (either multiple regression or ANOVA, most likely; PCA and MDS are less applicable here) to identify the attributes, besides money, which move the bar toward higher polling or winning.

The problem with the simplistic model is that the knee jerk reaction to positive feedback in some campaign is to toss yet more money to that campaign. But that's likely a waste of money. The goal is to use the data to identify those trailing candidates today who'll win tomorrow if they get more $$$ and *spend it on what works*. Pouring money into a winner is a loser. Pouring money down a rat hole is, too. The latter case is more obvious, but the former is just as wasteful.

Economists refer to "opportunity costs"; I can spend $1 on toothpaste or candy. I can't have both. In the short run, candy is dandy. In the long run, toothpaste wins. Campaigns don't, generally, last as long as the toothpaste's long run, but you get the point. Money is finite, and should be spent on those activities/goods/services which gain advantage to the goal. In the case of FECHell data, the goal is winning elections. Looking retrospectively only at $$$ and winners is just the wrong goal.