30 September 2016

Thought For The Day - 30 September 2016

PROS has been using R for a while in development, but found running R within SQL Server 2016 to be 100 times (not 100%, 100x!) faster for price optimization.

here

What's that adage from Gandhi, "then you win."

25 September 2016

Thought For The Day - 25 September 2016

It's advert time, but you should watch out for reruns of "Parts Unknown", since Bourdain somehow managed to get Obama to have dinner at a noodle shop in the middle of Hanoi. They had an adult conversation of various topics. Now, think for a moment whether that could have happened with King Donald of Orange.

22 September 2016

Dew Drop Inn, The Good News Cafe - part the second

It's kind of quick for a part the second, but Nate Cohn has let another cat out of another bag.
Well, well, well. Look at that. A net five-point difference between the five measures, including our own, even though all are based on identical data. Remember: There are no sampling differences in this exercise. Everyone is coming up with a number based on the same interviews.

With regard to the Census/BLS earnings surveys, here's how to decline. More bad data. Yum. Ayn Rand would be proud to turn back the clock to 1800.

20 September 2016

Dew Drop Inn, The Good News Cafe

The regular reader may recall the admonition in these endeavors that macro analysis is fraught with danger: nearly all the data is from sample surveys, of varying quality and coverage. This reader, who hasn't been living under a rock or as "The Martian", is aware that 2015 income has been widely reported as having risen in the last year. For the first time in many years. The reporting, in some places, does admit that the 2015 level is still below 2007 levels; but who's counting?

Here's the public release:
The increases of 5.3 percent and 5.4 percent for family and nonfamily households were not statistically different.
In fact, if you read through the various sections, that sentence repeats and repeats and repeats...

A question about stat sig makes it all a bit worse:
The Census Bureau uses 90 percent confidence intervals and 0.10 levels of significance to determine statistical validity. Consult standard statistical textbooks for alternative criteria.
-- here

So, right off the bat, we've got squishing differences. By usually accepted stat sig, .05, it wouldn't be even close.

It gets better. There's a link to a spreadsheet with some underlying numbers. If you look at these numbers, the claim is that most are stat sig at .10!

If you follow the first quote link (page 6):
The effect of nonresponse cannot be measured directly, but one indication of its potential effect is the nonresponse rate. The basic CPS household-level nonresponse rate was 14.9 percent. The household-level CPS ASEC nonresponse rate was an additional 15.8 percent. These two nonresponse rates lead to a combined supplement nonresponse rate of 28.3 percent.

Following on, one finds the description of imputing missing responses:
Multiple imputation is a general approach to analyzing data with missing values. We can treat the traditional sample as if the responses were missing for income sources targeted by the redesign and use multiple imputation to generate plausible responses. We use a flexible semiparametric imputation technique to place individuals into strata along two dimensions: 1) their probability of income recipiency and 2) their expected income conditional on recipiency for each income source.
Much of that document is devoted to describing how this is done.

All surveys must deal with non-responses, so those in the business wouldn't find such a process out of band. For civilians, not so much. As I said, most likely believe that all these numbers are full measures, probably from IRS. Were that it were so. Without the raw data, it's not possible (well, for humble self) parse out whether the wonderful increase was an artifact of imputation. But, it could be.

So, should we guess that the Kenyan President ordered the minions at Census and BLS to put a heavy thumb on the scales? No. Having done data and stats for the government (not public facing, though), there's a good deal of resistance to the corner office dudes telling us what to do. Case in point: Farkas at FDA resigned rather than be party to the data fuck up that was eteplirsen. There's been a good deal of fiddling with the sampling underlying the surveys (yes, more than one for these data), and described at length in the background docs. Is all of this fiddling enough to turn nearly a decade of the 1% getting richer and the 99% having kids the other way 'round? Could be. Certainly a question those with the data to answer.

19 September 2016

Pandering Central

Well, being a biostat just got more difficult. FDA, specifically the Boss in Charge Dr. Janet Woodcock, decided by fiat to approve Sarepta's eteplirsen, branded Exondys. The MoA data presented by the sponsor was not statistically different from 0, in aggregate, and no evidence that what dystrophin (the target compound) was produced was clinically meaningful. Woodcock threw not only the outside panel of experts, but also her staff under the bus. Both sets of experts saw eteplirsen for what it is: a (estimated) $400,000 saline solution. You wonder why healthcare cost goes nuts? This is one of the main reasons. The DMD parents, who lobbied ceaselessly for approval, win. Or so they think. When the boys die on schedule, they'll be really angry. That's not a good thing. If Exondys made some level of difference for all DMD patients, there might be some sense to this. But Exondys only affects 13% of DMD patients.

Sarepta is on the hook to conduct a confirmatory trial. I wouldn't hold my breath; they'll continue to find excuses, just as they've done so far.

It's a sad day for data.

06 September 2016

Thought For The Day - 6 September 2016

Vegetarian, vegan, gluten free and such dieters can be tweeked by simply telling them a bit of history. Among the dinosaurs, the herbivores were the big, slow, fat, dumb ones; while the carnivores were the fast, lean, smart ones.

05 September 2016

NoSql? No Mas! No Mas!

First they ignore you, then they laugh at you, then they fight you, then you win.
-- Gandhi

It may be a tad early to gloat, but indications are that the NoSql zealots have waved the white flag and admitted that CAP is silly and doing ACID is way more fun. I suppose they deserve hemorrhoids, too. Couldn't happen to a nicer bunch of folks. A bit of innterTubes searching confirms the occasional tidbit that drifts by: the thought leaders in the NoSql cabal finally admit that transactions and central control over data consistency ain't such an old fashioned idea after all. They've discovered it, and will be patenting it soon. Not that NoSql datastores were any kind of innovation, either. Just VSAM files in ASCII with a buzzy name.

Told ya so.

Here's the main wave from the perpetrator of CAP. At least the principal instigator admits the error. The silly part of the whole episode is that partitions really are rare occurrences. They are just extended latency when they occur. Federated RDBMS, which have been around since about 1990 (the general principles since the mid 80s), have handled the situation. Here's a DB2 tutorial from 2003. The semantics are about the same with other such RDBMS.
As the "CAP Confusion" sidebar explains, the "2 of 3" view is misleading on several fronts. First, because partitions are rare, there is little reason to forfeit C or A when the system is not partitioned.

Fact is, distributed RDBMS (both single and multi- vendor database) existed since at least the early 1990's. And it wasn't just casual; here's a paper on security from Mitre (just down the road from Progress, which supported federation) from 1994. While it's no secret that I'm not a big fan of The Zuck,
Facebook uses the opposite strategy: the master copy is always in one location, so a remote user typically has a closer but potentially stale copy. However, when users update their pages, the update goes to the master copy directly as do all the user's reads for a short time, despite higher latency. After 20 seconds, the user's traffic reverts to the closer copy, which by that time should reflect the update.

So, what's the deal?
Another aspect of CAP confusion is the hidden cost of forfeiting consistency, which is the need to know the system's invariants. The subtle beauty of a consistent system is that the invariants tend to hold even when the designer does not know what they are.

Or, as many RM zealots tell us, high NF schemas reveal facts about data relationships we didn't know before. The schema specifies the invariants, but the data reveals the real world correlations.

Later in the piece, Brewer goes off the deep end:
The essential ATM operations are deposit, withdraw, and check balance. The key invariant is that the balance should be zero or higher. Because only withdraw can violate the invariant, it will need special treatment, but the other two operations can always execute.

This is the PollyAnna view of how banks run ATMs, and transactions generally. His description, and what is assumed by most civilians, is that Your Bank updates Your Account in real time, whether at an ATM or human teller. Not true. Accounts are reconciled (sometimes so as to generate overdrafts!)) in batch at some time EOD. Much of the big money made on bank hacking happens because the perps know that they have hours to do the deed before the accounts used are reconciled. Sometimes the intermediate accounts never see the deed. COBOL cowboys much prefer batch. They've been doing things that way for six decades. BASE has been the default paradigm in banking since forever.

So, with so many cpu cycles, SSD, XPoint, NVRAM, bandwidth, and the like available why would anyone drop OLTP/ACID on purpose? Back in the thrilling days of yesteryear when the 360 and 2311 DASD ruled the world, may be there was no other choice. Times, they are a changin.

[For those that keep track of such things, this musing and its title were started before I saw the adverts for the new Roberto Duran movie.]

04 September 2016

Physicists Aren't From Mars

Was toddling back from the grocery this morning, and the wind from Hermine made its way to South Butt Fuck. For reasons unknown, that reminded me of "The Martian", which reminded me that the most vocal complaint about the movie (and, I guess, the book which I've not read) was the initial premise, that a wind storm on Mars would be powerful enough to endanger the MAV.

But, to me anyway, that wasn't the dumbest McGuffin in the movie. That was the Rich Purnell Maneuver, whereby the Hermes mothercraft is slingshot back to Mars to collect Watney. The story takes place in 2035 and that's important.
The Mariner 10 probe was the first spacecraft to use the gravitational slingshot effect to reach another planet, passing by Venus on February 5, 1974, on its way to becoming the first spacecraft to explore Mercury.
-- Wikipedia

The movie shows Purnell sitting in a cold room with hundreds of servers connected to his laptop, ostensibly to do the harder than hard calculations. Give me a break. NASA had a 60 year Rip Van Winkle moment?

03 September 2016

Thought For The Day - 3 September 2016

Recently, I saw a comment on one of the many message boards that said, more or less, that the profit motive is the key to innovation.

Bullshit on that. Smart people will do smart things, if so allowed, irregardless. It's what they do; just like the scorpion, the frog, and the river. Paying stupid people more money doesn't make them any smarter. Or capable of doing smart things.

Money can't buy you brains. Not to put inside your skull, at least.

Watch Where You're Going

Perhaps the most useful service provided by Apple Watch is to provide some of us (humble self very much included) the motivation to prick yet another rosy balloon. There persist in being folks who assert that the Watch is just a better battery away from being the awesome device worn by cartoon cop Dick Tracy. It's such silliness. Please read Gordon's book. Technological progress isn't a matter of human imagination, pulling rabbits out of our collective hat (brain) deus ex machina, but rather figuring out how Mother Nature has put the world together. The earth, and all the resources in it, is finite. There's a lot less of them than 1800. Get over it.

We've arrived at the lithium-ion battery, and that's as far as we're going to get. At least for a watch form factor. Lithium is the smallest atom capable of supplying electrons, without the potential of blowing up (mostly, lithium batteries don't :) ). Until a neo-Edison invents a new atom, that is. For those wishing for more authority, MIT has a new piece on the problem. It fails to make the point that the periodic table has been fully populated, modulo accelerator generated nanosecond monster atoms, and thus the wall has been hit. Lithium is it. The hydrogen fuel cell has been around for a bit, but de-scaling down to laptop/phone/watch size is fantasy. To answer those banking on yet smaller chip nodes to make Dick Tracy's wrist thingee reality, I'll offer the following (admitting I'm not EE): at some node, Xnm (my guess for X is 10), control of, and compensation for, leakage and capacitance will drain more power than the Xnm node will save over X+nm. You read it here first.

It's really fun living in a time, the 19th century for example, when you know that you've only scratched the surface of that big ball of reality that is Mother Nature. Anything imaginable looks possible. Not so much when the periodic table is all there to see, thermodynamics have been codified, the Bohr atom described, and so on. Now we have productive corporations, even Apple, segueing to "services" to make moolah, because they see the end of growth from novel widgets.

We live in a Rand New World, battling with a Christ New World. In the Old World, recently ended, economic growth was driven by population growth with each new kid becoming a consumer, if only the necessities. The expansion of white folks into land lived on by others for thousands of years was a manifestation of that. Since contemporary Western economies no longer deal in necessities, population growth can't drive economic growth. We imported folks to do the heavy lifting, starting with black folks, but also Chinese and lower level European ethnics (from the WASP point of view, of course) all in service to the betterment of the few. With the industrial revolution came the problem of consuming output. In the Dark Age, when production was sorta, kinda one man makes one widget (and the widget went to the manor's Lord), that problem didn't exist. But with automated production, i.e. non-linear increase in productivity, the problem became manifest. If you invent a new automated, capital intensive, way to make widgets, how do you find consumers to soak up the deluge of widgets? The manor's Lord, and even including his spawn, won't be enough. In the last few decades, it's been an all consuming problem. Yes, that's a pun with a healthy dose of irony.

So, we have to figure out how to grow an economy with a relatively (with respect to recent history and living memory) fixed technology. If you look at FRED data, you'll see that USofA "output" has been increasingly non-widgets, not stuff, but "services". Unless the path down the road of income and wealth concentration is reversed, we'll be in a permanent Dark Age. The key to avoiding that is to solve the 1%'s refusal to admit that their wealth is mostly luck, so they shouldn't be allowed to keep it all. In order for an economy to grow, aggregate demand has to grow. The "recovery" from the Great Recession is the poster child for faux recovery. It's been driven by monetary manipulation, not demand growth. That 4.9% unemployment rate just reported is mostly the result of the denominator (the size of the labor force) shrinking. Consider that for a moment. The data increasingly confirm that most folks are still worse off than they were before the Great Recession. Hyper capitalism only works if the 99% consume the output, rather than just the 1%. They will only buy so many Ferraris. The Donald tries to hammer on that exposed nerve of angry white folks, but nothing in his history indicates that he would actually do anything good (increase aggregate demand) for the 99%. Certainly, his tax proposals are pure 1% welfare.

So, in the end, the Quixotic Quest for the ultimate battery is fantasy, propelled by dreams of 19th century invention. "We did it back then when we didn't know anything, so it should be easy now when we know so much more." Dirty Harry famously said, "A man's got to know his limitations". So does a nation. So do all nations. There are only so many elements in the periodic table (and a fixed supply of them here on the Blue Marble, and, no, mining them on other planets won't happen with chemical rockets), and they can be assembled into only so many molecules (far more on the organic side, but still true). Thermodynamics proves that there's no such thing as a free lunch. And so on. The great discoveries of the 19th and 20th centuries have brought us to the walls of this box we call reality, and the structure of our society and economy will either adapt to this very different reality, or we'll end up like the rats in the experiment. We can have continuing, robust, economic growth (with or without a battery an order of magnitude more potent than lithium-ion) without burgeoning population where just 1% skim off the production. We just need to spread the wealth around. It will be spent on fancy services that all can use happily. Ayn or Jesus? Take your pick.

01 September 2016

The Did It Again

After all these years of ORMs being discredited, one would think that folks intent on fashioning themselves as "data folks" would bite the bullet and learn about the RM and SQL. They are, in sum, only a bit of set theory. What quant would shy away from some basic maths? Maths is what quants do, right?

I guess not. An ORM for R has been announced. The end of the world is nigh.

What's even more perplexing: R can now be run, more or less, in-engine with all of the serious RDBMS. Why go backasswards?