27 November 2015

Billions and Billions of Dollars

I've always suspected, just from acquired memory, that the real reason behind pharma's claim that it cost $X billion to bring a "new drug" to market is that drug companies simply don't pull the plug when the data say the drug has little chance of working and/or getting approved. For those who may not know, in the US, there are three sanctioned levels of trial, not surprisingly called Phase I/II/III. These are trials of the drug in humans. Prior to Phase I there are pre-clinical lab tests to, at least, demonstrate that the drug works chemically, biologically in glass, and biologically in non-humans.

Once FDA is convinced that the compound is non-harmful, or least non-fatal, in non-humans, clinical trials can begin. In general:
Phase I -- safety
Phase II -- safety and dosing, and possibly efficacy, in small trials
Phase III -- efficacy in large trials

As a rule, at least two PIII trials demonstrating statistical efficacy and clinical benefit beyond current therapies are needed to ask FDA for approval. The key points in these trials:
1 -- sponsors (aka, drug companies, mostly) are not required to provide the public, or investors for corporations, all data generated or FDA correspondence during development
2 -- FDA is not allowed to release much, if any, data until such time as it makes a marketing approval decision, and then the non-approval (aka, CRL) may be vague

The result of this is that drug companies often continue pouring money down a rat hole. It's what they do. That's my recovered memory of watching the drug business for the last decade or so. Finding clear data on how many drugs with failed/marginal Phase trials are then sent into the next Phase is difficult. Not the sort of information drug companies want publicized.

Part of the problem may just be a naive` view of stats, in particular what a p-value means. And, no, I don't say that as intro to pumping Bayes in clinical trials. Not even.

Then I found this piece. All is revealed.
And, of course, add to all that the entirely avoidable, but nonetheless remarkably prevalent, tendency to progress agents into Phase 3 that did not actually achieve positive Phase 2 findings (at least without the help of unjustifiable post hoc analyses).

So, here is where all that moolah goes:
If, for example, your primary end-point reaches statistical significance but every secondary end-point suggests no effect, its time to suspect the False Discovery Rate. Put another way, don't let the data from one single experiment (however important) dominate the weight-of-evidence. The attitude "well, the trial was positive so it must work - so lets plough ahead" may well be tempting, but unless the broader picture supports such a move (or the p value was vanishingly small) you are running a high risk of marching on to grander failure.

Leading to his conclusion:
Failures in large, expensive Phase 3 trials are the principle cause of poor capital productivity in pharmaceutical R&D. Some of the reasons for failure are unavoidable (such as, for example, the generalization problem). But the False Discovery Rate is most definitely avoidable -- and avoiding it could half the risk of late-stage trial failures for first-in-class class candidates. That translates into savings of billions of dollars. Not bad for a revised understanding of the meaning of the humble p value.

But, of course, drug companies won't do that, since they get to keep their bloated bureaucracies only if they continue to do trials. Cutting off the losers in PI or PII does nothing to promote that. So, they won't.

19 November 2015

You Say M-eye-cro, I Say M-ah-cro

One of the hallmarks, if not raison d'etre, of microeconomics aka The Corporate Perspective, is that macroeconomics is just the sum of all those homo economici maximizing their use of land, labor, and capital. Such argument has been used for centuries to justify all sorts of zero-sum gaming and short-term decision making. Keynes is the most well known but not first to recognize that the welfare of The Tribe amounted to more than the sum of each member's wealth. Arguing against the macro folks is the 1% argument that "only the little people pay taxes"; if you're of the 1% you directly buy your own cops and schools and such.

Short-term decision making is exemplified by "you don't miss your water until your well runs dry". In California, we find the 1% squandering water on lawns just because, today, there still is some water and today's price (which never seems to calculate depletion) is affordable for them.

Zero-sum gaming is exemplified by the likes of Airbnb, which is subject of Australian hearings. The same old story: we should be allowed to slough off social costs, both current term and long term, because we assert that we expand the larger economy. In the case of American sports teams being gifted with stadiums, often fully gratis, the carry-on effects of bars, restaurants, and memorabilia shops are asserted to bring in more commerce and tax on same than the cost of such stadiums and tax abatements. No unbiased study has ever agreed. What has been found, of course, is that such teams pull up stakes for some other jurisdiction which makes a bigger, dumber offer as soon as, or even before, the lease finishes. Such taxpayer/community gifts are only profitable to the community if customers are imported to the jurisdiction from external places; otherwise the community is simply transferring consumption from a loser (movies and bars and OTB) to an adorned winner (Your NFL Team). Plus, that winner gets extra profit from the fact of not paying substantial cost.

Airbnb has a more difficult case to make: it is a pure replacement for some other form of accommodation. What Airbnb gains, some other facility loses. Since Airbnb runs through Ireland, of course, all other countries see no tax benefit of the corporate cash flow. And, of course, all that happens locally is that Hilton loses a customer to Airbnb's rooming house. Such Airbnb facilities are often sub-rosa, so any local accommodation tax goes unpaid. The argument by the likes of Airbnb amounts to, "we're cheaper than X incumbent, so our customers will spend the difference in the locality". I won't stay at Hilton, but rather some stranger's back room, and I'll have dinner at the Hilton bar?? As if that actually made sense as a justification: the customer base spends the same, so let us avoid taxes because we help the larger, local, economy? So, even if the Airbnb argument were true, the net gain to the community is less than $0; the Airbnb sleeper spends the difference between Hilton and its cot. There is no net increase in the local economy. The only justification for taxpayers subsidizing Airbnb (by letting them skate on regs and taxes and such) is if those who sleep at Airbnb wouldn't otherwise be in the locality spending money.

The bottom line, so to speak: when analyzing micro effects, especially quantitatively, be careful not to get sucked into ignoring macro effects, both immediate term and long term. Nearly always you'll find micro actor(s) seeking to slough off costs to the macro world. It's at best a zero-sum game for the macro economy, while the 99% lose 99.44% of the time.

17 November 2015

Codd Has Risen

The kiddie koders still worship at the feet of flat-files and client-centric transaction control. Largely, it appears, because no one in University bothers to show them the Yellow Brick Road to sanity. There is the recent missive on the MongoDB attempt to dip another toe into the SQL pond, while maintaining that it's really, really a NoSql, client-centric transaction "datastore". "You don't need no TPM, roll your own."

Now comes Intel's parallel memory implementation for Xeon Phi/2. An earlier note.

What we have in a relational engine is SIMD semantics, and thus embarrassingly parallel. Someone is going to see the light (Larry the boatman, perhaps?) and adapt this hardware to Organic Normal Form™ data, mostly if not wholly, in memory.

15 November 2015

Commentariate, part the first

Every now and again I find a thread to which I feel compelled to comment. And, of those, every now and again, I feel the need to document it here.

This is all unfortunate. With Xeon, SSD, high bandwidth, fully connected web there isn't much need to view the world as disconnected and client-centric. There just isn't. But adopting a centralized, data-centric world means we don't need gobs and gobs of code. We'd need a fraction number of coders. Just as their COBOL grandpappies cried, "I don't know nuthin bout birthin babies!!!", so too today's web coders discard the RM as irrelevant. We could have a development semantic which looks a lot like a *nix RDBMS talking RS-232 to VT-220s. No muss, no fuss. All coders are needed for is to write screens and input routines; the database takes care of the rest. Scary thought, is it not?

14 November 2015

The Price of Apostasy

No surprise: I've no respect for the NoSql crowd, being as how they're hell-bent on reactionary 1960s client code/file paradigm. Ugh. Before: COBOL. Now: java (and still, in the Fortune X00, COBOL). Before: VSAM. Now: MongoDB (et al).

Comes this bit of irony.
But one thing was missing from that enterprise messaging, perhaps because it went missing from MongoDB's Enterprise product: joins.

That wasn't the plan. The plan was originally to charge for joins (or $lookup, as MongoDB is calling the functionality). Yet MongoDB's ever-watchful (but not always paying) community resisted.

MongoDB's capitulation is, of course, a testament to the company's willingness to heed the voice of its community. However, it's also a testament to just how hard it is to make money on free software: "Here's a new thing but you can't have it!"

Of course, imitation is the sincerest form of flattery, but come on. Stick to your guns. It appears the 20-somethings who actually use MongoDB like the huge flat-file non-structure that enables their infinite employment. Good on them.
This means that MongoDB, like every other open source company, needs to figure out ways to sell something other than open source, and feature-level differentiation, for the reasons stated, won't do. Not for the stuff that really matters, anyway. Otherwise the community build, meant to be an on-ramp to enterprise payola, instead becomes a roadblock to adoption.

Well, PostgreSQL has been doing the Open Source thing for a couple of decades, and supports a number of customization shops. One might also argue that MicroSoft has taken the open-core paradigm with its purchase (and debatable integration with) Revolution Analytics R. Also, I'm among those who deny that java is truly open source. One might argue that applications written with it are, but Leisure Suit Larry controls the language lock, stock, and barrel.

13 November 2015

Dee Feat is in Dee Flation, part the thirtieth

Well, the shit has hit the fan; producer prices continue to tank. All those Austrians keep telling us INFLATION IS HERE!!!!!!! Yet, like Godot, never appears.

Down a record 1.6 over the 12 months. As said so many times, motive and incentive matter more than data to those who pull the puppet's strings

12 November 2015

Linus, Guido, Larry

Whatever one may feel or think about linux, python, and perl, the fact that each has a Benevolent Dictator For Life goes a long way to explain why some really love it. A BDFL means that the application has stated boundaries and norms upon which one can depend.

R, alas, has no BDFL. R is anarchy. I'm certainly not the first to notice this fact, and my particular gripes with the language are not necessarily widely held. An interesting post from an R consultancy takes a direct swipe at the anarchy. It concludes with the following (bold in original):
Data should always be the first argument to a function.