28 April 2013

Gilding A Lily

Chris Date wrote "What Not How: The Business Rules Approach to Application Development" nearly fifteen years ago, and folks still think generating lots of client-side code, against some hoard of flatfiles, is the epitome of productivity and efficiency. Today's NY Times offers up yet another paean to coders.

I'll start with the punch line(s):
... Rails and JavaScript, which were interesting to Gild.

The company made Mr. Dominguez a job offer right away, and he accepted a position that pays around $115,000 a year.

Yikes!!! Let's see how this company lost its way down the Yellow Brick Road.

The story goes like this. A company with the self-aggrandizing name of Gild (the story doesn't say whether the management have heard of 'gilding the lily' as a negative epithet) asserts that it's easier, and better, to find superior coders (note, coders specifically) by dregging social media, rather than sorting through resumes and such traditional methods. They're building some application(s) to implement this approach.
In all, Gild's algorithm crunches thousands of bits of information in calculating around 300 larger variables about an individual: the sites where a person hangs out; the types of language, positive or negative, that he or she uses to describe technology of various kinds; self-reported skills on LinkedIn; the projects a person has worked on, and for how long; and, yes, where he or she went to school, in what major, and how that school was ranked that year by U.S. News & World Report.

Dominguez was found, and hired, using Gild's methods; they were eating their own dog food. The piece ends, thus:
The algorithm did a good job measuring what it can measure. It nailed Mr. Dominguez's talent for working with computers. What is still unfolding is how he uses his talent over the long term, working with people.

Of course, the only way to really know whether Dr. Ming's algorithm works is to track those hired with Gild versus those hired by traditional methods. And, of course, the data collection and analysis of such won't be done. Too expensive, having to wait for hires to cycle through, and so on.

I doubt whether the author, or the management of Gild, see the irony of that sentence. Let's see. Gild uses "social media" as the basis of its search and hiring practice, emphasis on 'social', and ends up with an anti-social troglodyte. We have a winner!!

The core problem with the 20-something development mindset is that it's fully embedded in the 1960's paradigm of dumb data run by client-side smart code. COBOL/VSAM all over again. And these folks wonder why they're always behind the curve. The fact that the syntax is ruby or java or (cringe) PHP is irrelevant. I'd even wager that these sort of folks don't even know the meaning of 'semantics'. They spin around 'syntax'.

High normal form RDBMS on multi-processor/core/SSD machines reduces the hairball of code. Coders aren't interested in efficiency, but in doing what's comfortable. And, what's comfortable for kids who've spent time with just a PC and a browser, is javascript and flatfiles. Mo LoC, mo money. Any effort to reduce LoC is an assault on the fortress of gelt.

For that $115,000, Gild could have bought a mammoth machine with a real, industrial strength RDBMS (DB2 on linux is my preference; SQL Server second, and Postgres third). Such a machine can easily handle terabytes of data in organic normal form; not that organic normal form schemas house anything near the byte load of the flatfiles. Gild could also get SAS/SPSS, if it wished; R would be better at any price. But "entrepreneurs" tend to be the tail wagged by the dog. The dog, in this sort of case, is the lemming herd of 20-somethings who've never read up (and certainly never gone to class for) the likes of set theory, the RM, real RDBMS, or even industrial strength SQL. Lone coders tend, almost exclusively, to gravitate to code-centric development for the simple reason that all the development platform needs is a compiler and a framework that fits on a laptop. Thus Rails/Ruby, or java/struts, or ...

The rejoinder, which I've had the pain to hear so many times, goes like this: "well, if we just hire a guy, we can just fire him if it doesn't work out, but if we commit to some database, we're stuck forever". In the first place, once a Gild hires a body to implement its platform, there will always be some body soaking up that $115K forever. They've committed to their "platform", so they're committed to keeping bodies at the oars of the trireme. Moving between SQL databases isn't as difficult as the kiddie koders want management to believe; think of it as front-loaded rearguard action. "We can't use a database!!! We have to code!!!" Baloney, of course. SwisSQL, among others, simplifies transition. And, again, in any case, if you've built an organic normal form schema (which you can't toss off in an afternoon, unlike code), then you'll be very close to vanilla ANSI-sql, and thus not dependent on vendor "extensions". Much as Joe Celko is a burr under the saddle of the Chris Date's of the world, he does stand with ANSI-sql. Good on him.

Design, in the mind of "agile" 20-somethings, looks an awful lot like "it growed like Topsy" COBOL applications from, you guessed it, the 1960s.
- carpenters: measure twice, cut once
- doctors: an ounce of prevention is worth a pound of cure
- the 20-somethings: we ain't got no idea (and it's too complicated for us to know yet, if ever) and never really will, so let's code; we can always bend the bytes to look how we want tomorrow and tomorrow and ...

Clearly, thinking first isn't part of the ethos of lemming coders.

The core problem with companies like Gild (others are mentioned in the piece), is that "the data" doesn't really drive the decision:
Dr. Ming's answer to what she calls "so much wasted talent" is to build machines that try to eliminate human bias. It's not that traditional pedigrees should be ignored, just balanced with what she considers more sophisticated measures. In all, Gild's algorithm crunches thousands of bits of information in calculating around 300 larger variables about an individual: the sites where a person hangs out; the types of language, positive or negative, that he or she uses to describe technology of various kinds; self-reported skills on LinkedIn; the projects a person has worked on, and for how long; and, yes, where he or she went to school, in what major, and how that school was ranked that year by U.S. News & World Report.

"Let's put everything in and let the data speak for itself," Dr. Ming said of the algorithms she is now building for Gild.

Should the Dr. Ming's of the world be making the decision about whether code (javascript, et al) or RDBMS be the basis of the platform? I think not. The "software problem" has gotten worse in lock step, over the last few decades, with the usurpation of technical decision-making from the experienced techies to 20-something "entrepreneurs" and "developers". If you insist that you must have a Black Swan (because your self image determines that you are also Just So Special), then you'll not be satisfied until you find what you believe is a Black Swan.

At no time in the course of the piece is any mentioned made of the data analysis. The implication (or, my inference) is that Dr. Ming is recreating SAS/SPSS/R functions in some ruby code. Such a bloody waste of time and money, largely out of (in the best case) ignorance, or (in the worst case) arrogance. While not my favorite RDBMS, this is a best case scenario for Postgres/R and PL/R. With the $115,000 of one coder, they could put together an inferential/database engine in short order. But doing so short circuits the bureaucratic need of "mo bodies, mo money". I've worked start-ups, and the need for organizational power ("I gots mo asses in seats than you do") starts with the third hire. Denying the reality doesn't make it go away. Failure starts early.

As usual these days, Gild is answering the wrong question.

No comments: