23 February 2012

Your Honor, the Prosecution Rests [UPDATE]

Recent essays here have been on the subject of a disruptive transformation of interTubes computing, specifically the notion that bandwidth will expand enough that "thin client" computing makes its deserved resurgence. Nothing is better than synchronous access. Entropy is bad, and asynchrony is high entropy.

Imagine my surprise then, whilst sipping my Panera dark roast and reading my dead-trees Times to see this. I surely don't condone using any tablet to run Office (wrong input paradigm, by far), but the anorexic nature of the implementation says it all.

Meaningful quotes:

"The secret is that OnLive isn't sending you all of the data from your Web browsing session. It's sending you only a video stream the size of your iPad screen."

That's precisely what happens with a VT-X00 on the human end of RS-232. Remember that.

"OnLive (free) and OnLive Plus ($5 a month) are both brilliantly executed steps forward into the long-promised world of 'thin client' computing, in which we can use cheap, low-powered computers to run programs that live online."

With database driven applications, centrally located, humans need only hold a "terminal" to get stuff done. All we need now is a plentiful supply of Mr. Fusion engines.


[UPDATE]
Well, seems like the kiddies in Redmond aren't entirely brain-dead. They've figured out that a touch based device needs touch based software. Here's the story.

.

15 February 2012

A Plan for All Seasons

Well, here we go again. The NY Fed's "Empire Manufacturing" survey results were released today, and were good. These are, of course, the seasonally adjusted numbers. Sound familiar? Well, I emailed the NY Fed (yes, one can do that) to ask how it was they do seasonal adjustment, specifically whether the weights are re-calculated each period based on the seasonal factors experienced during the period: weather, events, period length, etc. The answer I got back was to review the methodology, which they publish in some level of detail. I had planned to do that anyway, but a simple "yes" or "no" would have sufficed to answer my question.

Although I didn't get a simple answer, I did get a couple of links to Financial Times discussions of why seasonal adjustment may be a bit wacky these days. For those that don't follow links, the conclusion is that the Fed has adjusted the adjustments (that's a job I'd like: data chiropractor). For those who might not know: seasonal adjustment isn't, so far as I can find, done by measuring data in the period for some set of seasonal factors (weather, holidays, business days in period, etc.), then calculating the period's seasonal weights. Rather, standard practice, since I was in school and long before, is to wave a statistical wand over recent past periods, and coax out seasonality. These links deal with this, possible, bias: the Great Recession happened during autumn and winter (Northern Hemisphere) and the standard algorithms may have bled some part of the Great Recession into seasonality. Using such weights now will boost unadjusted data above where a longer term seasonal adjustment would (or one not using Great Recession data at all; aside, I just got a bit further into the second link, and find this "The second approach is to excise the financial crisis period (specifically, omitting one year of data beginning just before the Lehman bankruptcy) and estimate seasonal factors using this series." Great minds run in the same gutter.). The point being: seasonal adjustment is still fixed weights from past data.

For those interested in what BLS is up to, here's the document.

Or, to quote from the second link, below:
"These biases exist because the computational techniques used to seasonally adjust economic data inappropriately interpreted some of the downturn in the fourth quarter of 2008 as a new seasonal trend."

So, we have two possible upward biases.

Here:
http://ftalphaville.ft.com/blog/2011/12/20/806221/tis-the-seasonality-hold-the-jolly/
http://ftalphaville.ft.com/blog/2012/01/04/817881/

Even if you're not an economist or other data life form, it makes for interesting reading. Interesting in the Chinese curse sense, that is.

09 February 2012

R We to Delphi Yet?

Well, thanks to R-bloggers, I found this Oracle page/blog.

The money quote:
"Oracle R Enterprise lifts this memory and computational constraint found in R today by executing requested R calculations on data in the database, using the database itself as the computational engine."

Tres interessant. Now, that sounds like they're saying this will be PL/R for Oracle. Not that I'm in the market for Oracle Enterprise, of course. If they make it available for the free as in beer version of Oracle; that I'd be interested in. (Miss Morris would be so bent by my ending a sentence a preposition with.)

08 February 2012

Camel Passes Through Eye of Needle!

The Brave New World I've been talking about makes high normal form database centric applications all the more appropriate. I know of at least two extant applications which implement this way. That said, the issue remains: what will the interTubes infrastructure look like in two years and further, and what sort of RDBMS architecture best exploits it? It's worth noting that there is precedent; Bill Gates got very rich by targeting software to not yet available hardware. That said, some comments on some comments.


[Chris]: [Robert asserts] threads are more efficient, processor-power-wise to spin up than processes, and therefore if you insist on a single-threaded multi-process model, you can't have sufficient performance to make your database system relevant.

No, that's not exactly what I said. The issue is lost cycles as one moves down to a single thread. It's more about the number of clients that can be supported "simultaneously". At one time I was also enamoured of thread count to execute queries in parallel; not so much now. For reporting generally, and BI/DW specifically, sure. But for OLTP, where my interest lies, not so much. There, client count is what matters. Clients run relatively discrete transactions on few rows; parallelism isn't much help for that sort of client. Yes, to some extent I've changed my mind.

Think of the issue this way: if it were true that scaling down threads to one per processor/core yielded a scale up in performance equivalently (only one of two threads yields twice the performance on that last thread), then the question gets more difficult. But threads and cores don't scale that way. Such reverse scaling could only happen if the core's frequency increased as the inverse of threads active. That just doesn't happen, you get a bin or two; that nasty frequency brick wall is the reason vendors have turned to multi-core and threads in the first place.

It's a matter of relative performance, not absolute. If the PG developers decide to not implement a threaded engine (some or all sub-systems), PG will inevitably be viewed as not worth the cost. I never said that this would happen tomorrow. Threading isn't a fad, but a hardware evolution. As the quote from Intel states, that's where they intend to go. Countering that is the massively parallel single thread/core machine, such as the Transputer or Thinking Machine or possibly an armada of ARM (which has been reported to be adding threaded cpu to its menu) cores on a chip. Whether a *nix could better manage such a machine (as opposed to a threaded one) is up in the air; experience from the 1980's when parallel machines were all the rage suggests not (Amdahl's Law put an end to them, and is still in effect). Some feel that the ARM armada will take down Intel. Way too early to bet on that. Couple that with the rapidly evolving web infrastructure, and we have a Brave New World where the interTubes is virtual RS-232 passed on virtual Cat-5; what's on the client end of the wire is just a terminal, prettily pixelated nevertheless only a terminal. That evolution matters to how a RDBMS (and its applications) is optimally architected.

Intel needed MicroSoft to bloat up its software in order to create demand for ever higher cycle count chips; it's the Wintel Monopoly for a reason. Now that frequency is near a brick wall, clients are devolving from PCs to wrist computers and such, the likes of Intel can't count on Office to justify new chips (unless Steve's boys and girls can parallelize Office; I don't see goods odds on that). Servers are pathologically multi-user, and therefore the logical vein to mine. That's what's happening and will continue.


[Chris}: Instead things like network connection/teardown, command parsing, and, well, actual work dwarf process/thread startup cycles by several orders of magnitude.

As I've said before, the main issue isn't process startup cycles versus thread, but all those cycles lost by running single threaded, which could be used for actual work. All that network stuff gets kicked to the curb in the Brave New World anyway; that's a large measure of the point. And, as above, the issue isn't just "startup cycles", but total processing cycles doing useful work. Throw away all but one thread, and you throw away nearly (threads - 1 + (bin bump))/threads of useful work. See this testing. That's the issue. They No Such Thing As A Free Lunch, and the cpu is about the most significant example.

Which is precisely why client multi-plexing is superior; with a threaded engine, you get twice (or more) active clients without the falderal. The comment on connection pooling reveals the same issue: auditability and transaction integrity if nothing else. You end up having to do all that manually, if at all. While connection pools were initially at the database end, they are now most commonly in the web-server, and motivated by the (un)connectedness of HTTP. Web server writers face the same issue as database server writers. When I was building servlet/applet thingees in the late 1990's and early 2000's, we had to build quick and dirty poolers ourselves. By about 2002, databases and web servers had integrated the support. Had to happen. Here's an example for Tomcat. Folks can, and do, argue about which piece of software should own the connection pool; I'll avoid that for now.

What's missing from the comments is discussion of my main point: with a threaded engine working on threaded (and likely with higher thread counts in the future) cpu's, high bandwidth connections, and client multi-plexing; one ends up with a RS-232 like environment. While I haven't explored it, there's an environment named Opa[1] (Wikipedia has a short article), which is said to include web server and database server all in one. That's just begging for a fully connected infrastructure. Ideally, and I haven't explored enough to know as I type, there would be a single "connection", from the client's perspective; Opa takes care of managing the database/web server interaction in a way which preserves transaction semantics and client identity.

Unless you've been building database applications from before 1995 (or thereabouts), which is to say only webbie thingees, the efficiency and user friendliness of *nix/RDBMS/RS-232/terminal is a foreign concept; just as it still is for mainframe COBOL coders writing to 3270 terminals, same semantics (I'll grant that javascript is more capable than 3270 edit language, and Wikipedia does it again, with this write up). In the web age, AJAX was the first widely known (and possibly absolute first) attempt to get back to that future, then there is Comet. Now, we have WebSocket, discussed in earlier posts.

The Achilles heel of those *nix/RDBMS/RS-232/VT-X00 applications was the limit on the number of clients that could be accommodated since the client patch remained while the user kept a session. The emerging hardware/infrastructure that I'm discussing may/should remove that restriction. This also means that developers must heed the mantra, "the user doesn't determine transaction scope". With HTTP (un)connectedness, that isn't much of an issue.

With ever lighter clients, database centric (as opposed to coder's wet dream of client centric) applications become far more appropriate. Databases which can support higher client count on a given hardware platform (through threading, that is) will eventually clobber those that can't. That MySql, kind of OS, has threaded engines will make it difficult for other OS databases that don't. The question will not be PG versus Oracle/DB2/SS but PG versus MySql. While that's true today, the performance and configuration differences will only get greater over time.

While I surely agree that writing a threaded server, of any kind, is far more difficult than letting the O/S take care of the details (by writing a process based server), it's clear that the cpu makers are going to threads for performance. Think of this as a game of musical chairs.


[1] There is this internship listed on their website (typos their's):

Adding support for relational databases in Opa. The Web developping platform Opa has so far concentrated on its own database engine as well as the connection to existing NoSQL databases. The goal of this ambitious internship is to create the fundations of a framework for connecting relational databases to Opa (starting with MySQL). Different levels of integration will be experimented and successively made available (roughly: an untyped low-level API, a dynamically typed API, and a statically typed API generated at compile time). This internship requires excellent skills in functional programming (Ocaml, F#, Haskell, Scala) and a very good knowledge of the SQL language and its concepts.

06 February 2012

Mr. Sandman, Bring Me a Dream

AnandTech has just published its review of the Intel 520, driven by a customized (their firmware) Sandforce 2281. Lots of useful information. See, in particular, the very end of the review. The main point is whether this means the end of Intel SSD controllers. Busy day in SSD land.

I Do Thee Wed

In unity, there is strength; or something like that (some say it's from Aesop, others from The Bible). Well, Rambus just scarfed up Unity Semiconductor. This is an interesting development, viewed from at least two angles: 1) Rambus has a history of attempting to proprietarize memory technology, and 2) if CMOx works, NAND vendors are in trouble.

According to the various write-ups, CMOx has been in development by Unity for nine years. I found it about five years ago, and the technology was just ready to go. Guess that part didn't work out. Yes, this endeavor has mentioned Unity a few times over the years. With an aggressive new owner, we could see a major course correction in storage. An interesting point: the Unity website still has a page devoted to a JV with Micron to manufacture PoC (my inference) devices on Micron fab. Wonder what's going to happen to that?

So far as point 1) goes, Rambus has been losing consistently over the last year or so in its attempts to enforce patents. Taking on the NAND vendors won't be a walk in the park. Micron, by the way, didn't settle and turn evidence against the other DRAM vendors; Samsung and others did. Since much of the patents in contention has just been ruled invalid, I wonder how they feel?

05 February 2012

I Hate You, Diana Ross!

Newt and Mitty have proved that Triage would be a waste of time and effort. I really wanted to build that thing, but SuperPAC money makes such an application irrelevant. SuperPACs are "prohibited" from co-ordinating with campaigns, and vice versa, thus rendering control impossible. Thanks a lot, Supremes.

03 February 2012

Damn You, Damocles!

The earlier thread related post engendered some comments that pooh-poohed the importance of threads going forward. While the points were rational, in that not all of the engines mentioned are threaded in all subsystems on all platforms (and, I never said they were), the fact remains that threading, in addition to coring (if that's a real word), is the architecture we'll live with so long as cpu's are silicon based. Lithography can only go so small, and power can only go so high, especially as feature size diminishes. I really don't think that notion is up for debate.

What can be debated is whether a single threaded engine, all subsystems that is, can keep up with threaded engines, some or all subsystems. The answer, in the limit, is no. Eventually, the fork in the road will split the performance paths just too widely to make even a "free" engine worth the money. Will that happen next year, or within five? Five seems more likely, except that we're working, whether it's recognized or not, in a different Moore world.

Moore's actual observation was that processors would double in productivity (measured in $$$) about every 24 months. He didn't predict anything about feature size, per se. He did take into account that the Law derived from feature shrinkage, but from a financial point of view. Intel is leading the pack in implementing hardware threading across the board, no question; although not in thread count. It is equally unquestionable that clients, more and more phone-ish devices, are inherently limited in power, given the physical size limits. Given Amdahl's Law and inherently limited power, the client isn't a solution to the database power situation. In other words, success going forward will rest with those who *do* go Back to the Future, but not by ignoring threading.

With bidirectional "small" clients (AJAX, WebSockets, et al) talking to servers a la *nix databases to VT-X00 terminals, it behooves us to look back at how those servers functioned, since they are the paradigm which leverages the current, and evolving, hardware. As interTubes communication behaves more and more like RS-232, there is a maximally efficient way to use this. Jamming lots o data on the line (and leaving logic hanging out there, too) isn't it. The network of tomorrow is much like the client/server in-a-box environment of the first instance. One might argue, tee hee, that The Cloud is uber-centralized data and that clients are/will be relatively passive devices; very much like a VT-220 I fondly remember. More to the point, with a client/server in-a-box paradigm (which can now be achieved with WebSockets and such) we have a patch of memory for the server, and another patch for the client, with the screen/terminal/smartphone/whatever on the other end of a wire. The screen only has responsibility for display and input. With proper NF and DRI, there's very little active code in this server resident client. Schweeeet. But, as the previous experience demonstrates (if you've actually been there), all edits are done, input box by input box, in real-time against the live datastore; the VT-X00 are referred to as "character mode" devices, as opposed to the "block mode" 3270 mainframe terminals and (regular) HTTP clients of today which re-connect to send a screen's worth of input. Schweeeeet. All the client does is paint the screen and ship the input back. Schweeeeeeeet. Coders?? We don't need no stinkin' (and do they ever) coders!!

Each of those client patches can run on a thread. Give up on threads, and you give up on half your capacity (or more); it's not just about how much faster a thread switch is than a context switch. Here is a schematic (about half way down the page) of the i7; a core (cpu) is about 10% of the real estate, or the transistor budget. In other words, while feature size has diminished over the years thus pushing up the transistor budget, not much if any of that largess has gone to instruction set implementation. Current Intel chips don't even implement X86 instructions in the hardware, at all (appears that X86 native execution began to disappear with the P4; thanks to Andrew Binstock). So, we get more RISC cores/threads. Here's an Intel thought: "[T]he multithreading capability is more efficient than adding more processing cores to a microprocessor."

When asked in the 1970's why his machines were so much faster, Cray said, "shorter wires". That remains true; certainly cpu designers are quite aggressive in the pursuit as they jam evermore features on nanometer long wires. Why software folks continue to think that longer wires are smarter is puzzling. Even if McNealy had been right that the network is the computer (for a local engineering net, not so much the interTubes), there's still the problem of reconciling all those nodes.

While Intel cpu's mostly are two threads/core, others have up to eight.

So, what margin of advantage is there to "turbo" mode at the thread level? Turns out, there is a bit of it, with the current Intel chips:
"When there are only two active threads, the Intel Core i7 will automatically ramp up two of its processing cores by one speed grade or bin (133 MHz). When only one thread is active, the Core i7 will ramp up the sole active processing core, not by one bin, but by two bins (266 MHz). However, if there are three or more active threads though, the Core i7 processor will not overclock any of its processing cores." Not a lot, compared to turbo at the core level. So, I still predict that ignoring thread support in the cpu is a losing proposition.

Here's a new posting testing threading in SQL Server. While not a runaway, in virtually all test cases, threading improved performance. Now, SQL Server in default mode, is a locker based engine, while Postgres is MVCC, so the engine mechanics might have an effect. Given that MVCC databases promise no conflict between readers and writers, one should expect that those sorts of databases should gain more from a threaded engine than a locker; threads stay active.

While it doesn't explicitly compare Oracle on a process OS versus a thread OS, this paper does discuss the advantages, and methods of use, of parallelism and threads. Oracle and Postgres are both MVCC, although the implementations are different. And, this is a Stanford paper discussing the Oracle Windows thread model. Making comparisons between Windows and linux (or any specific *nix) is difficult, in that what we want to know is whether a "good" thread implementation is better than a "good" process implementation, for an otherwise equivalent OS.

Cut to the chase. Since DB2 switched to a threaded engine from a process engine, there ought to be some evidence on the interTubes as to whether this was a Good Thing. This is a presentation by Serge (guru to DB2 weenies), see slides 6, 15, 16. And, this is the IBM justification. Another example: this is a paper by SAP for DB2/LUW on HP-UX (not the most popular *nix), note point 1. under "Operating System". IBM didn't move to a threaded model for yucks; they've been treating LUW as red-headed stepchild for so long that any significant expense would require significant bang for the buck. Finally, at long last, a recent presentation on the new threaded DB2. Note slide 17.

We need two additional pieces to get the most out of the system: multiplexed clients and fast swap to enable the clients to not stall. Here's IBM's take on multiplexing. Others can do likewise. The way to keep things running smoothly is Sacrificial Swap©, by that one means using SSD as swap on the database and/or web server machine. Since SSD have determinate lifetimes, more cliff like End Times than HDD's decay slope, simply replace the swap drive periodically. Swap on SSD provides vastly better performance.

On a related note, this is an IBM (mostly) paper on bufferpools (what DB2 calls them) and SSD. Of note, this isn't using SSD as primary datastore, but only in support of bufferpools.

My conclusion: threaded models have the lead, and they're not likely to lose it.

Lies, Damn Lies, and the BLS [updated]

That's Bureau of Labor Statistics. The headline over the column in my dead trees version of the NY Times reads: "Stagnant Job Growth is Expected in Report". Yet, the number this morning was glorious. How does that happen? Why did the numbers come in so much better than expected? Were the numbers figured? Let's see whether we can find out.

First: these numbers are estimates from sample surveys. The only population number related to employment is the weekly UI filing number. Everything else is an estimate from some kind of survey.

Second: as every fourth grader knows, a percent is just a decimal fraction with the point shoved over two places to the right. And every fourth grader knows that a percent goes up whenever the numerator goes up or the denominator goes down. Or both.

This is real time, in the sense that I haven't looked at the numbers. I'm going to go out on a limb and say that, at least, the denominator went down. That number is the estimate of the total labor force. That number has been declining through 2011 - 2012 (here is the table and this is the almost raw data), the difference being seasonal adjustment. Seasonal adjustment of economic data is still controversial.

Now, have a look at the next table. What we see is that the number not in the labor force is up, as is the number working multiple jobs.

Table A-15 is where the doubters gather. U-6 is the number cited by both Left Wing and Right Wing sympathizers as justification for either blaming all those lazy poor people who can't seem to stay put in Mitt's safety net, or blaming Mitt and Friends for slicing away at the safety net. The not seasonally adjusted is the number which matters. And it's up a tad from December to January.

I rest my case.

[update]
Today's Times has a longer piece, complete with interviews, which I don't get to do from my drafty Frost Belt garret. As well, I see that I wasn't sufficiently explicit about seasonal adjustment. What I expect happened is that the adjusted numbers (the ones nearly always quoted) overstate the level of employment. One of the reasons to adjust the numbers is truly seasonal: weather is lousy here in the Frost Belt in January, and the purpose of adjustment is to make a level playing field, month to month. Since the Frost Belt last month was more like October (the January "season" was largely absent), the quoted number got a double dose. I don't expect to see it continue, alas.