17 February 2014

I Have a Code in My Node

Humpty Dumpty sat on a wall,
Humpty Dumpty had a great fall.
All the king's horses and all the king's men
Couldn't put Humpty together again.

Now, consider what the results might be if Humpty's wall were setby the Yellow Brick Road that we've been traveling? You remember, the one to Oz where the RM (and it's rococco manifestation, the SQL database) vanquishes all of it's historic (and neo-) pretenders? If we anthropomorphize Intel (and, face it, all semi makers) as Humpty, what do we see? Well, the fall is the delay in getting 14nm parts out the door; and, a bit less so, no move to 450mm wafers (300mm began at least 2001). The innterTubes blogosphere is awash in argument: it's an unplanned, unexpected, unadmitted failure to execute on promised delivery; or it's just a pause by design, much as used to be part of Cape Canaveral launches.

This growing discussion of where we're at and where we're going in the realm of VVVVVVLSI (don't go to WikiPedia, I made that up) does matter to our trip down the Yellow Brick Road. If we're entering a period, perhaps prolonged if not permanent, of technological stasis, how will computing be impacted? The meme of Moore's Law being true forever and ever, Amen is false. In the near term, we can see that the line in the sand betwixt the Newton world and the Heisenberg world is a few short steps away. For myself, I don't see the cpu living in a quantum world. Whether some other element can be fabricated smaller than where that line's at and still make a deterministic cpu? Carbon? May be. May be not.

Of equal import: is there any unmet demand for cycles? Is there (or soon to be) an equivalent to the Wintel duopoly, where Intel needed a cycle hog to generate demand for the next chip and MicroSoft needed a source of more cycles to keep its piggy software running?

That Intel/AMD/ARM face unmet demand for cycles, which can only be satisfied with smaller feature size (node) and greater transistor budgets may well now be a wishful myth. Clearly, the halcyon days of the Intel/Microsoft Pentium/Windoze&Office symbiotic duopoly are way in the rearview mirror. The iPhone hasn't turned out to be the interminable cycle hog that Office was, alas (bandwidth, well, yeah). While M$ could graft on ever more esoteric editing/publishing functions (99.44% of which users never used, of course) to Word and add applications to the Office umbrella, all the while writing to the next Intel cpu as Bill said (well, may be), "if Windows is slow, we'll let the hardware fix it", smartphone vendors haven't had such a clear field.
Even so, running Windows on a PC with 512K of memory is akin to pouring molasses in the Arctic. And the more windows you activate, the more sluggishly it performs.

The camera (which has what to do with making phone calls?) and innterTube games (ditto) and such. But organic expansion through the device? Not so much. Is there some other device/platform which the semi makers can leverage in a similar way? That's the real elephant in the room. We may be running out of potable water, but have a surfeit of transistors, nor any thirst to slake. As with economics, supply doesn't make its own demand; rather, it is subservient to it. American auto companies, for decades, implemented planned obsolescence. And it worked, until it didn't. That approach stopped working in the PC world, and looks to be staggering in the smartphone world, too. Lots of content XP users. Smartphones, too?

To date, transistor budgets have been put to use by cramming existing functions into fewer, one in the SoC case, chips. For a given wafer size, a node shrink is valuable for one of two (both, in the best case scenario) reasons: the increased budget per unit area supports better implementation of the cpu/whatever or the increased budget means greater chip yield (the chip budget is static) per wafer. In the former case, value redounds to the semi maker through improvements in end user devices. In the latter case, more devices are shipped to end users; i.e. there's unmet consumer demand for the devices. The former has been played as a zero-sum game for the semi makers; in truth nothing new, just M&A at nm scale. The latter looks to be shaky. PC sales continue to spiral down. Apple refuses to expand its target market. Samsung continues to ship more devices. Only those two appear to make any moolah shipping phones/tablets. Stasis and zero-sum game can be considered two sides to the same coin.

Even apart from the SoC fork in the road, the cpu core specifically uses a dramatically decreasing proportion of the transistor budget over the last decade. Makes one, me in particular, wonder whether Intel made the wrong decision lo those many years ago: 1) use the growing budget to process microcode to a real RISC core or 2) build out the X86 ISA in pure silicon. They went with 1, as we all know.

One of the common notions: as feature size gets smaller less power is used and less heat is generated. Not quite true.
Shrinking transistors boosted speeds, but engineers found that as they did so, they couldn't reduce the voltage across the devices to improve power consumption. So much current was being lost when the transistor was off that a strong voltage--applied on the drain to pull charge carriers through the channel--was needed to make sure the device switched as quickly as possible to avoid losing power in the switching process.

One take on the issue.
With Dennard scaling gone and the benefits of new nodes shrinking every generation, the impetus to actually pay the huge costs required to build at the next node are just too small to justify the cost. It might be possible to build sub-5nm chips, but the expense and degree of duplication at key areas to ensure proper circuit functionality are going to nuke any potential benefits.

This related piece has the graph that tells all.
This expanded version of Moore's law held true into the mid-2000s, at which point the power consumption and clock speed improvements collapsed. The problem at 90nm was that transistor gates became too thin to prevent current from leaking out into the substrate.
...
The more cores per die, the lower the chip's overall clock speed. This leaves the CPU ever more reliant on parallelism to extract acceptable performance. AMD isn't the only company to run into this problem; Oracle's new T4 processor is the first Niagara-class chip to focus on improving single-thread performance rather than pushing up the total number of threads per CPU.
...
The fact that transistor density continues to scale while power consumption and clock speed do not has given rise to a new term: dark silicon. It refers to the percentage of silicon on a processor that can't be powered up simultaneously without breaching the chip's TDP.

There just aren't that many embarrassingly parallel problems. Well, there is one....

Let's continue with this very recent piece.
In 1987, 20 of the top 20 semiconductor companies owned their own leading-edge fabs. In 2010, just seven did. As the cost of moving to new processes skyrockets, and the benefits shrink, there's going to come a time when even Intel looks at the potential value and says, "Nah."

"the benefits shrink"? No wintel monopoly. Sniff.

So, in the end, if we can see both limits, demand for computing power and supply of computing power, is this a Good Thing or Bad Thing? Are we slipping and sliding on Humpty's yellow guts, unable to progress? Or did we stroll past him before he knockered his noggin? Well, in 1969, when Codd wrote his first paper, computing was in a period of stasis. Relatively speaking. IBM was on the verge of knocking off the Seven Dwarfs, with the 360 machines victorious. That ISA is still around today, and only recently expanded beyond 31 bits (no, not a typo). Could be, the bird has now back come home to roost. The RM was invented to solve the maintenance and performance problems of a known present and future of computing platform: the 360 mainframe.

If I, and these pundits, are right, then the future of computing rests on a static compute power base, much as it did in 1969 and the 360. Only more so now. How, then, to maximize that cpu(s)? Embarrassingly parallel problems become the preferred approach. Some are, some aren't. Here's where the power of the RM comes into play. With flatfile datastores, by whatever name, performance will be limited both by byte-bloat and RBAR throughput. Unlike when Bill was wrestling with Windows, we won't be able to code to the next generation of chip. The RM yields the minimum data footprint, inherently parallel processing, and with SSDs and lots of DRAM (which is a simpler structure), should become the new norm. So to speak.

1 comment:

Anonymous said...

So hey there Robert! With the primacy of the relational model and finally getting rid of all that cobol era procedural code (YUK), I'm delighted that SQL is finally taken over. However, I'm having a hard time composing an SQL statement to play this movie I downloaded, can you help out here? Much appreciated, heres looking for that movie player screen to pop up in SSMS where it rightly belongs!