Multi-core Mania: A Rebuttal

02 April 2009

The Simple-Talk newsletter is a monthly e-zine that the folks over at Red Gate Software (makers of some pretty cool toys, including their ANTS Profiler, and recent inheritors of the Reflector utility legacy) produce, usually to good effect.

But this month carried with it an interesting editorial piece, which I reproduce in its entirety here:

When the market is slack, nothing succeeds better at tightening it up than promoting serial group-panic within the community. As an example of this, a wave of multi-core panic spread across the Internet about 18 months ago. IT organizations, it was said, urgently had to improve application performance by an order of magnitude in order to cope with rising demand. We wouldn't be able to meet that need because we were at the "end of the road" with regard to step changes in processor power and clock speed. Multi-core technology was the only sure route to improving the speed of applications but, unfortunately, our current "serial" programming techniques, and the limited multithreading capabilities of our programming languages and programmers, left us ill-equipped to exploit it. Multi-core mania gripped the industry.

However, the fever was surprisingly short-lived. Intel's "largest open-source effort ever" to provide a standard tool for writing multi-threaded code, caused little more than a ripple of interest. Various books, rushed out while the temperature soared, advocated the urgent need for new "multi-core-friendly" programming models, involving such things as "software pipelines". Interesting as they undoubtedly are, they sit stolidly on bookshelves, unread.

The truth is that it's simply not a big issue for the majority of people. Writing truly "concurrent" applications in languages such as C# is difficult, as you get very little help from the language. It means getting involved with low-level concurrency primitives, such as lock statements and so on.

Many programmers lack the skills to do this, but more pertinently lack the need. Increasingly, programmers work in a web environment. As long as these web applications are deployed to a load-balanced web farm, then page requests can be handled in parallel so all available cores will be used efficiently without the need for the programmer to be concerned with fine-grained parallelism.

Furthermore, the SQL Server engine behind these web applications is intrinsically "parallel", and can handle and use effectively about as many cores as you care to throw at it. SQL itself is a declarative rather than procedural language, so it is fundamentally concurrent.

A minority of programmers, for example games programmers or those who deal with "embarrassingly parallel" desktop applications such as Photoshop, do need to start working with the current tools and 'low-level' coding techniques that will allow them to exploit multi-core technology. Although currently perceived to be more of "academic" interest, concurrent languages such as Erlang, and concurrency techniques such as "software transactional memory", may yet prove to be significant.

For most programmers and for most web applications, however, the multi-core furore is a storm in a teacup; it's just not relevant. The web and database platforms already cope with concurrency requirements. We are already doing it.

My hope is that this newsletter, sent on April 1st, was intended to be a joke. Having said that, I can’t find any verbage in the email that suggests that it is, in which case, I have to treat it as a legitimate editorial.

And frankly, I think it’s all crap.

It's dangerously ostrichian in nature—it encourages developers to simply bury their heads in the sand and ignore the freight train that's coming their way. Permit me, if you will, a few minutes of your time, that I may be allowed to go through and demonstrate the reasons why I say this.

To begin ...

When the market is slack, nothing succeeds better at tightening it up than promoting serial group-panic within the community. As an example of this, a wave of multi-core panic spread across the Internet about 18 months ago. IT organizations, it was said, urgently had to improve application performance by an order of magnitude in order to cope with rising demand. [...] Multi-core mania gripped the industry.

Point of fact: The “panic” cited here didn’t start about 18 months ago, it started with Herb Sutter’s most excellent (and not only highly recommended but highly required) article, “The Free Lunch is Over: A Fundamental Turn Toward Concurrency in Software”, appeared in the pages of Dr. Dobb’s Journal in March of 2005. (Herb’s website notes that “a much briefer version under the title “The Concurrency Revolution” appeared in C/C++ User’s Journal” the previous month.) And the panic itself wasn’t rooted in the idea that we weren’t going to be able to cope with rising demand, but that multi-core CPUs, back then a rarity and reserved only for hardware systems in highly-specialized roles, were in fact becoming commonplace in servers, and worse, as they migrated into desktops, they would quickly a fact of life that every developer would need to face. Herb demonstrated this by pointing out that CPU speeds had taken an interesting change of pace in early 2003:

Around the beginning of 2003, [looking at the website Figure 1 graph] you’ll note a disturbing sharp turn in the previous trend toward ever-faster CPU clock speeds. I’ve added lines to show the limit trends in maximum clock speed; instead of continuing on the previous path, as indicated by the thin dotted line, there is a sharp flattening. It has become harder and harder to exploit higher clock speeds due to not just one but several physical issues, notably heat (too much of it and too hard to dissipate), power consumption (too high), and current leakage problems.

Joe Armstrong, creator of Erlang, noted in a presentation at QCon London 2007 that another of those physical limitations was the speed of light—that for the first time, CPU signal couldn't get from one end of the chip to the other in a single clock cycle.

Quick: What’s the clock speed on the CPU(s) in your current workstation? Are you running at 10GHz? On Intel chips, we reached 2GHz a long time ago (August 2001), and according to CPU trends before 2003, now in early 2005 we should have the first 10GHz Pentium-family chips.

Just to (re-)emphasize the point, here, now, in early 2009, we should be seeing the first 20 or 40 GHz processors, and clearly we’re still plodding along in the 2 – 3 GHz range. The "Quake Rule" (when asked about perf problems, tell your boss you'll need eighteen months to get a 2X improvement, then bury yourselves in a closet for 18 months playing Quake until the next gen of Intel hardware comes out) no longer works.

For the near-term future, meaning for the next few years, the performance gains in new chips will be fueled by three main approaches, only one of which is the same as in the past. The near-term future performance growth drivers are:

hyperthreading

multicore

cache

Hyperthreading is about running two or more threads in parallel inside a single CPU. Hyperthreaded CPUs are already available today, and they do allow some instructions to run in parallel. A limiting factor, however, is that although a hyper-threaded CPU has some extra hardware including extra registers, it still has just one cache, one integer math unit, one FPU, and in general just one each of most basic CPU features. Hyperthreading is sometimes cited as offering a 5% to 15% performance boost for reasonably well-written multi-threaded applications, or even as much as 40% under ideal conditions for carefully written multi-threaded applications. That’s good, but it’s hardly double, and it doesn’t help single-threaded applications.

Multicore is about running two or more actual CPUs on one chip. Some chips, including Sparc and PowerPC, have multicore versions available already. The initial Intel and AMD designs, both due in 2005, vary in their level of integration but are functionally similar. AMD’s seems to have some initial performance design advantages, such as better integration of support functions on the same die, whereas Intel’s initial entry basically just glues together two Xeons on a single die. The performance gains should initially be about the same as having a true dual-CPU system (only the system will be cheaper because the motherboard doesn’t have to have two sockets and associated “glue” chippery), which means something less than double the speed even in the ideal case, and just like today it will boost reasonably well-written multi-threaded applications. Not single-threaded ones.

Finally, on-die cache sizes can be expected to continue to grow, at least in the near term. Of these three areas, only this one will broadly benefit most existing applications. The continuing growth in on-die cache sizes is an incredibly important and highly applicable benefit for many applications, simply because space is speed. Accessing main memory is expensive, and you really don’t want to touch RAM if you can help it. On today’s systems, a cache miss that goes out to main memory often costs 10 to 50 times as much getting the information from the cache; this, incidentally, continues to surprise people because we all think of memory as fast, and it is fast compared to disks and networks, but not compared to on-board cache which runs at faster speeds. If an application’s working set fits into cache, we’re golden, and if it doesn’t, we’re not. That is why increased cache sizes will save some existing applications and breathe life into them for a few more years without requiring significant redesign: As existing applications manipulate more and more data, and as they are incrementally updated to include more code for new features, performance-sensitive operations need to continue to fit into cache. As the Depression-era old-timers will be quick to remind you, “Cache is king.”

Herb’s article was a pretty serious wake-up call to programmers who hadn’t noticed the trend themselves. (Being one of those who hadn’t noticed, I remember reading his piece, looking at that graph, glancing at the open ad from Fry’s Electronics sitting on the dining room table next to me, and saying to myself, “Holy sh*t, he’s right!”.) Does that qualify it as a “mania”? Perhaps if you’re trying to pooh-pooh the concern, sure. But if you’re a developer who’s wondering where you’re going to get the processing power to address the ever-expanding list of features your users want, something Herb points out as a basic fact of life in the software development world ...

There’s an interesting phenomenon that’s known as “Andy giveth, and Bill taketh away.” No matter how fast processors get, software consistently finds new ways to eat up the extra speed. Make a CPU ten times as fast, and software will usually find ten times as much to do (or, in some cases, will feel at liberty to do it ten times less efficiently).

... then eking out the best performance from an application is going to remain at the top of the priority list. Users are classic consumers: they will always want more and more for the same money as before. Ignore this truth of software (actually, of basic microeconomics) at your peril.

To get back to the editorial, we next come to ...

However, the fever was surprisingly short-lived. Intel's "largest open-source effort ever" to provide a standard tool for writing multi-threaded code, caused little more than a ripple of interest. Various books, rushed out while the temperature soared, advocated the urgent need for new "multi-core-friendly" programming models, involving such things as "software pipelines". Interesting as they undoubtedly are, they sit stolidly on bookshelves, unread.

Wow. Talk about your pretty aggressive accusation without any supporting evidence or citation whatsoever.

Intel's not big into the open-source space, so it doesn't take much for an open-source project from them to be their "largest open-source effort ever". (What, they're going to open-source the schematics for the Intel chipline? Who could read them even if they did? Who would offer up a patch? What good would it do?) The fact that Intel made the software available in the first place meant that they knew the hurdle that had yet to be overcome, and wanted to aid developers in overcoming it. They're members of the OpenMP group for the same reason.

Rogue Wave's software pipelines programming model is another case where real benefits have accrued, backed by case studies. (Disclaimer: I know this because I ghost-wrote an article for them on their Software Pipelines implementation.) Let's not knock something that's actually delivered value. Pipelines aren't going to be the solution to every problem, granted, but they're a useful way of structuring a design, one that's curiously similar to what I see in functional programming languages.

But simply defending Intel's generosity or the validity of an alternative programming model doesn't support the idea that concurrency is still a hot topic. No, for that, I need real evidence, something with actual concrete numbers and verifiable fact to it.

Thus, I point to Brian Goetz’s Java Concurrency in Practice, one of those “books, rushed out while the temperature soared”, which also turned out to be the best-selling book at Java One 2007, and the second-best-selling book (behind only Joshua Bloch’s unbelievably good Effective Java (2nd Ed) ) at Java One 2008. Clearly, yes, bestselling concurrency books are just a myth, alongside the magical device that will receive messages from all over the world and play them into your brain (by way of your ears) on demand, or the magical silver bird that can wing its way through the air with no visible means of support as it does so. Myths, clearly, all of them.

To continue...

The truth is that it's simply not a big issue for the majority of people. Writing truly "concurrent" applications in languages such as C# is difficult, as you get very little help from the language. It means getting involved with low-level concurrency primitives, such as lock statements and so on.

Many programmers lack the skills to do this, but more pertinently lack the need. Increasingly, programmers work in a web environment. As long as these web applications are deployed to a load-balanced web farm, then page requests can be handled in parallel so all available cores will be used efficiently without the need for the programmer to be concerned with fine-grained parallelism.

He’s right when he says you get very little help from the language, be it C# or Java or C++. And getting involved with low-level concurrency primitives is clearly not in anybody’s best interests, particularly if you’re not a concurrency guru like Brian. (And let’s be honest, even low-level concurrency gurus like Brian, or Joe Duffy, who wrote Concurrent Programming on Windows, or Mike Woodring, who co-authored Win32 Multithreaded Programming, have better things to do.) But to say that they “pertinently lack the need” is a rather impertinent statement. “As long as these web applications are deployed to a load-balanced web farm", which is very likely to continue to happen, “then page requests can be handled in parallel so all available cores will be used …”

Um... excuse me?

Didn’t you just say that programmers didn’t need to learn concurrency constructs? It would strike me that if their page requests are being handled in parallel that they have to learn how to write code that won’t break when it’s accessed in parallel or lead to data-corruption problems or race conditions when their pages are accessed in parallel. If parallelism is a fundamental part of the Web, don’t you think it’s important for them to learn how to write programs that can behave correctly in parallel?

Look for just a moment at the average web application: if data is stored in a per-user collection, and two simultaneous requests come in from a given user (perhaps because the page has AJAX requests being generated by the user on the page, or perhaps because there’s a frameset that’s generating requests for each sub-frame, or ...), what happens if the code is written to read a value from the session, increment it, and store it back? ASP.NET can save you here, a little, in that it used to establish a per-user lock on the entirety of the page request (I don’t know if it still does this—I really have lost any desire to build web apps ever again), but that essentially puts an artificial throttle on the scalability of your system, and makes the end-users’ experience that much slower. Load-balancer going to spray the request all over the farm? So long as the user session state is stored on every machine in the farm, that’ll work... But of course if you store the user’s state in the SQL instance behind each of those machines on the farm, then you take the performance hit of an extra network round-trip (at which point we’re back to concurrency in the database) ...

... all because the programmer couldn’t figure out how to make “lock” work? This is progress?

The Java Servlet specification specifically backed away from this "lock on every request" approach because of the performance implications. I heard a fair amount of wailing and gnashing during the early ASP.NET days over this. I heard the ASP.NET dev team say they made their decision because the average developer can't figure out concurrency correctly anyway.

And, by the way folks, this editorial completely ignores XML services. I guess "real" applications don't write services much, either.

The next part is even better:

Furthermore, the SQL Server engine behind these web applications is intrinsically "parallel", and can handle and use effectively about as many cores as you care to throw at it. SQL itself is a declarative rather than procedural language, so it is fundamentally concurrent.

True… and false. SQL is fundamentally “parallel” (largely because SQL is a non-strict functional language, not just a “declarative” one), but T-SQL isn’t. And how many developers actually know where the line is drawn between SQL and T-SQL? More importantly, though, how many effective applications can be written with a complete ignorance of the underlying locking model? Why do DBAs spend hours tuning the database’s physical constructs, establishing where isolation levels can be turned down, establishing where the scope of a transaction is too large, putting in indexed columns where necessary, and figuring out where page, row, or table locking will be most efficient? Because despite the view that a relational database presents, these queries are being executed in parallel, and if a developer wants to avoid writing an application that requires a new server for each and every new user added to the system, they need to learn how to maximize their use of the database’s parallelism. So even if the language is "fundamentally concurrent" and can thus be relied upon to do the right thing on behalf of the developer, the implementation isn't, and needs to be understood in order to be implemented efficiently.

He finishes:

For most programmers and for most web applications, however, the multi-core furore is a storm in a teacup; it's just not relevant. The web and database platforms already cope with concurrency requirements. We are already doing it.

This is one of those times I wish I had a time machine handy—I'd love to step forward five years, have a look around, then come back and report the findings. I'm tempted to close with the challenge to just let’s come back in five years and see what the programming language landscape and hardware landscape looks like. But that's too easy an "out", and frankly, doesn't do much to really instill confidence, in my opinion.

To ignore the developers building "rich" applications (be they being done in Flex/Flash, Cocoa/iPhone, WinForms, Swing, WPF, or what-have-you) is to also ignore a relatively large segment of the market. Not every application is being built on the web and is backed by a relational database—to simply brush those off and not even consider them as part of the editorial reveals a dangerous bias on the editor's part. And those applications aren't hosted in an "intrinsically 'parallel'" container that developers can just bury their head inside.

Like it or not, folks, the path forward isn't one that you get to choose. Intel, AMD, and other chip manufacturers have already made that clear. They're not going to abandon the multicore approach now, not when doing so would mean trying to wrestle with so many problems (including trying to change the speed of light) that simply aren't there when using a multicore foundation. That isn't up for debate anymore. Multicore has won for the forseeable future. And, as a result, multicore is going to be a fact of the developer's life for the forseeable future. Concurrency is thus also a fact of the developer's life for the forseeable future.

The web and database platforms “cope” with concurrency requirements by either making "one-size-fits-all" decisions that almost always end up being the wrong decision for high-scale systems (but I'm sure your new startup-based idea, like a system that allows people to push "micro-entries" of no more than 140 characters in length to a publicly-trackable feed would never actually take off and start carrying millions and millions of messages every day, right?), or by punting entirely and forcing developers to dig deeper beneath the covers to see the concurrency there. So if you're happy with your applications running no faster than 2GHz for the rest of the forseeable future, then sure, you don't need to worry about learning concurrency-friendly kinds of programming techniques. Bear in mind, by the way, that this essentially locks you in to small-scale, web-plus-database systems for the forseeable future, and clearly nothing with any sort of CPU intensiveness to it whatsoever. Be happy in your niche, and wave to the other COBOL programmers who made the same decision.

This is a leaky abstraction, full stop, end of story. Anyone who tells you otherwise is either trolling for hits, trying to sell you something, or striving to persuade developers that ignorance isn't such a bad place to be.

All you ignorant developers, this is the phrase you will be forced to learn before you start your next job: "Would you like fries with that?"

Tags: languages scala clr csharp fsharp functional programming reading csharp fsharp ruby scala

Multi-core Mania: A Rebuttal

In which I rebut an editorial in the Simple-Talk newsletter.