23 January 2008

The recent "failure" of the Chandler PIM project generated the question, "Can Dynamic Languages Scale?" on TheServerSide, and, as is all too typical these days, it turned into a "You suck"/"No you suck" flamefest between a couple of posters to the site.

I now make the perhaps vain attempt to address the question meaningfully.

What do you mean by "scale"?

There's an implicit problem with using the word "scale" here, in that we can think of a language scaling in one of two very orthogonal directions:

Part of the problem I think that appears on the TSS thread is that the posters never really clearly delineate the differences between these two. Assembly language can scale(2), but it can't really scale(1) very well. Most people believe that C scales(2) well, but doesn't scale(1) well. C++ scores better on scale(1), and usually does well on scale(2), but you get into all that icky memory-management stuff. (Unless, of course, you're using the Boehm GC implementation, but that's another topic entirely.)

Scale(1) is a measurement of a language's ability to extend or enhance the complexity budget of a project. For those who've not heard the term "complexity budget", I heard it first from Mike Clark (though I can't find a citation for it via Google--if anybody's got one, holler and I'll slip it in here), he of Pragmatic Project Automation fame, and it's essentially a statement that says "Humans can only deal with a fixed amount of complexity in their heads. Therefore, every project has a fixed complexity budget, and the more you spend on infrastructure and tools, the less you have to spend on the actual business logic." In many ways, this is a reflection of the ability of a language or tool to raise the level of abstraction--when projects began to exceed the abstraction level of assembly, for example, we moved to higher-level languages like C to help hide some of the complexity and let us spend more of the project's complexity budget on the program, and not with figuring out which register needed to have the value of the interrupt to be invoked. This same argument can be seen in the argument against EJB in favor of Spring: too much of the complexity budget was spent in getting the details of the EJB beans correct, and Spring reduced that amount and gave us more room to work with. Now, this argument is at the core of the Ruby/Rails-vs-Java/JEE debate, and implicitly it's obviously there in the middle of the room in the whole discussion over Chandler.

Scale(2) is an equally important measurement, since a project that cannot handle the expected user load during peak usage times will have effectively failed just as surely as if the project had never shipped in the first place. Part of this will be reflected in not just the language used but also the tools and libraries that are part of the overall software footprint, but choice of language can obviously have a major impact here: Erlang is being tossed about as a good choice for high-scale systems because of its intrinsic Actors-based model for concurrent processing, for example.

Both of these get tossed back and forth rather carelessly during this debate, usually along the following lines:

A couple of thoughts come to mind on this whole mess.

Just how refactorable are you?

First of all, it's a widely debatable point as to the actual refactorability of dynamic languages. On NFJS speaker panels, Dave Thomas (he of the PickAxe book) would routinely admit that not all of the refactorings currently supported in Eclipse were possible on a dynamic language platform given that type information (such as it is in a language like Ruby) isn't present until runtime. He would also take great pains to point out that simple search-and-replace across files, something any non-trivial editor supports, will do many of the same refactorings as Eclipse or IntelliJ provides, since type is no longer an issue. Having said that, however, it's relatively easy to imagine that the IDE could be actively "running" the code as it is being typed, in much the same way that Eclipse is doing constant compiles, tracking type information throughout the editing process. This is an area I personally expect the various IDE vendors will explore in depth as they look for ways to capture the dynamic language dynamic (if you'll pardon the pun) currently taking place.

Who exactly are you for?

What sometimes gets lost in this discussion is that not all dynamic languages need be for programmers; a tremendous amount of success has been achieved by creating a core engine and surrounding it with a scripting engine that non-programmers use to exercise the engine in meaningful ways. Excel and Word do it, Quake and Unreal (along with other equally impressively-successful games) do it, UNIX shells do it, and various enterprise projects I've worked on have done it, all successfully. A model whereby core components are written in Java/C#/C++ and are manipulated from the UI (or other "top-of-the-stack" code, such as might be found in nightly batch execution) by these less-rigorous languages is a powerful and effective architecture to keep in mind, particularly in combination with the next point....

Where do you run again?

With the release of JRuby, and the work on projects like IronRuby and Ruby.NET, it's entirely reasonable to assume that these dynamic languages can and will now run on top of modern virtual machines like the JVM and the CLR, completely negating arguments 2 and 4. While a dynamic language will usually take some kind of performance and memory hit when running on top of VMs that were designed for statically-typed languages, work on the DLR and the MLVM, as well as enhancements to the underlying platform that will be more beneficial to these dynamic language scenarios, will reduce that. Parrot may change that in time, but right now it sits at a 0.5 release and doesn't seem to be making huge inroads into reaching a 1.0 release that will be attractive to anyone outside of the "bleeding-edge" crowd.

So where does that leave us?

The allure of the dynamic language is strong on numerous levels. Without having to worry about type details, the dynamic language programmer can typically slam out more work-per-line-of-code than his statically-typed compatriot, given that both write the same set of unit tests to verify the code. However, I think this idea that the statically-typed developer must produce the same number of unit tests as his dynamically-minded coworker is a fallacy--a large part of the point of a compiler is to provide those same tests, so why duplicate its work? Plus we have the guarantee that the compiler will always execute these tests, regardless of whether the programmer using it remembers to write those tests or not.

Having said that, by the way, I think today's compilers (C++, Java and C#) are pretty weak in the type expressions they require and verify. Type-inferencing languages, like ML or Haskell and their modern descendents, F# and Scala, clearly don't require the degree of verbosity currently demanded by the traditional O-O compilers. I'm pretty certain this will get fixed over time, a la how C# has introduced implicitly typed variables.

Meanwhile, why the rancor between these two camps? It's eerily reminiscent of the ill-will that flowed back and forth between the C++ and Java communities during Java's early days, leading me to believe that it's more a concern over job market and emplyability than it is a real technical argument. In the end, there will continue to be a ton of Java work for the rest of this decade and well into the next, and JRuby (and Groovy) afford the Java developer lots of opportunities to learn those dynamic languages and still remain relevant to her employer.

It's as Marx said, lo these many years ago: "From each language, according to its abilities, to each project, according to its needs."

Oh, except Perl. Perl just sucks, period. :-)

PostScript

I find it deeply ironic that the news piece TSS cited at the top of the discussion claims that the Chandler project failed due to mismanagement, not its choice of implementation language. It doesn't even mention what language was used to build Chandler, leading me to wonder if anybody even read the piece before choosing up their sides and throwing dirt at one another.


Tags: industry   ruby   clr   jvm   development processes   languages  

Last modified 23 January 2008