The relational database needs no "defense"

11 June 2007

Anyone who is deeply enmeshed in a technology feels compelled to defend that technology when any sort of "threat" (or perception of threat) appears on the horizon, and apparently Gavin is no different. Sure enough, as people (apparently in this case, myself) start to talk about approaches to persistence that don't involve Hibernate, Gavin feels compelled to point to these other technologies using inflammatory terms and a certain amount of FUD. I felt a certain responsibility to respond, since it seems that he's taking a direct shot at the db4o articles I've written and discussed before.

(By the way, it's also entirely possible that he's taking aim against ActiveRecord and Rails, which I don't consider to be an "object database" at all; if that's the case, then I apologize ahead of time for misunderstanding the intent--and the points--of the piece. But the arguments he makes seem pretty relevant to the OODBMS-vs-RDBMS discusison as well, so much so that it was a db4o employee who pointed out the blog entry to me in the first place. In any event, though, Gavin's piece raises some issues that deserve to be discussed, regardless of the context of Rails or OODBMSs.)

First of all, let me state quite clearly, the relational database needs no defense. Take whatever comparitive criteria you like, the RDBMS has been, and will, in the absence of a nearly catastrophic change to the contrary, continue to be, the choice of businesses all over the world for storing data in a format that's easily-accessed from a variety of different systems. The RDBMS clearly "owns" the corporate data center, from Fortune X's (meaning X can be just about any number you choose to put there) down through single-person shops. To shake that kind of (dare I say it?) monopoly would require a kind of technology shift on the scale of the move from the mini- and mainframe to the PC. Those kinds of shifts don't happen very often, and when they do, it's because of a huge competitive advantage.

Furthermore, I wil go on the record and say it here: neither the OODBMS nor the HODBMS (hierarchically-oriented database system, a la the "XML database") makes that kind of case. Not right now, and probably not ever. They have compelling reasons for existence, but not so strong a case that they could displace the RDBMS from the "enterprise data" throne. That said, however, since when does one tool solve all problems? They have their own raisons d'etre, and to simply say that the OODBMS or HODBMS should be ignored just because "we've always used an RDBMS" is a crime just as great.

Now, having said that, let's take a look at Gavin's points:

"Object databases were a total failure and still are." Actually, he's right, from the perspective that the OODBMS clearly has not penetrated the corporate environment to the same degree that the RDBMS has. But, by that same token, the RDBMS, nearly a decade after its introduction, had about the same degree of success. Ask the folks who were around when Oracle 1 was released, and they'll tell you about the criticisms leveled at the RDBMS that are, in a startling replay of the past, now being applied to the OODBMS today. The first generation of anything is always crap... including O/R-Ms. Fortunately for both O/R-Ms and OODBMSs, neither is in their first generation stage anymore.

"the systems are often not called "object databases" in today's marketing literature, but we will call them that anyway, since that is what they are." Actually, all of the OODBMS vendors are pretty ready to call themselves OODBMSs, and I have to say, Gavin, you'd know that if you talked to them for more than, say, 30 seconds, or took the time to research the subject and listen to what they had to say. The folks who don't bother calling their systems "object databases" anymore are the very folks he's defending: Oracle, DB/2, and so on. (Anybody remember "Oracle Objects"? Table + sprocs == objects? Oy, what a mess.) But don't feel too bad, Gavin, you're in good company--Chris Date himself makes this same mistake (though he at least admits that true "object" support in the database model requires features that aren't present in todays RDBMS products, not that he's a big fan of those products anyway), so at least you're in good comapny. (Again, if you're talking Rails being an "object database", total agreement, it's not even close. But in all the years I've been hanging out with Dave Thomas, Bruce Tate, Stu Halloway, Justin Gehtland, and a bunch of the other Rails advocates/evangelists/lecturers/authors, I've never heard any of them make this assertion.)

Object-relational mapping isn't that hard, so there's no need to eliminate it. Sorry, Gavin, but the fact is, this remains, and always will remain, a point of difference between you and I, and between you and a fairly large number of developers I've spoken to over the years at conferences and consulting engagements and classes. For simple table-to-class mappings, you're right, it's a pretty simple thing. It is, however, still a "dual schema" problem, in that now you have two competing "sources of truth" that have to be reconciled to one another, the database schema, and the object model. Now, perhaps if all the projects you've ever done are projects where the developer gets to define both, then the problem doesn't appear, but if you're in an "enterprise" world where the database schema is managed by a team of DBAs and is shared across projects, you don't have the flexibility to "refactor" the schema like you can your object model. (Anyone who's ever tried to build a CORBA or DCOM system that stretches across corporate or division or department boundaries understands the problems of trying to create a domain model--or schema--that serves all groups well without sacrificing performance, elegance, or normal form.) I particularly like this statement:

So, from this point of view, ORM is at least as good as an object database for all usecases, and handles other usecases (indeed, the common cases) which the object database approach does not.

... particularly since he doesn't bother to go on to describe those use cases that the ORM handles that the OODBMS does not. Examples? 'Tis very easy to make assertions, but without backing them up....

Oh, and the comment that "If you just want to "throw some objects in the database", you'll never need to write a single mapping annotation." really sort of proves the point I try to make in the ODMG.org paper: if you just want to "throw some objects in the database", why do you bother having an RDBMS in the first place? There are DBAs that are in open revolt at the idea, particularly since you've also just conveniently left out any sort of indexing or other tuning decisions that will make the database perform at all reasonably. But, I suppose, if you're willing to argue "development speed uber alles", then sure, go ahead. Never mind the fact that an OODBMS will handle this exact situation, because that's exactly what they were made for. I repeat the statements I made in the ODMB paper: if you want persistence to just be an implementation detail, then why bother with the RDBMS in the first place? (It's not like any self-respecting DBA is going to want to take your slapdash relational schema, anyway...)

Don't use the OODBMS because it creates a tight coupling between your code and your data storage, and the language you use today won't necessarily be around tomorrow. Um... exactly. This is, surprisingly enough, exactly the point I'm trying to make in the ODMG paper: that an OODBMS creates a tight coupling between code and data, and sometimes, that's not what you want. Nothing is a silver bullet, everything comes with a price and a consequence of using it. It's only the honest vendors that will tell you when not to use their stuff, and from experience, the db4o guys (the only ones I can concretely speak to) are the first to stand up and tell you that they aren't trying to replace the RDBMS. So why spread the FUD that they are?

OODBMSs are trying to pull the wool over your eyes with benchmarks. So, again, rather than display his own benchmark that directly contradicts the benchmark offered by the OODBMS folks, Gavin chooses to say, "Look at all the reasons why they run faster, and look, these reasons are all clearly bogus." Which is kind of astute of him: lawyers are taught in law school that if the law isn't on your side, argue the facts, and if the facts aren't on your side, argue the law, and if neither is on your side, argue really really loudly. Toss out a benchmark of your own, Gavin, and then we can discuss the decisions you make in your benchmark and see if they're reasonable decisions to make for my own projects, so I can make an informed decision, rather than one based on your assertions and loud arguments that amount to "Duh!".

OODBMSs are faster because they run in-process. Some do, yes. Most can run either in-proc or out-of-proc, which (gasp!) is something that RDBMSs can do, too. Or have you not noticed HSQL and Derby recently? And yes, running the RDBMS in-proc performs better than running the RDBMS out-of-proc. Running anything in-proc performs better than running out-of-proc. And yes, you're right, sometimes you don't have the option of in-proc. But in a situation where you're just "throwing objects into the database", and nobody else is connecting to this data (in other words, you can be tightly coupled to the data storage), why take that overhead if it's not necessary? Choosing an out-of-proc database because "somebody may want to get to this data someday" is YAGNI, pure and simple.

"... the problem is that existing, mature RDBMS systems happen to not be written in Java (see Benefit #3)." Ouch. Don't let the Cloudscape developers hear you say that. Granted, HSQL is not what I'd call a "heavy-duty" RDBMS, but Gavin, not everything has to be stored in Oracle. Sometimes a lighter-weight database--MySQL, HSQL, Postgres, or even (gasp!) Access--is good enough. Or are you advocating that everybody should be using clustered J2EE servers to build their 5-user department calendar app? (Maybe it's a Seam thing, I dunno.)

OODBMSs don't scale because they share a lot of state across concurrent threads. Any architecture that shares state across concurrent threads will have a hard time scaling, but... aren't you the guy arguing that stateful session beans are better than stateless? And how is this different from an RDBMS sharing state across concurrent threads? The transaction model isn't any different between the OODBMS and the RDBMS...

OODBMS benchmarks suck because they measure ORM with caching turned off. As well they should, because not all ORM users can use caching. Particularly if they need to bypass the ORM for particularly sophisticated straight-up SQL queries. (Unless, of course, one subscribes to the belief that HQL or OQL is just as powerful as SQL itself, and therefore can do anything that SQL can do...) That said, it's still a fair argument, and benchmarks, if they're to be at all useful to the general community (as opposed to being just plain marketing fluff), should detail exactly how they were run so a technology investigator can re-run the benchmark on their own, see if the results match, and tune them as desired to better match their architectural constraints or opportunities.

"Things that do more stuff are slower". Agreed... but how is this refuting the point? If an O/R-M is doing more stuff than an OODBMS, but the end result is the same from the programmer's perspective, the fact tha the O/R-M has to do more stuff shouldn't be held against it? That's like suggesting that Tonya Harding should have gotten a do-over in the Olympics because she was kinda upset about all the bad publicity.

"Fetching hierarchical data.... there is no a priori reason why an object database should be any faster than an ORM solution for this." Absolutely! The problem is with the general approach of trying to manage the associations of the object model and the fact that the complete object graph (which doesn't have to be a hierarchy, by the way) frequently is larger than the programmer wants to pull across the wire. (Which is another great reason to look into an in-proc solution: no wires involved.) This will remain a problem--pending a perfect solution, which I believe does not exist, since the decision whether to eager- or lazy-fetch elements or associations will vary on a case-by-case basis--for both the OODBMS and the O/R-M world.

Gavin concludes with this:

If you think that relational technology is for persisting the state of your application, you've missed the point. The value of the relational model is that it's democratic. Anyone's favorite programming language can understand sets of tuples of primitive values. Relational databases are an integration technology, not just a persistence technology. And integration is important. That's why we are stuck with them.

Agreed! He makes my point for me: if you are in a situation where the data needs to be loosely coupled from the object model, then you need an RDBMS, and you cannot assume that the relational schema can closely mirror the object model--which essentially makes the point that the relational schema is the big winner in the dual schema decision (which is a perfectly fine decision to make, so long as you accept that your object model might suffer in its "purity" as a result). You have essentially acknowledged the dual schema problem, and chosen to let the relational schema be core definition. (Arguably, this is the only reasonable decision to make if your relational schema is fixed ahead of time.)

Tags: clr c++ java j2ee ruby xml services rdbms

The relational database needs no "defense"

In which I discuss criticism of the relational database.