01 February 2025

tl;dr Objects, despite being the most common tool form of mainstream programming languages, are often not as well-understood as a data concept as one might think. In an object data model, entities are defined as unions of state and behavior (and behavior is often of much less concern to the data modeler) that in turn can be related to other objects through a variety of mechanisms (type, ownership, association, and so on).

Refresher

Object systems are typically characterized by some basic components:

Shape analysis

The one-way nature of references in object systems often creates some design opportunities and restrictions. For example, in the following code:

In other words, working with a Person type that looks like:

class Person {
    private String name;
    private Person spouse;
    private List<Person> children;
}

Person ted = new Person("Ted");
Person charlotte = new Person("Charlotte");
ted.spouse = charlotte;
charlotte.children.append(new Person("Michael"));
charlotte.children.append(new Person("Matthew"));
ted.children = charlotte.children; // deliberately point to same List

... not only is marriage not implicitly reflexive, but neither is parentage.

Inheritance is another form of object association. Like the other forms, this associative relationship is usually one way; that is, children know their parents but not the other way around. In O-O programming, the parent can remain entirely ignorant of any derived classes through the use of dynamic dispatch (virtual methods), but this doesn't typically apply as a need in an object database. (An object query language obviates much of the need here.)

While identity often appears in other storage systems (most notably relational databases, where identity is captured via primary key)***, in object systems it tends to be more subtle***, where the object identifier (OID) is often implicitly defined and stored. (This is similar to the "ROWID" that is often found in relational tables that serves as the unique identifier for the row in the table, regardless of primary key definitions.) This implicit identity can sometimes be a little confusing around objects that are equivalent but not identical, in the same way that it can be confusing in object languages.

Most object databases supported some form of query language, either a SQL derivative and/or utilizing the syntax of the programming language (db4o called this 'naive' queries), and/or using a language-level API to build an exemplar object by which to search (called 'query by example' or QBE). This also often included a subtle parameter called "fetch depth", which essentially described how many object-association "links" to go down when retrieving the graph of objects to retrieve. For example, a fetch depth of 0 meant fetch the object queried and nothing else (leaving any associations empty or lazily-loaded), whereas a fetch depth of 1 meant fetch the object queried as well as the objects directly referenced from that object, and a fetch depth of 2 meant go to the objects directly referenced from the objects directly referenced from the object queried, and so on.

In other words, working with a Person type that looks like:

class Person {
    private String name;
    private Person spouse;
    private List<Person> children;
}

Person ted = new Person("Ted");
Person charlotte = new Person("Charlotte");
ted.spouse = charlotte; charlotte.spouse = ted;
charlotte.children.append(new Person("Michael"));
charlotte.children.append(new Person("Matthew"));
ted.children = charlotte.children; // deliberately point to same List

We would get a graph that looks like:

flowchart TB
    ted[Ted]-->charlotte[Charlotte]
    charlotte-->ted
    ted-->kids[List]
    charlotte-->kids
    kids-->michael[Michael]
    kids-->matthew[Matthew]

Note that if Persons know their parents, things get pretty messy pretty fast:

flowchart TB
    ted[Ted]-->charlotte[Charlotte]
    charlotte-->ted
    ted-->kids[List]
    charlotte-->kids
    kids-->michael[Michael]
    michael-->ted
    michael-->charlotte
    kids-->matthew[Matthew]
    matthew-->ted
    matthew-->charlotte

But this also highlights that in an object system, there is not a single well-defined root/starting point. The collection of objects can, and frequently is, a cyclic graph that can be "entered" (for query purposes) from anywhere--this is in contrast to associative systems (where we can only query by the key) or hierarchical systems (where we must work from the document's root node on down).

Most object database query engines also understand inheritance implicitly, so in a simple hierarchy like this:

classDiagram
    class Animal
    Vehicle <|-- Car
    Vehicle <|-- Boat
    Vehicle <|-- Airplane

... the database understood that a query of "select all Vehicles" would retrieve Car instances, Boat instances, and Airplane instances, as well as any Vehicle instances. This could lead to some very fine-grained queries ("select all Vehicles that have a passenger compartment of 1 and a range of over 500 miles except for motorcycles and rocket ships"), which could also create some indexing nightmares.

In many cases, an object database could only retrieve whole objects, even if only a subset of the object graph was necessary/desired to satisfy the request. This could lead to some performance issues, but from a design perspective mostly meant that any activity had to be thought in terms of "whole objects". For example, we often employ a "master-detail" approach, where we fetch a list of entities (such as students at a university) and display their identifying characteristics (first and last name, and student identifier) in a large list, prompting the user to select one for in-depth (displaying all of the selected student's information) examination. In a system which supports "partial results", such as SQL where we can "select id, fname, lname from students" and retrieve only three columns out of however many are defined on students, we can efficiently retrieve only the data we wish to display on that "master" UI. In a "whole results" system like a traditional object store, we have to fetch all of the student objects, displaying first the full collection's first and last names then after that the selected student's details. This means fetching the entirety of the student object in the bulk query, but also means no further round trips to the storage engine are required.

Thought Experiment: We often saw the desire in database systems for local code execution within the same node or process as the database server--in the RDBMS world we call them "stored procedures" and managed them through the database SQL interface--but the same concept never really took off within the OODBMS, where it would make actually much more sense. Some object query languages could invoke an object method as part of a query expression, but in general that didn't allow for object modification. It's intriguing to imagine what the overall experience of an OODBMS would be like if it could be a full object execution node, and not just data storage. Might even start resembling more of the old "distrubed objects" space, which is both a good thing and a bad thing....

Thought Experiment: Although unconventional, we've found in the hierarhical database world (namely, XML) that a query language (XPath) used against an in-memory collection of entities (an Infoset document, usually in a DOM) to be a powerful way to interact with only the dataset I'm interested in. Curiously, what would happen if we tried to apply a similar thought process to large in-memory object collections? Feels like it could be useful....

Implementations

A full list of object databases, apparently all of which remain in current use, can be found at https://db-engines.com/en/ranking/object+oriented+dbms . Note that this list does mix multi-model databases with object ones, which seems to be a common theme in a lot of lists like this.

Sadly, wandering through object-shaped datastore implementations is like walking through a graveyard in the middle of the night--so many reactions of the "Oh, man, I'm sad to see them gone" and "Oh, what might've been if only...." sort.


Tags: engineering   storage   database