24 August 2016
More than a decade ago, I published Effective Enterprise Java, and in the opening chapter I talked about the Ten Fallacies of Enterprise Computing, essentially an extension/add-on to Peter Deutsch's Fallacies of Distributed Computing. But in the ten-plus years since, I've had time to think about it, and now I'm convinced that Enterprise Fallacies are a different list. Now, with the rise of cloud computing stepping in to complement, supplment or replace entirely the on-premise enterprise data center, it seemed reasonable to get back to it.
I'll expand on the items in the list over future blog posts, I imagine, but without further ado, here's the Fallacies of Enterprise Computing.
As Deutsch said, "Essentally everyone, when they first build a [enterprise] system, makes the following [nine] assumptions. All prove to be false in the long run and all cause big trouble and painful learning experiences."
Naturally, I welcome discussion around these, and I may edit and/or append to this list as time goes by, but this is where the past decade has led me.
After building IT systems for more than sixty years, one would think we as an industry would have learned that "newer is not always better". Unfortunately, this is a highly youth-centric industry, and the young have this tendency to assume that anything new to them is also new to everybody else. And if it's new, it's exciting, and if it's exciting, it must be good, right? And therefore, we must throw away all the old, and replace it with the new.
This cannot be emphasized enough: This is fallacious, idiotic, stupid, and brain-dead.
This fallacy is an extension of the old economic "limited market" fallacy: The more gains one entity makes in a market, the more that other entities lose. (Essentially, it suggests that the market is intrinsically a zero-sum game, despite obvious evidence that markets have grown substantially even in just the last hundred years since we started tracking economics as a science.) Thus, for example, if the cloud is new, and it has some advantages over its "competitors", then every "win" for the cloud must mean an equal "loss" for the alternatives (such as on-prem computing). Never mind that the cloud solves different problems than on-prem computing, or that not everything can be solved using the cloud (such as computing when connections to the Internet are spotty, nonexistent, or worse, extremely slow).
Now, for those of you who have been engaged in the industry for more than just the past half-decade, here's the $65,535 question for you: How is "the cloud" any different from "the mainframe", albeit much, much faster and with much, much greater storage?
Those who cannot remember the past are condemned to repeat it. --George Santanyana, Historian
I've seen this play out over and over again, starting with my own entry into the IT universe with C++ (which was the "new" over C), and participated in a few system rewrites to C++ from other things (Visual Basic being one, C being another, sometimes some specific vertical stuff as well). Then I saw it again when Java came around, and companies immediately started rewriting some of their C++ systems into Java. This time around, I started to ask, "Why?", and more often than not, answers of "We don't want to fall too far behind" or "We need to modernize our software" were the fairly vague answers. (When pressed as to why "falling behind" was bad, or why software needed to be modernized, I was usually shushed and told not to worry about it.)
In the years since, I keep thinking that companies have started to get this message more thoroughly, but then something comes along and completely disrupts any and all lessons we might have learned. After Java, it was Ruby. Or, for those companies that didn't bite on the Java apple, it was .NET. Now NodeJS. Or NoSQL. Or "cloud". Or functional programming. Or take your pick of any of another
Unfortunately, as much as I wish I could believe that "it's different this time" and we as an industry have learned our way through this, I keep seeing signs that no, unfortunately, that's too much to hope for. The easy way to mitigate this fallacy is to force those advocating new technology to enumerate the benefits in concrete terms---monetary and/or temporal benefits, ideally, backed by examples and objective analysis of pros and cons.
By the way, for those who aren't sure if they can spot the fallacy, the easy way to tell if somebody is falling into this fallacious trap is to see if their analysis contains both positive and negative consequences. No technology is never without its negatives, and a practical and objective analysis will point it out. If it's you doing the analysis, then force yourself to ask the question, "When would I not use this? What circumstances would lead me away from it? When is using this going to lead to more pain than it's worth?"
This means, simply, that any enterprise system is subject to the same fallacies as any other distributed system. Reliability, latency, bandwidth, security, the whole nine yards (or the whole eight fallacies, if you prefer) are all in play with any enterprise system.
If you're not familiar with the Eight Fallacies of Distributed Systems, take some time to make yourself familiar with them and some of the mitigation strategies.
(Note: I wrote this up a long time ago in a blog post as the "Eleventh Fallacy of Distributed Systems", but it feels vastly more relevant as an Enterprise Fallacy.)
The reason this is a fallacy is because the term "business logic" is way too nebulous to nail down correctly, and because business logic tends to stretch out across client-, middle- and server- tiers, as well as across the presentation and data access/storage layers.
This is a hard one to swallow, I’ll grant. Consider, for a moment, a simple business rule: a given person’s name can be no longer than 40 characters. It’s a fairly simple rule, and as such should have a fairly simple answer to the question: Where do we enforce this particular rule? Obviously we have a database schema behind the scenes where the data will be stored, and while we could use tables with every column set to be variable-length strings of up to 2000 characters or so (to allow for maximum flexibility in our storage), most developers choose not to. They’ll cite a whole number of different reasons, but the most obvious one is also the most important–by using relational database constraints, the database can act as an automatic enforcer of business rules, such as the one that requires that names be no longer than 40 characters. Any violation of that rule will result in an error from the database.
Right here, right now, we have a violation of the "centralized business logic" rule. Even if the length of a person’s name isn’t what you consider a business rule, what about the rule stating that a person can have zero to one spouses as part of a family unit? That’s obviously a more complicated rule, and usually results in a foreign key constraint on the database in turn. Another business rule enforced within the database.
Perhaps the rules simply need to stay out of the presentation layer, then. But even here we run into problems–--how many of you have used a website application where all validation of form data entry happens on the server (instead of in the browser using script), usually one field at a time? This is the main drawback of enforcing presentation-related business rules at the middle- or server-tiers, in that it requires round trips back and forth to carry out. This hurts both performance and scalability of the system over time, yielding a poorer system as a result.
So where, exactly, did we get this fallacy in the first place? We get it from the old-style client/server applications and systems, where all the rules were sort of jumbled together, typically in the code that ran on the client tier. Then, when business logic code needed to change, it required a complete redeploy of the client-side application that ended up costing a fortune in both time and energy, assuming the change could even be done at all–the worst part was when certain elements of code were replicated multiple times all over the system. Changing one meant having to hunt down every place else a particular rule was–or worse, wasn’t–being implemented.
This isn’t to say that trying to make business logic maintainable over time isn’t a good idea–--far from it. But much of the driving force behind "centralize your business logic" was really a shrouded cry for "The Once and Only Once Rule" or the "Don’t Repeat Yourself" principle. In of themselves, they're good rules of thumb. The problem is that we just lost sight of the forest for the trees, and ended up trying to obey the letter of the law, rather than its spirit and intentions. Where possible, centralize, but don't take additional costs beyond the benefits of doing so.
By the way, one place where the "centralize only if it's convenient" rule has to be set aside is around validating inputs from foreign locations---in other words, any data which is passed across the wire or comes in from outside the local codebase. In order to avoid security vulnerabilities, data should always be verified as soon as it reaches your own shores, even if that means duplicating it in every foreign-accessible interface.
As tempting as it is to create "one domain model to rule them all", particularly given all the love for Domain-Driven Design in the past ten years or so. A similar corollary to the "one domain model" is the "one database model"---at some point in the enterprise IT manager's tenure, somebody (usually a data architect or consultant) will suggest that massive savings (of one form or another) can be had for the taking if the company takes the time to create a unified database. In other words, bring all the different scattered databases together under one roof, centralized in one model, and all the data-integration problems (data feeds into databases, ETL processes, and so on) will be a thing of the past as every single codebase now accesses the Grand Unified Data Model.
I have never seen one of these projects ever actually ship. Other architects have told me that they've had them ship, but when I follow up with people who've been at said companies, the universal story I hear is that once built, the resulting model was so complex and unwieldy that within a short period of time (usually measured in months) it was abandoned and/or fractured into smaller pieces so as to be usable.
The problem here is that different parts of the enterprise care about different aspects of a given "entity". Consider the ubiquitous "Person" type, which is almost always one of the first built in the unified model. Sales cares about the Person's sales history, Marketing cares about their demographic data (age, sex, location, etc), HR cares about their company-related information (position, department, salary, benefits status, etc), and Fulfillment (the department that ships your order once purchased) cares about address, credit card information, and the actual order placed.
Now, obviously, trying to keep all of this in one Person entity (the so-called "fat" entity, since it has everything that any possible department could want from it) is going to be problematic over time---if nothing else, fetching a list of all of the Persons from the system for a dropdown will result in downloading orders of magnitude more data than actually required. (This also runs afoul of the "Bandwidth is inifite" and "Latency is zero" and "Transport cost is zero" fallacies of Distributed Systems.) Clients will quickly start caching off only the parts they care about, and the centralized data model is essentially decentralized again.
The next reasonable step is to split Person up into "derived" models, usually (in the relational sense) by creating subsidiary tables for each of the specific parts. This is reasonable, assuming that the cost of doing joins (in the relational sense) across the tables is acceptable. Unfortunately, these sorts of centralized data models are usually supposed to hold the entirety of the enterprise's data in one database, so the costs of doing joins across millions of rows in multiple tables is often prohibitive. But let's leave that alone for a moment.
Where things really start to go awry is that enterprise systems are never monolithic (see the next fallacy), and the code that accesses the centralized data model often needs to be modified in response to "local" concerns; for example, HR may suddenly require that "names" (which are common to the Person core table) be able to support internationalization, but Marketing is right in the middle of an important campaign, and any system downtime or changes to their codebase are totally unacceptable. Suddenly we have a political tug-of-war between two departments over who "owns" the schedule for updates, and at this point, the problem is no longer a technical problem whatsoever. (This is the same problem that sank most centralized distributed systems, too---any changes to the shared IDL or WSDL or Schema have to ratified and "bought off" by all parties involved.)
Where this falls apart for domain models is right at the edge of the language barrier---a domain model in the traditional DDD sense simply cannot be shared across language boundaries, no matter how anemic. Classes written in C# are not accessible to Java except through tools that will do some form of language translation for local compilation, and these will almost always lose any behavior along the way---only the data types of the fields will be brought along. Which sort of defeats half the point of a Rich Domain Model.
While this may have been true in older systems (like, around the mainframe era), often the whole point of an enterprise system is to integrate with other systems in some way, even if just accessing the same database. Particularly today, with different parts of the system being revised at different times (presentation changes but business logic remains the same, or vice versa), it's more important than ever to recognize the different parts of the system will need to deploy, version, and in many cases be developed independently of one another.
This fallacy is often what drives the logic behind building microservice-based systems, so that each microservice can be managed (deployed, versioning, developed, etc) independently. However, despite the fact that many enterprise IT departments are building microservices, they then undo all that good work by then implicitly creating dependencies between the microservices with no mitigating strategy to deal with one or more of those microservices being down or out. This means that instead of explicit dependencies (which might force the department or developers to deal with the problem explicitly), developers will lose track of this possibility until it actually happens in Production---which usually doesn't end well for anybody.
The enterprise is a constantly shifting, constantly changing environment. Just when you think you've finished something, the business experts come back with some new requirements or some changes to what you've done already. It's the driving reason behind a lot of the fallacies of both distributed systems and enterprise systems, but more importantly, it's the underlying impetus behind most, if not all, enterprise software development. Enterprise developers can either embrace this, and recognize that systems need to be able to evolve effectively over time---or look for work in other industries.
This means, then, that anything that gets built here should (dare I say "must") be built with an eye towards constant-modification and incessant updates. This is partly why agile methodologies have taken the enterprise space with such gusto---as a construction approach, by the fact that agile embraces the idea that everything is constantly in flux, it deals far more easily with the idea that the system is never finished.
Alternatively, we can phrase this as "Vendors can make problem 'X' a vendor problem", where 'X' is one of scalability, security, maintainability, flexibility, and just about any other "ility" you care to name. As much as vendors have been trying to make this their problem, for the better part of two or three decades, they've never been able to do so except in some very narrow vertical circumstances. Even in today's cloud-crazed environment, companies that try to take their existing enterprise systems and move them to the cloud as-is (the classic "lift and shift" strategy) are finding that the cloud has nothing magical in it that makes things scale automagically, secure them, or even make them vastly more manageable than they were before. You can derive great benefits from the cloud, but in most cases you have to meet the cloud halfway---which then means that the vendor didn't make the problem go away, they just re-cast the problem in terms that make it easier for them to sell you things. (And even then, they can only make a few of those probems go away, often at the expense of making other problems more difficult. As an example of how deployments and dependency management got burned, for example, see "npm-Gate".)
Somehow, there seems to be this pervasive belief that if you've done enterprise architecture at company X, you can take those exact same lessons and apply them to your experience at company Y. This might be true if every company had exactly the same requirements, but ask any consultant who's been engaged with clients for more than a few years, and you'll find out that the Venn diagram of requirements between any two companies overlaps about 80% or so. But here's the ugly truth of that secret: if we look at the Venn diagram of all the companies, they aren't overlapping on the same 80%---it's always a different 80% between themselves and any other company. Which means, collectively, that the sum total of all companies overlaps across maybe 5%. (All accounting systems agree on what credits and debits are, but from there, the business rules tend to diverge.)
Given that enterprise architecture is highly context-sensitive to the enterprises for which it is being developed, it would stand to reason that enterprise architecture will differ from one company to the next. No matter what the vendor/influencer tries to tell you, no matter how desirable it is to believe, there is no such thing as a "universal enterprise architecture"; not MVC, not n-tier, not client-server, not microservices, not REST, not containers, and not whatever-comes-next.
Enterprise systems come with much higher criticality concerns than the average consumer software product. Consider, for a moment, the average iOS or Android application---if it crashes mid-use, the user is obviously annoyed, and if it happens too often, they might uninstall the application entirely, but no signficant monetary loss is incurred to the company. If, on the other hand, the company's e-commerce system crashes, literally thousands of dollars are potentially being lost per minute (or second, if the scale is that of an Amazon or other large-scale e-tailer) until that system gets back on its feet and can start processing transactions again. And that's not counting the cost of potential customer service costs or even lawsuits if an order is lost because the system went down mid-transaction and put the data into a corrupted or unrecoverable state. Nor does that consider the intangible costs that come into play when Ars Technica or Forbes or---worst of all---the Wall Street Journal covers the outage in their latest report.
Enterprise systems, by definition, have much higher reliability and recoverability concerns. That means, practically speaking, that any enterprise system must pay much greater attention to how the system is administered, deployed, monitored, managed, and so on. Thanks to the emphasis on the whole "DevOps" thing, this is becoming less of an argument with most developers, but even within companies that don't subscribe to all of the "DevOps" philosophy, developers will need to spend time thinking (and coding) about how operations staff will do all of the things they need to do to the system after its deployment.
For example, one such concern is that of error management and handling. But first, please repeat after me: "It is never acceptable to find out about an enterprise system outage from your users." Why this is not an accepted truth is well beyond me, but countless enterprise systems seem to feel it perfectly acceptable to show their users stack traces when things go wrong, or that it's perfectly acceptable to only worry about restarting the system when a user complaint informs Operations it's down.
Yes, vendors can often provide certain kinds of management software to look at the system from the outside---keeping track of processes to make sure they're still running and such---but on the whole, it's going to be up to developers building the enterprise system to make sure that Operations staff can peer inside the system to make sure everything is running, running smoothly, and can make the changes necessary (such as adding users, changing users' authorized capabilities, adding new types of things into the system, and so on) without requiring a restart or editing cryptic text files. Management, monitoring, deployment, restarting the system after a failure---these, and more, are all developer responsibilities until the developers provide those capabilities to Operations staff to actually use.
Last modified 24 August 2016