Databases – persistence layer or primary entity?

So a couple of weeks ago I was in San Francisco talking with Ken Orr and my old high school/college friend Stan Switzer and Stan was holding forth on an idea that I think is pretty self-evidently true.

This is that when considering software systems, the database is the primary entity, and any applications, interfaces, etc., are secondary to it. This contrasts with a certain viewpoint in “software engineering” (more later on that) that a database acts as a sort of “persistence layer” for your software, keeping data across uses, systems, etc.

Though I think you can certainly treat a database as a persistence layer, I think that Stan’s argument is generally valid. In particular, I note that database schemas are notoriously difficult to update, because all the software that uses it must also be updated. For the same reason the data formats themselves tend to become a bit rigid with time and harder to change.

We saw this with the whole Y2K situation, where suddenly the fact that for decades our databases had only stored two digits for a year became important. The people who designed the old software were aware that this would be an issue at the century turnaround, but thought that the software would all be obsolete and replaced by then.

And most of the software was replaced, but the underlying data was not.

Stan pointed to Hibernate as an example of a “persistence layer” that didn’t really believe its own hype, but knew it had to talk the language of the persistence layer folks. Hibernate actually does what you want: it mediates between software and a database such that many crufty database details can be avoided, but without the inherent limitations of APIs and frameworks that are simply persistence layers.

I suppose I’ve always intuitively understood this. Data representation and database design is always done much earlier than any code considerations when I’m working with a project that has much in the way of data.

2 Comments

  1. Rick Campbell says:

    I second the motion. The database is IT, in the sense that computation in the absense of data is moot. I currently work in web development, and some of my projects are comprised of pages without data “behind” them, but I’ve been doing this long enough (since 1979) to know that EVERYTHING we programmers work on is… er… DATA. Source files are in and of themselves data, which is read and acted upon by a system (the compiler or interpreter). The system that reads and acts upon those sources is data. Data is it. It’s all we have, and to suppose that computation can occur in the absense of data is to suppose that a cup’s being can be fulfilled without ever knowing water.

  2. Yes, we’re contemporaries in programming.

Leave a Reply