In 2006 I was on a rescue project. We were required to integrate with a legacy database for reference data, and at the heart of it sat a single table with over 300 columns. The only things that really knew what those columns meant were the applications that had been writing to it for years. We delivered the project. It was successful. It was also significantly painful — and, being honest, we did not leave the database in better shape than we found it. The boy-scout rule was not applied.
It is not an unusual story.
The shape of the database I keep meeting
Twenty years on, the same kind of database keeps turning up. Ten or fifteen years old. Two or three applications writing to it. Lightly normalised. A handful of different naming conventions, depending on which era of the codebase added the table. Few foreign key constraints. Almost every column nullable. Obvious functional dependencies between columns that the schema has nothing to say about.
None of that is necessarily a mistake. Frameworks like Rails actively encourage "let the application manage the database integrity", and that's a principled position. The trouble is what happens later: the application that was supposed to be managing the integrity gets rewritten twice, the people who knew the integrity rules move on, and you inherit a schema whose rules are now almost entirely implicit.
The questions you find yourself asking
When you're learning one of these databases, the same kinds of questions come up repeatedly, and they tend to cascade. Imagine you've just been handed a table with this lovely cluster of columns:
valor_typ, valor, valor_typ2, valor2, valor_typ3, valor3
You start with one nullable column and end up with five questions:
- Does
valorreally need to be nullable, or was it just declared that way because the framework defaults to nullable? valor_typlooks like an enumeration — what do the integer values mean?valoris avarchar, but it doesn't look like prose. What's its format? Does the format change depending on the value ofvalor_typ?- Do the numbered siblings —
valor_typ2/valor2,valor_typ3/valor3— follow the same pattern? Are they always populated together? Never populated together? Independently nullable? - What does
valormean anyway?
Answering any one of these is straightforward. You write a query, you read some application code, you ask the person on Slack who's been here longest. The problem isn't that the questions are hard. The problem is that the answers go nowhere. They live in a Slack thread, or in your head, until somebody else asks the same question six months later and starts over.
An idea
I started turning over a question: what if teams could simply document what they discovered about a database as they went? Not as a process. Not as a separate wiki that drifts out of date the moment it's written. Just a place to put a note about a column, attached to that column, that anybody else looking at it would find.
And then a second question: what if that documentation was validateable? If you write down "this column is never null", something somewhere should be able to check whether that's still true tomorrow. If you write down "these two tables have a 1:* relationship", the data ought to be able to confirm or contradict it.
That's where Quellery began.
A train project
It got built on commuter trains. An hour of train each way is a surprisingly productive amount of time, and over the years Quellery has gone through five rewrites, with multi-year gaps between them while work, family, and the occasional bout of boredom got in the way.
Most of those rewrites weren't because the previous attempt was wrong. The risk for a train project is that it becomes a trainING project — and for several years, that's exactly what it was. I wanted to learn a new programming language, or web framework, so I rewrote Quellery to learn whatever it was.
So what I have had for most of those years was almost a product, but never quite enough to release.
But eventually, the draw of actually delivering a product became too great, and here it is!
A note about quality
Before going further, a disclaimer. I don't really believe in objective quality. If a system has been maintained in production for a decade or more, it unquestionably has an admirable quality, no matter what anybody thinks of its code. The quality perspective I take in Quellery — the one that drives the validators and the migration generator — is specifically the perspective of the Relational Model of Data. It's a particular point of view, and not the only one worth having.
As Jack Handy more or less observed, it's easy to sit there and criticise. And I guess that's what I like about it. It's easy. Just sitting there, rocking back and forth, criticising.
What Quellery does
Quellery reads your live schema and lets you build a conceptual model on top of it. The model documents what you understand: nullability, keys, relationships, functional dependencies, null cohorts, naming conventions, and the rules for the values inside individual columns and between columns. Each piece is validated against the live data, continuously. When the model is ready, Quellery generates the SQL migration scripts to bring the schema in line — NOT NULL, foreign keys, unique indexes, check constraints, and table extractions where they make sense.
Crucially, the model and the schema are allowed to disagree. That's the whole point. A model isn't a migration; it's a hypothesis you keep testing. The disagreement is how you find out which assumptions are wrong and which constraints are safe to add.
It works on its own — one developer, an unfamiliar database, an afternoon of exploration — and it works better with a team, because the model becomes shared institutional memory rather than the contents of any one person's head.
That's the why. If you'd like to see the how, the Get Started page is the fastest way in.