Item 44077377

semiquaver • 7 days ago

I’m confused. Isn’t including the canonical state of the database schema in version control along with all the migrations that brought it to that point a completely standard part of every web framework?

xyzzy123 • 7 days ago

It often is, but the schema might be written against a language-specific ORM.

That code might in turn have plugins or feature flags that mean you don't know the concrete sql schema until runtime.

Same for seed data and migrations.

So it depends on the use-case how useful this format is for tooling and discovery vs an actual connection to the database.

1 reply

tough • 6 days ago

Sane ORMs will still use sql for migration files

2 replies

cortesoft • 6 days ago

There are a lot of advantages to using a DSL for migration files rather than pure sql. For one, many times you will get both forward and reverse migration ability automatically, to allow rollbacks (with the ability to mark certain migrations as unable to rollback). You also can have the DSL utilize different features depending on which database you are using without having to change your migration. You can also use special syntax to designate relations between tables easily without having to write the boilerplate SQL every time.

If you are using an ORM, using the same language and mental model to describe your database structure as you do to interact with the database makes a lot of sense.

OJFord • 6 days ago

I agree it's much better to, but the biggest ones are probably Django & Rails (?), and they don't.

1 reply

tough • 6 days ago

yeah tbh i was mostly thinking about the newer crop of ts/js ones (kysely/drizzle) vs the earlier mess that TypeORM or others where on its place, so at least its not so bad.

jen20 • 6 days ago

The _canonical_ state is what is in the production database(s). What's in version control is hopefully able to recreate it, with the obvious caveat of being highly unlikely to be able to repopulate the data.

1 reply

dotancohen • 6 days ago

For development, the actual data itself is not so important, but the features of the data are extremely important. Such as which fields have higher or lower cardinality, which fields are accessed often and which are barely touched.

Often times, the indexes will reflect this. Often times, not.

cerved • 7 days ago

I'm more confused why the version control of the thing using the database is including the entire schema of the database in it's repository

2 replies

schrodinger • 7 days ago

The database schema of an app is tightly coupled enough to essentially be code, and migrations are also probably checked in. This lets you see how the db schema changes over time -- likely along with the queries using the schema.

1 reply

int_19h • 6 days ago

This is only true when your app is the only thing that is using the database. In enterprise environments, databases are frequently shared between many different apps.

1 reply

mnahkies • 6 days ago

Sure, and in those cases you'd typically have a dedicated repository storing said schema and migrations.

It's important to manage your schema in code for various reasons from change control to standing up development databases, etc.

koolba • 7 days ago

It makes it trivial to have pretty diffs of the net result of migrations.

wredcoll • 7 days ago

That "works" for about as long as you have <10 employees and <3 customers or so. After that the railsapp doesn't get to be the sole owner of the db.

3 replies

bigfatkitten • 6 days ago

Providing access to other services is what APIs are for.

Jeff Bezos famously said[1] that anyone who does otherwise should be fired, and I agree.

[1] https://news.ycombinator.com/item?id=18916406

3 replies

bsaul • 6 days ago

the reason he did that was for business reasons. He wanted to be able to expose any part of the stack as a public service to external customers, and vice versa, to let his internal service compete against the publicly available ones.

But this is only valid when you're trying to build AWS. Not everyone does that.

Relational databases have extensive permission systems for a reason.

1 reply

cowsandmilk • 6 days ago

The distributed computing manifesto for Amazon is from 1998. Jeff was not thinking about those things then. And wouldn’t for 5 more years.

int_19h • 6 days ago

SQL is also an API.

dragonwriter • 6 days ago

> Providing access to other services is what APIs are for.

Yeah, and databases expose extensive APIs with extremely battle-tested security and permissions models for multiple consumers.

1 reply

bigfatkitten • 6 days ago

That’s why your bank lets you connect directly to their database to see your transaction history.

YorickPeterse • 7 days ago

It worked fine for GitLab when it had 2000+ employees and god knows how many customers. The same applies to many other large Rails shops.

sbarre • 7 days ago

Do you have multiple separate apps that can change a shared DB schema?

How do you keep that all in sync across your apps?

2 replies

scott_w • 6 days ago

So usually one is the “main” but, personally speaking, I’d just say “don’t do it,” for the obvious issues I suspect you’re aware of ;-)

By “don’t do it,” I mean having multiple apps talk to one DB schema.

1 reply

throwaway7783 • 6 days ago

"don't do it" is the right answer. Others have pointed it out as well, many large SaaS companies I worked with, have had apps owning their databases. Anyone else needs anything - use APIs (and ETL if you need everything)

1 reply

ecb_penguin • 6 days ago

It's definitely not the right answer. It's actually the completely wrong answer.

Services are slow, restrictive, and don't enjoy the benefits of an actual DBMS, like transactions. You also add additional dependencies and failure points.

1 reply

scott_w • 5 days ago

This isn’t a discussion of whether you should use services or not, only what to do in the case where you do have separate services interacting with a schema. And, in this case, trying to maintain consistency with multiple apps accessing the DB directly can quickly turn into a nightmare. Better to maintain separate services interacting over some contract to insulate multiple codebases from the internals of each other.

1 reply

ecb_penguin • 3 days ago

> Better to maintain separate services interacting over some contract to insulate multiple codebases from the internals of each other.

All you've done is shifted the dependency and made extra work when schemas change. Now we have schema updates, _and_ service updates.

As I said, you give up transactions, performance, and flexibility. You also increase the workload, and increase failures. There is almost no good reason for what you're proposing.

2 replies

throwaway7783 • 3 days ago

I am not sure I understand. The context was multiple applications managing/accessing the same database. No matter how you slice it, a schema update is accompanied by application updates. If anything, in a service oriented approach, you can potentially isolate the impact by versioning APIs and writing compatibility logic in a single place.

1 reply

ecb_penguin • 3 days ago

> a schema update is accompanied by application updates

1. DB -> App = Two updates, the schema and the app

2. DB -> Service -> App = Three updates, the schema, the service, and the app

In both cases, obviously the DB changes.

In both cases, obviously the App changes.

In the second case, you also get to update your services.

1 reply

scott_w • 2 days ago

No, it’s not obvious the DB changes because the team depending on that schema doesn’t necessarily talk to your team much, and you might not even know that team exists.

The same is true for API contracts but the culture of changing an API is much more understood as something that must be communicated and processes are more rehearsed.

1 reply

ecb_penguin • 2 days ago

> No, it’s not obvious the DB changes because the team depending on that schema doesn’t necessarily talk to your team much, and you might not even know that team exists.

Literally everything you just said applies to services and API contracts as well. You haven't solved anything.

> the culture of changing an API is much more understood as something that must be communicated and processes are more rehearsed

You're just making this up. There are thousands of incidents every day because someone changed an API without communicating to every stakeholder.

"Services are better than databases because service owners are better at communicating changes than database owners" isn't a very compelling argument.

I'd work to fix the communication problem before I adopted a terrible tech solution.

2 replies

scott_w • 2 days ago

>> the culture of changing an API is much more understood as something that must be communicated and processes are more rehearsed

> You're just making this up. There are thousands of incidents every day because someone changed an API without communicating to every stakeholder.

What you wrote isn't a contradiction of what I wrote. I wrote "more understood" not "perfectly solved problem."

> "Services are better than databases because service owners are better at communicating changes than database owners" isn't a very compelling argument.

Except they are because the tooling is better. I can check calls to an API endpoint using the same tooling that I would for checking end user API calls. I can push out comms to use the new API, I can confirm calls go to 0 before deleting the old endpoint.

For checking that a table or column is being accessed before changing / deleting it? Fuck, I don't actually know how I'd orchestrate that. And once I've made that change (vs when I make an API change), well, I hope I got it 100% correct because reverting that change is harder.

throwaway7783 • 2 days ago

It is more that services allow more backward compatibility than direct DB changes, because you can also code additional logic to maintain that.

Of course in a narrow case where all apps that need the same DB need to be upgraded/fixed in lockstep, maybe you can get away by directly updating the database. But more than 2 teams, it will quickly be a nightmare of conflicting priorities across teams, amount deployment coordination required.

"Oh you cannot change this schema right now because 3 other apps depend on it and they dont have time to make the necessary changes in their app to accommodate this right now, for the next 6 months"

scott_w • 2 days ago

Except if your schema changes in a way that forces your API contract to change, you have a big red alarm to tell you to investigate what’s using that contract and update it too.

When your apps connect to a DB schema, the change is silent. You just run ALTER TABLE and… whoops, you just brought down a service you didn’t know existed because the contract was implicit.

int_19h • 6 days ago

In situations like these, the database admins are the ones responsible for the schema; the apps are mere users.