Item 44124741

mrkeen • 7 days ago

> Cursor and OpenAI are powered by a single-box Postgres instance. You’ll be just fine.

Well no, not according to your own source:

  This setup consists of one primary database and dozens of replicas.

Are they just fine?

  There have been several instances in the past where issues related to PostgreSQL have led to outages of ChatGPT.

OK but let's pretend it's acceptable to have outages. It's fine apart from that?

  However, “write requests” have become a major bottleneck. OpenAI has implemented numerous optimizations in this area, such as offloading write loads wherever possible and avoiding the addition of new services to the primary database.

I feel that! I've been part of projects where we've finished building a feature, but didn't let customers have it because it affected the write path and broke other features.

It's been less than a week since someone in the company posted in Slack "we tried scaling up the db (Azure mssql) but it didn't fix the performance issues."

hobs • 7 days ago

I don't understand why that's an acceptable answer when people dont understand the nature of the performance issue.

Network round trip? Scaling the instance aint gonna help. Row by agonizing row? Maybe some linear speedups as you get more IO, but cloud storage is pretty fucking slow. Terrible plan/table/indexing/statistics? Still gonna be bad with more grunt. Blocking and locking and deadlocking the problem? Speeding up might make it worse :)

If people have exponential problems they don't think "let's just get more machines" they think "lets measure and fix the damn thing" but for some reason it doesn't apply to most people's databases.

1 reply

sgarland • 7 days ago

> but for some reason it doesn't apply to most people's databases.

It’s because RDBMS effectively hasn’t changed in decades, and so requires fundamental knowledge of how computers work, and the ability to read dense technical docs. If those two clauses don’t seem related, go read the docs for HAProxy, or Linux man pages, or anything else ancient in the tech world. It used to be assumed that if you were operating complex software, you necessarily understood the concepts it was built on, and also that you could read dozens of pages of plaintext without flashy images and effects.

That’s not to say that all modern software assumes the user is an idiot, or has terrible docs – Django does neither, for example.

> Network round trip? Scaling the instance aint gonna help. Row by agonizing row? Maybe some linear speedups as you get more IO, but cloud storage is pretty fucking slow.

See previous statement re: fundamentals. “I need more IOPS!” You have a 1 msec read latency; it doesn’t matter how quickly it comes off the disk (never mind the fact that the query is probably in a single thread), you have the same bottleneck.

1 reply

ghc • 7 days ago

> Django does neither, for example.

Django is "ancient" just like HAProxy. I deployed my first Django app at the end of 2005.

1 reply

sgarland • 7 days ago

Fair enough, I didn't know it was that old.