catlifeonmars 2 days ago

> OpenAI routes API requests to servers that recently processed the same prompt,

My mind immediately goes to rowhammer for some reason.

At the very least this opens up the possibility of some targeted denial of service

1
xmprt 2 days ago

Later they mention that they have some kind of rate limiting because if over ~15 requests are being processed per minute, the request will be sent to a different server. I guess you could deny cache usage but I'm not sure what isolation they have between different callers so maybe even that won't work.

catlifeonmars 2 days ago

So the doc mentions you can influence the cache key by passing an optional user parameter. It’s unclear from the doc whether the user parameter is validated or if you can just provide an arbitrary string.

catlifeonmars 2 days ago

15 requests/min is pretty low. Depending on how large the fleet is you might end up getting load balanced to the same one and if it’s round robin then it would be deterministic