koito17 7 days ago

> Without any back-pressure, the program kept stuffing memory with pending results.

The key phrase in the whole article.

This is the reason I am not fully confident in the "pervasive async" mindset overtaking web development. In JavaScript, pervasive async makes sense, because all functions eventually return control to a central event loop anyway. In environments with real threading, the move from OS threads to "green threads and async everywhere" implicitly gives up the tried-and-tested scheduler of your OS and its mechanisms to handle back-pressure. Developers must now take full responsibility for avoiding buffer bloat.

> they fundamentally change how we think about concurrency limits and resource management.

Virtual threads make concurrency cheaper, but nothing about them (or any other concurrency abstraction) eliminates the need for flow control. By switching to virtual threads, you traded off the safeguards once provided by your (bounded!) thread pool and OS scheduler. It's now your responsibility to put safeguards back in.

As an aside: in Clojure's core.async, channel limits (like 1024 pending puts) are intentionally there as a warning sign that your async pipeline isn't handling back-pressure correctly. This is exactly the "hyperactive downloader with no brakes" problem described in the article. With core.async, you would have eventually hit an exception related to excessive puts on a channel, rather than silently expanding buffers until resource exhaustion.

7
pron 7 days ago

> By switching to virtual threads, you traded off the safeguards once provided by your (bounded!) thread pool and OS scheduler.

The OS scheduler provides no safeguards here (you can use the thread-per-task executor with OS threads - you'll just get an OOME much sooner), and as the virtual thread adoption guide says, you should replace bounded thread pools with a semaphore when needed because it's just a better construct. It allows you to limit concurrency for different resources in an easy and efficient way even when different threads make use of different resources.

In this case, by switching to virtual threads you get to use more appropriate concurrency constructs without needing to manage threads directly, and without sharing them among tasks.

never_inline 7 days ago

Stupid question: Why not provide a threadpool-like construct which may not necessarily keep threads around but limits their number?

pron 7 days ago

What you usually want to limit isn't the number of threads but the number of threads doing a particular operation, such as accessing some service, so it's more flexible (and efficient) to just guard that particular operation with a semaphore.

The only place where you may want to limit the number of threads is at the entry to the system, but since the semaphore is the construct for limiting the number of anything, you might as well just use that there, too. For example, if the number of requests currently being processed is too high (above the semaphore's number of leases), you don't accept new connections.

jeroenhd 7 days ago

The amount of actual threads is limited, by default to the amount of CPU cores the system has. The problem isn't the amount of threads itself, but the fact that the threadpool will keep taking on more work, and that it's efficient enough to blow past the CPU+I/O bottleneck. You can achieve the same problem with a threadpool if your threadpool is efficient enough.

Standard concurrency tools like semaphores are capable of reducing the amount of work being done at the same time. You could also simulate classic thread pools by using concurrent lists/sets/etc to replicate traditional work queues, but you'd probably lose any advantage you'd gained from switching to green threads in the first place.

andersmurphy 7 days ago

Good, question. The simple answer is there already is such a construct on the JVM already: the semaphore. People just forget it exists. I wrote an article about it a while back[1].

1. Managing throughput with virtual threads (it's in Clojure, but it's using the underlying Java APIs)

https://andersmurphy.com/2024/05/06/clojure-managing-through...

708145_ 7 days ago

Of course it possible to limit the number of virtual threads. A web server can have a limit on number of virtual threads too, and queue incoming request before dispatching to to workers (virtual threads).

As other have said, this can be achieved with a semaphore and the bulkhead pattern. You can also limit the number of number of connections.

I would expect any "production ready" web server using virtual threads having some sort of limiting. That must be the case right?

1718627440 4 days ago

The OS won't schedule you when your write is blocking, because the pipe is full, same with reading when the pipe is empty.

ndriscoll 7 days ago

Wouldn't there be significant overhead in waking tasks from IO just to put them back to sleep on a semaphore vs. there being fewer threads servicing the IO ring for a given type of task?

mdavidn 7 days ago

Presumably the semaphore would be used to limit the number of concurrent virtual threads.

Incoming tasks can be rejected proactively if a system would overflow its limit on concurrent tasks. This approach uses a semaphore’s non-waiting acquire and release operations. See the “bulkhead pattern.”

Alternatively, if a system has a fixed number of threads ingesting tasks, these would respond to backpressure by waiting for available permits before creating more virtual threads.

pron 7 days ago

It should be the other way around. The concurrency-limited resource is usually accessed by IO (some service), so a thread would wait and then acquire the semaphore before doing the IO.

ndriscoll 7 days ago

I might just be taking the specific use-cases you're talking about for granted? I tend to think the reason you'd use task-specific thread pools in the first place is because the contended resource is CPU and you want different priorities for different types of tasks. E.g. if you have different types of long-lived connections (e.g. websockets) and want to make sure that even if you have 100 type A connections and 50,000 type B connections, you want ~even CPU to be split between A and B. You could use semaphores, but then you're usually waking a B (in response to some socket IO you were waiting on) just to put it to sleep. It seems like it'd make more sense to use two platform thread pools (with the different thread groups waiting on different sockets)?

Is your advice more about things like object pools or connection pools?

pron 7 days ago

The CPU is only contended once it's at 100%. I am not aware of any interactive server (as opposed to batch processing) that behaves well at 100% CPU (beyond short bursts), so I think that the hope you can actually balance the CPU using the OS scheduler (which isn't all that good) to produce good outcomes at 100% CPU is more myth than reality. So yes, I'm talking about resources accessed via IO, because there isn't all that much you can do about a CPU that's at 100% for long durations other than getting more CPU (horizontally or vertically).

However, when you have some background batch computational tasks of some limited concurrency (otherwise you'd be in trouble) then you should have some low-priority platform threads servicing that. Virtual threads work due to Little's law by allowing a high number of threads. For such low-priority background tasks you probably want a very low number of threads (often one), so the ability to have tens of thousands, or millions, of virtual threads won't help you for those background tasks anyway.

tombert 7 days ago

At my last job I made a thing that took in tens of millions of records from Kafka, did some processing, plopped stuff into Postgres, and moved along. It wasn't hard, but getting throughput was difficult to achieve my throughput goals with regular blocking JDBC.

Eventually I moved to an async-non-blocking model, and for my initial tests it went great, stuff returned almost immediately and I could grab the next one. The problem is that, sort of by definition, non-blocking stuff doesn't really know when it's going to complete right away, so it just kept enqueuing stuff until it ate all my memory.

I figured out pretty early on what was happening, but the other people on my team really seemed to struggle with the concept of backpressure, I think because they weren't really used to a non-blocking model. You kind of get backpressure for free with blocking code.

Eventually I was able to solve my problem using a BlockingQueue,

PaulKeeble 7 days ago

It's not really any different to the limitations of memory that we have always dealt with, it just consumes it faster than we might expect. The techniques we use for ensuring we don't run out of memory also apply to anything that can make threads and we have to put limitations on how many can exist, especially since they consume quite a bit of memory even in their virtual thread variant.

wahern 7 days ago

The OOM error is the back pressure. This is why some engineers working on highly concurrent frameworks insist on pervasive OOM recovery.

I haven't done any Java in years, but I always thought OOM errors were recoverable in the JVM. Or is this just a case where the author never thought to catch OOM exceptions? My instinct would be to catch OOM early (i.e. in entry function) in a virtual thread and re-queue the task. In theory re-queueing might fail, too, I guess, but in practice probably not.

This is why I like to code in C (and Lua) for this kind of thing, as in C my core data structures like lists and trees don't require additional allocations for insert. My normal pattern here would be to have, say, three job lists--todo, pending, complete; a job would always exist in one of those lists (presuming it could be allocated and initialized, of course), and no allocations would be required to move a job back from pending to todo on an OOM error during processing.

koito17 7 days ago

In Java, there are two kinds of Throwable instances[1]: Error and Exception. As the name suggests, OutOfMemoryError is a subclass of Error. In contrast to Exception, an Error "indicates serious problems that a reasonable application should not try to catch"[2]. For this reason, it's considered bad practice in Java to catch all Throwable instances (or catch Error instances explicitly).

> My instinct would be to catch OOM early

OutOfMemoryError subclasses VirtualMachineError, and when a VirtualMachineError is thrown, your program seems to be firmly in undefined behavior territory. Quoting the JVM specification [3]:

  A Java Virtual Machine implementation throws an object that is an instance of a subclass of the class VirtualMachineError when an internal error or resource limitation prevents it from implementing the semantics described in this chapter. This specification cannot predict where internal errors or resource limitations may be encountered and does not mandate precisely when they can be reported. 
For context: "this chapter" refers to the whole chapter describing the behavior of each JVM instruction. The specification seems to suggest that all bets are off on any guarantees the JVM makes by the time a VirtualMachineError is thrown.

[1] https://docs.oracle.com/en/java/javase/21/docs/api/java.base...

[2] https://docs.oracle.com/en/java/javase/21/docs/api/java.base...

[3] https://docs.oracle.com/javase/specs/jvms/se21/html/jvms-6.h...

fulafel 7 days ago

Is this a hole in the safety guarantees?

PhilipRoman 7 days ago

Safety can mean many things. I'm fairly sure JVM will keep the C-level memory safety guarantee after an OOM, but you can still end up with Java objects in inconsistent states that will start throwing exceptions when you poke them.

The reason why these exceptions are so "unsafe" is that they can occur at any moment. Maybe you were in the middle of modifying some data structure which will enter an infinite loop the next time you touch it. It's a bit like signals in C where you have to be extremely careful about the APIs you use, except worse because exceptions unwind the stack. At least in C you can still safely call kernel syscalls regardless of userspace state.

fulafel 7 days ago

Yep, it sounds ambiguous. So if there's potentially unsafety allowed in the semantics it would already be interesting. If it turns out that the specification permits "all bets are off" in OOM, eg if it can create conditions where an adversary can gain control of execution in some way, like can happen with synchronisation bugs on some (eg Go) runtimes.

unscaled 7 days ago

You can catch OutOfMemoryErrors in Java, but it wouldn't be as simple as you describe. In a real-world program, you may be doing other things besides downloading URLs and feeding the URLs to the download scheduler. Even though the URL downloader is what's causing the memory pressure, an OutOfMemoryError may very well be thrown by an allocation anywhere else. _You only have one shared heap_, so unless there is only one single thing that is being allocated on the heap you cannot use it as.

Coming from C, you might think you can just avoid heap allocations for anything that you don't manage with backpressure, but that doesn't work in Java, since Java can only store primitives and references on the stack (at least until project Valhalla is delivered[1]). Almost everything you do triggers a heap allocation.

Virtual Threads make this even worse, since their "stack" is also stored on the heap and it can grow dynamically and require more heap allocations even if you just add a primitive object on the stack!

tl;dr: No, you cannot rely on catching OutOfMemoryError in Java in anything but the simplest scenarios and things break down when you have multiple threads and especially virtual threads.

[1] https://openjdk.org/projects/valhalla/

1718627440 4 days ago

> You only have one shared heap

Can't you use mmap then?

> In a real-world program, you may be doing other things besides downloading URLs and feeding the URLs to the download scheduler. Even though the URL downloader is what's causing the memory pressure, an OutOfMemoryError may very well be thrown by an allocation anywhere else.

But upon that OOM, can't you still release the memory from the downloader so that you have memory for your other allocation?

3cats-in-a-coat 7 days ago

Everything, always, returns to a main event loop, even the actual hardware threads on your CPU. So JavaScript developers don't get a pass on this.

The problem is you keep creating async requests, they need to be stored somewhere, and that's in memory. Thread or no thread, it's in memory.

And backpressure is essential, yes. Except for basic tasks.

But there is another solution, which is coalescing requests. I.e. in my current project, I combine thousands of operations, each of which could've been its own async request, into a single "unit of work" which then results in a single async request, for all operations combined.

If we treat IO seriously, we won't just fire off requests into the dark by the hundreds. We'd think about it, and batch things together. This can be done manually, but it's best done at a lower level, so you don't have to think about it at the business logic level.

jandrewrogers 7 days ago

Proper async architecture requires designing a scheduler appropriate for your workload. Many (most?) async applications ignore or skip this part and try to delegate it to a runtime or the OS, usually with mediocre results. Designing a robust scheduler usually requires excellent visibility into the real-time distribution of available resources so that you can dynamically schedule load away from the resources under the most pressure at every instant in time. If you aren’t going to do that work, async may not be the best architecture choice.

pron 6 days ago

First, this is true in any architecture. Second, in Java you can do just that with the use of semaphores guarding different resources with a different number of leases.

The one resource you can't easily schedule well is CPU, but that's not a big problem because interactive servers don't work well when the CPU needs to be scheduled (i.e. when it's at 100%). Again, this is true regardless of architecture.

Back when we first released virtual threads, many asked us why we didn't make the default virtual thread scheduler offer time-sharing (with or without priorities). The answer was that we wanted to, but were unable to find workloads for the intended domain of the default scheduler (interactive servers) where any kind of CPU scheduling could help (we said that when somebody finds such real workloads, we'll add fairer CPU scheduling to the default scheduler, assuming there was a good algorithm that would address such workloads).

jandrewrogers 6 days ago

This is kind of what I am talking about though. The whole point of async scheduling is that semaphores, locking, etc are unnecessary because you strictly control the CPU schedule and can dynamically rewrite the schedule such that the need for such mechanisms largely doesn't exist. This provides a qualitative performance boost for quite a few analytics and other workloads, which is the only reason we do it (since it is considerably more complex to implement).

If you aren't bothering to write a real scheduler then it becomes less clear why you would bother with true async. Lots of hassle for not much payoff.

wyago 7 days ago

This feels analogous to saying that static memory management might be superior, because if dynamic memory is available developers can trivially produce an OOM.

It's possible we'll start borrowing more patterns from Erlang/OTP since it's been living in a post-greenthreads world for as long as many languages have existed, and has developed patterns around it.