Faster, easier 2D vector rendering [video]

Posted by raphlinus 1 day ago

youtube.com

Slides: https://docs.google.com/presentation/d/1f_vKBJMaD68ifBO2j83l...

141

thangalin • 18 hours ago

Fun story.

The Ministry of Education was using MS Gothic for printed student transcripts. To help students send transcripts directly to post-secondary schools, the Ministry wanted to shift from paper to digital copies. This meant producing a PDF file that had like-for-like characteristics with the printed copy.

Legally, Microsoft requires licensing MS Gothic if the font is used in server-side generated documents. I raised this issue with the Ministry as part of my work in recreating the transcripts. MS Gothic proved to be cost-prohibitive and I suggested they used Raph Levien's unencumbered Inconsolata Zero instead, which is a near-perfect drop-in replacement for MS Gothic and drew inspiration from Letter Gothic.

Now, the stakeholders for the Ministry of Education are extremely protective of the transcript format and there was a subtle, but important difference: The Ministry wanted a non-decorated zero whereas Inconsolata Zero's is slashed. That would not fly with the Ministry.

I, a complete stranger, emailed Raph. The next day, he asked Alexei Vanyashin to set up a custom version of Inconsolata Zero. Alexei went above and beyond to fix all the issues I encountered and about eight days later we had a free replacement for Inconsolata Zero without a dotted zero that passed Ministry scrutiny.

Hard to believe that that was nine years ago.

Aside, my coworkers got a kick of watching me walk down the hall from the printer back to my desk holding two overlapping pieces of paper up to the lights: an official student transcript and my version. This was the technique I used to make sure that the PDF file produced a pixel-perfect replica on paper.

1 reply

inemesitaffia • 9 hours ago

Is this publicly available?

krona • 1 day ago

As someone interested but unfamiliar with the state-of-the-art of GPU vector rasterization I struggle to understand how the method described here isn't a step back from the SLUG algorithm or the basic ~100 lines of glsl of the vector texture approach of https://wdobbie.com/post/gpu-text-rendering-with-vector-text... from nearly a decade ago (albeit with some numerical precision limitations.)

Is the problem here that computing the vector texture in real-time is too expensive and perhaps that font contours are too much of a special case of a general purpose vector rasterizer to be useful? The SLUG algorithm also implements 'banding' which seems similar to the tiling described in the presentation.

pier25 • 14 hours ago

Raph, are you familiar with Rive[1]?

Their vector rendering implementation is much faster than Skia[2].

How does Vellum compares to Rive in terms of performance?

[1] https://rive.app/

[2] https://x.com/gordonphayes/status/1654107954268782595

leetrout • 1 day ago

Raph - I know enough to be very dangerous with GPUs so please forgive my ignorance. Two questions:

1. Do you have a favorite source for GPU terminology like draw calls? I optimized for them on an unreal engine project but never "grokked" what all the various GPU constructs are and how to understand their purpose, behavior and constraints. (For this reason I was behind the curve for most of your talk :D) Maybe this is just my lack of understanding of what a common / modern pipeline consists of?

2. I replayed the video segment twice but it is still lost on me how you know which side of the path in a tile is the filled side. Is that easy to understand from the code if I go spelunking for it? I am particularly interested in the details on how that is known and how the merge itself is performed.

2 replies

raphlinus • 1 day ago

1. I like ryg's "A trip through the Graphics Pipeline" [1]. It's from 2011 but holds up pretty well, as the fundamentals haven't changed. The main new topic, perhaps, is the rise of tile based deferred rendering, especially on mobile.

2. I skipped over this in the interest of time. `Nevermark has the central insight, but the full story is more interesting. For each tile, detect whether the line segment crosses the top edge of the tile, and if so, the direction. This gives you a delta of -1, 0, or +1. Then do a prefix sum of these deltas on the sorted tiles. That gives you the winding number at the top left corner of each tile, which in turn lets you compute the sparse fills and also which side to fill within the tile.

[1]: https://fgiesen.wordpress.com/2011/07/09/a-trip-through-the-...

Nevermark • 1 day ago

Given the edges are represented by lines, line representation needs to be robust to any angle, including very vertical or very horizontal, and they are segments:

I expect the line segments are represented by their two end points.

This makes it easy to encode which side is fill vs. alpha, by ordering the two points. So that as you move from the first point to the second, fill is always on the right. (Or vice versa.)

Another benefit of ordering at both point and segment levels, is from one segment to the next, a turn to the fill side vs. the alpha side, can be used to inform clipping, either convex or concave, reflecting both segments.

No idea if any of this what is actually happening here, but this is one way to do it. The animation of segmentation did reflect ordered segments, and clockwise for the outside border, and counterclockwise for the cavity in R. Fill to the right.

1 reply

leetrout • 1 day ago

Makes sense! Thank you for the explainer and the call out to the detail of which way they walk the segments.

coffeeaddict1 • 1 day ago

Great presentation. However, I have mixed feelings about Vello. On one hand, it's awesome that someone is truly trying to push the GPU to do 2D rendering work. But my impression is that the project has hit some considerable limitations due to the lack of "inter-workgroup" level cooperation in current rendering APIs and difficulties with dynamic memory allocation on the GPU. I'm sure the hybrid rendering scheme they have will be great, as the people behind it are extremely capable, but is that really a meaningful step beyond what Pathfinder achieved years ago? Also, in terms of CPU rendering, Blend2D exists and it's blazing fast. Can Vello's CPU backend really do better?

1 reply

virtualritz • 1 day ago

TLDR; there other reasons why someone would prefer vello than speed.

There are different applications for 2D rendering.

In our case we need support for the rendering to take place with f32/float precision, i.e. RGBA colors need to be 96 bit values.

We also do not care if the renderer is realtime. The application we have is vector rendering for movie production.

That's where the multiple backend approach of vello and especially the vello-cpu crate become really interesting. We will either add the f32 support ourselves or hope it will become part of the vello roadmap at some stage.

Also, Blend2D is C++ (as is Skia, the best alternative, IMHO). Adding a C++ toolchain requirement to any Rust project is always a potential PITA.

For example, on the (Rust) software we work on, C++ toolchain breakage around a C++ image processing lib that we Rust-wrapped cost us two man weeks over the last 11 months. That's a lot for a startup where two devs work on the resp. affected part.

Suffice to say, there was zero Rust toolchain-related work done or breakage happening in the same timeframe.

1 reply

Asm2D • 1 day ago

Blend2D has C-API and no dependencies - it doesn't even need a C++ standard library - so generally it's not an issue to build it and use it anywhere.

There is a different problem though. While many people working on Vello are paid full time, Blend2D lacks funding and what you see today was developed independently. So, the development is super slow and that's the reason that Blend2D will most likely never have the features other libraries have.

xixixao • 1 day ago

Is his tattoo an Euler spiral? I made a little explainer for those (feedback welcome)

https://xixixao.github.io/euler-spiral-explanation/

1 reply

jobstijl • 22 hours ago

It is :) https://mastodon.online/@raph/110749467084183893

morio • 1 day ago

You got a long way to go. Writing a rasterizer from scratch is a huge undertaking.

What's the internal color space, I assume it is linear sRGB? It looks like you are going straight to RGBA FP32 which is good. Think how you will deal with denormals as the CPU will deal with those differently compared to the GPU. Rendering artifacts galore once you do real world testing.

And of course IsInf and NaN need to be handled everywhere. Just checking for F::ZERO is not enough in many cases, you will need epsilon values. In C++ doing if(value==0.0f){} or if (value==1.0f){} is considered a code smell.

Just browsing the source I see Porter Duff blend modes. Really, in 2025? Have fun dealing with alpha compositing issues on this one. Also most of the 'regular' blend modes are not alpha compositing safe, you need special handling of alpha values in many cases if you do not want to get artifacts. The W3C spec is completely underspecified in this regard. I spent many months dealing with this myself.

If I were to redo a rasterizer from scratch I would push boundaries a little more. For instance I would target full FP32 dynamic range support and a better internal color space, maybe something like OKLab to improve color blending and compositing quality. And coming up with innovative ways to use this gained dynamic range.

4 replies

mfabbri77 • 1 day ago

You didn't mention one of the biggest source of 2d vector graphic artifacts: mapping polygon coverage to the alpha channel, which is what virtually all engines do, and is the main reason why we at Mazatech are writing a new version of our engine, AmanithVG, based on a simple idea: draw all the paths (polygons) at once. Well, the idea is simple, the implementation... not so much ;)

bschwindHN • 15 hours ago

> Just browsing the source I see Porter Duff blend modes. Really, in 2025? Have fun dealing with alpha compositing issues on this one.

What should we be using in 2025? I thought pre-multiplied alpha is essentially what you go for if you want a chance of alpha compositing ending up correct, but my knowledge is probably outdated.

1 reply

kookamamie • 7 hours ago

You absolutely want premult alpha when dealing with multiple transparent layers in graphics.

1 reply

bschwindHN • 7 hours ago

Right - maybe I'm mistaken but doesn't Porter-Duff compositing encompass premultiplied alpha?

raphlinus • 1 day ago

It's device sRGB for the time being, but more color spaces are planned.

You are correct that conflation artifacts are a problem and that doing antialiasing in the right color space can improve quality. Long story short, that's future research. There are tradeoffs, one of which is that use of the system compositor is curtailed. Another is that font rendering tends to be weak and spindly compared with doing compositing in a device space.

1 reply

morio • 1 day ago

Yeah, there is an entire science on how to do font rendering properly. Perceptually you should even take into account if you have white text on black background or the other way as this changes the perceived thickness of the text. Slightly hinted SDFs kind of solve that issue and look really good but of course making that work on CPUs is difficult.

1 reply

elcritch • 9 hours ago

What's difficult with font SDFs on the CPU? The bezier paths?

I made myself a CPU SDF library last weekend, primarily for fast shadow textures. It was fun, and I was surprised how well most basic SDFs run with SIMD. Except yeah Beziers didn't fair well. Fonts seem much harder.

SIMD was easy, just asked Claude to convert my scalar Nim code to Neon SIMD version and then to an sse2 version. Most SDFs and gaussian shadowing got 4x speedup on my macbook m3. It's a bit surprising the author has so much trouble in Rust. Perhaps fp16 issues?

1 reply

morio • 4 hours ago

I haven't looked at this recently but from what I remember rendering from SDF textures instead from simple alpha textures was 3-4 times slower, including optimizations where fully outside and inside areas bypass the per pixel square root. Of course SIMD is a must, or at least the use _mm_rsqrt_ss.

pixelpoet • 1 day ago

Isn't "linear sRGB" an oxymoron?

1 reply

morio • 1 day ago

Not really, it's the same color primaries just without the non-linear transfer function.

littlestymaar • 1 day ago

What I love with Raph Levien is how it does exactly the opposite of what most people do in tech: given the current financial incentives, the only thing most people can afford is to do something “good enough” as fast as possible, and in the end we end up with lots of subpar, half-baked solutions that cannot be fixed properly because many people rely on the tool they have and fixing it in depth would break everyone's workflow. Until the next solution appears, and given the same structural constraints, end up failing in the exact same shortcomings.

Instead Raph has spent the past 9 years I believe, trying to create a sound foundation on the problem of performant UI rendering.

I don't know how it will go, and he's going to end up shipping his grand vision at all eventually, but I really appreciate the effort of “doing something well” in a world that pretty much only rewards “doing something quickly”.

2 replies

pvg • 1 day ago

There's a funny bit in that vein in Cringley's Accidental Empires:

“The first volume of Knuth's series (dedicated to the IBM 650 computer, "in remembrance of many pleasant evenings") was printed in the late 1960s using old-fashioned but beautiful hot-type printing technology, complete with Linotype machines and the sharp smell of molten lead. Volume 2, which appeared a few years later, used photo-offset printing to save money for the publisher (the publisher of this book, in fact). Knuth didn't like the change from hot type to cold, from Lino to photo, and so he took a few months off from his other work, rolled up his sleeves, and set to work computerizing the business of setting type and designing type fonts. Nine years later, he was done.”

unconed • 21 hours ago

We already have plenty of techniques that are fast enough for classic UI rendering. There is no conceivable bottleneck for the kind of stuff that is on your screen right now. It's not a matter of "doing something quickly" imo, that's an issue specific to the games industry, and largely caused by the need to make entirely custom, animated, textured UIs as a feature for a single product.

What projects like Slug and Vello rather show is that GPU coding remains so obtuse that you cannot tackle an isolated subproblem like 2D vector rendering, and instead have to make apple pie from scratch by first creating the universe. And then the resulting solution is itself a whole beast that cannot just be hooked up to other API(s) and languages than it was created for, unless that is specifically something you also architect for. As the first slide shows, v1 required modern GPUs, and the CPU side uses hand-optimized SIMD routines.

2D vector graphics is also just an awkward niche to optimize for today. GPUs are optimized for 3D, where z-buffers are used to draw things in an order-independent way. 2D graphics instead must be layered and clipped in the right order, which is much more difficult to 'embarrassingly' parallelize. Formats like SVG can have an endless number of points per path, e.g. a detailed polygon of the United States has to be processed as one shape, you can't blindly subdivide it. You also can't rely on vanilla anti-aliasing because complementary edges wouldn't be fully opaque.

Even if you do go all the way, you'll still have just a 2D rasterizer. Perhaps it can work under projective transform, that's usually pretty easy, but will it be significantly more powerful or extensible than something like Cairo is today? Or will it just do that exact same feature set in a technologically sexier way? e.g. Can it be adapted to rendering of 3D globes and maps, or would that break everything? And note that rasterizing fonts as just unhinted glyphs (i.e. paths) is rarely what people what.

2 replies

california-og • 4 hours ago

My wish is for a fast SVG renderer in the browser. At the moment, basic vector drawing is fast, but almost any use of filters or effects lags the browser. Theres a lot that SVG could do for (web) UI, but won't because it's so slow. Here's a small thought experiment I made a while ago for using SVG for more unconventional webdesign. Sadly it lags a lot.

https://hlnet.neocities.org/grid-drawings/grid-drawing

littlestymaar • 11 hours ago

> We already have plenty of techniques that are fast enough for classic UI rendering. There is no conceivable bottleneck for the kind of stuff that is on your screen right now.

I disagree, you either have the old object-oriented toolkits, which are fast enough but very unpleasant to, or the new reactive frameworks, that offers much better developer ergonomics (and that's why pretty much everybody uses them right now) but have pathological performance characteristics and requires lots of additional work to be fast enough when the number of items on screen is high enough.

By the way you are missing the forest (Xilem) for the tree (Vello) here: the foundational work Raph has been doing isn't just a 2D renderer (Vello), this is just a small piece in a bigger UI toolkit (Xilem) that is aimed at addressing the problem I mention above.

Xilem originally started using the native libraries for 2D rendering (through a piet wrapper he discusses quickly in the video) but ended up being disappointed and switched to making his own, but that's just one piece of the puzzle. The end goal is a fact reactive UI framework.

1 reply

elcritch • 9 hours ago

Out of curiosity, what sort of numbers of elements onscreen do you consider to be high?

1 reply

littlestymaar • 2 hours ago

Depends on the use case, but a few hundreds is enough make a naive React implementation crawl on mediocre hardware.

jasonthorsness • 1 day ago

Always amazes me when the cost of computing the optimized draw regions as shown in the slides is worth the savings even on CPU.

s-mon • 1 day ago

Great presentation and thanks for sharing the slides. Wondering, can any of these methods be used for 3D too?

boxerab • 1 day ago

Excellent presentation - I like that the design bakes in rendering efficiency from the get go.

haniehz • 23 hours ago

Very nice.

taneq • 1 day ago

I was confused by the step where the tiles are generated by tracing the outline and then sorted afterwards. It seems like this could be faster to do earlier (possibly even largely precomputed) using something analogous to oldschool scan conversion or span buffers? I'm not super up to date on this stuff so would love to know why this is faster.

Fraterkes • 1 day ago

Is there any specific connection between Rust and the netherlands? A friend of mine helped organize a big rustcon in Delft a while ago, I think Raph spoke there too.

Oh and a question for Raph, did the new spline you invented end up being integrated in any vector/font-creation tools? I remember being really impressed when I first tried your demo

1 reply

raphlinus • 1 day ago

Yes, I was born in Enkhuizen.

The newest spline work (hyperbezier) is still on the back burner, as I'm refining it. This turns out to be quite difficult, but I'm hopeful it will turn out better than the previous prototype you saw.