No, an image is a well ordered grid of pixels. The 3D variant would be voxels, and Nvidia recently released a project to do scene reconstruction with sparse voxels [0].
If you take these triangles, make them share vertices, and order them in a certain way, you have a mesh. You can then combine some of them into larger flat surfaces when that makes sense, draw thousands of them in one draw call, calculate intersections, volumes, physics, LODs, use textures with image compression instead of millions of colored objects, etc with them. Splatting is one way of answering the question "how do we reproduce these images in a way that lets us generate novel views of the same scene", not "what is the best representation of this 3D scene".
The aim is to find the light field that describes the scene, and if you have solid objects that function can be described on the surface of those objects. Seems like a much more elegant end result than a cloud of separate objects, no matter what shape they have, since that's much closer to how reality works. Obviously we need to handle volumetrics and translucency as well, but if we model the real surfaces as virtual surfaces I think things like reflections and shadow removal will be easier. At least gaussian splats have a hard time with reflections, they look good from some viewing angles, but the reflections are often handled as geometry [1].
I'm not arguing that it doesn't look good or that it doesn't serve a purpose, sometimes a photorealistic novel view of a real scene is all you want. But I still don't think it's the best representation of scenes.
I still love this older paper on Plenoxels : https://alexyu.net/plenoxels/
It made so much sense to me: voxels with view dependent color, using eg. spherical gaussians.
I don't know how it compares to newer techniques, probably badly since nobody seems to be talking about it.
They're mentioned in the SVRaster paper.
I'm unsure whether it is me, but one of us is confused about the representation of a 2D image with a 3D scene. it's absolutely correct that a digital (2d) image is a grid of pixels. We can call it a soup if you want. An audio file, our text document are soups too.
A 3D scene (the digital representation) is structure that can't be reduced to a simple grid. At least it better not to or it wouldn't look great from almost all angles.
Back to the splats...
Gaussian Splatting is a technique designed to tackle what seemed like impossible or at least prove extremely challenging with 3D scene reconstruction. The authors took a radically different approach and demonstrated feasibility. I haven't meant anybody who would look at a gaussian splatting reconstruction of a scene, and claim another method would look better. Or even could look better. Maybe some day, but as of 2025 there isn't.
On the voxel definition. I don't see vowel as encompassing any (as in all) 3d structure representation. I mean by that voxel is a definition.. A good one. But not every representation fit into that definition so we better watch out with what we infer about anything that doesn't fit in.
Imo gaussian splats do not form a voxel. But, let's say they do. So what? No idea how that's relevant to my point which is that gaussian splatting (voxel or not voxel) is a superior technology to any other for 3d reconstructions to date. I even caveated the cases where this method would be totally unhelpful. As of now editing splats is barely a thing (Super splats is super, but all we may do is remove splats) Some software can fuse, even adjusts properties of the splats, areas of splats, but these editing solutions are at their infancy, so they just don't count.
Your point that triangles can be combined into larger flat surfaces and optimized for rendering is a valid, but it doesn't help with the fact that non gaussian splatting methods, including the much slower nerf approach are inferior in the quality (let's say fidelity) they all produce.
Your argument doesn't discuss, compare, or even mention limitations faced by all the traditional mesh-based approaches.
3DGS, I'm not selling the thing just making it clear, is able at its current advancement of the method: to rasterize so efficiently that a millions of splats scene can render at 60 FPS on a mid-range GPU. (As t least it does on my laptop).
All that with the most accurate representation of lighting, reflection. Of whatever the camera was able capture really. Novel inference is just an approximation it doesn't invent anything unless some generative ML is plugged in, faked in, plastered all over so that the word Ai gets mentioned.
I don't think that's me..I think you are confusing the method and parts of the method. Gaussian Splatting is not a technique of generating novel views off some captured data.
Here is the situation: Most click bait articles or even GitHub repos will splash that aspect as if gaussians splatting is about generating the novel views. I should read the paper again but it isn't what I see in the discovery.
But still, let's say it is. On that front it may not outperforms nerf I don't know, it may still be the state of that art, but that's very slow, almost impractical for most workstations, and, doesn't outperform 3DGS on about everything other front.
Your argument that it is not close to how reality is, totally irrelevant again. Even contrary to what CG in general has demonstrated many times. We don't need the concept, what is captures, how things are represented, or how things are displayed to match reality more closely. That's usually the best way to fail, to attempt to compute reality as we believe it to be. Some would even argue with you: what reality are you talking about. We still don't have a clue what reality is.
All we know is that we seem to perceive things a certain way. Our brain may play a movie in there based on that. .it doesn't matter what's there. perception, then tricking the eyes or our neurons is all we have to focus on to make reconstruction valid.
But it's funny, actually the gaussians are based on optical functions. The blending of multiple layers of light wave is also a natural phenomenon. For what we know.
Anyhow, there is a lot of confusion out there about gaussian splats. I suspect not many people understand this tech but many are talking loud about it, confusing everyone else.
I hope you don't see my response as for sake of arguments, your reply was a good read, elegant, with a tone of authority on the question but I invite you to check 3DGS again (yourself not in the news).
Edit: I have no considered the voxel method you've shared. So not claiming gaussian splatting is superior to that, will check the claims though that wouldn't be the first..
> I'm unsure whether it is me, but one of us is confused about the representation of a 2D image with a 3D scene. it's absolutely correct that a digital (2d) image is a grid of pixels. We can call it a soup if you want. An audio file, our text document are soups too.
No, I don't agree that images or audio files or text documents are soups, they're ordered grids or lists of equidistant samples. To be clear, I didn't make up the description triangle soup, the authors did, it's right there in the article.
> A 3D scene (the digital representation) is structure that can't be reduced to a simple grid. At least it better not to or it wouldn't look great from almost all angles.
Yes it can, it's just a matter of resolution. Gaussian splats, Nerfs, and other similar techniques aim to represent the radiance field that the input images are samples of. The radiance field, like the EM field or most other fields, can be quantized to grids, just like how we represent 2d samples of scenes with grids of pixels.
> Gaussian Splatting is a technique designed to tackle what seemed like impossible or at least prove extremely challenging with 3D scene reconstruction.
Gaussian splatting is not used for scene reconstruction in the general sense, it's used for novel view synthesis. It doesn't claim to reconstruct the 3d scene, it only tries to make a estimation of the function that takes a position and direction and returns a color. I think the original Nerf paper does a good job explaining what the radiance field is, and how using images of a scene to estimate it works. 3dgs is a more efficient and intuitive way of doing the same thing.
> I haven't meant anybody who would look at a gaussian splatting reconstruction of a scene, and claim another method would look better. Or even could look better. Maybe some day, but as of 2025 there isn't.
Like I mentioned previosly, they look good and sometimes that's all you need, but as a representation of a 3d scene they're chaotic and not very elegant.
> Imo gaussian splats do not form a voxel.
No they don't.
> Your argument doesn't discuss, compare, or even mention limitations faced by all the traditional mesh-based approaches.
I didn't argue that there are any mesh based methods that look better at the moment.
> All that with the most accurate representation of lighting, reflection. Of whatever the camera was able capture really. Novel inference is just an approximation it doesn't invent anything unless some generative ML is plugged in, faked in, plastered all over so that the word Ai gets mentioned.
3dgs comes up with fake representations of reflections, it pretends that reflective surfaces aren't there and puts splats representing the reflections behind where the reflective surface should be. It does this because it has no concept of scene geometry, all it knows and cares about is optimizing the splats' positions and color so they look like the input images when rendered from the input images' positions.
> I don't think that's me..I think you are confusing the method and parts of the method. Gaussian Splatting is not a technique of generating novel views off some captured data.
Yes, it literally is. From the original paper from 2023 [0]:
We introduce three key elements that allow us to achieve state-of-the-art visual quality while maintaining competitive training times and importantly allow high-quality real-time (≥ 30 fps) novel-view synthesis at 1080p resolution.
Additional Key Words and Phrases: novel view synthesis, radiance fields, 3D gaussians, real-time rendering
Our goal is to optimize a scene representation that allows high quality novel view synthesis, starting from a sparse set of (SfM) points without normals.
Those SfM (Structure from Motion) points are from a previous step, often COLMAP, where the input images are used to find the camera poses and a sparse point cloud from features in the input images, the captured data. The images are samples of the scene's radiance field.> Some would even argue with you: what reality are you talking about. We still don't have a clue what reality is.
It isn't made up of disjoint triangles, I think most people agree with that. And without getting philosophical or diving into atoms or quantum fields, most of reality can be represented as continuous volumes or surfaces at a macro scale.
> All we know is that we seem to perceive things a certain way. Our brain may play a movie in there based on that. .it doesn't matter what's there. perception, then tricking the eyes or our neurons is all we have to focus on to make reconstruction valid.
Yes, if all we care about is novel view synthesis, which is a valid use case.
> Anyhow, there is a lot of confusion out there about gaussian splats. I suspect not many people understand this tech but many are talking loud about it, confusing everyone else.
It's really not that complicated, the original and follow up papers are easy to follow.
[0] https://repo-sam.inria.fr/fungraph/3d-gaussian-splatting/
I admire your patience and calmness in addressing just about every point I made, which I so poorly articulated that I concede we could just say I was wrong. Or was simply wrong, not due at all to poor wording.
And thank you for the extra notes surrounding the divergence. I had not read the original paper (again that is) before shooting my personal interpretation, and poor recollection of what the paper said. I should read it again I may not recognize it.
I haven't read the paper you shared yet. I thank you again for your input plus the couple of refs gave me plenty to spend time scratching my head as something must be just wrong with my current understanding.