stonepresto 5 days ago

I know there were at least a few kernel devs who "validated" this bug, but did anyone actually build a PoC and test it? It's such a critical piece of the process yet a proof of concept is completely omitted? If you don't have a PoC, you don't know what sort of hiccups would come along the way and therefore can't determine exploitability or impact. At least the author avoided calling it an RCE without validation.

But what if there's a missing piece of the puzzle that the author and devs missed or assumed o3 covered, but in fact was out of o3's context, that would invalidate this vulnerability?

I'm not saying there is, nor am I going to take the time to do the author's work for them, rather I am saying this report is not fully validated which feels like a dangerous precedent to set with what will likely be an influential blog post in the LLM VR space moving forward.

IMO the idea of PoC || GTFO should be applied more strictly than ever before to any vulnerability report generated by a model.

The underlying perspective that o3 is much better than previous or other current models still remains, and the methodology is still interesting. I understand the desire and need to get people to focus on something by wording it a specific way, it's the clickbait problem. But dammit, do better. Build a PoC and validate your claims, don't be lazy. If you're going to write a blog post that might influence how vulnerability researchers conduct their research, you should promote validation and not theoretical assumption. The alternative is the proliferation of ignorance through false-but-seemingly-true reporting, versus deepening the community's understanding of a system through vetted and provable reports.

2
seanheelan 5 days ago

Hi, author here. Yes, I built a PoC. Yes, it triggered a KASAN report/crash.

stonepresto 5 days ago

Thank you! I'm really happy to hear you did that. But why not mention that in your blog post? I understand not wanting to include a PoC for responsible disclosure reasons, but including it would have added a lot of credibility to your work for assholes like me lol

seanheelan 5 days ago

I honestly hadn’t anticipated someone would think I hadn’t bothered to verify the vulnerability is real ;)

Since you’re interested: the bug is real but it is, I think, hard to exploit in real world scenarios. I haven’t tried. The timing you need to achieve is quite precise and tight. There are better bugs in ksmbd from an exploitation point of view. All of that is a bit of a “luxury problem” from the PoV of assessing progress in LLM capabilities at finding vulnerabilities though. We can worry about ranking bugs based on convenience for RCE once we can reliably find them at all.

stonepresto 5 days ago

I'm too much of a skeptic to not do so lol. Great post though overall, don't let my assholery dissuade you! I was pleasantly surprised that it was actually a researcher behind the news story and there was some real evidence / scientific procedure. I thought you had a lot of good insights into how to use LLMs in the VR space specifically, and I'm glad you did benchmarking. It's interesting to see how they're improving.

Yeah race conditions like that are always tricky to make reliable. And yeah I do realize that the purpose of the writeup was more about the efficacy of using LLMs vs the bug itself, and I did get a lot out of that part, I just hyper-focused on the bug because it's what I tend to care the most about. In the end I agree with your conclusion, I believe LLMs are going to become a key part of the VR workflow as they improve and I'm grateful for folks like yourself documenting a way forward for their integration.

Anyways, solid writeup and really appreciate the follow-up!

lyu07282 5 days ago

Are you saying you want PoCs that trigger a crash from the use-after-free or you would only be satisfied by full on RCE PoCs?

stonepresto 5 days ago

PoCs should at least trigger a crash, overwrite a register, or have some other provable effect, the point being to determine:

1) If it is actually a UAF or if there is some other mechanism missing from the context that prevents UAF. 2) The category and severity of the vulnerability. Is it even a DoS, RCE, or is the only impact causing a thread to segfault?

This is all part of the standard vulnerability research process. I'm honestly surprised it got merged in without a PoC, although with high profile projects even the suggestion of a vulnerability in code that can clearly be improved will probably end up getting merged.

lyu07282 5 days ago

Even a rudimentary exploit can be a significant time investment, it is absolutely not common practice to develop, publish or to demand such exploits from researchers to demonstrate memory corruption vulnerabilities. Everyone thinks they are an expert in infosec its so funny.

stonepresto 5 days ago

Well, in another subthread the author said he did in fact make a crashing PoC. I guess it depends on the customer's standards, but I would say in the vast majority of cases (especially for nuanced memory corruptions in which the ability to make something exploitable depends on your ability to demonstrate control of the heap) a crashing PoC is the bare minimum. In most VDPs, BBPs, or red team engagements you are required to provide some sort of proof to claim, otherwise you'll be laughed out of the room.

I'm curious which sector of infosec you're referring to in which vulnerability researchers are not required to provide proofs of concept? Maybe internal product VR where there is already an established trust?