tshaddox 7 days ago

> an LLM has access to attacker-controlled data, sensitive information, and a data exfiltration capability

> THe "cardinal rule of agent design" should be that an LLM can have access to at most two of these during one session

I still don't really get it. Surely the older, simpler, and better cardinal rule is that you just don't expose any service to the Internet that you have given access to your private data, unless you directly control that service and have a very good understanding of its behavior.

2
motorest 7 days ago

> I still don't really get it. Surely the older, simpler, and better cardinal rule is that you just don't expose any service to the Internet that you have given access to your private data, unless you directly control that service and have a very good understanding of its behavior.

This scenario involves a system whose responsibility is to react to an event, analyse your private info in response to the event, and output something.

The exploit is that, much like a SQL injection, it turns out attackers can inject their own commands into the input event.

Also, it's worth keeping in mind that prompts do lead LLMs to update their context. Data ex filtration is a danger, but so is having an attacker silently manipulating the LLM's context.

miki123211 7 days ago

Private data + data exfiltration (with no attacker-controlled data) is fine, as there's no way to jailbreak the LLM. An attacker has no way to perform an attack, as no data they control can ever flow into the LLM, so they can't order it to behave in the way they want.

Private data + attacker controlled data (with no exfiltration capability) is also fine, as even if a jailbreak is performed, the LLM is physically incapable of leaking the results to the attacker.

So is attacker controlled data + exfiltration (with no private data access), as then there's nothing to exfiltrate.

This is just for the "data leakage attack." Other classes of LLM-powered attacks are possible, like asking the LLM to perform dangerous actions on your behalf, and they need their own security models.

IgorPartola 7 days ago

> Private data + attacker controlled data (with no exfiltration capability) is also fine, as even if a jailbreak is performed, the LLM is physically incapable of leaking the results to the attacker.

An attacker could modify your private data, delete it, inject prompts into it, etc.

rafaelmn 7 days ago

> Private data + data exfiltration (with no attacker-controlled data) is fine

Because LLMs are not at all known for their hallucinations and misuse of tools - not like it could leak all your data to random places just because it decided that was the best course of action.

Like I get the value proposition of LLMs but we're still benchmarking these things by counting Rs in strawberry - if you're ready to give it unfeathered access to your repos and PC - good luck I guess.

tshaddox 7 days ago

> Private data + data exfiltration (with no attacker-controlled data) is fine, as there's no way to jailbreak the LLM.

This is why I said *unless you...have a very good understanding of its behavior.*

If your public-facing service is, say, a typical RBAC implementation where the end user has a role and that role has read access to some resources and not others, then by all means go for it (obviously these system can still have bugs and still need hardening, but the intended behavior is relatively easy to understand and verify).

But if your service gives read access and exfiltration capabilities to a machine learning model that is deliberately designed to have complex, open-ended, non-deterministic behavior, I don't think "it's fine" even if there's no third-party attacker-controlled prompts in the system!

motorest 7 days ago

> This is why I said unless you...have a very good understanding of its behavior.

In this scenario the LLM's behavior per se is not a problem. The problem is that random third parties are able to sneak prompts to manipulate the LLM.