Item 44109453

miki123211 • 7 days ago

Private data + data exfiltration (with no attacker-controlled data) is fine, as there's no way to jailbreak the LLM. An attacker has no way to perform an attack, as no data they control can ever flow into the LLM, so they can't order it to behave in the way they want.

Private data + attacker controlled data (with no exfiltration capability) is also fine, as even if a jailbreak is performed, the LLM is physically incapable of leaking the results to the attacker.

So is attacker controlled data + exfiltration (with no private data access), as then there's nothing to exfiltrate.

This is just for the "data leakage attack." Other classes of LLM-powered attacks are possible, like asking the LLM to perform dangerous actions on your behalf, and they need their own security models.

IgorPartola • 7 days ago

> Private data + attacker controlled data (with no exfiltration capability) is also fine, as even if a jailbreak is performed, the LLM is physically incapable of leaking the results to the attacker.

An attacker could modify your private data, delete it, inject prompts into it, etc.

rafaelmn • 7 days ago

> Private data + data exfiltration (with no attacker-controlled data) is fine

Because LLMs are not at all known for their hallucinations and misuse of tools - not like it could leak all your data to random places just because it decided that was the best course of action.

Like I get the value proposition of LLMs but we're still benchmarking these things by counting Rs in strawberry - if you're ready to give it unfeathered access to your repos and PC - good luck I guess.

tshaddox • 7 days ago

> Private data + data exfiltration (with no attacker-controlled data) is fine, as there's no way to jailbreak the LLM.

This is why I said *unless you...have a very good understanding of its behavior.*

If your public-facing service is, say, a typical RBAC implementation where the end user has a role and that role has read access to some resources and not others, then by all means go for it (obviously these system can still have bugs and still need hardening, but the intended behavior is relatively easy to understand and verify).

But if your service gives read access and exfiltration capabilities to a machine learning model that is deliberately designed to have complex, open-ended, non-deterministic behavior, I don't think "it's fine" even if there's no third-party attacker-controlled prompts in the system!

1 reply

motorest • 7 days ago

> This is why I said unless you...have a very good understanding of its behavior.

In this scenario the LLM's behavior per se is not a problem. The problem is that random third parties are able to sneak prompts to manipulate the LLM.