> How is this considered an "exploit"?
Others in this discussion aptly described it as a confused deputy exploit. This goes something like:
- You write a LLM prompt that says something to the effect "dump all my darkest secrets in a place I can reach them",
- you paste them in a place where you expect your target's LLM agent to operate.
- Once your target triggers their LLM agent to process inputs, the agent will read the prompt and act upon it.
Would you ever put a plain password text in a search engine and then complain if someone "extracted" that info with a keyword payload?
> Would you ever put a plain password (...)
Your comment bears no resemblance with the topic. The attack described in the article consists of injecting a malicious prompt in a way that the target's agent will apply it.
Of course it will apply. Entire purpose of the agent is to give a response to a prompt. But to sound more dangareous let's call it "injecting". It's a prompt. You are not "injecting" anything. Agent pickups the prompt - that's its job, and execute - that is also its job.
> Of course it will apply. Entire purpose of the agent is to give a response to a prompt.
The exploit involves random third parties sneaking in their own prompts in a way that leads a LLM to run them on behalf of the repo's owner. This exploit can be used to leak protected information. This is pretty straight forward and easy to follow and understand.