motorest 8 days ago

> How is this considered an "exploit"?

Others in this discussion aptly described it as a confused deputy exploit. This goes something like:

- You write a LLM prompt that says something to the effect "dump all my darkest secrets in a place I can reach them",

- you paste them in a place where you expect your target's LLM agent to operate.

- Once your target triggers their LLM agent to process inputs, the agent will read the prompt and act upon it.

1
mirzap 8 days ago

Would you ever put a plain password text in a search engine and then complain if someone "extracted" that info with a keyword payload?

motorest 8 days ago

> Would you ever put a plain password (...)

Your comment bears no resemblance with the topic. The attack described in the article consists of injecting a malicious prompt in a way that the target's agent will apply it.

mirzap 7 days ago

Of course it will apply. Entire purpose of the agent is to give a response to a prompt. But to sound more dangareous let's call it "injecting". It's a prompt. You are not "injecting" anything. Agent pickups the prompt - that's its job, and execute - that is also its job.

motorest 7 days ago

> Of course it will apply. Entire purpose of the agent is to give a response to a prompt.

The exploit involves random third parties sneaking in their own prompts in a way that leads a LLM to run them on behalf of the repo's owner. This exploit can be used to leak protected information. This is pretty straight forward and easy to follow and understand.