Skip to content

Commit

Permalink
Merge pull request #30 from rot169/main
Browse files Browse the repository at this point in the history
Create AndySmith_ExcessiveAgency
  • Loading branch information
virtualsteve-star committed Jun 23, 2023
2 parents f8dcca9 + c7a0954 commit 8d8b0a6
Showing 1 changed file with 37 additions and 0 deletions.
37 changes: 37 additions & 0 deletions 0_5_vulns/AndySmith_ExcessiveAgency
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
## Excessive Agency

**Author(s):**

Andy Smith

**Description:**

An LLM may be granted a degree of agency - the ability to interface with other systems in order to undertake actions. Without restriction, any undesireable operation of the LLM (regardless of the root casue; e.g., halucination, direct/indirect prompt injection, or just poorly-engineered benign prompts, etc) may result in undesireable actions being taken. Just like we never trust client-side validation in web-apps, LLMs should not be trusted to self-police or self-restrict; controls should be embedded in the APIs of the systems being interefaced with.

**Labels/Tags:**

- Label: "Excessive Agency"
- Label: "Inadequate Sandboxing"

**Common Examples of Vulnerability:**

1. Undesireable Actions Performed: The LLM triggers actions outside of the LLM that are unintended or undesireable, leading to second order consequences on downstream systems and processes.

**How to Prevent:**

1. Prevention Step 1: Reduce the permissions granted to the LMM to the minimum necessary to limit the scope of undesireable actions.
2. Prevention Step 2: Implement rate-limiting to reduce the number of undesireable actions.
3. Prevention Step 3: Utilise human-in-the-loop control to requre a human to approve all actions before they are taken.

**Example Attack Scenarios:**

Scenario #1: A personal assistant LLM is granted access to an individuals's mailbox in order to summarise the content of incoming emails. The LMM is vulnerable to an indirect promot injection attack, whereby a maliciously-crafted incoming email tricks the LLM into sending spam messages from the user's mailbox. This could be avoided by only granting the LLM read-only access to the mailbox (not the ability to send messages), or by requiring the user to manually review and hit 'send' on every mail drafted by the LLM. Alternatively The damage caused could be reduced by implementing rate limiting on the mail-sending interface.

Scenario #2: A customer service LLM has an interface to a payments system to provide service credits or refunds to customers in the case of complaints. The system prompt instructs the LLM to limit refunds to no more than one month's subscription fee, however a malicious customer engineers a direct prompt injection attack to convince the LMM to issue a refund of 100 years of fees. This could be avoided by implementing the 'one month max' limit within the refund API, rather than relying on the LMM to honour the limit in it's system prompt.

**Reference Links:**

1. [Link Title](URL): TBC

**Author Commentary (Optional):**

0 comments on commit 8d8b0a6

Please sign in to comment.