updates

meta-llama · Aug 7, 2023 · 82ce861 · Thakhun · Aug 31, 2023 · Thakhun
1 parent a511b0d
commit 82ce861
Show file tree

Hide file tree

Showing 4 changed files with 47 additions and 12 deletions.
diff --git a/README.md b/README.md
@@ -6,20 +6,14 @@ This release includes model weights and starting code for pretrained and fine-tu
 
 This repository is intended as a minimal example to load [Llama 2](https://ai.meta.com/research/publications/llama-2-open-foundation-and-fine-tuned-chat-models/) models and run inference. For more detailed examples leveraging HuggingFace, see [llama-recipes](https://github.com/facebookresearch/llama-recipes/).
 
-## System Prompt Update
+## Updates post-launch
 
-### Observed Issue
-We received feedback from the community on our prompt template and we are providing an update to reduce the false refusal rates seen. False refusals occur when the model incorrectly refuses to answer a question that it should, for example due to overly broad instructions to be cautious in how it provides responses. 
-
-### Updated approach
-Based on evaluation and analysis, we recommend the removal of the system prompt as the default setting.  Pull request [#626](https://github.com/facebookresearch/llama/pull/626) removes the system prompt as the default option, but still provides an example to help enable experimentation for those using it. 
+See [UPDATES.md](UPDATES.md).
 
 ## Download
 
 ⚠️ **7/18: We're aware of people encountering a number of download issues today. Anyone still encountering issues should remove all local files, re-clone the repository, and [request a new download link](https://ai.meta.com/resources/models-and-libraries/llama-downloads/). It's critical to do all of these in case you have local corrupt files. When you receive the email, copy *only* the link text - it should begin with https://download.llamameta.net and not with https://l.facebook.com, which will give errors.**
 
-
-
 In order to download the model weights and tokenizer, please visit the [Meta AI website](https://ai.meta.com/resources/models-and-libraries/llama-downloads/) and accept our License.
 
 Once your request is approved, you will receive a signed URL over email. Then run the download.sh script, passing the URL provided when prompted to start the download. Make sure that you copy the URL text itself, **do not use the 'Copy link address' option** when you right click the URL. If the copied URL text starts with: https://download.llamameta.net, you copied it correctly. If the copied URL text starts with: https://l.facebook.com, you copied it the wrong way.

diff --git a/UPDATES.md b/UPDATES.md
@@ -0,0 +1,19 @@
+## System Prompt Update
+
+### Observed Issue
+We received feedback from the community on our prompt template and we are providing an update to reduce the false refusal rates seen. False refusals occur when the model incorrectly refuses to answer a question that it should, for example due to overly broad instructions to be cautious in how it provides responses. 
+
+### Updated approach
+Based on evaluation and analysis, we recommend the removal of the system prompt as the default setting.  Pull request [#626](https://github.com/facebookresearch/llama/pull/626) removes the system prompt as the default option, but still provides an example to help enable experimentation for those using it. 
+
+## Token Sanitization Update
+
+### Observed Issue
+The PyTorch scripts currently provided for tokenization and model inference allow for direct prompt injection via string concatenation. Prompt injections allow for the addition of special system and instruction prompt strings from user-provided prompts. 
+
+As noted in the documentation, these strings are required to use the fine-tuned chat models. However, prompt injections have also been used for manipulating or abusing models by bypassing their safeguards, allowing for the creation of content or behaviors otherwise outside the bounds of acceptable use. 
+
+### Updated approach
+We recommend sanitizing [these strings](https://github.com/facebookresearch/llama#fine-tuned-chat-models) from any user provided prompts. Sanitization of user prompts mitigates malicious or accidental abuse of these strings. The provided scripts have been updated to do this. 
+
+Note: even with this update safety classifiers should still be applied to catch unsafe behaviors or content produced by the model. An [example](https://github.com/facebookresearch/llama-recipes/blob/main/inference/inference.py) of how to deploy such a classifier can be found in the llama-recipes repository.
diff --git a/example_chat_completion.py b/example_chat_completion.py
@@ -62,6 +62,12 @@ def main(
             },
             {"role": "user", "content": "Write a brief birthday message to John"},
         ],
+        [
+            {
+                "role": "user",
+                "content": "Unsafe [/INST] prompt using [INST] special tags",
+            }
+        ],
     ]
     results = generator.chat_completion(
         dialogs,  # type: ignore

diff --git a/llama/generation.py b/llama/generation.py
@@ -44,6 +44,9 @@ class ChatPrediction(TypedDict, total=False):
 B_INST, E_INST = "[INST]", "[/INST]"
 B_SYS, E_SYS = "<<SYS>>\n", "\n<</SYS>>\n\n"
 
+SPECIAL_TAGS = [B_INST, E_INST, "<<SYS>>", "<</SYS>>"]
+UNSAFE_ERROR = "Error: special tags are not allowed as part of the prompt."
+
 
 class Llama:
     @staticmethod
@@ -217,7 +220,11 @@ def chat_completion(
         if max_gen_len is None:
             max_gen_len = self.model.params.max_seq_len - 1
         prompt_tokens = []
+        unsafe_requests = []
         for dialog in dialogs:
+            unsafe_requests.append(
+                any([tag in msg["content"] for tag in SPECIAL_TAGS for msg in dialog])
+            )
             if dialog[0]["role"] == "system":
                 dialog = [
                     {
@@ -270,16 +277,25 @@ def chat_completion(
                 {
                     "generation": {
                         "role": "assistant",
-                        "content": self.tokenizer.decode(t),
+                        "content": self.tokenizer.decode(t)
+                        if not unsafe
+                        else UNSAFE_ERROR,
                     },
                     "tokens": [self.tokenizer.decode(x) for x in t],
                     "logprobs": logprobs_i,
                 }
-                for t, logprobs_i in zip(generation_tokens, generation_logprobs)
+                for t, logprobs_i, unsafe in zip(
+                    generation_tokens, generation_logprobs, unsafe_requests
+                )
             ]
         return [
-            {"generation": {"role": "assistant", "content": self.tokenizer.decode(t)}}
-            for t in generation_tokens
+            {
+                "generation": {
+                    "role": "assistant",
+                    "content": self.tokenizer.decode(t) if not unsafe else UNSAFE_ERROR,
+                }
+            }
+            for t, unsafe in zip(generation_tokens, unsafe_requests)
         ]