Skip to content

Configuring Custom Models

3Simplex edited this page Jul 31, 2024 · 27 revisions

Warning

Here is a good example of a bad model. (somethings wrong)
We will now walk through configuration of a Downloaded model, this is required for it to (possibly) work.
Models found on Huggingface or anywhere else are "unsupported" you should follow this guide before asking for help.

Whether you "Sideload" or "Download" a custom model you must configure it to work properly.

  • We will refer to a "Download" as being any model that you found using the "Add Models" feature.
  • A "Sideload" is any model you get somewhere else and then put in the models directory.

1. Finding the model

In this example, we use the "Search" feature of GPT4All.
image
Typing the name of a custom model will search HuggingFace and return results.

  • A custom model is one that is not provided in the default models list by GPT4All.
  • Any time you use the "search" feature you will get a list of custom models.

Screenshot 2024-07-28 130430

2. Finding the remote repository where the model is hosted

Click "More info can be found HERE.", which in this example brings you to huggingface. image

3. Configuring the model

Finding the configuration - In the Model Card

Here, you find the information that you need to configure the model. (This model may be outdated, it may have been a failed experiment, it may not yet be compatible with GPT4All, it may be dangerous, it may also be GREAT!)

  • You need to know the Prompt Template.
  • You need to know the maximum context (128k)
  • You need to know if there is a problem. See the community tab and look.

image
Maybe this won't affect you. Though it's a good place to find out.

So next, let's find that template... Hopefully the model authors were kind and included it. image
This could be a good helpful template. Hopefully this works. Keep in mind:

  • The model authors may not have tested their own model
  • The model authors may not have not bothered to change their models configuration files from finetuning to inferencing workflows.
  • Even if they show you a template it may be wrong.
  • Each model has its own tokens and its own syntax.
    • The models are trained for these and one must use them to work.
    • The model uploader may not understand this either and can fail to provide a good model or a mismatching template.

(Optional) Finding the configuration - In the configuration files

Apart from the model card, there are three files that could hold relevant information for running the model.

  • config.json
  • tokenizer_config.json
  • generation_config.json

Check config.json to find the capabilities (such as the maximum context length) of the model. Check generation.config.json to find out about the original chat template. Especially useful, if the model author failed to provide a template. Check all three files, if you want to quantize the model and you do need to cross-check, if the model uses the proper beginning of string (bos) and end of string (eos) tokens. image

Drafting the System Prompt and Chat Template

Important

The chat templates must be followed on a per model basis. Every model is different. You can imagine them to be like magic spells.
Your magic won't work if you say the wrong word. It won't work if you say it at the wrong place or time.

At this step, we need to combine the chat template that we found in the model card (or in the tokenizer_config.json) with a special syntax that is compatible with the GPT4All-Chat application (The format shown in the above screenshot is only an example).

Model specific syntax:

Special tokens like <|user|> will say the user is about to talk. <|end|> will tell the llm we are done with that, now continue on.

GPT4All syntax:

  • We use %1 as placeholder for the content of the users prompt.
  • We use %2 as placholder for the content of the models response.

That example prompt that should (in theory) be compatible with GPT4All will look like this for you...

System Prompt:
<|system|>
You are a helpful AI assistant.<|end|>
Chat Prompt Template:
<|user|>
%1<|end|>
<|assistant|>
%2<|end|>

You can see how the template will inject the stuff you type where the %1 goes. You can stuff something fun in there if you want to... (Now the chat knows my name!)

<|user|>
3Simplex:%1<|end|>
<|assistant|>
%2<|end|>

The system prompt will define the behavior of the model when you chat. You can say "Talk like a pirate, and be sure to keep your bird quite!"
The prompt template will tell the model what is happening and when.

image

4. Settings

The Defaults:

The default settings are a good safe place to start. The default and provides good output for most models. For instance, you can't blow up your RAM on only 2048 context and you can always increase it to whatever the model supports.

Context Length

This is the maximum context that you will use with the model. Context is somewhat the sum of the models tokens in the system prompt + chat template + user prompts + model responses + tokens that were added to the models context via retrieval augmented generation (RAG), which would be the LocalDocs feature. You need to keep context length within two safe margins.

    1. your system can only use so much memory. Using more than you have will cause severe slowdowns or even crashes.
    1. your model is only capable of what it was trained for. Using more than that will give trash answers and gibberish.

image
Since we are talking about computer terminology here, 1k = 1024 not 1000. So 128k, as is advertised by the phi3 model will translate to (1024 x 128 = 131072).

Max Length

I will use 4192 which is 4k of a response. I like allowing for a great response but want to stop the model at that point. (Maybe you want it longer? Try 8192)

GPU Layers

This is one that you need to think about if you have a small GPU or a big model.
image
This will be set to load all layers on the GPU. You may need to use less to get the model to work for you.

Chat Name Prompt and SuggestedFollowUp Prompt

These settings are model independent. They are only for the GPT4All environment. You can play with them all you like.
image

The other settings (ToDo)

The rest of these are special settings that need more training and experience to learn. They don't need to be changed most of the time.

You should now have a fully configured model I hope it works for you!

More Advanced Topics:

  • The model is now configured but still doesn't work.
  • Explain how the tokens work in the templates.
Clone this wiki locally