AMD thread #3759

oobabooga · 2023-08-30T16:50:51Z

This thread is dedicated to discussing the setup of the webui on AMD GPUs.

You are welcome to ask questions as well as share your experiences, tips, and insights to make the process easier for all AMD users.

MistakingManx · 2023-08-30T20:14:16Z

Why no AMD for Windows?

BarfingLemurs · 2023-08-30T22:03:14Z

@MistakingManx there is, you have to diy a llama cpp python build. It will be harder to setup than Linux.

lufixSch · 2023-08-30T22:45:30Z

Does someone has a working AutoGPTQ setup?

Mine was really slow when I installed the wheel: https://github.com/PanQiWei/AutoGPTQ/releases/download/v0.4.2/auto_gptq-0.4.2+rocm5.4.2-cp310-cp310-linux_x86_64.whl

When building from source, the text generation is much faster but the output is just gibberish.

I am running on a RX 6750 XT if this is important.

MistakingManx · 2023-08-31T00:35:41Z

@MistakingManx there is, you have to diy a llama cpp python build. It will be harder to setup than Linux.

CUDA error when trying to offload layers to AMD GPU using rocBlas/hipBlas ggerganov/llama.cpp#2799 (comment)

Windows ROCm Build. ggerganov/llama.cpp#2843

Why exactly do models prefer a GPU instead of a CPU? Mine is running quick on CPU, but OBS kills it off due to OBS using so much.

BarfingLemurs · 2023-08-31T09:08:41Z

Why exactly do models prefer

users prefer. Since:

an AMD gpu comparable with 3090 may work at ~20t/s for 34B model.

MistakingManx · 2023-09-03T02:59:30Z

I have an AMD Radeon RX 5500 XT, is that good?
My CPU spits fully completed things out within 6 seconds, when the CPU isn't stressed with OBS.
Otherwise it takes around 35 seconds, if I could speed that up with my GPU I'd say it's worth the setup

CNR0706 · 2023-09-03T03:18:53Z

I'm having trouble getting the WebUI to even launch. I'm using ROCm 6.1 on openSuSE Tumbleweed Linux with a 6700XT.

I used the 1 click installer to set it up (and I selected ROCm support) but after the installation finished it just threw an error:

cnr07@opensuse-linux-gpc:~/oobabooga_linux> ./start_linux.sh Traceback (most recent call last): File "/home/cnr07/oobabooga_linux/text-generation-webui/server.py", line 28, in <module> from modules import ( File "/home/cnr07/oobabooga_linux/text-generation-webui/modules/training.py", line 21, in <module> from peft import ( File "/home/cnr07/oobabooga_linux/installer_files/env/lib/python3.10/site-packages/peft/__init__.py", line 22, in <module> from .auto import ( File "/home/cnr07/oobabooga_linux/installer_files/env/lib/python3.10/site-packages/peft/auto.py", line 31, in <module> from .mapping import MODEL_TYPE_TO_PEFT_MODEL_MAPPING File "/home/cnr07/oobabooga_linux/installer_files/env/lib/python3.10/site-packages/peft/mapping.py", line 23, in <module> from .peft_model import ( File "/home/cnr07/oobabooga_linux/installer_files/env/lib/python3.10/site-packages/peft/peft_model.py", line 38, in <module> from .tuners import ( File "/home/cnr07/oobabooga_linux/installer_files/env/lib/python3.10/site-packages/peft/tuners/__init__.py", line 21, in <module> from .lora import LoraConfig, LoraModel File "/home/cnr07/oobabooga_linux/installer_files/env/lib/python3.10/site-packages/peft/tuners/lora.py", line 45, in <module> import bitsandbytes as bnb File "/home/cnr07/oobabooga_linux/installer_files/env/lib/python3.10/site-packages/bitsandbytes/__init__.py", line 6, in <module> from . import cuda_setup, utils, research File "/home/cnr07/oobabooga_linux/installer_files/env/lib/python3.10/site-packages/bitsandbytes/research/__init__.py", line 1, in <module> from . import nn File "/home/cnr07/oobabooga_linux/installer_files/env/lib/python3.10/site-packages/bitsandbytes/research/nn/__init__.py", line 1, in <module> from .modules import LinearFP8Mixed, LinearFP8Global File "/home/cnr07/oobabooga_linux/installer_files/env/lib/python3.10/site-packages/bitsandbytes/research/nn/modules.py", line 8, in <module> from bitsandbytes.optim import GlobalOptimManager File "/home/cnr07/oobabooga_linux/installer_files/env/lib/python3.10/site-packages/bitsandbytes/optim/__init__.py", line 6, in <module> from bitsandbytes.cextension import COMPILED_WITH_CUDA File "/home/cnr07/oobabooga_linux/installer_files/env/lib/python3.10/site-packages/bitsandbytes/cextension.py", line 13, in <module> setup.run_cuda_setup() File "/home/cnr07/oobabooga_linux/installer_files/env/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py", line 120, in run_cuda_setup binary_name, cudart_path, cc, cuda_version_string = evaluate_cuda_setup() File "/home/cnr07/oobabooga_linux/installer_files/env/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py", line 341, in evaluate_cuda_setup cuda_version_string = get_cuda_version() File "/home/cnr07/oobabooga_linux/installer_files/env/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py", line 311, in get_cuda_version major, minor = map(int, torch.version.cuda.split(".")) AttributeError: 'NoneType' object has no attribute 'split'
--- System ---
GPU: RX 6700XT
CPU: R5 3600
RAM: 16 GiB
OS: openSuSE Tumbleweed (up to date)
Kernel: Linux 6.4.11-1-default
GPU Driver: AMDGPU FOSS Kernel driver, full Mesa 23.1.6
ROCm: 6.1, from AMD's SuSE repo

henrittp · 2023-09-03T04:28:28Z

I'm having trouble getting the WebUI to even launch. I'm using ROCm 6.1 on openSuSE Tumbleweed Linux with a 6700XT.

I used the 1 click installer to set it up (and I selected ROCm support) but after the installation finished it just threw an error:

cnr07@opensuse-linux-gpc:~/oobabooga_linux> ./start_linux.sh Traceback (most recent call last): File "/home/cnr07/oobabooga_linux/text-generation-webui/server.py", line 28, in <module> from modules import ( File "/home/cnr07/oobabooga_linux/text-generation-webui/modules/training.py", line 21, in <module> from peft import ( File "/home/cnr07/oobabooga_linux/installer_files/env/lib/python3.10/site-packages/peft/__init__.py", line 22, in <module> from .auto import ( File "/home/cnr07/oobabooga_linux/installer_files/env/lib/python3.10/site-packages/peft/auto.py", line 31, in <module> from .mapping import MODEL_TYPE_TO_PEFT_MODEL_MAPPING File "/home/cnr07/oobabooga_linux/installer_files/env/lib/python3.10/site-packages/peft/mapping.py", line 23, in <module> from .peft_model import ( File "/home/cnr07/oobabooga_linux/installer_files/env/lib/python3.10/site-packages/peft/peft_model.py", line 38, in <module> from .tuners import ( File "/home/cnr07/oobabooga_linux/installer_files/env/lib/python3.10/site-packages/peft/tuners/__init__.py", line 21, in <module> from .lora import LoraConfig, LoraModel File "/home/cnr07/oobabooga_linux/installer_files/env/lib/python3.10/site-packages/peft/tuners/lora.py", line 45, in <module> import bitsandbytes as bnb File "/home/cnr07/oobabooga_linux/installer_files/env/lib/python3.10/site-packages/bitsandbytes/__init__.py", line 6, in <module> from . import cuda_setup, utils, research File "/home/cnr07/oobabooga_linux/installer_files/env/lib/python3.10/site-packages/bitsandbytes/research/__init__.py", line 1, in <module> from . import nn File "/home/cnr07/oobabooga_linux/installer_files/env/lib/python3.10/site-packages/bitsandbytes/research/nn/__init__.py", line 1, in <module> from .modules import LinearFP8Mixed, LinearFP8Global File "/home/cnr07/oobabooga_linux/installer_files/env/lib/python3.10/site-packages/bitsandbytes/research/nn/modules.py", line 8, in <module> from bitsandbytes.optim import GlobalOptimManager File "/home/cnr07/oobabooga_linux/installer_files/env/lib/python3.10/site-packages/bitsandbytes/optim/__init__.py", line 6, in <module> from bitsandbytes.cextension import COMPILED_WITH_CUDA File "/home/cnr07/oobabooga_linux/installer_files/env/lib/python3.10/site-packages/bitsandbytes/cextension.py", line 13, in <module> setup.run_cuda_setup() File "/home/cnr07/oobabooga_linux/installer_files/env/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py", line 120, in run_cuda_setup binary_name, cudart_path, cc, cuda_version_string = evaluate_cuda_setup() File "/home/cnr07/oobabooga_linux/installer_files/env/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py", line 341, in evaluate_cuda_setup cuda_version_string = get_cuda_version() File "/home/cnr07/oobabooga_linux/installer_files/env/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py", line 311, in get_cuda_version major, minor = map(int, torch.version.cuda.split(".")) AttributeError: 'NoneType' object has no attribute 'split' --- System --- GPU: RX 6700XT CPU: R5 3600 RAM: 16 GiB OS: openSuSE Tumbleweed (up to date) Kernel: Linux 6.4.11-1-default GPU Driver: AMDGPU FOSS Kernel driver, full Mesa 23.1.6 ROCm: 6.1, from AMD's SuSE repo

same issue here. Still no solution for me. Anyone can gimme some light here? ty in advance.

CNR0706 · 2023-09-03T04:43:20Z

Okay, so this is definitely not idea but I found that VERY carefully following the manual installation guide and then uninstalling bitsandbytes makes it work. I'm still figuring things out but at least it works now.

henrittp · 2023-09-03T05:04:14Z

then uninstalling bitsandbytes makes it work

Then you installed that modified version of bitsandbytes for rocm? Or..? What exactly did you do? Tks in advance.

henrittp · 2023-09-03T05:50:31Z

@CNR0706 I managed to install a modified version of bitsandbytes for ROCm. Just follow this tutorial and you should be fine: YT Video. Therefore, you can leverage all of this lib offers (or almost everything, but anyways...)

lufixSch · 2023-09-03T10:55:32Z

@CNR0706 I managed to install a modified version of bitsandbytes for ROCm. Just follow this tutorial and you should be fine: YT Video. Therefore, you can leverage all of this lib offers (or almost everything, but anyways...)

I am not sure which version is newer, but I used https://github.com/agrocylo/bitsandbytes-rocm.
You need to build it from source with the following commands:

git clone [email protected]:agrocylo/bitsandbytes-rocm.git
cd bitsandbytes-rocm/
export PATH=/opt/rocm/bin:$PATH #Add ROCm to $PATH
export HSA_OVERRIDE_GFX_VERSION=10.3.0 HCC_AMDGPU_TARGET=gfx1030
make hip
python setup.py install

Make sure the environment variables are also set, when you start the webui. Depending on your GPU you might need to change the GPU target or GFX Version.

lufixSch · 2023-09-03T11:04:07Z

I have an AMD Radeon RX 5500 XT, is that good? My CPU spits fully completed things out within 6 seconds, when the CPU isn't stressed with OBS. Otherwise it takes around 35 seconds, if I could speed that up with my GPU I'd say it's worth the setup

Saying it takes 6 seconds is not that helpful to get an Idea of the performance you have. Because that depends on the length of the output. Take a look at the console. After every generation it spits out the generation speed in t/s. It also depends on what model you are using.

With my RX 6750 XT I got about 35 t/s with a 7B GPTQ Model

lufixSch · 2023-09-03T11:05:13Z

@henrittp, @CNR0706 Did you try setting up AutoGPTQ? Did it work for you?

RBNXI · 2023-09-07T20:23:02Z

I have AttributeError: 'NoneType' object has no attribute 'split' error too...
Has ANYONE managed to run this with ROCM at all?, I'm starting to think that AMD is just useless for this stuff

lufixSch · 2023-09-07T20:39:20Z

I have AttributeError: 'NoneType' object has no attribute 'split' error too...

@RBNXI This is caused by bitsandbytes. You need to install a specific Version. Take a look at my comment above.

Has ANYONE managed to run this with ROCM at all?, I'm starting to think that AMD is just useless for this stuff

Yes it worked really good on my PC until I broke my Installation with an update of the repository.
I am also running Stable diffusion on my PC with AUTOMATIC1111 and it works great. The AUTOMATIC1111 Setup is much easier, because the install script takes care of everything.

I plan on improving the one click installer and/or the setup guide of the oobabooga webui for AMD to make the setup easier, if I ever get it running again :)

RBNXI · 2023-09-07T20:44:09Z

I plan on improving the one click installer and/or the setup guide of the oobabooga webui for AMD to make the setup easier, if I ever get it running again :)

Cool, I'll be waiting for that then.

@RBNXI This is caused by bitsandbytes. You need to install a specific Version. Take a look at my comment above.

I saw it and tried to build it, but gave an error and got tired of trying stuff, I just thought "well, having to do so many steps and then having so many errors must mean it's just not ready yet...". But I could try another day when I have more time if I can fix that error, thanks.

lufixSch · 2023-09-07T20:50:14Z

@RBNXI What Error did you get?
Make sure the repo is located on a path without spaces. This seems to cause issues sometimes. And you need the rocm-hip-sdk package (at least on arch linux it is called that way)

"well, having to do so many steps and then having so many errors must mean it's just not ready yet..."

Yes I can understand that. The setup with NVIDIA is definitely easier.

RBNXI · 2023-09-07T21:02:21Z

@RBNXI What Error did you get? Make sure the repo is located on a path without spaces. This seems to cause issues sometimes. And you need the rocm-hip-sdk package (at least on arch linux it is called that way)

I don't remember the error, I'm sorry. But I had a question for when I try again, the command you used to clone (git clone [email protected]:agrocylo/bitsandbytes-rocm.git) I remember it gave me an error, is it ok to just clone with the default link to the repo? It said the link you used is private or something like that

lufixSch · 2023-09-07T21:06:25Z

Yes you can of course use the link from the repo directly. You probably mean this one: https://github.com/agrocylo/bitsandbytes-rocm.git

RBNXI · 2023-09-08T09:29:30Z

I tried again and same result. I followed the installation tutorial, everything works fine, then run and get the split error, then I compiled bitsandbytes from that repo (now it worked) and then tried to run again and same split error again...
Edit: I managed to fix that error, now everything is apparently working, but I try to load a model and says: assert self.model is not None
Errors are never ending...

containerblaq1 · 2023-09-09T15:51:16Z

installing bitsandbytes-rocm is the only way I've been able to make this work. The new install doesn't seem to work for the 7900XTX

lufixSch · 2023-09-09T19:06:54Z

AMD Setup Step-by-Step Guide (WIP)

I finally got my setup working again (by reinstalling everything). Here is a step by step guide on how I got it running:

I tested all steps on Manjaro but they should work on other Linux distros. I have no Idea how the steps can be transferred to windows. Please leave a comment if you have a solution for Windows.

NOTE: At the start of each step I assume you have the terminal opened at the root of the project and that you have ROCm installed (on you need to install the rocm-hip-sdk package).
Furthermore consider creating an virtual environment (for example with miniconda or venv) and activating it

NOTE: If you have a 7xxx Gen AMD GPU please read the notes at the End of this guide

Step 1: Install dependencies (should be similar to the one click installer except the last step)

pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm5.4.2
pip install -r requirements_nocuda.txt
export HSA_OVERRIDE_GFX_VERSION=10.3.0, export HCC_AMDGPU_TARGET=gfx1030 and export PATH=/opt/rocm/bin:$PATH (consider adding those lines to your .bash_profile, .zprofile or .profile as you need to run them every time you start the webui) (the gfx version might change depending on your GPU -> https://www.llvm.org/docs/AMDGPUUsage.html#processors)

If you get an error installing torch try running pip install -r requirements_nocuda.txt first. After this run the torch install command with the --force-reinstall option

Step 2: Fix bitsandbytes

This step did not work properly for me.
If you only want to get it working and don't want to use bitsandbytes on your GPU just run pip install bitsandbytes==0.38.1. I mostly run GPTQ models and this was fine for me.
It seems like the official bitsandbytes project is working on supporting ROCm but this will take a while until there is a working version

mkdir repositories && cd repositories
git clone https://github.com/broncotc/bitsandbytes-rocm.git (or another fork listed below)
make hip
python setup.py install

I found the following forks which should work for ROCm but got none of them working. If you find a working version please give some feedback.

https://github.com/broncotc/bitsandbytes-rocm
https://github.com/0cc4m/bitsandbytes-rocm/tree/rocm (rocm branch)
https://github.com/Titaniumtown/bitsandbytes-rocm/tree/patch-2 (patch-2 branch)
https://github.com/arlo-phoenix/bitsandbytes-rocm-5.6/tree/rocm (rocm branch)
https://git.ecker.tech/mrq/bitsandbytes-rocm

Step 3: Install AutoGPTQ

This is only neccessary if you want to run GPTQ models

mkdir repositories && cd repositories
git clone https://github.com/PanQiWei/AutoGPTQ.git && cd AutoGPTQ
ROCM_VERSION=5.4.2 pip install -v .

If the installation fails try applying the patch provided by this article.
Run git apply with the patch provided below as argument

diff --git a/autogptq_extension/exllama/hip_compat.cuh b/autogptq_cuda/exllama/hip_compat.cuh
index 5cd2e85..79e0930 100644
--- a/autogptq_cuda/exllama/hip_compat.cuh
+++ b/autogptq_cuda/exllama/hip_compat.cuh
@@ -46,4 +46,6 @@ __host__ __forceinline__ hipblasStatus_t __compat_hipblasHgemm(hipblasHandle_t
 #define rocblas_set_stream hipblasSetStream
 #define rocblas_hgemm __compat_hipblasHgemm
 
+#define hipblasHgemm __compat_hipblasHgemm
+
 #endif

Step 4: Exllama

This is only neccessary if you want to use this model loader (Faster for GPTQ models)

mkdir repositories && cd repositories
git clone https://github.com/turboderp/exllama && cd exllama
pip install -r requirments.txt

Step 4.5: ExllamaV2

ExllamaV2 works out of the box and will be installed automatically when installing requirements_nocuda.txt

If you get an error running ExllamaV2 try installing the nightly version of torch for ROCm5.6 (Should be released as stable version soon)

pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/rocm5.6 --force-reinstall

Step 5: llama-cpp-python

Did not work for me today but it worked before (not sure what I did wrong today)

CMAKE_ARGS="-DLLAMA_HIPBLAS=on -DCMAKE_CXX_FLAGS='-fPIC'" FORCE_CMAKE=1 CC=/opt/rocm/llvm/bin/clang CXX=/opt/rocm/llvm/bin/clang++ pip install llama-cpp-python

You might need to add the --no-cache-dir and --force-reinstall option if you installed llama-cpp-python before

I hope you can get it working with this guide :) I would appreciate some feedback on how this guide worked for you so we can create a complete and robust setup guide for AMD devices (and maybe even updated the one click installer based on the guide)

Notes on 7xxx AMD GPUs

Remember that you have to change the GFX Version for the envrionment variables: export HSA_OVERRIDE_GFX_VERSION=11.0.0, export HCC_AMDGPU_TARGET=gfx1100

As described by this article you should make sure to install/setup ROCm without opencl as this might cause problems with hip.

You also need to install the nighly version of torch for ROCm 5.6 instead of ROCm 5.4.2 (Should be released as stable version soon):

pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/rocm5.6

lufixSch · 2023-09-09T22:10:14Z

I try to load a model and says: assert self.model is not None Errors are never ending...

@RBNXI What model are you using? Which loader are you using? Usually this error means the loader failed to load the model.

As explained by my guide above you have to do extra steps for AutoGPTQ and Exllama/Exllama_HF.

Also note that with AutoGPTQ you often have to define the wbits and groupsize otherwise it will fail.

RBNXI · 2023-09-09T23:25:24Z

Awesome guide, thanks, I'll try it when I can.
You mentioned that llama-cpp-python didn't work today and you don't know why. The model I was using was one of those, I think that there's currently a known bug that doesn't let us load llama models, could that be the problem?.
Also, I think my GPU doesn't appear here https://www.llvm.org/docs/AMDGPUUsage.html#processors
I have a RX 6600, is that one also 1030?
Edit: I was able to load the model with llama.cpp, but they run in CPU, do I have to do anything special for it to run in GPU? I launch it with this: python server.py --chat --api --auto-devices --n-gpu-layers 1000000000 --n_ctx 4096 --mlock --verbose --model mythomax-l2-13b.Q5_K_M.gguf
Don't tell me my GPU doesn't support ROCM please...

I tried with different --n-gpu-layers and same result.

Also, AutoGPTQ installation failed with

 Total number of replaced kernel launches: 4
  running clean
  removing 'build/temp.linux-x86_64-cpython-310' (and everything under it)
  removing 'build/lib.linux-x86_64-cpython-310' (and everything under it)
  'build/bdist.linux-x86_64' does not exist -- can't clean it
  'build/scripts-3.10' does not exist -- can't clean it
  removing 'build'
Failed to build auto-gptq
ERROR: Could not build wheels for auto-gptq, which is required to install pyproject.toml-based projects

Edit 2: I tried running a GPTQ model anyways, and it starts to load in VRam so the GPU is detected, but fails with:

Traceback (most recent call last):

File “/run/media/ruben/Prime/CharacterAI/oobabooga_linux/text-generation-webui/text-generation-webui/modules/ui_model_menu.py”, line 196, in load_model_wrapper

shared.model, shared.tokenizer = load_model(shared.model_name, loader)

File “/run/media/ruben/Prime/CharacterAI/oobabooga_linux/text-generation-webui/text-generation-webui/modules/models.py”, line 79, in load_model

output = load_func_map[loader](model_name)

File “/run/media/ruben/Prime/CharacterAI/oobabooga_linux/text-generation-webui/text-generation-webui/modules/models.py”, line 320, in AutoGPTQ_loader

return modules.AutoGPTQ_loader.load_quantized(model_name)

File “/run/media/ruben/Prime/CharacterAI/oobabooga_linux/text-generation-webui/text-generation-webui/modules/AutoGPTQ_loader.py”, line 57, in load_quantized

model = AutoGPTQForCausalLM.from_quantized(path_to_model, **params)

File “/run/media/ruben/Prime/CharacterAI/oobabooga_linux/text-generation-webui/miniconda/envs/textgen/lib/python3.10/site-packages/auto_gptq/modeling/auto.py”, line 108, in from_quantized

return quant_func(

File “/run/media/ruben/Prime/CharacterAI/oobabooga_linux/text-generation-webui/miniconda/envs/textgen/lib/python3.10/site-packages/auto_gptq/modeling/_base.py”, line 875, in from_quantized

accelerate.utils.modeling.load_checkpoint_in_model(

File “/run/media/ruben/Prime/CharacterAI/oobabooga_linux/text-generation-webui/miniconda/envs/textgen/lib/python3.10/site-packages/accelerate/utils/modeling.py”, line 1392, in load_checkpoint_in_model

set_module_tensor_to_device(

File “/run/media/ruben/Prime/CharacterAI/oobabooga_linux/text-generation-webui/miniconda/envs/textgen/lib/python3.10/site-packages/accelerate/utils/modeling.py”, line 281, in set_module_tensor_to_device

raise ValueError(

ValueError: Trying to set a tensor of shape torch.Size([108, 640]) in “qzeros” (which has shape torch.Size([432, 640])), this look incorrect.

lufixSch · 2023-09-10T09:33:09Z

@RBNXI I found this Issue in the ROCm Repo discussing the RX 6600. According to this the RX 6600 should work. Usually for all 6xxx cards gfx1030 works. You could check if your GPU is working by running rocminfo and clinfo. Both commands should mention your GPU

llama.cpp probably runs on CPU because the prebuild python package is only build with CPU support. This is why you need to install it with the command from my guide.

Regarding AutoGPTQ: I think you just copied the last lines not the real error that broke the installation. Therefore I am not sure what the problem is. Maybe check your ROCm Version and change the ROCM_VERSION Variable accordingly.
Did you install the rocm-hip-sdk package (or whatever it is called on your distro). What Linux Distro are you running by the way?

I usually run the webui with python server.py and load the models using the GUI. This way the GUI usually chooses the default parameters by itself and it is easier to get it working. I also should note, that I run the newest version from the main branch. If you are using the one-click installer v1.5 your using the old requirements.txt which might explain where llama.cpp with cpu support is installed and why AutoGPTQ kind of works even though you did not install it.

RBNXI · 2023-09-10T09:36:30Z

I don't have rocminfo installed, should I?. But clinfo shows my GPU indeed.

I'll try to reinstall again and see if it works now.

I did install rocm-hip-sdk. And I'm using Arch.

Also I'm running it in a miniconda environment, is that a problem?

Also the ROCM I have installed is from arch repository, I think it's 5.6.0, is that a problem? if I change the version in the command (pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm5.4.2 -> pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm5.6.0) it says

ERROR: Could not find a version that satisfies the requirement torchvision (from versions: none)
ERROR: No matching distribution found for torchvision

RBNXI · 2023-09-10T10:17:15Z

I'm trying to install, still errors everywhere. First of all the bitsandbytes installation fails, so I have to use the pip one.
Then I try to install AutoGPTQ and can't, gives this error:
(tried with both ROCM versions)


(textgen) [ruben@ruben AutoGPTQ]$ ROCM_VERSION=5.6.0 pip install -v .
Using pip 23.2.1 from /run/media/ruben/Prime/CharacterAI/oobabooga_linux/text-generation-webui/miniconda/envs/textgen/lib/python3.10/site-packages/pip (python 3.10)
Processing /run/media/ruben/Prime/CharacterAI/oobabooga_linux/text-generation-webui/text-generation-webui/repositories/AutoGPTQ
  Running command python setup.py egg_info
  Trying to compile auto-gptq for RoCm, but PyTorch 2.0.1+cu117 is installed without RoCm support.
  error: subprocess-exited-with-error
  
  × python setup.py egg_info did not run successfully.
  │ exit code: 255
  ╰─> See above for output.
  
  note: This error originates from a subprocess, and is likely not a problem with pip.
  full command: /run/media/ruben/Prime/CharacterAI/oobabooga_linux/text-generation-webui/miniconda/envs/textgen/bin/python -c '
  exec(compile('"'"''"'"''"'"'
  # This is <pip-setuptools-caller> -- a caller that pip uses to run setup.py
  #
  # - It imports setuptools before invoking setup.py, to enable projects that directly
  #   import from `distutils.core` to work with newer packaging standards.
  # - It provides a clear error message when setuptools is not installed.
  # - It sets `sys.argv[0]` to the underlying `setup.py`, when invoking `setup.py` so
  #   setuptools doesn'"'"'t think the script is `-c`. This avoids the following warning:
  #     manifest_maker: standard file '"'"'-c'"'"' not found".
  # - It generates a shim setup.py, for handling setup.cfg-only projects.
  import os, sys, tokenize
  
  try:
      import setuptools
  except ImportError as error:
      print(
          "ERROR: Can not execute `setup.py` since setuptools is not available in "
          "the build environment.",
          file=sys.stderr,
      )
      sys.exit(1)
  
  __file__ = %r
  sys.argv[0] = __file__
  
  if os.path.exists(__file__):
      filename = __file__
      with tokenize.open(__file__) as f:
          setup_py_code = f.read()
  else:
      filename = "<auto-generated setuptools caller>"
      setup_py_code = "from setuptools import setup; setup()"
  
  exec(compile(setup_py_code, filename, "exec"))
  '"'"''"'"''"'"' % ('"'"'/run/media/ruben/Prime/CharacterAI/oobabooga_linux/text-generation-webui/text-generation-webui/repositories/AutoGPTQ/setup.py'"'"',), "<pip-setuptools-caller>", "exec"))' egg_info --egg-base /tmp/pip-pip-egg-info-spo0oczo
  cwd: /run/media/ruben/Prime/CharacterAI/oobabooga_linux/text-generation-webui/text-generation-webui/repositories/AutoGPTQ/
  Preparing metadata (setup.py) ... error
error: metadata-generation-failed

× Encountered error while generating package metadata.
╰─> See above for output.

note: This is an issue with the package mentioned above, not pip.
hint: See above for details.
(textgen) [ruben@ruben AutoGPTQ]$ ROCM_VERSION=5.4.2 pip install -v .
Using pip 23.2.1 from /run/media/ruben/Prime/CharacterAI/oobabooga_linux/text-generation-webui/miniconda/envs/textgen/lib/python3.10/site-packages/pip (python 3.10)
Processing /run/media/ruben/Prime/CharacterAI/oobabooga_linux/text-generation-webui/text-generation-webui/repositories/AutoGPTQ
  Running command python setup.py egg_info
  Trying to compile auto-gptq for RoCm, but PyTorch 2.0.1+cu117 is installed without RoCm support.
  error: subprocess-exited-with-error
  
  × python setup.py egg_info did not run successfully.
  │ exit code: 255
  ╰─> See above for output.
  
  note: This error originates from a subprocess, and is likely not a problem with pip.
  full command: /run/media/ruben/Prime/CharacterAI/oobabooga_linux/text-generation-webui/miniconda/envs/textgen/bin/python -c '
  exec(compile('"'"''"'"''"'"'
  # This is <pip-setuptools-caller> -- a caller that pip uses to run setup.py
  #
  # - It imports setuptools before invoking setup.py, to enable projects that directly
  #   import from `distutils.core` to work with newer packaging standards.
  # - It provides a clear error message when setuptools is not installed.
  # - It sets `sys.argv[0]` to the underlying `setup.py`, when invoking `setup.py` so
  #   setuptools doesn'"'"'t think the script is `-c`. This avoids the following warning:
  #     manifest_maker: standard file '"'"'-c'"'"' not found".
  # - It generates a shim setup.py, for handling setup.cfg-only projects.
  import os, sys, tokenize
  
  try:
      import setuptools
  except ImportError as error:
      print(
          "ERROR: Can not execute `setup.py` since setuptools is not available in "
          "the build environment.",
          file=sys.stderr,
      )
      sys.exit(1)
  
  __file__ = %r
  sys.argv[0] = __file__
  
  if os.path.exists(__file__):
      filename = __file__
      with tokenize.open(__file__) as f:
          setup_py_code = f.read()
  else:
      filename = "<auto-generated setuptools caller>"
      setup_py_code = "from setuptools import setup; setup()"
  
  exec(compile(setup_py_code, filename, "exec"))
  '"'"''"'"''"'"' % ('"'"'/run/media/ruben/Prime/CharacterAI/oobabooga_linux/text-generation-webui/text-generation-webui/repositories/AutoGPTQ/setup.py'"'"',), "<pip-setuptools-caller>", "exec"))' egg_info --egg-base /tmp/pip-pip-egg-info-3151ooou
  cwd: /run/media/ruben/Prime/CharacterAI/oobabooga_linux/text-generation-webui/text-generation-webui/repositories/AutoGPTQ/
  Preparing metadata (setup.py) ... error
error: metadata-generation-failed

× Encountered error while generating package metadata.
╰─> See above for output.

note: This is an issue with the package mentioned above, not pip.
hint: See above for details.

What am I doing wrong? I'm following the guide... this is so frustrating... could it be that I have to install ROCM 5.4.2 from some rare repository or compile it myself or something obscure like that? It says pytorch is installed without ROCM support? even if I installed it with pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm5.4.2

Edit: The 1 and 2 steeps in the install dependencies section are in different orders, if you run pip install requirements_nocuda first, it will install pytorch without ROCM support...

CubeTheThird · 2024-07-17T20:50:51Z

sup guys, still trying to get the best from my AMD cards, the best I could get was with vulkan and llamacpp, but now there is a cuda for rocm that claims works with llamacpp. https://docs.scale-lang.com/ If thats true, then with some work we can even run exl2, vLLM, aphrodite, and other LLM infering engines that are way better an faster than llamacpp

It's already possible to use other model types if your card can run with ROCm. I have no issue using an exl2 model on a 6700 XT, for instance. If you're talking about older cards though, then it would be a possibility, sure.

userbox020 · 2024-07-17T21:58:27Z

sup guys, still trying to get the best from my AMD cards, the best I could get was with vulkan and llamacpp, but now there is a cuda for rocm that claims works with llamacpp. https://docs.scale-lang.com/ If thats true, then with some work we can even run exl2, vLLM, aphrodite, and other LLM infering engines that are way better an faster than llamacpp

It's already possible to use other model types if your card can run with ROCm. I have no issue using an exl2 model on a 6700 XT, for instance. If you're talking about older cards though, then it would be a possibility, sure.

your right bro @CubeTheThird , im running a very particular setup, its a pcie x1 gen1 motherboard, i have connected 12 gpus. The only infering engine that i ever made work with that setup its llamacpp. Mean while CUDA has none problem with pcie x1 gen1 setup and run any infer engine

dgdguk · 2024-07-18T11:27:31Z

@userbox020 SCALE isn't open source, and honestly, it's likely going to get sued into oblivion. It doesn't necessarily matter that such a lawsuit isn't well founded - Nvidia can "win" by just suing the company to scare off customers and starve Spectral Compute of funding.

As far as your issues go: the problem isn't with ROCm, it's with the runtime environment. So building the packages yourself with your own runtime environment is an option, although the use of a renamed package in text-generation-webui makes things rather difficult. Or you can use the older version of llama-cpp-cuda mentioned above.

userbox020 · 2024-07-24T01:18:39Z

@userbox020 SCALE isn't open source, and honestly, it's likely going to get sued into oblivion. It doesn't necessarily matter that such a lawsuit isn't well founded - Nvidia can "win" by just suing the company to scare off customers and starve Spectral Compute of funding.

As far as your issues go: the problem isn't with ROCm, it's with the runtime environment. So building the packages yourself with your own runtime environment is an option, although the use of a renamed package in text-generation-webui makes things rather difficult. Or you can use the older version of llama-cpp-cuda mentioned above.

so they scam? havent try the repo yet but there are couple of repos out there that integrate cuda with rocm like zluda

userbox020 · 2024-07-24T01:19:32Z

@dgdguk https://github.com/vosen/ZLUDA

userbox020 · 2024-07-24T01:24:24Z

also i can see the examples codes of SCALE and I cant see no code error other than CMAKE_BUILD_TYPE=RelWithDebInfo not showing the version

https://github.com/spectral-compute/scale-examples

dgdguk · 2024-07-24T17:40:19Z

OK - how does any of this help any of the issues here, especially with llama.cpp which has first class support for ROCm?

The problem is something in the build environment of text-generation-webui which I've been unable to pinpoint. It's easy enough to build the packages from source, just not within the text-generation-webui framework.

@oobabooga One thing that comes to mind: if it were possible to split text-generation-webui into the UI and language generation components (so the UI talks to a locally hosted language generation server) it would make it much simpler for other people to provide servers which support hardware that you don't have access to, like AMD GPUs or NPUs in general. It would also fix the issues surrounding CPU and GPU llama.cpp (because they would reside in separate environments, making it trivial to switch between them as well as not needing to patch a module into GPU mode), as well as opening up the possibility of using multiple models simultaneously (by hosting multiple servers).

NonaSuomy · 2024-08-10T19:19:02Z

Did anyone figure this out yet to get it AMD working with recent builds of textgen and stuff like llama 3.1 gguf?

dgdguk · 2024-08-10T22:21:08Z

The only way I've managed it is by building my own llama-cpp-python with ROCm support, and then using that in my own virtualenv. I still have not managed to figure out why ROCm builds of llama-cpp-python fail under the Oobabooga mandated Conda environment. But the state of AMD support for this project is basically "abandoned", and it certainly feels like @oobabooga isn't interested in improving AMD support - to the point where I'm reluctant to put in the effort to fix things and make a pull request.

I am slowly making some build scripts that can handle the weird requirements of webui, and may at some point publish these. But it's early days on that front.

nktice · 2024-08-10T22:58:25Z

Thought I'd posted this, but don't see it in searching, so I'll post again...
I've composed this guide for using Oobabooga under Ubuntu...
https://github.com/nktice/AMD-AI
( see the other files there for older driver versions, and dev v stable )
Basically I have simplified the requirements and that seems to work.

I wasn't ever able to use the built in scripts to satisfaction.
The requirements_amd.txt had been pretty good until some recent changes -
as it refers to older versions than I use, I've worked around it.
With those workarounds the software is wonderful... I'm thankful for it.

CubeTheThird · 2024-08-12T18:09:07Z

I'm observing the same issue since some time too. As a workaround you can uninstall both llama cpp versions and install/build it manually.
CC='/opt/rocm/llvm/bin/clang' CXX='/opt/rocm/llvm/bin/clang++' CFLAGS='-fPIC' CXXFLAGS='-fPIC' CMAKE_PREFIX_PATH='/opt/rocm' ROCM_PATH="/opt/rocm" HIP_PATH="/opt/rocm" CMAKE_ARGS="-GNinja -DLLAMA_HIPBLAS=ON -DLLAMA_AVX2=on -DGPU_TARGETS=$GFX_VER" pip install --no-cache-dir llama-cpp-python
Make sure to replace $GFX_VER with your gpu target(s).

I have tried manually building this way, and while it builds successfully, the resulting llama-cpp-python package seems to only run on CPU regardless.

Not sure if related, but installing this way, I encounter an error when trying to load it (something to do with llama-cpp-python-cuda vs non-cuda package name). My fix has been to tweak the file llama_cpp_python_hijack.py, and swap the two lines:

(None, 'llama_cpp_cuda'),
(None, 'llama_cpp')

I don't know if this is the right way to do this, as I'm not 100% sure how the error is caused in the first place.

codeswhite · 2024-08-13T00:29:58Z

AMD support seems broken for me on Linux, with ROCm 6.0 in /opt/rocm on a desktop with dedicated AMD card.

At first the installation was successfully with the AMD option.

When tried to do inference with a model loaded to the GPU, It was constantly crashing with the following error:

02:08:03-895387 INFO     Loaded "mistral-7b.Q8_0.gguf" in 1.60 seconds.                                      
02:08:03-896031 INFO     LOADER: "llama.cpp"                                                                               
02:08:03-896540 INFO     TRUNCATION LENGTH: 32768                                                                          
02:08:03-896930 INFO     INSTRUCTION TEMPLATE: "ChatML"

Prompt evaluation:   0%|                                                                             | 0/1 [00:00<?, ?it/s]ggml_cuda_compute_forward: RMS_NORM failed
CUDA error: invalid device function
  current device: 0, in function ggml_cuda_compute_forward at /home/runner/work/llama-cpp-python-cuBLAS-wheels/llama-cpp-python-cuBLAS-wheels/vendor/llama.cpp/ggml/src/ggml-cuda.cu:2299
  err
/home/runner/work/llama-cpp-python-cuBLAS-wheels/llama-cpp-python-cuBLAS-wheels/vendor/llama.cpp/ggml/src/ggml-cuda.cu:101: CUDA error
ptrace: Operation not permitted.
No stack.
The program is not being run.

Looks like llama-cpp-python (specifically the custom wheel which is built for GPU loading llama_cpp_python_cuda) is trying to load CUDA instead of ROCm.

Although running pip list in the env I see that ROCM is used, and not CUDA:

llama_cpp_python          0.2.85+cpuavx2
llama_cpp_python_cuda     0.2.85+rocm5.6.1

( Could it be that the wheel was compiled with cuda parameters? )

Weird, and it loads well into the GPU, just crashes on inference.

After some digging,
The solution for me was to:

Clone recursive abetlen/llama-cpp-python
Set compilation params for AMD GPU (with hipBLAS)
- Following docs: https://github.com/ggerganov/llama.cpp/blob/master/docs/build.md
- And workflow file: https://github.com/oobabooga/llama-cpp-python-cuBLAS-wheels/blob/main/.github/workflows/build-wheels-rocm-full.yml
Manually compile llama.cpp with cmake
Build llama-cpp-python
Manually upgrade the llama-cpp-python of the python env
Change file: modules/llama_cpp_python_hijack.py commented out line 29 - because i have not renamed the wheel to be installed under the alternative name, and i dont plan to use CPU mode conditionally.
Circumstantial:
- I had to upgrade conda's libstdcxx-ng package to version = 14.1.0
- I think its because i was compiling llama.cpp with /opt/rocm/bin which used a different GLIBCXX version from what is conda have from the libstdcxx-ng v11.2.0 .

Approximately like that:

cd text-generator-webui
source ./installer_files/conda/etc/profile.d/conda.sh
conda activate ./installer_files/env
git clone --recurse-submodules https://github.com/abetlen/llama-cpp-python

cd llama-cpp-python/vendor/llama.cpp
export GPUTARGETS=gfx1030 CMAKE_BUILD_PARALLEL_LEVEL=22 ## Set your own
export CC=/opt/rocm/llvm/bin/clang CXX=/opt/rocm/llvm/bin/clang++ CFLAGS='-fPIC' CXXFLAGS='-fPIC' CMAKE_PREFIX_PATH=/opt/rocm ROCM_PATH=/opt/rocm HIP_PATH=/opt/rocm VERBOSE=1 CMAKE_ARGS="-GNinja -DLLAVA_BUILD=off -DGGML_HIPBLAS=ON -DGPU_TARGETS=$GPUTARGETS -DCMAKE_BUILD_TYPE=Release -DGGML_NATIVE=off"
cmake -S . -B build && cmake --build build --config release

cd ../..
pip install build && python -m build --wheel
pip install ./dist/llama_cpp_python*.whl ## Note that it will replace the CPU llama-cpp library, so you wll have to use CPU parameter.
pip uninstall llama_cpp_python_cuda ## Not really needed
conda install -c conda-forge libstdcxx-ng=14.1.0

Not sure about the DGGML_NATIVE compilation param didn't find any info about it in my quick search.

I think we could improve this repo by giving the user an option to either pull pre-built wheel, or build llama.cpp and the wheel locally, as part of the initial installation.
@oobabooga let me know if youre interested in a PR for this

Also, Thinking we could install the ROCm wheel under a different python package name: llama_cpp_python_rocm
And the code (llama_cpp_python_hijack.py) should load it explicitly if running on AMD setup (ex. pytorch is installed with HIP). Can make a PR for that as well

Hope it helps someone :)

dgdguk · 2024-08-14T22:06:24Z

Some additional info on building llama-cpp-python natively: The only argument you need is -DGGML_HIPBLAS=on, as per llama-cpp-python's documentation. You don't need to specify the target architecture or AVX types if building locally.

Here is a small script that might be useful to the people here. It pulls down llama-cpp-python and it's dependencies, renames it to llama-cpp-python-cuda and then compiles it. The wheel that's produced can then be installed into a Oobabooga environment and will function correctly, although note that if you are not running Ubuntu 20.04 (or another distro of around that time) you will likely have to make your own Python environment due to different libc versions. This script does assume a sane ROCm build environment, as well as git being installed.

feffy380 · 2024-09-08T08:11:19Z

@dgdguk Your script worked fine on Arch. I had the conda environment enabled but I'm not sure if it's necessary. The conda env created by the start script uses Python 3.11, so my only suggestion is to replace Path.walk with os.walk in your script for compatibility with Python <3.12

dgdguk · 2024-09-09T23:59:45Z

@feffy380 Thanks for the feedback. I've incorporated that change.

The script builds for whatever version of Python is running it, so building under a Conda environment gets you compatibility with that Conda environment. That being said, I had weird issues getting it to build within a text-generation-webui Conda environment when developing it, so I'm not sure if that's a long-term reliable option - I've been using a standard virtualenv with Python 3.12 for a while now.

CaseyLabs · 2024-09-21T20:07:46Z

@dgdguk @codeswhite - thanks for your scripts! For some reason on Arch OS (git version 2.45.2), the git command for me is --recurse-submodules instead of --recursive-submodules:

git clone --recurse-submodules https://github.com/abetlen/llama-cpp-python

FoxMcloud5655 · 2024-09-22T03:21:28Z

I was able to get this working in Windows by following these steps:

Follow these guidelines (thanks to @codeswhite) to create a working environment. Use the CMake for Windows instructions.
Grab the script from @dgdguk from here: https://gist.github.com/dgdguk/95ac9ef99ef17c44c7e44a3b150c92b4
Modify the script at the very bottom to change the cmake arguments to read
build_env['CMAKE_ARGS'] = '-DGGML_HIPBLAS=on -DGGML_OPENMP=OFF -DAMDGPU_TARGETS='
At the end of the -DAMDGPU_TARGETS= variable, find the closest GPU and insert the code for it. I have a Framework 16 with an AMD Radeon RX 7700S, so it's gfx1102, and the closest base to that is gfx1100, so that's what I used.
Run the script in python.

The reason for the -DGGML_OPENMP=OFF variable is because of this: ggerganov/llama.cpp#7743

codeswhite · 2024-09-22T10:46:37Z

@CaseyLabs

git clone --recurse-submodules

You're right! My typo. I edited it to avoid future confusions.

alexmazaltov · 2024-09-26T10:46:22Z

I am trying to get it working on MacOS with docker, the installation passed successfully but when the container is running I am getting this error:

 docker-compose up -d
WARN[0000] /Users/mac/Projects/AI-webui/docker-compose.yml: the attribute `version` is obsolete, it will be ignored, please remove it to avoid potential confusion 
[+] Running 0/1
 ⠼ Container ai-webui-text-generation-webui-1  Starti...                                              0.4s 
Error response from daemon: error gathering device information while adding custom device "/dev/kfd": no such file or directory

Any hints would appreciate

Unfortunately, macOS does not support the /dev/kfd device, and you cannot directly access AMD GPUs in Docker on macOS. The best approach is to use a Linux environment or a cloud service that supports GPU access. If you have further questions or need assistance with a specific setup, feel free to ask!

dgdguk · 2024-09-27T19:00:07Z

@FoxMcloud5655 Not sure if you're still around, but can you confirm that adding the -DAMDGPU_TARGETS=some_gpu and -DGGML_HIPBLAS=on are necessary steps on Windows? As far as I can tell, they shouldn't be with current tooling. Assuming they're not, I'll update my script so that it works on Windows. Obviously, that -DAMDGPU_TARGETS=some_gpu is a bit more difficult to automate.

One other alternative is for Windows users to use WSL.

alexmazaltov · 2024-09-28T01:54:51Z

@FoxMcloud5655 Not sure if you're still around, but can you confirm that adding the -DAMDGPU_TARGETS=some_gpu and -DGGML_HIPBLAS=on are necessary steps on Windows? As far as I can tell, they shouldn't be with current tooling. Assuming they're not, I'll update my script so that it works on Windows. Obviously, that -DAMDGPU_TARGETS=some_gpu is a bit more difficult to automate.

One other alternative is for Windows users to use WSL.

I am trying to install text-gen-webui on windows but I do not understand what script you @dgdguk are trying to update and is it possible to share with me so I can also try it on windows 10.

Please guide me.

oobabooga · 2024-09-28T03:30:04Z

The llama-cpp-python workflow for AMD is broken again, does anyone know what causes this error?

https://github.com/oobabooga/llama-cpp-python-cuBLAS-wheels/actions/runs/11079365489/job/30788288861

fatal error: error in backend: Instruction Combining seems stuck in an infinite loop after 1000 iterations.

FoxMcloud5655 · 2024-09-28T08:57:33Z

The llama-cpp-python workflow for AMD is broken again, does anyone know what causes this error?

https://github.com/oobabooga/llama-cpp-python-cuBLAS-wheels/actions/runs/11079365489/job/30788288861

fatal error: error in backend: Instruction Combining seems stuck in an infinite loop after 1000 iterations.

A quick search says that this issue was fixed in ROCm 6.0.2. I see that this build uses 5.7.1, which technically shouldn't be an issue... But I was personally using 6.1 with no issues once the environment was set up on Windows 11.

FoxMcloud5655 · 2024-09-28T09:02:04Z

@FoxMcloud5655 Not sure if you're still around, but can you confirm that adding the -DAMDGPU_TARGETS=some_gpu and -DGGML_HIPBLAS=on are necessary steps on Windows? As far as I can tell, they shouldn't be with current tooling. Assuming they're not, I'll update my script so that it works on Windows. Obviously, that -DAMDGPU_TARGETS=some_gpu is a bit more difficult to automate.

One other alternative is for Windows users to use WSL.

The target isn't required, as far as I know. I used it to make sure the wheel was as targeted to my own system and as small of a footprint as possible. Not even sure if it helped or not. However, I can confirm that the HIPBLAS argument is required. A seemingly unrelated error appears if you don't include it.

oobabooga · 2024-09-28T15:12:46Z

Currently the project uses rocm 5.6.1, and the error also happened with 5.7.1. It seems like most Linux distributions ship 5.7.1 by default, so I'm not sure if upgrading would be reasonable.

LeonardoSidney · 2024-09-28T17:34:59Z

The llama-cpp-python workflow for AMD is broken again, does anyone know what causes this error?

https://github.com/oobabooga/llama-cpp-python-cuBLAS-wheels/actions/runs/11079365489/job/30788288861

fatal error: error in backend: Instruction Combining seems stuck in an infinite loop after 1000 iterations.

the error I found here was from splitting the batch.
To fix this, I downgraded to version llama-cpp-python==0.2.90
torch 2.4.1 rocm 6.1 and bitandbytes branch multi-backend-refactor have no issues.
to create llama-cpp-python on my machine I use the command:

CMAKE_ARGS="/opt/rocm/llvm/bin/clang HIP_PATH=/opt/rocm -DLLAMA_HIPBLAS=ON -DAMDGPU_TARGETS=gfx1100 -DCMAKE_BUILD_TYPE=Release -DCMAKE_C_COMPILER=/opt/rocm/llvm/bin/clang -DCMAKE_CXX_COMPILER=/opt/rocm/llvm/bin/clang++ -DGGML_HIPBLAS=on -DCMAKE_C_FLAGS='-march=native -mavx2' -DCMAKE_CXX_FLAGS='-march=native -mavx2'" FORCE_CMAKE=1 CMAKE_BUILD_PARALLEL_LEVEL=16 pip install llama-cpp-python==0.2.90

dgdguk · 2024-09-30T16:48:03Z

The llama-cpp-python workflow for AMD is broken again, does anyone know what causes this error?

@oobabooga AS others have said, the issue is likely due to the outdated version of ROCM that you are using.

One thing I don't quite get: it's pretty clear one of the "features" you have invested a lot of effort is following upstream project builds to a frankly unreasonable level, building almost every new commit, but you don't seem to apply that zeal to the frameworks. The most recent frameworks you're building against are ROCM 5.6.1 (May 2023) and CUDA 12.2 (June 2023). That's more than a years worth of features and bug-fixes you're leaving on the table.

Realistically, I think you should probably cull some of the older frameworks that you're building against so that you can build against newer targets. There's seriously no reason to build for every minor version of CUDA i.e. 12.0, 12.1 and 12.2, or 11.6, 11.7 and 11.8. Just build 12.2 and possibly 11.8 if there were hardware deprecations in CUDA 12. The same is true for ROCM (more so, really, as ROCM has been having a lot more feature enablement than CUDA).

oobabooga · 2024-09-30T17:03:47Z

If you see a improvement to the wheels, send a PR to the repositories where they are compiled.

https://github.com/oobabooga/llama-cpp-python-cuBLAS-wheels

https://github.com/oobabooga/exllamav2

https://github.com/oobabooga/flash-attention

dgdguk · 2024-09-30T17:40:27Z

My point was more along the lines that it's not just AMD, which you have repeatedly disavowed interest in supporting due to lack of hardware. That's I can understand. I still think you would be best served not building any AMD wheels at all, however, given that they are frequently non-functional and you lack the resources to fix them.

What's not understandable is leaving CUDA on an old version while targeting bleeding edge code for your builds of binaries. That's your wheelhouse.

I've said this before: text-generation-webui is currently trying to be two separate things. It's both a user interface component, and a distribution of binaries of upstream stuff to run that user interface. That second component gets in the way of anyone else stepping up to provide binaries for hardware that you do not support, and given how you are maintaining the CUDA side, I'd argue it's not exactly well maintained anyway.

And just to be clear on something: you cannot go "someone submit a PR" and then merge the code without testing - which you have explicitly said you cannot do for ROCM. Anyone could submit a PR that builds a malicious package, which you would then release under your own name, completely in ignorance. This would not be a good thing. If you're accepting PRs, then you or the project needs a way of testing.

FoxMcloud5655 · 2024-09-30T19:48:00Z

I second what @dgdguk said above. Don't get me wrong; I'm extremely appreciative of the work you've done so far. But like they said, I would much rather you refuse to support something entirely, simply providing build instructions, than provide non-working binaries.

There are many a number of users who might even be happy to donate an AMD card for testing or even provide funds explicitly for purchasing an AMD card for you to test with, me included. But the last thing I want is for bad practices to be followed. Open source code is a great thing when these practices are followed; let's keep it that way.

oobabooga pinned this issue Aug 30, 2023

oobabooga mentioned this issue Aug 30, 2023

GPTQ-for-Llama broken on AMD #3754

Closed

1 task

AMD thread #3759

AMD thread #3759

Comments

oobabooga commented Aug 30, 2023 • edited Loading

MistakingManx commented Aug 30, 2023

BarfingLemurs commented Aug 30, 2023

lufixSch commented Aug 30, 2023

MistakingManx commented Aug 31, 2023

BarfingLemurs commented Aug 31, 2023

MistakingManx commented Sep 3, 2023 • edited Loading

CNR0706 commented Sep 3, 2023

henrittp commented Sep 3, 2023

CNR0706 commented Sep 3, 2023 • edited Loading

henrittp commented Sep 3, 2023

henrittp commented Sep 3, 2023 • edited Loading

lufixSch commented Sep 3, 2023

lufixSch commented Sep 3, 2023

lufixSch commented Sep 3, 2023

RBNXI commented Sep 7, 2023

lufixSch commented Sep 7, 2023 • edited Loading

RBNXI commented Sep 7, 2023

lufixSch commented Sep 7, 2023

RBNXI commented Sep 7, 2023 • edited Loading

lufixSch commented Sep 7, 2023

RBNXI commented Sep 8, 2023 • edited Loading

containerblaq1 commented Sep 9, 2023

lufixSch commented Sep 9, 2023 • edited Loading

AMD Setup Step-by-Step Guide (WIP)

Step 1: Install dependencies (should be similar to the one click installer except the last step)

Step 2: Fix bitsandbytes

Step 3: Install AutoGPTQ

Step 4: Exllama

Step 4.5: ExllamaV2

Step 5: llama-cpp-python

Notes on 7xxx AMD GPUs

lufixSch commented Sep 9, 2023

RBNXI commented Sep 9, 2023 • edited Loading

lufixSch commented Sep 10, 2023

RBNXI commented Sep 10, 2023 • edited Loading

RBNXI commented Sep 10, 2023 • edited Loading

CubeTheThird commented Jul 17, 2024

userbox020 commented Jul 17, 2024 • edited Loading

dgdguk commented Jul 18, 2024

userbox020 commented Jul 24, 2024

userbox020 commented Jul 24, 2024

userbox020 commented Jul 24, 2024

dgdguk commented Jul 24, 2024

NonaSuomy commented Aug 10, 2024

dgdguk commented Aug 10, 2024

nktice commented Aug 10, 2024

CubeTheThird commented Aug 12, 2024

codeswhite commented Aug 13, 2024 • edited Loading

dgdguk commented Aug 14, 2024 • edited Loading

feffy380 commented Sep 8, 2024

dgdguk commented Sep 9, 2024

CaseyLabs commented Sep 21, 2024

FoxMcloud5655 commented Sep 22, 2024 • edited Loading

codeswhite commented Sep 22, 2024

alexmazaltov commented Sep 26, 2024 • edited Loading

dgdguk commented Sep 27, 2024

alexmazaltov commented Sep 28, 2024

oobabooga commented Sep 28, 2024

FoxMcloud5655 commented Sep 28, 2024

FoxMcloud5655 commented Sep 28, 2024

oobabooga commented Sep 28, 2024

LeonardoSidney commented Sep 28, 2024 • edited Loading

dgdguk commented Sep 30, 2024 • edited Loading

oobabooga commented Sep 30, 2024

dgdguk commented Sep 30, 2024

FoxMcloud5655 commented Sep 30, 2024

oobabooga commented Aug 30, 2023 •

edited

Loading

MistakingManx commented Sep 3, 2023 •

edited

Loading

CNR0706 commented Sep 3, 2023 •

edited

Loading

henrittp commented Sep 3, 2023 •

edited

Loading

lufixSch commented Sep 7, 2023 •

edited

Loading

RBNXI commented Sep 7, 2023 •

edited

Loading

RBNXI commented Sep 8, 2023 •

edited

Loading

lufixSch commented Sep 9, 2023 •

edited

Loading

RBNXI commented Sep 9, 2023 •

edited

Loading

RBNXI commented Sep 10, 2023 •

edited

Loading

RBNXI commented Sep 10, 2023 •

edited

Loading

userbox020 commented Jul 17, 2024 •

edited

Loading

codeswhite commented Aug 13, 2024 •

edited

Loading

dgdguk commented Aug 14, 2024 •

edited

Loading

FoxMcloud5655 commented Sep 22, 2024 •

edited

Loading

alexmazaltov commented Sep 26, 2024 •

edited

Loading

LeonardoSidney commented Sep 28, 2024 •

edited

Loading

dgdguk commented Sep 30, 2024 •

edited

Loading