Ubuntu 24.04 LTS, ROCm 6.2
(base) lhl@rocm:~/llama.cpp$ CUDA_VISIBLE_DEVICES=0 ./llama-bench -m /models/gguf/llama-2-7b.Q4_K_M.gguf
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: CUDA_USE_TENSOR_CORES: yes
ggml_cuda_init: found 1 ROCm devices:
Device 0: AMD Radeon PRO W7900, compute capability 11.0, VMM: no
| model | size | params | backend | ngl | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | ------------: | ---------------: |
| llama 7B Q4_K - Medium | 3.80 GiB | 6.74 B | ROCm | 99 | pp512 | 2845.90 ± 11.00 |
| llama 7B Q4_K - Medium | 3.80 GiB | 6.74 B | ROCm | 99 | tg128 | 78.92 ± 0.12 |
build: 96355290 (3141)
(base) lhl@rocm:~/llama.cpp$ CUDA_VISIBLE_DEVICES=0 ./llama-bench -m /models/gguf/llama-2-7b.Q4_0.gguf
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: CUDA_USE_TENSOR_CORES: yes
ggml_cuda_init: found 1 ROCm devices:
Device 0: AMD Radeon PRO W7900, compute capability 11.0, VMM: no
| model | size | params | backend | ngl | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | ------------: | ---------------: |
| llama 7B Q4_0 | 3.56 GiB | 6.74 B | ROCm | 99 | pp512 | 2837.83 ± 136.68 |
| llama 7B Q4_0 | 3.56 GiB | 6.74 B | ROCm | 99 | tg128 | 94.46 ± 0.07 |
build: 96355290 (3141)
w/ latest build
(base) lhl@rocm:~/llama.cpp$ CUDA_VISIBLE_DEVICES=0 ./llama-bench -m /models/gguf/llama-2-7b.Q4_K_M.gguf
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 ROCm devices:
Device 0: AMD Radeon PRO W7900, compute capability 11.0, VMM: no
| model | size | params | backend | ngl | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | ------------: | ---------------: |
| llama 7B Q4_K - Medium | 3.80 GiB | 6.74 B | ROCm | 99 | pp512 | 2877.31 ± 14.47 |
| llama 7B Q4_K - Medium | 3.80 GiB | 6.74 B | ROCm | 99 | tg128 | 79.44 ± 0.13 |
build: c02b0a8a (3512)
(base) lhl@rocm:~/llama.cpp$ CUDA_VISIBLE_DEVICES=0 ./llama-bench -m /models/gguf/llama-2-7b.Q4_0.gguf
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 ROCm devices:
Device 0: AMD Radeon PRO W7900, compute capability 11.0, VMM: no
| model | size | params | backend | ngl | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | ------------: | ---------------: |
| llama 7B Q4_0 | 3.56 GiB | 6.74 B | ROCm | 99 | pp512 | 2907.80 ± 22.70 |
| llama 7B Q4_0 | 3.56 GiB | 6.74 B | ROCm | 99 | tg128 | 95.01 ± 0.05 |
build: c02b0a8a (3512)