You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Could you please advise on how to correctly run llama.cpp with tensor parallelism set to 4 and full GPU support? I have four GPUs with 48 GB of VRAM each.
Hello,
Could you please advise on how to correctly run llama.cpp with tensor parallelism set to 4 and full GPU support? I have four GPUs with 48 GB of VRAM each.
I am using the following command:
For speculative decoding, I am using the below command:
The text was updated successfully, but these errors were encountered: