You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
There's no problem of memory leak until I offload the model on my GPU with n_gpu_layers > 0
The problem of leak comes with ggml_backend_cuda_reg() that act as a singleton-like fashion and never release the static context allocated as new. I don't know if the problem is specific from my build (GGML_STATIC) but I think that is potentially related to the comment:
ggml-backend-reg.cpp (line 196)
~ggml_backend_registry() {
// FIXME: backends cannot be safely unloaded without a function to destroy all the backend resources,
// since backend threads may still be running and accessing resources from the dynamic library
This is not a huge leak but here's the details of the leak detector:
The CRT library classic output (another execution):
Detected memory leaks!
Dumping objects ->
{10179386} normal block at 0x0000018EBE7F3950, 16 bytes long.
Data: < r > 98 72 A9 BD 8E 01 00 00 00 00 00 00 00 00 00 00
{10179384} normal block at 0x0000018EBDA97290, 48 bytes long.
Data: < P9 > 00 00 00 00 CD CD CD CD 50 39 7F BE 8E 01 00 00
{10179376} normal block at 0x0000018EBE7F3CC0, 8 bytes long.
Data: < ) > 90 8D 29 C7 F6 7F 00 00
{7792} normal block at 0x0000018E9132B3F0, 8 bytes long.
Data: < > F0 07 F0 90 8E 01 00 00
{7791} normal block at 0x0000018E90F007F0, 136 bytes long.
Data: <@ > 40 14 E8 C5 F6 7F 00 00 80 14 E8 C5 F6 7F 00 00
{7790} normal block at 0x0000018E912A1FF0, 32 bytes long.
Data: <NVIDIA GeForce R> 4E 56 49 44 49 41 20 47 65 46 6F 72 63 65 20 52
{7787} normal block at 0x0000018E9132B490, 16 bytes long.
Data: < > D0 AD FC 90 8E 01 00 00 00 00 00 00 00 00 00 00
{7786} normal block at 0x0000018E9132B3A0, 16 bytes long.
Data: < > A8 AD FC 90 8E 01 00 00 00 00 00 00 00 00 00 00
{7785} normal block at 0x0000018E90FCADA0, 88 bytes long.
Data: < 2 > 00 00 00 00 CD CD CD CD A0 B3 32 91 8E 01 00 00
{7783} normal block at 0x0000018E90E56420, 16 bytes long.
Data: <P * > 50 1D 2A 91 8E 01 00 00 00 00 00 00 00 00 00 00
{7782} normal block at 0x0000018E912A1D50, 32 bytes long.
Data: < d 2 > 20 64 E5 90 8E 01 00 00 F0 B3 32 91 8E 01 00 00
Object dump complete.
I do not have found a way to release myself the memory because I cannot access to the underlaying ggml_backend_reg * of ggml_backend_reg_t that is hide in the ggml-backend-impl.h and not more accessible directly after building the static libraries.
I don't know if it's enough, but a code like that can eventually do the job:
Name and Version
build: 5124 (bc091a4) with MSVC 19.43.34810.0 for x64 (debug)
static build (MT/MTd) with VS2022 / LLAMA & GGML
GGML_STATIC / GGML_USE_CPU / GGML_USE_BLAS / GGML_USE_CUDA
Operating systems
Windows
Which llama.cpp modules do you know to be affected?
Other (Please specify in the next section)
Command line
No command line. Using personal C++ (preliminary) implementation that follow the steps of the llama-cli.
Problem description & steps to reproduce
I'm currently only do :
There's no problem of memory leak until I offload the model on my GPU with
n_gpu_layers > 0
The problem of leak comes with
ggml_backend_cuda_reg()
that act as a singleton-like fashion and never release the static context allocated asnew
. I don't know if the problem is specific from my build (GGML_STATIC
) but I think that is potentially related to the comment:This is not a huge leak but here's the details of the leak detector:
The CRT library classic output (another execution):
I do not have found a way to release myself the memory because I cannot access to the underlaying
ggml_backend_reg *
ofggml_backend_reg_t
that is hide in theggml-backend-impl.h
and not more accessible directly after building the static libraries.I don't know if it's enough, but a code like that can eventually do the job:
Unfortunatly:
E0833 pointer or reference to incomplete type not allowed
!I hope that help.
Thank's a lot for ggml/llama.cpp ! You rock !
First Bad Commit
No response
Relevant log output
The text was updated successfully, but these errors were encountered: