You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: training/performance/README.md
+2-2Lines changed: 2 additions & 2 deletions
Original file line number
Diff line number
Diff line change
@@ -293,7 +293,7 @@ In addition to the memory usage described in the previous section, there are oth
293
293
294
294
#### Preloaded CUDA kernels memory usage
295
295
296
-
When PyTorch uses CUDA for the first time, it may use up 0.5-2GB of GPU memory, reducing the GPU's total available memory.
296
+
When PyTorch uses CUDA for the first time, it may use up 0.5-2GB of GPU memory, reducing the GPU's total available memory. This memory won't be accounted for by torch memory profiler.
297
297
298
298
The size of allocated memory for cuda kernels varies between different GPUs, and also it can be different between pytorch versions. Let's allocate a 4-byte tensor on cuda and check how much GPU memory is used up upfront.
299
299
@@ -332,7 +332,7 @@ There is a 450MB difference, but here we only loaded kernels to do `torch.ones`
332
332
333
333
#### `torch.distributed` memory usage
334
334
335
-
When using `torch.distributed` expect ~1-2GB of GPU memory taken away - the more GPUs the higher the memory used. Different backends are likely to use a different amount of memory.
335
+
When using `torch.distributed` expect ~1-2GB of GPU memory taken away just to initialize things - the more GPUs the higher the memory used. Different backends are likely to use a different amount of memory. And this memory won't be accounted for by torch memory profiler.
336
336
337
337
Here is [torch-dist-mem-usage.py](distributed/torch-dist-mem-usage.py) that demonstrates the actual memory usage:
0 commit comments