Skip to content

Commit a4549bc

Browse files
committed
update
Signed-off-by: Stas Bekman <[email protected]>
1 parent 07ae2b8 commit a4549bc

File tree

1 file changed

+2
-2
lines changed

1 file changed

+2
-2
lines changed

training/performance/README.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -293,7 +293,7 @@ In addition to the memory usage described in the previous section, there are oth
293293

294294
#### Preloaded CUDA kernels memory usage
295295

296-
When PyTorch uses CUDA for the first time, it may use up 0.5-2GB of GPU memory, reducing the GPU's total available memory.
296+
When PyTorch uses CUDA for the first time, it may use up 0.5-2GB of GPU memory, reducing the GPU's total available memory. This memory won't be accounted for by torch memory profiler.
297297

298298
The size of allocated memory for cuda kernels varies between different GPUs, and also it can be different between pytorch versions. Let's allocate a 4-byte tensor on cuda and check how much GPU memory is used up upfront.
299299

@@ -332,7 +332,7 @@ There is a 450MB difference, but here we only loaded kernels to do `torch.ones`
332332

333333
#### `torch.distributed` memory usage
334334

335-
When using `torch.distributed` expect ~1-2GB of GPU memory taken away - the more GPUs the higher the memory used. Different backends are likely to use a different amount of memory.
335+
When using `torch.distributed` expect ~1-2GB of GPU memory taken away just to initialize things - the more GPUs the higher the memory used. Different backends are likely to use a different amount of memory. And this memory won't be accounted for by torch memory profiler.
336336

337337
Here is [torch-dist-mem-usage.py](distributed/torch-dist-mem-usage.py) that demonstrates the actual memory usage:
338338

0 commit comments

Comments
 (0)