Skip to content

Latest commit

 

History

History
18 lines (13 loc) · 1.28 KB

README.md

File metadata and controls

18 lines (13 loc) · 1.28 KB

PyTorch GPU Memory Profiling & Debugging

  • scripts/: Memory profiling scripts. These are minimal examples using ResNet50 as a model for demonstration purposes.
    • memorysnapshot.py: Visualisation of memory usage over time via a stack trace.
    • convert_snapshot.sh: Executable to convert the snapshot produced with _torch/cuda/memory_viz.py (as shown below but without having to manually specify your torch install path)
    • memoryprofile.py: Running profiling gives visualisation of memory usage aggregated by usage type, i.e. classified into optimizer, activations, parameters, backwards pass (autograd-related)
  • get_pytorch_environment_info.py: snippet prints out relevant information about the user's environment including PyTorch, Python and CUDA (Toolkit/Runtime) versions, the NVIDIA CUDA Deep Neural Network library (cuDNN) version and more.

Manual conversion of memory snapshot:

python torch/cuda/_memory_viz.py trace_plot snapshot.pickle -o snapshot.html

Code Source: Understanding GPU Memory 1: Visualizing All Allocations over Time (December 14, 2023) by Aaron Shi, Zachary DeVito