You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docfx/articles/memory.md
+13-4
Original file line number
Diff line number
Diff line change
@@ -8,6 +8,8 @@ Two approaches are available for memory management. Technique 1 is the default a
8
8
9
9
Note DiffSharp (which uses TorchSharp) relies on techniques 1.
10
10
11
+
> Most of the examples included will use technique #1, doing frequent explicit calls to GC.Collect() in the training code -- if not after each batch in the training loop, at least after each epoch.
12
+
11
13
## Technique 1. Implicit disposal using finalizers
12
14
13
15
In this technique all tensors (CPU and GPU) are implicitly disposed via .NET finalizers.
@@ -21,19 +23,26 @@ This is not yet done when using general tensor operations. It is possible a mor
21
23
22
24
👎 The .NET GC doesn't know of the memory pressure from CPU tensors, so failure may happen if large tensors can't be allocated
23
25
24
-
👎 The .NET GC doesn't know of GPU resources
26
+
👎 The .NET GC doesn't know of GPU resources.
27
+
28
+
👎 Native operations that allocate temporaries, whether on CPU or GPU, may fail -- the GC scheme implemented by TorchSharp only works when the allocation is initiated by .NET code.
25
29
26
30
## Technique 2. Explicit disposal
27
31
28
32
In this technique specific tensors (CPU and GPU) are explicitly disposed
29
33
using `using` in C# or explicit calls to `System.IDisposable.Dispose()`.
30
34
31
-
👍 control
35
+
👍 Specific lifetime management of all resources.
36
+
37
+
👎 Cumbersome, requiring lots of using statements in your code.
32
38
33
-
👎 you must know when to dispose
39
+
👎 You must know when to dispose.
40
+
41
+
👎 Temporaries are not covered by this approach, so to maximize the benefit, you may have to store all temporaries to variables and dispose.
34
42
35
43
> NOTE: Disposing a tensor only releases the underlying storage if this is the last
36
-
> live TorchTensor which has a view on that tensor.
44
+
> live TorchTensor which has a view on that tensor -- the native runtime does reference counting of tensors.
When using PyTorch, the expected pattern to use when saving and later restoring models from disk or other permanent storage media, is to get the model's state and pickle that using the standard Python format.
When restoring the model, you are expected to first create a model of the exact same structure as the original, with random weights, then restore the state:
This presents a couple of problems for a .NET implementation. First, Python pickling is very intimately coupled with Python and its runtime object model. It is a complex format that supports object graphs that form DAGs, and faithfully maintaining all object state.
17
+
18
+
Second, in order to share models between .NET applications, Python pickling is not necessary, and even for moving model state from Python to .NET, it is overkill. The state of a model is a simple dictionary where the keys are strings and the values are tensors.
19
+
20
+
Therefore, TorchSharp in its current form, implements its own very simple model serialization format, which allows models originating in either .NET or Python to be loaded using .NET, as long as the model was saved using the special format.
21
+
22
+
The MNIST and AdversarialExampleGeneration examples in this repo rely on saving and restoring model state -- the latter example relies on a pre-trained model from MNST.
23
+
24
+
> A future version of TorchSharp may include support for reading and writing Python pickle files directly. There are
25
+
26
+
## How to use the TorchSharp format
27
+
28
+
29
+
In C#, saving a model looks like this:
30
+
31
+
```C#
32
+
model.save("model_weights.dat");
33
+
```
34
+
35
+
It's important to note that calling 'save' will move the model to the CPU, where it remains after the call. If you need to continue to use the model after saving it, you will have to explicitly move it back:
36
+
37
+
```C#
38
+
model.to(Device.CUDA);
39
+
```
40
+
41
+
And loading it again is done by:
42
+
43
+
```C#
44
+
model= [...];
45
+
model.load("model_weights.dat");
46
+
```
47
+
48
+
The model should be created on the CPU before loading weights, then moved to the target device.
49
+
50
+
If the model starts out in Python, there's a simple script that allows you to use code that is very similar to the Pytorch API to save models to the TorchSharp format. Rather than placing this trivial script in a Python package and publishing it, we choose to just refer you to the script file itself, [exportsd.py](../src/Python/exportsd.py), which has all the necessary code.
0 commit comments