You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: examples/perplexity/README.md
+4-1
Original file line number
Diff line number
Diff line change
@@ -42,10 +42,13 @@ In addition to the KL divergence the following statistics are calculated with `-
42
42
43
43
Results were generated using the CUDA backend and are sorted by Kullback-Leibler divergence relative to FP16.
44
44
The "WT" importance matrices were created using varying numbers of Wikitext tokens and can be found [here](https://huggingface.co/JohannesGaessler/llama.cpp_importance_matrices/blob/main/imatrix-llama_3-8b-f16-2.7m_tokens.dat).
45
+
Note: the FP16 logits used for the calculation of all metrics other than perplexity are stored in a binary file between runs.
46
+
In order to save space this file does **not** contain the exact same FP32 logits but instead casts them to 16 bit unsigned integers (with some scaling).
47
+
So the "f16" results are to be understood as the difference resulting only from this downcast.
45
48
46
49
| Quantization | imatrix | Model size [GiB]| PPL | ΔPPL | KLD | Mean Δp | RMS Δp |
0 commit comments