more comments

2024-08-26 19:56:45 -07:00 · 2024-08-26 19:56:45 -07:00 · 15b7ff3a89
commit 15b7ff3a89
parent 3ad243466b
1 changed files with 18 additions and 17 deletions
--- a/docs/import.md
+++ b/docs/import.md
@ -114,6 +114,24 @@ Quantizing a model allows you to run models faster and with less memory consumpt

 Ollama can quantize FP16 and FP32 based models into different quantization levels using the `-q/--quantize` flag with the `ollama create` command.

+First, create a Modelfile with the FP16 or FP32 based model you wish to quantize.
+
+```dockerfile
+FROM /path/to/my/gemma/f16/model
+```
+
+Use `ollama create` to then create the quantized model.
+
+```shell
+$ ollama create --quantize q4_K_M mymodel
+transferring model data
+quantizing F16 model to Q4_K_M
+creating new layer sha256:735e246cc1abfd06e9cdcf95504d6789a6cd1ad7577108a70d9902fef503c1bd
+creating new layer sha256:0853f0ad24e5865173bbf9ffcc7b0f5d56b66fd690ab1009867e45e7d2c4db0f
+writing manifest
+success
+```
+
 ### Supported Quantizations

 - `q4_0`
@ -133,23 +151,6 @@ Ollama can quantize FP16 and FP32 based models into different quantization level
 - `q5_K_M`
 - `q6_K`

-First, create a Modelfile with the FP16 or FP32 based model you wish to quantize.
-
-```dockerfile
-FROM /path/to/my/gemma/f16/model
-```
-
-Use `ollama create` to then create the quantized model.
-
-```shell
-$ ollama create -q q4_K_M mymodel
-transferring model data
-quantizing F16 model to Q4_K_M
-creating new layer sha256:735e246cc1abfd06e9cdcf95504d6789a6cd1ad7577108a70d9902fef503c1bd
-creating new layer sha256:0853f0ad24e5865173bbf9ffcc7b0f5d56b66fd690ab1009867e45e7d2c4db0f
-writing manifest
-success
-```

 ## Sharing your model on ollama.com