diff --git a/docs/add-a-model.md b/docs/add-a-model.md index 1358c4dae..e52d573ab 100644 --- a/docs/add-a-model.md +++ b/docs/add-a-model.md @@ -32,10 +32,22 @@ graph TB subgraph Hardware["Backend Execution Layer"] direction TB backend_impl[" The backend package provides:
- Unified computation interface
- Automatic hardware selection
- Optimized kernels
- Efficient memory management "] + + subgraph Backends["Backend Implementations"] + direction LR + cpu["backend/cpu
- Pure Go implementation
- Fallback for all platforms"] + + metal["backend/metal
- Apple Silicon (M1/M2/M3)
- MLX integration
- Leverages Apple Neural Engine"] + + onnx["backend/onnx
- Cross-platform compatibility
- ONNX Runtime integration
- Pre-compiled graph execution"] + + ggml["backend/ggml
- CPU/GPU quantized compute
- Low-precision operations
- Memory-efficient inferencing"] + end end Models --> |" Makes high-level calls
(e.g., self-attention) "| ML_Ops ML_Ops --> |" Translates to tensor operations
(e.g., matmul, softmax) "| Hardware + backend_impl --> Backends ``` When implementing a new model, you'll primarily work in the model layer, interfacing with the neural network operations layer. @@ -323,4 +335,4 @@ To open a draft PR: ```bash ollama create / -f /path/to/Modelfile ollama push / - ``` \ No newline at end of file + ```