diff --git a/docs/add-a-model.md b/docs/add-a-model.md
index 1358c4dae..e52d573ab 100644
--- a/docs/add-a-model.md
+++ b/docs/add-a-model.md
@@ -32,10 +32,22 @@ graph TB
subgraph Hardware["Backend Execution Layer"]
direction TB
backend_impl[" The backend package provides:
- Unified computation interface
- Automatic hardware selection
- Optimized kernels
- Efficient memory management "]
+
+ subgraph Backends["Backend Implementations"]
+ direction LR
+ cpu["backend/cpu
- Pure Go implementation
- Fallback for all platforms"]
+
+ metal["backend/metal
- Apple Silicon (M1/M2/M3)
- MLX integration
- Leverages Apple Neural Engine"]
+
+ onnx["backend/onnx
- Cross-platform compatibility
- ONNX Runtime integration
- Pre-compiled graph execution"]
+
+ ggml["backend/ggml
- CPU/GPU quantized compute
- Low-precision operations
- Memory-efficient inferencing"]
+ end
end
Models --> |" Makes high-level calls
(e.g., self-attention) "| ML_Ops
ML_Ops --> |" Translates to tensor operations
(e.g., matmul, softmax) "| Hardware
+ backend_impl --> Backends
```
When implementing a new model, you'll primarily work in the model layer, interfacing with the neural network operations layer.
@@ -323,4 +335,4 @@ To open a draft PR:
```bash
ollama create / -f /path/to/Modelfile
ollama push /
- ```
\ No newline at end of file
+ ```