diff --git a/docs/implement.md b/docs/implement.md
new file mode 100644
index 000000000..17b8d7b04
--- /dev/null
+++ b/docs/implement.md
@@ -0,0 +1,216 @@
+# Guide: Implementing Models in Ollama's Go Inference Engine
+
+> **Note**: This guide and the Go inference engine are in early development and will be updated as implementation details evolve.
+
+This guide outlines the process of implementing a new model in Ollama's Go GGML inference engine. It covers everything from initial setup to deploying your model to ollama.com.
+
+## Architecture Overview
+
+Below is a diagram showing Ollama's inference engine architecture layers and how they interact:
+
+```mermaid
+graph TB
+    subgraph Models["Model Layer: LLM Implementations"]
+        direction TB
+        llama["model/models/llama/model.go"]
+        mllama["model/models/mllama/model.go"]
+        qwen["model/models/qwen2/model.go"]
+        qwen_vl["model/models/qwen2vl/model.go"]
+        
+        note1["Each model implements a specific architecture
+        - Defines model parameters
+        - Implements forward pass"]
+    end
+
+    subgraph ML_Ops["Neural Network Operations"]
+        direction TB
+        nn_ops["nn/
+            linear.go - Matrix operations
+            embedding.go - Token embeddings
+            normalization.go - Layer normalization
+            convolution.go - Conv operations"]
+        
+        backend["ml/backend.go
+        Hardware Abstraction Layer
+        - Defines tensor operations
+        - Manages computation graphs
+        - Handles memory allocation"]
+
+        note2["Common neural net operations
+        used across different models
+        - Abstracts hardware details
+        - Provides unified API
+        - Manages computation flow"]
+    end
+
+    subgraph GGML["Hardware Execution Layer"]
+        direction TB
+        ggml["ggml.go
+        CGO Interface
+        - Bridges Go and C++
+        - Handles type conversion
+        - Manages memory between languages"]
+        
+        subgraph Hardware_Specific["Hardware-Specific Implementations"]
+            direction LR
+            cpu["ggml-cpu.h
+            CPU optimized ops"]
+            cuda["ggml-cuda.h
+            NVIDIA GPU ops"]
+            metal["ggml-metal.h
+            Apple GPU ops"]
+            vulkan["ggml-vulkan.h
+            Cross-platform GPU"]
+            opencl["ggml-opencl.h
+            OpenCL acceleration"]
+        end
+
+        note3["GGML provides optimized 
+        implementations for each hardware:
+        - Automatic dispatch
+        - Hardware-specific optimizations
+        - Memory management
+        - Parallel execution"]
+    end
+
+    %% Connections with explanations
+    Models --> |"Makes high-level calls
+    (e.g., self-attention)"| ML_Ops
+    ML_Ops --> |"Translates to tensor operations
+    (e.g., matmul, softmax)"| GGML
+    GGML --> |"Executes optimized code
+    on target hardware"| Hardware_Specific
+    
+    %% Styling
+    classDef model fill:#fff,stroke:#01579b,stroke-width:2px
+    classDef ml fill:#fff,stroke:#e65100,stroke-width:2px
+    classDef hw fill:#fff,stroke:#b71c1c,stroke-width:2px
+    classDef note fill:#fff,stroke:#666,stroke-dasharray: 5 5
+    
+    class llama,mllama,qwen,qwen_vl,pixtral model
+    class nn_ops,backend ml
+    class ggml,cpu,cuda,metal,vulkan,opencl hw
+    class note1,note2,note3 note
+
+    %% Style subgraphs
+    style Models fill:#fff,stroke:#01579b,stroke-width:2px
+    style ML_Ops fill:#fff,stroke:#e65100,stroke-width:2px
+    style GGML fill:#fff,stroke:#b71c1c,stroke-width:2px
+    style Hardware_Specific fill:#fff,stroke:#b71c1c,stroke-width:1px
+```
+
+When implementing a new model, you'll primarily work in the model layer, interfacing with the neural network operations layer.
+
+## Implementation Steps
+
+### 1. Environment Setup
+
+First, clone the Ollama repository and get it running locally. Follow the development setup guide at:
+https://github.com/ollama/ollama/blob/main/docs/development.md
+
+### 2. Research Implementation
+
+Get the original model implementation running. This typically involves:
+- Cloning the research code repository (usually Python-based)
+- Setting up the required environment
+- Running inference with sample inputs
+- Understanding the model architecture and forward pass
+
+### 3. Project Structure Setup
+
+Create the necessary file structure by referencing previous model implementations. You'll need:
+
+```
+model/
+└── your-model/
+    ├── model.go         # Architecture and forward pass implementation
+    ├── convert.go       # Weight conversion logic (PyTorch/SafeTensors to GGML)
+    └── convert_test.go  # Conversion logic tests
+```
+
+Add your model to the main paths in [model/models/models.go](https://github.com/ollama/ollama/blob/main/model/models/models.go):
+
+```
+package models
+
+import (
+    _ "github.com/ollama/ollama/model/models/llama"
+    _ "github.com/ollama/ollama/model/models/mllama"
+    _ "github.com/ollama/ollama/model/models/your-model"  // Add your model here
+)
+```
+
+### 4. Development Process
+
+1. **Open a Draft PR**
+   - Create a draft pull request in the `ollama/ollama` repository
+   - Use this as a communication channel with Ollama maintainers
+
+2. **Implement Weight Conversion**
+   - Work on `convert.go`
+   - Reference existing conversion implementations
+   - Create a basic Modelfile:
+     ```
+     FROM /path/to/model
+     ```
+   - Test conversion:
+     ```bash
+     go run . create <my-model> -f /path/to/Modelfile
+     ```
+
+3. **Implement Model Logic**
+   - Implement `New()` and `Forward()` functions in `model.go`
+   - Reference existing model implementations
+   - Debug forward pass:
+     ```bash
+     OLLAMA_DEBUG=1 go run . run <my-model>
+     ```
+   - Compare output with research implementation
+
+4. **Tokenizer Implementation**
+   - Implement a new tokenizer if required
+   - Ensure compatibility with model architecture
+
+5. **Text Generation Testing**
+   - Implement proper prompt formatting
+   - Test basic generation:
+     ```bash
+     go run . run <my-model> "hello"
+     ```
+
+### 5. Testing
+
+1. Add comprehensive tests to:
+   - `model_test.go`
+   - `convert_test.go`
+
+2. Ensure tests cover:
+   - Weight conversion
+   - Model initialization
+   - Text generation
+
+### 6. Model Deployment
+
+1. **Finalize Pull Request**
+   - Move PR out of draft state
+   - Address reviewer feedback
+
+2. **Deploy to ollama.com**
+   - Determine model prompt format
+   - Convert prompt format to Go template
+   - Create final Modelfile:
+     ```
+     FROM <converted-gguf>
+     TEMPLATE <prompt-template>
+     LICENSE <license-info>
+     # Add additional parameters if needed
+     ```
+   - Push to ollama.com:
+     ```bash
+     ollama create <your-namespace>/<your-model> -f /path/to/Modelfile
+     ollama push <your-namespace>/<your-model>
+     ```
+
+3. **Integration Testing**
+   - Run end-to-end tests
+   - Verify model behavior in production environment