addressing new comments after merge

Signed-off-by: Matt Williams <m@technovangelist.com>
applied mikes comments
2023-10-15 14:17:23 -07:00 · 2023-10-14 08:29:24 -07:00 · 2023-10-12 15:57:50 -07:00 · 2023-10-12 15:34:57 -07:00 · 2023-10-12 12:56:43 -07:00 · 2023-10-12 12:52:43 -07:00
91 changed files with 3110 additions and 1039 deletions
--- a/.dockerignore
+++ b/.dockerignore
@@ -5,3 +5,4 @@ dist
 scripts
 llm/llama.cpp/ggml
 llm/llama.cpp/gguf
 .env
--- a/20
+++ b/20
@@ -1,9 +1,7 @@
-ARG CUDA_VERSION=12.2.0
+FROM nvidia/cuda:11.8.0-devel-ubuntu22.04
 FROM nvidia/cuda:$CUDA_VERSION-devel-ubuntu22.04
 ARG TARGETARCH
-ARG VERSION=0.0.0
+ARG GOFLAGS="'-ldflags=-w -s'"
 WORKDIR /go/src/github.com/jmorganca/ollama
 RUN apt-get update && apt-get install -y git build-essential cmake
@@ -12,20 +10,14 @@ RUN mkdir -p /usr/local && tar xz -C /usr/local </tmp/go1.21.1.tar.gz
 COPY . .
 ENV GOARCH=$TARGETARCH
 ENV GOFLAGS=$GOFLAGS
 RUN /usr/local/go/bin/go generate ./... \
-    && /usr/local/go/bin/go build -ldflags "-linkmode=external -extldflags='-static' -X=github.com/jmorganca/ollama/version.Version=$VERSION -X=github.com/jmorganca/ollama/server.mode=release" .
+    && /usr/local/go/bin/go build .
 FROM ubuntu:22.04
 ENV OLLAMA_HOST 0.0.0.0
 RUN apt-get update && apt-get install -y ca-certificates
 ARG USER=ollama
 ARG GROUP=ollama
 RUN groupadd $GROUP && useradd -m -g $GROUP $USER
 COPY --from=0 /go/src/github.com/jmorganca/ollama/ollama /bin/ollama
-
+EXPOSE 11434
-USER $USER:$GROUP
+ENV OLLAMA_HOST 0.0.0.0
 ENTRYPOINT ["/bin/ollama"]
 CMD ["serve"]
--- a/Dockerfile.build
+++ b/Dockerfile.build
@@ -1,4 +1,3 @@
 ARG VERSION=0.0.0
 # centos7 amd64 dependencies
 FROM --platform=linux/amd64 nvidia/cuda:11.8.0-devel-centos7 AS base-amd64
@@ -15,6 +14,7 @@ RUN yum install -y git cmake
 FROM base-${TARGETARCH}
 ARG TARGETARCH
 ARG GOFLAGS="'-ldflags -w -s'"
 # install go
 ADD https://dl.google.com/go/go1.21.1.linux-$TARGETARCH.tar.gz /tmp/go1.21.1.tar.gz
@@ -23,7 +23,10 @@ RUN mkdir -p /usr/local && tar xz -C /usr/local </tmp/go1.21.1.tar.gz
 # build the final binary
 WORKDIR /go/src/github.com/jmorganca/ollama
 COPY . .
 ENV GOOS=linux
 ENV GOARCH=$TARGETARCH
 ENV GOFLAGS=$GOFLAGS
 RUN /usr/local/go/bin/go generate ./... && \
-    /usr/local/go/bin/go build -ldflags "-X=github.com/jmorganca/ollama/version.Version=$VERSION -X=github.com/jmorganca/ollama/server.mode=release" .
+    /usr/local/go/bin/go build .
--- a/README.md
+++ b/README.md
@@ -9,19 +9,27 @@
 [![Discord](https://dcbadge.vercel.app/api/server/ollama?style=flat&compact=true)](https://discord.gg/ollama)
-Run, create, and share large language models (LLMs).
+Get up and running with large language models locally.
-> Note: Ollama is in early preview. Please report any issues you find.
+### macOS
-## Download
+[Download](https://ollama.ai/download/Ollama-darwin.zip)
- [Download](https://ollama.ai/download) for macOS
+### Linux & WSL2
- Download for Windows and Linux (coming soon)
+
- Build [from source](#building)
+```
 curl https://ollama.ai/install.sh | sh
 ```
 [Manual install instructions](https://github.com/jmorganca/ollama/blob/main/docs/linux.md)
 ### Windows
 coming soon
 ## Quickstart
-To run and chat with [Llama 2](https://ai.meta.com/llama), the new model by Meta:
+To run and chat with [Llama 2](https://ollama.ai/library/llama2):
 ```
 ollama run llama2
@@ -33,83 +41,46 @@ Ollama supports a list of open-source models available on [ollama.ai/library](ht
 Here are some example open-source models that can be downloaded:
-| Model                    | Parameters | Size  | Download                        |
+| Model              | Parameters | Size  | Download                       |
-| ------------------------ | ---------- | ----- | ------------------------------- |
+| ------------------ | ---------- | ----- | ------------------------------ |
-| Llama2                   | 7B         | 3.8GB | `ollama pull llama2`            |
+| Mistral            | 7B         | 4.1GB | `ollama run mistral`           |
-| Llama2 13B               | 13B        | 7.3GB | `ollama pull llama2:13b`        |
+| Llama 2            | 7B         | 3.8GB | `ollama run llama2`            |
-| Llama2 70B               | 70B        | 39GB  | `ollama pull llama2:70b`        |
+| Code Llama         | 7B         | 3.8GB | `ollama run codellama`         |
-| Llama2 Uncensored        | 7B         | 3.8GB | `ollama pull llama2-uncensored` |
+| Llama 2 Uncensored | 7B         | 3.8GB | `ollama run llama2-uncensored` |
-| Code Llama               | 7B         | 3.8GB | `ollama pull codellama`         |
+| Llama 2 13B        | 13B        | 7.3GB | `ollama run llama2:13b`        |
-| Orca Mini                | 3B         | 1.9GB | `ollama pull orca-mini`         |
+| Llama 2 70B        | 70B        | 39GB  | `ollama run llama2:70b`        |
-| Vicuna                   | 7B         | 3.8GB | `ollama pull vicuna`            |
+| Orca Mini          | 3B         | 1.9GB | `ollama run orca-mini`         |
-| Nous-Hermes              | 7B         | 3.8GB | `ollama pull nous-hermes`       |
+| Vicuna             | 7B         | 3.8GB | `ollama run vicuna`            |
 | Nous-Hermes 13B          | 13B        | 7.3GB | `ollama pull nous-hermes:13b`   |
 | Wizard Vicuna Uncensored | 13B        | 7.3GB | `ollama pull wizard-vicuna`     |
 > Note: You should have at least 8 GB of RAM to run the 3B models, 16 GB to run the 7B models, and 32 GB to run the 13B models.
-## Examples
+## Customize your own model
-### Pull a public model
+### Import from GGUF or GGML
-```
+Ollama supports importing GGUF and GGML file formats in the Modelfile. This means if you have a model that is not in the Ollama library, you can create it, iterate on it, and upload it to the Ollama library to share with others when you are ready.
 ollama pull llama2
 ```
-> This command can also be used to update a local model. Only updated changes will be pulled.
+1. Create a file named Modelfile, and add a `FROM` instruction with the local filepath to the model you want to import.
-### Run a model interactively
+   ```
   FROM ./vicuna-33b.Q4_0.gguf
   ```
-```
+2. Create the model in Ollama
 ollama run llama2
 >>> hi
 Hello! How can I help you today?
 ```
-For multiline input, you can wrap text with `"""`:
+   ```
   ollama create name -f path_to_modelfile
   ```
-```
+3. Run the model
 >>> """Hello,
 ... world!
 ... """
 I'm a basic program that prints the famous "Hello, world!" message to the console.
 ```
-### Run a model non-interactively
+   ```
   ollama run name
   ```
-```
+### Customize a prompt
 $ ollama run llama2 'tell me a joke'
 Sure! Here's a quick one:
 Why did the scarecrow win an award? Because he was outstanding in his field!
 ```
-```
+Models from the Ollama library can be customized with a prompt. The example
 $ cat <<EOF >prompts.txt
 tell me a joke about llamas
 tell me another one
 EOF
 $ ollama run llama2 <prompts.txt
 >>> tell me a joke about llamas
 Why did the llama refuse to play hide-and-seek?
 nobody likes to be hided!
 >>> tell me another one
 Sure, here's another one:
 Why did the llama go to the bar?
 To have a hay-often good time!
 ```
 ### Run a model on contents of a text file
 ```
 $ ollama run llama2 "summarize this file:" "$(cat README.md)"
 Ollama is a lightweight, extensible framework for building and running language models on the local machine. It provides a simple API for creating, running, and managing models, as well as a library of pre-built models that can be easily used in a variety of applications.
 ```
 ### Customize a model
 Pull a base model:
 ```
 ollama pull llama2
@@ -138,30 +109,61 @@ ollama run mario
 Hello! It's your friend Mario.
 ```
-For more examples, see the [examples](./examples) directory. For more information on creating a Modelfile, see the [Modelfile](./docs/modelfile.md) documentation.
+For more examples, see the [examples](examples) directory. For more information on working with a Modelfile, see the [Modelfile](docs/modelfile.md) documentation.
-### Listing local models
+## CLI Reference
 ### Create a model
 `ollama create` is used to create a model from a Modelfile.
 ### Pull a model
 ```
-ollama list
+ollama pull llama2
 ```
-### Removing local models
+> This command can also be used to update a local model. Only the diff will be pulled.
 ### Remove a model
 ```
 ollama rm llama2
 ```
-## Model packages
+### Copy a model
-### Overview
+```
 ollama cp llama2 my-llama2
 ```
-Ollama bundles model weights, configurations, and data into a single package, defined by a [Modelfile](./docs/modelfile.md).
+### Multiline input
-<picture>
+For multiline input, you can wrap text with `"""`:
-  <source media="(prefers-color-scheme: dark)" height="480" srcset="https://github.com/jmorganca/ollama/assets/251292/2fd96b5f-191b-45c1-9668-941cfad4eb70">
+
-  <img alt="logo" height="480" src="https://github.com/jmorganca/ollama/assets/251292/2fd96b5f-191b-45c1-9668-941cfad4eb70">
+```
-</picture>
+>>> """Hello,
 ... world!
 ... """
 I'm a basic program that prints the famous "Hello, world!" message to the console.
 ```
 ### Pass in prompt as arguments
 ```
 $ ollama run llama2 "summarize this file:" "$(cat README.md)"
 Ollama is a lightweight, extensible framework for building and running language models on the local machine. It provides a simple API for creating, running, and managing models, as well as a library of pre-built models that can be easily used in a variety of applications.
 ```
 ### List models on your computer
 ```
 ollama list
 ```
 ### Start Ollama
 `ollama serve` is used when you want to start ollama without running the desktop application.
 ## Building
@@ -193,7 +195,7 @@ Finally, in a separate shell, run a model:
 ## REST API
-> See the [API documentation](./docs/api.md) for all endpoints.
+> See the [API documentation](docs/api.md) for all endpoints.
 Ollama has an API for running and managing models. For example to generate text from a model:
@@ -204,18 +206,19 @@ curl -X POST http://localhost:11434/api/generate -d '{
 }'
 ```
-## Community Projects using Ollama
+## Community Integrations
-| Project                                                                    | Description                                                                                                                                                  |
+- [LangChain](https://python.langchain.com/docs/integrations/llms/ollama) and [LangChain.js](https://js.langchain.com/docs/modules/model_io/models/llms/integrations/ollama) with [example](https://js.langchain.com/docs/use_cases/question_answering/local_retrieval_qa)
-| -------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------ |
+- [LlamaIndex](https://gpt-index.readthedocs.io/en/stable/examples/llm/ollama.html)
-| [LangChain][1] and [LangChain.js][2]                                       | Also, there is a question-answering [example][3].                                                                                                            |
+- [Raycast extension](https://github.com/MassimilianoPasquini97/raycast_ollama)
-| [Continue](https://github.com/continuedev/continue)                        | Embeds Ollama inside Visual Studio Code. The extension lets you highlight code to add to the prompt, ask questions in the sidebar, and generate code inline. |
+- [Discollama](https://github.com/mxyng/discollama) (Discord bot inside the Ollama discord channel)
-| [LiteLLM](https://github.com/BerriAI/litellm)                              | Lightweight Python package to simplify LLM API calls.                                                                                                        |
+- [Continue](https://github.com/continuedev/continue)
-| [Discord AI Bot](https://github.com/mekb-turtle/discord-ai-bot)            | Interact with Ollama as a chatbot on Discord.                                                                                                                |
+- [Obsidian Ollama plugin](https://github.com/hinterdupfinger/obsidian-ollama)
-| [Raycast Ollama](https://github.com/MassimilianoPasquini97/raycast_ollama) | Raycast extension to use Ollama for local llama inference on Raycast.                                                                                        |
+- [Dagger Chatbot](https://github.com/samalba/dagger-chatbot)
-| [Simple HTML UI](https://github.com/rtcfirefly/ollama-ui)                  | Also, there is a Chrome extension.                                                                                                                           |
+- [LiteLLM](https://github.com/BerriAI/litellm)
-| [Emacs client](https://github.com/zweifisch/ollama)                        |                                                                                                                                                              |
+- [Discord AI Bot](https://github.com/mekb-turtle/discord-ai-bot)
-
+- [Chatbot UI](https://github.com/ivanfioravanti/chatbot-ollama)
-[1]: https://python.langchain.com/docs/integrations/llms/ollama
+- [HTML UI](https://github.com/rtcfirefly/ollama-ui)
-[2]: https://js.langchain.com/docs/modules/model_io/models/llms/integrations/ollama
+- [Typescript UI](https://github.com/ollama-interface/Ollama-Gui?tab=readme-ov-file)
-[3]: https://js.langchain.com/docs/use_cases/question_answering/local_retrieval_qa
+- [Dumbar](https://github.com/JerrySievert/Dumbar)
 - [Emacs client](https://github.com/zweifisch/ollama)
--- a/api/client.go
+++ b/api/client.go
@@ -7,6 +7,7 @@ import (
 	"encoding/json"
 	"fmt"
 	"io"
 	"net"
 	"net/http"
 	"net/url"
 	"os"
@@ -18,14 +19,11 @@ import (
 const DefaultHost = "127.0.0.1:11434"
-var (
+var envHost = os.Getenv("OLLAMA_HOST")
 	envHost = os.Getenv("OLLAMA_HOST")
 )
 type Client struct {
-	Base    url.URL
+	base *url.URL
-	HTTP    http.Client
+	http http.Client
 	Headers http.Header
 }
 func checkError(resp *http.Response, body []byte) error {
@@ -44,34 +42,44 @@ func checkError(resp *http.Response, body []byte) error {
 	return apiError
 }
-// Host returns the default host to use for the client. It is determined in the following order:
+func ClientFromEnvironment() (*Client, error) {
-// 1. The OLLAMA_HOST environment variable
+	scheme, hostport, ok := strings.Cut(os.Getenv("OLLAMA_HOST"), "://")
-// 2. The default host (localhost:11434)
+	if !ok {
-func Host() string {
+		scheme, hostport = "http", os.Getenv("OLLAMA_HOST")
 	if envHost != "" {
 		return envHost
 	}
 	return DefaultHost
 }
 // FromEnv creates a new client using Host() as the host. An error is returns
 // if the host is invalid.
 func FromEnv() (*Client, error) {
 	h := Host()
 	if !strings.HasPrefix(h, "http://") && !strings.HasPrefix(h, "https://") {
 		h = "http://" + h
 	}
-	u, err := url.Parse(h)
+	host, port, err := net.SplitHostPort(hostport)
 	if err != nil {
-		return nil, fmt.Errorf("could not parse host: %w", err)
+		host, port = "127.0.0.1", "11434"
 		if ip := net.ParseIP(strings.Trim(os.Getenv("OLLAMA_HOST"), "[]")); ip != nil {
 			host = ip.String()
 		}
 	}
-	if u.Port() == "" {
+	client := Client{
-		u.Host += ":11434"
+		base: &url.URL{
 			Scheme: scheme,
 			Host:   net.JoinHostPort(host, port),
 		},
 	}
-	return &Client{Base: *u, HTTP: http.Client{}}, nil
+	mockRequest, err := http.NewRequest("HEAD", client.base.String(), nil)
 	if err != nil {
 		return nil, err
 	}
 	proxyURL, err := http.ProxyFromEnvironment(mockRequest)
 	if err != nil {
 		return nil, err
 	}
 	client.http = http.Client{
 		Transport: &http.Transport{
 			Proxy: http.ProxyURL(proxyURL),
 		},
 	}
 	return &client, nil
 }
 func (c *Client) do(ctx context.Context, method, path string, reqData, respData any) error {
@@ -86,7 +94,7 @@ func (c *Client) do(ctx context.Context, method, path string, reqData, respData
 		reqBody = bytes.NewReader(data)
 	}
-	requestURL := c.Base.JoinPath(path)
+	requestURL := c.base.JoinPath(path)
 	request, err := http.NewRequestWithContext(ctx, method, requestURL.String(), reqBody)
 	if err != nil {
 		return err
@@ -96,11 +104,7 @@ func (c *Client) do(ctx context.Context, method, path string, reqData, respData
 	request.Header.Set("Accept", "application/json")
 	request.Header.Set("User-Agent", fmt.Sprintf("ollama/%s (%s %s) Go/%s", version.Version, runtime.GOARCH, runtime.GOOS, runtime.Version()))
-	for k, v := range c.Headers {
+	respObj, err := c.http.Do(request)
 		request.Header[k] = v
 	}
 	respObj, err := c.HTTP.Do(request)
 	if err != nil {
 		return err
 	}
@@ -123,6 +127,8 @@ func (c *Client) do(ctx context.Context, method, path string, reqData, respData
 	return nil
 }
 const maxBufferSize = 512 * 1000 // 512KB
 func (c *Client) stream(ctx context.Context, method, path string, data any, fn func([]byte) error) error {
 	var buf *bytes.Buffer
 	if data != nil {
@@ -134,23 +140,26 @@ func (c *Client) stream(ctx context.Context, method, path string, data any, fn f
 		buf = bytes.NewBuffer(bts)
 	}
-	requestURL := c.Base.JoinPath(path)
+	requestURL := c.base.JoinPath(path)
 	request, err := http.NewRequestWithContext(ctx, method, requestURL.String(), buf)
 	if err != nil {
 		return err
 	}
 	request.Header.Set("Content-Type", "application/json")
-	request.Header.Set("Accept", "application/json")
+	request.Header.Set("Accept", "application/x-ndjson")
 	request.Header.Set("User-Agent", fmt.Sprintf("ollama/%s (%s %s) Go/%s", version.Version, runtime.GOARCH, runtime.GOOS, runtime.Version()))
-	response, err := http.DefaultClient.Do(request)
+	response, err := c.http.Do(request)
 	if err != nil {
 		return err
 	}
 	defer response.Body.Close()
 	scanner := bufio.NewScanner(response.Body)
 	// increase the buffer size to avoid running out of space
 	scanBuf := make([]byte, 0, maxBufferSize)
 	scanner.Buffer(scanBuf, maxBufferSize)
 	for scanner.Scan() {
 		var errorResponse struct {
 			Error string `json:"error,omitempty"`
--- a/api/types.go
+++ b/api/types.go
@@ -37,6 +37,7 @@ type GenerateRequest struct {
 	System   string `json:"system"`
 	Template string `json:"template"`
 	Context  []int  `json:"context,omitempty"`
 	Stream   *bool  `json:"stream,omitempty"`
 	Options map[string]interface{} `json:"options"`
 }
@@ -53,8 +54,9 @@ type EmbeddingResponse struct {
 }
 type CreateRequest struct {
-	Name string `json:"name"`
+	Name   string `json:"name"`
-	Path string `json:"path"`
+	Path   string `json:"path"`
 	Stream *bool  `json:"stream,omitempty"`
 }
 type DeleteRequest struct {
@@ -83,13 +85,14 @@ type PullRequest struct {
 	Insecure bool   `json:"insecure,omitempty"`
 	Username string `json:"username"`
 	Password string `json:"password"`
 	Stream   *bool  `json:"stream,omitempty"`
 }
 type ProgressResponse struct {
 	Status    string `json:"status"`
 	Digest    string `json:"digest,omitempty"`
-	Total     int    `json:"total,omitempty"`
+	Total     int64  `json:"total,omitempty"`
-	Completed int    `json:"completed,omitempty"`
+	Completed int64  `json:"completed,omitempty"`
 }
 type PushRequest struct {
@@ -97,6 +100,7 @@ type PushRequest struct {
 	Insecure bool   `json:"insecure,omitempty"`
 	Username string `json:"username"`
 	Password string `json:"password"`
 	Stream   *bool  `json:"stream,omitempty"`
 }
 type ListResponse struct {
@@ -106,7 +110,7 @@ type ListResponse struct {
 type ModelResponse struct {
 	Name       string    `json:"name"`
 	ModifiedAt time.Time `json:"modified_at"`
-	Size       int       `json:"size"`
+	Size       int64     `json:"size"`
 	Digest     string    `json:"digest"`
 }
@@ -117,7 +121,7 @@ type TokenResponse struct {
 type GenerateResponse struct {
 	Model     string    `json:"model"`
 	CreatedAt time.Time `json:"created_at"`
-	Response  string    `json:"response,omitempty"`
+	Response  string    `json:"response"`
 	Done    bool  `json:"done"`
 	Context []int `json:"context,omitempty"`
@@ -201,6 +205,8 @@ type Options struct {
 	NumThread int `json:"num_thread,omitempty"`
 }
 var ErrInvalidOpts = fmt.Errorf("invalid options")
 func (opts *Options) FromMap(m map[string]interface{}) error {
 	valueOpts := reflect.ValueOf(opts).Elem() // names of the fields in the options struct
 	typeOpts := reflect.TypeOf(opts).Elem()   // types of the fields in the options struct
@@ -214,6 +220,7 @@ func (opts *Options) FromMap(m map[string]interface{}) error {
 		}
 	}
 	invalidOpts := []string{}
 	for key, val := range m {
 		if opt, ok := jsonOpts[key]; ok {
 			field := valueOpts.FieldByName(opt.Name)
@@ -231,12 +238,12 @@ func (opts *Options) FromMap(m map[string]interface{}) error {
 						// when JSON unmarshals numbers, it uses float64, not int
 						field.SetInt(int64(t))
 					default:
-						log.Printf("could not convert model parameter %v to int, skipped", key)
+						log.Printf("could not convert model parameter %v of type %T to int, skipped", key, val)
 					}
 				case reflect.Bool:
 					val, ok := val.(bool)
 					if !ok {
-						log.Printf("could not convert model parameter %v to bool, skipped", key)
+						log.Printf("could not convert model parameter %v of type %T to bool, skipped", key, val)
 						continue
 					}
 					field.SetBool(val)
@@ -244,14 +251,14 @@ func (opts *Options) FromMap(m map[string]interface{}) error {
 					// JSON unmarshals to float64
 					val, ok := val.(float64)
 					if !ok {
-						log.Printf("could not convert model parameter %v to float32, skipped", key)
+						log.Printf("could not convert model parameter %v of type %T to float32, skipped", key, val)
 						continue
 					}
 					field.SetFloat(val)
 				case reflect.String:
 					val, ok := val.(string)
 					if !ok {
-						log.Printf("could not convert model parameter %v to string, skipped", key)
+						log.Printf("could not convert model parameter %v of type %T to string, skipped", key, val)
 						continue
 					}
 					field.SetString(val)
@@ -259,7 +266,7 @@ func (opts *Options) FromMap(m map[string]interface{}) error {
 					// JSON unmarshals to []interface{}, not []string
 					val, ok := val.([]interface{})
 					if !ok {
-						log.Printf("could not convert model parameter %v to slice, skipped", key)
+						log.Printf("could not convert model parameter %v of type %T to slice, skipped", key, val)
 						continue
 					}
 					// convert []interface{} to []string
@@ -267,7 +274,7 @@ func (opts *Options) FromMap(m map[string]interface{}) error {
 					for i, item := range val {
 						str, ok := item.(string)
 						if !ok {
-							log.Printf("could not convert model parameter %v to slice of strings, skipped", key)
+							log.Printf("could not convert model parameter %v of type %T to slice of strings, skipped", key, item)
 							continue
 						}
 						slice[i] = str
@@ -277,45 +284,51 @@ func (opts *Options) FromMap(m map[string]interface{}) error {
 					return fmt.Errorf("unknown type loading config params: %v", field.Kind())
 				}
 			}
 		} else {
 			invalidOpts = append(invalidOpts, key)
 		}
 	}
 	if len(invalidOpts) > 0 {
 		return fmt.Errorf("%w: %v", ErrInvalidOpts, strings.Join(invalidOpts, ", "))
 	}
 	return nil
 }
 func DefaultOptions() Options {
 	return Options{
-		Seed: -1,
+		// options set on request to runner
-
+		NumPredict:       -1,
-		UseNUMA: false,
+		NumKeep:          -1,
 		NumCtx:             2048,
 		NumKeep:            -1,
 		NumBatch:           512,
 		NumGPU:             -1, // -1 here indicates that NumGPU should be set dynamically
 		NumGQA:             1,
 		LowVRAM:            false,
 		F16KV:              true,
 		UseMMap:            true,
 		UseMLock:           false,
 		RopeFrequencyBase:  10000.0,
 		RopeFrequencyScale: 1.0,
 		EmbeddingOnly:      true,
 		RepeatLastN:      64,
 		RepeatPenalty:    1.1,
 		FrequencyPenalty: 0.0,
 		PresencePenalty:  0.0,
 		Temperature:      0.8,
 		TopK:             40,
 		TopP:             0.9,
 		TFSZ:             1.0,
 		TypicalP:         1.0,
 		RepeatLastN:      64,
 		RepeatPenalty:    1.1,
 		PresencePenalty:  0.0,
 		FrequencyPenalty: 0.0,
 		Mirostat:         0,
 		MirostatTau:      5.0,
 		MirostatEta:      0.1,
 		PenalizeNewline:  true,
 		Seed:             -1,
-		NumThread: 0, // let the runtime decide
+		// options set when the model is loaded
 		NumCtx:             2048,
 		RopeFrequencyBase:  10000.0,
 		RopeFrequencyScale: 1.0,
 		NumBatch:           512,
 		NumGPU:             -1, // -1 here indicates that NumGPU should be set dynamically
 		NumGQA:             1,
 		NumThread:          0, // let the runtime decide
 		LowVRAM:            false,
 		F16KV:              true,
 		UseMLock:           false,
 		UseMMap:            true,
 		UseNUMA:            false,
 		EmbeddingOnly:      true,
 	}
 }
--- a/app/src/index.ts
+++ b/app/src/index.ts
@@ -5,7 +5,7 @@ import winston from 'winston'
 import 'winston-daily-rotate-file'
 import * as path from 'path'
-import { analytics, id } from './telemetry'
+import { v4 as uuidv4 } from 'uuid'
 import { installed } from './install'
 require('@electron/remote/main').initialize()
@@ -164,11 +164,11 @@ app.on('before-quit', () => {
 function init() {
  if (app.isPackaged) {
    heartbeat()
    autoUpdater.checkForUpdates()
    setInterval(() => {
-      heartbeat()
+      if (!updateAvailable) {
-      autoUpdater.checkForUpdates()
+        autoUpdater.checkForUpdates()
      }
    }, 60 * 60 * 1000)
  }
@@ -234,28 +234,26 @@ app.on('window-all-closed', () => {
  }
 })
-// In this file you can include the rest of your app's specific main process
+function id(): string {
-// code. You can also put them in separate files and import them here.
+  const id = store.get('id') as string
 let aid = ''
 try {
  aid = id()
 } catch (e) {}
-autoUpdater.setFeedURL({
+  if (id) {
-  url: `https://ollama.ai/api/update?os=${process.platform}&arch=${process.arch}&version=${app.getVersion()}&id=${aid}`,
+    return id
-})
+  }
-async function heartbeat() {
+  const uuid = uuidv4()
-  analytics.track({
+  store.set('id', uuid)
-    anonymousId: aid,
+  return uuid
    event: 'heartbeat',
    properties: {
      version: app.getVersion(),
    },
  })
 }
 autoUpdater.setFeedURL({
  url: `https://ollama.ai/api/update?os=${process.platform}&arch=${
    process.arch
  }&version=${app.getVersion()}&id=${id()}`,
 })
 autoUpdater.on('error', e => {
  logger.error(`update check failed - ${e.message}`)
  console.error(`update check failed - ${e.message}`)
 })
--- a/app/src/telemetry.ts
+++ b/app/src/telemetry.ts
@@ -1,19 +0,0 @@
 import { Analytics } from '@segment/analytics-node'
 import { v4 as uuidv4 } from 'uuid'
 import Store from 'electron-store'
 const store = new Store()
 export const analytics = new Analytics({ writeKey: process.env.TELEMETRY_WRITE_KEY || '<empty>' })
 export function id(): string {
  const id = store.get('id') as string
  if (id) {
    return id
  }
  const uuid = uuidv4()
  store.set('id', uuid)
  return uuid
 }
--- a/cmd/cmd.go
+++ b/cmd/cmd.go
@@ -13,9 +13,11 @@ import (
 	"net"
 	"os"
 	"os/exec"
 	"os/signal"
 	"path/filepath"
 	"runtime"
 	"strings"
 	"syscall"
 	"time"
 	"github.com/dustin/go-humanize"
@@ -33,15 +35,22 @@ import (
 )
 type Painter struct {
-	HideHint bool
+	IsMultiLine bool
 }
 func (p Painter) Paint(line []rune, _ int) []rune {
 	termType := os.Getenv("TERM")
-	if termType == "xterm-256color" && len(line) == 0 && !p.HideHint {
+	if termType == "xterm-256color" && len(line) == 0 {
-		prompt := "Send a message (/? for help)"
+		var prompt string
 		if p.IsMultiLine {
 			prompt = "Use \"\"\" to end multi-line input"
 		} else {
 			prompt = "Send a message (/? for help)"
 		}
 		return []rune(fmt.Sprintf("\033[38;5;245m%s\033[%dD\033[0m", prompt, len(prompt)))
 	}
 	// add a space and a backspace to prevent the cursor from walking up the screen
 	line = append(line, []rune(" \b")...)
 	return line
 }
@@ -52,7 +61,7 @@ func CreateHandler(cmd *cobra.Command, args []string) error {
 		return err
 	}
-	client, err := api.FromEnv()
+	client, err := api.ClientFromEnvironment()
 	if err != nil {
 		return err
 	}
@@ -71,18 +80,18 @@ func CreateHandler(cmd *cobra.Command, args []string) error {
 			currentDigest = resp.Digest
 			switch {
 			case strings.Contains(resp.Status, "embeddings"):
-				bar = progressbar.Default(int64(resp.Total), resp.Status)
+				bar = progressbar.Default(resp.Total, resp.Status)
-				bar.Set(resp.Completed)
+				bar.Set64(resp.Completed)
 			default:
 				// pulling
 				bar = progressbar.DefaultBytes(
-					int64(resp.Total),
+					resp.Total,
 					resp.Status,
 				)
-				bar.Set(resp.Completed)
+				bar.Set64(resp.Completed)
 			}
 		} else if resp.Digest == currentDigest && resp.Digest != "" {
-			bar.Set(resp.Completed)
+			bar.Set64(resp.Completed)
 		} else {
 			currentDigest = ""
 			if spinner != nil {
@@ -110,7 +119,7 @@ func CreateHandler(cmd *cobra.Command, args []string) error {
 }
 func RunHandler(cmd *cobra.Command, args []string) error {
-	client, err := api.FromEnv()
+	client, err := api.ClientFromEnvironment()
 	if err != nil {
 		return err
 	}
@@ -120,13 +129,9 @@ func RunHandler(cmd *cobra.Command, args []string) error {
 		return err
 	}
-	modelName, modelTag, ok := strings.Cut(args[0], ":")
+	canonicalModelPath := server.ParseModelPath(args[0])
 	if !ok {
 		modelTag = "latest"
 	}
 	for _, model := range models.Models {
-		if model.Name == strings.Join([]string{modelName, modelTag}, ":") {
+		if model.Name == canonicalModelPath.GetShortTagname() {
 			return RunGenerate(cmd, args)
 		}
 	}
@@ -139,7 +144,7 @@ func RunHandler(cmd *cobra.Command, args []string) error {
 }
 func PushHandler(cmd *cobra.Command, args []string) error {
-	client, err := api.FromEnv()
+	client, err := api.ClientFromEnvironment()
 	if err != nil {
 		return err
 	}
@@ -157,13 +162,13 @@ func PushHandler(cmd *cobra.Command, args []string) error {
 		if resp.Digest != currentDigest && resp.Digest != "" {
 			currentDigest = resp.Digest
 			bar = progressbar.DefaultBytes(
-				int64(resp.Total),
+				resp.Total,
 				fmt.Sprintf("pushing %s...", resp.Digest[7:19]),
 			)
-			bar.Set(resp.Completed)
+			bar.Set64(resp.Completed)
 		} else if resp.Digest == currentDigest && resp.Digest != "" {
-			bar.Set(resp.Completed)
+			bar.Set64(resp.Completed)
 		} else {
 			currentDigest = ""
 			fmt.Println(resp.Status)
@@ -183,7 +188,7 @@ func PushHandler(cmd *cobra.Command, args []string) error {
 }
 func ListHandler(cmd *cobra.Command, args []string) error {
-	client, err := api.FromEnv()
+	client, err := api.ClientFromEnvironment()
 	if err != nil {
 		return err
 	}
@@ -216,7 +221,7 @@ func ListHandler(cmd *cobra.Command, args []string) error {
 }
 func DeleteHandler(cmd *cobra.Command, args []string) error {
-	client, err := api.FromEnv()
+	client, err := api.ClientFromEnvironment()
 	if err != nil {
 		return err
 	}
@@ -232,7 +237,7 @@ func DeleteHandler(cmd *cobra.Command, args []string) error {
 }
 func ShowHandler(cmd *cobra.Command, args []string) error {
-	client, err := api.FromEnv()
+	client, err := api.ClientFromEnvironment()
 	if err != nil {
 		return err
 	}
@@ -310,7 +315,7 @@ func ShowHandler(cmd *cobra.Command, args []string) error {
 }
 func CopyHandler(cmd *cobra.Command, args []string) error {
-	client, err := api.FromEnv()
+	client, err := api.ClientFromEnvironment()
 	if err != nil {
 		return err
 	}
@@ -333,7 +338,7 @@ func PullHandler(cmd *cobra.Command, args []string) error {
 }
 func pull(model string, insecure bool) error {
-	client, err := api.FromEnv()
+	client, err := api.ClientFromEnvironment()
 	if err != nil {
 		return err
 	}
@@ -346,13 +351,13 @@ func pull(model string, insecure bool) error {
 		if resp.Digest != currentDigest && resp.Digest != "" {
 			currentDigest = resp.Digest
 			bar = progressbar.DefaultBytes(
-				int64(resp.Total),
+				resp.Total,
 				fmt.Sprintf("pulling %s...", resp.Digest[7:19]),
 			)
-			bar.Set(resp.Completed)
+			bar.Set64(resp.Completed)
 		} else if resp.Digest == currentDigest && resp.Digest != "" {
-			bar.Set(resp.Completed)
+			bar.Set64(resp.Completed)
 		} else {
 			currentDigest = ""
 			fmt.Println(resp.Status)
@@ -375,7 +380,20 @@ func pull(model string, insecure bool) error {
 func RunGenerate(cmd *cobra.Command, args []string) error {
 	if len(args) > 1 {
 		// join all args into a single prompt
-		return generate(cmd, args[0], strings.Join(args[1:], " "))
+		wordWrap := false
 		if term.IsTerminal(int(os.Stdout.Fd())) {
 			wordWrap = true
 		}
 		nowrap, err := cmd.Flags().GetBool("nowordwrap")
 		if err != nil {
 			return err
 		}
 		if nowrap {
 			wordWrap = false
 		}
 		return generate(cmd, args[0], strings.Join(args[1:], " "), wordWrap)
 	}
 	if readline.IsTerminal(int(os.Stdin.Fd())) {
@@ -387,8 +405,8 @@ func RunGenerate(cmd *cobra.Command, args []string) error {
 type generateContextKey string
-func generate(cmd *cobra.Command, model, prompt string) error {
+func generate(cmd *cobra.Command, model, prompt string, wordWrap bool) error {
-	client, err := api.FromEnv()
+	client, err := api.ClientFromEnvironment()
 	if err != nil {
 		return err
 	}
@@ -403,25 +421,23 @@ func generate(cmd *cobra.Command, model, prompt string) error {
 		generateContext = []int{}
 	}
 	var wrapTerm bool
 	termType := os.Getenv("TERM")
 	if termType == "xterm-256color" {
 		wrapTerm = true
 	}
 	termWidth, _, err := term.GetSize(int(0))
 	if err != nil {
-		wrapTerm = false
+		wordWrap = false
 	}
-	// override wrapping if the user turned it off
+	cancelCtx, cancel := context.WithCancel(context.Background())
-	nowrap, err := cmd.Flags().GetBool("nowordwrap")
+	defer cancel()
-	if err != nil {
+
-		return err
+	sigChan := make(chan os.Signal, 1)
-	}
+	signal.Notify(sigChan, syscall.SIGINT)
-	if nowrap {
+	var abort bool
-		wrapTerm = false
+
-	}
+	go func() {
 		<-sigChan
 		cancel()
 		abort = true
 	}()
 	var currentLineLength int
 	var wordBuffer string
@@ -434,7 +450,7 @@ func generate(cmd *cobra.Command, model, prompt string) error {
 		latest = response
-		if wrapTerm {
+		if wordWrap {
 			for _, ch := range response.Response {
 				if currentLineLength+1 > termWidth-5 {
 					// backtrack the length of the last word and clear to the end of the line
@@ -462,18 +478,10 @@ func generate(cmd *cobra.Command, model, prompt string) error {
 		return nil
 	}
-	if err := client.Generate(context.Background(), &request, fn); err != nil {
+	if err := client.Generate(cancelCtx, &request, fn); err != nil {
-		if strings.Contains(err.Error(), "failed to load model") {
+		if strings.Contains(err.Error(), "context canceled") && abort {
-			// tell the user to check the server log, if it exists locally
+			spinner.Finish()
-			home, nestedErr := os.UserHomeDir()
+			return nil
 			if nestedErr != nil {
 				// return the original error
 				return err
 			}
 			logPath := filepath.Join(home, ".ollama", "logs", "server.log")
 			if _, nestedErr := os.Stat(logPath); nestedErr == nil {
 				err = fmt.Errorf("%w\nFor more details, check the error logs at %s", err, logPath)
 			}
 		}
 		return err
 	}
@@ -483,6 +491,9 @@ func generate(cmd *cobra.Command, model, prompt string) error {
 	}
 	if !latest.Done {
 		if abort {
 			return nil
 		}
 		return errors.New("unexpected end of response")
 	}
@@ -509,7 +520,7 @@ func generateInteractive(cmd *cobra.Command, model string) error {
 	}
 	// load the model
-	if err := generate(cmd, model, ""); err != nil {
+	if err := generate(cmd, model, "", false); err != nil {
 		return err
 	}
@@ -536,8 +547,35 @@ func generateInteractive(cmd *cobra.Command, model string) error {
 	)
 	usage := func() {
-		fmt.Fprintln(os.Stderr, "commands:")
+		fmt.Fprintln(os.Stderr, "Available Commands:")
-		fmt.Fprintln(os.Stderr, completer.Tree("  "))
+		fmt.Fprintln(os.Stderr, "  /set         Set session variables")
 		fmt.Fprintln(os.Stderr, "  /show        Show model information")
 		fmt.Fprintln(os.Stderr, "  /bye         Exit")
 		fmt.Fprintln(os.Stderr, "  /?, /help    Help for a command")
 		fmt.Fprintln(os.Stderr, "")
 		fmt.Fprintln(os.Stderr, "Use \"\"\" to begin a multi-line message.")
 		fmt.Fprintln(os.Stderr, "")
 	}
 	usageSet := func() {
 		fmt.Fprintln(os.Stderr, "Available Commands:")
 		fmt.Fprintln(os.Stderr, "  /set history      Enable history")
 		fmt.Fprintln(os.Stderr, "  /set nohistory    Disable history")
 		fmt.Fprintln(os.Stderr, "  /set wordwrap     Enable wordwrap")
 		fmt.Fprintln(os.Stderr, "  /set nowordwrap   Disable wordwrap")
 		fmt.Fprintln(os.Stderr, "  /set verbose      Show LLM stats")
 		fmt.Fprintln(os.Stderr, "  /set quiet        Disable LLM stats")
 		fmt.Fprintln(os.Stderr, "")
 	}
 	usageShow := func() {
 		fmt.Fprintln(os.Stderr, "Available Commands:")
 		fmt.Fprintln(os.Stderr, "  /show license      Show model license")
 		fmt.Fprintln(os.Stderr, "  /show modelfile    Show Modelfile for this model")
 		fmt.Fprintln(os.Stderr, "  /show parameters   Show parameters for this model")
 		fmt.Fprintln(os.Stderr, "  /show system       Show system prompt")
 		fmt.Fprintln(os.Stderr, "  /show template     Show prompt template")
 		fmt.Fprintln(os.Stderr, "")
 	}
 	var painter Painter
@@ -555,6 +593,21 @@ func generateInteractive(cmd *cobra.Command, model string) error {
 	}
 	defer scanner.Close()
 	var wordWrap bool
 	termType := os.Getenv("TERM")
 	if termType == "xterm-256color" {
 		wordWrap = true
 	}
 	// override wrapping if the user turned it off
 	nowrap, err := cmd.Flags().GetBool("nowordwrap")
 	if err != nil {
 		return err
 	}
 	if nowrap {
 		wordWrap = false
 	}
 	var multiLineBuffer string
 	var isMultiLine bool
@@ -565,7 +618,7 @@ func generateInteractive(cmd *cobra.Command, model string) error {
 			return nil
 		case errors.Is(err, readline.ErrInterrupt):
 			if line == "" {
-				return nil
+				fmt.Println("Use Ctrl-D or /bye to exit.")
 			}
 			continue
@@ -579,7 +632,7 @@ func generateInteractive(cmd *cobra.Command, model string) error {
 		case isMultiLine:
 			if strings.HasSuffix(line, `"""`) {
 				isMultiLine = false
-				painter.HideHint = false
+				painter.IsMultiLine = isMultiLine
 				multiLineBuffer += strings.TrimSuffix(line, `"""`)
 				line = multiLineBuffer
 				multiLineBuffer = ""
@@ -590,9 +643,9 @@ func generateInteractive(cmd *cobra.Command, model string) error {
 			}
 		case strings.HasPrefix(line, `"""`):
 			isMultiLine = true
 			painter.IsMultiLine = isMultiLine
 			multiLineBuffer = strings.TrimPrefix(line, `"""`) + " "
 			scanner.SetPrompt("... ")
 			painter.HideHint = true
 			continue
 		case strings.HasPrefix(line, "/list"):
 			args := strings.Fields(line)
@@ -608,10 +661,10 @@ func generateInteractive(cmd *cobra.Command, model string) error {
 				case "nohistory":
 					scanner.HistoryDisable()
 				case "wordwrap":
-					cmd.Flags().Set("nowordwrap", "false")
+					wordWrap = true
 					fmt.Println("Set 'wordwrap' mode.")
 				case "nowordwrap":
-					cmd.Flags().Set("nowordwrap", "true")
+					wordWrap = false
 					fmt.Println("Set 'nowordwrap' mode.")
 				case "verbose":
 					cmd.Flags().Set("verbose", "true")
@@ -632,9 +685,11 @@ func generateInteractive(cmd *cobra.Command, model string) error {
 					} else {
 						usage()
 					}
 				default:
 					fmt.Printf("Unknown command '/set %s'. Type /? for help\n", args[1])
 				}
 			} else {
-				usage()
+				usageSet()
 			}
 		case strings.HasPrefix(line, "/show"):
 			args := strings.Fields(line)
@@ -642,27 +697,54 @@ func generateInteractive(cmd *cobra.Command, model string) error {
 				resp, err := server.GetModelInfo(model)
 				if err != nil {
 					fmt.Println("error: couldn't get model")
 					return err
 				}
 				switch args[1] {
 				case "license":
-					fmt.Println(resp.License)
+					if resp.License == "" {
 						fmt.Print("No license was specified for this model.\n\n")
 					} else {
 						fmt.Println(resp.License)
 					}
 				case "modelfile":
 					fmt.Println(resp.Modelfile)
 				case "parameters":
-					fmt.Println(resp.Parameters)
+					if resp.Parameters == "" {
 						fmt.Print("No parameters were specified for this model.\n\n")
 					} else {
 						fmt.Println(resp.Parameters)
 					}
 				case "system":
-					fmt.Println(resp.System)
+					if resp.System == "" {
 						fmt.Print("No system prompt was specified for this model.\n\n")
 					} else {
 						fmt.Println(resp.System)
 					}
 				case "template":
-					fmt.Println(resp.Template)
+					if resp.Template == "" {
 						fmt.Print("No prompt template was specified for this model.\n\n")
 					} else {
 						fmt.Println(resp.Template)
 					}
 				default:
-					fmt.Println("error: unknown command")
+					fmt.Printf("Unknown command '/show %s'. Type /? for help\n", args[1])
 				}
 			} else {
 				usageShow()
 			}
 		case strings.HasPrefix(line, "/help"), strings.HasPrefix(line, "/?"):
 			args := strings.Fields(line)
 			if len(args) > 1 {
 				switch args[1] {
 				case "set", "/set":
 					usageSet()
 				case "show", "/show":
 					usageShow()
 				}
 			} else {
 				usage()
 			}
 		case line == "/help", line == "/?":
 			usage()
 		case line == "/exit", line == "/bye":
 			return nil
 		case strings.HasPrefix(line, "/"):
@@ -671,7 +753,7 @@ func generateInteractive(cmd *cobra.Command, model string) error {
 		}
 		if len(line) > 0 && line[0] != '/' {
-			if err := generate(cmd, model, line); err != nil {
+			if err := generate(cmd, model, line, wordWrap); err != nil {
 				return err
 			}
 		}
@@ -683,7 +765,7 @@ func generateBatch(cmd *cobra.Command, model string) error {
 	for scanner.Scan() {
 		prompt := scanner.Text()
 		fmt.Printf(">>> %s\n", prompt)
-		if err := generate(cmd, model, prompt); err != nil {
+		if err := generate(cmd, model, prompt, false); err != nil {
 			return err
 		}
 	}
@@ -718,6 +800,15 @@ func RunServer(cmd *cobra.Command, _ []string) error {
 		if err := server.PruneLayers(); err != nil {
 			return err
 		}
 		manifestsPath, err := server.GetManifestPath()
 		if err != nil {
 			return err
 		}
 		if err := server.PruneDirectory(manifestsPath); err != nil {
 			return err
 		}
 	}
 	return server.Serve(ln, origins)
@@ -804,7 +895,7 @@ func startMacApp(client *api.Client) error {
 }
 func checkServerHeartbeat(_ *cobra.Command, _ []string) error {
-	client, err := api.FromEnv()
+	client, err := api.ClientFromEnvironment()
 	if err != nil {
 		return err
 	}
--- a/docs/api.md
+++ b/docs/api.md
@@ -12,7 +12,6 @@
 - [Push a Model](#push-a-model)
 - [Generate Embeddings](#generate-embeddings)
 ## Conventions
 ### Model names
@@ -23,6 +22,10 @@ Model names follow a `model:tag` format. Some examples are `orca-mini:3b-q4_1` a
 All durations are returned in nanoseconds.
 ### Streaming responses
 Certain endpoints stream responses as JSON objects delineated with the newline (`\n`) character.
 ## Generate a completion
 ```shell
@@ -36,12 +39,13 @@ Generate a response for a given prompt with a provided model. This is a streamin
 - `model`: (required) the [model name](#model-names)
 - `prompt`: the prompt to generate a response for
-Advanced parameters:
+Advanced parameters (optional):
 - `options`: additional model parameters listed in the documentation for the [Modelfile](./modelfile.md#valid-parameters-and-values) such as `temperature`
 - `system`: system prompt to (overrides what is defined in the `Modelfile`)
 - `template`: the full prompt or prompt template (overrides what is defined in the `Modelfile`)
 - `context`: the context parameter returned from a previous request to `/generate`, this can be used to keep a short conversational memory
 - `stream`: if `false` the response will be be returned as a single response object, rather than a stream of objects
 ### Request
@@ -76,6 +80,7 @@ The final response in the stream also includes additional data about the generat
 - `eval_count`: number of tokens the response
 - `eval_duration`: time in nanoseconds spent generating the response
 - `context`: an encoding of the conversation used in this response, this can be sent in the next request to keep a conversational memory
 - `response`: empty if the response was streamed, if not streamed, this will contain the full response
 To calculate how fast the response is generated in tokens per second (token/s), divide `eval_count` / `eval_duration`.
@@ -83,6 +88,7 @@ To calculate how fast the response is generated in tokens per second (token/s),
 {
  "model": "llama2:7b",
  "created_at": "2023-08-04T19:22:45.499127Z",
  "response": "",
  "context": [1, 2, 3],
  "done": true,
  "total_duration": 5589157167,
@@ -108,6 +114,7 @@ Create a model from a [`Modelfile`](./modelfile.md)
 - `name`: name of the model to create
 - `path`: path to the Modelfile
 - `stream`: (optional) if `false` the response will be be returned as a single response object, rather than a stream of objects
 ### Request
@@ -175,7 +182,7 @@ Show details about a model including modelfile, template, parameters, license, a
 ### Request
-```shell  
+```shell
 curl http://localhost:11434/api/show -d '{
  "name": "llama2:7b"
 }'
@@ -185,10 +192,10 @@ curl http://localhost:11434/api/show -d '{
 ```json
 {
-    "license": "<contents of license block>",
+  "license": "<contents of license block>",
-    "modelfile": "# Modelfile generated by \"ollama show\"\n# To build a new Modelfile based on this one, replace the FROM line with:\n# FROM llama2:latest\n\nFROM /Users/username/.ollama/models/blobs/sha256:8daa9615cce30c259a9555b1cc250d461d1bc69980a274b44d7eda0be78076d8\nTEMPLATE \"\"\"[INST] {{ if and .First .System }}<<SYS>>{{ .System }}<</SYS>>\n\n{{ end }}{{ .Prompt }} [/INST] \"\"\"\nSYSTEM \"\"\"\"\"\"\nPARAMETER stop [INST]\nPARAMETER stop [/INST]\nPARAMETER stop <<SYS>>\nPARAMETER stop <</SYS>>\n",
+  "modelfile": "# Modelfile generated by \"ollama show\"\n# To build a new Modelfile based on this one, replace the FROM line with:\n# FROM llama2:latest\n\nFROM /Users/username/.ollama/models/blobs/sha256:8daa9615cce30c259a9555b1cc250d461d1bc69980a274b44d7eda0be78076d8\nTEMPLATE \"\"\"[INST] {{ if and .First .System }}<<SYS>>{{ .System }}<</SYS>>\n\n{{ end }}{{ .Prompt }} [/INST] \"\"\"\nSYSTEM \"\"\"\"\"\"\nPARAMETER stop [INST]\nPARAMETER stop [/INST]\nPARAMETER stop <<SYS>>\nPARAMETER stop <</SYS>>\n",
-    "parameters": "stop                           [INST]\nstop                           [/INST]\nstop                           <<SYS>>\nstop                           <</SYS>>",
+  "parameters": "stop                           [INST]\nstop                           [/INST]\nstop                           <<SYS>>\nstop                           <</SYS>>",
-    "template": "[INST] {{ if and .First .System }}<<SYS>>{{ .System }}<</SYS>>\n\n{{ end }}{{ .Prompt }} [/INST] "
+  "template": "[INST] {{ if and .First .System }}<<SYS>>{{ .System }}<</SYS>>\n\n{{ end }}{{ .Prompt }} [/INST] "
 }
 ```
@@ -241,6 +248,7 @@ Download a model from the ollama library. Cancelled pulls are resumed from where
 - `name`: name of the model to pull
 - `insecure`: (optional) allow insecure connections to the library. Only use this if you are pulling from your own library during development.
 - `stream`: (optional) if `false` the response will be be returned as a single response object, rather than a stream of objects
 ### Request
@@ -271,7 +279,8 @@ Upload a model to a model library. Requires registering for ollama.ai and adding
 ### Parameters
 - `name`: name of the model to push in the form of `<namespace>/<model>:<tag>`
- `insecure`: (optional) allow insecure connections to the library. Only use this if you are pushing to your library during development.  
+- `insecure`: (optional) allow insecure connections to the library. Only use this if you are pushing to your library during development.
 - `stream`: (optional) if `false` the response will be be returned as a single response object, rather than a stream of objects
 ### Request
@@ -286,15 +295,16 @@ curl -X POST http://localhost:11434/api/push -d '{
 Streaming response that starts with:
 ```json
-{"status":"retrieving manifest"}
+{ "status": "retrieving manifest" }
 ```
 and then:
 ```json
 {
-"status":"starting upload","digest":"sha256:bc07c81de745696fdf5afca05e065818a8149fb0c77266fb584d9b2cba3711ab",
+  "status": "starting upload",
-"total":1928429856
+  "digest": "sha256:bc07c81de745696fdf5afca05e065818a8149fb0c77266fb584d9b2cba3711ab",
  "total": 1928429856
 }
 ```
@@ -302,9 +312,10 @@ Then there is a series of uploading responses:
 ```json
 {
-"status":"starting upload",
+  "status": "starting upload",
-"digest":"sha256:bc07c81de745696fdf5afca05e065818a8149fb0c77266fb584d9b2cba3711ab",
+  "digest": "sha256:bc07c81de745696fdf5afca05e065818a8149fb0c77266fb584d9b2cba3711ab",
-"total":1928429856}
+  "total": 1928429856
 }
 ```
 Finally, when the upload is complete:
@@ -348,4 +359,5 @@ curl -X POST http://localhost:11434/api/embeddings -d '{
    0.5670403838157654, 0.009260174818336964, 0.23178744316101074, -0.2916173040866852, -0.8924556970596313,
    0.8785552978515625, -0.34576427936553955, 0.5742510557174683, -0.04222835972905159, -0.137906014919281
  ]
-}```
+}
 ```
--- a/docs/development.md
+++ b/docs/development.md
@@ -10,25 +10,25 @@ Install required tools:
 - go version 1.20 or higher
 - gcc version 11.4.0 or higher
-```
+```bash
 brew install go cmake gcc
 ```
 Get the required libraries:
-```
+```bash
 go generate ./...
 ```
 Then build ollama:
-```
+```bash
 go build .
 ```
 Now you can run `ollama`:
-```
+```bash
 ./ollama
 ```
--- a/docs/faq.md
+++ b/docs/faq.md
@@ -2,16 +2,17 @@
 ## How can I expose the Ollama server?
-```
+```bash
 OLLAMA_HOST=0.0.0.0:11435 ollama serve
 ```
 By default, Ollama allows cross origin requests from `127.0.0.1` and `0.0.0.0`. To support more origins, you can use the `OLLAMA_ORIGINS` environment variable:
-```
+```bash
 OLLAMA_ORIGINS=http://192.168.1.1:*,https://example.com ollama serve
 ```
 ## Where are models stored?
-Raw model data is stored under `~/.ollama/models`.
+* macOS: Raw model data is stored under `~/.ollama/models`.
 * Linux: Raw model data is stored under `/usr/share/ollama/.ollama/models`
--- a/docs/linux.md
+++ b/docs/linux.md
@@ -0,0 +1,83 @@
 # Installing Ollama on Linux
 > Note: A one line installer for Ollama is available by running:
 >
 > ```bash
 > curl https://ollama.ai/install.sh | sh
 > ```
 ## Download the `ollama` binary
 Ollama is distributed as a self-contained binary. Download it to a directory in your PATH:
 ```bash
 sudo curl -L https://ollama.ai/download/ollama-linux-amd64 -o /usr/bin/ollama
 sudo chmod +x /usr/bin/ollama
 ```
 ## Start Ollama
 Start Ollama by running `ollama serve`:
 ```bash
 ollama serve
 ```
 Once Ollama is running, run a model in another terminal session:
 ```bash
 ollama run llama2
 ```
 ## Install CUDA drivers (optional – for Nvidia GPUs)
 [Download and install](https://developer.nvidia.com/cuda-downloads) CUDA.
 Verify that the drivers are installed by running the following command, which should print details about your GPU:
 ```bash
 nvidia-smi
 ```
 ## Adding Ollama as a startup service (optional)
 Create a user for Ollama:
 ```bash
 sudo useradd -r -s /bin/false -m -d /usr/share/ollama ollama
 ```
 Create a service file in `/etc/systemd/system/ollama.service`:
 ```ini
 [Unit]
 Description=Ollama Service
 After=network-online.target
 [Service]
 ExecStart=/usr/bin/ollama serve
 User=ollama
 Group=ollama
 Restart=always
 RestartSec=3
 Environment="HOME=/usr/share/ollama"
 [Install]
 WantedBy=default.target
 ```
 Then start the service:
 ```bash
 sudo systemctl daemon-reload
 sudo systemctl enable ollama
 ```
 ### Viewing logs
 To view logs of Ollama running as a startup service, run:
 ```bash
 journalctl -u ollama
 ```
--- a/docs/modelfile.md
+++ b/docs/modelfile.md
@@ -1,6 +1,6 @@
 # Ollama Model File
-> Note: this model file syntax is in development
+> Note: this `Modelfile` syntax is in development
 A model file is the blueprint to create and share models with Ollama.
@@ -24,7 +24,7 @@ A model file is the blueprint to create and share models with Ollama.
 ## Format
-The format of the Modelfile:
+The format of the `Modelfile`:
 ```modelfile
 # comment
@@ -42,9 +42,9 @@ INSTRUCTION arguments
 ## Examples
-An example of a model file creating a mario blueprint:
+An example of a `Modelfile` creating a mario blueprint:
-```
+```modelfile
 FROM llama2
 # sets the temperature to 1 [higher is more creative, lower is more coherent]
 PARAMETER temperature 1
@@ -57,9 +57,9 @@ SYSTEM You are Mario from super mario bros, acting as an assistant.
 To use this:
-1. Save it as a file (eg. `Modelfile`)
+1. Save it as a file (e.g. `Modelfile`)
-2. `ollama create NAME -f <location of the file eg. ./Modelfile>'`
+2. `ollama create choose-a-model-name -f <location of the file e.g. ./Modelfile>'`
-3. `ollama run NAME`
+3. `ollama run choose-a-model-name`
 4. Start using the model!
 More examples are available in the [examples directory](../examples).
@@ -68,33 +68,34 @@ More examples are available in the [examples directory](../examples).
 ### FROM (Required)
-The FROM instruction defines the base model to use when creating a model.
+The `FROM` instruction defines the base model to use when creating a model.
-```
+```modelfile
 FROM <model name>:<tag>
 ```
 #### Build from llama2
-```
+```modelfile
 FROM llama2
 ```
 A list of available base models:
 <https://github.com/jmorganca/ollama#model-library>
-#### Build from a bin file
+#### Build from a `bin` file
-```
+```modelfile
 FROM ./ollama-model.bin
 ```
-This bin file location should be specified as an absolute path or relative to the Modelfile location.
+This bin file location should be specified as an absolute path or relative to the `Modelfile` location.
 ### EMBED
-The EMBED instruction is used to add embeddings of files to a model. This is useful for adding custom data that the model can reference when generating an answer. Note that currently only text files are supported, formatted with each line as one embedding.
+The `EMBED` instruction is used to add embeddings of files to a model. This is useful for adding custom data that the model can reference when generating an answer. Note that currently only text files are supported, formatted with each line as one embedding.
-```
+
 ```modelfile
 FROM <model name>:<tag>
 EMBED <file path>.txt
 EMBED <different file path>.txt
@@ -105,7 +106,7 @@ EMBED <path to directory>/*.txt
 The `PARAMETER` instruction defines a parameter that can be set when the model is run.
-```
+```modelfile
 PARAMETER <parameter> <parametervalue>
 ```
@@ -118,19 +119,21 @@ PARAMETER <parameter> <parametervalue>
 | mirostat_tau   | Controls the balance between coherence and diversity of the output. A lower value will result in more focused and coherent text. (Default: 5.0)                                                                                                         | float      | mirostat_tau 5.0     |
 | num_ctx        | Sets the size of the context window used to generate the next token. (Default: 2048)                                                                                                                                                                    | int        | num_ctx 4096         |
 | num_gqa        | The number of GQA groups in the transformer layer. Required for some models, for example it is 8 for llama2:70b                                                                                                                                         | int        | num_gqa 1            |
-| num_gpu        | The number of GPUs to use. On macOS it defaults to 1 to enable metal support, 0 to disable.                                                                                                                                                             | int        | num_gpu 1            |
+| num_gpu        | The number of layers to send to the GPU(s). On macOS it defaults to 1 to enable metal support, 0 to disable.                                                                                                                                            | int        | num_gpu 50           |
 | num_thread     | Sets the number of threads to use during computation. By default, Ollama will detect this for optimal performance. It is recommended to set this value to the number of physical CPU cores your system has (as opposed to the logical number of cores). | int        | num_thread 8         |
 | repeat_last_n  | Sets how far back for the model to look back to prevent repetition. (Default: 64, 0 = disabled, -1 = num_ctx)                                                                                                                                           | int        | repeat_last_n 64     |
 | repeat_penalty | Sets how strongly to penalize repetitions. A higher value (e.g., 1.5) will penalize repetitions more strongly, while a lower value (e.g., 0.9) will be more lenient. (Default: 1.1)                                                                     | float      | repeat_penalty 1.1   |
 | temperature    | The temperature of the model. Increasing the temperature will make the model answer more creatively. (Default: 0.8)                                                                                                                                     | float      | temperature 0.7      |
 | seed | Sets the random number seed to use for generation. Setting this to a specific number will make the model generate the same text for the same prompt. | int | seed 42 |
 | stop           | Sets the stop sequences to use.                                                                                                                                                                                                                         | string     | stop "AI assistant:" |
 | tfs_z          | Tail free sampling is used to reduce the impact of less probable tokens from the output. A higher value (e.g., 2.0) will reduce the impact more, while a value of 1.0 disables this setting. (default: 1)                                               | float      | tfs_z 1              |
 | num_predict    | Maximum number of tokens to predict when generating text. (Default: 128, -1 = infinite generation, -2 = fill context)                                                                                                                                   | int        | num_predict 42       |
 | top_k          | Reduces the probability of generating nonsense. A higher value (e.g. 100) will give more diverse answers, while a lower value (e.g. 10) will be more conservative. (Default: 40)                                                                        | int        | top_k 40             |
 | top_p          | Works together with top-k. A higher value (e.g., 0.95) will lead to more diverse text, while a lower value (e.g., 0.5) will generate more focused and conservative text. (Default: 0.9)                                                                 | float      | top_p 0.9            |
 ### TEMPLATE
-`TEMPLATE` of the full prompt template to be passed into the model. It may include (optionally) a system prompt and a user's prompt. This is used to create a full custom prompt, and syntax may be model specific.
+`TEMPLATE` of the full prompt template to be passed into the model. It may include (optionally) a system prompt and a user's prompt. This is used to create a full custom prompt, and syntax may be model specific. You can usually find the template for a given model in the readme for that model.
 #### Template Variables
@@ -140,7 +143,7 @@ PARAMETER <parameter> <parametervalue>
 | `{{ .Prompt }}` | The incoming prompt, this is not specified in the model file and will be set based on input.                 |
 | `{{ .First }}`  | A boolean value used to render specific template information for the first generation of a session.          |
-```
+```modelfile
 TEMPLATE """
 {{- if .First }}
 ### System:
@@ -160,7 +163,7 @@ SYSTEM """<system message>"""
 The `SYSTEM` instruction specifies the system prompt to be used in the template, if applicable.
-```
+```modelfile
 SYSTEM """<system message>"""
 ```
@@ -168,7 +171,7 @@ SYSTEM """<system message>"""
 The `ADAPTER` instruction specifies the LoRA adapter to apply to the base model. The value of this instruction should be an absolute path or a path relative to the Modelfile and the file must be in a GGML file format. The adapter should be tuned from the base model otherwise the behaviour is undefined.
-```
+```modelfile
 ADAPTER ./ollama-lora.bin
 ```
@@ -176,7 +179,7 @@ ADAPTER ./ollama-lora.bin
 The `LICENSE` instruction allows you to specify the legal license under which the model used with this Modelfile is shared or distributed.
-```
+```modelfile
 LICENSE """
 <license text>
 """
@@ -184,5 +187,5 @@ LICENSE """
 ## Notes
- the **modelfile is not case sensitive**. In the examples, we use uppercase for instructions to make it easier to distinguish it from arguments.
+- the **`Modelfile` is not case sensitive**. In the examples, we use uppercase for instructions to make it easier to distinguish it from arguments.
 - Instructions can be in any order. In the examples, we start with FROM instruction to keep it easily readable.
--- a/docs/quantize.md
+++ b/docs/quantize.md
@@ -0,0 +1,111 @@
 # How to Quantize a Model
 Sometimes the model you want to work with is not available at [https://ollama.ai/library](https://ollama.ai/library).
 ## Figure out if we can run the model?
 Not all models will work with Ollama. There are a number of factors that go into whether we are able to work with the next cool model. First it has to work with llama.cpp. Then we have to have implemented the features of llama.cpp that it requires. And then, sometimes, even with both of those, the model might not work...
 1. What is the model you want to convert and upload?
 2. Visit the model's page on HuggingFace.
 3. Switch to the **Files and versions** tab.
 4. Click on the **config.json** file. If there is no config.json file, it may not work.
 5. Take note of the **architecture** list in the json file.
 6. Does any entry in the list match one of the following architectures?
    1. LlamaForCausalLM
    2. MistralForCausalLM
    3. RWForCausalLM
    4. FalconForCausalLM
    5. GPTNeoXForCausalLM
    6. GPTBigCodeForCausalLM
 7. If the answer is yes, then there is a good chance the model will run after being converted and quantized.
 8. An alternative to this process is to visit [https://caniquant.tvl.st](https://caniquant.tvl.st) and enter the org/modelname in the box and submit.
 At this point there are two processes you can use. You can either use a Docker container to convert and quantize, OR you can manually run the scripts. The Docker container is the easiest way to do it, but it requires you to have Docker installed on your machine. If you don't have Docker installed, you can follow the manual process.
 ## Convert and Quantize with Docker
 Run `docker run --rm -v /path/to/model/repo:/repo ollama/quantize -q quantlevel /repo`. For instance, if you have downloaded the latest Mistral 7B model, then clone it to your machine. Then change into that directory and you can run:
 ```shell
 docker run --rm -v .:/repo ollama/quantize -q q4_0 /repo
 ```
 You can find the different quantization levels below under **Quantize the Model**.
 This will output two files into the directory. First is a f16.bin file that is the model converted to GGUF. The second file is a q4_0.bin file which is the model quantized to a 4 bit quantization. You should rename it to something more descriptive.
 You can find the repository for the Docker container here: [https://github.com/mxyng/quantize](https://github.com/mxyng/quantize)
 For instance, if you wanted to convert the Mistral 7B model to a Q4 quantized model, then you could go through the following steps:
 1. First verify the model will potentially work.
 2. Now clone Mistral 7B to your machine. You can find the command to run when you click the three vertical dots button on the model page, then click **Clone Repository**.
   1. For this repo, the command is:
      ```shell
      git lfs install
      git clone https://huggingface.co/mistralai/Mistral-7B-v0.1
      ```
   2. Navigate into the new directory and run `docker run --rm -v .:/repo ollama/quantize -q q4_0 /repo`
   3. Now you can create a modelfile using the q4_0.bin file that was created.
 ## Convert and Quantize Manually
 ### Clone llama.cpp to your machine
 If we know the model has a chance of working, then we need to convert and quantize. This is a matter of running two separate scripts in the llama.cpp project.
 1. Decide where you want the llama.cpp repository on your machine.
 2. Navigate to that location and then run:
 [`git clone https://github.com/ggerganov/llama.cpp.git`](https://github.com/ggerganov/llama.cpp.git)
    1. If you don't have git installed, download this zip file and unzip it to that location: https://github.com/ggerganov/llama.cpp/archive/refs/heads/master.zip
 3. Install the Python dependencies: `pip install torch transformers sentencepiece`
 4. Run 'make' to build the project and the quantize executable.
 ### Convert the model to GGUF
 1. Decide on the right convert script to run. What was the model architecture you found in the first section.
    1. LlamaForCausalLM or MistralForCausalLM:
    run `python3 convert.py <modelfilename>`
    No need to specify fp16 or fp32.
    2. FalconForCausalLM or RWForCausalLM:
    run `python3 convert-falcon-hf-to-gguf.py <modelfilename> <fpsize>`  
    fpsize depends on the weight size. 1 for fp16, 0 for fp32
    3. GPTNeoXForCausalLM:
    run `python3 convert-gptneox-hf-to-gguf.py <modelfilename> <fpsize>`
    fpsize depends on the weight size. 1 for fp16, 0 for fp32
    4. GPTBigCodeForCausalLM:
    run `python3 convert-starcoder-hf-to-gguf.py <modelfilename> <fpsize>`
    fpsize depends on the weight size. 1 for fp16, 0 for fp32
 ### Quantize the model
 If the model converted successfully, there is a good chance it will also quantize successfully. Now you need to decide on the quantization to use. We will always try to create all the quantizations and upload them to the library. You should decide which level is more important to you and quantize accordingly.
 The quantization options are as follows. Note that some architectures such as Falcon do not support K quants.
 - Q4_0
 - Q4_1
 - Q5_0
 - Q5_1
 - Q2_K
 - Q3_K
 - Q3_K_S
 - Q3_K_M
 - Q3_K_L
 - Q4_K
 - Q4_K_S
 - Q4_K_M
 - Q5_K
 - Q5_K_S
 - Q5_K_M
 - Q6_K
 - Q8_0
 Run the following command `quantize <converted model from above> <output file> <quantization type>`
 ## Now Create the Model
 Now you can create the Ollama model. Refer to the [modelfile](./modelfile.md) doc for more information on doing that.
--- a/examples/.gitignore
+++ b/examples/.gitignore
@@ -0,0 +1,171 @@
 node_modules
 # OSX
 .DS_STORE
 # Models
 models/
 # Local Chroma db
 .chroma/
 db/
 # Byte-compiled / optimized / DLL files
 __pycache__/
 *.py[cod]
 *$py.class
 # C extensions
 *.so
 # Distribution / packaging
 .Python
 build/
 develop-eggs/
 dist/
 downloads/
 eggs/
 .eggs/
 lib/
 lib64/
 parts/
 sdist/
 var/
 wheels/
 share/python-wheels/
 *.egg-info/
 .installed.cfg
 *.egg
 MANIFEST
 # PyInstaller
 #  Usually these files are written by a python script from a template
 #  before PyInstaller builds the exe, so as to inject date/other infos into it.
 *.manifest
 *.spec
 # Installer logs
 pip-log.txt
 pip-delete-this-directory.txt
 # Unit test / coverage reports
 htmlcov/
 .tox/
 .nox/
 .coverage
 .coverage.*
 .cache
 nosetests.xml
 coverage.xml
 *.cover
 *.py,cover
 .hypothesis/
 .pytest_cache/
 cover/
 # Translations
 *.mo
 *.pot
 # Django stuff:
 *.log
 local_settings.py
 db.sqlite3
 db.sqlite3-journal
 # Flask stuff:
 instance/
 .webassets-cache
 # Scrapy stuff:
 .scrapy
 # Sphinx documentation
 docs/_build/
 # PyBuilder
 .pybuilder/
 target/
 # Jupyter Notebook
 .ipynb_checkpoints
 # IPython
 profile_default/
 ipython_config.py
 # pyenv
 #   For a library or package, you might want to ignore these files since the code is
 #   intended to run in multiple environments; otherwise, check them in:
 # .python-version
 # pipenv
 #   According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
 #   However, in case of collaboration, if having platform-specific dependencies or dependencies
 #   having no cross-platform support, pipenv may install dependencies that don't work, or not
 #   install all needed dependencies.
 #Pipfile.lock
 # poetry
 #   Similar to Pipfile.lock, it is generally recommended to include poetry.lock in version control.
 #   This is especially recommended for binary packages to ensure reproducibility, and is more
 #   commonly ignored for libraries.
 #   https://python-poetry.org/docs/basic-usage/#commit-your-poetrylock-file-to-version-control
 #poetry.lock
 # pdm
 #   Similar to Pipfile.lock, it is generally recommended to include pdm.lock in version control.
 #pdm.lock
 #   pdm stores project-wide configurations in .pdm.toml, but it is recommended to not include it
 #   in version control.
 #   https://pdm.fming.dev/#use-with-ide
 .pdm.toml
 # PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm
 __pypackages__/
 # Celery stuff
 celerybeat-schedule
 celerybeat.pid
 # SageMath parsed files
 *.sage.py
 # Environments
 .env
 .venv
 env/
 venv/
 ENV/
 env.bak/
 venv.bak/
 # Spyder project settings
 .spyderproject
 .spyproject
 # Rope project settings
 .ropeproject
 # mkdocs documentation
 /site
 # mypy
 .mypy_cache/
 .dmypy.json
 dmypy.json
 # Pyre type checker
 .pyre/
 # pytype static type analyzer
 .pytype/
 # Cython debug symbols
 cython_debug/
 # PyCharm
 #  JetBrains specific template is maintained in a separate JetBrains.gitignore that can
 #  be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore
 #  and can be added to the global gitignore or merged into this file.  For a more nuclear
 #  option (not recommended) you can uncomment the following to ignore the entire idea folder.
 #.idea/
--- a/examples/README.md
+++ b/examples/README.md
@@ -1,15 +1,3 @@
 # Examples
-This directory contains different examples of using Ollama
+This directory contains different examples of using Ollama.
 To create a model:
 ```
 ollama create example -f <example file>
 ```
 To run a model:
 ```
 ollama run example
 ```
--- a/examples/golang-simplegenerate/README.md
+++ b/examples/golang-simplegenerate/README.md
--- a/examples/golang-simplegenerate/main.go
+++ b/examples/golang-simplegenerate/main.go
@@ -0,0 +1,27 @@
 package main
 import (
 	"bytes"
 	"fmt"
 	"net/http"
 	"os"
 	"io"
 	"log"
 )
 func main() {
 	body := []byte(`{"model":"mistral"}`)
 	resp, err := http.Post("http://localhost:11434/api/generate", "application/json", bytes.NewBuffer(body))
 	if err != nil {
 		fmt.Print(err.Error())
 		os.Exit(1)
 	} 
 	responseData, err := io.ReadAll(resp.Body)
 	if err != nil {
 		log.Fatal(err)
 	}
 	fmt.Println(string(responseData))
 }
--- a/examples/langchain-python-rag-document/README.md
+++ b/examples/langchain-python-rag-document/README.md
--- a/examples/langchain-python-rag-document/main.py
+++ b/examples/langchain-python-rag-document/main.py
--- a/examples/langchain-python-rag-document/requirements.txt
+++ b/examples/langchain-python-rag-document/requirements.txt
--- a/examples/langchain-python-rag-privategpt/.gitignore
+++ b/examples/langchain-python-rag-privategpt/.gitignore
--- a/examples/langchain-python-rag-privategpt/LICENSE
+++ b/examples/langchain-python-rag-privategpt/LICENSE
--- a/examples/langchain-python-rag-privategpt/README.md
+++ b/examples/langchain-python-rag-privategpt/README.md
--- a/examples/langchain-python-rag-privategpt/constants.py
+++ b/examples/langchain-python-rag-privategpt/constants.py
--- a/examples/langchain-python-rag-privategpt/ingest.py
+++ b/examples/langchain-python-rag-privategpt/ingest.py
--- a/examples/langchain-python-rag-privategpt/poetry.lock
+++ b/examples/langchain-python-rag-privategpt/poetry.lock
--- a/examples/langchain-python-rag-privategpt/privateGPT.py
+++ b/examples/langchain-python-rag-privategpt/privateGPT.py
--- a/examples/langchain-python-rag-privategpt/pyproject.toml
+++ b/examples/langchain-python-rag-privategpt/pyproject.toml
--- a/examples/langchain-python-rag-privategpt/requirements.txt
+++ b/examples/langchain-python-rag-privategpt/requirements.txt
--- a/examples/langchain-python-rag-websummary/README.md
+++ b/examples/langchain-python-rag-websummary/README.md
--- a/examples/langchain-python-rag-websummary/main.py
+++ b/examples/langchain-python-rag-websummary/main.py
--- a/examples/langchain-python-rag-websummary/requirements.txt
+++ b/examples/langchain-python-rag-websummary/requirements.txt
--- a/examples/langchain-python-simple/README.md
+++ b/examples/langchain-python-simple/README.md
--- a/examples/langchain-python-simple/main.py
+++ b/examples/langchain-python-simple/main.py
--- a/examples/langchain-python-simple/requirements.txt
+++ b/examples/langchain-python-simple/requirements.txt
--- a/examples/langchain-typescript-simple/README.md
+++ b/examples/langchain-typescript-simple/README.md
@@ -0,0 +1,21 @@
 # LangChain
 This example is a basic "hello world" of using LangChain with Ollama using Node.js and Typescript.
 ## Setup
 ```shell
 npm install
 ```
 ## Run
 ```shell
 ts-node main.ts
 ```
 Running this example will print the response for "hello":
 ```plaintext
 Hello! It's nice to meet you. hopefully you are having a great day! Is there something I can help you with or would you like to chat?
 ```
--- a/examples/langchain-typescript-simple/main.ts
+++ b/examples/langchain-typescript-simple/main.ts
@@ -0,0 +1,15 @@
 import { Ollama} from 'langchain/llms/ollama';
 async function main() {
  const ollama = new Ollama({
    model: 'mistral'    
    // other parameters can be found at https://js.langchain.com/docs/api/llms_ollama/classes/Ollama
  })
  const stream = await ollama.stream("Hello");
  for await (const chunk of stream) {
    process.stdout.write(chunk);
  }
 }
 main();
--- a/examples/langchain-typescript-simple/package-lock.json
+++ b/examples/langchain-typescript-simple/package-lock.json
@@ -0,0 +1,997 @@
 {
  "name": "with-langchain-typescript-simplegenerate",
  "lockfileVersion": 3,
  "requires": true,
  "packages": {
    "": {
      "dependencies": {
        "langchain": "^0.0.165"
      },
      "devDependencies": {
        "typescript": "^5.2.2"
      }
    },
    "node_modules/@anthropic-ai/sdk": {
      "version": "0.6.2",
      "resolved": "https://registry.npmjs.org/@anthropic-ai/sdk/-/sdk-0.6.2.tgz",
      "integrity": "sha512-fB9PUj9RFT+XjkL+E9Ol864ZIJi+1P8WnbHspN3N3/GK2uSzjd0cbVIKTGgf4v3N8MwaQu+UWnU7C4BG/fap/g==",
      "dependencies": {
        "@types/node": "^18.11.18",
        "@types/node-fetch": "^2.6.4",
        "abort-controller": "^3.0.0",
        "agentkeepalive": "^4.2.1",
        "digest-fetch": "^1.3.0",
        "form-data-encoder": "1.7.2",
        "formdata-node": "^4.3.2",
        "node-fetch": "^2.6.7"
      }
    },
    "node_modules/@types/node": {
      "version": "18.18.4",
      "resolved": "https://registry.npmjs.org/@types/node/-/node-18.18.4.tgz",
      "integrity": "sha512-t3rNFBgJRugIhackit2mVcLfF6IRc0JE4oeizPQL8Zrm8n2WY/0wOdpOPhdtG0V9Q2TlW/axbF1MJ6z+Yj/kKQ=="
    },
    "node_modules/@types/node-fetch": {
      "version": "2.6.6",
      "resolved": "https://registry.npmjs.org/@types/node-fetch/-/node-fetch-2.6.6.tgz",
      "integrity": "sha512-95X8guJYhfqiuVVhRFxVQcf4hW/2bCuoPwDasMf/531STFoNoWTT7YDnWdXHEZKqAGUigmpG31r2FE70LwnzJw==",
      "dependencies": {
        "@types/node": "*",
        "form-data": "^4.0.0"
      }
    },
    "node_modules/@types/retry": {
      "version": "0.12.0",
      "resolved": "https://registry.npmjs.org/@types/retry/-/retry-0.12.0.tgz",
      "integrity": "sha512-wWKOClTTiizcZhXnPY4wikVAwmdYHp8q6DmC+EJUzAMsycb7HB32Kh9RN4+0gExjmPmZSAQjgURXIGATPegAvA=="
    },
    "node_modules/@types/uuid": {
      "version": "9.0.5",
      "resolved": "https://registry.npmjs.org/@types/uuid/-/uuid-9.0.5.tgz",
      "integrity": "sha512-xfHdwa1FMJ082prjSJpoEI57GZITiQz10r3vEJCHa2khEFQjKy91aWKz6+zybzssCvXUwE1LQWgWVwZ4nYUvHQ=="
    },
    "node_modules/abort-controller": {
      "version": "3.0.0",
      "resolved": "https://registry.npmjs.org/abort-controller/-/abort-controller-3.0.0.tgz",
      "integrity": "sha512-h8lQ8tacZYnR3vNQTgibj+tODHI5/+l06Au2Pcriv/Gmet0eaj4TwWH41sO9wnHDiQsEj19q0drzdWdeAHtweg==",
      "dependencies": {
        "event-target-shim": "^5.0.0"
      },
      "engines": {
        "node": ">=6.5"
      }
    },
    "node_modules/agentkeepalive": {
      "version": "4.5.0",
      "resolved": "https://registry.npmjs.org/agentkeepalive/-/agentkeepalive-4.5.0.tgz",
      "integrity": "sha512-5GG/5IbQQpC9FpkRGsSvZI5QYeSCzlJHdpBQntCsuTOxhKD8lqKhrleg2Yi7yvMIf82Ycmmqln9U8V9qwEiJew==",
      "dependencies": {
        "humanize-ms": "^1.2.1"
      },
      "engines": {
        "node": ">= 8.0.0"
      }
    },
    "node_modules/ansi-styles": {
      "version": "5.2.0",
      "resolved": "https://registry.npmjs.org/ansi-styles/-/ansi-styles-5.2.0.tgz",
      "integrity": "sha512-Cxwpt2SfTzTtXcfOlzGEee8O+c+MmUgGrNiBcXnuWxuFJHe6a5Hz7qwhwe5OgaSYI0IJvkLqWX1ASG+cJOkEiA==",
      "engines": {
        "node": ">=10"
      },
      "funding": {
        "url": "https://github.com/chalk/ansi-styles?sponsor=1"
      }
    },
    "node_modules/argparse": {
      "version": "2.0.1",
      "resolved": "https://registry.npmjs.org/argparse/-/argparse-2.0.1.tgz",
      "integrity": "sha512-8+9WqebbFzpX9OR+Wa6O29asIogeRMzcGtAINdpMHHyAg10f05aSFVBbcEqGf/PXw1EjAZ+q2/bEBg3DvurK3Q=="
    },
    "node_modules/asynckit": {
      "version": "0.4.0",
      "resolved": "https://registry.npmjs.org/asynckit/-/asynckit-0.4.0.tgz",
      "integrity": "sha512-Oei9OH4tRh0YqU3GxhX79dM/mwVgvbZJaSNaRk+bshkj0S5cfHcgYakreBjrHwatXKbz+IoIdYLxrKim2MjW0Q=="
    },
    "node_modules/base-64": {
      "version": "0.1.0",
      "resolved": "https://registry.npmjs.org/base-64/-/base-64-0.1.0.tgz",
      "integrity": "sha512-Y5gU45svrR5tI2Vt/X9GPd3L0HNIKzGu202EjxrXMpuc2V2CiKgemAbUUsqYmZJvPtCXoUKjNZwBJzsNScUbXA=="
    },
    "node_modules/base64-js": {
      "version": "1.5.1",
      "resolved": "https://registry.npmjs.org/base64-js/-/base64-js-1.5.1.tgz",
      "integrity": "sha512-AKpaYlHn8t4SVbOHCy+b5+KKgvR4vrsD8vbvrbiQJps7fKDTkjkDry6ji0rUJjC0kzbNePLwzxq8iypo41qeWA==",
      "funding": [
        {
          "type": "github",
          "url": "https://github.com/sponsors/feross"
        },
        {
          "type": "patreon",
          "url": "https://www.patreon.com/feross"
        },
        {
          "type": "consulting",
          "url": "https://feross.org/support"
        }
      ]
    },
    "node_modules/binary-extensions": {
      "version": "2.2.0",
      "resolved": "https://registry.npmjs.org/binary-extensions/-/binary-extensions-2.2.0.tgz",
      "integrity": "sha512-jDctJ/IVQbZoJykoeHbhXpOlNBqGNcwXJKJog42E5HDPUwQTSdjCHdihjj0DlnheQ7blbT6dHOafNAiS8ooQKA==",
      "engines": {
        "node": ">=8"
      }
    },
    "node_modules/binary-search": {
      "version": "1.3.6",
      "resolved": "https://registry.npmjs.org/binary-search/-/binary-search-1.3.6.tgz",
      "integrity": "sha512-nbE1WxOTTrUWIfsfZ4aHGYu5DOuNkbxGokjV6Z2kxfJK3uaAb8zNK1muzOeipoLHZjInT4Br88BHpzevc681xA=="
    },
    "node_modules/camelcase": {
      "version": "6.3.0",
      "resolved": "https://registry.npmjs.org/camelcase/-/camelcase-6.3.0.tgz",
      "integrity": "sha512-Gmy6FhYlCY7uOElZUSbxo2UCDH8owEk996gkbrpsgGtrJLM3J7jGxl9Ic7Qwwj4ivOE5AWZWRMecDdF7hqGjFA==",
      "engines": {
        "node": ">=10"
      },
      "funding": {
        "url": "https://github.com/sponsors/sindresorhus"
      }
    },
    "node_modules/charenc": {
      "version": "0.0.2",
      "resolved": "https://registry.npmjs.org/charenc/-/charenc-0.0.2.tgz",
      "integrity": "sha512-yrLQ/yVUFXkzg7EDQsPieE/53+0RlaWTs+wBrvW36cyilJ2SaDWfl4Yj7MtLTXleV9uEKefbAGUPv2/iWSooRA==",
      "engines": {
        "node": "*"
      }
    },
    "node_modules/combined-stream": {
      "version": "1.0.8",
      "resolved": "https://registry.npmjs.org/combined-stream/-/combined-stream-1.0.8.tgz",
      "integrity": "sha512-FQN4MRfuJeHf7cBbBMJFXhKSDq+2kAArBlmRBvcvFE5BB1HZKXtSFASDhdlz9zOYwxh8lDdnvmMOe/+5cdoEdg==",
      "dependencies": {
        "delayed-stream": "~1.0.0"
      },
      "engines": {
        "node": ">= 0.8"
      }
    },
    "node_modules/commander": {
      "version": "10.0.1",
      "resolved": "https://registry.npmjs.org/commander/-/commander-10.0.1.tgz",
      "integrity": "sha512-y4Mg2tXshplEbSGzx7amzPwKKOCGuoSRP/CjEdwwk0FOGlUbq6lKuoyDZTNZkmxHdJtp54hdfY/JUrdL7Xfdug==",
      "engines": {
        "node": ">=14"
      }
    },
    "node_modules/crypt": {
      "version": "0.0.2",
      "resolved": "https://registry.npmjs.org/crypt/-/crypt-0.0.2.tgz",
      "integrity": "sha512-mCxBlsHFYh9C+HVpiEacem8FEBnMXgU9gy4zmNC+SXAZNB/1idgp/aulFJ4FgCi7GPEVbfyng092GqL2k2rmow==",
      "engines": {
        "node": "*"
      }
    },
    "node_modules/decamelize": {
      "version": "1.2.0",
      "resolved": "https://registry.npmjs.org/decamelize/-/decamelize-1.2.0.tgz",
      "integrity": "sha512-z2S+W9X73hAUUki+N+9Za2lBlun89zigOyGrsax+KUQ6wKW4ZoWpEYBkGhQjwAjjDCkWxhY0VKEhk8wzY7F5cA==",
      "engines": {
        "node": ">=0.10.0"
      }
    },
    "node_modules/delayed-stream": {
      "version": "1.0.0",
      "resolved": "https://registry.npmjs.org/delayed-stream/-/delayed-stream-1.0.0.tgz",
      "integrity": "sha512-ZySD7Nf91aLB0RxL4KGrKHBXl7Eds1DAmEdcoVawXnLD7SDhpNgtuII2aAkg7a7QS41jxPSZ17p4VdGnMHk3MQ==",
      "engines": {
        "node": ">=0.4.0"
      }
    },
    "node_modules/digest-fetch": {
      "version": "1.3.0",
      "resolved": "https://registry.npmjs.org/digest-fetch/-/digest-fetch-1.3.0.tgz",
      "integrity": "sha512-CGJuv6iKNM7QyZlM2T3sPAdZWd/p9zQiRNS9G+9COUCwzWFTs0Xp8NF5iePx7wtvhDykReiRRrSeNb4oMmB8lA==",
      "dependencies": {
        "base-64": "^0.1.0",
        "md5": "^2.3.0"
      }
    },
    "node_modules/event-target-shim": {
      "version": "5.0.1",
      "resolved": "https://registry.npmjs.org/event-target-shim/-/event-target-shim-5.0.1.tgz",
      "integrity": "sha512-i/2XbnSz/uxRCU6+NdVJgKWDTM427+MqYbkQzD321DuCQJUqOuJKIA0IM2+W2xtYHdKOmZ4dR6fExsd4SXL+WQ==",
      "engines": {
        "node": ">=6"
      }
    },
    "node_modules/eventemitter3": {
      "version": "4.0.7",
      "resolved": "https://registry.npmjs.org/eventemitter3/-/eventemitter3-4.0.7.tgz",
      "integrity": "sha512-8guHBZCwKnFhYdHr2ysuRWErTwhoN2X8XELRlrRwpmfeY2jjuUN4taQMsULKUVo1K4DvZl+0pgfyoysHxvmvEw=="
    },
    "node_modules/expr-eval": {
      "version": "2.0.2",
      "resolved": "https://registry.npmjs.org/expr-eval/-/expr-eval-2.0.2.tgz",
      "integrity": "sha512-4EMSHGOPSwAfBiibw3ndnP0AvjDWLsMvGOvWEZ2F96IGk0bIVdjQisOHxReSkE13mHcfbuCiXw+G4y0zv6N8Eg=="
    },
    "node_modules/flat": {
      "version": "5.0.2",
      "resolved": "https://registry.npmjs.org/flat/-/flat-5.0.2.tgz",
      "integrity": "sha512-b6suED+5/3rTpUBdG1gupIl8MPFCAMA0QXwmljLhvCUKcUvdE4gWky9zpuGCcXHOsz4J9wPGNWq6OKpmIzz3hQ==",
      "bin": {
        "flat": "cli.js"
      }
    },
    "node_modules/form-data": {
      "version": "4.0.0",
      "resolved": "https://registry.npmjs.org/form-data/-/form-data-4.0.0.tgz",
      "integrity": "sha512-ETEklSGi5t0QMZuiXoA/Q6vcnxcLQP5vdugSpuAyi6SVGi2clPPp+xgEhuMaHC+zGgn31Kd235W35f7Hykkaww==",
      "dependencies": {
        "asynckit": "^0.4.0",
        "combined-stream": "^1.0.8",
        "mime-types": "^2.1.12"
      },
      "engines": {
        "node": ">= 6"
      }
    },
    "node_modules/form-data-encoder": {
      "version": "1.7.2",
      "resolved": "https://registry.npmjs.org/form-data-encoder/-/form-data-encoder-1.7.2.tgz",
      "integrity": "sha512-qfqtYan3rxrnCk1VYaA4H+Ms9xdpPqvLZa6xmMgFvhO32x7/3J/ExcTd6qpxM0vH2GdMI+poehyBZvqfMTto8A=="
    },
    "node_modules/formdata-node": {
      "version": "4.4.1",
      "resolved": "https://registry.npmjs.org/formdata-node/-/formdata-node-4.4.1.tgz",
      "integrity": "sha512-0iirZp3uVDjVGt9p49aTaqjk84TrglENEDuqfdlZQ1roC9CWlPk6Avf8EEnZNcAqPonwkG35x4n3ww/1THYAeQ==",
      "dependencies": {
        "node-domexception": "1.0.0",
        "web-streams-polyfill": "4.0.0-beta.3"
      },
      "engines": {
        "node": ">= 12.20"
      }
    },
    "node_modules/humanize-ms": {
      "version": "1.2.1",
      "resolved": "https://registry.npmjs.org/humanize-ms/-/humanize-ms-1.2.1.tgz",
      "integrity": "sha512-Fl70vYtsAFb/C06PTS9dZBo7ihau+Tu/DNCk/OyHhea07S+aeMWpFFkUaXRa8fI+ScZbEI8dfSxwY7gxZ9SAVQ==",
      "dependencies": {
        "ms": "^2.0.0"
      }
    },
    "node_modules/is-any-array": {
      "version": "2.0.1",
      "resolved": "https://registry.npmjs.org/is-any-array/-/is-any-array-2.0.1.tgz",
      "integrity": "sha512-UtilS7hLRu++wb/WBAw9bNuP1Eg04Ivn1vERJck8zJthEvXCBEBpGR/33u/xLKWEQf95803oalHrVDptcAvFdQ=="
    },
    "node_modules/is-buffer": {
      "version": "1.1.6",
      "resolved": "https://registry.npmjs.org/is-buffer/-/is-buffer-1.1.6.tgz",
      "integrity": "sha512-NcdALwpXkTm5Zvvbk7owOUSvVvBKDgKP5/ewfXEznmQFfs4ZRmanOeKBTjRVjka3QFoN6XJ+9F3USqfHqTaU5w=="
    },
    "node_modules/js-tiktoken": {
      "version": "1.0.7",
      "resolved": "https://registry.npmjs.org/js-tiktoken/-/js-tiktoken-1.0.7.tgz",
      "integrity": "sha512-biba8u/clw7iesNEWLOLwrNGoBP2lA+hTaBLs/D45pJdUPFXyxD6nhcDVtADChghv4GgyAiMKYMiRx7x6h7Biw==",
      "dependencies": {
        "base64-js": "^1.5.1"
      }
    },
    "node_modules/js-yaml": {
      "version": "4.1.0",
      "resolved": "https://registry.npmjs.org/js-yaml/-/js-yaml-4.1.0.tgz",
      "integrity": "sha512-wpxZs9NoxZaJESJGIZTyDEaYpl0FKSA+FB9aJiyemKhMwkxQg63h4T1KJgUGHpTqPDNRcmmYLugrRjJlBtWvRA==",
      "dependencies": {
        "argparse": "^2.0.1"
      },
      "bin": {
        "js-yaml": "bin/js-yaml.js"
      }
    },
    "node_modules/jsonpointer": {
      "version": "5.0.1",
      "resolved": "https://registry.npmjs.org/jsonpointer/-/jsonpointer-5.0.1.tgz",
      "integrity": "sha512-p/nXbhSEcu3pZRdkW1OfJhpsVtW1gd4Wa1fnQc9YLiTfAjn0312eMKimbdIQzuZl9aa9xUGaRlP9T/CJE/ditQ==",
      "engines": {
        "node": ">=0.10.0"
      }
    },
    "node_modules/langchain": {
      "version": "0.0.165",
      "resolved": "https://registry.npmjs.org/langchain/-/langchain-0.0.165.tgz",
      "integrity": "sha512-CpbNpjwaE+9lzjdw+pZz0VgnRrFivEgr7CVp9dDaAb5JpaJAA4V2v6uQ9ZPN+TSqupTQ79HFn2sfyZVEl2EG7Q==",
      "dependencies": {
        "@anthropic-ai/sdk": "^0.6.2",
        "ansi-styles": "^5.0.0",
        "binary-extensions": "^2.2.0",
        "camelcase": "6",
        "decamelize": "^1.2.0",
        "expr-eval": "^2.0.2",
        "flat": "^5.0.2",
        "js-tiktoken": "^1.0.7",
        "js-yaml": "^4.1.0",
        "jsonpointer": "^5.0.1",
        "langchainhub": "~0.0.6",
        "langsmith": "~0.0.31",
        "ml-distance": "^4.0.0",
        "object-hash": "^3.0.0",
        "openai": "~4.4.0",
        "openapi-types": "^12.1.3",
        "p-queue": "^6.6.2",
        "p-retry": "4",
        "uuid": "^9.0.0",
        "yaml": "^2.2.1",
        "zod": "^3.22.3",
        "zod-to-json-schema": "^3.20.4"
      },
      "engines": {
        "node": ">=18"
      },
      "peerDependencies": {
        "@aws-crypto/sha256-js": "^5.0.0",
        "@aws-sdk/client-bedrock-runtime": "^3.422.0",
        "@aws-sdk/client-dynamodb": "^3.310.0",
        "@aws-sdk/client-kendra": "^3.352.0",
        "@aws-sdk/client-lambda": "^3.310.0",
        "@aws-sdk/client-s3": "^3.310.0",
        "@aws-sdk/client-sagemaker-runtime": "^3.310.0",
        "@aws-sdk/client-sfn": "^3.310.0",
        "@aws-sdk/credential-provider-node": "^3.388.0",
        "@azure/storage-blob": "^12.15.0",
        "@clickhouse/client": "^0.0.14",
        "@cloudflare/ai": "^1.0.12",
        "@elastic/elasticsearch": "^8.4.0",
        "@getmetal/metal-sdk": "*",
        "@getzep/zep-js": "^0.7.0",
        "@gomomento/sdk": "^1.23.0",
        "@google-ai/generativelanguage": "^0.2.1",
        "@google-cloud/storage": "^6.10.1",
        "@huggingface/inference": "^1.5.1",
        "@mozilla/readability": "*",
        "@notionhq/client": "^2.2.10",
        "@opensearch-project/opensearch": "*",
        "@pinecone-database/pinecone": "^1.1.0",
        "@planetscale/database": "^1.8.0",
        "@qdrant/js-client-rest": "^1.2.0",
        "@raycast/api": "^1.55.2",
        "@smithy/eventstream-codec": "^2.0.5",
        "@smithy/protocol-http": "^3.0.6",
        "@smithy/signature-v4": "^2.0.10",
        "@smithy/util-utf8": "^2.0.0",
        "@supabase/postgrest-js": "^1.1.1",
        "@supabase/supabase-js": "^2.10.0",
        "@tensorflow-models/universal-sentence-encoder": "*",
        "@tensorflow/tfjs-converter": "*",
        "@tensorflow/tfjs-core": "*",
        "@upstash/redis": "^1.20.6",
        "@vercel/postgres": "^0.5.0",
        "@writerai/writer-sdk": "^0.40.2",
        "@xata.io/client": "^0.25.1",
        "@xenova/transformers": "^2.5.4",
        "@zilliz/milvus2-sdk-node": ">=2.2.7",
        "apify-client": "^2.7.1",
        "axios": "*",
        "cassandra-driver": "^4.6.4",
        "cheerio": "^1.0.0-rc.12",
        "chromadb": "*",
        "cohere-ai": ">=6.0.0",
        "d3-dsv": "^2.0.0",
        "epub2": "^3.0.1",
        "faiss-node": "^0.3.0",
        "fast-xml-parser": "^4.2.7",
        "firebase-admin": "^11.9.0",
        "google-auth-library": "^8.9.0",
        "googleapis": "^126.0.1",
        "hnswlib-node": "^1.4.2",
        "html-to-text": "^9.0.5",
        "ignore": "^5.2.0",
        "ioredis": "^5.3.2",
        "jsdom": "*",
        "llmonitor": "*",
        "lodash": "^4.17.21",
        "mammoth": "*",
        "mongodb": "^5.2.0",
        "mysql2": "^3.3.3",
        "neo4j-driver": "*",
        "node-llama-cpp": "*",
        "notion-to-md": "^3.1.0",
        "pdf-parse": "1.1.1",
        "peggy": "^3.0.2",
        "pg": "^8.11.0",
        "pg-copy-streams": "^6.0.5",
        "pickleparser": "^0.1.0",
        "playwright": "^1.32.1",
        "portkey-ai": "^0.1.11",
        "puppeteer": "^19.7.2",
        "redis": "^4.6.4",
        "replicate": "^0.18.0",
        "sonix-speech-recognition": "^2.1.1",
        "srt-parser-2": "^1.2.2",
        "typeorm": "^0.3.12",
        "typesense": "^1.5.3",
        "usearch": "^1.1.1",
        "vectordb": "^0.1.4",
        "voy-search": "0.6.2",
        "weaviate-ts-client": "^1.4.0",
        "web-auth-library": "^1.0.3",
        "youtube-transcript": "^1.0.6",
        "youtubei.js": "^5.8.0"
      },
      "peerDependenciesMeta": {
        "@aws-crypto/sha256-js": {
          "optional": true
        },
        "@aws-sdk/client-bedrock-runtime": {
          "optional": true
        },
        "@aws-sdk/client-dynamodb": {
          "optional": true
        },
        "@aws-sdk/client-kendra": {
          "optional": true
        },
        "@aws-sdk/client-lambda": {
          "optional": true
        },
        "@aws-sdk/client-s3": {
          "optional": true
        },
        "@aws-sdk/client-sagemaker-runtime": {
          "optional": true
        },
        "@aws-sdk/client-sfn": {
          "optional": true
        },
        "@aws-sdk/credential-provider-node": {
          "optional": true
        },
        "@azure/storage-blob": {
          "optional": true
        },
        "@clickhouse/client": {
          "optional": true
        },
        "@cloudflare/ai": {
          "optional": true
        },
        "@elastic/elasticsearch": {
          "optional": true
        },
        "@getmetal/metal-sdk": {
          "optional": true
        },
        "@getzep/zep-js": {
          "optional": true
        },
        "@gomomento/sdk": {
          "optional": true
        },
        "@google-ai/generativelanguage": {
          "optional": true
        },
        "@google-cloud/storage": {
          "optional": true
        },
        "@huggingface/inference": {
          "optional": true
        },
        "@mozilla/readability": {
          "optional": true
        },
        "@notionhq/client": {
          "optional": true
        },
        "@opensearch-project/opensearch": {
          "optional": true
        },
        "@pinecone-database/pinecone": {
          "optional": true
        },
        "@planetscale/database": {
          "optional": true
        },
        "@qdrant/js-client-rest": {
          "optional": true
        },
        "@raycast/api": {
          "optional": true
        },
        "@smithy/eventstream-codec": {
          "optional": true
        },
        "@smithy/protocol-http": {
          "optional": true
        },
        "@smithy/signature-v4": {
          "optional": true
        },
        "@smithy/util-utf8": {
          "optional": true
        },
        "@supabase/postgrest-js": {
          "optional": true
        },
        "@supabase/supabase-js": {
          "optional": true
        },
        "@tensorflow-models/universal-sentence-encoder": {
          "optional": true
        },
        "@tensorflow/tfjs-converter": {
          "optional": true
        },
        "@tensorflow/tfjs-core": {
          "optional": true
        },
        "@upstash/redis": {
          "optional": true
        },
        "@vercel/postgres": {
          "optional": true
        },
        "@writerai/writer-sdk": {
          "optional": true
        },
        "@xata.io/client": {
          "optional": true
        },
        "@xenova/transformers": {
          "optional": true
        },
        "@zilliz/milvus2-sdk-node": {
          "optional": true
        },
        "apify-client": {
          "optional": true
        },
        "axios": {
          "optional": true
        },
        "cassandra-driver": {
          "optional": true
        },
        "cheerio": {
          "optional": true
        },
        "chromadb": {
          "optional": true
        },
        "cohere-ai": {
          "optional": true
        },
        "d3-dsv": {
          "optional": true
        },
        "epub2": {
          "optional": true
        },
        "faiss-node": {
          "optional": true
        },
        "fast-xml-parser": {
          "optional": true
        },
        "firebase-admin": {
          "optional": true
        },
        "google-auth-library": {
          "optional": true
        },
        "googleapis": {
          "optional": true
        },
        "hnswlib-node": {
          "optional": true
        },
        "html-to-text": {
          "optional": true
        },
        "ignore": {
          "optional": true
        },
        "ioredis": {
          "optional": true
        },
        "jsdom": {
          "optional": true
        },
        "llmonitor": {
          "optional": true
        },
        "lodash": {
          "optional": true
        },
        "mammoth": {
          "optional": true
        },
        "mongodb": {
          "optional": true
        },
        "mysql2": {
          "optional": true
        },
        "neo4j-driver": {
          "optional": true
        },
        "node-llama-cpp": {
          "optional": true
        },
        "notion-to-md": {
          "optional": true
        },
        "pdf-parse": {
          "optional": true
        },
        "peggy": {
          "optional": true
        },
        "pg": {
          "optional": true
        },
        "pg-copy-streams": {
          "optional": true
        },
        "pickleparser": {
          "optional": true
        },
        "playwright": {
          "optional": true
        },
        "portkey-ai": {
          "optional": true
        },
        "puppeteer": {
          "optional": true
        },
        "redis": {
          "optional": true
        },
        "replicate": {
          "optional": true
        },
        "sonix-speech-recognition": {
          "optional": true
        },
        "srt-parser-2": {
          "optional": true
        },
        "typeorm": {
          "optional": true
        },
        "typesense": {
          "optional": true
        },
        "usearch": {
          "optional": true
        },
        "vectordb": {
          "optional": true
        },
        "voy-search": {
          "optional": true
        },
        "weaviate-ts-client": {
          "optional": true
        },
        "web-auth-library": {
          "optional": true
        },
        "youtube-transcript": {
          "optional": true
        },
        "youtubei.js": {
          "optional": true
        }
      }
    },
    "node_modules/langchainhub": {
      "version": "0.0.6",
      "resolved": "https://registry.npmjs.org/langchainhub/-/langchainhub-0.0.6.tgz",
      "integrity": "sha512-SW6105T+YP1cTe0yMf//7kyshCgvCTyFBMTgH2H3s9rTAR4e+78DA/BBrUL/Mt4Q5eMWui7iGuAYb3pgGsdQ9w=="
    },
    "node_modules/langsmith": {
      "version": "0.0.42",
      "resolved": "https://registry.npmjs.org/langsmith/-/langsmith-0.0.42.tgz",
      "integrity": "sha512-sFuN+e7E+pPBIRaRgFqZh/BRBWNHTZNAwi6uj4kydQawooCZYoJmM5snOkiQrhVSvAhgu6xFhLvmfvkPcKzD7w==",
      "dependencies": {
        "@types/uuid": "^9.0.1",
        "commander": "^10.0.1",
        "p-queue": "^6.6.2",
        "p-retry": "4",
        "uuid": "^9.0.0"
      },
      "bin": {
        "langsmith": "dist/cli/main.cjs"
      }
    },
    "node_modules/md5": {
      "version": "2.3.0",
      "resolved": "https://registry.npmjs.org/md5/-/md5-2.3.0.tgz",
      "integrity": "sha512-T1GITYmFaKuO91vxyoQMFETst+O71VUPEU3ze5GNzDm0OWdP8v1ziTaAEPUr/3kLsY3Sftgz242A1SetQiDL7g==",
      "dependencies": {
        "charenc": "0.0.2",
        "crypt": "0.0.2",
        "is-buffer": "~1.1.6"
      }
    },
    "node_modules/mime-db": {
      "version": "1.52.0",
      "resolved": "https://registry.npmjs.org/mime-db/-/mime-db-1.52.0.tgz",
      "integrity": "sha512-sPU4uV7dYlvtWJxwwxHD0PuihVNiE7TyAbQ5SWxDCB9mUYvOgroQOwYQQOKPJ8CIbE+1ETVlOoK1UC2nU3gYvg==",
      "engines": {
        "node": ">= 0.6"
      }
    },
    "node_modules/mime-types": {
      "version": "2.1.35",
      "resolved": "https://registry.npmjs.org/mime-types/-/mime-types-2.1.35.tgz",
      "integrity": "sha512-ZDY+bPm5zTTF+YpCrAU9nK0UgICYPT0QtT1NZWFv4s++TNkcgVaT0g6+4R2uI4MjQjzysHB1zxuWL50hzaeXiw==",
      "dependencies": {
        "mime-db": "1.52.0"
      },
      "engines": {
        "node": ">= 0.6"
      }
    },
    "node_modules/ml-array-mean": {
      "version": "1.1.6",
      "resolved": "https://registry.npmjs.org/ml-array-mean/-/ml-array-mean-1.1.6.tgz",
      "integrity": "sha512-MIdf7Zc8HznwIisyiJGRH9tRigg3Yf4FldW8DxKxpCCv/g5CafTw0RRu51nojVEOXuCQC7DRVVu5c7XXO/5joQ==",
      "dependencies": {
        "ml-array-sum": "^1.1.6"
      }
    },
    "node_modules/ml-array-sum": {
      "version": "1.1.6",
      "resolved": "https://registry.npmjs.org/ml-array-sum/-/ml-array-sum-1.1.6.tgz",
      "integrity": "sha512-29mAh2GwH7ZmiRnup4UyibQZB9+ZLyMShvt4cH4eTK+cL2oEMIZFnSyB3SS8MlsTh6q/w/yh48KmqLxmovN4Dw==",
      "dependencies": {
        "is-any-array": "^2.0.0"
      }
    },
    "node_modules/ml-distance": {
      "version": "4.0.1",
      "resolved": "https://registry.npmjs.org/ml-distance/-/ml-distance-4.0.1.tgz",
      "integrity": "sha512-feZ5ziXs01zhyFUUUeZV5hwc0f5JW0Sh0ckU1koZe/wdVkJdGxcP06KNQuF0WBTj8FttQUzcvQcpcrOp/XrlEw==",
      "dependencies": {
        "ml-array-mean": "^1.1.6",
        "ml-distance-euclidean": "^2.0.0",
        "ml-tree-similarity": "^1.0.0"
      }
    },
    "node_modules/ml-distance-euclidean": {
      "version": "2.0.0",
      "resolved": "https://registry.npmjs.org/ml-distance-euclidean/-/ml-distance-euclidean-2.0.0.tgz",
      "integrity": "sha512-yC9/2o8QF0A3m/0IXqCTXCzz2pNEzvmcE/9HFKOZGnTjatvBbsn4lWYJkxENkA4Ug2fnYl7PXQxnPi21sgMy/Q=="
    },
    "node_modules/ml-tree-similarity": {
      "version": "1.0.0",
      "resolved": "https://registry.npmjs.org/ml-tree-similarity/-/ml-tree-similarity-1.0.0.tgz",
      "integrity": "sha512-XJUyYqjSuUQkNQHMscr6tcjldsOoAekxADTplt40QKfwW6nd++1wHWV9AArl0Zvw/TIHgNaZZNvr8QGvE8wLRg==",
      "dependencies": {
        "binary-search": "^1.3.5",
        "num-sort": "^2.0.0"
      }
    },
    "node_modules/ms": {
      "version": "2.1.3",
      "resolved": "https://registry.npmjs.org/ms/-/ms-2.1.3.tgz",
      "integrity": "sha512-6FlzubTLZG3J2a/NVCAleEhjzq5oxgHyaCU9yYXvcLsvoVaHJq/s5xXI6/XXP6tz7R9xAOtHnSO/tXtF3WRTlA=="
    },
    "node_modules/node-domexception": {
      "version": "1.0.0",
      "resolved": "https://registry.npmjs.org/node-domexception/-/node-domexception-1.0.0.tgz",
      "integrity": "sha512-/jKZoMpw0F8GRwl4/eLROPA3cfcXtLApP0QzLmUT/HuPCZWyB7IY9ZrMeKw2O/nFIqPQB3PVM9aYm0F312AXDQ==",
      "funding": [
        {
          "type": "github",
          "url": "https://github.com/sponsors/jimmywarting"
        },
        {
          "type": "github",
          "url": "https://paypal.me/jimmywarting"
        }
      ],
      "engines": {
        "node": ">=10.5.0"
      }
    },
    "node_modules/node-fetch": {
      "version": "2.7.0",
      "resolved": "https://registry.npmjs.org/node-fetch/-/node-fetch-2.7.0.tgz",
      "integrity": "sha512-c4FRfUm/dbcWZ7U+1Wq0AwCyFL+3nt2bEw05wfxSz+DWpWsitgmSgYmy2dQdWyKC1694ELPqMs/YzUSNozLt8A==",
      "dependencies": {
        "whatwg-url": "^5.0.0"
      },
      "engines": {
        "node": "4.x || >=6.0.0"
      },
      "peerDependencies": {
        "encoding": "^0.1.0"
      },
      "peerDependenciesMeta": {
        "encoding": {
          "optional": true
        }
      }
    },
    "node_modules/num-sort": {
      "version": "2.1.0",
      "resolved": "https://registry.npmjs.org/num-sort/-/num-sort-2.1.0.tgz",
      "integrity": "sha512-1MQz1Ed8z2yckoBeSfkQHHO9K1yDRxxtotKSJ9yvcTUUxSvfvzEq5GwBrjjHEpMlq/k5gvXdmJ1SbYxWtpNoVg==",
      "engines": {
        "node": ">=8"
      },
      "funding": {
        "url": "https://github.com/sponsors/sindresorhus"
      }
    },
    "node_modules/object-hash": {
      "version": "3.0.0",
      "resolved": "https://registry.npmjs.org/object-hash/-/object-hash-3.0.0.tgz",
      "integrity": "sha512-RSn9F68PjH9HqtltsSnqYC1XXoWe9Bju5+213R98cNGttag9q9yAOTzdbsqvIa7aNm5WffBZFpWYr2aWrklWAw==",
      "engines": {
        "node": ">= 6"
      }
    },
    "node_modules/openai": {
      "version": "4.4.0",
      "resolved": "https://registry.npmjs.org/openai/-/openai-4.4.0.tgz",
      "integrity": "sha512-JN0t628Kh95T0IrXl0HdBqnlJg+4Vq0Bnh55tio+dfCnyzHvMLiWyCM9m726MAJD2YkDU4/8RQB6rNbEq9ct2w==",
      "dependencies": {
        "@types/node": "^18.11.18",
        "@types/node-fetch": "^2.6.4",
        "abort-controller": "^3.0.0",
        "agentkeepalive": "^4.2.1",
        "digest-fetch": "^1.3.0",
        "form-data-encoder": "1.7.2",
        "formdata-node": "^4.3.2",
        "node-fetch": "^2.6.7"
      },
      "bin": {
        "openai": "bin/cli"
      }
    },
    "node_modules/openapi-types": {
      "version": "12.1.3",
      "resolved": "https://registry.npmjs.org/openapi-types/-/openapi-types-12.1.3.tgz",
      "integrity": "sha512-N4YtSYJqghVu4iek2ZUvcN/0aqH1kRDuNqzcycDxhOUpg7GdvLa2F3DgS6yBNhInhv2r/6I0Flkn7CqL8+nIcw=="
    },
    "node_modules/p-finally": {
      "version": "1.0.0",
      "resolved": "https://registry.npmjs.org/p-finally/-/p-finally-1.0.0.tgz",
      "integrity": "sha512-LICb2p9CB7FS+0eR1oqWnHhp0FljGLZCWBE9aix0Uye9W8LTQPwMTYVGWQWIw9RdQiDg4+epXQODwIYJtSJaow==",
      "engines": {
        "node": ">=4"
      }
    },
    "node_modules/p-queue": {
      "version": "6.6.2",
      "resolved": "https://registry.npmjs.org/p-queue/-/p-queue-6.6.2.tgz",
      "integrity": "sha512-RwFpb72c/BhQLEXIZ5K2e+AhgNVmIejGlTgiB9MzZ0e93GRvqZ7uSi0dvRF7/XIXDeNkra2fNHBxTyPDGySpjQ==",
      "dependencies": {
        "eventemitter3": "^4.0.4",
        "p-timeout": "^3.2.0"
      },
      "engines": {
        "node": ">=8"
      },
      "funding": {
        "url": "https://github.com/sponsors/sindresorhus"
      }
    },
    "node_modules/p-retry": {
      "version": "4.6.2",
      "resolved": "https://registry.npmjs.org/p-retry/-/p-retry-4.6.2.tgz",
      "integrity": "sha512-312Id396EbJdvRONlngUx0NydfrIQ5lsYu0znKVUzVvArzEIt08V1qhtyESbGVd1FGX7UKtiFp5uwKZdM8wIuQ==",
      "dependencies": {
        "@types/retry": "0.12.0",
        "retry": "^0.13.1"
      },
      "engines": {
        "node": ">=8"
      }
    },
    "node_modules/p-timeout": {
      "version": "3.2.0",
      "resolved": "https://registry.npmjs.org/p-timeout/-/p-timeout-3.2.0.tgz",
      "integrity": "sha512-rhIwUycgwwKcP9yTOOFK/AKsAopjjCakVqLHePO3CC6Mir1Z99xT+R63jZxAT5lFZLa2inS5h+ZS2GvR99/FBg==",
      "dependencies": {
        "p-finally": "^1.0.0"
      },
      "engines": {
        "node": ">=8"
      }
    },
    "node_modules/retry": {
      "version": "0.13.1",
      "resolved": "https://registry.npmjs.org/retry/-/retry-0.13.1.tgz",
      "integrity": "sha512-XQBQ3I8W1Cge0Seh+6gjj03LbmRFWuoszgK9ooCpwYIrhhoO80pfq4cUkU5DkknwfOfFteRwlZ56PYOGYyFWdg==",
      "engines": {
        "node": ">= 4"
      }
    },
    "node_modules/tr46": {
      "version": "0.0.3",
      "resolved": "https://registry.npmjs.org/tr46/-/tr46-0.0.3.tgz",
      "integrity": "sha512-N3WMsuqV66lT30CrXNbEjx4GEwlow3v6rr4mCcv6prnfwhS01rkgyFdjPNBYd9br7LpXV1+Emh01fHnq2Gdgrw=="
    },
    "node_modules/typescript": {
      "version": "5.2.2",
      "resolved": "https://registry.npmjs.org/typescript/-/typescript-5.2.2.tgz",
      "integrity": "sha512-mI4WrpHsbCIcwT9cF4FZvr80QUeKvsUsUvKDoR+X/7XHQH98xYD8YHZg7ANtz2GtZt/CBq2QJ0thkGJMHfqc1w==",
      "dev": true,
      "bin": {
        "tsc": "bin/tsc",
        "tsserver": "bin/tsserver"
      },
      "engines": {
        "node": ">=14.17"
      }
    },
    "node_modules/uuid": {
      "version": "9.0.1",
      "resolved": "https://registry.npmjs.org/uuid/-/uuid-9.0.1.tgz",
      "integrity": "sha512-b+1eJOlsR9K8HJpow9Ok3fiWOWSIcIzXodvv0rQjVoOVNpWMpxf1wZNpt4y9h10odCNrqnYp1OBzRktckBe3sA==",
      "funding": [
        "https://github.com/sponsors/broofa",
        "https://github.com/sponsors/ctavan"
      ],
      "bin": {
        "uuid": "dist/bin/uuid"
      }
    },
    "node_modules/web-streams-polyfill": {
      "version": "4.0.0-beta.3",
      "resolved": "https://registry.npmjs.org/web-streams-polyfill/-/web-streams-polyfill-4.0.0-beta.3.tgz",
      "integrity": "sha512-QW95TCTaHmsYfHDybGMwO5IJIM93I/6vTRk+daHTWFPhwh+C8Cg7j7XyKrwrj8Ib6vYXe0ocYNrmzY4xAAN6ug==",
      "engines": {
        "node": ">= 14"
      }
    },
    "node_modules/webidl-conversions": {
      "version": "3.0.1",
      "resolved": "https://registry.npmjs.org/webidl-conversions/-/webidl-conversions-3.0.1.tgz",
      "integrity": "sha512-2JAn3z8AR6rjK8Sm8orRC0h/bcl/DqL7tRPdGZ4I1CjdF+EaMLmYxBHyXuKL849eucPFhvBoxMsflfOb8kxaeQ=="
    },
    "node_modules/whatwg-url": {
      "version": "5.0.0",
      "resolved": "https://registry.npmjs.org/whatwg-url/-/whatwg-url-5.0.0.tgz",
      "integrity": "sha512-saE57nupxk6v3HY35+jzBwYa0rKSy0XR8JSxZPwgLr7ys0IBzhGviA1/TUGJLmSVqs8pb9AnvICXEuOHLprYTw==",
      "dependencies": {
        "tr46": "~0.0.3",
        "webidl-conversions": "^3.0.0"
      }
    },
    "node_modules/yaml": {
      "version": "2.3.2",
      "resolved": "https://registry.npmjs.org/yaml/-/yaml-2.3.2.tgz",
      "integrity": "sha512-N/lyzTPaJasoDmfV7YTrYCI0G/3ivm/9wdG0aHuheKowWQwGTsK0Eoiw6utmzAnI6pkJa0DUVygvp3spqqEKXg==",
      "engines": {
        "node": ">= 14"
      }
    },
    "node_modules/zod": {
      "version": "3.22.4",
      "resolved": "https://registry.npmjs.org/zod/-/zod-3.22.4.tgz",
      "integrity": "sha512-iC+8Io04lddc+mVqQ9AZ7OQ2MrUKGN+oIQyq1vemgt46jwCwLfhq7/pwnBnNXXXZb8VTVLKwp9EDkx+ryxIWmg==",
      "funding": {
        "url": "https://github.com/sponsors/colinhacks"
      }
    },
    "node_modules/zod-to-json-schema": {
      "version": "3.21.4",
      "resolved": "https://registry.npmjs.org/zod-to-json-schema/-/zod-to-json-schema-3.21.4.tgz",
      "integrity": "sha512-fjUZh4nQ1s6HMccgIeE0VP4QG/YRGPmyjO9sAh890aQKPEk3nqbfUXhMFaC+Dr5KvYBm8BCyvfpZf2jY9aGSsw==",
      "peerDependencies": {
        "zod": "^3.21.4"
      }
    }
  }
 }
--- a/examples/langchain-typescript-simple/package.json
+++ b/examples/langchain-typescript-simple/package.json
@@ -0,0 +1,8 @@
 {
  "devDependencies": {
    "typescript": "^5.2.2"
  },
  "dependencies": {
    "langchain": "^0.0.165"
  }
 }
--- a/examples/midjourney-prompter/Modelfile
+++ b/examples/midjourney-prompter/Modelfile
@@ -1,8 +0,0 @@
 # Modelfile for creating a Midjourney prompts from a topic
 # This prompt was adapted from the original at https://www.greataiprompts.com/guide/midjourney/best-chatgpt-prompt-for-midjourney/
 # Run `ollama create mj -f ./Modelfile` and then `ollama run mj` and enter a topic
 FROM nous-hermes
 SYSTEM """
 Embrace your role as an AI-powered creative assistant, employing Midjourney to manifest compelling AI-generated art. I will outline a specific image concept, and in response, you must produce an exhaustive, multifaceted prompt for Midjourney, ensuring every detail of the original concept is represented in your instructions. Midjourney doesn't do well with text, so after the prompt, give me instructions that I can use to create the titles in a image editor.
 """
--- a/examples/modelfile-10tweets/Modelfile
+++ b/examples/modelfile-10tweets/Modelfile
--- a/examples/modelfile-10tweets/README.md
+++ b/examples/modelfile-10tweets/README.md
@@ -0,0 +1,23 @@
 # Ten Tweets Modelfile
 This is a simple modelfile that generates ten tweets based off any topic.
 ```bash
 ollama create tentweets
 ollama run tentweets
 >>> underwater basketweaving
 Great! Here are ten creative tweets about underwater basketweaving:
 1. "Just discovered the ultimate stress-reliever: Underwater basketweaving! 🌊🧵 #UnderwaterBasketweaving #StressRelief"
 2. "Who needs meditation when you can do underwater basketweaving? 😴👀 #PeacefulDistraction #UnderwaterBasketweaving"
 3. "Just spent an hour in the pool and still managed to knot my basket. Goal: untangle it before next session. 💪🏽 #ChallengeAccepted #UnderwaterBasketweaving"
 4. "When life gives you lemons, make underwater basketweaving! 🍋🧵 #LemonadeLife #UnderwaterBasketweaving"
 5. "Just realized my underwater basketweaving skills could come in handy during a zombie apocalypse. 😂🧡 #SurvivalTips #UnderwaterBasketweaving"
 6. "I'm not lazy, I'm just conserving energy for my next underwater basketweaving session. 😴💤 #LazyDay #UnderwaterBasketweaving"
 7. "Just found my inner peace while doing underwater basketweaving. It's like meditation, but with knots! 🙏🧵 #Mindfulness #UnderwaterBasketweaving"
 8. "Why study for exams when you can do underwater basketweaving and forget all your worries? 😜🧵 #ProcrastinationStation #UnderwaterBasketweaving"
 9. "Just had to cut my underwater basketweaving session short due to a sudden urge to breathe. 🤯🌊 #AquaticAdventures #UnderwaterBasketweaving"
 10. "I'm not sure what's more impressive: my underwater basketweaving skills or the fact that I didn't drown trying to make this tweet. 😅🧵 #Accomplishment 
 #UnderwaterBasketweaving"
 ```
--- a/examples/modelfile-devopsengineer/Modelfile
+++ b/examples/modelfile-devopsengineer/Modelfile
@@ -1,7 +1,7 @@
 # Modelfile for creating a devops engineer assistant
 # Run `ollama create devops-engineer -f ./Modelfile` and then `ollama run devops-engineer` and enter a topic
-FROM llama2:13b
+FROM mistral
 PARAMETER temperature 1
 SYSTEM """
 You are a senior devops engineer, acting as an assistant. You offer help with cloud technologies like: Terraform, AWS, kubernetes, python. You answer with code examples when possible
--- a/examples/modelfile-devopsengineer/README.md
+++ b/examples/modelfile-devopsengineer/README.md
@@ -0,0 +1,40 @@
 # DevOps Engineer Modelfile
 This is a simple modelfile that takes will respond to any questions that a senior DevOps engineer should know how to solve.
 ````text
 ollama create devopsengineer
 ollama run devopsengineer
 >>> how can i proxy all connections from inside the office through a single server that caches large files. Provide a config file for nginx to achieve this?
 Here is an example configuration file for Nginx that can be used to proxy all connections from inside the office network to a centralized cache server:
 ```perl
 http {
    upstream cache_server {
        server <cache_server_ip>;
        server <cache_server_ip>;
        server <cache_server_ip>;
    }
    server {
        listen 80;
        server_name localhost;
        location / {
            proxy_pass http://cache_server;
            proxy_set_header Host $host;
            proxy_set_header X-Real-IP $remote_addr;
            proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        }
    }
 }
 ```
 In this configuration, we define an `upstream` block that lists the IP addresses of the cache servers. We then define a server block that listens on port 80 and routes all 
 traffic to the `/` location to the cache servers using the `proxy_pass` directive. The `proxy_set_header` directives are used to preserve the source IP address of the client
 request when forwarding it to the cache server.
 To use this configuration, you would need to replace the placeholder `<cache_server_ip>` with the actual IP addresses of your cache servers. You would also need to make sure
 that the cache servers are configured to accept incoming connections from the Nginx server and handle requests for files.
 ````
--- a/examples/modelfile-mario/Modelfile
+++ b/examples/modelfile-mario/Modelfile
--- a/examples/modelfile-mario/logo.png
+++ b/examples/modelfile-mario/logo.png
--- a/examples/modelfile-mario/readme.md
+++ b/examples/modelfile-mario/readme.md
--- a/examples/modelfile-midjourney/Modelfile
+++ b/examples/modelfile-midjourney/Modelfile
@@ -0,0 +1,11 @@
 # Modelfile for creating a Midjourney prompts from a topic
 # This prompt was adapted from the original at https://www.greataiprompts.com/guide/midjourney/best-chatgpt-prompt-for-midjourney/
 # Run `ollama create mj -f ./Modelfile` and then `ollama run mj` and enter a topic
 FROM zephyr
 PARAMETER temperature 0.8
 PARAMETER top_k 500
 PARAMETER top_p 0.9
 SYSTEM """
 Embrace your role as a creative illustrator. Based on a concept provided, you must produce a single paragraph with a multifaceted description of an image, ensuring significant details of the concept and more is represented in your instructions. You do not need to write complete sentences but rather short concepts with the following information: the level of detail that should be represented, an artistic style and maybe a specific name of a painter or illustrator, the ideal color pallete, lighting, mood, perspective, the setting, time of day, weather, the season, the time period, location, materials, the textures, patterns, lines, brushstrokes, techniques, the medium, the genre, the rendering style. Don't include everything and keep the description length under 250 words. 
 """
--- a/examples/modelfile-midjourney/README.md
+++ b/examples/modelfile-midjourney/README.md
@@ -0,0 +1,11 @@
 # Midjourney Prompt Generator Modelfile
 This simple modelfile will help create a prompt to feed to Midjourney.
 ```text
 ollama create midjourney
 ollama run midjourney
 >>> a sports car in the mountains. 
 A sleek, high-performance automobile cuts through a serpentine mountain landscape. The concept is a classic illustration of speed and power, depicted in the style of pop art by Andy Warhol. The color palette is dominated by bold, primary hues of red, blue, and yellow, with striking accent colors of white, black, and metallic shades. The lighting is bright and focused, casting sharp shadows on the rugged terrain. A sense of excitement and anticipation permeates throughout the scene, as the car navigates a treacherous course through the winding road. The perspective is low, allowing for a full view of the vehicle's sleek lines and intricate details. The setting takes place in the afternoon during a sunny day in autumn, as evidenced by the vibrant foliage on the mountainside. The time period is modern, with nods to classic car design. The materials are primarily digital, allowing for smooth curves and sharp contrasts. The textures are sleek and polished, with meticulously detailed lines and brushstrokes that accentuate the car's aerodynamic design. The patterns consist of geometric shapes and bold stripes, adding to the car's dynamic appeal. The genre is modern realism, with a focus on precision and detail. The rendering style is highly technical, capturing the nuances and subtleties of the vehicle and its surroundings in breathtaking detail.
 ```
--- a/examples/modelfile-recipemaker/Modelfile
+++ b/examples/modelfile-recipemaker/Modelfile
--- a/examples/modelfile-recipemaker/README.md
+++ b/examples/modelfile-recipemaker/README.md
@@ -0,0 +1,20 @@
 # Recipe Maker Modelfile 
 Simple modelfile to generate a recipe from a short list of ingredients.
 ```
 ollama create recipemaker
 ollama run recipemaker
 >>> chilli pepper, white chocolate, kale
 Ingredients:
 - 1 small chili pepper
 - 4 squares of white chocolate
 - handful of kale leaves
 Instructions:
 1. In a blender or food processor, puree the chilies and white chocolate until smooth.
 2. Add the chopped kale leaves to the blender and pulse until well combined.
 3. Serve immediately as a dip for crackers or use it as an ingredient in your favorite recipe. The mixture of spicy chili pepper with sweet white chocolate and nutritious 
 kale will make your taste buds dance with delight!
 ```
--- a/examples/modelfile-sentiments/Modelfile
+++ b/examples/modelfile-sentiments/Modelfile
--- a/examples/modelfile-sentiments/Readme.md
+++ b/examples/modelfile-sentiments/Readme.md
--- a/examples/modelfile-tweetwriter/Modelfile
+++ b/examples/modelfile-tweetwriter/Modelfile
--- a/examples/python-dockerit/Modelfile
+++ b/examples/python-dockerit/Modelfile
@@ -1,4 +1,4 @@
-FROM llama2
+FROM mistral
 SYSTEM """
 You are an experienced Devops engineer focused on docker. When given specifications for a particular need or application you know the best way to host that within a docker container. For instance if someone tells you they want an nginx server to host files located at /web you will answer as follows
--- a/examples/python-dockerit/README.md
+++ b/examples/python-dockerit/README.md
--- a/examples/python-dockerit/dockerit.py
+++ b/examples/python-dockerit/dockerit.py
--- a/examples/python-dockerit/requirements.txt
+++ b/examples/python-dockerit/requirements.txt
--- a/examples/python-simplegenerate/client.py
+++ b/examples/python-simplegenerate/client.py
--- a/examples/typescript-mentors/.gitignore
+++ b/examples/typescript-mentors/.gitignore
@@ -0,0 +1,2 @@
 node_modules
 package-lock.json
--- a/examples/typescript-mentors/README.md
+++ b/examples/typescript-mentors/README.md
@@ -0,0 +1,21 @@
 # Ask the Mentors
 This example demonstrates how one would create a set of 'mentors' you can have a conversation with. The mentors are generated using the `character-generator.ts` file. This will use **Stable Beluga 70b** to create a bio and list of verbal ticks and common phrases used by each person. Then `mentors.ts` will take a question, and choose three of the 'mentors' and start a conversation with them. Occasionally, they will talk to each other, and other times they will just deliver a set of monologues. It's fun to see what they do and say.
 ## Usage
 ```bash
 ts-node ./character-generator.ts "Lorne Greene"
 ```
 This will create `lornegreene/Modelfile`. Now you can create a model with this command:
 ```bash
 ollama create lornegreene -f lornegreene/Modelfile
 ```
 If you want to add your own mentors, you will have to update the code to look at your namespace instead of **mattw**. Also set the list of mentors to include yours.
 ```bash
 ts-node ./mentors.ts "What is a Jackalope?"
 ```
--- a/examples/typescript-mentors/character-generator.ts
+++ b/examples/typescript-mentors/character-generator.ts
@@ -0,0 +1,26 @@
 import { Ollama } from 'ollama-node'
 import fs from 'fs';
 import path from 'path';
 async function characterGenerator() {
  const character = process.argv[2];
  console.log(`You are creating a character for ${character}.`);
  const foldername = character.replace(/\s/g, '').toLowerCase();
  const directory = path.join(__dirname, foldername);
  if (!fs.existsSync(directory)) {
    fs.mkdirSync(directory, { recursive: true });
  }
  const ollama = new Ollama();
  ollama.setModel("stablebeluga2:70b-q4_K_M");
  const bio = await ollama.generate(`create a bio of ${character} in a single long paragraph. Instead of saying '${character} is...' or '${character} was...' use language like 'You are...' or 'You were...'. Then create a paragraph describing the speaking mannerisms and style of ${character}. Don't include anything about how ${character} looked or what they sounded like, just focus on the words they said. Instead of saying '${character} would say...' use language like 'You should say...'. If you use quotes, always use single quotes instead of double quotes. If there are any specific words or phrases you used a lot, show how you used them. `);
  const thecontents = `FROM llama2\nSYSTEM """\n${bio.response.replace(/(\r\n|\n|\r)/gm, " ").replace('would', 'should')} All answers to questions should be related back to what you are most known for.\n"""`;
  fs.writeFile(path.join(directory, 'Modelfile'), thecontents, (err: any) => {
    if (err) throw err;
    console.log('The file has been saved!');
  });
 }
 characterGenerator();
--- a/examples/typescript-mentors/mentors.ts
+++ b/examples/typescript-mentors/mentors.ts
@@ -0,0 +1,59 @@
 import { Ollama } from 'ollama-node';
 const mentorCount = 3;
 const ollama = new Ollama();
 function getMentors(): string[] {
  const mentors = ['Gary Vaynerchuk', 'Kanye West', 'Martha Stewart', 'Neil deGrasse Tyson', 'Owen Wilson', 'Ronald Reagan', 'Donald Trump', 'Barack Obama', 'Jeff Bezos'];
  const chosenMentors: string[] = [];
  for (let i = 0; i < mentorCount; i++) {
    const mentor = mentors[Math.floor(Math.random() * mentors.length)];
    chosenMentors.push(mentor);
    mentors.splice(mentors.indexOf(mentor), 1);
  }
  return chosenMentors;
 }
 function getMentorFileName(mentor: string): string {
  const model = mentor.toLowerCase().replace(/\s/g, '');
  return `mattw/${model}`;
 }
 async function getSystemPrompt(mentor: string, isLast: boolean, question: string): Promise<string> {
  ollama.setModel(getMentorFileName(mentor));
  const info = await ollama.showModelInfo()
  let SystemPrompt = info.system || '';
  SystemPrompt += ` You should continue the conversation as if you were ${mentor} and acknowledge the people before you in the conversation. You should adopt their mannerisms and tone, but also not use language they wouldn't use. If they are not known to know about the concept in the question, don't offer an answer. Your answer should be no longer than 1 paragraph. And definitely try not to sound like anyone else. Don't repeat any slang or phrases already used. And if it is a question the original ${mentor} wouldn't have know the answer to, just say that you don't know, in the style of ${mentor}. And think about the time the person lived. Don't use terminology that they wouldn't have used.`
  if (isLast) {
    SystemPrompt += ` End your answer with something like I hope our answers help you out`;
  } else {
    SystemPrompt += ` Remember, this is a conversation, so you don't need a conclusion, but end your answer with a question related to the first question: "${question}".`;
  }
  return SystemPrompt;
 }
 async function main() {
  const mentors = getMentors();
  const question = process.argv[2];
  let theConversation = `Here is the conversation so far.\nYou: ${question}\n`
  for await (const mentor of mentors) {
    const SystemPrompt = await getSystemPrompt(mentor, mentor === mentors[mentorCount - 1], question);
    ollama.setModel(getMentorFileName(mentor));
    ollama.setSystemPrompt(SystemPrompt);
    let output = '';
    process.stdout.write(`\n${mentor}: `);
    for await (const chunk of ollama.streamingGenerate(theConversation + `Continue the conversation as if you were ${mentor} on the question "${question}".`)) {
      if (chunk.response) {
        output += chunk.response;
        process.stdout.write(chunk.response);
      } else {
        process.stdout.write('\n');
      }
    }
    theConversation += `${mentor}: ${output}\n\n`
  }
 }
 main();
--- a/examples/typescript-mentors/package.json
+++ b/examples/typescript-mentors/package.json
@@ -0,0 +1,7 @@
 {
  "dependencies": {
    "fs": "^0.0.1-security",
    "ollama-node": "^0.0.3",
    "path": "^0.12.7"
  }
 }
--- a/format/bytes.go
+++ b/format/bytes.go
@@ -0,0 +1,16 @@
 package format
 import "fmt"
 func HumanBytes(b int64) string {
 	switch {
 	case b > 1000*1000*1000:
 		return fmt.Sprintf("%d GB", b/1000/1000/1000)
 	case b > 1000*1000:
 		return fmt.Sprintf("%d MB", b/1000/1000)
 	case b > 1000:
 		return fmt.Sprintf("%d KB", b/1000)
 	default:
 		return fmt.Sprintf("%d B", b)
 	}
 }
--- a/format/time.go
+++ b/format/time.go
@@ -7,26 +7,14 @@ import (
 	"time"
 )
-// HumanDuration returns a human-readable approximation of a duration
+// humanDuration returns a human-readable approximation of a
-// (eg. "About a minute", "4 hours ago", etc.).
+// duration (eg. "About a minute", "4 hours ago", etc.).
-// Modified version of github.com/docker/go-units.HumanDuration
+func humanDuration(d time.Duration) string {
 func HumanDuration(d time.Duration) string {
 	return HumanDurationWithCase(d, true)
 }
 // HumanDurationWithCase returns a human-readable approximation of a
 // duration (eg. "About a minute", "4 hours ago", etc.). but allows
 // you to specify whether the first word should be capitalized
 // (eg. "About" vs. "about")
 func HumanDurationWithCase(d time.Duration, useCaps bool) string {
 	seconds := int(d.Seconds())
 	switch {
 	case seconds < 1:
-		if useCaps {
+		return "Less than a second"
 			return "Less than a second"
 		}
 		return "less than a second"
 	case seconds == 1:
 		return "1 second"
 	case seconds < 60:
@@ -36,10 +24,7 @@ func HumanDurationWithCase(d time.Duration, useCaps bool) string {
 	minutes := int(d.Minutes())
 	switch {
 	case minutes == 1:
-		if useCaps {
+		return "About a minute"
 			return "About a minute"
 		}
 		return "about a minute"
 	case minutes < 60:
 		return fmt.Sprintf("%d minutes", minutes)
 	}
@@ -47,10 +32,7 @@ func HumanDurationWithCase(d time.Duration, useCaps bool) string {
 	hours := int(math.Round(d.Hours()))
 	switch {
 	case hours == 1:
-		if useCaps {
+		return "About an hour"
 			return "About an hour"
 		}
 		return "about an hour"
 	case hours < 48:
 		return fmt.Sprintf("%d hours", hours)
 	case hours < 24*7*2:
@@ -65,77 +47,22 @@ func HumanDurationWithCase(d time.Duration, useCaps bool) string {
 }
 func HumanTime(t time.Time, zeroValue string) string {
-	return humanTimeWithCase(t, zeroValue, true)
+	return humanTime(t, zeroValue)
 }
 func HumanTimeLower(t time.Time, zeroValue string) string {
-	return humanTimeWithCase(t, zeroValue, false)
+	return strings.ToLower(humanTime(t, zeroValue))
 }
-func humanTimeWithCase(t time.Time, zeroValue string, useCaps bool) string {
+func humanTime(t time.Time, zeroValue string) string {
 	if t.IsZero() {
 		return zeroValue
 	}
 	delta := time.Since(t)
 	if delta < 0 {
-		return HumanDurationWithCase(-delta, useCaps) + " from now"
+		return humanDuration(-delta) + " from now"
 	}
-	return HumanDurationWithCase(delta, useCaps) + " ago"
+
-}
+	return humanDuration(delta) + " ago"
 // ExcatDuration returns a human readable hours/minutes/seconds or milliseconds format of a duration
 // the most precise level of duration is milliseconds
 func ExactDuration(d time.Duration) string {
 	if d.Seconds() < 1 {
 		if d.Milliseconds() == 1 {
 			return fmt.Sprintf("%d millisecond", d.Milliseconds())
 		}
 		return fmt.Sprintf("%d milliseconds", d.Milliseconds())
 	}
 	var readableDur strings.Builder
 	dur := d.String()
 	// split the default duration string format of 0h0m0s into something nicer to read
 	h := strings.Split(dur, "h")
 	if len(h) > 1 {
 		hours := h[0]
 		if hours == "1" {
 			readableDur.WriteString(fmt.Sprintf("%s hour ", hours))
 		} else {
 			readableDur.WriteString(fmt.Sprintf("%s hours ", hours))
 		}
 		dur = h[1]
 	}
 	m := strings.Split(dur, "m")
 	if len(m) > 1 {
 		mins := m[0]
 		switch mins {
 		case "0":
 			// skip
 		case "1":
 			readableDur.WriteString(fmt.Sprintf("%s minute ", mins))
 		default:
 			readableDur.WriteString(fmt.Sprintf("%s minutes ", mins))
 		}
 		dur = m[1]
 	}
 	s := strings.Split(dur, "s")
 	if len(s) > 0 {
 		sec := s[0]
 		switch sec {
 		case "0":
 			// skip
 		case "1":
 			readableDur.WriteString(fmt.Sprintf("%s second ", sec))
 		default:
 			readableDur.WriteString(fmt.Sprintf("%s seconds ", sec))
 		}
 	}
 	return strings.TrimSpace(readableDur.String())
 }
--- a/format/time_test.go
+++ b/format/time_test.go
@@ -11,92 +11,25 @@ func assertEqual(t *testing.T, a interface{}, b interface{}) {
 	}
 }
 func TestHumanDuration(t *testing.T) {
 	day := 24 * time.Hour
 	week := 7 * day
 	month := 30 * day
 	year := 365 * day
 	assertEqual(t, "Less than a second", HumanDuration(450*time.Millisecond))
 	assertEqual(t, "Less than a second", HumanDurationWithCase(450*time.Millisecond, true))
 	assertEqual(t, "less than a second", HumanDurationWithCase(450*time.Millisecond, false))
 	assertEqual(t, "1 second", HumanDuration(1*time.Second))
 	assertEqual(t, "45 seconds", HumanDuration(45*time.Second))
 	assertEqual(t, "46 seconds", HumanDuration(46*time.Second))
 	assertEqual(t, "59 seconds", HumanDuration(59*time.Second))
 	assertEqual(t, "About a minute", HumanDuration(60*time.Second))
 	assertEqual(t, "About a minute", HumanDurationWithCase(1*time.Minute, true))
 	assertEqual(t, "about a minute", HumanDurationWithCase(1*time.Minute, false))
 	assertEqual(t, "3 minutes", HumanDuration(3*time.Minute))
 	assertEqual(t, "35 minutes", HumanDuration(35*time.Minute))
 	assertEqual(t, "35 minutes", HumanDuration(35*time.Minute+40*time.Second))
 	assertEqual(t, "45 minutes", HumanDuration(45*time.Minute))
 	assertEqual(t, "45 minutes", HumanDuration(45*time.Minute+40*time.Second))
 	assertEqual(t, "46 minutes", HumanDuration(46*time.Minute))
 	assertEqual(t, "59 minutes", HumanDuration(59*time.Minute))
 	assertEqual(t, "About an hour", HumanDuration(1*time.Hour))
 	assertEqual(t, "About an hour", HumanDurationWithCase(1*time.Hour+29*time.Minute, true))
 	assertEqual(t, "about an hour", HumanDurationWithCase(1*time.Hour+29*time.Minute, false))
 	assertEqual(t, "2 hours", HumanDuration(1*time.Hour+31*time.Minute))
 	assertEqual(t, "2 hours", HumanDuration(1*time.Hour+59*time.Minute))
 	assertEqual(t, "3 hours", HumanDuration(3*time.Hour))
 	assertEqual(t, "3 hours", HumanDuration(3*time.Hour+29*time.Minute))
 	assertEqual(t, "4 hours", HumanDuration(3*time.Hour+31*time.Minute))
 	assertEqual(t, "4 hours", HumanDuration(3*time.Hour+59*time.Minute))
 	assertEqual(t, "4 hours", HumanDuration(3*time.Hour+60*time.Minute))
 	assertEqual(t, "24 hours", HumanDuration(24*time.Hour))
 	assertEqual(t, "36 hours", HumanDuration(1*day+12*time.Hour))
 	assertEqual(t, "2 days", HumanDuration(2*day))
 	assertEqual(t, "7 days", HumanDuration(7*day))
 	assertEqual(t, "13 days", HumanDuration(13*day+5*time.Hour))
 	assertEqual(t, "2 weeks", HumanDuration(2*week))
 	assertEqual(t, "2 weeks", HumanDuration(2*week+4*day))
 	assertEqual(t, "3 weeks", HumanDuration(3*week))
 	assertEqual(t, "4 weeks", HumanDuration(4*week))
 	assertEqual(t, "4 weeks", HumanDuration(4*week+3*day))
 	assertEqual(t, "4 weeks", HumanDuration(1*month))
 	assertEqual(t, "6 weeks", HumanDuration(1*month+2*week))
 	assertEqual(t, "2 months", HumanDuration(2*month))
 	assertEqual(t, "2 months", HumanDuration(2*month+2*week))
 	assertEqual(t, "3 months", HumanDuration(3*month))
 	assertEqual(t, "3 months", HumanDuration(3*month+1*week))
 	assertEqual(t, "5 months", HumanDuration(5*month+2*week))
 	assertEqual(t, "13 months", HumanDuration(13*month))
 	assertEqual(t, "23 months", HumanDuration(23*month))
 	assertEqual(t, "24 months", HumanDuration(24*month))
 	assertEqual(t, "2 years", HumanDuration(24*month+2*week))
 	assertEqual(t, "3 years", HumanDuration(3*year+2*month))
 }
 func TestHumanTime(t *testing.T) {
 	now := time.Now()
 	t.Run("zero value", func(t *testing.T) {
 		assertEqual(t, HumanTime(time.Time{}, "never"), "never")
 	})
 	t.Run("time in the future", func(t *testing.T) {
 		v := now.Add(48 * time.Hour)
 		assertEqual(t, HumanTime(v, ""), "2 days from now")
 	})
 	t.Run("time in the past", func(t *testing.T) {
 		v := now.Add(-48 * time.Hour)
 		assertEqual(t, HumanTime(v, ""), "2 days ago")
 	})
 }
-func TestExactDuration(t *testing.T) {
+	t.Run("soon", func(t *testing.T) {
-	assertEqual(t, "1 millisecond", ExactDuration(1*time.Millisecond))
+		v := now.Add(800*time.Millisecond)
-	assertEqual(t, "10 milliseconds", ExactDuration(10*time.Millisecond))
+		assertEqual(t, HumanTime(v, ""), "Less than a second from now")
-	assertEqual(t, "1 second", ExactDuration(1*time.Second))
+	})
 	assertEqual(t, "10 seconds", ExactDuration(10*time.Second))
 	assertEqual(t, "1 minute", ExactDuration(1*time.Minute))
 	assertEqual(t, "10 minutes", ExactDuration(10*time.Minute))
 	assertEqual(t, "1 hour", ExactDuration(1*time.Hour))
 	assertEqual(t, "10 hours", ExactDuration(10*time.Hour))
 	assertEqual(t, "1 hour 1 second", ExactDuration(1*time.Hour+1*time.Second))
 	assertEqual(t, "1 hour 10 seconds", ExactDuration(1*time.Hour+10*time.Second))
 	assertEqual(t, "1 hour 1 minute", ExactDuration(1*time.Hour+1*time.Minute))
 	assertEqual(t, "1 hour 10 minutes", ExactDuration(1*time.Hour+10*time.Minute))
 	assertEqual(t, "1 hour 1 minute 1 second", ExactDuration(1*time.Hour+1*time.Minute+1*time.Second))
 	assertEqual(t, "10 hours 10 minutes 10 seconds", ExactDuration(10*time.Hour+10*time.Minute+10*time.Second))
 }
--- a/go.mod
+++ b/go.mod
@@ -10,6 +10,7 @@ require (
 	github.com/olekukonko/tablewriter v0.0.5
 	github.com/pdevine/readline v1.5.2
 	github.com/spf13/cobra v1.7.0
 	golang.org/x/sync v0.3.0
 )
 require github.com/rivo/uniseg v0.2.0 // indirect
--- a/go.sum
+++ b/go.sum
@@ -125,6 +125,8 @@ golang.org/x/exp v0.0.0-20230817173708-d852ddb80c63/go.mod h1:0v4NqG35kSWCMzLaMe
 golang.org/x/net v0.0.0-20210226172049-e18ecbb05110/go.mod h1:m0MpNAwzfU5UDzcl9v0D8zg8gWTRqZa9RBIspLL5mdg=
 golang.org/x/net v0.10.0 h1:X2//UzNDwYmtCLn7To6G58Wr6f5ahEAQgKNzv9Y951M=
 golang.org/x/net v0.10.0/go.mod h1:0qNGK6F8kojg2nk9dLZ2mShWaEBan6FAoqfSigmmuDg=
 golang.org/x/sync v0.3.0 h1:ftCYgMx6zT/asHUrPw8BLLscYtGznsLAnjq5RH9P66E=
 golang.org/x/sync v0.3.0/go.mod h1:FU7BRWz2tNW+3quACPkgCx/L+uEAv1htQ0V83Z9Rj+Y=
 golang.org/x/sys v0.0.0-20201119102817-f84b799fce68/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs=
 golang.org/x/sys v0.0.0-20210615035016-665e8c7367d1/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg=
 golang.org/x/sys v0.0.0-20210630005230-0f9fa26af87c/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg=
--- a/llm/falcon.go
+++ b/llm/falcon.go
@@ -1,7 +1,5 @@
 package llm
 const ModelFamilyFalcon = "falcon"
 const (
 	falconModelType7B   = 32
 	falconModelType40B  = 60
@@ -17,6 +15,6 @@ func falconModelType(numLayer uint32) string {
 	case 80:
 		return "180B"
 	default:
-		return "Unknown"
+		return "unknown"
 	}
 }
--- a/llm/ggml.go
+++ b/llm/ggml.go
@@ -69,7 +69,7 @@ func fileType(fileType uint32) string {
 	case fileTypeQ6_K:
 		return "Q6_K"
 	default:
-		return "Unknown"
+		return "unknown"
 	}
 }
@@ -77,6 +77,7 @@ type model interface {
 	ModelFamily() string
 	ModelType() string
 	FileType() string
 	NumLayers() int64
 }
 type container interface {
--- a/llm/gguf.go
+++ b/llm/gguf.go
@@ -109,9 +109,13 @@ func (llm *ggufModel) ModelType() string {
 		if blocks, ok := llm.kv["falcon.block_count"].(uint32); ok {
 			return falconModelType(blocks)
 		}
 	case "starcoder":
 		if blocks, ok := llm.kv["starcoder.block_count"].(uint32); ok {
 			return starCoderModelType(blocks)
 		}
 	}
-	return "Unknown"
+	return "unknown"
 }
 func (llm *ggufModel) FileType() string {
@@ -120,7 +124,7 @@ func (llm *ggufModel) FileType() string {
 		return fileType(t)
 	}
-	return "Unknown"
+	return "unknown"
 }
 func (llm *ggufModel) Decode(r io.Reader) error {
@@ -195,6 +199,16 @@ func (llm *ggufModel) Decode(r io.Reader) error {
 	return nil
 }
 func (llm *ggufModel) NumLayers() int64 {
 	value, exists := llm.kv[fmt.Sprintf("%s.block_count", llm.ModelFamily())]
 	if !exists {
 		return 0
 	}
 	v := value.(uint32)
 	return int64(v)
 }
 func (ggufModel) readU8(r io.Reader) uint8 {
 	var u8 uint8
 	binary.Read(r, binary.LittleEndian, &u8)
--- a/llm/llama.cpp/generate_darwin_amd64.go
+++ b/llm/llama.cpp/generate_darwin_amd64.go
@@ -9,8 +9,10 @@ package llm
 //go:generate git -C ggml apply ../patches/0004-metal-add-missing-barriers-for-mul-mat-2699.patch
 //go:generate cmake -S ggml -B ggml/build/cpu -DLLAMA_ACCELERATE=on -DLLAMA_K_QUANTS=on -DCMAKE_SYSTEM_PROCESSOR=x86_64 -DCMAKE_OSX_ARCHITECTURES=x86_64 -DCMAKE_OSX_DEPLOYMENT_TARGET=11.0
 //go:generate cmake --build ggml/build/cpu --target server --config Release
 //go:generate mv ggml/build/cpu/bin/server ggml/build/cpu/bin/ollama-runner
 //go:generate git submodule update --force gguf
 //go:generate git -C gguf apply ../patches/0001-remove-warm-up-logging.patch
 //go:generate cmake -S gguf -B gguf/build/cpu -DLLAMA_ACCELERATE=on -DLLAMA_K_QUANTS=on -DCMAKE_SYSTEM_PROCESSOR=x86_64 -DCMAKE_OSX_ARCHITECTURES=x86_64 -DCMAKE_OSX_DEPLOYMENT_TARGET=11.0
 //go:generate cmake --build gguf/build/cpu --target server --config Release
 //go:generate mv gguf/build/cpu/bin/server gguf/build/cpu/bin/ollama-runner
--- a/llm/llama.cpp/generate_darwin_arm64.go
+++ b/llm/llama.cpp/generate_darwin_arm64.go
@@ -9,8 +9,10 @@ package llm
 //go:generate git -C ggml apply ../patches/0004-metal-add-missing-barriers-for-mul-mat-2699.patch
 //go:generate cmake -S ggml -B ggml/build/metal -DLLAMA_METAL=on -DLLAMA_ACCELERATE=on -DLLAMA_K_QUANTS=on -DCMAKE_SYSTEM_PROCESSOR=arm64 -DCMAKE_OSX_ARCHITECTURES=arm64 -DCMAKE_OSX_DEPLOYMENT_TARGET=11.0
 //go:generate cmake --build ggml/build/metal --target server --config Release
 //go:generate mv ggml/build/metal/bin/server ggml/build/metal/bin/ollama-runner
 //go:generate git submodule update --force gguf
 //go:generate git -C gguf apply ../patches/0001-remove-warm-up-logging.patch
 //go:generate cmake -S gguf -B gguf/build/metal -DLLAMA_METAL=on -DLLAMA_ACCELERATE=on -DLLAMA_K_QUANTS=on -DCMAKE_SYSTEM_PROCESSOR=arm64 -DCMAKE_OSX_ARCHITECTURES=arm64 -DCMAKE_OSX_DEPLOYMENT_TARGET=11.0
 //go:generate cmake --build gguf/build/metal --target server --config Release
 //go:generate mv gguf/build/metal/bin/server gguf/build/metal/bin/ollama-runner
--- a/llm/llama.cpp/generate_linux.go
+++ b/llm/llama.cpp/generate_linux.go
@@ -9,14 +9,18 @@ package llm
 //go:generate git -C ggml apply ../patches/0001-copy-cuda-runtime-libraries.patch
 //go:generate cmake -S ggml -B ggml/build/cpu -DLLAMA_K_QUANTS=on
 //go:generate cmake --build ggml/build/cpu --target server --config Release
 //go:generate mv ggml/build/cpu/bin/server ggml/build/cpu/bin/ollama-runner
 //go:generate git submodule update --force gguf
 //go:generate git -C gguf apply ../patches/0001-copy-cuda-runtime-libraries.patch
 //go:generate git -C gguf apply ../patches/0001-remove-warm-up-logging.patch
 //go:generate cmake -S gguf -B gguf/build/cpu -DLLAMA_K_QUANTS=on
 //go:generate cmake --build gguf/build/cpu --target server --config Release
 //go:generate mv gguf/build/cpu/bin/server gguf/build/cpu/bin/ollama-runner
 //go:generate cmake -S ggml -B ggml/build/cuda -DLLAMA_CUBLAS=on -DLLAMA_ACCELERATE=on -DLLAMA_K_QUANTS=on
 //go:generate cmake --build ggml/build/cuda --target server --config Release
 //go:generate mv ggml/build/cuda/bin/server ggml/build/cuda/bin/ollama-runner
 //go:generate cmake -S gguf -B gguf/build/cuda -DLLAMA_CUBLAS=on -DLLAMA_ACCELERATE=on -DLLAMA_K_QUANTS=on
 //go:generate cmake --build gguf/build/cuda --target server --config Release
 //go:generate mv gguf/build/cuda/bin/server gguf/build/cuda/bin/ollama-runner
--- a/llm/llama.cpp/generate_windows.go
+++ b/llm/llama.cpp/generate_windows.go
@@ -7,8 +7,10 @@ package llm
 //go:generate git -C ggml apply ../patches/0002-34B-model-support.patch
 //go:generate cmake -S ggml -B ggml/build/cpu -DLLAMA_K_QUANTS=on
 //go:generate cmake --build ggml/build/cpu --target server --config Release
 //go:generate cmd /c move ggml\build\cpu\bin\Release\server.exe ggml\build\cpu\bin\Release\ollama-runner.exe
 //go:generate git submodule update --force gguf
 //go:generate git -C gguf apply ../patches/0001-remove-warm-up-logging.patch
 //go:generate cmake -S gguf -B gguf/build/cpu -DLLAMA_K_QUANTS=on
 //go:generate cmake --build gguf/build/cpu --target server --config Release
 //go:generate cmd /c move gguf\build\cpu\bin\Release\server.exe gguf\build\cpu\bin\Release\ollama-runner.exe
--- a/llm/llama.go
+++ b/llm/llama.go
@@ -20,6 +20,7 @@ import (
 	"runtime"
 	"strconv"
 	"strings"
 	"sync"
 	"time"
 	"github.com/jmorganca/ollama/api"
@@ -41,50 +42,52 @@ func chooseRunners(workDir, runnerType string) []ModelRunner {
 	switch runtime.GOOS {
 	case "darwin":
 		runners = []string{
-			path.Join(buildPath, "metal", "bin", "server"),
+			path.Join(buildPath, "metal", "bin", "ollama-runner"),
-			path.Join(buildPath, "cpu", "bin", "server"),
+			path.Join(buildPath, "cpu", "bin", "ollama-runner"),
 		}
 	case "linux":
 		runners = []string{
-			path.Join(buildPath, "cuda", "bin", "server"),
+			path.Join(buildPath, "cuda", "bin", "ollama-runner"),
-			path.Join(buildPath, "cpu", "bin", "server"),
+			path.Join(buildPath, "cpu", "bin", "ollama-runner"),
 		}
 	case "windows":
 		// TODO: select windows GPU runner here when available
 		runners = []string{
-			path.Join(buildPath, "cpu", "bin", "Release", "server.exe"),
+			path.Join(buildPath, "cpu", "bin", "Release", "ollama-runner.exe"),
 		}
 	default:
 		log.Printf("unknown OS, running on CPU: %s", runtime.GOOS)
 		runners = []string{
-			path.Join(buildPath, "cpu", "bin", "server"),
+			path.Join(buildPath, "cpu", "bin", "ollama-runner"),
 		}
 	}
 	runnerAvailable := false // if no runner files are found in the embed, this flag will cause a fast fail
 	for _, r := range runners {
 		// find all the files in the runner's bin directory
-		files, err := fs.Glob(llamaCppEmbed, filepath.Join(filepath.Dir(r), "*"))
+		files, err := fs.Glob(llamaCppEmbed, path.Join(path.Dir(r), "*"))
 		if err != nil {
 			// this is expected, ollama may be compiled without all runners packed in
 			log.Printf("%s runner not found: %v", r, err)
 			continue
 		}
 		runnerAvailable = true
 		for _, f := range files {
 			runnerAvailable = true
 			srcFile, err := llamaCppEmbed.Open(f)
 			if err != nil {
 				log.Fatalf("read llama runner %s: %v", f, err)
 			}
 			defer srcFile.Close()
-			// create the directory in case it does not exist
+			// create the directory in case it does not exist, filepath.Dir() converts the file path to the OS's format
 			destPath := filepath.Join(workDir, filepath.Dir(f))
 			if err := os.MkdirAll(destPath, 0o755); err != nil {
 				log.Fatalf("create runner temp dir %s: %v", filepath.Dir(f), err)
 			}
 			// create the path to the destination file, filepath.Base() converts the file path to the OS's format
 			destFile := filepath.Join(destPath, filepath.Base(f))
 			_, err = os.Stat(destFile)
@@ -111,7 +114,8 @@ func chooseRunners(workDir, runnerType string) []ModelRunner {
 	// return the runners to try in priority order
 	localRunnersByPriority := []ModelRunner{}
 	for _, r := range runners {
-		localRunnersByPriority = append(localRunnersByPriority, ModelRunner{Path: path.Join(workDir, r)})
+		// clean the ModelRunner paths so that they match the OS we are running on
 		localRunnersByPriority = append(localRunnersByPriority, ModelRunner{Path: filepath.Clean(path.Join(workDir, r))})
 	}
 	return localRunnersByPriority
@@ -140,7 +144,7 @@ func llamaModelType(numLayer uint32) string {
 	case 80:
 		return "65B"
 	default:
-		return "Unknown"
+		return "unknown"
 	}
 }
@@ -152,6 +156,10 @@ func (llm *llamaModel) FileType() string {
 	return fileType(llm.hyperparameters.FileType)
 }
 func (llm *llamaModel) NumLayers() int64 {
 	return int64(llm.hyperparameters.NumLayer)
 }
 type llamaHyperparameters struct {
 	// NumVocab is the size of the model's vocabulary.
 	NumVocab uint32
@@ -170,9 +178,12 @@ type llamaHyperparameters struct {
 }
 type Running struct {
-	Port   int
+	Port     int
-	Cmd    *exec.Cmd
+	Cmd      *exec.Cmd
-	Cancel context.CancelFunc
+	Cancel   context.CancelFunc
 	exitOnce sync.Once
 	exitCh   chan error // channel to receive the exit status of the subprocess
 	exitErr  error      // error returned by the subprocess
 }
 type llama struct {
@@ -183,8 +194,8 @@ type llama struct {
 var errNoGPU = errors.New("nvidia-smi command failed")
 // CheckVRAM returns the available VRAM in MiB on Linux machines with NVIDIA GPUs
-func CheckVRAM() (int, error) {
+func CheckVRAM() (int64, error) {
-	cmd := exec.Command("nvidia-smi", "--query-gpu=memory.total", "--format=csv,noheader,nounits")
+	cmd := exec.Command("nvidia-smi", "--query-gpu=memory.free", "--format=csv,noheader,nounits")
 	var stdout bytes.Buffer
 	cmd.Stdout = &stdout
 	err := cmd.Run()
@@ -192,28 +203,27 @@ func CheckVRAM() (int, error) {
 		return 0, errNoGPU
 	}
-	var total int
+	var free int64
 	scanner := bufio.NewScanner(&stdout)
 	for scanner.Scan() {
 		line := scanner.Text()
-		vram, err := strconv.Atoi(line)
+		vram, err := strconv.ParseInt(strings.TrimSpace(line), 10, 64)
 		if err != nil {
 			return 0, fmt.Errorf("failed to parse available VRAM: %v", err)
 		}
-		total += vram
+		free += vram
 	}
-	return total, nil
+	return free, nil
 }
-func NumGPU(opts api.Options) int {
+func NumGPU(numLayer, fileSizeBytes int64, opts api.Options) int {
 	if opts.NumGPU != -1 {
 		return opts.NumGPU
 	}
 	n := 1 // default to enable metal on macOS
 	if runtime.GOOS == "linux" {
-		vram, err := CheckVRAM()
+		vramMib, err := CheckVRAM()
 		if err != nil {
 			if err.Error() != "nvidia-smi command failed" {
 				log.Print(err.Error())
@@ -221,33 +231,45 @@ func NumGPU(opts api.Options) int {
 			// nvidia driver not installed or no nvidia GPU found
 			return 0
 		}
-		// TODO: this is a very rough heuristic, better would be to calculate this based on number of layers and context size
+
-		switch {
+		freeVramBytes := int64(vramMib) * 1024 * 1024 // 1 MiB = 1024^2 bytes
-		case vram < 500:
+
-			log.Printf("WARNING: Low VRAM detected, disabling GPU")
+		// Calculate bytes per layer
-			n = 0
+		// TODO: this is a rough heuristic, better would be to calculate this based on number of layers and context size
-		case vram < 1000:
+		bytesPerLayer := fileSizeBytes / numLayer
-			n = 4
+
-		case vram < 2000:
+		// max number of layers we can fit in VRAM, subtract 5% to prevent consuming all available VRAM and running out of memory
-			n = 8
+		layers := int(freeVramBytes/bytesPerLayer) * 95 / 100
-		case vram < 4000:
+		log.Printf("%d MiB VRAM available, loading up to %d GPU layers", vramMib, layers)
-			n = 12
+
-		case vram < 8000:
+		return layers
 			n = 16
 		case vram < 12000:
 			n = 24
 		case vram < 16000:
 			n = 32
 		default:
 			n = 48
 		}
 		log.Printf("%d MB VRAM available, loading %d GPU layers", vram, n)
 	}
-	return n
+	// default to enable metal on macOS
 	return 1
 }
-func newLlama(model string, adapters []string, runners []ModelRunner, opts api.Options) (*llama, error) {
+// StatusWriter is a writer that captures error messages from the llama runner process
-	if _, err := os.Stat(model); err != nil {
+type StatusWriter struct {
 	ErrCh chan error
 }
 func NewStatusWriter() *StatusWriter {
 	return &StatusWriter{
 		ErrCh: make(chan error, 1),
 	}
 }
 func (w *StatusWriter) Write(b []byte) (int, error) {
 	if _, after, ok := bytes.Cut(b, []byte("error:")); ok {
 		err := fmt.Errorf("llama runner: %s", after)
 		w.ErrCh <- err
 	}
 	return os.Stderr.Write(b)
 }
 func newLlama(model string, adapters []string, runners []ModelRunner, numLayers int64, opts api.Options) (*llama, error) {
 	fileInfo, err := os.Stat(model)
 	if err != nil {
 		return nil, err
 	}
@@ -261,7 +283,7 @@ func newLlama(model string, adapters []string, runners []ModelRunner, opts api.O
 		"--rope-freq-base", fmt.Sprintf("%f", opts.RopeFrequencyBase),
 		"--rope-freq-scale", fmt.Sprintf("%f", opts.RopeFrequencyScale),
 		"--batch-size", fmt.Sprintf("%d", opts.NumBatch),
-		"--n-gpu-layers", fmt.Sprintf("%d", NumGPU(opts)),
+		"--n-gpu-layers", fmt.Sprintf("%d", NumGPU(numLayers, fileInfo.Size(), opts)),
 		"--embedding",
 	}
@@ -291,6 +313,8 @@ func newLlama(model string, adapters []string, runners []ModelRunner, opts api.O
 		params = append(params, "--numa")
 	}
 	var runnerErr error
 	// start the llama.cpp server with a retry in case the port is already in use
 	for _, runner := range runners {
 		if _, err := os.Stat(runner.Path); err != nil {
@@ -307,9 +331,10 @@ func newLlama(model string, adapters []string, runners []ModelRunner, opts api.O
 		)
 		cmd.Env = append(os.Environ(), fmt.Sprintf("LD_LIBRARY_PATH=%s", filepath.Dir(runner.Path)))
 		cmd.Stdout = os.Stderr
-		cmd.Stderr = os.Stderr
+		statusWriter := NewStatusWriter()
 		cmd.Stderr = statusWriter
-		llm := &llama{Options: opts, Running: Running{Port: port, Cmd: cmd, Cancel: cancel}}
+		llm := &llama{Options: opts, Running: Running{Port: port, Cmd: cmd, Cancel: cancel, exitCh: make(chan error)}}
 		log.Print("starting llama runner")
 		if err := llm.Cmd.Start(); err != nil {
@@ -317,19 +342,30 @@ func newLlama(model string, adapters []string, runners []ModelRunner, opts api.O
 			continue
 		}
-		// monitor the command, it is blocking, so if it exits we need to capture that
+		// monitor the llama runner process and signal when it exits
 		go func() {
-			err := llm.Cmd.Wait() // this will block until the command exits
+			err := llm.Cmd.Wait()
-			if err != nil {
+			llm.exitErr = err
-				log.Printf("llama runner exited with error: %v", err)
+			// llm.Cmd.Wait() can only be called once, use this exit channel to signal that the process has exited
-			} else {
+			llm.exitOnce.Do(func() {
-				log.Printf("llama runner exited")
+				close(llm.exitCh)
-			}
+			})
 		}()
 		if err := waitForServer(llm); err != nil {
 			log.Printf("error starting llama runner: %v", err)
 			llm.Close()
 			// default the runnerErr to the error returned by the most recent llama runner process
 			runnerErr = err
 			// capture the error directly from the runner process, if any
 			select {
 			case runnerErr = <-statusWriter.ErrCh:
 			default:
 				// the runner process probably timed out
 			}
 			// try again
 			continue
 		}
@@ -338,37 +374,54 @@ func newLlama(model string, adapters []string, runners []ModelRunner, opts api.O
 		return llm, nil
 	}
 	if runnerErr != nil {
 		// this is the error returned from the llama runner process that failed most recently
 		return nil, runnerErr
 	}
 	return nil, fmt.Errorf("failed to start a llama runner")
 }
 func waitForServer(llm *llama) error {
 	// wait for the server to start responding
 	start := time.Now()
-	expiresAt := time.Now().Add(2 * time.Minute) // be generous with timeout, large models can take a while to load
+	expiresAt := time.Now().Add(3 * time.Minute) // be generous with timeout, large models can take a while to load
 	ticker := time.NewTicker(200 * time.Millisecond)
 	defer ticker.Stop()
 	log.Print("waiting for llama runner to start responding")
-	for range ticker.C {
+	for {
-		if time.Now().After(expiresAt) {
+		select {
-			return fmt.Errorf("llama runner did not start within alloted time, retrying")
+		case <-llm.exitCh:
-		}
+			// failed to start subprocess
 		// check if the server process has terminated
 		if llm.Cmd.ProcessState != nil && llm.Cmd.ProcessState.Exited() {
 			return fmt.Errorf("llama runner process has terminated")
-		}
+		case <-ticker.C:
 			if time.Now().After(expiresAt) {
 				// timeout
 				return fmt.Errorf("timed out waiting for llama runner to start")
 			}
-		if err := llm.Ping(context.Background()); err == nil {
+			if err := llm.Ping(context.Background()); err == nil {
-			break
+				// success
 				log.Printf("llama runner started in %f seconds", time.Since(start).Seconds())
 				return nil
 			}
 		}
 	}
 	log.Printf("llama runner started in %f seconds", time.Since(start).Seconds())
 	return nil
 }
 func (llm *llama) Close() {
 	// signal the sub-process to terminate
 	llm.Cancel()
 	// wait for the command to exit to prevent race conditions with the next run
 	<-llm.exitCh
 	err := llm.exitErr
 	if err != nil {
 		log.Printf("llama runner stopped with error: %v", err)
 	} else {
 		log.Print("llama runner stopped successfully")
 	}
 }
 func (llm *llama) SetOptions(opts api.Options) {
@@ -418,30 +471,29 @@ type Prediction struct {
 }
 type PredictRequest struct {
-	Stream           bool            `json:"stream"`
+	Prompt           string   `json:"prompt"`
-	NPredict         int             `json:"n_predict,omitempty"`
+	Stream           bool     `json:"stream"`
-	TopK             int             `json:"top_k,omitempty"`
+	NPredict         int      `json:"n_predict"`
-	TopP             float32         `json:"top_p,omitempty"`
+	NKeep            int      `json:"n_keep"`
-	TfsZ             float32         `json:"tfs_z,omitempty"`
+	Temperature      float32  `json:"temperature"`
-	TypicalP         float32         `json:"typical_p,omitempty"`
+	TopK             int      `json:"top_k"`
-	RepeatLastN      int             `json:"repeat_last_n,omitempty"`
+	TopP             float32  `json:"top_p"`
-	Temperature      float32         `json:"temperature,omitempty"`
+	TfsZ             float32  `json:"tfs_z"`
-	RepeatPenalty    float32         `json:"repeat_penalty,omitempty"`
+	TypicalP         float32  `json:"typical_p"`
-	PresencePenalty  float32         `json:"presence_penalty,omitempty"`
+	RepeatLastN      int      `json:"repeat_last_n"`
-	FrequencyPenalty float32         `json:"frequency_penalty,omitempty"`
+	RepeatPenalty    float32  `json:"repeat_penalty"`
-	Mirostat         int             `json:"mirostat,omitempty"`
+	PresencePenalty  float32  `json:"presence_penalty"`
-	MirostatTau      float32         `json:"mirostat_tau,omitempty"`
+	FrequencyPenalty float32  `json:"frequency_penalty"`
-	MirostatEta      float32         `json:"mirostat_eta,omitempty"`
+	Mirostat         int      `json:"mirostat"`
-	PenalizeNl       bool            `json:"penalize_nl,omitempty"`
+	MirostatTau      float32  `json:"mirostat_tau"`
-	NKeep            int             `json:"n_keep,omitempty"`
+	MirostatEta      float32  `json:"mirostat_eta"`
-	Seed             int             `json:"seed,omitempty"`
+	PenalizeNl       bool     `json:"penalize_nl"`
-	Prompt           string          `json:"prompt,omitempty"`
+	Seed             int      `json:"seed"`
-	NProbs           int             `json:"n_probs,omitempty"`
+	Stop             []string `json:"stop,omitempty"`
 	LogitBias        map[int]float32 `json:"logit_bias,omitempty"`
 	IgnoreEos        bool            `json:"ignore_eos,omitempty"`
 	Stop             []string        `json:"stop,omitempty"`
 }
 const maxBufferSize = 512 * 1000 // 512KB
 func (llm *llama) Predict(ctx context.Context, prevContext []int, prompt string, fn func(api.GenerateResponse)) error {
 	prevConvo, err := llm.Decode(ctx, prevContext)
 	if err != nil {
@@ -471,8 +523,10 @@ func (llm *llama) Predict(ctx context.Context, prevContext []int, prompt string,
 		MirostatTau:      llm.MirostatTau,
 		MirostatEta:      llm.MirostatEta,
 		PenalizeNl:       llm.PenalizeNewline,
 		Seed:             llm.Seed,
 		Stop:             llm.Stop,
 	}
 	data, err := json.Marshal(predReq)
 	if err != nil {
 		return fmt.Errorf("error marshaling data: %v", err)
@@ -500,6 +554,9 @@ func (llm *llama) Predict(ctx context.Context, prevContext []int, prompt string,
 	}
 	scanner := bufio.NewScanner(resp.Body)
 	// increase the buffer size to avoid running out of space
 	buf := make([]byte, 0, maxBufferSize)
 	scanner.Buffer(buf, maxBufferSize)
 	for scanner.Scan() {
 		select {
 		case <-ctx.Done():
--- a/llm/llm.go
+++ b/llm/llm.go
@@ -5,6 +5,7 @@ import (
 	"fmt"
 	"log"
 	"os"
 	"runtime"
 	"github.com/pbnjay/memory"
@@ -37,53 +38,55 @@ func New(workDir, model string, adapters []string, opts api.Options) (LLM, error
 		return nil, err
 	}
-	switch ggml.FileType() {
+	if runtime.GOOS == "darwin" {
-	case "Q8_0":
+		switch ggml.FileType() {
-		if ggml.Name() != "gguf" && opts.NumGPU != 0 {
+		case "Q8_0":
-			// GGML Q8_0 do not support Metal API and will
+			if ggml.Name() != "gguf" && opts.NumGPU != 0 {
-			// cause the runner to segmentation fault so disable GPU
+				// GGML Q8_0 do not support Metal API and will
-			log.Printf("WARNING: GPU disabled for F32, Q5_0, Q5_1, and Q8_0")
+				// cause the runner to segmentation fault so disable GPU
-			opts.NumGPU = 0
+				log.Printf("WARNING: GPU disabled for F32, Q5_0, Q5_1, and Q8_0")
-		}
+				opts.NumGPU = 0
-	case "F32", "Q5_0", "Q5_1":
+			}
-		if opts.NumGPU != 0 {
+		case "F32", "Q5_0", "Q5_1":
-			// F32, Q5_0, Q5_1, and Q8_0 do not support Metal API and will
+			if opts.NumGPU != 0 {
-			// cause the runner to segmentation fault so disable GPU
+				// F32, Q5_0, Q5_1, and Q8_0 do not support Metal API and will
-			log.Printf("WARNING: GPU disabled for F32, Q5_0, Q5_1, and Q8_0")
+				// cause the runner to segmentation fault so disable GPU
-			opts.NumGPU = 0
+				log.Printf("WARNING: GPU disabled for F32, Q5_0, Q5_1, and Q8_0")
 				opts.NumGPU = 0
 			}
 		}
 	}
 	totalResidentMemory := memory.TotalMemory()
 	switch ggml.ModelType() {
 	case "3B", "7B":
-		if ggml.FileType() == "F16" && totalResidentMemory < 16*1024*1024 {
+		if ggml.FileType() == "F16" && totalResidentMemory < 16*1000*1000 {
-			return nil, fmt.Errorf("F16 model requires at least 16GB of memory")
+			return nil, fmt.Errorf("F16 model requires at least 16 GB of memory")
-		} else if totalResidentMemory < 8*1024*1024 {
+		} else if totalResidentMemory < 8*1000*1000 {
-			return nil, fmt.Errorf("model requires at least 8GB of memory")
+			return nil, fmt.Errorf("model requires at least 8 GB of memory")
 		}
 	case "13B":
-		if ggml.FileType() == "F16" && totalResidentMemory < 32*1024*1024 {
+		if ggml.FileType() == "F16" && totalResidentMemory < 32*1000*1000 {
-			return nil, fmt.Errorf("F16 model requires at least 32GB of memory")
+			return nil, fmt.Errorf("F16 model requires at least 32 GB of memory")
-		} else if totalResidentMemory < 16*1024*1024 {
+		} else if totalResidentMemory < 16*1000*1000 {
-			return nil, fmt.Errorf("model requires at least 16GB of memory")
+			return nil, fmt.Errorf("model requires at least 16 GB of memory")
 		}
 	case "30B", "34B", "40B":
-		if ggml.FileType() == "F16" && totalResidentMemory < 64*1024*1024 {
+		if ggml.FileType() == "F16" && totalResidentMemory < 64*1000*1000 {
-			return nil, fmt.Errorf("F16 model requires at least 64GB of memory")
+			return nil, fmt.Errorf("F16 model requires at least 64 GB of memory")
-		} else if totalResidentMemory < 32*1024*1024 {
+		} else if totalResidentMemory < 32*1000*1000 {
-			return nil, fmt.Errorf("model requires at least 32GB of memory")
+			return nil, fmt.Errorf("model requires at least 32 GB of memory")
 		}
 	case "65B", "70B":
-		if ggml.FileType() == "F16" && totalResidentMemory < 128*1024*1024 {
+		if ggml.FileType() == "F16" && totalResidentMemory < 128*1000*1000 {
-			return nil, fmt.Errorf("F16 model requires at least 128GB of memory")
+			return nil, fmt.Errorf("F16 model requires at least 128 GB of memory")
-		} else if totalResidentMemory < 64*1024*1024 {
+		} else if totalResidentMemory < 64*1000*1000 {
-			return nil, fmt.Errorf("model requires at least 64GB of memory")
+			return nil, fmt.Errorf("model requires at least 64 GB of memory")
 		}
 	case "180B":
-		if ggml.FileType() == "F16" && totalResidentMemory < 512*1024*1024 {
+		if ggml.FileType() == "F16" && totalResidentMemory < 512*1000*1000 {
 			return nil, fmt.Errorf("F16 model requires at least 512GB of memory")
-		} else if totalResidentMemory < 128*1024*1024 {
+		} else if totalResidentMemory < 128*1000*1000 {
 			return nil, fmt.Errorf("model requires at least 128GB of memory")
 		}
 	}
@@ -91,9 +94,9 @@ func New(workDir, model string, adapters []string, opts api.Options) (LLM, error
 	switch ggml.Name() {
 	case "gguf":
 		opts.NumGQA = 0 // TODO: remove this when llama.cpp runners differ enough to need separate newLlama functions
-		return newLlama(model, adapters, chooseRunners(workDir, "gguf"), opts)
+		return newLlama(model, adapters, chooseRunners(workDir, "gguf"), ggml.NumLayers(), opts)
 	case "ggml", "ggmf", "ggjt", "ggla":
-		return newLlama(model, adapters, chooseRunners(workDir, "ggml"), opts)
+		return newLlama(model, adapters, chooseRunners(workDir, "ggml"), ggml.NumLayers(), opts)
 	default:
 		return nil, fmt.Errorf("unknown ggml type: %s", ggml.ModelFamily())
 	}
--- a/llm/starcoder.go
+++ b/llm/starcoder.go
@@ -0,0 +1,23 @@
 package llm
 const (
 	starCoderModelType1B  = 24
 	starCoderModelType3B  = 36
 	starCoderModelType7B  = 42
 	starCoderModelType15B = 40
 )
 func starCoderModelType(numLayer uint32) string {
 	switch numLayer {
 	case 24:
 		return "1B"
 	case 36:
 		return "3B"
 	case 42:
 		return "7B"
 	case 40:
 		return "15B"
 	default:
 		return "unknown"
 	}
 }
--- a/scripts/build.sh
+++ b/scripts/build.sh
@@ -0,0 +1,21 @@
 #!/bin/sh
 set -eu
 usage() {
    echo "usage: $(basename $0) VERSION"
    exit 1
 }
 [ "$#" -eq 1 ] || usage
 export VERSION="$1"
 # build universal MacOS binary
 sh $(dirname $0)/build_darwin.sh
 # # build arm64 and amd64 Linux binaries
 sh $(dirname $0)/build_linux.sh
 # # build arm64 and amd64 Docker images
 sh $(dirname $0)/build_docker.sh
--- a/scripts/build_darwin.sh
+++ b/scripts/build_darwin.sh
@@ -1,29 +1,30 @@
-#!/bin/bash
+#!/bin/sh
 set -eu
 export VERSION=${VERSION:-0.0.0}
 export GOFLAGS="'-ldflags=-w -s \"-X=github.com/jmorganca/ollama/version.Version=$VERSION\" \"-X=github.com/jmorganca/ollama/server.mode=release\"'"
 mkdir -p dist
-GO_LDFLAGS="-X github.com/jmorganca/ollama/version.Version=$VERSION"
+for TARGETARCH in arm64 amd64; do
-GO_LDFLAGS="$GO_LDFLAGS -X github.com/jmorganca/ollama/server.mode=release"
+    GOOS=darwin GOARCH=$TARGETARCH go generate ./...
    GOOS=darwin GOARCH=$TARGETARCH go build -o dist/ollama-darwin-$TARGETARCH
 done
-# build universal binary
+lipo -create -output dist/ollama dist/ollama-darwin-*
-GOARCH=arm64 go generate ./...
+rm -f dist/ollama-darwin-*
 GOARCH=arm64 go build -ldflags "$GO_LDFLAGS" -o dist/ollama-darwin-arm64
 rm -rf llm/llama.cpp/*/build/*/bin
 GOARCH=amd64 go generate ./...
 GOARCH=amd64 go build -ldflags "$GO_LDFLAGS" -o dist/ollama-darwin-amd64
 lipo -create -output dist/ollama dist/ollama-darwin-arm64 dist/ollama-darwin-amd64
 rm dist/ollama-darwin-amd64 dist/ollama-darwin-arm64
 codesign --deep --force --options=runtime --sign "$APPLE_IDENTITY" --timestamp dist/ollama
 chmod +x dist/ollama
 # build and sign the mac app
 npm install --prefix app
 npm run --prefix app make:sign
-cp app/out/make/zip/darwin/universal/Ollama-darwin-universal-${VERSION:-0.0.0}.zip dist/Ollama-darwin.zip
+cp app/out/make/zip/darwin/universal/Ollama-darwin-universal-$VERSION.zip dist/Ollama-darwin.zip
 # sign the binary and rename it
 codesign -f --timestamp -s "$APPLE_IDENTITY" --identifier ai.ollama.ollama --options=runtime dist/ollama
 ditto -c -k --keepParent dist/ollama dist/temp.zip
 xcrun notarytool submit dist/temp.zip --wait --timeout 10m --apple-id $APPLE_ID --password $APPLE_PASSWORD --team-id $APPLE_TEAM_ID
 mv dist/ollama dist/ollama-darwin
-rm dist/temp.zip
+rm -f dist/temp.zip
--- a/scripts/build_docker.sh
+++ b/scripts/build_docker.sh
@@ -0,0 +1,15 @@
 #!/bin/sh
 set -eu
 export VERSION=${VERSION:-0.0.0}
 export GOFLAGS="'-ldflags=-w -s \"-X=github.com/jmorganca/ollama/version.Version=$VERSION\" \"-X=github.com/jmorganca/ollama/server.mode=release\"'"
 docker buildx build \
    --load \
    --platform=linux/arm64,linux/amd64 \
    --build-arg=VERSION \
    --build-arg=GOFLAGS \
    -f Dockerfile \
    -t ollama \
    .
--- a/scripts/build_linux.sh
+++ b/scripts/build_linux.sh
@@ -1,12 +1,15 @@
-#!/bin/bash
+#!/bin/sh
-set -e
+set -eu
 export VERSION=${VERSION:-0.0.0}
 export GOFLAGS="'-ldflags=-w -s \"-X=github.com/jmorganca/ollama/version.Version=$VERSION\" \"-X=github.com/jmorganca/ollama/server.mode=release\"'"
 mkdir -p dist
-for ARCH in arm64 amd64; do
+for TARGETARCH in arm64 amd64; do
-    docker buildx build --platform=linux/$ARCH -f Dockerfile.build . -t builder:$ARCH --load
+    docker buildx build --load --platform=linux/$TARGETARCH --build-arg=VERSION --build-arg=GOFLAGS -f Dockerfile.build -t builder:$TARGETARCH .
-    docker create --platform linux/$ARCH --name builder builder:$ARCH
+    docker create --platform linux/$TARGETARCH --name builder-$TARGETARCH builder:$TARGETARCH
-    docker cp builder:/go/src/github.com/jmorganca/ollama/ollama ./dist/ollama-linux-$ARCH
+    docker cp builder-$TARGETARCH:/go/src/github.com/jmorganca/ollama/ollama ./dist/ollama-linux-$TARGETARCH
-    docker rm builder
+    docker rm builder-$TARGETARCH
 done
--- a/scripts/install.sh
+++ b/scripts/install.sh
@@ -4,157 +4,240 @@
 set -eu
-check_os() {
+status() { echo ">>> $*" >&2; }
-    if [ "$(uname -s)" != "Linux" ]; then
+error() { echo "ERROR $*"; exit 1; }
-        echo "This script is intended to run on Linux only."
+warning() { echo "WARNING: $*"; }
        exit 1
    fi
 }
-determine_architecture() {
+TEMP_DIR=$(mktemp -d)
-    ARCH=$(uname -m)
+cleanup() { rm -rf $TEMP_DIR; }
-    case $ARCH in
+trap cleanup EXIT
        x86_64)
            ARCH_SUFFIX="amd64"
            ;;
        aarch64|arm64)
            ARCH_SUFFIX="arm64"
            ;;
        *)
            echo "Unsupported architecture: $ARCH"
            exit 1
            ;;
    esac
 }
-check_sudo() {
+available() { command -v $1 >/dev/null; }
-    if [ "$(id -u)" -ne 0 ]; then
+require() {
-        if command -v sudo >/dev/null 2>&1; then
+    local MISSING=''
-            SUDO_CMD="sudo"
+    for TOOL in $*; do
-            echo "Downloading the ollama executable to the PATH, this will require sudo permissions."
+        if ! available $TOOL; then
-        else
+            MISSING="$MISSING $TOOL"
            echo "Error: sudo is not available. Please run as root or install sudo."
            exit 1
        fi
-    else
+    done
-        SUDO_CMD=""
+
-    fi
+    echo $MISSING
 }
-install_cuda_drivers() {
+[ "$(uname -s)" = "Linux" ] || error 'This script is intended to run on Linux only.'
-    local os_name os_version
+
-    if [ -f "/etc/os-release" ]; then
+case "$(uname -m)" in
-        . /etc/os-release
+    x86_64) ARCH="amd64" ;;
-        os_name=$ID
+    aarch64|arm64) ARCH="arm64" ;;
-        os_version=$VERSION_ID
+    *) error "Unsupported architecture: $ARCH" ;;
-    else
+esac
-        echo "Unable to detect operating system. Skipping CUDA installation."
+
-        return 1
+SUDO=
 if [ "$(id -u)" -ne 0 ]; then
    # Running as root, no need for sudo
    if ! available sudo; then
        error "This script requires superuser permissions. Please re-run as root."
    fi
-    # based on https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html#package-manager-installation
+    SUDO="sudo"
-    case $os_name in
+fi
        CentOS)
            $SUDO_CMD yum install yum-utils
            $SUDO_CMD yum-config-manager --add-repo https://developer.download.nvidia.com/compute/cuda/repos/rhel7/x86_64/cuda-rhel7.repo
            $SUDO_CMD yum clean all
            $SUDO_CMD yum -y install nvidia-driver-latest-dkms
            $SUDO_CMD yum -y install cuda-driver
            $SUDO_CMD yum install kernel-devel-$(uname -r) kernel-headers-$(uname -r)
            $SUDO_CMD dkms status | awk -F: '/added/ { print $1 }' | xargs -n1 $SUDO_CMD dkms install
            $SUDO_CMD modprobe nvidia
            ;;
        ubuntu)
            case $os_version in
                20.04)
                    wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/cuda-keyring_1.1-1_all.deb
                ;;
                22.04)
                    wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-keyring_1.1-1_all.deb
                ;;
                *)
                    echo "Skipping automatic CUDA installation, not supported for Ubuntu ($os_version)."
                    return
                ;;
            esac
            $SUDO_CMD dpkg -i cuda-keyring_1.1-1_all.deb
            $SUDO_CMD apt-get update
            $SUDO_CMD apt-get -y install cuda-drivers
            ;;
        RedHatEnterprise*|Kylin|Fedora|SLES|openSUSE*|Microsoft|Debian)
            echo "NVIDIA CUDA drivers may not be installed, you can install them from: https://developer.nvidia.com/cuda-downloads"
            ;;
        *)
            echo "Unsupported or unknown distribution, skipping GPU CUDA driver install: $os_name"
            ;;
    esac
 }
-check_install_cuda_drivers() {
+NEEDS=$(require curl awk grep sed tee xargs)
-    if lspci -d '10de:' | grep 'NVIDIA' >/dev/null; then
+if [ -n "$NEEDS" ]; then
-        # NVIDIA Corporation [10de] device is available
+    status "ERROR: The following tools are required but missing:"
-        if command -v nvidia-smi >/dev/null 2>&1; then
+    for NEED in $NEEDS; do
-            CUDA_VERSION=$(nvidia-smi | grep -o "CUDA Version: [0-9]*\.[0-9]*")
+        echo "  - $NEED"
-            if [ -z "$CUDA_VERSION" ]; then
+    done
-                echo "Warning: NVIDIA-SMI is available, but the CUDA version cannot be detected. Installing CUDA drivers..."
+    exit 1
-                install_cuda_drivers
+fi
            else
                echo "Detected CUDA version $CUDA_VERSION"
            fi
        else
            echo "Warning: NVIDIA GPU detected but NVIDIA-SMI is not available. Installing CUDA drivers..."
            install_cuda_drivers
        fi
    else
        echo "No NVIDIA GPU detected. Skipping driver installation."
    fi
 }
-download_ollama() {
+status "Downloading ollama..."
-    $SUDO_CMD mkdir -p /usr/bin
+curl --fail --show-error --location --progress-bar -o $TEMP_DIR/ollama "https://ollama.ai/download/ollama-linux-$ARCH"
-    $SUDO_CMD curl -fsSL -o /usr/bin/ollama "https://ollama.ai/download/latest/ollama-linux-$ARCH_SUFFIX"
+
-}
+for BINDIR in /usr/local/bin /usr/bin /bin; do
    echo $PATH | grep -q $BINDIR && break || continue
 done
 status "Installing ollama to $BINDIR..."
 $SUDO install -o0 -g0 -m755 -d $BINDIR
 $SUDO install -o0 -g0 -m755 $TEMP_DIR/ollama $BINDIR/ollama
 install_success() { status 'Install complete. Run "ollama" from the command line.'; }
 trap install_success EXIT
 # Everything from this point onwards is optional.
 configure_systemd() {
-    if command -v systemctl >/dev/null 2>&1; then
+    if ! id ollama >/dev/null 2>&1; then
-        $SUDO_CMD useradd -r -s /bin/false -m -d /home/ollama ollama 2>/dev/null 
+        status "Creating ollama user..."
        $SUDO useradd -r -s /bin/false -m -d /usr/share/ollama ollama
    fi
-        echo "Creating systemd service file for ollama..."
+    status "Creating ollama systemd service..."
-        cat <<EOF | $SUDO_CMD tee /etc/systemd/system/ollama.service >/dev/null
+    cat <<EOF | $SUDO tee /etc/systemd/system/ollama.service >/dev/null
 [Unit]
 Description=Ollama Service
 After=network-online.target
 [Service]
-ExecStart=/usr/bin/ollama serve
+ExecStart=$BINDIR/ollama serve
 User=ollama
 Group=ollama
 Restart=always
 RestartSec=3
-Environment="HOME=/home/ollama"
+Environment="HOME=/usr/share/ollama"
 Environment="PATH=$PATH"
 [Install]
 WantedBy=default.target
 EOF
-        echo "Reloading systemd and enabling ollama service..."
+    SYSTEMCTL_RUNNING="$(systemctl is-system-running || true)"
-        if [ "$(systemctl is-system-running || echo 'not running')" = 'running' ]; then 
+    case $SYSTEMCTL_RUNNING in
-            $SUDO_CMD systemctl daemon-reload
+        running|degraded)
-            $SUDO_CMD systemctl enable ollama
+            status "Enabling and starting ollama service..."
-            $SUDO_CMD systemctl restart ollama
+            $SUDO systemctl daemon-reload
-        fi
+            $SUDO systemctl enable ollama
-    else
+
-        echo "Run 'ollama serve' from the command line to start the service."
+            start_service() { $SUDO systemctl restart ollama; }
-    fi
+            trap start_service EXIT
            ;;
    esac
 }
-main() {
+if available systemctl; then
    check_os
    determine_architecture
    check_sudo
    download_ollama
    configure_systemd
-    check_install_cuda_drivers
+fi
-    echo "Installation complete. You can now run 'ollama' from the command line."
+
 if ! available lspci && ! available lshw; then
    warning "Unable to detect NVIDIA GPU. Install lspci or lshw to automatically detect and install NVIDIA CUDA drivers."
    exit 0
 fi
 check_gpu() {
    case $1 in
        lspci) available lspci && lspci -d '10de:' | grep -q 'NVIDIA' || return 1 ;;
        lshw) available lshw && $SUDO lshw -c display -numeric | grep -q 'vendor: .* \[10DE\]' || return 1 ;;
        nvidia-smi) available nvidia-smi || return 1 ;;
    esac
 }
-main
+if check_gpu nvidia-smi; then
    status "NVIDIA GPU installed."
    exit 0
 fi
 if ! check_gpu lspci && ! check_gpu lshw; then
    warning "No NVIDIA GPU detected. Ollama will run in CPU-only mode."
    exit 0
 fi
 # ref: https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html#rhel-7-centos-7
 # ref: https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html#rhel-8-rocky-8
 # ref: https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html#rhel-9-rocky-9
 # ref: https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html#fedora
 install_cuda_driver_yum() {
    status 'Installing NVIDIA repository...'
    case $PACKAGE_MANAGER in
        yum)
            $SUDO $PACKAGE_MANAGER -y install yum-utils
            $SUDO $PACKAGE_MANAGER-config-manager --add-repo https://developer.download.nvidia.com/compute/cuda/repos/$1$2/$(uname -m)/cuda-$1$2.repo
            ;;
        dnf)
            $SUDO $PACKAGE_MANAGER config-manager --add-repo https://developer.download.nvidia.com/compute/cuda/repos/$1$2/$(uname -m)/cuda-$1$2.repo
            ;;
    esac
    case $1 in
        rhel)
            status 'Installing EPEL repository...'
            # EPEL is required for third-party dependencies such as dkms and libvdpau
            $SUDO $PACKAGE_MANAGER -y install https://dl.fedoraproject.org/pub/epel/epel-release-latest-$2.noarch.rpm || true
            ;;
    esac
    status 'Installing CUDA driver...'
    if [ "$1" = 'centos' ] || [ "$1$2" = 'rhel7' ]; then
        $SUDO $PACKAGE_MANAGER -y install nvidia-driver-latest-dkms
    fi
    $SUDO $PACKAGE_MANAGER -y install cuda-drivers
 }
 # ref: https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html#ubuntu
 # ref: https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html#debian
 install_cuda_driver_apt() {
    status 'Installing NVIDIA repository...'
    curl -fsSL -o $TEMP_DIR/cuda-keyring.deb https://developer.download.nvidia.com/compute/cuda/repos/$1$2/$(uname -m)/cuda-keyring_1.1-1_all.deb
    case $1 in
        debian)
            status 'Enabling contrib sources...'
            $SUDO sed 's/main/contrib/' < /etc/apt/sources.list | sudo tee /etc/apt/sources.list.d/contrib.list > /dev/null
            ;;
    esac
    status 'Installing CUDA driver...'
    $SUDO dpkg -i $TEMP_DIR/cuda-keyring.deb
    $SUDO apt-get update
    [ -n "$SUDO" ] && SUDO_E="$SUDO -E" || SUDO_E=
    DEBIAN_FRONTEND=noninteractive $SUDO_E apt-get -y install cuda-drivers -q
 }
 if [ ! -f "/etc/os-release" ]; then
    error "Unknown distribution. Skipping CUDA installation."
 fi
 . /etc/os-release
 OS_NAME=$ID
 OS_VERSION=$VERSION_ID
 PACKAGE_MANAGER=
 for PACKAGE_MANAGER in dnf yum apt-get; do
    if available $PACKAGE_MANAGER; then
        break
    fi
 done
 if [ -z "$PACKAGE_MANAGER" ]; then
    error "Unknown package manager. Skipping CUDA installation."
 fi
 if ! check_gpu nvidia-smi || [ -z "$(nvidia-smi | grep -o "CUDA Version: [0-9]*\.[0-9]*")" ]; then
    case $OS_NAME in
        centos|rhel) install_cuda_driver_yum 'rhel' $OS_VERSION ;;
        rocky) install_cuda_driver_yum 'rhel' $(echo $OS_VERSION | cut -c1) ;;
        fedora) install_cuda_driver_yum $OS_NAME $OS_VERSION ;;
        amzn) install_cuda_driver_yum 'fedora' '35' ;;
        debian) install_cuda_driver_apt $OS_NAME $OS_VERSION ;;
        ubuntu) install_cuda_driver_apt $OS_NAME $(echo $OS_VERSION | sed 's/\.//') ;;
        *) exit ;;
    esac
 fi
 if ! lsmod | grep -q nvidia; then
    KERNEL_RELEASE="$(uname -r)"
    case $OS_NAME in
        centos|rhel|rocky|amzn) $SUDO $PACKAGE_MANAGER -y install kernel-devel-$KERNEL_RELEASE kernel-headers-$KERNEL_RELEASE ;;
        fedora) $SUDO $PACKAGE_MANAGER -y install kernel-devel-$KERNEL_RELEASE ;;
        debian|ubuntu) $SUDO apt-get -y install linux-headers-$KERNEL_RELEASE ;;
        *) exit ;;
    esac
    NVIDIA_CUDA_VERSION=$($SUDO dkms status | awk -F: '/added/ { print $1 }')
    if [ -n "$NVIDIA_CUDA_VERSION" ]; then
        $SUDO dkms install $NVIDIA_CUDA_VERSION
    fi
    if lsmod | grep -q nouveau; then
        status 'Reboot to complete NVIDIA CUDA driver install.'
        exit 0
    fi
    $SUDO modprobe nvidia
 fi
 status "NVIDIA CUDA drivers installed."
--- a/server/download.go
+++ b/server/download.go
@@ -2,38 +2,309 @@ package server
 import (
 	"context"
 	"encoding/json"
 	"errors"
 	"fmt"
 	"io"
 	"log"
 	"net/http"
 	"net/url"
 	"os"
 	"path/filepath"
 	"strconv"
 	"strings"
 	"sync"
 	"sync/atomic"
 	"time"
 	"golang.org/x/sync/errgroup"
 	"github.com/jmorganca/ollama/api"
 	"github.com/jmorganca/ollama/format"
 )
-type FileDownload struct {
+var blobDownloadManager sync.Map
-	Digest    string
+
-	FilePath  string
+type blobDownload struct {
 	Name   string
 	Digest string
 	Total     int64
-	Completed int64
+	Completed atomic.Int64
 	Parts []*blobDownloadPart
 	context.CancelFunc
 	done       bool
 	err        error
 	references atomic.Int32
 }
-var inProgress sync.Map // map of digests currently being downloaded to their current download progress
+type blobDownloadPart struct {
 	N         int
 	Offset    int64
 	Size      int64
 	Completed int64
 	*blobDownload `json:"-"`
 }
 const (
 	numDownloadParts          = 64
 	minDownloadPartSize int64 = 32 * 1000 * 1000
 	maxDownloadPartSize int64 = 256 * 1000 * 1000
 )
 func (p *blobDownloadPart) Name() string {
 	return strings.Join([]string{
 		p.blobDownload.Name, "partial", strconv.Itoa(p.N),
 	}, "-")
 }
 func (p *blobDownloadPart) StartsAt() int64 {
 	return p.Offset + p.Completed
 }
 func (p *blobDownloadPart) StopsAt() int64 {
 	return p.Offset + p.Size
 }
 func (b *blobDownload) Prepare(ctx context.Context, requestURL *url.URL, opts *RegistryOptions) error {
 	partFilePaths, err := filepath.Glob(b.Name + "-partial-*")
 	if err != nil {
 		return err
 	}
 	for _, partFilePath := range partFilePaths {
 		part, err := b.readPart(partFilePath)
 		if err != nil {
 			return err
 		}
 		b.Total += part.Size
 		b.Completed.Add(part.Completed)
 		b.Parts = append(b.Parts, part)
 	}
 	if len(b.Parts) == 0 {
 		resp, err := makeRequest(ctx, "HEAD", requestURL, nil, nil, opts)
 		if err != nil {
 			return err
 		}
 		defer resp.Body.Close()
 		if resp.StatusCode >= http.StatusBadRequest {
 			body, _ := io.ReadAll(resp.Body)
 			return fmt.Errorf("registry responded with code %d: %v", resp.StatusCode, string(body))
 		}
 		b.Total, _ = strconv.ParseInt(resp.Header.Get("Content-Length"), 10, 64)
 		var size = b.Total / numDownloadParts
 		switch {
 		case size < minDownloadPartSize:
 			size = minDownloadPartSize
 		case size > maxDownloadPartSize:
 			size = maxDownloadPartSize
 		}
 		var offset int64
 		for offset < b.Total {
 			if offset+size > b.Total {
 				size = b.Total - offset
 			}
 			if err := b.newPart(offset, size); err != nil {
 				return err
 			}
 			offset += size
 		}
 	}
 	log.Printf("downloading %s in %d %s part(s)", b.Digest[7:19], len(b.Parts), format.HumanBytes(b.Parts[0].Size))
 	return nil
 }
 func (b *blobDownload) Run(ctx context.Context, requestURL *url.URL, opts *RegistryOptions) {
 	b.err = b.run(ctx, requestURL, opts)
 }
 func (b *blobDownload) run(ctx context.Context, requestURL *url.URL, opts *RegistryOptions) error {
 	defer blobDownloadManager.Delete(b.Digest)
 	ctx, b.CancelFunc = context.WithCancel(ctx)
 	file, err := os.OpenFile(b.Name+"-partial", os.O_CREATE|os.O_RDWR, 0644)
 	if err != nil {
 		return err
 	}
 	defer file.Close()
 	file.Truncate(b.Total)
 	g, inner := errgroup.WithContext(ctx)
 	g.SetLimit(numDownloadParts)
 	for i := range b.Parts {
 		part := b.Parts[i]
 		if part.Completed == part.Size {
 			continue
 		}
 		i := i
 		g.Go(func() error {
 			for try := 0; try < maxRetries; try++ {
 				w := io.NewOffsetWriter(file, part.StartsAt())
 				err := b.downloadChunk(inner, requestURL, w, part, opts)
 				switch {
 				case errors.Is(err, context.Canceled):
 					return err
 				case err != nil:
 					log.Printf("%s part %d attempt %d failed: %v, retrying", b.Digest[7:19], i, try, err)
 					continue
 				default:
 					return nil
 				}
 			}
 			return errors.New("max retries exceeded")
 		})
 	}
 	if err := g.Wait(); err != nil {
 		return err
 	}
 	// explicitly close the file so we can rename it
 	if err := file.Close(); err != nil {
 		return err
 	}
 	for i := range b.Parts {
 		if err := os.Remove(file.Name() + "-" + strconv.Itoa(i)); err != nil {
 			return err
 		}
 	}
 	if err := os.Rename(file.Name(), b.Name); err != nil {
 		return err
 	}
 	b.done = true
 	return nil
 }
 func (b *blobDownload) downloadChunk(ctx context.Context, requestURL *url.URL, w io.Writer, part *blobDownloadPart, opts *RegistryOptions) error {
 	headers := make(http.Header)
 	headers.Set("Range", fmt.Sprintf("bytes=%d-%d", part.StartsAt(), part.StopsAt()-1))
 	resp, err := makeRequest(ctx, "GET", requestURL, headers, nil, opts)
 	if err != nil {
 		return err
 	}
 	defer resp.Body.Close()
 	n, err := io.Copy(w, io.TeeReader(resp.Body, b))
 	if err != nil && !errors.Is(err, context.Canceled) {
 		// rollback progress
 		b.Completed.Add(-n)
 		return err
 	}
 	part.Completed += n
 	if err := b.writePart(part.Name(), part); err != nil {
 		return err
 	}
 	// return nil or context.Canceled
 	return err
 }
 func (b *blobDownload) newPart(offset, size int64) error {
 	part := blobDownloadPart{blobDownload: b, Offset: offset, Size: size, N: len(b.Parts)}
 	if err := b.writePart(part.Name(), &part); err != nil {
 		return err
 	}
 	b.Parts = append(b.Parts, &part)
 	return nil
 }
 func (b *blobDownload) readPart(partName string) (*blobDownloadPart, error) {
 	var part blobDownloadPart
 	partFile, err := os.Open(partName)
 	if err != nil {
 		return nil, err
 	}
 	defer partFile.Close()
 	if err := json.NewDecoder(partFile).Decode(&part); err != nil {
 		return nil, err
 	}
 	part.blobDownload = b
 	return &part, nil
 }
 func (b *blobDownload) writePart(partName string, part *blobDownloadPart) error {
 	partFile, err := os.OpenFile(partName, os.O_CREATE|os.O_RDWR|os.O_TRUNC, 0644)
 	if err != nil {
 		return err
 	}
 	defer partFile.Close()
 	return json.NewEncoder(partFile).Encode(part)
 }
 func (b *blobDownload) Write(p []byte) (n int, err error) {
 	n = len(p)
 	b.Completed.Add(int64(n))
 	return n, nil
 }
 func (b *blobDownload) acquire() {
 	b.references.Add(1)
 }
 func (b *blobDownload) release() {
 	if b.references.Add(-1) == 0 {
 		b.CancelFunc()
 	}
 }
 func (b *blobDownload) Wait(ctx context.Context, fn func(api.ProgressResponse)) error {
 	b.acquire()
 	defer b.release()
 	ticker := time.NewTicker(60 * time.Millisecond)
 	for {
 		select {
 		case <-ticker.C:
 		case <-ctx.Done():
 			return ctx.Err()
 		}
 		fn(api.ProgressResponse{
 			Status:    fmt.Sprintf("downloading %s", b.Digest),
 			Digest:    b.Digest,
 			Total:     b.Total,
 			Completed: b.Completed.Load(),
 		})
 		if b.done || b.err != nil {
 			return b.err
 		}
 	}
 }
 type downloadOpts struct {
 	mp      ModelPath
 	digest  string
 	regOpts *RegistryOptions
 	fn      func(api.ProgressResponse)
 	retry   int // track the number of retries on this download
 }
-const maxRetry = 3
+const maxRetries = 3
 // downloadBlob downloads a blob from the registry and stores it in the blobs directory
 func downloadBlob(ctx context.Context, opts downloadOpts) error {
@@ -42,196 +313,34 @@ func downloadBlob(ctx context.Context, opts downloadOpts) error {
 		return err
 	}
-	if fi, _ := os.Stat(fp); fi != nil {
+	fi, err := os.Stat(fp)
-		// we already have the file, so return
+	switch {
 	case errors.Is(err, os.ErrNotExist):
 	case err != nil:
 		return err
 	default:
 		opts.fn(api.ProgressResponse{
 			Status:    fmt.Sprintf("downloading %s", opts.digest),
 			Digest:    opts.digest,
-			Total:     int(fi.Size()),
+			Total:     fi.Size(),
-			Completed: int(fi.Size()),
+			Completed: fi.Size(),
 		})
 		return nil
 	}
-	fileDownload := &FileDownload{
+	data, ok := blobDownloadManager.LoadOrStore(opts.digest, &blobDownload{Name: fp, Digest: opts.digest})
-		Digest:    opts.digest,
+	download := data.(*blobDownload)
-		FilePath:  fp,
+	if !ok {
-		Total:     1, // dummy value to indicate that we don't know the total size yet
+		requestURL := opts.mp.BaseURL()
-		Completed: 0,
+		requestURL = requestURL.JoinPath("v2", opts.mp.GetNamespaceRepository(), "blobs", opts.digest)
-	}
+		if err := download.Prepare(ctx, requestURL, opts.regOpts); err != nil {
-
+			blobDownloadManager.Delete(opts.digest)
 	_, downloading := inProgress.LoadOrStore(opts.digest, fileDownload)
 	if downloading {
 		// this is another client requesting the server to download the same blob concurrently
 		return monitorDownload(ctx, opts, fileDownload)
 	}
 	if err := doDownload(ctx, opts, fileDownload); err != nil {
 		if errors.Is(err, errDownload) && opts.retry < maxRetry {
 			opts.retry++
 			log.Print(err)
 			log.Printf("retrying download of %s", opts.digest)
 			return downloadBlob(ctx, opts)
 		}
 		return err
 	}
 	return nil
 }
 var downloadMu sync.Mutex // mutex to check to resume a download while monitoring
 // monitorDownload monitors the download progress of a blob and resumes it if it is interrupted
 func monitorDownload(ctx context.Context, opts downloadOpts, f *FileDownload) error {
 	tick := time.NewTicker(time.Second)
 	for range tick.C {
 		done, resume, err := func() (bool, bool, error) {
 			downloadMu.Lock()
 			defer downloadMu.Unlock()
 			val, downloading := inProgress.Load(f.Digest)
 			if !downloading {
 				// check once again if the download is complete
 				if fi, _ := os.Stat(f.FilePath); fi != nil {
 					// successful download while monitoring
 					opts.fn(api.ProgressResponse{
 						Digest:    f.Digest,
 						Total:     int(fi.Size()),
 						Completed: int(fi.Size()),
 					})
 					return true, false, nil
 				}
 				// resume the download
 				inProgress.Store(f.Digest, f) // store the file download again to claim the resume
 				return false, true, nil
 			}
 			f, ok := val.(*FileDownload)
 			if !ok {
 				return false, false, fmt.Errorf("invalid type for in progress download: %T", val)
 			}
 			opts.fn(api.ProgressResponse{
 				Status:    fmt.Sprintf("downloading %s", f.Digest),
 				Digest:    f.Digest,
 				Total:     int(f.Total),
 				Completed: int(f.Completed),
 			})
 			return false, false, nil
 		}()
 		if err != nil {
 			return err
 		}
-		if done {
+
-			// done downloading
+		go download.Run(context.Background(), requestURL, opts.regOpts)
 			return nil
 		}
 		if resume {
 			return doDownload(ctx, opts, f)
 		}
 	}
-	return nil
+
-}
+	return download.Wait(ctx, opts.fn)
 var (
 	chunkSize   = 1024 * 1024 // 1 MiB in bytes
 	errDownload = fmt.Errorf("download failed")
 )
 // doDownload downloads a blob from the registry and stores it in the blobs directory
 func doDownload(ctx context.Context, opts downloadOpts, f *FileDownload) error {
 	defer inProgress.Delete(f.Digest)
 	var size int64
 	fi, err := os.Stat(f.FilePath + "-partial")
 	switch {
 	case errors.Is(err, os.ErrNotExist):
 		// noop, file doesn't exist so create it
 	case err != nil:
 		return fmt.Errorf("stat: %w", err)
 	default:
 		size = fi.Size()
 		// Ensure the size is divisible by the chunk size by removing excess bytes
 		size -= size % int64(chunkSize)
 		err := os.Truncate(f.FilePath+"-partial", size)
 		if err != nil {
 			return fmt.Errorf("truncate: %w", err)
 		}
 	}
 	requestURL := opts.mp.BaseURL()
 	requestURL = requestURL.JoinPath("v2", opts.mp.GetNamespaceRepository(), "blobs", f.Digest)
 	headers := make(http.Header)
 	headers.Set("Range", fmt.Sprintf("bytes=%d-", size))
 	resp, err := makeRequest(ctx, "GET", requestURL, headers, nil, opts.regOpts)
 	if err != nil {
 		log.Printf("couldn't download blob: %v", err)
 		return fmt.Errorf("%w: %w", errDownload, err)
 	}
 	defer resp.Body.Close()
 	if resp.StatusCode >= http.StatusBadRequest {
 		body, _ := io.ReadAll(resp.Body)
 		return fmt.Errorf("%w: on download registry responded with code %d: %v", errDownload, resp.StatusCode, string(body))
 	}
 	err = os.MkdirAll(filepath.Dir(f.FilePath), 0o700)
 	if err != nil {
 		return fmt.Errorf("make blobs directory: %w", err)
 	}
 	remaining, _ := strconv.ParseInt(resp.Header.Get("Content-Length"), 10, 64)
 	f.Completed = size
 	f.Total = remaining + f.Completed
 	inProgress.Store(f.Digest, f)
 	out, err := os.OpenFile(f.FilePath+"-partial", os.O_CREATE|os.O_APPEND|os.O_WRONLY, 0o644)
 	if err != nil {
 		return fmt.Errorf("open file: %w", err)
 	}
 	defer out.Close()
 outerLoop:
 	for {
 		select {
 		case <-ctx.Done():
 			// handle client request cancellation
 			inProgress.Delete(f.Digest)
 			return nil
 		default:
 			opts.fn(api.ProgressResponse{
 				Status:    fmt.Sprintf("downloading %s", f.Digest),
 				Digest:    f.Digest,
 				Total:     int(f.Total),
 				Completed: int(f.Completed),
 			})
 			if f.Completed >= f.Total {
 				if err := out.Close(); err != nil {
 					return err
 				}
 				if err := os.Rename(f.FilePath+"-partial", f.FilePath); err != nil {
 					opts.fn(api.ProgressResponse{
 						Status:    fmt.Sprintf("error renaming file: %v", err),
 						Digest:    f.Digest,
 						Total:     int(f.Total),
 						Completed: int(f.Completed),
 					})
 					return err
 				}
 				break outerLoop
 			}
 		}
 		n, err := io.CopyN(out, resp.Body, int64(chunkSize))
 		if err != nil && !errors.Is(err, io.EOF) {
 			return fmt.Errorf("%w: %w", errDownload, err)
 		}
 		f.Completed += n
 		inProgress.Store(f.Digest, f)
 	}
 	log.Printf("success getting %s\n", f.Digest)
 	return nil
 }
--- a/server/images.go
+++ b/server/images.go
@@ -30,8 +30,6 @@ import (
 	"github.com/jmorganca/ollama/version"
 )
 const MaxRetries = 3
 type RegistryOptions struct {
 	Insecure bool
 	Username string
@@ -103,7 +101,7 @@ type ManifestV2 struct {
 type Layer struct {
 	MediaType string `json:"mediaType"`
 	Digest    string `json:"digest"`
-	Size      int    `json:"size"`
+	Size      int64  `json:"size"`
 	From      string `json:"from,omitempty"`
 }
@@ -129,11 +127,11 @@ type RootFS struct {
 	DiffIDs []string `json:"diff_ids"`
 }
-func (m *ManifestV2) GetTotalSize() int {
+func (m *ManifestV2) GetTotalSize() (total int64) {
 	var total int
 	for _, layer := range m.Layers {
 		total += layer.Size
 	}
 	total += m.Config.Size
 	return total
 }
@@ -374,6 +372,7 @@ func CreateModel(ctx context.Context, workDir, name string, path string, fn func
 			}
 			if mf != nil {
 				fn(api.ProgressResponse{Status: "reading model metadata"})
 				sourceBlobPath, err := GetBlobsPath(mf.Config.Digest)
 				if err != nil {
 					return err
@@ -649,8 +648,8 @@ func embeddingLayers(workDir string, e EmbeddingParams) ([]*LayerReader, error)
 					e.fn(api.ProgressResponse{
 						Status:    fmt.Sprintf("creating embeddings for file %s", filePath),
 						Digest:    fileDigest,
-						Total:     len(data) - 1,
+						Total:     int64(len(data) - 1),
-						Completed: i,
+						Completed: int64(i),
 					})
 					if len(existing[d]) > 0 {
 						// already have an embedding for this line
@@ -675,7 +674,7 @@ func embeddingLayers(workDir string, e EmbeddingParams) ([]*LayerReader, error)
 					Layer: Layer{
 						MediaType: "application/vnd.ollama.image.embed",
 						Digest:    digest,
-						Size:      r.Len(),
+						Size:      r.Size(),
 					},
 					Reader: r,
 				}
@@ -831,7 +830,7 @@ func formatParams(params map[string][]string) (map[string]interface{}, error) {
 						return nil, fmt.Errorf("invalid int value %s", vals)
 					}
-					out[key] = int(intVal)
+					out[key] = intVal
 				case reflect.Bool:
 					boolVal, err := strconv.ParseBool(vals[0])
 					if err != nil {
@@ -1005,6 +1004,39 @@ func PruneLayers() error {
 	return nil
 }
 func PruneDirectory(path string) error {
 	info, err := os.Lstat(path)
 	if err != nil {
 		return err
 	}
 	if info.IsDir() && info.Mode()&os.ModeSymlink == 0 {
 		entries, err := os.ReadDir(path)
 		if err != nil {
 			return err
 		}
 		for _, entry := range entries {
 			if err := PruneDirectory(filepath.Join(path, entry.Name())); err != nil {
 				return err
 			}
 		}
 		entries, err = os.ReadDir(path)
 		if err != nil {
 			return err
 		}
 		if len(entries) > 0 {
 			return nil
 		}
 		return os.Remove(path)
 	}
 	return nil
 }
 func DeleteModel(name string) error {
 	mp := ParseModelPath(name)
 	manifest, _, err := GetManifest(mp)
@@ -1356,14 +1388,14 @@ func createConfigLayer(config ConfigV2, layers []string) (*LayerReader, error) {
 }
 // GetSHA256Digest returns the SHA256 hash of a given buffer and returns it, and the size of buffer
-func GetSHA256Digest(r io.Reader) (string, int) {
+func GetSHA256Digest(r io.Reader) (string, int64) {
 	h := sha256.New()
 	n, err := io.Copy(h, r)
 	if err != nil {
 		log.Fatal(err)
 	}
-	return fmt.Sprintf("sha256:%x", h.Sum(nil)), int(n)
+	return fmt.Sprintf("sha256:%x", h.Sum(nil)), n
 }
 // Function to check if a blob already exists in the Docker registry
@@ -1384,7 +1416,7 @@ func checkBlobExistence(ctx context.Context, mp ModelPath, digest string, regOpt
 func makeRequestWithRetry(ctx context.Context, method string, requestURL *url.URL, headers http.Header, body io.ReadSeeker, regOpts *RegistryOptions) (*http.Response, error) {
 	var status string
-	for try := 0; try < MaxRetries; try++ {
+	for try := 0; try < maxRetries; try++ {
 		resp, err := makeRequest(ctx, method, requestURL, headers, body, regOpts)
 		if err != nil {
 			log.Printf("couldn't start upload: %v", err)
@@ -1454,13 +1486,14 @@ func makeRequest(ctx context.Context, method string, requestURL *url.URL, header
 		req.ContentLength = contentLength
 	}
-	client := &http.Client{
+	proxyURL, err := http.ProxyFromEnvironment(req)
-		CheckRedirect: func(req *http.Request, via []*http.Request) error {
+	if err != nil {
-			if len(via) >= 10 {
+		return nil, err
-				return fmt.Errorf("too many redirects")
+	}
-			}
+
-			log.Printf("redirected to: %s\n", req.URL)
+	client := http.Client{
-			return nil
+		Transport: &http.Transport{
 			Proxy: http.ProxyURL(proxyURL),
 		},
 	}
--- a/server/modelpath_test.go
+++ b/server/modelpath_test.go
@@ -4,9 +4,9 @@ import "testing"
 func TestParseModelPath(t *testing.T) {
 	tests := []struct {
-		name    string
+		name string
-		arg    string
+		arg  string
-		want    ModelPath
+		want ModelPath
 	}{
 		{
 			"full path https",
--- a/server/routes.go
+++ b/server/routes.go
@@ -3,8 +3,10 @@ package server
 import (
 	"context"
 	"encoding/json"
 	"errors"
 	"fmt"
 	"io"
 	"io/fs"
 	"log"
 	"net"
 	"net/http"
@@ -66,7 +68,6 @@ func load(ctx context.Context, workDir string, model *Model, reqOpts map[string]
 	}
 	if err := opts.FromMap(reqOpts); err != nil {
 		log.Printf("could not merge model options: %v", err)
 		return err
 	}
@@ -170,6 +171,11 @@ func GenerateHandler(c *gin.Context) {
 	model, err := GetModel(req.Model)
 	if err != nil {
 		var pErr *fs.PathError
 		if errors.As(err, &pErr) {
 			c.JSON(http.StatusNotFound, gin.H{"error": fmt.Sprintf("model '%s' not found, try pulling it first", req.Model)})
 			return
 		}
 		c.JSON(http.StatusBadRequest, gin.H{"error": err.Error()})
 		return
 	}
@@ -179,6 +185,10 @@ func GenerateHandler(c *gin.Context) {
 	// TODO: set this duration from the request if specified
 	sessionDuration := defaultSessionDuration
 	if err := load(c.Request.Context(), workDir, model, req.Options, sessionDuration); err != nil {
 		if errors.Is(err, api.ErrInvalidOpts) {
 			c.JSON(http.StatusBadRequest, gin.H{"error": err.Error()})
 			return
 		}
 		c.JSON(http.StatusInternalServerError, gin.H{"error": err.Error()})
 		return
 	}
@@ -233,6 +243,23 @@ func GenerateHandler(c *gin.Context) {
 		}
 	}()
 	if req.Stream != nil && !*req.Stream {
 		var response api.GenerateResponse
 		generated := ""
 		for resp := range ch {
 			if r, ok := resp.(api.GenerateResponse); ok {
 				generated += r.Response
 				response = r
 			} else {
 				c.JSON(http.StatusInternalServerError, gin.H{"error": err.Error()})
 				return
 			}
 		}
 		response.Response = generated
 		c.JSON(http.StatusOK, response)
 		return
 	}
 	streamResponse(c, ch)
 }
@@ -292,8 +319,6 @@ func PullModelHandler(c *gin.Context) {
 		regOpts := &RegistryOptions{
 			Insecure: req.Insecure,
 			Username: req.Username,
 			Password: req.Password,
 		}
 		ctx, cancel := context.WithCancel(c.Request.Context())
@@ -304,6 +329,11 @@ func PullModelHandler(c *gin.Context) {
 		}
 	}()
 	if req.Stream != nil && !*req.Stream {
 		waitForStream(c, ch)
 		return
 	}
 	streamResponse(c, ch)
 }
@@ -323,8 +353,6 @@ func PushModelHandler(c *gin.Context) {
 		regOpts := &RegistryOptions{
 			Insecure: req.Insecure,
 			Username: req.Username,
 			Password: req.Password,
 		}
 		ctx := context.Background()
@@ -333,6 +361,11 @@ func PushModelHandler(c *gin.Context) {
 		}
 	}()
 	if req.Stream != nil && !*req.Stream {
 		waitForStream(c, ch)
 		return
 	}
 	streamResponse(c, ch)
 }
@@ -360,6 +393,11 @@ func CreateModelHandler(c *gin.Context) {
 		}
 	}()
 	if req.Stream != nil && !*req.Stream {
 		waitForStream(c, ch)
 		return
 	}
 	streamResponse(c, ch)
 }
@@ -378,6 +416,18 @@ func DeleteModelHandler(c *gin.Context) {
 		}
 		return
 	}
 	manifestsPath, err := GetManifestPath()
 	if err != nil {
 		c.JSON(http.StatusInternalServerError, gin.H{"error": err.Error()})
 		return
 	}
 	if err := PruneDirectory(manifestsPath); err != nil {
 		c.JSON(http.StatusInternalServerError, gin.H{"error": err.Error()})
 		return
 	}
 	c.JSON(http.StatusOK, nil)
 }
@@ -588,6 +638,31 @@ func Serve(ln net.Listener, allowOrigins []string) error {
 	return s.Serve(ln)
 }
 func waitForStream(c *gin.Context, ch chan interface{}) {
 	c.Header("Content-Type", "application/json")
 	for resp := range ch {
 		switch r := resp.(type) {
 		case api.ProgressResponse:
 			if r.Status == "success" {
 				c.JSON(http.StatusOK, r)
 				return
 			}
 		case gin.H:
 			if errorMsg, ok := r["error"].(string); ok {
 				c.JSON(http.StatusInternalServerError, gin.H{"error": errorMsg})
 				return
 			} else {
 				c.JSON(http.StatusInternalServerError, gin.H{"error": "unexpected error format in progress response"})
 				return
 			}
 		default:
 			c.JSON(http.StatusInternalServerError, gin.H{"error": "unexpected progress response"})
 			return
 		}
 	}
 	c.JSON(http.StatusInternalServerError, gin.H{"error": "unexpected end of progress response"})
 }
 func streamResponse(c *gin.Context, ch chan any) {
 	c.Header("Content-Type", "application/x-ndjson")
 	c.Stream(func(w io.Writer) bool {
@@ -602,6 +677,7 @@ func streamResponse(c *gin.Context, ch chan any) {
 			return false
 		}
 		// Delineate chunks with new-line delimiter
 		bts = append(bts, '\n')
 		if _, err := w.Write(bts); err != nil {
 			log.Printf("streamResponse: w.Write failed with %s", err)
--- a/server/upload.go
+++ b/server/upload.go
@@ -10,13 +10,14 @@ import (
 	"net/url"
 	"os"
 	"strconv"
 	"sync"
 	"github.com/jmorganca/ollama/api"
 )
 const (
-	redirectChunkSize = 1024 * 1024 * 1024
+	redirectChunkSize int64 = 1024 * 1024 * 1024
-	regularChunkSize  = 95 * 1024 * 1024
+	regularChunkSize  int64 = 95 * 1024 * 1024
 )
 func startUpload(ctx context.Context, mp ModelPath, layer *Layer, regOpts *RegistryOptions) (*url.URL, int64, error) {
@@ -48,7 +49,7 @@ func startUpload(ctx context.Context, mp ModelPath, layer *Layer, regOpts *Regis
 		return nil, 0, err
 	}
-	return locationURL, int64(chunkSize), nil
+	return locationURL, chunkSize, nil
 }
 func uploadBlob(ctx context.Context, requestURL *url.URL, layer *Layer, chunkSize int64, regOpts *RegistryOptions, fn func(api.ProgressResponse)) error {
@@ -73,10 +74,10 @@ func uploadBlob(ctx context.Context, requestURL *url.URL, layer *Layer, chunkSiz
 		fn:     fn,
 	}
-	for offset := int64(0); offset < int64(layer.Size); {
+	for offset := int64(0); offset < layer.Size; {
-		chunk := int64(layer.Size) - offset
+		chunk := layer.Size - offset
-		if chunk > int64(chunkSize) {
+		if chunk > chunkSize {
-			chunk = int64(chunkSize)
+			chunk = chunkSize
 		}
 		resp, err := uploadBlobChunk(ctx, http.MethodPatch, requestURL, f, offset, chunk, regOpts, &pw)
@@ -85,7 +86,7 @@ func uploadBlob(ctx context.Context, requestURL *url.URL, layer *Layer, chunkSiz
 				Status:    fmt.Sprintf("error uploading chunk: %v", err),
 				Digest:    layer.Digest,
 				Total:     layer.Size,
-				Completed: int(offset),
+				Completed: offset,
 			})
 			return err
@@ -127,7 +128,7 @@ func uploadBlob(ctx context.Context, requestURL *url.URL, layer *Layer, chunkSiz
 }
 func uploadBlobChunk(ctx context.Context, method string, requestURL *url.URL, r io.ReaderAt, offset, limit int64, opts *RegistryOptions, pw *ProgressWriter) (*http.Response, error) {
-	sectionReader := io.NewSectionReader(r, int64(offset), limit)
+	sectionReader := io.NewSectionReader(r, offset, limit)
 	headers := make(http.Header)
 	headers.Set("Content-Type", "application/octet-stream")
@@ -138,7 +139,7 @@ func uploadBlobChunk(ctx context.Context, method string, requestURL *url.URL, r
 		headers.Set("Content-Range", fmt.Sprintf("%d-%d", offset, offset+sectionReader.Size()-1))
 	}
-	for try := 0; try < MaxRetries; try++ {
+	for try := 0; try < maxRetries; try++ {
 		resp, err := makeRequest(ctx, method, requestURL, headers, io.TeeReader(sectionReader, pw), opts)
 		if err != nil && !errors.Is(err, io.EOF) {
 			return nil, err
@@ -152,7 +153,7 @@ func uploadBlobChunk(ctx context.Context, method string, requestURL *url.URL, r
 				return nil, err
 			}
-			pw.completed = int(offset)
+			pw.completed = offset
 			if _, err := uploadBlobChunk(ctx, http.MethodPut, location, r, offset, limit, nil, pw); err != nil {
 				// retry
 				log.Printf("retrying redirected upload: %v", err)
@@ -170,7 +171,7 @@ func uploadBlobChunk(ctx context.Context, method string, requestURL *url.URL, r
 			opts.Token = token
-			pw.completed = int(offset)
+			pw.completed = offset
 			sectionReader = io.NewSectionReader(r, offset, limit)
 			continue
 		case resp.StatusCode >= http.StatusBadRequest:
@@ -187,19 +188,23 @@ func uploadBlobChunk(ctx context.Context, method string, requestURL *url.URL, r
 type ProgressWriter struct {
 	status    string
 	digest    string
-	bucket    int
+	bucket    int64
-	completed int
+	completed int64
-	total     int
+	total     int64
 	fn        func(api.ProgressResponse)
 	mu        sync.Mutex
 }
 func (pw *ProgressWriter) Write(b []byte) (int, error) {
 	pw.mu.Lock()
 	defer pw.mu.Unlock()
 	n := len(b)
-	pw.bucket += n
+	pw.bucket += int64(n)
 	pw.completed += n
 	// throttle status updates to not spam the client
-	if pw.bucket >= 1024*1024 || pw.completed >= pw.total {
+	if pw.bucket >= 1024*1024 || pw.completed+pw.bucket >= pw.total {
 		pw.completed += pw.bucket
 		pw.fn(api.ProgressResponse{
 			Status:    pw.status,
 			Digest:    pw.digest,
Author	SHA1	Message	Date
Matt Williams	4522109b11	addressing new comments after merge Signed-off-by: Matt Williams <m@technovangelist.com>	2023-10-15 14:17:23 -07:00
Matt Williams	b2974a7095	applied mikes comments Signed-off-by: Matt Williams <m@technovangelist.com>	2023-10-14 08:29:24 -07:00
Matt Williams	3c975f898f	update doc to refer to docker image Signed-off-by: Matt Williams <m@technovangelist.com>	2023-10-12 15:57:50 -07:00
Matt Williams	9245c8a1df	add how to quantize doc Signed-off-by: Matt Williams <m@technovangelist.com>	2023-10-12 15:34:57 -07:00
Michael Yang	7a537cdca9	Merge pull request #770 from jmorganca/mxyng/fix-download fix download	2023-10-12 12:56:43 -07:00
Michael Yang	257ffeb997	fix download	2023-10-12 12:52:43 -07:00
Matt Williams	9b513bb6b1	Merge pull request #753 from jmorganca/mattw/examplereorg rename the examples to be more descriptive	2023-10-12 11:24:12 -07:00
Matt Williams	042100f797	final rename Signed-off-by: Matt Williams <m@technovangelist.com>	2023-10-12 11:23:41 -07:00
Bruce MacDonald	7804b8fab9	validate api options fields from map (#711 )	2023-10-12 11:18:11 -04:00
Bruce MacDonald	56497663c8	relay model runner error message to client (#720 ) * give direction to user when runner fails * also relay errors from timeout * increase timeout to 3 minutes	2023-10-12 11:16:37 -04:00
Matt Williams	e1afcb8af2	simple gen to simple Signed-off-by: Matt Williams <m@technovangelist.com>	2023-10-11 21:29:07 -07:00
Matt Williams	385eeea357	remove with Signed-off-by: Matt Williams <m@technovangelist.com>	2023-10-11 21:26:11 -07:00
Matt Williams	8a41b244e8	add golang gen Signed-off-by: Matt Williams <m@technovangelist.com>	2023-10-11 21:20:50 -07:00
Jeffrey Morgan	92578798bb	fix relative links in `README.md`	2023-10-11 19:24:06 -04:00
Michael Yang	788637918a	Merge pull request #760 from jmorganca/mxyng/more-downloads Mxyng/more downloads	2023-10-11 14:33:10 -07:00
Michael Yang	c413a55093	download: handle inner errors	2023-10-11 14:15:30 -07:00
Michael Yang	630bb75d2a	dynamically size download parts based on file size	2023-10-11 14:10:25 -07:00
Michael Yang	a2055a1e93	update download	2023-10-11 14:10:25 -07:00
Michael Yang	b599946b74	add format bytes	2023-10-11 14:08:23 -07:00
Michael Yang	aca2d65b82	Merge pull request #757 from jmorganca/mxyng/format-time cleanup format time	2023-10-11 11:12:29 -07:00
Michael Yang	b5e08e3373	cleanup format time	2023-10-11 11:09:27 -07:00
Bruce MacDonald	274d5a5fdf	optional parameter to not stream response (#639 ) * update streaming request accept header * add optional stream param to request bodies	2023-10-11 12:54:27 -04:00
Matt Williams	fc6b49be32	add ts alternate to python langchain simplegen Signed-off-by: Matt Williams <m@technovangelist.com>	2023-10-11 09:50:15 -07:00
Bruce MacDonald	77295f716e	prevent waiting on exited command (#752 ) * prevent waiting on exited command * close llama runner once	2023-10-11 12:32:13 -04:00
Matt Williams	615f7d1dea	cleanup readme. Signed-off-by: Matt Williams <m@technovangelist.com>	2023-10-11 06:13:29 -07:00
Matt Williams	cdf5e106ae	rename dirs Signed-off-by: Matt Williams <m@technovangelist.com>	2023-10-11 06:10:24 -07:00
Matt Williams	a85329f59a	rename the models to be more descriptive Signed-off-by: Matt Williams <m@technovangelist.com>	2023-10-10 17:40:02 -07:00
Bruce MacDonald	f2ba1311aa	improve vram safety with 5% vram memory buffer (#724 ) * check free memory not total * wait for subprocess to exit	2023-10-10 16:16:09 -04:00
Jeffrey Morgan	65dcd0ce35	always cleanup blob download (#747 )	2023-10-10 13:12:29 -04:00
Michael Yang	0040f543a2	Merge pull request #743 from jmorganca/mxyng/http-proxy handle upstream proxies	2023-10-10 09:59:06 -07:00
Matt Williams	767f9bdbbb	Merge pull request #585 from jmorganca/matt/examplementors add the example for ask the mentors	2023-10-09 13:58:14 -07:00
Costa Alexoglou	f7f5169c94	Update api.md (#741 ) Avoid triple ticks in visual editor and also copied in clipboard.	2023-10-09 16:01:46 -04:00
Michael Yang	2cfffea02e	handle client proxy	2023-10-09 12:33:47 -07:00
Michael Yang	f6e98334e4	handle upstream proxies	2023-10-09 11:42:36 -07:00
Jeffrey Morgan	ab0668293c	llm: fix build on `amd64`	2023-10-06 14:39:54 -07:00
Bruce MacDonald	af4cf55884	not found error before pulling model (#718 )	2023-10-06 16:06:20 -04:00
Bruce MacDonald	d6786f2945	add feedback for reading model metadata (#722 )	2023-10-06 16:05:32 -04:00
Michael Yang	38dc2f79bc	Merge pull request #626 from jmorganca/mxyng/concurrent-downloads parallel chunked downloads	2023-10-06 13:01:29 -07:00
Michael Yang	cb961c87ca	Merge pull request #679 from jamesbraza/modelfile-docs `Modelfile` syntax highlighting	2023-10-06 12:59:45 -07:00
Michael Yang	0560b28a8d	names	2023-10-06 12:56:56 -07:00
Michael Yang	10199c5987	replace done channel with file check	2023-10-06 12:56:56 -07:00
Michael Yang	288814d3e4	fix ref counts	2023-10-06 12:56:43 -07:00
Michael Yang	04733438da	check head request response	2023-10-06 12:56:43 -07:00
Michael Yang	711e891f0f	fix resumable downloads glob returns files in lexical order which is not appropriate when rebuilding the parts list	2023-10-06 12:56:43 -07:00
Michael Yang	090d08422b	handle unexpected eofs	2023-10-06 12:56:43 -07:00
Michael Yang	5b84404c64	handle concurrent requests for the same blobs	2023-10-06 12:56:43 -07:00
Michael Yang	8544edca21	parallel chunked downloads	2023-10-06 12:56:43 -07:00
Bruce MacDonald	5d22319a2c	rename server subprocess (#700 ) - this makes it easier to see that the subprocess is associated with ollama	2023-10-06 10:15:42 -04:00
Bruce MacDonald	2130c0708b	output type parsed from modelfile (#678 )	2023-10-05 14:58:04 -04:00
Patrick Devine	61ff1946e6	revise help text (#706 )	2023-10-05 11:36:07 -07:00
Bruce MacDonald	d06bc0cb6e	enable q8, q5, 5_1, and f32 for linux gpu (#699 )	2023-10-05 12:53:47 -04:00
Alexander F. Rødseth	d104b7e997	Fix go test./... issue: fmt.Println arg list ends with redundant newline (#705 )	2023-10-05 11:11:04 -04:00
Bruce MacDonald	9e2de1bd2c	increase streaming buffer size (#692 )	2023-10-04 14:09:00 -04:00
Jeffrey Morgan	dc87e9c9ae	update `Dockerfile` to pass `GOFLAGS`	2023-10-03 07:05:15 -07:00
Michael Yang	367cb68dc1	Merge pull request #686 from jmorganca/mxyng/starcoder decode starcoder	2023-10-02 22:47:19 -07:00
Michael Yang	c02c0cd483	starcoder	2023-10-02 19:56:51 -07:00
Patrick Devine	1852755154	show a default message when license/parameters/system prompt/template aren't specified (#681 )	2023-10-02 14:34:52 -07:00
James Braza	6f2ce74231	Got rif of all caps to show it can be lower case	2023-10-02 13:54:27 -07:00
James Braza	6edcc5c79f	Using code highlighting syntax around Modelfile	2023-10-02 13:46:05 -07:00
Bruce MacDonald	b1f7123301	clean up num_gpu calculation code (#673 )	2023-10-02 14:53:42 -04:00
Bruce MacDonald	1fbf3585d6	Relay default values to llama runner (#672 ) * include seed in params for llama.cpp server and remove empty filter for temp * relay default predict options to llama.cpp - reorganize options to match predict request for readability * omit empty stop --------- Co-authored-by: hallh <hallh@users.noreply.github.com>	2023-10-02 14:53:16 -04:00
Patrick Devine	99d5161e8a	don't wordwrap when stdout is redirected or piped (#662 )	2023-10-02 11:50:55 -07:00
Michael	ea8380be45	add community project: Chatbot Ollama add community project: Chatbot Ollama by @ivanfioravanti	2023-10-02 09:04:31 -07:00
Jeffrey Morgan	4f25092dc1	fix `build_docker.sh` permissions	2023-10-01 16:42:32 -07:00
Jiayu Liu	4fc10acce9	add some missing code directives in docs (#664 )	2023-10-01 11:51:01 -07:00
Michael Yang	0a4f21c0a7	fix docker build (#659 )	2023-09-30 13:34:01 -07:00
Jeffrey Morgan	9abb66254a	docker: fix volume permission errors	2023-09-30 12:32:15 -07:00
Jay Nakrani	1d0ebe67e8	Document response stream chunk delimiter. (#632 ) Document response stream chunk delimiter.	2023-09-29 21:45:52 -07:00
Bruce MacDonald	a1b2d95f96	remove unused push/pull params (#650 )	2023-09-29 17:27:19 -04:00
Michael Yang	c0b1bf7537	Merge pull request #606 from jmorganca/mxyng/install.sh-2 ordered list of install locations	2023-09-29 11:30:46 -07:00
Michael Yang	cdfeb165ca	Merge pull request #608 from jmorganca/mxyng/build update build scripts	2023-09-29 11:30:25 -07:00
Michael Yang	92d454ec5f	update build_darwin.sh	2023-09-29 11:29:23 -07:00
Michael Yang	9333b0cc82	Merge pull request #612 from jmorganca/mxyng/prune-empty-directories prune empty directories	2023-09-29 11:23:39 -07:00
Bruce MacDonald	9771b1ec51	windows runner fixes (#637 )	2023-09-29 11:47:55 -04:00
Patrick Devine	76db4a49cf	allow the user to cancel generating with ctrl-C (#641 )	2023-09-28 17:13:01 -07:00
Luc Stepniewski	4aa0976a2e	Added missing return preventing SIGSEGV because of missing resp (#621 ) Co-authored-by: Luc Stepniewski <luc@eclipse-fr.com>	2023-09-28 14:25:22 -07:00
Patrick Devine	92c20fdae6	fix error messages for unknown commands in the repl (#611 )	2023-09-28 14:19:45 -07:00
Michael Yang	c951da7096	Merge pull request #634 from jmorganca/mxyng/int64 use int64 consistently	2023-09-28 14:17:47 -07:00
Bruce MacDonald	24d82a23a2	do not download updates multiple times (#633 )	2023-09-28 15:29:17 -04:00
Michael Yang	f40b3de758	use int64 consistently	2023-09-28 11:07:24 -07:00
Michael	5f4008c296	Update README.md adding in instruction to run mistral	2023-09-28 09:06:03 -07:00
Aaron Coffey	6ae33d8141	Update modelfile.md to reflect the usage of num_gpu. (#629 )	2023-09-28 10:21:21 -04:00
Jeffrey Morgan	c5664c1fef	Update faq.md	2023-09-27 13:49:43 -07:00
Bruce MacDonald	958a5a8184	revert fedora cuda version check	2023-09-27 15:12:29 -04:00
Michael Yang	8608eb4760	prune empty directories	2023-09-27 10:58:09 -07:00
Bruce MacDonald	a2b210130f	fedora install fixes (#609 )	2023-09-27 11:43:47 -04:00
Bruce MacDonald	ed20837f9a	Update modelfile.md	2023-09-27 10:38:10 -04:00
James Braza	1db2a61dd0	Added num_predict to the options table (#614 )	2023-09-27 10:26:08 -04:00
Jeffrey Morgan	2ded8ab206	use `11.8.0` nvidia dockerfile base image for now	2023-09-26 21:48:41 -07:00
Michael Yang	e6b3648bbf	Merge pull request #616 from jmorganca/mxyng/fix-model-name	2023-09-26 20:54:18 -07:00
Michael Yang	0625e805f0	fix model name not matching	2023-09-26 19:50:04 -07:00
Michael Yang	c38ec5befb	Merge pull request #598 from jmorganca/mxyng/help-exit add painter message for exit	2023-09-26 15:17:40 -07:00
Michael Yang	c577721a43	Merge pull request #605 from jmorganca/mxyng/install.sh do not unload nouveau driver	2023-09-26 09:53:05 -07:00
Michael Yang	29c056ea39	ordered list of install locations	2023-09-26 09:38:11 -07:00
Michael Yang	9fc3bba9cf	do no unload nouveau driver	2023-09-26 09:36:54 -07:00
Michael Chiang	7774ed4ae6	Update README.md for linux + cleanup (#601 ) Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>	2023-09-25 23:44:53 -07:00
Michael Yang	11f920f209	Merge pull request #599 from jmorganca/mxyng/install.sh update install.sh	2023-09-25 18:24:13 -07:00
Michael Yang	6e6b655956	update install.sh	2023-09-25 18:09:44 -07:00
Michael Yang	110ae89a6c	Merge pull request #596 from jmorganca/mxyng/install.sh update install.sh	2023-09-25 17:59:13 -07:00
Michael Yang	5e388f931e	check cuda installed before installing	2023-09-25 17:56:43 -07:00
Michael Yang	d5ad41dd7b	fix path for wsl user	2023-09-25 17:56:25 -07:00
Michael Yang	d294a11bc9	start service on exit instead of immediately	2023-09-25 17:54:02 -07:00
Michael Yang	93d887e4bc	add painter message for exit	2023-09-25 16:30:22 -07:00
Jeffrey Morgan	5306b0269d	Update linux.md	2023-09-25 16:10:32 -07:00
Michael Yang	7de0c8345d	Merge pull request #595 from jmorganca/mxyng/install.sh ignore systemctl is-system-running exit code	2023-09-25 15:49:47 -07:00
Michael Yang	1b9dcab3ab	ignore systemctl is-system-running exit code	2023-09-25 15:47:45 -07:00
Bruce MacDonald	86279f4ae3	unbound max num gpu layers (#591 ) --------- Co-authored-by: Michael Yang <mxyng@pm.me>	2023-09-25 18:36:46 -04:00
Michael Yang	b934bf23e6	exit on unknown distro (#594 )	2023-09-25 15:30:58 -07:00
Michael Yang	2b8ef455ad	Merge pull request #593 from jmorganca/mxyng/install.sh update install.sh	2023-09-25 14:09:40 -07:00
Michael Yang	0c5f47177c	update install.sh	2023-09-25 14:01:44 -07:00
Michael Yang	1210db2924	Merge pull request #592 from jmorganca/mxyng/install.sh fix dkms on debian	2023-09-25 12:59:01 -07:00
Michael Yang	d0854bf1e6	fix dkms on debian	2023-09-25 12:57:25 -07:00
Michael Yang	8396463255	Merge pull request #590 from jmorganca/mxyng/install.sh fix dkms install	2023-09-25 12:17:31 -07:00
Michael Yang	a027bbf4d7	fix dkms install	2023-09-25 12:16:41 -07:00
Michael Yang	ed94a3dd02	Merge pull request #589 from jmorganca/mxyng/install.sh update install.sh	2023-09-25 11:08:25 -07:00
Michael Yang	f14f62ab3b	update install.sh	2023-09-25 11:05:38 -07:00
Jeffrey Morgan	0fb5268496	Update linux.md	2023-09-25 10:06:23 -07:00
Bruce MacDonald	c65edb1506	fix linux installer warning logs (#588 )	2023-09-25 11:22:56 -04:00
Twan L	1605af32ec	Added a new community project (#574 )	2023-09-25 10:40:59 -04:00
Jeffrey Morgan	ee3032ad89	improvements to `docs/linux.md`	2023-09-24 21:50:07 -07:00
Jeffrey Morgan	5b7a27281d	improvements to `docs/linux.md`	2023-09-24 21:38:23 -07:00
Jeffrey Morgan	d2a784e33e	add `docs/linux.md`	2023-09-24 21:34:44 -07:00
Jeffrey Morgan	413a2e4f91	set `DEBIAN_FRONTEND=noninteractive` correctly	2023-09-24 20:35:42 -07:00
Matt Williams	a92fdff620	add the example for ask the mentors Signed-off-by: Matt Williams <m@technovangelist.com>	2023-09-24 15:58:32 -07:00
Patrick Devine	b5614f3ebc	fix end-of-line issue with the new prompt (#582 )	2023-09-23 17:20:30 -07:00
Jeffrey Morgan	8b2ba9cab8	minor improvements to `install.sh`	2023-09-23 11:20:39 -04:00
Jeffrey Morgan	e29662ab5c	fix minor install script issues on debian	2023-09-23 10:25:47 -04:00
Bruce MacDonald	cbc40aa996	debian installer support (#579 ) * debian installer support - normalize os name to lowercase - check needed commands are available - dont check sudo when root user - share common install commands - support debian cuda install - skip aarm cuda install - system user shared home dir * refactor and add other platforms (#580) --------- Co-authored-by: Michael Yang <mxyng@pm.me>	2023-09-23 09:46:47 -04:00
Jeffrey Morgan	5cb82540c9	`install.sh`: update install url	2023-09-23 09:35:14 -04:00
Jeffrey Morgan	d7849a1dc9	add `.env` to `.dockerignore`	2023-09-23 00:53:48 -04:00