Compare commits

..

3015 Commits

Author SHA1 Message Date
ParthSareen
23e8ac9428 wip? 2025-05-07 19:00:44 -07:00
ParthSareen
611d3a17ed server: add python tool parsing logic 2025-05-02 16:23:54 -07:00
Michael Yang
5cfc1c39f3 model: fix build (#10416) 2025-04-25 19:24:48 -07:00
Michael Yang
f0ad49ea17 memory 2025-04-25 16:59:20 -07:00
Michael Yang
7ba9fa9c7d fixes for maverick 2025-04-25 16:59:20 -07:00
Michael Yang
8bf11b84c1 chunked attention 2025-04-25 16:59:20 -07:00
Michael Yang
470af8ab89 connect vision to text 2025-04-25 16:59:20 -07:00
Michael Yang
178761aef3 image processing
Co-authored-by: Patrick Devine <patrick@infrahq.com>
2025-04-25 16:59:20 -07:00
Michael Yang
f0c66e6dea llama4 2025-04-25 16:59:20 -07:00
Michael Yang
54055a6dae fix test 2025-04-25 16:59:01 -07:00
Michael Yang
340448d2d1 explicitly decode maxarraysize 1024 2025-04-25 16:59:01 -07:00
Michael Yang
ced7d0e53d fix parameter count 2025-04-25 16:59:01 -07:00
Michael Yang
a0dba0f8ae default slice values 2025-04-25 16:59:01 -07:00
Michael Yang
5e20b170a7 update comment 2025-04-25 16:59:01 -07:00
Michael Yang
d26c18e25c fix token type 2025-04-25 16:59:01 -07:00
Michael Yang
8d376acc9b zero means zero
use a default of 1024 when asking for zero is confusing since most calls
seem to assume 0 means do not ready any data
2025-04-25 16:59:01 -07:00
Michael Yang
dc1e81f027 convert: use -1 for read all 2025-04-25 16:59:01 -07:00
Michael Yang
5d0279164c generic ggml.array 2025-04-25 16:59:01 -07:00
Michael Yang
214a7678ea fix superfluous call to WriteHeader
the first call to http.ResponseWriter.Write implicitly calls WriteHeader
with http.StatusOK if it hasn't already been called. once WriteHeader
has been called, subsequent calls has no effect. Write is called when
JSON encoding progressUpdateJSON{}. calls to
http.ResponseWriter.WriteHeader after the first encode is useless and
produces a warning:

http: superfluous response.WriteHeader call from github.com/ollama/ollama/server/internal/registry.(*statusCodeRecorder).WriteHeader (server.go:77)
2025-04-25 16:58:49 -07:00
Michael Yang
4892872c18 convert: change to colmajor 2025-04-25 15:27:39 -07:00
Michael Yang
0b9198bf47 ci: silence deprecated gpu targets warning 2025-04-25 13:37:54 -07:00
Jeffrey Morgan
e9e5f61c45 llama: update to commit 2016f07b (#10352) 2025-04-24 17:26:02 -07:00
Parth Sareen
11dde41824 server: improve spacing for JSON grammar (#10131) 2025-04-24 16:47:57 -07:00
Parth Sareen
a53d744b01 llama: remove model loading for grammar (#10096) 2025-04-24 11:51:19 -07:00
Adrien Duermael
40b10eee6d api: fix ImageData struct comment to expect raw image bytes (#10386) 2025-04-24 12:13:51 +09:00
Devon Rifkin
424f648632 increase default context length to 4096 (#10364)
* increase default context length to 4096

We lower the default numParallel from 4 to 2 and use these "savings" to
double the default context length from 2048 to 4096.

We're memory neutral in cases when we previously would've used
numParallel == 4, but we add the following mitigation to handle some
cases where we would have previously fallen back to 1x2048 due to low
VRAM: we decide between 2048 and 4096 using a runtime check, choosing
2048 if we're on a one GPU system with total VRAM of <= 4 GB. We
purposefully don't check the available VRAM because we don't want the
context window size to change unexpectedly based on the available VRAM.

We plan on making the default even larger, but this is a relatively
low-risk change we can make to quickly double it.

* fix tests

add an explicit context length so they don't get truncated. The code
that converts -1 from being a signal for doing a runtime check isn't
running as part of these tests.

* tweak small gpu message

* clarify context length default

also make it actually show up in `ollama serve --help`
2025-04-22 16:33:24 -07:00
Richard Shiue
2eb1fb3231 readme: add AppFlowy to community integrations (#10335) 2025-04-20 15:38:06 -07:00
greengrass821
0806521642 cmd: add support for escaping ~ in filepath (#10339)
Co-authored-by: tooth paste <tooth_paste91@Poorneshwars-MacBook-Pro.local>
2025-04-20 15:21:48 -07:00
Michael Yang
88738b357b create tempdir in models directory
the models directory should have plenty of storage and also ensure
there's no cross-device copy
2025-04-18 18:13:05 -07:00
Blake Mizerany
4e535e6188 server/internal/registry: make pull send errors with Error field (#10326)
Previously, the pull handler would send an error message in the Status
field, this prevented the client from using the message as a signal to
stop. In the case of the "run" command, it would follow the pull with a
"show" which would print a nearly identical "not found" message for
unresolved models.

Fixes #10307
2025-04-18 18:12:28 -07:00
Michael Yang
40b8fdbdca arange 2025-04-18 11:45:44 -07:00
Blake Mizerany
1d99451ad7 server/internal/client/ollama: handle some network errors gracefully (#10317) 2025-04-17 12:43:09 -07:00
Jeffrey Morgan
09bb2e30f6 ml/backend/ggml: use default CUDA compression mode (#10314) 2025-04-16 19:54:20 -07:00
Jeffrey Morgan
dc264be6ff ml: add missing cmake property and remove additional CMakeLists.txt (#10310) 2025-04-16 18:56:29 -07:00
Devon Rifkin
fbe7039618 Merge pull request #10290 from ollama/drifkin/template-highlighting
docs: change more template blocks to have syntax highlighting
2025-04-16 15:15:08 -07:00
Jeffrey Morgan
943464ccb8 llama: update to commit 71e90e88 (#10192) 2025-04-16 15:14:01 -07:00
Blake Mizerany
369de832cd server/internal/registry: remove superfluous progress bar flush (#10303)
This removes the extra flushProgress() at the end of handlePull. It is
unnecessary because final progress updates are flushed in all cases of
the main select loop.
2025-04-16 14:43:07 -07:00
Blake Mizerany
3457a315b2 server/internal/client/ollama: cleanup use of multiple counters (#10304)
The completed and received counters must work in tandem and the code
should better reflect that. Previously, the act of updating them was 2-3
lines of code duplicated in multiple places. This consolidates them into
a single update closure for easy reading and maintenance.

This also simplifies error handling in places where we can use a return
parameter and defer to handle the error case for updates.

Also, remove the old Layer field from the trackingReader struct.
2025-04-16 14:33:40 -07:00
Daniel Hiltgen
ed4e139314 Integration test improvements (#9654)
Add some new test coverage for various model architectures,
and switch from orca-mini to the small llama model.
2025-04-16 14:25:55 -07:00
Daniel Hiltgen
56dc316a57 Give tests more time to run (#10306)
Fix flake failures on windows
2025-04-16 13:37:00 -07:00
Michael Yang
2fec73eef6 fix write gguf padding 2025-04-16 10:24:35 -07:00
Blake Mizerany
1e7f62cb42 cmd: add retry/backoff (#10069)
This commit adds retry/backoff to the registry client for pull requests.

Also, revert progress indication to match original client's until we can
"get it right."

Also, make WithTrace wrap existing traces instead of clobbering them.
This allows clients to compose traces.
2025-04-15 23:24:44 -07:00
Jesse Gross
ccb7eb8135 ggml: Free ggml_backend_buffer_t when releasing buffer
When ggml_backend_buffer_free() is called, the device memory
is released but not all backends consistently release the actual
ggml_backend_buffer_t in system RAM, causing a memory leak.

Bug #10040
2025-04-15 15:29:58 -07:00
Devon Rifkin
637fd21230 docs: change more template blocks to have syntax highlighting
In #8215 syntax highlighting was added to most of the blocks, but there were a couple that were still being rendered as plaintext
2025-04-15 12:08:11 -07:00
Devon Rifkin
0fe487e732 Merge pull request #10276 from ollama/drifkin/cors-headers
server: add `OpenAI-Beta` header to CORS safelist
2025-04-14 17:42:51 -07:00
Devon Rifkin
6bfaa6e282 Merge pull request #10277 from ollama/drifkin/docs-json-errors
docs: update some response code blocks to json5
2025-04-14 17:11:20 -07:00
Devon Rifkin
378d3210dc docs: update some response code blocks to json5
This is to prevent rendering bright red comments indicating invalid JSON when the comments are just supposed to be explanatory
2025-04-14 17:09:06 -07:00
Devon Rifkin
97fe45e36d server: add OpenAI-Beta header to CORS safelist
alphabetized the compat list and then added a single header

fixes: #9801
2025-04-14 15:36:10 -07:00
CYJiang
64a9cc8f05 cmd: add missing file close in tests (#10179) 2025-04-14 07:49:41 -04:00
Jesse Gross
f50d691254 ggml: Fix memory leak on input tensors
For every forward pass through the model, we need to allocate input
tensors: tokens, images, positions, outputs and masks. These get
allocated in system memory.

However, when we close the context that the tensors were allocated
through, the metadata gets freed but the actual backend memory does
not. This results in a significant memory leak.

This makes it so that all the memory allocated through a context
gets freed when it is closed.

Fixes #10040
2025-04-11 11:13:22 -07:00
Jesse Gross
34c3b68fc8 ggml: Don't allocate CPU buffers as CUDA Host buffers
Allocating (and in particular, freeing) memory from CUDA host buffers
is expensive and can cause a significant performance hit if we do
it for every token. Using normal system memory avoids this issue
and also gives the OS more flexibility to manage it.

There is no performance impact from this patch directly (either
positive or negative) but it makes a difference once we start
freeing memory correctly.
2025-04-11 11:13:22 -07:00
Jesse Gross
f33ccd5d27 ggml: Use pointer receivers for Context
Context is currently mixed between pointer and value receivers. Change
this to be all pointer receivers so don't have to reason about whether
the things we are updating in the struct will be retained.
2025-04-11 11:13:22 -07:00
Jesse Gross
bc108b9ad6 ggml: Log filesystem errors
Sometimes loading the GGUF file fails with:
panic: context canceled

This is probably a filesystem error but it doesn't provide any
information about what happened.
2025-04-11 11:13:06 -07:00
Tom Sheffler
ef65174df2 types: include the 'items' and '$defs' fields to properly handle "array" types (#10091)
---------

Co-authored-by: Parth Sareen <parth.sareen@ollama.com>
2025-04-09 17:45:49 -07:00
Ire Gaddr
42ecb9f138 fix(scheduler): make model unload order deterministic (#10185) 2025-04-09 16:01:02 -07:00
湛露先生
5c0331fd83 Fix dockerfile. (#9855)
Signed-off-by: zhanluxianshen <zhanluxianshen@163.com>
2025-04-09 13:24:56 -07:00
CYJiang
e7019c9455 fix(integration): move waitgroup Add(1) outside goroutine to avoid potential issue (#10070)
Signed-off-by: googs1025 <googs1025@gmail.com>
2025-04-08 15:17:40 -07:00
Michael Yang
d98bfe7e70 kvcache: stub out test structs 2025-04-08 15:08:29 -07:00
Parth Sareen
6747099d71 types: add any type and validation for ToolFunction enum (#10166) 2025-04-08 15:05:38 -07:00
frob
ccc8c6777b cleanup: remove OLLAMA_TMPDIR and references to temporary executables (#10182)
* cleanup: remove OLLAMA_TMPDIR
* cleanup: ollama doesn't use temporary executables anymore

---------

Co-authored-by: Richard Lyons <frob@cloudstaff.com>
2025-04-08 15:01:39 -07:00
Jesse Gross
dbb149e6f7 ollamarunner: Preallocate worst case graph at startup
Currently, the KV cache and graph are lazily allocated as needed.
The cache is fully allocated on first use of the corresponding
layer whereas the graph grows with the size of the context.

This can be an issue if another application allocates more VRAM
after we do our calculations - Ollama will crash in the middle of
inference. If we instead allocate the maximum needed memory at
startup of the runner, we will either succeed or fail at that point
rather than at some surprising time in the future.

Currently, this only generates a worst case batch for text, which
means that vision models may get a partial allocation and continue
to lazily allocate the rest.
2025-04-08 10:01:28 -07:00
Jesse Gross
a807985e59 ggml: Check for OOM and return as Go errors
If there is a CUDA OOM, we currently don't check the return value
and will evetually segfault. This checks for the problem and generates
a Go error. At the moment, this will still result in a panic but having
the error is the first step to being able to handle it more gracefully.
2025-04-08 10:01:28 -07:00
qwerty108109
8643c4d5bf readme: fix url for big-AGI in community integrations (#10173) 2025-04-07 19:42:26 -07:00
Jonathan Hecl
b0c3aba590 readme: add GGUF-to-ollama to community integrations (#10156) 2025-04-07 16:31:45 -07:00
qwerty108109
19c0c25de8 readme: rename community integration from Claude Dev to Cline (#10168) 2025-04-07 16:27:20 -07:00
Alex Rozgo
2f723ac2d6 types: allow tool function parameters with a single type or an array of types (#9434) 2025-04-07 14:27:01 -07:00
Devon Rifkin
249fbbe52f Merge pull request #10169 from ollama/drifkin/fix-contributing-formatting
CONTRIBUTING: fix code block formatting
2025-04-07 14:02:35 -07:00
Devon Rifkin
c38680b8a1 CONTRIBUTING: fix code block formatting
There were only 3 spaces instead of 4, so the example was being considered to include html elements
2025-04-07 13:53:33 -07:00
Michael Yang
16fca86c4a digest files in parallel 2025-04-07 09:46:31 -07:00
Daniel Hipke
0f3f9e353d ml/backend/ggml: create a new file descriptor for tensor (#10133)
improves model loading times on network-based filesystems
such as GCS fuse by creating a dedicated file descriptor for each
section of the file being read, reducing seeking
2025-04-04 17:04:24 -07:00
Bruce MacDonald
6bd0a983cd model: support for mistral-small in the ollama runner
Mistral is a popular research lab making open source models. This updates
the forward pass of llama architecture models to support both llama models
and mistral models by accounting for additional metadata present in mistral
models, and finding the correct dimensions for the output projection.
2025-04-03 16:57:36 -07:00
Michael Yang
1861fbdeb5 Merge pull request #9873 from ollama/mxyng/fs-config
fs: move ml.Config to fs package
2025-04-03 14:05:21 -07:00
Michael Yang
3b96a93672 fs: move ml.Config to fs package 2025-04-03 13:12:24 -07:00
Bruce MacDonald
e53b3cbd0c llm: set done reason at server level (#9830)
No functional change. Many different done reasons can be set at the runner
level, so rather than obsuring them we should return them to the server
process and let it choose what to do with the done reason. This separates
the API concerns from the runner.
2025-04-03 10:19:24 -07:00
Jeffrey Morgan
b51e0f397c model: fix issues with spm tokenizer for Gemma 3 (#10081) 2025-04-02 13:22:56 -07:00
jmorganca
b42970063d kvcache: Add check for values that fall out of sliding window cache
The sliding window cache trims entries that are outside the window for
the latest token. This works when we are extending the cache, such as
when the conversation continues. However, if we have a partial overlap
in conversation (including the BOS tokens), then we resume from a past
point in the conversation and the needed tokens are no longer stored
in memory. This verifies that the new window overlaps with the old one
before reusing the cache.

Co-authored-by: Jesse Gross <jesse@ollama.com>
2025-04-02 11:55:48 -07:00
Jesse Gross
493385eb3e ollamarunner: Don't truncate a SameBatch
When truncating inputs to the the context window at the beginning of
a sequence, we remove the minimum amount possible. However, this
may cause us to truncate to the middle of a set of inputs that
the model specified should not be split up. To avoid this, we
need to remove the rest of the partial batch.
2025-04-02 10:40:38 -07:00
Bruce MacDonald
9876c9faa4 chore(all): replace instances of interface with any (#10067)
Both interface{} and any (which is just an alias for interface{} introduced in Go 1.18) represent the empty interface that all types satisfy.
2025-04-02 09:44:27 -07:00
IsAurora6
4e415029b3 readme: add Casibase to community integrations (#10057) 2025-04-02 01:27:16 -07:00
Bruce MacDonald
e172f095ba api: return model capabilities from the show endpoint (#10066)
With support for multimodal models becoming more varied and common it is important for clients to be able to easily see what capabilities a model has. Retuning these from the show endpoint will allow clients to easily see what a model can do.
2025-04-01 15:21:46 -07:00
Ilian
c001b98087 docs: add TagSpaces to community integrations (#9983) 2025-03-31 17:28:59 -07:00
Abyss-c0re
23fc8e92eb docs: add DeepShell to community projects (#9955)
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com>
2025-03-31 17:23:04 -07:00
湛露先生
4059a297a6 discover: /proc/cpuinfo file open and close. (#9950)
Signed-off-by: zhanluxianshen <zhanluxianshen@163.com>
2025-03-31 17:07:42 -07:00
Bruce MacDonald
66b2539238 runner: clear cache when shift is not possible (#9433)
Clear KV cache when shift operation is not supported by model.
Added KvCacheCanShift() check to handle models that can't perform cache shifts,
falling back to full cache clear while preserving logical token history to
maintain expected behavior when context window fills up.
2025-03-31 12:54:45 -07:00
Blake Mizerany
ef27d52e79 server/internal/client/ollama: cache completed chunks (#9933)
This change adds tracking of download chunks during the pull process so
that subsequent pulls can skip downloading already completed chunks.
This works across restarts of ollama.

Currently, download state will be lost if a prune is triggered during a
pull (e.g. restart or remove). This issue should be addressed in a
follow-up PR.
2025-03-30 23:54:54 -07:00
Jesse Gross
b2a465296d runner: Release semaphore and improve error messages on failures
If we have an error after creating a new sequence but before
finding a slot for it, we return without releasing the semaphore.
This reduces our parallel sequences and eventually leads to deadlock.

In practice this should never happen because once we have acquired
the semaphore, we should always be able to find a slot. However, the
code is clearly not correct.
2025-03-30 19:21:54 -07:00
Jesse Gross
5d097277ef ollamarunner: Ensure batch size limits are not exceeded
With the llama runner, we can generate up to NUM_PARALLEL batches
at once, which will then get broken up to into individual batches
to get executed by llama.cpp (i.e. we add up to 2048 tokens and
this gets split into 4 batches of 512 tokens at default settings).

This splitting can improve parallelism on multi-GPU systems because
the individual batches can move though the pipeline without blocking
on the first one to fully complete. However, we don't yet support
this in the Ollama runner, partially because it makes it hard to
enforce model-specified batch constraints, which didn't exist
previously.

The result is that we will try to execute the full, unsplit batch.
This could result in out of memory or insufficient KV cache space
errors.

This triggers batch breaking when the total inputs from all sequences
exceeds the batch size, rather than per-sequence. In order to ensure
fairness, it also reintroduces round-robinning around sequences so
that we don't let one busy sequence starve the others.
2025-03-30 19:21:01 -07:00
Leandro Borges Ferreira
071a9872cb readme: add Writeopia to community integrations (#10042) 2025-03-30 17:28:06 -07:00
CYJiang
0bd0454ea7 server: organize error types (#9465)
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com>
2025-03-28 11:50:22 -07:00
Jesse Gross
01aa788722 ml: Remove Output from Context interface
Model implementations should use Input for all of their tensors
supplied to the model. This includes tensors that relate to the
outputs, which is confusing since there is also an Output funciton.

Since Output is only used internally in GGML and not used by any
model implementations, we can remove it from the interface to
reduce confusion.
2025-03-27 12:19:43 -07:00
saman-amd
ead27aa9fe Add gfx1200 & gfx1201 support on linux (#9878) 2025-03-27 07:35:19 -07:00
Parth Sareen
b816ff86c9 docs: make context length faq readable (#10006) 2025-03-26 17:34:18 -07:00
molbal
e5d84fb90b docs: add molbal/orca-cli to community integrations (#9909) 2025-03-26 13:39:01 -07:00
Hengky Steen
dd66712e31 docs: add ollamb to community projects 2025-03-26 13:38:05 -07:00
Jesse Gross
f66216e399 ggml: Support heterogeneous KV cache layer sizes in memory estimation
Gemma3 uses sliding windows for its context on 5/6 layers, significantly
reducing memory usage but leading to uneven usage across layers,
which makes allocation to the correct GPU difficult. We currently
estimate very conservatively by assuming all layers are consistent
at the max size.

Llama3.2-vision is also inconsistent between self attention and cross
attention layers - at moment, we calculate the correct total size
and then average this across layers. In some cases, this may lead
to crashes if a large layer is placed on a GPU sized by the average.

This allows memory estimation to calculate per-layer KV cache size
and take this account when placing layers onto GPUs. We already do
this for weights that vary per-tensor, so this is a logical extension.

Fixes #9730
Fixes #9890
2025-03-26 13:16:03 -07:00
Jesse Gross
f4f0992b6e llm: Fix debug logging for memory estimates 2025-03-26 13:16:03 -07:00
Jesse Gross
1feff61977 kvcache: Sliding window cache only needs a single batch total
When computing the size of the cache for sliding window attention,
we don't need to multiple the batch size by the number of parallel
sequences - the batch size is constant.

This also simplifies the check for whether to allocate the cache
size based on capacity or window size as the batch size is already
incorporated into the capacity when handled by the runner.
2025-03-26 13:16:03 -07:00
copeland3300
5e0b904e88 docs: add flags to example linux log output command (#9852) 2025-03-25 09:52:23 -07:00
Matheus C. França
131f0355a5 readme: add ollama-d library (#9907) 2025-03-24 09:25:58 -07:00
Blake Mizerany
ce929984a3 server/internal/client/ollama: fix file descriptor management in Pull (#9931)
Close chunked writers as soon as downloads complete, rather than
deferring closure until Pull exits. This prevents exhausting file
descriptors when pulling many layers.

Instead of unbounded defers, use a WaitGroup and background goroutine
to close each chunked writer as soon as its downloads finish.

Also rename 'total' to 'received' for clarity.
2025-03-21 16:16:38 -07:00
Michael Yang
4b34930a31 Merge pull request #9897 from ollama/mxyng/chunk-load
ml/backend/ggml: load tensors in 128KiB chunks
2025-03-21 14:47:13 -07:00
Michael Yang
74bd09652d ml/backend/ggml: load tensors in 32KiB chunks 2025-03-21 14:43:52 -07:00
Bruce MacDonald
fb6252d786 benchmark: performance of running ollama server (#8643) 2025-03-21 13:08:20 -07:00
Blake Mizerany
c794fef2f2 server/internal/client/ollama: persist through chunk download errors (#9923) 2025-03-21 13:03:43 -07:00
Parth Sareen
00ebda8cc4 Revert "parser: remove role validation from Modelfile parser" (#9917)
This reverts commit ffbfe833da.
2025-03-21 12:38:09 -07:00
Parth Sareen
d14ce75b95 docs: update final response for /api/chat stream (#9919) 2025-03-21 12:35:47 -07:00
Jesse Gross
2d6eac9084 kvcache: Optimize sliding window attention
Currently sliding window attention allocates and uses the full
context size and just masks out any tokens that are outside of the
window. However, we really only need (roughly) the sliding window
size.

At large context sizes this improves two things:
 - Memory allocated - since the fully context size is allocated up front,
   memory requirements drop substantially. On Gemma3:4b with a 32k
   context window, total memory usage (including weights and non-sliding
   layers) drops from ~20GB to ~8GB.
 - Computation - ranges that are completely outside of the sliding
   window are now removed from the tensors that are returned from the
   cache rather than simply being masked out. This results in more
   efficient processing, scaling with the size of the context that
   has actually been used.

Notable, this does not update the scheduler for any model to be aware of
the smaller memory requirements. This is difficult for Gemma3 because
the layers are heterogeneous between sliding and non-sliding attention.
As a result, while actual memory consumption will be reduced, the
scheduler will over-estimate the requirements of the model. This means
that splitting between GPUs or GPUs and CPUs will still be suboptimal.

Bug #9730
2025-03-21 11:20:19 -07:00
Jesse Gross
3ed7ad3ab3 kvcache: Pass granular cache size into implementations
Currently the runner computes the kv size needed and creates a
cache of that size. This is the context size times number of
parallel sequences.

Cache implementations can make better decisions about their memory
usage, so instead pass in the required capacity, number of sequences
and maximum batch size. For now, the causal cache just uses this to
compute the size in the same way as before.
2025-03-21 11:20:19 -07:00
Patrick Devine
6d1103048e fix: show correct bool value for kv in verbose show information (#9928) 2025-03-21 11:13:54 -07:00
Jesse Gross
0ff28758b3 ollamarunner: Provide mechanism for backends to report loading progress
This enables the runner to report progress back to the Ollama server,
both for showing status to the user and also to prevent the server
from killing the runner if it thinks things have stalled.

Most of the infrastructure was already there, this extends it to
be available to the backends.
2025-03-21 10:44:26 -07:00
Jesse Gross
d3e9ca3eda kvcache: Account for source tensors in defrag operation count
Defragging the KV cache can generate a lot of operations, so we
need to be careful that we don't overflow the number that the graph
can support. We currently account for all of the nodes that we add
to the graph for each move but we also need to include the original
cache tensors as well.

Fixes #9904
2025-03-21 10:42:19 -07:00
Jesse Gross
0fbfcf3c9c model: Pass input tensor instead of raw data to models
Rather than directly giving the input data to models, we can
pass a tensor instead. In the short term, this saves some duplicated
code.

Longer term, we will want to overlap setting up the next batch with
processing of the current one. In this case, we will only have the
shape of tensor but it will not be loaded with data at the time of
graph generation. By passing only a tensor to models now, we set up
this possibility and prevent them from relying on data that they won't
have in the future.

Although the same could be done for Positions and Outputs, in some
cases we either need the raw input data or don't use them at all.
Therefore, for now we leave them as they are and allow models to
convert them to tensors as needed.
2025-03-20 13:28:13 -07:00
Jesse Gross
0c220935bd input: Rename Options to Batch
Options is no longer very descriptive of this struct.
2025-03-20 13:28:13 -07:00
rylativity
ffbfe833da parser: remove role validation from Modelfile parser (#9874)
* updates parser/parser.go to allow arbitrary roles in Modelfile MESSAGE blocks
2025-03-20 13:11:17 -07:00
Parth Sareen
42a14f7f63 sample: add error handling for empty logits (#9740) 2025-03-20 11:11:18 -07:00
Patrick Devine
f8c3dbe5b5 templates: add autotemplate for gemma3 (#9880)
This change allows the gemma3 template to be autodetected during `ollama
create`.
2025-03-20 00:15:30 -07:00
Jesse Gross
b078dd157c gemma2: Remove second call to Rows
Looks like a merge conflict that broke the model.
2025-03-19 17:28:49 -07:00
Blake Mizerany
2ddacd7516 server/internal/client/ollama: confirm all chunksums were received (#9893)
If the chunksums response is missing a chunk, the client should fail
the download. This changes the client to check that all bytes are
accounted for in the chunksums response.

It is possible there are overlaps or gaps in the chunksums response and
so the size is not the only thing left to check, but this provides
enough coverage for now. We may want to check that chunks are contiguous
later.
2025-03-19 14:59:57 -07:00
Jeffrey Morgan
da0e345200 ml: use input context for extracting outputs (#9875) 2025-03-18 18:08:19 -07:00
Bruce MacDonald
df94175a0f ggml: return error on failure to read tensor data (#9872)
When converting a ggml model if there is a failure to read tensor data a nil error value was being returned. It should be assigned to the actual error from reading.
2025-03-18 16:51:33 -07:00
Bruce MacDonald
61a8825216 convert: return name of unsupported architecture (#9862)
When a model's architecture cannot be converted return the name of the unsupported arch in the error message.
2025-03-18 10:38:28 -07:00
Michael Yang
021dcf089d Merge pull request #9824 from ollama/mxyng/sched
conditionally enable parallel pipelines
2025-03-17 15:41:37 -07:00
Jesse Gross
bf24498b1e ollamarunner: Check for minBatch of context space when shifting
Models can specify that a group of inputs need to be handled a single
batch. However, context shifting didn't respect this and could trigger
a break anyways. In this case, we should instead trigger a context
shift earlier so that it occurs before the grouped batch.

Note that there still some corner cases:
 - A long prompt that exceeds the context window can get truncated
   in the middle of an image. With the current models, this will
   result in the model not recognizing the image at all, which is
   pretty much the expected result with truncation.
 - The context window is set less than the minimum batch size. The
   only solution to this is to refuse to load the model with these
   settings. However, this can never occur with current models and
   default settings.

Since users are unlikely to run into these scenarios, fixing them is
left as a follow up.
2025-03-17 15:33:16 -07:00
Bruce MacDonald
95e271d98f runner: remove cache prompt flag from ollama runner (#9826)
We do not need to bypass the prompt caching in the ollama runner yet, as
only embedding models needed to bypass the prompt caching. When embedding
models are implemented they can skip initializing this cache completely.
2025-03-17 15:11:15 -07:00
Jeffrey Morgan
364629b8d6 ml/backend/ggml: allocate memory with malloc when loading model (#9822) 2025-03-17 13:32:40 -07:00
Parth Sareen
108fe02165 sample: make mutations in transforms explicit (#9743)
* updated minP to use early exit making use of sorted tokens
2025-03-17 11:24:18 -07:00
Michael Yang
4561fff36e conditionally enable parallel pipelines 2025-03-17 09:46:07 -07:00
Daniel Hiltgen
50b5962042 Add support for ROCm gfx1151 (#9773) 2025-03-17 09:33:57 -07:00
Louis Beaumont
e27e4a3c1b readme: add screenpipe to community integrations (#9786) 2025-03-16 21:56:42 -04:00
zeo
088514bbd4 readme: add Ellama to list of community integrations (#9800) 2025-03-16 21:54:43 -04:00
Patrick Devine
2c8b484643 fix: correctly save in interactive mode (#9788)
This fixes the case where a FROM line in previous modelfile points to a
file which may/may not be present in a different ollama instance. We
shouldn't be relying on the filename though and instead just check if
the FROM line was instead a valid model name and point to that instead.
2025-03-15 12:09:02 -07:00
Blake Mizerany
8294676150 server/internal/client/ollama: set User-Agent for registry client (#9775)
This sets the agent header in DefaultRegistry to include the version of
the client, OS, and architecture in the previous format, with a minor
twist.

Note: The version is obtained from the build info, instead of the
version in version.Version, which should not longer be necessary, but we
can remove in a future commit. Using the build info is more accurate and
also provides extra build information if the build is not tagged, and if
it is "dirty". Previously, the version was just "0.0.0" with no other
helpful information. The ollama.com registry and others handle this
swimmingly.
2025-03-14 18:33:07 -07:00
Patrick Devine
ef378ad673 gemma3 quantization (#9776) 2025-03-14 17:41:07 -07:00
Daniel Hiltgen
2d2247e59e Align versions for local builds (#9635)
Darwin was using a different pattern for the version string
than linux or windows.
2025-03-14 15:44:08 -07:00
Jesse Gross
7bf793a600 gemma3: Allow multiple image in a single input
Previously processing multiple images in a batch would trigger
segfaults so sending images together was disabled as a way to
mitigate this. The trigger was processing one image on the CPU
and one on the GPU.

This can no longer happen:
 - The vision encoder is now on the GPU so both images would be
   processed on the GPU.
 - We require images to be fully contained in a batch and each
   image including its special tokens is over half the batch size.
   As a result, we will never get two images in the same batch.

Fixes #9731
2025-03-14 15:38:54 -07:00
Jesse Gross
282bfaaa95 ollamarunner: Use a separate context per multimodal input
Currently there is a single context per sequence, shared all by
all multimodal inputs. Since we build a vision encoder graph per
image, with a large number of inputs we can eventually hit the
maximum number of graph nodes per context.

This changes to use a separate context for each image, ensuring
that available resource limits are consistent.
2025-03-14 15:38:54 -07:00
Jesse Gross
9679f40146 ml: Allow models to constrain inputs to a single batch
Models may require that a set of inputs all be processed as part
of the same batch. For example, if an image has multiple patches
with fully connected attention between them, we should not split
the batch in the middle of an image.

Fixes #9697
2025-03-14 15:38:54 -07:00
Bruce MacDonald
3892c3a703 llm: remove internal subprocess req and resp types (#9324)
This commit refactors the LLM subsystem by removing internal subprocess
request and response types. It consolidates duplicate type definitions
across the codebase, moving them to centralized locations. The change also
standardizes interfaces between components, simplifies the ServerStatusResp
struct, and moves the ParseDurationMs function to a common package. This
cleanup reduces code duplication between different runner implementations
(llamarunner and ollamarunner).
2025-03-14 15:21:53 -07:00
Blake Mizerany
4e320b8b90 server/internal/chunks: remove chunks package (#9755) 2025-03-14 08:57:59 -07:00
Blake Mizerany
eb2b22b042 server/internal/client: use chunksums for concurrent blob verification (#9746)
Replace large-chunk blob downloads with parallel small-chunk
verification to solve timeout and performance issues. Registry users
experienced progressively slowing download speeds as large-chunk
transfers aged, often timing out completely.

The previous approach downloaded blobs in a few large chunks but
required a separate, single-threaded pass to read the entire blob back
from disk for verification after download completion.

This change uses the new chunksums API to fetch many smaller
chunk+digest pairs, allowing concurrent downloads and immediate
verification as each chunk arrives. Chunks are written directly to their
final positions, eliminating the entire separate verification pass.

The result is more reliable downloads that maintain speed throughout the
transfer process and significantly faster overall completion, especially
over unstable connections or with large blobs.
2025-03-13 22:18:29 -07:00
Michael Yang
4ea4d2b189 Merge pull request #9703 from ollama/mxyng/gemma3-memory
count gemma3 vision tensors
2025-03-13 16:56:34 -07:00
Michael Yang
8d76fa23ef count non-repeating vision layers 2025-03-13 16:53:29 -07:00
Bradley Erickson
74b44fdf8f docs: Add OLLAMA_ORIGINS for browser extension support (#9643) 2025-03-13 16:35:20 -07:00
Michael Yang
65b88c544f fix divide by zero 2025-03-13 16:35:00 -07:00
Michael Yang
a422ba39c9 roughly count gemma3 graph
the largest operation is by far (q @ k) so just count that for
simplicity
2025-03-13 16:35:00 -07:00
Michael Yang
d2ec22371e count all vision tensors 2025-03-13 16:35:00 -07:00
Michael Yang
033cec232a count gemma3 vision tensors 2025-03-13 16:34:42 -07:00
Michael Yang
543240fb5f Merge pull request #9741 from ollama/mxyng/visionless
fix: error if image requested without vision model
2025-03-13 15:03:25 -07:00
Patrick Devine
4bed739259 add verbose mode to the show command (#9640)
Add metadata and tensor information to the show command to be able to
see more information about a model. This outputs the same data as
shown on the model details page on ollama.com
2025-03-13 14:24:27 -07:00
Patrick Devine
80c7ce381b fix: change default context size for gemma3 (#9744) 2025-03-13 13:59:19 -07:00
Michael Yang
ccfd41c4f0 Merge pull request #9742 from ollama/mxyng/engine-error-embeddings
fix: error on models that don't support embeddings
2025-03-13 13:12:33 -07:00
Michael Yang
3e102b7dad Update model/model.go
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>
2025-03-13 13:11:52 -07:00
Michael Yang
ec46f3286c engine: error on embeddings; not currently implemented 2025-03-13 11:40:55 -07:00
Michael Yang
5e2e0b46b1 fix: error if image requested without vision model 2025-03-13 10:52:09 -07:00
Michael Yang
45a13b1dec Merge pull request #9688 from Shane-XB-Qian/debug_mistype_lld
ollama-debug.c: correct mistype
2025-03-13 10:12:44 -07:00
Parth Sareen
5c0b663969 sample: separate softmax and temperature transforms (#9732) 2025-03-13 09:53:27 -07:00
shane.xb.qian
30d7a59ba8 ollama-debug.c: change 'ld' to 'PRIi64'
* macOS has different definition per info from @mxyng
2025-03-13 17:10:37 +08:00
ParthSareen
4aeb67ef4c sample: do all sorting in topK 2025-03-12 11:59:17 -07:00
ParthSareen
3ba91634c1 sample: simplify top_k=0 sorting 2025-03-12 11:59:17 -07:00
ParthSareen
1b7433b71e sample: use container/heap for top_k 2025-03-12 11:59:17 -07:00
Bruce MacDonald
a70820daa0 models/gemma3: remove final logit softcap (#9692)
Softcap isn't in the whitepaper/implementation for the language model so we should remove it. There is no discernible difference in output with it removed.
2025-03-12 10:17:57 -07:00
Shane-XB-Qian
6b45b1d6b4 cli: adding support ctrl-n/p like general cli (#9136)
Signed-off-by: shane.xb.qian <shane.qian@foxmail.com>
2025-03-12 08:51:56 -07:00
shane.xb.qian
85ab552028 ollama-debug.c: correct mistype
Signed-off-by: shane.xb.qian <shane.qian@foxmail.com>
2025-03-12 22:32:30 +08:00
frob
b3af953a55 cli: don't exit for invalid model during /load. (#9576)
Co-authored-by: Richard Lyons <frob@cloudstaff.com>
2025-03-11 23:42:53 -07:00
Michael
ad4e0bf3be Adding Gemma 3 to readme (#9671) 2025-03-12 07:39:25 +01:00
Michael Yang
aee28501b5 Merge pull request #9661 from ollama/gemma
engine: add gemma support
2025-03-11 15:07:50 -07:00
jmorganca
83f0ec8269 all: address linter errors 2025-03-11 14:49:20 -07:00
jmorganca
c6b6938b3a kvcache: fix tests by adding AvgPool2D stub 2025-03-11 14:49:20 -07:00
jmorganca
fb4664fcec model: add more spm tokenizer tests 2025-03-11 14:49:20 -07:00
jmorganca
20e3593863 model: validate left and right pairs before merging them 2025-03-11 14:49:20 -07:00
Michael Yang
63a394068c use 2d pooling 2025-03-11 14:49:20 -07:00
Daniel Hiltgen
ab39e08eb9 llm: auto detect models that require Ollama Engine (#1) 2025-03-11 14:49:20 -07:00
jmorganca
11bfa62796 add trailing \n\n after <end_of_image> to match reference implementation 2025-03-11 14:49:20 -07:00
jmorganca
f63e62e546 reduce kernel size, add TODO for loading from config 2025-03-11 14:49:20 -07:00
jmorganca
65b0f329d1 Revert "Allow models to force a new batch"
This reverts commit c7eae586b899083acebcd9b3847b89ea78c2850c.
2025-03-11 14:49:20 -07:00
Jesse Gross
06007c0a18 Allow models to force a new batch
This is useful for a few things:
 - Work around bugs, such as having 2 images in one batch
 - Keep the image in a single batch for fully connected attention
 - Improve performance by not evaluating embeddings multiple times
2025-03-11 14:49:20 -07:00
Jesse Gross
a8e83a7654 Disable causal attention based on batch index
Currently we are using positions, which are relative to a
sequence and may not be unique.
2025-03-11 14:49:20 -07:00
Jesse Gross
475005504e Restrict Gemma to a single image per request 2025-03-11 14:49:20 -07:00
Jesse Gross
2c40c4d35e Fix follow up images and images split across batches 2025-03-11 14:49:19 -07:00
Michael Yang
e95278932b use non-causal mask only for image positions 2025-03-11 14:49:19 -07:00
Michael Yang
9d2a20a763 use non-causal mask for inputs with images 2025-03-11 14:49:19 -07:00
Patrick Devine
2e54d72fc3 fix gemma3 1b conversion 2025-03-11 14:49:19 -07:00
Michael Yang
6b32a2d549 compat with upstream gguf 2025-03-11 14:49:19 -07:00
Michael Yang
c5cbe4fc2a fallback to cpu 2025-03-11 14:49:19 -07:00
Michael Yang
f888912870 fix vision encoder 2025-03-11 14:49:19 -07:00
Michael Yang
9e4642e9b3 ollama debug tensor 2025-03-11 14:49:19 -07:00
Michael Yang
6b0486c216 duplicate token_embd to output 2025-03-11 14:49:19 -07:00
Michael Yang
d368c039f0 skip repacking vision tensors 2025-03-11 14:49:19 -07:00
Patrick Devine
9b54267e69 fix configs 2025-03-11 14:49:19 -07:00
Michael Yang
46bb0169c4 update model 2025-03-11 14:49:19 -07:00
Michael Yang
8934324b72 use fast attention 2025-03-11 14:49:18 -07:00
Jesse Gross
0e886595bf Fix tests and drift from main 2025-03-11 14:49:18 -07:00
Patrick Devine
c62861f4fa fix conversion 2025-03-11 14:49:18 -07:00
Michael Yang
0df1800436 set non-causal attention 2025-03-11 14:49:18 -07:00
Patrick Devine
631fecc6d9 temporary work around for converting spm 2025-03-11 14:49:18 -07:00
Jesse Gross
4346c2409d fix drift from main 2025-03-11 14:49:18 -07:00
Michael Yang
4b037a97dc add gemma vision encoder 2025-03-11 14:49:17 -07:00
Patrick Devine
5f74d1fd47 gemma2 impl 2025-03-11 14:35:08 -07:00
Daniel Hiltgen
4dcf80167a Build release for windows with local script (#9636) 2025-03-11 08:34:20 -07:00
Michael Yang
26a26998fb Merge pull request #9590 from ollama/mxyng/dump-pad
fix: pad tensor item if ge zero
2025-03-10 16:34:55 -07:00
Michael Yang
9926eae015 fix: pad tensor item if ge zero
this produces a nicer output since both positive and negative values
produces the same width
2025-03-10 16:18:12 -07:00
Vincent Koc
8585b7b151 docs: add opik to observability integrations (#9626) 2025-03-10 16:15:10 -07:00
Parth Sareen
7e34f4fbfa sample: add numerical stability to temperature/softmax transform (#9631) 2025-03-10 14:43:53 -07:00
Michael Yang
fe776293f7 Merge pull request #9569 from dwt/patch-1
Better WantedBy declaration
2025-03-10 14:09:37 -07:00
frob
d8a5d96b98 docs: Add OLLAMA_CONTEXT_LENGTH to FAQ. (#9545) 2025-03-10 11:02:54 -07:00
Xiaowei Zhu
757668c42f docs: add SwiftChat (#9540) 2025-03-10 11:01:09 -07:00
Sam
96ec8afd09 docs(tool): add mcp-llm (#9537) 2025-03-10 09:52:02 -07:00
Jeffrey Morgan
e093db92c4 sample: temporarily use grammars for constrained generation in new engine (#9586) 2025-03-10 16:17:39 +01:00
Jesse Gross
a1cda80bcb model: Update encoder cache to use multimodal input processing handler
The encoder cache needs to know the position of images in the input
stream so that it knows when to delete them. Previously images didn't
have a position, so we implied one by breaking batches before an
image and then assuming the image was in the first position. However,
multimodal objects are now given explicit positions in the input
stream, so we can use that instead.

Breaking batches was also a way to simulate a cross attention mask
for mllama. However, given that it only supports a single sequence
and a single image, this mask doesn't serve any real purpose.
Removing the batch break does not appear to affect the quality of
the output.

Most of this is simply moving the input data structures to a new
package to avoid import cycles.
2025-03-09 17:05:26 -07:00
Jesse Gross
4614fafae0 ollamarunner: Don't panic for unimplemented features at runtime.
It's ok to fail on startup but we shouldn't panic during runtime
based on user input. Downgrade the panic to a warning.
2025-03-08 18:58:18 -08:00
Jesse Gross
4100ed7bdd ml: Add support for quantized KV cache
Similar to the llama engine, quantizing the KV cache requires
flash attention to be enabled through the Ollama server.
2025-03-07 18:43:39 -08:00
Jesse Gross
f52b2615ef kvcache: Set context for shift offsets 2025-03-07 18:43:39 -08:00
Jesse Gross
25f9b152f9 ggml-backend: Ensure allocation meet backend requirements
Backends can impose additional alignment requirements on buffer sizes.
We should ensure that we meet these or allocations can fail.
2025-03-07 18:43:39 -08:00
Jesse Gross
6da8b6a879 kvcache: Support non-causal attention
Models can disable causality for all or part of their processing
while continuing to store data in the KV cache.
2025-03-07 18:39:27 -08:00
Jesse Gross
0daaaef8c9 ollamarunner: Quiet debug logging and panic on unimplemented features
Debug logging of every token has previously caused test timeouts
on slower machines.
2025-03-07 18:38:02 -08:00
Jesse Gross
98272fbd58 additional review comments 2025-03-07 14:08:21 -08:00
Michael Yang
b27e8f3f10 ml/backend/ggml: use backend buffer type
this ensures the tensor is created on the right buffer type for backends
such as cpu
2025-03-07 14:08:21 -08:00
Michael Yang
45df786f09 comments 2025-03-07 14:08:21 -08:00
Michael Yang
daaf42e4a4 ml/backend/ggml: clean up 2025-03-07 14:08:21 -08:00
Michael Yang
2dc60d4620 ml/backend/ggml: offload vision to cpu
temporary until tensor loading can accurately account for vision models
2025-03-07 14:08:21 -08:00
Michael Yang
b5312f30e8 ml/backend/ggml: handle tensor split 2025-03-07 14:08:21 -08:00
Michael Yang
26c2e0bd35 ml/backend/ggml: handle user specified cpu offloading 2025-03-07 14:08:21 -08:00
Michael Yang
bf920883d5 ml/backend/ggml: set cpu n_threads 2025-03-07 14:08:21 -08:00
Michael Yang
58b9ec1f6b kvcache: update tests 2025-03-07 14:08:21 -08:00
Michael Yang
7bae7fa5ce ml/backend/ggml: create tensor on specific backend
some tensors should be created on specific backends to reduce number of
copies and improve performance
2025-03-07 14:08:21 -08:00
Michael Yang
764e199d67 kvcache: create cache ctx per layer
each cache layer creates and maintains its own context instead of using
a large context for all layers
2025-03-07 14:08:21 -08:00
Michael Yang
bfce55db3d model: load non-repeated tensors into multiple backends
some tensors are expected to be used in repeating layers but are not
themselves repeated. this change copies these tensors into the same
backends as their repeating counterparts to minimize copying tensors
between backends
2025-03-07 14:08:21 -08:00
Michael Yang
bab6f34dc0 ml/backend/ggml: update model loading for hybrid/multi backends
use a similar strategy as llama.cpp for deciding where tensors should be
allocated. this will be improved later to be aware of usable memory
before assigning the tensor
2025-03-07 14:08:21 -08:00
Parth Sareen
0682dae027 sample: improve ollama engine sampler performance (#9374)
This change bring in various interface cleanups along with greatly improving the performance of the sampler.

Tested with llama3.2 on local machine.
Improves performance from ~ 70 tokens/s -> 135 tokens/s with topK(40) enabled.
Without topK performance is ~ 110 tokens/s
2025-03-07 12:37:48 -08:00
Breaker
1f6986e919 readme: add QwQ to the supported models list (#9565) 2025-03-07 09:30:07 -08:00
Jeffrey Morgan
4289c74359 llama: fix kv loading on snowflake-arctic-embed models (#9536) 2025-03-07 09:25:34 -08:00
‮rekcäH nitraM‮
25248f4bd5 Better WantedBy declaration
The problem with default.target is that it always points to the target that is currently started. So if you boot into single user mode or the rescue mode still Ollama tries to start.

I noticed this because either tried (and failed) to start all the time during a system update, where Ollama definitely is not wanted.
2025-03-07 10:26:31 +01:00
Jesse Gross
a7e63b82be ollamarunner: Improve multimodal input handling
Various vision models have different requirements for how they
receive their inputs. For example:
 - Mllama wants images together with text and the image embeddings
   don't themselves have positions or get stored in the main KV cache
 - Llava-style models feed in embeddings similar to tokens and
   images correspond to a varying number of tokens in the cache.

In addition, the strategy for providing inputs must support batching
and multiple sequences, which are managed by the runner. At the same
time, we want to keep data handling fully in the model so that new
architectures are not bottlenecked by runner code which does not
understand their particular requirements.

This provides a method for models to edit the input stream so that
it meets their needs while still being in a format that the runner
understands. This allows the runner to avoid special processing
for different models.

In addition, this fixes a regression where non-vision models may
try to incorrectly interpret images.
2025-03-06 16:54:16 -08:00
Jesse Gross
b70fc4d51e model: Don't unconditionally add special tokens
We sometimes tokenize partial strings. For example, with
multimodal inputs, we split the input string around the images
and then tokenize each piece. In these cases, we should only add
the special tokens on the first piece.
2025-03-06 16:54:16 -08:00
Blake Mizerany
e2252d0fc6 server/internal/registry: take over pulls from server package (#9485)
This commit replaces the old pull implementation in the server package
with the new, faster, more robust pull implementation in the registry
package.

The new endpoint, and now the remove endpoint too, are behind the
feature gate "client2" enabled only by setting the OLLAMA_EXPERIMENT
environment variable include "client2".

Currently, the progress indication is wired to perform the same as the
previous implementation to avoid making changes to the CLI, and because
the status reports happen at the start of the download, and the end of
the write to disk, the progress indication is not as smooth as it could
be. This is a known issue and will be addressed in a future change.

This implementation may be ~0.5-1.0% slower in rare cases, depending on
network and disk speed, but is generally MUCH faster and more robust
than the its predecessor in all other cases.
2025-03-05 14:48:18 -08:00
Daniel Hiltgen
cae5d4d4ea Win: doc new rocm zip file (#9367)
To stay under the 2G github artifact limit, we're splitting ROCm
out like we do on linux.
2025-03-05 14:11:21 -08:00
Michael Yang
05a01fdecb ml/backend/ggml: consolidate system info logging
- output backend system info when initializing the backend. this ensures
  this information is always present without needing to be called
  explicitly
- convert to structured logging
- enumerate devices rather than backends since devices are ordered
- track device indices grouped by device name
2025-03-04 15:14:31 -08:00
aritra saha
8fe6f69f28 docs: add granite-3.2 to the readme 2025-03-04 11:10:56 -08:00
Daniel Hiltgen
1fdb351c37 New engine: vision models and auto-fallback (#9113)
* Include unified vision layers in memory prediction

For newer vision models with a single gguf, include
the projection estimates.

* Adjust CLI to handle both styles of vision model metadata

* Wire up new tokenizers for new engine

If we're loading the new engine, utilize the new model
text processor instead of calling into cgo wrappers for
llama.cpp.  This also cleans up some tech debt from the
older tokenization flow for the C++ server which was
no longer used.

This also adjusts the grammar handling logic to pass
through to the new engine instead of utilizing the cgo
schema to grammar call.

* Lay foundation for auto selection of new engine
2025-03-04 09:03:46 -08:00
Blake Mizerany
7a01ad7614 server/internal/registry: reintroduce pruning on model deletion (#9489)
This reintroduces aggressive pruning on model deletion as a temporary
measure until a more controlled garbage collection (GC) mechanism is
implemented.

Issues with the current approach:

1. Users may accidentally delete a model (`ollama rm llama3.3` instead
   of `ollama rm llama3.2`), requiring a full re-download unless another
   model references the same blobs.

2. Users may assume a deleted model is still referenced elsewhere, but
   due to prior updates or deletions, the references no longer exist,
   leading to unnecessary re-downloads.

Soon, we should implement a structured GC mechanism to retain
unreferenced blobs for a configurable period before removal, which will
run on "ollama rm" and other commands we deem appropriate.

Users that want to immediately remove unreferenced blobs can use a new
prune command that will allow them to specify the age and class of blobs
to remove.

Example usage:

    # Run basic blob GC
    $ ollama prune

    # Remove unreferenced blobs older than 7 days
    $ ollama prune --age 7d

    # Remove all blobs, referenced or not, older than 7 days (and their manifests?)
    $ ollama prune --age 7d --all

    # Remove all unreferenced blobs immediately
    $ ollama prune --age 0 --all

    # Remove all blobs
    $ ollama prune --age 0 --all

This should provide a safer and more predictable cleanup process.
2025-03-03 19:11:16 -08:00
Blake Mizerany
55ab9f371a server/.../backoff,syncs: don't break builds without synctest (#9484)
Previously, developers without the synctest experiment enabled would see
build failures when running tests in some server/internal/internal
packages using the synctest package. This change makes the transition to
use of the package less painful but guards the use of the synctest
package with build tags.

synctest is enabled in CI. If a new change will break a synctest
package, it will break in CI, even if it does not break locally.

The developer docs have been updated to help with any confusion about
why package tests pass locally but fail in CI.
2025-03-03 16:45:40 -08:00
KindBrave
fefbf8f74b docs: add Ollama Android Chat community integration 2025-03-03 16:38:32 -08:00
Michael Yang
b428ddd796 docker: use go version from go.mod 2025-03-03 13:02:02 -08:00
Michael Yang
ba7d31240e fix: own lib/ollama directory
expand backend loading error handling to catch more problems and log
them instead of panicing
2025-03-03 13:01:18 -08:00
CYJiang
d25efe3954 cmd: add default err return for stop (#9458) 2025-03-03 12:13:41 -08:00
Mark
36dfb906bb docs: don't use self-closing tag for anchor element (#9456) 2025-03-03 11:56:34 -08:00
aritra saha
a6f0f908b9 docs: update phi3-mini to phi4-mini (#9424)
* Update README.md

removed phi 3 mini and added phi4-mini

* Update README.md

---------

Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com>
2025-03-03 11:09:21 -08:00
İbrahim Çetin
3b1ddb2b3a docs: add reins to community integrations (#9411) 2025-03-03 11:06:30 -08:00
Jeffrey Morgan
1579c4f06d build: install binutils alongside gcc in Dockerfile (#9475) 2025-03-03 01:20:49 -08:00
Blake Mizerany
3519dd1c6e server/internal/client/ollama: hold DiskCache on Registry (#9463)
Previously, using a Registry required a DiskCache to be passed in for
use in various methods. This was a bit cumbersome, as the DiskCache is
required for most operations, and the DefaultCache is used in most of
those cases. This change makes the DiskCache an optional field on the
Registry struct.

This also changes DefaultCache to initialize on first use. This is to
not burden clients with the cost of creating a new cache per use, or
having to hold onto a cache for the lifetime of the Registry.

Also, slip in some minor docs updates for Trace.
2025-03-02 20:55:44 -08:00
Jeffrey Morgan
e41c4cbea7 build: install ccache manually in Dockerfile (#9464)
Reverts ccache installation to be done manually via curl instead of
using the dnf package manager as this has side effects of prepending
ccache's install directory to the front of the PATH
2025-03-02 16:48:31 -08:00
Blake Mizerany
ee048b76d4 server/internal/client/ollama: handle extended names in client/ollama (#9454)
The extended name format is a superset of the name format that only the
client needs to know about, not the server or other dependents of the
name package, so move the split logic into the client package.

Also, take advantage of knowing about the extended name format to allow
the client to use the extended name format when unlinking to verify they
are unlinking the manifest with the content they intend.
2025-03-02 13:30:41 -08:00
Soulter
af68d60a58 readme: add AstrBot to community integrations (#9442) 2025-03-01 21:58:34 -08:00
Jesse Gross
21aa666a1e ml: Enable support for flash attention
The GGML flash attention kernel has specific requirements for
padding and permutation. This adds support to the KV cache
for conforming to these requirements so that flash attention
can be enabled.

Flash attention can be used in the same situations as the llama
engine and is enabled by the user in the same way.
2025-03-01 20:53:23 -08:00
Jesse Gross
ee141cc821 ml: Empty tensor constructor for tensors
In cases where we allocate a tensor and then fully overwrite it with
copied data, it is wasteful to first zero out the memory.
2025-03-01 20:53:23 -08:00
Jesse Gross
55e5776c44 ggml-backend: Store parent backend as part of tensor
It can be important for a tensor to know what backend it came from -
for example, to know if flash attention is enabled.
2025-03-01 20:53:23 -08:00
Jesse Gross
854a9195f3 attention: Remove unnecessary contiguous operations
Prior to performing attention, we need to permute query, key
and value. Currently we call Contiguous after each of these
permutations, which is correct but expensive. Avoiding the
3 calls to Contiguous increases performance by over 20%.

The permutations of query and key do not violate the continuity
rules for mulmat and the Contiguous call can be simply removed.

Value requires a different permutation and does require Contiguous.
However, we can use the copy into the cache as a way to perform this
without further overhead.

To support this and avoid unexpected tensor shapes that are seen by
models, we need tighter integration between attention, cache
and backend. Future optimization will also likely need this structure
 - for example, flash attention has special padding requirements in
the cache and other backends may have their own needs.

This further contains the operations that go into attention so that
these and other optimizations can be handled transparently. Models
that have special requirements for attention can still implement
their own version of it.
2025-03-01 20:53:23 -08:00
Jeffrey Morgan
96a97adf9b build: use correct GGML_HIP_NO_VMM compiler definition for ggml-hip (#9451) 2025-03-01 17:00:31 -08:00
Jeffrey Morgan
e75c6126e9 build: set GGML_CUDA_NO_VMM for ggml-hip target (#9449) 2025-03-01 14:02:19 -08:00
Blake Mizerany
cda6f5c66c server/internal/internal/names: validate names (#9400)
This commit is a step towards a goal to make names less ceremonial
outside of the registry client. Clients of the registry package can
treat names as opaque strings, and the registry package will handle
parsing, validating, and normalizing names.

Ideally we end up with the names package tucked away in an internal
package for good. We'll see how things go.

Also, this package name is not permanent. This another step in the
on-going process of refactoring the server code, and at some point it
will most likely be renamed/moved.
2025-03-01 13:15:14 -08:00
Bruce MacDonald
bebb6823c0 server: validate local path on safetensor create (#9379)
More validation during the safetensor creation process.
Properly handle relative paths (like ./model.safetensors) while rejecting absolute paths
Add comprehensive test coverage for various paths
No functionality changes for valid inputs - existing workflows remain unaffected
Leverages Go 1.24's new os.Root functionality for secure containment
2025-02-28 16:10:43 -08:00
Michael Yang
31e472baa4 runner: defer context cancel
defer the cancel to guarantee it runs
2025-02-28 22:27:28 +00:00
Michael Yang
657685e85d fix: replace deprecated functions 2025-02-28 21:29:34 +00:00
Jeffrey Morgan
a14912858e build: add compute capability 12.0 to CUDA 12 preset (#9426)
Focuses initial Blackwell support on compute capability 12.0
which includes the 50x series of GeForce cards. In the future
additional compute capabilities may be added
2025-02-28 13:12:31 -08:00
Blake Mizerany
eed11ded30 server/.../safetensors: fix offsets and include all model parts (#9427)
Also, require the -as flag to be set when importing a model. This
prevents the confusing error message "invalid name".

Also, allow short names to be used when importing a model and
auto-complete the name with the default mask.
2025-02-28 13:08:10 -08:00
Michael Yang
b42aba40ed cuda: enable flash attention
ggml added an option to disable flash attention so explicitly enable it
2025-02-28 19:40:34 +00:00
王贺
25885e5335 docs: Add 1Panel to Community Integrations (#9312) 2025-02-28 09:53:03 -08:00
Jeffrey Morgan
98d44fa39d llama: add phi4 mini support (#9403) 2025-02-27 19:30:32 -08:00
Blake Mizerany
2099e2d267 CONTRIBUTING: provide clarity on good commit messages, and bad (#9405)
Also, our commit messages have been getting better, but we can do
better, and be more consistent. This adds more clarity on how to write
commit messages and provides examples of good and bad messages.

Also, our contributing guide was lacking helpful guidance on how to
start change proposals. This commit adds the start of that section.

Soon, we should add a proposal template to the issue tracker with a link
back to the proposal section, which should also be expanded upon.
2025-02-27 19:22:26 -08:00
Bruce MacDonald
0c1041ad85 runner: default to greedy sampler for performance (#9407)
As are adding support for weighted sampling we have seen some performance
regressions, bypassing the sampler logic for now and defaulting to greedy
until we can benchmark the new sampler logic.
2025-02-27 16:41:20 -08:00
Parth Sareen
c245b0406f sample: remove transforms from greedy sampling (#9377) 2025-02-27 15:44:53 -08:00
Michael Yang
8b194b7520 kvcache: update tests 2025-02-27 22:27:16 +00:00
Michael Yang
3e8b8a1933 ml: update Context.Forward interface
update Context.Forward to accept multiple tensors to match
Context.Compute signature

update Context.Forward to return Context such that it can be chained
with Context.Compute
2025-02-27 22:27:16 +00:00
Blake Mizerany
41dc280491 server/internal/registry: implement CloseNotify and Flush (for now) (#9402)
This fixes panics introduced in 2412adf42b
when Gin ungracefully assumes that the http.ResponseWriter implements
http.CloseNotifier and http.Flusher, which our new statusCodeRecorder
does not. This is a temporary fix until we can pour the rest of the Gin
out.
2025-02-27 14:00:37 -08:00
Michael Yang
53d2990d9b model: add bos token if configured 2025-02-27 21:04:59 +00:00
Jesse Gross
e185c08ad9 go.mod: Use full version for go 1.24.0
Otherwise on Linux I get:
go: download go1.24 for linux/amd64: toolchain not available
2025-02-27 13:01:32 -08:00
Blake Mizerany
2412adf42b server/internal: replace model delete API with new registry handler. (#9347)
This commit introduces a new API implementation for handling
interactions with the registry and the local model cache. The new API is
located in server/internal/registry. The package name is "registry" and
should be considered temporary; it is hidden and not bleeding outside of
the server package. As the commits roll in, we'll start consuming more
of the API and then let reverse osmosis take effect, at which point it
will surface closer to the root level packages as much as needed.
2025-02-27 12:04:53 -08:00
Steven Hartland
be2ac1ed93 docs: fix api examples link (#9360)
Fix the examples link in the go package documentation for the API.
2025-02-27 10:51:12 -08:00
Eries Trisnadi
dc13813a03 server: allow vscode-file origins (#9313) 2025-02-27 10:39:43 -08:00
Michael Yang
d6af13efed runner: simplify tensor split parsing 2025-02-27 18:36:46 +00:00
Michael Yang
a59f665235 ml/backend/ggml: fix debug logging 2025-02-27 18:30:57 +00:00
Daniel Hiltgen
688925aca9 Windows ARM build (#9120)
* Windows ARM build

Skip cmake, and note it's unused in the developer docs.

* Win: only check for ninja when we need it

On windows ARM, the cim lookup fails, but we don't need ninja anyway.
2025-02-27 09:02:25 -08:00
Blake Mizerany
76e903cf9d .github/workflows: swap order of go test and golangci-lint (#9389)
The linter is secondary to the tests, so it should run after the tests,
exposing test failures faster.
2025-02-26 23:03:48 -08:00
Jeffrey Morgan
a5272130c4 ml/backend/ggml: follow on fixes after updating vendored code (#9388)
Fixes sync filters and lowers CUDA version to 11.3 in test.yaml
2025-02-26 22:33:53 -08:00
Jeffrey Morgan
d7d7e99662 llama: update llama.cpp vendor code to commit d7cfe1ff (#9356) 2025-02-26 20:34:44 -08:00
Gordon Kamer
2db96c18e7 readme: add Nichey to community integrations (#9370) 2025-02-26 10:40:53 -08:00
Daniel Hiltgen
e12af460ed Add cuda Blackwell architecture for v12 (#9350)
* Add cuda Blackwell architecture for v12

* Win: Split rocm out to separate zip file

* Reduce CC matrix

The 6.2 and 7.2 architectures only appear on Jetsons, so they were wasting space.
The 5.0 should be forward compatible with 5.2 and 5.3.
2025-02-26 09:20:52 -08:00
Jeffrey Morgan
3ad4bc8afe llama: removed unused 'vendoring' file (#9351) 2025-02-25 14:33:03 -08:00
Blake Mizerany
0d694793f2 .github: always run tests, and other helpful fixes (#9348)
During work on our new registry client, I ran into frustrations with CI
where a misspelling in a comment caused the linter to fail, which caused
the tests to not run, which caused the build to not be cached, which
caused the next run to be slow, which caused me to be sad.

This commit address these issues, and pulls in some helpful changes
we've had in CI on ollama.com for some time now.

They are:

* Always run tests, even if the other checks fail.

Tests are the most important part of CI, and should always run. Failures
in tests can be correlated with failures in other checks, and can help
surface the root cause of the failure sooner. This is especially
important when the failure is platform specific, and the tests are not
platform independent.

* Check that `go generate` is clean.

This prevents 'go generate' abuse regressions. This codebase used to use
it to generate platform specific binary build artifacts. Let's make sure
that does not happen again and this powerful tool is used correctly, and
the generated code is checked in.

Also, while adding `go generate` the check, it was revealed that the
generated metal code was putting dates in the comments, resulting in
non-deterministic builds. This is a bad practice, and this commit fixes
that. Git tells us the most important date: the commit date along with
other associated changes.

* Check that `go mod tidy` is clean.

A new job to check that `go mod tidy` is clean was added, to prevent
easily preventable merge conflicts or go.mod changes being deferred to a
future PR that is unrelated to the change that caused the go.mod to
change.

* More robust caching.

We now cache the go build cache, and the go mod download cache
independently. This is because the download cache contains zips that can
be unpacked in parallel faster than they can be fetched and extracted by
tar. This speeds up the build significantly.

The linter is hostile enough. It does not need to also punish us with
longer build times due to small failures like misspellings.
2025-02-25 14:28:07 -08:00
Daniel Hiltgen
e91ae3d47d Update ROCm (6.3 linux, 6.2 windows) and CUDA v12.8 (#9304)
* Bump cuda and rocm versions

Update ROCm to linux:6.3 win:6.2 and CUDA v12 to 12.8.
Yum has some silent failure modes, so largely switch to dnf.

* Fix windows build script
2025-02-25 13:47:36 -08:00
José Pekkarinen
6ecd7f64ba docker: upgrade rocm to 6.3.3 (#8211)
centos-7 images have been deprecated upstream and replaced with
almalinux-8 images instead, requiring some small extra work.

Signed-off-by: José Pekkarinen <jose.pekkarinen@foxhound.fi>
2025-02-25 13:38:08 -08:00
Chuanhui Liu
888855675e docs: rocm install link (#9346) 2025-02-25 13:15:47 -08:00
Michael Yang
b16367b4b2 fix: add back bf16 support
this was accidentally removed when moving fs/ggml from its previous
location
2025-02-25 19:26:14 +00:00
Pavol Rusnak
a499390648 build: support Compute Capability 5.0, 5.2 and 5.3 for CUDA 12.x (#8567)
CUDA 12.x still supports Compute Capability 5.0, 5.2 and 5.3,
so let's build for these architectures as well
2025-02-25 09:54:19 -08:00
frob
4df98f3eb5 Move cgroups fix out of AMD section. (#9072)
Co-authored-by: Richard Lyons <frob@cloudstaff.com>
2025-02-25 08:52:50 -08:00
Blake Mizerany
348b3e0983 server/internal: copy bmizerany/ollama-go to internal package (#9294)
This commit copies (without history) the bmizerany/ollama-go repository
with the intention of integrating it into the ollama as a replacement
for the pushing, and pulling of models, and management of the cache they
are pushed and pulled from.

New homes for these packages will be determined as they are integrated
and we have a better understanding of proper package boundaries.
2025-02-24 22:39:44 -08:00
Parth Sareen
0b7e1676eb sample: add sampling package for new engine (#8410) 2025-02-24 17:19:01 -08:00
Parth Sareen
314573bfe8 config: allow setting context length through env var (#8938)
* envconfig: allow setting context length through env var
2025-02-24 13:26:35 -08:00
Blake Mizerany
4604b10306 go.mod: bump to go1.24 (#9242) 2025-02-24 13:11:46 -08:00
Jeffrey Morgan
8c13cfa4dd ml/backend/ggml: fix crash on windows paths with wide characters (#9305) 2025-02-23 19:13:53 -08:00
Jeffrey Morgan
7cfd4aee4d docs: add additional ROCm docs for building (#9066) 2025-02-22 11:22:59 -08:00
Blake Mizerany
68bac1e0a6 server: group routes by category and purpose (#9270)
The route assembly in Handler lacked clear organization making it
difficult scan for routes and their relationships to each other. This
commit aims to fix that by reordering the assembly of routes to group
them by category and purpose.

Also, be more specific about what "config" refers to (it is about CORS
if you were wondering... I was.)
2025-02-21 21:02:26 -08:00
Jesse Gross
f53f4198c3 ml: Abstract attention out of model definitions
There are two benefits to doing this:
 - Provide a library function that models can use, reducing code for
   each model implementation
 - Enables a single place to drop in optimized implementations of
   attention based on the backend or other factors. One is provided for
   GGML.

On CUDA this improves token generation rate by about 3%. It does not
have a significant effect on Metal.

Co-authored-by: Daniel Hiltgen <daniel@ollama.com>
2025-02-21 13:16:21 -08:00
Michael Yang
2192a28eed ml/backend/ggml: fix rms norm 2025-02-21 18:34:19 +00:00
Junyan Qin (Chin)
5d81c1a184 docs: add RockChinQ/LangBot to integrations list (#9272) 2025-02-21 09:36:55 -08:00
Jesse Gross
5c5535c064 models: Prune unused outputs earlier in the forward pass
Currently Rows is called as the last step in a model computation
to get the values for the output tokens. However, if we move it
earlier in the process then we can trim out computations that
never get used. This is similar to how models are defined in
llama.cpp.

Changing the model definition in this way improves token generation
performance by approximately 8%.
2025-02-20 14:49:47 -08:00
Jesse Gross
e5bcc51ae1 ggml-backend: Don't recreate the scheduler for each context
We don't need to create and destroy the GGML scheduler for every
context. This introduces extra CPU overhead for every forward
pass and extra memory for contexts that don't actually get scheduled
(for example, KV caches). We can instead just have one scheduler
for the backend and reset it each time we call Compute.

This improves token generation performance by 1-2% and removes
scheduler create/destroy from profile traces.
2025-02-20 14:49:47 -08:00
Jesse Gross
bd6a7d5e64 ollamarunner: Pass runner performance parameters to backends
Currently the following parameters are in the runner but not used:
 - numGPULayers
 - mainGPU
 - threads
 - tensorSplit

This passes them through to the backend, which is where they would
actually get used. However, the GGML backend does not yet do anything
with them.
2025-02-20 13:27:57 -08:00
Bruce MacDonald
14b5a9a150 api: document client stream behavior with a test (#8996)
Added unit tests to verify error handling behavior in the Client.stream and Client.do methods.
Tests cover various error scenarios including:
- Error responses with status codes >= 400
- Error messages with successful status codes
- Empty error messages
- Successful responses
2025-02-20 13:19:58 -08:00
Michael Yang
ba9ec3d05e ci: use clang for windows cpu builds
clang outputs are faster. we were previously building with clang via gcc
wrapper in cgo but this was missed during the build updates so there was
a drop in performance
2025-02-20 20:22:36 +00:00
frob
7c168b08c9 server: add missing function parens to debug log (#9255) 2025-02-20 12:10:15 -08:00
danielekp
3d4cc7833c docs: Add yla to community integrations 2025-02-20 11:34:24 -08:00
Lucas Hahn
351a85d9ea openai: add 'timeout' to allowable x-stainless headers (#9237) 2025-02-19 21:56:18 -08:00
Michael Yang
bda4ef6c56 reorder patches 2025-02-20 03:49:24 +00:00
Michael Yang
1e438b237c Merge pull request #9203 from ollama/mxyng/sapphirerapids
build: remove backend build for sapphirerapids
2025-02-19 21:42:00 +00:00
yuiseki
d721a02e7d test: add test cases for ListHandler (#9146) 2025-02-19 13:24:27 -08:00
zyxucp
778603a818 docs: Add AntSK to Community Integrations (#9214) 2025-02-19 13:22:48 -08:00
maninhill
3c874df46e docs: Add MaxKB to Community Integrations (#9212) 2025-02-19 13:20:09 -08:00
Jeffrey Morgan
d2eb226c91 llama: add patch to fix ggml backend reg on Linux with utf-8 characters in the path (#9159) 2025-02-18 22:46:17 -05:00
Michael Yang
e13e7c8d94 Merge pull request #9079 from jeremyschlatter/main
cmd: fix flickering in progress bar
2025-02-18 22:59:29 +00:00
Jeremy Schlatter
78f403ff45 address code review comments 2025-02-18 14:50:09 -08:00
Michael Yang
5f8c03189e build: remove backend build for sapphirerapids
sapphire rapids has amx support but it ends up having a negative
performance impact.

emerald rapids also has amx support with a positive performance impact
however there's no reasonable way in ggml to differentiate between the
two. the impact is small (~6%) so disable amx entirely for simplicity
2025-02-18 14:47:58 -08:00
Michael Yang
08a299e1d0 cmake: avoid building intel backends on linux 2025-02-18 22:17:00 +00:00
Michael Yang
7b5d916a9a ci: set owner/group in tarball
set owner and group when building the linux tarball so extracted files
are consistent. this is the behaviour of release tarballs in version
0.5.7 and lower
2025-02-18 20:11:09 +00:00
benhaotang
33ad61b112 Add OpenDeepResearcher-via-searxng to Community Integrations (#9138) 2025-02-18 11:39:11 -08:00
L. Jiang
716e365615 test: add test cases for HumanNumber (#9108) 2025-02-18 11:35:26 -08:00
innightwolfsleep
3b4424ff98 readme: add LLM Telegram Bot to community integrations (#9150) 2025-02-18 10:04:30 -05:00
Jeremy Schlatter
f9c7ead160 cmd: eliminate flickering with synchronized output 2025-02-17 20:01:03 -08:00
Jeremy Schlatter
5930aaeb1a cmd: fix cursor flickering in progress bar
The previous commit fixed flickering in the progress bar itself. Cursor
flickering is harder to address.

Cursor flickering could be fixed by hiding the cursor altogether while
the progress bar is displayed. The downside of this is that if the
program is killed in such a way that it can't clean up its state, it
would leave the cursor invisible.

Instead, this commit introduces an output buffer. All of the escape
codes and content for a single progress update are written to a buffer,
which is then flushed to the terminal all at once. This significantly
decreases the time during which the terminal has seen the cursor-hiding
code but has not yet seen the cursor-showing code, thus minimizing (but
not 100% eliminating) cursor flickering.

For more context, see:
https://gitlab.gnome.org/GNOME/vte/-/issues/2837#note_2269501
2025-02-17 14:56:57 -08:00
Jeremy Schlatter
faf67db089 cmd: fix progress bar flickering
Previous code cleared the display before writing new content, creating a
window where the terminal could (and in some cases did) render empty lines.

Instead, we now write new content over the old content, only clearing
the trailing end of lines for cases where the new line is shorter.

Fixes #1664
2025-02-17 13:39:02 -08:00
James-William-Kincaid-III
0667baddc6 docs: fix incorrect shortcut key in windows.md (#9098) 2025-02-15 15:38:24 -05:00
Bruce MacDonald
d006e1e09b model: document high-level model interface (#9122) 2025-02-14 16:01:00 -08:00
Daniel Hiltgen
df2680b4b9 Wire up system info log for new engine (#9123) 2025-02-14 15:55:33 -08:00
Jesse Gross
010313bb63 llamarunner: Init GGML before printing system info
We currently print system info before the GGML backends are loaded.
This results in only getting information about the default lowest
common denominator runner. If we move up the GGML init then we can
see what we are actually running.

Before:
time=2025-02-14T11:15:07.606-08:00 level=INFO source=runner.go:935 msg=system info="CPU : LLAMAFILE = 1 | CPU : LLAMAFILE = 1 | cgo(gcc)" threads=24

After:
time=2025-02-14T11:16:02.936-08:00 level=INFO source=runner.go:935 msg=system info="CPU : LLAMAFILE = 1 | CPU : LLAMAFILE = 1 | CUDA : ARCHS = 890 | USE_GRAPHS = 1 | PEER_MAX_BATCH_SIZE = 128 | CPU : SSE3 = 1 | SSSE3 = 1 | AVX = 1 | AVX2 = 1 | F16C = 1 | FMA = 1 | AVX512 = 1 | AVX512_VBMI = 1 | AVX512_VNNI = 1 | LLAMAFILE = 1 | cgo(gcc)" threads=24
2025-02-14 11:41:53 -08:00
Jeffrey Morgan
5296f487a8 llm: attempt to evaluate symlinks, but do not fail (#9089)
provides a better approach to #9088 that will attempt to
evaluate symlinks (important for macOS where 'ollama' is
often a symlink), but use the result of os.Executable()
as a fallback in scenarios where filepath.EvalSymlinks
fails due to permission erorrs or other issues
2025-02-13 22:37:59 -08:00
Jeffrey Morgan
f05774b04c llm: do not evaluate symlink for exe path lookup (#9088)
In some cases, the directories in the executable path read by
filepath.EvalSymlinks are not accessible, resulting in permission
errors which results in an error when running models. It also
doesn't work well on long paths on windows, also resulting in
errors. This change removes filepath.EvalSymlinks when accessing
os.Executable() altogether
2025-02-13 22:13:00 -08:00
Jeffrey Morgan
6600bd7d91 ml/backend/ggml: stable sort devices by score (#9081) 2025-02-13 18:42:36 -08:00
Jesse Gross
ed443a0393 Runner for Ollama engine
This provides integration with the new Ollama engine
(5824541 next ollama runner (#7913)) and the rest of the Ollama
infrastructure such as the runner and Ollama server.

In addition, it also builds out the KV cache infrastructure to
support requirements of how Ollama runs models such as:
 - Parallel processing
 - Memory management for defragmentation and shifting
 - Multi-modal modals

Both old and new engines continue to be supported. By default, only
the old engine is used. To enable the new engine:

Start the server with the OLLAMA_NEW_ENGINE environment variable set:
OLLAMA_NEW_ENGINE=1 ./ollama serve

Start a model that is supported by the Ollama engine. This one is Llama 3.1 8b Q4_K_M:
./ollama run jessegross/llama3.1
2025-02-13 17:09:26 -08:00
Jesse Gross
6945617af5 models: Move model into their own directory
This allows there to be a file that is a list of models that is
not mixed into the runner code.
2025-02-13 17:09:26 -08:00
Jesse Gross
7916f55009 vocab: Use int32 for special tokens
Special tokens are currently read as uint32 from the model metadata.
However, all other parts of the system (including the tokenizer) use
int32 to represent tokens so it is impossible to represent the high
portion of the unsigned range. For consistency and to avoid casts,
we should just use int32 everywhere.
2025-02-13 17:09:26 -08:00
Jesse Gross
d650ad398f model: Load tensors behind an interface
Currently, if a model uses an interface for its data structures (as mllama
does) then the tensor data in the structs implementing that interface will
not get loaded.
2025-02-13 17:09:26 -08:00
Jesse Gross
d223f3b697 ggml-backend: Close on nil should be a no-op 2025-02-13 17:09:26 -08:00
Jesse Gross
60830695c2 ggml-backend: Ensure data is available after async computation
We need to sync before retrieving data after async computation.
It is also important to ensure that the Go buffer is not moved by
the GC across function calls so we do a synchronous copy.
2025-02-13 17:09:26 -08:00
Jesse Gross
01d9a46854 ggml-backend: Let GGML allocate context memory
Passing in a Go buffer is not safe because the garbage collector could
free or move the memory while the context is still open. However, if
we pass in the size and a nil pointer then GGML will allocate it from
the C side.
2025-02-13 17:09:26 -08:00
Jesse Gross
d773b7d671 backend: API to support full precision matmul
Most tensor backends try to optimize performance by using a lower
precision for matmuls. However, some operations (such as kq) on
some models are sensitive to this and require full precision.
2025-02-13 17:09:26 -08:00
Jesse Gross
4d4463b2bd backend: Support graph computation that does not return an output
There are two cases where we may not have an output after computing:
 - Prompt processing where the length of the input exceeds the batch
   size
 - Internal memory management operations such as cache defrag and shift
2025-02-13 17:09:26 -08:00
Jesse Gross
0e38297f87 backend: Consistently use int (vs. int64) for tensor shapes
Currently there is a mixture of int and int64 used when dealing with
tensor dimensions and shapes, which causes unnecessary conversions -
they all should be the same type.

In general, most interfaces (such as Pytorch) use int64 for
generality but most implementations (such as CUDA) use int32 for
performance. There isn't much benefit to us to being more flexible
than the implementations we are likely to run on.

In addition, as a practical matter, a model with a tensor with a single
dimension larger than 32 bits is unlikely to run on a 32-bit machine.
2025-02-13 17:09:26 -08:00
Jesse Gross
7e13f568dc backend: Don't return an error on Close
It is not common to return errors with close/free operations - most
people won't check it and even if they did there's probably not much
that can do. It's better to not give implementations false expectations.
2025-02-13 17:09:26 -08:00
Michael Yang
58245413f4 next ollama runner (#7913)
feat: add new Ollama engine using ggml through cgo

This change introduces a new way to run pretrained models. It introduces 3 high level interfaces and a bunch of smaller helper interfaces to facilitate this.

- `model.Model` defines the interface for a model architecture. Models such as `llama` and `mllama`, which are provided as examples, can implement the model's forward propagation in the `Forward` method. This method will be called to generate completions. This interface can be found in `model/model.go`
- `ml.Backend` defines the interface for a backend tensor library, in this case `ggml`. Among other things, a Backend is responsible for loading a pretrained model into hardware (GPU, CPU, etc) and providing an interface for Models to access loaded tensors. This interface can be found in `ml/backend.go`
- `ml.Tensor` defines the interface for a tensor and tensor operations

This is the first implementation of the new engine. Follow up PRs will implement more features:

- non-greedy sampling (#8410)
- integration with Ollama and KV caching (#8301)
- more model support (#9080) with more coming soon

Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com>
2025-02-13 16:31:21 -08:00
Bùi Đức Nhật
8cf16063a5 docs: add ollamazing to the README.md (#9075) 2025-02-13 10:47:09 -08:00
frob
3a4449e2f1 docs: add H200 as supported device. (#9076)
Co-authored-by: Richard Lyons <frob@cloudstaff.com>
2025-02-13 10:44:23 -08:00
Anuraag (Rag) Agrawal
10d59d5f90 openai: finish_reason as tool_calls for streaming with tools (#7963) 2025-02-13 10:20:12 -08:00
Jeffrey Morgan
a4f69a0191 build: add -DGGML_CUDA_NO_PEER_COPY=ON for rocm builds on windows (#9060) 2025-02-13 00:23:17 -08:00
Clinton
82658c3eec readme: add Homebrew to package managers section (#9052) 2025-02-12 11:17:39 -08:00
bloominstrong
378d6e1e6a docs: fix nix package link (#9045)
removing the channel tag from the url so it will always go to the current stable channel.
2025-02-12 09:16:26 -08:00
Hugues Chocart
afa55bc70c doc: fix link for Abso (#9043) 2025-02-12 09:15:08 -08:00
Michael Yang
49df03da9a fix: harden backend loading (#9024)
* wrap ggml_backend_load_best in try/catch
* ignore non-ollama paths
2025-02-11 15:36:53 -08:00
Hugues Chocart
0189bdd0b7 readme: add Abso SDK to community integrations (#8973) 2025-02-11 00:14:45 -08:00
Jeffrey Morgan
f4711da7bd ml/backend/ggml: fix crash on dlopen for non-AVX systems (#8976) 2025-02-10 09:52:12 -08:00
Hugues Chocart
38117fba83 readme: add Lunary to observability community integrations (#8975) 2025-02-09 22:08:46 -08:00
Michael Yang
1f766c36fb ci: use windows-2022 to sign and bundle (#8941)
ollama requires vcruntime140_1.dll which isn't found on 2019. previously
the job used the windows runner (2019) but it explicitly installs
2022 to build the app. since the sign job doesn't actually build
anything, it can use the windows-2022 runner instead.
2025-02-08 13:07:00 -08:00
Qusai Ismael
484a99e428 docs: add LocalLLM app to community integrations (#8953) 2025-02-08 12:28:01 -08:00
DravenK
ec6121c331 docs: ollama zig community lib (#8688) 2025-02-08 11:10:47 -08:00
Jeffrey Morgan
b86c0a1500 docs: link directly to latest release page for tdm-gcc (#8939) 2025-02-08 00:21:10 -08:00
Guddu Kumar
7e402ebb8c readme: add deepseek to supported models 2025-02-07 11:28:28 -08:00
Azis Alvriyanto
b901a712c6 docs: improve syntax highlighting in code blocks (#8854) 2025-02-07 09:55:07 -08:00
Michael Yang
abb8dd57f8 add gfx instinct gpus (#8933) 2025-02-07 09:51:22 -08:00
Leisure Linux
a400df48c0 docs: include port in faq.md OLLAMA_HOST examples (#8905) 2025-02-06 18:45:09 -08:00
annilq
6ab4ba4c26 readme: add React Native client to community integrations (#8877) 2025-02-06 17:15:48 -08:00
CosmicEventHorizon
e8d4eb3e68 readme: add ChibiChat to community integrations (#8883) 2025-02-06 16:08:46 -08:00
Michael Yang
ae7e368f75 build(rocm): add numa, elf (#8900) 2025-02-06 15:46:30 -08:00
oslook
31acd1ebf9 readme: add Ollama Chat WebUI for Docker to community integrations (#8084) 2025-02-06 15:41:02 -08:00
Michael Yang
9a4757ae66 build(rocm): add tinfo (#8899) 2025-02-06 15:08:12 -08:00
Abhinav Pant
7814019708 docs: add step for removing libraries in linux.md (#8897) 2025-02-06 14:54:58 -08:00
Michael Yang
b698f9a0d8 build: add missing dependencies (#8896) 2025-02-06 13:12:16 -08:00
Azis Alvriyanto
32285a6d19 format: rename test file from byte_test.go to bytes_test.go (#8865) 2025-02-06 13:06:15 -08:00
Michael Yang
1c198977ec ci: fix linux archive (#8862)
the find returns intermediate directories which pulls the parent
directories. it also omits files under lib/ollama.

switch back to globbing
2025-02-05 19:45:58 -08:00
zyphixor
330b6c50b0 readme: add simple-discord-ai to community integrations (#8659) 2025-02-05 18:35:04 -08:00
Diego Pereira
928911bc68 runner: avoid buffer overwrite when generating multiple embeddings (#8714)
Shield the code processing the embedding result
from subsequent calls that may overwrite the same
buffer to process a second input when retrieving
model embeddings.
2025-02-05 16:53:33 -08:00
Michael Yang
5b446cc815 chore: update gitattributes (#8860)
* chore: update gitattributes
* chore: add build info source
2025-02-05 16:37:18 -08:00
Daniel Lok
451c1596af readme: add MLflow Tracing as an observability integration (#8811) 2025-02-05 16:04:24 -08:00
Michael Yang
932bded12f chore: add optional field for server logs 2025-02-05 15:55:32 -08:00
Michael Yang
070ad913ac ci: fix linux archive 2025-02-05 15:08:02 -08:00
Azis Alvriyanto
8d8b9f83ae format: byte formatting test coverage (#8692)
Removed redundant checks and streamlined the switch-case structure.
Added test cases for both HumanBytes and HumanBytes2 to cover a wide range of scenarios.
2025-02-05 12:23:07 -08:00
Jeffrey Morgan
f00d359a67 docs: add section in development.md on library detection (#8855) 2025-02-05 11:16:27 -08:00
Yashwanth A
291def6adb server: increase timeout in stall detection from 5s to 30s (#8831)
In some cases, downloads slow due to disk i/o or other factors,
causing the download to restart a part. This causes the download
to "reverse" in percent completion. By increasing the timeout to 30s,
this should happen less frequently.
2025-02-05 10:00:26 -08:00
Jeffrey Morgan
cd3fbf1c49 llama: use dynamic backend loading for mllama and clip (#8835) 2025-02-05 09:46:56 -08:00
Jeffrey Morgan
c852b8e021 server: always print upload/download part info (#8832) 2025-02-04 19:30:49 -08:00
William
d8932c55e7 server: fix out of bounds exception on model download (#8746) 2025-02-04 18:52:47 -08:00
Michael Yang
63f0269f7f ci: split docker build by platform
this improves build reliability and concurrency
2025-02-04 17:04:27 -08:00
Jeffrey Morgan
4759ecae19 ml/backend/ggml: fix library loading on macOS amd64 (#8827) 2025-02-04 15:05:39 -08:00
Michael Yang
65b7ecac7b fix extra quote 2025-02-04 08:35:30 -08:00
Michael Yang
f9d2d89135 fix linux archive 2025-02-03 16:12:33 -08:00
Michael Yang
669dc31cf3 fix build 2025-02-03 15:10:51 -08:00
Tilman Griesel
d4d338c224 readme: add Chipper to community integrations (#8803) 2025-02-03 14:18:19 -08:00
Melroy van den Berg
bfdeffc375 docs: use OLLAMA_VERSION=0.5.7 for install version override (#8802) 2025-02-03 13:54:08 -08:00
Michael Yang
e806184023 fix release workflow 2025-02-03 13:19:57 -08:00
Jeffrey Morgan
50566113ac llm: do not error if LibOllamaPath does not exist (#8801) 2025-02-03 12:27:48 -08:00
Davide Bertoni
ad22ace439 docs: add missing json and shell code blocks in api.md (#8766) 2025-02-02 13:12:55 -08:00
Anıl Kaynar
f4321a421c readme: add MinimalNextOllamaChat to community integrations (#8767) 2025-02-02 12:56:10 -08:00
Michael Yang
475333d533 fix docker build-args
env context is not accessible from job.*.strategy. since it's in the
environment, just tell docker to use the environment variable[1]

[1]: https://docs.docker.com/reference/cli/docker/buildx/build/#build-arg
2025-01-31 14:56:02 -08:00
Michael Yang
39fd89308c build: set CFLAGS=-O3 specifically for cpu.go 2025-01-31 10:25:39 -08:00
Michael Yang
548a9f56a6 Revert "cgo: use O3"
This reverts commit bea1f1fac6.
2025-01-31 10:25:39 -08:00
Michael Yang
3f0cb36bdb build: set goflags in linux release 2025-01-30 13:07:32 -08:00
Michael Yang
bea1f1fac6 cgo: use O3 2025-01-30 12:21:50 -08:00
Jeffrey Morgan
5d75d837ef discover: fix default LibOllamaPath value (#8702) 2025-01-30 12:21:38 -08:00
Parth Sareen
711648c9bb docs: update api.md with streaming with tools is enabled (#8676) 2025-01-29 15:14:30 -08:00
Michael Yang
dcfb7a105c next build (#8539)
* add build to .dockerignore

* test: only build one arch

* add build to .gitignore

* fix ccache path

* filter amdgpu targets

* only filter if autodetecting

* Don't clobber gpu list for default runner

This ensures the GPU specific environment variables are set properly

* explicitly set CXX compiler for HIP

* Update build_windows.ps1

This isn't complete, but is close.  Dependencies are missing, and it only builds the "default" preset.

* build: add ollama subdir

* add .git to .dockerignore

* docs: update development.md

* update build_darwin.sh

* remove unused scripts

* llm: add cwd and build/lib/ollama to library paths

* default DYLD_LIBRARY_PATH to LD_LIBRARY_PATH in runner on macOS

* add additional cmake output vars for msvc

* interim edits to make server detection logic work with dll directories like lib/ollama/cuda_v12

* remove unncessary filepath.Dir, cleanup

* add hardware-specific directory to path

* use absolute server path

* build: linux arm

* cmake install targets

* remove unused files

* ml: visit each library path once

* build: skip cpu variants on arm

* build: install cpu targets

* build: fix workflow

* shorter names

* fix rocblas install

* docs: clean up development.md

* consistent build dir removal in development.md

* silence -Wimplicit-function-declaration build warnings in ggml-cpu

* update readme

* update development readme

* llm: update library lookup logic now that there is one runner (#8587)

* tweak development.md

* update docs

* add windows cuda/rocm tests

---------

Co-authored-by: jmorganca <jmorganca@gmail.com>
Co-authored-by: Daniel Hiltgen <daniel@ollama.com>
2025-01-29 15:03:38 -08:00
Xiaofu Huang
2ef3c803a1 readme: add AI Toolkit for VSCode to community integrations (#8604) 2025-01-27 00:36:23 -08:00
Matěj Štágl
453e4d090b readme: add LlmTornado to community integrations (#8551) 2025-01-25 01:04:07 -08:00
Daniel Jalkut
ca2f9843c8 docs: remove reference to the deleted examples folder (#8524) 2025-01-22 22:52:15 -08:00
frob
294b6f5a22 docs: remove tfs_z option from documentation (#8515) 2025-01-21 09:28:59 -08:00
EndoTheDev
7bb356c680 docs: update suspend header in gpu.md (#8487) 2025-01-19 18:45:35 -08:00
Jannik Maierhöfer
021817e59a readme: add link to Langfuse (#8455) 2025-01-16 22:41:12 -08:00
Patrick Devine
a420a453b4 fix default modelfile for create (#8452) 2025-01-16 01:14:04 -08:00
Jeffrey Morgan
42cf4db601 parser: fix parsing Modelfiles with multiple FROM commands (#8449) 2025-01-16 00:14:04 -08:00
Josh
93a8daf285 convert: import support for command-r models from safetensors (#6063)
---------

Co-authored-by: Patrick Devine <patrick@infrahq.com>
2025-01-15 16:31:22 -08:00
Gloryjaw
a041b4df7c docs: fix path to examples (#8438) 2025-01-15 11:49:12 -08:00
Patrick Devine
2539f2dbf9 Fix absolute path names + gguf detection (#8428) 2025-01-14 19:01:24 -08:00
Jeffrey Morgan
61676fb506 llama: move grammar tests to llama_test.go (#8411) 2025-01-14 12:55:45 -08:00
Bruce MacDonald
f6f3713001 convert: qwen2 from safetensors (#8408)
Add native support for converting Qwen2 family models (including Qwen2.5)
from safetensors to gguf format so we can run it.
2025-01-14 10:34:37 -08:00
Steve Berdy
a30f347201 readme: add LangChain for .NET to community integrations (#8352) 2025-01-14 09:37:35 -08:00
Jeffrey Morgan
74ea4fb604 remove .prettierrc.json (#8413) 2025-01-14 09:30:34 -08:00
Jeffrey Morgan
6982e9cc96 readme: remove link to missing page 2025-01-13 18:56:31 -08:00
Patrick Devine
ab39872cb4 add new create api doc (#8388) 2025-01-13 17:30:24 -08:00
Parth Sareen
84a2314463 examples: remove codified examples (#8267) 2025-01-13 11:26:22 -08:00
Jeffrey Morgan
17fcdea698 readme: move discord link 2025-01-12 22:45:47 -08:00
Patrick Devine
32bd37adf8 make the modelfile path relative for ollama create (#8380) 2025-01-10 16:14:08 -08:00
Michael Yang
9446c2c902 Merge pull request #8196 from ollama/mxyng/gods-v2
chore: upgrade to gods v2
2025-01-10 13:50:11 -08:00
Jeffrey Morgan
9aa141d023 readme: remove discord badge image for now 2025-01-09 22:02:18 -08:00
Patrick Devine
8bccae4f92 show a more descriptive error in the client if it is newer than the server (#8351) 2025-01-09 10:12:30 -08:00
isamu arimoto
6ae2adc1af openai: accept additional headers to fix CORS errors (#8343) 2025-01-08 11:28:11 -08:00
Jeffrey Morgan
1deafd8254 llama: update vendored code to commit 46e3556 (#8308) 2025-01-08 11:22:01 -08:00
Michael
57f038ec7b readme: add phi4 model (#8350) 2025-01-08 11:21:39 -08:00
frob
cdf3a181dc Add CUSTOM_CPU_FLAGS to Dockerfile. (#8284)
* Add CUSTOM_CPU_FLAGS.

* fix golangci-lint error.

---------

Co-authored-by: Richard Lyons <rick@frob.com.au>
2025-01-06 09:17:19 -08:00
Ubaldo Porcheddu
3919f4ba3d llama: fix runner api example url in README.md (#8307) 2025-01-04 15:45:16 -08:00
Bruce MacDonald
2d33c4e97d discover: remove leading new-line for linter 2025-01-03 12:03:58 -08:00
Bruce MacDonald
29a8975c66 api: remove unused create fields
These fields are deprecated, but specifying them will not do anything. Removing them as the other deprecated fields will still work, but these do not, so they dont match our existing pattern.
2025-01-03 12:03:58 -08:00
Patrick Devine
86a622cbdc Update the /api/create endpoint to use JSON (#7935)
Replaces `POST /api/create` to use JSON instead of a Modelfile.

This is a breaking change.
2024-12-31 18:02:30 -08:00
Jeffrey Morgan
459d822b51 readme: link header to ollama.com 2024-12-29 17:36:07 -05:00
Simon Schampijer
844899440a examples: updated deprecated imports (#3602) 2024-12-29 14:36:25 -05:00
Anas Khan
103db4216d docs: add /api/version endpoint documentation (#8082)
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>
2024-12-29 14:33:44 -05:00
Jeffrey Morgan
6daddcde01 readme: update import header 2024-12-29 14:12:23 -05:00
Emilien Lancelot
07f7e69b36 readme: add Yacana multi-agent framework to community integrations (#7259) 2024-12-28 15:05:57 -05:00
CIIDMike
b68e8e5727 docs: add syntax highlighting on Go template code blocks (#8215) 2024-12-27 13:17:49 -05:00
Adarsh Mishra
369fb529e2 readme: add TextLLaMA to community integrations 2024-12-27 13:16:06 -05:00
Jared Donnell
023e4bca14 readme: add neollama to terminal section of community integrations (#8242) 2024-12-25 17:16:11 -05:00
aritra saha
51af455f62 readme: add alpaca client application to community integrations (#8227) 2024-12-24 23:05:35 -05:00
Emanuil Rusev
ffe3549064 readme: add IntelliBar to community integrations (#7950) 2024-12-23 12:04:18 -05:00
湛露先生
928de9050e server: reuse InvalidModelNameErrMsg type (#8163) 2024-12-23 10:38:34 -05:00
ItzCrazyKns
36aea6154a readme: add Perplexica to community-integrations (#8198) 2024-12-22 20:04:01 -05:00
Patrick Devine
dd352ab27f fix crash bug with /save when quotes are used (#8208) 2024-12-21 22:31:37 -08:00
Michael Yang
cb40d60469 chore: upgrade to gods v2
gods v2 uses go generics rather than interfaces which simplifies the
code considerably
2024-12-21 00:05:16 -08:00
Patrick Devine
d8bab8ea44 remove tutorials.md which pointed to removed tutorials (#8189) 2024-12-20 14:04:20 -08:00
Squishedmac
9ab62eb96f update golang.org/x dependencies (#8172) 2024-12-20 09:29:30 -08:00
Parth Sareen
290cf2040a llama: test key order preservation in schema_to_grammar (#8078)
This change adds a test to catch a regression in schema_to_grammar where
the order of keys in the JSON schema is not preserved in the generated
grammar, which is critical for step-by-step reasoning.
2024-12-18 19:44:50 -08:00
Jeffrey Morgan
a72f2dce45 scripts: sign renamed macOS binary (#8131) 2024-12-17 18:03:49 -08:00
Jesse Gross
08a832b482 llama: Ensure KV cache is fully defragmented.
Sometimes the KV cache requires defragmentation even without
triggering the threshold heuristic. In this case, decoding
will not being able to find a KV cache slot. This is particularly
difficult for the caller to handle if it happens in between
ubatches. To avoid this, we should immediately trigger a defrag.

In addition, a heavily fragmented cache can require more than
max_moves to defragment. Currently, we stop when we hit the limit
but this can leave a cache that still does not have adequate space
even after defragmentation is triggered. Instead, we should do
multiple batches of processing until everything is complete.

Fixes #7949
2024-12-17 14:01:19 -08:00
Blake Mizerany
2ddc32d5c5 llm: do not error on "null" format (#8139)
This fixes another regression in the previous commit that fixed other
known bugs.
2024-12-17 09:49:37 -08:00
Jascha Beste
2cde4b8817 readme: change getting started guide link for pgai (#8119) 2024-12-16 22:13:23 -08:00
Blake Mizerany
87f0a49fe6 llm: do not silently fail for supplied, but invalid formats (#8130)
Changes in #8002 introduced fixes for bugs with mangling JSON Schemas.
It also fixed a bug where the server would silently fail when clients
requested invalid formats. It also, unfortunately, introduced a bug
where the server would reject requests with an empty format, which
should be allowed.

The change in #8127 updated the code to allow the empty format, but also
reintroduced the regression where the server would silently fail when
the format was set, but invalid.

This commit fixes both regressions. The server does not reject the empty
format, but it does reject invalid formats. It also adds tests to help
us catch regressions in the future.

Also, the updated code provides a more detailed error message when a
client sends a non-empty, but invalid format, echoing the invalid format
in the response.

This commits also takes the opportunity to remove superfluous linter
checks.
2024-12-16 21:57:49 -08:00
Jeffrey Morgan
0f06a6daa7 llm: loosen format check to default to no format (#8127) 2024-12-16 18:45:46 -08:00
Daniel Hiltgen
8f805dd74b darwin: restore multiple runners for x86 (#8125)
In 0.5.2 we simplified packaging to have avx only for macos x86.  It looks like
there may still be some non-AVX systems out there, so this puts back the prior
logic of building no-AVX for the primary binary, and now 2 runners for avx and avx2.
These will be packaged in the App bundle only, so the stand-alone binary will now be
without AVX support on macos.  On arm, we'll also see these runners reported
as available in the log, but they're dormant and will never be used at runtime.
2024-12-16 18:45:02 -08:00
Michael
89d5e2f2fd readme: example/get started guide for pgai with Ollama (#8115)
readme: example/get started guide for pgai with Ollama
2024-12-16 17:14:37 +08:00
Jascha Beste
297ada6c87 readme: add pgai to readme for semantic search (#8028)
* docs: switch around database integrations order and link to quickstart

* docs: link to blog post in example readme

* chore: link to main readme

* readme: removing example to link externally

readme: removing example to link externally so we don't have to keep this example up-to-date

---------
2024-12-16 17:02:28 +08:00
Patrick Devine
8c9fb8eb73 imageproc mllama refactor (#7537)
Refactor mllama image processing code, and add pixtral and qwen2vl
2024-12-14 19:50:15 -08:00
Daniel Hiltgen
b75ccfc5ec ci: be more aggressive on parallelism in build (#8102) 2024-12-14 14:56:05 -08:00
Jeffrey Morgan
7a81daf026 llama: update vendor code to commit ba1cb19c (#8101) 2024-12-14 14:55:51 -08:00
Daniel Hiltgen
60f75560a2 runner: switch logging back to stderr (#8091)
This puts the low-level runner logging back on stderr for consistency with prior releases
2024-12-13 14:36:50 -08:00
Anuraag (Rag) Agrawal
e28f2d4900 openai: return usage as final chunk for streams (#6784)
* openai: return usage as final chunk for streams

---------

Co-authored-by: ParthSareen <parth.sareen@ollama.com>
2024-12-12 17:09:30 -08:00
Pascal Patry
c216850523 llama: parse JSON schema using nlohmann::ordered_json to maintain ordering (#8071) 2024-12-12 09:57:28 -08:00
Parth Sareen
18f6a98bd6 llama: enable JSON schema key ordering for generating grammars (#8055) 2024-12-11 17:17:36 -08:00
Blake Mizerany
b1fd7fef86 server: more support for mixed-case model names (#8017)
Fixes #7944
2024-12-11 15:29:59 -08:00
Daniel Hiltgen
36d111e788 ci: fix linux version (#8054)
Pass through the version override so the makefiles use it
2024-12-11 14:09:57 -08:00
Blake Mizerany
9039c821a2 llama: preserve field order in user-defined JSON schemas (#8002)
Previously we decoded and re-encoded JSON schemas during validation,
which served no purpose since json.RawMessage already validates JSON
syntax. Worse, the re-encoding lost field ordering from the original
schema, which affects inference quality during step-by-step reasoning.

While fixing this ordering issue by using json.RawMessage directly,
testing revealed that schema_to_grammar (from llama.cpp) also fails to
preserve field order during grammar generation. This appears to be the
root cause of inference degradation.

This change prevents us from mangling the user's original schema order,
but we still need to address the ordering issue in schema_to_grammar.
That will be a separate change.

Updates #7978
2024-12-11 14:07:30 -08:00
Daniel Hiltgen
581a4a5553 ci: fix artifact path prefix for missing windows payloads (#8052)
upload-artifacts strips off leading common paths so when
the ./build/ artifacts were removed, the ./dist/windows-amd64
prefix became common and was stripped, making the
later download-artifacts place them in the wrong location
2024-12-11 10:59:32 -08:00
Daniel Hiltgen
cf4d7c52c4 win: builtin arm runner (#8039)
The new build embeds the arm runner in the
main binary, so there is no longer a lib/ollama
2024-12-11 08:32:13 -08:00
Daniel Hiltgen
6a6328a5e9 ci: build dir changed (#8037)
Remove no longer relevant build log dir
2024-12-10 20:33:34 -08:00
Jeffrey Morgan
527cc97899 llama: update vendored code to commit 40c6d79f (#7875) 2024-12-10 19:21:34 -08:00
Blake Mizerany
a37f4a86a7 go.mod: go 1.22.8 -> 1.23.4 (#8036) 2024-12-10 18:16:16 -08:00
湛露先生
46f74e0cb5 Return err when NewHipLib() detect error. (#8012)
Signed-off-by: zhanluxianshen <zhanluxianshen@163.com>
2024-12-10 16:32:29 -08:00
Phil Wornath
7622ea21af readme: add AI summary helper plugin to community-integrations (#7202) 2024-12-10 16:13:06 -08:00
Tao Zuhong
c5d3947084 readme: add Kangaroo, an AI-powered SQL admin tool to community integrations (#7948) 2024-12-10 13:48:32 -08:00
frob
757eeacc1b server: lowercase hostname for Host header check (#5851) 2024-12-10 13:43:22 -08:00
Dr. Daniel Bender
dd42acf737 readme: add aidful-ollama-model-delete to community integrations (#8024) 2024-12-10 13:03:19 -08:00
Daniel Hiltgen
b9ccb3741e Remove unused runner CpuFeatures (#8032)
The final implementation of #7499 removed dynamic vector requirements
in favor of a simpler filename based model, and this was left over logic that
is no longer needed.
2024-12-10 12:59:39 -08:00
Stefan Weil
abfdc4710f all: fix typos in documentation, code, and comments (#7021) 2024-12-10 12:58:06 -08:00
Daniel Hiltgen
82a02e18d9 build: fix typo in override variable (#8031)
The "F" was missing.
2024-12-10 10:51:16 -08:00
Daniel Hiltgen
4879a234c4 build: Make target improvements (#7499)
* llama: wire up builtin runner

This adds a new entrypoint into the ollama CLI to run the cgo built runner.
On Mac arm64, this will have GPU support, but on all other platforms it will
be the lowest common denominator CPU build.  After we fully transition
to the new Go runners more tech-debt can be removed and we can stop building
the "default" runner via make and rely on the builtin always.

* build: Make target improvements

Add a few new targets and help for building locally.
This also adjusts the runner lookup to favor local builds, then
runners relative to the executable, and finally payloads.

* Support customized CPU flags for runners

This implements a simplified custom CPU flags pattern for the runners.
When built without overrides, the runner name contains the vector flag
we check for (AVX) to ensure we don't try to run on unsupported systems
and crash.  If the user builds a customized set, we omit the naming
scheme and don't check for compatibility.  This avoids checking
requirements at runtime, so that logic has been removed as well.  This
can be used to build GPU runners with no vector flags, or CPU/GPU
runners with additional flags (e.g. AVX512) enabled.

* Use relative paths

If the user checks out the repo in a path that contains spaces, make gets
really confused so use relative paths for everything in-repo to avoid breakage.

* Remove payloads from main binary

* install: clean up prior libraries

This removes support for v0.3.6 and older versions (before the tar bundle)
and ensures we clean up prior libraries before extracting the bundle(s).
Without this change, runners and dependent libraries could leak when we
update and lead to subtle runtime errors.
2024-12-10 09:47:19 -08:00
frob
63269668c0 Prevent underflow when FreeMemory < overhead (#8014)
Co-authored-by: Richard Lyons <frob@cloudstaff.com>
2024-12-10 09:10:40 -08:00
Jesse Gross
900f64e6be prompt: Don't trim whitespace from prompts
New lines can be an important part of a user's prompt and trimming
it can alter the results. We previously only trimmed prompts with
images but refactoring brought this behavior to all prompts, where
it became more noticable.

The /generate endpoint adds less whitespace and therefore doesn't
need to trim it out - this brings the same behavior to /chat.

Thanks to @gabe-l-hart for spotting the issue!

Fixes #7795
2024-12-09 11:02:55 -08:00
Yannick Gloster
da09488fbf docs: remove comment regarding tool streaming in openai.md (#7960) 2024-12-07 22:16:21 -08:00
湛露先生
7f0ccc8a9d docs: fix syntax error in openai.md (#7986) 2024-12-07 22:14:36 -08:00
Parth Sareen
de52b6c2f9 bugfix: "null" value json mode (#7979) 2024-12-06 14:13:15 -08:00
Michael
acd7d03266 readme: add llama3.3 to readme (#7975)
readme: add llama3.3 to readme
2024-12-06 14:05:11 -05:00
Parth Sareen
f6e87fd628 docs: update readmes for structured outputs (#7962) 2024-12-06 10:35:37 -08:00
Jeffrey Morgan
aed1419c64 ci: skip go build for tests (#7899) 2024-12-04 21:22:36 -08:00
Parth Sareen
c6c526275d api: add generate endpoint for structured outputs (#7939) 2024-12-04 17:37:12 -08:00
Parth Sareen
630e7dc6ff api: structured outputs - chat endpoint (#7900)
Adds structured outputs to chat endpoint
---------

Co-authored-by: Michael Yang <mxyng@pm.me>
Co-authored-by: Hieu Nguyen <hieunguyen1053@outlook.com>
2024-12-04 16:31:19 -08:00
Michael Yang
eb8366d658 Merge pull request #7932 from ollama/mxyng/fix-merges 2024-12-04 10:04:52 -08:00
Michael Yang
4456012956 fix unmarshaling merges 2024-12-04 09:21:56 -08:00
Sam
539be43640 llm: normalise kvct parameter handling (#7926) 2024-12-03 16:30:40 -08:00
Sam
1bdab9fdb1 llm: introduce k/v context quantization (vRAM improvements) (#6279) 2024-12-03 15:57:19 -08:00
owboson
2b82c5a8a1 docs: correct default num_predict value in modelfile.md (#7693) 2024-12-03 15:00:05 -08:00
Tigran
55c3efa900 docs: remove extra quote in modelfile.md (#7908) 2024-12-02 09:28:56 -08:00
David Mayboroda
1aedffad93 readme: add minima to community integrations (#7906) 2024-12-02 01:14:47 -08:00
Jeffrey Morgan
ff6c2d6dc8 cmd: don't rely on reading repo file for test (#7898) 2024-11-30 14:12:53 -08:00
Jeffrey Morgan
d543b282a7 server: add warning message for deprecated context field (#7878) 2024-11-30 14:05:50 -08:00
Parth Sareen
5f8051180e Enable index tracking for tools - openai api support (#7888) 2024-11-29 20:00:09 -08:00
Jeffrey Morgan
39e29ae5dd llama: fix typo and formatting in readme (#7876) 2024-11-28 17:27:11 -08:00
TheCookingSenpai
30a9f063c9 readme: add SpaceLlama, YouLama, and DualMind to community integrations (#7216) 2024-11-28 15:16:27 -08:00
Parth Sareen
ce7455a8e1 api: enable tool streaming (#7836) 2024-11-27 13:40:57 -08:00
ItzCrazyKns
e3936d4fb3 Support Multiple LoRa Adapters (#7667)
Closes #7627
2024-11-27 11:00:04 -08:00
Bruce MacDonald
940e62772e openai: remove unused error code (#7850)
The writeError takes a code argument which is no longer used. Remove it for clarity.
2024-11-26 16:08:09 -08:00
Jesse Gross
71e6a0d0d1 runner.go: Don't try to extract image tags for text models
When processing a prompt, we look for image tags of the form
[img-0], which are inserted by the Ollama server process.
However, this can cause errors if the original prompt has these
tags - typically an image not found error is returned.

This changes tag searching behavior to be similar to the 0.3.x
series, which will largely avoid these problems. However,they can
still happen when input text with these tags is used with image
models. The correct solution is to escape the tags but this is a
larger issue with special sequences in general so this is an
incremental fix that should avoid the problem for the majority
of cases.
2024-11-26 13:23:24 -08:00
Jesse Gross
2cd11ae365 runner.go: Add unit tests for context shifting
This also makes it easier to truncate long inputs the same as
shifting but does not actually implement it. This type of
truncation has a trade off between quality and time to first
token.
2024-11-26 11:21:35 -08:00
jake83741
52bbad12f9 readme: update description for vnc-lm community integration (#7832) 2024-11-25 17:56:30 -08:00
frob
30e88d7f31 cmd: don't submit svg files as images for now (#7830) 2024-11-25 16:43:29 -08:00
Blake Mizerany
2b7ed61ca2 server: fix Transport override (#7834)
This changes makeRequest to update the http client Transport if and only
if testMakeRequestDialContext is set. This is to avoid overriding the
default Transport when testMakeRequestDialContext is nil, which broke
existing behavior, included proxies, timeouts, and other behaviors.

Fixes #7829
Fixes #7788
2024-11-25 15:08:34 -08:00
Shikhar Bakhda
647513a7d4 readme: add HoneyHive to community integrations (#7831) 2024-11-25 09:55:33 -08:00
Bruce MacDonald
a210ec74d2 cmd: print location of model after pushing (#7695)
After a user pushes their model it is not clear what to do next. Add a link
to the output of `ollama push` that tells the user where their model can now
be found.
2024-11-25 09:40:16 -08:00
Simon Schampijer
cfb1ddd6fc examples: update langchain-python-simple (#3591)
- better formatting of input prompt
- use invoke instead of predict
2024-11-24 16:06:22 -08:00
reid41
3987acd7ec readme: add descriptions for QA-Pilot and shell-pilot community integrations (#4303) 2024-11-24 15:55:09 -08:00
frob
fda1e6b563 llm: bring fileTypes into alignment with llama.cpp (#7819) 2024-11-24 10:33:33 -08:00
Adarsh Mishra
3440ffb37b readme: add description for OpenTalkGpt in community integrations (#7818) 2024-11-24 10:32:23 -08:00
Patcher
a820d2b267 readme: add observability section with OpenLIT to community-integrations 2024-11-23 18:03:12 -08:00
Meng Zhuo
2ebdb54fb3 all: update math32 go mod to v1.11.0 (#6627) 2024-11-23 15:21:54 -08:00
josc146
bb52abfa55 readme: add ChatGPTBox and RWKV-Runner to community integrations (#4118) 2024-11-23 13:31:27 -08:00
oza6ut0ne
31cb1ca9e5 openai: accept X-Stainless-Retry-Count header (#6910) 2024-11-23 12:39:05 -08:00
Rodrigo Ribeiro Gomes
78f779a323 readme: add powershai, a powershell module with ollama support to community integrations (#7438) 2024-11-23 10:08:59 -08:00
Jesse Gross
3478b2cf14 runner.go: Fix deadlock with many concurrent requests
If there are no avilable slots for new sequences then a request
will not be added to the processing queue but will continue on
to wait for a response that never comes. Besides never giving a
response to the request, this prevents the model from being
unloaded due to the outstanding request.

To prevent this, there are semaphores that prevent more requests
from being processed than there are slots - one in the Ollama
server and one in the runner.
 - The Ollama server one works but it is not designed to protect
the runner's data internal structures and the runner can return a
final response before clearing its data structures.
 - The internal runner semaphore has similar behavior where it
 can release the semaphore when it issues a response. This is
 wrong - it should only release the semaphore after it has
 cleared the data structure.

In addition, we should return an error if a slot is not found
rather than deadlocking in the event we ever get to this spot.

Fixes #7779
2024-11-22 16:14:51 -08:00
Bruce MacDonald
7b5585b9cb server: remove out of date anonymous access check (#7785)
In the past the ollama.com server would return a JWT that contained
information about the user being authenticated. This was used to return
different error messages to the user. This is no longer possible since the
token used to authenticate does not contain information about the user
anymore. Removing this code that no longer works.

Follow up changes will improve the error messages returned here, but good to
clean up first.
2024-11-22 11:57:35 -08:00
Daniel Hiltgen
f0a351810c tests: fix max queue integration test (#7782)
This had fallen out of sync with the envconfig behavior, where max queue default was not zero.
2024-11-22 08:05:45 -08:00
Daniel Hiltgen
b85520bfb9 logs: explain client aborts better (#7783)
Users get confused by "Failed to acquire semaphore" error="context canceled"
messages in the logs, which are actually clients giving up.  While there could be
a legitimate hang bug in the system, sometimes this is just short client timeouts
with an overloaded system, so this should help users understand what's going on
better.
2024-11-22 08:05:32 -08:00
Daniel Hiltgen
d88972ea48 Be quiet when redirecting output (#7360)
This avoids emitting the progress indicators to stderr, and the interactive
prompts to the output file or pipe.  Running "ollama run model > out.txt"
now exits immediately, and "echo hello | ollama run model > out.txt"
produces zero stderr output and a typical response in out.txt
2024-11-22 08:04:54 -08:00
Leon Sander
25c9339e2d readme: add Local Multimodal AI Chat app to community integrations (#6931) 2024-11-21 20:39:38 -08:00
Mikel Olasagasti Uranga
597072ef1b readme: update google/uuid module (#7310)
update uuid.New().String() to uuid.NewString()
2024-11-21 19:37:04 -08:00
Dustin
84b3e07f1b readme: add ollamarama-matrix to community integrations (#7325) 2024-11-21 17:49:30 -08:00
Edwin.JH.Lee
422d52858c readme: add x-cmd ollama module to community integrations (#5191) 2024-11-21 16:55:25 -08:00
Elias
723f285813 readme: add OrionChat to community integrations (#7084)
OrionChat is a free web-based chat interface that simplifies interactions
with multiple AI model providers. It provides a unified platform for chatting
and exploring multiple large language models (LLMs).
2024-11-21 11:23:42 -08:00
湛露先生
eaaf5d309d cmd: delete duplicated call to sb.Reset() (#7308)
Signed-off-by: zhanluxianshen <zhanluxianshen@163.com>
2024-11-21 11:20:48 -08:00
Jeffrey Morgan
27d9c749d5 docs: remove tutorials, add cloud section to community integrations (#7784) 2024-11-21 09:59:53 -08:00
R0CKSTAR
b7bddeebc1 env.sh: cleanup unused RELEASE_IMAGE_REPO (#6855)
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
2024-11-21 08:28:04 -08:00
Paul Robello
6a0c2ec50f readme: add terminal tool ParLlama to community integrations (#5623) 2024-11-21 02:55:35 -08:00
毛巳煜
baa41be2aa readme: add a community made ollama web management tool (#7126) 2024-11-21 02:51:45 -08:00
xuyangbocn
2157b1232e readme: add Terraform AWS Ollama & Open WebUI community example (#5633) 2024-11-21 02:28:57 -08:00
emrgnt-cmplxty
37711578a2 readme: add R2R to community integrations (#5587) 2024-11-21 02:09:36 -08:00
Cyril Blaecke
fb2c9594e0 readme: Add Nosia to Community Integrations (#5381) 2024-11-21 02:07:17 -08:00
Christian Tzolov
7fbcd55da3 readme: Add Spring AI library reference (#5981) 2024-11-21 02:02:14 -08:00
Philippe Charrière
b4348bdd25 readme: add Parakeet to community integrations
Parakeet is a GoLang SDK for Ollama

---------

Co-authored-by: Parth Sareen <parth.sareen@ollama.com>
2024-11-21 02:00:32 -08:00
Marcin Szczygliński
155734e09a readme: add community integration py-gpt (#6503) 2024-11-21 01:54:39 -08:00
Michael
883d80e097 readme: add Promptery to community integrations (#7093) 2024-11-21 01:46:20 -08:00
Jakub Burkiewicz
e4c9f75b23 readme: add node-red-contrib-ollama to community integrations (#4648) 2024-11-21 01:09:37 -08:00
Dezoito
f5ec7cc872 readme: add ollama grid search, a community project (#4301) 2024-11-21 01:02:46 -08:00
Franco Lombardo
811bafba82 readme: Add LLPhant to community integrations (#5679) 2024-11-21 00:54:26 -08:00
Aarushi
431075fcbb readme: add autogpt integration to list of community integrations (#6459) 2024-11-21 00:51:38 -08:00
Kevin Brake
c4f27225ac readme: add community contribution to readme ollama-kis (#5575) 2024-11-21 00:31:27 -08:00
chyok
b7aa5ee06c readme: Add tkinter-based client to community based integrations (#5412) 2024-11-21 00:19:24 -08:00
Nico
3f87f71755 readme: add Shinkai Desktop to community integrations (#4877) 2024-11-21 00:16:18 -08:00
Laurent Eschenauer
20623cec13 readme: add OpenGPA to community integrations (#5497) 2024-11-21 00:13:54 -08:00
Andy Gill
0e5f31a86d readme: add Haverscript to community integrations (#6945)
Haverscript uses classical functional programming techniques to provide a composable interface for interacting with ollama-hosted LLMs.
2024-11-21 00:11:39 -08:00
drunkwcodes
7e92091751 readme: Terminal app bb7 to community integrations (#7064) 2024-11-21 00:03:11 -08:00
boessu
1a742f54c9 readme: update AMD ROCm links (#7213) 2024-11-20 23:48:55 -08:00
奶茶叔叔
6a89dcf848 readme: flutter-based chat app to community integrations (#7221) 2024-11-20 23:30:10 -08:00
Alexander F. Rødseth
c5e238e8e5 readme: orbiton to community integrations (#7770) 2024-11-20 23:24:05 -08:00
Nikita Ganzikov
fce30f407a app: typo in wintray messages const (#7705) 2024-11-20 22:01:58 -08:00
Daniel Hiltgen
d863298210 docs: Link to AMD guide on multi-GPU guidance (#7744) 2024-11-20 16:00:46 -08:00
Jesse Gross
c4b34f2a2a runner.go: Truncate inputs that exceed context rather than shifting
Previous versions of the runner would truncate inputs to the context
window before beginning processing. The main processing loop relied
on this behavior if the context needed to be shifted later (due to
token generation). If truncation did not occur then invariants
would be broken, causing crashes or infinite loops.

Later versions attempted to fix these bugs and make the logic less
subtle so that all inputs could be handled. Truncation was removed
to make things consistent.

However, truncation is much faster than processing and shifting, so
removing it caused performance problems when the input vastly exceeded
the context size. This restores the input truncation as a performance
optimization while keeping the more robust processing logic.

Fixes #7762
2024-11-20 12:49:24 -08:00
Jesse Gross
c3ff916431 runner.go: Don't add inputs to cache view until actually processed
We need to track which tokens are in the cache ourselves. We currently
add tokens to the cache tracker when we add them to batch but they are
not actually in the cache until we call Decode. This can cause
confusion when we are shifting the cache.

Avoids "could not find a KV slot for the batch" issues.

Bug #7545
2024-11-20 12:49:24 -08:00
Jesse Gross
3fc1dc0e6f runner.go: Hard fail on errors rather than potentially infinite looping
We try to recover from errors by dropping the tokens that caused the
problem and re-trying. However, dropping the tokens is not correct
and continuing often leads to infinite loops. To avoid, this we
end the sequence if such a condition is detected, which is also
surprising.

At this point, it is better to just report the error. This will make
it easier to find problems and the alternatives are perhaps even more
surprising to users.

This is not a very satisfactory solution either - we should isolate
the error and return it to the user without killing the whole process.
However, this is an incremental step and consistent with most other
failures (which either manifest as abort() or panic).
2024-11-20 12:49:24 -08:00
Jesse Gross
7121dfa309 runner.go: Retry decoding after defragmentation if needed
Fragmentation of the KV cache can occur due to cache shifting or
different sequences getting processed. Decode uses a heuristic to
decide if it should defrag. However, this heuristic isn't 100%
accurate, so decoding can sometimes fail by surprise.

For these cases, if decode indicates that there is no KV cache space,
we should defrag and then try again.
2024-11-20 12:49:24 -08:00
Jesse Gross
5f68fcab12 runner.go: Use correct index when retrieving embedding results
This doesn't have any impact currently because NUM_PARALLEL is forced
to 1 for embeddings, so both indicies will always be 0.
2024-11-20 12:49:24 -08:00
Emir Sahin
ecf41eed05 readme: add llm-axe to community integrations (#5931) 2024-11-20 10:53:14 -08:00
Marcus Ziadé
b8c66d3307 readme: add a swift community integration (#7383) 2024-11-20 10:49:15 -08:00
thewh1teagle
303f4bc79e readme: add vibe app to community integrations (#7607) 2024-11-20 10:45:10 -08:00
Adarsh Mishra
d2a25206b1 readme: add opentalkgpt to community integrations (#7707) 2024-11-20 10:42:55 -08:00
rohitanshu
2f0a8c8778 docs: fix minor typo in import.md (#7764)
change 'containg' to 'containing'
2024-11-20 09:57:32 -08:00
Gordon Kamer
bfd30f4286 readme: add Abbey to community integrations (#7746) 2024-11-19 21:37:15 -08:00
Jonathan Hecl
0ef17ede89 readme: add Gollama to community integrations (#7756) 2024-11-19 21:31:43 -08:00
Daniel Hiltgen
909a88c5c0 Improve crash reporting (#7728)
Many model crashes are masked behind "An existing connection was forcibly closed by the remote host"
This captures that common error message and wires in any detected errors from the log.

This also adds the deepseek context shift error to the known errors we capture.
2024-11-19 16:26:57 -08:00
Daniel Hiltgen
f602ab4de4 expose underlying error on embedding failure (#7743)
Avoid a round-trip asking users for logs to see what went wrong.
2024-11-19 16:26:05 -08:00
Gabe Goodhart
807ace5b1f fix(runner): Set logits to 0 if false on Batch.Add
https://github.com/ollama/ollama/issues/7656
Branch: Granite3StoppingBug-7656

Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
2024-11-19 15:45:37 -08:00
Blake Mizerany
4b8a2e341a server: allow mixed-case model names on push, pull, cp, and create (#7676)
This change allows for mixed-case model names to be pushed, pulled,
copied, and created, which was previously disallowed because the Ollama
registry was backed by a Docker registry that enforced a naming
convention that disallowed mixed-case names, which is no longer the
case.

This does not break existing, intended, behaviors.

Also, make TestCase test a story of creating, updating, pulling, and
copying a model with case variations, ensuring the model's manifest is
updated correctly, and not duplicated across different files with
different case variations.
2024-11-19 15:05:57 -08:00
frob
e66c29261a Better error suppresion when getting terminal colours (#7739)
Co-authored-by: Richard Lyons <frob@cloudstaff.com>
2024-11-19 08:33:52 -08:00
Patrick Devine
712d63c3f0 update the docs (#7731) 2024-11-18 21:17:38 -08:00
Patrick Sy
6cdf27d154 readme: add Alfred Ollama to community integrations (#7724) 2024-11-18 19:33:23 -08:00
frob
5c18e66384 Notify the user if systemd is not running (#6693)
Co-authored-by: Richard Lyons <frob@cloudstaff.com>
2024-11-18 15:02:41 -08:00
Daniel Hiltgen
35096a7eff win: add right click menu support (#7727)
Enable both left and right click on the pop-up menu
2024-11-18 14:39:52 -08:00
Daniel Hiltgen
81d55d3e4d fix index out of range on zero layer metal load (#7696)
If the model doesn't fit any layers on metal, and we load zero layers
we would panic trying to look up the GPU size during scheduling ops
2024-11-18 11:48:13 -08:00
Vinh Nguyen
a14f76491d readme: improve Community Integrations section (#7718) 2024-11-17 19:30:22 -08:00
Nicolas Bonamy
760cfa27e5 readme: add Witsy and multi-llm-ts to community integrations (#7713) 2024-11-17 16:33:10 -08:00
Darius Kocar
c9a5aca3da readme: add Perfect Memory AI to community integrations (#7431) 2024-11-17 15:19:26 -08:00
Tushar Adhatrao
d5da2ab7e8 readme: add ollama-haskell library to community integrations (#7451) 2024-11-17 15:18:04 -08:00
Vinh Nguyen
1c04117114 readme: add the VT app to the community integrations section (#7706) 2024-11-17 14:35:41 -08:00
Jeffrey Morgan
8b4b243f5f server: fix warnings in prompt_test.go (#7710) 2024-11-17 13:01:04 -08:00
Jeffrey Morgan
b42a596425 docs: add customization section in linux.md (#7709) 2024-11-17 11:48:12 -08:00
Daniel Hiltgen
4759d879f2 Install support for jetpacks (#7632)
Follow up to #7217 - merge after release
2024-11-15 16:47:54 -08:00
Jesse Gross
d875e99e46 runner.go: Propagate panics back to the user.
This is a partial revert of 8a35bb92
"runner.go: Increase survivability of main processing loop", removing
the panic handler.

Although we want to avoid errors taking down the runner, we also
should make the user aware of problems when they happen. In the
future, we can restructure things so both parts are true.
2024-11-15 11:52:25 -08:00
Jesse Gross
8a35bb926e runner.go: Increase survivability of main processing loop
Currently, if an error occurs during the prep stages (such as
tokenizing) of a single request, it will only affect that request.
However, if an error happens during decoding, it can take down the
entire runner.

Instead, it's better to drop the tokens that triggered the error and try to
keep going. However, we also need to stop when we run out of tokens,
otherwise, this just causes an infinite loop. This is likely the cause
of at least some of the hanging issues that have been reported.

Bug #7573
2024-11-14 17:18:41 -08:00
Daniel Hiltgen
a0ea067b63 build: fix arm container image (#7674)
Fix a rebase glitch from the old C++ runner build model
2024-11-14 16:02:01 -08:00
Patrick Devine
4efb98cb4f add line numbers for parser errors (#7326) 2024-11-14 13:59:44 -08:00
Bruce MacDonald
0679d491fe chore(deps): bump golang.org/x dependencies (#7655)
- golang.org/x/sync v0.3.0 -> v0.9.0
- golang.org/x/image v0.14.0 -> v0.22.0
- golang.org/x/text v0.15.0 -> v0.20.0
2024-11-14 13:58:25 -08:00
Jesse Gross
c25ffde91d runner.go: Don't trim whitespace from inputs
It's possible to get prompts that consist entirely of whitespace -
this is most likely to happen when generating embeddings. Currently,
we will trim this away, leaving an empty prompt, which will then
generate an error.

Generating embeddings from whitespace should not trigger an error,
as this may break pipelines. It's better to just leave the whitespace
in place and process what we are given. This is consistent with
past versions of Ollama.

Bug #7578
2024-11-14 11:23:06 -08:00
Jesse Gross
17b386a891 runner.go: Enforce NUM_PARALLEL directly in the runner
NUM_PARALEL is currently enforced by the Ollama server process - it
will only issue requests to the runner if the maximum number of
concurrent requests has not been exceeded. Although this should
be sufficient, it is good for the runner to protect its own data
structures. Currently, if too many requests get through to the
runner, they will just get stuck and never return.

This may help with reports of Ollama hanging, though it is unclear
how it would actually occur.

Bug #7573
2024-11-14 11:21:59 -08:00
Michael Yang
549c2bdfcf Merge pull request #7657 from ollama/mxyng/sync
fix(mllama): sync backend between batches
2024-11-14 09:40:04 -08:00
Blake Mizerany
67691e410d cmd: preserve exact bytes when displaying template/system layers (#7586) 2024-11-13 23:53:30 -08:00
Michael Yang
5b3393b6a2 fix(mllama): sync backend between batches 2024-11-13 16:37:21 -08:00
Jesse Gross
d7eb05b936 runner.go: Fix off-by-one for num predicted 2024-11-12 11:35:57 -08:00
Daniel Hiltgen
636a743c2b CI: give windows lint more time (#7635)
It looks like 8 minutes isn't quite enough and we're seeing sporadic timeouts
2024-11-12 11:22:39 -08:00
Daniel Hiltgen
df011054fa Jetpack support for Go server (#7217)
This adds support for the Jetson JetPack variants into the Go runner
2024-11-12 10:31:52 -08:00
Daniel Hiltgen
ac07160c8d doc: capture numeric group requirement (#6941)
Docker uses the container filesystem for name resolution, so we can't guide users
to use the name of the host group.  Instead they must specify the numeric ID.
2024-11-12 09:13:23 -08:00
Daniel Hiltgen
6606e4243c docs: Capture docker cgroup workaround (#7519)
GPU support can break on some systems after a while.  This captures a
known workaround to solve the problem.
2024-11-12 09:12:50 -08:00
Jesse Gross
65973ceb64 runner.go: Make KV entry accounting more robust
The structure of the accounting for KV cache shifting was carried
over from the old runner but it now doesn't feel natural with the new
runner. There are a number of invariants that should hold true but
are difficult to reason about. There is at least one bug report
that would imply that the invariants are not holding.

This reduces the number of implicit assumptions and is more forgiving
of unexpected situations. It also improves behavior around which input
tokens are kept when truncation occurs.

Bug #7545
2024-11-11 20:23:03 -08:00
Joey Zheng
bebef1e50d readme: add aichat terminal app to community integrations (#7418) 2024-11-11 16:44:46 -08:00
Evan
d48c1c5a44 api: fix typos in Go Doc comments (#7620) 2024-11-11 16:21:58 -08:00
Prasad Bhalerao
36a8372b28 readme: add GoLamify to community integrations (#7521) 2024-11-10 22:38:18 -08:00
Ivo Stoykov
4e94227b5d readme: add browser extension that enables using Ollama for interacting with web pages (#5827) 2024-11-10 22:14:22 -08:00
frances720
479d551766 docs: add mentions of Llama 3.2 (#7517) 2024-11-10 19:04:23 -08:00
Evan
76b2b723b2 api: fix typo in python ClientFromEnvironment docs (#7604) 2024-11-10 17:30:27 -08:00
Arhan Busam
b8d77cdeab readme: add llama3.2-vision to model list (#7580) 2024-11-10 13:36:25 -08:00
Jesse Gross
c2e8cbaa14 runner.go: Check for zero length images
If we get a request with a zero length image, it will result in
an out-of-bounds error when we pass the data to the image encoder.
2024-11-08 09:39:32 -08:00
Edward J. Schwartz
771fab1dd8 docs: update langchainpy.md with proper model name (#7527) 2024-11-08 09:36:17 -08:00
Daniel Hiltgen
3a5239e6bf Set macos min version for all architectures (#7579) 2024-11-08 09:27:04 -08:00
Daniel Hiltgen
3d25e7bf8c win: remove preview title from installer (#7529)
This should have been in #7347 but was overlooked.
2024-11-07 14:26:47 -08:00
Daniel Hiltgen
1618700c5a Workaround buggy P2P ROCm copy on windows (#7466)
This enables the workaround code only for windows which should help windows users with muliple AMD GPUs
2024-11-07 14:26:31 -08:00
Daniel Hiltgen
b111aa5a91 Debug logging for nvcuda init (#7532)
Some users are reporting crashes during nvcuda.dll initialization
on windows.  This should help narrow down where things are going bad.
2024-11-07 14:25:53 -08:00
Daniel Hiltgen
9e83e550e1 Align rocm compiler flags (#7467)
Bring consistency with the old generate script behavior
2024-11-07 10:20:50 -08:00
Daniel Hiltgen
fc2a0715df Be explicit for gpu library link dir (#7560)
On linux nvcc isn't automatically linking to the same cuda version.
2024-11-07 09:20:40 -08:00
Jesse Gross
3020d2dc58 docs: OLLAMA_NEW_RUNNERS no longer exists 2024-11-06 14:39:02 -08:00
Jesse Gross
a909417602 runner.go: Remove unused arguments
Now that server.cpp is gone, we don't need to keep passing arguments
that were only ignored and only kept for compatibility.
2024-11-06 13:32:18 -08:00
Jesse Gross
6cd566872b sched: Lift parallel restriction for multimodal models except mllama
The Go runner does not have a problem with supporting parallel
requests for most multimodal models. Now that we won't be potentially
falling back to server.cpp, this restriction can be lifted.

However, the new mllama model can't support parallel requests, so we
will need to keep a restriction for that.
2024-11-06 13:32:18 -08:00
RAPID ARCHITECT
9d71bcc3e2 Update README.md (#7516)
added reddit rate below hexabot, ollama powered reddit search and analysis with streamlit for the intervace
2024-11-05 15:07:25 -08:00
Daniel Hiltgen
a4c70fe157 One corrupt manifest should not wedge model operations (#7515)
One potential failure mode is an empty file which bubbles up as an EOF error,
leading to all pulls and listing operations failing.  Instead, continue and
warn about the corrupt manifest.  This also allows re-pulling the corrupt
manifest to repair the system.
2024-11-05 14:21:45 -08:00
Jesse Gross
34a75102f7 prompt: Use a single token when estimating mllama context size
Currently we assume that images take 768 tokens of context size for
the purposes of clipping old messages that exceed the context window.
However, our mllama implementation stores the full image embedding
in a single token. As a result, there is significant waste of context
space.

Ideally, we would handle this more generically and have the
implementation report the number of tokens. However, at the moment
this would just result in a similar set of 'if' conditions in the
runner plus APIs to report it back. So for now, we just keep this
simple.
2024-11-05 10:11:50 -08:00
Med Marrouchi
4157d1f7b6 readme: add Hexabot to the list of community integrations 2024-11-05 09:06:38 -08:00
Daniel Hiltgen
4ebfa2cb91 Quiet down debug log of image payload (#7454)
Avoid excessive log spew and make consistent with chat logging
2024-11-04 13:05:16 -08:00
Daniel Hiltgen
046054fa3b CI: Switch to v13 macos runner (#7498) 2024-11-04 13:02:07 -08:00
Daniel Hiltgen
95483f348b CI: matrix strategy fix (#7496)
Github actions matrix strategy can't access env settings
2024-11-04 10:48:35 -08:00
Michael Yang
f247a6233e Merge pull request #7456 from ollama/mxyng/llama3.2-vision-mem
update llama3.2 vision memory estimation
2024-11-04 09:48:43 -08:00
Daniel Hiltgen
44bd9e5994 Sign windows arm64 official binaries (#7493) 2024-11-04 09:15:14 -08:00
suncloudsmoon
18237be9b2 readme: add TextCraft to community integrations (#7377) 2024-11-03 16:53:51 -08:00
Daniel Hiltgen
29ab9fa7d7 nvidia libs have inconsistent ordering (#7473)
The runtime and management libraries may not always have
identical ordering, so use the device UUID to correlate instead of ID.
2024-11-02 16:35:41 -07:00
Daniel Hiltgen
b8d5036e33 CI: omit unused tools for faster release builds (#7432)
This leverages caching, and some reduced installer scope to try
to speed up builds. It also tidies up some windows build logic
that was only relevant for the older generate/cmake builds.
2024-11-02 13:56:54 -07:00
Jesse Gross
312d9de1d1 llama: Improve error handling
Check for NULL return values from llama.cpp in more places and
convert them into Go errors, which should make debugging easier
in the future rather than having hidden surprises in our data
structures.
2024-11-02 13:37:55 -07:00
Jesse Gross
a103dae01e runner.go: Only allocate 1 element embedding batches for mllama
Mllama has large embeddings (100 MB per image) and each embedding is
represented as 1 token when passed to llama.cpp. Batches are pre-
allocated for the size of the tokens times the batch size, so this
results in allocations of over 50 GB at the default batch size.
On some systems, these mallocs will fail.

Since an image is represented as a single token and mllama doesn't
support more than 1 image per request, we only need to allocate a
batch size of 1, which is much more reasonable. In addition, for
non-multimodal models, we don't need to allocate the embedding
batches at all.

Fixes #7464
2024-11-02 13:37:55 -07:00
Michael Yang
d07cf41a97 refactor kv estimation 2024-11-01 16:23:55 -07:00
Michael Yang
8c238e70ab mllama cross attention 2024-11-01 16:23:55 -07:00
Daniel Hiltgen
8a9bb0d000 Add basic mllama integration tests (#7455) 2024-10-31 17:25:48 -07:00
Jesse Gross
26acdcf44e runner.go: Don't set cross attention before sending embeddings
Currently if an input has embeddings at any point then we will set
cross attention to true from the beginning. This means that any
tokens before the embeddings are sent will incorrectly have cross
attention layers applied.

This only sets cross attention when we have an embedding, either
previously in this sequence or in the cache. It also makes cross
attention capable of supporting parallelism at the runner level,
though the mllama implementation doesn't support that yet.
2024-10-31 13:56:08 -07:00
Daniel Hiltgen
921779bb10 Give unicode test more time to run (#7437)
* Give unicode test more time to run

Some slower GPUs (or partial CPU/GPU loads) can take more than the default 30s to complete this test

* Give more time for concurrency test

CPU inference can be very slow under stress
2024-10-31 13:35:31 -07:00
Daniel Hiltgen
16f4eabe2d Refine default thread selection for NUMA systems (#7322)
Until we have full NUMA support, this adjusts the default thread selection
algorithm to count up the number of performance cores across all sockets.
2024-10-30 15:05:45 -07:00
Jesse Gross
c826e57475 runner.go: Better abstract vision model integration
-Update mllama to take the cross attention state as embeddings in
a batch, more similar to how Llava handles it. This improves
integration with the input cache.
-Pass locations in a prompt for embeddings using tags similar to Llava.
-Abstract interface to vision models so the main runner accesses Clip
and Mllama similarly

Co-authored-by: Michael Yang <mxyng@pm.me>
2024-10-30 14:53:43 -07:00
Daniel Hiltgen
712e99d477 Soften windows clang requirement (#7428)
This will no longer error if built with regular gcc on windows.  To help
triage issues that may come in related to different compilers, the runner now
reports the compier used by cgo.
2024-10-30 12:28:36 -07:00
Daniel Hiltgen
b754f5a6a3 Remove submodule and shift to Go server - 0.4.0 (#7157)
* Remove llama.cpp submodule and shift new build to top

* CI: install msys and clang gcc on win

Needed for deepseek to work properly on windows
2024-10-30 10:34:28 -07:00
Daniel Hiltgen
a805e5947e Move windows app out of preview (#7347) 2024-10-30 09:24:59 -07:00
Daniel Hiltgen
91dfbb1bba windows: Support alt install paths, fit and finish (#6967)
* windows: Support alt install paths

Advanced users are leveraging innosetup's /DIR switch to target
an alternate location, but we get confused by things not existing in the LocalAppData dir.
This also hardens the server path lookup code for a future attempt to unify with a ./bin prefix

* Fit and finish improvements for windows app

Document alternate install location instructions for binaries and model.
Pop up progress UI for upgrades (automatic, with cancel button).
Expose non-default port in menu to disambiguate mutiple instances.
Set minimum Windows version to 10 22H2
2024-10-30 09:24:31 -07:00
Patrick Devine
db1842b9e1 add more tests for getting the optimal tiled canvas (#7411) 2024-10-29 16:28:02 -07:00
Daniel Hiltgen
c9ca386131 Switch windows to clang (#7407)
* Switch over to clang for deepseek on windows

The patch for deepseek requires clang on windows. gcc on windows
has a buggy c++ library and can't handle the unicode characters

* Fail fast with wrong compiler on windows

Avoid users mistakenly building with GCC when we need clang
2024-10-29 13:15:04 -07:00
Jesse Gross
078f666f73 tests: Add test for Unicode processing 2024-10-28 18:12:29 -07:00
Jesse Gross
de1557a0dc runner.go: Better handle return NULL values from llama.cpp
Llama.cpp sometimes returns NULL as a return value to report an
error. We should explicitly check for this and convert it to a Go
error rather than putting NULL in our data structures and waiting
for it to blow up later.
2024-10-28 18:12:29 -07:00
Patrick Devine
084929c293 add mllama image processing to the generate handler (#7384) 2024-10-28 13:51:19 -07:00
Daniel Hiltgen
abd5dfd06a Bump to latest Go 1.22 patch (#7379) 2024-10-26 17:03:37 -07:00
Daniel Hiltgen
099f7077a1 Fix deepseek deseret regex (#7369)
On windows compiled with gcc the c++ regex library failed to handle
the characters
2024-10-26 14:58:54 -07:00
Daniel Hiltgen
d7c94e0ca6 Better support for AMD multi-GPU on linux (#7212)
* Better support for AMD multi-GPU

This resolves a number of problems related to AMD multi-GPU setups on linux.

The numeric IDs used by rocm are not the same as the numeric IDs exposed in
sysfs although the ordering is consistent.  We have to count up from the first
valid gfx (major/minor/patch with non-zero values) we find starting at zero.

There are 3 different env vars for selecting GPUs, and only ROCR_VISIBLE_DEVICES
supports UUID based identification, so we should favor that one, and try
to use UUIDs if detected to avoid potential ordering bugs with numeric IDs

* ROCR_VISIBLE_DEVICES only works on linux

Use the numeric ID only HIP_VISIBLE_DEVICES on windows
2024-10-26 14:04:14 -07:00
Daniel Hiltgen
35ec7f079f Fix unicode output on windows with redirect to file (#7358)
If we're not writing out to a terminal, avoid setting the console mode
on windows, which corrupts the output file.
2024-10-25 13:43:16 -07:00
Daniel Hiltgen
5231ae52d9 Fix incremental build file deps (#7361)
The common src/hdr defs should be in the common definitions, not gpu specific.
2024-10-25 11:50:45 -07:00
Daniel Hiltgen
3085c47bea Improve dependency gathering logic (#7345)
This unfies the rocm/cuda dependency logic into the makefile
and fixes a missing define which broke windows rocm
2024-10-24 09:51:53 -07:00
Bill Wang
0ccc73251a fix #7247 - invalid image input (#7249)
---------

Co-authored-by: Bill Wang <bill.wang@bill.wang>
2024-10-23 10:31:04 -07:00
Daniel Hiltgen
dc6fe82051 integration: harden embedding test (#7306)
Use cosine similarity to make the embeddings tests more robust
2024-10-22 15:25:22 -07:00
Patrick Devine
d78fb62056 default to "FROM ." if a Modelfile isn't present (#7250) 2024-10-22 13:32:24 -07:00
Daniel Hiltgen
5c44461ccf Fix rocm windows build and clean up dependency gathering (#7305)
On windows ensure windows version define is properly set for rocm.
Remove duplicate rocm arch flags.
Resolve wildcards in the targets so parallel builds don't race.
Use readlink to resolve rocm dependencies since wildcards omit libelf
Keep windows rocm deps aligned with unified packaging model
2024-10-22 12:54:15 -07:00
Jesse Gross
03e40efa51 runner.go: Merge partial unicode characters before sending
We check for partial unicode characters and accumulate them before
sending. However, when we did send, we still sent each individual piece
separately, leading to broken output. This combines everything into
a single group, which is also more efficient.

This also switches to the built-in check for valid unicode characters,
which is stricter. After this, we should never send back an invalid
sequence.

Fixes #7290
2024-10-22 12:07:51 -07:00
Mattt
23f746508d readme: add Ollama for Swift to the community integrations (#7295) 2024-10-21 22:29:11 -07:00
Jeffrey Morgan
48708ca0d5 server: allow vscode-webview origin (#7273) 2024-10-19 14:06:41 -07:00
Patrick Devine
c7cb0f0602 image processing for llama3.2 (#6963)
Co-authored-by: jmorganca <jmorganca@gmail.com>
Co-authored-by: Michael Yang <mxyng@pm.me>
Co-authored-by: Jesse Gross <jesse@ollama.com>
2024-10-18 16:12:35 -07:00
Daniel Hiltgen
bf4018b9ec llama: Decouple patching script from submodule (#7139)
* Refine llama.cpp vendoring workflow tools

Switch from the sync.sh over to make based tooling

* Run new make sync and patch flow
2024-10-17 15:03:09 -07:00
Daniel Hiltgen
f86d00cd95 llama: add compiler tags for cpu features (#7137)
This adds the ability to customize the default runner with user specified flags
2024-10-17 13:43:20 -07:00
Gabe Goodhart
f2890a4494 IBM granite/granitemoe architecture support (#6760)
* fix(ext_server): Port llama.cpp sampling refactors to ext_server

This was a fairly large changeset. I closely followed the changes here:
df270ef745

Branch: IBMGraniteArchitectureSupport

Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

* fix(server.cpp): Refactor server.cpp logging for llama.cpp overhaul

Branch: IBMGraniteArchitectureSupport

Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

* feat: Bump llama.cpp to the latest master with `granite` support

This does not yet have granite MoE support, but that can come in a
follow up PR

Branch: IBMGraniteArchitectureSupport

Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

* fix(patches): Update all patches (except solar-pro) to work with bumped llama.cpp

Branch: IBMGraniteArchitectureSupport

Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

* fix(solar): Update solar patch for llama.cpp bump

Branch: IBMGraniteArchitectureSupport

Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

* feat(llama.cpp): Bump llama.cpp for granitemoe support

Branch: IBMGraniteArchitectureSupport

Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

* feat(llama.cpp): Bump llama.cpp for granitemoe support

Branch: IBMGraniteArchitectureSupport

Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

* fix(solar): Update the solar-pro patch for latest llama.cpp bump

Branch: IBMGraniteArchitectureSupport

Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

* feat(llama.cpp): Bump to the latest master of llama.cpp

Branch: IBMGraniteArchitectureSupport

Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

* fix(patches): Update all patches for latest bump

Branch: IBMGraniteArchitectureSupport

Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

* feat(llama): Always run sync.sh from the right directory

Branch: IBMGraniteArchitectureSupport

Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

* fix(llama/patches): Update llama patches

Branch: IBMGraniteArchitectureSupport

Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

* feat(llama)!: Rough sync with llama.cpp submodule

There are a number of changes that will need to be propagated to llama.go
before any of this works!

Branch: IBMGraniteArchitectureSupport

Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

* fix(llama/patches): Add a patch and update for missing ggml-impl.h include

This include is where the ggml_cgraph struct is defined. It is included in
many of the .c files to define the forward declartion in ggml.h. It seems
that with the subset of code included here, the import was somehow lost (or
out-of-order) when building, so adding this include to llama.cpp fixes the
missing definition.

Branch: IBMGraniteArchitectureSupport

Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

* fix(llama/sync): Add missing ggml-cpu-impl.h copy-over in sync.sh

Branch: IBMGraniteArchitectureSupport

Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

* fix(llama): Add missing log.cpp

This was added as part of the logging overhaul done in llama.cpp

Branch: IBMGraniteArchitectureSupport

Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

* fix(llama): Overhaul use of sampling module for llama.cpp changes

The changes here reflect the changes made in the big llama.cpp sampling PR
https://github.com/ggerganov/llama.cpp/pull/9294

The sampling functionality is now broken into the base interface
(llama_sampler) and the generation implementation (gpt_sampler). The
changes here reflect that. Since the sampling.h/sampling.cpp code uses c++
STL headers, the sampling_ext.[h|cpp] wrapper is maintained to allow go to
access a pure-C interface.

Branch: IBMGraniteArchitectureSupport

Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

* fix(llama): Fix the impl of SampleTokenGreedy for new sampling

I don't think this method is currently used, so it could probably just be
removed so that all sampling goes through the GPT interface, but in the
interest of doing no harm, this should keep the method working as expected.

Branch: IBMGraniteArchitectureSupport

* fix(llama): Remove unused SampleTokenGreedy

Branch: IBMGraniteArchitectureSupport

Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

* fix(sync): Remove bash-specific change to sync.sh

Branch: IBMGraniteArchitectureSupport

Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

* chore(gofumpt): Format on llama.go to pass linting

Branch: IBMGraniteArchitectureSupport

Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

* fix(llm): Fix missing <thread> include in ext_server

Branch: IBMGraniteArchitectureSupport

Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

* fix(llama): Remove TODO about grammar_first

This feature was not used/needed previously so should be fine without
plumbing it through now.

Branch: IBMGraniteArchitectureSupport

Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

* fix(llama): Better naming for sampling wrapper and args

Branch: IBMGraniteArchitectureSupport

Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

* fix(llama): Fix patch 05 to use new wrapper api and re-sync

Branch: IBMGraniteArchitectureSupport

Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

* runner: Flush pending responses before returning

If there are any pending reponses (such as from potential stop
tokens) then we should send them back before ending the sequence.
Otherwise, we can be missing tokens at the end of a response.

Fixes #6707

* fix(llama/sampling): Use gpt_sampler with a forward declaration

Branch: IBMGraniteArchitectureSupport

Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

* fix(llama): Remove unnecessary patch for gguf impl header

This was caused by an earlier mistake in the embeddings patch that was
dereferencing the pointer instead of using the wrapper API.

Branch: IBMGraniteArchitectureSupport

Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

* fix(llm): Remove use of deprecated --log-disable flag

Branch: IBMGraniteArchitectureSupport

Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

---------

Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
2024-10-17 11:59:52 -07:00
Daniel Hiltgen
05cd82ef94 Rename gpu package discover (#7143)
Cleaning up go package naming
2024-10-16 17:45:00 -07:00
Daniel Hiltgen
7d6eb0d4c3 Move macos v11 support flags to build script (#7203)
Having v11 support hard-coded into the cgo settings causes warnings
for newer Xcode versions.  This should help keep the build clean for users
building from source with the latest tools, while still allow us to target
the older OS via our CI processes.
2024-10-16 12:49:46 -07:00
Daniel Hiltgen
24636dfa87 Discovery CPU details for default thread selection (#6264)
On windows, detect large multi-socket systems and reduce to the number of cores
in one socket for best performance
2024-10-15 11:36:08 -07:00
JHubi1
1d7fa3ad2d Adding 'Ollama App' as community integrations (#6465) 2024-10-15 09:57:32 -07:00
frob
09035b71cd Add missing BF16 tensor type. (#7193)
Co-authored-by: Richard Lyons <frob@cloudstaff.com>
2024-10-14 17:06:35 -07:00
Daniel Hiltgen
f3c8b898cd Track GPU discovery failure information (#5820)
* Expose GPU discovery failure information

* Remove exposed API for now
2024-10-14 16:26:45 -07:00
Daniel Hiltgen
5dd0477fd4 Fix regression on older macos versions (#7192)
The new cgo compilation requires a flag to target older macos versions
2024-10-13 10:47:42 -07:00
Daniel Hiltgen
c3d321d405 llm: Remove GGML_CUDA_NO_PEER_COPY for ROCm (#7174)
This workaround logic in llama.cpp is causing crashes for users with less system memory than VRAM.
2024-10-12 09:56:49 -07:00
Jesse Gross
7fe3902552 cli: Send all images in conversation history
Currently the CLI only sends images from the most recent image-
containing message. This prevents doing things like sending
one message with an image and then a follow message with a
second image and asking for comparision based on additional
information not present in any text that was output.

It's possible that some models have a problem with this but the
CLI is not the right place to do this since any adjustments are
model-specific and should affect all clients.

Both llava:34b and minicpm-v do reasonable things with multiple
images in the history.
2024-10-10 11:21:51 -07:00
Jesse Gross
0077e22d52 runner.go: Handle truncation of tokens for stop sequences
When a single token contains both text to be return and a stop
sequence, this causes an out of bounds error when we update the
cache to match our text. This is because we currently assume that
the removing the stop sequence will consume at least one token.

This also inverts the logic to deal with positive numbers, rather
than a value to be subtracted, which is easier to reason about.

Fixes #7153
2024-10-09 20:39:04 -07:00
Jesse Gross
03408f3437 server: Don't clear cmd when closing a server
Close can be called on an LLM server if the runner subprocess dies.
However, the Ollama scheduler code may not know about this yet and
still try to access it. In this case, it is important that 'cmd'
is still available as it is used to check on the status of the
subprocess. If this happens, Kill may be called twice on the subprocess -
that is fine.

In addition, model unloading may race with new accesses, so we should
hold a lock around this. This may result in the model being reloaded
after the first close call - this is also fine as close will be called
again later.
2024-10-09 20:39:04 -07:00
Daniel Hiltgen
cd7e01e8b9 fix vendoring attribute for metal (#7156)
Add missing metal files to vendoring list
2024-10-09 15:22:36 -07:00
Daniel Hiltgen
7a962bd802 fix vendoring attribute (#7155)
Expand out the file extensions for vendored code so git reports the
status correctly
2024-10-09 14:21:02 -07:00
Daniel Hiltgen
f9584deba5 Fix build leakages (#7141)
The recent change to applying patches leaves the submodule dirty based on
"new commits" being present.  This ensures we clean up so the tree no longer
reports dirty after a `go generate ./...` run.

The Makefile was being a bit too aggressive in cleaning things up and would result in deleting the placeholder files which someone might accidentally commit.
2024-10-08 13:04:59 -07:00
Jeffrey Morgan
96efd9052f Re-introduce the llama package (#5034)
* Re-introduce the llama package

This PR brings back the llama package, making it possible to call llama.cpp and
ggml APIs from Go directly via CGo. This has a few advantages:

- C APIs can be called directly from Go without needing to use the previous
  "server" REST API
- On macOS and for CPU builds on Linux and Windows, Ollama can be built without
  a go generate ./... step, making it easy to get up and running to hack on
  parts of Ollama that don't require fast inference
- Faster build times for AVX,AVX2,CUDA and ROCM (a full build of all runners
  takes <5 min on a fast CPU)
- No git submodule making it easier to clone and build from source

This is a big PR, but much of it is vendor code except for:

- llama.go CGo bindings
- example/: a simple example of running inference
- runner/: a subprocess server designed to replace the llm/ext_server package
- Makefile an as minimal as possible Makefile to build the runner package for
  different targets (cpu, avx, avx2, cuda, rocm)

Co-authored-by: Jesse Gross <jesse@ollama.com>
Co-authored-by: Daniel Hiltgen <daniel@ollama.com>

* cache: Clear old KV cache entries when evicting a slot

When forking a cache entry, if no empty slots are available we
evict the least recently used one and copy over the KV entries
from the closest match. However, this copy does not overwrite
existing values but only adds new ones. Therefore, we need to
clear the old slot first.

This change fixes two issues:
 - The KV cache fills up and runs out of space even though we think
   we are managing it correctly
 - Performance gets worse over time as we use new cache entries that
   are not hot in the processor caches

* doc: explain golang objc linker warning (#6830)

* llama: gather transitive dependencies for rocm for dist packaging (#6848)

* Refine go server makefiles to be more DRY (#6924)

This breaks up the monolithic Makefile for the Go based runners into a
set of utility files as well as recursive Makefiles for the runners.
Files starting with the name "Makefile" are buildable, while files that
end with ".make" are utilities to include in other Makefiles.  This
reduces the amount of nearly identical targets and helps set a pattern
for future community contributions for new GPU runner architectures.

When we are ready to switch over to the Go runners, these files should
move to the top of the repo, and we should add targets for the main CLI,
as well as a helper "install" (put all the built binaries on the local
system in a runnable state) and "dist" target (generate the various
tar/zip files for distribution) for local developer use.

* llama: don't create extraneous directories (#6988)

* llama: Exercise the new build in CI (#6989)

Wire up some basic sanity testing in CI for the Go runner.  GPU runners are not covered yet.

* llama: Refine developer docs for Go server (#6842)

This enhances the documentation for development focusing on the new Go
server.  After we complete the transition further doc refinements
can remove the "transition" discussion.

* runner.go: Allocate batches for all sequences during init

We should tell the model that we could have full batches for all
sequences. We already do this when we allocate the batches but it was
missed during initialization.

* llama.go: Don't return nil from Tokenize on zero length input

Potentially receiving nil in a non-error condition is surprising to
most callers - it's better to return an empty slice.

* runner.go: Remove stop tokens from cache

If the last token is EOG then we don't return this and it isn't
present in the cache (because it was never submitted to Decode).
This works well for extending the cache entry with a new sequence.

However, for multi-token stop sequences, we won't return any of the
tokens but all but the last one will be in the cache. This means
when the conversation continues the cache will contain tokens that
don't overlap with the new prompt.

This works (we will pick up the portion where there is overlap) but
it causes unnecessary cache thrashing because we will fork the original
cache entry as it is not a perfect match.

By trimming the cache to the tokens that we actually return this
issue can be avoided.

* runner.go: Simplify flushing of pending tokens

* runner.go: Update TODOs

* runner.go: Don't panic when processing sequences

If there is an error processing a sequence, we should return a
clean HTTP error back to Ollama rather than panicing. This will
make us more resilient to transient failures.

Panics can still occur during startup as there is no way to serve
requests if that fails.

Co-authored-by: jmorganca <jmorganca@gmail.com>

* runner.go: More accurately capture timings

Currently prompt processing time doesn't capture the that it takes
to tokenize the input, only decoding time. We should capture the
full process to more accurately reflect reality. This is especially
true once we start processing images where the initial processing
can take significant time. This is also more consistent with the
existing C++ runner.

* runner.go: Support for vision models

In addition to bringing feature parity with the C++ runner, this also
incorporates several improvements:
 - Cache prompting works with images, avoiding the need to re-decode
   embeddings for every message in a conversation
 - Parallelism is supported, avoiding the need to restrict to one
   sequence at a time. (Though for now Ollama will not schedule
   them while we might need to fall back to the old runner.)

Co-authored-by: jmorganca <jmorganca@gmail.com>

* runner.go: Move Unicode checking code and add tests

* runner.go: Export external cache members

Runner and cache are in the same package so the change doesn't
affect anything but it is more internally consistent.

* runner.go: Image embedding cache

Generating embeddings from images can take significant time (on
my machine between 100ms and 8s depending on the model). Although
we already cache the result of decoding these images, the embeddings
need to be regenerated every time. This is not necessary if we get
the same image over and over again, for example, during a conversation.

This currently uses a very small cache with a very simple algorithm
but it is easy to improve as is warranted.

* llama: catch up on patches

Carry forward solar-pro and cli-unicode patches

* runner.go: Don't re-allocate memory for every batch

We can reuse memory allocated from batch to batch since batch
size is fixed. This both saves the cost of reallocation as well
keeps the cache lines hot.

This results in a roughly 1% performance improvement for token
generation with Nvidia GPUs on Linux.

* runner.go: Default to classic input cache policy

The input cache as part of the go runner implemented a cache
policy that aims to maximize hit rate in both single and multi-
user scenarios. When there is a cache hit, the response is
very fast.

However, performance is actually slower when there is an input
cache miss due to worse GPU VRAM locality. This means that
performance is generally better overall for multi-user scenarios
(better input cache hit rate, locality was relatively poor already).
But worse for single users (input cache hit rate is about the same,
locality is now worse).

This defaults the policy back to the old one to avoid a regression
but keeps the new one available through an environment variable
OLLAMA_MULTIUSER_CACHE. This is left undocumented as the goal is
to improve this in the future to get the best of both worlds
without user configuration.

For inputs that result in cache misses, on Nvidia/Linux this
change improves performance by 31% for prompt processing and
13% for token generation.

* runner.go: Increase size of response channel

Generally the CPU can easily keep up with handling reponses that
are generated but there's no reason not to let generation continue
and handle things in larger batches if needed.

* llama: Add CI to verify all vendored changes have patches (#7066)

Make sure we don't accidentally merge changes in the vendored code
that aren't also reflected in the patches.

* llama: adjust clip patch for mingw utf-16 (#7065)

* llama: adjust clip patch for mingw utf-16

* llama: ensure static linking of runtime libs

Avoid runtime dependencies on non-standard libraries

* runner.go: Enable llamafile (all platforms) and BLAS (Mac OS)

These are two features that are shown on llama.cpp's system info
that are currently different between the two runners. On my test
systems the performance difference is very small to negligible
but it is probably still good to equalize the features.

* llm: Don't add BOS/EOS for tokenize requests

This is consistent with what server.cpp currently does. It affects
things like token processing counts for embedding requests.

* runner.go: Don't cache prompts for embeddings

Our integration with server.cpp implicitly disables prompt caching
because it is not part of the JSON object being parsed, this makes
the Go runner behavior similarly.

Prompt caching has been seen to affect the results of text completions
on certain hardware. The results are not wrong either way but they
are non-deterministic. However, embeddings seem to be affected even
on hardware that does not show this behavior for completions. For
now, it is best to maintain consistency with the existing behavior.

* runner.go: Adjust debug log levels

Add system info printed at startup and quiet down noisier logging.

* llama: fix compiler flag differences (#7082)

Adjust the flags for the new Go server to more closely match the
generate flow

* llama: refine developer docs (#7121)

* llama: doc and example clean up (#7122)

* llama: doc and example clean up

* llama: Move new dockerfile into llama dir

Temporary home until we fully transition to the Go server

* llama: runner doc cleanup

* llama.go: Add description for Tokenize error case

---------

Co-authored-by: Jesse Gross <jesse@ollama.com>
Co-authored-by: Daniel Hiltgen <daniel@ollama.com>
Co-authored-by: Daniel Hiltgen <dhiltgen@users.noreply.github.com>
2024-10-08 08:53:54 -07:00
Shifra Goldstone
de982616f1 readme: replace stale links to LangChain documentation (#7117) 2024-10-07 21:16:56 -04:00
hidden1nin
defbf9425a readme: add G1 to list of community integrations (#7096) 2024-10-05 11:57:53 -07:00
Alex Mavrogiannis
f40bb398f6 Stop model before deletion if loaded (fixed #6957) (#7050) 2024-10-01 15:45:43 -07:00
zmldndx
79d3b1e2bd readme: add ARGO LLM tool to community integrations (#7027) 2024-09-29 13:01:01 -07:00
Blake Mizerany
03608cb46e server: close response body on error (#6986)
This change closes the response body when an error occurs in
makeRequestWithRetry. Previously, the first, non-200 response body was
not closed before reattempting the request. This change ensures that
the response body is closed in all cases where an error occurs,
preventing leaks of file descriptors.

Fixes #6974
2024-09-26 12:00:31 -07:00
Xe Iaso
450acb71a6 readme: fix llama3.1 -> llama3.2 typo (#6962) 2024-09-25 11:53:47 -07:00
Jeffrey Morgan
55ea963c9e update default model to llama3.2 (#6959) 2024-09-25 11:11:22 -07:00
Daniel Hiltgen
e9e9bdb8d9 CI: Fix win arm version defect (#6940)
write-host in powershell writes directly to the console and will not be picked
up by a pipe.  Echo, or write-output will.
2024-09-24 15:18:10 -07:00
Alex Yang
35bb6d32b3 readme: update llamaindex links (#6939) 2024-09-24 12:15:43 -07:00
Deep Lakhani
98701b58b3 readme: add LLMChat to community integrations (#6919) 2024-09-23 17:49:46 -07:00
Mahesh Sathiamoorthy
ad935f45ac examples: use punkt_tab instead of punkt (#6907)
This was causing an error since we depend on punkt_tab.
2024-09-21 18:55:28 -07:00
Daniel Hiltgen
dbba73469d runner: Set windows above normal priority (#6905)
When running the subprocess as a background service windows may
throttle, which can lead to thrashing and very poor token rate.
2024-09-21 16:54:49 -07:00
Daniel Hiltgen
6c2eb73a70 Fix missing dep path on windows CPU runners (#6884)
GPUs handled the dependency path properly, but CPU runners didn't which
results in missing vc redist libraries on systems where the user didn't
already have it installed from some other app.
2024-09-21 16:28:29 -07:00
Daniel Hiltgen
2a038c1d7e CI: win arm artifact dist dir (#6900)
The upload artifact is missing the dist prefix since all
payloads are in the same directory, so restore the prefix
on download.
2024-09-20 19:16:18 -07:00
Daniel Hiltgen
616c5eafee CI: win arm adjustments (#6898) 2024-09-20 16:58:56 -07:00
Daniel Hiltgen
f5ff917b1d CI: adjust step ordering for win arm to match x64 (#6895) 2024-09-20 14:20:57 -07:00
Daniel Hiltgen
d632e23fba Add Windows arm64 support to official builds (#5712)
* Unified arm/x86 windows installer

This adjusts the installer payloads to be architecture aware so we can cary
both amd64 and arm64 binaries in the installer, and install only the applicable
architecture at install time.

* Include arm64 in official windows build

* Harden schedule test for slow windows timers

This test seems to be a bit flaky on windows, so give it more time to converge
2024-09-20 13:09:38 -07:00
Patrick Devine
5804cf1723 documentation for stopping a model (#6766) 2024-09-18 16:26:42 -07:00
Ryan Marten
bf7ee0f4d4 examples: add python examples for bespoke-minicheck (#6841) 2024-09-18 09:35:25 -07:00
Michael Yang
504a410f02 llm: add solar pro (preview) (#6846) 2024-09-17 18:11:26 -07:00
Jeffrey Morgan
d05da29912 server: add tool parsing support for nemotron-mini (#6849) 2024-09-17 18:06:16 -07:00
Michael Yang
72962c6e08 Merge pull request #6833 from ollama/mxyng/git-am
make patches git am-able
2024-09-17 16:33:23 -07:00
Michael Yang
7bd7b02712 make patches git am-able
raw diffs can be applied using `git apply` but not with `git am`. git
patches, e.g. through `git format-patch` are both apply-able and am-able
2024-09-17 15:26:40 -07:00
Daniel Hiltgen
8f9ab5e14d CI: dist directories no longer present (#6834)
The new buildx based build no longer leaves the dist/linux-* directories
around, so we don't have to clean them up before uploading.
2024-09-16 17:31:37 -07:00
Daniel Hiltgen
7717bb6a84 CI: clean up naming, fix tagging latest (#6832)
The rocm CI step for RCs was incorrectly tagging them as the latest rocm build.
The multiarch manifest was incorrectly tagged twice (with and without the
prefix "v").  Static windows artifacts weren't being carried between build
jobs.  This also fixes the latest tagging script.
2024-09-16 16:18:41 -07:00
Daniel Hiltgen
0ec2915ea7 CI: set platform build build_linux script to keep buildx happy (#6829)
The runners don't have emulation set up so the default multi-platform build
wont work.
2024-09-16 14:07:29 -07:00
Michael Yang
c9a7541b9c readme: add Agents-Flex to community integrations (#6788) 2024-09-16 13:42:52 -07:00
Patrick Devine
d81cfd7d6f fix typo in import docs (#6828) 2024-09-16 11:48:14 -07:00
Pepo
b330c830d3 readme: add vim-intelligence-bridge to Terminal section (#6818) 2024-09-15 21:20:36 -04:00
Edward Cui
d889c6fd07 readme: add Obsidian Quiz Generator plugin to community integrations (#6789) 2024-09-14 23:52:37 -04:00
Daniel Hiltgen
56b9af336a Fix incremental builds on linux (#6780)
scripts: fix incremental builds on linux or similar
2024-09-13 08:24:08 -07:00
Daniel Hiltgen
fda0d3be52 Use GOARCH for build dirs (#6779)
Corrects x86_64 vs amd64 discrepancy
2024-09-12 16:38:05 -07:00
Daniel Hiltgen
cd5c8f6471 Optimize container images for startup (#6547)
* Optimize container images for startup

This change adjusts how to handle runner payloads to support
container builds where we keep them extracted in the filesystem.
This makes it easier to optimize the cpu/cuda vs cpu/rocm images for
size, and should result in faster startup times for container images.

* Refactor payload logic and add buildx support for faster builds

* Move payloads around

* Review comments

* Converge to buildx based helper scripts

* Use docker buildx action for release
2024-09-12 12:10:30 -07:00
dcasota
fef257c5c5 examples: updated requirements.txt for privategpt example 2024-09-11 18:56:56 -07:00
Adrian Cole
d066d9b8e0 examples: polish loganalyzer example (#6744) 2024-09-11 18:37:37 -07:00
RAPID ARCHITECT
5a00dc9fc9 readme: add ollama_moe to community integrations (#6752) 2024-09-11 18:36:26 -07:00
Jesse Gross
c354e87809 Merge pull request #6767 from ollama/jessegross/bug_6707
runner: Flush pending responses before returning
2024-09-11 17:20:22 -07:00
Jesse Gross
93ac3760cb runner: Flush pending responses before returning
If there are any pending reponses (such as from potential stop
tokens) then we should send them back before ending the sequence.
Otherwise, we can be missing tokens at the end of a response.

Fixes #6707
2024-09-11 16:39:32 -07:00
Patrick Devine
abed273de3 add "stop" command (#6739) 2024-09-11 16:36:21 -07:00
Michael Yang
034392624c Merge pull request #6762 from ollama/mxyng/show-output
refactor show ouput
2024-09-11 14:58:40 -07:00
Michael Yang
ecab6f1cc5 refactor show ouput
fixes line wrapping on long texts
2024-09-11 14:23:09 -07:00
Petr Mironychev
7d6900827d readme: add QodeAssist to community integrations (#6754) 2024-09-11 13:19:49 -07:00
Daniel Hiltgen
9246e6dd15 Verify permissions for AMD GPU (#6736)
This adds back a check which was lost many releases back to verify /dev/kfd permissions
which when lacking, can lead to confusing failure modes of:
  "rocBLAS error: Could not initialize Tensile host: No devices found"

This implementation does not hard fail the serve command but instead will fall back to CPU
with an error log.  In the future we can include this in the GPU discovery UX to show
detected but unsupported devices we discovered.
2024-09-11 11:38:25 -07:00
Michael Yang
735a0ca2e4 Merge pull request #6732 from ollama/mxyng/debug-proxy
add *_proxy to env map for debugging
2024-09-10 16:13:25 -07:00
Michael Yang
dddb72e084 add *_proxy for debugging 2024-09-10 09:43:35 -07:00
Jeffrey Morgan
83a9b5271a docs: update examples to use llama3.1 (#6718) 2024-09-09 22:47:16 -07:00
Daniel Hiltgen
4a8069f9c4 Quiet down dockers new lint warnings (#6716)
* Quiet down dockers new lint warnings

Docker has recently added lint warnings to build.  This cleans up those warnings.

* Fix go lint regression
2024-09-09 17:22:20 -07:00
Patrick Devine
84b84ce2db catch when model vocab size is set correctly (#6714) 2024-09-09 17:18:54 -07:00
Jeffrey Morgan
bb6a086d63 readme: add crewAI to community integrations (#6699) 2024-09-08 00:36:24 -07:00
RAPID ARCHITECT
30c8f201cc readme: add crewAI with mesop to community integrations 2024-09-08 00:35:59 -07:00
frob
06d4fba851 openai: align chat temperature and frequency_penalty options with completion (#6688) 2024-09-07 09:08:08 -07:00
Jeffrey Morgan
108fb6c1d1 docs: improve linux install documentation (#6683)
Includes small improvements to document layout and code blocks
2024-09-06 22:05:37 -07:00
Yaroslav
da915345d1 openai: don't scale temperature or frequency_penalty (#6514) 2024-09-06 17:45:45 -07:00
nickthecook
8a027bc401 readme: add Archyve to community integrations (#6680) 2024-09-06 14:06:01 -07:00
imoize
5446903fbd readme: add Plasmoid Ollama Control to community integrations (#6681) 2024-09-06 14:04:12 -07:00
Daniel Hiltgen
56318fb365 Improve logging on GPU too small (#6666)
When we determine a GPU is too small for any layers, it's not always clear why.
This will help troubleshoot those scenarios.
2024-09-06 08:29:36 -07:00
frob
fe91d7fff1 openai: fix "presence_penalty" typo and add test (#6665) 2024-09-06 01:16:28 -07:00
Patrick Devine
608e87bf87 Fix gemma2 2b conversion (#6645) 2024-09-05 17:02:28 -07:00
Daniel Hiltgen
48685c6ed0 Document uninstall on windows (#6663) 2024-09-05 15:57:38 -07:00
Daniel Hiltgen
9565fa64a8 Revert "Detect running in a container (#6495)" (#6662)
This reverts commit a60d9b89ce.
2024-09-05 14:26:00 -07:00
Daniel Hiltgen
6719097649 llm: make load time stall duration configurable via OLLAMA_LOAD_TIMEOUT
With the new very large parameter models, some users are willing to wait for
a very long time for models to load.
2024-09-05 14:00:08 -07:00
Daniel Hiltgen
b05c9e83d9 Introduce GPU Overhead env var (#5922)
Provide a mechanism for users to set aside an amount of VRAM on each GPU
to make room for other applications they want to start after Ollama, or workaround
memory prediction bugs
2024-09-05 13:46:35 -07:00
Daniel Hiltgen
a60d9b89ce Detect running in a container (#6495) 2024-09-05 13:24:51 -07:00
Michael Yang
bf612cd608 Merge pull request #6260 from ollama/mxyng/mem
llama3.1 memory
2024-09-05 13:22:08 -07:00
Zeyo
ef98e56122 readme: add AiLama to the list of community integrations (#4957) 2024-09-05 13:10:44 -07:00
Michael
5f944baac7 Update gpu.md: Add RTX 3050 Ti and RTX 3050 Ti (#5888)
* Update gpu.md

    Seems strange that the laptop versions of 3050 and 3050 Ti would be supported but not the non-notebook, but this is what the page (https://developer.nvidia.com/cuda-gpus) says.

Signed-off-by: bean5 <2052646+bean5@users.noreply.github.com>

* Update gpu.md

Remove notebook reference

---------

Signed-off-by: bean5 <2052646+bean5@users.noreply.github.com>
2024-09-05 11:24:26 -07:00
Tobias Heinze
6fc9d22707 server: fix blob download when receiving a 200 response (#6656) 2024-09-05 10:48:26 -07:00
Vitaly Zdanevich
f27c00d8c5 readme: add Gentoo package manager entry to community integrations (#5714) 2024-09-05 09:58:14 -07:00
王卿
c7c845ec52 Update install.sh:Replace "command -v" with encapsulated functionality (#6035)
Replace "command -v" with encapsulated functionality
2024-09-05 09:49:48 -07:00
Augustinas Malinauskas
cf48603943 readme: include Enchanted for Apple Vision Pro (#4949)
Added Enchanted with Apple Vision Pro support
2024-09-05 01:30:19 -04:00
Silas Marvin
6e67be09b6 readme: add lsp-ai to community integrations (#5063) 2024-09-05 01:17:34 -04:00
Arda Günsüren
0f5f060d2b readme: add ollama-php library to community integrations (#6361) 2024-09-05 01:01:14 -04:00
jk011ru
b3554778bd readme: add vnc-lm discord bot community integration (#6644) 2024-09-04 19:46:02 -04:00
Pascal Patry
bbe7b96ded llm: use json.hpp from common (#6642) 2024-09-04 19:34:42 -04:00
Rune Berg
c18ff18b2c readme: add confichat to community integrations (#6378) 2024-09-04 17:26:02 -04:00
Tomoya Fujita
133770a548 docs: add group to manual Linux isntructions and verify service is running (#6430) 2024-09-04 14:45:09 -04:00
Teïlo M
f36ebfb478 readme: add gollm to the list of community libraries (#6099) 2024-09-04 14:19:41 -04:00
亢奋猫
5b55379651 readme: add Cherry Studio to community integrations (#6633) 2024-09-04 10:53:36 -04:00
Mitar
93eb43d020 readme: add Go fun package (#6421) 2024-09-04 10:52:46 -04:00
Carter
369479cc30 docs: fix spelling error (#6391)
change "dorrect" to "correct"
2024-09-04 09:42:33 -04:00
Erkin Alp Güney
7d89e48f5c install.sh: update instructions to use WSL2 (#6450) 2024-09-04 09:34:53 -04:00
Sam
27bcce6d9f readme: add claude-dev to community integrations (#6630) 2024-09-04 09:32:26 -04:00
Viz
491fc312ae readme: add PyOllaMx project (#6624) 2024-09-03 23:10:53 -04:00
Jeffrey Morgan
5e2653f9fe llm: update llama.cpp commit to 8962422 (#6618) 2024-09-03 21:12:39 -04:00
Daniel Hiltgen
f29b167e1a Use cuda v11 for driver 525 and older (#6620)
It looks like driver 525 (aka, cuda driver 12.0) has problems with the cuda v12 library
we compile against, so run v11 on those older drivers if detected.
2024-09-03 17:15:31 -07:00
Daniel Hiltgen
037a4d103e Log system memory at info (#6617)
On systems with low system memory, we can hit allocation failures that are difficult to diagnose
without debug logs.  This will make it easier to spot.
2024-09-03 14:55:20 -07:00
Mateusz Migas
50c05d57e0 readme: add Painting Droid community integration (#5514) 2024-09-03 16:15:54 -04:00
Amith Koujalgi
35159de18a readme: update Ollama4j link and add link to Ollama4j Web UI (#6608) 2024-09-03 16:08:50 -04:00
FellowTraveler
94fff5805f Fix sprintf to snprintf (#5664)
/Users/au/src/ollama/llm/ext_server/server.cpp:289:9: warning: 'sprintf' is deprecated: This function is provided for compatibility reasons only. Due to security concerns inherent in the design of sprintf(3), it is highly recommended that you use snprintf(3) instead.
2024-09-03 09:32:59 -07:00
OpenVMP
14d5093cd0 readme: add PartCAD tool to readme for generating 3D CAD models using Ollama (#6605) 2024-09-03 12:28:01 -04:00
R0CKSTAR
9df5f0e8e4 Reduce docker image size (#5847)
Signed-off-by: Xiaodong Ye <yeahdongcn@gmail.com>
2024-09-03 09:25:31 -07:00
presbrey
ad3eb00bee readme: add OllamaFarm project (#6508) 2024-09-02 16:05:36 -04:00
Jonathan Hecl
bfc2d61549 readme: add go-crew and Ollamaclient projects (#6583) 2024-09-02 15:34:26 -04:00
SnoopyTlion
741affdfd6 docs: update faq.md for OLLAMA_MODELS env var permissions (#6587) 2024-09-02 15:31:29 -04:00
Vimal Kumar
5f7b4a5e30 fix(cmd): show info may have nil ModelInfo (#6579) 2024-08-31 21:12:17 -07:00
rayfiyo
1aad838707 docs: update GGUF examples and references (#6577) 2024-08-31 19:34:25 -07:00
Daniel Hiltgen
a1cef4d0a5 Add findutils to base images (#6581)
This caused missing internal files
2024-08-31 10:40:05 -07:00
Michael Yang
c41f0b9e6c Merge pull request #6562 from ollama/mxyng/build-artifacts
remove any unneeded build artifacts
2024-08-30 09:40:50 -07:00
Michael Yang
142cbb722d Merge pull request #6482 from ollama/mxyng/client-path
passthrough OLLAMA_HOST path to client
2024-08-30 09:40:34 -07:00
Michael Yang
9468c6824a Merge pull request #6534 from ollama/mxyng/messages
update templates to use messages
2024-08-30 09:39:59 -07:00
Michael Yang
11018196e0 remove any unneeded build artifacts 2024-08-29 13:40:47 -07:00
Bryan Honof
56346ccfa3 doc: Add Nix and Flox to package manager listing (#6074) 2024-08-29 12:45:35 -04:00
Patrick Devine
8e4e509fa4 update the openai docs to explain how to set the context size (#6548) 2024-08-28 17:11:46 -07:00
Michael Yang
47c2b947a9 Merge pull request #6546 from ollama/mxyng/fix-test
fix(test): do not clobber models directory
2024-08-28 15:37:47 -07:00
Michael Yang
5eb77bf976 Merge pull request #6539 from ollama/mxyng/validate-modelpath
fix: validate modelpath
2024-08-28 14:38:27 -07:00
Michael Yang
e4d0a9c325 fix(test): do not clobber models directory 2024-08-28 14:07:48 -07:00
Patrick Devine
7416ced70f add llama3.1 chat template (#6545) 2024-08-28 14:03:20 -07:00
Michael Yang
9cfd2dd3e3 Merge pull request #6522 from ollama/mxyng/detect-chat
detect chat template from configs that contain lists
2024-08-28 11:04:18 -07:00
Michael Yang
8e6da3cbc5 update deprecated warnings 2024-08-28 09:55:11 -07:00
Michael Yang
d9d50c43cc validate model path 2024-08-28 09:32:57 -07:00
Patrick Devine
6c1c1ad6a9 throw an error when encountering unsupport tensor sizes (#6538) 2024-08-27 17:54:04 -07:00
Daniel Hiltgen
93ea9240ae Move ollama executable out of bin dir (#6535) 2024-08-27 16:19:00 -07:00
Michael Yang
413ae39f3c update templates to use messages 2024-08-27 15:44:04 -07:00
Michael Yang
60e47573a6 more tokenizer tests 2024-08-27 14:51:10 -07:00
Patrick Devine
d13c3daa0b add safetensors to the modelfile docs (#6532) 2024-08-27 14:46:47 -07:00
Patrick Devine
1713eddcd0 Fix import image width (#6528) 2024-08-27 14:19:47 -07:00
Daniel Hiltgen
4e1c4f6e0b Update manual instructions with discrete ROCm bundle (#6445) 2024-08-27 13:42:28 -07:00
Sean Khatiri
397cae7962 llm: fix typo in comment (#6530) 2024-08-27 13:28:29 -07:00
Patrick Devine
1c70a00f71 adjust image sizes 2024-08-27 11:15:25 -07:00
Michael Yang
eae3af6807 clean up convert tokenizer 2024-08-27 11:11:43 -07:00
Michael Yang
3eb08377f8 detect chat template from configs that contain lists 2024-08-27 10:49:33 -07:00
Patrick Devine
ac80010db8 update the import docs (#6104) 2024-08-26 19:57:26 -07:00
Jeffrey Morgan
47fa0839b9 server: clean up route names for consistency (#6524) 2024-08-26 19:36:11 -07:00
Daniel Hiltgen
0f92b19bec Only enable numa on CPUs (#6484)
The numa flag may be having a performance impact on multi-socket systems with GPU loads
2024-08-24 17:24:50 -07:00
Daniel Hiltgen
69be940bf6 gpu: Group GPU Library sets by variant (#6483)
The recent cuda variant changes uncovered a bug in ByLibrary
which failed to group by common variant for GPU types.
2024-08-23 15:11:56 -07:00
Michael Yang
9638c24c58 Merge pull request #5446 from ollama/mxyng/faq
update faq
2024-08-23 14:05:59 -07:00
Michael Yang
bb362caf88 update faq 2024-08-23 13:37:21 -07:00
Michael Yang
386af6c1a0 passthrough OLLAMA_HOST path to client 2024-08-23 13:23:28 -07:00
Patrick Devine
0c819e167b convert safetensor adapters into GGUF (#6327) 2024-08-23 11:29:56 -07:00
Daniel Hiltgen
7a1e1c1caf gpu: Ensure driver version set before variant (#6480)
During rebasing, the ordering was inverted causing the cuda version
selection logic to break, with driver version being evaluated as zero
incorrectly causing a downgrade to v11.
2024-08-23 11:21:12 -07:00
Daniel Hiltgen
0b03b9c32f llm: Align cmake define for cuda no peer copy (#6455)
Define changed recently and this slipped through the cracks with the old
name.
2024-08-23 11:20:39 -07:00
Daniel Hiltgen
90ca84172c Fix embeddings memory corruption (#6467)
* Fix embeddings memory corruption

The patch was leading to a buffer overrun corruption.  Once removed though, parallism
in server.cpp lead to hitting an assert due to slot/seq IDs being >= token count.  To
work around this, only use slot 0 for embeddings.

* Fix embed integration test assumption

The token eval count has changed with recent llama.cpp bumps (0.3.5+)
2024-08-22 14:51:42 -07:00
Michael Yang
6bd8a4b0a1 Merge pull request #6064 from ollama/mxyng/convert-llama3
convert: update llama conversion for llama3.1
2024-08-21 12:57:09 -07:00
Michael Yang
77903ab8b4 llama3.1 2024-08-21 11:49:31 -07:00
Michael Yang
e22286c9e1 Merge pull request #5365 from ollama/mxyng/convert-gemma2
convert gemma2
2024-08-21 11:48:43 -07:00
Michael Yang
107f695929 Merge pull request #4917 from ollama/mxyng/convert-bert
convert bert model from safetensors
2024-08-21 11:48:29 -07:00
Michael Yang
4ecc70d3b4 Merge pull request #6386 from zwwhdls/fix-new-layer
fix: chmod new layer to 0o644 when creating it
2024-08-21 10:58:45 -07:00
Michael Yang
3546bbd08c convert gemma2 2024-08-20 17:27:51 -07:00
Michael Yang
beb49eef65 create bert models from cli 2024-08-20 17:27:34 -07:00
Michael Yang
5a28b9cf5f bert 2024-08-20 17:27:34 -07:00
Daniel Hiltgen
a017cf2fea Split rocm back out of bundle (#6432)
We're over budget for github's maximum release artifact size with rocm + 2 cuda
versions.  This splits rocm back out as a discrete artifact, but keeps the layout so it can
be extracted into the same location as the main bundle.
2024-08-20 07:26:38 -07:00
Daniel Hiltgen
19e5a890f7 CI: remove directories from dist dir before upload step (#6429) 2024-08-19 15:19:21 -07:00
Daniel Hiltgen
f91c9e3709 CI: handle directories during checksum (#6427) 2024-08-19 13:48:45 -07:00
Daniel Hiltgen
2df6905ede Merge pull request #6424 from dhiltgen/cuda_v12
Fix overlapping artifact name on CI
2024-08-19 12:11:58 -07:00
Daniel Hiltgen
d8be22e47d Fix overlapping artifact name on CI 2024-08-19 12:07:18 -07:00
Daniel Hiltgen
652c273f0e Merge pull request #5049 from dhiltgen/cuda_v12
Cuda v12
2024-08-19 11:14:24 -07:00
Daniel Hiltgen
88e7705079 Merge pull request #6402 from rick-github/numParallel
Override numParallel in pickBestPartialFitByLibrary() only if unset.
2024-08-19 11:07:22 -07:00
Daniel Hiltgen
f9e31da946 Review comments 2024-08-19 10:36:15 -07:00
Daniel Hiltgen
88bb9e3328 Adjust layout to bin+lib/ollama 2024-08-19 09:38:53 -07:00
Daniel Hiltgen
3b19cdba2a Remove Jetpack 2024-08-19 09:38:53 -07:00
Daniel Hiltgen
927d98a6cd Add windows cuda v12 + v11 support 2024-08-19 09:38:53 -07:00
Daniel Hiltgen
f6c811b320 Enable cuda v12 flags 2024-08-19 09:38:53 -07:00
Daniel Hiltgen
4fe3a556fa Add cuda v12 variant and selection logic
Based on compute capability and driver version, pick
v12 or v11 cuda variants.
2024-08-19 09:38:53 -07:00
Daniel Hiltgen
fc3b4cda89 Report GPU variant in log 2024-08-19 09:38:53 -07:00
Daniel Hiltgen
d470ebe78b Add Jetson cuda variants for arm
This adds new variants for arm64 specific to Jetson platforms
2024-08-19 09:38:53 -07:00
Daniel Hiltgen
c7bcb00319 Wire up ccache and pigz in the docker based build
This should help speed things up a little
2024-08-19 09:38:53 -07:00
Daniel Hiltgen
74d45f0102 Refactor linux packaging
This adjusts linux to follow a similar model to windows with a discrete archive
(zip/tgz) to cary the primary executable, and dependent libraries. Runners are
still carried as payloads inside the main binary

Darwin retain the payload model where the go binary is fully self contained.
2024-08-19 09:38:53 -07:00
Jeffrey Morgan
9fddef3731 server: limit upload parts to 16 (#6411) 2024-08-19 09:20:52 -07:00
Richard Lyons
885cf45087 Fix white space. 2024-08-18 03:07:16 +02:00
Richard Lyons
9352eeb752 Reset NumCtx. 2024-08-18 02:55:01 +02:00
Richard Lyons
0ad0e738cd Override numParallel only if unset. 2024-08-18 01:43:26 +02:00
zwwhdls
bdc4308afb fix: chmod new layer to 0o644 when creating it
Signed-off-by: zwwhdls <zww@hdls.me>
2024-08-16 11:43:19 +08:00
Daniel Hiltgen
d29cd4c2ed Merge pull request #6381 from eust-w/main
fix: Add tooltip to system tray icon
2024-08-15 15:31:15 -07:00
eust-w
a84c05cf91 fix: Add tooltip to system tray icon
- Updated setIcon method to include tooltip text for the system tray icon.
- Added NIF_TIP flag and set the tooltip text using UTF16 encoding.

Resolves: #6372
2024-08-16 06:00:12 +08:00
Michael Yang
e3d7f32af7 Merge pull request #6363 from ollama/mxyng/fix-noprune
fix: noprune on pull
2024-08-15 12:20:38 -07:00
Michael Yang
3a75e74e34 only skip invalid json manifests 2024-08-15 10:29:14 -07:00
Michael Yang
237dccba1e skip invalid manifest files 2024-08-14 16:55:45 -07:00
Michael Yang
b3f75fc812 fix noprune 2024-08-14 15:48:51 -07:00
Jeffrey Morgan
8200c371ae add CONTRIBUTING.md (#6349) 2024-08-14 15:19:50 -07:00
longtao
0a8d6ea86d Fix typo and improve readability (#5964)
* Fix typo and improve readability

Summary:
* Rename updatAvailableMenuID to updateAvailableMenuID
* Replace unused cmd parameter with _ in RunServer function
* Fix typos in comments

(cherry picked from commit 5b8715f0b04773369e8eb1f9e6737995a0ab3ba7)

* Update api/client.go

Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>

---------

Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>
2024-08-13 17:54:19 -07:00
Blake Mizerany
8e1050f366 server: reduce max connections used in download (#6347)
The previous value of 64 was WAY too high and unnecessary. It reached
diminishing returns and blew past it. This is a more reasonable number
for _most_ normal cases. For users on cloud servers with excellent
network quality, this will keep screaming for them, without hitting our
CDN limits. For users with relatively poor network quality, this will
keep them from saturating their network and causing other issues.
2024-08-13 16:47:35 -07:00
Bruce MacDonald
eda8a32a09 update chatml template format to latest in docs (#6344) 2024-08-13 16:39:18 -07:00
Michael Yang
a0a40aa20c Merge pull request #6346 from ollama/mxyng/lint 2024-08-13 14:58:35 -07:00
Michael Yang
2697d7f5aa lint
- fixes printf: non-constant format string in call to fmt.Printf
- fixes SA1032: arguments have the wrong order
- disables testifylint
2024-08-13 14:36:33 -07:00
Pamela Fox
1f32276178 Update openai.md to remove extra checkbox (#6345) 2024-08-13 13:36:05 -07:00
Daniel Hiltgen
4c4fe3f87f Merge pull request #6343 from dhiltgen/revert_win_go_version
Go back to a pinned Go version
2024-08-13 11:53:49 -07:00
Daniel Hiltgen
feedf49c71 Go back to a pinned Go version
Go version 1.22.6 is triggering AV false positives, so go back to 1.22.5
2024-08-13 11:45:44 -07:00
royjhan
8b00a415ab Load Embedding Model on Empty Input (#6325)
* load on empty input

* no load on invalid input
2024-08-13 10:19:56 -07:00
Michael Yang
01b80e9ffc Merge pull request #5443 from ollama/mxyng/convert-phi3
add conversion for microsoft phi 3 mini/medium 4k, 128k
2024-08-12 15:47:58 -07:00
Michael Yang
bd5e432630 update import.md 2024-08-12 15:13:29 -07:00
Bruce MacDonald
aec77d6a05 support new "longrope" attention factor 2024-08-12 15:13:29 -07:00
Michael Yang
6ffb5cb017 add conversion for microsoft phi 3 mini/medium 4k, 128 2024-08-12 15:13:29 -07:00
Josh
f7e3b9190f cmd: spinner progress for transfer model data (#6100) 2024-08-12 11:46:32 -07:00
Josh
980dd15f81 cmd: speed up gguf creates (#6324) 2024-08-12 11:46:09 -07:00
royjhan
01d544d373 OpenAI: Simplify input output in testing (#5858)
* simplify input output

* direct comp

* in line image

* rm error pointer type

* update response testing

* lint
2024-08-12 10:33:34 -07:00
Josh
1dc3ef3aa9 Revert "server: speed up single gguf creates (#5898)" (#6323)
This reverts commit 8aac22438e.
2024-08-12 09:57:51 -07:00
Josh
8aac22438e server: speed up single gguf creates (#5898) 2024-08-12 09:28:55 -07:00
Jeffrey Morgan
15c2d8fe14 server: parallelize embeddings in API web handler instead of in subprocess runner (#6220)
For simplicity, perform parallelization of embedding requests in the API handler instead of offloading this to the subprocess runner. This keeps the scheduling story simpler as it builds on existing parallel requests, similar to existing text completion functionality.
2024-08-11 11:57:10 -07:00
Daniel Hiltgen
25906d72d1 llm: prevent loading too large models on windows (#5926)
Don't allow loading models that would lead to memory exhaustion (across vram, system memory and disk paging). This check was already applied on Linux but should also be applied on Windows as well.
2024-08-11 11:30:20 -07:00
CognitiveTech
023451ce47 add integration obook-summary (#6305) 2024-08-10 18:43:08 -07:00
Jesse Gross
9b53e39d8e Merge pull request #6258 from coolljt0725/fix_typo
server/download.go: Fix a typo in log
2024-08-09 17:19:48 -07:00
Michael Yang
97fae2df95 Merge pull request #6235 from Nicholas42/fix_line_endings
Set *.png and *.ico to be treated as binary files.
2024-08-09 17:06:30 -07:00
Michael Yang
160d9d4900 Merge pull request #6171 from ollama/mxyng/remove-temp
removeall to remove non-empty temp dirs
2024-08-09 15:47:13 -07:00
Nicholas Schwab
d4e6407464 Restrict text files with explicit line feeds to *.go.
This partially reverts b732beba6a. It
seems like explicitly setting all files to use line feeds was done due
to issues with the go linter, hence it can be restricted to those files
(https://github.com/ollama/ollama/pull/6235#issuecomment-2278745953).
2024-08-09 23:14:13 +02:00
Daniel Hiltgen
b7f7d8cd15 Merge pull request #6291 from dhiltgen/no_sparse_fail
Don't hard fail on sparse setup error
2024-08-09 12:30:25 -07:00
Daniel Hiltgen
2fa1db4345 Don't hard fail on sparse setup error
It seems this can fail in some casees, but proceed
with the download anyway.
2024-08-09 12:16:19 -07:00
Daniel Hiltgen
71b0945fc6 Merge pull request #6290 from dhiltgen/intel_npe
Harden intel boostrap for nil pointers
2024-08-09 12:14:42 -07:00
Daniel Hiltgen
5bca2e60a7 Harden intel boostrap for nil pointers 2024-08-09 11:31:38 -07:00
Nicholas42
67472e0e89 Also flag *.icns as binary 2024-08-09 13:41:20 +02:00
Daniel Hiltgen
e9aa5117c4 Merge pull request #6133 from dhiltgen/cuda_repo
Adjust arm cuda repo paths
2024-08-08 12:33:35 -07:00
Daniel Hiltgen
2473bdba5e Merge pull request #6182 from dhiltgen/more_patterns
Catch one more error log
2024-08-08 12:33:17 -07:00
Michael Yang
2003d60159 llama3.1 memory 2024-08-08 11:18:13 -07:00
Jesse Gross
7d1c0047fa Merge pull request #6247 from ollama/jessegross/layers
Store layers inside manifests consistently as values.
2024-08-08 10:46:43 -07:00
Jitang Lei
7b61eba471 server/download.go: Fix a typo in log
Signed-off-by: Jitang Lei <leijitang@outlook.com>
2024-08-08 20:28:01 +08:00
Jesse Gross
7edaf6e7e8 manifest: Store layers inside manifests consistently as values.
Commit 1829fb61 ("manifest: Fix crash on startup when trying to clean up
unused files (#5840)") changed the config layer stored in manifests
from a pointer to a value. This was done in order to avoid potential
nil pointer dereferences after it is deserialized from JSON in the
event that the field is missing.

This changes the Layers slice to also be stored by value. This enables
consistency in handling across the two objects.
2024-08-07 17:03:06 -07:00
Jesse Gross
97ec8cfd4e image: Clarify argument to WriteManifest is config
When creating a model the config layer is appended to the list of
layers and then the last layer is used as the config when writing the
manifest. This change directly uses the config layer to write the
manifest. There is no behavior change but it is less error prone.
2024-08-07 16:58:42 -07:00
royjhan
5b3a21b578 add metrics to docs (#6079) 2024-08-07 14:43:44 -07:00
Kyle Kelley
ad0c19dde4 Use llama3.1 in tools example (#5985)
* Use llama3.1 in tools example

* Update api.md
2024-08-07 17:20:50 -04:00
Jesse Gross
69eb06c40e Merge pull request #6145 from ollama/jessegross/bug5840
Fix crash on startup when trying to clean up unused files (#5840)
2024-08-07 11:24:15 -07:00
Jesse Gross
1829fb61bd manifest: Fix crash on startup when trying to clean up unused files (#5840)
Currently if the config field is missing in the manifest file (or
corrupted), Ollama will crash when it tries to read it. This can
happen at startup or when pulling new models.

This data is mostly just used for showing model information so we
can be tolerant of it not being present - it is not required to
run the models. Besides avoiding crashing, this also gives us the
ability to restructure the config in the future by pulling it
into the main manifest file.
2024-08-07 10:30:44 -07:00
Nicholas Schwab
ce67706037 Set *.png and *.ico to be treated as binary files.
The change b732beba6 makes all files text files and sets lf as eol. This
will automatically change all files to have lf if they are touched by
git (e.g. via git status). This change cannot be stashed and makes it
hard to work with the repo (rebase and checkout don't really work). See
also #6183.

Here, we set the offending files (*.png and *.ico, but that might be
more in the future) to be treated as binary files and not be changed by
git.
2024-08-07 18:20:11 +02:00
Jesse Gross
685a53534b manifest: Don't prune layers if we can't open a manifest file
If there is an error when opening a manifest file (corrupted, permission denied, etc.)
then the referenced layers will not be included in the list of active
layers. This causes them to be deleted when pruning happens at startup
or a model is pulled.

In such a situation, we should prefer to preserve data in the hopes that
it can be recovered rather than being agressive about deletion.
2024-08-06 23:11:19 -07:00
Jeffrey Morgan
de4fc29773 llm: reserve required number of slots for embeddings (#6219) 2024-08-06 23:20:49 -04:00
Jeffrey Morgan
e04c7012c2 update llama.cpp submodule to 1e6f6554 (#6208) 2024-08-06 15:11:45 -04:00
Chua Chee Seng
d4a7216c82 Fixed invalid option provided not displaying the invalid option name problem. (#6202) 2024-08-06 14:37:16 -04:00
Daniel Hiltgen
a4fdd03c3b Merge pull request #6207 from dhiltgen/sparse_win
Ensure sparse files on windows during download
2024-08-06 11:06:06 -07:00
Daniel Hiltgen
fc85f50a2b Ensure sparse files on windows during download
The file.Truncate call on windows will write the whole file
unless you set the sparse flag, leading to heavy I/O at the
beginning of download.  This should improve our
I/O behavior on windows and put less stress on the users disk.
2024-08-06 10:58:08 -07:00
royjhan
86b907f82a sort batch results (#6189) 2024-08-05 16:55:34 -07:00
Michael Yang
10d49bce70 Merge pull request #6190 from ollama/mxyng/fix-integration
fix concurrency test
2024-08-05 16:45:49 -07:00
Michael Yang
7ed367419e fix concurrency test 2024-08-05 16:36:16 -07:00
Daniel Hiltgen
50ee8b5f56 Merge pull request #6186 from dhiltgen/numa
Implement linux NUMA detection
2024-08-05 15:20:06 -07:00
Michael Yang
03bdac0595 Merge pull request #6146 from ollama/mxyng/testing
use testing tempdirs
2024-08-05 13:00:05 -07:00
Daniel Hiltgen
f457d63400 Implement linux NUMA detection
If the system has multiple numa nodes, enable numa support in llama.cpp
If we detect numactl in the path, use that, else use the basic "distribute" mode.
2024-08-05 12:56:20 -07:00
Daniel Hiltgen
04210aa6dd Catch one more error log 2024-08-05 09:28:07 -07:00
Michael Yang
43f9d92008 close pid file 2024-08-05 00:41:16 -07:00
Michael Yang
ed6c8bfe57 removeall to remove non-empty temp dirs 2024-08-05 00:41:16 -07:00
Michael Yang
39f2bc6bfc Merge pull request #6167 from ollama/mxyng/line-feed
line feed
2024-08-05 00:06:28 -07:00
frob
b73b0940ef Disable paging for journalctl (#6154)
Users using `journalctl` to get logs for issue logging sometimes don't realize that paging is causing information to be missed.
2024-08-05 00:10:53 -04:00
Michael Yang
6a07344786 line feed 2024-08-04 17:25:41 -07:00
sryu1
8b920f35a4 Add Gemma 2 2b (#6151) 2024-08-04 10:58:39 -04:00
Ivan Charapanau
4221e39867 Reference ollama integration with Harbor (#6147) 2024-08-02 17:03:46 -07:00
Michael Yang
a091fadfda use testing tempdirs 2024-08-02 16:04:06 -07:00
Michael Yang
77ccbf04dc Merge pull request #6128 from ollama/mxyng/lint
enable gofmt/gofumpt/goimports/tenv
2024-08-02 14:58:40 -07:00
royjhan
4addf6b587 Update OpenAI Compatibility Docs with /v1/completions (#5311)
* Update docs

* token bug corrected

* Update docs/openai.md

* Update docs/openai.md

* add suffix

* merge conflicts

* merge conflicts
2024-08-02 13:16:23 -07:00
royjhan
85c7f11170 Update docs (#5310) 2024-08-02 13:05:57 -07:00
Daniel Hiltgen
df3802a65f Adjust arm cuda repo paths
Ubuntu distros fail to install cuda drivers since aarch64 isn't valid
2024-08-01 17:22:25 -07:00
Michael Yang
b732beba6a lint 2024-08-01 17:06:06 -07:00
Kim Hallberg
ce1fb4447e Fix models/{model} URL (#6132) 2024-08-01 16:31:47 -07:00
royjhan
558a54b098 Update OpenAI Compatibility Docs with /v1/embeddings (#5470)
* docs without usage

* no usage

* rm metric note
2024-08-01 16:00:29 -07:00
royjhan
ed52833bb1 Add to docs (#5309) 2024-08-01 15:58:13 -07:00
royjhan
6f133a0bdd OpenAI: Add Usage to v1/embeddings (#5886)
* add prompt tokens to embed response

* rm slog

* metrics

* types

* prompt n

* clean up

* reset submodule

* add tokens to v1/embeddings

* separate usage
2024-08-01 15:49:37 -07:00
royjhan
f561eecfb8 Update OpenAI Compatibility Docs with /v1/models (#5151)
* OpenAI Docs

* Update docs/openai.md

Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>

* Remove newline

---------

Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>
2024-08-01 15:48:44 -07:00
Michael Yang
ff7c9060ec Merge pull request #6115 from slouffka/fix-context
Fix context in /api/generate grows too much (#5980).
2024-08-01 15:13:59 -07:00
Michael Yang
0ff42e84b0 Merge pull request #4756 from ollama/mxyng/convert2
refactor convert
2024-08-01 14:16:30 -07:00
Vyacheslav Moskalev
8a9f946ca7 Refactor and format code. 2024-08-02 03:50:05 +07:00
Vyacheslav Moskalev
3b5210548e Refactor code. Remove extra variable. 2024-08-01 19:56:15 +07:00
Vyacheslav Moskalev
b0c216584c Better types and naming closer to style. 2024-08-01 19:43:44 +07:00
Vyacheslav Moskalev
49a5483139 Change the order of context and prompt. 2024-08-01 19:25:56 +07:00
Vyacheslav Moskalev
6bc5c13758 Fix extra context concatenation in generate handler (#5980). 2024-08-01 15:45:58 +07:00
Michael Yang
3e614260af Merge pull request #6109 from ollama/mxyng/fix-modelfile
fix modelfile message quotes
2024-07-31 17:05:43 -07:00
Michael Yang
d87b4a488e fix modelfile message quotes 2024-07-31 16:52:09 -07:00
Michael Yang
4c14855ad7 Merge pull request #6106 from ollama/mxyng/default-sliding-window-attention
patches: phi3 optional sliding window attention
2024-07-31 16:12:06 -07:00
Blake Mizerany
dc77bbcfa4 server: fix json marshalling of downloadBlobPart (#6108) 2024-07-31 16:01:24 -07:00
Michael Yang
d8e2664c33 convert: fix parse functions 2024-07-31 15:58:55 -07:00
Michael Yang
eafc607abb convert: only extract large files 2024-07-31 15:58:55 -07:00
Michael Yang
781fc2d576 Update convert/reader_safetensors.go
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>
2024-07-31 15:58:55 -07:00
Michael Yang
df993fa37b comments 2024-07-31 15:58:55 -07:00
Michael Yang
5e9db9fb0b refactor convert 2024-07-31 15:58:33 -07:00
Michael Yang
0f3271db88 patches: phi3 default sliding window attention 2024-07-31 14:58:34 -07:00
Michael Yang
6b252918fb update convert test to check result data 2024-07-31 10:59:38 -07:00
Michael Yang
c4c84b7a0d Merge pull request #5196 from ollama/mxyng/messages-2
include modelfile messages
2024-07-31 10:18:17 -07:00
Michael Yang
5c1912769e Merge pull request #5473 from ollama/mxyng/environ
fix: environ lookup
2024-07-31 10:18:05 -07:00
Daniel Nguyen
71399aa682 Added BoltAI as a desktop UI for Ollama (#6096) 2024-07-31 08:44:58 -07:00
Jeffrey Morgan
463a8aa273 Create SECURITY.md 2024-07-30 21:01:12 -07:00
Michael
3579b4966a Update README to include Firebase Genkit (#6083)
Firebase Genkit
2024-07-30 18:40:09 -07:00
Jeffrey Morgan
5d66578356 Update README.md
Better example for multi-modal input
2024-07-30 18:08:34 -07:00
jmorganca
afa8d6e9d5 patch gemma support 2024-07-30 18:07:29 -07:00
royjhan
1b44d873e7 Add Metrics to api\embed response (#5709)
* add prompt tokens to embed response

* rm slog

* metrics

* types

* prompt n

* clean up

* reset submodule

* update tests

* test name

* list metrics
2024-07-30 13:12:21 -07:00
Daniel Hiltgen
cef2c6054d Merge pull request #5859 from dhiltgen/homogeneous_gpus
Prevent partial loading on mixed GPU brands
2024-07-30 11:06:42 -07:00
Daniel Hiltgen
345420998e Prevent partial loading on mixed GPU brands
In mult-brand GPU setups, if we couldn't fully load the model we
would fall through the scheduler and mistakenly try to load across
a mix of brands.  This makes sure we find the set of GPU(s) that
best fit for the partial load.
2024-07-30 11:00:55 -07:00
Kim Hallberg
0be8baad2b Update and Fix example models (#6065)
* Update example models

* Remove unused README.md
2024-07-29 23:56:37 -07:00
Daniel Hiltgen
1a83581a8e Merge pull request #5895 from dhiltgen/sched_faq
Better explain multi-gpu behavior
2024-07-29 14:25:41 -07:00
Daniel Hiltgen
37926eb991 Merge pull request #5927 from dhiltgen/high_cpu_count
Ensure amd gpu nodes are numerically sorted
2024-07-29 14:24:57 -07:00
Daniel Hiltgen
3d4634fdff Merge pull request #5934 from dhiltgen/missing_cuda_repo
Report better error on cuda unsupported os/arch
2024-07-29 14:24:20 -07:00
royjhan
365431d406 return tool calls finish reason for openai (#5995)
* hot fix

* backend stream support

* clean up

* finish reason

* move to openai
2024-07-29 13:56:57 -07:00
Daniel Hiltgen
161e12cecf Merge pull request #5932 from dhiltgen/win_font
Explain font problems on windows 10
2024-07-29 13:40:24 -07:00
Jeffrey Morgan
46e6327e0f api: add stringifier for Tool (#5891) 2024-07-29 13:35:16 -07:00
Jeffrey Morgan
68ee42f995 update llama.cpp submodule to 6eeaeba1 (#6039) 2024-07-29 13:20:26 -07:00
Ikko Eltociear Ashimine
f26aef9a8b docs: update README.md (#6059)
HuggingFace -> Hugging Face
2024-07-29 10:53:30 -07:00
Michael Yang
38d9036b59 Merge pull request #5992 from ollama/mxyng/save
fix: model save
2024-07-29 09:53:19 -07:00
Veit Heller
6f26e9322f Fix typo in image docs (#6041) 2024-07-29 08:50:53 -07:00
Jeffrey Morgan
0e4d653687 upate to llama3.1 elsewhere in repo (#6032) 2024-07-28 19:56:02 -07:00
Michael
2c01610616 update readme to llama3.1 (#5933) 2024-07-28 14:21:38 -07:00
Tibor Schmidt
f3d7a481b7 feat: add support for min_p (resolve #1142) (#1825) 2024-07-27 14:37:40 -07:00
Jeffrey Morgan
f2a96c7d77 llm: keep patch for llama 3 rope factors (#5987) 2024-07-26 15:20:52 -07:00
Daniel Hiltgen
e8a66680d1 Merge pull request #5705 from dhiltgen/win_errormode
Enable windows error dialog for subprocess
2024-07-26 14:49:34 -07:00
Michael Yang
079b2c3b03 Merge pull request #5999 from ollama/mxyng/fix-push
fix nil deref in auth.go
2024-07-26 14:28:34 -07:00
Blake Mizerany
750c1c55f7 server: fix race conditions during download (#5994)
This fixes various data races scattered throughout the download/pull
client where the client was accessing the download state concurrently.

This commit is mostly a hot-fix and will be replaced by a new client one
day soon.

Also, remove the unnecessary opts argument from downloadChunk.
2024-07-26 14:24:24 -07:00
Michael Yang
a622c47bd3 fix nil deref in auth.go 2024-07-26 14:14:48 -07:00
Michael Yang
ec4c35fe99 Merge pull request #5512 from ollama/mxyng/detect-stop
autodetect stop parameters from template
2024-07-26 13:48:23 -07:00
Michael Yang
a250c2cb13 display messages 2024-07-26 13:39:57 -07:00
Michael Yang
3d9de805b7 fix: model save
stop parameter is saved as a slice which is incompatible with modelfile
parsing
2024-07-26 13:23:06 -07:00
Michael Yang
15af558423 include modelfile messages 2024-07-26 11:40:11 -07:00
Jeffrey Morgan
f5e3939220 Update api.md (#5968) 2024-07-25 23:10:18 -04:00
Jeffrey Morgan
ae27d9dcfd Update openai.md 2024-07-25 20:27:33 -04:00
Michael Yang
37096790a7 Merge pull request #5552 from ollama/mxyng/messages-docs
docs
2024-07-25 16:26:19 -07:00
Michael Yang
997c903884 Update docs/template.md
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>
2024-07-25 16:23:40 -07:00
Blake Mizerany
c8af3c2d96 server: reuse original download URL for images (#5962)
This changes the registry client to reuse the original download URL
it gets on the first redirect response for all subsequent requests,
preventing thundering herd issues when hot new LLMs are released.
2024-07-25 15:58:30 -07:00
Jeffrey Morgan
455e61170d Update openai.md 2024-07-25 18:34:47 -04:00
royjhan
4de1370a9d openai tools doc (#5617) 2024-07-25 18:34:06 -04:00
Jeffrey Morgan
bbf8f102ee Revert "llm(llama): pass rope factors (#5924)" (#5963)
This reverts commit bb46bbcf5e.
2024-07-25 18:24:55 -04:00
Daniel Hiltgen
ce3c93b08f Report better error on cuda unsupported os/arch
If we detect an NVIDIA GPU, but nvidia doesn't support the os/arch,
this will report a better error for the user and point them to docs
to self-install the drivers if possible.
2024-07-24 17:09:20 -07:00
Daniel Hiltgen
6c2129d5d0 Explain font problems on windows 10 2024-07-24 15:22:00 -07:00
Daniel Hiltgen
7c2a157ca4 Ensure amd gpu nodes are numerically sorted
For systems that enumerate over 10 CPUs the default lexicographical
sort order interleaves CPUs and GPUs.
2024-07-24 13:43:26 -07:00
Michael Yang
bb46bbcf5e llm(llama): pass rope factors (#5924) 2024-07-24 16:05:59 -04:00
royjhan
ac33aa7d37 Fix Embed Test Flakes (#5893)
* float cmp

* increase tolerance
2024-07-24 11:15:46 -07:00
Daniel Hiltgen
830fdd2715 Better explain multi-gpu behavior 2024-07-23 15:16:38 -07:00
Ajay Chintala
a6cd8f6169 Update README.md to add LLMStack integration (#5799) 2024-07-23 14:40:23 -04:00
Daniel Hiltgen
c78089263a Merge pull request #5864 from dhiltgen/bump_go
Bump Go patch version
2024-07-22 16:34:18 -07:00
Daniel Hiltgen
3e5ea035d5 Merge pull request #5757 from lreed-mdsol/lreed/bump-go-version-fix-vulnerabilities
bump go version to 1.22.5 to fix security vulnerabilities in docker
2024-07-22 16:32:43 -07:00
Daniel Hiltgen
5d604eec5b Bump Go patch version 2024-07-22 16:16:28 -07:00
Josh
db0968f30c fix dupe err message (#5857) 2024-07-22 15:48:15 -07:00
Daniel Hiltgen
e12fff8810 Enable windows error dialog for subprocess startup
Make sure if something goes wrong spawning the process, the user gets
enough info to be able to try to self correct, or at least file a bug
with details so we can fix it.  Once the process starts, we immediately
change back to the recommended setting to prevent the blocking dialog.
This ensures if the model fails to load (OOM, unsupported model type,
etc.) the process will exit quickly and we can scan the stdout/stderr
of the subprocess for the reason to report via API.
2024-07-22 14:07:27 -07:00
Michael Yang
9b60a038e5 update api.md 2024-07-22 13:49:51 -07:00
Michael Yang
83a0cb8d88 docs 2024-07-22 13:38:09 -07:00
royjhan
c0648233f2 api embed docs (#5282) 2024-07-22 13:37:08 -07:00
Jeffrey Morgan
d835368eb8 convert: capture head_dim for mistral (#5818) 2024-07-22 16:16:22 -04:00
Michael Yang
85d9d73a72 comments 2024-07-22 11:49:03 -07:00
Michael Yang
78140a712c cleanup tests 2024-07-22 11:49:03 -07:00
Michael Yang
1954ec5917 uint64 2024-07-22 11:49:02 -07:00
Michael Yang
0f1910129f int 2024-07-22 11:30:07 -07:00
Michael Yang
e2c3f6b3e2 string 2024-07-22 11:27:52 -07:00
Michael Yang
8570c1c0ef keepalive 2024-07-22 11:27:22 -07:00
Michael Yang
55cd3ddcca bool 2024-07-22 11:27:21 -07:00
Michael Yang
66fe77f084 models 2024-07-22 11:26:12 -07:00
Michael Yang
d1a5227cad origins 2024-07-22 11:25:30 -07:00
Michael Yang
4f1afd575d host 2024-07-22 11:25:30 -07:00
Michael Yang
35b89b2eab rfc: dynamic environ lookup 2024-07-22 11:25:30 -07:00
Daniel Hiltgen
5784c05397 Merge pull request #5854 from dhiltgen/win_exit_status
Refine error reporting for subprocess crash
2024-07-22 10:40:22 -07:00
Daniel Hiltgen
f14aa5435d Merge pull request #5855 from dhiltgen/remove_max_vram
Remove no longer supported max vram var
2024-07-22 10:35:29 -07:00
Jeffrey Morgan
f8fedbda20 Update llama.cpp submodule commit to d94c6e0c (#5805) 2024-07-22 12:42:00 -04:00
Jeffrey Morgan
b3e5491e41 server: collect nested tool call objects when parsing (#5824) 2024-07-22 12:38:03 -04:00
Daniel Hiltgen
cc269ba094 Remove no longer supported max vram var
The OLLAMA_MAX_VRAM env var was a temporary workaround for OOM
scenarios.  With Concurrency this was no longer wired up, and the simplistic
value doesn't map to multi-GPU setups.  Users can still set `num_gpu`
to limit memory usage to avoid OOM if we get our predictions wrong.
2024-07-22 09:08:11 -07:00
Daniel Hiltgen
a3c20e3f18 Refine error reporting for subprocess crash
On windows, the exit status winds up being the search term many
users search for and end up piling in on issues that are unrelated.
This refines the reporting so that if we have a more detailed message
we'll suppress the exit status portion of the message.
2024-07-22 08:52:16 -07:00
Jeffrey Morgan
80ee9b5e47 Remove out of space test temporarily (#5825) 2024-07-21 00:22:11 -04:00
Jeffrey Morgan
5534f2cc6a llm: consider head_dim in llama arch (#5817) 2024-07-20 21:48:12 -04:00
Daniel Hiltgen
d321297d8a Merge pull request #5815 from dhiltgen/win_rocm_gfx_features
Adjust windows ROCm discovery
2024-07-20 16:02:55 -07:00
Daniel Hiltgen
06e5d74e34 Merge pull request #5506 from dhiltgen/sched_tests
Refine scheduler unit tests for reliability
2024-07-20 15:48:39 -07:00
Daniel Hiltgen
5d707e6fd5 Merge pull request #5583 from dhiltgen/integration_improvements
Fix context exhaustion integration test for small gpus
2024-07-20 15:48:21 -07:00
Daniel Hiltgen
283948c83b Adjust windows ROCm discovery
The v5 hip library returns unsupported GPUs which wont enumerate at
inference time in the runner so this makes sure we align discovery.  The
gfx906 cards are no longer supported so we shouldn't compile with that
GPU type as it wont enumerate at runtime.
2024-07-20 15:17:50 -07:00
Jeffrey Morgan
1475eab95f add patch for tekken (#5807) 2024-07-20 13:41:21 -04:00
Jeffrey Morgan
20090f3172 preserve last assistant message (#5802) 2024-07-19 20:19:26 -07:00
Jeffrey Morgan
69a2d4ccff Fix generate test flakyness (#5804) 2024-07-19 19:11:25 -07:00
Josh
e8b954c646 server: validate template (#5734)
add template validation to modelfile
2024-07-19 15:24:29 -07:00
royjhan
c57317cbf0 OpenAI: Function Based Testing (#5752)
* distinguish error forwarding

* more coverage

* rm comment
2024-07-19 11:37:12 -07:00
royjhan
51b2fd299c adjust openai chat msg processing (#5729) 2024-07-19 11:19:20 -07:00
Michael Yang
d0634b1596 Merge pull request #5780 from ollama/mxyng/tools
fix parsing tool calls: break on unexpected eofs
2024-07-18 12:14:10 -07:00
Michael Yang
43606d6d6a fix parsing tool calls 2024-07-18 12:08:11 -07:00
Jeffrey Morgan
70b1010fa5 server: check for empty tools array too (#5779) 2024-07-18 11:44:57 -07:00
Jeffrey Morgan
84e5721f3a always provide content even if empty (#5778) 2024-07-18 11:28:19 -07:00
Jeffrey Morgan
319fb1ce03 server: only parse tool calls if tools are provided (#5771)
* server: only parse tool calls if tools are provided

* still set `resp.Message.Content`
2024-07-18 08:50:23 -07:00
Michael Yang
b255445557 marshal json automatically for some template values (#5758) 2024-07-17 15:35:11 -07:00
lreed
f02f83660c bump go version to 1.22.5 to fix security vulnerabilities 2024-07-17 21:44:19 +00:00
Michael Yang
b23424bb3c Merge pull request #5753 from ollama/mxyng/parse-tool-call
parse tool call as individual objects
2024-07-17 11:47:53 -07:00
Michael Yang
5fd6988126 parse tool call as individual objects 2024-07-17 11:19:04 -07:00
Michael Yang
5b82960df8 stub response (#5750) 2024-07-17 10:39:22 -07:00
Michael Yang
cc9a252d8c Merge pull request #5732 from ollama/mxyng/cleanup
remove ToolCall from GenerateResponse
2024-07-17 10:26:54 -07:00
Pákozdi György
d281a6e603 add sidellama link (#5702) 2024-07-17 10:24:44 -07:00
royjhan
154f6f45d4 OpenAI: Support Tools (#5614)
* reopen pr

* tools

* remove tc from stream for now

* ID and Function

* openai expects arguments to be a string (#5739)

* mutually exclusive content and tool calls

* clean up

---------

Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>
2024-07-16 20:52:59 -07:00
royjhan
0d41623b52 OpenAI: Add Suffix to v1/completions (#5611)
* add suffix

* remove todo

* remove TODO

* add to test

* rm outdated prompt tokens info md

* fix test

* fix test
2024-07-16 20:50:14 -07:00
Michael Yang
c279f96371 remove ToolCall from GenerateResponse 2024-07-16 15:22:49 -07:00
Michael Yang
499e87c9ba Merge pull request #5730 from ollama/mxyng/cleanup
remove unneeded tool calls
2024-07-16 14:42:13 -07:00
Michael Yang
cd0853f2d5 Merge pull request #5207 from ollama/mxyng/suffix
add insert support to generate endpoint
2024-07-16 14:37:32 -07:00
Michael Yang
d290e87513 add suffix support to generate endpoint
this change is triggered by the presence of "suffix", particularly
useful for code completion tasks
2024-07-16 14:31:35 -07:00
Thorsten Sommer
97c20ede33 README: Added AI Studio to the list of UIs (#5721)
* Added AI Studio to the list of UIs
2024-07-16 14:24:27 -07:00
Michael Yang
5a83f79afd remove unneeded tool calls 2024-07-16 13:48:45 -07:00
royjhan
987dbab0b0 OpenAI: /v1/embeddings compatibility (#5285)
* OpenAI v1 models

* Empty List Testing

* Add back envconfig

* v1/models docs

* Remove Docs

* OpenAI batch embed compatibility

* merge conflicts

* integrate with api/embed

* ep

* merge conflicts

* request tests

* rm resp test

* merge conflict

* merge conflict

* test fixes

* test fn renaming

* input validation for empty string

---------

Co-authored-by: jmorganca <jmorganca@gmail.com>
2024-07-16 13:36:08 -07:00
Michael Yang
a8388beb94 Merge pull request #5726 from ollama/mxyng/tools-templates
fix unmarshal type errors
2024-07-16 12:12:10 -07:00
Michael Yang
5afbb60fc4 fix unmarshal type errors 2024-07-16 11:39:34 -07:00
Jeffrey Morgan
4cb5d7decc server: omit model system prompt if empty (#5717) 2024-07-16 11:09:00 -07:00
Michael Yang
8eac50dd4f Merge pull request #5684 from ollama/mxyng/tests
add chat and generate tests with mock runner
2024-07-16 09:44:45 -07:00
Michael Yang
4a565cbf94 add chat and generate tests with mock runner 2024-07-16 09:39:31 -07:00
Michael Yang
64039df6d7 Merge pull request #5284 from ollama/mxyng/tools
tools
2024-07-15 18:03:37 -07:00
Jeffrey Morgan
7ac6d462ec server: return empty slice on empty /api/embed request (#5713)
* server: return empty slice on empty `/api/embed` request

* fix tests
2024-07-15 17:39:44 -07:00
Michael Yang
ef5136a745 tools test 2024-07-15 17:18:21 -07:00
Daniel Hiltgen
8288ec8824 Merge pull request #5710 from dhiltgen/rocm_bump
Bump linux ROCm to 6.1.2
2024-07-15 15:32:18 -07:00
Michael Yang
d02bbebb11 tools 2024-07-15 15:26:16 -07:00
Daniel Hiltgen
224337b32f Bump linux ROCm to 6.1.2 2024-07-15 15:10:22 -07:00
Jeffrey Morgan
9e35d9bbee server: lowercase roles for compatibility with clients (#5695) 2024-07-15 13:55:57 -07:00
royjhan
b9f5e16c80 Introduce /api/embed endpoint supporting batch embedding (#5127)
* Initial Batch Embedding

* Revert "Initial Batch Embedding"

This reverts commit c22d54895a.

* Initial Draft

* mock up notes

* api/embed draft

* add server function

* check normalization

* clean up

* normalization

* playing around with truncate stuff

* Truncation

* Truncation

* move normalization to go

* Integration Test Template

* Truncation Integration Tests

* Clean up

* use float32

* move normalize

* move normalize test

* refactoring

* integration float32

* input handling and handler testing

* Refactoring of legacy and new

* clear comments

* merge conflicts

* touches

* embedding type 64

* merge conflicts

* fix hanging on single string

* refactoring

* test values

* set context length

* clean up

* testing clean up

* testing clean up

* remove function closure

* Revert "remove function closure"

This reverts commit 55d48c6ed1.

* remove function closure

* remove redundant error check

* clean up

* more clean up

* clean up
2024-07-15 12:14:24 -07:00
royjhan
e9f7f36029 Support image input for OpenAI chat compatibility (#5208)
* OpenAI v1 models

* Refactor Writers

* Add Test

Co-Authored-By: Attila Kerekes

* Credit Co-Author

Co-Authored-By: Attila Kerekes <439392+keriati@users.noreply.github.com>

* Empty List Testing

* Use Namespace for Ownedby

* Update Test

* Add back envconfig

* v1/models docs

* Use ModelName Parser

* Test Names

* Remove Docs

* Clean Up

* Test name

Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>

* Add Middleware for Chat and List

* Testing Cleanup

* Test with Fatal

* Add functionality to chat test

* Support image input for OpenAI chat

* Decoding

* Fix message processing logic

* openai vision test

* type errors

* clean up

* redundant check

* merge conflicts

* merge conflicts

* merge conflicts

* flattening and smaller image

* add test

* support python and js SDKs and mandate prefixing

* clean up

---------

Co-authored-by: Attila Kerekes <439392+keriati@users.noreply.github.com>
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>
2024-07-13 22:07:45 -07:00
Patrick Devine
057d31861e remove template (#5655) 2024-07-13 20:56:24 -07:00
jmorganca
f7ee012300 server: prepend system message in chat handler 2024-07-13 15:08:00 -07:00
Jeffrey Morgan
1ed0aa8fea server: fix context, load_duration and total_duration fields (#5676)
* server: fix `contet`, `load_duration` and `total_duration` fields

* Update server/routes.go
2024-07-13 09:25:31 -07:00
Jeffrey Morgan
ef98803d63 llm: looser checks for minimum memory (#5677) 2024-07-13 09:20:05 -07:00
Jarek
02fea420e5 Add Kerlig AI, an app for macOS (#5675) 2024-07-13 08:33:46 -07:00
Michael Yang
22c5451fc2 fix system prompt (#5662)
* fix system prompt

* execute template when hitting previous roles

* fix tests

---------

Co-authored-by: jmorganca <jmorganca@gmail.com>
2024-07-12 21:04:44 -07:00
Michael Yang
ebc529cbb3 autodetect stop parameters from template 2024-07-12 16:01:23 -07:00
Patrick Devine
23ebbaa46e Revert "remove template from tests"
This reverts commit 9ac0a7a50b.
2024-07-12 15:47:17 -07:00
Patrick Devine
9ac0a7a50b remove template from tests 2024-07-12 15:41:31 -07:00
Michael Yang
e5c65a85df Merge pull request #5653 from ollama/mxyng/collect-system
template: preprocess message and collect system
2024-07-12 12:32:34 -07:00
Jeffrey Morgan
33627331a3 app: also clean up tempdir runners on install (#5646) 2024-07-12 12:29:23 -07:00
Michael Yang
36c87c433b template: preprocess message and collect system 2024-07-12 12:26:43 -07:00
Jeffrey Morgan
179737feb7 Clean up old files when installing on Windows (#5645)
* app: always clean up install dir; force close applications

* remove wildcard

* revert `CloseApplications`

* whitespace

* update `LOCALAPPDATA` var
2024-07-11 22:53:46 -07:00
Michael Yang
47353f5ee4 Merge pull request #5639 from ollama/mxyng/unaggregated-system 2024-07-11 17:48:50 -07:00
Josh
10e768826c fix: quant err message (#5616) 2024-07-11 17:24:29 -07:00
Michael Yang
5056bb9c01 rename aggregate to contents 2024-07-11 17:00:26 -07:00
Jeffrey Morgan
c4cf8ad559 llm: avoid loading model if system memory is too small (#5637)
* llm: avoid loading model if system memory is too small

* update log

* Instrument swap free space

On linux and windows, expose how much swap space is available
so we can take that into consideration when scheduling models

* use `systemSwapFreeMemory` in check

---------

Co-authored-by: Daniel Hiltgen <daniel@ollama.com>
2024-07-11 16:42:57 -07:00
Michael Yang
57ec6901eb revert embedded templates to use prompt/response
This reverts commit 19753c18c0.

for compat. messages will be added at a later date
2024-07-11 14:49:35 -07:00
Michael Yang
e64f9ebb44 do no automatically aggregate system messages 2024-07-11 14:49:35 -07:00
Jeffrey Morgan
791650ddef sched: only error when over-allocating system memory (#5626) 2024-07-11 00:53:12 -07:00
Jeffrey Morgan
efbf41ed81 llm: dont link cuda with compat libs (#5621) 2024-07-10 20:01:52 -07:00
Michael Yang
cf15589851 Merge pull request #5620 from ollama/mxyng/templates
update embedded templates
2024-07-10 17:16:24 -07:00
Michael Yang
19753c18c0 update embedded templates 2024-07-10 17:03:08 -07:00
Michael Yang
41be28096a add system prompt to first legacy template 2024-07-10 17:03:08 -07:00
Michael Yang
37a570f962 Merge pull request #5612 from ollama/mxyng/mem
chatglm graph
2024-07-10 14:18:33 -07:00
Michael Yang
5a739ff4cb chatglm graph 2024-07-10 13:43:47 -07:00
Jeffrey Morgan
4e262eb2a8 remove GGML_CUDA_FORCE_MMQ=on from build (#5588) 2024-07-10 13:17:13 -07:00
Daniel Hiltgen
4cfcbc328f Merge pull request #5124 from dhiltgen/amd_windows
Wire up windows AMD driver reporting
2024-07-10 12:50:23 -07:00
Daniel Hiltgen
79292ff3e0 Merge pull request #5555 from dhiltgen/msvc_deps
Bundle missing CRT libraries
2024-07-10 12:50:02 -07:00
Daniel Hiltgen
8ea500441d Merge pull request #5580 from dhiltgen/cuda_overhead
Detect CUDA OS overhead
2024-07-10 12:47:31 -07:00
Daniel Hiltgen
b50c818623 Merge pull request #5607 from dhiltgen/win_rocm_v6
Bump ROCm on windows to 6.1.2
2024-07-10 12:47:10 -07:00
Daniel Hiltgen
b99e750b62 Merge pull request #5605 from dhiltgen/merge_glitch
Remove duplicate merge glitch
2024-07-10 11:47:08 -07:00
Daniel Hiltgen
1f50356e8e Bump ROCm on windows to 6.1.2
This also adjusts our algorithm to favor our bundled ROCm.
I've confirmed VRAM reporting still doesn't work properly so we
can't yet enable concurrency by default.
2024-07-10 11:01:22 -07:00
Daniel Hiltgen
22c81f62ec Remove duplicate merge glitch 2024-07-10 09:01:33 -07:00
Daniel Hiltgen
73e2c8f68f Fix context exhaustion integration test for small gpus
On the smaller GPUs, the initial model load of llama2 took over 30s (the
default timeout for the DoGenerate helper)
2024-07-09 16:24:14 -07:00
Daniel Hiltgen
f4408219e9 Refine scheduler unit tests for reliability
This breaks up some of the test scenarios to create a
more reliable set of tests, as well as adding a little more
coverage.
2024-07-09 16:00:08 -07:00
Daniel Hiltgen
2d1e3c3229 Merge pull request #5503 from dhiltgen/dual_rocm
Workaround broken ROCm p2p copy
2024-07-09 15:44:16 -07:00
royjhan
4918fae535 OpenAI v1/completions: allow stop token list (#5551)
* stop token parsing fix

* add stop test
2024-07-09 14:01:26 -07:00
royjhan
0aff67877e separate request tests (#5578) 2024-07-09 13:48:31 -07:00
Daniel Hiltgen
f6f759fc5f Detect CUDA OS Overhead
This adds logic to detect skew between the driver and
management library which can be attributed to OS overhead
and records that so we can adjust subsequent management
library free VRAM updates and avoid OOM scenarios.
2024-07-09 12:21:50 -07:00
Daniel Hiltgen
9544a57ee4 Merge pull request #5579 from dhiltgen/win_static_deps
Statically link c++ and thread lib on windows
2024-07-09 12:21:13 -07:00
Daniel Hiltgen
b51e3b63ac Statically link c++ and thread lib
This makes sure we statically link the c++ and thread library on windows
to avoid unnecessary runtime dependencies on non-standard DLLs
2024-07-09 11:34:30 -07:00
Michael Yang
6bbbc50f10 Merge pull request #5440 from ollama/mxyng/messages-templates
update named templates
2024-07-09 09:36:32 -07:00
Michael Yang
9bbddc37a7 Merge pull request #5126 from ollama/mxyng/messages
update message processing
2024-07-09 09:20:44 -07:00
Jeffrey Morgan
e4ff73297d server: fix model reloads when setting OLLAMA_NUM_PARALLEL (#5560)
* server: fix unneeded model reloads when setting `OLLAMA_NUM_PARALLEL`

* remove whitespace change

* undo some changes
2024-07-08 22:32:15 -07:00
Daniel Hiltgen
b44320db13 Bundle missing CRT libraries
Some users are experienging runner startup errors due
to not having these msvc redist libraries on their host
2024-07-08 18:24:21 -07:00
Daniel Hiltgen
0bacb30007 Workaround broken ROCm p2p copy
Enable the build flag for llama.cpp to use CPU copy for multi-GPU scenarios.
2024-07-08 09:40:52 -07:00
Jeffrey Morgan
53da2c6965 llm: remove ambiguous comment when putting upper limit on predictions to avoid infinite generation (#5535) 2024-07-07 14:32:05 -04:00
Jeffrey Morgan
d8def1ff94 llm: allow gemma 2 to context shift (#5534) 2024-07-07 13:41:51 -04:00
Jeffrey Morgan
571dc61955 Update llama.cpp submodule to a8db2a9c (#5530) 2024-07-07 13:03:09 -04:00
Jeffrey Morgan
0e09c380fc llm: print caching notices in debug only (#5533) 2024-07-07 12:38:04 -04:00
Jeffrey Morgan
0ee87615c7 sched: don't error if paging to disk on Windows and macOS (#5523) 2024-07-06 22:01:52 -04:00
Jeffrey Morgan
f8241bfba3 gpu: report system free memory instead of 0 (#5521) 2024-07-06 19:35:04 -04:00
Jeffrey Morgan
4607c70641 llm: add -DBUILD_SHARED_LIBS=off to common cpu cmake flags (#5520) 2024-07-06 18:58:16 -04:00
jmorganca
c12f1c5b99 release: move mingw library cleanup to correct job 2024-07-06 16:12:29 -04:00
jmorganca
a08f20d910 release: remove unwanted mingw dll.a files 2024-07-06 15:21:15 -04:00
jmorganca
6cea036027 Revert "llm: only statically link libstdc++"
This reverts commit 5796bfc401.
2024-07-06 15:10:48 -04:00
jmorganca
5796bfc401 llm: only statically link libstdc++ 2024-07-06 14:06:20 -04:00
jmorganca
f1a379aa56 llm: statically link pthread and stdc++ dependencies in windows build 2024-07-06 12:54:02 -04:00
jmorganca
9ae146993e llm: add GGML_STATIC flag to windows static lib 2024-07-06 03:27:05 -04:00
Jeffrey Morgan
e0348d3fe8 llm: add COMMON_DARWIN_DEFS to arm static build (#5513) 2024-07-05 22:42:42 -04:00
Jeffrey Morgan
2cc854f8cb llm: fix missing dylibs by restoring old build behavior on Linux and macOS (#5511)
* Revert "fix cmake build (#5505)"

This reverts commit 4fd5f3526a.

* llm: fix missing dylibs by restoring old build behavior

* crlf -> lf
2024-07-05 21:48:31 -04:00
Jeffrey Morgan
5304b765b2 llm: put back old include dir (#5507)
* llm: put back old include dir

* llm: update link paths for old submodule commits
2024-07-05 19:34:21 -04:00
Michael Yang
fb6cbc02fb update named templates 2024-07-05 16:29:32 -07:00
Jeffrey Morgan
4fd5f3526a fix cmake build (#5505) 2024-07-05 19:07:01 -04:00
Daniel Hiltgen
842f85f758 Merge pull request #5502 from dhiltgen/ci_fixes
Always go build in CI generate steps
2024-07-05 15:39:11 -07:00
Daniel Hiltgen
9d30f9f8b3 Always go build in CI generate steps
With the recent cgo changes, bugs can sneak through
if we don't make sure to `go build` all the permutations
2024-07-05 15:31:52 -07:00
Blake Mizerany
631cfd9e62 types/model: remove knowledge of digest (#5500)
This was leading to ambiguity and confusion in ollama.com, and is not
used anywhere in ollama at the moment. Once manifests are addressable by
digest, we can add this back in, and in a way that is more tailored to
the concept of addressing a manifest by digest.
2024-07-05 13:42:30 -07:00
Michael Yang
326363b3a7 no funcs 2024-07-05 13:17:25 -07:00
Michael Yang
ac7a842e55 fix model reloading
ensure runtime model changes (template, system prompt, messages,
options) are captured on model updates without needing to reload the
server
2024-07-05 13:17:25 -07:00
Michael Yang
2c3fe1fd97 comments 2024-07-05 13:17:24 -07:00
Michael Yang
269ed6e6a2 update message processing 2024-07-05 13:16:58 -07:00
Jeffrey Morgan
78fb33dd07 fix typo in cgo directives in llm.go (#5501) 2024-07-05 15:18:36 -04:00
Jeffrey Morgan
8f8e736b13 update llama.cpp submodule to d7fd29f (#5475) 2024-07-05 13:25:58 -04:00
Jeffrey Morgan
d89454de80 Use slot with cached prompt instead of least recently used (#5492)
* Use common prefix to select slot

* actually report `longest`
2024-07-05 12:32:47 -04:00
Daniel Hiltgen
af28b94533 Merge pull request #5469 from dhiltgen/prevent_system_oom
Prevent loading models larger than total memory
2024-07-05 08:22:20 -07:00
Jeffrey Morgan
e9188e971a Fix assert on small embedding inputs (#5491)
* Fix assert on small embedding inputs

* Update llm/patches/09-pooling.diff
2024-07-05 11:20:57 -04:00
Daniel Hiltgen
78eddfc068 Merge pull request #4412 from dhiltgen/win_docs
Document older win10 terminal problems
2024-07-05 08:18:22 -07:00
Daniel Hiltgen
02c24d3d01 Merge pull request #5466 from dhiltgen/fix_clip_unicode
Fix clip model loading with unicode paths
2024-07-05 08:16:58 -07:00
Daniel Hiltgen
52abc8acb7 Document older win10 terminal problems
We haven't found a workaround, so for now recommend updating.
2024-07-03 17:32:14 -07:00
Jeffrey Morgan
4d71c559b2 fix error detection by limiting model loading error parsing (#5472) 2024-07-03 20:04:30 -04:00
Anatoli Babenia
0d16eb310e fix: use envconfig.ModelsDir directly (#4821)
* Co-authored-by: Anatoli Babenia <anatoli@rainforce.org>

Co-authored-by: Maas Lalani <maas@lalani.dev>
2024-07-03 15:36:11 -07:00
Daniel Hiltgen
8072e205ff Merge pull request #5447 from dhiltgen/fix_keepalive
Only set default keep_alive on initial model load
2024-07-03 15:34:38 -07:00
Daniel Hiltgen
955f2a4e03 Only set default keep_alive on initial model load
This change fixes the handling of keep_alive so that if client
request omits the setting, we only set this on initial load.  Once
the model is loaded, if new requests leave this unset, we'll keep
whatever keep_alive was there.
2024-07-03 15:29:56 -07:00
Daniel Hiltgen
3c75113e37 Prevent loading models larger than total memory
Users may not realize the siny new model they're trying to load
fits on their disk, but can't load into system+GPU memory.  Today
we crash, but with this fix, we'll give them a better error message
before even trying to load it.
2024-07-03 14:47:42 -07:00
Daniel Hiltgen
ccd7785859 Merge pull request #5243 from dhiltgen/modelfile_use_mmap
Fix use_mmap for modefiles
2024-07-03 13:59:42 -07:00
royjhan
3b5a4a77f3 Return Correct Prompt Eval Count Regardless of Cache Prompt (#5371)
* openai compatibility

* Revert "openai compatibility"

This reverts commit d3f98a811e.

* remove erroneous subtraction of prompt cache
2024-07-03 13:46:23 -07:00
Daniel Hiltgen
daed0634a9 Merge pull request #5467 from dhiltgen/bogus_cpu_mac_error
Fix corner cases on tmp cleaner on mac
2024-07-03 13:39:36 -07:00
Daniel Hiltgen
0d4dd707bc Merge pull request #5465 from dhiltgen/better_cuda_logging
Better nvidia GPU discovery logging
2024-07-03 13:12:22 -07:00
Daniel Hiltgen
0e982bc1f4 Fix corner cases on tmp cleaner on mac
When ollama is running a long time, tmp cleaners can remove the
runners.  This tightens up a few corner cases on arm macs where
we failed with "server cpu not listed in available servers map[]"
2024-07-03 13:10:14 -07:00
Daniel Hiltgen
6298f49816 Fix clip model loading with unicode paths
On windows, if the model dir contained unicode characters
clip models would fail to load.  This fixes the file name
handling in clip.cpp to support utf16 on windows.
2024-07-03 12:46:36 -07:00
Daniel Hiltgen
ef757da2c9 Better nvidia GPU discovery logging
Refine the way we log GPU discovery to improve the non-debug
output, and report more actionable log messages when possible
to help users troubleshoot on their own.
2024-07-03 10:50:40 -07:00
Michael Yang
e5352297d9 Merge pull request #5448 from ollama/mxyng/fix-generate
use model template by default
2024-07-02 16:48:06 -07:00
Michael Yang
65a5040e09 fix generate template 2024-07-02 16:42:17 -07:00
royjhan
d626b99b54 OpenAI: v1/completions compatibility (#5209)
* OpenAI v1 models

* Refactor Writers

* Add Test

Co-Authored-By: Attila Kerekes

* Credit Co-Author

Co-Authored-By: Attila Kerekes <439392+keriati@users.noreply.github.com>

* Empty List Testing

* Use Namespace for Ownedby

* Update Test

* Add back envconfig

* v1/models docs

* Use ModelName Parser

* Test Names

* Remove Docs

* Clean Up

* Test name

Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>

* Add Middleware for Chat and List

* Completions Endpoint

* Testing Cleanup

* Test with Fatal

* Add functionality to chat test

* Rename function

* float types

* type cleanup

* cleaning

* more cleaning

* Extra test cases

* merge conflicts

* merge conflicts

* merge conflicts

* merge conflicts

* cleaning

* cleaning

---------

Co-authored-by: Attila Kerekes <439392+keriati@users.noreply.github.com>
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>
2024-07-02 16:01:45 -07:00
Michael Yang
dddb58a38b Merge pull request #5051 from ollama/mxyng/capabilities
add model capabilities
2024-07-02 14:26:07 -07:00
Michael Yang
400056e154 Merge pull request #5420 from ollama/mxyng/insecure-path
err on insecure path
2024-07-02 14:03:23 -07:00
Daniel Hiltgen
d2f19024d0 Merge pull request #5442 from dhiltgen/concurrency_docs
Add windows radeon concurrency note
2024-07-02 12:47:47 -07:00
Daniel Hiltgen
69c04eecc4 Add windows radeon concurreny note 2024-07-02 12:46:14 -07:00
royjhan
996bb1b85e OpenAI: /v1/models and /v1/models/{model} compatibility (#5007)
* OpenAI v1 models

* Refactor Writers

* Add Test

Co-Authored-By: Attila Kerekes

* Credit Co-Author

Co-Authored-By: Attila Kerekes <439392+keriati@users.noreply.github.com>

* Empty List Testing

* Use Namespace for Ownedby

* Update Test

* Add back envconfig

* v1/models docs

* Use ModelName Parser

* Test Names

* Remove Docs

* Clean Up

* Test name

Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>

* Add Middleware for Chat and List

* Testing Cleanup

* Test with Fatal

* Add functionality to chat test

* OpenAI: /v1/models/{model} compatibility (#5028)

* Retrieve Model

* OpenAI Delete Model

* Retrieve Middleware

* Remove Delete from Branch

* Update Test

* Middleware Test File

* Function name

* Cleanup

* Test Update

* Test Update

---------

Co-authored-by: Attila Kerekes <439392+keriati@users.noreply.github.com>
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>
2024-07-02 11:50:56 -07:00
Daniel Hiltgen
422dcc3856 Merge pull request #5439 from dhiltgen/fix_centos_7_build
Switch ARM64 container image base to rocky 8
2024-07-02 11:01:15 -07:00
Daniel Hiltgen
020bd60ab2 Switch amd container image base to rocky 8
The centos 7 arm mirrors have disappeared due to the EOL 2 days
ago, and the vault sed workaround which works for x86 doesn't work for arm.
2024-07-02 10:34:47 -07:00
Daniel Hiltgen
8e277b72bb Merge pull request #5438 from dhiltgen/fix_centos_7_build
Centos 7 EOL broke mirrors
2024-07-02 09:28:00 -07:00
Daniel Hiltgen
4f67b39d26 Centos 7 EOL broke mirrors
As of July 1st 2024: Could not resolve host: mirrorlist.centos.org
This is expected due to EOL dates.
2024-07-02 09:22:17 -07:00
Josh
2425281317 Merge pull request #5336 from ollama/jyan/from-errors
fix: trim spaces for FROM argument, don't trim inside of quotes
2024-07-01 16:32:46 -07:00
Josh
0403e9860e Merge pull request #5421 from ollama/jyan/ver
fix: add unsupported architecture message for linux/windows
2024-07-01 16:32:14 -07:00
Josh Yan
33a65e3ba3 error 2024-07-01 16:04:13 -07:00
Michael Yang
88bcd79bb9 err on insecure path 2024-07-01 15:55:59 -07:00
Josh Yan
7e571f95f0 trimspace test case 2024-07-01 11:07:48 -07:00
Michael Yang
da8e2a0447 use kvs to detect embedding models 2024-07-01 10:47:43 -07:00
Michael Yang
a30915bde1 add capabilities 2024-07-01 10:47:43 -07:00
Michael Yang
58e3fff311 rename templates to template 2024-07-01 10:40:54 -07:00
Michael Yang
3f0b309ad4 remove ManifestV2 2024-07-01 10:40:54 -07:00
Daniel Hiltgen
e70610ef06 Merge pull request #5410 from dhiltgen/ctx_cleanup
Fix case for NumCtx
2024-07-01 09:54:20 -07:00
Daniel Hiltgen
dfded7e075 Merge pull request #5364 from dhiltgen/concurrency_docs
Document concurrent behavior and settings
2024-07-01 09:49:48 -07:00
Daniel Hiltgen
173b550438 Remove default auto from help message
This may confuse users thinking "auto" is an acceptable string - it must be numeric
2024-07-01 09:48:05 -07:00
Daniel Hiltgen
cff3f44f4a Fix case for NumCtx 2024-07-01 09:43:59 -07:00
Josh Yan
26e4e66faf updated parsefile test 2024-07-01 09:43:49 -07:00
Daniel Hiltgen
97c9e11768 Switch use_mmap to a pointer type
This uses nil as undefined for a cleaner implementation.
2024-07-01 08:44:59 -07:00
Daniel Hiltgen
3518aaef33 Merge pull request #4218 from dhiltgen/auto_parallel
Enable concurrency by default
2024-07-01 08:32:29 -07:00
RAPID ARCHITECT
1963c00201 Update README.md (#5214)
* Update README.md

Added Mesop example to web & desktop

* Update README.md

---------

Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>
2024-06-30 22:00:57 -04:00
Eduard
27402cb7a2 Update gpu.md (#5382)
Runs fine on a NVIDIA GeForce GTX 1050 Ti
2024-06-30 21:48:51 -04:00
Jeffrey Morgan
c1218199cf Update api.md 2024-06-29 16:22:49 -07:00
Jeffrey Morgan
717f7229eb Do not shift context for sliding window models (#5368)
* Do not shift context for sliding window models

* truncate prompt > 2/3 tokens

* only target gemma2
2024-06-28 19:39:31 -07:00
Daniel Hiltgen
aae56abb7c Document concurrent behavior and settings 2024-06-28 13:15:57 -07:00
royjhan
5f034f5b63 Include Show Info in Interactive (#5342) 2024-06-28 13:15:52 -07:00
royjhan
b910fa9010 Ollama Show: Check for Projector Type (#5307)
* Check exists projtype

* Maintain Ordering
2024-06-28 11:30:16 -07:00
royjhan
6d4219083c Update docs (#5312) 2024-06-28 09:58:14 -07:00
Michael Yang
1ed4f521c4 Merge pull request #5340 from ollama/mxyng/mem
gemma2 graph
2024-06-27 14:26:49 -07:00
Michael Yang
de2163dafd gemma2 graph 2024-06-27 13:34:52 -07:00
Josh Yan
9bd00041fa trim all params 2024-06-27 11:18:38 -07:00
Josh Yan
4e986a823c unquote, trimp space 2024-06-27 10:59:15 -07:00
Michael
2cc7d05012 update readme for gemma 2 (#5333)
* update readme for gemma 2
2024-06-27 12:45:16 -04:00
Michael Yang
123a722a6f zip: prevent extracting files into parent dirs (#5314) 2024-06-26 21:38:21 -07:00
Jeffrey Morgan
4d311eb731 llm: architecture patch (#5316) 2024-06-26 21:38:12 -07:00
Blake Mizerany
cb42e607c5 llm: speed up gguf decoding by a lot (#5246)
Previously, some costly things were causing the loading of GGUF files
and their metadata and tensor information to be VERY slow:

  * Too many allocations when decoding strings
  * Hitting disk for each read of each key and value, resulting in a
    not-okay amount of syscalls/disk I/O.

The show API is now down to 33ms from 800ms+ for llama3 on a macbook pro
m3.

This commit also prevents collecting large arrays of values when
decoding GGUFs (if desired). When such keys are encountered, their
values are null, and are encoded as such in JSON.

Also, this fixes a broken test that was not encoding valid GGUF.
2024-06-24 21:47:52 -07:00
Blake Mizerany
2aa91a937b cmd: defer stating model info until necessary (#5248)
This commit changes the 'ollama run' command to defer fetching model
information until it really needs it. That is, when in interactive mode.

It also removes one such case where the model information is fetch in
duplicate, just before calling generateInteractive and then again, first
thing, in generateInteractive.

This positively impacts the performance of the command:

    ; time ./before run llama3 'hi'
    Hi! It's nice to meet you. Is there something I can help you with, or would you like to chat?

    ./before run llama3 'hi'  0.02s user 0.01s system 2% cpu 1.168 total
    ; time ./before run llama3 'hi'
    Hi! It's nice to meet you. Is there something I can help you with, or would you like to chat?

    ./before run llama3 'hi'  0.02s user 0.01s system 2% cpu 1.220 total
    ; time ./before run llama3 'hi'
    Hi! It's nice to meet you. Is there something I can help you with, or would you like to chat?

    ./before run llama3 'hi'  0.02s user 0.01s system 2% cpu 1.217 total
    ; time ./after run llama3 'hi'
    Hi! It's nice to meet you. Is there something I can help you with, or would you like to chat?

    ./after run llama3 'hi'  0.02s user 0.01s system 4% cpu 0.652 total
    ; time ./after run llama3 'hi'
    Hi! It's nice to meet you. Is there something I can help you with, or would you like to chat?

    ./after run llama3 'hi'  0.01s user 0.01s system 5% cpu 0.498 total
    ; time ./after run llama3 'hi'
    Hi! It's nice to meet you. Is there something I can help you with or would you like to chat?

    ./after run llama3 'hi'  0.01s user 0.01s system 3% cpu 0.479 total
    ; time ./after run llama3 'hi'
    Hi! It's nice to meet you. Is there something I can help you with, or would you like to chat?

    ./after run llama3 'hi'  0.02s user 0.01s system 5% cpu 0.507 total
    ; time ./after run llama3 'hi'
    Hi! It's nice to meet you. Is there something I can help you with, or would you like to chat?

    ./after run llama3 'hi'  0.02s user 0.01s system 5% cpu 0.507 total
2024-06-24 20:14:03 -07:00
Daniel Hiltgen
ccef9431c8 Merge pull request #5205 from dhiltgen/modelfile_use_mmap
Fix use_mmap parsing for modelfiles
2024-06-21 16:30:36 -07:00
Daniel Hiltgen
642cee1342 Sort the ps output
Provide consistent ordering for the ps command - longest duration listed first
2024-06-21 15:59:41 -07:00
royjhan
9a9e7d83c4 Docs (#5149) 2024-06-21 15:52:09 -07:00
Daniel Hiltgen
9929751cc8 Disable concurrency for AMD + Windows
Until ROCm v6.2 ships, we wont be able to get accurate free memory
reporting on windows, which makes automatic concurrency too risky.
Users can still opt-in but will need to pay attention to model sizes otherwise they may thrash/page VRAM or cause OOM crashes.
All other platforms and GPUs have accurate VRAM reporting wired
up now, so we can turn on concurrency by default.
2024-06-21 15:45:05 -07:00
Daniel Hiltgen
17b7186cd7 Enable concurrency by default
This adjusts our default settings to enable multiple models and parallel
requests to a single model.  Users can still override these by the same
env var settings as before.  Parallel has a direct impact on
num_ctx, which in turn can have a significant impact on small VRAM GPUs
so this change also refines the algorithm so that when parallel is not
explicitly set by the user, we try to find a reasonable default that fits
the model on their GPU(s).  As before, multiple models will only load
concurrently if they fully fit in VRAM.
2024-06-21 15:45:05 -07:00
Michael Yang
189a43caa2 Merge pull request #5206 from ollama/mxyng/quantize
fix: quantization with template
2024-06-21 13:44:34 -07:00
Michael Yang
e835ef1836 fix: quantization with template 2024-06-21 13:39:25 -07:00
Daniel Hiltgen
7e7749224c Fix use_mmap parsing for modelfiles
Add the new tristate parsing logic for the code path for modelfiles,
as well as a unit test.
2024-06-21 12:27:19 -07:00
Daniel Hiltgen
c7c2f3bc22 Merge pull request #5194 from dhiltgen/linux_mmap_auto
Refine mmap default logic on linux
2024-06-20 11:44:08 -07:00
Daniel Hiltgen
54a79d6a8a Merge pull request #5125 from dhiltgen/fedora39
Bump latest fedora cuda repo to 39
2024-06-20 11:27:24 -07:00
Daniel Hiltgen
5bf5aeec01 Refine mmap default logic on linux
If we try to use mmap when the model is larger than the system free space, loading is slower than the no-mmap approach.
2024-06-20 11:07:04 -07:00
Michael Yang
e01e535cbb Merge pull request #5192 from ollama/mxyng/kv
handle asymmetric embedding KVs
2024-06-20 10:46:24 -07:00
Josh
0195d6a2f8 Merge pull request #5188 from ollama/jyan/tmpdir2
fix: skip os.removeAll() if PID does not exist
2024-06-20 10:40:59 -07:00
Michael Yang
8e0641a9bf handle asymmetric embedding KVs 2024-06-20 09:57:27 -07:00
Josh Yan
662568d453 err!=nil check 2024-06-20 09:30:59 -07:00
Josh Yan
4ebb66c662 reformat error check 2024-06-20 09:23:43 -07:00
Josh Yan
23e899f32d skip os.removeAll() if PID does not exist 2024-06-20 08:51:35 -07:00
royjhan
fedf71635e Extend api/show and ollama show to return more model info (#4881)
* API Show Extended

* Initial Draft of Information

Co-Authored-By: Patrick Devine <pdevine@sonic.net>

* Clean Up

* Descriptive arg error messages and other fixes

* Second Draft of Show with Projectors Included

* Remove Chat Template

* Touches

* Prevent wrapping from files

* Verbose functionality

* Docs

* Address Feedback

* Lint

* Resolve Conflicts

* Function Name

* Tests for api/show model info

* Show Test File

* Add Projector Test

* Clean routes

* Projector Check

* Move Show Test

* Touches

* Doc update

---------

Co-authored-by: Patrick Devine <pdevine@sonic.net>
2024-06-19 14:19:02 -07:00
Daniel Hiltgen
97c59be653 Merge pull request #5074 from dhiltgen/app_log_rotation
Implement log rotation for tray app
2024-06-19 13:02:24 -07:00
Daniel Hiltgen
9d8a4988e8 Implement log rotation for tray app 2024-06-19 12:53:34 -07:00
Michael Yang
1ae0750a21 Merge pull request #5147 from ollama/mxyng/cleanup
remove confusing log message
2024-06-19 12:50:31 -07:00
Michael Yang
9d91e5e587 remove confusing log message 2024-06-19 11:14:11 -07:00
Daniel Hiltgen
96624aa412 Merge pull request #5072 from dhiltgen/windows_path
Move libraries out of users path
2024-06-19 09:13:39 -07:00
Daniel Hiltgen
10f33b8537 Merge pull request #5146 from dhiltgen/backout
Put back temporary intel GPU env var
2024-06-19 09:12:45 -07:00
Daniel Hiltgen
4a633cc295 Merge pull request #5145 from dhiltgen/bad_loads
Fix bad symbol load detection
2024-06-19 09:12:33 -07:00
Daniel Hiltgen
d34d88e417 Revert "Revert "gpu: add env var for detecting Intel oneapi gpus (#5076)""
This reverts commit 755b4e4fc2.
2024-06-19 08:57:41 -07:00
Daniel Hiltgen
52ce350b7a Fix bad symbol load detection
pointer deref's weren't correct on a few libraries, which explains
some crashes on older systems or miswired symlinks for discovery libraries.
2024-06-19 08:39:07 -07:00
Daniel Hiltgen
2abebb2cbe Merge pull request #5128 from zhewang1-intc/fix_levelzero_empty_symbol_detect
Fix levelzero empty symbol detect
2024-06-19 08:33:16 -07:00
Blake Mizerany
380e06e5be types/model: remove Digest
The Digest type in its current form is awkward to work with and presents
challenges with regard to how it serializes via String using the '-'
prefix.

We currently only use this in ollama.com, so we'll move our specific
needs around digest parsing and validation there.
2024-06-18 20:28:11 -07:00
Wang,Zhe
badf975e45 get real func ptr. 2024-06-19 09:00:51 +08:00
Wang,Zhe
755b4e4fc2 Revert "gpu: add env var for detecting Intel oneapi gpus (#5076)"
This reverts commit 163cd3e77c.
2024-06-19 08:59:58 +08:00
Daniel Hiltgen
1a1c99e334 Bump latest fedora cuda repo to 39 2024-06-18 17:13:54 -07:00
Michael Yang
21adf8b6d2 Merge pull request #5121 from ollama/mxyng/deepseekv2
deepseek v2 graph
2024-06-18 16:30:58 -07:00
Daniel Hiltgen
784bf88b0d Wire up windows AMD driver reporting
This seems to be ROCm version, not actually driver version, but
it may be useful for toggling logic for VRAM reporting in the future
2024-06-18 16:22:47 -07:00
Michael Yang
e873841cbb deepseek v2 graph 2024-06-18 15:35:12 -07:00
Daniel Hiltgen
26d0bf9236 Merge pull request #5117 from dhiltgen/fix_prediction
Handle models with divergent layer sizes
2024-06-18 11:36:51 -07:00
Daniel Hiltgen
359b15a597 Handle models with divergent layer sizes
The recent refactoring of the memory prediction assumed all layers
are the same size, but for some models (like deepseek-coder-v2) this
is not the case, so our predictions were significantly off.
2024-06-18 11:05:34 -07:00
Daniel Hiltgen
b55958a587 Merge pull request #5106 from dhiltgen/clean_logs
Tighten up memory prediction logging
2024-06-18 09:24:38 -07:00
Daniel Hiltgen
7784ca33ce Tighten up memory prediction logging
Prior to this change, we logged the memory prediction multiple times
as the scheduler iterates to find a suitable configuration, which can be
confusing since only the last log before the server starts is actually valid.
This now logs once just before starting the server on the final configuration.
It also reports what library instead of always saying "offloading to gpu" when
using CPU.
2024-06-18 09:15:35 -07:00
Daniel Hiltgen
c9c8c98bf6 Merge pull request #5105 from dhiltgen/cuda_mmap
Adjust mmap logic for cuda windows for faster model load
2024-06-17 17:07:30 -07:00
Daniel Hiltgen
171796791f Adjust mmap logic for cuda windows for faster model load
On Windows, recent llama.cpp changes make mmap slower in most
cases, so default to off.  This also implements a tri-state for
use_mmap so we can detect the difference between a user provided
value of true/false, or unspecified.
2024-06-17 16:54:30 -07:00
Jeffrey Morgan
176d0f7075 Update import.md 2024-06-17 19:44:14 -04:00
Daniel Hiltgen
8ed51cac37 Merge pull request #5103 from dhiltgen/faster_win_build
Revert powershell jobs, but keep nvcc and cmake parallelism
2024-06-17 14:23:18 -07:00
Daniel Hiltgen
c9e6f0542d Merge pull request #5069 from dhiltgen/ci_release
Implement custom github release action
2024-06-17 13:59:37 -07:00
Daniel Hiltgen
b0930626c5 Add back lower level parallel flags
nvcc supports parallelism (threads) and cmake + make can use -j,
while msbuild requires /p:CL_MPcount=8
2024-06-17 13:44:46 -07:00
Daniel Hiltgen
e890be4814 Revert "More parallelism on windows generate"
This reverts commit 0577af98f4.
2024-06-17 13:32:46 -07:00
Daniel Hiltgen
b2799f111b Move libraries out of users path
We update the PATH on windows to get the CLI mapped, but this has
an unintended side effect of causing other apps that may use our bundled
DLLs to get terminated when we upgrade.
2024-06-17 13:12:18 -07:00
Jeffrey Morgan
152fc202f5 llm: update llama.cpp commit to 7c26775 (#4896)
* llm: update llama.cpp submodule to `7c26775`

* disable `LLAMA_BLAS` for now

* `-DLLAMA_OPENMP=off`
2024-06-17 15:56:16 -04:00
Lei Jitang
4ad0d4d6d3 Fix a build warning (#5096)
Signed-off-by: Lei Jitang <leijitang@outlook.com>
2024-06-17 14:47:48 -04:00
Jeffrey Morgan
163cd3e77c gpu: add env var for detecting Intel oneapi gpus (#5076)
* gpu: add env var for detecting intel oneapi gpus

* fix build error
2024-06-16 20:09:05 -04:00
Daniel Hiltgen
4c2c8f93dd Merge pull request #5080 from dhiltgen/debug_intel_crash
Add some more debugging logs for intel discovery
2024-06-16 14:42:41 -07:00
Daniel Hiltgen
fd1e6e0590 Add some more debugging logs for intel discovery
Also removes an unused overall count variable
2024-06-16 07:42:52 -07:00
royjhan
89c79bec8c Add ModifiedAt Field to /api/show (#5033)
* Add Mod Time to Show

* Error Handling
2024-06-15 20:53:56 -07:00
Jeffrey Morgan
c7b77004e3 docs: add missing powershell package to windows development instructions (#5075)
* docs: add missing instruction for powershell build

The powershell script for building Ollama on Windows now requires the `ThreadJob` module. Add this to the instructions and dependency list.

* Update development.md
2024-06-15 23:08:09 -04:00
Daniel Hiltgen
07d143f412 Merge pull request #5058 from coolljt0725/fix_build_warning
gpu: Fix build warning
2024-06-15 11:52:36 -07:00
Daniel Hiltgen
a12283e2ff Implement custom github release action
This implements the release logic we want via gh cli
to support updating releases with rc tags in place and retain
release notes and other community reactions.
2024-06-15 11:36:56 -07:00
Daniel Hiltgen
4b0050cf0e Merge pull request #5037 from dhiltgen/faster_win_build
More parallelism on windows generate
2024-06-15 08:03:05 -07:00
Daniel Hiltgen
0577af98f4 More parallelism on windows generate
Make the build faster
2024-06-15 07:44:55 -07:00
Daniel Hiltgen
17ce203a26 Merge pull request #4875 from dhiltgen/rocm_gfx900_workaround
Rocm gfx900 workaround
2024-06-15 07:38:58 -07:00
Daniel Hiltgen
d76555ffb5 Merge pull request #4874 from dhiltgen/rocm_v6_bump
Rocm v6 bump
2024-06-15 07:38:32 -07:00
Daniel Hiltgen
2786dff5d3 Merge pull request #4264 from dhiltgen/show_gpu_visible_settings
Centralize GPU configuration vars
2024-06-15 07:33:52 -07:00
Lei Jitang
225f0d1219 gpu: Fix build warning
Signed-off-by: Lei Jitang <leijitang@outlook.com>
2024-06-15 14:26:23 +08:00
Daniel Hiltgen
532db58311 Merge pull request #4972 from jayson-cloude/main
fix: "Skip searching for network devices"
2024-06-14 17:04:40 -07:00
Daniel Hiltgen
6be309e1bd Centralize GPU configuration vars
This should aid in troubleshooting by capturing and reporting the GPU
settings at startup in the logs along with all the other server settings.
2024-06-14 15:59:10 -07:00
Daniel Hiltgen
da3bf23354 Workaround gfx900 SDMA bugs
Implement support for GPU env var workarounds, and leverage
this for the Vega RX 56 which needs
HSA_ENABLE_SDMA=0 set to work properly
2024-06-14 15:38:13 -07:00
Daniel Hiltgen
26ab67732b Bump ROCm linux to 6.1.1 2024-06-14 15:37:54 -07:00
Daniel Hiltgen
45cacbaf05 Merge pull request #4517 from dhiltgen/gpu_incremental
Enhanced GPU discovery and multi-gpu support with concurrency
2024-06-14 15:35:00 -07:00
Daniel Hiltgen
17df6520c8 Remove mmap related output calc logic 2024-06-14 14:55:50 -07:00
Daniel Hiltgen
6f351bf586 review comments and coverage 2024-06-14 14:55:50 -07:00
Daniel Hiltgen
ff4f0cbd1d Prevent multiple concurrent loads on the same gpus
While models are loading, the VRAM metrics are dynamic, so try
to load on a GPU that doesn't have a model actively loading, or wait
to avoid races that lead to OOMs
2024-06-14 14:51:40 -07:00
Daniel Hiltgen
fc37c192ae Refine CPU load behavior with system memory visibility 2024-06-14 14:51:40 -07:00
Daniel Hiltgen
434dfe30c5 Reintroduce nvidia nvml library for windows
This library will give us the most reliable free VRAM reporting on windows
to enable concurrent model scheduling.
2024-06-14 14:51:40 -07:00
Daniel Hiltgen
4e2b7e181d Refactor intel gpu discovery 2024-06-14 14:51:40 -07:00
Daniel Hiltgen
48702dd149 Harden unload for empty runners 2024-06-14 14:51:40 -07:00
Daniel Hiltgen
68dfc6236a refined test timing
adjust timing on some tests so they don't timeout on small/slow GPUs
2024-06-14 14:51:40 -07:00
Daniel Hiltgen
5e8ff556cb Support forced spreading for multi GPU
Our default behavior today is to try to fit into a single GPU if possible.
Some users would prefer the old behavior of always spreading across
multiple GPUs even if the model can fit into one.  This exposes that
tunable behavior.
2024-06-14 14:51:40 -07:00
Daniel Hiltgen
6fd04ca922 Improve multi-gpu handling at the limit
Still not complete, needs some refinement to our prediction to understand the
discrete GPUs available space so we can see how many layers fit in each one
since we can't split one layer across multiple GPUs we can't treat free space
as one logical block
2024-06-14 14:51:40 -07:00
Daniel Hiltgen
206797bda4 Fix concurrency integration test to work locally
This worked remotely but wound up trying to spawn multiple servers
locally which doesn't work
2024-06-14 14:51:40 -07:00
Daniel Hiltgen
43ed358f9a Refine GPU discovery to bootstrap once
Now that we call the GPU discovery routines many times to
update memory, this splits initial discovery from free memory
updating.
2024-06-14 14:51:40 -07:00
Daniel Hiltgen
b32ebb4f29 Use DRM driver for VRAM info for amd
The amdgpu drivers free VRAM reporting omits some other apps, so leverage the
upstream DRM driver which keeps better tabs on things
2024-06-14 14:51:40 -07:00
Daniel Hiltgen
fb9cdfa723 Fix server.cpp for the new cuda build macros 2024-06-14 14:51:40 -07:00
Daniel Hiltgen
efac488675 Revert "Limit GPU lib search for now (#4777)"
This reverts commit 476fb8e892.
2024-06-14 14:51:40 -07:00
Jeffrey Morgan
6b800aa7b7 openai: do not set temperature to 0 when setting seed (#5045) 2024-06-14 13:43:56 -07:00
Jeffrey Morgan
dd7c9ebeaf server: longer timeout in TestRequests (#5046) 2024-06-14 09:48:25 -07:00
Patrick Devine
4dc7fb9525 update 40xx gpu compat matrix (#5036) 2024-06-13 17:10:33 -07:00
Daniel Hiltgen
c39761c552 Merge pull request #5032 from dhiltgen/actually_skip
Actually skip PhysX on windows
2024-06-13 13:26:09 -07:00
Daniel Hiltgen
aac367636d Actually skip PhysX on windows 2024-06-13 13:17:19 -07:00
Michael Yang
15a687ae4b Merge pull request #5031 from ollama/mxyng/fix-multibyte-utf16
fix: multibyte utf16
2024-06-13 13:14:55 -07:00
Michael Yang
d528e1af75 fix utf16 for multibyte runes 2024-06-13 13:07:42 -07:00
Michael Yang
cd234ce22c parser: add test for multibyte runes 2024-06-13 13:07:42 -07:00
Patrick Devine
94618b2365 add OLLAMA_MODELS to envconfig (#5029) 2024-06-13 12:52:03 -07:00
Jeffrey Morgan
1fd236d177 server: remove jwt decoding error (#5027) 2024-06-13 11:21:15 -07:00
Michael Yang
e87fc7200d Merge pull request #5025 from ollama/mxyng/revert-parser-scan
Revert "proper utf16 support"
2024-06-13 10:31:25 -07:00
Michael Yang
20b9f8e6f4 Revert "proper utf16 support"
This reverts commit 66ab48772f.

this change broke utf-8 scanning of multi-byte runes
2024-06-13 10:22:16 -07:00
Patrick Devine
c69bc19e46 move OLLAMA_HOST to envconfig (#5009) 2024-06-12 18:48:16 -04:00
Michael Yang
bba5d177aa Merge pull request #5004 from ollama/mxyng/fix-templates
fix: multiple templates when creating from model
2024-06-12 14:39:29 -07:00
Michael Yang
c16f8af911 fix: multiple templates when creating from model
multiple templates may appear in a model if a model is created from
another model that 1) has an autodetected template and 2) defines a
custom template
2024-06-12 13:35:49 -07:00
Michael Yang
217f60c3d9 Merge pull request #4987 from ollama/mxyng/revert-byte-order
Revert "Merge pull request #4938 from ollama/mxyng/fix-byte-order"
2024-06-11 16:04:20 -07:00
Michael Yang
7bdcd1da94 Revert "Merge pull request #4938 from ollama/mxyng/fix-byte-order"
This reverts commit f5f245cc15, reversing
changes made to 94d37fdcae.

this change broke gguf v2 which is incorrectly detected as big endian
2024-06-11 15:56:17 -07:00
Jeffrey Morgan
ead259d877 llm: fix seed value not being applied to requests (#4986) 2024-06-11 14:24:41 -07:00
James Montgomery
2ff45d571d Add Ollama-hpp to Community Libraries in README. (#4983) 2024-06-11 11:15:05 -07:00
jayson-cloude
157f09acdf fix: "Skip searching for network devices"
On an Ubuntu 24.04 computer with vmware installed, the sudo lshw command will get stuck. "Network interfaces" is always displayed
2024-06-11 16:11:35 +08:00
Michael Yang
0f3cf1d42e Merge pull request #4715 from ollama/mxyng/utf16-parser
proper utf16 support
2024-06-10 11:41:29 -07:00
Michael Yang
5bc029c529 Merge pull request #4921 from ollama/mxyng/import-md
update import.md
2024-06-10 11:41:09 -07:00
Michael Yang
e9a9c6a8e8 Merge pull request #4965 from ollama/mxyng/skip-layer-remove
fix: skip removing layers that no longer exist
2024-06-10 11:40:03 -07:00
Michael Yang
515f497e6d fix: skip removing layers that no longer exist 2024-06-10 11:32:19 -07:00
Michael Yang
b27268aaef add test 2024-06-10 11:32:15 -07:00
Michael Yang
f5f245cc15 Merge pull request #4938 from ollama/mxyng/fix-byte-order
fix parsing big endian gguf
2024-06-10 09:38:12 -07:00
Jim Scardelis
94d37fdcae fix: examples/langchain-python-rag-privategpt/requirements.txt (#3382) 2024-06-09 10:58:09 -07:00
Craig Hughes
b84aea1685 Critical fix from llama.cpp JSON grammar to forbid un-escaped escape characters inside strings, which breaks parsing. (#3782) 2024-06-09 10:57:09 -07:00
Napuh
896495de7b Add instructions to easily install specific versions on faq.md (#4084)
* Added instructions to easily install specific versions on faq.md

* Small typo

* Moved instructions on how to install specific version to linux.md

* Update docs/linux.md

* Update docs/linux.md

---------

Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>
2024-06-09 10:49:03 -07:00
dcasota
5528dd9d11 Error handling load_single_document() in ingest.py (#4852)
load_single_document() handles
- corrupt files
- empty (zero byte) files
- unsupported file extensions
2024-06-09 10:41:07 -07:00
Jeffrey Morgan
943172cbf4 Update api.md 2024-06-08 23:04:32 -07:00
Nischal Jain
85169e8d6f Added headless-ollama (#4612) 2024-06-08 18:51:16 -07:00
Jeffrey Morgan
34f142797a llm: always add bos token to prompt (#4941)
* fix embedding by adding fixes from llama.cpp upstream

* remove assert

---------

Co-authored-by: Jesper Ek <deadbeef84@gmail.com>
2024-06-08 18:47:10 -07:00
Erhan
46a7f1e74a Update README.md with LangChainRust (#4854) 2024-06-08 17:29:36 -07:00
Michael Yang
620d5c569e fix parsing big endian gguf 2024-06-08 12:35:26 -07:00
Michael Yang
b9ce7bf75e update import.md 2024-06-07 16:45:15 -07:00
Daniel Hiltgen
cddc63381c Merge pull request #4909 from dhiltgen/oneapi_disable
Add ability to skip oneapi generate
2024-06-07 14:07:15 -07:00
Michael Yang
385a32ecb5 Merge pull request #4910 from ollama/mxyng/detect-chat-template
fix create model when template detection errors
2024-06-07 11:07:39 -07:00
Michael Yang
030e765e76 fix create model when template detection errors 2024-06-07 10:51:35 -07:00
Daniel Hiltgen
ab8c929e20 Add ability to skip oneapi generate
This follows the same pattern for cuda and rocm to allow
disabling the build even when we detect the dependent libraries
2024-06-07 08:32:49 -07:00
Jeffrey Morgan
ce0dc33cb8 llm: patch to fix qwen 2 temporarily on nvidia (#4897) 2024-06-06 23:14:33 -07:00
Michael Yang
78f81fc0e5 Merge pull request #4800 from ollama/mxyng/detect-chat-template
detect chat template from KV
2024-06-06 16:17:18 -07:00
Michael Yang
9b6c2e6eb6 detect chat template from KV 2024-06-06 16:03:47 -07:00
royjhan
1a29e9a879 API app/browser access (#4879)
* API app/browser access

* Add tauri (resolves #2291, #4791, #3799, #4388)
2024-06-06 15:19:03 -07:00
royjhan
4bf1da4944 Separate ListResponse and ModelResponse for api/tags vs api/ps (#4842)
* Remove false time fields

* Struct Separation for List and Process

* Remove Marshaler
2024-06-06 10:11:45 -07:00
Blake Mizerany
de5beb06b3 server: skip blob verification for already verified blobs 2024-06-05 16:39:11 -07:00
Sam
98e65929dc docs(tools): add gollama (#4829) 2024-06-05 14:13:39 -07:00
Michael Yang
66ab48772f proper utf16 support 2024-06-05 13:11:50 -07:00
Michael Yang
22fcf8f7de Merge pull request #3737 from ollama/mxyng/modelname-4
update create handler to use model.Name
2024-06-05 12:05:05 -07:00
royjhan
28c7813ac4 API PS Documentation (#4822)
* API PS Documentation
2024-06-05 11:06:53 -07:00
Kartikeya Mishra
1d8616d30f docs: update to add LLocal.in to web & desktop integrations (#4719) 2024-06-04 14:43:59 -07:00
Michael Yang
d61ef8b954 update create handler to use model.Name 2024-06-04 13:28:25 -07:00
Michael Yang
89d9900152 Merge pull request #4570 from ollama/mxyng/slices
lint some of the things
2024-06-04 13:27:05 -07:00
Michael
4a048715b6 local wording was confusing people
local wording was confusing people -- Ollama runs on cloud providers
2024-06-04 13:25:25 -07:00
Michael Yang
6297f85606 gofmt, goimports 2024-06-04 13:20:24 -07:00
Michael Yang
ed56428dd7 warn on intrange, usestdlibvars 2024-06-04 11:52:48 -07:00
Michael Yang
ad40b92b6a disable intrange 2024-06-04 11:35:30 -07:00
Michael Yang
8ce4032e72 more lint 2024-06-04 11:13:30 -07:00
Michael Yang
42660466f8 no usestdlibvars 2024-06-04 11:13:30 -07:00
Michael Yang
e919f6811f lint windows 2024-06-04 11:13:30 -07:00
Michael Yang
bf7edb0d5d lint linux 2024-06-04 11:13:30 -07:00
Michael Yang
f38353d6b9 stdin.fd 2024-06-04 11:13:30 -07:00
Michael Yang
201d853fdf nolintlint 2024-06-04 11:13:30 -07:00
Michael Yang
e40145a39d lint 2024-06-04 11:13:30 -07:00
Michael Yang
c895a7d13f some gocritic 2024-06-04 11:13:30 -07:00
Michael Yang
dad7a987ae nosprintfhostport 2024-06-04 11:13:30 -07:00
Michael Yang
8ffb51749f nolintlint 2024-06-04 11:13:30 -07:00
Michael Yang
55f6eba049 gofmt 2024-06-04 11:13:30 -07:00
Michael Yang
04f3c12bb7 replace x/exp/slices with slices 2024-06-04 11:13:30 -07:00
Shubham
60323e0805 add embed model command and fix question invoke (#4766)
* add embed model command and fix question invoke

* Update docs/tutorials/langchainpy.md

Co-authored-by: Kim Hallberg <hallberg.kim@gmail.com>

* Update docs/tutorials/langchainpy.md

---------

Co-authored-by: Kim Hallberg <hallberg.kim@gmail.com>
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>
2024-06-03 22:20:48 -07:00
Jeffrey Morgan
d4a86102fd update welcome prompt in windows to llama3 (#4779) 2024-06-01 21:05:51 -07:00
Jeffrey Morgan
476fb8e892 Limit GPU lib search for now (#4777)
* fix oneapi errors on windows 10
2024-06-01 19:24:33 -07:00
Michael Yang
829ff87bd1 revert tokenize ffi (#4761)
* Revert "use `int32_t` for call to tokenize (#4738)"

This reverts commit 763bb65dbb.

* Revert "vocab only"

This reverts commit bf54c845e9.

* Revert "use ffi for tokenizing/detokenizing"

This reverts commit 26a00a0410.
2024-05-31 18:54:21 -07:00
Josh
f6b622c4b3 Merge pull request #4733 from ollama/jyan/isvalidname
added IsValidNamespace function
2024-05-31 14:08:45 -07:00
Josh Yan
2e4da8eec2 added tests for IsValidNamespace 2024-05-31 11:48:07 -07:00
Jeffrey Morgan
763bb65dbb use int32_t for call to tokenize (#4738)
* use `int32_t` for call to tokenize

* variable naming

* cleanup

* fix crash
2024-05-30 21:43:30 -07:00
Jeffrey Morgan
7ca9605f54 speed up tests by only building static lib (#4740) 2024-05-30 21:43:15 -07:00
Michael Yang
eb2c443a79 Merge pull request #4736 from ollama/mxyng/vocab-only
vocab only for tokenize
2024-05-30 17:21:00 -07:00
Michael Yang
278e25ea44 Merge pull request #4737 from ollama/mxyng/less-generate
only generate on relevant changes
2024-05-30 17:17:50 -07:00
Jeffrey Morgan
a50a87a7b8 partial offloading: allow flash attention and disable mmap (#4734)
* partial offloading: allow flash attention and disable mmap

* allow mmap with num_gpu=0
2024-05-30 16:58:01 -07:00
Michael Yang
98085015d5 only generate on relevant changes 2024-05-30 16:54:11 -07:00
Michael Yang
bf54c845e9 vocab only 2024-05-30 16:49:28 -07:00
Josh Yan
c365f195a8 directly use isvalidpart 2024-05-30 16:40:04 -07:00
Josh
e91d0ef737 Merge pull request #4728 from ollama/jyan/japanese
fixed japanese characters deleted at end of line
2024-05-30 16:25:12 -07:00
Jeffrey Morgan
22f5c12ced Update llama.cpp submodule to 5921b8f0 (#4731)
* update llama.cpp submodule to `5921b8f089d3b7bda86aac5a66825df6a6c10603`

* add patch
2024-05-30 16:20:22 -07:00
Josh Yan
298c996e54 added IsValidNamespace function 2024-05-30 16:02:07 -07:00
Daniel Hiltgen
0fc0cfc6d2 Merge pull request #4594 from dhiltgen/doc_container_workarounds
Add isolated gpu test to troubleshooting
2024-05-30 13:10:54 -07:00
Josh Yan
914f68f021 replaced duplicate call with variable 2024-05-30 10:38:07 -07:00
Josh Yan
bd1d119ba9 fixed japanese characters deleted at end of line 2024-05-30 10:24:21 -07:00
Lei Jitang
a03be18189 Fix OLLAMA_LLM_LIBRARY with wrong map name and add more env vars to help message (#4663)
* envconfig/config.go: Fix wrong description of OLLAMA_LLM_LIBRARY

Signed-off-by: Lei Jitang <leijitang@outlook.com>

* serve: Add more env to help message of ollama serve

Add more enviroment variables to `ollama serve --help`
to let users know what can be configurated.

Signed-off-by: Lei Jitang <leijitang@outlook.com>

---------

Signed-off-by: Lei Jitang <leijitang@outlook.com>
2024-05-30 09:36:51 -07:00
Michael Yang
96bc232b43 Merge pull request #4413 from ollama/mxyng/name-check
check if name exists before create/pull/copy
2024-05-29 12:06:58 -07:00
Michael Yang
bca7b12284 Merge pull request #3718 from ollama/mxyng/modelname-3
update delete handler to use model.Name
2024-05-29 12:02:07 -07:00
Michael Yang
32cb1960c1 Merge pull request #4380 from ollama/mxyng/tokenize
use tokenize/detokenize
2024-05-29 12:00:59 -07:00
Michael Yang
de781b37c8 rm unused infill 2024-05-29 11:26:47 -07:00
Michael Yang
3e21799377 rm unused system prompt 2024-05-29 11:26:47 -07:00
Michael Yang
26a00a0410 use ffi for tokenizing/detokenizing 2024-05-29 11:26:47 -07:00
Daniel Hiltgen
646371f56d Merge pull request #3278 from zhewang1-intc/rebase_ollama_main
Enabling ollama to run on Intel GPUs with SYCL backend
2024-05-28 16:30:50 -07:00
Jeffrey Morgan
1f5008544b Update install.sh 2024-05-28 15:01:22 -07:00
Jeffrey Morgan
45cbfc5aee fix wsl2 status check for nvidia cards (#4689) 2024-05-28 14:49:46 -07:00
Jeffrey Morgan
6d423b383b Improve install experience on WSL2 and Linux (#4653) 2024-05-28 14:41:50 -07:00
Josh
ad897080a2 working on integration of multi-byte and multi-width runes (#4549)
* integrated runewidth for display management - fixed cursor movement for mutli-width char

* updated input and deletion of multi-byte chars

* fixed line history with some exceptions

* improved insert and add

* fixed issues with moving across lines

* end of line extra space tracking'

* saved changes

* fixed end of line issues with empty spaces

* worked some more

* worked on end of line

* fixed failed test

* fixed minor inserting bug

* fixed movement hotkeys

* adjusted hotkeys

* removed comments

* Update readline/buffer.go

Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com>

* Update readline/buffer.go

Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com>

* Update readline/buffer.go

Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com>

* Update readline/buffer.go

Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com>

* Update readline/buffer.go

Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com>

* Update readline/buffer.go

Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com>

* Update readline/buffer.go

Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com>

* Update readline/buffer.go

Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com>

* Update readline/buffer.go

Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com>

* Update readline/buffer.go

Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com>

* Update readline/buffer.go

Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com>

* Update readline/buffer.go

Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com>

* deleted comments and duplicate code

* removed duplicate code

* added comments, refactored add function to use addChar

* added helper to retrieve lineSpacing, renamed lineFlags for clarity

* fixed remove()

---------

Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com>
2024-05-28 12:04:03 -07:00
Jeffrey Morgan
b7d316d98d fix nvidia detection in install script (#4683) 2024-05-28 09:59:36 -07:00
Daniel Hiltgen
d7339fad52 Merge pull request #4682 from dhiltgen/more_time
Give the final model loading more time
2024-05-28 09:36:02 -07:00
Daniel Hiltgen
92c81e8117 Give the final model loading more time
On some systems, 1 minute isn't sufficient to finish the load after it
hits 100% This creates 2 distinct timers, although they're both set to
the same value for now so we can refine the timeouts further.
2024-05-28 09:08:10 -07:00
Tai
9db0996ed4 Add OllamaSpring Project to Readme (#4672)
* Add OllamaSpring Project to Readme

* Update README.md

---------

Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>
2024-05-27 19:58:26 -07:00
Orfeo Ciano
6f43898b17 Adds olpaka flutter client (#4647)
* Adds olpaka flutter client

* Update README.md

---------

Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>
2024-05-27 17:22:01 -07:00
Lei Jitang
7487229c34 llm/server.go: Fix 2 minor typos (#4661)
Signed-off-by: Lei Jitang <leijitang@outlook.com>
2024-05-27 17:21:10 -07:00
Rayan Mostovoi
8a8e7afa96 small fix on examples/python-simplechat/client.py to actually get a streamed response and get tokens printed as we receive it (#4671) 2024-05-27 17:19:20 -07:00
Jeffrey Morgan
c79f8c9c39 Ensure nvidia and nvidia_uvm kernel modules are loaded in install.sh script and at startup (#4652)
* ensure kernel modules are loaded in `install.sh` script and at startup

* indentation

* use `SUDO` variable

* restart if nouveau is detected

* consistent success message for AMD
2024-05-26 14:57:17 -07:00
Jeffrey Morgan
485016bfbb Update install.sh 2024-05-26 11:46:00 -07:00
Daniel Hiltgen
0165ba1651 Merge pull request #4638 from dhiltgen/better_error
Report better warning on client closed abort of load
2024-05-25 14:32:28 -07:00
Daniel Hiltgen
c4209d6d21 Report better warning on client closed abort of load
If the client closes the connection before we finish loading the model
we abort, so lets make the log message clearer why to help users
understand this failure mode
2024-05-25 09:23:28 -07:00
Michael Yang
6adca97f37 Merge pull request #4619 from noxer/patch-1
Fix download retry issue
2024-05-24 17:21:57 -07:00
Michael Yang
9a3c8003c8 Merge pull request #4624 from ollama/mxyng/fix-5
fix q5_0, q5_1
2024-05-24 16:11:21 -07:00
Michael Yang
d51f15257c Update llm/ggml.go
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com>
2024-05-24 16:10:43 -07:00
Michael Yang
8f440d579a fix q5_0, q5_1 2024-05-24 16:01:46 -07:00
Patrick Devine
4cc3be3035 Move envconfig and consolidate env vars (#4608) 2024-05-24 14:57:15 -07:00
Tim Scheuermann
db2ffa79f1 Fix download retry issue 2024-05-24 20:30:42 +02:00
Jeffrey Morgan
afd2b058b4 set codesign timeout to longer (#4605) 2024-05-23 22:46:23 -07:00
Wang,Zhe
fd5971be0b support ollama run on Intel GPUs 2024-05-24 11:18:27 +08:00
Daniel Hiltgen
89bf98bcf2 Merge pull request #4598 from dhiltgen/docs
Tidy up developer guide a little
2024-05-23 15:14:29 -07:00
Daniel Hiltgen
1b2d156094 Tidy up developer guide a little 2024-05-23 15:14:05 -07:00
Michael Yang
714adb8bd1 bump (#4597) 2024-05-23 14:16:26 -07:00
Daniel Hiltgen
95b1133d0c Merge pull request #4547 from dhiltgen/load_progress
Wire up load progress
2024-05-23 14:06:02 -07:00
Daniel Hiltgen
b37b496a12 Wire up load progress
This doesn't expose a UX yet, but wires the initial server portion
of progress reporting during load
2024-05-23 13:36:48 -07:00
Bruce MacDonald
d6f692ad1a Add support for IQ1_S, IQ3_S, IQ2_S, IQ4_XS. IQ4_NL (#4322)
Co-authored-by: ManniX-ITA <20623405+mann1x@users.noreply.github.com>
2024-05-23 13:21:49 -07:00
Daniel Hiltgen
f77713bf1f Add isolated gpu test to troubleshooting 2024-05-23 09:33:25 -07:00
Jeffrey Morgan
38255d2af1 Use flash attention flag for now (#4580)
* put flash attention behind flag for now

* add test

* remove print

* up timeout for sheduler tests
2024-05-22 21:52:09 -07:00
Michael
73630a7e85 add phi 3 medium (#4578) 2024-05-22 12:53:45 -04:00
Ikko Eltociear Ashimine
955c317cab chore: update tokenizer.go (#4571)
PreTokenziers -> PreTokenizers
2024-05-22 00:25:23 -07:00
Josh
9f18b88a06 Merge pull request #4566 from ollama/jyan/shortcuts
add Ctrl + W shortcut
2024-05-21 22:49:36 -07:00
Josh Yan
353f83a9c7 add Ctrl + W shortcut 2024-05-21 16:55:09 -07:00
Patrick Devine
3bade04e10 doc updates for the faq/troubleshooting (#4565) 2024-05-21 15:30:09 -07:00
Michael Yang
a6d0f443eb Merge pull request #4543 from ollama/mxyng/simple-safetensors
simplify safetensors reading
2024-05-21 14:43:55 -07:00
Michael Yang
96236b7968 Merge pull request #4268 from ollama/pdevine/llama3
Convert directly from llama3
2024-05-21 14:43:37 -07:00
Sang Park
4434d7f447 Correct typo in error message (#4535)
The spelling of the term "request" has been corrected, which was previously mistakenly written as "requeset" in the error log message.
2024-05-21 13:39:01 -07:00
Michael Yang
171eb040fc simplify safetensors reading 2024-05-21 11:28:22 -07:00
Michael Yang
3591bbe56f add test 2024-05-21 11:28:22 -07:00
Michael Yang
34d5ef29b3 fix conversion for f16 or f32 inputs 2024-05-21 11:28:22 -07:00
Michael Yang
bbbd9f20f3 cleanup 2024-05-20 16:13:57 -07:00
Michael Yang
547132e820 bpe pretokenizer 2024-05-20 16:13:57 -07:00
Patrick Devine
2d315ba9a9 add missing file 2024-05-20 16:13:57 -07:00
Patrick Devine
d355d2020f add fixes for llama 2024-05-20 16:13:57 -07:00
Patrick Devine
c8cf0d94ed llama3 conversion 2024-05-20 16:13:57 -07:00
Patrick Devine
4730762e5c add safetensors version 2024-05-20 16:13:57 -07:00
Patrick Devine
d88582dffd some changes for llama3 2024-05-20 16:13:57 -07:00
Michael Yang
2f81b3dce2 Merge pull request #4502 from ollama/mxyng/fix-quantize
fix quantize file types
2024-05-20 16:09:27 -07:00
jmorganca
5cab13739e set llama.cpp submodule commit to 614d3b9 2024-05-20 15:28:17 -07:00
Josh Yan
8aadad9c72 updated updateURL 2024-05-20 15:24:32 -07:00
Michael Yang
807d092761 fix quantize file types 2024-05-20 15:22:11 -07:00
Michael Yang
f36f1d6be9 tidy intermediate blobs 2024-05-20 15:15:06 -07:00
alwqx
8800c8a59b chore: fix typo in docs (#4536) 2024-05-20 14:19:03 -07:00
Michael Yang
b4dce13309 Merge pull request #4330 from ollama/mxyng/cache-intermediate-layers
cache and reuse intermediate blobs
2024-05-20 13:54:41 -07:00
Sam
e15307fdf4 feat: add support for flash_attn (#4120)
* feat: enable flash attention if supported

* feat: enable flash attention if supported

* feat: enable flash attention if supported

* feat: add flash_attn support
2024-05-20 13:36:03 -07:00
Michael Yang
3520c0e4d5 cache and reuse intermediate blobs
particularly useful for zipfiles and f16s
2024-05-20 13:25:10 -07:00
Patrick Devine
ccdf0b2a44 Move the parser back + handle utf16 files (#4533) 2024-05-20 11:26:45 -07:00
jmorganca
63a453554d go mod tidy 2024-05-19 23:03:57 -07:00
Patrick Devine
105186aa17 add OLLAMA_NOHISTORY to turn off history in interactive mode (#4508) 2024-05-18 11:51:57 -07:00
Daniel Hiltgen
ba04afc9a4 Merge pull request #4483 from dhiltgen/clean_exit
Don't return error on signal exit
2024-05-17 11:41:57 -07:00
Daniel Hiltgen
7e1e0086e7 Merge pull request #4482 from dhiltgen/integration_improvements
Skip max queue test on remote
2024-05-16 16:43:48 -07:00
Daniel Hiltgen
02b31c9dc8 Don't return error on signal exit 2024-05-16 16:25:38 -07:00
Daniel Hiltgen
7f2fbad736 Skip max queue test on remote
This test needs to be able to adjust the queue size down from
our default setting for a reliable test, so it needs to skip on
remote test execution mode.
2024-05-16 16:24:18 -07:00
Josh
5bece94509 Merge pull request #4463 from ollama/jyan/line-display
changed line display to be calculated with runewidth
2024-05-16 14:15:08 -07:00
Josh Yan
3d90156e99 removed comment 2024-05-16 14:12:03 -07:00
Rose Heart
5e46c5c435 Updating software for read me (#4467)
* Update README.md

Added chat/moderation bot to list of software.

* Update README.md

Fixed link error.
2024-05-16 13:55:14 -07:00
Jeffrey Morgan
583c1f472c update llama.cpp submodule to 614d3b9 (#4414) 2024-05-16 13:53:09 -07:00
Josh Yan
26bfc1c443 go fmt'd cmd.go 2024-05-15 17:26:39 -07:00
Josh Yan
799aa9883c go fmt'd cmd.go 2024-05-15 17:24:17 -07:00
Michael Yang
84ed77cbd8 Merge pull request #4436 from ollama/mxyng/done-part
return on part done
2024-05-15 17:16:24 -07:00
Josh Yan
c9e584fb90 updated double-width display 2024-05-15 16:45:24 -07:00
Josh Yan
17b1e81ca1 fixed width and word count for double spacing 2024-05-15 16:29:33 -07:00
Daniel Hiltgen
7e9a2da097 Merge pull request #4462 from dhiltgen/opt_out_build
Port cuda/rocm skip build vars to linux
2024-05-15 16:27:47 -07:00
Daniel Hiltgen
c48c1d7c46 Port cuda/rocm skip build vars to linux
Windows already implements these, carry over to linux.
2024-05-15 15:56:43 -07:00
Patrick Devine
d1692fd3e0 fix the cpu estimatedTotal memory + get the expiry time for loading models (#4461) 2024-05-15 15:43:16 -07:00
Daniel Hiltgen
5fa36a0833 Merge pull request #4459 from dhiltgen/sanitize_env_log
Sanitize the env var debug log
2024-05-15 14:58:55 -07:00
Daniel Hiltgen
853ae490e1 Sanitize the env var debug log
Only dump env vars we care about in the logs
2024-05-15 14:42:57 -07:00
Patrick Devine
f2cf97d6f1 fix typo in modelfile generation (#4439) 2024-05-14 15:34:29 -07:00
Patrick Devine
c344da4c5a fix keepalive for non-interactive mode (#4438) 2024-05-14 15:17:04 -07:00
Michael Yang
85a57006d1 check if name exists before create/pull/copy 2024-05-14 14:58:58 -07:00
Michael Yang
c5e892cb3e update tests 2024-05-14 14:56:31 -07:00
Michael Yang
81fb06f530 more resilient Manifests 2024-05-14 14:08:24 -07:00
Michael Yang
a385382ff5 filepath.Join 2024-05-14 14:08:24 -07:00
Michael Yang
b8772a353f remove DeleteModel 2024-05-14 14:08:24 -07:00
Michael Yang
c2714fcbfd routes: use Manifests for ListHandler 2024-05-14 14:08:24 -07:00
Michael Yang
a2fc933fed update delete handler to use model.Name 2024-05-14 14:08:24 -07:00
Michael Yang
0e331c7168 Merge pull request #4328 from ollama/mxyng/mem
count memory up to NumGPU if set by user
2024-05-14 13:47:44 -07:00
Michael Yang
ac145f75ca return on part done 2024-05-14 13:04:30 -07:00
Patrick Devine
a4b8d1f89a re-add system context (#4435) 2024-05-14 11:38:20 -07:00
Ryo Machida
798b107f19 Fixed the API endpoint /api/tags when the model list is empty. (#4424)
* Fixed the API endpoint /api/tags to return {models: []} instead of {models: null} when the model list is empty.

* Update server/routes.go

---------

Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>
2024-05-14 11:18:10 -07:00
Daniel Hiltgen
6a1b471365 Merge pull request #4430 from dhiltgen/gpu_info
Remove VRAM convergence check for windows
2024-05-14 10:59:06 -07:00
Daniel Hiltgen
ec231a7923 Remove VRAM convergence check for windows
The APIs we query are optimistic on free space, and windows pages
VRAM, so we don't have to wait to see reported usage recover on unload
2024-05-14 09:53:46 -07:00
Patrick Devine
7ca71a6b0f don't abort when an invalid model name is used in /save (#4416) 2024-05-13 18:48:28 -07:00
Josh
7607e6e902 Merge pull request #4379 from WolfTheDeveloper/main
Update `LlamaScript` to point to new link from Legacy link.
2024-05-13 18:08:32 -07:00
Patrick Devine
f1548ef62d update the FAQ to be more clear about windows env variables (#4415) 2024-05-13 18:01:13 -07:00
Patrick Devine
6845988807 Ollama ps command for showing currently loaded models (#4327) 2024-05-13 17:17:36 -07:00
Josh
9eed4a90ce Merge pull request #4411 from joshyan1/main
removed inconsistent punctuation
2024-05-13 15:30:45 -07:00
Josh Yan
f8464785a6 removed inconsistencies 2024-05-13 14:50:52 -07:00
Michael Yang
1d359e737e typo 2024-05-13 14:18:34 -07:00
Michael Yang
50b9056e09 count memory up to NumGPU 2024-05-13 14:13:10 -07:00
Josh Yan
91a090a485 removed inconsistent punctuation 2024-05-13 14:08:22 -07:00
睡觉型学渣
9c76b30d72 Correct typos. (#4387)
* Correct typos.

* Correct typos.
2024-05-12 18:21:11 -07:00
Zander Lewis
93f19910c5 Update LlamaScript to point to new link.
Still used Legacy link.
2024-05-12 11:24:21 -04:00
jmorganca
4ec7445a6f Revert "use post token"
This reverts commit 0fec3525ad.
2024-05-11 22:19:14 -07:00
Michael Yang
0372c51f82 Merge pull request #4369 from ollama/mxyng/post-token
use post token
2024-05-11 19:29:14 -07:00
Michael Yang
0fec3525ad use post token 2024-05-11 19:13:16 -07:00
Jeffrey Morgan
41ba3017fd Fix OpenAI finish_reason values when empty (#4368) 2024-05-11 15:31:41 -07:00
todashuta
8080fbce35 fix ollama create's usage string (#4362) 2024-05-11 14:47:49 -07:00
Michael Yang
ec14f6ceda case sensitive filepaths (#4366) 2024-05-11 14:12:36 -07:00
Daniel Hiltgen
c60a086635 Merge pull request #4331 from dhiltgen/fix_unit
Fix envconfig unit test
2024-05-11 09:16:28 -07:00
jmorganca
92ca2cca95 Revert "only forward some env vars"
This reverts commit ce3b212d12.
2024-05-10 22:53:21 -07:00
Patrick Devine
1e1634daca update go deps (#4324) 2024-05-10 21:39:27 -07:00
Daniel Hiltgen
824ee5446f Fix envconfig unit test 2024-05-10 16:49:48 -07:00
Daniel Hiltgen
879e2caf8c Merge pull request #4329 from dhiltgen/zero_layers
Fall back to CPU runner with zero layers
2024-05-10 15:23:16 -07:00
Daniel Hiltgen
c4014e73a2 Fall back to CPU runner with zero layers 2024-05-10 15:09:48 -07:00
Daniel Hiltgen
be9efdb981 Merge pull request #4326 from dhiltgen/fix_integration
Integration fixes
2024-05-10 14:25:59 -07:00
Daniel Hiltgen
074dc3b9d8 Integration fixes 2024-05-10 14:20:10 -07:00
Daniel Hiltgen
86f9b582d5 Merge pull request #4323 from dhiltgen/sort_by_free
Always use the sorted list of GPUs
2024-05-10 14:12:15 -07:00
Daniel Hiltgen
4142c3ef7c Always use the sorted list of GPUs
Make sure the first GPU has the most free space
2024-05-10 13:53:21 -07:00
Jeffrey Morgan
6602e793c0 Use --quantize flag and quantize api parameter (#4321)
* rename `--quantization` to `--quantize`

* backwards

* Update api/types.go

Co-authored-by: Michael Yang <mxyng@pm.me>

---------

Co-authored-by: Michael Yang <mxyng@pm.me>
2024-05-10 13:06:13 -07:00
Michael Yang
ea0fdaed28 Merge pull request #4320 from ollama/mxyng/phi2-mem
add phi2 mem
2024-05-10 12:35:08 -07:00
Michael Yang
1eb382da5a add phi2 mem 2024-05-10 12:13:28 -07:00
Jeffrey Morgan
bb6fd02298 Don't clamp ctx size in PredictServerFit (#4317)
* dont clamp ctx size in `PredictServerFit`

* minimum 4 context

* remove context warning
2024-05-10 10:17:12 -07:00
Daniel Hiltgen
7e2bceceee Merge pull request #4316 from dhiltgen/more_buffer
Bump VRAM buffer back up
2024-05-10 10:02:34 -07:00
Daniel Hiltgen
30a7d7096c Bump VRAM buffer back up
Under stress scenarios we're seeing OOMs so this should help stabilize
the allocations under heavy concurrency stress.
2024-05-10 09:15:28 -07:00
Michael Yang
200a18820e Merge pull request #4306 from ollama/mxyng/fix-routes 2024-05-10 08:58:16 -07:00
Michael Yang
e03637176d fix(routes): skip bad manifests 2024-05-10 08:46:11 -07:00
Bruce MacDonald
c02db93243 omit empty done reason 2024-05-09 16:45:29 -07:00
Michael Yang
ffa4d5134a Merge pull request #4305 from ollama/mxyng/typo
fix typo
2024-05-09 16:42:09 -07:00
Jeffrey Morgan
302d7fdbf3 prune partial downloads (#4272) 2024-05-09 16:35:20 -07:00
Michael Yang
cf442cd57e fix typo 2024-05-09 16:23:37 -07:00
Michael Yang
0e1ba65855 Merge pull request #4302 from ollama/mxyng/forward-env
only forward some env vars
2024-05-09 16:21:05 -07:00
Michael Yang
6aad333c63 Merge pull request #4298 from ollama/mxyng/log-cleanup
log clean up
2024-05-09 16:20:57 -07:00
Daniel Hiltgen
4fcc84e67a Merge pull request #4304 from dhiltgen/signals
Fix race in shutdown logic
2024-05-09 15:58:44 -07:00
Daniel Hiltgen
3ae2f441e0 Fix race in shutdown logic
Ensure the runners are terminated
2024-05-09 15:54:02 -07:00
Zander Lewis
2abb3f6424 Update README.md (#4300) 2024-05-09 15:30:49 -07:00
Michael Yang
ce3b212d12 only forward some env vars 2024-05-09 15:16:09 -07:00
Daniel Hiltgen
83d6d46e29 Merge pull request #4299 from dhiltgen/handle_vram_reporting_lag
Wait for GPU free memory reporting to converge
2024-05-09 15:08:56 -07:00
Daniel Hiltgen
354ad9254e Wait for GPU free memory reporting to converge
The GPU drivers take a while to update their free memory reporting, so we need
to wait until the values converge with what we're expecting before proceeding
to start another runner in order to get an accurate picture.
2024-05-09 14:56:01 -07:00
Michael Yang
58876091f7 log clean up 2024-05-09 14:55:36 -07:00
Daniel Hiltgen
dc18eee39d Merge pull request #4238 from dhiltgen/gpu_info
Record more GPU information
2024-05-09 14:26:58 -07:00
Daniel Hiltgen
8727a9c140 Record more GPU information
This cleans up the logging for GPU discovery a bit, and can
serve as a foundation to report GPU information in a future UX.
2024-05-09 14:18:14 -07:00
Daniel Hiltgen
d0425f26cf Merge pull request #4294 from dhiltgen/harden_subprocess_reaping
Harden subprocess reaping
2024-05-09 14:02:16 -07:00
Bruce MacDonald
cfa84b8470 add done_reason to the api (#4235) 2024-05-09 13:30:14 -07:00
Michael Yang
1580ed4c06 Merge pull request #4295 from ollama/mxyng/fix-list
routes: skip invalid filepaths
2024-05-09 11:37:34 -07:00
Michael Yang
a7ee84fc31 routes: skip invalid filepaths 2024-05-09 11:23:22 -07:00
Daniel Hiltgen
84ac7ce139 Refine subprocess reaping 2024-05-09 11:21:31 -07:00
tusharhero
788b092c49 docs: add Guix package manager in README. (#4040) 2024-05-09 11:10:24 -07:00
J S
5cde17a096 Add PromptingTools.jl (#2192) 2024-05-09 09:39:05 -07:00
Daniel Hiltgen
c3837eb08c Merge pull request #4289 from dhiltgen/doc_container_workarounds
Doc container usage and workaround for nvidia errors
2024-05-09 09:27:29 -07:00
Daniel Hiltgen
8cc0ee2efe Doc container usage and workaround for nvidia errors 2024-05-09 09:26:45 -07:00
Jeffrey Morgan
d5eec16d23 use model defaults for num_gqa, rope_frequency_base and rope_frequency_scale (#1983) 2024-05-09 09:06:13 -07:00
Carlos Gamez
daa1a032f7 Update langchainjs.md (#2027)
Updated sample code as per warning notification from the package maintainers
2024-05-08 20:21:03 -07:00
jmorganca
6042e8bc57 remove bash-comparemodels example 2024-05-08 19:49:45 -07:00
Daniel Hiltgen
920a4b0794 Merge remote-tracking branch 'upstream/main' into pr3702 2024-05-08 16:44:35 -07:00
Daniel Hiltgen
ee49844d09 Merge pull request #4153 from dhiltgen/gpu_verbose_response
Add GPU usage
2024-05-08 16:39:11 -07:00
Daniel Hiltgen
8a516ac862 Merge pull request #4241 from dhiltgen/fix_tmp_override
Detect noexec and report a better error
2024-05-08 15:34:22 -07:00
Daniel Hiltgen
bee2f4a3b0 Record GPU usage information
This records more GPU usage information for eventual UX inclusion.
2024-05-08 14:45:39 -07:00
Bruce MacDonald
cef45feaa4 Add preflight OPTIONS handling and update CORS config (#4086)
* Add preflight OPTIONS handling and update CORS config

- Implement early return with HTTP 204 (No Content) for OPTIONS requests in allowedHostsMiddleware to optimize preflight handling.

- Extend CORS configuration to explicitly allow 'Authorization' headers and 'OPTIONS' method when OLLAMA_ORIGINS environment variable is set.

* allow auth, content-type, and user-agent headers

* Update routes.go
2024-05-08 13:14:00 -07:00
Michael Yang
2687f02c96 Merge pull request #4265 from ollama/mxyng/fix-show-llava
routes: fix show llava models
2024-05-08 12:51:21 -07:00
Michael Yang
b25976aeb8 routes: fix show llava models 2024-05-08 12:43:36 -07:00
Michael Yang
001f167aad Merge pull request #4261 from ollama/mxyng/fix-tag-case
types/model: fix tag case
2024-05-08 11:09:47 -07:00
Michael Yang
486a2c1d94 types/model: fix tag case 2024-05-08 08:47:16 -07:00
Michael Yang
88cf154483 Merge pull request #4244 from ollama/mxyng/skip-if-same
skip if same quantization
2024-05-07 19:03:37 -07:00
Bruce MacDonald
8cbd3e7510 skip hidden files in list models handler (#4247) 2024-05-07 19:01:45 -07:00
Michael Yang
eeb695261f skip if same quantization 2024-05-07 17:44:19 -07:00
Bruce MacDonald
dc9b1111e0 fix invalid destination error message 2024-05-07 17:35:52 -07:00
Tobias Gårdhus
06ac829e70 Fix help string for stop parameter (#2307) 2024-05-07 16:48:35 -07:00
Daniel Hiltgen
72700279e2 Detect noexec and report a better error
This will bubble up a much more informative error message if noexec
is preventing us from running the subprocess
2024-05-07 16:46:15 -07:00
boessu
5d3f7fff26 Update langchainpy.md (#4236)
fixing pip code.
2024-05-07 16:36:34 -07:00
Eli Bendersky
d77c1c5f9d api: fill up API documentation (#3596)
* api: fill up API documentation

Followup for #2878

Now that the documentation is more complete, mention it in the README.

Updates #2840

* fix typo/lint

* Update README.md

Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>

---------

Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>
2024-05-07 16:27:46 -07:00
Giuseppe Lumia
2a5302a1cf Fix paste of text with line feed characters (#3043)
Some terminals may send line feed characters when pasting text with
newlines.
2024-05-07 15:26:07 -07:00
Michael Yang
ffbd3d173f Merge pull request #3715 from ollama/mxyng/modelname-2
update list handler to use model.Name
2024-05-07 15:21:39 -07:00
Michael Yang
1e0a669f75 Merge pull request #3682 from ollama/mxyng/quantize-all-the-things
quantize any fp16/fp32 model
2024-05-07 15:20:49 -07:00
Bruce MacDonald
527e9be058 fix: store accurate model parameter size (#4058)
- add test for number formatting
- fix bug where 1B and 1M were not stored correctly
- display 2 decimal points for million param sizes
- display 1 decimal point for billion param sizes
2024-05-07 14:41:53 -07:00
Renat
34bea2e272 Add macai to list of Web & Desktop integrations (#3881) 2024-05-07 13:31:34 -07:00
Fernando Maclen
fe44ae3371 Update README.md (#3884) 2024-05-07 13:17:35 -07:00
Michael Yang
adeb40eaf2 Merge pull request #4231 from ollama/mxyng/parser
types/model: fix parser for empty values
2024-05-07 10:48:32 -07:00
Michael Yang
d7d33e5255 Merge pull request #951 from ollama/mxyng/example-fly
fly example
2024-05-07 10:46:24 -07:00
Michael Yang
63bc884e25 types/model: fix parser for empty values 2024-05-07 10:44:43 -07:00
Michael Yang
ef4e095d24 Merge pull request #4232 from ollama/revert-4190-fix/golang-ci
Revert "fix golangci workflow not enable gofmt and goimports"
2024-05-07 10:39:37 -07:00
Michael Yang
4d4f75a8a8 Revert "fix golangci workflow missing gofmt and goimports (#4190)"
This reverts commit 04f971c84b.
2024-05-07 10:35:44 -07:00
Mélony QIN
3f71ba406a Correct the kubernetes terminology (#3843)
* add details on kubernetes deployment and separate the testing process

* Update examples/kubernetes/README.md

thanks for suggesting this change, I agree with you and let's make this project better together !

Co-authored-by: JonZeolla <Zeolla@gmail.com>

---------

Co-authored-by: QIN Mélony <MQN1@dsone.3ds.com>
Co-authored-by: JonZeolla <Zeolla@gmail.com>
2024-05-07 09:53:08 -07:00
Hause Lin
88a67127d8 Update README.md to include ollama-r library (#4012)
* Update README.md

Add Ollama for R - ollama-r library

* Update README.md

---------

Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>
2024-05-07 09:52:30 -07:00
Jeffrey Morgan
f7dc7dcc64 Update .gitattributes 2024-05-07 09:50:19 -07:00
alwqx
04f971c84b fix golangci workflow missing gofmt and goimports (#4190) 2024-05-07 09:49:40 -07:00
Michael Yang
548a7df014 update list handler to use model.Name 2024-05-07 09:38:45 -07:00
Michael Yang
70edb9bc4d Merge pull request #4215 from ollama/mxyng/mem
llm: add minimum based on layer size
2024-05-07 09:26:33 -07:00
Michael Yang
3f0ed03856 Update examples/flyio/README.md
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>
2024-05-07 09:25:01 -07:00
Michael Yang
4736391bfb llm: add minimum based on layer size 2024-05-06 17:04:19 -07:00
CrispStrobe
7c5330413b note on naming restrictions (#2625)
* note on naming restrictions

else push would fail with cryptic
retrieving manifest 
Error: file does not exist
==> maybe change that in code too

* Update docs/import.md

---------

Co-authored-by: C-4-5-3 <154636388+C-4-5-3@users.noreply.github.com>
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>
2024-05-06 16:03:21 -07:00
Jeffrey Morgan
39d9d22ca3 close server on receiving signal (#4213) 2024-05-06 16:01:37 -07:00
Jackie Li
af47413dba Add MarshalJSON to Duration (#3284)
---------

Co-authored-by: Patrick Devine <patrick@infrahq.com>
2024-05-06 15:59:18 -07:00
Michael Yang
b2f00aa977 close zip files 2024-05-06 15:27:19 -07:00
Michael Yang
6694be5e50 convert/llama: use WriteSeeker 2024-05-06 15:24:01 -07:00
Michael Yang
f5e8b207fb s/DisplayLongest/String/ 2024-05-06 15:24:01 -07:00
Michael Yang
d245460362 only quantize language models 2024-05-06 15:24:01 -07:00
Michael Yang
4d0d0fa383 no iterator 2024-05-06 15:24:01 -07:00
Michael Yang
7ffe45734d rebase 2024-05-06 15:24:01 -07:00
Michael Yang
01811c176a comments 2024-05-06 15:24:01 -07:00
Michael Yang
a7248f6ea8 update tests 2024-05-06 15:24:01 -07:00
Michael Yang
9685c34509 quantize any fp16/fp32 model
- FROM /path/to/{safetensors,pytorch}
- FROM /path/to/fp{16,32}.bin
- FROM model:fp{16,32}
2024-05-06 15:24:01 -07:00
Jeffrey Chen
d091fe3c21 Windows automatically recognizes username (#3214) 2024-05-06 15:03:14 -07:00
Mohamed A. Fouad
ee02f548c8 Update linux.md (#3847)
Add -e to viewing logs in order to show end of ollama logs
2024-05-06 15:02:25 -07:00
Daniel Hiltgen
b08870aff3 Merge pull request #4188 from dhiltgen/use_our_lib
User our bundled libraries (cuda) instead of the host library
2024-05-06 14:41:05 -07:00
Darinka
3ecae420ac Update api.md (#3945)
* Update api.md

Changed the calculation of tps (token/s) in the documentation

* Update docs/api.md

---------

Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>
2024-05-06 14:39:58 -07:00
Daniel Hiltgen
4cbbf0e13b Merge pull request #4090 from dhiltgen/rocm_paths
Support Fedoras standard ROCm location
2024-05-06 14:33:41 -07:00
Daniel Hiltgen
380378cc80 Use our libraries first
Trying to live off the land for cuda libraries was not the right strategy.  We need to use the version we compiled against to ensure things work properly
2024-05-06 14:23:29 -07:00
Daniel Hiltgen
0963c65027 Merge pull request #4208 from dhiltgen/fix_sched_test
Fix stale test logic
2024-05-06 14:23:12 -07:00
Jeffrey Morgan
ed740a2504 Fix no slots available error with concurrent requests (#4160) 2024-05-06 14:22:53 -07:00
Jeffrey Morgan
c9f98622b1 Skip scheduling cancelled requests, always reload unloaded runners (#4189) 2024-05-06 14:22:24 -07:00
Daniel Hiltgen
0a954e5066 Fix stale test logic
The model processing was recently changed to be deferred but
this test scenario hadn't been adjusted for that change in behavior.
2024-05-06 14:15:37 -07:00
Adrien Brault
aa93423fbf docs: pbcopy on mac (#3129) 2024-05-06 13:47:00 -07:00
Nurgo
01c9386267 Add BrainSoup to compatible clients list (#3473) 2024-05-06 13:42:16 -07:00
Daniel Hiltgen
af9eb36f9f Merge pull request #4135 from dhiltgen/no_physx
Skip PhysX cudart library
2024-05-06 13:34:00 -07:00
Daniel Hiltgen
06093fd396 Merge pull request #4067 from dhiltgen/cudart
Add CUDA Driver API for GPU discovery
2024-05-06 13:30:27 -07:00
Tony Loehr
86b7fcac32 Update README.md with StreamDeploy (#3621)
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com>
2024-05-06 11:14:41 -07:00
Hyden Liu
fb8ddc564e chore: delete HEAD (#4194) 2024-05-06 10:32:30 -07:00
Saif
242efe6611 👌 IMPROVE: add portkey library for production tools (#4119) 2024-05-06 10:25:23 -07:00
Jeffrey Morgan
1b0e6c9c0e Fix llava models not working after first request (#4164)
* fix llava models not working after first request

* individual requests only for llava models
2024-05-05 20:50:31 -07:00
Jeffrey Morgan
dfa2f32ca0 unload in critical section (#4187) 2024-05-05 17:18:27 -07:00
Daniel Hiltgen
840424a2c4 Merge pull request #4154 from dhiltgen/central_config
Centralize server config handling
2024-05-05 17:08:26 -07:00
Daniel Hiltgen
f56aa20014 Centralize server config handling
This moves all the env var reading into one central module
and logs the loaded config once at startup which should
help in troubleshooting user server logs
2024-05-05 16:49:50 -07:00
alwqx
6707768ebd chore: format go code (#4149) 2024-05-05 16:08:09 -07:00
Lord Basil - Automate EVERYTHING
c78bb76a12 update libraries for langchain_community + llama3 changed from llama2 (#4174) 2024-05-05 16:07:04 -07:00
Jeffrey Morgan
942c979232 allocate a large enough kv cache for all parallel requests (#4162) 2024-05-05 15:59:32 -07:00
Bernardo de Oliveira Bruning
06164911dd Update README.md (#4111)
---------

Co-authored-by: Patrick Devine <patrick@infrahq.com>
2024-05-05 14:45:32 -07:00
Patrick Devine
2a21363bb7 validate the format of the digest when getting the model path (#4175) 2024-05-05 11:46:12 -07:00
Daniel Hiltgen
026869915f Merge pull request #4144 from dhiltgen/max_queue
Make maximum pending request configurable
2024-05-05 10:53:44 -07:00
Daniel Hiltgen
45d61aaaa3 Add integration test to push max queue limits 2024-05-05 10:46:25 -07:00
Daniel Hiltgen
20f6c06569 Make maximum pending request configurable
This also bumps up the default to be 50 queued requests
instead of 10.
2024-05-04 21:00:52 -07:00
Daniel Hiltgen
371f5e52aa Merge pull request #4141 from dhiltgen/win_docs
Explain the 2 different windows download options
2024-05-04 12:50:16 -07:00
Daniel Hiltgen
e006480e49 Explain the 2 different windows download options 2024-05-04 12:50:05 -07:00
Michael Yang
aed545872d Merge pull request #4143 from ollama/mxyng/final-response
omit prompt and generate settings from final response
2024-05-03 17:39:49 -07:00
Michael Yang
44869c59d6 omit prompt and generate settings from final response 2024-05-03 17:00:02 -07:00
Daniel Hiltgen
52663284cf Merge pull request #4145 from dhiltgen/fix_lint
Fix lint warnings
2024-05-03 16:53:17 -07:00
Daniel Hiltgen
42fa9d7f0a Fix lint warnings 2024-05-03 16:44:19 -07:00
Michael Yang
b7a87a22b6 Merge pull request #4059 from ollama/mxyng/parser-2
rename parser to model/file
2024-05-03 13:01:22 -07:00
Dr Nic Williams
e8aaea030e Update 'llama2' -> 'llama3' in most places (#4116)
* Update 'llama2' -> 'llama3' in most places

---------

Co-authored-by: Patrick Devine <patrick@infrahq.com>
2024-05-03 15:25:04 -04:00
Daniel Hiltgen
b1ad3a43cb Skip PhysX cudart library
For some reason this library gives incorrect GPU information, so skip it
2024-05-03 11:55:32 -07:00
Daniel Hiltgen
267e25a750 Merge pull request #4129 from dhiltgen/unit_tests
Soften timeouts on sched unit tests
2024-05-03 11:10:26 -07:00
Daniel Hiltgen
9a32c514cb Soften timeouts on sched unit tests
This gives us more headroom on the scheduler tests to tamp
down some flakes.
2024-05-03 09:08:33 -07:00
Michael Yang
e9ae607ece Merge pull request #3892 from ollama/mxyng/parser
refactor modelfile parser
2024-05-02 17:04:47 -07:00
Michael Yang
93707fa3f2 Merge pull request #4108 from ollama/mxyng/lf
fix line ending
2024-05-02 14:55:15 -07:00
Michael Yang
94c369095f fix line ending
replace CRLF with LF
2024-05-02 14:53:13 -07:00
Jeffrey Morgan
9164b0161b Update .gitattributes 2024-05-02 14:06:31 -04:00
Daniel Hiltgen
e592e8fccb Support Fedoras standard ROCm location 2024-05-01 15:47:12 -07:00
Bryce Reitano
bf4fc25f7b Add a /clear command (#3947)
* Add a /clear command

* change help messages

---------

Co-authored-by: Patrick Devine <patrick@infrahq.com>
2024-05-01 17:44:36 -04:00
Michael Yang
5b806d8d24 Merge pull request #4089 from ollama/mxyng/target-invalid
server: destination invalid
2024-05-01 12:46:35 -07:00
Michael Yang
cb1e072643 Merge pull request #4087 from ollama/mxyng/fix-host-port
types/model: fix name for hostport
2024-05-01 12:42:07 -07:00
Michael Yang
45b6a12e45 server: target invalid 2024-05-01 12:40:45 -07:00
alwqx
68755f1f5e chore: fix typo in docs/development.md (#4073) 2024-05-01 15:39:11 -04:00
Michael Yang
997a455039 want filepath 2024-05-01 12:33:41 -07:00
Michael Yang
88775e1ff9 strip scheme from name 2024-05-01 12:26:19 -07:00
Michael Yang
8867e744ff types/model: fix name for hostport 2024-05-01 12:14:53 -07:00
Daniel Hiltgen
4fd064bea6 Merge pull request #4031 from MarkWard0110/fix/issue-3736
Fix/issue 3736: When runners are closing or expiring. Scheduler is getting dirty VRAM size readings.
2024-05-01 12:13:26 -07:00
Jeffrey Morgan
59fbceedcc use lf for line endings (#4085) 2024-05-01 15:02:45 -04:00
Mark Ward
321d57e1a0 Removing go routine calling .wait from load. 2024-05-01 18:51:10 +00:00
Mark Ward
ba26c7aa00 it will always return an error due to Kill() discarding Wait() errors 2024-05-01 18:51:10 +00:00
Mark Ward
63c763685f log when the waiting for the process to stop to help debug when other tasks execute during this wait.
expire timer clear the timer reference because it will not be reused.
close will clean up expireTimer if calling code has not already done this.
2024-05-01 18:51:10 +00:00
Mark Ward
34a4a94f13 ignore debug bin files 2024-05-01 18:51:10 +00:00
Mark Ward
f4a73d57a4 fix runner expire during active use. Clearing the expire timer as it is used. Allowing the finish to assign an expire timer so that the runner will expire after no use. 2024-05-01 18:51:10 +00:00
Mark Ward
948114e3e3 fix sched to wait for the runner to terminate to ensure following vram check will be more accurate 2024-05-01 18:51:10 +00:00
Arpit Jain
a3e60d9058 README.md: fix typos (#4007)
Co-authored-by: Blake Mizerany <blake.mizerany@gmail.com>
2024-05-01 10:39:38 -07:00
Michael Yang
8acb233668 use strings.Builder 2024-05-01 10:01:09 -07:00
Michael Yang
119589fcb3 rename parser to model/file 2024-05-01 09:53:50 -07:00
Michael Yang
5ea844964e cmd: import regexp 2024-05-01 09:53:45 -07:00
Michael Yang
bd8eed57fc fix parser name 2024-05-01 09:52:54 -07:00
Michael Yang
9cf0f2e973 use parser.Format instead of templating modelfile 2024-05-01 09:52:54 -07:00
Michael Yang
176ad3aa6e parser: add commands format 2024-05-01 09:52:54 -07:00
Michael Yang
4d08363580 comments 2024-05-01 09:52:54 -07:00
Michael Yang
8907bf51d2 fix multiline 2024-05-01 09:52:54 -07:00
Michael Yang
abe614c705 tests 2024-05-01 09:52:54 -07:00
Michael Yang
238715037d linting 2024-05-01 09:52:54 -07:00
Michael Yang
c0a00f68ae refactor modelfile parser 2024-05-01 09:52:54 -07:00
Jeffrey Morgan
f0c454ab57 gpu: add 512MiB to darwin minimum, metal doesn't have partial offloading overhead (#4068) 2024-05-01 11:46:03 -04:00
Daniel Hiltgen
089daaeabc Add CUDA Driver API for GPU discovery
We're seeing some corner cases with cudart which might be resolved by
switching to the driver API which comes bundled with the driver package
2024-04-30 18:00:45 -07:00
Blake Mizerany
b9f74ff3d6 types/model: reintroduce Digest (#4065) 2024-04-30 16:38:03 -07:00
jmorganca
fcf4d60eee llm: add back check for empty token cache 2024-04-30 17:38:44 -04:00
jmorganca
e33d5c2dbc update llama.cpp commit to 952d03d 2024-04-30 17:31:20 -04:00
Jeffrey Morgan
18d9a7e1f1 update llama.cpp submodule to f364eb6 (#4060) 2024-04-30 17:25:39 -04:00
Michael
8488388cbd Update README.md 2024-04-30 15:45:56 -04:00
Blake Mizerany
588901f449 types/model: reduce Name.Filepath allocs from 5 to 2 (#4039) 2024-04-30 11:09:19 -07:00
Bruce MacDonald
0a7fdbe533 prompt to display and add local ollama keys to account (#3717)
- return descriptive error messages when unauthorized to create blob or push a model
- display the local public key associated with the request that was denied
2024-04-30 11:02:08 -07:00
Christian Frantzen
5950c176ca Update langchainpy.md (#4037)
Updated the code a bit
2024-04-29 23:19:06 -04:00
Daniel Hiltgen
23d23409a0 Update llama.cpp (#4036)
* Bump llama.cpp to b2761

* Adjust types for bump
2024-04-29 23:18:48 -04:00
Patrick Devine
9009bedf13 better checking for OLLAMA_HOST variable (#3661) 2024-04-29 19:14:07 -04:00
Daniel Hiltgen
d4ac57e240 Merge pull request #4035 from dhiltgen/fix_relative_paths
Fix relative path lookup
2024-04-29 16:08:06 -07:00
Daniel Hiltgen
7b59d1770f Fix relative path lookup 2024-04-29 16:00:08 -07:00
Jeffrey Morgan
95ead8ffba Restart server on failure when running Windows app (#3985)
* app: restart server on failure

* fix linter

* address comments

* refactor log directory creation to be where logs are written

* check all log dir creation errors
2024-04-29 10:07:52 -04:00
Jeffrey Morgan
7aa08a77ca llm: dont cap context window limit to training context window (#3988) 2024-04-29 10:07:30 -04:00
Blake Mizerany
7e432cdfac types/model: remove old comment (#4020) 2024-04-28 20:52:26 -07:00
Jeffrey Morgan
586672f490 fix copying model to itself (#4019) 2024-04-28 23:47:49 -04:00
Daniel Hiltgen
b03408de74 Merge pull request #3972 from hmartinez82/win_arm64
Add support for building on Windows ARM64
2024-04-28 14:52:58 -07:00
Daniel Hiltgen
1e6a28bf5b Merge pull request #4009 from dhiltgen/cpu_concurrency
Fix concurrency for CPU mode
2024-04-28 14:20:27 -07:00
Daniel Hiltgen
d6e3b64582 Fix concurrency for CPU mode
Prior refactoring passes accidentally removed the logic to bypass VRAM
checks for CPU loads.  This adds that back, along with test coverage.

This also fixes loaded map access in the unit test to be behind the mutex which was
likely the cause of various flakes in the tests.
2024-04-28 13:42:39 -07:00
Blake Mizerany
114c932a8e types/model: allow _ as starter character in Name parts (#3991) 2024-04-27 21:24:52 -07:00
Jeffrey Morgan
7f7103de06 mac: update setup command to llama3 (#3986) 2024-04-27 22:52:10 -04:00
Blake Mizerany
c631a9c726 types/model: relax name length constraint from 2 to 1 (#3984) 2024-04-27 17:58:41 -07:00
Blake Mizerany
8fd9e56804 types/structs: drop unused structs package (#3981) 2024-04-27 14:06:11 -07:00
Hernan Martinez
8a65717f55 Do not build AVX runners on ARM64 2024-04-26 23:55:32 -06:00
Hernan Martinez
6d3152a98a Use architecture specific folders in installer script 2024-04-26 23:35:16 -06:00
Hernan Martinez
b438d485f1 Use architecture specific folders in the generate script 2024-04-26 23:34:12 -06:00
Hernan Martinez
204349b17b Use architecture specific folders in the build script 2024-04-26 23:26:03 -06:00
Hernan Martinez
86e67fc4a9 Add import declaration for windows,arm64 to llm.go 2024-04-26 23:23:53 -06:00
Blake Mizerany
2bed62926e types/model: remove Digest (for now) (#3970)
The Digest type needs more thought and is not necessary at the moment.
2024-04-26 21:14:28 -07:00
Jeffrey Morgan
aad8d128a0 also look at cwd as a root for windows runners (#3959) 2024-04-26 19:14:08 -04:00
Daniel Hiltgen
ec1acbb867 Merge pull request #3968 from dhiltgen/win_generate
Fine grain control over windows generate steps
2024-04-26 16:03:38 -07:00
Daniel Hiltgen
e4859c4563 Fine grain control over windows generate steps
This will speed up CI which already tries to only build static for unit tests
2024-04-26 15:49:46 -07:00
Nataly Merezhuk
8e30eb26bd Updates the setup command to use llama3. (#3962) 2024-04-26 18:41:01 -04:00
Daniel Hiltgen
0b5c589ca2 Merge pull request #3966 from dhiltgen/bump
Fix target in gen_windows.ps1
2024-04-26 15:36:53 -07:00
Michael Yang
65fadddc85 Merge pull request #3964 from ollama/mxyng/weights
fix gemma, command-r layer weights
2024-04-26 15:23:33 -07:00
Daniel Hiltgen
ed5fb088c4 Fix target in gen_windows.ps1 2024-04-26 15:10:42 -07:00
Michael Yang
f81f308118 fix gemma, command-r layer weights 2024-04-26 15:00:55 -07:00
Blake Mizerany
b1390a7b37 types/model: export ParseNameBare and Merge (#3957)
These are useful outside this package.
2024-04-26 14:58:07 -07:00
Michael Yang
11d83386a5 Merge pull request #3951 from ollama/mxyng/zip
check file type before zip
2024-04-26 14:51:23 -07:00
Jeffrey Morgan
bb31def011 return code 499 when user cancels request while a model is loading (#3955) 2024-04-26 17:38:29 -04:00
Michael Yang
41e03ede95 check file type before zip 2024-04-26 14:18:07 -07:00
Michael Yang
7fea1ecdf6 Merge pull request #3958 from ollama/mxyng/fix-workflow
use merge base for diff-tree
2024-04-26 14:17:56 -07:00
Blake Mizerany
054894271d .github/workflows/test.yaml: add in-flight cancellations on new push (#3956)
Also, remove a superfluous 'go get'
2024-04-26 13:54:24 -07:00
Michael Yang
6fef042f0b use merge base for diff-tree 2024-04-26 13:54:15 -07:00
Daniel Hiltgen
5c0c2d1d09 Merge pull request #3954 from dhiltgen/ci_fixes
Put back non-avx CPU build for windows
2024-04-26 13:09:03 -07:00
Blake Mizerany
37f9c8ad99 types/model: overhaul Name and Digest types (#3924) 2024-04-26 13:08:32 -07:00
Quinten van Buul
2a80f55e2a Update windows.md (#3855)
Fixed a typo
2024-04-26 16:04:15 -04:00
Daniel Hiltgen
421c878a2d Put back non-avx CPU build for windows 2024-04-26 12:44:07 -07:00
Daniel Hiltgen
36666c2142 Merge pull request #3925 from dhiltgen/bump
Bump llama.cpp to b2737
2024-04-26 10:09:38 -07:00
Daniel Hiltgen
85801317d1 Fix clip log import 2024-04-26 09:43:46 -07:00
Daniel Hiltgen
2ed0d65948 Bump llama.cpp to b2737 2024-04-26 09:43:28 -07:00
Daniel Hiltgen
d459dc4ad1 Merge pull request #3950 from dhiltgen/windows_packaging
Fix exe name for zip packaging on windows
2024-04-26 09:27:37 -07:00
Daniel Hiltgen
40bc4622ef Fix exe name for zip packaging on windows
The zip file encodes the OS and architecture, so keep the short exe name
2024-04-26 09:18:05 -07:00
Daniel Hiltgen
c0f818a07a Merge pull request #3948 from dhiltgen/win_generate
Refactor windows generate for more modular usage
2024-04-26 09:17:20 -07:00
Daniel Hiltgen
8671fdeda6 Refactor windows generate for more modular usage 2024-04-26 08:35:50 -07:00
Daniel Hiltgen
2619850fb4 Merge pull request #3933 from dhiltgen/ci_fixes
Move cuda/rocm dependency gathering into generate script
2024-04-26 07:01:24 -07:00
Daniel Hiltgen
8feb97dc0d Move cuda/rocm dependency gathering into generate script
This will make it simpler for CI to accumulate artifacts from prior steps
2024-04-25 22:38:44 -07:00
Daniel Hiltgen
4e1ff6dcbb Merge pull request #3926 from dhiltgen/ci_fixes
Fix release CI
2024-04-25 17:42:31 -07:00
Daniel Hiltgen
8589d752ac Fix release CI
download-artifact path was being used incorrectly.  It is where to
extract the zip not the files in the zip to extract.  Default is
workspace dir which is what we want, so omit it
2024-04-25 17:27:11 -07:00
Michael Yang
de4ded68b0 Merge pull request #3923 from ollama/mxyng/mem
only count output tensors
2024-04-25 16:34:17 -07:00
Daniel Hiltgen
9b5a3c5991 Merge pull request #3914 from dhiltgen/mac_perf
Improve mac parallel performance
2024-04-25 16:28:31 -07:00
Jeffrey Morgan
00b0699c75 Reload model if num_gpu changes (#3920)
* reload model if `num_gpu` changes

* dont reload on -1

* fix tests
2024-04-25 19:02:40 -04:00
Jeffrey Morgan
993cf8bf55 llm: limit generation to 10x context size to avoid run on generations (#3918)
* llm: limit generation to 10x context size to avoid run on generations

* add comment

* simplify condition statement
2024-04-25 19:02:30 -04:00
Michael Yang
7bb7cb8a60 only count output tensors 2024-04-25 15:24:08 -07:00
Daniel Hiltgen
b123be5b71 Adjust context size for parallelism 2024-04-25 13:58:54 -07:00
jmorganca
ddf5c09a9b use matrix multiplcation kernels in more cases 2024-04-25 13:58:54 -07:00
Roy Yang
5f73c08729 Remove trailing spaces (#3889) 2024-04-25 14:32:26 -04:00
Daniel Hiltgen
f503a848c2 Merge pull request #3895 from brycereitano/shiftloading
Move ggml loading to when attempting to fit
2024-04-25 09:24:08 -07:00
Bryce Reitano
36a6daccab Restructure loading conditional chain 2024-04-24 17:37:03 -06:00
Bryce Reitano
ceb0e26e5e Provide variable ggml for TestLoad 2024-04-24 17:19:55 -06:00
Bryce Reitano
284e02bed0 Move ggml loading to when we attempt fitting 2024-04-24 17:17:24 -06:00
Michael Yang
3450a57d4a Merge pull request #3713 from ollama/mxyng/modelname
update copy handler to use model.Name
2024-04-24 16:00:32 -07:00
Michael Yang
592dae31c8 update copy to use model.Name 2024-04-24 15:54:54 -07:00
Michael Yang
2010cbc5fa Merge pull request #3833 from ollama/mxyng/fix-from
fix: from blob
2024-04-24 15:13:47 -07:00
Michael Yang
ac0801eced only replace if it matches command 2024-04-24 14:49:26 -07:00
Michael Yang
ad66e5b060 split temp zip files 2024-04-24 14:18:01 -07:00
Blake Mizerany
ade4b55520 types/model: make ParseName use default without question (#3886) 2024-04-24 11:52:55 -07:00
Daniel Hiltgen
a6d62e0617 Merge pull request #3882 from dhiltgen/amd_gfx
AMD gfx patch rev is hex
2024-04-24 11:07:49 -07:00
Daniel Hiltgen
6e76348df7 Merge pull request #3834 from dhiltgen/not_found_in_path
Report errors on server lookup instead of path lookup failure
2024-04-24 10:50:48 -07:00
Daniel Hiltgen
0d6687f84c AMD gfx patch rev is hex
Correctly handle gfx90a discovery
2024-04-24 09:43:52 -07:00
Patrick Devine
74d2a9ef9a add OLLAMA_KEEP_ALIVE env variable to FAQ (#3865) 2024-04-23 21:06:51 -07:00
Patrick Devine
14476d48cc fixes for gguf (#3863) 2024-04-23 20:57:20 -07:00
Patrick Devine
ce8ce82567 add mixtral 8x7b model conversion (#3859) 2024-04-23 20:17:04 -07:00
Blake Mizerany
4dc4f1be34 types/model: restrict digest hash part to a minimum of 2 characters (#3858)
This allows users of a valid Digest to know it has a minimum of 2
characters in the hash part for use when sharding.

This is a reasonable restriction as the hash part is a SHA256 hash which
is 64 characters long, which is the common hash used. There is no
anticipation of using a hash with less than 2 characters.

Also, add MustParseDigest.

Also, replace Digest.Type with Digest.Split for getting both the type
and hash parts together, which is most the common case when asking for
either.
2024-04-23 18:24:17 -07:00
Daniel Hiltgen
16b52331a4 Merge pull request #3857 from dhiltgen/mem_escape_valve
Add back memory escape valve
2024-04-23 17:32:24 -07:00
Daniel Hiltgen
5445aaa94e Add back memory escape valve
If we get our predictions wrong, this can be used to
set a lower memory limit as a workaround.  Recent multi-gpu
refactoring accidentally removed it, so this adds it back.
2024-04-23 17:09:02 -07:00
Daniel Hiltgen
2ac3dd6853 Merge pull request #3850 from dhiltgen/windows_packaging
Move nested payloads to installer and zip file on windows
2024-04-23 16:35:20 -07:00
Daniel Hiltgen
d8851cb7a0 Harden sched TestLoad
Give the go routine a moment to deliver the expired event
2024-04-23 16:14:47 -07:00
Daniel Hiltgen
058f6cd2cc Move nested payloads to installer and zip file on windows
Now that the llm runner is an executable and not just a dll, more users are facing
problems with security policy configurations on windows that prevent users
writing to directories and then executing binaries from the same location.
This change removes payloads from the main executable on windows and shifts them
over to be packaged in the installer and discovered based on the executables location.
This also adds a new zip file for people who want to "roll their own" installation model.
2024-04-23 16:14:47 -07:00
Daniel Hiltgen
790cf34d17 Merge pull request #3846 from dhiltgen/missing_runner
Detect and recover if runner removed
2024-04-23 13:14:12 -07:00
Michael
928d844896 adding phi-3 mini to readme
adding phi-3 mini to readme
2024-04-23 13:58:31 -04:00
Daniel Hiltgen
939d6a8606 Make CI lint verbvose 2024-04-23 10:17:42 -07:00
Daniel Hiltgen
58888a74bc Detect and recover if runner removed
Tmp cleaners can nuke the file out from underneath us.  This detects the missing
runner, and re-initializes the payloads.
2024-04-23 10:05:26 -07:00
Daniel Hiltgen
cc5a71e0e3 Merge pull request #3709 from remy415/custom-gpu-defs
Adds support for customizing GPU build flags in llama.cpp
2024-04-23 09:28:34 -07:00
Michael Yang
e83bcf7f9a Merge pull request #3836 from ollama/mxyng/mixtral
fix: mixtral graph
2024-04-23 09:15:10 -07:00
Daniel Hiltgen
5690e5ce99 Merge pull request #3418 from dhiltgen/concurrency
Request and model concurrency
2024-04-23 08:31:38 -07:00
Daniel Hiltgen
f2ea8470e5 Local unicode test case 2024-04-22 19:29:12 -07:00
Daniel Hiltgen
34b9db5afc Request and model concurrency
This change adds support for multiple concurrent requests, as well as
loading multiple models by spawning multiple runners. The default
settings are currently set at 1 concurrent request per model and only 1
loaded model at a time, but these can be adjusted by setting
OLLAMA_NUM_PARALLEL and OLLAMA_MAX_LOADED_MODELS.
2024-04-22 19:29:12 -07:00
Daniel Hiltgen
8711d03df7 Report errors on server lookup instead of path lookup failure 2024-04-22 19:08:47 -07:00
Daniel Hiltgen
ee448deaba Merge pull request #3835 from dhiltgen/harden_llm_override
Trim spaces and quotes from llm lib override
2024-04-22 19:06:54 -07:00
Bruce MacDonald
6e8db04716 tidy community integrations
- move some popular integrations to the top of the lists
2024-04-22 17:29:08 -07:00
Bruce MacDonald
658e60cf73 Revert "stop running model on interactive exit"
This reverts commit fad00a85e5.
2024-04-22 17:23:11 -07:00
Bruce MacDonald
4c78f028f8 Merge branch 'main' of https://github.com/ollama/ollama 2024-04-22 17:22:28 -07:00
Michael Yang
435cc866a3 fix: mixtral graph 2024-04-22 17:19:44 -07:00
Hao Wu
c7d3a558f6 docs: update README to add chat (web UI) for LLM (#3810)
* add chat (web UI) for LLM

I have used chat with llama3 in local successfully and the code is MIT licensed.

* Update README.md

---------

Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com>
2024-04-22 20:19:39 -04:00
Maple Gao
089cdb2877 docs: Update README for Lobe-chat integration. (#3817)
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com>
2024-04-22 20:18:15 -04:00
Võ Đình Đạt
ea1e9aa36b Update README.md (#3655) 2024-04-22 20:16:55 -04:00
Jonathan Smoley
d0d28ef90d Update README.md with Discord-Ollama project (#3633)
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com>
2024-04-22 20:14:20 -04:00
Eric Curtin
6654186a7c Add podman-ollama to terminal apps (#3626)
The goal of podman-ollama is to make AI even more boring.

Signed-off-by: Eric Curtin <ecurtin@redhat.com>
2024-04-22 20:13:23 -04:00
Daniel Hiltgen
aa72281eae Trim spaces and quotes from llm lib override 2024-04-22 17:11:14 -07:00
reid41
74bcbf828f add qa-pilot link (#3612)
* add qa-pilot link

* format the link

* add shell-pilot
2024-04-22 20:10:34 -04:00
Christian Neff
fe39147e64 Add Chatbot UI v2 to Community Integrations (#3503) 2024-04-22 20:09:55 -04:00
Bruce MacDonald
fad00a85e5 stop running model on interactive exit 2024-04-22 16:22:14 -07:00
Jeremy
9c0db4cc83 Update gen_windows.ps1
Fixed improper env references
2024-04-21 16:13:41 -04:00
Cheng
62be2050dd chore: use errors.New to replace fmt.Errorf will much better (#3789) 2024-04-20 22:11:06 -04:00
Blake Mizerany
56f8aa6912 types/model: export IsValidNamePart (#3788) 2024-04-20 18:26:34 -07:00
Sri Siddhaarth
e6f9bfc0e8 Update api.md (#3705) 2024-04-20 15:17:03 -04:00
Jeremy
6f18297b3a Update gen_windows.ps1
Forgot a " on the write-host
2024-04-18 19:47:44 -04:00
Jeremy
15016413de Update gen_windows.ps1
Added OLLAMA_CUSTOM_CUDA_DEFS and OLLAMA_CUSTOM_ROCM_DEFS to customize GPU builds on Windows
2024-04-18 19:27:16 -04:00
Jeremy
440b7190ed Update gen_linux.sh
Added OLLAMA_CUSTOM_CUDA_DEFS and OLLAMA_CUSTOM_ROCM_DEFS instead of OLLAMA_CUSTOM_GPU_DEFS
2024-04-18 19:18:10 -04:00
Daniel Hiltgen
8d1995c625 Merge pull request #3708 from remy415/arm64static
move Ollama static build to its own flag
2024-04-18 16:04:12 -07:00
Daniel Hiltgen
fd01fbf038 Merge pull request #3710 from remy415/update-jetson-docs
update jetson tutorial
2024-04-18 16:02:08 -07:00
Blake Mizerany
0408205c1c types/model: accept former : as a separator in digest (#3724)
This also converges the old sep `:` to the new sep `-`.
2024-04-18 14:17:46 -07:00
Jeffrey Morgan
63a7edd771 Update README.md 2024-04-18 16:09:38 -04:00
Michael
554ffdcce3 add llama3 to readme
add llama3 to readme
2024-04-18 15:18:48 -04:00
ManniX-ITA
c496967e56 Merge branch 'ollama:main' into mannix-server 2024-04-18 18:45:15 +02:00
Jeremy
9850a4ce08 Merge branch 'ollama:main' into update-jetson-docs 2024-04-18 09:55:17 -04:00
Jeremy
3934c15895 Merge branch 'ollama:main' into custom-gpu-defs 2024-04-18 09:55:10 -04:00
Jeremy
fd048f1367 Merge branch 'ollama:main' into arm64static 2024-04-18 09:55:04 -04:00
Michael Yang
8645076a71 Merge pull request #3712 from ollama/mxyng/mem
add stablelm graph calculation
2024-04-17 15:57:51 -07:00
Michael Yang
05e9424824 Merge pull request #3664 from ollama/mxyng/fix-padding-2
fix padding to only return padding
2024-04-17 15:57:40 -07:00
Michael Yang
52ebe67a98 Merge pull request #3714 from ollama/mxyng/model-name-host
types/model: support : in PartHost for host:port
2024-04-17 15:34:03 -07:00
Michael Yang
889b31ab78 types/model: support : in PartHost for host:port 2024-04-17 15:16:07 -07:00
Michael Yang
3cf483fe48 add stablelm graph calculation 2024-04-17 13:57:19 -07:00
Jeremy
8dca03173d Merge remote-tracking branch 'upstream/main' into update-jetson-docs 2024-04-17 16:18:50 -04:00
Jeremy
85bdf14b56 update jetson tutorial 2024-04-17 16:17:42 -04:00
Jeremy
d524e5ef5e Merge branch 'custom-gpu-defs' of https://github.com/remy415/ollama into custom-gpu-defs 2024-04-17 16:01:03 -04:00
Jeremy
52f5370c48 add support for custom gpu build flags for llama.cpp 2024-04-17 16:00:48 -04:00
Jeremy
da8a0c7657 Merge branch 'ollama:main' into arm64static 2024-04-17 15:22:34 -04:00
Jeremy
1b42b4b59a Merge branch 'ollama:main' into custom-gpu-defs 2024-04-17 15:21:56 -04:00
Jeremy
7c000ec3ed adds support for OLLAMA_CUSTOM_GPU_DEFS to customize GPU build flags 2024-04-17 15:21:05 -04:00
jmorganca
c8afe7168c use correct extension for feature and model request issue templates 2024-04-17 15:18:40 -04:00
jmorganca
28d3cd0148 simpler feature and model request forms 2024-04-17 15:17:08 -04:00
jmorganca
eb5554232a simpler feature and model request forms 2024-04-17 15:14:49 -04:00
Jeremy
ea4c284a48 Merge branch 'ollama:main' into arm64static 2024-04-17 15:11:38 -04:00
jmorganca
2bdc320216 add descriptions to issue templates 2024-04-17 15:08:36 -04:00
jmorganca
32561aed09 simplify github issue templates a bit 2024-04-17 15:07:03 -04:00
Michael Yang
71548d9829 Merge pull request #3706 from ollama/mxyng/mem
account for all non-repeating layers
2024-04-17 11:58:20 -07:00
Jeremy
8aec92fa6d rearranged conditional logic for static build, dockerfile updated 2024-04-17 14:43:28 -04:00
Michael Yang
a8b9b930b4 account for all non-repeating layers 2024-04-17 11:21:21 -07:00
Michael
9755cf9173 acknowledge the amazing work done by Georgi and team! 2024-04-17 13:48:14 -04:00
Jeremy
70261b9bb6 move static build to its own flag 2024-04-17 13:04:28 -04:00
ManniX-ITA
c942e4a07b Fixed startup sequence to report model loading 2024-04-17 17:40:32 +02:00
ManniX-ITA
bd54b08261 Streamlined WaitUntilRunning 2024-04-17 17:39:52 +02:00
Blake Mizerany
9df6c85c3a types/model: add FilepathNoBuild (#3680)
Also, add test for DisplayLongest.

Also, plumb fill param to ParseName in MustParseName
2024-04-16 18:35:43 -07:00
Michael Yang
e74163af4c fix padding to only return padding 2024-04-16 15:43:26 -07:00
Michael Yang
fb9580df85 Merge pull request #3684 from ollama/mxyng/scale-graph
scale graph based on gpu count
2024-04-16 14:57:09 -07:00
Michael Yang
26df674785 scale graph based on gpu count 2024-04-16 14:44:13 -07:00
Jeffrey Morgan
7c9792a6e0 Support unicode characters in model path (#3681)
* parse wide argv characters on windows

* cleanup

* move cleanup to end of `main`
2024-04-16 17:00:12 -04:00
Michael Yang
7afb2e125a Merge pull request #3678 from ollama/mxyng/fix-darwin-partial-offloading
darwin: no partial offloading if required memory greater than system
2024-04-16 12:05:56 -07:00
Michael Yang
41a272de9f darwin: no partial offloading if required memory greater than system 2024-04-16 11:22:38 -07:00
Jeffrey Morgan
f335722275 update llama.cpp submodule to 7593639 (#3665) 2024-04-15 23:04:43 -04:00
Michael Yang
6d53b67c2c Merge pull request #3663 from ollama/mxyng/fix-padding 2024-04-15 17:44:54 -07:00
Michael Yang
969238b19e fix padding in decode
TODO: update padding() to _only_ returning the padding
2024-04-15 17:27:06 -07:00
Blake Mizerany
949d7832cf Revert "cmd: provide feedback if OLLAMA_MODELS is set on non-serve command (#3470)" (#3662)
This reverts commit 7d05a6ee8f.

This proved to be more painful than useful.

See: https://github.com/ollama/ollama/issues/3624
2024-04-15 16:58:00 -07:00
Sung Kim
99d227c9db Added Solar example at README.md (#3610)
Added just one line

| Solar              | 10.7B      | 6.1GB | `ollama run solar`             |
2024-04-15 19:54:23 -04:00
Carlos Gamez
a27e419b47 Update langchainjs.md (#2030)
Changed ollama.call() for ollama.invoke() as per deprecated documentation from langchain
2024-04-15 18:37:30 -04:00
Chandre Van Der Westhuizen
e4d0db5a97 Added MindsDB information (#3595)
* Added MindsDB information

Added more details to MindsDB so that Ollama users can know that they can connect their Ollama model with 200+ databases and apps

* updated text for mindsdb
2024-04-15 18:35:29 -04:00
Eli Bendersky
ba460802c2 examples: add more Go examples using the API (#3599)
* examples: go-multimodal

* examples: add go-pull-progress

* examples: add go-chat

* fix
2024-04-15 18:34:54 -04:00
Jeffrey Morgan
e54a3c7fcd Update modelfile.md
Remove Modelfile parameters that are decided at runtime
2024-04-15 15:35:44 -04:00
Patrick Devine
9f8691c6c8 Add llama2 / torch models for ollama create (#3607) 2024-04-15 11:26:42 -07:00
Jeffrey Morgan
a0b8a32eb4 Terminate subprocess if receiving SIGINT or SIGTERM signals while model is loading (#3653)
* terminate subprocess if receiving `SIGINT` or `SIGTERM` signals while model is loading

* use `unload` in signal handler
2024-04-15 12:09:32 -04:00
Jeffrey Morgan
7027f264fb app: gracefully shut down ollama serve on windows (#3641)
* app: gracefully shut down `ollama serve` on windows

* fix linter errors

* bring back `HideWindow`

* remove creation flags

* restore `windows.CREATE_NEW_PROCESS_GROUP`
2024-04-14 18:33:25 -04:00
Blake Mizerany
9bee3b63b1 types/model: add path helpers (#3619)
This commit adds path helpers for working with Names in URL and file
paths. The new helpers are ParseNameFromPath, ParseNameFromFilePath,
Name.Path, and Name.FilePath.

This commit also adds Name.DisplayLongest, and Name.DisplayLong.

Also, be it updates a place where strings.StripPrefix is more consistent
with the surrounding code.

Also, replace Parts with specific methods
2024-04-13 12:59:19 -07:00
Jeffrey Morgan
309aef7fee update llama.cpp submodule to 4bd0f93 (#3627) 2024-04-13 10:43:02 -07:00
Blake Mizerany
08655170aa types/model: make ParseName variants less confusing (#3617)
Also, fix http stripping bug.

Also, improve upon docs about fills and masks.
2024-04-12 13:57:57 -07:00
Blake Mizerany
2b341069a7 types/model: remove (*Digest).Scan and Digest.Value (#3605) 2024-04-11 13:32:31 -07:00
Daniel Hiltgen
c00fee6936 Merge pull request #3604 from dhiltgen/fix_rocm_deps
Fix rocm deps with new subprocess paths
2024-04-11 13:08:29 -07:00
Daniel Hiltgen
c2d813bdc3 Fix rocm deps with new subprocess paths 2024-04-11 12:52:06 -07:00
Michael Yang
786f3a1c44 Merge pull request #3600 from ollama/mxyng/mixtral 2024-04-11 12:23:37 -07:00
Michael Yang
3397eff0cd mixtral mem 2024-04-11 11:10:41 -07:00
Blake Mizerany
0efb7931c7 Revert "types/model: remove (*Digest).Scan and Digest.Value (#3589)"
This reverts commit 42f2cc408e.
2024-04-11 00:45:07 -07:00
Blake Mizerany
42f2cc408e types/model: remove (*Digest).Scan and Digest.Value (#3589) 2024-04-11 00:37:26 -07:00
Blake Mizerany
9446b795b5 types/model: remove DisplayLong (#3587) 2024-04-10 16:55:12 -07:00
Blake Mizerany
62f8cda3b3 types/model: remove MarshalText/UnmarshalText from Digest (#3586) 2024-04-10 16:52:49 -07:00
Blake Mizerany
6a1de23175 types/model: init with Name and Digest types (#3541) 2024-04-10 16:30:05 -07:00
Blake Mizerany
a7b431e743 server: provide helpful workaround hint when stalling on pull (#3584)
This is a quick fix to help users who are stuck on the "pull" step at
99%.

In the near future we're introducing a new registry client that
should/will hopefully be smarter. In the meantime, this should unblock
the users hitting issue #1736.
2024-04-10 16:24:37 -07:00
Michael Yang
5a25f93522 Merge pull request #3478 from ollama/mxyng/tensor-layer
refactor tensor query
2024-04-10 12:45:03 -07:00
Michael Yang
7e33a017c0 partial offloading 2024-04-10 11:37:20 -07:00
Michael Yang
8b2c10061c refactor tensor query 2024-04-10 11:37:20 -07:00
Michael Yang
c5c451ca3b Merge pull request #3579 from ollama/mxyng/fix-ci
fix ci
2024-04-10 11:37:01 -07:00
Michael Yang
2b4ca6cf36 fix ci 2024-04-10 11:35:12 -07:00
Eli Bendersky
ad90b9ab3d api: start adding documentation to package api (#2878)
* api: start adding documentation to package api

Updates #2840

* Fix lint typo report
2024-04-10 13:31:55 -04:00
Eli Bendersky
4340f8eba4 examples: start adding Go examples using api/ (#2879)
We can have the same examples as e.g. https://github.com/ollama/ollama-python/tree/main/examples
here. Using consistent naming and renaming the existing example to have -http-
since it uses direct HTTP requests rather than api/

Updates #2840
2024-04-10 13:26:45 -04:00
Daniel Hiltgen
4c7db6b7e9 Merge pull request #3566 from dhiltgen/more_time
Handle very slow model loads
2024-04-09 16:53:49 -07:00
Michael Yang
c03f0e3c3d Merge pull request #3565 from ollama/mxyng/rope
fix: rope
2024-04-09 16:36:55 -07:00
Daniel Hiltgen
c5ff443b9f Handle very slow model loads
During testing, we're seeing some models take over 3 minutes.
2024-04-09 16:35:10 -07:00
Michael Yang
01114b4526 fix: rope 2024-04-09 16:15:24 -07:00
Blake Mizerany
1524f323a3 Revert "build.go: introduce a friendlier way to build Ollama (#3548)" (#3564) 2024-04-09 15:57:45 -07:00
Blake Mizerany
fccf3eecaa build.go: introduce a friendlier way to build Ollama (#3548)
This commit introduces a more friendly way to build Ollama dependencies
and the binary without abusing `go generate` and removing the
unnecessary extra steps it brings with it.

This script also provides nicer feedback to the user about what is
happening during the build process.

At the end, it prints a helpful message to the user about what to do
next (e.g. run the new local Ollama).
2024-04-09 14:18:47 -07:00
Michael Yang
c77d45d836 Merge pull request #3506 from ollama/mxyng/quantize-redux
cgo quantize
2024-04-09 12:32:53 -07:00
Jeffrey Morgan
5ec12cec6c update llama.cpp submodule to 1b67731 (#3561) 2024-04-09 15:10:17 -04:00
Michael Yang
d9578d2bad Merge pull request #3559 from ollama/mxyng/ci
ci: use go-version-file
2024-04-09 11:03:18 -07:00
Michael Yang
cb8352d6b4 ci: use go-version-file 2024-04-09 09:50:12 -07:00
Alex Mavrogiannis
fc6558f47f Correct directory reference in macapp/README (#3555) 2024-04-09 09:48:46 -04:00
Michael Yang
9502e5661f cgo quantize 2024-04-08 15:31:08 -07:00
Michael Yang
e1c9a2a00f no blob create if already exists 2024-04-08 15:09:48 -07:00
writinwaters
1341ee1b56 Update README.md (#3539)
RAGFlow now supports integration with Ollama.
2024-04-08 10:58:14 -04:00
Jeffrey Morgan
63efa075a0 update generate scripts with new LLAMA_CUDA variable, set HIP_PLATFORM to avoid compiler errors (#3528) 2024-04-07 19:29:51 -04:00
Thomas Vitale
cb03fc9571 Docs: Remove wrong parameter for Chat Completion (#3515)
Fixes gh-3514

Signed-off-by: Thomas Vitale <ThomasVitale@users.noreply.github.com>
2024-04-06 09:08:35 -07:00
Michael Yang
a5ec9cfc0f Merge pull request #3508 from ollama/mxyng/rope 2024-04-05 18:46:06 -07:00
Michael Yang
be517e491c no rope parameters 2024-04-05 18:05:27 -07:00
Michael Yang
fc8e108642 Merge pull request #3496 from ollama/mxyng/cmd-r-graph
add command-r graph estimate
2024-04-05 12:26:21 -07:00
Daniel Hiltgen
c5d5c4a96c Merge pull request #3491 from dhiltgen/context_bust_test
Add test case for context exhaustion
2024-04-04 16:20:20 -07:00
Daniel Hiltgen
dfe330fa1c Merge pull request #3488 from mofanke/fix-windows-dll-compress
fix dll compress in windows building
2024-04-04 16:12:13 -07:00
Michael Yang
01f77ae25d add command-r graph estimate 2024-04-04 14:07:24 -07:00
Daniel Hiltgen
483b81a863 Merge pull request #3494 from dhiltgen/ci_release
Fail fast if mingw missing on windows
2024-04-04 10:15:40 -07:00
Daniel Hiltgen
36bd967722 Fail fast if mingw missing on windows 2024-04-04 09:51:26 -07:00
Jeffrey Morgan
b0e7d35db8 use an older version of the mac os sdk in release (#3484) 2024-04-04 09:48:54 -07:00
Daniel Hiltgen
aeb1fb5192 Add test case for context exhaustion
Confirmed this fails on 0.1.30 with known regression
but passes on main
2024-04-04 07:42:17 -07:00
Daniel Hiltgen
a2e60ebcaf Merge pull request #3490 from dhiltgen/ci_fixes
CI missing archive
2024-04-04 07:24:24 -07:00
Daniel Hiltgen
883ec4d1ef CI missing archive 2024-04-04 07:23:27 -07:00
mofanke
4de0126719 fix dll compress in windows building 2024-04-04 21:27:33 +08:00
Daniel Hiltgen
9768e2dc75 Merge pull request #3481 from dhiltgen/ci_fixes
CI subprocess path fix
2024-04-03 19:29:09 -07:00
Daniel Hiltgen
08600d5bec CI subprocess path fix 2024-04-03 19:12:53 -07:00
Daniel Hiltgen
a624e672d2 Merge pull request #3479 from dhiltgen/ci_fixes
Fix CI release glitches
2024-04-03 18:42:27 -07:00
Daniel Hiltgen
e4a7e5b2ca Fix CI release glitches
The subprocess change moved the build directory
arm64 builds weren't setting cross-compilation flags when building on x86
2024-04-03 16:41:40 -07:00
Michael Yang
a0a15cfd5b Merge pull request #3463 from ollama/mxyng/graph-estimate
update graph size estimate
2024-04-03 14:27:30 -07:00
Michael Yang
12e923e158 update graph size estimate 2024-04-03 13:34:12 -07:00
Jeffrey Morgan
cd135317d2 Fix macOS builds on older SDKs (#3467) 2024-04-03 10:45:54 -07:00
Michael Yang
4f895d633f Merge pull request #3466 from ollama/mxyng/head-kv
default head_kv to 1
2024-04-03 10:41:00 -07:00
Blake Mizerany
7d05a6ee8f cmd: provide feedback if OLLAMA_MODELS is set on non-serve command (#3470)
This also moves the checkServerHeartbeat call out of the "RunE" Cobra
stuff (that's the only word I have for that) to on-site where it's after
the check for OLLAMA_MODELS, which allows the helpful error message to
be printed before the server heartbeat check. This also arguably makes
the code more readable without the magic/superfluous "pre" function
caller.
2024-04-02 22:11:13 -07:00
Daniel Hiltgen
464d817824 Merge pull request #3464 from dhiltgen/subprocess
Fix numgpu opt miscomparison
2024-04-02 20:10:17 -07:00
Pier Francesco Contino
531324a9be feat: add OLLAMA_DEBUG in ollama server help message (#3461)
Co-authored-by: Pier Francesco Contino <pfcontino@gmail.com>
2024-04-02 18:20:03 -07:00
Daniel Hiltgen
6589eb8a8c Revert options as a ref in the server 2024-04-02 16:44:10 -07:00
Michael Yang
90f071c658 default head_kv to 1 2024-04-02 16:37:59 -07:00
Michael Yang
a039e383cd Merge pull request #3465 from ollama/mxyng/fix-metal
fix metal gpu
2024-04-02 16:29:58 -07:00
Michael Yang
80163ebcb5 fix metal gpu 2024-04-02 16:06:45 -07:00
Daniel Hiltgen
a57818d93e Merge pull request #3343 from dhiltgen/bump_more2
Bump llama.cpp to b2581
2024-04-02 15:08:26 -07:00
Daniel Hiltgen
841adda157 Fix windows lint CI flakiness 2024-04-02 12:22:16 -07:00
Daniel Hiltgen
0035e31af8 Bump to b2581 2024-04-02 11:53:07 -07:00
Daniel Hiltgen
c863c6a96d Merge pull request #3218 from dhiltgen/subprocess
Switch back to subprocessing for llama.cpp
2024-04-02 10:49:44 -07:00
Daniel Hiltgen
1f11b52511 Refined min memory from testing 2024-04-01 16:48:33 -07:00
Daniel Hiltgen
526d4eb204 Release gpu discovery library after use
Leaving the cudart library loaded kept ~30m of memory
pinned in the GPU in the main process.  This change ensures
we don't hold GPU resources when idle.
2024-04-01 16:48:33 -07:00
Daniel Hiltgen
0a74cb31d5 Safeguard for noexec
We may have users that run into problems with our current
payload model, so this gives us an escape valve.
2024-04-01 16:48:33 -07:00
Daniel Hiltgen
10ed1b6292 Detect too-old cuda driver
"cudart init failure: 35" isn't particularly helpful in the logs.
2024-04-01 16:48:33 -07:00
Daniel Hiltgen
4fec5816d6 Integration test improvements
Cleaner shutdown logic, a bit of response hardening
2024-04-01 16:48:18 -07:00
Daniel Hiltgen
0a0e9f3e0f Apply 01-cache.diff 2024-04-01 16:48:18 -07:00
Daniel Hiltgen
58d95cc9bd Switch back to subprocessing for llama.cpp
This should resolve a number of memory leak and stability defects by allowing
us to isolate llama.cpp in a separate process and shutdown when idle, and
gracefully restart if it has problems.  This also serves as a first step to be
able to run multiple copies to support multiple models concurrently.
2024-04-01 16:48:18 -07:00
Patrick Devine
3b6a9154dd Simplify model conversion (#3422) 2024-04-01 16:14:53 -07:00
Michael Yang
d6dd2ff839 Merge pull request #3241 from ollama/mxyng/mem
update memory estimations for gpu offloading
2024-04-01 13:59:14 -07:00
Michael Yang
e57a6ba89f Merge pull request #2926 from ollama/mxyng/decode-ggml-v2
refactor model parsing
2024-04-01 13:58:13 -07:00
Michael Yang
12ec2346ef Merge pull request #3442 from ollama/mxyng/generate-output
fix generate output
2024-04-01 13:56:09 -07:00
Michael Yang
1ec0df1069 fix generate output 2024-04-01 13:47:34 -07:00
Michael Yang
91b3e4d282 update memory calcualtions
count each layer independently when deciding gpu offloading
2024-04-01 13:16:32 -07:00
Michael Yang
d338d70492 refactor model parsing 2024-04-01 13:16:15 -07:00
Philipp Gillé
011bb67351 Add chromem-go to community integrations (#3437) 2024-04-01 11:17:37 -04:00
Saifeddine ALOUI
d124627202 Update README.md (#3436) 2024-04-01 11:16:31 -04:00
Jesse Zhang
b0a8246a69 Community Integration: CRAG Ollama Chat (#3423)
Corrective Retrieval Augmented Generation Demo, powered by Langgraph and Streamlit 🤗

Support: 
- Ollama
- OpenAI APIs
2024-04-01 11:16:14 -04:00
Yaroslav
e6fb39c182 Update README.md (#3378)
Plugins list updated
2024-03-31 13:10:05 -04:00
sugarforever
e1f1c374ea Community Integration: ChatOllama (#3400)
* Community Integration: ChatOllama

* fixed typo
2024-03-30 22:46:50 -04:00
Jeffrey Morgan
06a1508bfe Update 90_bug_report.yml 2024-03-29 10:11:17 -04:00
Patrick Devine
5a5efee46b Add gemma safetensors conversion (#3250)
Co-authored-by: Michael Yang <mxyng@pm.me>
2024-03-28 18:54:01 -07:00
Daniel Hiltgen
97ae517fbf Merge pull request #3398 from dhiltgen/release_latest
CI automation for tagging latest images
2024-03-28 16:25:54 -07:00
Daniel Hiltgen
44b813e459 Merge pull request #3377 from dhiltgen/rocm_v6_bump
Bump ROCm to 6.0.2 patch release
2024-03-28 16:07:54 -07:00
Daniel Hiltgen
539043f5e0 CI automation for tagging latest images 2024-03-28 16:07:37 -07:00
Daniel Hiltgen
dbcace6847 Merge pull request #3392 from dhiltgen/ci_build_win_cuda
CI windows gpu builds
2024-03-28 16:03:52 -07:00
Daniel Hiltgen
c91a4ebcff Bump ROCm to 6.0.2 patch release 2024-03-28 15:58:57 -07:00
Daniel Hiltgen
b79c7e4528 CI windows gpu builds
If we're doing generate, test windows cuda and rocm as well
2024-03-28 14:39:10 -07:00
Michael Yang
035b274b70 Merge pull request #3379 from ollama/mxyng/origins
fix: trim quotes on OLLAMA_ORIGINS
2024-03-28 14:14:18 -07:00
Michael Yang
9c6a254945 Merge pull request #3391 from ollama/mxyng-patch-1 2024-03-28 13:15:56 -07:00
Michael Yang
f31f2bedf4 Update troubleshooting link 2024-03-28 12:05:26 -07:00
Michael Yang
756c257553 Merge pull request #3380 from ollama/mxyng/conditional-generate
fix: workflows
2024-03-28 00:35:27 +01:00
Michael Yang
5255d0af8a fix: workflows 2024-03-27 16:30:01 -07:00
Michael Yang
af8a8a6b59 fix: trim quotes on OLLAMA_ORIGINS 2024-03-27 15:24:29 -07:00
Michael Yang
461ad25015 Merge pull request #3376 from ollama/mxyng/conditional-generate
only generate on changes to llm subdirectory
2024-03-27 22:12:53 +01:00
Michael Yang
8838ae787d stub stub 2024-03-27 13:59:12 -07:00
Michael Yang
db75402ade mangle arch 2024-03-27 13:44:50 -07:00
Michael Yang
1e85a140a3 only generate on changes to llm subdirectory 2024-03-27 12:45:35 -07:00
Michael Yang
c363282fdc Merge pull request #3375 from ollama/mxyng/conditional-generate
only generate cuda/rocm when changes to llm detected
2024-03-27 20:40:55 +01:00
Michael Yang
5b0c48d29e only generate cuda/rocm when changes to llm detected 2024-03-27 12:23:09 -07:00
Jeffrey Morgan
913306f4fd Detect arrow keys on windows (#3363)
* detect arrow keys on windows
* add some helpful comments
2024-03-26 18:21:56 -04:00
Jeffrey Morgan
f5ca7f8c8e add license in file header for vendored llama.cpp code (#3351) 2024-03-26 16:23:23 -04:00
Jeffrey Morgan
856b8ec131 remove need for $VSINSTALLDIR since build will fail if ninja cannot be found (#3350) 2024-03-26 16:23:16 -04:00
Patrick Devine
1b272d5bcd change github.com/jmorganca/ollama to github.com/ollama/ollama (#3347) 2024-03-26 13:04:17 -07:00
Christophe Dervieux
29715dbca7 malformed markdown link (#3358) 2024-03-26 10:46:36 -04:00
Daniel Hiltgen
54a028d07f Merge pull request #3356 from dhiltgen/fix_arm_linux
Switch runner for final release job
2024-03-25 20:54:46 -07:00
Daniel Hiltgen
f83e4db365 Switch runner for final release job
The manifest and tagging step use a lot of disk space
2024-03-25 20:51:40 -07:00
Daniel Hiltgen
3b5866a233 Merge pull request #3353 from dhiltgen/fix_arm_linux
Use Rocky Linux Vault to get GCC 10.2 installed
2024-03-25 19:38:56 -07:00
Daniel Hiltgen
b8c2be6142 Use Rocky Linux Vault to get GCC 10.2 installed
This should hopefully only be a temporary workaround until Rocky 8
picks up GCC 10.4 which fixes the NVCC bug
2024-03-25 19:18:50 -07:00
Daniel Hiltgen
e0319bd78d Revert "Switch arm cuda base image to centos 7"
This reverts commit 5dacc1ebe8.
2024-03-25 19:01:11 -07:00
Daniel Hiltgen
b31ed7f031 Merge pull request #3352 from dhiltgen/fix_arm_linux
Switch arm cuda base image to centos 7
2024-03-25 16:13:10 -07:00
Daniel Hiltgen
5dacc1ebe8 Switch arm cuda base image to centos 7
We had started using rocky linux 8, but they've updated to GCC 10.3,
which breaks NVCC.  10.2 is compatible (or 10.4, but that's not
available from rocky linux 8 repos yet)
2024-03-25 15:57:08 -07:00
Daniel Hiltgen
c2712b5566 Merge pull request #3348 from dhiltgen/bump_llamacpp
Bump llama.cpp to b2527
2024-03-25 14:15:53 -07:00
Daniel Hiltgen
8091ef2eeb Bump llama.cpp to b2527 2024-03-25 13:47:44 -07:00
Jeffrey Morgan
f38b705dc7 Fix ROCm link in development.md 2024-03-25 16:32:44 -04:00
Daniel Hiltgen
560be5e0b6 Merge pull request #3308 from dhiltgen/bump_more
Bump llama.cpp to b2510
2024-03-25 12:56:12 -07:00
Daniel Hiltgen
4a1c76b3aa Merge pull request #3331 from dhiltgen/integration_testing
Integration tests conditionally pull
2024-03-25 12:48:51 -07:00
Daniel Hiltgen
28a64e23ca Merge pull request #2279 from remy415/main
Add support for libcudart.so for CUDA devices (Adds Jetson support)
2024-03-25 12:46:28 -07:00
Niclas Pahlfer
92d74e2f59 adds ooo to community integrations (#1623)
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com>
2024-03-25 15:08:33 -04:00
Herval Freire
6f8f57dd1d Add cliobot to ollama supported list (#1873)
* Update README.md

* Update README.md

---------

Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com>
2024-03-25 15:07:19 -04:00
Chenhe Gu
b2fa68b0ea Add Dify.AI to community integrations (#1944)
Dify.AI is a model-agnostic LLMOps platform for building and managing LLM applications.

Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com>
2024-03-25 15:06:39 -04:00
Marco Antônio
3767d5ef0d enh: add ollero.nvim to community applications (#1905)
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com>
2024-03-25 15:06:08 -04:00
Ani Betts
9fed85bc8b Add typechat-cli to Terminal apps (#2428)
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com>
2024-03-25 15:05:04 -04:00
Miguel
4501bc0913 add new Web & Desktop link in readme for alpaca webui (#2881)
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com>
2024-03-25 15:00:18 -04:00
Danny Avila
57ba519e63 Add LibreChat to Web & Desktop Apps (#2918) 2024-03-25 14:59:18 -04:00
enoch1118
d98d322d24 Add Community Integration: OllamaGUI (#2927)
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com>
2024-03-25 14:58:28 -04:00
fly2tomato
0c3ec74cf1 Add Community Integration: OpenAOE (#2946)
* Update README.md

* Update README.md

---------

Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com>
2024-03-25 14:57:40 -04:00
tusharhero
42ae8359fa docs: Add AI telegram to Community Integrations. (#3033) 2024-03-25 14:56:42 -04:00
Timothy Carambat
e4b76dfb76 docs: Add AnythingLLM to README as integration option (#3145)
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com>
2024-03-25 14:54:48 -04:00
Jikku Jose
2c56517494 Add Saddle (#3178) 2024-03-25 14:54:09 -04:00
Yusuf Can Bayrak
cfbc1b152b tlm added to README.md terminal section. (#3274) 2024-03-25 14:53:26 -04:00
RAPID ARCHITECT
9305ac1b2e Update README.md (#3288)
Added Ollama Basic chat based on hyperdiv

Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com>
2024-03-25 14:52:25 -04:00
drazdra
45d6292959 Update README.md (#3338)
adding drazdra/ollama-chats to the list of UI :)
2024-03-25 14:50:51 -04:00
Blake Mizerany
22921a3969 doc: specify ADAPTER is optional (#3333) 2024-03-25 09:43:19 -07:00
Daniel Hiltgen
7b6cbc10ec Integration tests conditionally pull
If images aren't present, pull them.
Also fixes the expected responses
2024-03-25 08:57:45 -07:00
Jeremy
dfc6721b20 add support for libcudart.so for CUDA devices (adds Jetson support) 2024-03-25 11:07:44 -04:00
Blake Mizerany
acfa2b9422 llm: prevent race appending to slice (#3320) 2024-03-24 11:35:54 -07:00
Daniel Hiltgen
2c390a73ac Merge pull request #3282 from dhiltgen/gpu_docs
Add docs for GPU selection and nvidia uvm workaround
2024-03-24 19:15:03 +01:00
Daniel Hiltgen
3e30c75f3e Bump llama.cpp to b2510 2024-03-23 19:55:56 +01:00
Eddú Meléndez Gonzales
7e430ff352 Add Testcontainers into Libraries section (#3291)
Testcontainers provides a module for Ollama.
2024-03-23 19:55:25 +01:00
Daniel Hiltgen
1784113ef5 Merge pull request #3309 from dhiltgen/integration_testing
Revamp go based integration tests
2024-03-23 19:08:49 +01:00
Daniel Hiltgen
949b6c01e0 Revamp go based integration tests
This uplevels the integration tests to run the server which can allow
testing an existing server, or a remote server.
2024-03-23 14:24:18 +01:00
jmorganca
38daf0a252 rename .gitattributes 2024-03-23 12:40:31 +01:00
Daniel Hiltgen
43799532c1 Bump llama.cpp to b2474
The release just before ggml-cuda.cu refactoring
2024-03-23 09:54:56 +01:00
Daniel Hiltgen
d8fdbfd8da Add docs for GPU selection and nvidia uvm workaround 2024-03-21 11:52:54 +01:00
Bruce MacDonald
a5ba0fcf78 doc: faq gpu compatibility (#3142) 2024-03-21 05:21:34 -04:00
Jeffrey Morgan
3a30bf56dc Update faq.md 2024-03-20 17:48:39 +01:00
Daniel Hiltgen
a1c0a48524 Merge pull request #3122 from dhiltgen/better_tmp_cleanup
Better tmpdir cleanup
2024-03-20 16:28:03 +01:00
Daniel Hiltgen
74788b487c Better tmpdir cleanup
If expanding the runners fails, don't leave a corrupt/incomplete payloads dir
We now write a pid file out to the tmpdir, which allows us to scan for stale tmpdirs
and remove this as long as there isn't still a process running.
2024-03-20 16:03:19 +01:00
Jeffrey Morgan
7ed3e94105 Update faq.md 2024-03-18 10:24:39 +01:00
jmorganca
2297ad39da update faq.md 2024-03-18 10:17:59 +01:00
Michael Yang
01cff6136d Merge pull request #3217 from ollama/mxyng/cleanup
remove global
2024-03-18 02:13:30 -07:00
Michael Yang
3c4ad0ecab dyn global 2024-03-18 09:45:45 +01:00
Michael Yang
22f326464e Merge pull request #3083 from ollama/mxyng/refactor-readseeker
refactor readseeker
2024-03-16 12:08:56 -07:00
Jeffrey Morgan
e95ffc7448 llama: remove server static assets (#3174) 2024-03-15 19:24:12 -07:00
Jeffrey Morgan
2dce1ab40b add llm/ext_server directory to linguist-vendored (#3173) 2024-03-15 17:46:46 -07:00
Daniel Hiltgen
f4b31c2d53 Merge pull request #3111 from alitrack/main
Update ollama.iss
2024-03-15 16:46:59 -07:00
Daniel Hiltgen
ab3456207b Merge pull request #3028 from ollama/ci_release
CI release process
2024-03-15 16:40:54 -07:00
Daniel Hiltgen
6ad414f31e Merge pull request #3086 from dhiltgen/import_server
Import server.cpp to retain llava support
2024-03-15 16:10:35 -07:00
Daniel Hiltgen
052b5a3b77 Merge pull request #3171 from dhiltgen/rocm_94x
Add Radeon gfx940-942 GPU support
2024-03-15 15:58:33 -07:00
Daniel Hiltgen
d4c10df2b0 Add Radeon gfx940-942 GPU support 2024-03-15 15:34:58 -07:00
Daniel Hiltgen
540f4af45f Wire up more complete CI for releases
Flesh out our github actions CI so we can build official releaes.
2024-03-15 12:37:36 -07:00
Blake Mizerany
6ce37e4d96 llm,readline: use errors.Is instead of simple == check (#3161)
This fixes some brittle, simple equality checks to use errors.Is. Since
go1.13, errors.Is is the idiomatic way to check for errors.

Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>
2024-03-15 07:14:12 -07:00
Blake Mizerany
703684a82a server: replace blob prefix separator from ':' to '-' (#3146)
This fixes issues with blob file names that contain ':' characters to be rejected by file systems that do not support them.
2024-03-14 20:18:06 -07:00
Daniel Hiltgen
6459377ae0 Add ROCm support to linux install script (#2966) 2024-03-14 18:00:16 -07:00
Blake Mizerany
8546dd3d72 .github: fix model and feature request yml (#3155) 2024-03-14 15:26:06 -07:00
Blake Mizerany
87100be5e0 .github: add issue templates (#3143) 2024-03-14 15:19:10 -07:00
Michael Yang
e87c780ff9 Merge pull request #3149 from ollama/mxyng/fix-memory-leak
fix: clip memory leak
2024-03-14 13:34:15 -07:00
Michael Yang
291c663865 fix: clip memory leak 2024-03-14 13:12:42 -07:00
Daniel Hiltgen
da20786e3e Merge pull request #3068 from dhiltgen/win_pipe
Use stdin for term discovery on windows
2024-03-14 11:55:19 -07:00
Jeffrey Morgan
5ce997a7b9 Update README.md 2024-03-13 21:12:17 -07:00
Jeffrey Morgan
672ffe9b7d add OLLAMA_KEEP_ALIVE to environment variable docs for ollama serve (#3127) 2024-03-13 14:35:33 -07:00
Patrick Devine
47cfe58af5 Default Keep Alive environment variable (#3094)
---------

Co-authored-by: Chris-AS1 <8493773+Chris-AS1@users.noreply.github.com>
2024-03-13 13:29:40 -07:00
Daniel Hiltgen
c1a81c6fe3 Use stdin for term discovery on windows
When you feed input to the cmd via a pipe it no longer reports a warning
2024-03-13 10:37:31 -07:00
Steven Lee
152ab524c2 Update ollama.iss
add arm64 support
2024-03-13 20:15:45 +08:00
Jeffrey Morgan
e72c567cfd restore locale patch (#3091) 2024-03-12 22:08:13 -07:00
Bruce MacDonald
3e22611200 token repeat limit for prediction requests (#3080) 2024-03-12 22:08:25 -04:00
Daniel Hiltgen
a54d4a28dc Merge pull request #3088 from dhiltgen/rocm_igpu_linux
Fix iGPU detection for linux
2024-03-12 17:20:27 -07:00
Daniel Hiltgen
82b0c7c27e Fix iGPU detection for linux
This fixes a few bugs in the new sysfs discovery logic.  iGPUs are now
correctly identified by their <1G VRAM reported.  the sysfs IDs are off
by one compared to what HIP wants due to the CPU being reported
in amdgpu, but HIP only cares about GPUs.
2024-03-12 16:57:19 -07:00
Patrick Devine
ba7cf7fb66 add more docs on for the modelfile message command (#3087) 2024-03-12 16:41:41 -07:00
Bruce MacDonald
2f804068bd warn when json format is expected but not mentioned in prompt (#3081) 2024-03-12 19:07:11 -04:00
Daniel Hiltgen
85129d3a32 Adapt our build for imported server.cpp 2024-03-12 14:57:15 -07:00
Daniel Hiltgen
9ac6440da3 Import server.cpp as of b2356 2024-03-12 13:58:06 -07:00
Michael Yang
0085297928 refactor readseeker 2024-03-12 12:54:18 -07:00
Daniel Hiltgen
34d00f90b1 Merge pull request #3070 from dhiltgen/visible_devices
Add docs explaining GPU selection env vars
2024-03-12 11:36:46 -07:00
Daniel Hiltgen
b53229a2ed Add docs explaining GPU selection env vars 2024-03-12 11:33:06 -07:00
racerole
53c107e20e chore: fix typo (#3073)
Signed-off-by: racerole <jiangyifeng@outlook.com>
2024-03-12 14:09:22 -04:00
mofanke
51578d8573 fix gpu_info_cuda.c compile warning (#3077) 2024-03-12 14:08:40 -04:00
Jeffrey Morgan
b5fcd9d3aa use -trimpath when building releases (#3069) 2024-03-11 15:58:46 -07:00
Bruce MacDonald
b80661e8c7 relay load model errors to the client (#3065) 2024-03-11 16:48:27 -04:00
Jeffrey Morgan
6d3adfbea2 Update troubleshooting.md 2024-03-11 13:22:28 -07:00
Jeffrey Morgan
369eda65f5 update llama.cpp submodule to ceca1ae (#3064) 2024-03-11 12:57:48 -07:00
Michael Yang
f878e91070 Merge pull request #3044 from ollama/mxyng/fix-convert-shape
convert: fix shape
2024-03-11 09:56:57 -07:00
Daniel Hiltgen
0d651478e4 Merge pull request #3056 from dhiltgen/rocm_link_clash
Avoid rocm runner and dependency clash
2024-03-11 09:48:48 -07:00
Michael Yang
9ea492f1ce convert: fix shape 2024-03-11 09:41:01 -07:00
Daniel Hiltgen
bc13da2bfe Avoid rocm runner and dependency clash
Putting the rocm symlink next to the runners is risky.  This moves
the payloads into a subdir to avoid potential clashes.
2024-03-11 09:33:22 -07:00
Jeffrey Morgan
41b00b9856 fix 03-locale.diff 2024-03-10 16:21:05 -07:00
Daniel Hiltgen
c2a8ed48e7 Merge pull request #3048 from dhiltgen/harden_rocm_deps
Harden for deps file being empty (or short)
2024-03-10 15:17:22 -07:00
Daniel Hiltgen
3dc1bb6a35 Harden for deps file being empty (or short) 2024-03-10 14:45:38 -07:00
Daniel Hiltgen
7865a6996a Merge pull request #3046 from dhiltgen/rocm_search_paths
Add ollama executable peer dir for rocm
2024-03-10 12:30:56 -07:00
Daniel Hiltgen
00ec269321 Add ollama executable peer dir for rocm
This allows people who package up ollama on their own to place
the rocm dependencies in a peer directory to the ollama executable
much like our windows install flow.
2024-03-10 12:16:30 -07:00
Jeffrey Morgan
908005d90b patch: use default locale in wpm tokenizer (#3034) 2024-03-09 21:12:12 -08:00
Jeffrey Morgan
cdf65e793f only copy deps for amd64 in build_linux.sh 2024-03-09 17:55:22 -08:00
Daniel Hiltgen
82ca694d68 Rename ROCm deps file to avoid confusion (#3025) 2024-03-09 17:48:38 -08:00
Jeffrey Morgan
5017a15bcb add macapp to .dockerignore 2024-03-09 16:07:06 -08:00
Jeffrey Morgan
e11668aa07 add bundle_metal and cleanup_metal funtions to gen_darwin.sh 2024-03-09 16:04:57 -08:00
Jeffrey Morgan
0bd0f4a29c tidy cleanup logs 2024-03-09 15:56:48 -08:00
Jeffrey Morgan
1ffb1e2874 update llama.cpp submodule to 77d1ac7 (#3030) 2024-03-09 15:55:34 -08:00
Daniel Hiltgen
0a7844413c Merge pull request #3026 from dhiltgen/win_rocm_docs
Doc how to set up ROCm builds on windows
2024-03-09 14:17:19 -08:00
Jeffrey Morgan
f9cd55c70b disable gpu for certain model architectures and fix divide-by-zero on memory estimation 2024-03-09 12:51:38 -08:00
Daniel Hiltgen
0fdebb34a9 Doc how to set up ROCm builds on windows 2024-03-09 11:29:45 -08:00
Daniel Hiltgen
ac64cd4ef9 Merge pull request #3008 from dhiltgen/no_more_idempotent
Finish unwinding idempotent payload logic
2024-03-09 09:13:24 -08:00
Daniel Hiltgen
4a5c9b8035 Finish unwinding idempotent payload logic
The recent ROCm change partially removed idempotent
payloads, but the ggml-metal.metal file for mac was still
idempotent.  This finishes switching to always extract
the payloads, and now that idempotentcy is gone, the
version directory is no longer useful.
2024-03-09 08:34:39 -08:00
Jeffrey Morgan
efe5617b64 update llama.cpp submodule to c2101a2 (#3020) 2024-03-09 00:44:50 -08:00
Jeffrey Morgan
5b3fad9636 separate out isLocalIP 2024-03-09 00:22:08 -08:00
Jeffrey Morgan
bfec2c6e10 simplify host checks 2024-03-08 23:29:53 -08:00
Jeffrey Morgan
5c143af726 add additional allowed hosts 2024-03-08 23:23:59 -08:00
Jeffrey Morgan
6c0af2599e Update docs README.md and table of contents 2024-03-08 22:45:11 -08:00
Jeffrey Morgan
fc8c044584 add allowed host middleware and remove workDir middleware (#3018) 2024-03-08 22:23:47 -08:00
Michael Yang
ecc133d843 Merge pull request #3014 from ollama/mxyng/decode-ggla 2024-03-08 16:14:53 -08:00
Michael Yang
76bdebbadf decode ggla 2024-03-08 15:46:25 -08:00
Michael Yang
18979ad4a1 convert: fix default shape 2024-03-08 15:42:48 -08:00
Michael Yang
8e0ef931d8 Merge pull request #2990 from ollama/mxyng/default-term-size
fix: default terminal width, height
2024-03-08 15:20:54 -08:00
Daniel Hiltgen
280da44522 Merge pull request #2988 from dhiltgen/rocm_docs
Refined ROCm troubleshooting docs
2024-03-08 13:33:30 -08:00
Bruce MacDonald
0cebc79cba fix: allow importing a model from name reference (#3005) 2024-03-08 12:27:47 -05:00
Jeffrey Morgan
0e4669b04f update llama.cpp submodule to 6cdabe6 (#2999) 2024-03-08 00:26:20 -08:00
Jeffrey Morgan
b886bec3f9 Update api.md 2024-03-07 23:27:51 -08:00
Jeffrey Morgan
fc06205971 Revert "adjust download and upload concurrency based on available bandwidth" (#2995) 2024-03-07 18:10:16 -08:00
Blake Mizerany
2ada81e068 cmd: tighten up env var usage sections (#2962)
Also, document OLLAMA_HOST client semantics per command that honors it.
This looks nicer than having a general puprose environment variable
section in the root usage which was showing up after the "addition help
topics" section outputed by Cobra's default template.

It was decided this was easier to work with than using a custom template
for Cobra right now.
2024-03-07 13:57:07 -08:00
Michael Yang
b1e74d4fda default terminal width, height 2024-03-07 11:35:42 -08:00
Michael Yang
f678f5c5c3 Merge pull request #2991 from ollama/mxyng/fix-ci
fix ci
2024-03-07 11:35:06 -08:00
Michael Yang
2cb74e23fb fix ci 2024-03-07 11:33:49 -08:00
Daniel Hiltgen
69f0227813 Refined ROCm troubleshooting docs 2024-03-07 11:22:37 -08:00
Daniel Hiltgen
3c8df3808b Merge pull request #2885 from dhiltgen/rocm_v6_only
Revamp ROCm support
2024-03-07 10:51:00 -08:00
Michael Yang
7d564835c2 Merge pull request #2985 from ollama/rm-empty-examples
remove empty examples
2024-03-07 10:49:40 -08:00
Michael Yang
72431031d9 no ci test on docs, examples 2024-03-07 10:44:48 -08:00
Michael Yang
6041abb5b2 remove empty examples 2024-03-07 10:40:32 -08:00
Daniel Hiltgen
6c5ccb11f9 Revamp ROCm support
This refines where we extract the LLM libraries to by adding a new
OLLAMA_HOME env var, that defaults to `~/.ollama` The logic was already
idempotenent, so this should speed up startups after the first time a
new release is deployed.  It also cleans up after itself.

We now build only a single ROCm version (latest major) on both windows
and linux.  Given the large size of ROCms tensor files, we split the
dependency out.  It's bundled into the installer on windows, and a
separate download on windows.  The linux install script is now smart and
detects the presence of AMD GPUs and looks to see if rocm v6 is already
present, and if not, then downloads our dependency tar file.

For Linux discovery, we now use sysfs and check each GPU against what
ROCm supports so we can degrade to CPU gracefully instead of having
llama.cpp+rocm assert/crash on us.  For Windows, we now use go's windows
dynamic library loading logic to access the amdhip64.dll APIs to query
the GPU information.
2024-03-07 10:36:50 -08:00
Michael Yang
2e20110e50 Merge pull request #2221 from ollama/mxyng/up-down-ccy
adjust download and upload concurrency based on available bandwidth
2024-03-07 09:27:33 -08:00
Daniel Hiltgen
82ddc3e441 Merge pull request #2964 from dhiltgen/mem_limit_var
Allow setting max vram for workarounds
2024-03-07 09:25:44 -08:00
Jeffrey Morgan
d481fb3cc8 update go to 1.22 in other places (#2975) 2024-03-07 07:39:49 -08:00
DJ Johnson
23ee633252 docs: Add LLM-X to Web Integration section (#2759) 2024-03-07 10:11:53 -05:00
John
23ebe8fe11 fix some typos (#2973)
Signed-off-by: hishope <csqiye@126.com>
2024-03-06 22:50:11 -08:00
Patrick Devine
2c017ca441 Convert Safetensors to an Ollama model (#2824) 2024-03-06 21:01:51 -08:00
Daniel Hiltgen
be330174dd Allow setting max vram for workarounds
Until we get all the memory calculations correct, this can provide
and escape valve for users to workaround out of memory crashes.
2024-03-06 17:15:06 -08:00
Blake Mizerany
0ded7fdc4b cmd: document environment variables for serve command
Updates #2944
2024-03-06 13:48:46 -08:00
Leo
2103a5073c Add Odin Runes, a Feature-Rich Java UI for Ollama, to README (#2440)
* Add Odin Runes to README

Add Odin Runes to README

This commit adds Odin Runes to the "Community Integrations" section of the README. Odin Runes is a Java-based GPT client designed to provide seamless interaction with GPT models, enhancing productivity in prompt engineering and text generation tasks. This addition highlights the integration between Odin Runes and Ollama, offering users the flexibility to leverage large language models locally within their development workflow.

* Update README.md

this commit applies the comments of the reviewer.
2024-03-06 11:57:49 -08:00
Jeffrey Morgan
ce9f7c4674 Update api.md 2024-03-05 13:13:23 -08:00
Anders Rex
e5596c1944 Add NotesOllama to Community Integrations (#2909) 2024-03-04 01:18:10 -08:00
Timothy Graupmann
9bc3fee694 Added community link for Ollama Copilot (#2582)
* Added community link for Ollama Copilot

* Update README.md

---------

Co-authored-by: Michael <mchiang0610@users.noreply.github.com>
2024-03-04 00:40:36 -08:00
Jeffrey Morgan
21347e1ed6 update llama.cpp submodule to c29af7e (#2868) 2024-03-01 15:26:04 -08:00
Jeffrey Morgan
3b4bab3dc5 Fix embeddings load model behavior (#2848) 2024-02-29 17:40:56 -08:00
Daniel Hiltgen
cbd6e3b38e Merge pull request #2838 from dhiltgen/opensuse
Add ollama user to video group
2024-02-29 15:47:56 -08:00
Daniel Hiltgen
b830afa716 Merge pull request #2837 from dhiltgen/podman_image_support
Add env var so podman will map cuda GPUs
2024-02-29 15:47:37 -08:00
Daniel Hiltgen
bd1d8b0d14 Merge pull request #2836 from bmwiedemann/gzip
Omit build date from gzip headers
2024-02-29 15:46:46 -08:00
fred-bf
25c2912120 Add Community Integration: NextChat (#2780) 2024-02-29 12:12:13 -08:00
Michael Yang
0e19476b56 prepend image tags (#2789)
instead of appending image tags, prepend them - this generally produces better results
2024-02-29 11:30:14 -08:00
tylinux
fa2f2b3563 fix: print usedMemory size right (#2827) 2024-02-29 11:11:04 -08:00
Jeffrey Morgan
cbf4970e0f bump submodule to 87c91c07663b707e831c59ec373b5e665ff9d64a (#2828) 2024-02-29 09:42:08 -08:00
Daniel Hiltgen
74468513bd Add ollama user to video group
On OpenSUSE, ollama needs to be a member of the video group
to access the GPU
2024-02-29 08:50:10 -08:00
Daniel Hiltgen
794a916a72 Add env var so podman will map cuda GPUs
Without this env var, podman's GPU logic doesn't map the GPU through
2024-02-29 08:43:08 -08:00
Bernhard M. Wiedemann
76e5d9ec88 Omit build date from gzip headers
See https://reproducible-builds.org/ for why this is good.

This patch was done while working on reproducible builds for openSUSE.
2024-02-29 16:48:19 +01:00
Daniel Hiltgen
076237b8ea Merge pull request #2771 from dhiltgen/toggle_models
Bump llama.cpp to b2276
2024-02-27 11:29:53 -08:00
Daniel Hiltgen
53d694c67f Merge pull request #2772 from dhiltgen/container_image
Refine container image build script
2024-02-27 11:29:08 -08:00
Daniel Hiltgen
5aa6bfea94 Merge pull request #2785 from dhiltgen/win_download
Log unexpected server errors checking for update
2024-02-27 10:43:14 -08:00
Daniel Hiltgen
1cde63dd64 Log unexpected server errors checking for update
This should unmask some failure modes that likely
show up in app logs as unmarshal errors
2024-02-27 09:17:04 -08:00
Daniel Hiltgen
98e0b7e94f Refine container image build script
Allow overriding the platform, image name, and tag latest for
standard and rocm images.
2024-02-26 17:26:49 -08:00
Daniel Hiltgen
061e8f6abc Bump llama.cpp to b2276 2024-02-26 16:49:24 -08:00
peanut256
a189810df6 Determine max VRAM on macOS using recommendedMaxWorkingSetSize (#2354)
* read iogpu.wired_limit_mb on macOS

Fix for https://github.com/ollama/ollama/issues/1826

* improved determination of available vram on macOS

read the recommended maximal vram on macOS via Metal API

* Removed macOS-specific logging

* Remove logging from gpu_darwin.go

* release Core Foundation object

fixes a possible memory leak
2024-02-25 18:16:45 -05:00
Ikko Eltociear Ashimine
e95b896790 Update types.go (#2744)
specfied -> specified
2024-02-25 13:41:25 -05:00
elthommy
1f087c4d26 Update langchain python tutorial (#2737)
Remove unused GPT4all
Use nomic-embed-text as embedded model
Fix a deprecation warning (__call__)
2024-02-25 00:31:36 -05:00
Jeffrey Morgan
5d7ea6616f no extra disk space for windows installation (#2739) 2024-02-25 00:20:35 -05:00
Michael Yang
2a4b128ae3 Merge pull request #2719 from ollama/mxyng/format-private-key
remove format private key
2024-02-23 17:15:14 -08:00
Michael Yang
fc483274ad clean up go.mod 2024-02-23 16:53:36 -08:00
Michael Yang
fd10a2ad4b remove format/openssh.go
this is unnecessary now that x/crypto/ssh.MarshalPrivateKey has been
added
2024-02-23 16:52:23 -08:00
Benn Huang
b291f63188 Add Community Integration: Chatbox
Co-authored-by: bennhuang <bennhuang@tencent.com>
2024-02-23 07:17:28 -05:00
Jeffrey Morgan
f58856bf6f better directory cleanup in ollama.iss 2024-02-23 07:14:59 -05:00
Jeffrey Morgan
275ea01587 restore windows build flags and compression 2024-02-22 18:07:18 -05:00
Jeffrey Morgan
8782dd5628 fix build_windows.ps1 script to run go build with the correct flags 2024-02-22 17:41:43 -05:00
Jeffrey Morgan
11bfff8ee1 update llama.cpp submodule to 96633eeca1265ed03e57230de54032041c58f9cd 2024-02-22 16:44:26 -05:00
Logan Yang
7c0167a8f6 Add copilot for obsidian plugin to community integration (#1918) 2024-02-22 14:17:20 -05:00
LangChain4j
74d898e37d Added LangChain4j links (#1690) 2024-02-22 14:09:08 -05:00
Yuan-Man
c6e8b00718 Add README.md (#2249) 2024-02-22 14:03:44 -05:00
B-Tocs.org Community
be9980ef13 Update README.md - Ollama for SAP ABAP (#2510) 2024-02-22 13:12:27 -05:00
Augustinas Malinauskas
646a0dedb9 Update README.md (#2504)
- Enchanted is now supported for desktop on macOS
2024-02-22 13:09:29 -05:00
Azhar Khan
7f964d938c update README to add Gemma 2B, 7B model in Model Library Table (#2686) 2024-02-22 13:07:47 -05:00
Pavel Frankov
e6b8a139ff Update README.md (#2138) 2024-02-22 10:52:36 -05:00
Jeffrey Morgan
bdc0ea1ba5 Update import.md 2024-02-22 02:08:03 -05:00
Jeffrey Morgan
7fab7918cc Update import.md 2024-02-22 02:06:24 -05:00
Michael Yang
74c1bdba0d Merge pull request #2657 from joshyan1/patch-1
Update install.sh success message
2024-02-21 15:55:20 -08:00
Josh
f983ef7f5f Update install.sh success message 2024-02-21 18:30:01 -05:00
Jeffrey Morgan
1ae1c33651 Windows build + installer adjustments (#2656)
* remove `-w -s` linker flags on windows

* use `zip` for windows installer compression
2024-02-21 18:21:26 -05:00
Michael Yang
084d846621 refactor 2024-02-21 13:42:48 -08:00
Michael Yang
6a4b994433 lint 2024-02-21 13:42:48 -08:00
Michael Yang
bea007deb7 use LimitGroup for uploads 2024-02-21 13:42:48 -08:00
Michael Yang
074934be03 adjust group limit based on download speed 2024-02-21 13:42:48 -08:00
Michael Yang
0de12368a0 add new LimitGroup for dynamic concurrency 2024-02-21 13:42:48 -08:00
Michael Yang
917bd61084 refactor download run 2024-02-21 13:42:46 -08:00
Jeffrey Morgan
efe040f8c0 reset with init_vars ahead of each cpu build in gen_windows.ps1 (#2654) 2024-02-21 16:35:34 -05:00
Jeffrey Morgan
2a7553ce09 update llama.cpp submodule to c14f72d 2024-02-21 09:03:14 -05:00
Sun Bo
10af6070a9 Update big-AGI config file link (#2626)
Co-authored-by: bo.sun <bo.sun@cotticoffee.com>
2024-02-21 01:24:48 -05:00
Jeffrey Morgan
92423b0600 add dist directory in build_windows.ps 2024-02-21 00:05:05 -05:00
Jeffrey Morgan
b3eac61cac update llama.cpp submodule to f0d1fafc029a056cd765bdae58dcaa12312e9879 2024-02-20 22:56:51 -05:00
Jeffrey Morgan
287ba11500 better error message when calling /api/generate or /api/chat with embedding models 2024-02-20 21:53:45 -05:00
Jeffrey Morgan
63861f58cc Support for bert and nomic-bert embedding models 2024-02-20 21:37:29 -05:00
Jeffrey Morgan
f0425d3de9 Update faq.md 2024-02-20 20:44:45 -05:00
Michael Yang
210b65268e replace strings buffer with hasher (#2437)
the buffered value is going into the hasher eventually so write directly
to the hasher instead
2024-02-20 19:07:50 -05:00
Michael Yang
949d7b1c48 add gguf file types (#2532) 2024-02-20 19:06:29 -05:00
Michael Yang
897b213468 use http.DefaultClient (#2530)
default client already handles proxy
2024-02-20 18:34:47 -05:00
Jeffrey Morgan
4613a080e7 update llama.cpp submodule to 66c1968f7 (#2618) 2024-02-20 17:42:31 -05:00
Muhammed Nazeem
ace2cdf1c6 Add Page Assist to the community integrations (#2447) 2024-02-20 14:03:58 -05:00
Nikesh Parajuli
eed92bc19a docs: add Msty app in readme (#1775)
* docs: add Msty app in readme

* docs: update msty url
2024-02-20 14:03:33 -05:00
Michael Edoror
e0a2f46466 Update README.md to include Elixir LangChain Library (#2180)
The Elixir LangChain Library now supports Ollama Chat with this [PR](https://github.com/brainlid/langchain/pull/70)
2024-02-20 14:03:02 -05:00
Taras Tsugrii
01ff2e14db [nit] Remove unused msg local var. (#2511) 2024-02-20 14:02:34 -05:00
BADR
199e79ec0c docs: add tenere to terminal clients (#2329) 2024-02-19 23:13:03 -05:00
Jeffrey Morgan
8125ce4cb6 Update import.md
Add instructions to get public key on windows
2024-02-19 22:48:24 -05:00
Daniel
636d6eea99 Add ShellOracle to community terminal integrations (#1767) 2024-02-19 22:18:05 -05:00
Jeffrey Morgan
df56f1ee5e Update faq.md 2024-02-19 22:16:42 -05:00
Jean-Baptiste Detroyes
0b6c6c9092 feat: add Helm Chart link to Package managers list (#1673) 2024-02-19 22:05:14 -05:00
Jakob Hoeg Mørk
cb60389de7 NextJS web interface for Ollama (#2466) 2024-02-19 21:57:36 -05:00
lulz
ce0c95d097 [fix] /bye and /exit are now treated as prefixes (#2381)
* [fix] /bye and /exit are now treated as prefixes
instead of being treated as entire lines which doesn't align with the way the rest of the commands are treated

* Update cmd/interactive.go

Fixing whitespace

---------

Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>
2024-02-19 21:56:49 -05:00
Eddú Meléndez Gonzales
a9bc1e1c37 Add LangChain4J (#2164) 2024-02-19 21:17:32 -05:00
Branislav Gerazov
62c71f4cb1 add ollama-chat.nvim (#2188) 2024-02-19 21:14:29 -05:00
Jeffrey Morgan
41aca5c2d0 Update faq.md 2024-02-19 21:11:01 -05:00
Jeffrey Morgan
753724d867 Update api.md to include examples for reproducible outputs 2024-02-19 20:36:16 -05:00
Jeffrey Morgan
e4576c2ee1 Update README.md 2024-02-19 20:15:24 -05:00
Patrick Devine
9a7a4b9533 add faqs for memory pre-loading and the keep_alive setting (#2601) 2024-02-19 14:45:25 -08:00
Daniel Hiltgen
2653191222 Merge pull request #2600 from dhiltgen/refined_win_docs
Document setting server vars for windows
2024-02-19 13:46:37 -08:00
Daniel Hiltgen
b338c0635f Document setting server vars for windows 2024-02-19 13:30:46 -08:00
Daniel Hiltgen
4fcbf1cde6 Merge pull request #2599 from dhiltgen/fix_avx
Explicitly disable AVX2 on GPU builds
2024-02-19 13:13:05 -08:00
Daniel Hiltgen
9220b4fa91 Merge pull request #2585 from dhiltgen/cuda_leaks
Fix cuda leaks
2024-02-19 12:48:00 -08:00
Daniel Hiltgen
fc39a6cd7a Fix cuda leaks
This should resolve the problem where we don't fully unload from the GPU
when we go idle.
2024-02-18 18:37:20 -08:00
Justin Hayes
1e23e82324 Update Web UI link to new project name (#2563)
Ollama WebUI is now known as Open WebUI.
2024-02-17 20:05:20 -08:00
Daniel Hiltgen
f9fd08040b Merge pull request #2552 from dhiltgen/dup_update_menus
Fix duplicate menus on update and exit on signals
2024-02-16 17:23:37 -08:00
Daniel Hiltgen
4318e35ee3 Merge pull request #2553 from dhiltgen/amdgpu_version
Harden AMD driver lookup logic
2024-02-16 17:23:12 -08:00
Daniel Hiltgen
9754c6d9d8 Harden AMD driver lookup logic
It looks like the version file doesnt exist on older(?) drivers
2024-02-16 16:20:16 -08:00
Daniel Hiltgen
a497235a55 Fix view logs menu 2024-02-16 15:42:53 -08:00
Daniel Hiltgen
df6dc4fd96 Fix duplicate menus on update and exit on signals
Also fixes a few fit-and-finish items for better developer experience
2024-02-16 15:33:16 -08:00
Bruce MacDonald
88622847c6 fix: chat system prompting overrides (#2542) 2024-02-16 14:42:43 -05:00
Tristan Rhodes
9774663013 Update faq.md with the location of models on Windows (#2545) 2024-02-16 11:04:19 -08:00
Daniel Hiltgen
a468ae0459 Merge pull request #2499 from ollama/windows-preview
Windows Preview
2024-02-15 16:06:32 -08:00
Daniel Hiltgen
c3e62ba38a Merge pull request #2516 from dhiltgen/single_tray_app
Fix a couple duplicate instance bugs
2024-02-15 15:52:43 -08:00
Daniel Hiltgen
117369aa73 Exit if we detect another copy of Ollama running 2024-02-15 14:58:29 -08:00
Daniel Hiltgen
1ba734de67 typo 2024-02-15 14:56:55 -08:00
Daniel Hiltgen
5208cf09b1 clean up some logging 2024-02-15 14:56:55 -08:00
Daniel Hiltgen
bb9de6037c Prevent multiple installers running concurrently 2024-02-15 14:56:55 -08:00
Daniel Hiltgen
272e53a1f5 Prepare to distribute standalone windows executable
This will be useful for our automated test riggig, and may be useful for
advanced users who want to "roll their own" system service
2024-02-15 14:56:55 -08:00
Daniel Hiltgen
db2a9ad1fe Explicitly disable AVX2 on GPU builds
Even though we weren't setting it to on, somewhere in the cmake config
it was getting toggled on.  By explicitly setting it to off, we get `/arch:AVX`
as intended.
2024-02-15 14:50:11 -08:00
Daniel Hiltgen
c9ab1aead3 Merge pull request #2526 from dhiltgen/harden_for_quotes
Harden the OLLAMA_HOST lookup for quotes
2024-02-15 14:13:40 -08:00
Daniel Hiltgen
4a10e7a7fa Harden the OLLAMA_HOST lookup for quotes 2024-02-15 13:46:56 -08:00
Michael Yang
86808f80a8 remove unused import 2024-02-15 12:09:11 -08:00
Michael Yang
4240b045e6 always enable view logs 2024-02-15 12:08:27 -08:00
Michael Yang
e547378893 disable default debug 2024-02-15 12:05:13 -08:00
Michael Yang
fd77dbec4d do not print update request headers 2024-02-15 11:36:35 -08:00
Michael
fefb3e77d1 Update README.md 2024-02-15 10:32:40 -08:00
Jeffrey Morgan
ed5489a96e higher resolution tray icons 2024-02-14 22:55:03 -08:00
jmorganca
76113742cf update installer title 2024-02-15 05:56:45 +00:00
Jeffrey Morgan
57e60c836f better windows app and tray icons 2024-02-15 05:56:45 +00:00
jmorganca
622b1f3e67 update installer and app.exe metadata 2024-02-15 05:56:45 +00:00
jmorganca
7ad9844ac0 set exe metadata using resource files 2024-02-15 05:56:45 +00:00
Michael Yang
e43648afe5 rerefactor 2024-02-15 05:56:45 +00:00
Daniel Hiltgen
823a520266 Fix lint error on ignored error for win console 2024-02-15 05:56:45 +00:00
vinjn
66ef308abd Import "containerd/console" lib to support colorful output in Windows terminal 2024-02-15 05:56:45 +00:00
Daniel Hiltgen
29e90cc13b Implement new Go based Desktop app
This focuses on Windows first, but coudl be used for Mac
and possibly linux in the future.
2024-02-15 05:56:45 +00:00
Daniel Hiltgen
f397e0e988 Move hub auth out to new package 2024-02-15 05:56:45 +00:00
Daniel Hiltgen
9da9e8fb72 Move Mac App to a new dir 2024-02-15 05:56:45 +00:00
Patrick Devine
42e77e2a69 handle race condition while setting raw mode in windows (#2509) 2024-02-14 21:28:35 -08:00
Jeffrey Morgan
9241a29336 Revert "Revert "bump submodule to 6c00a06 (#2479)"" (#2485)
This reverts commit 6920964b87.
2024-02-13 18:18:41 -08:00
Jeffrey Morgan
f7231ad9ad set shutting_down to false once shutdown is complete (#2484) 2024-02-13 17:48:41 -08:00
Jeffrey Morgan
6920964b87 Revert "bump submodule to 6c00a06 (#2479)"
This reverts commit 2f9ed52bbd.
2024-02-13 17:23:05 -08:00
Jeffrey Morgan
2f9ed52bbd bump submodule to 6c00a06 (#2479) 2024-02-13 17:12:42 -08:00
bnorick
caf2b13c10 Fix infinite keep_alive (#2480) 2024-02-13 15:40:32 -08:00
lebrunel
1d263449ff Update README.md to include link to Ollama-ex Elixir library (#2477) 2024-02-13 11:40:44 -08:00
Jeffrey Morgan
48a273f80b Fix issues with templating prompt in chat mode (#2460) 2024-02-12 15:06:57 -08:00
Daniel Hiltgen
939c60473f Merge pull request #2422 from dhiltgen/better_kill
More robust shutdown
2024-02-12 14:05:06 -08:00
Jeffrey Morgan
f76ca04f9e update submodule to 099afc6 (#2468) 2024-02-12 14:01:16 -08:00
Daniel Hiltgen
76b8728f0c Merge pull request #2465 from dhiltgen/block_rocm_pre_9
Detect AMD GPU info via sysfs and block old cards
2024-02-12 12:41:43 -08:00
Jeffrey Morgan
1f9078d6ae Check image filetype in api handlers (#2467) 2024-02-12 11:16:20 -08:00
Daniel Hiltgen
6d84f07505 Detect AMD GPU info via sysfs and block old cards
This wires up some new logic to start using sysfs to discover AMD GPU
information and detects old cards we can't yet support so we can fallback to CPU mode.
2024-02-12 08:19:41 -08:00
Jeffrey Morgan
26b13fc33c patch: always add token to cache_tokens (#2459) 2024-02-12 08:10:16 -08:00
Jeffrey Morgan
1c8435ffa9 Update domain name references in docs and install script (#2435) 2024-02-09 15:19:30 -08:00
Daniel Hiltgen
6680761596 Shutdown faster
Make sure that when a shutdown signal comes, we shutdown quickly instead
of waiting for a potentially long exchange to wrap up.
2024-02-08 22:22:50 -08:00
Jeffrey Morgan
42b797ed9c Update openai.md 2024-02-08 15:03:23 -05:00
Jeffrey Morgan
336aa43f3c Update openai.md 2024-02-08 12:48:28 -05:00
Daniel Hiltgen
69f392c9b7 Merge pull request #2403 from dhiltgen/handle_tmp_cleanup
Ensure the libraries are present
2024-02-07 17:55:31 -08:00
Daniel Hiltgen
a1dfab43b9 Ensure the libraries are present
When we store our libraries in a temp dir, a reaper might clean
them when we are idle, so make sure to check for them before
we reload.
2024-02-07 17:27:49 -08:00
Jeffrey Morgan
a0a199b108 Fix hanging issue when sending empty content (#2399) 2024-02-07 19:30:33 -05:00
Jeffrey Morgan
ab0d37fde4 Update openai.md 2024-02-07 17:25:33 -05:00
Jeffrey Morgan
14e71350c8 Update openai.md 2024-02-07 17:25:24 -05:00
Jeffrey Morgan
453f572f83 Initial OpenAI /v1/chat/completions API compatibility (#2376) 2024-02-07 17:24:29 -05:00
Daniel Hiltgen
c9dfa6e571 Merge pull request #2377 from dhiltgen/bump_llamacpp
Bump llama.cpp to b2081
2024-02-07 12:04:38 -08:00
Michael Yang
3dcbcd367d Merge pull request #2394 from ollama/mxyng/fix-error-response 2024-02-07 11:47:31 -08:00
Michael Yang
e805ac1d59 fix response on token error 2024-02-07 11:05:49 -08:00
Michael Yang
b9229ffca5 Merge pull request #2378 from ollama/mxyng/runners
runners
2024-02-06 13:49:58 -08:00
Michael Yang
46c847c4ad enable rocm builds 2024-02-06 13:36:13 -08:00
Michael Yang
92b1a21f79 use linux runners 2024-02-06 13:36:04 -08:00
Daniel Hiltgen
de76b95dd4 Bump llama.cpp to b2081 2024-02-06 12:06:43 -08:00
Michael Yang
59ec837ef6 Merge pull request #2374 from ollama/mxyng/rocm-builds
disable rocm builds
2024-02-06 09:41:02 -08:00
Michael Yang
f06b99a461 disable rocm builds 2024-02-06 09:29:42 -08:00
Bruce MacDonald
128fce5495 docs: keep_alive (#2258) 2024-02-06 11:00:05 -05:00
Daniel Hiltgen
27aa2d4a19 Merge pull request #1849 from mraiser/main
Accomodate split cuda lib dir
2024-02-05 16:01:16 -08:00
Jeffrey Morgan
b9f91a0b36 Update import instructions to use convert and quantize tooling from llama.cpp submodule (#2247) 2024-02-05 00:50:44 -05:00
Erik S
b538dc3858 Add llm-ollama plugin for Datasette's LLM CLI to README (#2340)
Co-authored-by: Erik Sp <git@aschwa.com>
2024-02-03 15:40:50 -08:00
Jeffrey Morgan
f0e9496c85 Update api.md 2024-02-02 12:17:24 -08:00
Jeffrey Morgan
09a6f76f4c fix error on ollama run with a non-existent model 2024-02-01 23:11:52 -08:00
Jeffrey Morgan
e135167484 Add multimodel support to ollama run in noninteractive mopde (#2317) 2024-02-01 21:33:06 -08:00
Jeffrey Morgan
38296ab352 clear previous images when submitting an image to ollama run (#2316) 2024-02-01 21:30:26 -08:00
Daniel Hiltgen
f43dea68d1 Merge pull request #2318 from dhiltgen/more_clean
Harden generate patching model
2024-02-01 20:41:29 -08:00
Daniel Hiltgen
e1f50377f4 Harden generate patching model
Only apply patches if we have any, and make sure to cleanup
every file we patched at the end to leave the tree clean
2024-02-01 19:34:36 -08:00
Jeffrey Morgan
7913104527 Improvements to ollama run for multimodal models (#2300) 2024-02-01 17:09:51 -08:00
Michael Yang
bfbf2f7cf7 Merge pull request #2296 from ollama/mxyng/img-tags
append image tags to user content
2024-02-01 13:16:59 -08:00
Michael Yang
fe3cbd014f Merge pull request #2298 from ollama/mxyng/debug-prompt
structured debug prompt
2024-02-01 13:16:49 -08:00
Michael Yang
3d6f48507a structured debug prompt 2024-02-01 11:56:28 -08:00
Michael Yang
f3761405c8 use image id 2024-02-01 11:52:42 -08:00
Michael Yang
e49dc9f3d8 fix tests 2024-02-01 11:48:11 -08:00
Michael Yang
d125510b4b remove image tags 2024-02-01 11:32:51 -08:00
Russell Canfield
1ca386aa9e Feature - Add Wingman Extension (#2313) 2024-02-01 11:16:24 -08:00
Michael Yang
fb56988014 account for image projection in token count 2024-02-01 09:50:48 -08:00
Michael Yang
d046bee790 use llm.ImageData for chat 2024-01-31 19:18:25 -08:00
Jeffrey Morgan
f11bf0740b use llm.ImageData 2024-01-31 19:13:48 -08:00
Michael Yang
8450bf66e6 trim images 2024-01-31 19:13:47 -08:00
Michael Yang
b4e11be8ef append image tags to user content 2024-01-31 19:13:10 -08:00
Bruce MacDonald
a896079705 preserve last system message from modelfile (#2289) 2024-01-31 21:45:01 -05:00
Michael Yang
583950c828 Merge pull request #2294 from ollama/mxyng/slog-source
update slog handler options
2024-01-31 15:29:11 -08:00
Michael Yang
8ac08a0eec update slog handler options
- consistent format by using text handler for debug and non-debug
- truncate source file to just the file name
2024-01-31 15:15:00 -08:00
Michael Yang
60f47be64c Merge pull request #2284 from ollama/mxyng/parse-raw
remove unnecessary parse raw
2024-01-31 09:40:48 -08:00
Daniel Hiltgen
6e56077ada Merge pull request #2263 from dhiltgen/bump_llamacpp
Bump llama.cpp to b1999
2024-01-31 08:39:41 -08:00
Hoang Nguyen
98ae9467bb Added MindMac to Community Integrations -> Web & Desktop section (#1957) 2024-01-31 07:48:37 -08:00
Richard Macarthy
b7a24af083 Add twinny vscode extension to Extensions and Plugins (#1950) 2024-01-31 06:25:06 -08:00
Michael Yang
c8b1f2369e remove unnecessary parse raw 2024-01-30 17:00:53 -08:00
Daniel Hiltgen
72b12c3be7 Bump llama.cpp to b1999
This requires an upstream change to support graceful termination,
carried as a patch.
2024-01-30 16:52:12 -08:00
Bruce MacDonald
0632dff3f8 trim chat prompt based on llm context size (#1963) 2024-01-30 15:59:29 -05:00
Maximilian Weber
509e2dec8a Update README.md (#2252)
Added - [Ollama for R - rollama](https://github.com/JBGruber/rollama) in Libraries in README.md
2024-01-30 11:56:51 -08:00
Daniel Hiltgen
78a48de804 Merge pull request #2256 from dhiltgen/container_logs
Add container hints for troubleshooting
2024-01-30 08:12:48 -08:00
Daniel Hiltgen
e7dbb00331 Add container hints for troubleshooting
Some users are new to containers and unsure where the server logs go
2024-01-29 08:53:41 -08:00
Marc Raiser
c3f9538636 remove default.nix 2024-01-29 00:05:07 -05:00
Jeffrey Morgan
2e06ed01d5 remove unknown CPPFLAGS option 2024-01-28 17:51:23 -08:00
Daniel Hiltgen
4072b5879b Merge pull request #2246 from dhiltgen/reject_cuda_without_avx
Don't disable GPUs on arm without AVX
2024-01-28 16:26:55 -08:00
Daniel Hiltgen
15562e887d Don't disable GPUs on arm without AVX
AVX is an x86 feature, so ARM should be excluded from
the check.
2024-01-28 15:22:38 -08:00
Jeffrey Morgan
f2245c7c77 print prompt with OLLAMA_DEBUG=1 (#2245) 2024-01-28 15:22:35 -08:00
Jeffrey Morgan
e4b9b72f2a Do not repeat system prompt for chat templating (#2241) 2024-01-28 14:15:56 -08:00
Daniel Hiltgen
311f8e0c3f Merge pull request #2243 from dhiltgen/harden_zero_gpus
Harden for zero detected GPUs
2024-01-28 13:30:44 -08:00
Daniel Hiltgen
f07f8b7a9e Harden for zero detected GPUs
At least with the ROCm libraries, its possible to have the library
present with zero GPUs.  This fix avoids a divide by zero bug in llm.go
when we try to calculate GPU memory with zero GPUs.
2024-01-28 13:13:10 -08:00
mraiser
4c4c730a0a Merge branch 'ollama:main' into main 2024-01-27 21:56:11 -05:00
Daniel Hiltgen
e02ecfb6c8 Merge pull request #2116 from dhiltgen/cc_50_80
Add support for CUDA 5.0 cards
2024-01-27 10:28:38 -08:00
Daniel Hiltgen
c8059b4dcf Merge pull request #2224 from jaglinux/fix_rocm_get_version_message
ROCm: Correct the response string in rocm_get_version function
2024-01-27 07:29:32 -08:00
Jagadish Krishnamoorthy
59d87127f5 Update gpu_info_rocm.c 2024-01-26 22:08:27 -08:00
Patrick Devine
b5cf31b460 add keep_alive to generate/chat/embedding api endpoints (#2146) 2024-01-26 14:28:02 -08:00
Daniel Hiltgen
cc4915e262 Merge pull request #2214 from dhiltgen/reject_cuda_without_avx
Detect lack of AVX and fallback to CPU mode
2024-01-26 12:06:44 -08:00
Daniel Hiltgen
667a2ba18a Detect lack of AVX and fallback to CPU mode
We build the GPU libraries with AVX enabled to ensure that if not all
layers fit on the GPU we get better performance in a mixed mode.
If the user is using a virtualization/emulation system that lacks AVX
this used to result in an illegal instruction error and crash before this
fix.  Now we will report a warning in the server log, and just use
CPU mode to ensure we don't crash.
2024-01-26 11:36:03 -08:00
Michael Yang
e054ebe059 Merge pull request #2212 from ollama/mxyng/fix-build
fix build
2024-01-26 11:19:08 -08:00
Michael Yang
9d3dcfd0ec fix logging 2024-01-26 11:04:27 -08:00
Michael Yang
6e0ea5ecc8 Merge pull request #1916 from ollama/mxyng/inactivity-monitor
download: add inactivity monitor
2024-01-26 10:56:00 -08:00
Daniel Hiltgen
a47d8b2557 Merge pull request #2197 from dhiltgen/remove_rocm_image
Add back ROCm container support
2024-01-26 09:34:23 -08:00
Daniel Hiltgen
30c43c285c Merge pull request #2195 from dhiltgen/rocm_real_gpus
Ignore AMD integrated GPUs
2024-01-26 09:30:24 -08:00
Daniel Hiltgen
23a7ea593b Merge pull request #2209 from dhiltgen/harden_mgmt
Fix crash on cuda ml init failure
2024-01-26 09:30:13 -08:00
Daniel Hiltgen
75c44aa319 Add back ROCm container support
This adds ROCm support back as a discrete image.
2024-01-26 09:24:29 -08:00
Daniel Hiltgen
9d7b5d6c91 Ignore AMD integrated GPUs
Detect and ignore integrated GPUs reported by rocm.
2024-01-26 09:21:35 -08:00
Daniel Hiltgen
5d9c4a5f5a Fix crash on cuda ml init failure
The new driver lookup code was triggering after init failure due to a missing return
2024-01-26 09:18:33 -08:00
Daniel Hiltgen
197e420a97 Merge pull request #2196 from dhiltgen/remove_rocm_image
Switch back to ubuntu base
2024-01-25 16:50:32 -08:00
Daniel Hiltgen
a34e1ad3cf Switch back to ubuntu base
The size increase for rocm support in the standard image is problematic
We'll revisit multiple tags for rocm support in a follow up PR.
2024-01-25 16:46:01 -08:00
Michael Yang
2ae0556292 Merge pull request #1679 from ollama/mxyng/build-gpus
build cuda and rocm
2024-01-25 16:38:14 -08:00
Jeffrey Morgan
5be9bdd444 Update modelfile.md 2024-01-25 16:29:48 -08:00
Jeffrey Morgan
b706794905 Update modelfile.md to include MESSAGE 2024-01-25 16:29:32 -08:00
Michael Yang
a8c5413d06 only generate gpu libs 2024-01-25 15:41:56 -08:00
Michael Yang
5580de4571 archive ollama binaries 2024-01-25 15:40:16 -08:00
Michael Yang
946431d5b0 build cuda and rocm 2024-01-25 15:40:15 -08:00
Michael Yang
0610126049 remove env setting 2024-01-25 15:39:43 -08:00
Jeffrey Morgan
3ebd6a83fc update submodule to cd4fddb29f81d6a1f6d51a0c016bc6b486d68def 2024-01-25 13:54:11 -08:00
Jeffrey Morgan
a64570dcae Fix clearing kv cache between requests with the same prompt (#2186)
* Fix clearing kv cache between requests with the same prompt

* fix powershell script
2024-01-25 13:46:20 -08:00
Patrick Devine
7c40a67841 Save and load sessions (#2063) 2024-01-25 12:12:36 -08:00
Michael Yang
e64b5b07a2 Merge pull request #2181 from ollama/mxyng/stub-lint
stub generate outputs for lint
2024-01-25 11:55:15 -08:00
Michael Yang
9e1e295cdc Merge pull request #2175 from ollama/mxyng/refactor-tensor-read
refactor tensor read
2024-01-25 09:22:42 -08:00
Marc Raiser
6eb3cddcb6 To build on NixOS: nix-shell --run 'go generate ./... && go build .' 2024-01-25 10:17:22 -05:00
mraiser
a4564232a4 Update gen_linux.sh to find libcudart in separate directory 2024-01-25 09:49:35 -05:00
Jeffrey Morgan
a643823f86 Update README.md 2024-01-24 21:36:56 -08:00
Michael Yang
8e5d359a03 stub generate outputs for lint 2024-01-24 17:36:10 -08:00
Daniel Hiltgen
a170888dd4 Merge pull request #2174 from dhiltgen/rocm_real_gpus
More logging for gpu management
2024-01-24 11:09:17 -08:00
Michael Yang
cd22855ef8 refactor tensor read 2024-01-24 10:48:31 -08:00
Daniel Hiltgen
013fd07139 More logging for gpu management
Fix an ordering glitch of dlerr/dlclose and add more logging to help
root cause some crashes users are hitting. This also refines the
function pointer names to use the underlying function names instead
of simplified names for readability.
2024-01-24 10:32:36 -08:00
Daniel Hiltgen
f63dc2db5c Merge pull request #2162 from dhiltgen/rocm_real_gpus
Report more information about GPUs in verbose mode
2024-01-23 17:45:40 -08:00
Jeffrey Morgan
eaa5a396d9 Update README.md 2024-01-23 16:08:15 -08:00
Jeffrey Morgan
8ed22f5d72 Update README.md 2024-01-23 14:38:01 -08:00
Daniel Hiltgen
987c16b2f7 Report more information about GPUs in verbose mode
This adds additional calls to both CUDA and ROCm management libraries to
discover additional attributes about the GPU(s) detected in the system, and
wires up runtime verbosity selection.  When users hit problems with GPUs we can
ask them to run with `OLLAMA_DEBUG=1 ollama serve` and share the results.
2024-01-23 11:37:02 -08:00
Jeffrey Morgan
950f636d64 Update README.md 2024-01-23 10:29:10 -08:00
Jeffrey Morgan
4458efb73a Load all layers on arm64 macOS if model is small enough (#2149) 2024-01-22 17:40:06 -08:00
Daniel Hiltgen
ceea599494 Merge pull request #2150 from dhiltgen/default_version
Set a default version using git describe
2024-01-22 17:38:27 -08:00
Daniel Hiltgen
3005ec74b3 Set a default version using git describe
If a VERSION is not specified, this will generate a version string that
represents the state of the repo.  For example `0.1.21-12-gffaf52e-dirty`
representing 12 commits away from 0.1.21 tag, on commit gffaf52e
and the tree is dirty.
2024-01-22 17:12:20 -08:00
Daniel Hiltgen
0759d8996e Merge pull request #2148 from dhiltgen/intel_mac
Refine Accelerate usage on mac
2024-01-22 16:56:58 -08:00
Daniel Hiltgen
0f5b843319 Refine Accelerate usage on mac
For old macs, accelerate seems to cause crashes, but for
AVX2 capable macs, it does not.
2024-01-22 16:25:56 -08:00
Jeffrey Morgan
ffaf52e1e9 update submodule to 011e8ec577fd135cbc02993d3ea9840c516d6a1c 2024-01-22 15:16:54 -08:00
Michael Yang
940b10b036 Merge pull request #2144 from jmorganca/mxyng/update-faq
faq: update to use launchctl setenv
2024-01-22 13:46:57 -08:00
Daniel Hiltgen
3bc28736cd Merge pull request #2143 from dhiltgen/llm_verbosity
Refine debug logging for llm
2024-01-22 13:19:16 -08:00
Michael Yang
93a756266c faq: update to use launchctl setenv 2024-01-22 13:10:13 -08:00
Daniel Hiltgen
a0a829bf7a Merge pull request #2142 from dhiltgen/debug_on_fail
Debug logging on init failure
2024-01-22 12:29:22 -08:00
Daniel Hiltgen
730dcfcc7a Refine debug logging for llm
This wires up logging in llama.cpp to always go to stderr, and also
turns up logging if OLLAMA_DEBUG is set.
2024-01-22 12:26:49 -08:00
Daniel Hiltgen
27a2d5af54 Debug logging on init failure 2024-01-22 12:08:22 -08:00
Jeffrey Morgan
5f81a33f43 update submodule to 6f9939d (#2115) 2024-01-22 11:56:40 -08:00
Michael Yang
6225fde046 Merge pull request #2102 from jmorganca/mxyng/fix-create-override
fix: remove overwritten model layers
2024-01-22 09:37:48 -08:00
Meng Zhuo
069184562b readline: drop not use min function (#2134) 2024-01-22 08:15:08 -08:00
Daniel Hiltgen
5576bb2348 Merge pull request #2130 from dhiltgen/more_faster
Make CPU builds parallel and customizable AMD GPUs
2024-01-21 16:14:12 -08:00
Daniel Hiltgen
2738837786 Merge pull request #2131 from dhiltgen/probe_cards_at_init
Probe GPUs before backend init
2024-01-21 16:13:47 -08:00
Daniel Hiltgen
ec3764538d Probe GPUs before backend init
Detect potential error scenarios so we can fallback to CPU mode without
hitting asserts.
2024-01-21 15:59:38 -08:00
Daniel Hiltgen
df54c723ae Make CPU builds parallel and customizable AMD GPUs
The linux build now support parallel CPU builds to speed things up.
This also exposes AMD GPU targets as an optional setting for advaced
users who want to alter our default set.
2024-01-21 15:12:21 -08:00
Daniel Hiltgen
fa8c990e58 Merge pull request #2127 from dhiltgen/rocm_container
Combine the 2 Dockerfiles and add ROCm
2024-01-21 11:49:01 -08:00
Daniel Hiltgen
da72235ebf Combine the 2 Dockerfiles and add ROCm
This renames Dockerfile.build to Dockerfile, and adds some new stages
to support 2 modes of building - the build_linux.sh script uses
intermediate stages to extract the artifacts for ./dist, and the default
build generates a container image usable by both cuda and rocm cards.
This required transitioniing the x86 base to the rocm image to avoid
layer bloat.
2024-01-21 11:37:11 -08:00
Jeffrey Morgan
89c4aee29e Unlock mutex when failing to load model (#2117) 2024-01-20 20:54:46 -05:00
Daniel Hiltgen
a447a083f2 Add compute capability 5.0, 7.5, and 8.0 2024-01-20 14:24:05 -08:00
Jeffrey Morgan
f32ea81b21 increase minimum overhead to 1024MiB (#2114) 2024-01-20 17:11:38 -05:00
Daniel Hiltgen
681a914990 Add support for CUDA 5.2 cards 2024-01-20 10:48:43 -08:00
Jeffrey Morgan
4c54f0ddeb sign dylibs on macOS (#2101) 2024-01-19 19:24:11 -05:00
Michael Yang
c08dfaa23d fix: remove overwritten model layers
if create overrides a manifest, first add the older manifest's layers to
the delete map so they can be cleaned up
2024-01-19 14:58:37 -08:00
Daniel Hiltgen
3b76e736ae Merge pull request #2100 from dhiltgen/more_wsl_globs
More WSL paths
2024-01-19 13:41:08 -08:00
Daniel Hiltgen
552db98bf1 More WSL paths 2024-01-19 13:23:29 -08:00
Daniel Hiltgen
fdcdfef620 Merge pull request #2099 from dhiltgen/fix_cuda_model_swap
Switch to local dlopen symbols
2024-01-19 12:22:04 -08:00
Daniel Hiltgen
6a042438af Switch to local dlopen symbols 2024-01-19 11:37:02 -08:00
Jeffrey Morgan
dc88cc3981 use gzip for runner embedding (#2067) 2024-01-19 13:23:03 -05:00
Daniel Hiltgen
62976087c6 Merge pull request #1999 from lainedfles/termux_android_cpu_only
Fix CPU-only build under Android Termux enviornment.
2024-01-18 17:16:53 -08:00
Self Denial
344342abdf Restore dyn_ext_server.c since RTLD_DEEPBIND has been removed 2024-01-18 17:30:42 -07:00
Self Denial
eb76f3e379 Fix CPU-only build under Android Termux enviornment.
Update gpu.go initGPUHandles() to declare gpuHandles variable before
reading it. This resolves an "invalid memory address or nil pointer
dereference" error.

Update dyn_ext_server.c to avoid setting the RTLD_DEEPBIND flag under
__TERMUX__ (Android).
2024-01-18 17:25:33 -07:00
Michael Yang
d017e3d0a6 Merge pull request #2060 from jmorganca/mxyng/fix-show
fix show handler
2024-01-18 16:02:27 -08:00
Michael Yang
aac9ab4db7 fix show handler 2024-01-18 15:36:50 -08:00
Michael Yang
1f5b7ff976 Merge pull request #1932 from jmorganca/mxyng/api-fields
api: add model for all requests
2024-01-18 14:56:51 -08:00
Michael Yang
e299831e2c Merge pull request #1958 from purificant/ci
ci: update setup-go action
2024-01-18 14:53:36 -08:00
Michael Yang
745b5934fa add model to ModelResponse 2024-01-18 14:32:55 -08:00
Michael Yang
a38d88d828 api: add model for all requests
prefer using req.Model and fallback to req.Name
2024-01-18 14:31:37 -08:00
Daniel Hiltgen
abec7f06e5 Merge pull request #2056 from dhiltgen/slog
Mechanical switch from log to slog
2024-01-18 14:27:24 -08:00
Michael Yang
e5da190bac Merge pull request #2020 from jmorganca/mxyng/install-fedora
install: pin fedora to max 37
2024-01-18 14:23:42 -08:00
Daniel Hiltgen
ecbfc0182f Go bump to v1.21 to pick up slog 2024-01-18 14:12:57 -08:00
Daniel Hiltgen
fedd705aea Mechanical switch from log to slog
A few obvious levels were adjusted, but generally everything mapped to "info" level.
2024-01-18 14:12:57 -08:00
Mike Bird
82ee019bfc add open interpreter to list of extensions (#2016) 2024-01-18 13:59:39 -08:00
Sachin Sachdeva
ad9dbc2a04 Haystack Ollama Integration (#2021)
Updated readme with the web link for haystack ollama integration
2024-01-18 13:38:32 -08:00
Daniel Hiltgen
fccdf4c635 Merge pull request #1987 from xyproto/archlinux
Let gpu.go and gen_linux.sh also find CUDA on Arch Linux
2024-01-18 13:32:10 -08:00
Daniel Hiltgen
d450fb1d1e Merge pull request #2055 from dhiltgen/cuda_docs
Refine the linux cuda/rocm developer docs
2024-01-18 12:07:31 -08:00
Daniel Hiltgen
df40b11d03 Merge pull request #2007 from dhiltgen/cpu_fallback
Add multiple CPU variants for Intel Mac
2024-01-18 11:32:29 -08:00
Daniel Hiltgen
9cd20b0ec8 Refine the linux cuda/rocm developer docs 2024-01-18 09:44:44 -08:00
Daniel Hiltgen
b992bf65fc Disable arm64 for test phase
The runners are x86 so we can only run binaries that match.
2024-01-17 19:26:13 -08:00
Daniel Hiltgen
1b249748ab Add multiple CPU variants for Intel Mac
This also refines the build process for the ext_server build.
2024-01-17 15:08:54 -08:00
Alexander F. Rødseth
cbe2adc78a Merge branch 'main' into archlinux 2024-01-17 12:50:11 +01:00
Michael Yang
d5a7353357 Merge pull request #2026 from jmorganca/mxyng/fix-windows
fix: normalize name path before splitting
2024-01-16 16:58:42 -08:00
Michael Yang
96cfb62641 fix: normalize name path before splitting 2024-01-16 16:48:29 -08:00
Daniel Hiltgen
7d00b5d110 Merge pull request #1915 from dhiltgen/bump_llama_with_new_dep
Bump llama.cpp to b1842 and add new cuda lib dep
2024-01-16 13:36:49 -08:00
Daniel Hiltgen
795674dd90 Bump llama.cpp to b1842 and add new cuda lib dep
Upstream llama.cpp has added a new dependency with the
NVIDIA CUDA Driver Libraries (libcuda.so) which is part of the
driver distribution, not the general cuda libraries, and is not
available as an archive, so we can not statically link it.  This may
introduce some additional compatibility challenges which we'll
need to keep an eye on.
2024-01-16 12:53:52 -08:00
Daniel Hiltgen
e282bdccdd Merge pull request #1990 from dhiltgen/ci_mac_cross
Add macos cross-compile CI coverage
2024-01-16 12:31:37 -08:00
Michael Yang
d9bfb2f08f install: pin fedora to max 37
repos for fedora 38 and newer do not exist as of this commit

```
$ dnf config-manager --add-repo https://developer.download.nvidia.com/compute/cuda/repos/fedora38/x86_64/cuda-fedora38.repo
Adding repo from: https://developer.download.nvidia.com/compute/cuda/repos/fedora38/x86_64/cuda-fedora38.repo
Status code: 404 for https://developer.download.nvidia.com/compute/cuda/repos/fedora38/x86_64/cuda-fedora38.repo (IP: 152.195.19.142)
Error: Configuration of repo failed
```
2024-01-16 11:45:21 -08:00
Michael Yang
598d6d5572 Merge pull request #1937 from jmorganca/mxyng/remove-client-py
remove client.py
2024-01-16 11:01:41 -08:00
Bruce MacDonald
a897e833b8 do not cache prompt (#2018)
- prompt cache causes inferance to hang after some time
2024-01-16 13:48:05 -05:00
Patrick Devine
eef50accb4 Fix show parameters (#2017) 2024-01-16 10:34:44 -08:00
Michael Yang
05d53de7a1 Merge pull request #1968 from jmorganca/mxyng/fix-request-retry
fix: request retry with error
2024-01-16 10:33:50 -08:00
Daniel Hiltgen
8795447dad Merge pull request #1966 from fpreiss/fpreiss/gen_linux_cuda_detection
improve cuda detection (rel. issue #1704)
2024-01-14 18:00:11 -08:00
Daniel Hiltgen
b3035112a1 Add macos cross-compile CI coverage 2024-01-14 10:38:59 -08:00
Daniel Hiltgen
95ad9a9fc8 Merge pull request #1988 from dhiltgen/fix_intel_mac
Fix typo in arm mac arch script
2024-01-14 08:45:18 -08:00
Daniel Hiltgen
3ca5f69ce8 Fix typo in arm mac arch script 2024-01-14 08:32:57 -08:00
Daniel Hiltgen
cfa6337960 Merge pull request #1982 from dhiltgen/fix_intel_mac
Fix intel mac build
2024-01-14 08:26:46 -08:00
Alexander F. Rødseth
f4bf1d514f Let gpu.go and gen_linux.sh also find CUDA on Arch Linux 2024-01-14 13:40:36 +01:00
Jeffrey Morgan
557110d0ba Disable mmap with lora layers (#1985) 2024-01-13 23:36:31 -05:00
Daniel Hiltgen
2ecb247276 Fix intel mac build
Make sure we're building an x86 ext_server lib when cross-compiling
2024-01-13 14:46:34 -08:00
Jeffrey Morgan
288ef8ff95 add gcc -lstdc++ flag for linux cpu (#1974) 2024-01-13 03:53:00 -05:00
Jeffrey Morgan
4cf17990f7 use g++ to build libext_server.so on linux (#1972) 2024-01-13 03:12:42 -05:00
Michael Yang
27331ae3a8 download: add inactivity monitor
if a download part is inactive for some time, restart it
2024-01-12 15:23:15 -08:00
Michael Yang
b6c0ef1e70 Merge pull request #1961 from jmorganca/mxyng/rm-double-newline
remove double newlines in /set parameter
2024-01-12 15:18:19 -08:00
Michael Yang
356d178f6e Merge pull request #1971 from jmorganca/mxyng/max-context-length
add max context length check
2024-01-12 15:10:25 -08:00
Michael Yang
eaed6f8c45 add max context length check 2024-01-12 14:54:07 -08:00
purificant
6a5bfc2ed6 update actions/setup-go 2024-01-12 22:27:25 +00:00
Michael Yang
cf29bd2d72 fix: request retry with error
this fixes a subtle bug with makeRequestWithRetry where an HTTP status
error on a retried request will potentially not return the right err
2024-01-12 13:32:27 -08:00
Fabian Preiss
905862e17b improve cuda detection (rel. issue #1704) 2024-01-12 21:59:19 +01:00
Patrick Devine
565f8a3c44 Convert the REPL to use /api/chat for interactive responses (#1936) 2024-01-12 12:05:52 -08:00
Michael Yang
5121b7ac9c remove double newlines in /set parameter 2024-01-12 11:21:15 -08:00
Michael Yang
a70262c6b2 Update README.md
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>
2024-01-12 09:43:04 -08:00
Tristram Oaten
40a0a90a88 Add group delete to uninstall instructions (#1924)
After executing the `userdel ollama` command, I saw this message:

```sh
$ sudo userdel ollama
userdel: group ollama not removed because it has other members.
```

Which reminded me that I had to remove the dangling group too. For completeness, the uninstall instructions should do this too.

Thanks!
2024-01-12 00:07:00 -05:00
Michael Yang
cbe20c4375 update readme 2024-01-11 16:24:37 -08:00
Michael Yang
5ffbbea1d7 remove client.py 2024-01-11 15:53:10 -08:00
Daniel Hiltgen
3773fb6465 Merge pull request #1935 from dhiltgen/cpu_fallback
Fix up the CPU fallback selection
2024-01-11 15:52:32 -08:00
Daniel Hiltgen
7427fa1387 Fix up the CPU fallback selection
The memory changes and multi-variant change had some merge
glitches I missed.  This fixes them so we actually get the cpu llm lib
and best variant for the given system.
2024-01-11 15:27:06 -08:00
Michael Yang
f84537e0e0 Merge pull request #1934 from jmorganca/mxyng/fix-slices
fix build and lint
2024-01-11 14:36:20 -08:00
Michael Yang
d2be6387c9 fix typo 2024-01-11 14:25:21 -08:00
Michael Yang
d7af35d3d0 import fmt 2024-01-11 14:22:32 -08:00
Michael Yang
defc1dbd6e use x/exp/slices 2024-01-11 14:20:13 -08:00
Daniel Hiltgen
de2fbdec99 Merge pull request #1819 from dhiltgen/multi_variant
Support multiple LLM libs; ROCm v5 and v6; Rosetta, AVX, and AVX2 compatible CPU builds
2024-01-11 14:00:48 -08:00
Eduard van Valkenburg
f5faf79aa1 Add semantic kernel to Readme (#1931) 2024-01-11 14:40:23 -05:00
Michael Yang
f4f939de28 Merge pull request #1552 from jmorganca/mxyng/lint-test
add lint and test on pull_request
2024-01-11 09:37:45 -08:00
Daniel Hiltgen
39928a42e8 Always dynamically load the llm server library
This switches darwin to dynamic loading, and refactors the code now that no
static linking of the library is used on any platform
2024-01-11 08:42:47 -08:00
Daniel Hiltgen
d88c527be3 Build multiple CPU variants and pick the best
This reduces the built-in linux version to not use any vector extensions
which enables the resulting builds to run under Rosetta on MacOS in
Docker.  Then at runtime it checks for the actual CPU vector
extensions and loads the best CPU library available
2024-01-11 08:42:47 -08:00
Fabian Preiß
3bc8b9832b fix gpu_test.go Error (same type) uint64->uint32 (#1921) 2024-01-11 08:22:23 -05:00
Jeffrey Morgan
ab6be852c7 revisit memory allocation to account for full kv cache on main gpu 2024-01-11 01:45:31 -05:00
Daniel Hiltgen
052b33b81b DRY out the Dockefile.build 2024-01-10 17:27:51 -08:00
Daniel Hiltgen
8da7bef05f Support multiple variants for a given llm lib type
In some cases we may want multiple variants for a given GPU type or CPU.
This adds logic to have an optional Variant which we can use to select
an optimal library, but also allows us to try multiple variants in case
some fail to load.

This can be useful for scenarios such as ROCm v5 vs v6 incompatibility
or potentially CPU features.
2024-01-10 17:27:51 -08:00
Jeffrey Morgan
b24e8d17b2 Increase minimum CUDA memory allocation overhead and fix minimum overhead for multi-gpu (#1896)
* increase minimum cuda overhead and fix minimum overhead for multi-gpu

* fix multi gpu overhead

* limit overhead to 10% of all gpus

* better wording

* allocate fixed amount before layers

* fixed only includes graph alloc
2024-01-10 19:08:51 -05:00
Jeffrey Morgan
f83881390f revert submodule back to 328b83de23b33240e28f4e74900d1d06726f5eb1 2024-01-10 18:42:39 -05:00
Daniel Hiltgen
ac70ab6761 Merge pull request #1914 from dhiltgen/smarter_cuda_detection
Smarter GPU Management library detection
2024-01-10 15:21:56 -08:00
Daniel Hiltgen
3c49c3ab0d Harden GPU mgmt library lookup
When there are multiple management libraries installed on a system
not every one will be compatible with the current driver.  This change
improves our management library algorithm to build up a set of discovered
libraries based on glob patterns, and then try all of them until we're able to
load one without error.
2024-01-10 15:06:41 -08:00
Daniel Hiltgen
9754ae4c89 Support optional override of the target archictures
This can help speed up incremental builds when you're only testing one
archicture, like amd64.  E.g.
BUILD_ARCH=amd64 ./scripts/build_linux.sh && scp ./dist/ollama-linux-amd64 test-system:
2024-01-10 14:43:24 -08:00
Jeffrey Morgan
224fbf2795 update submodule to commit 1fc2f265ff9377a37fd2c61eae9cd813a3491bea until its main branch is fixed 2024-01-10 17:03:15 -05:00
Jeffrey Morgan
2c6e8f5248 Update submodule to 6efb8eb30e7025b168f3fda3ff83b9b386428ad6 (#1885)
* update submodule to `6efb8eb30e7025b168f3fda3ff83b9b386428ad6`
* unblock condition variable in `update_slots` when closing server
2024-01-10 16:48:38 -05:00
Jeffrey Morgan
34344d801c clean up cmake build directory when cross compiling macOS builds 2024-01-09 17:13:56 -05:00
Robin Glauser
e868c8a5c7 Update api.md (#1878)
Fixed assistant in the example response.
2024-01-09 16:21:17 -05:00
Jeffrey Morgan
c336693f07 calculate overhead based number of gpu devices (#1875) 2024-01-09 15:53:33 -05:00
Daniel Hiltgen
e89dc1d54b Merge pull request #1874 from dhiltgen/correct_cuda_min
Set corret CUDA minimum compute capability version
2024-01-09 11:37:22 -08:00
Daniel Hiltgen
1961a81f03 Set corret CUDA minimum compute capability version
If you attempt to run the current CUDA build on compute capability 5.2
cards, you'll hit the following failure:
cuBLAS error 15 at ggml-cuda.cu:7956: the requested functionality is not supported
2024-01-09 11:28:24 -08:00
Jeffrey Morgan
8a8c7e7f8d only build for metal on arm64 2024-01-09 13:51:08 -05:00
Jeffrey Morgan
6df83e6daa update rough cuda overhead estimate to 15% + 384MiB 2024-01-09 13:51:08 -05:00
Michael Yang
f921e2696e typo 2024-01-09 09:45:42 -08:00
Michael Yang
4a33cede20 remove unused fields and functions 2024-01-09 09:37:40 -08:00
Michael Yang
f95d2f25f3 fix temporary history file permissions 2024-01-09 09:36:58 -08:00
Michael Yang
2b9892a808 fix(windows): modelpath and list 2024-01-09 09:36:58 -08:00
Michael Yang
2bb2bdd5d4 fix lint 2024-01-09 09:36:58 -08:00
Michael Yang
acfc376efd add .golangci.yaml 2024-01-09 09:36:58 -08:00
Michael Yang
997253143f add lint and test on pull_request 2024-01-09 09:36:58 -08:00
Michael Yang
62023177f6 Merge pull request #1614 from jmorganca/mxyng/fix-set-template
fix: set template without triple quotes
2024-01-09 09:36:24 -08:00
Jeffrey Morgan
6164f378f2 revert cuda overhead to 20% 2024-01-09 00:54:29 -05:00
Jeffrey Morgan
f387e9631b use runner if cuda alloc won't fit 2024-01-09 00:44:34 -05:00
Jeffrey Morgan
6566387ae3 add TODO for cuda overhead 2024-01-09 00:28:03 -05:00
Jeffrey Morgan
37708931fb update cuda overhead to 20% to fix crashes when switching between models and large context sizes 2024-01-09 00:05:23 -05:00
Jeffrey Morgan
f6cb0a553c update cuda overhead to 15% or 400MiB 2024-01-08 23:45:45 -05:00
Jeffrey Morgan
2680078c13 fix build on linux 2024-01-08 23:44:13 -05:00
Jeffrey Morgan
f1b7e5f560 update overhead to 15% 2024-01-08 23:37:45 -05:00
Jeffrey Morgan
cb534e6ac2 use 10% vram overhead for cuda 2024-01-08 23:17:44 -05:00
Jeffrey Morgan
58ce2d8273 better estimate scratch buffer size 2024-01-08 21:32:44 -05:00
Jeffrey Morgan
18ddf6d57d fix windows build 2024-01-08 20:04:01 -05:00
Michael Yang
61e6502449 Merge pull request #1818 from jmorganca/mxyng/fix-alt-prompt
fix(cmd): history in alt prompt
2024-01-08 13:48:34 -08:00
Jeffrey Morgan
08f1e18965 Offload layers to GPU based on new model size estimates (#1850)
* select layers based on estimated model memory usage

* always account for scratch vram

* dont load +1 layers

* better estmation for graph alloc

* Update gpu/gpu_darwin.go

Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com>

* Update llm/llm.go

Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com>

* Update llm/llm.go

* add overhead for cuda memory

* Update llm/llm.go

Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com>

* fix build error on linux

* address comments

---------

Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com>
2024-01-08 16:42:00 -05:00
Bruce MacDonald
7e8f7c8358 remove ggml automatic re-pull (#1856) 2024-01-08 14:41:01 -05:00
Bruce MacDonald
3f3eb19a3b document response in modelfile template variables (#1428) 2024-01-08 14:38:51 -05:00
Daniel Hiltgen
059ae4585e Merge pull request #1834 from dhiltgen/old_cuda
Detect very old CUDA GPUs and fall back to CPU
2024-01-07 10:39:49 -08:00
Daniel Hiltgen
6347f501ca Merge pull request #1828 from dhiltgen/fix_llava
Accept windows paths for image processing
2024-01-07 09:05:46 -08:00
Jeffrey Morgan
5feec959ad dont use -Wall in static build (#1833) 2024-01-07 10:39:19 -05:00
Jeffrey Morgan
dbdd50b283 add -DCMAKE_SYSTEM_NAME=Darwin cmake flag (#1832) 2024-01-07 00:46:17 -05:00
Daniel Hiltgen
d74ce6bd4f Detect very old CUDA GPUs and fall back to CPU
If we try to load the CUDA library on an old GPU, it panics and crashes
the server.  This checks the compute capability before we load the
library so we can gracefully fall back to CPU mode.
2024-01-06 21:40:29 -08:00
Guilherme Baptista
57942b4676 Update README.md - Community Integrations - Ollama for Ruby (#1830) 2024-01-06 22:31:39 -05:00
Daniel Hiltgen
e0d05b0f1e Accept windows paths for image processing
This enhances our regex to support windows style paths.  The regex will
match invalid path specifications, but we'll still validate file
existence and filter out mismatches
2024-01-06 10:50:27 -08:00
Daniel Hiltgen
2d9dd14f27 Merge pull request #1697 from dhiltgen/win_docs
Add windows native build instructions
2024-01-05 19:34:20 -08:00
Jeffrey Morgan
1caa56128f add cuda lib path for nvidia container toolkit 2024-01-05 21:10:37 -05:00
Michael Yang
0101e76dbe Merge pull request #1797 from sublimator/nd-allow-extension-origins-still-needs-explicit-listing-2024-01-05
fix: allow extension origins (still needs explicit listing), fixes #1686
2024-01-05 17:20:09 -08:00
Michael Yang
2ef9352b94 fix(cmd): history in alt mode 2024-01-05 16:20:02 -08:00
Michael Yang
5580ae2472 fix: set template without triple quotes 2024-01-05 15:51:33 -08:00
Bruce MacDonald
3a9f447141 only pull gguf model if already exists (#1817) 2024-01-05 18:50:00 -05:00
Patrick Devine
9c2941e61b switch api for ShowRequest to use the name field (#1816) 2024-01-05 15:06:43 -08:00
Patrick Devine
238ac5e765 Add unit tests for Parser (#1815) 2024-01-05 14:04:31 -08:00
Bruce MacDonald
4f4980b66b simplify ggml update logic (#1814)
- additional information is now available in show response, use this to pull gguf before running
- make gguf updates cancellable
2024-01-05 15:22:32 -05:00
Patrick Devine
22e93efa41 add show info command and fix the modelfile 2024-01-05 12:20:05 -08:00
Patrick Devine
2909dce894 split up interactive generation 2024-01-05 12:20:05 -08:00
Jeffrey Morgan
df32537312 gpu: read memory info from all cuda devices (#1802)
* gpu: read memory info from all cuda devices

* add `LOOKUP_SIZE` constant

* better constant name

* address comments
2024-01-05 11:25:58 -05:00
Bruce MacDonald
3367b5f3df remove unused generate patches (#1810) 2024-01-05 11:25:45 -05:00
Matt Williams
46edbbc518 Merge pull request #1801 from jmorganca/mattw/correctdockerlink 2024-01-04 19:20:45 -08:00
Michael Yang
d2ff18cd6b Merge pull request #1791 from jmorganca/mxyng/update-build
update Dockerfile.build
2024-01-04 19:13:44 -08:00
Matt Williams
df086d3c8c fix docker doc to point to hub
Signed-off-by: Matt Williams <m@technovangelist.com>
2024-01-04 18:42:23 -08:00
Nicholas Dudfield
8baaaa39c0 Allow extension origins (still needs explicit listing), fixes #1686 2024-01-05 09:06:47 +07:00
Michael Yang
f9961c70ae update build 2024-01-04 17:34:38 -08:00
Daniel Hiltgen
cd8fad3398 Merge pull request #1790 from dhiltgen/llm_code_shuffle
Cleaup stale submodule
2024-01-04 13:47:25 -08:00
Daniel Hiltgen
9983fa5f4e Cleaup stale submodule
If the tree has a stale submodule, make sure we clean it up first
2024-01-04 13:40:16 -08:00
Daniel Hiltgen
dfda91c2ee Merge pull request #1788 from dhiltgen/llm_code_shuffle
Revamp code layout for the llm directory and llama.cpp submodule
2024-01-04 13:14:28 -08:00
Daniel Hiltgen
fac9060da5 Init submodule with new path 2024-01-04 13:00:13 -08:00
Daniel Hiltgen
a554616f8e remove old llama.cpp submodule path 2024-01-04 12:12:21 -08:00
Daniel Hiltgen
77d96da94b Code shuffle to clean up the llm dir 2024-01-04 12:12:05 -08:00
Brian Murray
0d6e3565ae Add embeddings to API (#1773) 2024-01-04 15:00:52 -05:00
Daniel Hiltgen
b5939008a1 Merge pull request #1785 from dhiltgen/win_native_cli
Load dynamic cpu lib on windows
2024-01-04 08:55:01 -08:00
Daniel Hiltgen
e9ce91e9a6 Load dynamic cpu lib on windows
On linux, we link the CPU library in to the Go app and fall back to it
when no GPU match is found. On windows we do not link in the CPU library
so that we can better control our dependencies for the CLI.  This fixes
the logic so we correctly fallback to the dynamic CPU library
on windows.
2024-01-04 08:41:41 -08:00
Bruce MacDonald
4ad6c9b11f fix: pull either original model or from model on create (#1774) 2024-01-04 01:34:38 -05:00
Jeffrey Morgan
c0285158a9 tweak memory requirements error text 2024-01-03 19:47:18 -05:00
Jeffrey Morgan
77a66df72c add macOS memory check for 47B models 2024-01-03 19:46:16 -05:00
Jeffrey Morgan
5b4837f881 remove unused filetype check 2024-01-03 19:45:39 -05:00
Jeffrey Morgan
29340c2e62 update cmake flags for amd64 macOS (#1780)
* update cmake flags for intel macOS

* remove `LLAMA_K_QUANTS`

* put back `CMAKE_OSX_DEPLOYMENT_TARGET` and disable `LLAMA_F16C`
2024-01-03 19:22:15 -05:00
Daniel Hiltgen
d5ec730354 Merge pull request #1779 from dhiltgen/refined_amd_gpu_list
Improve maintainability of Radeon card list
2024-01-03 16:18:57 -08:00
Daniel Hiltgen
8bed487aba Merge pull request #1778 from dhiltgen/wsl1
Fail fast on WSL1 while allowing on WSL2
2024-01-03 16:18:41 -08:00
Daniel Hiltgen
c1a10a6e9b Merge pull request #1781 from dhiltgen/cpu_only_build
Fix CPU only builds
2024-01-03 16:18:25 -08:00
Daniel Hiltgen
ddbfa6fe31 Fix CPU only builds
Go embed doesn't like when there's no matching files, so put
a dummy placeholder in to allow building without any GPU support
If no "server" library is found, it's safely ignored at runtime.
2024-01-03 16:08:34 -08:00
Daniel Hiltgen
2fcd41ef81 Fail fast on WSL1 while allowing on WSL2
This prevents users from accidentally installing on WSL1 with instructions
guiding how to upgrade their WSL instance to version 2.  Once running WSL2
if you have an NVIDIA card, you can follow their instructions to set up
GPU passthrough and run models on the GPU.  This is not possible on WSL1.
2024-01-03 16:02:32 -08:00
Daniel Hiltgen
16f4603b67 Improve maintainability of Radeon card list
This moves the list of AMD GPUs to an easier to maintain list which
should make it easier to update over time.
2024-01-03 15:16:56 -08:00
Daniel Hiltgen
1184686649 Merge pull request #1776 from dhiltgen/render_group
Add ollama user to render group for Radeon support
2024-01-03 13:07:54 -08:00
Daniel Hiltgen
2588cb2daa Add ollama user to render group for Radeon support
For the ROCm libraries to access the driver, we need to add the ollama user
to the render group.
2024-01-03 12:56:31 -08:00
Jeffrey Morgan
c7ea8f237e set num_gpu to 1 only by default on darwin arm64 (#1771) 2024-01-03 14:10:29 -05:00
Bruce MacDonald
0b3118e0af fix: relay request opts to loaded llm prediction (#1761) 2024-01-03 12:01:42 -05:00
Daniel Hiltgen
05face44ef Merge pull request #1683 from dhiltgen/fix_windows_test
Fix windows system memory lookup
2024-01-03 09:00:39 -08:00
Daniel Hiltgen
a2ad952440 Fix windows system memory lookup
This refines the gpu package error handling and fixes a bug with the
system memory lookup on windows.
2024-01-03 08:50:01 -08:00
Daniel Hiltgen
5fea4410be Merge pull request #1680 from dhiltgen/better_patching
Refactor how we augment llama.cpp and refine windows native build
2024-01-03 08:10:17 -08:00
Bruce MacDonald
b846eb64d0 Fix template api doc description (#1661) 2024-01-03 11:00:59 -05:00
Cole Gillespie
3c5dd9ed1d Update README.md (#1766) 2024-01-03 10:44:22 -05:00
Jeffrey Morgan
b17ccd0542 Update import.md 2024-01-02 22:28:18 -05:00
Patrick Devine
d0409f772f keyboard shortcut help (#1764) 2024-01-02 18:04:12 -08:00
Jeffrey Morgan
ec261422af use docker build in build scripts 2024-01-02 19:32:54 -05:00
Daniel Hiltgen
0498f7ce56 Get rid of one-line llama.log
This one log line was triggering a single line llama.log to be generated
in the pwd of the server
2024-01-02 15:36:16 -08:00
Daniel Hiltgen
738a8d12eb Rename the ollama cmakefile 2024-01-02 15:36:16 -08:00
Daniel Hiltgen
d966b730ac Switch windows build to fully dynamic
Refactor where we store build outputs, and support a fully dynamic loading
model on windows so the base executable has no special dependencies thus
doesn't require a special PATH.
2024-01-02 15:36:16 -08:00
Daniel Hiltgen
9a70aecccb Refactor how we augment llama.cpp
This changes the model for llama.cpp inclusion so we're not applying a patch,
but instead have the C++ code directly in the ollama tree, which should make it
easier to refine and update over time.
2024-01-02 15:35:55 -08:00
Karim ElGhandour
22cd5eaab6 Added Ollama-SwiftUI to integrations (#1747) 2024-01-02 09:47:50 -05:00
Dane Madsen
304a8799ca Update README.md (#1757) 2024-01-02 09:47:08 -05:00
Jeffrey Morgan
2a2fa3c329 api.md cleanup & formatting 2023-12-27 14:32:35 -05:00
Jeffrey Morgan
55978c1dc9 clean up cache api option 2023-12-27 14:27:45 -05:00
Jeffrey Morgan
d4ebdadbe7 enable cache_prompt by default 2023-12-27 14:23:42 -05:00
Daniel Hiltgen
e201efa14b Add windows native build instructions 2023-12-25 08:31:34 -08:00
Icelain
c5f21f73a4 follow best practices by adding resp.Body.Close() (#1708) 2023-12-25 09:01:37 -05:00
Jeffrey Morgan
371bc73531 Update README.md 2023-12-24 11:54:08 -05:00
Jeffrey Morgan
c651d8b824 Update README.md 2023-12-23 11:18:12 -05:00
Daniel Hiltgen
cf50ef5b51 Merge pull request #1684 from dhiltgen/tag_integration_tests
Guard integration tests with a tag
2023-12-22 16:43:41 -08:00
Daniel Hiltgen
697bea6939 Guard integration tests with a tag
This should help CI avoid running the integration test logic in a
container where it's not currently possible.
2023-12-22 16:33:27 -08:00
K0IN
10da41d677 Add Cache flag to api (#1642) 2023-12-22 17:16:20 -05:00
Bruce MacDonald
db356c8519 post-response templating (#1427) 2023-12-22 17:07:05 -05:00
Jeffrey Morgan
b80081022f cache docker builds in build_linux.sh 2023-12-22 16:01:20 -05:00
Matt Williams
790457398a Merge pull request #1677 from jmorganca/mattw/docrunupdate
update where are models stored q
2023-12-22 09:56:27 -08:00
Matt Williams
511069a2a5 update where are models stored q
Signed-off-by: Matt Williams <m@technovangelist.com>
2023-12-22 09:48:44 -08:00
Matt Williams
5a85070c22 Update readmes, requirements, packagejsons, etc for all examples (#1452)
Most of the examples needed updates of Readmes to show how to run them. Some of the requirements.txt files had extra content that wasn't needed, or missing altogether. Apparently some folks like to run npm start
to run typescript, so a script was added to all typescript examples which
hadn't been done before.

Basically just a lot of cleanup.

Signed-off-by: Matt Williams <m@technovangelist.com>
2023-12-22 09:10:41 -08:00
Matt Williams
291700c92d Clean up documentation (#1506)
* Clean up documentation

Will probably need to update with PRs for new release.

Signed-off-by: Matt Williams <m@technovangelist.com>

* Correcting to fit in 0.1.15 changes

Signed-off-by: Matt Williams <m@technovangelist.com>

* Update README.md

Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>

* addressing comments

Signed-off-by: Matt Williams <m@technovangelist.com>

* more api cleanup

Signed-off-by: Matt Williams <m@technovangelist.com>

* its llava not llama

Signed-off-by: Matt Williams <m@technovangelist.com>

* Update docs/troubleshooting.md

Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>

* Updated hosting to server and documented all env vars

Signed-off-by: Matt Williams <m@technovangelist.com>

* remove last of the cli descriptions

Signed-off-by: Matt Williams <m@technovangelist.com>

* Update README.md

Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>

* update further per conversation with jeff earlier today

Signed-off-by: Matt Williams <m@technovangelist.com>

* cleanup the doc readme

Signed-off-by: Matt Williams <m@technovangelist.com>

* move upgrade to faq

Signed-off-by: Matt Williams <m@technovangelist.com>

* first change

Signed-off-by: Matt Williams <m@technovangelist.com>

* updated

Signed-off-by: Matt Williams <m@technovangelist.com>

* Update docs/faq.md

Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>

* Update docs/api.md

Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>

* Update docs/api.md

Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>

* Update docs/api.md

Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>

* Update docs/api.md

Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>

* Update docs/api.md

Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>

* Update docs/api.md

Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>

* Update docs/README.md

Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>

* Update docs/api.md

Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>

* Update docs/api.md

Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>

* Update docs/api.md

Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>

* Update README.md

Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>

* Update docs/README.md

Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>

* Update docs/api.md

Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>

* Update docs/api.md

Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>

* Update docs/api.md

Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>

* Update docs/README.md

Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>

* Update docs/README.md

Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>

* Update docs/README.md

Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>

* examples in parent

Signed-off-by: Matt Williams <m@technovangelist.com>

* add exapmle for create model.

Signed-off-by: Matt Williams <m@technovangelist.com>

* update faq

Signed-off-by: Matt Williams <m@technovangelist.com>

* update create model api

Signed-off-by: Matt Williams <m@technovangelist.com>

* Update docs/api.md

Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>

* Update docs/faq.md

Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>

* Update docs/troubleshooting.md

Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>

* update the readme in docs

Signed-off-by: Matt Williams <m@technovangelist.com>

* update a few more things

Signed-off-by: Matt Williams <m@technovangelist.com>

* Update docs/troubleshooting.md

Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>

* Update docs/faq.md

Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>

* Update README.md

Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>

* Update docs/modelfile.md

Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>

* Update docs/troubleshooting.md

Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>

---------

Signed-off-by: Matt Williams <m@technovangelist.com>
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>
2023-12-22 09:10:01 -08:00
Daniel Hiltgen
9db28af84e Merge pull request #1675 from dhiltgen/less_verbose
Quiet down llama.cpp logging by default
2023-12-22 08:57:17 -08:00
Daniel Hiltgen
e5202eb687 Quiet down llama.cpp logging by default
By default builds will now produce non-debug and non-verbose binaries.
To enable verbose logs in llama.cpp and debug symbols in the
native code, set `CGO_CFLAGS=-g`
2023-12-22 08:47:18 -08:00
Daniel Hiltgen
96fb441abd Merge pull request #1146 from dhiltgen/ext_server_cgo
Add cgo implementation for llama.cpp
2023-12-22 08:16:31 -08:00
Daniel Hiltgen
495c06e4a6 Fix doc glitch 2023-12-21 18:21:31 -08:00
Daniel Hiltgen
fa24e73b82 Remove CPU build, fixup linux build script 2023-12-21 18:21:31 -08:00
Daniel Hiltgen
325d74985b Fix CPU performance on hyperthreaded systems
The default thread count logic was broken and resulted in 2x the number
of threads as it should on a hyperthreading CPU
resulting in thrashing and poor performance.
2023-12-21 16:23:36 -08:00
Bruce MacDonald
fabf2f3467 allow for starting llava queries with filepath (#1549) 2023-12-21 13:20:59 -05:00
Daniel Hiltgen
d9cd3d9667 Revive windows build
The windows native setup still needs some more work, but this gets it building
again and if you set the PATH properly, you can run the resulting exe on a cuda system.
2023-12-20 17:21:54 -08:00
Patrick Devine
a607d922f0 add FAQ for slow networking in WSL2 (#1646) 2023-12-20 16:27:24 -08:00
Daniel Hiltgen
7555ea44f8 Revamp the dynamic library shim
This switches the default llama.cpp to be CPU based, and builds the GPU variants
as dynamically loaded libraries which we can select at runtime.

This also bumps the ROCm library to version 6 given 5.7 builds don't work
on the latest ROCm library that just shipped.
2023-12-20 14:45:57 -08:00
Jeffrey Morgan
df06812494 Update api.md 2023-12-20 08:47:53 -05:00
Daniel Hiltgen
1d1eb1688c Additional nvidial-ml path to check 2023-12-19 15:52:34 -08:00
Michael Yang
23dc179350 Merge pull request #1619 from jmorganca/mxyng/fix-version-test
fix(test): use real version string for comparison
2023-12-19 15:48:52 -08:00
Michael Yang
63aac0edc5 fix(test): use real version string for comparison 2023-12-19 15:03:02 -08:00
Daniel Hiltgen
6558f94ed0 Fix darwin intel build 2023-12-19 13:32:24 -08:00
Erick Ghaumez
1ca484f67e Add Langchain Dart library (#1564)
* Add Langchain Dart

* Update README.md

---------

Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com>
2023-12-19 14:04:52 -05:00
Jeffrey Morgan
72b0c32fe9 Update README.md 2023-12-19 12:59:22 -05:00
Jeffrey Morgan
68c28224f8 Update README.md 2023-12-19 12:59:03 -05:00
Daniel Hiltgen
54dbfa4c4a Carry ggml-metal.metal as payload 2023-12-19 09:05:46 -08:00
Daniel Hiltgen
5646826a79 Add WSL2 path to nvidia-ml.so library 2023-12-19 09:05:46 -08:00
Daniel Hiltgen
3269535a4c Refine handling of shim presence
This allows the CPU only builds to work on systems with Radeon cards
2023-12-19 09:05:46 -08:00
Daniel Hiltgen
1b991d0ba9 Refine build to support CPU only
If someone checks out the ollama repo and doesn't install the CUDA
library, this will ensure they can build a CPU only version
2023-12-19 09:05:46 -08:00
Daniel Hiltgen
51082535e1 Add automated test for multimodal
A simple test case that verifies llava:7b can read text in an image
2023-12-19 09:05:46 -08:00
Daniel Hiltgen
9adca7f711 Bump llama.cpp to b1662 and set n_parallel=1 2023-12-19 09:05:46 -08:00
Daniel Hiltgen
89bbaafa64 Build linux using ubuntu 20.04
This changes the container-based linux build to use an older Ubuntu
distro to improve our compatibility matrix for older user machines
2023-12-19 09:05:46 -08:00
Daniel Hiltgen
35934b2e05 Adapted rocm support to cgo based llama.cpp 2023-12-19 09:05:46 -08:00
65a
f8ef4439e9 Use build tags to generate accelerated binaries for CUDA and ROCm on Linux.
The build tags rocm or cuda must be specified to both go generate and go build.
ROCm builds should have both ROCM_PATH set (and the ROCM SDK present) as well
as CLBlast installed (for GGML) and CLBlast_DIR set in the environment to the
CLBlast cmake directory (likely /usr/lib/cmake/CLBlast). Build tags are also
used to switch VRAM detection between cuda and rocm implementations, using
added "accelerator_foo.go" files which contain architecture specific functions
and variables. accelerator_none is used when no tags are set, and a helper
function addRunner will ignore it if it is the chosen accelerator. Fix go
generate commands, thanks @deadmeu for testing.
2023-12-19 09:05:46 -08:00
Daniel Hiltgen
d4cd695759 Add cgo implementation for llama.cpp
Run the server.cpp directly inside the Go runtime via cgo
while retaining the LLM Go abstractions.
2023-12-19 09:05:46 -08:00
Bruce MacDonald
5e7fd6906f Update images.go 2023-12-19 09:05:46 -08:00
Bruce MacDonald
811b1f03c8 deprecate ggml
- remove ggml runner
- automatically pull gguf models when ggml detected
- tell users to update to gguf in the case automatic pull fails

Co-Authored-By: Jeffrey Morgan <jmorganca@gmail.com>
2023-12-19 09:05:46 -08:00
Matt Williams
ed195f3562 Merge pull request #1595 from pgibler/main
Added cmdh to community section in README
2023-12-18 20:55:18 -08:00
Matt Williams
e0d0072ef1 Merge pull request #1592 from jmorganca/mattw/examplepruning
Lets get rid of these old modelfile examples
2023-12-18 20:29:48 -08:00
pgibler
620a2ffcfb Added cmdh to community section in README 2023-12-18 22:04:40 -05:00
Matt Williams
d287013f24 Lets get rid of these old modelfile examples
Signed-off-by: Matt Williams <m@technovangelist.com>
2023-12-18 17:47:33 -08:00
Jeffrey Morgan
6b5bdfa6c9 update runner submodule 2023-12-18 17:33:46 -05:00
Jeffrey Morgan
c063ee4af0 update runner submodule to fix hipblas build 2023-12-18 15:41:13 -05:00
Bruce MacDonald
d99fa6ce0a send empty messages on last chat response (#1530) 2023-12-18 14:23:38 -05:00
Patrick Devine
3948c6ea06 add magic header for unit tests (#1558) 2023-12-18 10:41:02 -08:00
Jeffrey Morgan
b85982eb91 update runner submodule 2023-12-18 12:43:31 -05:00
Patrick Devine
86b0dd4b16 add API create/copy handlers (#1541) 2023-12-15 11:59:18 -08:00
Augustinas Malinauskas
f728738427 README with Enchanted iOS App (#1529)
* feat(docs): README with Enchanted iOS app

* Update README.md

---------

Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com>
2023-12-15 14:37:29 -05:00
Ian Purton
115048a0d8 Added Bionic GPT as a front end. (#1463)
* Added Bionic GPT as a front end.

* Update README.md

---------

Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com>
2023-12-15 14:33:04 -05:00
Bruce MacDonald
1b417a7836 use exp slices for go 1.20 compatibility (#1544) 2023-12-15 14:15:56 -05:00
Patrick Devine
0174665d0e add API tests for list handler (#1535) 2023-12-14 18:18:25 -08:00
Patrick Devine
630518f0d9 Add unit test of API routes (#1528) 2023-12-14 16:47:40 -08:00
Bruce MacDonald
6e16098a60 remove sample_count from docs (#1527)
this info has not been returned from these endpoints in some time
2023-12-14 17:49:00 -05:00
Bruce MacDonald
6ee8c80199 restore model load duration on generate response (#1524)
* restore model load duration on generate response

- set model load duration on generate and chat done response
- calculate createAt time when response created

* remove checkpoints predict opts

* Update routes.go
2023-12-14 12:15:50 -05:00
Jeffrey Morgan
31f0551dab Update runner to support mixtral and mixture of experts (MoE) (#1475) 2023-12-13 17:15:10 -05:00
Jeffrey Morgan
4a1abfe4fa fix tests 2023-12-13 14:42:30 -05:00
Jeffrey Morgan
bbd41494bf add multimodal to README.md 2023-12-13 14:38:47 -05:00
Jeffrey Morgan
fedba24a63 Docs for multimodal support (#1485)
* add multimodal docs

* add chat api docs

* consistency between `/api/generate` and `/api/chat`

* simplify docs
2023-12-13 13:59:33 -05:00
pepperoni21
e3b090dbc5 Added message format for chat api (#1488) 2023-12-13 11:21:23 -05:00
Patrick Devine
d9e60f634b add image support to the chat api (#1490) 2023-12-12 13:28:58 -08:00
Michael Yang
4251b342de Merge pull request #1469 from jmorganca/mxyng/model-types
remove per-model types
2023-12-12 12:27:03 -08:00
Jeffrey Morgan
0a9d348023 Fix issues with /set template and /set system (#1486) 2023-12-12 14:43:19 -05:00
Bruce MacDonald
3144e2a439 exponential back-off (#1484) 2023-12-12 12:33:02 -05:00
Bruce MacDonald
c0960e29b5 retry on concurrent request failure (#1483)
- remove parallel
2023-12-12 12:14:35 -05:00
ruecat
5314fc9b63 Fix Readme "Database -> MindsDB" link (#1479) 2023-12-12 10:26:13 -05:00
Jorge Torres
a36b5fef3b Update README.md (#1412) 2023-12-11 18:05:10 -05:00
Patrick Devine
910e9401d0 Multimodal support (#1216)
---------

Co-authored-by: Matt Apperson <mattapperson@Matts-MacBook-Pro.local>
2023-12-11 13:56:22 -08:00
Michael Yang
56ffc3023a remove per-model types
mostly replaced by decoding tensors except ggml models which only
support llama
2023-12-11 09:40:21 -08:00
Bruce MacDonald
7a1b37ac64 os specific ctrl-z (#1420) 2023-12-11 10:48:14 -05:00
Jeffrey Morgan
5d4d2e2c60 update docs with chat completion api 2023-12-10 13:53:36 -05:00
Jeffrey Morgan
7db5bcf73b fix go-staticcheck warning 2023-12-10 11:44:27 -05:00
Jeffrey Morgan
fa2f095bd9 fix model name returned by /api/generate being different than the model name provided 2023-12-10 11:42:15 -05:00
Jeffrey Morgan
045b855db9 fix error on accumulating final chat response 2023-12-10 11:24:39 -05:00
Jeffrey Morgan
32064a0646 fix empty response when receiving runner error 2023-12-10 10:53:38 -05:00
Jeffrey Morgan
d9a250e9b5 seek to end of file when decoding older model formats 2023-12-09 21:14:35 -05:00
Jeffrey Morgan
944519ed16 seek to eof for older model binaries 2023-12-09 20:48:57 -05:00
Jeffrey Morgan
2dd040d04c do not use --parallel 2 for old runners 2023-12-09 20:17:33 -05:00
Bruce MacDonald
bbe41ce41a fix: parallel queueing race condition caused silent failure (#1445)
* fix: queued request failures

- increase parallel requests to 2 to complete queued request, queueing is managed in ollama

* log steam errors
2023-12-09 14:14:02 -05:00
Jeffrey Morgan
9e1406e4ed Don't expose model information in /api/generate 2023-12-09 02:05:43 -08:00
Jeffrey Morgan
b74580c913 Update api.md 2023-12-08 16:02:07 -08:00
Bruce MacDonald
7e9405fd07 fix: encode full previous prompt in context (#1424) 2023-12-08 16:53:51 -05:00
Bruce MacDonald
3b0b8930d4 fix: only flush template in chat when current role encountered (#1426) 2023-12-08 16:44:24 -05:00
Bruce MacDonald
e3f925fc1b fix: restore modelfile system in prompt template (#1425) 2023-12-08 14:20:19 -05:00
Jeffrey Morgan
2a2289fb6b Update api.md 2023-12-08 09:36:45 -08:00
Matt Williams
dd427f499a Merge pull request #1419 from jmorganca/mattw/typescript-simplechat
Simple chat example for typescript
2023-12-07 14:42:24 -08:00
Michael Yang
2ae573c7ed Merge pull request #1421 from jmorganca/mxyng/fix-newline
fix redundant newline
2023-12-07 13:47:23 -08:00
Matt Williams
02fe26c44b update the readme as per bruce
Signed-off-by: Matt Williams <m@technovangelist.com>
2023-12-07 13:46:30 -08:00
Michael Yang
16c7548460 fix redundant newline 2023-12-07 13:44:45 -08:00
Matt Williams
fa75998c0d Update examples/typescript-simplechat/readme.md
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com>
2023-12-07 13:40:54 -08:00
Matt Williams
5344f886c8 Update examples/typescript-simplechat/client.ts
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com>
2023-12-07 13:40:37 -08:00
Matt Williams
6cc823c9b5 Update examples/typescript-simplechat/client.ts
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com>
2023-12-07 13:39:59 -08:00
Matt Williams
b84d34e632 Update examples/typescript-simplechat/readme.md
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com>
2023-12-07 13:39:33 -08:00
Matt Williams
30229a913c Update examples/typescript-simplechat/client.ts
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com>
2023-12-07 13:39:24 -08:00
Matt Williams
1ade380bd7 Simple chat example for typescript
Signed-off-by: Matt Williams <m@technovangelist.com>
2023-12-07 11:48:25 -08:00
Jeffrey Morgan
ba264e9da8 add future version note to chat api docs 2023-12-07 09:42:15 -08:00
Matt Williams
a2405ec831 Merge pull request #1409 from jmorganca/mattw/python-simplechat
Simple chat example
2023-12-06 15:49:45 -08:00
Matt Williams
ce809bb529 Merge branch 'mattw/python-simplechat' of github.com:jmorganca/ollama into mattw/python-simplechat 2023-12-06 15:48:42 -08:00
Matt Williams
76bc4d0458 Cleanup as per Bruce
Signed-off-by: Matt Williams <m@technovangelist.com>
2023-12-06 15:44:40 -08:00
Bruce MacDonald
4a02945a15 Update examples/python-simplechat/client.py 2023-12-06 18:36:45 -05:00
Matt Williams
aec742b6d2 Update examples/python-simplechat/readme.md
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com>
2023-12-06 15:30:45 -08:00
Matt Williams
f337642e94 Update examples/python-simplechat/readme.md
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com>
2023-12-06 15:30:35 -08:00
Matt Williams
51131cc6e2 Update examples/python-simplechat/client.py
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com>
2023-12-06 15:30:10 -08:00
Matt Williams
43027789dc Simple chat example
Signed-off-by: Matt Williams <m@technovangelist.com>
2023-12-06 14:35:58 -08:00
Xe Iaso
f9b7d65e2b docs/tutorials: add bit on how to use Fly GPUs on-demand with Ollama (#1406)
Signed-off-by: Xe Iaso <xe@camellia.finch-kitefin.ts.net>
2023-12-06 14:14:02 -08:00
Michael Yang
1f05d77110 Merge pull request #1244 from jmorganca/brucemacd/no-fail-template
do not fail on unsupported template variables
2023-12-06 13:23:04 -08:00
Michael Yang
c3ff36088b Merge pull request #774 from jmorganca/mxyng/server-version
add version api and show server version in cli
2023-12-06 13:22:55 -08:00
Samuel Calderon
13524b5e72 List "Send chat messages" in table of contents (#1399)
Thank you @calderonsamuel
2023-12-06 12:34:27 -08:00
Michael Yang
f1b049fed8 Merge pull request #1377 from jmorganca/mxyng/qwen
update for qwen
2023-12-06 12:31:51 -08:00
Jeffrey Morgan
97c5696945 fix base urls in chat examples 2023-12-06 12:10:20 -08:00
Bruce MacDonald
47d4e22673 use missingkey in set empty interface when missing 2023-12-05 15:49:05 -08:00
Michael Yang
32f62fbb8e Merge pull request #1334 from jmorganca/mxyng/load-projectors
load projectors
2023-12-05 14:40:53 -08:00
Michael Yang
5d75505ebd return model configuration in generate 2023-12-05 14:39:02 -08:00
Michael Yang
b9495ea162 load projectors 2023-12-05 14:36:12 -08:00
Michael Yang
409bb9674e Merge pull request #1308 from jmorganca/mxyng/split-from
split from into one or more models
2023-12-05 14:33:03 -08:00
Michael Yang
d3479c07a1 Merge pull request #1250 from jmorganca/mxyng/create-layer
refactor layer creation
2023-12-05 14:32:52 -08:00
Michael Yang
b12f1b984f Merge pull request #1393 from jmorganca/mxyng/fix-whitespace
fix: trim space in modelfile fields
2023-12-05 12:18:01 -08:00
Bruce MacDonald
195e3d9dbd chat api endpoint (#1392) 2023-12-05 14:57:33 -05:00
Michael Yang
38fe1a368b fix: trim space in modelfile fields 2023-12-05 11:57:29 -08:00
Michael Yang
4b77fcb2b9 comments 2023-12-05 09:43:50 -08:00
Michael Yang
cde13bcdea cmd: only print server version when different 2023-12-05 09:36:01 -08:00
Michael Yang
0f0cd265a7 cmd: add server version 2023-12-05 09:36:01 -08:00
Michael Yang
0db4706ec2 api: add version api handler 2023-12-05 09:36:01 -08:00
Michael Yang
1ebdbd9694 server: add version handler 2023-12-05 09:36:01 -08:00
Michael Yang
5c59455b59 cmd: use existing cmd context 2023-12-05 09:36:01 -08:00
Jeffrey Morgan
00d06619a1 Revert "chat api (#991)" while context variable is fixed
This reverts commit 7a0899d62d.
2023-12-04 21:16:27 -08:00
Matt Williams
f1ef3f9947 remove mention of gpt-neox in import (#1381)
Signed-off-by: Matt Williams <m@technovangelist.com>
2023-12-04 20:58:10 -08:00
Michael Yang
5a5dca13b2 comments 2023-12-04 16:59:23 -08:00
Michael Yang
7232f1fa41 go mod tidy 2023-12-04 16:59:23 -08:00
Michael Yang
72e7a49aa9 seek instead of copyn 2023-12-04 16:59:23 -08:00
Michael Yang
a3737cbd33 use NewLayer for CreateBlobHandler 2023-12-04 16:59:23 -08:00
Michael Yang
998f1785b6 add modelfamilies 2023-12-04 16:59:23 -08:00
Michael Yang
70a93057cd refactor layer creation
previous layer creation was not ideal because:

1. it required reading the input file multiple times, once to calculate
   the sha256 checksum, another to write it to disk, and potentially one
   more to decode the underlying gguf
2. used io.ReadSeeker which is prone to user error. if the file isn't
   reset correctly or in the right place, it could end up reading an
   empty file

there are also some brittleness when reading existing layers else
writing the inherited layers will error reading an already closed file

this commit aims to fix these issues by restructuring layer creation.

1. it will now write the layer to a temporary file as well as the hash
   function and move it to the final location on Commit
2. layers are read once once when copied to the destination. exception
   is raw model files which still requires a second read to decode the
   model metadata
2023-12-04 16:59:23 -08:00
Michael Yang
2cb0fa7d40 split from into one or more models 2023-12-04 16:59:23 -08:00
Michael Yang
b2816bca67 unnecessary ReadSeeker for DecodeGGML 2023-12-04 16:59:23 -08:00
Patrick Devine
bf704423c5 revert cli to use /api/generate (#1383) 2023-12-04 16:35:29 -08:00
Bruce MacDonald
7a0899d62d chat api (#991)
- update chat docs
- add messages chat endpoint
- remove deprecated context and template generate parameters from docs
- context and template are still supported for the time being and will continue to work as expected
- add partial response to chat history
2023-12-04 18:01:06 -05:00
Michael Yang
0cca1486dd Merge pull request #1376 from jmorganca/mxyng/rocky-install
install: fix rocky kernel packages
2023-12-04 14:23:43 -08:00
Patrick Devine
2113c9d31a make linewrap still work when the terminal width has changed (#1350) 2023-12-04 14:14:56 -08:00
Michael Yang
6deebf2489 update for qwen 2023-12-04 11:38:05 -08:00
Michael Yang
95cb38ae47 install: fix rocky kernel packages 2023-12-04 11:10:42 -08:00
ruecat
1f126afb2d Ollama Telegram Bot (#1364)
* Add "ollama-telegram" to Extensions & Plugins

* Update README.md
2023-12-03 11:19:55 -08:00
Jeffrey Morgan
f6201a7a6c remove duplicate community integration in README.md 2023-12-02 21:18:13 -08:00
Michael Yang
b3f6c6598f Merge pull request #1349 from jmorganca/mxyng/ctrl-z
handle ctrl+z
2023-12-01 16:21:49 -08:00
Michael Yang
88620e983a handle ctrl+z 2023-12-01 16:15:20 -08:00
Michael Yang
cedae0d17a Merge pull request #1347 from jshph/adapter-hash
Fix adapter loading from SHA hash
2023-12-01 11:08:25 -08:00
Joshua Pham
bb80a597db Fix adapter loading from SHA hash 2023-12-01 13:50:55 -05:00
Patrick Devine
6681d37861 allow setting the system and template for prompts in the repl (#1335) 2023-12-01 09:28:35 -08:00
Michael Yang
0409c1fa59 docker: set PATH, LD_LIBRARY_PATH, and capabilities (#1336)
* docker: set PATH, LD_LIBRARY_PATH, and capabilities

* example: update k8s gpu manifest
2023-11-30 21:16:56 -08:00
Michael Yang
b56e92470a Merge pull request #1229 from jmorganca/mxyng/calculate-as-you-go
revert checksum calculation to calculate-as-you-go
2023-11-30 10:54:38 -08:00
Jeffrey Morgan
5687f1a0cf fix unexpected end of response errors when cancelling in ollama run 2023-11-30 00:30:21 -05:00
James Radtke
7eda3d0c55 Corrected transposed 129 to 192 for OLLAMA_ORIGINS example (#1325) 2023-11-29 22:44:17 -05:00
Bruce MacDonald
7194a07d4d Add chatd to example projects 2023-11-29 21:18:21 -05:00
Michael Yang
13efd5f218 upload: fix PUT retry 2023-11-29 16:38:35 -08:00
Michael Yang
c4bdfffd96 upload: separate progress tracking 2023-11-29 16:38:33 -08:00
Michael Yang
26c63418e0 new hasher 2023-11-29 14:52:41 -08:00
Michael Yang
2799784ac8 revert checksum calculation to calculate-as-you-go 2023-11-29 13:47:58 -08:00
Alec Hammond
91897a606f Add OllamaEmbeddings to python LangChain example (#994)
* Add OllamaEmbeddings to python LangChain example

* typo

---------

Co-authored-by: Alec Hammond <alechammond@fb.com>
2023-11-29 16:25:39 -05:00
Bruce MacDonald
96122b7271 validate model tags on copy (#1323) 2023-11-29 15:54:29 -05:00
jeremiahbuckley
39be7fdb98 fix rhel cuda install (#1321)
Co-authored-by: Cloud User <azureuser@testgpu2.hqzwom21okjenksna4y3c4ymjd.phxx.internal.cloudapp.net>
2023-11-29 14:55:15 -05:00
Timothy Jaeryang Baek
c2e3b89176 fix: disable ':' in tag names (#1280)
Co-authored-by: rootedbox
2023-11-29 13:33:45 -05:00
Patrick Devine
cde31cb220 Allow setting parameters in the REPL (#1294) 2023-11-29 09:56:42 -08:00
ToasterUwU
63097607b2 Correct MacOS Host port example (#1301) 2023-11-29 11:44:03 -05:00
Michael
2ae80e1e27 Update README.md
add new recent models as examples
2023-11-28 22:16:37 -05:00
Michael Yang
b173cfc558 Merge pull request #1195 from jmorganca/mxyng/fix-bar-rate
progress: fix bar rate
2023-11-28 11:55:23 -08:00
Michael Yang
424d53ac70 progress: fix bar rate 2023-11-28 11:44:56 -08:00
ftorto
e1a69d44c9 Update faq.md (#1299)
Fix a typo in the CA update command
2023-11-28 09:54:42 -05:00
Jason Jacobs
3d620f9462 ignore jetbrain ides (#1287) 2023-11-27 15:57:45 -05:00
Bruce MacDonald
928950fcc6 update python client create example (#1227)
* add remote create to python example client
2023-11-27 15:36:19 -05:00
Kasumi
39c6d949fc Add Amica to community integrations (#1281) 2023-11-27 10:44:37 -05:00
Jeffrey Morgan
16a9006306 add back f16c instructions on intel mac 2023-11-26 15:59:49 -05:00
Jeffrey Morgan
e9216ea459 fix readline history on linux 2023-11-26 15:59:04 -05:00
Jeffrey Morgan
9e4a316405 update submodule commit 2023-11-26 14:52:00 -05:00
Jeffrey Morgan
9fb5e8399c Fix issues with inputting and formatting multi line strings in ollama run
Co-authored-by: Wen Sun <iwendellsun@gmail.com>
2023-11-26 12:54:29 -05:00
Jing Zhang
82b9b329ff windows CUDA support (#1262)
* Support cuda build in Windows
* Enable dynamic NumGPU allocation for Windows
2023-11-24 17:16:36 -05:00
Jongwook Choi
12e8c12d2b Disable CUDA peer access as a workaround for multi-gpu inference bug (#1261)
When CUDA peer access is enabled, multi-gpu inference will produce
garbage output. This is a known bug of llama.cpp (or nvidia). Until the
upstream bug is fixed, we can disable CUDA peer access temporarily
to ensure correct output.

See #961.
2023-11-24 14:05:57 -05:00
Jeffrey Morgan
d77dde126b consistent cpu instructions on macos and linux 2023-11-22 16:26:46 -05:00
Michael Yang
c7e70cd3bb Merge pull request #1245 from jmorganca/mxyng/gguf-int
fix: gguf int type
2023-11-22 11:42:56 -08:00
Michael Yang
199941cd15 fix: gguf int type 2023-11-22 11:40:30 -08:00
Long Huynh
c9474f7f61 Update README.md - Community Integrations - Obsidian BMO Chatbot plugin (#1239) 2023-11-22 14:32:30 -05:00
Jeffrey Morgan
927e3ba4a4 tag image with correct version when building with build_docker script 2023-11-22 14:32:17 -05:00
Bruce MacDonald
37d95157df fix relative path on create (#1222) 2023-11-21 15:43:17 -05:00
Jeffrey Morgan
2eaa95b417 Update api.md 2023-11-21 15:32:05 -05:00
Kevin Cao
3cd07728f4 Make alt+backspace delete word (#1223) 2023-11-21 12:26:47 -08:00
Michael Yang
ecf8b793f0 Merge pull request #1224 from jmorganca/mxyng/update
update llama.cpp
2023-11-21 12:21:59 -08:00
Matt Williams
abf294826b Merge pull request #1221 from jmorganca/mattw/communityinstalls
add installation packages category to community
2023-11-21 12:12:23 -08:00
Steve Korshakov
ae06bb426b add Llama Coder (#1225)
* add Llama Coder
* Update README.md
2023-11-21 14:08:19 -05:00
Matt Williams
d8e0f62ebb Merge pull request #1159 from jmorganca/mattw/functioncalling
Example: Function Calling in Typescript
2023-11-21 10:06:55 -08:00
Michael Yang
a00fac4ec8 update llama.cpp 2023-11-21 09:50:02 -08:00
Jeffrey Morgan
f2113c1fc7 fix potential error in progress bar calculation 2023-11-21 12:48:20 -05:00
Jeffrey Morgan
6452e2ecb8 fix cases where progress bar would not be fixed size 2023-11-21 12:07:25 -05:00
Matt Williams
9a28e263a5 Update README.md
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>
2023-11-21 07:25:32 -08:00
Matt Williams
0c066c9214 Update README.md
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>
2023-11-21 07:25:26 -08:00
Jeffrey Morgan
aabd71aede fix rendering and variable width issues on progress bar 2023-11-21 10:02:37 -05:00
Matt Williams
da4d7c9f9c add installation packages category to community
Moved the arch package and someone has added a pr for brew.
that needs to get updated to be a link.

Signed-off-by: Matt Williams <m@technovangelist.com>
2023-11-21 06:40:59 -08:00
Matt Williams
f321b13a03 Merge pull request #1178 from tusharhero/install-instructions-archlinux
Add Installation instructions for Archlinux
2023-11-21 06:33:22 -08:00
Matt Williams
5ebcde1541 Merge branch 'main' into install-instructions-archlinux 2023-11-21 06:32:50 -08:00
Matt Williams
45206cb7cc Merge pull request #1218 from danemadsen/main
Update Maid repo
2023-11-21 06:30:33 -08:00
Matt Williams
6e65b84f54 Merge pull request #1219 from dustinblackman/main
docs: Add Oatmeal to terminal integrations
2023-11-21 06:28:12 -08:00
Dustin Blackman
c00ce12e83 docs: Add Oatmeal to terminal integrations 2023-11-21 06:47:43 -05:00
tusharhero
e1cd3152c9 Move Archlinux package to Community Integrations section. 2023-11-21 16:28:50 +05:30
Dane Madsen
0bef3778c9 Update README.md 2023-11-21 21:02:13 +11:00
Dane Madsen
6ebab38b89 Merge branch 'jmorganca:main' into main 2023-11-21 20:01:13 +10:00
Dane Madsen
5d8e864d44 Update Maid repo 2023-11-21 21:00:54 +11:00
Matt Williams
5f7acd0bbd remove 'recent'
Signed-off-by: Matt Williams <m@technovangelist.com>
2023-11-20 17:03:25 -08:00
Matt Williams
44b3a1ad42 Merge branch 'mattw/functioncalling' of github.com:jmorganca/ollama into mattw/functioncalling
Signed-off-by: Matt Williams <m@technovangelist.com>
2023-11-20 17:01:41 -08:00
Matt Williams
0260be4414 remove 'recently'
Signed-off-by: Matt Williams <m@technovangelist.com>
2023-11-20 16:57:07 -08:00
Jeffrey Morgan
a3fcecf943 only set main_gpu if value > 0 is provided 2023-11-20 19:54:04 -05:00
Jeffrey Morgan
df07e4a097 remove redundant filename parameter (#1213) 2023-11-20 17:05:36 -05:00
Michael Yang
0b7ade0d4c Merge pull request #1212 from jmorganca/mxyng/metal
enable metal for fp32, q5_0, q5_1
2023-11-20 13:56:39 -08:00
Michael Yang
19b7a4d715 recent llama.cpp update added kernels for fp32, q5_0, and q5_1 2023-11-20 13:44:31 -08:00
Bruce MacDonald
31ab453d37 resolve FROM path before sending modelfile (#1211) 2023-11-20 16:43:48 -05:00
Jeffrey Morgan
35c4b5ec16 calculate hash separately from http request 2023-11-20 15:45:11 -05:00
James Braza
f24741ff39 Documenting how to view Modelfiles (#723)
* Documented viewing Modelfiles in ollama.ai/library

* Moved Modelfile in ollama.ai down per request
2023-11-20 15:24:29 -05:00
Jeffrey Morgan
8c4022b06b fix initial progress stats 2023-11-20 14:33:46 -05:00
Jeffrey Morgan
433702f421 hide progress stats on completion 2023-11-20 14:22:39 -05:00
Matt Williams
48896f626c Update examples/typescript-functioncalling/extractwp.ts
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com>
2023-11-20 10:12:10 -08:00
Matt Williams
c57aee6fba Update examples/typescript-functioncalling/readme.md
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com>
2023-11-20 10:10:42 -08:00
Jeffrey Morgan
6066c70edd restore progress messages for older endpoints 2023-11-20 11:37:17 -05:00
Jeffrey Morgan
f10ac5de19 restore stats updated every second to progress bar 2023-11-20 10:58:19 -05:00
Jeffrey Morgan
93a108214c only show decimal points for smaller file size numbers 2023-11-20 10:58:19 -05:00
Purinda Gunasekara
be61a81758 main-gpu argument is not getting passed to llamacpp, fixed. (#1192) 2023-11-20 10:52:52 -05:00
Toni Soriano
2fdf1b5ff8 add laravel package to README.md (#1208)
Co-authored-by: Toni <cloudstudio@Tonis-Mac-mini.local>
2023-11-20 10:48:35 -05:00
Huy Le
331068b964 Adding ogpt.nvim into the list of plugins! (#1190)
* adding ollama.nvim for visibility

* adding an ogpt.nvim neovim plugin
2023-11-20 10:39:14 -05:00
Andy Brenneke
0179d8eb6b Add Rivet to Community Integrations (#1183) 2023-11-20 10:36:47 -05:00
Eli Bendersky
be48741308 README: link to LangChainGo for talking to ollama, with an example (#1206) 2023-11-20 10:35:07 -05:00
Jeffrey Morgan
6bbd6e26fb fix temporary newline created and removed with spinner in ollama run 2023-11-20 00:49:08 -05:00
Jeffrey Morgan
e6ad4813d3 dont crash when redirecting stderr 2023-11-19 23:50:45 -05:00
Jeffrey Morgan
13ba6df5ab enable cpu instructions on intel macs 2023-11-19 23:20:26 -05:00
Jeffrey Morgan
9d73d3a6b5 add back part.Reset() 2023-11-19 14:32:19 -05:00
Jeffrey Morgan
72cd336410 dont retry on upload complete context cancel 2023-11-19 14:32:19 -05:00
Jeffrey Morgan
1bd594b2fa revert to using one open file for blob uploads 2023-11-19 14:32:19 -05:00
Jeffrey Morgan
9a8c21ac3d use exponential everywhere 2023-11-19 14:32:19 -05:00
Jeffrey Morgan
f6b317e8c9 fix sending too little data in chunk upload body 2023-11-19 14:32:19 -05:00
Jeffrey Morgan
ac5076ce1e exponential backoff up to 30s 2023-11-19 14:32:19 -05:00
Michael Yang
42c2e3a624 upload: retry complete upload 2023-11-19 14:32:19 -05:00
Michael Yang
cb42589792 adjust download/upload parts 2023-11-19 14:32:19 -05:00
Jeffrey Morgan
258addc799 fix comment in progress.go 2023-11-19 13:46:19 -05:00
Jeffrey Morgan
c06b9b7304 update progress rendering to be closer to v0.1.10 2023-11-19 13:43:21 -05:00
Jeffrey Morgan
95b9acd324 improve pull percentage rendering 2023-11-19 11:00:43 -05:00
Jeffrey Morgan
04cbf5ccc0 progress bar styling improvements 2023-11-19 09:54:33 -05:00
Jeffrey Morgan
e1d7056496 update progress statuses 2023-11-19 09:21:13 -05:00
Jeffrey Morgan
02524a56ff check retry for authorization error 2023-11-19 00:19:53 -05:00
Jeffrey Morgan
1657c6abc7 add note to specify JSON in the prompt when using JSON mode 2023-11-18 22:59:26 -05:00
Jeffrey Morgan
12e046f12a remove unused function 2023-11-18 22:16:51 -05:00
Jeffrey Morgan
36a3bbf65f Update llm/llama.go 2023-11-18 21:25:07 -05:00
Bruce MacDonald
43a726149d fix potentially inaccurate error message 2023-11-18 21:25:07 -05:00
Jeffrey Morgan
984714f131 update status text when transfering blob on ollama create 2023-11-18 09:40:10 -05:00
Jeffrey Morgan
bab9494176 add - separator to temp file created on ollama create 2023-11-18 09:39:52 -05:00
Jeffrey Morgan
85e4441c6a cache docker builds 2023-11-18 08:51:38 -05:00
Michael Yang
42e43736a4 Merge pull request #1186 from jmorganca/mxyng/copy-blob
fix cross device rename
2023-11-17 21:54:53 -08:00
Michael Yang
c6e6c8ee7e fix cross device rename 2023-11-17 15:22:17 -08:00
Jeffrey Morgan
a185b29719 fix install script error on linux 2023-11-17 18:00:41 -05:00
Michael Yang
dc84b20d6b Merge pull request #1104 from jmorganca/mxyng/jupyter
add jupyter notebook example
2023-11-17 14:46:26 -08:00
Michael Yang
ad8659b980 Merge pull request #1161 from jmorganca/mxyng/systemd-placeholder
placeholder environment variables
2023-11-17 14:45:38 -08:00
Michael Yang
c1bbf5ddee Merge pull request #1134 from jmorganca/mxyng/progress
progress bar
2023-11-17 14:03:35 -08:00
Bruce MacDonald
0b19e24d81 only retry once on auth failure (#1175) 2023-11-17 14:22:35 -05:00
Michael Yang
3cb07d2773 simplify StopAndClear 2023-11-17 10:26:22 -08:00
Michael Yang
976068369b stop all spinners on progress stop 2023-11-17 10:06:19 -08:00
Michael Yang
4d677ee389 no divide by zero 2023-11-17 10:06:19 -08:00
Michael Yang
7ea905871a only move cursor up if pos > 0 2023-11-17 10:06:19 -08:00
Michael Yang
d6ecaa2cbf update progress responses 2023-11-17 10:06:19 -08:00
Michael Yang
4dcf7a59b1 generate progress 2023-11-17 10:06:19 -08:00
Michael Yang
1c0e092ead progress cmd 2023-11-17 10:06:19 -08:00
Michael Yang
c4a3ccd7ac progress 2023-11-17 10:06:19 -08:00
Michael Yang
9f04e5a8ea format bytes 2023-11-17 10:06:19 -08:00
Michael Yang
f91bb2f7f0 remove progressbar 2023-11-17 10:06:19 -08:00
Michael Yang
0813387414 Merge pull request #1177 from jmorganca/mxyng/faq
faq: fix heading and add more details
2023-11-17 10:05:21 -08:00
Michael Yang
4936b5bb37 add jupyter readme 2023-11-17 10:04:52 -08:00
tusharhero
786288829e Make Archlinux a sub-heading of Linux. 2023-11-17 23:17:36 +05:30
tusharhero
72dcc952b6 Add Installation instructions for Archlinux
Pacman is the recommended installation method. And the package is in
the official repository, so makes sense to mention it in the README.
2023-11-17 23:13:40 +05:30
Michael Yang
f7f6d6c693 Update examples/jupyter-notebook/ollama.ipynb
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com>
2023-11-17 09:30:07 -08:00
Michael Yang
a3053b66d2 add jupyter notebook example 2023-11-17 09:30:07 -08:00
Michael Yang
c82ead4d01 faq: fix heading and add more details 2023-11-17 09:02:17 -08:00
Michael Yang
90860b6a7e update faq (#1176) 2023-11-17 11:42:58 -05:00
Jeffrey Morgan
81092147c4 remove unnecessary -X POST from example curl commands 2023-11-17 09:50:38 -05:00
Jeffrey Morgan
92656a74b7 Use llama2 as the model in api.md 2023-11-17 07:17:51 -05:00
Jeffrey Morgan
41434a7cdc build intel mac with correct binary and compile flags 2023-11-16 22:14:51 -05:00
Michael Yang
71687ab809 Merge pull request #1164 from jmorganca/mxyng/faq
update faq
2023-11-16 17:20:18 -08:00
Michael Yang
d8842b4d4b update faq 2023-11-16 17:07:36 -08:00
Michael Yang
32add8577d placeholder environment variables 2023-11-16 16:57:39 -08:00
Michael Yang
585f9c01fa Merge pull request #1160 from jmorganca/mxyng/faq
update faq
2023-11-16 16:48:51 -08:00
Michael Yang
c13bde962d Update docs/faq.md
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>
2023-11-16 16:48:38 -08:00
Michael Yang
ee307937fd update faq 2023-11-16 16:46:43 -08:00
Matt Williams
ab6639bc47 Merge pull request #1074 from jmorganca/mattw/loganalysisexample
Log Analysis Example
2023-11-16 16:33:07 -08:00
Matt Williams
fefae84c06 example: function calling
Signed-off-by: Matt Williams <m@technovangelist.com>
2023-11-16 16:26:29 -08:00
Jeffrey Morgan
dbe6e77472 Update README.md 2023-11-16 16:46:38 -05:00
Bruce MacDonald
4b3f4bc7d9 return failure details when unauthorized to push (#1131)
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>
2023-11-16 16:44:18 -05:00
Michael Yang
a5ccf742c1 fix cross repo mounts 2023-11-16 16:33:30 -05:00
Michael Yang
e33ef391cd fix push scope error for inherited model 2023-11-16 16:33:30 -05:00
yanndegat
75295b9528 install: fix enable contrib on debian 12 (#1151)
On debian 12, sources definitions have moved from
/etc/apt/sources.list to /etc/apt/sources.list.d/debian.sources
2023-11-16 15:53:06 -05:00
Matt Williams
db5ef3004c Merge pull request #1079 from jmorganca/mattw/jsonexample
Add example using JSON format output
2023-11-16 09:13:34 -08:00
Michael Yang
b5f158f046 add faq for proxies (#1147) 2023-11-16 11:43:37 -05:00
Piero Savastano
30141b42e9 Add Cheshire Cat to community integrations (#1124) 2023-11-16 11:30:54 -05:00
Dane Madsen
5f301ece1d Add Maid to Community Integrations (#1120) 2023-11-16 11:27:53 -05:00
Michael Yang
77954bea0e Merge pull request #898 from jmorganca/mxyng/build-context
create remote models
2023-11-15 16:41:12 -08:00
Michael Yang
54f92f01cb update docs 2023-11-15 15:28:15 -08:00
Michael
30ae6e731e Update randomaddresses.py 2023-11-15 18:24:50 -05:00
Michael
b28a30f7ba Update examples/python-json-datagenerator/predefinedschema.py
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com>
2023-11-15 18:23:36 -05:00
Jeffrey Morgan
ecd71347ab Update faq.md 2023-11-15 18:17:13 -05:00
Jeffrey Morgan
8ee4cbea0f Remove table of contents in faq.md 2023-11-15 18:16:27 -05:00
Michael Yang
652d90e1c7 Update server/images.go
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com>
2023-11-15 15:16:23 -08:00
Michael Yang
bc22d5a38b no blob response 2023-11-15 15:16:23 -08:00
Michael Yang
71d71d0988 update docs 2023-11-15 15:16:23 -08:00
Michael Yang
1901044b07 use checksum reference 2023-11-15 15:16:23 -08:00
Michael Yang
d660eebf22 fix create from model tag 2023-11-15 15:16:23 -08:00
Michael Yang
cac11c9137 update api docs 2023-11-15 15:16:23 -08:00
Michael Yang
a07c935d34 ignore non blobs 2023-11-15 15:16:23 -08:00
Michael Yang
1552cee59f client create modelfile 2023-11-15 15:16:23 -08:00
Michael Yang
3ca56b5ada add create modelfile field 2023-11-15 15:16:23 -08:00
Michael Yang
b0d14ed51c refactor create model 2023-11-15 15:16:23 -08:00
Matt Williams
f61f340279 FAQ: answer a few faq questions (#1128)
* faq: does ollama share my prompts

Signed-off-by: Matt Williams <m@technovangelist.com>

* faq: ollama and openai

Signed-off-by: Matt Williams <m@technovangelist.com>

* faq: vscode plugins

Signed-off-by: Matt Williams <m@technovangelist.com>

* faq: send a doc to Ollama

Signed-off-by: Matt Williams <m@technovangelist.com>

* extra spacing

Signed-off-by: Matt Williams <m@technovangelist.com>

* Update faq.md

* Update faq.md

---------

Signed-off-by: Matt Williams <m@technovangelist.com>
Co-authored-by: Michael <mchiang0610@users.noreply.github.com>
2023-11-15 18:05:13 -05:00
Michael Yang
686f85d6ca Merge pull request #1132 from jmorganca/mxyng/human-bytes
replace go-humanize with format.HumanBytes
2023-11-15 09:46:21 -08:00
bnodnarb
85951d25ef Created tutorial for running Ollama on NVIDIA Jetson devices (#1098) 2023-11-15 12:32:37 -05:00
Dane Madsen
779e196ef6 Merge branch 'jmorganca:main' into main 2023-11-15 21:38:07 +10:00
Michael Yang
01ea6002c4 replace go-humanize with format.HumanBytes 2023-11-14 14:57:41 -08:00
Jeffrey Morgan
423862042a treat ollama run model < file as entire prompt, not prompt-per-line (#1126)
Previously, `ollama run` treated a non-terminal stdin (such as `ollama run model < file`) as containing one prompt per line. To run inference on a multi-line prompt, the only non-API workaround was to run `ollama run` interactively and wrap the prompt in `"""..."""`.

Now, `ollama run` treats a non-terminal stdin as containing a single prompt. For example, if `myprompt.txt` is a multi-line file, then `ollama run model < myprompt.txt` would treat `myprompt.txt`'s entire contents as the prompt.

Co-authored-by: Quinn Slack <quinn@slack.org>
2023-11-14 16:42:21 -05:00
Bruce MacDonald
df18486c35 Move /generate format to optional parameters (#1127)
This field is optional and should be under the `Advanced parameters` header
2023-11-14 16:12:30 -05:00
Jeffrey Morgan
4e612a2e92 use stdout fd for terminal size (#1125) 2023-11-14 16:09:09 -05:00
Matt Williams
47ffb81db7 Update examples/python-json-datagenerator/readme.md
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com>
2023-11-14 10:33:34 -08:00
Matt Williams
69795d2db0 Update examples/python-json-datagenerator/readme.md
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com>
2023-11-14 10:33:16 -08:00
Matt Williams
acde0819d9 Update examples/python-json-datagenerator/randomaddresses.py
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com>
2023-11-14 10:33:02 -08:00
Matt Williams
f748331aa3 Update examples/python-json-datagenerator/predefinedschema.py
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com>
2023-11-14 10:32:45 -08:00
Matt Williams
f4edc302a8 Update examples/python-loganalysis/readme.md
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com>
2023-11-14 10:31:22 -08:00
Matt Williams
64b7e0c218 Update examples/python-loganalysis/loganalysis.py
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com>
2023-11-14 10:31:05 -08:00
Matt Williams
eced0d52ab Update examples/python-loganalysis/loganalysis.py
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com>
2023-11-14 10:30:30 -08:00
Matt Williams
96bf9cafa7 Update examples/python-loganalysis/loganalysis.py
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com>
2023-11-14 10:30:17 -08:00
Jeffrey Morgan
6e0f686afa --format json should work in interactive mode 2023-11-14 10:22:03 -05:00
Dane Madsen
c1a5220860 Update README.md 2023-11-14 15:31:31 +10:00
Dane Madsen
3b15175a70 Add maid to community integrations 2023-11-14 15:30:03 +10:00
Jeffrey Morgan
c1844bbee2 add json mode to cli (#1095) 2023-11-13 21:54:02 -05:00
Huy Le
cb745965ce adding ollama.nvim for visibility (#1115) 2023-11-13 17:00:17 -05:00
Enrico Ros
8d29b6a2b6 New big-AGI integration (#1078)
* New big-AGI integration

Ollama works great in big-AGI, and this document explains how to link the two projects.

* Update README.md
2023-11-13 16:59:00 -05:00
Ilya Breitburg
724aa64bee Add Dart library to README.md (#1106) 2023-11-13 14:50:42 -05:00
Michael Yang
d91c103e74 Merge pull request #1055 from dansreis/946-fix-incorrect-base-model-name
Fixed incorrect base model name
2023-11-13 08:42:55 -08:00
Kevin Hermawan
98ec7d81e3 Add OllamaKit to the community integrations (#1085) 2023-11-11 14:41:42 -08:00
Matt Williams
b6817a83d8 Add gif and finish readme
Signed-off-by: Matt Williams <m@technovangelist.com>
2023-11-10 16:41:48 -06:00
Matt Williams
73f3448ede add example showing use of JSON format
Signed-off-by: Matt Williams <m@technovangelist.com>
2023-11-10 16:33:56 -06:00
Daniel Reis
7c438f2c53 Replaced method 2023-11-10 20:22:03 +00:00
Daniel Reis
6e46338d44 Reverting previous changes 2023-11-10 20:21:35 +00:00
Jeffrey Morgan
cdddd3df65 add format to example python client 2023-11-10 10:22:21 -08:00
Daniel Hiltgen
afa61bdf45 Merge pull request #1075 from jmorganca/dhiltgen/unexpected-eof
Resume chunk download on UnexpectedEOF errors
2023-11-10 08:48:27 -08:00
Daniel Hiltgen
cc54a416c6 Resume chunk download on UnexpectedEOF errors
If the chunk download is interrupted, resume from where we left off
2023-11-10 08:29:42 -08:00
Matt Williams
c819d7f68a Merge pull request #955 from jmorganca/mattw/example-bash-compare
docs: add examples using bash to compare models
2023-11-10 08:59:32 -06:00
Matt Williams
e4f59ba073 better streaming plus gif
Signed-off-by: Matt Williams <m@technovangelist.com>
2023-11-10 08:55:17 -06:00
Matt Williams
5de568bffe Add a simple log analysis example
Signed-off-by: Matt Williams <m@technovangelist.com>
2023-11-10 08:28:52 -06:00
Jeffrey Morgan
5cba29b9d6 JSON mode: add `"format" as an api parameter (#1051)
* add `"format": "json"` as an API parameter
---------
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com>
2023-11-09 16:44:02 -08:00
Daniel Reis
d17730356a Removed inline parse model path 2023-11-09 22:44:26 +00:00
Daniel Reis
32d79a6eea Using 'GetShortTagname' method instead 2023-11-09 22:40:37 +00:00
Bruce MacDonald
5b39503bcd document specifying multiple stop params (#1061) 2023-11-09 13:16:26 -08:00
Bruce MacDonald
1ae84bc2a2 skip gpu if less than 2GB VRAM are available (#1059) 2023-11-09 13:16:16 -08:00
Bruce MacDonald
db8bf336fc Update README.md 2023-11-09 12:53:24 -08:00
Nick Anderson
d77e094a90 Added gptel to list of integrations (#1062) 2023-11-09 12:52:36 -08:00
Matt Williams
dd3dc47ddb Merge pull request #992 from aashish2057/aashish2057/langchainjs_doc_update 2023-11-09 05:08:31 -08:00
Michael Yang
c5e1bbabda instead of static number of parameters for each model family, get the real number from the tensors (#1022)
* parse tensor info

* refactor decoder

* return actual parameter count

* explicit rounding

* s/Human/HumanNumber/
2023-11-08 17:55:46 -08:00
Bruce MacDonald
a49d6acc1e add a complete /generate options example (#1035) 2023-11-08 16:44:36 -08:00
Moritz Poldrack
6e9bcdb9b3 progressbar: make start and end seamless (#1042) 2023-11-08 16:42:40 -08:00
Matt Williams
13086363bd Update as per bmacd
Signed-off-by: Matt Williams <m@technovangelist.com>
2023-11-08 18:09:05 -06:00
Bruce MacDonald
ec2a31e9b3 support raw generation requests (#952)
- add the optional `raw` generate request parameter to bypass prompt formatting and response context
-add raw request to docs
2023-11-08 14:05:02 -08:00
Amith Koujalgi
ec84c02d54 Add Ollama4j Java library to the list of community libraries (#1044) 2023-11-08 11:04:32 -08:00
Kevin Hermawan
2a88b66bc9 Add Ollamac to community integrations (#1043) 2023-11-08 11:01:09 -08:00
Jeffrey Morgan
2d0faea96c clean up README.md 2023-11-08 00:03:29 -08:00
Jeffrey Morgan
637142181a clean up README.md 2023-11-07 23:52:31 -08:00
Matt Williams
bcbff421c9 Merge pull request #1023 from jmorganca/mattw/wherearemodelsfaq 2023-11-07 17:59:54 -08:00
thealhu
1359d6cf3b Fix sudo variable in install.sh (#1034)
It was forgotten to replace sudo at one place with the variable for sudo.
2023-11-07 09:59:57 -08:00
Omar Magdy
6e2d0224d9 Added logseq ollama plugin (#1029) 2023-11-07 09:58:13 -08:00
Ikko Eltociear Ashimine
921406f721 Update client.py (#1026)
recieve -> receive
2023-11-07 09:55:47 -08:00
Michael Yang
c7047d7353 Merge pull request #959 from jmorganca/mxyng/example-k8s 2023-11-07 10:43:21 -06:00
Matt Williams
1d155caba3 docs: clarify where the models are stored in the faq
Signed-off-by: Matt Williams <m@technovangelist.com>
2023-11-06 14:38:49 -08:00
Michael Yang
866324b9a5 Merge pull request #943 from tjbck/patch-1
doc: categorised community integrations + added ollama-webui
2023-11-06 11:35:39 -08:00
Michael Yang
145e060855 Apply suggestions from code review
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com>
2023-11-06 11:32:23 -08:00
Michael Yang
146072113d Merge pull request #993 from jmorganca/mxyng/cleanup
cleanup upload and download errors
2023-11-06 11:32:12 -08:00
Timothy Jaeryang Baek
33d31d1b56 Merge branch 'main' into patch-1 2023-11-06 14:27:02 -05:00
Dr. David A. Kunz
274c6cbf4c Added gen.nvim to community integrations (#996) 2023-11-06 10:51:41 -08:00
Elton Renda
7ebbd89bbf add hass-ollama-conversation (#999) 2023-11-06 10:50:35 -08:00
Lars Grammel
9079b1bb6d Add ModelFusion community integration (#1020) 2023-11-06 10:46:16 -08:00
Timothy Jaeryang Baek
6febde7200 Merge branch 'main' into patch-1 2023-11-04 19:12:18 -05:00
pepperoni21
325cfcd9ff Added ollama-rs to community integrations (#995)
Co-authored-by: pepperoni21 <pepperoni2100@gmail.com>
2023-11-04 14:51:29 -07:00
Jeffrey Morgan
639d0fd070 Update README.md 2023-11-04 12:24:24 -07:00
Jeffrey Morgan
e21579a0f1 Restore system prompt on requests 2023-11-03 17:26:45 -07:00
Jeffrey Morgan
c44b619428 remove unused fmt.Println 2023-11-03 17:24:58 -07:00
Michael Yang
434a6f9d46 return last error 2023-11-03 16:49:51 -07:00
aashish2057
b13586cc72 update langchainjs doc 2023-11-03 18:45:19 -05:00
Jeffrey Morgan
17678b7225 Restore system prompt on requests and default num_keep to 0 2023-11-03 13:25:25 -07:00
Michael Yang
84725ec7e3 refactor part reset 2023-11-03 09:20:32 -07:00
Bruce MacDonald
6109bebba6 reformat api docs for more examples (#972) 2023-11-03 10:57:00 -04:00
Noah Gitsham
8ae8c9fa8c Remove duplicate "install" in GPU support warning (#984) 2023-11-03 00:45:14 -07:00
Noah Gitsham
f39daff461 Add missing "be" to GPU support warning message (#983) 2023-11-02 18:37:12 -07:00
Jeffrey Morgan
c50b01bc21 check request.Context for initial system prompt 2023-11-02 18:17:00 -07:00
Bruce MacDonald
b9dc875401 remove modelfile context deprecated in v0.0.7 (#974) 2023-11-02 20:52:56 -04:00
Jeffrey Morgan
06589a3b30 Set NumKeep to 4 by default (#982) 2023-11-02 17:26:11 -07:00
Michael Yang
1fd511e661 Merge pull request #975 from jmorganca/mxyng/downloads
update downloads to use retry wrapper
2023-11-02 16:12:48 -07:00
Michael Yang
c01bbe94fd Merge pull request #979 from jmorganca/mxyng/num-keep
update default NumKeep
2023-11-02 15:48:44 -07:00
Jeffrey Morgan
1beb5645a9 only use system prompt if context is not provided (#978) 2023-11-02 15:48:02 -07:00
Michael Yang
6db3691b8f update default NumKeep 2023-11-02 15:47:35 -07:00
Michael Yang
fe5a872444 fix upload 2023-11-02 13:25:58 -07:00
Michael Yang
d39709260f download with retry 2023-11-02 13:16:11 -07:00
Michael Yang
60bb3c03a1 use http.Method 2023-11-02 13:12:45 -07:00
Jeffrey Morgan
2e53704685 default rope params to 0 for new models (#968) 2023-11-02 08:41:30 -07:00
Michael Yang
527f9a7975 Merge pull request #966 from jmorganca/mxyng/fix-log 2023-11-01 17:49:10 -07:00
Michael Yang
c4cc738cbf fix log 2023-11-01 17:18:11 -07:00
Michael Yang
2c6189f4fe Merge pull request #750 from jmorganca/mxyng/concurrent-uploads
concurrent uploads
2023-11-01 15:00:01 -07:00
Michael Yang
b99c291f47 fly example 2023-11-01 14:58:20 -07:00
Michael Yang
dccac8c8fa k8s example 2023-11-01 14:52:58 -07:00
Michael Yang
c05ab9a86e Merge pull request #965 from jmorganca/mxyng/go-mod-tidy
go mod tidy
2023-11-01 11:55:43 -07:00
Michael Yang
f42f3d9b27 go fmt 2023-11-01 11:55:08 -07:00
Michael Yang
341fb7e35f go mod tidy 2023-11-01 11:54:25 -07:00
Michael
f31961637f Update README.md 2023-11-01 12:20:55 -04:00
Michael Yang
ec3614812a Merge pull request #960 from jmorganca/mxyng/fix-tautology 2023-11-01 08:30:49 -07:00
Michael Yang
f14969314a Merge pull request #958 from jmorganca/mxyng/append-ld-library-path 2023-11-01 08:30:38 -07:00
Bruce MacDonald
1fb9288661 notify that the ollama api is available after linux install (#954) 2023-11-01 11:28:26 -04:00
Matt Williams
01a03caa20 Merge pull request #956 from jmorganca/mattw/apidocupdate 2023-10-31 21:43:11 -07:00
Michael Yang
bf6786bb39 fix tautology 2023-10-31 20:49:48 -07:00
Michael Yang
642128b75a append LD_LIBRARY_PATH 2023-10-31 15:54:49 -07:00
Matt Williams
f21bd6210d docs: clarify and clean up API docs
Signed-off-by: Matt Williams <m@technovangelist.com>
2023-10-31 13:11:33 -07:00
Matt Williams
80362fedce better readme
Signed-off-by: Matt Williams <m@technovangelist.com>
2023-10-31 12:40:46 -07:00
Matt Williams
5757925060 add a gif
Signed-off-by: Matt Williams <m@technovangelist.com>
2023-10-31 11:52:01 -07:00
Michael
4512301756 Update README.md 2023-10-31 13:25:36 -04:00
Matt Williams
2236a93efc docs: add examples using bash to compare models
Signed-off-by: Matt Williams <m@technovangelist.com>
2023-10-31 09:12:39 -07:00
Matt Williams
ad88799411 Merge pull request #949 from jmorganca/matt/fixPrivateGPT
fix: private gpt example was broken due to changes in chroma
2023-10-30 17:17:00 -07:00
Bruce MacDonald
0818b5e318 readline windows terminal support (#950)
- update the readline package to have basic support on windows, this is not full feature parity with the unix cli yet
2023-10-30 16:18:12 -04:00
Matt Williams
1df6100c77 Update examples/langchain-python-rag-privategpt/privateGPT.py
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com>
2023-10-30 12:48:17 -07:00
Matt Williams
5c48fe1fb0 Update examples/langchain-python-rag-privategpt/constants.py
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com>
2023-10-30 12:47:56 -07:00
Dirk Loss
874bb31986 Fix conversion command for gptneox (#948) 2023-10-30 14:34:29 -04:00
Matt Williams
f7856a57eb fix: private gpt example was broken due to changes in chroma
Signed-off-by: Matt Williams <m@technovangelist.com>
2023-10-30 10:56:25 -07:00
Bruce MacDonald
f9a4281124 clean up: remove server functions from client (#937) 2023-10-30 11:10:18 -04:00
Timothy Jaeryang Baek
96da0792e6 doc: OllamaSharp for .NET moved to libraries 2023-10-28 16:18:38 -05:00
Timothy Jaeryang Baek
95d24262fc doc: categorised community integrations + added web-ui 2023-10-28 16:02:13 -05:00
Jeffrey Morgan
8d03bd7b54 remove +build directive in term.go 2023-10-28 09:56:03 -07:00
Jeffrey Morgan
9ec16f0f03 fix formatting when exiting ollama run 2023-10-27 21:26:23 -07:00
Jeffrey Morgan
57a58db1b0 history: update pos after compact 2023-10-27 20:38:03 -07:00
Jeffrey Morgan
2d75a4537c close input channel when receiving io.EOF 2023-10-27 20:26:04 -07:00
Jeffrey Morgan
4748609611 Don't quit ioloop on NUL character (#940)
* dont quit ioloop on 0 rune

* check for closed channel

* remove unused error on `Close()`
2023-10-27 20:01:48 -07:00
Jeffrey Morgan
c0dcea1398 Update faq.md 2023-10-27 18:29:00 -07:00
Michael Yang
115fc56eb7 calculate and verify md5 checksum 2023-10-27 17:07:33 -07:00
Michael Yang
186f685224 retry PUT 2023-10-27 17:07:33 -07:00
Michael Yang
12efcbb057 comments 2023-10-27 17:07:33 -07:00
Michael Yang
4e09aab8b9 concurrent uploads 2023-10-27 17:07:33 -07:00
Jeffrey Morgan
3a1ed9ff70 restore building runner with AVX on by default (#900) 2023-10-27 12:13:44 -07:00
Bruce MacDonald
6d283882b1 catch insufficient permissions nvidia err (#934) 2023-10-27 12:42:40 -04:00
Bruce MacDonald
5c3491f425 allow for a configurable ollama model storage directory (#897)
* allow for a configurable ollama models directory

- set OLLAMA_MODELS in the environment that ollama is running in to change where model files are stored
- update docs

Co-Authored-By: Jeffrey Morgan <jmorganca@gmail.com>
Co-Authored-By: Jay Nakrani <dhananjaynakrani@gmail.com>
Co-Authored-By: Akhil Acharya <akhilcacharya@gmail.com>
Co-Authored-By: Sasha Devol <sasha.devol@protonmail.com>
2023-10-27 10:19:59 -04:00
James Braza
e5d1ce4dde Tweaks to README.md (#906)
* Mentioned Docker Hub in docs
* Consolidated brew installs to one line
2023-10-27 00:10:23 -07:00
Bruce MacDonald
2665f3c28e offload 75% of available vram to improve stability (#921) 2023-10-26 20:49:55 -04:00
Patrick Devine
a79f030e75 add bracketed paste mode (#922) 2023-10-26 15:57:00 -07:00
Michael Yang
9bc5864a03 Merge pull request #918 from jmorganca/mxyng/fix-out-of-space
fix(download): no retry when out of space
2023-10-26 12:24:20 -07:00
Michael Yang
b88cc0fac9 Merge pull request #916 from jmorganca/mxyng/fix-client-host
fix(client): trim trailing slash
2023-10-26 12:24:12 -07:00
Patrick Devine
5b2cf16397 fix docker build annotations (#917) 2023-10-26 12:00:33 -07:00
Michael Yang
910816a532 fix(download): no retry when out of space 2023-10-26 11:34:07 -07:00
Michael Yang
28c3f288e2 client: fix trailing slash 2023-10-26 11:09:38 -07:00
Patrick Devine
deeac961bb new readline library (#847) 2023-10-25 16:41:18 -07:00
Jeffrey Morgan
49443e7da5 fix typo in README.md 2023-10-25 16:19:27 -07:00
Ajay Kemparaj
bb8464c0d2 update golang.org/x/net fixes CVE-2023-3978,CVE-2023-39325,CVE-2023-44487 (#855) 2023-10-25 16:17:24 -07:00
Michael Yang
daa5bb4473 Merge pull request #907 from jmorganca/mxyng/linux
update linux.md
2023-10-25 15:03:34 -07:00
Michael Yang
92119de9d8 update linux.md 2023-10-25 14:57:50 -07:00
Michael Yang
53b0ba8d43 Merge pull request #893 from jmorganca/mxyng/update-faq
update faq
2023-10-24 16:02:35 -07:00
Michael Yang
db342691f9 Update docs/faq.md
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com>
2023-10-24 13:59:33 -07:00
Bruce MacDonald
cecf83141e Linux uninstall instructions (#894) 2023-10-24 14:07:05 -04:00
Michael Yang
a5a2adf1ec update faq 2023-10-24 10:54:16 -07:00
Jeffrey Morgan
b0c9cd0f3b fix metal assertion errors 2023-10-24 00:32:36 -07:00
Jeffrey Morgan
77f61c6301 update submodule commit 2023-10-24 00:30:27 -07:00
Jeffrey Morgan
f3604534e5 update submodule commit 2023-10-23 23:59:12 -07:00
Jeffrey Morgan
914428351a Update import.md 2023-10-23 17:44:53 -07:00
Jeffrey Morgan
9afea9e3b9 Update import.md
Separate GGUF and PyTorch guides
2023-10-23 17:42:17 -07:00
Bruce MacDonald
c039432b5c add current user to ollama group on install (#772) 2023-10-23 17:06:31 -04:00
Michael Yang
c345b4ca7c Merge pull request #884 from jmorganca/mxyng/update-submodules
bump submodules
2023-10-23 11:27:38 -07:00
Michael Yang
0c7a00a264 bump submodules
pin to 9e70cc03229df19ca2d28ce23cc817198f897278 for now since
438c2ca83045a00ef244093d27e9ed41a8cb4ea9 is breaking
2023-10-23 11:17:59 -07:00
Michael Yang
36c160f1c3 Merge pull request #881 from jmorganca/mxyng/ggufv3
ggufv3
2023-10-23 10:50:45 -07:00
Michael Yang
b66bcaa582 Merge pull request #883 from jmorganca/mxyng/logs
update default log target
2023-10-23 10:50:29 -07:00
Michael Yang
c9167494cb update default log target 2023-10-23 10:44:50 -07:00
Michael Yang
125d0a013a ggufv3
ggufv3 adds support for big endianness, mainly for s390x architecture.
while that's not currently supported for ollama, the change is simple.

loosen version check to be more forward compatible. unless specified,
gguf versions other v1 will be decoded into v2.
2023-10-23 09:35:49 -07:00
Richard Awoyemi
ba2da6ceaa Added a minimalist React UI for Ollama models to the community contributions.md (#870) 2023-10-23 10:44:39 -04:00
1011 changed files with 748403 additions and 69014 deletions

View File

@@ -1,8 +1,11 @@
.vscode
ollama
app
macapp
dist
scripts
llm/llama.cpp/ggml
llm/llama.cpp/gguf
build
.env
.cache
test_data
.git

24
.gitattributes vendored Normal file
View File

@@ -0,0 +1,24 @@
llama/**/*.cpp linguist-vendored
llama/**/*.hpp linguist-vendored
llama/**/*.h linguist-vendored
llama/**/*.c linguist-vendored
llama/**/*.cu linguist-vendored
llama/**/*.cuh linguist-vendored
llama/**/*.m linguist-vendored
llama/**/*.metal linguist-vendored
ml/backend/**/*.c linguist-vendored
ml/backend/**/*.h linguist-vendored
ml/backend/**/*.cpp linguist-vendored
ml/backend/**/*.hpp linguist-vendored
ml/backend/**/*.cu linguist-vendored
ml/backend/**/*.cuh linguist-vendored
ml/backend/**/*.m linguist-vendored
ml/backend/**/*.metal linguist-vendored
ml/backend/**/CMakeLists.txt linguist-vendored
llama/build-info.cpp linguist-generated
ml/backend/ggml/ggml/src/ggml-metal/ggml-metal-embed.s linguist-generated
* text=auto
*.go text eol=lf

View File

@@ -0,0 +1,68 @@
name: Bug report
labels: [bug]
description: Something isn't working right.
body:
- type: textarea
id: description
attributes:
label: What is the issue?
description: What happened? What did you expect to happen?
validations:
required: true
- type: textarea
id: logs
attributes:
label: Relevant log output
description: Please copy and paste any relevant log output. See [Troubleshooting Guide](https://github.com/ollama/ollama/blob/main/docs/troubleshooting.md#how-to-troubleshoot-issues) for details.
render: shell
validations:
required: false
- type: dropdown
id: os
attributes:
label: OS
description: Which operating system are you using?
multiple: true
options:
- Linux
- macOS
- Windows
- Docker
- WSL2
validations:
required: false
- type: dropdown
id: gpu
attributes:
label: GPU
description: Which GPU are you using?
multiple: true
options:
- Nvidia
- AMD
- Intel
- Apple
- Other
validations:
required: false
- type: dropdown
id: cpu
attributes:
label: CPU
description: Which CPU are you using?
multiple: true
options:
- Intel
- AMD
- Apple
- Other
validations:
required: false
- type: input
id: version
attributes:
label: Ollama version
description: What version of Ollama are you using? (`ollama --version`)
placeholder: e.g., 0.1.32
validations:
required: false

View File

@@ -0,0 +1,6 @@
---
name: Feature request
about: Request a new feature
labels: feature request
---

View File

@@ -0,0 +1,5 @@
---
name: Model request
about: Request support for a new model to be added to Ollama
labels: model request
---

8
.github/ISSUE_TEMPLATE/config.yml vendored Normal file
View File

@@ -0,0 +1,8 @@
blank_issues_enabled: true
contact_links:
- name: Help
url: https://discord.com/invite/ollama
about: Please join our Discord server for help using Ollama
- name: Troubleshooting
url: https://github.com/ollama/ollama/blob/main/docs/faq.md#faq
about: See the FAQ for common issues and solutions

24
.github/workflows/latest.yaml vendored Normal file
View File

@@ -0,0 +1,24 @@
name: latest
on:
release:
types: [released]
jobs:
update-latest:
environment: release
runs-on: linux
steps:
- uses: actions/checkout@v4
- name: Login to Docker Hub
uses: docker/login-action@v3
with:
username: ${{ vars.DOCKER_USER }}
password: ${{ secrets.DOCKER_ACCESS_TOKEN }}
- name: Tag images as latest
env:
PUSH: "1"
shell: bash
run: |
export "VERSION=${GITHUB_REF_NAME#v}"
./scripts/tag_latest.sh

479
.github/workflows/release.yaml vendored Normal file
View File

@@ -0,0 +1,479 @@
name: release
on:
push:
tags:
- 'v*'
env:
CGO_CFLAGS: '-O3'
CGO_CXXFLAGS: '-O3'
jobs:
setup-environment:
runs-on: ubuntu-latest
environment: release
outputs:
GOFLAGS: ${{ steps.goflags.outputs.GOFLAGS }}
steps:
- uses: actions/checkout@v4
- name: Set environment
id: goflags
run: |
echo GOFLAGS="'-ldflags=-w -s \"-X=github.com/ollama/ollama/version.Version=${GITHUB_REF_NAME#v}\" \"-X=github.com/ollama/ollama/server.mode=release\"'" >>$GITHUB_OUTPUT
darwin-build:
runs-on: macos-13
environment: release
needs: setup-environment
strategy:
matrix:
os: [darwin]
arch: [amd64, arm64]
env:
GOFLAGS: ${{ needs.setup-environment.outputs.GOFLAGS }}
steps:
- uses: actions/checkout@v4
- uses: actions/setup-go@v5
with:
go-version-file: go.mod
- run: |
go build -o dist/ .
env:
GOOS: ${{ matrix.os }}
GOARCH: ${{ matrix.arch }}
CGO_ENABLED: 1
CGO_CPPFLAGS: '-mmacosx-version-min=11.3'
- if: matrix.arch == 'amd64'
run: |
cmake --preset CPU -DCMAKE_OSX_DEPLOYMENT_TARGET=11.3 -DCMAKE_SYSTEM_PROCESSOR=x86_64 -DCMAKE_OSX_ARCHITECTURES=x86_64
cmake --build --parallel --preset CPU
cmake --install build --component CPU --strip --parallel 8
- uses: actions/upload-artifact@v4
with:
name: build-${{ matrix.os }}-${{ matrix.arch }}
path: dist/*
darwin-sign:
runs-on: macos-13
environment: release
needs: darwin-build
steps:
- uses: actions/checkout@v4
- run: |
echo $MACOS_SIGNING_KEY | base64 --decode > certificate.p12
security create-keychain -p password build.keychain
security default-keychain -s build.keychain
security unlock-keychain -p password build.keychain
security import certificate.p12 -k build.keychain -P $MACOS_SIGNING_KEY_PASSWORD -T /usr/bin/codesign
security set-key-partition-list -S apple-tool:,apple:,codesign: -s -k password build.keychain
security set-keychain-settings -lut 3600 build.keychain
env:
MACOS_SIGNING_KEY: ${{ secrets.MACOS_SIGNING_KEY }}
MACOS_SIGNING_KEY_PASSWORD: ${{ secrets.MACOS_SIGNING_KEY_PASSWORD }}
- uses: actions/download-artifact@v4
with:
name: build-darwin-amd64
path: dist/darwin-amd64
- uses: actions/download-artifact@v4
with:
name: build-darwin-arm64
path: dist/darwin-arm64
- run: |
export VERSION=${GITHUB_REF_NAME#v}
./scripts/build_darwin.sh sign macapp
env:
APPLE_IDENTITY: ${{ secrets.APPLE_IDENTITY }}
APPLE_PASSWORD: ${{ secrets.APPLE_PASSWORD }}
APPLE_TEAM_ID: ${{ vars.APPLE_TEAM_ID }}
APPLE_ID: ${{ vars.APPLE_ID }}
SDKROOT: /Applications/Xcode_14.1.0.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk
DEVELOPER_DIR: /Applications/Xcode_14.1.0.app/Contents/Developer
- uses: actions/upload-artifact@v4
with:
name: dist-darwin
path: |
dist/Ollama-darwin.zip
dist/ollama-darwin.tgz
windows-depends:
strategy:
matrix:
os: [windows]
arch: [amd64]
preset: ['CPU']
include:
- os: windows
arch: amd64
preset: 'CUDA 11'
install: https://developer.download.nvidia.com/compute/cuda/11.3.1/local_installers/cuda_11.3.1_465.89_win10.exe
cuda-version: '11.3'
- os: windows
arch: amd64
preset: 'CUDA 12'
install: https://developer.download.nvidia.com/compute/cuda/12.8.0/local_installers/cuda_12.8.0_571.96_windows.exe
cuda-version: '12.8'
- os: windows
arch: amd64
preset: 'ROCm 6'
install: https://download.amd.com/developer/eula/rocm-hub/AMD-Software-PRO-Edition-24.Q4-WinSvr2022-For-HIP.exe
rocm-version: '6.2'
runs-on: ${{ matrix.arch == 'arm64' && format('{0}-{1}', matrix.os, matrix.arch) || matrix.os }}
environment: release
env:
GOFLAGS: ${{ needs.setup-environment.outputs.GOFLAGS }}
steps:
- name: Install system dependencies
run: |
choco install -y --no-progress ccache ninja
ccache -o cache_dir=${{ github.workspace }}\.ccache
- if: startsWith(matrix.preset, 'CUDA ') || startsWith(matrix.preset, 'ROCm ')
id: cache-install
uses: actions/cache/restore@v4
with:
path: |
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA
C:\Program Files\AMD\ROCm
key: ${{ matrix.install }}
- if: startsWith(matrix.preset, 'CUDA ')
name: Install CUDA ${{ matrix.cuda-version }}
run: |
$ErrorActionPreference = "Stop"
if ("${{ steps.cache-install.outputs.cache-hit }}" -ne 'true') {
Invoke-WebRequest -Uri "${{ matrix.install }}" -OutFile "install.exe"
$subpackages = @("cudart", "nvcc", "cublas", "cublas_dev") | Foreach-Object {"${_}_${{ matrix.cuda-version }}"}
Start-Process -FilePath .\install.exe -ArgumentList (@("-s") + $subpackages) -NoNewWindow -Wait
}
$cudaPath = (Resolve-Path "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\*").path
echo "$cudaPath\bin" | Out-File -FilePath $env:GITHUB_PATH -Encoding utf8 -Append
- if: startsWith(matrix.preset, 'ROCm')
name: Install ROCm ${{ matrix.rocm-version }}
run: |
$ErrorActionPreference = "Stop"
if ("${{ steps.cache-install.outputs.cache-hit }}" -ne 'true') {
Invoke-WebRequest -Uri "${{ matrix.install }}" -OutFile "install.exe"
Start-Process -FilePath .\install.exe -ArgumentList '-install' -NoNewWindow -Wait
}
$hipPath = (Resolve-Path "C:\Program Files\AMD\ROCm\*").path
echo "$hipPath\bin" | Out-File -FilePath $env:GITHUB_PATH -Encoding utf8 -Append
echo "CC=$hipPath\bin\clang.exe" | Out-File -FilePath $env:GITHUB_ENV -Append
echo "CXX=$hipPath\bin\clang++.exe" | Out-File -FilePath $env:GITHUB_ENV -Append
- if: matrix.preset == 'CPU'
run: |
echo "CC=clang.exe" | Out-File -FilePath $env:GITHUB_ENV -Append
echo "CXX=clang++.exe" | Out-File -FilePath $env:GITHUB_ENV -Append
- if: ${{ !cancelled() && steps.cache-install.outputs.cache-hit != 'true' }}
uses: actions/cache/save@v4
with:
path: |
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA
C:\Program Files\AMD\ROCm
key: ${{ matrix.install }}
- uses: actions/checkout@v4
- uses: actions/cache@v4
with:
path: ${{ github.workspace }}\.ccache
key: ccache-${{ matrix.os }}-${{ matrix.arch }}-${{ matrix.preset }}
- name: Build target "${{ matrix.preset }}"
run: |
Import-Module 'C:\Program Files (x86)\Microsoft Visual Studio\2019\Enterprise\Common7\Tools\Microsoft.VisualStudio.DevShell.dll'
Enter-VsDevShell -VsInstallPath 'C:\Program Files (x86)\Microsoft Visual Studio\2019\Enterprise' -SkipAutomaticLocation -DevCmdArguments '-arch=x64 -no_logo'
cmake --preset "${{ matrix.preset }}"
cmake --build --parallel --preset "${{ matrix.preset }}"
cmake --install build --component "${{ startsWith(matrix.preset, 'CUDA ') && 'CUDA' || startsWith(matrix.preset, 'ROCm ') && 'HIP' || 'CPU' }}" --strip --parallel 8
env:
CMAKE_GENERATOR: Ninja
- uses: actions/upload-artifact@v4
with:
name: depends-${{ matrix.os }}-${{ matrix.arch }}-${{ matrix.preset }}
path: dist\*
windows-build:
strategy:
matrix:
os: [windows]
arch: [amd64, arm64]
runs-on: ${{ matrix.arch == 'arm64' && format('{0}-{1}', matrix.os, matrix.arch) || matrix.os }}
environment: release
needs: [setup-environment]
env:
GOFLAGS: ${{ needs.setup-environment.outputs.GOFLAGS }}
steps:
- name: Install AMD64 system dependencies
if: matrix.arch == 'amd64'
run: |
$ErrorActionPreference = "Stop"
Start-Process "C:\msys64\usr\bin\pacman.exe" -ArgumentList @("-S", "--noconfirm", "mingw-w64-clang-x86_64-gcc-compat", "mingw-w64-clang-x86_64-clang") -NoNewWindow -Wait
echo "C:\msys64\usr\bin" | Out-File -FilePath $env:GITHUB_PATH -Encoding utf8 -Append
echo "C:\msys64\clang64\bin" | Out-File -FilePath $env:GITHUB_PATH -Encoding utf8 -Append
- name: Install ARM64 system dependencies
if: matrix.arch == 'arm64'
run: |
$ErrorActionPreference = "Stop"
Set-ExecutionPolicy Bypass -Scope Process -Force
[System.Net.ServicePointManager]::SecurityProtocol = [System.Net.ServicePointManager]::SecurityProtocol -bor 3072
iex ((New-Object System.Net.WebClient).DownloadString('https://community.chocolatey.org/install.ps1'))
echo "C:\ProgramData\chocolatey\bin" | Out-File -FilePath $env:GITHUB_PATH -Encoding utf8 -Append
choco install -y --no-progress git gzip
echo "C:\Program Files\Git\cmd" | Out-File -FilePath $env:GITHUB_PATH -Encoding utf8 -Append
Invoke-WebRequest -Uri "https://github.com/mstorsjo/llvm-mingw/releases/download/20240619/llvm-mingw-20240619-ucrt-aarch64.zip" -OutFile "${{ runner.temp }}\llvm-mingw-ucrt-aarch64.zip"
Expand-Archive -Path ${{ runner.temp }}\llvm-mingw-ucrt-aarch64.zip -DestinationPath "C:\Program Files\"
$installPath=(Resolve-Path -Path "C:\Program Files\llvm-mingw-*-ucrt-aarch64").path
echo $installPath\bin | Out-File -FilePath $env:GITHUB_PATH -Encoding utf8 -Append
- uses: actions/checkout@v4
- uses: actions/setup-go@v5
with:
go-version-file: go.mod
- run: |
go build -o dist/${{ matrix.os }}-${{ matrix.arch }}/ .
- if: matrix.arch == 'arm64'
run: |
Invoke-WebRequest -Uri "https://aka.ms/vs/17/release/vc_redist.arm64.exe" -OutFile "dist\windows-arm64\vc_redist.arm64.exe"
- run: |
$env:VERSION='${{ github.ref_name }}' -Replace "v(.*)", '$1'
& .\scripts\build_windows.ps1 buildApp
env:
VCToolsRedistDir: stub
- uses: actions/upload-artifact@v4
with:
name: build-${{ matrix.os }}-${{ matrix.arch }}
path: |
dist\${{ matrix.os }}-${{ matrix.arch }}\*.exe
dist\${{ matrix.os }}-${{ matrix.arch }}-app.exe
windows-sign:
runs-on: windows-2022
environment: release
needs: [windows-depends, windows-build]
steps:
- uses: actions/checkout@v4
- uses: google-github-actions/auth@v2
with:
project_id: ollama
credentials_json: ${{ secrets.GOOGLE_SIGNING_CREDENTIALS }}
- run: |
$ErrorActionPreference = "Stop"
Invoke-WebRequest -Uri "https://go.microsoft.com/fwlink/p/?LinkId=323507" -OutFile "${{ runner.temp }}\sdksetup.exe"
Start-Process "${{ runner.temp }}\sdksetup.exe" -ArgumentList @("/q") -NoNewWindow -Wait
Invoke-WebRequest -Uri "https://github.com/GoogleCloudPlatform/kms-integrations/releases/download/cng-v1.0/kmscng-1.0-windows-amd64.zip" -OutFile "${{ runner.temp }}\plugin.zip"
Expand-Archive -Path "${{ runner.temp }}\plugin.zip" -DestinationPath "${{ runner.temp }}\plugin\"
& "${{ runner.temp }}\plugin\*\kmscng.msi" /quiet
echo "${{ vars.OLLAMA_CERT }}" >ollama_inc.crt
- uses: actions/download-artifact@v4
with:
pattern: build-windows-*
path: dist\
merge-multiple: true
- uses: actions/download-artifact@v4
with:
pattern: depends-windows-amd64-*
path: dist\windows-amd64\
merge-multiple: true
- run: |
& .\scripts\build_windows.ps1 gatherDependencies sign buildInstaller distZip
env:
KEY_CONTAINER: ${{ vars.KEY_CONTAINER }}
- uses: actions/upload-artifact@v4
with:
name: dist-windows
path: |
dist\OllamaSetup.exe
dist\ollama-windows-*.zip
linux-build:
strategy:
matrix:
include:
- os: linux
arch: amd64
target: archive
- os: linux
arch: amd64
target: rocm
- os: linux
arch: arm64
target: archive
runs-on: ${{ matrix.arch == 'arm64' && format('{0}-{1}', matrix.os, matrix.arch) || matrix.os }}
environment: release
needs: setup-environment
env:
GOFLAGS: ${{ needs.setup-environment.outputs.GOFLAGS }}
steps:
- uses: actions/checkout@v4
- uses: docker/setup-buildx-action@v3
- uses: docker/build-push-action@v6
with:
context: .
platforms: ${{ matrix.os }}/${{ matrix.arch }}
target: ${{ matrix.target }}
build-args: |
GOFLAGS=${{ env.GOFLAGS }}
CGO_CFLAGS=${{ env.CGO_CFLAGS }}
CGO_CXXFLAGS=${{ env.CGO_CXXFLAGS }}
outputs: type=local,dest=dist/${{ matrix.os }}-${{ matrix.arch }}
cache-from: type=registry,ref=ollama/ollama:latest
cache-to: type=inline
- run: |
for COMPONENT in bin/* lib/ollama/*; do
case "$COMPONENT" in
bin/ollama) echo $COMPONENT >>ollama-${{ matrix.os }}-${{ matrix.arch }}.tar.in ;;
lib/ollama/*.so) echo $COMPONENT >>ollama-${{ matrix.os }}-${{ matrix.arch }}.tar.in ;;
lib/ollama/cuda_v11) echo $COMPONENT >>ollama-${{ matrix.os }}-${{ matrix.arch }}.tar.in ;;
lib/ollama/cuda_v12) echo $COMPONENT >>ollama-${{ matrix.os }}-${{ matrix.arch }}.tar.in ;;
lib/ollama/cuda_jetpack5) echo $COMPONENT >>ollama-${{ matrix.os }}-${{ matrix.arch }}-jetpack5.tar.in ;;
lib/ollama/cuda_jetpack6) echo $COMPONENT >>ollama-${{ matrix.os }}-${{ matrix.arch }}-jetpack6.tar.in ;;
lib/ollama/rocm) echo $COMPONENT >>ollama-${{ matrix.os }}-${{ matrix.arch }}-rocm.tar.in ;;
esac
done
working-directory: dist/${{ matrix.os }}-${{ matrix.arch }}
- run: |
for ARCHIVE in dist/${{ matrix.os }}-${{ matrix.arch }}/*.tar.in; do
tar c -C dist/${{ matrix.os }}-${{ matrix.arch }} -T $ARCHIVE --owner 0 --group 0 | pigz -9vc >$(basename ${ARCHIVE//.*/}.tgz);
done
- uses: actions/upload-artifact@v4
with:
name: dist-${{ matrix.os }}-${{ matrix.arch }}-${{ matrix.target }}
path: |
*.tgz
# Build each Docker variant (OS, arch, and flavor) separately. Using QEMU is unreliable and slower.
docker-build-push:
strategy:
matrix:
include:
- os: linux
arch: arm64
build-args: |
CGO_CFLAGS
CGO_CXXFLAGS
GOFLAGS
- os: linux
arch: amd64
build-args: |
CGO_CFLAGS
CGO_CXXFLAGS
GOFLAGS
- os: linux
arch: amd64
suffix: '-rocm'
build-args: |
CGO_CFLAGS
CGO_CXXFLAGS
GOFLAGS
FLAVOR=rocm
runs-on: ${{ matrix.arch == 'arm64' && format('{0}-{1}', matrix.os, matrix.arch) || matrix.os }}
environment: release
needs: setup-environment
env:
GOFLAGS: ${{ needs.setup-environment.outputs.GOFLAGS }}
steps:
- uses: actions/checkout@v4
- uses: docker/setup-buildx-action@v3
- uses: docker/login-action@v3
with:
username: ${{ vars.DOCKER_USER }}
password: ${{ secrets.DOCKER_ACCESS_TOKEN }}
- id: build-push
uses: docker/build-push-action@v6
with:
context: .
platforms: ${{ matrix.os }}/${{ matrix.arch }}
build-args: ${{ matrix.build-args }}
outputs: type=image,name=ollama/ollama,push-by-digest=true,name-canonical=true,push=true
cache-from: type=registry,ref=ollama/ollama:latest
cache-to: type=inline
- run: |
mkdir -p ${{ matrix.os }}-${{ matrix.arch }}
echo "${{ steps.build-push.outputs.digest }}" >${{ matrix.os }}-${{ matrix.arch }}-${{ matrix.suffix }}.txt
working-directory: ${{ runner.temp }}
- uses: actions/upload-artifact@v4
with:
name: digest-${{ matrix.os }}-${{ matrix.arch }}-${{ matrix.suffix }}
path: |
${{ runner.temp }}/${{ matrix.os }}-${{ matrix.arch }}-${{ matrix.suffix }}.txt
# Merge Docker images for the same flavor into a single multi-arch manifest
docker-merge-push:
strategy:
matrix:
suffix: ['', '-rocm']
runs-on: linux
environment: release
needs: [docker-build-push]
steps:
- uses: docker/login-action@v3
with:
username: ${{ vars.DOCKER_USER }}
password: ${{ secrets.DOCKER_ACCESS_TOKEN }}
- id: metadata
uses: docker/metadata-action@v4
with:
flavor: |
latest=false
suffix=${{ matrix.suffix }}
images: |
ollama/ollama
tags: |
type=ref,enable=true,priority=600,prefix=pr-,event=pr
type=semver,pattern={{version}}
- uses: actions/download-artifact@v4
with:
pattern: digest-*
path: ${{ runner.temp }}
merge-multiple: true
- run: |
docker buildx imagetools create $(echo '${{ steps.metadata.outputs.json }}' | jq -cr '.tags | map("-t", .) | join(" ")') $(cat *-${{ matrix.suffix }}.txt | xargs printf 'ollama/ollama@%s ')
docker buildx imagetools inspect ollama/ollama:${{ steps.metadata.outputs.version }}
working-directory: ${{ runner.temp }}
# Aggregate all the assets and ship a release
release:
needs: [darwin-sign, windows-sign, linux-build]
runs-on: linux
environment: release
permissions:
contents: write
env:
GH_TOKEN: ${{ github.token }}
steps:
- uses: actions/checkout@v4
- uses: actions/download-artifact@v4
with:
name: dist-darwin
path: dist
- uses: actions/download-artifact@v4
with:
name: dist-windows
path: dist
- uses: actions/download-artifact@v4
with:
pattern: dist-linux-*
path: dist
merge-multiple: true
- run: find . -type f -not -name 'sha256sum.txt' | xargs sha256sum | tee sha256sum.txt
working-directory: dist
- name: Create or update Release
run: |
RELEASE_VERSION="$(echo ${GITHUB_REF_NAME} | cut -f1 -d-)"
echo "Looking for existing release for ${RELEASE_VERSION}"
OLD_TAG=$(gh release ls --json name,tagName | jq -r ".[] | select(.name == \"${RELEASE_VERSION}\") | .tagName")
if [ -n "$OLD_TAG" ]; then
echo "Updating release ${RELEASE_VERSION} to point to new tag ${GITHUB_REF_NAME}"
gh release edit ${OLD_TAG} --tag ${GITHUB_REF_NAME}
else
echo "Creating new release ${RELEASE_VERSION} pointing to tag ${GITHUB_REF_NAME}"
gh release create ${GITHUB_REF_NAME} \
--title ${RELEASE_VERSION} \
--draft \
--generate-notes \
--prerelease
fi
echo "Uploading artifacts for tag ${GITHUB_REF_NAME}"
gh release upload ${GITHUB_REF_NAME} dist/* --clobber

241
.github/workflows/test.yaml vendored Normal file
View File

@@ -0,0 +1,241 @@
name: test
concurrency:
# For PRs, later CI runs preempt previous ones. e.g. a force push on a PR
# cancels running CI jobs and starts all new ones.
#
# For non-PR pushes, concurrency.group needs to be unique for every distinct
# CI run we want to have happen. Use run_id, which in practice means all
# non-PR CI runs will be allowed to run without preempting each other.
group: ${{ github.workflow }}-$${{ github.pull_request.number || github.run_id }}
cancel-in-progress: true
on:
pull_request:
paths:
- '**/*'
- '!docs/**'
- '!README.md'
jobs:
changes:
runs-on: ubuntu-latest
outputs:
changed: ${{ steps.changes.outputs.changed }}
steps:
- uses: actions/checkout@v4
with:
fetch-depth: 0
- id: changes
run: |
changed() {
local BASE=${{ github.event.pull_request.base.sha }}
local HEAD=${{ github.event.pull_request.head.sha }}
local MERGE_BASE=$(git merge-base $BASE $HEAD)
git diff-tree -r --no-commit-id --name-only "$MERGE_BASE" "$HEAD" \
| xargs python3 -c "import sys; from pathlib import Path; print(any(Path(x).match(glob) for x in sys.argv[1:] for glob in '$*'.split(' ')))"
}
echo changed=$(changed 'llama/llama.cpp/**' 'ml/backend/ggml/ggml/**') | tee -a $GITHUB_OUTPUT
linux:
needs: [changes]
if: needs.changes.outputs.changed == 'True'
strategy:
matrix:
include:
- preset: CPU
- preset: CUDA
container: nvidia/cuda:11.8.0-devel-ubuntu22.04
flags: '-DCMAKE_CUDA_ARCHITECTURES=87'
- preset: ROCm
container: rocm/dev-ubuntu-22.04:6.1.2
extra-packages: rocm-libs
flags: '-DAMDGPU_TARGETS=gfx1010 -DCMAKE_PREFIX_PATH=/opt/rocm'
runs-on: linux
container: ${{ matrix.container }}
steps:
- uses: actions/checkout@v4
- run: |
[ -n "${{ matrix.container }}" ] || sudo=sudo
$sudo apt-get update
$sudo apt-get install -y cmake ccache ${{ matrix.extra-packages }}
env:
DEBIAN_FRONTEND: noninteractive
- uses: actions/cache@v4
with:
path: /github/home/.cache/ccache
key: ccache-${{ runner.os }}-${{ runner.arch }}-${{ matrix.preset }}
- run: |
cmake --preset ${{ matrix.preset }} ${{ matrix.flags }}
cmake --build --preset ${{ matrix.preset }} --parallel
windows:
needs: [changes]
if: needs.changes.outputs.changed == 'True'
strategy:
matrix:
include:
- preset: CPU
- preset: CUDA
install: https://developer.download.nvidia.com/compute/cuda/11.3.1/local_installers/cuda_11.3.1_465.89_win10.exe
flags: '-DCMAKE_CUDA_ARCHITECTURES=80'
- preset: ROCm
install: https://download.amd.com/developer/eula/rocm-hub/AMD-Software-PRO-Edition-24.Q4-WinSvr2022-For-HIP.exe
flags: '-DAMDGPU_TARGETS=gfx1010'
runs-on: windows
steps:
- run: |
choco install -y --no-progress ccache ninja
ccache -o cache_dir=${{ github.workspace }}\.ccache
- if: matrix.preset == 'CUDA' || matrix.preset == 'ROCm'
id: cache-install
uses: actions/cache/restore@v4
with:
path: |
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA
C:\Program Files\AMD\ROCm
key: ${{ matrix.install }}
- if: matrix.preset == 'CUDA'
name: Install CUDA ${{ matrix.cuda-version }}
run: |
$ErrorActionPreference = "Stop"
if ("${{ steps.cache-install.outputs.cache-hit }}" -ne 'true') {
Invoke-WebRequest -Uri "${{ matrix.install }}" -OutFile "install.exe"
Start-Process -FilePath .\install.exe -ArgumentList (@("-s", "cudart_11.3", "nvcc_11.3", "cublas_11.3", "cublas_dev_11.3")) -NoNewWindow -Wait
}
$cudaPath = (Resolve-Path "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\*").path
echo "$cudaPath\bin" | Out-File -FilePath $env:GITHUB_PATH -Encoding utf8 -Append
- if: matrix.preset == 'ROCm'
name: Install ROCm ${{ matrix.rocm-version }}
run: |
$ErrorActionPreference = "Stop"
if ("${{ steps.cache-install.outputs.cache-hit }}" -ne 'true') {
Invoke-WebRequest -Uri "${{ matrix.install }}" -OutFile "install.exe"
Start-Process -FilePath .\install.exe -ArgumentList '-install' -NoNewWindow -Wait
}
$hipPath = (Resolve-Path "C:\Program Files\AMD\ROCm\*").path
echo "$hipPath\bin" | Out-File -FilePath $env:GITHUB_PATH -Encoding utf8 -Append
echo "CC=$hipPath\bin\clang.exe" | Out-File -FilePath $env:GITHUB_ENV -Append
echo "CXX=$hipPath\bin\clang++.exe" | Out-File -FilePath $env:GITHUB_ENV -Append
- if: ${{ !cancelled() && steps.cache-install.outputs.cache-hit != 'true' }}
uses: actions/cache/save@v4
with:
path: |
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA
C:\Program Files\AMD\ROCm
key: ${{ matrix.install }}
- uses: actions/checkout@v4
- uses: actions/cache@v4
with:
path: ${{ github.workspace }}\.ccache
key: ccache-${{ runner.os }}-${{ runner.arch }}-${{ matrix.preset }}
- run: |
Import-Module 'C:\Program Files (x86)\Microsoft Visual Studio\2019\Enterprise\Common7\Tools\Microsoft.VisualStudio.DevShell.dll'
Enter-VsDevShell -VsInstallPath 'C:\Program Files (x86)\Microsoft Visual Studio\2019\Enterprise' -SkipAutomaticLocation -DevCmdArguments '-arch=x64 -no_logo'
cmake --preset "${{ matrix.preset }}" ${{ matrix.flags }}
cmake --build --parallel --preset "${{ matrix.preset }}"
env:
CMAKE_GENERATOR: Ninja
go_mod_tidy:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: check that 'go mod tidy' is clean
run: go mod tidy --diff || (echo "Please run 'go mod tidy'." && exit 1)
test:
strategy:
matrix:
os: [ubuntu-latest, macos-latest, windows-latest]
runs-on: ${{ matrix.os }}
env:
CGO_ENABLED: '1'
GOEXPERIMENT: 'synctest'
steps:
- name: checkout
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # 4.2.2
- name: cache restore
uses: actions/cache/restore@1bd1e32a3bdc45362d1e726936510720a7c30a57 # v4.2.0
with:
# Note: unlike the other setups, this is only grabbing the mod download
# cache, rather than the whole mod directory, as the download cache
# contains zips that can be unpacked in parallel faster than they can be
# fetched and extracted by tar
path: |
~/.cache/go-build
~/go/pkg/mod/cache
~\AppData\Local\go-build
# NOTE: The -3- here should be incremented when the scheme of data to be
# cached changes (e.g. path above changes).
key: ${{ github.job }}-${{ runner.os }}-${{ matrix.goarch }}-${{ matrix.buildflags }}-go-3-${{ hashFiles('**/go.sum') }}-${{ github.run_id }}
restore-keys: |
${{ github.job }}-${{ runner.os }}-${{ matrix.goarch }}-${{ matrix.buildflags }}-go-3-${{ hashFiles('**/go.sum') }}
${{ github.job }}-${{ runner.os }}-${{ matrix.goarch }}-${{ matrix.buildflags }}-go-3-
- name: Setup Go
uses: actions/setup-go@v5
with:
# The caching strategy of setup-go is less than ideal, and wastes
# time by not saving artifacts due to small failures like the linter
# complaining, etc. This means subsequent have to rebuild their world
# again until all checks pass. For instance, if you mispell a word,
# you're punished until you fix it. This is more hostile than
# helpful.
cache: false
go-version-file: go.mod
# It is tempting to run this in a platform independent way, but the past
# shows this codebase will see introductions of platform specific code
# generation, and so we need to check this per platform to ensure we
# don't abuse go generate on specific platforms.
- name: check that 'go generate' is clean
if: always()
run: |
go generate ./...
git diff --name-only --exit-code || (echo "Please run 'go generate ./...'." && exit 1)
- name: go test
if: always()
run: go test -count=1 -benchtime=1x ./...
# TODO(bmizerany): replace this heavy tool with just the
# tools/checks/binaries we want and then make them all run in parallel
# across jobs, not on a single tiny vm on Github Actions.
- uses: golangci/golangci-lint-action@v6
with:
args: --timeout 10m0s -v
- name: cache save
# Always save the cache, even if the job fails. The artifacts produced
# during the building of test binaries are not all for naught. They can
# be used to speed up subsequent runs.
if: always()
uses: actions/cache/save@1bd1e32a3bdc45362d1e726936510720a7c30a57 # v4.2.0
with:
# Note: unlike the other setups, this is only grabbing the mod download
# cache, rather than the whole mod directory, as the download cache
# contains zips that can be unpacked in parallel faster than they can be
# fetched and extracted by tar
path: |
~/.cache/go-build
~/go/pkg/mod/cache
~\AppData\Local\go-build
# NOTE: The -3- here should be incremented when the scheme of data to be
# cached changes (e.g. path above changes).
key: ${{ github.job }}-${{ runner.os }}-${{ matrix.goarch }}-${{ matrix.buildflags }}-go-3-${{ hashFiles('**/go.sum') }}-${{ github.run_id }}
patches:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Verify patches apply cleanly and do not change files
run: |
make -f Makefile.sync clean checkout apply-patches sync
git diff --compact-summary --exit-code

12
.gitignore vendored
View File

@@ -4,5 +4,13 @@
.venv
.swp
dist
ollama
ggml-metal.metal
build
.cache
*.exe
.idea
test_data
*.crt
__debug_bin*
llama/build
llama/vendor
/ollama

10
.gitmodules vendored
View File

@@ -1,10 +0,0 @@
[submodule "llm/llama.cpp/ggml"]
path = llm/llama.cpp/ggml
url = https://github.com/ggerganov/llama.cpp.git
ignore = dirty
shallow = true
[submodule "llm/llama.cpp/gguf"]
path = llm/llama.cpp/gguf
url = https://github.com/ggerganov/llama.cpp.git
ignore = dirty
shallow = true

41
.golangci.yaml Normal file
View File

@@ -0,0 +1,41 @@
run:
timeout: 5m
linters:
enable:
- asasalint
- bidichk
- bodyclose
- containedctx
- gocheckcompilerdirectives
- gofmt
- gofumpt
- gosimple
- govet
- ineffassign
- intrange
- makezero
- misspell
- nilerr
- nolintlint
- nosprintfhostport
- staticcheck
- tenv
- unconvert
- wastedassign
- whitespace
disable:
- usestdlibvars
- errcheck
linters-settings:
staticcheck:
checks:
- all
- -SA1019 # omit Deprecated check
severity:
default-severity: error
rules:
- linters:
- gofmt
- goimports
- intrange
severity: info

View File

@@ -1,10 +0,0 @@
{
"trailingComma": "es5",
"tabWidth": 2,
"useTabs": false,
"semi": false,
"singleQuote": true,
"jsxSingleQuote": true,
"printWidth": 120,
"arrowParens": "avoid"
}

133
CMakeLists.txt Normal file
View File

@@ -0,0 +1,133 @@
cmake_minimum_required(VERSION 3.21)
project(Ollama C CXX)
include(CheckLanguage)
find_package(Threads REQUIRED)
set(CMAKE_BUILD_TYPE Release)
set(BUILD_SHARED_LIBS ON)
set(CMAKE_CXX_STANDARD 17)
set(CMAKE_CXX_STANDARD_REQUIRED ON)
set(CMAKE_CXX_EXTENSIONS OFF)
set(GGML_BUILD ON)
set(GGML_SHARED ON)
set(GGML_CCACHE ON)
set(GGML_BACKEND_DL ON)
set(GGML_BACKEND_SHARED ON)
set(GGML_SCHED_MAX_COPIES 4)
set(GGML_LLAMAFILE ON)
set(GGML_CUDA_PEER_MAX_BATCH_SIZE 128)
set(GGML_CUDA_GRAPHS ON)
set(GGML_CUDA_FA ON)
set(GGML_CUDA_COMPRESSION_MODE default)
if((CMAKE_OSX_ARCHITECTURES AND NOT CMAKE_OSX_ARCHITECTURES MATCHES "arm64")
OR (NOT CMAKE_OSX_ARCHITECTURES AND NOT CMAKE_SYSTEM_PROCESSOR MATCHES "arm|aarch64|ARM64|ARMv[0-9]+"))
set(GGML_CPU_ALL_VARIANTS ON)
endif()
if (CMAKE_OSX_ARCHITECTURES MATCHES "x86_64")
set(CMAKE_BUILD_RPATH "@loader_path")
set(CMAKE_INSTALL_RPATH "@loader_path")
endif()
set(OLLAMA_BUILD_DIR ${CMAKE_BINARY_DIR}/lib/ollama)
set(OLLAMA_INSTALL_DIR ${CMAKE_INSTALL_PREFIX}/lib/ollama)
set(CMAKE_RUNTIME_OUTPUT_DIRECTORY ${OLLAMA_BUILD_DIR})
set(CMAKE_RUNTIME_OUTPUT_DIRECTORY_DEBUG ${OLLAMA_BUILD_DIR})
set(CMAKE_RUNTIME_OUTPUT_DIRECTORY_RELEASE ${OLLAMA_BUILD_DIR})
set(CMAKE_LIBRARY_OUTPUT_DIRECTORY ${OLLAMA_BUILD_DIR})
set(CMAKE_LIBRARY_OUTPUT_DIRECTORY_DEBUG ${OLLAMA_BUILD_DIR})
set(CMAKE_LIBRARY_OUTPUT_DIRECTORY_RELEASE ${OLLAMA_BUILD_DIR})
include_directories(${CMAKE_CURRENT_SOURCE_DIR}/ml/backend/ggml/ggml/src)
include_directories(${CMAKE_CURRENT_SOURCE_DIR}/ml/backend/ggml/ggml/src/include)
include_directories(${CMAKE_CURRENT_SOURCE_DIR}/ml/backend/ggml/ggml/src/ggml-cpu)
include_directories(${CMAKE_CURRENT_SOURCE_DIR}/ml/backend/ggml/ggml/src/ggml-cpu/amx)
set(GGML_CPU ON)
add_subdirectory(${CMAKE_CURRENT_SOURCE_DIR}/ml/backend/ggml/ggml/src)
set_property(TARGET ggml PROPERTY EXCLUDE_FROM_ALL TRUE)
get_target_property(CPU_VARIANTS ggml-cpu MANUALLY_ADDED_DEPENDENCIES)
if(NOT CPU_VARIANTS)
set(CPU_VARIANTS "ggml-cpu")
endif()
install(TARGETS ggml-base ${CPU_VARIANTS}
RUNTIME_DEPENDENCIES
PRE_EXCLUDE_REGEXES ".*"
RUNTIME DESTINATION ${OLLAMA_INSTALL_DIR} COMPONENT CPU
LIBRARY DESTINATION ${OLLAMA_INSTALL_DIR} COMPONENT CPU
FRAMEWORK DESTINATION ${OLLAMA_INSTALL_DIR} COMPONENT CPU
)
check_language(CUDA)
if(CMAKE_CUDA_COMPILER)
if(CMAKE_VERSION VERSION_GREATER_EQUAL "3.24" AND NOT CMAKE_CUDA_ARCHITECTURES)
set(CMAKE_CUDA_ARCHITECTURES "native")
endif()
find_package(CUDAToolkit)
add_subdirectory(${CMAKE_CURRENT_SOURCE_DIR}/ml/backend/ggml/ggml/src/ggml-cuda)
set(OLLAMA_CUDA_INSTALL_DIR ${OLLAMA_INSTALL_DIR}/cuda_v${CUDAToolkit_VERSION_MAJOR})
install(TARGETS ggml-cuda
RUNTIME_DEPENDENCIES
DIRECTORIES ${CUDAToolkit_BIN_DIR} ${CUDAToolkit_LIBRARY_DIR}
PRE_INCLUDE_REGEXES cublas cublasLt cudart
PRE_EXCLUDE_REGEXES ".*"
RUNTIME DESTINATION ${OLLAMA_CUDA_INSTALL_DIR} COMPONENT CUDA
LIBRARY DESTINATION ${OLLAMA_CUDA_INSTALL_DIR} COMPONENT CUDA
)
endif()
set(WINDOWS_AMDGPU_TARGETS_EXCLUDE_REGEX "^gfx(906|908|90a|1200|1201):xnack[+-]$"
CACHE STRING
"Regular expression describing AMDGPU_TARGETS not supported on Windows. Override to force building these targets. Default \"^gfx(906|908|90a|1200|1201):xnack[+-]$\"."
)
check_language(HIP)
if(CMAKE_HIP_COMPILER)
set(HIP_PLATFORM "amd")
find_package(hip REQUIRED)
if(NOT AMDGPU_TARGETS)
list(FILTER AMDGPU_TARGETS INCLUDE REGEX "^gfx(900|94[012]|101[02]|1030|110[012]|120[01])$")
elseif(WIN32 AND WINDOWS_AMDGPU_TARGETS_EXCLUDE_REGEX)
list(FILTER AMDGPU_TARGETS EXCLUDE REGEX ${WINDOWS_AMDGPU_TARGETS_EXCLUDE_REGEX})
endif()
if(AMDGPU_TARGETS)
add_subdirectory(${CMAKE_CURRENT_SOURCE_DIR}/ml/backend/ggml/ggml/src/ggml-hip)
if (WIN32)
target_compile_definitions(ggml-hip PRIVATE GGML_CUDA_NO_PEER_COPY)
endif()
target_compile_definitions(ggml-hip PRIVATE GGML_HIP_NO_VMM)
set(OLLAMA_HIP_INSTALL_DIR ${OLLAMA_INSTALL_DIR}/rocm)
install(TARGETS ggml-hip
RUNTIME_DEPENDENCIES
DIRECTORIES ${HIP_BIN_INSTALL_DIR} ${HIP_LIB_INSTALL_DIR}
PRE_INCLUDE_REGEXES hipblas rocblas amdhip64 rocsolver amd_comgr hsa-runtime64 rocsparse tinfo rocprofiler-register drm drm_amdgpu numa elf
PRE_EXCLUDE_REGEXES ".*"
POST_EXCLUDE_REGEXES "system32"
RUNTIME DESTINATION ${OLLAMA_HIP_INSTALL_DIR} COMPONENT HIP
LIBRARY DESTINATION ${OLLAMA_HIP_INSTALL_DIR} COMPONENT HIP
)
foreach(HIP_LIB_BIN_INSTALL_DIR IN ITEMS ${HIP_BIN_INSTALL_DIR} ${HIP_LIB_INSTALL_DIR})
if(EXISTS ${HIP_LIB_BIN_INSTALL_DIR}/rocblas)
install(DIRECTORY ${HIP_LIB_BIN_INSTALL_DIR}/rocblas DESTINATION ${OLLAMA_HIP_INSTALL_DIR} COMPONENT HIP)
break()
endif()
endforeach()
endif()
endif()

112
CMakePresets.json Normal file
View File

@@ -0,0 +1,112 @@
{
"version": 3,
"configurePresets": [
{
"name": "Default",
"binaryDir": "${sourceDir}/build",
"installDir": "${sourceDir}/dist",
"cacheVariables": {
"CMAKE_BUILD_TYPE": "Release"
}
},
{
"name": "CPU",
"inherits": [ "Default" ]
},
{
"name": "CUDA",
"inherits": [ "Default" ]
},
{
"name": "CUDA 11",
"inherits": [ "CUDA" ],
"cacheVariables": {
"CMAKE_CUDA_ARCHITECTURES": "50;52;53;60;61;70;75;80;86",
"CMAKE_CUDA_FLAGS": "-Wno-deprecated-gpu-targets"
}
},
{
"name": "CUDA 12",
"inherits": [ "CUDA" ],
"cacheVariables": {
"CMAKE_CUDA_ARCHITECTURES": "50;60;61;70;75;80;86;87;89;90;90a;120",
"CMAKE_CUDA_FLAGS": "-Wno-deprecated-gpu-targets"
}
},
{
"name": "JetPack 5",
"inherits": [ "CUDA" ],
"cacheVariables": {
"CMAKE_CUDA_ARCHITECTURES": "72;87"
}
},
{
"name": "JetPack 6",
"inherits": [ "CUDA" ],
"cacheVariables": {
"CMAKE_CUDA_ARCHITECTURES": "87"
}
},
{
"name": "ROCm",
"inherits": [ "Default" ],
"cacheVariables": {
"CMAKE_HIP_PLATFORM": "amd"
}
},
{
"name": "ROCm 6",
"inherits": [ "ROCm" ],
"cacheVariables": {
"AMDGPU_TARGETS": "gfx900;gfx940;gfx941;gfx942;gfx1010;gfx1012;gfx1030;gfx1100;gfx1101;gfx1102;gfx1151;gfx1200;gfx1201;gfx906:xnack-;gfx908:xnack-;gfx90a:xnack+;gfx90a:xnack-"
}
}
],
"buildPresets": [
{
"name": "Default",
"configurePreset": "Default",
"configuration": "Release"
},
{
"name": "CPU",
"configurePreset": "Default",
"targets": [ "ggml-cpu" ]
},
{
"name": "CUDA",
"configurePreset": "CUDA",
"targets": [ "ggml-cuda" ]
},
{
"name": "CUDA 11",
"inherits": [ "CUDA" ],
"configurePreset": "CUDA 11"
},
{
"name": "CUDA 12",
"inherits": [ "CUDA" ],
"configurePreset": "CUDA 12"
},
{
"name": "JetPack 5",
"inherits": [ "CUDA" ],
"configurePreset": "JetPack 5"
},
{
"name": "JetPack 6",
"inherits": [ "CUDA" ],
"configurePreset": "JetPack 6"
},
{
"name": "ROCm",
"configurePreset": "ROCm",
"targets": [ "ggml-hip" ]
},
{
"name": "ROCm 6",
"inherits": [ "ROCm" ],
"configurePreset": "ROCm 6"
}
]
}

88
CONTRIBUTING.md Normal file
View File

@@ -0,0 +1,88 @@
# Contributing to Ollama
Thank you for your interest in contributing to Ollama! Here are a few guidelines to help get you started.
## Set up
See the [development documentation](./docs/development.md) for instructions on how to build and run Ollama locally.
### Ideal issues
* [Bugs](https://github.com/ollama/ollama/issues?q=is%3Aissue+is%3Aopen+label%3Abug): issues where Ollama stops working or where it results in an unexpected error.
* [Performance](https://github.com/ollama/ollama/issues?q=is%3Aissue+is%3Aopen+label%3Aperformance): issues to make Ollama faster at model inference, downloading or uploading.
* [Security](https://github.com/ollama/ollama/blob/main/SECURITY.md): issues that could lead to a security vulnerability. As mentioned in [SECURITY.md](https://github.com/ollama/ollama/blob/main/SECURITY.md), please do not disclose security vulnerabilities publicly.
### Issues that are harder to review
* New features: new features (e.g. API fields, environment variables) add surface area to Ollama and make it harder to maintain in the long run as they cannot be removed without potentially breaking users in the future.
* Refactoring: large code improvements are important, but can be harder or take longer to review and merge.
* Documentation: small updates to fill in or correct missing documentation is helpful, however large documentation additions can be hard to maintain over time.
### Issues that may not be accepted
* Changes that break backwards compatibility in Ollama's API (including the OpenAI-compatible API)
* Changes that add significant friction to the user experience
* Changes that create a large future maintenance burden for maintainers and contributors
## Proposing a (non-trivial) change
> By "non-trivial", we mean a change that is not a bug fix or small
> documentation update. If you are unsure, please ask us on our [Discord
> server](https://discord.gg/ollama).
Before opening a non-trivial Pull Request, please open an issue to discuss the change and
get feedback from the maintainers. This helps us understand the context of the
change and how it fits into Ollama's roadmap and prevents us from duplicating
work or you from spending time on a change that we may not be able to accept.
Tips for proposals:
* Explain the problem you are trying to solve, not what you are trying to do.
* Explain why the change is important.
* Explain how the change will be used.
* Explain how the change will be tested.
Additionally, for bonus points: Provide draft documentation you would expect to
see if the change were accepted.
## Pull requests
**Commit messages**
The title should look like:
<package>: <short description>
The package is the most affected Go package. If the change does not affect Go
code, then use the directory name instead. Changes to a single well-known
file in the root directory may use the file name.
The short description should start with a lowercase letter and be a
continuation of the sentence:
"This changes Ollama to..."
Examples:
llm/backend/mlx: support the llama architecture
CONTRIBUTING: provide clairity on good commit messages, and bad
Bad Examples:
feat: add more emoji
fix: was not using famous web framework
chore: generify code
**Tests**
Please include tests. Strive to test behavior, not implementation.
**New dependencies**
Dependencies should be added sparingly. If you are adding a new dependency,
please explain why it is necessary and what other ways you attempted that
did not work without it.
## Need help?
If you need help with anything, feel free to reach out to us on our [Discord server](https://discord.gg/ollama).

View File

@@ -1,23 +1,131 @@
FROM nvidia/cuda:11.8.0-devel-ubuntu22.04
# vim: filetype=dockerfile
ARG TARGETARCH
ARG GOFLAGS="'-ldflags=-w -s'"
ARG FLAVOR=${TARGETARCH}
WORKDIR /go/src/github.com/jmorganca/ollama
RUN apt-get update && apt-get install -y git build-essential cmake
ADD https://dl.google.com/go/go1.21.3.linux-$TARGETARCH.tar.gz /tmp/go1.21.3.tar.gz
RUN mkdir -p /usr/local && tar xz -C /usr/local </tmp/go1.21.3.tar.gz
ARG ROCMVERSION=6.3.3
ARG JETPACK5VERSION=r35.4.1
ARG JETPACK6VERSION=r36.4.0
ARG CMAKEVERSION=3.31.2
# CUDA v11 requires gcc v10. v10.3 has regressions, so the rockylinux 8.5 AppStream has the latest compatible version
FROM --platform=linux/amd64 rocm/dev-almalinux-8:${ROCMVERSION}-complete AS base-amd64
RUN yum install -y yum-utils \
&& yum-config-manager --add-repo https://dl.rockylinux.org/vault/rocky/8.5/AppStream/\$basearch/os/ \
&& rpm --import https://dl.rockylinux.org/pub/rocky/RPM-GPG-KEY-Rocky-8 \
&& dnf install -y yum-utils ccache gcc-toolset-10-gcc-10.2.1-8.2.el8 gcc-toolset-10-gcc-c++-10.2.1-8.2.el8 gcc-toolset-10-binutils-2.35-11.el8 \
&& yum-config-manager --add-repo https://developer.download.nvidia.com/compute/cuda/repos/rhel8/x86_64/cuda-rhel8.repo
ENV PATH=/opt/rh/gcc-toolset-10/root/usr/bin:$PATH
FROM --platform=linux/arm64 almalinux:8 AS base-arm64
# install epel-release for ccache
RUN yum install -y yum-utils epel-release \
&& dnf install -y clang ccache \
&& yum-config-manager --add-repo https://developer.download.nvidia.com/compute/cuda/repos/rhel8/sbsa/cuda-rhel8.repo
ENV CC=clang CXX=clang++
FROM base-${TARGETARCH} AS base
ARG CMAKEVERSION
RUN curl -fsSL https://github.com/Kitware/CMake/releases/download/v${CMAKEVERSION}/cmake-${CMAKEVERSION}-linux-$(uname -m).tar.gz | tar xz -C /usr/local --strip-components 1
COPY CMakeLists.txt CMakePresets.json .
COPY ml/backend/ggml/ggml ml/backend/ggml/ggml
ENV LDFLAGS=-s
FROM base AS cpu
RUN dnf install -y gcc-toolset-11-gcc gcc-toolset-11-gcc-c++
ENV PATH=/opt/rh/gcc-toolset-11/root/usr/bin:$PATH
RUN --mount=type=cache,target=/root/.ccache \
cmake --preset 'CPU' \
&& cmake --build --parallel --preset 'CPU' \
&& cmake --install build --component CPU --strip --parallel 8
FROM base AS cuda-11
ARG CUDA11VERSION=11.3
RUN dnf install -y cuda-toolkit-${CUDA11VERSION//./-}
ENV PATH=/usr/local/cuda-11/bin:$PATH
RUN --mount=type=cache,target=/root/.ccache \
cmake --preset 'CUDA 11' \
&& cmake --build --parallel --preset 'CUDA 11' \
&& cmake --install build --component CUDA --strip --parallel 8
FROM base AS cuda-12
ARG CUDA12VERSION=12.8
RUN dnf install -y cuda-toolkit-${CUDA12VERSION//./-}
ENV PATH=/usr/local/cuda-12/bin:$PATH
RUN --mount=type=cache,target=/root/.ccache \
cmake --preset 'CUDA 12' \
&& cmake --build --parallel --preset 'CUDA 12' \
&& cmake --install build --component CUDA --strip --parallel 8
FROM base AS rocm-6
ENV PATH=/opt/rocm/hcc/bin:/opt/rocm/hip/bin:/opt/rocm/bin:/opt/rocm/hcc/bin:$PATH
RUN --mount=type=cache,target=/root/.ccache \
cmake --preset 'ROCm 6' \
&& cmake --build --parallel --preset 'ROCm 6' \
&& cmake --install build --component HIP --strip --parallel 8
FROM --platform=linux/arm64 nvcr.io/nvidia/l4t-jetpack:${JETPACK5VERSION} AS jetpack-5
ARG CMAKEVERSION
RUN apt-get update && apt-get install -y curl ccache \
&& curl -fsSL https://github.com/Kitware/CMake/releases/download/v${CMAKEVERSION}/cmake-${CMAKEVERSION}-linux-$(uname -m).tar.gz | tar xz -C /usr/local --strip-components 1
COPY CMakeLists.txt CMakePresets.json .
COPY ml/backend/ggml/ggml ml/backend/ggml/ggml
RUN --mount=type=cache,target=/root/.ccache \
cmake --preset 'JetPack 5' \
&& cmake --build --parallel --preset 'JetPack 5' \
&& cmake --install build --component CUDA --strip --parallel 8
FROM --platform=linux/arm64 nvcr.io/nvidia/l4t-jetpack:${JETPACK6VERSION} AS jetpack-6
ARG CMAKEVERSION
RUN apt-get update && apt-get install -y curl ccache \
&& curl -fsSL https://github.com/Kitware/CMake/releases/download/v${CMAKEVERSION}/cmake-${CMAKEVERSION}-linux-$(uname -m).tar.gz | tar xz -C /usr/local --strip-components 1
COPY CMakeLists.txt CMakePresets.json .
COPY ml/backend/ggml/ggml ml/backend/ggml/ggml
RUN --mount=type=cache,target=/root/.ccache \
cmake --preset 'JetPack 6' \
&& cmake --build --parallel --preset 'JetPack 6' \
&& cmake --install build --component CUDA --strip --parallel 8
FROM base AS build
WORKDIR /go/src/github.com/ollama/ollama
COPY go.mod go.sum .
RUN curl -fsSL https://golang.org/dl/go$(awk '/^go/ { print $2 }' go.mod).linux-$(case $(uname -m) in x86_64) echo amd64 ;; aarch64) echo arm64 ;; esac).tar.gz | tar xz -C /usr/local
ENV PATH=/usr/local/go/bin:$PATH
RUN go mod download
COPY . .
ENV GOARCH=$TARGETARCH
ENV GOFLAGS=$GOFLAGS
RUN /usr/local/go/bin/go generate ./... \
&& /usr/local/go/bin/go build .
ARG GOFLAGS="'-ldflags=-w -s'"
ENV CGO_ENABLED=1
RUN --mount=type=cache,target=/root/.cache/go-build \
go build -trimpath -buildmode=pie -o /bin/ollama .
FROM ubuntu:22.04
RUN apt-get update && apt-get install -y ca-certificates
COPY --from=0 /go/src/github.com/jmorganca/ollama/ollama /bin/ollama
FROM --platform=linux/amd64 scratch AS amd64
COPY --from=cuda-11 dist/lib/ollama/cuda_v11 /lib/ollama/cuda_v11
COPY --from=cuda-12 dist/lib/ollama/cuda_v12 /lib/ollama/cuda_v12
FROM --platform=linux/arm64 scratch AS arm64
COPY --from=cuda-11 dist/lib/ollama/cuda_v11 /lib/ollama/cuda_v11
COPY --from=cuda-12 dist/lib/ollama/cuda_v12 /lib/ollama/cuda_v12
COPY --from=jetpack-5 dist/lib/ollama/cuda_v11 /lib/ollama/cuda_jetpack5
COPY --from=jetpack-6 dist/lib/ollama/cuda_v12 /lib/ollama/cuda_jetpack6
FROM scratch AS rocm
COPY --from=rocm-6 dist/lib/ollama/rocm /lib/ollama/rocm
FROM ${FLAVOR} AS archive
COPY --from=cpu dist/lib/ollama /lib/ollama
COPY --from=build /bin/ollama /bin/ollama
FROM ubuntu:20.04
RUN apt-get update \
&& apt-get install -y ca-certificates \
&& apt-get clean \
&& rm -rf /var/lib/apt/lists/*
COPY --from=archive /bin /usr/bin
ENV PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
COPY --from=archive /lib/ollama /usr/lib/ollama
ENV LD_LIBRARY_PATH=/usr/local/nvidia/lib:/usr/local/nvidia/lib64
ENV NVIDIA_DRIVER_CAPABILITIES=compute,utility
ENV NVIDIA_VISIBLE_DEVICES=all
ENV OLLAMA_HOST=0.0.0.0:11434
EXPOSE 11434
ENV OLLAMA_HOST 0.0.0.0
ENTRYPOINT ["/bin/ollama"]
CMD ["serve"]

View File

@@ -1,31 +0,0 @@
# centos7 amd64 dependencies
FROM --platform=linux/amd64 nvidia/cuda:11.3.1-devel-centos7 AS base-amd64
RUN yum install -y https://repo.ius.io/ius-release-el7.rpm centos-release-scl && \
yum update -y && \
yum install -y devtoolset-10-gcc devtoolset-10-gcc-c++ git236 wget
RUN wget "https://github.com/Kitware/CMake/releases/download/v3.27.6/cmake-3.27.6-linux-x86_64.sh" -O cmake-installer.sh && chmod +x cmake-installer.sh && ./cmake-installer.sh --skip-license --prefix=/usr/local
ENV PATH /opt/rh/devtoolset-10/root/usr/bin:$PATH
# centos8 arm64 dependencies
FROM --platform=linux/arm64 nvidia/cuda-arm64:11.3.1-devel-centos8 AS base-arm64
RUN sed -i -e 's/mirrorlist/#mirrorlist/g' -e 's|#baseurl=http://mirror.centos.org|baseurl=http://vault.centos.org|g' /etc/yum.repos.d/CentOS-*
RUN yum install -y git cmake
FROM base-${TARGETARCH}
ARG TARGETARCH
ARG GOFLAGS="'-ldflags -w -s'"
# install go
ADD https://dl.google.com/go/go1.21.3.linux-$TARGETARCH.tar.gz /tmp/go1.21.3.tar.gz
RUN mkdir -p /usr/local && tar xz -C /usr/local </tmp/go1.21.3.tar.gz
# build the final binary
WORKDIR /go/src/github.com/jmorganca/ollama
COPY . .
ENV GOOS=linux
ENV GOARCH=$TARGETARCH
ENV GOFLAGS=$GOFLAGS
RUN /usr/local/go/bin/go generate ./... && \
/usr/local/go/bin/go build .

60
Makefile.sync Normal file
View File

@@ -0,0 +1,60 @@
UPSTREAM=https://github.com/ggerganov/llama.cpp.git
WORKDIR=llama/vendor
FETCH_HEAD=2016f07bd106c73699ecbaace80f55db5ed95dac
.PHONY: help
help:
@echo "Available targets:"
@echo " sync Sync with upstream repositories"
@echo " checkout Checkout upstream repository"
@echo " apply-patches Apply patches to local repository"
@echo " format-patches Format patches from local repository"
@echo " clean Clean local repository"
@echo
@echo "Example:"
@echo " make -f $(lastword $(MAKEFILE_LIST)) clean sync"
.PHONY: sync
sync: llama/build-info.cpp llama/llama.cpp ml/backend/ggml/ggml
.PHONY: llama/build-info.cpp
llama/build-info.cpp: llama/build-info.cpp.in
sed -e 's|@FETCH_HEAD@|$(FETCH_HEAD)|' $< > $@
.PHONY: llama/llama.cpp
llama/llama.cpp: llama/vendor/
rsync -arvzc -f "merge $@/.rsync-filter" $< $@
.PHONY: ml/backend/ggml/ggml
ml/backend/ggml/ggml: llama/vendor/ggml/
rsync -arvzc -f "merge $@/.rsync-filter" $< $@
PATCHES=$(wildcard llama/patches/*.patch)
.PHONY: apply-patches
.NOTPARALLEL:
apply-patches: $(addsuffix ed, $(PATCHES))
%.patched: %.patch
@if git -c user.name=nobody -c 'user.email=<>' -C $(WORKDIR) am -3 $(realpath $<); then touch $@; else git -C $(WORKDIR) am --abort; exit 1; fi
.PHONY: checkout
checkout: $(WORKDIR)
git -C $(WORKDIR) fetch
git -C $(WORKDIR) checkout -f $(FETCH_HEAD)
$(WORKDIR):
git clone $(UPSTREAM) $(WORKDIR)
.PHONE: format-patches
format-patches: llama/patches
git -C $(WORKDIR) format-patch \
--no-signature \
--no-numbered \
--zero-commit \
-o $(realpath $<) \
$(FETCH_HEAD)
.PHONE: clean
clean: checkout
$(RM) $(addsuffix ed, $(PATCHES))

510
README.md
View File

@@ -1,64 +1,88 @@
<div align="center">
<picture>
<source media="(prefers-color-scheme: dark)" height="200px" srcset="https://github.com/jmorganca/ollama/assets/3325447/56ea1849-1284-4645-8970-956de6e51c3c">
<img alt="logo" height="200px" src="https://github.com/jmorganca/ollama/assets/3325447/0d0b44e2-8f4a-4e99-9b52-a5c1c741c8f7">
</picture>
  <a href="https://ollama.com">
<img alt="ollama" height="200px" src="https://github.com/ollama/ollama/assets/3325447/0d0b44e2-8f4a-4e99-9b52-a5c1c741c8f7">
</a>
</div>
# Ollama
[![Discord](https://dcbadge.vercel.app/api/server/ollama?style=flat&compact=true)](https://discord.gg/ollama)
Get up and running with large language models locally.
Get up and running with large language models.
### macOS
[Download](https://ollama.ai/download/Ollama-darwin.zip)
[Download](https://ollama.com/download/Ollama-darwin.zip)
### Windows
Coming soon!
[Download](https://ollama.com/download/OllamaSetup.exe)
### Linux & WSL2
### Linux
```
curl https://ollama.ai/install.sh | sh
```shell
curl -fsSL https://ollama.com/install.sh | sh
```
[Manual install instructions](https://github.com/jmorganca/ollama/blob/main/docs/linux.md)
[Manual install instructions](https://github.com/ollama/ollama/blob/main/docs/linux.md)
### Docker
See the official [Docker image](https://hub.docker.com/r/ollama/ollama).
The official [Ollama Docker image](https://hub.docker.com/r/ollama/ollama) `ollama/ollama` is available on Docker Hub.
### Libraries
- [ollama-python](https://github.com/ollama/ollama-python)
- [ollama-js](https://github.com/ollama/ollama-js)
### Community
- [Discord](https://discord.gg/ollama)
- [Reddit](https://reddit.com/r/ollama)
## Quickstart
To run and chat with [Llama 2](https://ollama.ai/library/llama2):
To run and chat with [Llama 3.2](https://ollama.com/library/llama3.2):
```
ollama run llama2
```shell
ollama run llama3.2
```
## Model library
Ollama supports a list of open-source models available on [ollama.ai/library](https://ollama.ai/library 'ollama model library')
Ollama supports a list of models available on [ollama.com/library](https://ollama.com/library 'ollama model library')
Here are some example open-source models that can be downloaded:
Here are some example models that can be downloaded:
| Model | Parameters | Size | Download |
| ------------------ | ---------- | ----- | ------------------------------ |
| ------------------ | ---------- | ----- | -------------------------------- |
| Gemma 3 | 1B | 815MB | `ollama run gemma3:1b` |
| Gemma 3 | 4B | 3.3GB | `ollama run gemma3` |
| Gemma 3 | 12B | 8.1GB | `ollama run gemma3:12b` |
| Gemma 3 | 27B | 17GB | `ollama run gemma3:27b` |
| QwQ | 32B | 20GB | `ollama run qwq` |
| DeepSeek-R1 | 7B | 4.7GB | `ollama run deepseek-r1` |
| DeepSeek-R1 | 671B | 404GB | `ollama run deepseek-r1:671b` |
| Llama 3.3 | 70B | 43GB | `ollama run llama3.3` |
| Llama 3.2 | 3B | 2.0GB | `ollama run llama3.2` |
| Llama 3.2 | 1B | 1.3GB | `ollama run llama3.2:1b` |
| Llama 3.2 Vision | 11B | 7.9GB | `ollama run llama3.2-vision` |
| Llama 3.2 Vision | 90B | 55GB | `ollama run llama3.2-vision:90b` |
| Llama 3.1 | 8B | 4.7GB | `ollama run llama3.1` |
| Llama 3.1 | 405B | 231GB | `ollama run llama3.1:405b` |
| Phi 4 | 14B | 9.1GB | `ollama run phi4` |
| Phi 4 Mini | 3.8B | 2.5GB | `ollama run phi4-mini` |
| Mistral | 7B | 4.1GB | `ollama run mistral` |
| Llama 2 | 7B | 3.8GB | `ollama run llama2` |
| Moondream 2 | 1.4B | 829MB | `ollama run moondream` |
| Neural Chat | 7B | 4.1GB | `ollama run neural-chat` |
| Starling | 7B | 4.1GB | `ollama run starling-lm` |
| Code Llama | 7B | 3.8GB | `ollama run codellama` |
| Llama 2 Uncensored | 7B | 3.8GB | `ollama run llama2-uncensored` |
| Llama 2 13B | 13B | 7.3GB | `ollama run llama2:13b` |
| Llama 2 70B | 70B | 39GB | `ollama run llama2:70b` |
| Orca Mini | 3B | 1.9GB | `ollama run orca-mini` |
| Vicuna | 7B | 3.8GB | `ollama run vicuna` |
| LLaVA | 7B | 4.5GB | `ollama run llava` |
| Granite-3.2 | 8B | 4.9GB | `ollama run granite3.2` |
> Note: You should have at least 8 GB of RAM to run the 3B models, 16 GB to run the 7B models, and 32 GB to run the 13B models.
> [!NOTE]
> You should have at least 8 GB of RAM available to run the 7B models, 16 GB to run the 13B models, and 32 GB to run the 33B models.
## Customize your own model
## Customize a model
### Import from GGUF
@@ -72,37 +96,37 @@ Ollama supports importing GGUF models in the Modelfile:
2. Create the model in Ollama
```
```shell
ollama create example -f Modelfile
```
3. Run the model
```
```shell
ollama run example
```
### Import from PyTorch or Safetensors
### Import from Safetensors
See the [guide](docs/import.md) on importing models for more information.
### Customize a prompt
Models from the Ollama library can be customized with a prompt. The example
Models from the Ollama library can be customized with a prompt. For example, to customize the `llama3.2` model:
```
ollama pull llama2
```shell
ollama pull llama3.2
```
Create a `Modelfile`:
```
FROM llama2
FROM llama3.2
# set the temperature to 1 [higher is more creative, lower is more coherent]
PARAMETER temperature 1
# set the system prompt
# set the system message
SYSTEM """
You are Mario from Super Mario Bros. Answer as Mario, the assistant, only.
"""
@@ -117,7 +141,7 @@ ollama run mario
Hello! It's your friend Mario.
```
For more examples, see the [examples](examples) directory. For more information on working with a Modelfile, see the [Modelfile](docs/modelfile.md) documentation.
For more information on working with a Modelfile, see the [Modelfile](docs/modelfile.md) documentation.
## CLI Reference
@@ -125,24 +149,28 @@ For more examples, see the [examples](examples) directory. For more information
`ollama create` is used to create a model from a Modelfile.
```shell
ollama create mymodel -f ./Modelfile
```
### Pull a model
```
ollama pull llama2
```shell
ollama pull llama3.2
```
> This command can also be used to update a local model. Only the diff will be pulled.
### Remove a model
```
ollama rm llama2
```shell
ollama rm llama3.2
```
### Copy a model
```
ollama cp llama2 my-llama2
```shell
ollama cp llama3.2 my-model
```
### Multiline input
@@ -156,80 +184,408 @@ For multiline input, you can wrap text with `"""`:
I'm a basic program that prints the famous "Hello, world!" message to the console.
```
### Pass in prompt as arguments
### Multimodal models
```
$ ollama run llama2 "summarize this file:" "$(cat README.md)"
Ollama is a lightweight, extensible framework for building and running language models on the local machine. It provides a simple API for creating, running, and managing models, as well as a library of pre-built models that can be easily used in a variety of applications.
ollama run llava "What's in this image? /Users/jmorgan/Desktop/smile.png"
```
> **Output**: The image features a yellow smiley face, which is likely the central focus of the picture.
### Pass the prompt as an argument
```shell
ollama run llama3.2 "Summarize this file: $(cat README.md)"
```
> **Output**: Ollama is a lightweight, extensible framework for building and running language models on the local machine. It provides a simple API for creating, running, and managing models, as well as a library of pre-built models that can be easily used in a variety of applications.
### Show model information
```shell
ollama show llama3.2
```
### List models on your computer
```
```shell
ollama list
```
### List which models are currently loaded
```shell
ollama ps
```
### Stop a model which is currently running
```shell
ollama stop llama3.2
```
### Start Ollama
`ollama serve` is used when you want to start ollama without running the desktop application.
## Building
Install `cmake` and `go`:
See the [developer guide](https://github.com/ollama/ollama/blob/main/docs/development.md)
```
brew install cmake
brew install go
```
Then generate dependencies and build:
```
go generate ./...
go build .
```
### Running local builds
Next, start the server:
```
```shell
./ollama serve
```
Finally, in a separate shell, run a model:
```
./ollama run llama2
```shell
./ollama run llama3.2
```
## REST API
See the [API documentation](docs/api.md) for all endpoints.
Ollama has a REST API for running and managing models.
Ollama has an API for running and managing models. For example to generate text from a model:
### Generate a response
```
curl -X POST http://localhost:11434/api/generate -d '{
"model": "llama2",
```shell
curl http://localhost:11434/api/generate -d '{
"model": "llama3.2",
"prompt":"Why is the sky blue?"
}'
```
### Chat with a model
```shell
curl http://localhost:11434/api/chat -d '{
"model": "llama3.2",
"messages": [
{ "role": "user", "content": "why is the sky blue?" }
]
}'
```
See the [API documentation](./docs/api.md) for all endpoints.
## Community Integrations
- [LangChain](https://python.langchain.com/docs/integrations/llms/ollama) and [LangChain.js](https://js.langchain.com/docs/modules/model_io/models/llms/integrations/ollama) with [example](https://js.langchain.com/docs/use_cases/question_answering/local_retrieval_qa)
- [LlamaIndex](https://gpt-index.readthedocs.io/en/stable/examples/llm/ollama.html)
### Web & Desktop
- [Open WebUI](https://github.com/open-webui/open-webui)
- [SwiftChat (macOS with ReactNative)](https://github.com/aws-samples/swift-chat)
- [Enchanted (macOS native)](https://github.com/AugustDev/enchanted)
- [Hollama](https://github.com/fmaclen/hollama)
- [Lollms-Webui](https://github.com/ParisNeo/lollms-webui)
- [LibreChat](https://github.com/danny-avila/LibreChat)
- [Bionic GPT](https://github.com/bionic-gpt/bionic-gpt)
- [HTML UI](https://github.com/rtcfirefly/ollama-ui)
- [Saddle](https://github.com/jikkuatwork/saddle)
- [TagSpaces](https://www.tagspaces.org) (A platform for file based apps, [utilizing Ollama](https://docs.tagspaces.org/ai/) for the generation of tags and descriptions)
- [Chatbot UI](https://github.com/ivanfioravanti/chatbot-ollama)
- [Chatbot UI v2](https://github.com/mckaywrigley/chatbot-ui)
- [Typescript UI](https://github.com/ollama-interface/Ollama-Gui?tab=readme-ov-file)
- [Minimalistic React UI for Ollama Models](https://github.com/richawo/minimal-llm-ui)
- [Ollamac](https://github.com/kevinhermawan/Ollamac)
- [big-AGI](https://github.com/enricoros/big-AGI)
- [Cheshire Cat assistant framework](https://github.com/cheshire-cat-ai/core)
- [Amica](https://github.com/semperai/amica)
- [chatd](https://github.com/BruceMacD/chatd)
- [Ollama-SwiftUI](https://github.com/kghandour/Ollama-SwiftUI)
- [Dify.AI](https://github.com/langgenius/dify)
- [MindMac](https://mindmac.app)
- [NextJS Web Interface for Ollama](https://github.com/jakobhoeg/nextjs-ollama-llm-ui)
- [Msty](https://msty.app)
- [Chatbox](https://github.com/Bin-Huang/Chatbox)
- [WinForm Ollama Copilot](https://github.com/tgraupmann/WinForm_Ollama_Copilot)
- [NextChat](https://github.com/ChatGPTNextWeb/ChatGPT-Next-Web) with [Get Started Doc](https://docs.nextchat.dev/models/ollama)
- [Alpaca WebUI](https://github.com/mmo80/alpaca-webui)
- [OllamaGUI](https://github.com/enoch1118/ollamaGUI)
- [OpenAOE](https://github.com/InternLM/OpenAOE)
- [Odin Runes](https://github.com/leonid20000/OdinRunes)
- [LLM-X](https://github.com/mrdjohnson/llm-x) (Progressive Web App)
- [AnythingLLM (Docker + MacOs/Windows/Linux native app)](https://github.com/Mintplex-Labs/anything-llm)
- [Ollama Basic Chat: Uses HyperDiv Reactive UI](https://github.com/rapidarchitect/ollama_basic_chat)
- [Ollama-chats RPG](https://github.com/drazdra/ollama-chats)
- [IntelliBar](https://intellibar.app/) (AI-powered assistant for macOS)
- [QA-Pilot](https://github.com/reid41/QA-Pilot) (Interactive chat tool that can leverage Ollama models for rapid understanding and navigation of GitHub code repositories)
- [ChatOllama](https://github.com/sugarforever/chat-ollama) (Open Source Chatbot based on Ollama with Knowledge Bases)
- [CRAG Ollama Chat](https://github.com/Nagi-ovo/CRAG-Ollama-Chat) (Simple Web Search with Corrective RAG)
- [RAGFlow](https://github.com/infiniflow/ragflow) (Open-source Retrieval-Augmented Generation engine based on deep document understanding)
- [StreamDeploy](https://github.com/StreamDeploy-DevRel/streamdeploy-llm-app-scaffold) (LLM Application Scaffold)
- [chat](https://github.com/swuecho/chat) (chat web app for teams)
- [Lobe Chat](https://github.com/lobehub/lobe-chat) with [Integrating Doc](https://lobehub.com/docs/self-hosting/examples/ollama)
- [Ollama RAG Chatbot](https://github.com/datvodinh/rag-chatbot.git) (Local Chat with multiple PDFs using Ollama and RAG)
- [BrainSoup](https://www.nurgo-software.com/products/brainsoup) (Flexible native client with RAG & multi-agent automation)
- [macai](https://github.com/Renset/macai) (macOS client for Ollama, ChatGPT, and other compatible API back-ends)
- [RWKV-Runner](https://github.com/josStorer/RWKV-Runner) (RWKV offline LLM deployment tool, also usable as a client for ChatGPT and Ollama)
- [Ollama Grid Search](https://github.com/dezoito/ollama-grid-search) (app to evaluate and compare models)
- [Olpaka](https://github.com/Otacon/olpaka) (User-friendly Flutter Web App for Ollama)
- [Casibase](https://casibase.org) (An open source AI knowledge base and dialogue system combining the latest RAG, SSO, ollama support and multiple large language models.)
- [OllamaSpring](https://github.com/CrazyNeil/OllamaSpring) (Ollama Client for macOS)
- [LLocal.in](https://github.com/kartikm7/llocal) (Easy to use Electron Desktop Client for Ollama)
- [Shinkai Desktop](https://github.com/dcSpark/shinkai-apps) (Two click install Local AI using Ollama + Files + RAG)
- [AiLama](https://github.com/zeyoyt/ailama) (A Discord User App that allows you to interact with Ollama anywhere in discord )
- [Ollama with Google Mesop](https://github.com/rapidarchitect/ollama_mesop/) (Mesop Chat Client implementation with Ollama)
- [R2R](https://github.com/SciPhi-AI/R2R) (Open-source RAG engine)
- [Ollama-Kis](https://github.com/elearningshow/ollama-kis) (A simple easy to use GUI with sample custom LLM for Drivers Education)
- [OpenGPA](https://opengpa.org) (Open-source offline-first Enterprise Agentic Application)
- [Painting Droid](https://github.com/mateuszmigas/painting-droid) (Painting app with AI integrations)
- [Kerlig AI](https://www.kerlig.com/) (AI writing assistant for macOS)
- [AI Studio](https://github.com/MindWorkAI/AI-Studio)
- [Sidellama](https://github.com/gyopak/sidellama) (browser-based LLM client)
- [LLMStack](https://github.com/trypromptly/LLMStack) (No-code multi-agent framework to build LLM agents and workflows)
- [BoltAI for Mac](https://boltai.com) (AI Chat Client for Mac)
- [Harbor](https://github.com/av/harbor) (Containerized LLM Toolkit with Ollama as default backend)
- [PyGPT](https://github.com/szczyglis-dev/py-gpt) (AI desktop assistant for Linux, Windows and Mac)
- [Alpaca](https://github.com/Jeffser/Alpaca) (An Ollama client application for linux and macos made with GTK4 and Adwaita)
- [AutoGPT](https://github.com/Significant-Gravitas/AutoGPT/blob/master/docs/content/platform/ollama.md) (AutoGPT Ollama integration)
- [Go-CREW](https://www.jonathanhecl.com/go-crew/) (Powerful Offline RAG in Golang)
- [PartCAD](https://github.com/openvmp/partcad/) (CAD model generation with OpenSCAD and CadQuery)
- [Ollama4j Web UI](https://github.com/ollama4j/ollama4j-web-ui) - Java-based Web UI for Ollama built with Vaadin, Spring Boot and Ollama4j
- [PyOllaMx](https://github.com/kspviswa/pyOllaMx) - macOS application capable of chatting with both Ollama and Apple MLX models.
- [Cline](https://github.com/cline/cline) - Formerly known as Claude Dev is a VSCode extension for multi-file/whole-repo coding
- [Cherry Studio](https://github.com/kangfenmao/cherry-studio) (Desktop client with Ollama support)
- [ConfiChat](https://github.com/1runeberg/confichat) (Lightweight, standalone, multi-platform, and privacy focused LLM chat interface with optional encryption)
- [Archyve](https://github.com/nickthecook/archyve) (RAG-enabling document library)
- [crewAI with Mesop](https://github.com/rapidarchitect/ollama-crew-mesop) (Mesop Web Interface to run crewAI with Ollama)
- [Tkinter-based client](https://github.com/chyok/ollama-gui) (Python tkinter-based Client for Ollama)
- [LLMChat](https://github.com/trendy-design/llmchat) (Privacy focused, 100% local, intuitive all-in-one chat interface)
- [Local Multimodal AI Chat](https://github.com/Leon-Sander/Local-Multimodal-AI-Chat) (Ollama-based LLM Chat with support for multiple features, including PDF RAG, voice chat, image-based interactions, and integration with OpenAI.)
- [ARGO](https://github.com/xark-argo/argo) (Locally download and run Ollama and Huggingface models with RAG on Mac/Windows/Linux)
- [OrionChat](https://github.com/EliasPereirah/OrionChat) - OrionChat is a web interface for chatting with different AI providers
- [G1](https://github.com/bklieger-groq/g1) (Prototype of using prompting strategies to improve the LLM's reasoning through o1-like reasoning chains.)
- [Web management](https://github.com/lemonit-eric-mao/ollama-web-management) (Web management page)
- [Promptery](https://github.com/promptery/promptery) (desktop client for Ollama.)
- [Ollama App](https://github.com/JHubi1/ollama-app) (Modern and easy-to-use multi-platform client for Ollama)
- [chat-ollama](https://github.com/annilq/chat-ollama) (a React Native client for Ollama)
- [SpaceLlama](https://github.com/tcsenpai/spacellama) (Firefox and Chrome extension to quickly summarize web pages with ollama in a sidebar)
- [YouLama](https://github.com/tcsenpai/youlama) (Webapp to quickly summarize any YouTube video, supporting Invidious as well)
- [DualMind](https://github.com/tcsenpai/dualmind) (Experimental app allowing two models to talk to each other in the terminal or in a web interface)
- [ollamarama-matrix](https://github.com/h1ddenpr0cess20/ollamarama-matrix) (Ollama chatbot for the Matrix chat protocol)
- [ollama-chat-app](https://github.com/anan1213095357/ollama-chat-app) (Flutter-based chat app)
- [Perfect Memory AI](https://www.perfectmemory.ai/) (Productivity AI assists personalized by what you have seen on your screen, heard and said in the meetings)
- [Hexabot](https://github.com/hexastack/hexabot) (A conversational AI builder)
- [Reddit Rate](https://github.com/rapidarchitect/reddit_analyzer) (Search and Rate Reddit topics with a weighted summation)
- [OpenTalkGpt](https://github.com/adarshM84/OpenTalkGpt) (Chrome Extension to manage open-source models supported by Ollama, create custom models, and chat with models from a user-friendly UI)
- [VT](https://github.com/vinhnx/vt.ai) (A minimal multimodal AI chat app, with dynamic conversation routing. Supports local models via Ollama)
- [Nosia](https://github.com/nosia-ai/nosia) (Easy to install and use RAG platform based on Ollama)
- [Witsy](https://github.com/nbonamy/witsy) (An AI Desktop application available for Mac/Windows/Linux)
- [Abbey](https://github.com/US-Artificial-Intelligence/abbey) (A configurable AI interface server with notebooks, document storage, and YouTube support)
- [Minima](https://github.com/dmayboroda/minima) (RAG with on-premises or fully local workflow)
- [aidful-ollama-model-delete](https://github.com/AidfulAI/aidful-ollama-model-delete) (User interface for simplified model cleanup)
- [Perplexica](https://github.com/ItzCrazyKns/Perplexica) (An AI-powered search engine & an open-source alternative to Perplexity AI)
- [Ollama Chat WebUI for Docker ](https://github.com/oslook/ollama-webui) (Support for local docker deployment, lightweight ollama webui)
- [AI Toolkit for Visual Studio Code](https://aka.ms/ai-tooklit/ollama-docs) (Microsoft-official VSCode extension to chat, test, evaluate models with Ollama support, and use them in your AI applications.)
- [MinimalNextOllamaChat](https://github.com/anilkay/MinimalNextOllamaChat) (Minimal Web UI for Chat and Model Control)
- [Chipper](https://github.com/TilmanGriesel/chipper) AI interface for tinkerers (Ollama, Haystack RAG, Python)
- [ChibiChat](https://github.com/CosmicEventHorizon/ChibiChat) (Kotlin-based Android app to chat with Ollama and Koboldcpp API endpoints)
- [LocalLLM](https://github.com/qusaismael/localllm) (Minimal Web-App to run ollama models on it with a GUI)
- [Ollamazing](https://github.com/buiducnhat/ollamazing) (Web extension to run Ollama models)
- [OpenDeepResearcher-via-searxng](https://github.com/benhaotang/OpenDeepResearcher-via-searxng) (A Deep Research equivent endpoint with Ollama support for running locally)
- [AntSK](https://github.com/AIDotNet/AntSK) (Out-of-the-box & Adaptable RAG Chatbot)
- [MaxKB](https://github.com/1Panel-dev/MaxKB/) (Ready-to-use & flexible RAG Chatbot)
- [yla](https://github.com/danielekp/yla) (Web interface to freely interact with your customized models)
- [LangBot](https://github.com/RockChinQ/LangBot) (LLM-based instant messaging bots platform, with Agents, RAG features, supports multiple platforms)
- [1Panel](https://github.com/1Panel-dev/1Panel/) (Web-based Linux Server Management Tool)
- [AstrBot](https://github.com/Soulter/AstrBot/) (User-friendly LLM-based multi-platform chatbot with a WebUI, supporting RAG, LLM agents, and plugins integration)
- [Reins](https://github.com/ibrahimcetin/reins) (Easily tweak parameters, customize system prompts per chat, and enhance your AI experiments with reasoning model support.)
- [Ellama](https://github.com/zeozeozeo/ellama) (Friendly native app to chat with an Ollama instance)
- [screenpipe](https://github.com/mediar-ai/screenpipe) Build agents powered by your screen history
- [Ollamb](https://github.com/hengkysteen/ollamb) (Simple yet rich in features, cross-platform built with Flutter and designed for Ollama. Try the [web demo](https://hengkysteen.github.io/demo/ollamb/).)
- [Writeopia](https://github.com/Writeopia/Writeopia) (Text editor with integration with Ollama)
- [AppFlowy](https://github.com/AppFlowy-IO/AppFlowy) (AI collaborative workspace with Ollama, cross-platform and self-hostable)
### Cloud
- [Google Cloud](https://cloud.google.com/run/docs/tutorials/gpu-gemma2-with-ollama)
- [Fly.io](https://fly.io/docs/python/do-more/add-ollama/)
- [Koyeb](https://www.koyeb.com/deploy/ollama)
### Terminal
- [oterm](https://github.com/ggozad/oterm)
- [Ellama Emacs client](https://github.com/s-kostyaev/ellama)
- [Emacs client](https://github.com/zweifisch/ollama)
- [neollama](https://github.com/paradoxical-dev/neollama) UI client for interacting with models from within Neovim
- [gen.nvim](https://github.com/David-Kunz/gen.nvim)
- [ollama.nvim](https://github.com/nomnivore/ollama.nvim)
- [ollero.nvim](https://github.com/marco-souza/ollero.nvim)
- [ollama-chat.nvim](https://github.com/gerazov/ollama-chat.nvim)
- [ogpt.nvim](https://github.com/huynle/ogpt.nvim)
- [gptel Emacs client](https://github.com/karthink/gptel)
- [Oatmeal](https://github.com/dustinblackman/oatmeal)
- [cmdh](https://github.com/pgibler/cmdh)
- [ooo](https://github.com/npahlfer/ooo)
- [shell-pilot](https://github.com/reid41/shell-pilot)(Interact with models via pure shell scripts on Linux or macOS)
- [tenere](https://github.com/pythops/tenere)
- [llm-ollama](https://github.com/taketwo/llm-ollama) for [Datasette's LLM CLI](https://llm.datasette.io/en/stable/).
- [typechat-cli](https://github.com/anaisbetts/typechat-cli)
- [ShellOracle](https://github.com/djcopley/ShellOracle)
- [tlm](https://github.com/yusufcanb/tlm)
- [podman-ollama](https://github.com/ericcurtin/podman-ollama)
- [gollama](https://github.com/sammcj/gollama)
- [ParLlama](https://github.com/paulrobello/parllama)
- [Ollama eBook Summary](https://github.com/cognitivetech/ollama-ebook-summary/)
- [Ollama Mixture of Experts (MOE) in 50 lines of code](https://github.com/rapidarchitect/ollama_moe)
- [vim-intelligence-bridge](https://github.com/pepo-ec/vim-intelligence-bridge) Simple interaction of "Ollama" with the Vim editor
- [x-cmd ollama](https://x-cmd.com/mod/ollama)
- [bb7](https://github.com/drunkwcodes/bb7)
- [SwollamaCLI](https://github.com/marcusziade/Swollama) bundled with the Swollama Swift package. [Demo](https://github.com/marcusziade/Swollama?tab=readme-ov-file#cli-usage)
- [aichat](https://github.com/sigoden/aichat) All-in-one LLM CLI tool featuring Shell Assistant, Chat-REPL, RAG, AI tools & agents, with access to OpenAI, Claude, Gemini, Ollama, Groq, and more.
- [PowershAI](https://github.com/rrg92/powershai) PowerShell module that brings AI to terminal on Windows, including support for Ollama
- [DeepShell](https://github.com/Abyss-c0re/deepshell) Your self-hosted AI assistant. Interactive Shell, Files and Folders analysis.
- [orbiton](https://github.com/xyproto/orbiton) Configuration-free text editor and IDE with support for tab completion with Ollama.
- [orca-cli](https://github.com/molbal/orca-cli) Ollama Registry CLI Application - Browse, pull and download models from Ollama Registry in your terminal.
- [GGUF-to-Ollama](https://github.com/jonathanhecl/gguf-to-ollama) - Importing GGUF to Ollama made easy (multiplatform)
### Apple Vision Pro
- [SwiftChat](https://github.com/aws-samples/swift-chat) (Cross-platform AI chat app supporting Apple Vision Pro via "Designed for iPad")
- [Enchanted](https://github.com/AugustDev/enchanted)
### Database
- [pgai](https://github.com/timescale/pgai) - PostgreSQL as a vector database (Create and search embeddings from Ollama models using pgvector)
- [Get started guide](https://github.com/timescale/pgai/blob/main/docs/vectorizer-quick-start.md)
- [MindsDB](https://github.com/mindsdb/mindsdb/blob/staging/mindsdb/integrations/handlers/ollama_handler/README.md) (Connects Ollama models with nearly 200 data platforms and apps)
- [chromem-go](https://github.com/philippgille/chromem-go/blob/v0.5.0/embed_ollama.go) with [example](https://github.com/philippgille/chromem-go/tree/v0.5.0/examples/rag-wikipedia-ollama)
- [Kangaroo](https://github.com/dbkangaroo/kangaroo) (AI-powered SQL client and admin tool for popular databases)
### Package managers
- [Pacman](https://archlinux.org/packages/extra/x86_64/ollama/)
- [Gentoo](https://github.com/gentoo/guru/tree/master/app-misc/ollama)
- [Homebrew](https://formulae.brew.sh/formula/ollama)
- [Helm Chart](https://artifacthub.io/packages/helm/ollama-helm/ollama)
- [Guix channel](https://codeberg.org/tusharhero/ollama-guix)
- [Nix package](https://search.nixos.org/packages?show=ollama&from=0&size=50&sort=relevance&type=packages&query=ollama)
- [Flox](https://flox.dev/blog/ollama-part-one)
### Libraries
- [LangChain](https://python.langchain.com/docs/integrations/llms/ollama) and [LangChain.js](https://js.langchain.com/docs/integrations/chat/ollama/) with [example](https://js.langchain.com/docs/tutorials/local_rag/)
- [Firebase Genkit](https://firebase.google.com/docs/genkit/plugins/ollama)
- [crewAI](https://github.com/crewAIInc/crewAI)
- [Yacana](https://remembersoftwares.github.io/yacana/) (User-friendly multi-agent framework for brainstorming and executing predetermined flows with built-in tool integration)
- [Spring AI](https://github.com/spring-projects/spring-ai) with [reference](https://docs.spring.io/spring-ai/reference/api/chat/ollama-chat.html) and [example](https://github.com/tzolov/ollama-tools)
- [LangChainGo](https://github.com/tmc/langchaingo/) with [example](https://github.com/tmc/langchaingo/tree/main/examples/ollama-completion-example)
- [LangChain4j](https://github.com/langchain4j/langchain4j) with [example](https://github.com/langchain4j/langchain4j-examples/tree/main/ollama-examples/src/main/java)
- [LangChainRust](https://github.com/Abraxas-365/langchain-rust) with [example](https://github.com/Abraxas-365/langchain-rust/blob/main/examples/llm_ollama.rs)
- [LangChain for .NET](https://github.com/tryAGI/LangChain) with [example](https://github.com/tryAGI/LangChain/blob/main/examples/LangChain.Samples.OpenAI/Program.cs)
- [LLPhant](https://github.com/theodo-group/LLPhant?tab=readme-ov-file#ollama)
- [LlamaIndex](https://docs.llamaindex.ai/en/stable/examples/llm/ollama/) and [LlamaIndexTS](https://ts.llamaindex.ai/modules/llms/available_llms/ollama)
- [LiteLLM](https://github.com/BerriAI/litellm)
- [OllamaFarm for Go](https://github.com/presbrey/ollamafarm)
- [OllamaSharp for .NET](https://github.com/awaescher/OllamaSharp)
- [Ollama for Ruby](https://github.com/gbaptista/ollama-ai)
- [Ollama-rs for Rust](https://github.com/pepperoni21/ollama-rs)
- [Ollama-hpp for C++](https://github.com/jmont-dev/ollama-hpp)
- [Ollama4j for Java](https://github.com/ollama4j/ollama4j)
- [ModelFusion Typescript Library](https://modelfusion.dev/integration/model-provider/ollama)
- [OllamaKit for Swift](https://github.com/kevinhermawan/OllamaKit)
- [Ollama for Dart](https://github.com/breitburg/dart-ollama)
- [Ollama for Laravel](https://github.com/cloudstudio/ollama-laravel)
- [LangChainDart](https://github.com/davidmigloz/langchain_dart)
- [Semantic Kernel - Python](https://github.com/microsoft/semantic-kernel/tree/main/python/semantic_kernel/connectors/ai/ollama)
- [Haystack](https://github.com/deepset-ai/haystack-integrations/blob/main/integrations/ollama.md)
- [Elixir LangChain](https://github.com/brainlid/langchain)
- [Ollama for R - rollama](https://github.com/JBGruber/rollama)
- [Ollama for R - ollama-r](https://github.com/hauselin/ollama-r)
- [Ollama-ex for Elixir](https://github.com/lebrunel/ollama-ex)
- [Ollama Connector for SAP ABAP](https://github.com/b-tocs/abap_btocs_ollama)
- [Testcontainers](https://testcontainers.com/modules/ollama/)
- [Portkey](https://portkey.ai/docs/welcome/integration-guides/ollama)
- [PromptingTools.jl](https://github.com/svilupp/PromptingTools.jl) with an [example](https://svilupp.github.io/PromptingTools.jl/dev/examples/working_with_ollama)
- [LlamaScript](https://github.com/Project-Llama/llamascript)
- [llm-axe](https://github.com/emirsahin1/llm-axe) (Python Toolkit for Building LLM Powered Apps)
- [Gollm](https://docs.gollm.co/examples/ollama-example)
- [Gollama for Golang](https://github.com/jonathanhecl/gollama)
- [Ollamaclient for Golang](https://github.com/xyproto/ollamaclient)
- [High-level function abstraction in Go](https://gitlab.com/tozd/go/fun)
- [Ollama PHP](https://github.com/ArdaGnsrn/ollama-php)
- [Agents-Flex for Java](https://github.com/agents-flex/agents-flex) with [example](https://github.com/agents-flex/agents-flex/tree/main/agents-flex-llm/agents-flex-llm-ollama/src/test/java/com/agentsflex/llm/ollama)
- [Parakeet](https://github.com/parakeet-nest/parakeet) is a GoLang library, made to simplify the development of small generative AI applications with Ollama.
- [Haverscript](https://github.com/andygill/haverscript) with [examples](https://github.com/andygill/haverscript/tree/main/examples)
- [Ollama for Swift](https://github.com/mattt/ollama-swift)
- [Swollama for Swift](https://github.com/marcusziade/Swollama) with [DocC](https://marcusziade.github.io/Swollama/documentation/swollama/)
- [GoLamify](https://github.com/prasad89/golamify)
- [Ollama for Haskell](https://github.com/tusharad/ollama-haskell)
- [multi-llm-ts](https://github.com/nbonamy/multi-llm-ts) (A Typescript/JavaScript library allowing access to different LLM in unified API)
- [LlmTornado](https://github.com/lofcz/llmtornado) (C# library providing a unified interface for major FOSS & Commercial inference APIs)
- [Ollama for Zig](https://github.com/dravenk/ollama-zig)
- [Abso](https://github.com/lunary-ai/abso) (OpenAI-compatible TypeScript SDK for any LLM provider)
- [Nichey](https://github.com/goodreasonai/nichey) is a Python package for generating custom wikis for your research topic
- [Ollama for D](https://github.com/kassane/ollama-d)
### Mobile
- [SwiftChat](https://github.com/aws-samples/swift-chat) (Lightning-fast Cross-platform AI chat app with native UI for Android, iOS and iPad)
- [Enchanted](https://github.com/AugustDev/enchanted)
- [Maid](https://github.com/Mobile-Artificial-Intelligence/maid)
- [Ollama App](https://github.com/JHubi1/ollama-app) (Modern and easy-to-use multi-platform client for Ollama)
- [ConfiChat](https://github.com/1runeberg/confichat) (Lightweight, standalone, multi-platform, and privacy focused LLM chat interface with optional encryption)
- [Ollama Android Chat](https://github.com/sunshine0523/OllamaServer) (No need for Termux, start the Ollama service with one click on an Android device)
- [Reins](https://github.com/ibrahimcetin/reins) (Easily tweak parameters, customize system prompts per chat, and enhance your AI experiments with reasoning model support.)
### Extensions & Plugins
- [Raycast extension](https://github.com/MassimilianoPasquini97/raycast_ollama)
- [Discollama](https://github.com/mxyng/discollama) (Discord bot inside the Ollama discord channel)
- [Continue](https://github.com/continuedev/continue)
- [Vibe](https://github.com/thewh1teagle/vibe) (Transcribe and analyze meetings with Ollama)
- [Obsidian Ollama plugin](https://github.com/hinterdupfinger/obsidian-ollama)
- [Logseq Ollama plugin](https://github.com/omagdy7/ollama-logseq)
- [NotesOllama](https://github.com/andersrex/notesollama) (Apple Notes Ollama plugin)
- [Dagger Chatbot](https://github.com/samalba/dagger-chatbot)
- [LiteLLM](https://github.com/BerriAI/litellm)
- [Discord AI Bot](https://github.com/mekb-turtle/discord-ai-bot)
- [Chatbot UI](https://github.com/ivanfioravanti/chatbot-ollama)
- [HTML UI](https://github.com/rtcfirefly/ollama-ui)
- [Typescript UI](https://github.com/ollama-interface/Ollama-Gui?tab=readme-ov-file)
- [Dumbar](https://github.com/JerrySievert/Dumbar)
- [Emacs client](https://github.com/zweifisch/ollama)
- [oterm](https://github.com/ggozad/oterm)
- [Ellama Emacs client](https://github.com/s-kostyaev/ellama)
- [OllamaSharp for .NET](https://github.com/awaescher/OllamaSharp)
- [Ollama Telegram Bot](https://github.com/ruecat/ollama-telegram)
- [Hass Ollama Conversation](https://github.com/ej52/hass-ollama-conversation)
- [Rivet plugin](https://github.com/abrenneke/rivet-plugin-ollama)
- [Obsidian BMO Chatbot plugin](https://github.com/longy2k/obsidian-bmo-chatbot)
- [Cliobot](https://github.com/herval/cliobot) (Telegram bot with Ollama support)
- [Copilot for Obsidian plugin](https://github.com/logancyang/obsidian-copilot)
- [Obsidian Local GPT plugin](https://github.com/pfrankov/obsidian-local-gpt)
- [Open Interpreter](https://docs.openinterpreter.com/language-model-setup/local-models/ollama)
- [Llama Coder](https://github.com/ex3ndr/llama-coder) (Copilot alternative using Ollama)
- [Ollama Copilot](https://github.com/bernardo-bruning/ollama-copilot) (Proxy that allows you to use ollama as a copilot like Github copilot)
- [twinny](https://github.com/rjmacarthy/twinny) (Copilot and Copilot chat alternative using Ollama)
- [Wingman-AI](https://github.com/RussellCanfield/wingman-ai) (Copilot code and chat alternative using Ollama and Hugging Face)
- [Page Assist](https://github.com/n4ze3m/page-assist) (Chrome Extension)
- [Plasmoid Ollama Control](https://github.com/imoize/plasmoid-ollamacontrol) (KDE Plasma extension that allows you to quickly manage/control Ollama model)
- [AI Telegram Bot](https://github.com/tusharhero/aitelegrambot) (Telegram bot using Ollama in backend)
- [AI ST Completion](https://github.com/yaroslavyaroslav/OpenAI-sublime-text) (Sublime Text 4 AI assistant plugin with Ollama support)
- [Discord-Ollama Chat Bot](https://github.com/kevinthedang/discord-ollama) (Generalized TypeScript Discord Bot w/ Tuning Documentation)
- [ChatGPTBox: All in one browser extension](https://github.com/josStorer/chatGPTBox) with [Integrating Tutorial](https://github.com/josStorer/chatGPTBox/issues/616#issuecomment-1975186467)
- [Discord AI chat/moderation bot](https://github.com/rapmd73/Companion) Chat/moderation bot written in python. Uses Ollama to create personalities.
- [Headless Ollama](https://github.com/nischalj10/headless-ollama) (Scripts to automatically install ollama client & models on any OS for apps that depends on ollama server)
- [Terraform AWS Ollama & Open WebUI](https://github.com/xuyangbocn/terraform-aws-self-host-llm) (A Terraform module to deploy on AWS a ready-to-use Ollama service, together with its front end Open WebUI service.)
- [node-red-contrib-ollama](https://github.com/jakubburkiewicz/node-red-contrib-ollama)
- [Local AI Helper](https://github.com/ivostoykov/localAI) (Chrome and Firefox extensions that enable interactions with the active tab and customisable API endpoints. Includes secure storage for user prompts.)
- [vnc-lm](https://github.com/jake83741/vnc-lm) (Discord bot for messaging with LLMs through Ollama and LiteLLM. Seamlessly move between local and flagship models.)
- [LSP-AI](https://github.com/SilasMarvin/lsp-ai) (Open-source language server for AI-powered functionality)
- [QodeAssist](https://github.com/Palm1r/QodeAssist) (AI-powered coding assistant plugin for Qt Creator)
- [Obsidian Quiz Generator plugin](https://github.com/ECuiDev/obsidian-quiz-generator)
- [AI Summmary Helper plugin](https://github.com/philffm/ai-summary-helper)
- [TextCraft](https://github.com/suncloudsmoon/TextCraft) (Copilot in Word alternative using Ollama)
- [Alfred Ollama](https://github.com/zeitlings/alfred-ollama) (Alfred Workflow)
- [TextLLaMA](https://github.com/adarshM84/TextLLaMA) A Chrome Extension that helps you write emails, correct grammar, and translate into any language
- [Simple-Discord-AI](https://github.com/zyphixor/simple-discord-ai)
- [LLM Telegram Bot](https://github.com/innightwolfsleep/llm_telegram_bot) (telegram bot, primary for RP. Oobabooga-like buttons, [A1111](https://github.com/AUTOMATIC1111/stable-diffusion-webui) API integration e.t.c)
- [mcp-llm](https://github.com/sammcj/mcp-llm) (MCP Server to allow LLMs to call other LLMs)
### Supported backends
- [llama.cpp](https://github.com/ggerganov/llama.cpp) project founded by Georgi Gerganov.
### Observability
- [Opik](https://www.comet.com/docs/opik/cookbook/ollama) is an open-source platform to debug, evaluate, and monitor your LLM applications, RAG systems, and agentic workflows with comprehensive tracing, automated evaluations, and production-ready dashboards. Opik supports native intergration to Ollama.
- [Lunary](https://lunary.ai/docs/integrations/ollama) is the leading open-source LLM observability platform. It provides a variety of enterprise-grade features such as real-time analytics, prompt templates management, PII masking, and comprehensive agent tracing.
- [OpenLIT](https://github.com/openlit/openlit) is an OpenTelemetry-native tool for monitoring Ollama Applications & GPUs using traces and metrics.
- [HoneyHive](https://docs.honeyhive.ai/integrations/ollama) is an AI observability and evaluation platform for AI agents. Use HoneyHive to evaluate agent performance, interrogate failures, and monitor quality in production.
- [Langfuse](https://langfuse.com/docs/integrations/ollama) is an open source LLM observability platform that enables teams to collaboratively monitor, evaluate and debug AI applications.
- [MLflow Tracing](https://mlflow.org/docs/latest/llms/tracing/index.html#automatic-tracing) is an open source LLM observability tool with a convenient API to log and visualize traces, making it easy to debug and evaluate GenAI applications.

25
SECURITY.md Normal file
View File

@@ -0,0 +1,25 @@
# Security
The Ollama maintainer team takes security seriously and will actively work to resolve security issues.
## Reporting a vulnerability
If you discover a security vulnerability, please do not open a public issue. Instead, please report it by emailing hello@ollama.com. We ask that you give us sufficient time to investigate and address the vulnerability before disclosing it publicly.
Please include the following details in your report:
- A description of the vulnerability
- Steps to reproduce the issue
- Your assessment of the potential impact
- Any possible mitigations
## Security best practices
While the maintainer team does their best to secure Ollama, users are encouraged to implement their own security best practices, such as:
- Regularly updating to the latest version of Ollama
- Securing access to hosted instances of Ollama
- Monitoring systems for unusual activity
## Contact
For any other questions or concerns related to security, please contact us at hello@ollama.com

View File

@@ -1,3 +1,16 @@
// Package api implements the client-side API for code wishing to interact
// with the ollama service. The methods of the [Client] type correspond to
// the ollama REST API as described in [the API documentation].
// The ollama command-line client itself uses this package to interact with
// the backend service.
//
// # Examples
//
// Several examples of using this package are available [in the GitHub
// repository].
//
// [the API documentation]: https://github.com/ollama/ollama/blob/main/docs/api.md
// [in the GitHub repository]: https://github.com/ollama/ollama/tree/main/api/examples
package api
import (
@@ -5,26 +18,23 @@ import (
"bytes"
"context"
"encoding/json"
"errors"
"fmt"
"io"
"net"
"net/http"
"net/url"
"os"
"runtime"
"strings"
"github.com/jmorganca/ollama/format"
"github.com/jmorganca/ollama/version"
"github.com/ollama/ollama/envconfig"
"github.com/ollama/ollama/format"
"github.com/ollama/ollama/version"
)
const DefaultHost = "127.0.0.1:11434"
var envHost = os.Getenv("OLLAMA_HOST")
// Client encapsulates client state for interacting with the ollama
// service. Use [ClientFromEnvironment] to create new Clients.
type Client struct {
base *url.URL
http http.Client
http *http.Client
}
func checkError(resp *http.Response, body []byte) error {
@@ -43,57 +53,46 @@ func checkError(resp *http.Response, body []byte) error {
return apiError
}
// ClientFromEnvironment creates a new [Client] using configuration from the
// environment variable OLLAMA_HOST, which points to the network host and
// port on which the ollama service is listening. The format of this variable
// is:
//
// <scheme>://<host>:<port>
//
// If the variable is not specified, a default ollama host and port will be
// used.
func ClientFromEnvironment() (*Client, error) {
scheme, hostport, ok := strings.Cut(os.Getenv("OLLAMA_HOST"), "://")
if !ok {
scheme, hostport = "http", os.Getenv("OLLAMA_HOST")
return &Client{
base: envconfig.Host(),
http: http.DefaultClient,
}, nil
}
host, port, err := net.SplitHostPort(hostport)
if err != nil {
host, port = "127.0.0.1", "11434"
if ip := net.ParseIP(strings.Trim(hostport, "[]")); ip != nil {
host = ip.String()
} else if hostport != "" {
host = hostport
func NewClient(base *url.URL, http *http.Client) *Client {
return &Client{
base: base,
http: http,
}
}
client := Client{
base: &url.URL{
Scheme: scheme,
Host: net.JoinHostPort(host, port),
},
}
mockRequest, err := http.NewRequest("HEAD", client.base.String(), nil)
if err != nil {
return nil, err
}
proxyURL, err := http.ProxyFromEnvironment(mockRequest)
if err != nil {
return nil, err
}
client.http = http.Client{
Transport: &http.Transport{
Proxy: http.ProxyURL(proxyURL),
},
}
return &client, nil
}
func (c *Client) do(ctx context.Context, method, path string, reqData, respData any) error {
var reqBody io.Reader
var data []byte
var err error
if reqData != nil {
switch reqData := reqData.(type) {
case io.Reader:
// reqData is already an io.Reader
reqBody = reqData
case nil:
// noop
default:
data, err = json.Marshal(reqData)
if err != nil {
return err
}
reqBody = bytes.NewReader(data)
}
@@ -133,7 +132,7 @@ func (c *Client) do(ctx context.Context, method, path string, reqData, respData
const maxBufferSize = 512 * format.KiloByte
func (c *Client) stream(ctx context.Context, method, path string, data any, fn func([]byte) error) error {
var buf *bytes.Buffer
var buf io.Reader
if data != nil {
bts, err := json.Marshal(data)
if err != nil {
@@ -174,7 +173,7 @@ func (c *Client) stream(ctx context.Context, method, path string, data any, fn f
}
if errorResponse.Error != "" {
return fmt.Errorf(errorResponse.Error)
return errors.New(errorResponse.Error)
}
if response.StatusCode >= http.StatusBadRequest {
@@ -193,8 +192,14 @@ func (c *Client) stream(ctx context.Context, method, path string, data any, fn f
return nil
}
// GenerateResponseFunc is a function that [Client.Generate] invokes every time
// a response is received from the service. If this function returns an error,
// [Client.Generate] will stop generating and return this error.
type GenerateResponseFunc func(GenerateResponse) error
// Generate generates a response for a given prompt. The req parameter should
// be populated with prompt details. fn is called for each response (there may
// be multiple responses, e.g. in case streaming is enabled).
func (c *Client) Generate(ctx context.Context, req *GenerateRequest, fn GenerateResponseFunc) error {
return c.stream(ctx, http.MethodPost, "/api/generate", req, func(bts []byte) error {
var resp GenerateResponse
@@ -206,8 +211,34 @@ func (c *Client) Generate(ctx context.Context, req *GenerateRequest, fn Generate
})
}
// ChatResponseFunc is a function that [Client.Chat] invokes every time
// a response is received from the service. If this function returns an error,
// [Client.Chat] will stop generating and return this error.
type ChatResponseFunc func(ChatResponse) error
// Chat generates the next message in a chat. [ChatRequest] may contain a
// sequence of messages which can be used to maintain chat history with a model.
// fn is called for each response (there may be multiple responses, e.g. if case
// streaming is enabled).
func (c *Client) Chat(ctx context.Context, req *ChatRequest, fn ChatResponseFunc) error {
return c.stream(ctx, http.MethodPost, "/api/chat", req, func(bts []byte) error {
var resp ChatResponse
if err := json.Unmarshal(bts, &resp); err != nil {
return err
}
return fn(resp)
})
}
// PullProgressFunc is a function that [Client.Pull] invokes every time there
// is progress with a "pull" request sent to the service. If this function
// returns an error, [Client.Pull] will stop the process and return this error.
type PullProgressFunc func(ProgressResponse) error
// Pull downloads a model from the ollama library. fn is called each time
// progress is made on the request and can be used to display a progress bar,
// etc.
func (c *Client) Pull(ctx context.Context, req *PullRequest, fn PullProgressFunc) error {
return c.stream(ctx, http.MethodPost, "/api/pull", req, func(bts []byte) error {
var resp ProgressResponse
@@ -219,8 +250,14 @@ func (c *Client) Pull(ctx context.Context, req *PullRequest, fn PullProgressFunc
})
}
// PushProgressFunc is a function that [Client.Push] invokes when progress is
// made.
// It's similar to other progress function types like [PullProgressFunc].
type PushProgressFunc func(ProgressResponse) error
// Push uploads a model to the model library; requires registering for ollama.ai
// and adding a public key first. fn is called each time progress is made on
// the request and can be used to display a progress bar, etc.
func (c *Client) Push(ctx context.Context, req *PushRequest, fn PushProgressFunc) error {
return c.stream(ctx, http.MethodPost, "/api/push", req, func(bts []byte) error {
var resp ProgressResponse
@@ -232,8 +269,15 @@ func (c *Client) Push(ctx context.Context, req *PushRequest, fn PushProgressFunc
})
}
// CreateProgressFunc is a function that [Client.Create] invokes when progress
// is made.
// It's similar to other progress function types like [PullProgressFunc].
type CreateProgressFunc func(ProgressResponse) error
// Create creates a model from a [Modelfile]. fn is a progress function that
// behaves similarly to other methods (see [Client.Pull]).
//
// [Modelfile]: https://github.com/ollama/ollama/blob/main/docs/modelfile.md
func (c *Client) Create(ctx context.Context, req *CreateRequest, fn CreateProgressFunc) error {
return c.stream(ctx, http.MethodPost, "/api/create", req, func(bts []byte) error {
var resp ProgressResponse
@@ -245,6 +289,7 @@ func (c *Client) Create(ctx context.Context, req *CreateRequest, fn CreateProgre
})
}
// List lists models that are available locally.
func (c *Client) List(ctx context.Context) (*ListResponse, error) {
var lr ListResponse
if err := c.do(ctx, http.MethodGet, "/api/tags", nil, &lr); err != nil {
@@ -253,6 +298,17 @@ func (c *Client) List(ctx context.Context) (*ListResponse, error) {
return &lr, nil
}
// ListRunning lists running models.
func (c *Client) ListRunning(ctx context.Context) (*ProcessResponse, error) {
var lr ProcessResponse
if err := c.do(ctx, http.MethodGet, "/api/ps", nil, &lr); err != nil {
return nil, err
}
return &lr, nil
}
// Copy copies a model - creating a model with another name from an existing
// model.
func (c *Client) Copy(ctx context.Context, req *CopyRequest) error {
if err := c.do(ctx, http.MethodPost, "/api/copy", req, nil); err != nil {
return err
@@ -260,6 +316,7 @@ func (c *Client) Copy(ctx context.Context, req *CopyRequest) error {
return nil
}
// Delete deletes a model and its data.
func (c *Client) Delete(ctx context.Context, req *DeleteRequest) error {
if err := c.do(ctx, http.MethodDelete, "/api/delete", req, nil); err != nil {
return err
@@ -267,6 +324,7 @@ func (c *Client) Delete(ctx context.Context, req *DeleteRequest) error {
return nil
}
// Show obtains model information, including details, modelfile, license etc.
func (c *Client) Show(ctx context.Context, req *ShowRequest) (*ShowResponse, error) {
var resp ShowResponse
if err := c.do(ctx, http.MethodPost, "/api/show", req, &resp); err != nil {
@@ -275,9 +333,48 @@ func (c *Client) Show(ctx context.Context, req *ShowRequest) (*ShowResponse, err
return &resp, nil
}
// Heartbeat checks if the server has started and is responsive; if yes, it
// returns nil, otherwise an error.
func (c *Client) Heartbeat(ctx context.Context) error {
if err := c.do(ctx, http.MethodHead, "/", nil, nil); err != nil {
return err
}
return nil
}
// Embed generates embeddings from a model.
func (c *Client) Embed(ctx context.Context, req *EmbedRequest) (*EmbedResponse, error) {
var resp EmbedResponse
if err := c.do(ctx, http.MethodPost, "/api/embed", req, &resp); err != nil {
return nil, err
}
return &resp, nil
}
// Embeddings generates an embedding from a model.
func (c *Client) Embeddings(ctx context.Context, req *EmbeddingRequest) (*EmbeddingResponse, error) {
var resp EmbeddingResponse
if err := c.do(ctx, http.MethodPost, "/api/embeddings", req, &resp); err != nil {
return nil, err
}
return &resp, nil
}
// CreateBlob creates a blob from a file on the server. digest is the
// expected SHA256 digest of the file, and r represents the file.
func (c *Client) CreateBlob(ctx context.Context, digest string, r io.Reader) error {
return c.do(ctx, http.MethodPost, fmt.Sprintf("/api/blobs/%s", digest), r, nil)
}
// Version returns the Ollama server version as a string.
func (c *Client) Version(ctx context.Context) (string, error) {
var version struct {
Version string `json:"version"`
}
if err := c.do(ctx, http.MethodGet, "/api/version", nil, &version); err != nil {
return "", err
}
return version.Version, nil
}

View File

@@ -1,225 +0,0 @@
import os
import json
import requests
BASE_URL = os.environ.get('OLLAMA_HOST', 'http://localhost:11434')
# Generate a response for a given prompt with a provided model. This is a streaming endpoint, so will be a series of responses.
# The final response object will include statistics and additional data from the request. Use the callback function to override
# the default handler.
def generate(model_name, prompt, system=None, template=None, context=None, options=None, callback=None):
try:
url = f"{BASE_URL}/api/generate"
payload = {
"model": model_name,
"prompt": prompt,
"system": system,
"template": template,
"context": context,
"options": options
}
# Remove keys with None values
payload = {k: v for k, v in payload.items() if v is not None}
with requests.post(url, json=payload, stream=True) as response:
response.raise_for_status()
# Creating a variable to hold the context history of the final chunk
final_context = None
# Variable to hold concatenated response strings if no callback is provided
full_response = ""
# Iterating over the response line by line and displaying the details
for line in response.iter_lines():
if line:
# Parsing each line (JSON chunk) and extracting the details
chunk = json.loads(line)
# If a callback function is provided, call it with the chunk
if callback:
callback(chunk)
else:
# If this is not the last chunk, add the "response" field value to full_response and print it
if not chunk.get("done"):
response_piece = chunk.get("response", "")
full_response += response_piece
print(response_piece, end="", flush=True)
# Check if it's the last chunk (done is true)
if chunk.get("done"):
final_context = chunk.get("context")
# Return the full response and the final context
return full_response, final_context
except requests.exceptions.RequestException as e:
print(f"An error occurred: {e}")
return None, None
# Create a model from a Modelfile. Use the callback function to override the default handler.
def create(model_name, model_path, callback=None):
try:
url = f"{BASE_URL}/api/create"
payload = {"name": model_name, "path": model_path}
# Making a POST request with the stream parameter set to True to handle streaming responses
with requests.post(url, json=payload, stream=True) as response:
response.raise_for_status()
# Iterating over the response line by line and displaying the status
for line in response.iter_lines():
if line:
# Parsing each line (JSON chunk) and extracting the status
chunk = json.loads(line)
if callback:
callback(chunk)
else:
print(f"Status: {chunk.get('status')}")
except requests.exceptions.RequestException as e:
print(f"An error occurred: {e}")
# Pull a model from a the model registry. Cancelled pulls are resumed from where they left off, and multiple
# calls to will share the same download progress. Use the callback function to override the default handler.
def pull(model_name, insecure=False, callback=None):
try:
url = f"{BASE_URL}/api/pull"
payload = {
"name": model_name,
"insecure": insecure
}
# Making a POST request with the stream parameter set to True to handle streaming responses
with requests.post(url, json=payload, stream=True) as response:
response.raise_for_status()
# Iterating over the response line by line and displaying the details
for line in response.iter_lines():
if line:
# Parsing each line (JSON chunk) and extracting the details
chunk = json.loads(line)
# If a callback function is provided, call it with the chunk
if callback:
callback(chunk)
else:
# Print the status message directly to the console
print(chunk.get('status', ''), end='', flush=True)
# If there's layer data, you might also want to print that (adjust as necessary)
if 'digest' in chunk:
print(f" - Digest: {chunk['digest']}", end='', flush=True)
print(f" - Total: {chunk['total']}", end='', flush=True)
print(f" - Completed: {chunk['completed']}", end='\n', flush=True)
else:
print()
except requests.exceptions.RequestException as e:
print(f"An error occurred: {e}")
# Push a model to the model registry. Use the callback function to override the default handler.
def push(model_name, insecure=False, callback=None):
try:
url = f"{BASE_URL}/api/push"
payload = {
"name": model_name,
"insecure": insecure
}
# Making a POST request with the stream parameter set to True to handle streaming responses
with requests.post(url, json=payload, stream=True) as response:
response.raise_for_status()
# Iterating over the response line by line and displaying the details
for line in response.iter_lines():
if line:
# Parsing each line (JSON chunk) and extracting the details
chunk = json.loads(line)
# If a callback function is provided, call it with the chunk
if callback:
callback(chunk)
else:
# Print the status message directly to the console
print(chunk.get('status', ''), end='', flush=True)
# If there's layer data, you might also want to print that (adjust as necessary)
if 'digest' in chunk:
print(f" - Digest: {chunk['digest']}", end='', flush=True)
print(f" - Total: {chunk['total']}", end='', flush=True)
print(f" - Completed: {chunk['completed']}", end='\n', flush=True)
else:
print()
except requests.exceptions.RequestException as e:
print(f"An error occurred: {e}")
# List models that are available locally.
def list():
try:
response = requests.get(f"{BASE_URL}/api/tags")
response.raise_for_status()
data = response.json()
models = data.get('models', [])
return models
except requests.exceptions.RequestException as e:
print(f"An error occurred: {e}")
return None
# Copy a model. Creates a model with another name from an existing model.
def copy(source, destination):
try:
# Create the JSON payload
payload = {
"source": source,
"destination": destination
}
response = requests.post(f"{BASE_URL}/api/copy", json=payload)
response.raise_for_status()
# If the request was successful, return a message indicating that the copy was successful
return "Copy successful"
except requests.exceptions.RequestException as e:
print(f"An error occurred: {e}")
return None
# Delete a model and its data.
def delete(model_name):
try:
url = f"{BASE_URL}/api/delete"
payload = {"name": model_name}
response = requests.delete(url, json=payload)
response.raise_for_status()
return "Delete successful"
except requests.exceptions.RequestException as e:
print(f"An error occurred: {e}")
return None
# Show info about a model.
def show(model_name):
try:
url = f"{BASE_URL}/api/show"
payload = {"name": model_name}
response = requests.post(url, json=payload)
response.raise_for_status()
# Parse the JSON response and return it
data = response.json()
return data
except requests.exceptions.RequestException as e:
print(f"An error occurred: {e}")
return None
def heartbeat():
try:
url = f"{BASE_URL}/"
response = requests.head(url)
response.raise_for_status()
return "Ollama is running"
except requests.exceptions.RequestException as e:
print(f"An error occurred: {e}")
return "Ollama is not running"

255
api/client_test.go Normal file
View File

@@ -0,0 +1,255 @@
package api
import (
"context"
"encoding/json"
"fmt"
"net/http"
"net/http/httptest"
"net/url"
"strings"
"testing"
)
func TestClientFromEnvironment(t *testing.T) {
type testCase struct {
value string
expect string
err error
}
testCases := map[string]*testCase{
"empty": {value: "", expect: "http://127.0.0.1:11434"},
"only address": {value: "1.2.3.4", expect: "http://1.2.3.4:11434"},
"only port": {value: ":1234", expect: "http://:1234"},
"address and port": {value: "1.2.3.4:1234", expect: "http://1.2.3.4:1234"},
"scheme http and address": {value: "http://1.2.3.4", expect: "http://1.2.3.4:80"},
"scheme https and address": {value: "https://1.2.3.4", expect: "https://1.2.3.4:443"},
"scheme, address, and port": {value: "https://1.2.3.4:1234", expect: "https://1.2.3.4:1234"},
"hostname": {value: "example.com", expect: "http://example.com:11434"},
"hostname and port": {value: "example.com:1234", expect: "http://example.com:1234"},
"scheme http and hostname": {value: "http://example.com", expect: "http://example.com:80"},
"scheme https and hostname": {value: "https://example.com", expect: "https://example.com:443"},
"scheme, hostname, and port": {value: "https://example.com:1234", expect: "https://example.com:1234"},
"trailing slash": {value: "example.com/", expect: "http://example.com:11434"},
"trailing slash port": {value: "example.com:1234/", expect: "http://example.com:1234"},
}
for k, v := range testCases {
t.Run(k, func(t *testing.T) {
t.Setenv("OLLAMA_HOST", v.value)
client, err := ClientFromEnvironment()
if err != v.err {
t.Fatalf("expected %s, got %s", v.err, err)
}
if client.base.String() != v.expect {
t.Fatalf("expected %s, got %s", v.expect, client.base.String())
}
})
}
}
// testError represents an internal error type with status code and message
// this is used since the error response from the server is not a standard error struct
type testError struct {
message string
statusCode int
}
func (e testError) Error() string {
return e.message
}
func TestClientStream(t *testing.T) {
testCases := []struct {
name string
responses []any
wantErr string
}{
{
name: "immediate error response",
responses: []any{
testError{
message: "test error message",
statusCode: http.StatusBadRequest,
},
},
wantErr: "test error message",
},
{
name: "error after successful chunks, ok response",
responses: []any{
ChatResponse{Message: Message{Content: "partial response 1"}},
ChatResponse{Message: Message{Content: "partial response 2"}},
testError{
message: "mid-stream error",
statusCode: http.StatusOK,
},
},
wantErr: "mid-stream error",
},
{
name: "successful stream completion",
responses: []any{
ChatResponse{Message: Message{Content: "chunk 1"}},
ChatResponse{Message: Message{Content: "chunk 2"}},
ChatResponse{
Message: Message{Content: "final chunk"},
Done: true,
DoneReason: "stop",
},
},
},
}
for _, tc := range testCases {
t.Run(tc.name, func(t *testing.T) {
ts := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
flusher, ok := w.(http.Flusher)
if !ok {
t.Fatal("expected http.Flusher")
}
w.Header().Set("Content-Type", "application/x-ndjson")
for _, resp := range tc.responses {
if errResp, ok := resp.(testError); ok {
w.WriteHeader(errResp.statusCode)
err := json.NewEncoder(w).Encode(map[string]string{
"error": errResp.message,
})
if err != nil {
t.Fatal("failed to encode error response:", err)
}
return
}
if err := json.NewEncoder(w).Encode(resp); err != nil {
t.Fatalf("failed to encode response: %v", err)
}
flusher.Flush()
}
}))
defer ts.Close()
client := NewClient(&url.URL{Scheme: "http", Host: ts.Listener.Addr().String()}, http.DefaultClient)
var receivedChunks []ChatResponse
err := client.stream(context.Background(), http.MethodPost, "/v1/chat", nil, func(chunk []byte) error {
var resp ChatResponse
if err := json.Unmarshal(chunk, &resp); err != nil {
return fmt.Errorf("failed to unmarshal chunk: %w", err)
}
receivedChunks = append(receivedChunks, resp)
return nil
})
if tc.wantErr != "" {
if err == nil {
t.Fatal("expected error but got nil")
}
if !strings.Contains(err.Error(), tc.wantErr) {
t.Errorf("expected error containing %q, got %v", tc.wantErr, err)
}
return
}
if err != nil {
t.Errorf("unexpected error: %v", err)
}
})
}
}
func TestClientDo(t *testing.T) {
testCases := []struct {
name string
response any
wantErr string
}{
{
name: "immediate error response",
response: testError{
message: "test error message",
statusCode: http.StatusBadRequest,
},
wantErr: "test error message",
},
{
name: "server error response",
response: testError{
message: "internal error",
statusCode: http.StatusInternalServerError,
},
wantErr: "internal error",
},
{
name: "successful response",
response: struct {
ID string `json:"id"`
Success bool `json:"success"`
}{
ID: "msg_123",
Success: true,
},
},
}
for _, tc := range testCases {
t.Run(tc.name, func(t *testing.T) {
ts := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
if errResp, ok := tc.response.(testError); ok {
w.WriteHeader(errResp.statusCode)
err := json.NewEncoder(w).Encode(map[string]string{
"error": errResp.message,
})
if err != nil {
t.Fatal("failed to encode error response:", err)
}
return
}
w.Header().Set("Content-Type", "application/json")
if err := json.NewEncoder(w).Encode(tc.response); err != nil {
t.Fatalf("failed to encode response: %v", err)
}
}))
defer ts.Close()
client := NewClient(&url.URL{Scheme: "http", Host: ts.Listener.Addr().String()}, http.DefaultClient)
var resp struct {
ID string `json:"id"`
Success bool `json:"success"`
}
err := client.do(context.Background(), http.MethodPost, "/v1/messages", nil, &resp)
if tc.wantErr != "" {
if err == nil {
t.Fatalf("got nil, want error %q", tc.wantErr)
}
if err.Error() != tc.wantErr {
t.Errorf("error message mismatch: got %q, want %q", err.Error(), tc.wantErr)
}
return
}
if err != nil {
t.Fatalf("got error %q, want nil", err)
}
if expectedResp, ok := tc.response.(struct {
ID string `json:"id"`
Success bool `json:"success"`
}); ok {
if resp.ID != expectedResp.ID {
t.Errorf("response ID mismatch: got %q, want %q", resp.ID, expectedResp.ID)
}
if resp.Success != expectedResp.Success {
t.Errorf("response Success mismatch: got %v, want %v", resp.Success, expectedResp.Success)
}
}
})
}
}

18
api/examples/README.md Normal file
View File

@@ -0,0 +1,18 @@
# Ollama API Examples
Run the examples in this directory with:
```shell
go run example_name/main.go
```
## Chat - Chat with a model
- [chat/main.go](chat/main.go)
## Generate - Generate text from a model
- [generate/main.go](generate/main.go)
- [generate-streaming/main.go](generate-streaming/main.go)
## Pull - Pull a model
- [pull-progress/main.go](pull-progress/main.go)

51
api/examples/chat/main.go Normal file
View File

@@ -0,0 +1,51 @@
package main
import (
"context"
"fmt"
"log"
"github.com/ollama/ollama/api"
)
func main() {
client, err := api.ClientFromEnvironment()
if err != nil {
log.Fatal(err)
}
messages := []api.Message{
api.Message{
Role: "system",
Content: "Provide very brief, concise responses",
},
api.Message{
Role: "user",
Content: "Name some unusual animals",
},
api.Message{
Role: "assistant",
Content: "Monotreme, platypus, echidna",
},
api.Message{
Role: "user",
Content: "which of these is the most dangerous?",
},
}
ctx := context.Background()
req := &api.ChatRequest{
Model: "llama3.2",
Messages: messages,
}
respFunc := func(resp api.ChatResponse) error {
fmt.Print(resp.Message.Content)
return nil
}
err = client.Chat(ctx, req, respFunc)
if err != nil {
log.Fatal(err)
}
}

View File

@@ -0,0 +1,40 @@
package main
import (
"context"
"fmt"
"log"
"github.com/ollama/ollama/api"
)
func main() {
client, err := api.ClientFromEnvironment()
if err != nil {
log.Fatal(err)
}
// By default, GenerateRequest is streaming.
req := &api.GenerateRequest{
Model: "gemma2",
Prompt: "how many planets are there?",
}
ctx := context.Background()
respFunc := func(resp api.GenerateResponse) error {
// Only print the response here; GenerateResponse has a number of other
// interesting fields you want to examine.
// In streaming mode, responses are partial so we call fmt.Print (and not
// Println) in order to avoid spurious newlines being introduced. The
// model will insert its own newlines if it wants.
fmt.Print(resp.Response)
return nil
}
err = client.Generate(ctx, req, respFunc)
if err != nil {
log.Fatal(err)
}
fmt.Println()
}

View File

@@ -0,0 +1,37 @@
package main
import (
"context"
"fmt"
"log"
"github.com/ollama/ollama/api"
)
func main() {
client, err := api.ClientFromEnvironment()
if err != nil {
log.Fatal(err)
}
req := &api.GenerateRequest{
Model: "gemma2",
Prompt: "how many planets are there?",
// set streaming to false
Stream: new(bool),
}
ctx := context.Background()
respFunc := func(resp api.GenerateResponse) error {
// Only print the response here; GenerateResponse has a number of other
// interesting fields you want to examine.
fmt.Println(resp.Response)
return nil
}
err = client.Generate(ctx, req, respFunc)
if err != nil {
log.Fatal(err)
}
}

View File

@@ -0,0 +1,47 @@
package main
import (
"context"
"fmt"
"log"
"os"
"github.com/ollama/ollama/api"
)
func main() {
if len(os.Args) <= 1 {
log.Fatal("usage: <image name>")
}
imgData, err := os.ReadFile(os.Args[1])
if err != nil {
log.Fatal(err)
}
client, err := api.ClientFromEnvironment()
if err != nil {
log.Fatal(err)
}
req := &api.GenerateRequest{
Model: "llava",
Prompt: "describe this image",
Images: []api.ImageData{imgData},
}
ctx := context.Background()
respFunc := func(resp api.GenerateResponse) error {
// In streaming mode, responses are partial so we call fmt.Print (and not
// Println) in order to avoid spurious newlines being introduced. The
// model will insert its own newlines if it wants.
fmt.Print(resp.Response)
return nil
}
err = client.Generate(ctx, req, respFunc)
if err != nil {
log.Fatal(err)
}
fmt.Println()
}

View File

@@ -0,0 +1,31 @@
package main
import (
"context"
"fmt"
"log"
"github.com/ollama/ollama/api"
)
func main() {
client, err := api.ClientFromEnvironment()
if err != nil {
log.Fatal(err)
}
ctx := context.Background()
req := &api.PullRequest{
Model: "mistral",
}
progressFunc := func(resp api.ProgressResponse) error {
fmt.Printf("Progress: status=%v, total=%v, completed=%v\n", resp.Status, resp.Total, resp.Completed)
return nil
}
err = client.Pull(ctx, req, progressFunc)
if err != nil {
log.Fatal(err)
}
}

View File

@@ -3,13 +3,19 @@ package api
import (
"encoding/json"
"fmt"
"log/slog"
"math"
"os"
"reflect"
"strconv"
"strings"
"time"
"github.com/ollama/ollama/envconfig"
"github.com/ollama/ollama/types/model"
)
// StatusError is an error with an HTTP status code and message.
type StatusError struct {
StatusCode int
Status string
@@ -30,101 +36,215 @@ func (e StatusError) Error() string {
}
}
// ImageData represents the raw binary data of an image file.
type ImageData []byte
// GenerateRequest describes a request sent by [Client.Generate]. While you
// have to specify the Model and Prompt fields, all the other fields have
// reasonable defaults for basic uses.
type GenerateRequest struct {
// Model is the model name; it should be a name familiar to Ollama from
// the library at https://ollama.com/library
Model string `json:"model"`
// Prompt is the textual prompt to send to the model.
Prompt string `json:"prompt"`
// Suffix is the text that comes after the inserted text.
Suffix string `json:"suffix"`
// System overrides the model's default system message/prompt.
System string `json:"system"`
// Template overrides the model's default prompt template.
Template string `json:"template"`
// Context is the context parameter returned from a previous call to
// [Client.Generate]. It can be used to keep a short conversational memory.
Context []int `json:"context,omitempty"`
// Stream specifies whether the response is streaming; it is true by default.
Stream *bool `json:"stream,omitempty"`
Options map[string]interface{} `json:"options"`
// Raw set to true means that no formatting will be applied to the prompt.
Raw bool `json:"raw,omitempty"`
// Format specifies the format to return a response in.
Format json.RawMessage `json:"format,omitempty"`
// KeepAlive controls how long the model will stay loaded in memory following
// this request.
KeepAlive *Duration `json:"keep_alive,omitempty"`
// Images is an optional list of raw image bytes accompanying this
// request, for multimodal models.
Images []ImageData `json:"images,omitempty"`
// Options lists model-specific options. For example, temperature can be
// set through this field, if the model supports it.
Options map[string]any `json:"options"`
}
type EmbeddingRequest struct {
// ChatRequest describes a request sent by [Client.Chat].
type ChatRequest struct {
// Model is the model name, as in [GenerateRequest].
Model string `json:"model"`
Prompt string `json:"prompt"`
Options map[string]interface{} `json:"options"`
}
// Messages is the messages of the chat - can be used to keep a chat memory.
Messages []Message `json:"messages"`
type EmbeddingResponse struct {
Embedding []float64 `json:"embedding"`
}
type CreateRequest struct {
Name string `json:"name"`
Path string `json:"path"`
// Stream enables streaming of returned responses; true by default.
Stream *bool `json:"stream,omitempty"`
// Format is the format to return the response in (e.g. "json").
Format json.RawMessage `json:"format,omitempty"`
// KeepAlive controls how long the model will stay loaded into memory
// following the request.
KeepAlive *Duration `json:"keep_alive,omitempty"`
// Tools is an optional list of tools the model has access to.
Tools `json:"tools,omitempty"`
// Options lists model-specific options.
Options map[string]any `json:"options"`
}
type DeleteRequest struct {
type Tools []Tool
func (t Tools) String() string {
bts, _ := json.Marshal(t)
return string(bts)
}
func (t Tool) String() string {
bts, _ := json.Marshal(t)
return string(bts)
}
// Message is a single message in a chat sequence. The message contains the
// role ("system", "user", or "assistant"), the content and an optional list
// of images.
type Message struct {
Role string `json:"role"`
Content string `json:"content"`
Images []ImageData `json:"images,omitempty"`
ToolCalls []ToolCall `json:"tool_calls,omitempty"`
}
func (m *Message) UnmarshalJSON(b []byte) error {
type Alias Message
var a Alias
if err := json.Unmarshal(b, &a); err != nil {
return err
}
*m = Message(a)
m.Role = strings.ToLower(m.Role)
return nil
}
type ToolCall struct {
Function ToolCallFunction `json:"function"`
}
type ToolCallFunction struct {
Index int `json:"index,omitempty"`
Name string `json:"name"`
Arguments ToolCallFunctionArguments `json:"arguments"`
}
type ShowRequest struct {
type ToolCallFunctionArguments map[string]any
func (t *ToolCallFunctionArguments) String() string {
bts, _ := json.Marshal(t)
return string(bts)
}
type Tool struct {
Type string `json:"type"`
Items any `json:"items,omitempty"`
Function ToolFunction `json:"function"`
}
// PropertyType can be either a string or an array of strings
type PropertyType []string
// UnmarshalJSON implements the json.Unmarshaler interface
func (pt *PropertyType) UnmarshalJSON(data []byte) error {
// Try to unmarshal as a string first
var s string
if err := json.Unmarshal(data, &s); err == nil {
*pt = []string{s}
return nil
}
// If that fails, try to unmarshal as an array of strings
var a []string
if err := json.Unmarshal(data, &a); err != nil {
return err
}
*pt = a
return nil
}
// MarshalJSON implements the json.Marshaler interface
func (pt PropertyType) MarshalJSON() ([]byte, error) {
if len(pt) == 1 {
// If there's only one type, marshal as a string
return json.Marshal(pt[0])
}
// Otherwise marshal as an array
return json.Marshal([]string(pt))
}
// String returns a string representation of the PropertyType
func (pt PropertyType) String() string {
if len(pt) == 0 {
return ""
}
if len(pt) == 1 {
return pt[0]
}
return fmt.Sprintf("%v", []string(pt))
}
type ToolFunction struct {
Name string `json:"name"`
Description string `json:"description"`
Parameters struct {
Type string `json:"type"`
Defs any `json:"$defs,omitempty"`
Items any `json:"items,omitempty"`
Required []string `json:"required"`
Properties map[string]struct {
Type PropertyType `json:"type"`
Items any `json:"items,omitempty"`
Description string `json:"description"`
Enum []any `json:"enum,omitempty"`
} `json:"properties"`
} `json:"parameters"`
}
type ShowResponse struct {
License string `json:"license,omitempty"`
Modelfile string `json:"modelfile,omitempty"`
Parameters string `json:"parameters,omitempty"`
Template string `json:"template,omitempty"`
System string `json:"system,omitempty"`
func (t *ToolFunction) String() string {
bts, _ := json.Marshal(t)
return string(bts)
}
type CopyRequest struct {
Source string `json:"source"`
Destination string `json:"destination"`
}
type PullRequest struct {
Name string `json:"name"`
Insecure bool `json:"insecure,omitempty"`
Username string `json:"username"`
Password string `json:"password"`
Stream *bool `json:"stream,omitempty"`
}
type ProgressResponse struct {
Status string `json:"status"`
Digest string `json:"digest,omitempty"`
Total int64 `json:"total,omitempty"`
Completed int64 `json:"completed,omitempty"`
}
type PushRequest struct {
Name string `json:"name"`
Insecure bool `json:"insecure,omitempty"`
Username string `json:"username"`
Password string `json:"password"`
Stream *bool `json:"stream,omitempty"`
}
type ListResponse struct {
Models []ModelResponse `json:"models"`
}
type ModelResponse struct {
Name string `json:"name"`
ModifiedAt time.Time `json:"modified_at"`
Size int64 `json:"size"`
Digest string `json:"digest"`
}
type TokenResponse struct {
Token string `json:"token"`
}
type GenerateResponse struct {
// ChatResponse is the response returned by [Client.Chat]. Its fields are
// similar to [GenerateResponse].
type ChatResponse struct {
Model string `json:"model"`
CreatedAt time.Time `json:"created_at"`
Response string `json:"response"`
Message Message `json:"message"`
DoneReason string `json:"done_reason,omitempty"`
Done bool `json:"done"`
Context []int `json:"context,omitempty"`
Metrics
}
type Metrics struct {
TotalDuration time.Duration `json:"total_duration,omitempty"`
LoadDuration time.Duration `json:"load_duration,omitempty"`
PromptEvalCount int `json:"prompt_eval_count,omitempty"`
@@ -133,54 +253,8 @@ type GenerateResponse struct {
EvalDuration time.Duration `json:"eval_duration,omitempty"`
}
func (r *GenerateResponse) Summary() {
if r.TotalDuration > 0 {
fmt.Fprintf(os.Stderr, "total duration: %v\n", r.TotalDuration)
}
if r.LoadDuration > 0 {
fmt.Fprintf(os.Stderr, "load duration: %v\n", r.LoadDuration)
}
if r.PromptEvalCount > 0 {
fmt.Fprintf(os.Stderr, "prompt eval count: %d token(s)\n", r.PromptEvalCount)
}
if r.PromptEvalDuration > 0 {
fmt.Fprintf(os.Stderr, "prompt eval duration: %s\n", r.PromptEvalDuration)
fmt.Fprintf(os.Stderr, "prompt eval rate: %.2f tokens/s\n", float64(r.PromptEvalCount)/r.PromptEvalDuration.Seconds())
}
if r.EvalCount > 0 {
fmt.Fprintf(os.Stderr, "eval count: %d token(s)\n", r.EvalCount)
}
if r.EvalDuration > 0 {
fmt.Fprintf(os.Stderr, "eval duration: %s\n", r.EvalDuration)
fmt.Fprintf(os.Stderr, "eval rate: %.2f tokens/s\n", float64(r.EvalCount)/r.EvalDuration.Seconds())
}
}
// Runner options which must be set when the model is loaded into memory
type Runner struct {
UseNUMA bool `json:"numa,omitempty"`
NumCtx int `json:"num_ctx,omitempty"`
NumBatch int `json:"num_batch,omitempty"`
NumGQA int `json:"num_gqa,omitempty"`
NumGPU int `json:"num_gpu,omitempty"`
MainGPU int `json:"main_gpu,omitempty"`
LowVRAM bool `json:"low_vram,omitempty"`
F16KV bool `json:"f16_kv,omitempty"`
LogitsAll bool `json:"logits_all,omitempty"`
VocabOnly bool `json:"vocab_only,omitempty"`
UseMMap bool `json:"use_mmap,omitempty"`
UseMLock bool `json:"use_mlock,omitempty"`
EmbeddingOnly bool `json:"embedding_only,omitempty"`
RopeFrequencyBase float32 `json:"rope_frequency_base,omitempty"`
RopeFrequencyScale float32 `json:"rope_frequency_scale,omitempty"`
NumThread int `json:"num_thread,omitempty"`
}
// Options specified in [GenerateRequest]. If you add a new option here, also
// add it to the API docs.
type Options struct {
Runner
@@ -190,7 +264,7 @@ type Options struct {
NumPredict int `json:"num_predict,omitempty"`
TopK int `json:"top_k,omitempty"`
TopP float32 `json:"top_p,omitempty"`
TFSZ float32 `json:"tfs_z,omitempty"`
MinP float32 `json:"min_p,omitempty"`
TypicalP float32 `json:"typical_p,omitempty"`
RepeatLastN int `json:"repeat_last_n,omitempty"`
Temperature float32 `json:"temperature,omitempty"`
@@ -200,13 +274,284 @@ type Options struct {
Mirostat int `json:"mirostat,omitempty"`
MirostatTau float32 `json:"mirostat_tau,omitempty"`
MirostatEta float32 `json:"mirostat_eta,omitempty"`
PenalizeNewline bool `json:"penalize_newline,omitempty"`
Stop []string `json:"stop,omitempty"`
}
var ErrInvalidOpts = fmt.Errorf("invalid options")
// Runner options which must be set when the model is loaded into memory
type Runner struct {
NumCtx int `json:"num_ctx,omitempty"`
NumBatch int `json:"num_batch,omitempty"`
NumGPU int `json:"num_gpu,omitempty"`
MainGPU int `json:"main_gpu,omitempty"`
LowVRAM bool `json:"low_vram,omitempty"`
F16KV bool `json:"f16_kv,omitempty"` // Deprecated: This option is ignored
LogitsAll bool `json:"logits_all,omitempty"`
VocabOnly bool `json:"vocab_only,omitempty"`
UseMMap *bool `json:"use_mmap,omitempty"`
UseMLock bool `json:"use_mlock,omitempty"`
NumThread int `json:"num_thread,omitempty"`
}
func (opts *Options) FromMap(m map[string]interface{}) error {
// EmbedRequest is the request passed to [Client.Embed].
type EmbedRequest struct {
// Model is the model name.
Model string `json:"model"`
// Input is the input to embed.
Input any `json:"input"`
// KeepAlive controls how long the model will stay loaded in memory following
// this request.
KeepAlive *Duration `json:"keep_alive,omitempty"`
Truncate *bool `json:"truncate,omitempty"`
// Options lists model-specific options.
Options map[string]any `json:"options"`
}
// EmbedResponse is the response from [Client.Embed].
type EmbedResponse struct {
Model string `json:"model"`
Embeddings [][]float32 `json:"embeddings"`
TotalDuration time.Duration `json:"total_duration,omitempty"`
LoadDuration time.Duration `json:"load_duration,omitempty"`
PromptEvalCount int `json:"prompt_eval_count,omitempty"`
}
// EmbeddingRequest is the request passed to [Client.Embeddings].
type EmbeddingRequest struct {
// Model is the model name.
Model string `json:"model"`
// Prompt is the textual prompt to embed.
Prompt string `json:"prompt"`
// KeepAlive controls how long the model will stay loaded in memory following
// this request.
KeepAlive *Duration `json:"keep_alive,omitempty"`
// Options lists model-specific options.
Options map[string]any `json:"options"`
}
// EmbeddingResponse is the response from [Client.Embeddings].
type EmbeddingResponse struct {
Embedding []float64 `json:"embedding"`
}
// CreateRequest is the request passed to [Client.Create].
type CreateRequest struct {
Model string `json:"model"`
Stream *bool `json:"stream,omitempty"`
Quantize string `json:"quantize,omitempty"`
From string `json:"from,omitempty"`
Files map[string]string `json:"files,omitempty"`
Adapters map[string]string `json:"adapters,omitempty"`
Template string `json:"template,omitempty"`
License any `json:"license,omitempty"`
System string `json:"system,omitempty"`
Parameters map[string]any `json:"parameters,omitempty"`
Messages []Message `json:"messages,omitempty"`
// Deprecated: set the model name with Model instead
Name string `json:"name"`
// Deprecated: use Quantize instead
Quantization string `json:"quantization,omitempty"`
}
// DeleteRequest is the request passed to [Client.Delete].
type DeleteRequest struct {
Model string `json:"model"`
// Deprecated: set the model name with Model instead
Name string `json:"name"`
}
// ShowRequest is the request passed to [Client.Show].
type ShowRequest struct {
Model string `json:"model"`
System string `json:"system"`
// Template is deprecated
Template string `json:"template"`
Verbose bool `json:"verbose"`
Options map[string]any `json:"options"`
// Deprecated: set the model name with Model instead
Name string `json:"name"`
}
// ShowResponse is the response returned from [Client.Show].
type ShowResponse struct {
License string `json:"license,omitempty"`
Modelfile string `json:"modelfile,omitempty"`
Parameters string `json:"parameters,omitempty"`
Template string `json:"template,omitempty"`
System string `json:"system,omitempty"`
Details ModelDetails `json:"details,omitempty"`
Messages []Message `json:"messages,omitempty"`
ModelInfo map[string]any `json:"model_info,omitempty"`
ProjectorInfo map[string]any `json:"projector_info,omitempty"`
Tensors []Tensor `json:"tensors,omitempty"`
Capabilities []model.Capability `json:"capabilities,omitempty"`
ModifiedAt time.Time `json:"modified_at,omitempty"`
}
// CopyRequest is the request passed to [Client.Copy].
type CopyRequest struct {
Source string `json:"source"`
Destination string `json:"destination"`
}
// PullRequest is the request passed to [Client.Pull].
type PullRequest struct {
Model string `json:"model"`
Insecure bool `json:"insecure,omitempty"` // Deprecated: ignored
Username string `json:"username"` // Deprecated: ignored
Password string `json:"password"` // Deprecated: ignored
Stream *bool `json:"stream,omitempty"`
// Deprecated: set the model name with Model instead
Name string `json:"name"`
}
// ProgressResponse is the response passed to progress functions like
// [PullProgressFunc] and [PushProgressFunc].
type ProgressResponse struct {
Status string `json:"status"`
Digest string `json:"digest,omitempty"`
Total int64 `json:"total,omitempty"`
Completed int64 `json:"completed,omitempty"`
}
// PushRequest is the request passed to [Client.Push].
type PushRequest struct {
Model string `json:"model"`
Insecure bool `json:"insecure,omitempty"`
Username string `json:"username"`
Password string `json:"password"`
Stream *bool `json:"stream,omitempty"`
// Deprecated: set the model name with Model instead
Name string `json:"name"`
}
// ListResponse is the response from [Client.List].
type ListResponse struct {
Models []ListModelResponse `json:"models"`
}
// ProcessResponse is the response from [Client.Process].
type ProcessResponse struct {
Models []ProcessModelResponse `json:"models"`
}
// ListModelResponse is a single model description in [ListResponse].
type ListModelResponse struct {
Name string `json:"name"`
Model string `json:"model"`
ModifiedAt time.Time `json:"modified_at"`
Size int64 `json:"size"`
Digest string `json:"digest"`
Details ModelDetails `json:"details,omitempty"`
}
// ProcessModelResponse is a single model description in [ProcessResponse].
type ProcessModelResponse struct {
Name string `json:"name"`
Model string `json:"model"`
Size int64 `json:"size"`
Digest string `json:"digest"`
Details ModelDetails `json:"details,omitempty"`
ExpiresAt time.Time `json:"expires_at"`
SizeVRAM int64 `json:"size_vram"`
}
type RetrieveModelResponse struct {
Id string `json:"id"`
Object string `json:"object"`
Created int64 `json:"created"`
OwnedBy string `json:"owned_by"`
}
type TokenResponse struct {
Token string `json:"token"`
}
// GenerateResponse is the response passed into [GenerateResponseFunc].
type GenerateResponse struct {
// Model is the model name that generated the response.
Model string `json:"model"`
// CreatedAt is the timestamp of the response.
CreatedAt time.Time `json:"created_at"`
// Response is the textual response itself.
Response string `json:"response"`
// Done specifies if the response is complete.
Done bool `json:"done"`
// DoneReason is the reason the model stopped generating text.
DoneReason string `json:"done_reason,omitempty"`
// Context is an encoding of the conversation used in this response; this
// can be sent in the next request to keep a conversational memory.
Context []int `json:"context,omitempty"`
Metrics
}
// ModelDetails provides details about a model.
type ModelDetails struct {
ParentModel string `json:"parent_model"`
Format string `json:"format"`
Family string `json:"family"`
Families []string `json:"families"`
ParameterSize string `json:"parameter_size"`
QuantizationLevel string `json:"quantization_level"`
}
// Tensor describes the metadata for a given tensor.
type Tensor struct {
Name string `json:"name"`
Type string `json:"type"`
Shape []uint64 `json:"shape"`
}
func (m *Metrics) Summary() {
if m.TotalDuration > 0 {
fmt.Fprintf(os.Stderr, "total duration: %v\n", m.TotalDuration)
}
if m.LoadDuration > 0 {
fmt.Fprintf(os.Stderr, "load duration: %v\n", m.LoadDuration)
}
if m.PromptEvalCount > 0 {
fmt.Fprintf(os.Stderr, "prompt eval count: %d token(s)\n", m.PromptEvalCount)
}
if m.PromptEvalDuration > 0 {
fmt.Fprintf(os.Stderr, "prompt eval duration: %s\n", m.PromptEvalDuration)
fmt.Fprintf(os.Stderr, "prompt eval rate: %.2f tokens/s\n", float64(m.PromptEvalCount)/m.PromptEvalDuration.Seconds())
}
if m.EvalCount > 0 {
fmt.Fprintf(os.Stderr, "eval count: %d token(s)\n", m.EvalCount)
}
if m.EvalDuration > 0 {
fmt.Fprintf(os.Stderr, "eval duration: %s\n", m.EvalDuration)
fmt.Fprintf(os.Stderr, "eval rate: %.2f tokens/s\n", float64(m.EvalCount)/m.EvalDuration.Seconds())
}
}
func (opts *Options) FromMap(m map[string]any) error {
valueOpts := reflect.ValueOf(opts).Elem() // names of the fields in the options struct
typeOpts := reflect.TypeOf(opts).Elem() // types of the fields in the options struct
@@ -219,9 +564,13 @@ func (opts *Options) FromMap(m map[string]interface{}) error {
}
}
invalidOpts := []string{}
for key, val := range m {
if opt, ok := jsonOpts[key]; ok {
opt, ok := jsonOpts[key]
if !ok {
slog.Warn("invalid option provided", "option", key)
continue
}
field := valueOpts.FieldByName(opt.Name)
if field.IsValid() && field.CanSet() {
if val == nil {
@@ -259,12 +608,12 @@ func (opts *Options) FromMap(m map[string]interface{}) error {
}
field.SetString(val)
case reflect.Slice:
// JSON unmarshals to []interface{}, not []string
val, ok := val.([]interface{})
// JSON unmarshals to []any, not []string
val, ok := val.([]any)
if !ok {
return fmt.Errorf("option %q must be of type array", key)
}
// convert []interface{} to []string
// convert []any to []string
slice := make([]string, len(val))
for i, item := range val {
str, ok := item.(string)
@@ -274,30 +623,38 @@ func (opts *Options) FromMap(m map[string]interface{}) error {
slice[i] = str
}
field.Set(reflect.ValueOf(slice))
case reflect.Pointer:
var b bool
if field.Type() == reflect.TypeOf(&b) {
val, ok := val.(bool)
if !ok {
return fmt.Errorf("option %q must be of type boolean", key)
}
field.Set(reflect.ValueOf(&val))
} else {
return fmt.Errorf("unknown type loading config params: %v %v", field.Kind(), field.Type())
}
default:
return fmt.Errorf("unknown type loading config params: %v", field.Kind())
}
}
} else {
invalidOpts = append(invalidOpts, key)
}
}
if len(invalidOpts) > 0 {
return fmt.Errorf("%w: %v", ErrInvalidOpts, strings.Join(invalidOpts, ", "))
}
return nil
}
// DefaultOptions is the default set of options for [GenerateRequest]; these
// values are used unless the user specifies other values explicitly.
func DefaultOptions() Options {
return Options{
// options set on request to runner
NumPredict: -1,
NumKeep: -1,
// set a minimal num_keep to avoid issues on context shifts
NumKeep: 4,
Temperature: 0.8,
TopK: 40,
TopP: 0.9,
TFSZ: 1.0,
TypicalP: 1.0,
RepeatLastN: 64,
RepeatPenalty: 1.1,
@@ -306,24 +663,17 @@ func DefaultOptions() Options {
Mirostat: 0,
MirostatTau: 5.0,
MirostatEta: 0.1,
PenalizeNewline: true,
Seed: -1,
Runner: Runner{
// options set when the model is loaded
NumCtx: 2048,
RopeFrequencyBase: 10000.0,
RopeFrequencyScale: 1.0,
NumCtx: int(envconfig.ContextLength()),
NumBatch: 512,
NumGPU: -1, // -1 here indicates that NumGPU should be set dynamically
NumGQA: 1,
NumThread: 0, // let the runtime decide
LowVRAM: false,
F16KV: true,
UseMLock: false,
UseMMap: true,
UseNUMA: false,
EmbeddingOnly: true,
UseMMap: nil,
},
}
}
@@ -332,6 +682,13 @@ type Duration struct {
time.Duration
}
func (d Duration) MarshalJSON() ([]byte, error) {
if d.Duration < 0 {
return []byte("-1"), nil
}
return []byte("\"" + d.Duration.String() + "\""), nil
}
func (d *Duration) UnmarshalJSON(b []byte) (err error) {
var v any
if err := json.Unmarshal(b, &v); err != nil {
@@ -343,16 +700,92 @@ func (d *Duration) UnmarshalJSON(b []byte) (err error) {
switch t := v.(type) {
case float64:
if t < 0 {
t = math.MaxFloat64
d.Duration = time.Duration(math.MaxInt64)
} else {
d.Duration = time.Duration(int(t) * int(time.Second))
}
d.Duration = time.Duration(t)
case string:
d.Duration, err = time.ParseDuration(t)
if err != nil {
return err
}
if d.Duration < 0 {
d.Duration = time.Duration(math.MaxInt64)
}
default:
return fmt.Errorf("Unsupported type: '%s'", reflect.TypeOf(v))
}
return nil
}
// FormatParams converts specified parameter options to their correct types
func FormatParams(params map[string][]string) (map[string]any, error) {
opts := Options{}
valueOpts := reflect.ValueOf(&opts).Elem() // names of the fields in the options struct
typeOpts := reflect.TypeOf(opts) // types of the fields in the options struct
// build map of json struct tags to their types
jsonOpts := make(map[string]reflect.StructField)
for _, field := range reflect.VisibleFields(typeOpts) {
jsonTag := strings.Split(field.Tag.Get("json"), ",")[0]
if jsonTag != "" {
jsonOpts[jsonTag] = field
}
}
out := make(map[string]any)
// iterate params and set values based on json struct tags
for key, vals := range params {
if opt, ok := jsonOpts[key]; !ok {
return nil, fmt.Errorf("unknown parameter '%s'", key)
} else {
field := valueOpts.FieldByName(opt.Name)
if field.IsValid() && field.CanSet() {
switch field.Kind() {
case reflect.Float32:
floatVal, err := strconv.ParseFloat(vals[0], 32)
if err != nil {
return nil, fmt.Errorf("invalid float value %s", vals)
}
out[key] = float32(floatVal)
case reflect.Int:
intVal, err := strconv.ParseInt(vals[0], 10, 64)
if err != nil {
return nil, fmt.Errorf("invalid int value %s", vals)
}
out[key] = intVal
case reflect.Bool:
boolVal, err := strconv.ParseBool(vals[0])
if err != nil {
return nil, fmt.Errorf("invalid bool value %s", vals)
}
out[key] = boolVal
case reflect.String:
out[key] = vals[0]
case reflect.Slice:
// TODO: only string slices are supported right now
out[key] = vals
case reflect.Pointer:
var b bool
if field.Type() == reflect.TypeOf(&b) {
boolVal, err := strconv.ParseBool(vals[0])
if err != nil {
return nil, fmt.Errorf("invalid bool value %s", vals)
}
out[key] = &boolVal
} else {
return nil, fmt.Errorf("unknown type %s for %s", field.Kind(), key)
}
default:
return nil, fmt.Errorf("unknown type %s for %s", field.Kind(), key)
}
}
}
}
return out, nil
}

374
api/types_test.go Normal file
View File

@@ -0,0 +1,374 @@
package api
import (
"encoding/json"
"errors"
"math"
"testing"
"time"
"github.com/stretchr/testify/assert"
"github.com/stretchr/testify/require"
)
func TestKeepAliveParsingFromJSON(t *testing.T) {
tests := []struct {
name string
req string
exp *Duration
}{
{
name: "Positive Integer",
req: `{ "keep_alive": 42 }`,
exp: &Duration{42 * time.Second},
},
{
name: "Positive Float",
req: `{ "keep_alive": 42.5 }`,
exp: &Duration{42 * time.Second},
},
{
name: "Positive Integer String",
req: `{ "keep_alive": "42m" }`,
exp: &Duration{42 * time.Minute},
},
{
name: "Negative Integer",
req: `{ "keep_alive": -1 }`,
exp: &Duration{math.MaxInt64},
},
{
name: "Negative Float",
req: `{ "keep_alive": -3.14 }`,
exp: &Duration{math.MaxInt64},
},
{
name: "Negative Integer String",
req: `{ "keep_alive": "-1m" }`,
exp: &Duration{math.MaxInt64},
},
}
for _, test := range tests {
t.Run(test.name, func(t *testing.T) {
var dec ChatRequest
err := json.Unmarshal([]byte(test.req), &dec)
require.NoError(t, err)
assert.Equal(t, test.exp, dec.KeepAlive)
})
}
}
func TestDurationMarshalUnmarshal(t *testing.T) {
tests := []struct {
name string
input time.Duration
expected time.Duration
}{
{
"negative duration",
time.Duration(-1),
time.Duration(math.MaxInt64),
},
{
"positive duration",
42 * time.Second,
42 * time.Second,
},
{
"another positive duration",
42 * time.Minute,
42 * time.Minute,
},
{
"zero duration",
time.Duration(0),
time.Duration(0),
},
{
"max duration",
time.Duration(math.MaxInt64),
time.Duration(math.MaxInt64),
},
}
for _, test := range tests {
t.Run(test.name, func(t *testing.T) {
b, err := json.Marshal(Duration{test.input})
require.NoError(t, err)
var d Duration
err = json.Unmarshal(b, &d)
require.NoError(t, err)
assert.Equal(t, test.expected, d.Duration, "input %v, marshalled %v, got %v", test.input, string(b), d.Duration)
})
}
}
func TestUseMmapParsingFromJSON(t *testing.T) {
tr := true
fa := false
tests := []struct {
name string
req string
exp *bool
}{
{
name: "Undefined",
req: `{ }`,
exp: nil,
},
{
name: "True",
req: `{ "use_mmap": true }`,
exp: &tr,
},
{
name: "False",
req: `{ "use_mmap": false }`,
exp: &fa,
},
}
for _, test := range tests {
t.Run(test.name, func(t *testing.T) {
var oMap map[string]any
err := json.Unmarshal([]byte(test.req), &oMap)
require.NoError(t, err)
opts := DefaultOptions()
err = opts.FromMap(oMap)
require.NoError(t, err)
assert.Equal(t, test.exp, opts.UseMMap)
})
}
}
func TestUseMmapFormatParams(t *testing.T) {
tr := true
fa := false
tests := []struct {
name string
req map[string][]string
exp *bool
err error
}{
{
name: "True",
req: map[string][]string{
"use_mmap": {"true"},
},
exp: &tr,
err: nil,
},
{
name: "False",
req: map[string][]string{
"use_mmap": {"false"},
},
exp: &fa,
err: nil,
},
{
name: "Numeric True",
req: map[string][]string{
"use_mmap": {"1"},
},
exp: &tr,
err: nil,
},
{
name: "Numeric False",
req: map[string][]string{
"use_mmap": {"0"},
},
exp: &fa,
err: nil,
},
{
name: "invalid string",
req: map[string][]string{
"use_mmap": {"foo"},
},
exp: nil,
err: errors.New("invalid bool value [foo]"),
},
}
for _, test := range tests {
t.Run(test.name, func(t *testing.T) {
resp, err := FormatParams(test.req)
require.Equal(t, test.err, err)
respVal, ok := resp["use_mmap"]
if test.exp != nil {
assert.True(t, ok, "resp: %v", resp)
assert.Equal(t, *test.exp, *respVal.(*bool))
}
})
}
}
func TestMessage_UnmarshalJSON(t *testing.T) {
tests := []struct {
input string
expected string
}{
{`{"role": "USER", "content": "Hello!"}`, "user"},
{`{"role": "System", "content": "Initialization complete."}`, "system"},
{`{"role": "assistant", "content": "How can I help you?"}`, "assistant"},
{`{"role": "TOOl", "content": "Access granted."}`, "tool"},
}
for _, test := range tests {
var msg Message
if err := json.Unmarshal([]byte(test.input), &msg); err != nil {
t.Errorf("Unexpected error: %v", err)
}
if msg.Role != test.expected {
t.Errorf("role not lowercased: got %v, expected %v", msg.Role, test.expected)
}
}
}
func TestToolFunction_UnmarshalJSON(t *testing.T) {
tests := []struct {
name string
input string
wantErr string
}{
{
name: "valid enum with same types",
input: `{
"name": "test",
"description": "test function",
"parameters": {
"type": "object",
"required": ["test"],
"properties": {
"test": {
"type": "string",
"description": "test prop",
"enum": ["a", "b", "c"]
}
}
}
}`,
wantErr: "",
},
{
name: "empty enum array",
input: `{
"name": "test",
"description": "test function",
"parameters": {
"type": "object",
"required": ["test"],
"properties": {
"test": {
"type": "string",
"description": "test prop",
"enum": []
}
}
}
}`,
wantErr: "",
},
}
for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
var tf ToolFunction
err := json.Unmarshal([]byte(tt.input), &tf)
if tt.wantErr != "" {
require.Error(t, err)
assert.Contains(t, err.Error(), tt.wantErr)
} else {
require.NoError(t, err)
}
})
}
}
func TestPropertyType_UnmarshalJSON(t *testing.T) {
tests := []struct {
name string
input string
expected PropertyType
}{
{
name: "string type",
input: `"string"`,
expected: PropertyType{"string"},
},
{
name: "array of types",
input: `["string", "number"]`,
expected: PropertyType{"string", "number"},
},
{
name: "array with single type",
input: `["string"]`,
expected: PropertyType{"string"},
},
}
for _, test := range tests {
t.Run(test.name, func(t *testing.T) {
var pt PropertyType
if err := json.Unmarshal([]byte(test.input), &pt); err != nil {
t.Errorf("Unexpected error: %v", err)
}
if len(pt) != len(test.expected) {
t.Errorf("Length mismatch: got %v, expected %v", len(pt), len(test.expected))
}
for i, v := range pt {
if v != test.expected[i] {
t.Errorf("Value mismatch at index %d: got %v, expected %v", i, v, test.expected[i])
}
}
})
}
}
func TestPropertyType_MarshalJSON(t *testing.T) {
tests := []struct {
name string
input PropertyType
expected string
}{
{
name: "single type",
input: PropertyType{"string"},
expected: `"string"`,
},
{
name: "multiple types",
input: PropertyType{"string", "number"},
expected: `["string","number"]`,
},
{
name: "empty type",
input: PropertyType{},
expected: `[]`,
},
}
for _, test := range tests {
t.Run(test.name, func(t *testing.T) {
data, err := json.Marshal(test.input)
if err != nil {
t.Errorf("Unexpected error: %v", err)
}
if string(data) != test.expected {
t.Errorf("Marshaled data mismatch: got %v, expected %v", string(data), test.expected)
}
})
}
}

93
app/.gitignore vendored
View File

@@ -1,92 +1 @@
# Logs
logs
*.log
npm-debug.log*
yarn-debug.log*
yarn-error.log*
lerna-debug.log*
# Diagnostic reports (https://nodejs.org/api/report.html)
report.[0-9]*.[0-9]*.[0-9]*.[0-9]*.json
# Runtime data
pids
*.pid
*.seed
*.pid.lock
.DS_Store
# Directory for instrumented libs generated by jscoverage/JSCover
lib-cov
# Coverage directory used by tools like istanbul
coverage
*.lcov
# nyc test coverage
.nyc_output
# node-waf configuration
.lock-wscript
# Compiled binary addons (https://nodejs.org/api/addons.html)
build/Release
# Dependency directories
node_modules/
jspm_packages/
# TypeScript v1 declaration files
typings/
# TypeScript cache
*.tsbuildinfo
# Optional npm cache directory
.npm
# Optional eslint cache
.eslintcache
# Optional REPL history
.node_repl_history
# Output of 'npm pack'
*.tgz
# Yarn Integrity file
.yarn-integrity
# dotenv environment variables file
.env
.env.test
# parcel-bundler cache (https://parceljs.org/)
.cache
# next.js build output
.next
# nuxt.js build output
.nuxt
# vuepress build output
.vuepress/dist
# Serverless directories
.serverless/
# FuseBox cache
.fusebox/
# DynamoDB Local files
.dynamodb/
# Webpack
.webpack/
# Vite
.vite/
# Electron-Forge
out/
ollama.syso

View File

@@ -1,21 +1,22 @@
# Desktop
# Ollama App
This app builds upon Ollama to provide a desktop experience for running models.
## Linux
## Developing
TODO
First, build the `ollama` binary:
## MacOS
TODO
## Windows
If you want to build the installer, youll need to install
- https://jrsoftware.org/isinfo.php
In the top directory of this repo, run the following powershell script
to build the ollama CLI, ollama app, and ollama installer.
```powershell
powershell -ExecutionPolicy Bypass -File .\scripts\build_windows.ps1
```
cd ..
go build .
```
Then run the desktop app with `npm start`:
```
cd app
npm install
npm start
```

BIN
app/assets/app.ico Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 7.3 KiB

17
app/assets/assets.go Normal file
View File

@@ -0,0 +1,17 @@
package assets
import (
"embed"
"io/fs"
)
//go:embed *.ico
var icons embed.FS
func ListIcons() ([]string, error) {
return fs.Glob(icons, "*")
}
func GetIcon(filename string) ([]byte, error) {
return icons.ReadFile(filename)
}

BIN
app/assets/setup.bmp Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 76 KiB

BIN
app/assets/tray.ico Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 89 KiB

BIN
app/assets/tray_upgrade.ico Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 91 KiB

View File

@@ -0,0 +1,9 @@
//go:build !windows
package lifecycle
import "errors"
func GetStarted() error {
return errors.New("not implemented")
}

View File

@@ -0,0 +1,43 @@
package lifecycle
import (
"fmt"
"log/slog"
"os"
"os/exec"
"path/filepath"
"syscall"
)
func GetStarted() error {
const CREATE_NEW_CONSOLE = 0x00000010
var err error
bannerScript := filepath.Join(AppDir, "ollama_welcome.ps1")
args := []string{
// TODO once we're signed, the execution policy bypass should be removed
"powershell", "-noexit", "-ExecutionPolicy", "Bypass", "-nologo", "-file", bannerScript,
}
args[0], err = exec.LookPath(args[0])
if err != nil {
return err
}
// Make sure the script actually exists
_, err = os.Stat(bannerScript)
if err != nil {
return fmt.Errorf("getting started banner script error %s", err)
}
slog.Info(fmt.Sprintf("opening getting started terminal with %v", args))
attrs := &os.ProcAttr{
Files: []*os.File{os.Stdin, os.Stdout, os.Stderr},
Sys: &syscall.SysProcAttr{CreationFlags: CREATE_NEW_CONSOLE, HideWindow: false},
}
proc, err := os.StartProcess(args[0], args, attrs)
if err != nil {
return fmt.Errorf("unable to start getting started shell %w", err)
}
slog.Debug(fmt.Sprintf("getting started terminal PID: %d", proc.Pid))
return proc.Release()
}

View File

@@ -0,0 +1,94 @@
package lifecycle
import (
"context"
"fmt"
"log"
"log/slog"
"os"
"os/signal"
"syscall"
"github.com/ollama/ollama/app/store"
"github.com/ollama/ollama/app/tray"
"github.com/ollama/ollama/envconfig"
)
func Run() {
InitLogging()
slog.Info("app config", "env", envconfig.Values())
ctx, cancel := context.WithCancel(context.Background())
var done chan int
t, err := tray.NewTray()
if err != nil {
log.Fatalf("Failed to start: %s", err)
}
callbacks := t.GetCallbacks()
signals := make(chan os.Signal, 1)
signal.Notify(signals, syscall.SIGINT, syscall.SIGTERM)
go func() {
slog.Debug("starting callback loop")
for {
select {
case <-callbacks.Quit:
slog.Debug("quit called")
t.Quit()
case <-signals:
slog.Debug("shutting down due to signal")
t.Quit()
case <-callbacks.Update:
err := DoUpgrade(cancel, done)
if err != nil {
slog.Warn(fmt.Sprintf("upgrade attempt failed: %s", err))
}
case <-callbacks.ShowLogs:
ShowLogs()
case <-callbacks.DoFirstUse:
err := GetStarted()
if err != nil {
slog.Warn(fmt.Sprintf("Failed to launch getting started shell: %s", err))
}
}
}
}()
// Are we first use?
if !store.GetFirstTimeRun() {
slog.Debug("First time run")
err = t.DisplayFirstUseNotification()
if err != nil {
slog.Debug(fmt.Sprintf("XXX failed to display first use notification %v", err))
}
store.SetFirstTimeRun(true)
} else {
slog.Debug("Not first time, skipping first run notification")
}
if IsServerRunning(ctx) {
slog.Info("Detected another instance of ollama running, exiting")
os.Exit(1)
} else {
done, err = SpawnServer(ctx, CLIName)
if err != nil {
// TODO - should we retry in a backoff loop?
// TODO - should we pop up a warning and maybe add a menu item to view application logs?
slog.Error(fmt.Sprintf("Failed to spawn ollama server %s", err))
done = make(chan int, 1)
done <- 1
}
}
StartBackgroundUpdaterChecker(ctx, t.UpdateAvailable)
t.Run()
cancel()
slog.Info("Waiting for ollama server to shutdown...")
if done != nil {
<-done
}
slog.Info("Ollama app exiting")
}

80
app/lifecycle/logging.go Normal file
View File

@@ -0,0 +1,80 @@
package lifecycle
import (
"fmt"
"log/slog"
"os"
"path/filepath"
"strconv"
"strings"
"github.com/ollama/ollama/envconfig"
)
func InitLogging() {
level := slog.LevelInfo
if envconfig.Debug() {
level = slog.LevelDebug
}
var logFile *os.File
var err error
// Detect if we're a GUI app on windows, and if not, send logs to console
if os.Stderr.Fd() != 0 {
// Console app detected
logFile = os.Stderr
// TODO - write one-line to the app.log file saying we're running in console mode to help avoid confusion
} else {
rotateLogs(AppLogFile)
logFile, err = os.OpenFile(AppLogFile, os.O_APPEND|os.O_WRONLY|os.O_CREATE, 0o755)
if err != nil {
slog.Error(fmt.Sprintf("failed to create server log %v", err))
return
}
}
handler := slog.NewTextHandler(logFile, &slog.HandlerOptions{
Level: level,
AddSource: true,
ReplaceAttr: func(_ []string, attr slog.Attr) slog.Attr {
if attr.Key == slog.SourceKey {
source := attr.Value.Any().(*slog.Source)
source.File = filepath.Base(source.File)
}
return attr
},
})
slog.SetDefault(slog.New(handler))
slog.Info("ollama app started")
}
func rotateLogs(logFile string) {
if _, err := os.Stat(logFile); os.IsNotExist(err) {
return
}
index := strings.LastIndex(logFile, ".")
pre := logFile[:index]
post := "." + logFile[index+1:]
for i := LogRotationCount; i > 0; i-- {
older := pre + "-" + strconv.Itoa(i) + post
newer := pre + "-" + strconv.Itoa(i-1) + post
if i == 1 {
newer = pre + post
}
if _, err := os.Stat(newer); err == nil {
if _, err := os.Stat(older); err == nil {
err := os.Remove(older)
if err != nil {
slog.Warn("Failed to remove older log", "older", older, "error", err)
continue
}
}
err := os.Rename(newer, older)
if err != nil {
slog.Warn("Failed to rotate log", "older", older, "newer", newer, "error", err)
}
}
}
}

View File

@@ -0,0 +1,9 @@
//go:build !windows
package lifecycle
import "log/slog"
func ShowLogs() {
slog.Warn("not implemented")
}

View File

@@ -0,0 +1,44 @@
package lifecycle
import (
"os"
"path/filepath"
"strconv"
"testing"
"github.com/stretchr/testify/assert"
"github.com/stretchr/testify/require"
)
func TestRotateLogs(t *testing.T) {
logDir := t.TempDir()
logFile := filepath.Join(logDir, "testlog.log")
// No log exists
rotateLogs(logFile)
require.NoError(t, os.WriteFile(logFile, []byte("1"), 0o644))
assert.FileExists(t, logFile)
// First rotation
rotateLogs(logFile)
assert.FileExists(t, filepath.Join(logDir, "testlog-1.log"))
assert.NoFileExists(t, filepath.Join(logDir, "testlog-2.log"))
assert.NoFileExists(t, logFile)
// Should be a no-op without a new log
rotateLogs(logFile)
assert.FileExists(t, filepath.Join(logDir, "testlog-1.log"))
assert.NoFileExists(t, filepath.Join(logDir, "testlog-2.log"))
assert.NoFileExists(t, logFile)
for i := 2; i <= LogRotationCount+1; i++ {
require.NoError(t, os.WriteFile(logFile, []byte(strconv.Itoa(i)), 0o644))
assert.FileExists(t, logFile)
rotateLogs(logFile)
assert.NoFileExists(t, logFile)
for j := 1; j < i; j++ {
assert.FileExists(t, filepath.Join(logDir, "testlog-"+strconv.Itoa(j)+".log"))
}
assert.NoFileExists(t, filepath.Join(logDir, "testlog-"+strconv.Itoa(i+1)+".log"))
}
}

View File

@@ -0,0 +1,19 @@
package lifecycle
import (
"fmt"
"log/slog"
"os/exec"
"syscall"
)
func ShowLogs() {
cmd_path := "c:\\Windows\\system32\\cmd.exe"
slog.Debug(fmt.Sprintf("viewing logs with start %s", AppDataDir))
cmd := exec.Command(cmd_path, "/c", "start", AppDataDir)
cmd.SysProcAttr = &syscall.SysProcAttr{HideWindow: false, CreationFlags: 0x08000000}
err := cmd.Start()
if err != nil {
slog.Error(fmt.Sprintf("Failed to open log dir: %s", err))
}
}

84
app/lifecycle/paths.go Normal file
View File

@@ -0,0 +1,84 @@
package lifecycle
import (
"errors"
"fmt"
"log/slog"
"os"
"path/filepath"
"runtime"
"strings"
)
var (
AppName = "ollama app"
CLIName = "ollama"
AppDir = "/opt/Ollama"
AppDataDir = "/opt/Ollama"
// TODO - should there be a distinct log dir?
UpdateStageDir = "/tmp"
AppLogFile = "/tmp/ollama_app.log"
ServerLogFile = "/tmp/ollama.log"
UpgradeLogFile = "/tmp/ollama_update.log"
Installer = "OllamaSetup.exe"
LogRotationCount = 5
)
func init() {
if runtime.GOOS == "windows" {
AppName += ".exe"
CLIName += ".exe"
// Logs, configs, downloads go to LOCALAPPDATA
localAppData := os.Getenv("LOCALAPPDATA")
AppDataDir = filepath.Join(localAppData, "Ollama")
UpdateStageDir = filepath.Join(AppDataDir, "updates")
AppLogFile = filepath.Join(AppDataDir, "app.log")
ServerLogFile = filepath.Join(AppDataDir, "server.log")
UpgradeLogFile = filepath.Join(AppDataDir, "upgrade.log")
exe, err := os.Executable()
if err != nil {
slog.Warn("error discovering executable directory", "error", err)
AppDir = filepath.Join(localAppData, "Programs", "Ollama")
} else {
AppDir = filepath.Dir(exe)
}
// Make sure we have PATH set correctly for any spawned children
paths := strings.Split(os.Getenv("PATH"), ";")
// Start with whatever we find in the PATH/LD_LIBRARY_PATH
found := false
for _, path := range paths {
d, err := filepath.Abs(path)
if err != nil {
continue
}
if strings.EqualFold(AppDir, d) {
found = true
}
}
if !found {
paths = append(paths, AppDir)
pathVal := strings.Join(paths, ";")
slog.Debug("setting PATH=" + pathVal)
err := os.Setenv("PATH", pathVal)
if err != nil {
slog.Error(fmt.Sprintf("failed to update PATH: %s", err))
}
}
// Make sure our logging dir exists
_, err = os.Stat(AppDataDir)
if errors.Is(err, os.ErrNotExist) {
if err := os.MkdirAll(AppDataDir, 0o755); err != nil {
slog.Error(fmt.Sprintf("create ollama dir %s: %v", AppDataDir, err))
}
}
} else if runtime.GOOS == "darwin" {
// TODO
AppName += ".app"
// } else if runtime.GOOS == "linux" {
// TODO
}
}

186
app/lifecycle/server.go Normal file
View File

@@ -0,0 +1,186 @@
package lifecycle
import (
"context"
"errors"
"fmt"
"io"
"log/slog"
"os"
"os/exec"
"path/filepath"
"time"
"github.com/ollama/ollama/api"
)
func getCLIFullPath(command string) string {
var cmdPath string
appExe, err := os.Executable()
if err == nil {
// Check both the same location as the tray app, as well as ./bin
cmdPath = filepath.Join(filepath.Dir(appExe), command)
_, err := os.Stat(cmdPath)
if err == nil {
return cmdPath
}
cmdPath = filepath.Join(filepath.Dir(appExe), "bin", command)
_, err = os.Stat(cmdPath)
if err == nil {
return cmdPath
}
}
cmdPath, err = exec.LookPath(command)
if err == nil {
_, err := os.Stat(cmdPath)
if err == nil {
return cmdPath
}
}
pwd, err := os.Getwd()
if err == nil {
cmdPath = filepath.Join(pwd, command)
_, err = os.Stat(cmdPath)
if err == nil {
return cmdPath
}
}
return command
}
func start(ctx context.Context, command string) (*exec.Cmd, error) {
cmd := getCmd(ctx, getCLIFullPath(command))
stdout, err := cmd.StdoutPipe()
if err != nil {
return nil, fmt.Errorf("failed to spawn server stdout pipe: %w", err)
}
stderr, err := cmd.StderrPipe()
if err != nil {
return nil, fmt.Errorf("failed to spawn server stderr pipe: %w", err)
}
rotateLogs(ServerLogFile)
logFile, err := os.OpenFile(ServerLogFile, os.O_APPEND|os.O_WRONLY|os.O_CREATE, 0o755)
if err != nil {
return nil, fmt.Errorf("failed to create server log: %w", err)
}
logDir := filepath.Dir(ServerLogFile)
_, err = os.Stat(logDir)
if err != nil {
if !errors.Is(err, os.ErrNotExist) {
return nil, fmt.Errorf("stat ollama server log dir %s: %v", logDir, err)
}
if err := os.MkdirAll(logDir, 0o755); err != nil {
return nil, fmt.Errorf("create ollama server log dir %s: %v", logDir, err)
}
}
go func() {
defer logFile.Close()
io.Copy(logFile, stdout) //nolint:errcheck
}()
go func() {
defer logFile.Close()
io.Copy(logFile, stderr) //nolint:errcheck
}()
// Re-wire context done behavior to attempt a graceful shutdown of the server
cmd.Cancel = func() error {
if cmd.Process != nil {
err := terminate(cmd)
if err != nil {
slog.Warn("error trying to gracefully terminate server", "err", err)
return cmd.Process.Kill()
}
tick := time.NewTicker(10 * time.Millisecond)
defer tick.Stop()
for {
select {
case <-tick.C:
exited, err := isProcessExited(cmd.Process.Pid)
if err != nil {
return err
}
if exited {
return nil
}
case <-time.After(5 * time.Second):
slog.Warn("graceful server shutdown timeout, killing", "pid", cmd.Process.Pid)
return cmd.Process.Kill()
}
}
}
return nil
}
// run the command and wait for it to finish
if err := cmd.Start(); err != nil {
return nil, fmt.Errorf("failed to start server %w", err)
}
if cmd.Process != nil {
slog.Info(fmt.Sprintf("started ollama server with pid %d", cmd.Process.Pid))
}
slog.Info(fmt.Sprintf("ollama server logs %s", ServerLogFile))
return cmd, nil
}
func SpawnServer(ctx context.Context, command string) (chan int, error) {
done := make(chan int)
go func() {
// Keep the server running unless we're shuttind down the app
crashCount := 0
for {
slog.Info("starting server...")
cmd, err := start(ctx, command)
if err != nil {
crashCount++
slog.Error(fmt.Sprintf("failed to start server %s", err))
time.Sleep(500 * time.Millisecond * time.Duration(crashCount))
continue
}
cmd.Wait() //nolint:errcheck
var code int
if cmd.ProcessState != nil {
code = cmd.ProcessState.ExitCode()
}
select {
case <-ctx.Done():
slog.Info(fmt.Sprintf("server shutdown with exit code %d", code))
done <- code
return
default:
crashCount++
slog.Warn(fmt.Sprintf("server crash %d - exit code %d - respawning", crashCount, code))
time.Sleep(500 * time.Millisecond * time.Duration(crashCount))
break
}
}
}()
return done, nil
}
func IsServerRunning(ctx context.Context) bool {
client, err := api.ClientFromEnvironment()
if err != nil {
slog.Info("unable to connect to server")
return false
}
err = client.Heartbeat(ctx)
if err != nil {
slog.Debug(fmt.Sprintf("heartbeat from server: %s", err))
slog.Info("unable to connect to server")
return false
}
return true
}

View File

@@ -0,0 +1,38 @@
//go:build !windows
package lifecycle
import (
"context"
"errors"
"fmt"
"os"
"os/exec"
"syscall"
)
func getCmd(ctx context.Context, cmd string) *exec.Cmd {
return exec.CommandContext(ctx, cmd, "serve")
}
func terminate(cmd *exec.Cmd) error {
return cmd.Process.Signal(os.Interrupt)
}
func isProcessExited(pid int) (bool, error) {
proc, err := os.FindProcess(pid)
if err != nil {
return false, fmt.Errorf("failed to find process: %v", err)
}
err = proc.Signal(syscall.Signal(0))
if err != nil {
if errors.Is(err, os.ErrProcessDone) || errors.Is(err, syscall.ESRCH) {
return true, nil
}
return false, fmt.Errorf("error signaling process: %v", err)
}
return false, nil
}

View File

@@ -0,0 +1,91 @@
package lifecycle
import (
"context"
"fmt"
"os/exec"
"syscall"
"golang.org/x/sys/windows"
)
func getCmd(ctx context.Context, exePath string) *exec.Cmd {
cmd := exec.CommandContext(ctx, exePath, "serve")
cmd.SysProcAttr = &syscall.SysProcAttr{
HideWindow: true,
CreationFlags: windows.CREATE_NEW_PROCESS_GROUP,
}
return cmd
}
func terminate(cmd *exec.Cmd) error {
dll, err := windows.LoadDLL("kernel32.dll")
if err != nil {
return err
}
//nolint:errcheck
defer dll.Release()
pid := cmd.Process.Pid
f, err := dll.FindProc("AttachConsole")
if err != nil {
return err
}
r1, _, err := f.Call(uintptr(pid))
if r1 == 0 && err != syscall.ERROR_ACCESS_DENIED {
return err
}
f, err = dll.FindProc("SetConsoleCtrlHandler")
if err != nil {
return err
}
r1, _, err = f.Call(0, 1)
if r1 == 0 {
return err
}
f, err = dll.FindProc("GenerateConsoleCtrlEvent")
if err != nil {
return err
}
r1, _, err = f.Call(windows.CTRL_BREAK_EVENT, uintptr(pid))
if r1 == 0 {
return err
}
r1, _, err = f.Call(windows.CTRL_C_EVENT, uintptr(pid))
if r1 == 0 {
return err
}
return nil
}
const STILL_ACTIVE = 259
func isProcessExited(pid int) (bool, error) {
hProcess, err := windows.OpenProcess(windows.PROCESS_QUERY_INFORMATION, false, uint32(pid))
if err != nil {
return false, fmt.Errorf("failed to open process: %v", err)
}
//nolint:errcheck
defer windows.CloseHandle(hProcess)
var exitCode uint32
err = windows.GetExitCodeProcess(hProcess, &exitCode)
if err != nil {
return false, fmt.Errorf("failed to get exit code: %v", err)
}
if exitCode == STILL_ACTIVE {
return false, nil
}
return true, nil
}

229
app/lifecycle/updater.go Normal file
View File

@@ -0,0 +1,229 @@
package lifecycle
import (
"context"
"crypto/rand"
"encoding/json"
"errors"
"fmt"
"io"
"log/slog"
"mime"
"net/http"
"net/url"
"os"
"path"
"path/filepath"
"runtime"
"strconv"
"strings"
"time"
"github.com/ollama/ollama/auth"
"github.com/ollama/ollama/version"
)
var (
UpdateCheckURLBase = "https://ollama.com/api/update"
UpdateDownloaded = false
UpdateCheckInterval = 60 * 60 * time.Second
)
// TODO - maybe move up to the API package?
type UpdateResponse struct {
UpdateURL string `json:"url"`
UpdateVersion string `json:"version"`
}
func IsNewReleaseAvailable(ctx context.Context) (bool, UpdateResponse) {
var updateResp UpdateResponse
requestURL, err := url.Parse(UpdateCheckURLBase)
if err != nil {
return false, updateResp
}
query := requestURL.Query()
query.Add("os", runtime.GOOS)
query.Add("arch", runtime.GOARCH)
query.Add("version", version.Version)
query.Add("ts", strconv.FormatInt(time.Now().Unix(), 10))
nonce, err := auth.NewNonce(rand.Reader, 16)
if err != nil {
return false, updateResp
}
query.Add("nonce", nonce)
requestURL.RawQuery = query.Encode()
data := []byte(fmt.Sprintf("%s,%s", http.MethodGet, requestURL.RequestURI()))
signature, err := auth.Sign(ctx, data)
if err != nil {
return false, updateResp
}
req, err := http.NewRequestWithContext(ctx, http.MethodGet, requestURL.String(), nil)
if err != nil {
slog.Warn(fmt.Sprintf("failed to check for update: %s", err))
return false, updateResp
}
req.Header.Set("Authorization", signature)
req.Header.Set("User-Agent", fmt.Sprintf("ollama/%s (%s %s) Go/%s", version.Version, runtime.GOARCH, runtime.GOOS, runtime.Version()))
slog.Debug("checking for available update", "requestURL", requestURL)
resp, err := http.DefaultClient.Do(req)
if err != nil {
slog.Warn(fmt.Sprintf("failed to check for update: %s", err))
return false, updateResp
}
defer resp.Body.Close()
if resp.StatusCode == http.StatusNoContent {
slog.Debug("check update response 204 (current version is up to date)")
return false, updateResp
}
body, err := io.ReadAll(resp.Body)
if err != nil {
slog.Warn(fmt.Sprintf("failed to read body response: %s", err))
}
if resp.StatusCode != http.StatusOK {
slog.Info(fmt.Sprintf("check update error %d - %.96s", resp.StatusCode, string(body)))
return false, updateResp
}
err = json.Unmarshal(body, &updateResp)
if err != nil {
slog.Warn(fmt.Sprintf("malformed response checking for update: %s", err))
return false, updateResp
}
// Extract the version string from the URL in the github release artifact path
updateResp.UpdateVersion = path.Base(path.Dir(updateResp.UpdateURL))
slog.Info("New update available at " + updateResp.UpdateURL)
return true, updateResp
}
func DownloadNewRelease(ctx context.Context, updateResp UpdateResponse) error {
// Do a head first to check etag info
req, err := http.NewRequestWithContext(ctx, http.MethodHead, updateResp.UpdateURL, nil)
if err != nil {
return err
}
resp, err := http.DefaultClient.Do(req)
if err != nil {
return fmt.Errorf("error checking update: %w", err)
}
if resp.StatusCode != http.StatusOK {
return fmt.Errorf("unexpected status attempting to download update %d", resp.StatusCode)
}
resp.Body.Close()
etag := strings.Trim(resp.Header.Get("etag"), "\"")
if etag == "" {
slog.Debug("no etag detected, falling back to filename based dedup")
etag = "_"
}
filename := Installer
_, params, err := mime.ParseMediaType(resp.Header.Get("content-disposition"))
if err == nil {
filename = params["filename"]
}
stageFilename := filepath.Join(UpdateStageDir, etag, filename)
// Check to see if we already have it downloaded
_, err = os.Stat(stageFilename)
if err == nil {
slog.Info("update already downloaded")
return nil
}
cleanupOldDownloads()
req.Method = http.MethodGet
resp, err = http.DefaultClient.Do(req)
if err != nil {
return fmt.Errorf("error checking update: %w", err)
}
defer resp.Body.Close()
etag = strings.Trim(resp.Header.Get("etag"), "\"")
if etag == "" {
slog.Debug("no etag detected, falling back to filename based dedup") // TODO probably can get rid of this redundant log
etag = "_"
}
stageFilename = filepath.Join(UpdateStageDir, etag, filename)
_, err = os.Stat(filepath.Dir(stageFilename))
if errors.Is(err, os.ErrNotExist) {
if err := os.MkdirAll(filepath.Dir(stageFilename), 0o755); err != nil {
return fmt.Errorf("create ollama dir %s: %v", filepath.Dir(stageFilename), err)
}
}
payload, err := io.ReadAll(resp.Body)
if err != nil {
return fmt.Errorf("failed to read body response: %w", err)
}
fp, err := os.OpenFile(stageFilename, os.O_WRONLY|os.O_CREATE|os.O_TRUNC, 0o755)
if err != nil {
return fmt.Errorf("write payload %s: %w", stageFilename, err)
}
defer fp.Close()
if n, err := fp.Write(payload); err != nil || n != len(payload) {
return fmt.Errorf("write payload %s: %d vs %d -- %w", stageFilename, n, len(payload), err)
}
slog.Info("new update downloaded " + stageFilename)
UpdateDownloaded = true
return nil
}
func cleanupOldDownloads() {
files, err := os.ReadDir(UpdateStageDir)
if err != nil && errors.Is(err, os.ErrNotExist) {
// Expected behavior on first run
return
} else if err != nil {
slog.Warn(fmt.Sprintf("failed to list stage dir: %s", err))
return
}
for _, file := range files {
fullname := filepath.Join(UpdateStageDir, file.Name())
slog.Debug("cleaning up old download: " + fullname)
err = os.RemoveAll(fullname)
if err != nil {
slog.Warn(fmt.Sprintf("failed to cleanup stale update download %s", err))
}
}
}
func StartBackgroundUpdaterChecker(ctx context.Context, cb func(string) error) {
go func() {
// Don't blast an update message immediately after startup
// time.Sleep(30 * time.Second)
time.Sleep(3 * time.Second)
for {
available, resp := IsNewReleaseAvailable(ctx)
if available {
err := DownloadNewRelease(ctx, resp)
if err != nil {
slog.Error(fmt.Sprintf("failed to download new release: %s", err))
}
err = cb(resp.UpdateVersion)
if err != nil {
slog.Warn(fmt.Sprintf("failed to register update available with tray: %s", err))
}
}
select {
case <-ctx.Done():
slog.Debug("stopping background update checker")
return
default:
time.Sleep(UpdateCheckInterval)
}
}
}()
}

View File

@@ -0,0 +1,12 @@
//go:build !windows
package lifecycle
import (
"context"
"errors"
)
func DoUpgrade(cancel context.CancelFunc, done chan int) error {
return errors.New("not implemented")
}

View File

@@ -0,0 +1,74 @@
package lifecycle
import (
"context"
"errors"
"fmt"
"log/slog"
"os"
"os/exec"
"path/filepath"
)
func DoUpgrade(cancel context.CancelFunc, done chan int) error {
files, err := filepath.Glob(filepath.Join(UpdateStageDir, "*", "*.exe")) // TODO generalize for multiplatform
if err != nil {
return fmt.Errorf("failed to lookup downloads: %s", err)
}
if len(files) == 0 {
return errors.New("no update downloads found")
} else if len(files) > 1 {
// Shouldn't happen
slog.Warn(fmt.Sprintf("multiple downloads found, using first one %v", files))
}
installerExe := files[0]
slog.Info("starting upgrade with " + installerExe)
slog.Info("upgrade log file " + UpgradeLogFile)
// make the upgrade show progress, but non interactive
installArgs := []string{
"/CLOSEAPPLICATIONS", // Quit the tray app if it's still running
"/LOG=" + filepath.Base(UpgradeLogFile), // Only relative seems reliable, so set pwd
"/FORCECLOSEAPPLICATIONS", // Force close the tray app - might be needed
"/SP", // Skip the "This will install... Do you wish to continue" prompt
"/NOCANCEL", // Disable the ability to cancel upgrade mid-flight to avoid partially installed upgrades
"/SILENT",
}
// Safeguard in case we have requests in flight that need to drain...
slog.Info("Waiting for server to shutdown")
cancel()
if done != nil {
<-done
} else {
// Shouldn't happen
slog.Warn("done chan was nil, not actually waiting")
}
slog.Debug(fmt.Sprintf("starting installer: %s %v", installerExe, installArgs))
os.Chdir(filepath.Dir(UpgradeLogFile)) //nolint:errcheck
cmd := exec.Command(installerExe, installArgs...)
if err := cmd.Start(); err != nil {
return fmt.Errorf("unable to start ollama app %w", err)
}
if cmd.Process != nil {
err = cmd.Process.Release()
if err != nil {
slog.Error(fmt.Sprintf("failed to release server process: %s", err))
}
} else {
// TODO - some details about why it didn't start, or is this a pedantic error case?
return errors.New("installer process did not start")
}
// TODO should we linger for a moment and check to make sure it's actually running by checking the pid?
slog.Info("Installer started in background, exiting")
os.Exit(0)
// Not reached
return nil
}

12
app/main.go Normal file
View File

@@ -0,0 +1,12 @@
package main
// Compile with the following to get rid of the cmd pop up on windows
// go build -ldflags="-H windowsgui" .
import (
"github.com/ollama/ollama/app/lifecycle"
)
func main() {
lifecycle.Run()
}

204
app/ollama.iss Normal file
View File

@@ -0,0 +1,204 @@
; Inno Setup Installer for Ollama
;
; To build the installer use the build script invoked from the top of the source tree
;
; powershell -ExecutionPolicy Bypass -File .\scripts\build_windows.ps
#define MyAppName "Ollama"
#if GetEnv("PKG_VERSION") != ""
#define MyAppVersion GetEnv("PKG_VERSION")
#else
#define MyAppVersion "0.0.0"
#endif
#define MyAppPublisher "Ollama"
#define MyAppURL "https://ollama.com/"
#define MyAppExeName "ollama app.exe"
#define MyIcon ".\assets\app.ico"
[Setup]
; NOTE: The value of AppId uniquely identifies this application. Do not use the same AppId value in installers for other applications.
; (To generate a new GUID, click Tools | Generate GUID inside the IDE.)
AppId={{44E83376-CE68-45EB-8FC1-393500EB558C}
AppName={#MyAppName}
AppVersion={#MyAppVersion}
VersionInfoVersion={#MyAppVersion}
;AppVerName={#MyAppName} {#MyAppVersion}
AppPublisher={#MyAppPublisher}
AppPublisherURL={#MyAppURL}
AppSupportURL={#MyAppURL}
AppUpdatesURL={#MyAppURL}
ArchitecturesAllowed=x64compatible arm64
ArchitecturesInstallIn64BitMode=x64compatible arm64
DefaultDirName={localappdata}\Programs\{#MyAppName}
DefaultGroupName={#MyAppName}
DisableProgramGroupPage=yes
PrivilegesRequired=lowest
OutputBaseFilename="OllamaSetup"
SetupIconFile={#MyIcon}
UninstallDisplayIcon={uninstallexe}
Compression=lzma2
SolidCompression=no
WizardStyle=modern
ChangesEnvironment=yes
OutputDir=..\dist\
; Disable logging once everything's battle tested
; Filename will be %TEMP%\Setup Log*.txt
SetupLogging=yes
CloseApplications=yes
RestartApplications=no
RestartIfNeededByRun=no
; https://jrsoftware.org/ishelp/index.php?topic=setup_wizardimagefile
WizardSmallImageFile=.\assets\setup.bmp
; Ollama requires Windows 10 22H2 or newer for proper unicode rendering
; TODO: consider setting this to 10.0.19045
MinVersion=10.0.10240
; First release that supports WinRT UI Composition for win32 apps
; MinVersion=10.0.17134
; First release with XAML Islands - possible UI path forward
; MinVersion=10.0.18362
; quiet...
DisableDirPage=yes
DisableFinishedPage=yes
DisableReadyMemo=yes
DisableReadyPage=yes
DisableStartupPrompt=yes
DisableWelcomePage=yes
; TODO - percentage can't be set less than 100, so how to make it shorter?
; WizardSizePercent=100,80
#if GetEnv("KEY_CONTAINER")
SignTool=MySignTool
SignedUninstaller=yes
#endif
SetupMutex=OllamaSetupMutex
[Languages]
Name: "english"; MessagesFile: "compiler:Default.isl"
[LangOptions]
DialogFontSize=12
[Files]
#if DirExists("..\dist\windows-amd64")
Source: "..\dist\windows-amd64-app.exe"; DestDir: "{app}"; DestName: "{#MyAppExeName}" ;Check: not IsArm64(); Flags: ignoreversion 64bit
Source: "..\dist\windows-amd64\ollama.exe"; DestDir: "{app}"; Check: not IsArm64(); Flags: ignoreversion 64bit
Source: "..\dist\windows-amd64\lib\ollama\*"; DestDir: "{app}\lib\ollama\"; Check: not IsArm64(); Flags: ignoreversion 64bit recursesubdirs
#endif
#if DirExists("..\dist\windows-arm64")
Source: "..\dist\windows-arm64\vc_redist.arm64.exe"; DestDir: "{tmp}"; Check: IsArm64() and vc_redist_needed(); Flags: deleteafterinstall
Source: "..\dist\windows-arm64-app.exe"; DestDir: "{app}"; DestName: "{#MyAppExeName}" ;Check: IsArm64(); Flags: ignoreversion 64bit
Source: "..\dist\windows-arm64\ollama.exe"; DestDir: "{app}"; Check: IsArm64(); Flags: ignoreversion 64bit
#endif
Source: "..\dist\ollama_welcome.ps1"; DestDir: "{app}"; Flags: ignoreversion
Source: ".\assets\app.ico"; DestDir: "{app}"; Flags: ignoreversion
[Icons]
Name: "{group}\{#MyAppName}"; Filename: "{app}\{#MyAppExeName}"; IconFilename: "{app}\app.ico"
Name: "{userstartup}\{#MyAppName}"; Filename: "{app}\{#MyAppExeName}"; IconFilename: "{app}\app.ico"
Name: "{userprograms}\{#MyAppName}"; Filename: "{app}\{#MyAppExeName}"; IconFilename: "{app}\app.ico"
[Run]
#if DirExists("..\dist\windows-arm64")
Filename: "{tmp}\vc_redist.arm64.exe"; Parameters: "/install /passive /norestart"; Check: IsArm64() and vc_redist_needed(); StatusMsg: "Installing VC++ Redistributables..."; Flags: waituntilterminated
#endif
Filename: "{cmd}"; Parameters: "/C set PATH={app};%PATH% & ""{app}\{#MyAppExeName}"""; Flags: postinstall nowait runhidden
[UninstallRun]
; Filename: "{cmd}"; Parameters: "/C ""taskkill /im ''{#MyAppExeName}'' /f /t"; Flags: runhidden
; Filename: "{cmd}"; Parameters: "/C ""taskkill /im ollama.exe /f /t"; Flags: runhidden
Filename: "taskkill"; Parameters: "/im ""{#MyAppExeName}"" /f /t"; Flags: runhidden
Filename: "taskkill"; Parameters: "/im ""ollama.exe"" /f /t"; Flags: runhidden
; HACK! need to give the server and app enough time to exit
; TODO - convert this to a Pascal code script so it waits until they're no longer running, then completes
Filename: "{cmd}"; Parameters: "/c timeout 5"; Flags: runhidden
[UninstallDelete]
Type: filesandordirs; Name: "{%TEMP}\ollama*"
Type: filesandordirs; Name: "{%LOCALAPPDATA}\Ollama"
Type: filesandordirs; Name: "{%LOCALAPPDATA}\Programs\Ollama"
Type: filesandordirs; Name: "{%USERPROFILE}\.ollama\models"
Type: filesandordirs; Name: "{%USERPROFILE}\.ollama\history"
; NOTE: if the user has a custom OLLAMA_MODELS it will be preserved
[InstallDelete]
Type: filesandordirs; Name: "{%TEMP}\ollama*"
Type: filesandordirs; Name: "{%LOCALAPPDATA}\Programs\Ollama"
[Messages]
WizardReady=Ollama
ReadyLabel1=%nLet's get you up and running with your own large language models.
SetupAppRunningError=Another Ollama installer is running.%n%nPlease cancel or finish the other installer, then click OK to continue with this install, or Cancel to exit.
;FinishedHeadingLabel=Run your first model
;FinishedLabel=%nRun this command in a PowerShell or cmd terminal.%n%n%n ollama run llama3.2
;ClickFinish=%n
[Registry]
Root: HKCU; Subkey: "Environment"; \
ValueType: expandsz; ValueName: "Path"; ValueData: "{olddata};{app}"; \
Check: NeedsAddPath('{app}')
[Code]
function NeedsAddPath(Param: string): boolean;
var
OrigPath: string;
begin
if not RegQueryStringValue(HKEY_CURRENT_USER,
'Environment',
'Path', OrigPath)
then begin
Result := True;
exit;
end;
{ look for the path with leading and trailing semicolon }
{ Pos() returns 0 if not found }
Result := Pos(';' + ExpandConstant(Param) + ';', ';' + OrigPath + ';') = 0;
end;
{ --- VC Runtime libraries discovery code - Only install vc_redist if it isn't already installed ----- }
const VCRTL_MIN_V1 = 14;
const VCRTL_MIN_V2 = 40;
const VCRTL_MIN_V3 = 33807;
const VCRTL_MIN_V4 = 0;
// check if the minimum required vc redist is installed (by looking the registry)
function vc_redist_needed (): Boolean;
var
sRegKey: string;
v1: Cardinal;
v2: Cardinal;
v3: Cardinal;
v4: Cardinal;
begin
sRegKey := 'SOFTWARE\WOW6432Node\Microsoft\VisualStudio\14.0\VC\Runtimes\arm64';
if (RegQueryDWordValue (HKEY_LOCAL_MACHINE, sRegKey, 'Major', v1) and
RegQueryDWordValue (HKEY_LOCAL_MACHINE, sRegKey, 'Minor', v2) and
RegQueryDWordValue (HKEY_LOCAL_MACHINE, sRegKey, 'Bld', v3) and
RegQueryDWordValue (HKEY_LOCAL_MACHINE, sRegKey, 'RBld', v4)) then
begin
Log ('VC Redist version: ' + IntToStr (v1) +
'.' + IntToStr (v2) + '.' + IntToStr (v3) +
'.' + IntToStr (v4));
{ Version info was found. Return true if later or equal to our
minimal required version RTL_MIN_Vx }
Result := not (
(v1 > VCRTL_MIN_V1) or ((v1 = VCRTL_MIN_V1) and
((v2 > VCRTL_MIN_V2) or ((v2 = VCRTL_MIN_V2) and
((v3 > VCRTL_MIN_V3) or ((v3 = VCRTL_MIN_V3) and
(v4 >= VCRTL_MIN_V4)))))));
end
else
Result := TRUE;
end;

29
app/ollama.rc Normal file
View File

@@ -0,0 +1,29 @@
#include <winver.h>
VS_VERSION_INFO VERSIONINFO
FILEFLAGSMASK 0x3fL
#ifdef _DEBUG
FILEFLAGS 0x1L
#else
FILEFLAGS 0x0L
#endif
FILEOS 0x40004L
FILETYPE 0x1L
FILESUBTYPE 0x0L
BEGIN
BLOCK "StringFileInfo"
BEGIN
BLOCK "040904b0"
BEGIN
VALUE "FileDescription", "Ollama"
VALUE "InternalName", "Ollama"
VALUE "OriginalFilename", "ollama app.exe"
VALUE "ProductName", "Ollama"
END
END
BLOCK "VarFileInfo"
BEGIN
VALUE "Translation", 0x409, 1200
END
END

8
app/ollama_welcome.ps1 Normal file
View File

@@ -0,0 +1,8 @@
# TODO - consider ANSI colors and maybe ASCII art...
write-host ""
write-host "Welcome to Ollama!"
write-host ""
write-host "Run your first model:"
write-host ""
write-host "`tollama run llama3.2"
write-host ""

97
app/store/store.go Normal file
View File

@@ -0,0 +1,97 @@
package store
import (
"encoding/json"
"errors"
"fmt"
"log/slog"
"os"
"path/filepath"
"sync"
"github.com/google/uuid"
)
type Store struct {
ID string `json:"id"`
FirstTimeRun bool `json:"first-time-run"`
}
var (
lock sync.Mutex
store Store
)
func GetID() string {
lock.Lock()
defer lock.Unlock()
if store.ID == "" {
initStore()
}
return store.ID
}
func GetFirstTimeRun() bool {
lock.Lock()
defer lock.Unlock()
if store.ID == "" {
initStore()
}
return store.FirstTimeRun
}
func SetFirstTimeRun(val bool) {
lock.Lock()
defer lock.Unlock()
if store.FirstTimeRun == val {
return
}
store.FirstTimeRun = val
writeStore(getStorePath())
}
// lock must be held
func initStore() {
storeFile, err := os.Open(getStorePath())
if err == nil {
defer storeFile.Close()
err = json.NewDecoder(storeFile).Decode(&store)
if err == nil {
slog.Debug(fmt.Sprintf("loaded existing store %s - ID: %s", getStorePath(), store.ID))
return
}
} else if !errors.Is(err, os.ErrNotExist) {
slog.Debug(fmt.Sprintf("unexpected error searching for store: %s", err))
}
slog.Debug("initializing new store")
store.ID = uuid.NewString()
writeStore(getStorePath())
}
func writeStore(storeFilename string) {
ollamaDir := filepath.Dir(storeFilename)
_, err := os.Stat(ollamaDir)
if errors.Is(err, os.ErrNotExist) {
if err := os.MkdirAll(ollamaDir, 0o755); err != nil {
slog.Error(fmt.Sprintf("create ollama dir %s: %v", ollamaDir, err))
return
}
}
payload, err := json.Marshal(store)
if err != nil {
slog.Error(fmt.Sprintf("failed to marshal store: %s", err))
return
}
fp, err := os.OpenFile(storeFilename, os.O_WRONLY|os.O_CREATE|os.O_TRUNC, 0o755)
if err != nil {
slog.Error(fmt.Sprintf("write store payload %s: %v", storeFilename, err))
return
}
defer fp.Close()
if n, err := fp.Write(payload); err != nil || n != len(payload) {
slog.Error(fmt.Sprintf("write store payload %s: %d vs %d -- %v", storeFilename, n, len(payload), err))
return
}
slog.Debug("Store contents: " + string(payload))
slog.Info(fmt.Sprintf("wrote store: %s", storeFilename))
}

13
app/store/store_darwin.go Normal file
View File

@@ -0,0 +1,13 @@
package store
import (
"os"
"path/filepath"
)
func getStorePath() string {
// TODO - system wide location?
home := os.Getenv("HOME")
return filepath.Join(home, "Library", "Application Support", "Ollama", "config.json")
}

16
app/store/store_linux.go Normal file
View File

@@ -0,0 +1,16 @@
package store
import (
"os"
"path/filepath"
)
func getStorePath() string {
if os.Geteuid() == 0 {
// TODO where should we store this on linux for system-wide operation?
return "/etc/ollama/config.json"
}
home := os.Getenv("HOME")
return filepath.Join(home, ".ollama", "config.json")
}

View File

@@ -0,0 +1,11 @@
package store
import (
"os"
"path/filepath"
)
func getStorePath() string {
localAppData := os.Getenv("LOCALAPPDATA")
return filepath.Join(localAppData, "Ollama", "config.json")
}

View File

@@ -0,0 +1,24 @@
package commontray
var (
Title = "Ollama"
ToolTip = "Ollama"
UpdateIconName = "tray_upgrade"
IconName = "tray"
)
type Callbacks struct {
Quit chan struct{}
Update chan struct{}
DoFirstUse chan struct{}
ShowLogs chan struct{}
}
type OllamaTray interface {
GetCallbacks() Callbacks
Run()
UpdateAvailable(ver string) error
DisplayFirstUseNotification() error
Quit()
}

28
app/tray/tray.go Normal file
View File

@@ -0,0 +1,28 @@
package tray
import (
"fmt"
"runtime"
"github.com/ollama/ollama/app/assets"
"github.com/ollama/ollama/app/tray/commontray"
)
func NewTray() (commontray.OllamaTray, error) {
extension := ".png"
if runtime.GOOS == "windows" {
extension = ".ico"
}
iconName := commontray.UpdateIconName + extension
updateIcon, err := assets.GetIcon(iconName)
if err != nil {
return nil, fmt.Errorf("failed to load icon %s: %w", iconName, err)
}
iconName = commontray.IconName + extension
icon, err := assets.GetIcon(iconName)
if err != nil {
return nil, fmt.Errorf("failed to load icon %s: %w", iconName, err)
}
return InitPlatformTray(icon, updateIcon)
}

View File

@@ -0,0 +1,13 @@
//go:build !windows
package tray
import (
"errors"
"github.com/ollama/ollama/app/tray/commontray"
)
func InitPlatformTray(icon, updateIcon []byte) (commontray.OllamaTray, error) {
return nil, errors.New("not implemented")
}

10
app/tray/tray_windows.go Normal file
View File

@@ -0,0 +1,10 @@
package tray
import (
"github.com/ollama/ollama/app/tray/commontray"
"github.com/ollama/ollama/app/tray/wintray"
)
func InitPlatformTray(icon, updateIcon []byte) (commontray.OllamaTray, error) {
return wintray.InitTray(icon, updateIcon)
}

View File

@@ -0,0 +1,181 @@
//go:build windows
package wintray
import (
"fmt"
"log/slog"
"sync"
"unsafe"
"golang.org/x/sys/windows"
)
var quitOnce sync.Once
func (t *winTray) Run() {
nativeLoop()
}
func nativeLoop() {
// Main message pump.
slog.Debug("starting event handling loop")
m := &struct {
WindowHandle windows.Handle
Message uint32
Wparam uintptr
Lparam uintptr
Time uint32
Pt point
LPrivate uint32
}{}
for {
ret, _, err := pGetMessage.Call(uintptr(unsafe.Pointer(m)), 0, 0, 0)
// If the function retrieves a message other than WM_QUIT, the return value is nonzero.
// If the function retrieves the WM_QUIT message, the return value is zero.
// If there is an error, the return value is -1
// https://msdn.microsoft.com/en-us/library/windows/desktop/ms644936(v=vs.85).aspx
switch int32(ret) {
case -1:
slog.Error(fmt.Sprintf("get message failure: %v", err))
return
case 0:
return
default:
pTranslateMessage.Call(uintptr(unsafe.Pointer(m))) //nolint:errcheck
pDispatchMessage.Call(uintptr(unsafe.Pointer(m))) //nolint:errcheck
}
}
}
// WindowProc callback function that processes messages sent to a window.
// https://msdn.microsoft.com/en-us/library/windows/desktop/ms633573(v=vs.85).aspx
func (t *winTray) wndProc(hWnd windows.Handle, message uint32, wParam, lParam uintptr) (lResult uintptr) {
const (
WM_RBUTTONUP = 0x0205
WM_LBUTTONUP = 0x0202
WM_COMMAND = 0x0111
WM_ENDSESSION = 0x0016
WM_CLOSE = 0x0010
WM_DESTROY = 0x0002
WM_MOUSEMOVE = 0x0200
WM_LBUTTONDOWN = 0x0201
)
switch message {
case WM_COMMAND:
menuItemId := int32(wParam)
// https://docs.microsoft.com/en-us/windows/win32/menurc/wm-command#menus
switch menuItemId {
case quitMenuID:
select {
case t.callbacks.Quit <- struct{}{}:
// should not happen but in case not listening
default:
slog.Error("no listener on Quit")
}
case updateMenuID:
select {
case t.callbacks.Update <- struct{}{}:
// should not happen but in case not listening
default:
slog.Error("no listener on Update")
}
case diagLogsMenuID:
select {
case t.callbacks.ShowLogs <- struct{}{}:
// should not happen but in case not listening
default:
slog.Error("no listener on ShowLogs")
}
default:
slog.Debug(fmt.Sprintf("Unexpected menu item id: %d", menuItemId))
}
case WM_CLOSE:
boolRet, _, err := pDestroyWindow.Call(uintptr(t.window))
if boolRet == 0 {
slog.Error(fmt.Sprintf("failed to destroy window: %s", err))
}
err = t.wcex.unregister()
if err != nil {
slog.Error(fmt.Sprintf("failed to unregister window %s", err))
}
case WM_DESTROY:
// same as WM_ENDSESSION, but throws 0 exit code after all
defer pPostQuitMessage.Call(uintptr(int32(0))) //nolint:errcheck
fallthrough
case WM_ENDSESSION:
t.muNID.Lock()
if t.nid != nil {
err := t.nid.delete()
if err != nil {
slog.Error(fmt.Sprintf("failed to delete nid: %s", err))
}
}
t.muNID.Unlock()
case t.wmSystrayMessage:
switch lParam {
case WM_MOUSEMOVE, WM_LBUTTONDOWN:
// Ignore these...
case WM_RBUTTONUP, WM_LBUTTONUP:
err := t.showMenu()
if err != nil {
slog.Error(fmt.Sprintf("failed to show menu: %s", err))
}
case 0x405: // TODO - how is this magic value derived for the notification left click
if t.pendingUpdate {
select {
case t.callbacks.Update <- struct{}{}:
// should not happen but in case not listening
default:
slog.Error("no listener on Update")
}
} else {
select {
case t.callbacks.DoFirstUse <- struct{}{}:
// should not happen but in case not listening
default:
slog.Error("no listener on DoFirstUse")
}
}
case 0x404: // Middle click or close notification
// slog.Debug("doing nothing on close of first time notification")
default:
// 0x402 also seems common - what is it?
slog.Debug(fmt.Sprintf("unmanaged app message, lParm: 0x%x", lParam))
}
case t.wmTaskbarCreated: // on explorer.exe restarts
t.muNID.Lock()
err := t.nid.add()
if err != nil {
slog.Error(fmt.Sprintf("failed to refresh the taskbar on explorer restart: %s", err))
}
t.muNID.Unlock()
default:
// Calls the default window procedure to provide default processing for any window messages that an application does not process.
// https://msdn.microsoft.com/en-us/library/windows/desktop/ms633572(v=vs.85).aspx
lResult, _, _ = pDefWindowProc.Call(
uintptr(hWnd),
uintptr(message),
wParam,
lParam,
)
}
return
}
func (t *winTray) Quit() {
quitOnce.Do(quit)
}
func quit() {
boolRet, _, err := pPostMessage.Call(
uintptr(wt.window),
WM_CLOSE,
0,
0,
)
if boolRet == 0 {
slog.Error(fmt.Sprintf("failed to post close message on shutdown %s", err))
}
}

72
app/tray/wintray/menus.go Normal file
View File

@@ -0,0 +1,72 @@
//go:build windows
package wintray
import (
"fmt"
"log/slog"
"unsafe"
"golang.org/x/sys/windows"
)
const (
_ = iota
updateAvailableMenuID
updateMenuID
separatorMenuID
diagLogsMenuID
diagSeparatorMenuID
quitMenuID
)
func (t *winTray) initMenus() error {
if err := t.addOrUpdateMenuItem(diagLogsMenuID, 0, diagLogsMenuTitle, false); err != nil {
return fmt.Errorf("unable to create menu entries %w\n", err)
}
if err := t.addSeparatorMenuItem(diagSeparatorMenuID, 0); err != nil {
return fmt.Errorf("unable to create menu entries %w", err)
}
if err := t.addOrUpdateMenuItem(quitMenuID, 0, quitMenuTitle, false); err != nil {
return fmt.Errorf("unable to create menu entries %w\n", err)
}
return nil
}
func (t *winTray) UpdateAvailable(ver string) error {
if !t.updateNotified {
slog.Debug("updating menu and sending notification for new update")
if err := t.addOrUpdateMenuItem(updateAvailableMenuID, 0, updateAvailableMenuTitle, true); err != nil {
return fmt.Errorf("unable to create menu entries %w", err)
}
if err := t.addOrUpdateMenuItem(updateMenuID, 0, updateMenuTitle, false); err != nil {
return fmt.Errorf("unable to create menu entries %w", err)
}
if err := t.addSeparatorMenuItem(separatorMenuID, 0); err != nil {
return fmt.Errorf("unable to create menu entries %w", err)
}
iconFilePath, err := iconBytesToFilePath(wt.updateIcon)
if err != nil {
return fmt.Errorf("unable to write icon data to temp file: %w", err)
}
if err := wt.setIcon(iconFilePath); err != nil {
return fmt.Errorf("unable to set icon: %w", err)
}
t.updateNotified = true
t.pendingUpdate = true
// Now pop up the notification
t.muNID.Lock()
defer t.muNID.Unlock()
copy(t.nid.InfoTitle[:], windows.StringToUTF16(updateTitle))
copy(t.nid.Info[:], windows.StringToUTF16(fmt.Sprintf(updateMessage, ver)))
t.nid.Flags |= NIF_INFO
t.nid.Timeout = 10
t.nid.Size = uint32(unsafe.Sizeof(*wt.nid))
err = t.nid.modify()
if err != nil {
return err
}
}
return nil
}

View File

@@ -0,0 +1,15 @@
//go:build windows
package wintray
const (
firstTimeTitle = "Ollama is running"
firstTimeMessage = "Click here to get started"
updateTitle = "Update available"
updateMessage = "Ollama version %s is ready to install"
quitMenuTitle = "Quit Ollama"
updateAvailableMenuTitle = "An update is available"
updateMenuTitle = "Restart to update"
diagLogsMenuTitle = "View logs"
)

View File

@@ -0,0 +1,66 @@
//go:build windows
package wintray
import (
"unsafe"
"golang.org/x/sys/windows"
)
// Contains information that the system needs to display notifications in the notification area.
// Used by Shell_NotifyIcon.
// https://msdn.microsoft.com/en-us/library/windows/desktop/bb773352(v=vs.85).aspx
// https://msdn.microsoft.com/en-us/library/windows/desktop/bb762159
type notifyIconData struct {
Size uint32
Wnd windows.Handle
ID, Flags, CallbackMessage uint32
Icon windows.Handle
Tip [128]uint16
State, StateMask uint32
Info [256]uint16
// Timeout, Version uint32
Timeout uint32
InfoTitle [64]uint16
InfoFlags uint32
GuidItem windows.GUID
BalloonIcon windows.Handle
}
func (nid *notifyIconData) add() error {
const NIM_ADD = 0x00000000
res, _, err := pShellNotifyIcon.Call(
uintptr(NIM_ADD),
uintptr(unsafe.Pointer(nid)),
)
if res == 0 {
return err
}
return nil
}
func (nid *notifyIconData) modify() error {
const NIM_MODIFY = 0x00000001
res, _, err := pShellNotifyIcon.Call(
uintptr(NIM_MODIFY),
uintptr(unsafe.Pointer(nid)),
)
if res == 0 {
return err
}
return nil
}
func (nid *notifyIconData) delete() error {
const NIM_DELETE = 0x00000002
res, _, err := pShellNotifyIcon.Call(
uintptr(NIM_DELETE),
uintptr(unsafe.Pointer(nid)),
)
if res == 0 {
return err
}
return nil
}

488
app/tray/wintray/tray.go Normal file
View File

@@ -0,0 +1,488 @@
//go:build windows
package wintray
import (
"crypto/md5"
"encoding/hex"
"fmt"
"log/slog"
"os"
"path/filepath"
"sort"
"sync"
"syscall"
"unsafe"
"golang.org/x/sys/windows"
"github.com/ollama/ollama/app/tray/commontray"
)
// Helpful sources: https://github.com/golang/exp/blob/master/shiny/driver/internal/win32
// Contains information about loaded resources
type winTray struct {
instance,
icon,
cursor,
window windows.Handle
loadedImages map[string]windows.Handle
muLoadedImages sync.RWMutex
// menus keeps track of the submenus keyed by the menu item ID, plus 0
// which corresponds to the main popup menu.
menus map[uint32]windows.Handle
muMenus sync.RWMutex
menuOf map[uint32]windows.Handle
muMenuOf sync.RWMutex
// menuItemIcons maintains the bitmap of each menu item (if applies). It's
// needed to show the icon correctly when showing a previously hidden menu
// item again.
// menuItemIcons map[uint32]windows.Handle
// muMenuItemIcons sync.RWMutex
visibleItems map[uint32][]uint32
muVisibleItems sync.RWMutex
nid *notifyIconData
muNID sync.RWMutex
wcex *wndClassEx
wmSystrayMessage,
wmTaskbarCreated uint32
pendingUpdate bool
updateNotified bool // Only pop up the notification once - TODO consider daily nag?
// Callbacks
callbacks commontray.Callbacks
normalIcon []byte
updateIcon []byte
}
var wt winTray
func (t *winTray) GetCallbacks() commontray.Callbacks {
return t.callbacks
}
func InitTray(icon, updateIcon []byte) (*winTray, error) {
wt.callbacks.Quit = make(chan struct{})
wt.callbacks.Update = make(chan struct{})
wt.callbacks.ShowLogs = make(chan struct{})
wt.callbacks.DoFirstUse = make(chan struct{})
wt.normalIcon = icon
wt.updateIcon = updateIcon
if err := wt.initInstance(); err != nil {
return nil, fmt.Errorf("Unable to init instance: %w\n", err)
}
if err := wt.createMenu(); err != nil {
return nil, fmt.Errorf("Unable to create menu: %w\n", err)
}
iconFilePath, err := iconBytesToFilePath(wt.normalIcon)
if err != nil {
return nil, fmt.Errorf("Unable to write icon data to temp file: %w", err)
}
if err := wt.setIcon(iconFilePath); err != nil {
return nil, fmt.Errorf("Unable to set icon: %w", err)
}
return &wt, wt.initMenus()
}
func (t *winTray) initInstance() error {
const (
className = "OllamaClass"
windowName = ""
)
t.wmSystrayMessage = WM_USER + 1
t.visibleItems = make(map[uint32][]uint32)
t.menus = make(map[uint32]windows.Handle)
t.menuOf = make(map[uint32]windows.Handle)
t.loadedImages = make(map[string]windows.Handle)
taskbarEventNamePtr, _ := windows.UTF16PtrFromString("TaskbarCreated")
// https://msdn.microsoft.com/en-us/library/windows/desktop/ms644947
res, _, err := pRegisterWindowMessage.Call(
uintptr(unsafe.Pointer(taskbarEventNamePtr)),
)
if res == 0 { // success 0xc000-0xfff
return fmt.Errorf("failed to register window: %w", err)
}
t.wmTaskbarCreated = uint32(res)
instanceHandle, _, err := pGetModuleHandle.Call(0)
if instanceHandle == 0 {
return err
}
t.instance = windows.Handle(instanceHandle)
// https://msdn.microsoft.com/en-us/library/windows/desktop/ms648072(v=vs.85).aspx
iconHandle, _, err := pLoadIcon.Call(0, uintptr(IDI_APPLICATION))
if iconHandle == 0 {
return err
}
t.icon = windows.Handle(iconHandle)
// https://msdn.microsoft.com/en-us/library/windows/desktop/ms648391(v=vs.85).aspx
cursorHandle, _, err := pLoadCursor.Call(0, uintptr(IDC_ARROW))
if cursorHandle == 0 {
return err
}
t.cursor = windows.Handle(cursorHandle)
classNamePtr, err := windows.UTF16PtrFromString(className)
if err != nil {
return err
}
windowNamePtr, err := windows.UTF16PtrFromString(windowName)
if err != nil {
return err
}
t.wcex = &wndClassEx{
Style: CS_HREDRAW | CS_VREDRAW,
WndProc: windows.NewCallback(t.wndProc),
Instance: t.instance,
Icon: t.icon,
Cursor: t.cursor,
Background: windows.Handle(6), // (COLOR_WINDOW + 1)
ClassName: classNamePtr,
IconSm: t.icon,
}
if err := t.wcex.register(); err != nil {
return err
}
windowHandle, _, err := pCreateWindowEx.Call(
uintptr(0),
uintptr(unsafe.Pointer(classNamePtr)),
uintptr(unsafe.Pointer(windowNamePtr)),
uintptr(WS_OVERLAPPEDWINDOW),
uintptr(CW_USEDEFAULT),
uintptr(CW_USEDEFAULT),
uintptr(CW_USEDEFAULT),
uintptr(CW_USEDEFAULT),
uintptr(0),
uintptr(0),
uintptr(t.instance),
uintptr(0),
)
if windowHandle == 0 {
return err
}
t.window = windows.Handle(windowHandle)
pShowWindow.Call(uintptr(t.window), uintptr(SW_HIDE)) //nolint:errcheck
boolRet, _, err := pUpdateWindow.Call(uintptr(t.window))
if boolRet == 0 {
slog.Error(fmt.Sprintf("failed to update window: %s", err))
}
t.muNID.Lock()
defer t.muNID.Unlock()
t.nid = &notifyIconData{
Wnd: t.window,
ID: 100,
Flags: NIF_MESSAGE,
CallbackMessage: t.wmSystrayMessage,
}
t.nid.Size = uint32(unsafe.Sizeof(*t.nid))
return t.nid.add()
}
func (t *winTray) createMenu() error {
menuHandle, _, err := pCreatePopupMenu.Call()
if menuHandle == 0 {
return err
}
t.menus[0] = windows.Handle(menuHandle)
// https://msdn.microsoft.com/en-us/library/windows/desktop/ms647575(v=vs.85).aspx
mi := struct {
Size, Mask, Style, Max uint32
Background windows.Handle
ContextHelpID uint32
MenuData uintptr
}{
Mask: MIM_APPLYTOSUBMENUS,
}
mi.Size = uint32(unsafe.Sizeof(mi))
res, _, err := pSetMenuInfo.Call(
uintptr(t.menus[0]),
uintptr(unsafe.Pointer(&mi)),
)
if res == 0 {
return err
}
return nil
}
// Contains information about a menu item.
// https://msdn.microsoft.com/en-us/library/windows/desktop/ms647578(v=vs.85).aspx
type menuItemInfo struct {
Size, Mask, Type, State uint32
ID uint32
SubMenu, Checked, Unchecked windows.Handle
ItemData uintptr
TypeData *uint16
Cch uint32
BMPItem windows.Handle
}
func (t *winTray) addOrUpdateMenuItem(menuItemId uint32, parentId uint32, title string, disabled bool) error {
titlePtr, err := windows.UTF16PtrFromString(title)
if err != nil {
return err
}
mi := menuItemInfo{
Mask: MIIM_FTYPE | MIIM_STRING | MIIM_ID | MIIM_STATE,
Type: MFT_STRING,
ID: menuItemId,
TypeData: titlePtr,
Cch: uint32(len(title)),
}
mi.Size = uint32(unsafe.Sizeof(mi))
if disabled {
mi.State |= MFS_DISABLED
}
var res uintptr
t.muMenus.RLock()
menu := t.menus[parentId]
t.muMenus.RUnlock()
if t.getVisibleItemIndex(parentId, menuItemId) != -1 {
// We set the menu item info based on the menuID
boolRet, _, err := pSetMenuItemInfo.Call(
uintptr(menu),
uintptr(menuItemId),
0,
uintptr(unsafe.Pointer(&mi)),
)
if boolRet == 0 {
return fmt.Errorf("failed to set menu item: %w", err)
}
}
if res == 0 {
// Menu item does not already exist, create it
t.muMenus.RLock()
submenu, exists := t.menus[menuItemId]
t.muMenus.RUnlock()
if exists {
mi.Mask |= MIIM_SUBMENU
mi.SubMenu = submenu
}
t.addToVisibleItems(parentId, menuItemId)
position := t.getVisibleItemIndex(parentId, menuItemId)
res, _, err = pInsertMenuItem.Call(
uintptr(menu),
uintptr(position),
1,
uintptr(unsafe.Pointer(&mi)),
)
if res == 0 {
t.delFromVisibleItems(parentId, menuItemId)
return err
}
t.muMenuOf.Lock()
t.menuOf[menuItemId] = menu
t.muMenuOf.Unlock()
}
return nil
}
func (t *winTray) addSeparatorMenuItem(menuItemId, parentId uint32) error {
mi := menuItemInfo{
Mask: MIIM_FTYPE | MIIM_ID | MIIM_STATE,
Type: MFT_SEPARATOR,
ID: menuItemId,
}
mi.Size = uint32(unsafe.Sizeof(mi))
t.addToVisibleItems(parentId, menuItemId)
position := t.getVisibleItemIndex(parentId, menuItemId)
t.muMenus.RLock()
menu := uintptr(t.menus[parentId])
t.muMenus.RUnlock()
res, _, err := pInsertMenuItem.Call(
menu,
uintptr(position),
1,
uintptr(unsafe.Pointer(&mi)),
)
if res == 0 {
return err
}
return nil
}
// func (t *winTray) hideMenuItem(menuItemId, parentId uint32) error {
// const ERROR_SUCCESS syscall.Errno = 0
// t.muMenus.RLock()
// menu := uintptr(t.menus[parentId])
// t.muMenus.RUnlock()
// res, _, err := pRemoveMenu.Call(
// menu,
// uintptr(menuItemId),
// MF_BYCOMMAND,
// )
// if res == 0 && err.(syscall.Errno) != ERROR_SUCCESS {
// return err
// }
// t.delFromVisibleItems(parentId, menuItemId)
// return nil
// }
func (t *winTray) showMenu() error {
p := point{}
boolRet, _, err := pGetCursorPos.Call(uintptr(unsafe.Pointer(&p)))
if boolRet == 0 {
return err
}
boolRet, _, err = pSetForegroundWindow.Call(uintptr(t.window))
if boolRet == 0 {
slog.Warn(fmt.Sprintf("failed to bring menu to foreground: %s", err))
}
boolRet, _, err = pTrackPopupMenu.Call(
uintptr(t.menus[0]),
TPM_BOTTOMALIGN|TPM_LEFTALIGN|TPM_RIGHTBUTTON,
uintptr(p.X),
uintptr(p.Y),
0,
uintptr(t.window),
0,
)
if boolRet == 0 {
return err
}
return nil
}
func (t *winTray) delFromVisibleItems(parent, val uint32) {
t.muVisibleItems.Lock()
defer t.muVisibleItems.Unlock()
visibleItems := t.visibleItems[parent]
for i, itemval := range visibleItems {
if val == itemval {
t.visibleItems[parent] = append(visibleItems[:i], visibleItems[i+1:]...)
break
}
}
}
func (t *winTray) addToVisibleItems(parent, val uint32) {
t.muVisibleItems.Lock()
defer t.muVisibleItems.Unlock()
if visibleItems, exists := t.visibleItems[parent]; !exists {
t.visibleItems[parent] = []uint32{val}
} else {
newvisible := append(visibleItems, val)
sort.Slice(newvisible, func(i, j int) bool { return newvisible[i] < newvisible[j] })
t.visibleItems[parent] = newvisible
}
}
func (t *winTray) getVisibleItemIndex(parent, val uint32) int {
t.muVisibleItems.RLock()
defer t.muVisibleItems.RUnlock()
for i, itemval := range t.visibleItems[parent] {
if val == itemval {
return i
}
}
return -1
}
func iconBytesToFilePath(iconBytes []byte) (string, error) {
bh := md5.Sum(iconBytes)
dataHash := hex.EncodeToString(bh[:])
iconFilePath := filepath.Join(os.TempDir(), "ollama_temp_icon_"+dataHash)
if _, err := os.Stat(iconFilePath); os.IsNotExist(err) {
if err := os.WriteFile(iconFilePath, iconBytes, 0o644); err != nil {
return "", err
}
}
return iconFilePath, nil
}
// Loads an image from file and shows it in tray.
// Shell_NotifyIcon: https://msdn.microsoft.com/en-us/library/windows/desktop/bb762159(v=vs.85).aspx
func (t *winTray) setIcon(src string) error {
h, err := t.loadIconFrom(src)
if err != nil {
return err
}
t.muNID.Lock()
defer t.muNID.Unlock()
t.nid.Icon = h
t.nid.Flags |= NIF_ICON | NIF_TIP
if toolTipUTF16, err := syscall.UTF16FromString(commontray.ToolTip); err == nil {
copy(t.nid.Tip[:], toolTipUTF16)
} else {
return err
}
t.nid.Size = uint32(unsafe.Sizeof(*t.nid))
return t.nid.modify()
}
// Loads an image from file to be shown in tray or menu item.
// LoadImage: https://msdn.microsoft.com/en-us/library/windows/desktop/ms648045(v=vs.85).aspx
func (t *winTray) loadIconFrom(src string) (windows.Handle, error) {
// Save and reuse handles of loaded images
t.muLoadedImages.RLock()
h, ok := t.loadedImages[src]
t.muLoadedImages.RUnlock()
if !ok {
srcPtr, err := windows.UTF16PtrFromString(src)
if err != nil {
return 0, err
}
res, _, err := pLoadImage.Call(
0,
uintptr(unsafe.Pointer(srcPtr)),
IMAGE_ICON,
0,
0,
LR_LOADFROMFILE|LR_DEFAULTSIZE,
)
if res == 0 {
return 0, err
}
h = windows.Handle(res)
t.muLoadedImages.Lock()
t.loadedImages[src] = h
t.muLoadedImages.Unlock()
}
return h, nil
}
func (t *winTray) DisplayFirstUseNotification() error {
t.muNID.Lock()
defer t.muNID.Unlock()
copy(t.nid.InfoTitle[:], windows.StringToUTF16(firstTimeTitle))
copy(t.nid.Info[:], windows.StringToUTF16(firstTimeMessage))
t.nid.Flags |= NIF_INFO
t.nid.Size = uint32(unsafe.Sizeof(*wt.nid))
return t.nid.modify()
}

View File

@@ -0,0 +1,91 @@
//go:build windows
package wintray
import (
"runtime"
"golang.org/x/sys/windows"
)
var (
k32 = windows.NewLazySystemDLL("Kernel32.dll")
u32 = windows.NewLazySystemDLL("User32.dll")
s32 = windows.NewLazySystemDLL("Shell32.dll")
pCreatePopupMenu = u32.NewProc("CreatePopupMenu")
pCreateWindowEx = u32.NewProc("CreateWindowExW")
pDefWindowProc = u32.NewProc("DefWindowProcW")
pDestroyWindow = u32.NewProc("DestroyWindow")
pDispatchMessage = u32.NewProc("DispatchMessageW")
pGetCursorPos = u32.NewProc("GetCursorPos")
pGetMessage = u32.NewProc("GetMessageW")
pGetModuleHandle = k32.NewProc("GetModuleHandleW")
pInsertMenuItem = u32.NewProc("InsertMenuItemW")
pLoadCursor = u32.NewProc("LoadCursorW")
pLoadIcon = u32.NewProc("LoadIconW")
pLoadImage = u32.NewProc("LoadImageW")
pPostMessage = u32.NewProc("PostMessageW")
pPostQuitMessage = u32.NewProc("PostQuitMessage")
pRegisterClass = u32.NewProc("RegisterClassExW")
pRegisterWindowMessage = u32.NewProc("RegisterWindowMessageW")
pSetForegroundWindow = u32.NewProc("SetForegroundWindow")
pSetMenuInfo = u32.NewProc("SetMenuInfo")
pSetMenuItemInfo = u32.NewProc("SetMenuItemInfoW")
pShellNotifyIcon = s32.NewProc("Shell_NotifyIconW")
pShowWindow = u32.NewProc("ShowWindow")
pTrackPopupMenu = u32.NewProc("TrackPopupMenu")
pTranslateMessage = u32.NewProc("TranslateMessage")
pUnregisterClass = u32.NewProc("UnregisterClassW")
pUpdateWindow = u32.NewProc("UpdateWindow")
)
const (
CS_HREDRAW = 0x0002
CS_VREDRAW = 0x0001
CW_USEDEFAULT = 0x80000000
IDC_ARROW = 32512 // Standard arrow
IDI_APPLICATION = 32512
IMAGE_ICON = 1 // Loads an icon
LR_DEFAULTSIZE = 0x00000040 // Loads default-size icon for windows(SM_CXICON x SM_CYICON) if cx, cy are set to zero
LR_LOADFROMFILE = 0x00000010 // Loads the stand-alone image from the file
MF_BYCOMMAND = 0x00000000
MFS_DISABLED = 0x00000003
MFT_SEPARATOR = 0x00000800
MFT_STRING = 0x00000000
MIIM_BITMAP = 0x00000080
MIIM_FTYPE = 0x00000100
MIIM_ID = 0x00000002
MIIM_STATE = 0x00000001
MIIM_STRING = 0x00000040
MIIM_SUBMENU = 0x00000004
MIM_APPLYTOSUBMENUS = 0x80000000
NIF_ICON = 0x00000002
NIF_TIP = 0x00000004
NIF_INFO = 0x00000010
NIF_MESSAGE = 0x00000001
SW_HIDE = 0
TPM_BOTTOMALIGN = 0x0020
TPM_LEFTALIGN = 0x0000
TPM_RIGHTBUTTON = 0x0002
WM_CLOSE = 0x0010
WM_USER = 0x0400
WS_CAPTION = 0x00C00000
WS_MAXIMIZEBOX = 0x00010000
WS_MINIMIZEBOX = 0x00020000
WS_OVERLAPPED = 0x00000000
WS_OVERLAPPEDWINDOW = WS_OVERLAPPED | WS_CAPTION | WS_SYSMENU | WS_THICKFRAME | WS_MINIMIZEBOX | WS_MAXIMIZEBOX
WS_SYSMENU = 0x00080000
WS_THICKFRAME = 0x00040000
)
// Not sure if this is actually needed on windows
func init() {
runtime.LockOSThread()
}
// The POINT structure defines the x- and y- coordinates of a point.
// https://msdn.microsoft.com/en-us/library/windows/desktop/dd162805(v=vs.85).aspx
type point struct {
X, Y int32
}

View File

@@ -0,0 +1,45 @@
//go:build windows
package wintray
import (
"unsafe"
"golang.org/x/sys/windows"
)
// Contains window class information.
// It is used with the RegisterClassEx and GetClassInfoEx functions.
// https://msdn.microsoft.com/en-us/library/ms633577.aspx
type wndClassEx struct {
Size, Style uint32
WndProc uintptr
ClsExtra, WndExtra int32
Instance, Icon, Cursor, Background windows.Handle
MenuName, ClassName *uint16
IconSm windows.Handle
}
// Registers a window class for subsequent use in calls to the CreateWindow or CreateWindowEx function.
// https://msdn.microsoft.com/en-us/library/ms633587.aspx
func (w *wndClassEx) register() error {
w.Size = uint32(unsafe.Sizeof(*w))
res, _, err := pRegisterClass.Call(uintptr(unsafe.Pointer(w)))
if res == 0 {
return err
}
return nil
}
// Unregisters a window class, freeing the memory required for the class.
// https://msdn.microsoft.com/en-us/library/ms644899.aspx
func (w *wndClassEx) unregister() error {
res, _, err := pUnregisterClass.Call(
uintptr(unsafe.Pointer(w.ClassName)),
uintptr(w.Instance),
)
if res == 0 {
return err
}
return nil
}

92
auth/auth.go Normal file
View File

@@ -0,0 +1,92 @@
package auth
import (
"bytes"
"context"
"crypto/rand"
"encoding/base64"
"errors"
"fmt"
"io"
"log/slog"
"os"
"path/filepath"
"strings"
"golang.org/x/crypto/ssh"
)
const defaultPrivateKey = "id_ed25519"
func keyPath() (string, error) {
home, err := os.UserHomeDir()
if err != nil {
return "", err
}
return filepath.Join(home, ".ollama", defaultPrivateKey), nil
}
func GetPublicKey() (string, error) {
keyPath, err := keyPath()
if err != nil {
return "", err
}
privateKeyFile, err := os.ReadFile(keyPath)
if err != nil {
slog.Info(fmt.Sprintf("Failed to load private key: %v", err))
return "", err
}
privateKey, err := ssh.ParsePrivateKey(privateKeyFile)
if err != nil {
return "", err
}
publicKey := ssh.MarshalAuthorizedKey(privateKey.PublicKey())
return strings.TrimSpace(string(publicKey)), nil
}
func NewNonce(r io.Reader, length int) (string, error) {
nonce := make([]byte, length)
if _, err := io.ReadFull(r, nonce); err != nil {
return "", err
}
return base64.RawURLEncoding.EncodeToString(nonce), nil
}
func Sign(ctx context.Context, bts []byte) (string, error) {
keyPath, err := keyPath()
if err != nil {
return "", err
}
privateKeyFile, err := os.ReadFile(keyPath)
if err != nil {
slog.Info(fmt.Sprintf("Failed to load private key: %v", err))
return "", err
}
privateKey, err := ssh.ParsePrivateKey(privateKeyFile)
if err != nil {
return "", err
}
// get the pubkey, but remove the type
publicKey := ssh.MarshalAuthorizedKey(privateKey.PublicKey())
parts := bytes.Split(publicKey, []byte(" "))
if len(parts) < 2 {
return "", errors.New("malformed public key")
}
signedData, err := privateKey.Sign(rand.Reader, bts)
if err != nil {
return "", err
}
// signature is <pubkey>:<signature>
return fmt.Sprintf("%s:%s", bytes.TrimSpace(parts[1]), base64.StdEncoding.EncodeToString(signedData.Blob)), nil
}

View File

@@ -0,0 +1,178 @@
package benchmark
import (
"context"
"flag"
"fmt"
"testing"
"time"
"github.com/ollama/ollama/api"
)
// Command line flags
var modelFlag string
func init() {
flag.StringVar(&modelFlag, "m", "", "Name of the model to benchmark")
flag.Lookup("m").DefValue = "model"
}
// modelName returns the model name from flags, failing the test if not set
func modelName(b *testing.B) string {
if modelFlag == "" {
b.Fatal("Error: -m flag is required for benchmark tests")
}
return modelFlag
}
type TestCase struct {
name string
prompt string
maxTokens int
}
// runGenerateBenchmark contains the common generate and metrics logic
func runGenerateBenchmark(b *testing.B, ctx context.Context, client *api.Client, req *api.GenerateRequest) {
start := time.Now()
var ttft time.Duration
var metrics api.Metrics
err := client.Generate(ctx, req, func(resp api.GenerateResponse) error {
if ttft == 0 && resp.Response != "" {
ttft = time.Since(start)
}
if resp.Done {
metrics = resp.Metrics
}
return nil
})
// Report custom metrics as part of the benchmark results
b.ReportMetric(float64(ttft.Milliseconds()), "ttft_ms")
b.ReportMetric(float64(metrics.LoadDuration.Milliseconds()), "load_ms")
// Token throughput metrics
promptThroughput := float64(metrics.PromptEvalCount) / metrics.PromptEvalDuration.Seconds()
genThroughput := float64(metrics.EvalCount) / metrics.EvalDuration.Seconds()
b.ReportMetric(promptThroughput, "prompt_tok/s")
b.ReportMetric(genThroughput, "gen_tok/s")
// Token counts
b.ReportMetric(float64(metrics.PromptEvalCount), "prompt_tokens")
b.ReportMetric(float64(metrics.EvalCount), "gen_tokens")
if err != nil {
b.Fatal(err)
}
}
// BenchmarkColdStart runs benchmarks with model loading from cold state
func BenchmarkColdStart(b *testing.B) {
client := setup(b)
tests := []TestCase{
{"short_prompt", "Write a long story", 100},
{"medium_prompt", "Write a detailed economic analysis", 500},
{"long_prompt", "Write a comprehensive AI research paper", 1000},
}
m := modelName(b)
for _, tt := range tests {
b.Run(fmt.Sprintf("%s/cold/%s", m, tt.name), func(b *testing.B) {
ctx := context.Background()
// Set number of tokens as our throughput metric
b.SetBytes(int64(tt.maxTokens))
for b.Loop() {
b.StopTimer()
// Ensure model is unloaded before each iteration
unload(client, m, b)
b.StartTimer()
req := &api.GenerateRequest{
Model: m,
Prompt: tt.prompt,
Options: map[string]any{"num_predict": tt.maxTokens, "temperature": 0.1},
}
runGenerateBenchmark(b, ctx, client, req)
}
})
}
}
// BenchmarkWarmStart runs benchmarks with pre-loaded model
func BenchmarkWarmStart(b *testing.B) {
client := setup(b)
tests := []TestCase{
{"short_prompt", "Write a long story", 100},
{"medium_prompt", "Write a detailed economic analysis", 500},
{"long_prompt", "Write a comprehensive AI research paper", 1000},
}
m := modelName(b)
for _, tt := range tests {
b.Run(fmt.Sprintf("%s/warm/%s", m, tt.name), func(b *testing.B) {
ctx := context.Background()
// Pre-warm the model
warmup(client, m, tt.prompt, b)
// Set number of tokens as our throughput metric
b.SetBytes(int64(tt.maxTokens))
for b.Loop() {
req := &api.GenerateRequest{
Model: m,
Prompt: tt.prompt,
Options: map[string]any{"num_predict": tt.maxTokens, "temperature": 0.1},
}
runGenerateBenchmark(b, ctx, client, req)
}
})
}
}
// setup verifies server and model availability
func setup(b *testing.B) *api.Client {
client, err := api.ClientFromEnvironment()
if err != nil {
b.Fatal(err)
}
if _, err := client.Show(context.Background(), &api.ShowRequest{Model: modelName(b)}); err != nil {
b.Fatalf("Model unavailable: %v", err)
}
return client
}
// warmup ensures the model is loaded and warmed up
func warmup(client *api.Client, model string, prompt string, b *testing.B) {
for range 3 {
err := client.Generate(
context.Background(),
&api.GenerateRequest{
Model: model,
Prompt: prompt,
Options: map[string]any{"num_predict": 50, "temperature": 0.1},
},
func(api.GenerateResponse) error { return nil },
)
if err != nil {
b.Logf("Error during model warm-up: %v", err)
}
}
}
// unload forces model unloading using KeepAlive: 0 parameter
func unload(client *api.Client, model string, b *testing.B) {
req := &api.GenerateRequest{
Model: model,
KeepAlive: &api.Duration{Duration: 0},
}
if err := client.Generate(context.Background(), req, func(api.GenerateResponse) error { return nil }); err != nil {
b.Logf("Unload error: %v", err)
}
time.Sleep(1 * time.Second)
}

1504
cmd/cmd.go

File diff suppressed because it is too large Load Diff

922
cmd/cmd_test.go Normal file
View File

@@ -0,0 +1,922 @@
package cmd
import (
"bytes"
"context"
"encoding/json"
"io"
"net/http"
"net/http/httptest"
"os"
"strings"
"testing"
"time"
"github.com/google/go-cmp/cmp"
"github.com/spf13/cobra"
"github.com/ollama/ollama/api"
"github.com/ollama/ollama/types/model"
)
func TestShowInfo(t *testing.T) {
t.Run("bare details", func(t *testing.T) {
var b bytes.Buffer
if err := showInfo(&api.ShowResponse{
Details: api.ModelDetails{
Family: "test",
ParameterSize: "7B",
QuantizationLevel: "FP16",
},
}, false, &b); err != nil {
t.Fatal(err)
}
expect := ` Model
architecture test
parameters 7B
quantization FP16
`
if diff := cmp.Diff(expect, b.String()); diff != "" {
t.Errorf("unexpected output (-want +got):\n%s", diff)
}
})
t.Run("bare model info", func(t *testing.T) {
var b bytes.Buffer
if err := showInfo(&api.ShowResponse{
ModelInfo: map[string]any{
"general.architecture": "test",
"general.parameter_count": float64(7_000_000_000),
"test.context_length": float64(0),
"test.embedding_length": float64(0),
},
Details: api.ModelDetails{
Family: "test",
ParameterSize: "7B",
QuantizationLevel: "FP16",
},
}, false, &b); err != nil {
t.Fatal(err)
}
expect := ` Model
architecture test
parameters 7B
context length 0
embedding length 0
quantization FP16
`
if diff := cmp.Diff(expect, b.String()); diff != "" {
t.Errorf("unexpected output (-want +got):\n%s", diff)
}
})
t.Run("verbose model", func(t *testing.T) {
var b bytes.Buffer
if err := showInfo(&api.ShowResponse{
Details: api.ModelDetails{
Family: "test",
ParameterSize: "8B",
QuantizationLevel: "FP16",
},
Parameters: `
stop up`,
ModelInfo: map[string]any{
"general.architecture": "test",
"general.parameter_count": float64(8_000_000_000),
"some.true_bool": true,
"some.false_bool": false,
"test.context_length": float64(1000),
"test.embedding_length": float64(11434),
},
Tensors: []api.Tensor{
{Name: "blk.0.attn_k.weight", Type: "BF16", Shape: []uint64{42, 3117}},
{Name: "blk.0.attn_q.weight", Type: "FP16", Shape: []uint64{3117, 42}},
},
}, true, &b); err != nil {
t.Fatal(err)
}
expect := ` Model
architecture test
parameters 8B
context length 1000
embedding length 11434
quantization FP16
Parameters
stop up
Metadata
general.architecture test
general.parameter_count 8e+09
some.false_bool false
some.true_bool true
test.context_length 1000
test.embedding_length 11434
Tensors
blk.0.attn_k.weight BF16 [42 3117]
blk.0.attn_q.weight FP16 [3117 42]
`
if diff := cmp.Diff(expect, b.String()); diff != "" {
t.Errorf("unexpected output (-want +got):\n%s", diff)
}
})
t.Run("parameters", func(t *testing.T) {
var b bytes.Buffer
if err := showInfo(&api.ShowResponse{
Details: api.ModelDetails{
Family: "test",
ParameterSize: "7B",
QuantizationLevel: "FP16",
},
Parameters: `
stop never
stop gonna
stop give
stop you
stop up
temperature 99`,
}, false, &b); err != nil {
t.Fatal(err)
}
expect := ` Model
architecture test
parameters 7B
quantization FP16
Parameters
stop never
stop gonna
stop give
stop you
stop up
temperature 99
`
if diff := cmp.Diff(expect, b.String()); diff != "" {
t.Errorf("unexpected output (-want +got):\n%s", diff)
}
})
t.Run("project info", func(t *testing.T) {
var b bytes.Buffer
if err := showInfo(&api.ShowResponse{
Details: api.ModelDetails{
Family: "test",
ParameterSize: "7B",
QuantizationLevel: "FP16",
},
ProjectorInfo: map[string]any{
"general.architecture": "clip",
"general.parameter_count": float64(133_700_000),
"clip.vision.embedding_length": float64(0),
"clip.vision.projection_dim": float64(0),
},
}, false, &b); err != nil {
t.Fatal(err)
}
expect := ` Model
architecture test
parameters 7B
quantization FP16
Projector
architecture clip
parameters 133.70M
embedding length 0
dimensions 0
`
if diff := cmp.Diff(expect, b.String()); diff != "" {
t.Errorf("unexpected output (-want +got):\n%s", diff)
}
})
t.Run("system", func(t *testing.T) {
var b bytes.Buffer
if err := showInfo(&api.ShowResponse{
Details: api.ModelDetails{
Family: "test",
ParameterSize: "7B",
QuantizationLevel: "FP16",
},
System: `You are a pirate!
Ahoy, matey!
Weigh anchor!
`,
}, false, &b); err != nil {
t.Fatal(err)
}
expect := ` Model
architecture test
parameters 7B
quantization FP16
System
You are a pirate!
Ahoy, matey!
`
if diff := cmp.Diff(expect, b.String()); diff != "" {
t.Errorf("unexpected output (-want +got):\n%s", diff)
}
})
t.Run("license", func(t *testing.T) {
var b bytes.Buffer
license := "MIT License\nCopyright (c) Ollama\n"
if err := showInfo(&api.ShowResponse{
Details: api.ModelDetails{
Family: "test",
ParameterSize: "7B",
QuantizationLevel: "FP16",
},
License: license,
}, false, &b); err != nil {
t.Fatal(err)
}
expect := ` Model
architecture test
parameters 7B
quantization FP16
License
MIT License
Copyright (c) Ollama
`
if diff := cmp.Diff(expect, b.String()); diff != "" {
t.Errorf("unexpected output (-want +got):\n%s", diff)
}
})
t.Run("capabilities", func(t *testing.T) {
var b bytes.Buffer
if err := showInfo(&api.ShowResponse{
Details: api.ModelDetails{
Family: "test",
ParameterSize: "7B",
QuantizationLevel: "FP16",
},
Capabilities: []model.Capability{model.CapabilityVision, model.CapabilityTools},
}, false, &b); err != nil {
t.Fatal(err)
}
expect := " Model\n" +
" architecture test \n" +
" parameters 7B \n" +
" quantization FP16 \n" +
"\n" +
" Capabilities\n" +
" vision \n" +
" tools \n" +
"\n"
if diff := cmp.Diff(expect, b.String()); diff != "" {
t.Errorf("unexpected output (-want +got):\n%s", diff)
}
})
}
func TestDeleteHandler(t *testing.T) {
stopped := false
mockServer := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
if r.URL.Path == "/api/delete" && r.Method == http.MethodDelete {
var req api.DeleteRequest
if err := json.NewDecoder(r.Body).Decode(&req); err != nil {
http.Error(w, err.Error(), http.StatusBadRequest)
return
}
if req.Name == "test-model" {
w.WriteHeader(http.StatusOK)
} else {
w.WriteHeader(http.StatusNotFound)
}
return
}
if r.URL.Path == "/api/generate" && r.Method == http.MethodPost {
var req api.GenerateRequest
if err := json.NewDecoder(r.Body).Decode(&req); err != nil {
http.Error(w, err.Error(), http.StatusBadRequest)
return
}
if req.Model == "test-model" {
w.WriteHeader(http.StatusOK)
if err := json.NewEncoder(w).Encode(api.GenerateResponse{
Done: true,
}); err != nil {
http.Error(w, err.Error(), http.StatusInternalServerError)
}
stopped = true
return
} else {
w.WriteHeader(http.StatusNotFound)
if err := json.NewEncoder(w).Encode(api.GenerateResponse{
Done: false,
}); err != nil {
http.Error(w, err.Error(), http.StatusInternalServerError)
}
}
}
}))
t.Setenv("OLLAMA_HOST", mockServer.URL)
t.Cleanup(mockServer.Close)
cmd := &cobra.Command{}
cmd.SetContext(context.TODO())
if err := DeleteHandler(cmd, []string{"test-model"}); err != nil {
t.Fatalf("DeleteHandler failed: %v", err)
}
if !stopped {
t.Fatal("Model was not stopped before deletion")
}
err := DeleteHandler(cmd, []string{"test-model-not-found"})
if err == nil || !strings.Contains(err.Error(), "unable to stop existing running model \"test-model-not-found\"") {
t.Fatalf("DeleteHandler failed: expected error about stopping non-existent model, got %v", err)
}
}
func TestGetModelfileName(t *testing.T) {
tests := []struct {
name string
modelfileName string
fileExists bool
expectedName string
expectedErr error
}{
{
name: "no modelfile specified, no modelfile exists",
modelfileName: "",
fileExists: false,
expectedName: "",
expectedErr: os.ErrNotExist,
},
{
name: "no modelfile specified, modelfile exists",
modelfileName: "",
fileExists: true,
expectedName: "Modelfile",
expectedErr: nil,
},
{
name: "modelfile specified, no modelfile exists",
modelfileName: "crazyfile",
fileExists: false,
expectedName: "",
expectedErr: os.ErrNotExist,
},
{
name: "modelfile specified, modelfile exists",
modelfileName: "anotherfile",
fileExists: true,
expectedName: "anotherfile",
expectedErr: nil,
},
}
for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
cmd := &cobra.Command{
Use: "fakecmd",
}
cmd.Flags().String("file", "", "path to modelfile")
var expectedFilename string
if tt.fileExists {
tempDir, err := os.MkdirTemp("", "modelfiledir")
defer os.RemoveAll(tempDir)
if err != nil {
t.Fatalf("temp modelfile dir creation failed: %v", err)
}
var fn string
if tt.modelfileName != "" {
fn = tt.modelfileName
} else {
fn = "Modelfile"
}
tempFile, err := os.CreateTemp(tempDir, fn)
if err != nil {
t.Fatalf("temp modelfile creation failed: %v", err)
}
defer tempFile.Close()
expectedFilename = tempFile.Name()
err = cmd.Flags().Set("file", expectedFilename)
if err != nil {
t.Fatalf("couldn't set file flag: %v", err)
}
} else {
expectedFilename = tt.expectedName
if tt.modelfileName != "" {
err := cmd.Flags().Set("file", tt.modelfileName)
if err != nil {
t.Fatalf("couldn't set file flag: %v", err)
}
}
}
actualFilename, actualErr := getModelfileName(cmd)
if actualFilename != expectedFilename {
t.Errorf("expected filename: '%s' actual filename: '%s'", expectedFilename, actualFilename)
}
if tt.expectedErr != os.ErrNotExist {
if actualErr != tt.expectedErr {
t.Errorf("expected err: %v actual err: %v", tt.expectedErr, actualErr)
}
} else {
if !os.IsNotExist(actualErr) {
t.Errorf("expected err: %v actual err: %v", tt.expectedErr, actualErr)
}
}
})
}
}
func TestPushHandler(t *testing.T) {
tests := []struct {
name string
modelName string
serverResponse map[string]func(w http.ResponseWriter, r *http.Request)
expectedError string
expectedOutput string
}{
{
name: "successful push",
modelName: "test-model",
serverResponse: map[string]func(w http.ResponseWriter, r *http.Request){
"/api/push": func(w http.ResponseWriter, r *http.Request) {
if r.Method != http.MethodPost {
t.Errorf("expected POST request, got %s", r.Method)
}
var req api.PushRequest
if err := json.NewDecoder(r.Body).Decode(&req); err != nil {
http.Error(w, err.Error(), http.StatusBadRequest)
return
}
if req.Name != "test-model" {
t.Errorf("expected model name 'test-model', got %s", req.Name)
}
// Simulate progress updates
responses := []api.ProgressResponse{
{Status: "preparing manifest"},
{Digest: "sha256:abc123456789", Total: 100, Completed: 50},
{Digest: "sha256:abc123456789", Total: 100, Completed: 100},
}
for _, resp := range responses {
if err := json.NewEncoder(w).Encode(resp); err != nil {
http.Error(w, err.Error(), http.StatusInternalServerError)
return
}
w.(http.Flusher).Flush()
}
},
},
expectedOutput: "\nYou can find your model at:\n\n\thttps://ollama.com/test-model\n",
},
{
name: "unauthorized push",
modelName: "unauthorized-model",
serverResponse: map[string]func(w http.ResponseWriter, r *http.Request){
"/api/push": func(w http.ResponseWriter, r *http.Request) {
w.Header().Set("Content-Type", "application/json")
w.WriteHeader(http.StatusUnauthorized)
err := json.NewEncoder(w).Encode(map[string]string{
"error": "access denied",
})
if err != nil {
t.Fatal(err)
}
},
},
expectedError: "you are not authorized to push to this namespace, create the model under a namespace you own",
},
}
for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
mockServer := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
if handler, ok := tt.serverResponse[r.URL.Path]; ok {
handler(w, r)
return
}
http.Error(w, "not found", http.StatusNotFound)
}))
defer mockServer.Close()
t.Setenv("OLLAMA_HOST", mockServer.URL)
cmd := &cobra.Command{}
cmd.Flags().Bool("insecure", false, "")
cmd.SetContext(context.TODO())
// Redirect stderr to capture progress output
oldStderr := os.Stderr
r, w, _ := os.Pipe()
os.Stderr = w
// Capture stdout for the "Model pushed" message
oldStdout := os.Stdout
outR, outW, _ := os.Pipe()
os.Stdout = outW
err := PushHandler(cmd, []string{tt.modelName})
// Restore stderr
w.Close()
os.Stderr = oldStderr
// drain the pipe
if _, err := io.ReadAll(r); err != nil {
t.Fatal(err)
}
// Restore stdout and get output
outW.Close()
os.Stdout = oldStdout
stdout, _ := io.ReadAll(outR)
if tt.expectedError == "" {
if err != nil {
t.Errorf("expected no error, got %v", err)
}
if tt.expectedOutput != "" {
if got := string(stdout); got != tt.expectedOutput {
t.Errorf("expected output %q, got %q", tt.expectedOutput, got)
}
}
} else {
if err == nil || !strings.Contains(err.Error(), tt.expectedError) {
t.Errorf("expected error containing %q, got %v", tt.expectedError, err)
}
}
})
}
}
func TestListHandler(t *testing.T) {
tests := []struct {
name string
args []string
serverResponse []api.ListModelResponse
expectedError string
expectedOutput string
}{
{
name: "list all models",
args: []string{},
serverResponse: []api.ListModelResponse{
{Name: "model1", Digest: "sha256:abc123", Size: 1024, ModifiedAt: time.Now().Add(-24 * time.Hour)},
{Name: "model2", Digest: "sha256:def456", Size: 2048, ModifiedAt: time.Now().Add(-48 * time.Hour)},
},
expectedOutput: "NAME ID SIZE MODIFIED \n" +
"model1 sha256:abc12 1.0 KB 24 hours ago \n" +
"model2 sha256:def45 2.0 KB 2 days ago \n",
},
{
name: "filter models by prefix",
args: []string{"model1"},
serverResponse: []api.ListModelResponse{
{Name: "model1", Digest: "sha256:abc123", Size: 1024, ModifiedAt: time.Now().Add(-24 * time.Hour)},
{Name: "model2", Digest: "sha256:def456", Size: 2048, ModifiedAt: time.Now().Add(-24 * time.Hour)},
},
expectedOutput: "NAME ID SIZE MODIFIED \n" +
"model1 sha256:abc12 1.0 KB 24 hours ago \n",
},
{
name: "server error",
args: []string{},
expectedError: "server error",
},
}
for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
mockServer := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
if r.URL.Path != "/api/tags" || r.Method != http.MethodGet {
t.Errorf("unexpected request to %s %s", r.Method, r.URL.Path)
http.Error(w, "not found", http.StatusNotFound)
return
}
if tt.expectedError != "" {
http.Error(w, tt.expectedError, http.StatusInternalServerError)
return
}
response := api.ListResponse{Models: tt.serverResponse}
if err := json.NewEncoder(w).Encode(response); err != nil {
t.Fatal(err)
}
}))
defer mockServer.Close()
t.Setenv("OLLAMA_HOST", mockServer.URL)
cmd := &cobra.Command{}
cmd.SetContext(context.TODO())
// Capture stdout
oldStdout := os.Stdout
r, w, _ := os.Pipe()
os.Stdout = w
err := ListHandler(cmd, tt.args)
// Restore stdout and get output
w.Close()
os.Stdout = oldStdout
output, _ := io.ReadAll(r)
if tt.expectedError == "" {
if err != nil {
t.Errorf("expected no error, got %v", err)
}
if got := string(output); got != tt.expectedOutput {
t.Errorf("expected output:\n%s\ngot:\n%s", tt.expectedOutput, got)
}
} else {
if err == nil || !strings.Contains(err.Error(), tt.expectedError) {
t.Errorf("expected error containing %q, got %v", tt.expectedError, err)
}
}
})
}
}
func TestCreateHandler(t *testing.T) {
tests := []struct {
name string
modelName string
modelFile string
serverResponse map[string]func(w http.ResponseWriter, r *http.Request)
expectedError string
expectedOutput string
}{
{
name: "successful create",
modelName: "test-model",
modelFile: "FROM foo",
serverResponse: map[string]func(w http.ResponseWriter, r *http.Request){
"/api/create": func(w http.ResponseWriter, r *http.Request) {
if r.Method != http.MethodPost {
t.Errorf("expected POST request, got %s", r.Method)
}
req := api.CreateRequest{}
if err := json.NewDecoder(r.Body).Decode(&req); err != nil {
http.Error(w, err.Error(), http.StatusBadRequest)
return
}
if req.Name != "test-model" {
t.Errorf("expected model name 'test-model', got %s", req.Name)
}
if req.From != "foo" {
t.Errorf("expected from 'foo', got %s", req.From)
}
responses := []api.ProgressResponse{
{Status: "using existing layer sha256:56bb8bd477a519ffa694fc449c2413c6f0e1d3b1c88fa7e3c9d88d3ae49d4dcb"},
{Status: "writing manifest"},
{Status: "success"},
}
for _, resp := range responses {
if err := json.NewEncoder(w).Encode(resp); err != nil {
http.Error(w, err.Error(), http.StatusInternalServerError)
return
}
w.(http.Flusher).Flush()
}
},
},
expectedOutput: "",
},
}
for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
mockServer := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
handler, ok := tt.serverResponse[r.URL.Path]
if !ok {
t.Errorf("unexpected request to %s", r.URL.Path)
http.Error(w, "not found", http.StatusNotFound)
return
}
handler(w, r)
}))
t.Setenv("OLLAMA_HOST", mockServer.URL)
t.Cleanup(mockServer.Close)
tempFile, err := os.CreateTemp("", "modelfile")
if err != nil {
t.Fatal(err)
}
defer os.Remove(tempFile.Name())
if _, err := tempFile.WriteString(tt.modelFile); err != nil {
t.Fatal(err)
}
if err := tempFile.Close(); err != nil {
t.Fatal(err)
}
cmd := &cobra.Command{}
cmd.Flags().String("file", "", "")
if err := cmd.Flags().Set("file", tempFile.Name()); err != nil {
t.Fatal(err)
}
cmd.Flags().Bool("insecure", false, "")
cmd.SetContext(context.TODO())
// Redirect stderr to capture progress output
oldStderr := os.Stderr
r, w, _ := os.Pipe()
os.Stderr = w
// Capture stdout for the "Model pushed" message
oldStdout := os.Stdout
outR, outW, _ := os.Pipe()
os.Stdout = outW
err = CreateHandler(cmd, []string{tt.modelName})
// Restore stderr
w.Close()
os.Stderr = oldStderr
// drain the pipe
if _, err := io.ReadAll(r); err != nil {
t.Fatal(err)
}
// Restore stdout and get output
outW.Close()
os.Stdout = oldStdout
stdout, _ := io.ReadAll(outR)
if tt.expectedError == "" {
if err != nil {
t.Errorf("expected no error, got %v", err)
}
if tt.expectedOutput != "" {
if got := string(stdout); got != tt.expectedOutput {
t.Errorf("expected output %q, got %q", tt.expectedOutput, got)
}
}
}
})
}
}
func TestNewCreateRequest(t *testing.T) {
tests := []struct {
name string
from string
opts runOptions
expected *api.CreateRequest
}{
{
"basic test",
"newmodel",
runOptions{
Model: "mymodel",
ParentModel: "",
Prompt: "You are a fun AI agent",
Messages: []api.Message{},
WordWrap: true,
},
&api.CreateRequest{
From: "mymodel",
Model: "newmodel",
},
},
{
"parent model test",
"newmodel",
runOptions{
Model: "mymodel",
ParentModel: "parentmodel",
Messages: []api.Message{},
WordWrap: true,
},
&api.CreateRequest{
From: "parentmodel",
Model: "newmodel",
},
},
{
"parent model as filepath test",
"newmodel",
runOptions{
Model: "mymodel",
ParentModel: "/some/file/like/etc/passwd",
Messages: []api.Message{},
WordWrap: true,
},
&api.CreateRequest{
From: "mymodel",
Model: "newmodel",
},
},
{
"parent model as windows filepath test",
"newmodel",
runOptions{
Model: "mymodel",
ParentModel: "D:\\some\\file\\like\\etc\\passwd",
Messages: []api.Message{},
WordWrap: true,
},
&api.CreateRequest{
From: "mymodel",
Model: "newmodel",
},
},
{
"options test",
"newmodel",
runOptions{
Model: "mymodel",
ParentModel: "parentmodel",
Options: map[string]any{
"temperature": 1.0,
},
},
&api.CreateRequest{
From: "parentmodel",
Model: "newmodel",
Parameters: map[string]any{
"temperature": 1.0,
},
},
},
{
"messages test",
"newmodel",
runOptions{
Model: "mymodel",
ParentModel: "parentmodel",
System: "You are a fun AI agent",
Messages: []api.Message{
{
Role: "user",
Content: "hello there!",
},
{
Role: "assistant",
Content: "hello to you!",
},
},
WordWrap: true,
},
&api.CreateRequest{
From: "parentmodel",
Model: "newmodel",
System: "You are a fun AI agent",
Messages: []api.Message{
{
Role: "user",
Content: "hello there!",
},
{
Role: "assistant",
Content: "hello to you!",
},
},
},
},
}
for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
actual := NewCreateRequest(tt.from, tt.opts)
if !cmp.Equal(actual, tt.expected) {
t.Errorf("expected output %#v, got %#v", tt.expected, actual)
}
})
}
}

582
cmd/interactive.go Normal file
View File

@@ -0,0 +1,582 @@
package cmd
import (
"cmp"
"errors"
"fmt"
"io"
"net/http"
"os"
"path/filepath"
"regexp"
"slices"
"strings"
"github.com/spf13/cobra"
"github.com/ollama/ollama/api"
"github.com/ollama/ollama/envconfig"
"github.com/ollama/ollama/readline"
"github.com/ollama/ollama/types/errtypes"
"github.com/ollama/ollama/types/model"
)
type MultilineState int
const (
MultilineNone MultilineState = iota
MultilinePrompt
MultilineSystem
)
func generateInteractive(cmd *cobra.Command, opts runOptions) error {
usage := func() {
fmt.Fprintln(os.Stderr, "Available Commands:")
fmt.Fprintln(os.Stderr, " /set Set session variables")
fmt.Fprintln(os.Stderr, " /show Show model information")
fmt.Fprintln(os.Stderr, " /load <model> Load a session or model")
fmt.Fprintln(os.Stderr, " /save <model> Save your current session")
fmt.Fprintln(os.Stderr, " /clear Clear session context")
fmt.Fprintln(os.Stderr, " /bye Exit")
fmt.Fprintln(os.Stderr, " /?, /help Help for a command")
fmt.Fprintln(os.Stderr, " /? shortcuts Help for keyboard shortcuts")
fmt.Fprintln(os.Stderr, "")
fmt.Fprintln(os.Stderr, "Use \"\"\" to begin a multi-line message.")
if opts.MultiModal {
fmt.Fprintf(os.Stderr, "Use %s to include .jpg or .png images.\n", filepath.FromSlash("/path/to/file"))
}
fmt.Fprintln(os.Stderr, "")
}
usageSet := func() {
fmt.Fprintln(os.Stderr, "Available Commands:")
fmt.Fprintln(os.Stderr, " /set parameter ... Set a parameter")
fmt.Fprintln(os.Stderr, " /set system <string> Set system message")
fmt.Fprintln(os.Stderr, " /set history Enable history")
fmt.Fprintln(os.Stderr, " /set nohistory Disable history")
fmt.Fprintln(os.Stderr, " /set wordwrap Enable wordwrap")
fmt.Fprintln(os.Stderr, " /set nowordwrap Disable wordwrap")
fmt.Fprintln(os.Stderr, " /set format json Enable JSON mode")
fmt.Fprintln(os.Stderr, " /set noformat Disable formatting")
fmt.Fprintln(os.Stderr, " /set verbose Show LLM stats")
fmt.Fprintln(os.Stderr, " /set quiet Disable LLM stats")
fmt.Fprintln(os.Stderr, "")
}
usageShortcuts := func() {
fmt.Fprintln(os.Stderr, "Available keyboard shortcuts:")
fmt.Fprintln(os.Stderr, " Ctrl + a Move to the beginning of the line (Home)")
fmt.Fprintln(os.Stderr, " Ctrl + e Move to the end of the line (End)")
fmt.Fprintln(os.Stderr, " Alt + b Move back (left) one word")
fmt.Fprintln(os.Stderr, " Alt + f Move forward (right) one word")
fmt.Fprintln(os.Stderr, " Ctrl + k Delete the sentence after the cursor")
fmt.Fprintln(os.Stderr, " Ctrl + u Delete the sentence before the cursor")
fmt.Fprintln(os.Stderr, " Ctrl + w Delete the word before the cursor")
fmt.Fprintln(os.Stderr, "")
fmt.Fprintln(os.Stderr, " Ctrl + l Clear the screen")
fmt.Fprintln(os.Stderr, " Ctrl + c Stop the model from responding")
fmt.Fprintln(os.Stderr, " Ctrl + d Exit ollama (/bye)")
fmt.Fprintln(os.Stderr, "")
}
usageShow := func() {
fmt.Fprintln(os.Stderr, "Available Commands:")
fmt.Fprintln(os.Stderr, " /show info Show details for this model")
fmt.Fprintln(os.Stderr, " /show license Show model license")
fmt.Fprintln(os.Stderr, " /show modelfile Show Modelfile for this model")
fmt.Fprintln(os.Stderr, " /show parameters Show parameters for this model")
fmt.Fprintln(os.Stderr, " /show system Show system message")
fmt.Fprintln(os.Stderr, " /show template Show prompt template")
fmt.Fprintln(os.Stderr, "")
}
// only list out the most common parameters
usageParameters := func() {
fmt.Fprintln(os.Stderr, "Available Parameters:")
fmt.Fprintln(os.Stderr, " /set parameter seed <int> Random number seed")
fmt.Fprintln(os.Stderr, " /set parameter num_predict <int> Max number of tokens to predict")
fmt.Fprintln(os.Stderr, " /set parameter top_k <int> Pick from top k num of tokens")
fmt.Fprintln(os.Stderr, " /set parameter top_p <float> Pick token based on sum of probabilities")
fmt.Fprintln(os.Stderr, " /set parameter min_p <float> Pick token based on top token probability * min_p")
fmt.Fprintln(os.Stderr, " /set parameter num_ctx <int> Set the context size")
fmt.Fprintln(os.Stderr, " /set parameter temperature <float> Set creativity level")
fmt.Fprintln(os.Stderr, " /set parameter repeat_penalty <float> How strongly to penalize repetitions")
fmt.Fprintln(os.Stderr, " /set parameter repeat_last_n <int> Set how far back to look for repetitions")
fmt.Fprintln(os.Stderr, " /set parameter num_gpu <int> The number of layers to send to the GPU")
fmt.Fprintln(os.Stderr, " /set parameter stop <string> <string> ... Set the stop parameters")
fmt.Fprintln(os.Stderr, "")
}
scanner, err := readline.New(readline.Prompt{
Prompt: ">>> ",
AltPrompt: "... ",
Placeholder: "Send a message (/? for help)",
AltPlaceholder: `Use """ to end multi-line input`,
})
if err != nil {
return err
}
if envconfig.NoHistory() {
scanner.HistoryDisable()
}
fmt.Print(readline.StartBracketedPaste)
defer fmt.Printf(readline.EndBracketedPaste)
var sb strings.Builder
var multiline MultilineState
for {
line, err := scanner.Readline()
switch {
case errors.Is(err, io.EOF):
fmt.Println()
return nil
case errors.Is(err, readline.ErrInterrupt):
if line == "" {
fmt.Println("\nUse Ctrl + d or /bye to exit.")
}
scanner.Prompt.UseAlt = false
sb.Reset()
continue
case err != nil:
return err
}
switch {
case multiline != MultilineNone:
// check if there's a multiline terminating string
before, ok := strings.CutSuffix(line, `"""`)
sb.WriteString(before)
if !ok {
fmt.Fprintln(&sb)
continue
}
switch multiline {
case MultilineSystem:
opts.System = sb.String()
opts.Messages = append(opts.Messages, api.Message{Role: "system", Content: opts.System})
fmt.Println("Set system message.")
sb.Reset()
}
multiline = MultilineNone
scanner.Prompt.UseAlt = false
case strings.HasPrefix(line, `"""`):
line := strings.TrimPrefix(line, `"""`)
line, ok := strings.CutSuffix(line, `"""`)
sb.WriteString(line)
if !ok {
// no multiline terminating string; need more input
fmt.Fprintln(&sb)
multiline = MultilinePrompt
scanner.Prompt.UseAlt = true
}
case scanner.Pasting:
fmt.Fprintln(&sb, line)
continue
case strings.HasPrefix(line, "/list"):
args := strings.Fields(line)
if err := ListHandler(cmd, args[1:]); err != nil {
return err
}
case strings.HasPrefix(line, "/load"):
args := strings.Fields(line)
if len(args) != 2 {
fmt.Println("Usage:\n /load <modelname>")
continue
}
opts.Model = args[1]
opts.Messages = []api.Message{}
fmt.Printf("Loading model '%s'\n", opts.Model)
if err := loadOrUnloadModel(cmd, &opts); err != nil {
if strings.Contains(err.Error(), "not found") {
fmt.Printf("error: %v\n", err)
continue
}
return err
}
continue
case strings.HasPrefix(line, "/save"):
args := strings.Fields(line)
if len(args) != 2 {
fmt.Println("Usage:\n /save <modelname>")
continue
}
client, err := api.ClientFromEnvironment()
if err != nil {
fmt.Println("error: couldn't connect to ollama server")
return err
}
req := NewCreateRequest(args[1], opts)
fn := func(resp api.ProgressResponse) error { return nil }
err = client.Create(cmd.Context(), req, fn)
if err != nil {
if strings.Contains(err.Error(), errtypes.InvalidModelNameErrMsg) {
fmt.Printf("error: The model name '%s' is invalid\n", args[1])
continue
}
return err
}
fmt.Printf("Created new model '%s'\n", args[1])
continue
case strings.HasPrefix(line, "/clear"):
opts.Messages = []api.Message{}
if opts.System != "" {
newMessage := api.Message{Role: "system", Content: opts.System}
opts.Messages = append(opts.Messages, newMessage)
}
fmt.Println("Cleared session context")
continue
case strings.HasPrefix(line, "/set"):
args := strings.Fields(line)
if len(args) > 1 {
switch args[1] {
case "history":
scanner.HistoryEnable()
case "nohistory":
scanner.HistoryDisable()
case "wordwrap":
opts.WordWrap = true
fmt.Println("Set 'wordwrap' mode.")
case "nowordwrap":
opts.WordWrap = false
fmt.Println("Set 'nowordwrap' mode.")
case "verbose":
if err := cmd.Flags().Set("verbose", "true"); err != nil {
return err
}
fmt.Println("Set 'verbose' mode.")
case "quiet":
if err := cmd.Flags().Set("verbose", "false"); err != nil {
return err
}
fmt.Println("Set 'quiet' mode.")
case "format":
if len(args) < 3 || args[2] != "json" {
fmt.Println("Invalid or missing format. For 'json' mode use '/set format json'")
} else {
opts.Format = args[2]
fmt.Printf("Set format to '%s' mode.\n", args[2])
}
case "noformat":
opts.Format = ""
fmt.Println("Disabled format.")
case "parameter":
if len(args) < 4 {
usageParameters()
continue
}
params := args[3:]
fp, err := api.FormatParams(map[string][]string{args[2]: params})
if err != nil {
fmt.Printf("Couldn't set parameter: %q\n", err)
continue
}
fmt.Printf("Set parameter '%s' to '%s'\n", args[2], strings.Join(params, ", "))
opts.Options[args[2]] = fp[args[2]]
case "system":
if len(args) < 3 {
usageSet()
continue
}
multiline = MultilineSystem
line := strings.Join(args[2:], " ")
line, ok := strings.CutPrefix(line, `"""`)
if !ok {
multiline = MultilineNone
} else {
// only cut suffix if the line is multiline
line, ok = strings.CutSuffix(line, `"""`)
if ok {
multiline = MultilineNone
}
}
sb.WriteString(line)
if multiline != MultilineNone {
scanner.Prompt.UseAlt = true
continue
}
opts.System = sb.String() // for display in modelfile
newMessage := api.Message{Role: "system", Content: sb.String()}
// Check if the slice is not empty and the last message is from 'system'
if len(opts.Messages) > 0 && opts.Messages[len(opts.Messages)-1].Role == "system" {
// Replace the last message
opts.Messages[len(opts.Messages)-1] = newMessage
} else {
opts.Messages = append(opts.Messages, newMessage)
}
fmt.Println("Set system message.")
sb.Reset()
continue
default:
fmt.Printf("Unknown command '/set %s'. Type /? for help\n", args[1])
}
} else {
usageSet()
}
case strings.HasPrefix(line, "/show"):
args := strings.Fields(line)
if len(args) > 1 {
client, err := api.ClientFromEnvironment()
if err != nil {
fmt.Println("error: couldn't connect to ollama server")
return err
}
req := &api.ShowRequest{
Name: opts.Model,
System: opts.System,
Options: opts.Options,
}
resp, err := client.Show(cmd.Context(), req)
if err != nil {
fmt.Println("error: couldn't get model")
return err
}
switch args[1] {
case "info":
_ = showInfo(resp, false, os.Stderr)
case "license":
if resp.License == "" {
fmt.Println("No license was specified for this model.")
} else {
fmt.Println(resp.License)
}
case "modelfile":
fmt.Println(resp.Modelfile)
case "parameters":
if resp.Parameters == "" {
fmt.Println("No parameters were specified for this model.")
} else {
if len(opts.Options) > 0 {
fmt.Println("User defined parameters:")
for k, v := range opts.Options {
fmt.Printf("%-*s %v\n", 30, k, v)
}
fmt.Println()
}
fmt.Println("Model defined parameters:")
fmt.Println(resp.Parameters)
}
case "system":
switch {
case opts.System != "":
fmt.Println(opts.System + "\n")
case resp.System != "":
fmt.Println(resp.System + "\n")
default:
fmt.Println("No system message was specified for this model.")
}
case "template":
if resp.Template != "" {
fmt.Println(resp.Template)
} else {
fmt.Println("No prompt template was specified for this model.")
}
default:
fmt.Printf("Unknown command '/show %s'. Type /? for help\n", args[1])
}
} else {
usageShow()
}
case strings.HasPrefix(line, "/help"), strings.HasPrefix(line, "/?"):
args := strings.Fields(line)
if len(args) > 1 {
switch args[1] {
case "set", "/set":
usageSet()
case "show", "/show":
usageShow()
case "shortcut", "shortcuts":
usageShortcuts()
}
} else {
usage()
}
case strings.HasPrefix(line, "/exit"), strings.HasPrefix(line, "/bye"):
return nil
case strings.HasPrefix(line, "/"):
args := strings.Fields(line)
isFile := false
if opts.MultiModal {
for _, f := range extractFileNames(line) {
if strings.HasPrefix(f, args[0]) {
isFile = true
break
}
}
}
if !isFile {
fmt.Printf("Unknown command '%s'. Type /? for help\n", args[0])
continue
}
sb.WriteString(line)
default:
sb.WriteString(line)
}
if sb.Len() > 0 && multiline == MultilineNone {
newMessage := api.Message{Role: "user", Content: sb.String()}
if opts.MultiModal {
msg, images, err := extractFileData(sb.String())
if err != nil {
return err
}
newMessage.Content = msg
newMessage.Images = images
}
opts.Messages = append(opts.Messages, newMessage)
assistant, err := chat(cmd, opts)
if err != nil {
return err
}
if assistant != nil {
opts.Messages = append(opts.Messages, *assistant)
}
sb.Reset()
}
}
}
func NewCreateRequest(name string, opts runOptions) *api.CreateRequest {
parentModel := opts.ParentModel
modelName := model.ParseName(parentModel)
if !modelName.IsValid() {
parentModel = ""
}
req := &api.CreateRequest{
Model: name,
From: cmp.Or(parentModel, opts.Model),
}
if opts.System != "" {
req.System = opts.System
}
if len(opts.Options) > 0 {
req.Parameters = opts.Options
}
if len(opts.Messages) > 0 {
req.Messages = opts.Messages
}
return req
}
func normalizeFilePath(fp string) string {
return strings.NewReplacer(
"\\ ", " ", // Escaped space
"\\(", "(", // Escaped left parenthesis
"\\)", ")", // Escaped right parenthesis
"\\[", "[", // Escaped left square bracket
"\\]", "]", // Escaped right square bracket
"\\{", "{", // Escaped left curly brace
"\\}", "}", // Escaped right curly brace
"\\$", "$", // Escaped dollar sign
"\\&", "&", // Escaped ampersand
"\\;", ";", // Escaped semicolon
"\\'", "'", // Escaped single quote
"\\\\", "\\", // Escaped backslash
"\\*", "*", // Escaped asterisk
"\\?", "?", // Escaped question mark
"\\~", "~", // Escaped tilde
).Replace(fp)
}
func extractFileNames(input string) []string {
// Regex to match file paths starting with optional drive letter, / ./ \ or .\ and include escaped or unescaped spaces (\ or %20)
// and followed by more characters and a file extension
// This will capture non filename strings, but we'll check for file existence to remove mismatches
regexPattern := `(?:[a-zA-Z]:)?(?:\./|/|\\)[\S\\ ]+?\.(?i:jpg|jpeg|png)\b`
re := regexp.MustCompile(regexPattern)
return re.FindAllString(input, -1)
}
func extractFileData(input string) (string, []api.ImageData, error) {
filePaths := extractFileNames(input)
var imgs []api.ImageData
for _, fp := range filePaths {
nfp := normalizeFilePath(fp)
data, err := getImageData(nfp)
if errors.Is(err, os.ErrNotExist) {
continue
} else if err != nil {
fmt.Fprintf(os.Stderr, "Couldn't process image: %q\n", err)
return "", imgs, err
}
fmt.Fprintf(os.Stderr, "Added image '%s'\n", nfp)
input = strings.ReplaceAll(input, fp, "")
imgs = append(imgs, data)
}
return strings.TrimSpace(input), imgs, nil
}
func getImageData(filePath string) ([]byte, error) {
file, err := os.Open(filePath)
if err != nil {
return nil, err
}
defer file.Close()
buf := make([]byte, 512)
_, err = file.Read(buf)
if err != nil {
return nil, err
}
contentType := http.DetectContentType(buf)
allowedTypes := []string{"image/jpeg", "image/jpg", "image/png"}
if !slices.Contains(allowedTypes, contentType) {
return nil, fmt.Errorf("invalid image type: %s", contentType)
}
info, err := file.Stat()
if err != nil {
return nil, err
}
// Check if the file size exceeds 100MB
var maxSize int64 = 100 * 1024 * 1024 // 100MB in bytes
if info.Size() > maxSize {
return nil, errors.New("file size exceeds maximum limit (100MB)")
}
buf = make([]byte, info.Size())
_, err = file.Seek(0, 0)
if err != nil {
return nil, err
}
_, err = io.ReadFull(file, buf)
if err != nil {
return nil, err
}
return buf, nil
}

52
cmd/interactive_test.go Normal file
View File

@@ -0,0 +1,52 @@
package cmd
import (
"testing"
"github.com/stretchr/testify/assert"
)
func TestExtractFilenames(t *testing.T) {
// Unix style paths
input := ` some preamble
./relative\ path/one.png inbetween1 ./not a valid two.jpg inbetween2 ./1.svg
/unescaped space /three.jpeg inbetween3 /valid\ path/dir/four.png "./quoted with spaces/five.JPG`
res := extractFileNames(input)
assert.Len(t, res, 5)
assert.Contains(t, res[0], "one.png")
assert.Contains(t, res[1], "two.jpg")
assert.Contains(t, res[2], "three.jpeg")
assert.Contains(t, res[3], "four.png")
assert.Contains(t, res[4], "five.JPG")
assert.NotContains(t, res[4], '"')
assert.NotContains(t, res, "inbetween1")
assert.NotContains(t, res, "./1.svg")
// Windows style paths
input = ` some preamble
c:/users/jdoe/one.png inbetween1 c:/program files/someplace/two.jpg inbetween2
/absolute/nospace/three.jpeg inbetween3 /absolute/with space/four.png inbetween4
./relative\ path/five.JPG inbetween5 "./relative with/spaces/six.png inbetween6
d:\path with\spaces\seven.JPEG inbetween7 c:\users\jdoe\eight.png inbetween8
d:\program files\someplace\nine.png inbetween9 "E:\program files\someplace\ten.PNG some ending
`
res = extractFileNames(input)
assert.Len(t, res, 10)
assert.NotContains(t, res, "inbetween2")
assert.Contains(t, res[0], "one.png")
assert.Contains(t, res[0], "c:")
assert.Contains(t, res[1], "two.jpg")
assert.Contains(t, res[1], "c:")
assert.Contains(t, res[2], "three.jpeg")
assert.Contains(t, res[3], "four.png")
assert.Contains(t, res[4], "five.JPG")
assert.Contains(t, res[5], "six.png")
assert.Contains(t, res[6], "seven.JPEG")
assert.Contains(t, res[6], "d:")
assert.Contains(t, res[7], "eight.png")
assert.Contains(t, res[7], "c:")
assert.Contains(t, res[8], "nine.png")
assert.Contains(t, res[8], "d:")
assert.Contains(t, res[9], "ten.PNG")
assert.Contains(t, res[9], "E:")
}

15
cmd/runner/main.go Normal file
View File

@@ -0,0 +1,15 @@
package main
import (
"fmt"
"os"
"github.com/ollama/ollama/runner"
)
func main() {
if err := runner.Execute(os.Args[1:]); err != nil {
fmt.Fprintf(os.Stderr, "error: %s\n", err)
os.Exit(1)
}
}

View File

@@ -1,44 +0,0 @@
package cmd
import (
"fmt"
"os"
"time"
"github.com/jmorganca/ollama/progressbar"
)
type Spinner struct {
description string
*progressbar.ProgressBar
}
func NewSpinner(description string) *Spinner {
return &Spinner{
description: description,
ProgressBar: progressbar.NewOptions(-1,
progressbar.OptionSetWriter(os.Stderr),
progressbar.OptionThrottle(60*time.Millisecond),
progressbar.OptionSpinnerType(14),
progressbar.OptionSetRenderBlankState(true),
progressbar.OptionSetElapsedTime(false),
progressbar.OptionClearOnFinish(),
progressbar.OptionSetDescription(description),
),
}
}
func (s *Spinner) Spin(tick time.Duration) {
for range time.Tick(tick) {
if s.IsFinished() {
break
}
s.Add(1)
}
}
func (s *Spinner) Stop() {
s.Finish()
fmt.Println(s.description)
}

27
cmd/start.go Normal file
View File

@@ -0,0 +1,27 @@
//go:build darwin || windows
package cmd
import (
"context"
"errors"
"time"
"github.com/ollama/ollama/api"
)
func waitForServer(ctx context.Context, client *api.Client) error {
// wait for the server to start
timeout := time.After(5 * time.Second)
tick := time.Tick(500 * time.Millisecond)
for {
select {
case <-timeout:
return errors.New("timed out waiting for server to start")
case <-tick:
if err := client.Heartbeat(ctx); err == nil {
return nil // server has started
}
}
}
}

30
cmd/start_darwin.go Normal file
View File

@@ -0,0 +1,30 @@
package cmd
import (
"context"
"errors"
"os"
"os/exec"
"strings"
"github.com/ollama/ollama/api"
)
func startApp(ctx context.Context, client *api.Client) error {
exe, err := os.Executable()
if err != nil {
return err
}
link, err := os.Readlink(exe)
if err != nil {
return err
}
if !strings.Contains(link, "Ollama.app") {
return errors.New("could not find ollama app")
}
path := strings.Split(link, "Ollama.app")
if err := exec.Command("/usr/bin/open", "-a", path[0]+"Ollama.app").Run(); err != nil {
return err
}
return waitForServer(ctx, client)
}

14
cmd/start_default.go Normal file
View File

@@ -0,0 +1,14 @@
//go:build !windows && !darwin
package cmd
import (
"context"
"errors"
"github.com/ollama/ollama/api"
)
func startApp(ctx context.Context, client *api.Client) error {
return errors.New("could not connect to ollama server, run 'ollama serve' to start it")
}

58
cmd/start_windows.go Normal file
View File

@@ -0,0 +1,58 @@
package cmd
import (
"context"
"errors"
"fmt"
"os"
"os/exec"
"path/filepath"
"strings"
"syscall"
"github.com/ollama/ollama/api"
)
func startApp(ctx context.Context, client *api.Client) error {
// log.Printf("XXX Attempting to find and start ollama app")
AppName := "ollama app.exe"
exe, err := os.Executable()
if err != nil {
return err
}
appExe := filepath.Join(filepath.Dir(exe), AppName)
_, err = os.Stat(appExe)
if errors.Is(err, os.ErrNotExist) {
// Try the standard install location
localAppData := os.Getenv("LOCALAPPDATA")
appExe = filepath.Join(localAppData, "Ollama", AppName)
_, err := os.Stat(appExe)
if errors.Is(err, os.ErrNotExist) {
// Finally look in the path
appExe, err = exec.LookPath(AppName)
if err != nil {
return errors.New("could not locate ollama app")
}
}
}
// log.Printf("XXX attempting to start app %s", appExe)
cmd_path := "c:\\Windows\\system32\\cmd.exe"
cmd := exec.Command(cmd_path, "/c", appExe)
// TODO - these hide flags aren't working - still pops up a command window for some reason
cmd.SysProcAttr = &syscall.SysProcAttr{CreationFlags: 0x08000000, HideWindow: true}
// TODO this didn't help either...
cmd.Stdin = strings.NewReader("")
cmd.Stdout = os.Stdout
cmd.Stderr = os.Stderr
if err := cmd.Start(); err != nil {
return fmt.Errorf("unable to start ollama app %w", err)
}
if cmd.Process != nil {
defer cmd.Process.Release() //nolint:errcheck
}
return waitForServer(ctx, client)
}

251
convert/convert.go Normal file
View File

@@ -0,0 +1,251 @@
package convert
import (
"encoding/json"
"errors"
"fmt"
"io"
"io/fs"
"log/slog"
"slices"
"strings"
"github.com/ollama/ollama/fs/ggml"
)
type ModelParameters struct {
Architectures []string `json:"architectures"`
VocabSize uint32 `json:"vocab_size"`
TextModel TextParameters `json:"text_config"`
}
type TextParameters struct {
VocabSize uint32 `json:"vocab_size"`
}
type AdapterParameters struct {
Alpha uint32 `json:"lora_alpha"`
LoraLayers uint32 `json:"lora_layers"`
LoraParameters struct {
Rank uint32 `json:"rank"`
Alpha float32 `json:"alpha"`
Scale float32 `json:"scale"`
} `json:"lora_parameters"`
}
func (ModelParameters) KV(t *Tokenizer) ggml.KV {
kv := ggml.KV{
"general.file_type": uint32(1),
"general.quantization_version": uint32(2),
"tokenizer.ggml.pre": t.Pre,
"tokenizer.ggml.model": t.Vocabulary.Model,
"tokenizer.ggml.tokens": t.Vocabulary.Tokens,
"tokenizer.ggml.scores": t.Vocabulary.Scores,
"tokenizer.ggml.token_type": t.Vocabulary.Types,
}
if len(t.Merges) > 0 {
kv["tokenizer.ggml.merges"] = t.Merges
}
if t.Template != "" {
kv["tokenizer.chat_template"] = t.Template
}
for _, sv := range t.SpecialVocabulary {
kv[fmt.Sprintf("tokenizer.ggml.%s_token_id", sv.Key())] = uint32(sv.ID)
kv[fmt.Sprintf("tokenizer.ggml.add_%s_token", sv.Key())] = sv.AddToken
}
return kv
}
func (p AdapterParameters) KV() ggml.KV {
var alpha float32
if p.LoraParameters.Alpha == 0 {
alpha = float32(p.Alpha)
} else {
alpha = p.LoraParameters.Alpha
}
kv := ggml.KV{
"adapter.lora.alpha": alpha,
"adapter.type": "lora",
"general.file_type": uint32(1),
"general.type": "adapter",
"general.version": "v0.2",
}
return kv
}
func (ModelParameters) specialTokenTypes() []string {
return []string{
"bos", "eos", "unk", "sep", "pad", "cls", "mask",
}
}
type ModelConverter interface {
// KV maps parameters to LLM key-values
KV(*Tokenizer) ggml.KV
// Tensors maps input tensors to LLM tensors. Model specific modifications can be done here.
Tensors([]Tensor) []ggml.Tensor
// Replacements returns a list of string pairs to replace in tensor names.
// See [strings.Replacer](https://pkg.go.dev/strings#Replacer) for details
Replacements() []string
// specialTokenTypes returns any special token types the model uses
specialTokenTypes() []string
}
type moreParser interface {
parseMore(fs.FS) error
}
type AdapterConverter interface {
// KV maps parameters to LLM key-values
KV(ggml.KV) ggml.KV
// Tensors maps input tensors to LLM tensors. Adapter specific modifications can be done here.
Tensors([]Tensor) []ggml.Tensor
// Replacements returns a list of string pairs to replace in tensor names.
// See [strings.Replacer](https://pkg.go.dev/strings#Replacer) for details
Replacements() []string
}
func ConvertAdapter(fsys fs.FS, ws io.WriteSeeker, baseKV ggml.KV) error {
bts, err := fs.ReadFile(fsys, "adapter_config.json")
if err != nil {
return err
}
var p AdapterParameters
if err := json.Unmarshal(bts, &p); err != nil {
return err
}
arch, ok := baseKV["general.architecture"]
if !ok {
return errors.New("architecture not set for the base model")
}
var conv AdapterConverter
switch arch {
case "llama":
conv = &llamaAdapter{}
case "gemma2":
conv = &gemma2Adapter{}
default:
return errors.New("unsupported architecture")
}
ts, err := parseTensors(fsys, strings.NewReplacer(conv.Replacements()...))
if err != nil {
return err
}
if err := json.Unmarshal(bts, conv); err != nil {
return err
}
return writeFile(ws, conv.KV(baseKV), conv.Tensors(ts))
}
// Convert writes an Ollama compatible model to the provided io.WriteSeeker based on configurations
// and files it finds in the input path.
// Supported input model formats include safetensors.
// Supported input tokenizers files include tokenizer.json (preferred) and tokenizer.model.
func ConvertModel(fsys fs.FS, ws io.WriteSeeker) error {
bts, err := fs.ReadFile(fsys, "config.json")
if err != nil {
return err
}
var p ModelParameters
if err := json.Unmarshal(bts, &p); err != nil {
return err
}
if len(p.Architectures) < 1 {
return errors.New("unknown architecture")
}
var conv ModelConverter
switch p.Architectures[0] {
case "LlamaForCausalLM":
conv = &llamaModel{}
case "Llama4ForConditionalGeneration":
conv = &llama4Model{}
case "Mistral3ForConditionalGeneration":
conv = &mistral3Model{}
case "MixtralForCausalLM":
conv = &mixtralModel{}
case "GemmaForCausalLM":
conv = &gemmaModel{}
case "Gemma2ForCausalLM":
conv = &gemma2Model{}
case "Gemma3ForCausalLM", "Gemma3ForConditionalGeneration":
conv = &gemma3Model{Architecture: p.Architectures[0]}
case "Phi3ForCausalLM":
conv = &phi3Model{}
case "Qwen2ForCausalLM":
conv = &qwen2Model{}
case "BertModel":
conv = &bertModel{}
case "CohereForCausalLM":
conv = &commandrModel{}
default:
return fmt.Errorf("unsupported architecture %q", p.Architectures[0])
}
if err := json.Unmarshal(bts, conv); err != nil {
return err
}
if t, ok := conv.(moreParser); ok {
if err := t.parseMore(fsys); err != nil {
return err
}
}
t, err := parseTokenizer(fsys, conv.specialTokenTypes())
if err != nil {
return err
}
vocabSize := int(p.VocabSize)
if vocabSize == 0 {
tVocabSize := int(p.TextModel.VocabSize)
vocabSize = tVocabSize
}
switch {
case vocabSize == 0:
slog.Warn("vocabulary size was not explicitly set by the model", "default size", len(t.Vocabulary.Tokens))
case vocabSize > len(t.Vocabulary.Tokens):
slog.Warn("vocabulary is smaller than expected, padding with dummy tokens", "expect", vocabSize, "actual", len(t.Vocabulary.Tokens))
for i := range vocabSize - len(t.Vocabulary.Tokens) {
t.Vocabulary.Tokens = append(t.Vocabulary.Tokens, fmt.Sprintf("[PAD%d]", i))
t.Vocabulary.Scores = append(t.Vocabulary.Scores, -1)
t.Vocabulary.Types = append(t.Vocabulary.Types, tokenTypeUserDefined)
}
case vocabSize < len(t.Vocabulary.Tokens):
return fmt.Errorf("vocabulary is larger than expected '%d' instead of '%d'", len(t.Vocabulary.Tokens), vocabSize)
default:
slog.Debug("vocabulary", "size", len(t.Vocabulary.Tokens))
}
ts, err := parseTensors(fsys, strings.NewReplacer(conv.Replacements()...))
if err != nil {
return err
}
return writeFile(ws, conv.KV(t), conv.Tensors(ts))
}
func writeFile(ws io.WriteSeeker, kv ggml.KV, ts []ggml.Tensor) error {
for i := range ts {
ts[i].Shape = slices.Clone(ts[i].Shape)
slices.Reverse(ts[i].Shape)
}
return ggml.WriteGGUF(ws, kv, ts)
}

174
convert/convert_bert.go Normal file
View File

@@ -0,0 +1,174 @@
package convert
import (
"cmp"
"encoding/json"
"io/fs"
"path/filepath"
"slices"
"strings"
"github.com/ollama/ollama/fs/ggml"
)
type bertModel struct {
ModelParameters
NLayers uint32 `json:"n_layers"`
NumHiddenLayers uint32 `json:"num_hidden_layers"`
NLayer uint32 `json:"n_layer"`
MaxPositionEmbeddings uint32 `json:"max_position_embeddings"`
NCtx uint32 `json:"n_ctx"`
HiddenSize uint32 `json:"hidden_size"`
NEmbd uint32 `json:"n_embd"`
IntermediateSize uint32 `json:"intermediate_size"`
NInner uint32 `json:"n_inner"`
NumAttentionHeads uint32 `json:"num_attention_heads"`
NHead uint32 `json:"n_head"`
NumKeyValueHeads uint32 `json:"num_key_value_heads"`
LayerNormEPS float32 `json:"layer_norm_eps"`
LayerNormEpsilon float32 `json:"layer_norm_epsilon"`
NormEpsilon float32 `json:"norm_epsilon"`
PoolingType uint32
}
var (
_ ModelConverter = (*bertModel)(nil)
_ moreParser = (*bertModel)(nil)
)
func (p *bertModel) parseMore(fsys fs.FS) error {
bts, err := fs.ReadFile(fsys, "modules.json")
if err != nil {
return err
}
var modules []struct {
Type string `json:"type"`
Path string `json:"path"`
}
if err := json.Unmarshal(bts, &modules); err != nil {
return err
}
var pooling string
for _, m := range modules {
if m.Type == "sentence_transformers.models.Pooling" {
pooling = m.Path
break
}
}
if pooling != "" {
bts, err := fs.ReadFile(fsys, filepath.Join(pooling, "config.json"))
if err != nil {
return err
}
var pc struct {
PoolingModeCLSToken bool `json:"pooling_mode_cls_token"`
PoolingModeMeanTokens bool `json:"pooling_mode_mean_tokens"`
}
if err := json.Unmarshal(bts, &pc); err != nil {
return err
}
if pc.PoolingModeMeanTokens {
p.PoolingType = 1
} else if pc.PoolingModeCLSToken {
p.PoolingType = 2
}
}
return nil
}
func (p *bertModel) KV(t *Tokenizer) ggml.KV {
kv := p.ModelParameters.KV(t)
kv["general.architecture"] = "bert"
kv["bert.attention.causal"] = false
kv["bert.pooling_type"] = p.PoolingType
kv["bert.block_count"] = cmp.Or(p.NLayers, p.NumHiddenLayers, p.NLayer)
if contextLength := cmp.Or(p.MaxPositionEmbeddings, p.NCtx); contextLength > 0 {
kv["bert.context_length"] = contextLength
}
if embeddingLength := cmp.Or(p.HiddenSize, p.NEmbd); embeddingLength > 0 {
kv["bert.embedding_length"] = cmp.Or(p.HiddenSize, p.NEmbd)
}
if feedForwardLength := cmp.Or(p.IntermediateSize, p.NInner); feedForwardLength > 0 {
kv["bert.feed_forward_length"] = cmp.Or(p.IntermediateSize, p.NInner)
}
if headCount := cmp.Or(p.NumAttentionHeads, p.NHead); headCount > 0 {
kv["bert.attention.head_count"] = cmp.Or(p.NumAttentionHeads, p.NHead)
}
if layerNormEpsilon := cmp.Or(p.LayerNormEPS, p.LayerNormEpsilon, p.NormEpsilon); layerNormEpsilon > 0 {
kv["bert.attention.layer_norm_epsilon"] = layerNormEpsilon
}
kv["tokenizer.ggml.model"] = "bert"
kv["tokenizer.ggml.token_type_count"] = uint32(2)
// convert to phantom space tokens
for i, e := range t.Tokens {
if strings.HasPrefix(e, "[") && strings.HasSuffix(e, "]") {
// noop
} else if strings.HasPrefix(e, "##") {
t.Tokens[i] = e[2:]
} else {
t.Tokens[i] = "\u2581" + e
}
}
kv["tokenizer.ggml.tokens"] = t.Tokens
return kv
}
func (p *bertModel) Tensors(ts []Tensor) []ggml.Tensor {
var out []ggml.Tensor
for _, t := range ts {
if slices.Contains([]string{
"embeddings.position_ids",
"pooler.dense.weight",
"pooler.dense.bias",
}, t.Name()) {
continue
}
out = append(out, ggml.Tensor{
Name: t.Name(),
Kind: t.Kind(),
Shape: t.Shape(),
WriterTo: t,
})
}
return out
}
func (bertModel) Replacements() []string {
return []string{
"encoder.layer", "blk",
"encoder.layers", "blk",
"embeddings.word_embeddings", "token_embd",
"embeddings.token_type_embeddings", "token_types",
"embeddings.LayerNorm", "token_embd_norm",
"embeddings.position_embeddings", "position_embd",
"attention.self.query", "attn_q",
"attention.self.key", "attn_k",
"attention.self.value", "attn_v",
"attention.output.dense", "attn_output",
"attention.output.LayerNorm", "attn_output_norm",
"intermediate.dense", "ffn_up",
"output.dense", "ffn_down",
"output.LayerNorm", "layer_output_norm",
}
}

View File

@@ -0,0 +1,76 @@
package convert
import (
"cmp"
"github.com/ollama/ollama/fs/ggml"
)
type commandrModel struct {
ModelParameters
MaxPositionEmbeddings uint32 `json:"max_position_embeddings"`
HiddenSize uint32 `json:"hidden_size"`
HiddenLayers uint32 `json:"num_hidden_layers"`
IntermediateSize uint32 `json:"intermediate_size"`
NumAttentionHeads uint32 `json:"num_attention_heads"`
NumKeyValueHeads uint32 `json:"num_key_value_heads"`
LayerNormEPS float32 `json:"layer_norm_eps"`
RopeTheta float32 `json:"rope_theta"`
UseQKNorm bool `json:"use_qk_norm"`
MaxLength uint32 `json:"model_max_length"`
LogitScale float32 `json:"logit_scale"`
NCtx uint32 `json:"n_ctx"`
}
var _ ModelConverter = (*commandrModel)(nil)
func (p *commandrModel) KV(t *Tokenizer) ggml.KV {
kv := p.ModelParameters.KV(t)
kv["general.architecture"] = "command-r"
kv["general.name"] = "command-r"
kv["command-r.context_length"] = cmp.Or(p.MaxLength, p.MaxPositionEmbeddings, p.NCtx)
kv["command-r.embedding_length"] = p.HiddenSize
kv["command-r.block_count"] = p.HiddenLayers
kv["command-r.feed_forward_length"] = p.IntermediateSize
kv["command-r.attention.head_count"] = p.NumAttentionHeads
kv["command-r.attention.head_count_kv"] = p.NumKeyValueHeads
kv["command-r.attention.layer_norm_epsilon"] = p.LayerNormEPS
kv["command-r.rope.freq_base"] = p.RopeTheta
kv["command-r.max_position_embeddings"] = cmp.Or(p.MaxLength, p.MaxPositionEmbeddings)
kv["command-r.logit_scale"] = p.LogitScale
kv["command-r.rope.scaling.type"] = "none"
return kv
}
func (p *commandrModel) Tensors(ts []Tensor) []ggml.Tensor {
var out []ggml.Tensor
for _, t := range ts {
out = append(out, ggml.Tensor{
Name: t.Name(),
Kind: t.Kind(),
Shape: t.Shape(),
WriterTo: t,
})
}
return out
}
func (p *commandrModel) Replacements() []string {
return []string{
"self_attn.q_norm", "attn_q_norm",
"self_attn.k_norm", "attn_k_norm",
"model.layers", "blk",
"input_layernorm", "attn_norm",
"mlp.down_proj", "ffn_down",
"mlp.gate_proj", "ffn_gate",
"mlp.up_proj", "ffn_up",
"self_attn.k_proj", "attn_k",
"self_attn.o_proj", "attn_output",
"self_attn.q_proj", "attn_q",
"self_attn.v_proj", "attn_v",
"model.norm", "output_norm",
"model.embed_tokens", "token_embd",
}
}

100
convert/convert_gemma.go Normal file
View File

@@ -0,0 +1,100 @@
package convert
import (
"strings"
"github.com/pdevine/tensor"
"github.com/pdevine/tensor/native"
"github.com/ollama/ollama/fs/ggml"
)
type gemmaModel struct {
ModelParameters
MaxPositionEmbeddings uint32 `json:"max_position_embeddings"`
HiddenSize uint32 `json:"hidden_size"`
HiddenLayers uint32 `json:"num_hidden_layers"`
IntermediateSize uint32 `json:"intermediate_size"`
NumAttentionHeads uint32 `json:"num_attention_heads"`
NumKeyValueHeads uint32 `json:"num_key_value_heads"`
RMSNormEPS float32 `json:"rms_norm_eps"`
HeadDim uint32 `json:"head_dim"`
}
var _ ModelConverter = (*gemmaModel)(nil)
func (p *gemmaModel) KV(t *Tokenizer) ggml.KV {
kv := p.ModelParameters.KV(t)
kv["general.architecture"] = "gemma"
kv["gemma.context_length"] = p.MaxPositionEmbeddings
kv["gemma.embedding_length"] = p.HiddenSize
kv["gemma.block_count"] = p.HiddenLayers
kv["gemma.feed_forward_length"] = p.IntermediateSize
kv["gemma.attention.head_count"] = p.NumAttentionHeads
kv["gemma.attention.head_count_kv"] = p.NumKeyValueHeads
kv["gemma.attention.layer_norm_rms_epsilon"] = p.RMSNormEPS
kv["gemma.attention.key_length"] = p.HeadDim
kv["gemma.attention.value_length"] = p.HeadDim
kv["tokenizer.ggml.eot_token_id"] = uint32(107)
kv["tokenizer.ggml.middle_token_id"] = uint32(68)
kv["tokenizer.ggml.prefix_token_id"] = uint32(67)
kv["tokenizer.ggml.suffix_token_id"] = uint32(69)
return kv
}
func (p *gemmaModel) Tensors(ts []Tensor) []ggml.Tensor {
var out []ggml.Tensor
for _, t := range ts {
if !strings.HasPrefix(t.Name(), "v.") && strings.HasSuffix(t.Name(), "_norm.weight") {
t.SetRepacker(p.addOne)
}
out = append(out, ggml.Tensor{
Name: t.Name(),
Kind: t.Kind(),
Shape: t.Shape(),
WriterTo: t,
})
}
return out
}
func (p *gemmaModel) Replacements() []string {
return []string{
"model.embed_tokens", "token_embd",
"model.norm", "output_norm",
"model.layers", "blk",
"input_layernorm", "attn_norm",
"self_attn.q_proj", "attn_q",
"self_attn.k_proj", "attn_k",
"self_attn.v_proj", "attn_v",
"self_attn.o_proj", "attn_output",
"mlp.gate_proj", "ffn_gate",
"mlp.down_proj", "ffn_down",
"mlp.up_proj", "ffn_up",
"post_attention_layernorm", "ffn_norm",
}
}
func (*gemmaModel) addOne(_ string, data []float32, shape []uint64) ([]float32, error) {
n := tensor.New(tensor.WithShape(int(shape[0])), tensor.WithBacking(data))
ones := tensor.Ones(tensor.Float32, int(shape[0]))
n, err := n.Add(ones)
if err != nil {
return nil, err
}
ts, err := native.SelectF32(n, 0)
if err != nil {
return nil, err
}
var f32s []float32
for _, t := range ts {
f32s = append(f32s, t...)
}
return f32s, nil
}

51
convert/convert_gemma2.go Normal file
View File

@@ -0,0 +1,51 @@
package convert
import "github.com/ollama/ollama/fs/ggml"
type gemma2Model struct {
gemmaModel
SlidingWindow uint32 `json:"sliding_window"`
AttentionLogitSoftcap float32 `json:"attn_logit_softcapping"`
FinalLogitSoftcap float32 `json:"final_logit_softcapping"`
}
func (p *gemma2Model) KV(t *Tokenizer) ggml.KV {
kv := p.ModelParameters.KV(t)
kv["general.architecture"] = "gemma2"
kv["gemma2.context_length"] = p.MaxPositionEmbeddings
kv["gemma2.embedding_length"] = p.HiddenSize
kv["gemma2.block_count"] = p.HiddenLayers
kv["gemma2.feed_forward_length"] = p.IntermediateSize
kv["gemma2.attention.head_count"] = p.NumAttentionHeads
kv["gemma2.attention.head_count_kv"] = p.NumKeyValueHeads
kv["gemma2.attention.layer_norm_rms_epsilon"] = p.RMSNormEPS
kv["gemma2.attention.key_length"] = p.HeadDim
kv["gemma2.attention.value_length"] = p.HeadDim
kv["gemma2.attention.sliding_window"] = p.SlidingWindow
kv["gemma2.attn_logit_softcapping"] = p.AttentionLogitSoftcap
kv["gemma2.final_logit_softcapping"] = p.FinalLogitSoftcap
kv["tokenizer.ggml.eot_token_id"] = uint32(107)
kv["tokenizer.ggml.middle_token_id"] = uint32(68)
kv["tokenizer.ggml.prefix_token_id"] = uint32(67)
kv["tokenizer.ggml.suffix_token_id"] = uint32(69)
return kv
}
func (p *gemma2Model) Replacements() []string {
return []string{
"model.embed_tokens", "token_embd",
"model.norm", "output_norm",
"model.layers", "blk",
"input_layernorm", "attn_norm",
"self_attn.q_proj", "attn_q",
"self_attn.k_proj", "attn_k",
"self_attn.v_proj", "attn_v",
"self_attn.o_proj", "attn_output",
"mlp.gate_proj", "ffn_gate",
"mlp.down_proj", "ffn_down",
"mlp.up_proj", "ffn_up",
"post_attention_layernorm", "post_attention_norm",
"pre_feedforward_layernorm", "ffn_norm",
"post_feedforward_layernorm", "post_ffw_norm",
}
}

View File

@@ -0,0 +1,91 @@
package convert
import (
"strings"
"github.com/pdevine/tensor"
"github.com/pdevine/tensor/native"
"github.com/ollama/ollama/fs/ggml"
)
type gemma2Adapter struct {
AdapterParameters
}
var _ AdapterConverter = (*gemma2Adapter)(nil)
func (p *gemma2Adapter) KV(baseKV ggml.KV) ggml.KV {
kv := p.AdapterParameters.KV()
kv["general.architecture"] = "gemma2"
return kv
}
func (p *gemma2Adapter) Tensors(ts []Tensor) []ggml.Tensor {
var out []ggml.Tensor
for _, t := range ts {
shape := t.Shape()
if (strings.HasSuffix(t.Name(), "weight.lora_a") && shape[0] > shape[1]) ||
(strings.HasSuffix(t.Name(), "weight.lora_b") && shape[0] < shape[1]) {
shape[0], shape[1] = shape[1], shape[0]
t.SetRepacker(p.repack)
}
out = append(out, ggml.Tensor{
Name: t.Name(),
Kind: t.Kind(),
Shape: t.Shape(),
WriterTo: t,
})
}
return out
}
func (p *gemma2Adapter) Replacements() []string {
return []string{
"base_model.model.", "",
"model.layers", "blk",
"self_attn.q_proj", "attn_q",
"self_attn.k_proj", "attn_k",
"self_attn.v_proj", "attn_v",
"self_attn.o_proj", "attn_output",
"mlp.gate_proj", "ffn_gate",
"mlp.down_proj", "ffn_down",
"mlp.up_proj", "ffn_up",
"lora_A.weight", "weight.lora_a",
"lora_B.weight", "weight.lora_b",
"lora_a", "weight.lora_a",
"lora_b", "weight.lora_b",
}
}
func (p *gemma2Adapter) repack(name string, data []float32, shape []uint64) ([]float32, error) {
dims := []int{int(shape[1]), int(shape[0])}
n := tensor.New(tensor.WithShape(dims...), tensor.WithBacking(data))
if err := n.T(1, 0); err != nil {
return nil, err
}
if err := n.Reshape(dims...); err != nil {
return nil, err
}
if err := n.Transpose(); err != nil {
return nil, err
}
ts, err := native.SelectF32(n, 1)
if err != nil {
return nil, err
}
var f32s []float32
for _, t := range ts {
f32s = append(f32s, t...)
}
return f32s, nil
}

142
convert/convert_gemma3.go Normal file
View File

@@ -0,0 +1,142 @@
package convert
import (
"cmp"
"github.com/ollama/ollama/fs/ggml"
)
type gemma3Model struct {
gemmaModel
Architecture string
TextModel struct {
HeadDim uint32 `json:"head_dim"`
HiddenSize uint32 `json:"hidden_size"`
HiddenLayers uint32 `json:"num_hidden_layers"`
IntermediateSize uint32 `json:"intermediate_size"`
SlidingWindow uint32 `json:"sliding_window"`
} `json:"text_config"`
VisionModel struct {
NumAttentionHeads uint32 `json:"num_attention_heads"` // attention.head_count 16
LayerNormEpsilon float32 `json:"layer_norm_eps"` // attention.layer_norm_epsilon 1e-05
NumHiddenLayers uint32 `json:"num_hidden_layers"` // block_count 32
HiddenSize uint32 `json:"hidden_size"` // embedding_length 1280
IntermediateSize uint32 `json:"intermediate_size"` // feed_forward_length 5120
ImageSize uint32 `json:"image_size"` // image_size 560
NumChannels uint32 `json:"num_channels"` // num_channels 3
PatchSize uint32 `json:"patch_size"` // patch_size 14
} `json:"vision_config"`
MaxPositionEmbeddings uint32 `json:"max_position_embeddings"`
NumAttentionHeads uint32 `json:"num_attention_heads"`
NumKeyValueHeads uint32 `json:"num_key_value_heads"`
RMSNormEPS float32 `json:"rms_norm_eps"`
HeadDim uint32 `json:"head_dim"`
FinalLogitSoftcap float32 `json:"final_logit_softcapping"`
RopeLocalTheta float32 `json:"rope_local_base_freq"`
RopeGlobalTheta float32 `json:"rope_global_base_freq"`
SlidingWindow uint32 `json:"sliding_window"`
MultiModalTokensPerImage uint32 `json:"mm_tokens_per_image"`
}
const (
gemma4BLayerCount = 34
gemma12BLayerCount = 48
gemma27BLayerCount = 62
)
func (p *gemma3Model) KV(t *Tokenizer) ggml.KV {
kv := p.ModelParameters.KV(t)
kv["general.architecture"] = "gemma3"
numBlocks := cmp.Or(p.HiddenLayers, p.TextModel.HiddenLayers)
kv["gemma3.block_count"] = numBlocks
var (
numHeads uint32
numKVHeads uint32
)
switch numBlocks {
case gemma4BLayerCount:
numHeads = 8
numKVHeads = 4
case gemma12BLayerCount:
numHeads = 16
numKVHeads = 8
case gemma27BLayerCount:
numHeads = 32
numKVHeads = 16
default:
numHeads = p.NumAttentionHeads
numKVHeads = p.NumKeyValueHeads
}
kv["gemma3.attention.head_count"] = numHeads
kv["gemma3.attention.head_count_kv"] = numKVHeads
switch p.Architecture {
case "Gemma3ForCausalLM":
kv["gemma3.context_length"] = p.MaxPositionEmbeddings
kv["gemma3.attention.layer_norm_rms_epsilon"] = p.RMSNormEPS
kv["gemma3.attention.key_length"] = p.HeadDim
kv["gemma3.attention.value_length"] = p.HeadDim
kv["gemma3.attention.sliding_window"] = p.SlidingWindow
kv["gemma3.final_logit_softcapping"] = cmp.Or(p.FinalLogitSoftcap, 30)
kv["gemma3.rope.local.freq_base"] = cmp.Or(p.RopeLocalTheta, 10000.0)
kv["gemma3.rope.global.freq_base"] = cmp.Or(p.RopeGlobalTheta, 1000000.0)
kv["gemma3.embedding_length"] = p.HiddenSize
kv["gemma3.feed_forward_length"] = p.IntermediateSize
default:
kv["gemma3.context_length"] = cmp.Or(p.MaxPositionEmbeddings, 131072)
kv["gemma3.embedding_length"] = p.TextModel.HiddenSize
kv["gemma3.feed_forward_length"] = p.TextModel.IntermediateSize
kv["gemma3.attention.sliding_window"] = p.TextModel.SlidingWindow
kv["gemma3.vision.block_count"] = p.VisionModel.NumHiddenLayers
kv["gemma3.vision.embedding_length"] = p.VisionModel.HiddenSize
kv["gemma3.vision.feed_forward_length"] = p.VisionModel.IntermediateSize
kv["gemma3.vision.image_size"] = p.VisionModel.ImageSize
kv["gemma3.vision.patch_size"] = p.VisionModel.PatchSize
kv["gemma3.vision.num_channels"] = cmp.Or(p.VisionModel.NumChannels, 3)
kv["gemma3.vision.attention.head_count"] = p.VisionModel.NumAttentionHeads
kv["gemma3.vision.attention.layer_norm_epsilon"] = cmp.Or(p.VisionModel.LayerNormEpsilon, 1e-6)
kv["gemma3.attention.key_length"] = cmp.Or(p.TextModel.HeadDim, 256)
kv["gemma3.attention.value_length"] = cmp.Or(p.TextModel.HeadDim, 256)
}
if p.MultiModalTokensPerImage > 0 {
kv["gemma3.mm.tokens_per_image"] = p.MultiModalTokensPerImage
}
return kv
}
func (p *gemma3Model) Replacements() []string {
return []string{
"lm_head", "output",
"model.embed_tokens", "token_embd",
"model.norm", "output_norm",
"vision_tower.vision_model.embeddings", "v",
"vision_tower.vision_model", "v",
"vision_model.vision_model.embeddings", "v",
"vision_model.vision_model", "v",
"language_model.", "",
"model.layers", "blk",
"encoder.layers", "blk",
"input_layernorm", "attn_norm",
"self_attn.q_proj", "attn_q",
"self_attn.q_norm", "attn_q_norm",
"self_attn.k_proj", "attn_k",
"self_attn.k_norm", "attn_k_norm",
"self_attn.v_proj", "attn_v",
"self_attn.o_proj", "attn_output",
"self_attn.out_proj", "attn_output",
"mlp.gate_proj", "ffn_gate",
"mlp.down_proj", "ffn_down",
"mlp.up_proj", "ffn_up",
"post_attention_layernorm", "post_attention_norm",
"pre_feedforward_layernorm", "ffn_norm",
"post_feedforward_layernorm", "post_ffw_norm",
"input_projection_weight", "input_projection.weight",
"multi_modal_projector", "mm",
}
}

220
convert/convert_llama.go Normal file
View File

@@ -0,0 +1,220 @@
package convert
import (
"cmp"
"fmt"
"math"
"strings"
"github.com/pdevine/tensor"
"github.com/pdevine/tensor/native"
"github.com/ollama/ollama/fs/ggml"
)
type llamaModel struct {
ModelParameters
NLayers uint32 `json:"n_layers"`
NumHiddenLayers uint32 `json:"num_hidden_layers"`
NLayer uint32 `json:"n_layer"`
MaxPositionEmbeddings uint32 `json:"max_position_embeddings"`
NCtx uint32 `json:"n_ctx"`
HiddenSize uint32 `json:"hidden_size"`
NEmbd uint32 `json:"n_embd"`
IntermediateSize uint32 `json:"intermediate_size"`
NInner uint32 `json:"n_inner"`
NumAttentionHeads uint32 `json:"num_attention_heads"`
NHead uint32 `json:"n_head"`
NumKeyValueHeads uint32 `json:"num_key_value_heads"`
RopeTheta float32 `json:"rope_theta"`
RopeScaling struct {
Type string `json:"type"`
RopeType string `json:"rope_type"`
Factor float32 `json:"factor"`
LowFrequencyFactor float32 `json:"low_freq_factor"`
HighFrequencyFactor float32 `json:"high_freq_factor"`
OriginalMaxPositionEmbeddings uint32 `json:"original_max_position_embeddings"`
factors ropeFactor
} `json:"rope_scaling"`
RMSNormEPS float32 `json:"rms_norm_eps"`
LayerNormEPS float32 `json:"layer_norm_eps"`
LayerNormEpsilon float32 `json:"layer_norm_epsilon"`
NormEpsilon float32 `json:"norm_epsilon"`
HeadDim uint32 `json:"head_dim"`
skipRepack bool
}
var _ ModelConverter = (*llamaModel)(nil)
func (p *llamaModel) KV(t *Tokenizer) ggml.KV {
kv := p.ModelParameters.KV(t)
kv["general.architecture"] = "llama"
kv["llama.vocab_size"] = p.VocabSize
kv["llama.block_count"] = cmp.Or(p.NLayers, p.NumHiddenLayers, p.NLayer)
if contextLength := cmp.Or(p.MaxPositionEmbeddings, p.NCtx); contextLength > 0 {
kv["llama.context_length"] = contextLength
}
if embeddingLength := cmp.Or(p.HiddenSize, p.NEmbd); embeddingLength > 0 {
kv["llama.embedding_length"] = cmp.Or(p.HiddenSize, p.NEmbd)
}
if feedForwardLength := cmp.Or(p.IntermediateSize, p.NInner); feedForwardLength > 0 {
kv["llama.feed_forward_length"] = cmp.Or(p.IntermediateSize, p.NInner)
}
if headCount := cmp.Or(p.NumAttentionHeads, p.NHead); headCount > 0 {
kv["llama.attention.head_count"] = cmp.Or(p.NumAttentionHeads, p.NHead)
kv["llama.rope.dimension_count"] = p.HiddenSize / headCount
}
if p.HeadDim > 0 {
kv["llama.attention.head_dim"] = p.HeadDim
}
if p.RopeTheta > 0 {
kv["llama.rope.freq_base"] = p.RopeTheta
}
if p.RopeScaling.Type == "linear" {
kv["llama.rope.scaling.type"] = p.RopeScaling.Type
kv["llama.rope.scaling.factor"] = p.RopeScaling.Factor
} else if p.RopeScaling.RopeType == "llama3" {
dim := p.HiddenSize / p.NumAttentionHeads
for i := uint32(0); i < dim; i += 2 {
factor := cmp.Or(p.RopeScaling.Factor, 8.0)
factorLow := cmp.Or(p.RopeScaling.LowFrequencyFactor, 1.0)
factorHigh := cmp.Or(p.RopeScaling.HighFrequencyFactor, 4.0)
original := cmp.Or(p.RopeScaling.OriginalMaxPositionEmbeddings, 8192)
lambdaLow := float32(original) / factorLow
lambdaHigh := float32(original) / factorHigh
lambda := 2 * math.Pi * math.Pow(float64(p.RopeTheta), float64(i)/float64(dim))
if lambda < float64(lambdaHigh) {
p.RopeScaling.factors = append(p.RopeScaling.factors, 1.0)
} else if lambda > float64(lambdaLow) {
p.RopeScaling.factors = append(p.RopeScaling.factors, factor)
} else {
smooth := (float32(original)/float32(lambda) - factorLow) / (factorHigh - factorLow)
p.RopeScaling.factors = append(p.RopeScaling.factors, 1.0/((1-smooth)/factor+smooth))
}
}
}
if p.NumKeyValueHeads > 0 {
kv["llama.attention.head_count_kv"] = p.NumKeyValueHeads
}
if p.RMSNormEPS > 0 {
kv["llama.attention.layer_norm_rms_epsilon"] = p.RMSNormEPS
}
if layerNormEpsilon := cmp.Or(p.LayerNormEPS, p.LayerNormEpsilon, p.NormEpsilon); layerNormEpsilon > 0 {
kv["llama.attention.layer_norm_epsilon"] = layerNormEpsilon
}
if p.HeadDim > 0 {
kv["llama.attention.key_length"] = p.HeadDim
kv["llama.attention.value_length"] = p.HeadDim
}
return kv
}
func (p *llamaModel) Tensors(ts []Tensor) []ggml.Tensor {
var out []ggml.Tensor
if p.RopeScaling.factors != nil {
out = append(out, ggml.Tensor{
Name: "rope_freqs.weight",
Kind: 0,
Shape: []uint64{uint64(len(p.RopeScaling.factors))},
WriterTo: p.RopeScaling.factors,
})
}
for _, t := range ts {
if strings.HasSuffix(t.Name(), "attn_q.weight") || strings.HasSuffix(t.Name(), "attn_k.weight") {
if !p.skipRepack {
t.SetRepacker(p.repack)
}
}
out = append(out, ggml.Tensor{
Name: t.Name(),
Kind: t.Kind(),
Shape: t.Shape(),
WriterTo: t,
})
}
return out
}
func (p *llamaModel) Replacements() []string {
return []string{
"lm_head", "output",
"model.embed_tokens", "token_embd",
"model.norm", "output_norm",
"model.layers", "blk",
"input_layernorm", "attn_norm",
"self_attn.q_proj", "attn_q",
"self_attn.k_proj", "attn_k",
"self_attn.v_proj", "attn_v",
"self_attn.o_proj", "attn_output",
"mlp.gate_proj", "ffn_gate",
"mlp.down_proj", "ffn_down",
"mlp.up_proj", "ffn_up",
"post_attention_layernorm", "ffn_norm",
}
}
func (p *llamaModel) repack(name string, data []float32, shape []uint64) ([]float32, error) {
var dims []int
for _, dim := range shape {
dims = append(dims, int(dim))
}
var heads uint32
if strings.HasSuffix(name, "attn_q.weight") {
heads = p.NumAttentionHeads
} else if strings.HasSuffix(name, "attn_k.weight") {
heads = cmp.Or(p.NumKeyValueHeads, p.NumAttentionHeads)
} else {
return nil, fmt.Errorf("unknown tensor for repack: %s", name)
}
n := tensor.New(tensor.WithShape(dims...), tensor.WithBacking(data))
if err := n.Reshape(append([]int{int(heads), 2, dims[0] / int(heads) / 2}, dims[1:]...)...); err != nil {
return nil, err
}
if err := n.T(0, 2, 1, 3); err != nil {
return nil, err
}
if err := n.Reshape(dims...); err != nil {
return nil, err
}
if err := n.Transpose(); err != nil {
return nil, err
}
ts, err := native.SelectF32(n, 1)
if err != nil {
return nil, err
}
var f32s []float32
for _, t := range ts {
f32s = append(f32s, t...)
}
return f32s, nil
}

169
convert/convert_llama4.go Normal file
View File

@@ -0,0 +1,169 @@
package convert
import (
"slices"
"strings"
"github.com/pdevine/tensor"
"github.com/pdevine/tensor/native"
"github.com/ollama/ollama/fs/ggml"
)
type llama4Model struct {
ModelParameters
TextModel struct {
llamaModel
NumExpertsPerToken uint32 `json:"num_experts_per_tok"`
NumLocalExperts uint32 `json:"num_local_experts"`
InterleaveMOELayerStep uint32 `json:"interleave_moe_layer_step"`
UseQKNorm bool `json:"use_qk_norm"`
IntermediateSizeMLP uint32 `json:"intermediate_size_mlp"`
AttentionChunkSize uint32 `json:"attention_chunk_size"`
} `json:"text_config"`
VisionModel struct {
NumHiddenLayers uint32 `json:"num_hidden_layers"`
HiddenSize uint32 `json:"hidden_size"`
IntermediateSize uint32 `json:"intermediate_size"`
NumAttentionHeads uint32 `json:"num_attention_heads"`
ImageSize uint32 `json:"image_size"`
PatchSize uint32 `json:"patch_size"`
RopeTheta float32 `json:"rope_theta"`
NormEpsilon float32 `json:"norm_eps"`
PixelShuffleRatio float32 `json:"pixel_shuffle_ratio"`
} `json:"vision_config"`
}
// KV implements ModelConverter.
func (p *llama4Model) KV(t *Tokenizer) ggml.KV {
kv := p.ModelParameters.KV(t)
kv["general.architecture"] = "llama4"
for k, v := range p.TextModel.KV(t) {
if strings.HasPrefix(k, "llama.") {
kv[strings.ReplaceAll(k, "llama.", "llama4.")] = v
}
}
kv["llama4.feed_forward_length"] = p.TextModel.IntermediateSizeMLP
kv["llama4.expert_feed_forward_length"] = p.TextModel.IntermediateSize
kv["llama4.expert_count"] = p.TextModel.NumLocalExperts
kv["llama4.expert_used_count"] = p.TextModel.NumExpertsPerToken
kv["llama4.interleave_moe_layer_step"] = p.TextModel.InterleaveMOELayerStep
kv["llama4.use_qk_norm"] = p.TextModel.UseQKNorm
kv["llama4.attention.chunk_size"] = p.TextModel.AttentionChunkSize
kv["llama4.vision.block_count"] = p.VisionModel.NumHiddenLayers
kv["llama4.vision.embedding_length"] = p.VisionModel.HiddenSize
kv["llama4.vision.feed_forward_length"] = p.VisionModel.IntermediateSize
kv["llama4.vision.attention.head_count"] = p.VisionModel.NumAttentionHeads
kv["llama4.vision.image_size"] = p.VisionModel.ImageSize
kv["llama4.vision.patch_size"] = p.VisionModel.PatchSize
kv["llama4.vision.rope.freq_base"] = p.VisionModel.RopeTheta
kv["llama4.vision.layer_norm_epsilon"] = p.VisionModel.NormEpsilon
kv["llama4.vision.pixel_shuffle_ratio"] = p.VisionModel.PixelShuffleRatio
return kv
}
// Replacements implements ModelConverter.
func (p *llama4Model) Replacements() []string {
return append(
p.TextModel.Replacements(),
"language_model.", "",
"vision_model", "v",
"multi_modal_projector", "mm",
"feed_forward.down_proj", "ffn_down",
"feed_forward.up_proj", "ffn_up",
"feed_forward.gate_proj", "ffn_gate",
"feed_forward.", "ffn_",
"shared_expert.down_proj", "down_shexp",
"shared_expert.gate_proj", "gate_shexp",
"shared_expert.up_proj", "up_shexp",
"experts.down_proj", "down_exps.weight",
"experts.gate_up_proj", "gate_up_exps.weight",
"router", "gate_inp",
"patch_embedding.linear", "patch_embedding",
)
}
// Tensors implements ModelConverter.
func (p *llama4Model) Tensors(ts []Tensor) []ggml.Tensor {
var out []ggml.Tensor
var textTensors []Tensor
for _, t := range ts {
if strings.HasPrefix(t.Name(), "v.") || strings.HasPrefix(t.Name(), "mm.") {
out = append(out, ggml.Tensor{
Name: t.Name(),
Kind: t.Kind(),
Shape: t.Shape(),
WriterTo: t,
})
} else if strings.Contains(t.Name(), "ffn_gate_up_exps") {
// gate and up projectors are fused
// dims[1], dims[2] must be swapped
// [experts, hidden_size, intermediate_size * 2] --> [experts, intermediate_size, hidden_size]
halfDim := int(t.Shape()[2]) / 2
newShape := slices.Clone(t.Shape())
newShape[1], newShape[2] = newShape[2]/2, newShape[1]
for i, name := range []string{"ffn_gate_exps", "ffn_up_exps"} {
// clone tensor since we need separate repackers
tt := t.Clone()
tt.SetRepacker(p.repack(nil, nil, tensor.S(i*halfDim, (i+1)*halfDim)))
out = append(out, ggml.Tensor{
Name: strings.ReplaceAll(tt.Name(), "ffn_gate_up_exps", name),
Kind: tt.Kind(),
Shape: newShape,
WriterTo: tt,
})
}
} else if strings.Contains(t.Name(), "ffn_down_exps") {
// dims[1], dims[2] must be swapped
// [experts, intermediate_size, hidden_size] --> [experts, hidden_size, intermediate_size]
t.SetRepacker(p.repack())
newShape := slices.Clone(t.Shape())
newShape[1], newShape[2] = newShape[2], newShape[1]
out = append(out, ggml.Tensor{
Name: t.Name(),
Kind: t.Kind(),
Shape: newShape,
WriterTo: t,
})
} else {
textTensors = append(textTensors, t)
}
}
p.TextModel.skipRepack = true
out = append(out, p.TextModel.Tensors(textTensors)...)
return out
}
func (p *llama4Model) repack(slice ...tensor.Slice) Repacker {
return func(name string, data []float32, shape []uint64) ([]float32, error) {
dims := make([]int, len(shape))
for i, dim := range shape {
dims[i] = int(dim)
}
var t tensor.Tensor = tensor.New(tensor.WithShape(dims...), tensor.WithBacking(data))
t, err := t.Slice(slice...)
if err != nil {
return nil, err
}
if err := t.T(0, 2, 1); err != nil {
return nil, err
}
t = tensor.Materialize(t)
// flatten tensor so it can be return as a vector
if err := t.Reshape(t.Shape().TotalSize()); err != nil {
return nil, err
}
return native.VectorF32(t.(*tensor.Dense))
}
}

View File

@@ -0,0 +1,169 @@
package convert
import (
"cmp"
"strings"
"github.com/pdevine/tensor"
"github.com/pdevine/tensor/native"
"github.com/ollama/ollama/fs/ggml"
)
type llamaAdapter struct {
AdapterParameters
NumAttentionHeads uint32 `json:"num_attention_heads"`
NumKeyValueHeads uint32 `json:"num_key_value_heads"`
}
var _ AdapterConverter = (*llamaAdapter)(nil)
func (p *llamaAdapter) KV(baseKV ggml.KV) ggml.KV {
kv := p.AdapterParameters.KV()
kv["general.architecture"] = "llama"
kv["llama.attention.head_count"] = baseKV["llama.attention.head_count"]
kv["llama.attention.head_count_kv"] = baseKV["llama.attention.head_count_kv"]
p.NumAttentionHeads = baseKV["llama.attention.head_count"].(uint32)
return kv
}
func (p *llamaAdapter) Tensors(ts []Tensor) []ggml.Tensor {
var out []ggml.Tensor
for _, t := range ts {
shape := t.Shape()
if (strings.HasSuffix(t.Name(), "weight.lora_a") && shape[0] > shape[1]) ||
(strings.HasSuffix(t.Name(), "weight.lora_b") && shape[0] < shape[1]) {
shape[0], shape[1] = shape[1], shape[0]
t.SetRepacker(p.repackAndTranspose)
} else {
t.SetRepacker(p.repack)
}
out = append(out, ggml.Tensor{
Name: t.Name(),
Kind: t.Kind(),
Shape: shape,
WriterTo: t,
})
}
return out
}
func (p *llamaAdapter) Replacements() []string {
return []string{
"base_model.model.", "",
"model.layers", "blk",
"self_attn.q_proj", "attn_q",
"self_attn.k_proj", "attn_k",
"self_attn.v_proj", "attn_v",
"self_attn.o_proj", "attn_output",
"mlp.gate_proj", "ffn_gate",
"mlp.down_proj", "ffn_down",
"mlp.up_proj", "ffn_up",
"lora_A.weight", "weight.lora_a",
"lora_B.weight", "weight.lora_b",
"lora_a", "weight.lora_a",
"lora_b", "weight.lora_b",
}
}
func (p *llamaAdapter) repack(name string, data []float32, shape []uint64) ([]float32, error) {
dims := []int{int(shape[1]), int(shape[0])}
var heads uint32
if strings.HasSuffix(name, "attn_q.weight.lora_a") {
heads = p.NumAttentionHeads
} else if strings.HasSuffix(name, "attn_k.weight.lora_a") {
heads = cmp.Or(p.NumKeyValueHeads, p.NumAttentionHeads)
} else {
return data, nil
}
n := tensor.New(tensor.WithShape(dims...), tensor.WithBacking(data))
if err := n.Reshape(append([]int{int(heads), 2, dims[0] / int(heads) / 2}, dims[1:]...)...); err != nil {
return nil, err
}
if err := n.T(0, 2, 1, 3); err != nil {
return nil, err
}
if err := n.Reshape(dims...); err != nil {
return nil, err
}
if err := n.Transpose(); err != nil {
return nil, err
}
ts, err := native.SelectF32(n, 1)
if err != nil {
return nil, err
}
var f32s []float32
for _, t := range ts {
f32s = append(f32s, t...)
}
return f32s, nil
}
func (p *llamaAdapter) repackAndTranspose(name string, data []float32, shape []uint64) ([]float32, error) {
dims := []int{int(shape[1]), int(shape[0])}
n := tensor.New(tensor.WithShape(dims...), tensor.WithBacking(data))
var heads uint32
if strings.HasSuffix(name, "attn_q.weight.lora_a") {
heads = p.NumAttentionHeads
} else if strings.HasSuffix(name, "attn_k.weight.lora_a") {
heads = cmp.Or(p.NumKeyValueHeads, p.NumAttentionHeads)
}
if heads > 0 {
if err := n.Reshape(append([]int{int(heads), 2, dims[0] / int(heads) / 2}, dims[1:]...)...); err != nil {
return nil, err
}
if err := n.T(0, 2, 1, 3); err != nil {
return nil, err
}
if err := n.Reshape(dims...); err != nil {
return nil, err
}
if err := n.Transpose(); err != nil {
return nil, err
}
}
if err := n.T(1, 0); err != nil {
return nil, err
}
if err := n.Reshape(dims...); err != nil {
return nil, err
}
if err := n.Transpose(); err != nil {
return nil, err
}
ts, err := native.SelectF32(n, 1)
if err != nil {
return nil, err
}
var f32s []float32
for _, t := range ts {
f32s = append(f32s, t...)
}
return f32s, nil
}

190
convert/convert_mistral.go Normal file
View File

@@ -0,0 +1,190 @@
package convert
import (
"cmp"
"fmt"
"strings"
"github.com/pdevine/tensor"
"github.com/pdevine/tensor/native"
"github.com/ollama/ollama/fs/ggml"
)
type mistral3Model struct {
ModelParameters
ImageTokenIndex uint32 `json:"image_token_index"`
SpatialMergeSize uint32 `json:"spatial_merge_size"`
VisionFeatureLayer int32 `json:"vision_feature_layer"`
TextModel struct {
NumHiddenLayers uint32 `json:"num_hidden_layers"`
MaxPositionEmbeddings uint32 `json:"max_position_embeddings"`
HiddenSize uint32 `json:"hidden_size"`
IntermediateSize uint32 `json:"intermediate_size"`
NumAttentionHeads uint32 `json:"num_attention_heads"`
NumKeyValueHeads uint32 `json:"num_key_value_heads"`
RopeTheta float32 `json:"rope_theta"`
RMSNormEPS float32 `json:"rms_norm_eps"`
HeadDim uint32 `json:"head_dim"`
SlidingWindow *uint32 `json:"sliding_window"`
HiddenAct string `json:"hidden_act"`
VocabSize uint32 `json:"vocab_size"`
} `json:"text_config"`
VisionModel struct {
NumAttentionHeads uint32 `json:"num_attention_heads"`
NumHiddenLayers uint32 `json:"num_hidden_layers"`
HiddenSize uint32 `json:"hidden_size"`
IntermediateSize uint32 `json:"intermediate_size"`
ImageSize uint32 `json:"image_size"`
NumChannels uint32 `json:"num_channels"`
PatchSize uint32 `json:"patch_size"`
HeadDim uint32 `json:"head_dim"`
HiddenAct string `json:"hidden_act"`
RopeTheta float32 `json:"rope_theta"`
} `json:"vision_config"`
MultiModalProjectorBias bool `json:"multimodal_projector_bias"`
ProjectorHiddenAct string `json:"projector_hidden_act"`
}
func (p *mistral3Model) KV(t *Tokenizer) ggml.KV {
kv := p.ModelParameters.KV(t)
kv["general.architecture"] = "mistral3"
kv["mistral3.vocab_size"] = p.TextModel.VocabSize
// Text configuration
kv["mistral3.block_count"] = p.TextModel.NumHiddenLayers
kv["mistral3.context_length"] = p.TextModel.MaxPositionEmbeddings
kv["mistral3.embedding_length"] = p.TextModel.HiddenSize
kv["mistral3.feed_forward_length"] = p.TextModel.IntermediateSize
kv["mistral3.attention.head_count"] = p.TextModel.NumAttentionHeads
kv["mistral3.attention.head_count_kv"] = p.TextModel.NumKeyValueHeads
kv["mistral3.attention.layer_norm_rms_epsilon"] = p.TextModel.RMSNormEPS
kv["mistral3.attention.key_length"] = p.TextModel.HeadDim
kv["mistral3.attention.value_length"] = p.TextModel.HeadDim
kv["mistral3.rope.dimension_count"] = p.TextModel.HiddenSize / p.TextModel.NumHiddenLayers
kv["mistral3.rope.freq_base"] = p.TextModel.RopeTheta
// Vision configuration
kv["mistral3.vision.block_count"] = p.VisionModel.NumHiddenLayers
kv["mistral3.vision.embedding_length"] = p.VisionModel.HiddenSize
kv["mistral3.vision.feed_forward_length"] = p.VisionModel.IntermediateSize
kv["mistral3.vision.attention.head_count"] = p.VisionModel.NumAttentionHeads
kv["mistral3.vision.attention.key_length"] = p.VisionModel.HeadDim
kv["mistral3.vision.image_size"] = p.VisionModel.ImageSize
kv["mistral3.vision.patch_size"] = p.VisionModel.PatchSize
kv["mistral3.vision.num_channels"] = p.VisionModel.NumChannels
// kv["mistral3.vision.attention.layer_norm_epsilon"] = 1e-05 // Default value
kv["mistral3.vision.rope.freq_base"] = p.VisionModel.RopeTheta
// Multimodal configuration
kv["mistral3.image_token_index"] = p.ImageTokenIndex
kv["mistral3.spatial_merge_size"] = p.SpatialMergeSize
kv["mistral3.mm.projector_bias"] = p.MultiModalProjectorBias
if p.ProjectorHiddenAct != "" {
kv["mistral3.mm.projector_hidden_act"] = p.ProjectorHiddenAct
}
return kv
}
func (p *mistral3Model) Tensors(ts []Tensor) []ggml.Tensor {
var out []ggml.Tensor
for _, t := range ts {
if !strings.HasPrefix(t.Name(), "v.") {
if strings.HasSuffix(t.Name(), ".attn_q.weight") ||
strings.HasSuffix(t.Name(), ".attn_k.weight") {
t.SetRepacker(p.repack)
}
}
out = append(out, ggml.Tensor{
Name: t.Name(),
Kind: t.Kind(),
Shape: t.Shape(),
WriterTo: t,
})
}
return out
}
func (p *mistral3Model) Replacements() []string {
return []string{
"language_model.model.norm", "output_norm",
"language_model.model.", "",
"language_model.", "",
"layers", "blk",
"transformer.layers", "blk",
"vision_tower", "v",
"ln_pre", "encoder_norm",
"input_layernorm", "attn_norm",
"post_attention_layernorm", "ffn_norm",
"embed_tokens", "token_embd",
"self_attn.q_proj", "attn_q",
"self_attn.k_proj", "attn_k",
"self_attn.v_proj", "attn_v",
"self_attn.o_proj", "attn_output",
"mlp.down_proj", "ffn_down",
"mlp.gate_proj", "ffn_gate",
"mlp.up_proj", "ffn_up",
"attention.q_proj", "attn_q",
"attention.k_proj", "attn_k",
"attention.v_proj", "attn_v",
"attention.o_proj", "attn_output",
"attention_norm", "attn_norm",
"feed_forward.gate_proj", "ffn_gate",
"feed_forward.down_proj", "ffn_down",
"feed_forward.up_proj", "ffn_up",
"multi_modal_projector", "mm",
"ffn_norm", "ffn_norm",
"lm_head", "output",
}
}
func (p *mistral3Model) repack(name string, data []float32, shape []uint64) ([]float32, error) {
var dims []int
for _, dim := range shape {
dims = append(dims, int(dim))
}
var heads uint32
if strings.HasSuffix(name, ".attn_q.weight") {
heads = p.TextModel.NumAttentionHeads
} else if strings.HasSuffix(name, ".attn_k.weight") {
heads = cmp.Or(p.TextModel.NumKeyValueHeads, p.TextModel.NumAttentionHeads)
} else {
return nil, fmt.Errorf("unknown tensor for repack: %s", name)
}
n := tensor.New(tensor.WithShape(dims...), tensor.WithBacking(data))
if err := n.Reshape(append([]int{int(heads), 2, dims[0] / int(heads) / 2}, dims[1:]...)...); err != nil {
return nil, err
}
if err := n.T(0, 2, 1, 3); err != nil {
return nil, err
}
if err := n.Reshape(dims...); err != nil {
return nil, err
}
if err := n.Transpose(); err != nil {
return nil, err
}
ts, err := native.SelectF32(n, 1)
if err != nil {
return nil, err
}
var f32s []float32
for _, t := range ts {
f32s = append(f32s, t...)
}
return f32s, nil
}

View File

@@ -0,0 +1,94 @@
package convert
import (
"fmt"
"io"
"slices"
"strings"
"github.com/ollama/ollama/fs/ggml"
)
type mixtralModel struct {
llamaModel
NumLocalExperts uint32 `json:"num_local_experts"`
NumExpertsPerToken uint32 `json:"num_experts_per_tok"`
}
func (p *mixtralModel) KV(t *Tokenizer) ggml.KV {
kv := p.llamaModel.KV(t)
if p.NumLocalExperts > 0 {
kv["llama.expert_count"] = p.NumLocalExperts
}
if p.NumExpertsPerToken > 0 {
kv["llama.expert_used_count"] = p.NumExpertsPerToken
}
return kv
}
func (p *mixtralModel) Tensors(ts []Tensor) []ggml.Tensor {
oldnew := []string{
"model.layers", "blk",
"w1", "ffn_gate_exps",
"w2", "ffn_down_exps",
"w3", "ffn_up_exps",
}
for i := range p.NumLocalExperts {
oldnew = append(oldnew, fmt.Sprintf(".block_sparse_moe.experts.%d.", i), ".")
}
// group experts of the same layer (model.layers.%d) and type (w[123]) into a single tensor
namer := strings.NewReplacer(oldnew...)
experts := make(map[string]experts)
// merge experts into a single tensor while removing them from ts
ts = slices.DeleteFunc(ts, func(t Tensor) bool {
if !strings.Contains(t.Name(), ".block_sparse_moe.experts.") {
return false
}
name := namer.Replace(t.Name())
experts[name] = append(experts[name], t)
return true
})
var out []ggml.Tensor
for n, e := range experts {
// TODO(mxyng): sanity check experts
out = append(out, ggml.Tensor{
Name: n,
Kind: e[0].Kind(),
Shape: append([]uint64{uint64(len(e))}, e[0].Shape()...),
WriterTo: e,
})
}
return append(out, p.llamaModel.Tensors(ts)...)
}
func (p *mixtralModel) Replacements() []string {
return append(
p.llamaModel.Replacements(),
"block_sparse_moe.gate", "ffn_gate_inp",
)
}
type experts []Tensor
func (e experts) WriteTo(w io.Writer) (int64, error) {
// TODO(mxyng): experts _should_ be numerically sorted by expert but this should check
for _, t := range e {
// the canonical merged experts tensor stacks all experts along a new, 0 axis,
// e.g. `tensor.Stack(0, e[0], e[1:]...)`, which requires allocating temporary buffers
// this accomplishes the same thing by writing each expert tensor in sequence
if _, err := t.WriteTo(w); err != nil {
return 0, err
}
}
return 0, nil
}

122
convert/convert_phi3.go Normal file
View File

@@ -0,0 +1,122 @@
package convert
import (
"cmp"
"encoding/binary"
"io"
"math"
"strings"
"sync"
"github.com/ollama/ollama/fs/ggml"
)
type phi3Model struct {
ModelParameters
NumHiddenLayers uint32 `json:"num_hidden_layers"`
NLayers uint32 `json:"n_layers"`
HiddenSize uint32 `json:"hidden_size"`
NEmbd uint32 `json:"n_embd"`
IntermediateSize uint32 `json:"intermediate_size"`
NumAttentionHeads uint32 `json:"num_attention_heads"`
NHead uint32 `json:"n_head"`
NumKeyValueHeads uint32 `json:"num_key_value_heads"`
NHeadKV uint32 `json:"n_head_kv"`
RopeTheta float32 `json:"rope_theta"`
RopeScaling struct {
Type string `json:"type"`
LongFactor ropeFactor `json:"long_factor"`
ShortFactor ropeFactor `json:"short_factor"`
} `json:"rope_scaling"`
RMSNormEPS float32 `json:"rms_norm_eps"`
NPositions uint32 `json:"n_positions"`
MaxPositionEmbeddings uint32 `json:"max_position_embeddings"`
OriginalMaxPositionEmbeddings uint32 `json:"original_max_position_embeddings"`
SlidingWindow uint32 `json:"sliding_window"`
}
var _ ModelConverter = (*phi3Model)(nil)
func (p *phi3Model) KV(t *Tokenizer) ggml.KV {
kv := p.ModelParameters.KV(t)
kv["general.architecture"] = "phi3"
kv["phi3.context_length"] = p.MaxPositionEmbeddings
kv["phi3.embedding_length"] = cmp.Or(p.HiddenSize, p.NEmbd)
kv["phi3.feed_forward_length"] = p.IntermediateSize
kv["phi3.block_count"] = cmp.Or(p.NumHiddenLayers, p.NLayers)
kv["phi3.attention.head_count"] = cmp.Or(p.NumAttentionHeads, p.NHead)
kv["phi3.attention.head_count_kv"] = cmp.Or(p.NumKeyValueHeads, p.NHeadKV)
kv["phi3.attention.layer_norm_rms_epsilon"] = p.RMSNormEPS
kv["phi3.rope.dimension_count"] = p.HiddenSize / cmp.Or(p.NumAttentionHeads, p.NHead)
kv["phi3.rope.freq_base"] = p.RopeTheta
kv["phi3.rope.scaling.original_context_length"] = p.OriginalMaxPositionEmbeddings
kv["phi3.attention.sliding_window"] = p.SlidingWindow
scale := float64(p.MaxPositionEmbeddings) / float64(p.OriginalMaxPositionEmbeddings)
switch p.RopeScaling.Type {
case "":
// no scaling
case "su", "longrope":
kv["phi3.rope.scaling.attn_factor"] = float32(max(math.Sqrt(1+math.Log(scale)/math.Log(float64(p.OriginalMaxPositionEmbeddings))), 1.0))
case "yarn":
kv["phi3.rope.scaling.attn_factor"] = float32(max(0.1*math.Log(scale)+1.0, 1.0))
default:
panic("unknown rope scaling type")
}
return kv
}
func (p *phi3Model) Tensors(ts []Tensor) []ggml.Tensor {
var addRopeFactors sync.Once
out := make([]ggml.Tensor, 0, len(ts)+2)
for _, t := range ts {
if strings.HasPrefix(t.Name(), "blk.0.") {
addRopeFactors.Do(func() {
out = append(out, ggml.Tensor{
Name: "rope_factors_long.weight",
Kind: 0,
Shape: []uint64{uint64(len(p.RopeScaling.LongFactor))},
WriterTo: p.RopeScaling.LongFactor,
}, ggml.Tensor{
Name: "rope_factors_short.weight",
Kind: 0,
Shape: []uint64{uint64(len(p.RopeScaling.ShortFactor))},
WriterTo: p.RopeScaling.ShortFactor,
})
})
}
out = append(out, ggml.Tensor{
Name: t.Name(),
Kind: t.Kind(),
Shape: t.Shape(),
WriterTo: t,
})
}
return out
}
func (p *phi3Model) Replacements() []string {
return []string{
"lm_head", "output",
"model.embed_tokens", "token_embd",
"model.norm", "output_norm",
"model.layers", "blk",
"input_layernorm", "attn_norm",
"self_attn.qkv_proj", "attn_qkv",
"self_attn.o_proj", "attn_output",
"mlp.down_proj", "ffn_down",
"mlp.gate_up_proj", "ffn_up",
"post_attention_layernorm", "ffn_norm",
}
}
type ropeFactor []float32
func (r ropeFactor) WriteTo(w io.Writer) (int64, error) {
return 0, binary.Write(w, binary.LittleEndian, r)
}

78
convert/convert_qwen2.go Normal file
View File

@@ -0,0 +1,78 @@
package convert
import "github.com/ollama/ollama/fs/ggml"
type qwen2Model struct {
ModelParameters
MaxPositionEmbeddings uint32 `json:"max_position_embeddings"`
HiddenSize uint32 `json:"hidden_size"`
HiddenLayers uint32 `json:"num_hidden_layers"`
IntermediateSize uint32 `json:"intermediate_size"`
NumAttentionHeads uint32 `json:"num_attention_heads"`
NumKeyValueHeads uint32 `json:"num_key_value_heads"`
RopeTheta float32 `json:"rope_theta"`
RopeScaling struct {
Type string `json:"type"`
Factor ropeFactor `json:"factor"`
OriginalMaxPositionEmbeddings uint32 `json:"original_max_position_embeddings"`
} `json:"rope_scaling"`
RMSNormEPS float32 `json:"rms_norm_eps"`
}
var _ ModelConverter = (*qwen2Model)(nil)
func (q *qwen2Model) KV(t *Tokenizer) ggml.KV {
kv := q.ModelParameters.KV(t)
kv["general.architecture"] = "qwen2"
kv["qwen2.block_count"] = q.HiddenLayers
kv["qwen2.context_length"] = q.MaxPositionEmbeddings
kv["qwen2.embedding_length"] = q.HiddenSize
kv["qwen2.feed_forward_length"] = q.IntermediateSize
kv["qwen2.attention.head_count"] = q.NumAttentionHeads
kv["qwen2.attention.head_count_kv"] = q.NumKeyValueHeads
kv["qwen2.rope.freq_base"] = q.RopeTheta
kv["qwen2.attention.layer_norm_rms_epsilon"] = q.RMSNormEPS
switch q.RopeScaling.Type {
case "":
// no scaling
case "yarn":
kv["qwen2.rope.scaling.type"] = q.RopeScaling.Type
kv["qwen2.rope.scaling.factor"] = q.RopeScaling.Factor
default:
panic("unknown rope scaling type")
}
return kv
}
func (q *qwen2Model) Tensors(ts []Tensor) []ggml.Tensor {
var out []ggml.Tensor
for _, t := range ts {
out = append(out, ggml.Tensor{
Name: t.Name(),
Kind: t.Kind(),
Shape: t.Shape(),
WriterTo: t,
})
}
return out
}
func (p *qwen2Model) Replacements() []string {
return []string{
"lm_head", "output",
"model.embed_tokens", "token_embd",
"model.layers", "blk",
"input_layernorm", "attn_norm",
"self_attn.k_proj", "attn_k",
"self_attn.v_proj", "attn_v",
"self_attn.q_proj", "attn_q",
"self_attn.o_proj", "attn_output",
"mlp.down_proj", "ffn_down",
"mlp.gate_proj", "ffn_gate",
"mlp.up_proj", "ffn_up",
"post_attention_layernorm", "ffn_norm",
"model.norm", "output_norm",
}
}

477
convert/convert_test.go Normal file
View File

@@ -0,0 +1,477 @@
package convert
import (
"bytes"
"crypto/sha256"
"encoding/binary"
"encoding/hex"
"encoding/json"
"flag"
"fmt"
"io"
"io/fs"
"log/slog"
"os"
"path/filepath"
"slices"
"strings"
"testing"
"golang.org/x/exp/maps"
"github.com/ollama/ollama/fs/ggml"
)
type tensorData struct {
Offsets []int `json:"data_offsets"`
Type string `json:"dtype"`
Shape []int `json:"shape"`
}
func convertFull(t *testing.T, fsys fs.FS) (*os.File, ggml.KV, ggml.Tensors) {
t.Helper()
f, err := os.CreateTemp(t.TempDir(), "f16")
if err != nil {
t.Fatal(err)
}
defer f.Close()
if err := ConvertModel(fsys, f); err != nil {
t.Fatal(err)
}
r, err := os.Open(f.Name())
if err != nil {
t.Fatal(err)
}
t.Cleanup(func() { r.Close() })
m, _, err := ggml.Decode(r, -1)
if err != nil {
t.Fatal(err)
}
if _, err := r.Seek(0, io.SeekStart); err != nil {
t.Fatal(err)
}
return r, m.KV(), m.Tensors()
}
func generateResultsJSON(t *testing.T, f *os.File, kv ggml.KV, tensors ggml.Tensors) map[string]string {
actual := make(map[string]string)
for k, v := range kv {
if s, ok := v.(json.Marshaler); !ok {
actual[k] = fmt.Sprintf("%v", v)
} else {
bts, err := json.Marshal(s)
if err != nil {
t.Fatal(err)
}
actual[k] = fmt.Sprintf("%x", sha256.Sum256(bts))
}
}
for _, tensor := range tensors.Items() {
sha256sum := sha256.New()
sr := io.NewSectionReader(f, int64(tensors.Offset+tensor.Offset), int64(tensor.Size()))
if _, err := io.Copy(sha256sum, sr); err != nil {
t.Fatal(err)
}
actual[tensor.Name] = hex.EncodeToString(sha256sum.Sum(nil))
}
return actual
}
func TestMain(m *testing.M) {
var level slog.Level
flag.TextVar(&level, "level", slog.LevelInfo, "log level")
flag.Parse()
slog.SetLogLoggerLevel(level)
os.Exit(m.Run())
}
func TestConvertModel(t *testing.T) {
cases := []string{
"Meta-Llama-3-8B-Instruct",
"Meta-Llama-3.1-8B-Instruct",
"Mistral-7B-Instruct-v0.2",
"Mixtral-8x7B-Instruct-v0.1",
"gemma-2b-it",
"gemma-2-2b-it",
// microsoft/Phi-3-mini-128-instruct@d548c233192db00165d842bf8edff054bb3212f8
"Phi-3-mini-128k-instruct",
"all-MiniLM-L6-v2",
"gemma-2-9b-it",
"Qwen2.5-0.5B-Instruct",
"c4ai-command-r-v01",
}
for i := range cases {
tt := cases[i]
t.Run(tt, func(t *testing.T) {
t.Parallel()
p := filepath.Join("testdata", tt)
if testing.Short() {
t.Skip("skipping in short mode")
} else if _, err := os.Stat(p); err != nil {
t.Skipf("%s not found", p)
}
f, kv, tensors := convertFull(t, os.DirFS(p))
actual := generateResultsJSON(t, f, kv, tensors)
expectFile, err := os.Open(filepath.Join("testdata", fmt.Sprintf("%s.json", tt)))
if err != nil {
t.Fatal(err)
}
var expect map[string]string
if err := json.NewDecoder(expectFile).Decode(&expect); err != nil {
t.Fatal(err)
}
keys := maps.Keys(expect)
slices.Sort(keys)
for _, k := range keys {
if v, ok := actual[k]; !ok {
t.Errorf("missing %s", k)
} else if v != expect[k] {
t.Errorf("unexpected %s: want %s, got %s", k, expect[k], v)
}
}
})
}
}
func TestConvertInvalidTensorNames(t *testing.T) {
f, err := os.CreateTemp(t.TempDir(), "testmodel")
if err != nil {
t.Fatal(err)
}
defer f.Close()
tempDir := t.TempDir()
td := map[string]*tensorData{}
offset := 4096
td["model.layers.0.self_attn.q_proj.weight"] = &tensorData{
Offsets: []int{0, offset},
Type: "F32",
Shape: []int{4096, 4096},
}
td["blk.0.attn_q.weight"] = &tensorData{
Offsets: []int{offset, offset * 2},
Type: "F32",
Shape: []int{4096, 4096},
}
generateSafetensorTestData(t, tempDir, td)
err = ConvertModel(os.DirFS(tempDir), f)
if err == nil || !strings.HasPrefix(err.Error(), "duplicate tensor name") {
t.Errorf("expected error but didn't get one")
}
}
func TestConvertInvalidDatatype(t *testing.T) {
f, err := os.CreateTemp(t.TempDir(), "testmodel")
if err != nil {
t.Fatal(err)
}
defer f.Close()
tempDir := t.TempDir()
td := map[string]*tensorData{}
offset := 4096 * 14336
td["model.layers.0.mlp.down_proj.weight"] = &tensorData{
Offsets: []int{0, offset},
Type: "I8",
Shape: []int{4096, 14336},
}
td["model.layers.0.mlp.down_proj.weight_format"] = &tensorData{
Offsets: []int{offset, offset},
Type: "U8",
Shape: []int{},
}
generateSafetensorTestData(t, tempDir, td)
err = ConvertModel(os.DirFS(tempDir), f)
if err == nil || err.Error() != "unsupported safetensors model" {
t.Errorf("expected error but didn't get one")
}
}
func generateSafetensorTestData(t *testing.T, tempDir string, tensorData map[string]*tensorData) {
data, err := json.Marshal(tensorData)
if err != nil {
t.Fatal(err)
}
var buf bytes.Buffer
l := int64(len(data))
err = binary.Write(&buf, binary.LittleEndian, l)
if err != nil {
t.Fatal(err)
}
_, err = buf.Write(data)
if err != nil {
t.Fatal(err)
}
fdata, err := os.Create(filepath.Join(tempDir, "model-00001-of-00001.safetensors"))
if err != nil {
t.Fatal(err)
}
defer fdata.Close()
_, err = fdata.Write(buf.Bytes())
if err != nil {
t.Fatal(err)
}
configData := `
{
"architectures": [
"LlamaForCausalLM"
]
}
`
f, err := os.Create(filepath.Join(tempDir, "config.json"))
if err != nil {
t.Fatal(err)
}
defer f.Close()
_, err = f.WriteString(configData)
if err != nil {
t.Fatal(err)
}
tokenizerData := `
{
}
`
f, err = os.Create(filepath.Join(tempDir, "tokenizer.json"))
if err != nil {
t.Fatal(err)
}
defer f.Close()
_, err = f.WriteString(tokenizerData)
if err != nil {
t.Fatal(err)
}
}
func TestConvertAdapter(t *testing.T) {
type AdapterCase struct {
Name string
BaseKV map[string]any
Expected map[string]string
}
cases := []AdapterCase{
{
Name: "discollama",
BaseKV: map[string]any{
"general.architecture": "llama",
"llama.attention.head_count": uint32(32),
"llama.attention.head_count_kv": uint32(8),
},
Expected: map[string]string{
"general.architecture": "llama",
"general.file_type": "1",
"general.parameter_count": "106496",
"general.type": "adapter",
"general.version": "v0.2",
"adapter.lora.alpha": "16",
"adapter.type": "lora",
"llama.attention.head_count": "32",
"llama.attention.head_count_kv": "8",
"blk.31.attn_q.weight.lora_a": "0eb3318b02cd313429bcc7621b539fdbb10240fea190c56c9e5f93fcd37a4e50",
"blk.31.attn_q.weight.lora_b": "0eb3318b02cd313429bcc7621b539fdbb10240fea190c56c9e5f93fcd37a4e50",
"blk.31.attn_v.weight.lora_a": "0eb3318b02cd313429bcc7621b539fdbb10240fea190c56c9e5f93fcd37a4e50",
"blk.31.attn_v.weight.lora_b": "071dcafe89df065d6e1c935ecb8fdf6479b3c202eb912e7da938597673ff5857",
},
},
}
for _, c := range cases {
t.Run(c.Name, func(t *testing.T) {
t.Parallel()
f, err := os.CreateTemp(t.TempDir(), "f16")
if err != nil {
t.Fatal(err)
}
defer f.Close()
tempDir := t.TempDir()
generateLoraTestData(t, tempDir)
if err = ConvertAdapter(os.DirFS(tempDir), f, c.BaseKV); err != nil {
t.Fatal(err)
}
r, err := os.Open(f.Name())
if err != nil {
t.Fatal(err)
}
defer r.Close()
m, _, err := ggml.Decode(r, -1)
if err != nil {
t.Fatal(err)
}
if _, err := r.Seek(0, io.SeekStart); err != nil {
t.Fatal(err)
}
actual := generateResultsJSON(t, r, m.KV(), m.Tensors())
keys := maps.Keys(c.Expected)
slices.Sort(keys)
for _, k := range keys {
if v, ok := actual[k]; !ok {
t.Errorf("missing %s", k)
} else if v != c.Expected[k] {
t.Errorf("unexpected %s: want %s, got %s", k, c.Expected[k], v)
}
}
})
}
}
func generateLoraTestData(t *testing.T, tempDir string) {
offset := 4096 * 8 * 4
td := map[string]*tensorData{"__metadata__": nil}
td["model.layers.31.self_attn.q_proj.lora_a"] = &tensorData{
Offsets: []int{0, offset},
Type: "F32",
Shape: []int{4096, 8},
}
td["model.layers.31.self_attn.q_proj.lora_b"] = &tensorData{
Offsets: []int{offset, offset * 2},
Type: "F32",
Shape: []int{8, 4096},
}
td["model.layers.31.self_attn.v_proj.lora_a"] = &tensorData{
Offsets: []int{offset * 2, offset * 3},
Type: "F32",
Shape: []int{4096, 8},
}
td["model.layers.31.self_attn.v_proj.lora_b"] = &tensorData{
Offsets: []int{offset * 3, offset*3 + 8*1024*4},
Type: "F32",
Shape: []int{8, 1024},
}
data, err := json.Marshal(td)
if err != nil {
t.Fatal(err)
}
var buf bytes.Buffer
l := int64(len(data))
err = binary.Write(&buf, binary.LittleEndian, l)
if err != nil {
t.Fatal(err)
}
_, err = buf.Write(data)
if err != nil {
t.Fatal(err)
}
// write some data for the tensors
ones := make([]float32, 4096*8)
for i := range ones {
ones[i] = float32(1)
}
for range 3 {
err = binary.Write(&buf, binary.LittleEndian, ones)
if err != nil {
t.Fatal(err)
}
}
ones = make([]float32, 1024*8)
for i := range ones {
ones[i] = float32(1)
}
err = binary.Write(&buf, binary.LittleEndian, ones)
if err != nil {
t.Fatal(err)
}
fdata, err := os.Create(filepath.Join(tempDir, "adapters.safetensors"))
if err != nil {
t.Fatal(err)
}
defer fdata.Close()
_, err = fdata.Write(buf.Bytes())
if err != nil {
t.Fatal(err)
}
configData := `
{
"adapter_path": "adapters-test",
"batch_size": 8,
"config": "config-tiny.json",
"data": "../discollama-completion",
"grad_checkpoint": null,
"iters": 1000,
"learning_rate": 1e-05,
"lora_layers": 1,
"lora_parameters": {
"rank": 8,
"alpha": 16,
"dropout": 0.0,
"scale": 2.0
},
"lr_schedule": null,
"max_seq_length": 2048,
"model": "/Users/pdevine/git/Meta-Llama-3-8B-Instruct",
"resume_adapter_file": null,
"save_every": 100,
"seed": 0,
"steps_per_eval": 200,
"steps_per_report": 10,
"test": false,
"test_batches": 500,
"train": true,
"use_dora": false,
"val_batches": 25
}
`
f, err := os.Create(filepath.Join(tempDir, "adapter_config.json"))
if err != nil {
t.Fatal(err)
}
defer f.Close()
_, err = f.WriteString(configData)
if err != nil {
t.Fatal(err)
}
}

58
convert/fs.go Normal file
View File

@@ -0,0 +1,58 @@
package convert
import (
"archive/zip"
"errors"
"io"
"io/fs"
"os"
"path/filepath"
)
type ZipReader struct {
r *zip.Reader
p string
// limit is the maximum size of a file that can be read directly
// from the zip archive. Files larger than this size will be extracted
limit int64
}
func NewZipReader(r *zip.Reader, p string, limit int64) fs.FS {
return &ZipReader{r, p, limit}
}
func (z *ZipReader) Open(name string) (fs.File, error) {
r, err := z.r.Open(name)
if err != nil {
return nil, err
}
defer r.Close()
if fi, err := r.Stat(); err != nil {
return nil, err
} else if fi.Size() < z.limit {
return r, nil
}
if !filepath.IsLocal(name) {
return nil, zip.ErrInsecurePath
}
n := filepath.Join(z.p, name)
if _, err := os.Stat(n); errors.Is(err, os.ErrNotExist) {
w, err := os.Create(n)
if err != nil {
return nil, err
}
defer w.Close()
if _, err := io.Copy(w, r); err != nil {
return nil, err
}
} else if err != nil {
return nil, err
}
return os.Open(n)
}

Some files were not shown because too many files have changed in this diff Show More