Patrick Devine
cb576a6b23
fix ref
2024-08-26 19:59:33 -07:00
Patrick Devine
15b7ff3a89
more comments
2024-08-26 19:56:45 -07:00
Patrick Devine
3ad243466b
comments
2024-08-26 19:54:06 -07:00
Patrick Devine
a13e583c49
cleanup whitespace
2024-08-26 18:09:21 -07:00
Patrick Devine
3c1994d0ee
small change
2024-08-26 18:07:59 -07:00
Patrick Devine
1b2da3829d
update the import docs
2024-08-26 18:04:46 -07:00
Daniel Hiltgen
0f92b19bec
Only enable numa on CPUs ( #6484 )
...
The numa flag may be having a performance impact on multi-socket systems with GPU loads
2024-08-24 17:24:50 -07:00
Daniel Hiltgen
69be940bf6
gpu: Group GPU Library sets by variant ( #6483 )
...
The recent cuda variant changes uncovered a bug in ByLibrary
which failed to group by common variant for GPU types.
2024-08-23 15:11:56 -07:00
Michael Yang
9638c24c58
Merge pull request #5446 from ollama/mxyng/faq
...
update faq
2024-08-23 14:05:59 -07:00
Michael Yang
bb362caf88
update faq
2024-08-23 13:37:21 -07:00
Patrick Devine
0c819e167b
convert safetensor adapters into GGUF ( #6327 )
2024-08-23 11:29:56 -07:00
Daniel Hiltgen
7a1e1c1caf
gpu: Ensure driver version set before variant ( #6480 )
...
During rebasing, the ordering was inverted causing the cuda version
selection logic to break, with driver version being evaluated as zero
incorrectly causing a downgrade to v11.
2024-08-23 11:21:12 -07:00
Daniel Hiltgen
0b03b9c32f
llm: Align cmake define for cuda no peer copy ( #6455 )
...
Define changed recently and this slipped through the cracks with the old
name.
2024-08-23 11:20:39 -07:00
Daniel Hiltgen
90ca84172c
Fix embeddings memory corruption ( #6467 )
...
* Fix embeddings memory corruption
The patch was leading to a buffer overrun corruption. Once removed though, parallism
in server.cpp lead to hitting an assert due to slot/seq IDs being >= token count. To
work around this, only use slot 0 for embeddings.
* Fix embed integration test assumption
The token eval count has changed with recent llama.cpp bumps (0.3.5+)
2024-08-22 14:51:42 -07:00
Michael Yang
6bd8a4b0a1
Merge pull request #6064 from ollama/mxyng/convert-llama3
...
convert: update llama conversion for llama3.1
2024-08-21 12:57:09 -07:00
Michael Yang
77903ab8b4
llama3.1
2024-08-21 11:49:31 -07:00
Michael Yang
e22286c9e1
Merge pull request #5365 from ollama/mxyng/convert-gemma2
...
convert gemma2
2024-08-21 11:48:43 -07:00
Michael Yang
107f695929
Merge pull request #4917 from ollama/mxyng/convert-bert
...
convert bert model from safetensors
2024-08-21 11:48:29 -07:00
Michael Yang
4ecc70d3b4
Merge pull request #6386 from zwwhdls/fix-new-layer
...
fix: chmod new layer to 0o644 when creating it
2024-08-21 10:58:45 -07:00
Michael Yang
3546bbd08c
convert gemma2
2024-08-20 17:27:51 -07:00
Michael Yang
beb49eef65
create bert models from cli
2024-08-20 17:27:34 -07:00
Michael Yang
5a28b9cf5f
bert
2024-08-20 17:27:34 -07:00
Daniel Hiltgen
a017cf2fea
Split rocm back out of bundle ( #6432 )
...
We're over budget for github's maximum release artifact size with rocm + 2 cuda
versions. This splits rocm back out as a discrete artifact, but keeps the layout so it can
be extracted into the same location as the main bundle.
2024-08-20 07:26:38 -07:00
Daniel Hiltgen
19e5a890f7
CI: remove directories from dist dir before upload step ( #6429 )
2024-08-19 15:19:21 -07:00
Daniel Hiltgen
f91c9e3709
CI: handle directories during checksum ( #6427 )
2024-08-19 13:48:45 -07:00
Daniel Hiltgen
2df6905ede
Merge pull request #6424 from dhiltgen/cuda_v12
...
Fix overlapping artifact name on CI
2024-08-19 12:11:58 -07:00
Daniel Hiltgen
d8be22e47d
Fix overlapping artifact name on CI
2024-08-19 12:07:18 -07:00
Daniel Hiltgen
652c273f0e
Merge pull request #5049 from dhiltgen/cuda_v12
...
Cuda v12
2024-08-19 11:14:24 -07:00
Daniel Hiltgen
88e7705079
Merge pull request #6402 from rick-github/numParallel
...
Override numParallel in pickBestPartialFitByLibrary() only if unset.
2024-08-19 11:07:22 -07:00
Daniel Hiltgen
f9e31da946
Review comments
2024-08-19 10:36:15 -07:00
Daniel Hiltgen
88bb9e3328
Adjust layout to bin+lib/ollama
2024-08-19 09:38:53 -07:00
Daniel Hiltgen
3b19cdba2a
Remove Jetpack
2024-08-19 09:38:53 -07:00
Daniel Hiltgen
927d98a6cd
Add windows cuda v12 + v11 support
2024-08-19 09:38:53 -07:00
Daniel Hiltgen
f6c811b320
Enable cuda v12 flags
2024-08-19 09:38:53 -07:00
Daniel Hiltgen
4fe3a556fa
Add cuda v12 variant and selection logic
...
Based on compute capability and driver version, pick
v12 or v11 cuda variants.
2024-08-19 09:38:53 -07:00
Daniel Hiltgen
fc3b4cda89
Report GPU variant in log
2024-08-19 09:38:53 -07:00
Daniel Hiltgen
d470ebe78b
Add Jetson cuda variants for arm
...
This adds new variants for arm64 specific to Jetson platforms
2024-08-19 09:38:53 -07:00
Daniel Hiltgen
c7bcb00319
Wire up ccache and pigz in the docker based build
...
This should help speed things up a little
2024-08-19 09:38:53 -07:00
Daniel Hiltgen
74d45f0102
Refactor linux packaging
...
This adjusts linux to follow a similar model to windows with a discrete archive
(zip/tgz) to cary the primary executable, and dependent libraries. Runners are
still carried as payloads inside the main binary
Darwin retain the payload model where the go binary is fully self contained.
2024-08-19 09:38:53 -07:00
Jeffrey Morgan
9fddef3731
server: limit upload parts to 16 ( #6411 )
2024-08-19 09:20:52 -07:00
Richard Lyons
885cf45087
Fix white space.
2024-08-18 03:07:16 +02:00
Richard Lyons
9352eeb752
Reset NumCtx.
2024-08-18 02:55:01 +02:00
Richard Lyons
0ad0e738cd
Override numParallel only if unset.
2024-08-18 01:43:26 +02:00
zwwhdls
bdc4308afb
fix: chmod new layer to 0o644 when creating it
...
Signed-off-by: zwwhdls <zww@hdls.me >
2024-08-16 11:43:19 +08:00
Daniel Hiltgen
d29cd4c2ed
Merge pull request #6381 from eust-w/main
...
fix: Add tooltip to system tray icon
2024-08-15 15:31:15 -07:00
eust-w
a84c05cf91
fix: Add tooltip to system tray icon
...
- Updated setIcon method to include tooltip text for the system tray icon.
- Added NIF_TIP flag and set the tooltip text using UTF16 encoding.
Resolves : #6372
2024-08-16 06:00:12 +08:00
Michael Yang
e3d7f32af7
Merge pull request #6363 from ollama/mxyng/fix-noprune
...
fix: noprune on pull
2024-08-15 12:20:38 -07:00
Michael Yang
3a75e74e34
only skip invalid json manifests
2024-08-15 10:29:14 -07:00
Michael Yang
237dccba1e
skip invalid manifest files
2024-08-14 16:55:45 -07:00
Michael Yang
b3f75fc812
fix noprune
2024-08-14 15:48:51 -07:00
Jeffrey Morgan
8200c371ae
add CONTRIBUTING.md ( #6349 )
2024-08-14 15:19:50 -07:00
longtao
0a8d6ea86d
Fix typo and improve readability ( #5964 )
...
* Fix typo and improve readability
Summary:
* Rename updatAvailableMenuID to updateAvailableMenuID
* Replace unused cmd parameter with _ in RunServer function
* Fix typos in comments
(cherry picked from commit 5b8715f0b04773369e8eb1f9e6737995a0ab3ba7)
* Update api/client.go
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com >
---------
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com >
2024-08-13 17:54:19 -07:00
Blake Mizerany
8e1050f366
server: reduce max connections used in download ( #6347 )
...
The previous value of 64 was WAY too high and unnecessary. It reached
diminishing returns and blew past it. This is a more reasonable number
for _most_ normal cases. For users on cloud servers with excellent
network quality, this will keep screaming for them, without hitting our
CDN limits. For users with relatively poor network quality, this will
keep them from saturating their network and causing other issues.
2024-08-13 16:47:35 -07:00
Bruce MacDonald
eda8a32a09
update chatml template format to latest in docs ( #6344 )
2024-08-13 16:39:18 -07:00
Michael Yang
a0a40aa20c
Merge pull request #6346 from ollama/mxyng/lint
2024-08-13 14:58:35 -07:00
Michael Yang
2697d7f5aa
lint
...
- fixes printf: non-constant format string in call to fmt.Printf
- fixes SA1032: arguments have the wrong order
- disables testifylint
2024-08-13 14:36:33 -07:00
Pamela Fox
1f32276178
Update openai.md to remove extra checkbox ( #6345 )
2024-08-13 13:36:05 -07:00
Daniel Hiltgen
4c4fe3f87f
Merge pull request #6343 from dhiltgen/revert_win_go_version
...
Go back to a pinned Go version
2024-08-13 11:53:49 -07:00
Daniel Hiltgen
feedf49c71
Go back to a pinned Go version
...
Go version 1.22.6 is triggering AV false positives, so go back to 1.22.5
2024-08-13 11:45:44 -07:00
royjhan
8b00a415ab
Load Embedding Model on Empty Input ( #6325 )
...
* load on empty input
* no load on invalid input
2024-08-13 10:19:56 -07:00
Michael Yang
01b80e9ffc
Merge pull request #5443 from ollama/mxyng/convert-phi3
...
add conversion for microsoft phi 3 mini/medium 4k, 128k
2024-08-12 15:47:58 -07:00
Michael Yang
bd5e432630
update import.md
2024-08-12 15:13:29 -07:00
Bruce MacDonald
aec77d6a05
support new "longrope" attention factor
2024-08-12 15:13:29 -07:00
Michael Yang
6ffb5cb017
add conversion for microsoft phi 3 mini/medium 4k, 128
2024-08-12 15:13:29 -07:00
Josh
f7e3b9190f
cmd: spinner progress for transfer model data ( #6100 )
2024-08-12 11:46:32 -07:00
Josh
980dd15f81
cmd: speed up gguf creates ( #6324 )
2024-08-12 11:46:09 -07:00
royjhan
01d544d373
OpenAI: Simplify input output in testing ( #5858 )
...
* simplify input output
* direct comp
* in line image
* rm error pointer type
* update response testing
* lint
2024-08-12 10:33:34 -07:00
Josh
1dc3ef3aa9
Revert "server: speed up single gguf creates ( #5898 )" ( #6323 )
...
This reverts commit 8aac22438e .
2024-08-12 09:57:51 -07:00
Josh
8aac22438e
server: speed up single gguf creates ( #5898 )
2024-08-12 09:28:55 -07:00
Jeffrey Morgan
15c2d8fe14
server: parallelize embeddings in API web handler instead of in subprocess runner ( #6220 )
...
For simplicity, perform parallelization of embedding requests in the API handler instead of offloading this to the subprocess runner. This keeps the scheduling story simpler as it builds on existing parallel requests, similar to existing text completion functionality.
2024-08-11 11:57:10 -07:00
Daniel Hiltgen
25906d72d1
llm: prevent loading too large models on windows ( #5926 )
...
Don't allow loading models that would lead to memory exhaustion (across vram, system memory and disk paging). This check was already applied on Linux but should also be applied on Windows as well.
2024-08-11 11:30:20 -07:00
CognitiveTech
023451ce47
add integration obook-summary ( #6305 )
2024-08-10 18:43:08 -07:00
Jesse Gross
9b53e39d8e
Merge pull request #6258 from coolljt0725/fix_typo
...
server/download.go: Fix a typo in log
2024-08-09 17:19:48 -07:00
Michael Yang
97fae2df95
Merge pull request #6235 from Nicholas42/fix_line_endings
...
Set *.png and *.ico to be treated as binary files.
2024-08-09 17:06:30 -07:00
Michael Yang
160d9d4900
Merge pull request #6171 from ollama/mxyng/remove-temp
...
removeall to remove non-empty temp dirs
2024-08-09 15:47:13 -07:00
Nicholas Schwab
d4e6407464
Restrict text files with explicit line feeds to *.go.
...
This partially reverts b732beba6a . It
seems like explicitly setting all files to use line feeds was done due
to issues with the go linter, hence it can be restricted to those files
(https://github.com/ollama/ollama/pull/6235#issuecomment-2278745953 ).
2024-08-09 23:14:13 +02:00
Daniel Hiltgen
b7f7d8cd15
Merge pull request #6291 from dhiltgen/no_sparse_fail
...
Don't hard fail on sparse setup error
2024-08-09 12:30:25 -07:00
Daniel Hiltgen
2fa1db4345
Don't hard fail on sparse setup error
...
It seems this can fail in some casees, but proceed
with the download anyway.
2024-08-09 12:16:19 -07:00
Daniel Hiltgen
71b0945fc6
Merge pull request #6290 from dhiltgen/intel_npe
...
Harden intel boostrap for nil pointers
2024-08-09 12:14:42 -07:00
Daniel Hiltgen
5bca2e60a7
Harden intel boostrap for nil pointers
2024-08-09 11:31:38 -07:00
Nicholas42
67472e0e89
Also flag *.icns as binary
2024-08-09 13:41:20 +02:00
Daniel Hiltgen
e9aa5117c4
Merge pull request #6133 from dhiltgen/cuda_repo
...
Adjust arm cuda repo paths
2024-08-08 12:33:35 -07:00
Daniel Hiltgen
2473bdba5e
Merge pull request #6182 from dhiltgen/more_patterns
...
Catch one more error log
2024-08-08 12:33:17 -07:00
Jesse Gross
7d1c0047fa
Merge pull request #6247 from ollama/jessegross/layers
...
Store layers inside manifests consistently as values.
2024-08-08 10:46:43 -07:00
Jitang Lei
7b61eba471
server/download.go: Fix a typo in log
...
Signed-off-by: Jitang Lei <leijitang@outlook.com >
2024-08-08 20:28:01 +08:00
Jesse Gross
7edaf6e7e8
manifest: Store layers inside manifests consistently as values.
...
Commit 1829fb61 ("manifest: Fix crash on startup when trying to clean up
unused files (#5840 )") changed the config layer stored in manifests
from a pointer to a value. This was done in order to avoid potential
nil pointer dereferences after it is deserialized from JSON in the
event that the field is missing.
This changes the Layers slice to also be stored by value. This enables
consistency in handling across the two objects.
2024-08-07 17:03:06 -07:00
Jesse Gross
97ec8cfd4e
image: Clarify argument to WriteManifest is config
...
When creating a model the config layer is appended to the list of
layers and then the last layer is used as the config when writing the
manifest. This change directly uses the config layer to write the
manifest. There is no behavior change but it is less error prone.
2024-08-07 16:58:42 -07:00
royjhan
5b3a21b578
add metrics to docs ( #6079 )
2024-08-07 14:43:44 -07:00
Kyle Kelley
ad0c19dde4
Use llama3.1 in tools example ( #5985 )
...
* Use llama3.1 in tools example
* Update api.md
2024-08-07 17:20:50 -04:00
Jesse Gross
69eb06c40e
Merge pull request #6145 from ollama/jessegross/bug5840
...
Fix crash on startup when trying to clean up unused files (#5840 )
2024-08-07 11:24:15 -07:00
Jesse Gross
1829fb61bd
manifest: Fix crash on startup when trying to clean up unused files ( #5840 )
...
Currently if the config field is missing in the manifest file (or
corrupted), Ollama will crash when it tries to read it. This can
happen at startup or when pulling new models.
This data is mostly just used for showing model information so we
can be tolerant of it not being present - it is not required to
run the models. Besides avoiding crashing, this also gives us the
ability to restructure the config in the future by pulling it
into the main manifest file.
2024-08-07 10:30:44 -07:00
Nicholas Schwab
ce67706037
Set *.png and *.ico to be treated as binary files.
...
The change b732beba6 makes all files text files and sets lf as eol. This
will automatically change all files to have lf if they are touched by
git (e.g. via git status). This change cannot be stashed and makes it
hard to work with the repo (rebase and checkout don't really work). See
also #6183 .
Here, we set the offending files (*.png and *.ico, but that might be
more in the future) to be treated as binary files and not be changed by
git.
2024-08-07 18:20:11 +02:00
Jesse Gross
685a53534b
manifest: Don't prune layers if we can't open a manifest file
...
If there is an error when opening a manifest file (corrupted, permission denied, etc.)
then the referenced layers will not be included in the list of active
layers. This causes them to be deleted when pruning happens at startup
or a model is pulled.
In such a situation, we should prefer to preserve data in the hopes that
it can be recovered rather than being agressive about deletion.
2024-08-06 23:11:19 -07:00
Jeffrey Morgan
de4fc29773
llm: reserve required number of slots for embeddings ( #6219 )
2024-08-06 23:20:49 -04:00
Jeffrey Morgan
e04c7012c2
update llama.cpp submodule to 1e6f6554 ( #6208 )
2024-08-06 15:11:45 -04:00
Chua Chee Seng
d4a7216c82
Fixed invalid option provided not displaying the invalid option name problem. ( #6202 )
2024-08-06 14:37:16 -04:00
Daniel Hiltgen
a4fdd03c3b
Merge pull request #6207 from dhiltgen/sparse_win
...
Ensure sparse files on windows during download
2024-08-06 11:06:06 -07:00
Daniel Hiltgen
fc85f50a2b
Ensure sparse files on windows during download
...
The file.Truncate call on windows will write the whole file
unless you set the sparse flag, leading to heavy I/O at the
beginning of download. This should improve our
I/O behavior on windows and put less stress on the users disk.
2024-08-06 10:58:08 -07:00
royjhan
86b907f82a
sort batch results ( #6189 )
2024-08-05 16:55:34 -07:00
Michael Yang
10d49bce70
Merge pull request #6190 from ollama/mxyng/fix-integration
...
fix concurrency test
2024-08-05 16:45:49 -07:00
Michael Yang
7ed367419e
fix concurrency test
2024-08-05 16:36:16 -07:00
Daniel Hiltgen
50ee8b5f56
Merge pull request #6186 from dhiltgen/numa
...
Implement linux NUMA detection
2024-08-05 15:20:06 -07:00
Michael Yang
03bdac0595
Merge pull request #6146 from ollama/mxyng/testing
...
use testing tempdirs
2024-08-05 13:00:05 -07:00
Daniel Hiltgen
f457d63400
Implement linux NUMA detection
...
If the system has multiple numa nodes, enable numa support in llama.cpp
If we detect numactl in the path, use that, else use the basic "distribute" mode.
2024-08-05 12:56:20 -07:00
Daniel Hiltgen
04210aa6dd
Catch one more error log
2024-08-05 09:28:07 -07:00
Michael Yang
43f9d92008
close pid file
2024-08-05 00:41:16 -07:00
Michael Yang
ed6c8bfe57
removeall to remove non-empty temp dirs
2024-08-05 00:41:16 -07:00
Michael Yang
39f2bc6bfc
Merge pull request #6167 from ollama/mxyng/line-feed
...
line feed
2024-08-05 00:06:28 -07:00
frob
b73b0940ef
Disable paging for journalctl ( #6154 )
...
Users using `journalctl` to get logs for issue logging sometimes don't realize that paging is causing information to be missed.
2024-08-05 00:10:53 -04:00
Michael Yang
6a07344786
line feed
2024-08-04 17:25:41 -07:00
sryu1
8b920f35a4
Add Gemma 2 2b ( #6151 )
2024-08-04 10:58:39 -04:00
Ivan Charapanau
4221e39867
Reference ollama integration with Harbor ( #6147 )
2024-08-02 17:03:46 -07:00
Michael Yang
a091fadfda
use testing tempdirs
2024-08-02 16:04:06 -07:00
Michael Yang
77ccbf04dc
Merge pull request #6128 from ollama/mxyng/lint
...
enable gofmt/gofumpt/goimports/tenv
2024-08-02 14:58:40 -07:00
royjhan
4addf6b587
Update OpenAI Compatibility Docs with /v1/completions ( #5311 )
...
* Update docs
* token bug corrected
* Update docs/openai.md
* Update docs/openai.md
* add suffix
* merge conflicts
* merge conflicts
2024-08-02 13:16:23 -07:00
royjhan
85c7f11170
Update docs ( #5310 )
2024-08-02 13:05:57 -07:00
Daniel Hiltgen
df3802a65f
Adjust arm cuda repo paths
...
Ubuntu distros fail to install cuda drivers since aarch64 isn't valid
2024-08-01 17:22:25 -07:00
Michael Yang
b732beba6a
lint
2024-08-01 17:06:06 -07:00
Kim Hallberg
ce1fb4447e
Fix models/{model} URL ( #6132 )
2024-08-01 16:31:47 -07:00
royjhan
558a54b098
Update OpenAI Compatibility Docs with /v1/embeddings ( #5470 )
...
* docs without usage
* no usage
* rm metric note
2024-08-01 16:00:29 -07:00
royjhan
ed52833bb1
Add to docs ( #5309 )
2024-08-01 15:58:13 -07:00
royjhan
6f133a0bdd
OpenAI: Add Usage to v1/embeddings ( #5886 )
...
* add prompt tokens to embed response
* rm slog
* metrics
* types
* prompt n
* clean up
* reset submodule
* add tokens to v1/embeddings
* separate usage
2024-08-01 15:49:37 -07:00
royjhan
f561eecfb8
Update OpenAI Compatibility Docs with /v1/models ( #5151 )
...
* OpenAI Docs
* Update docs/openai.md
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com >
* Remove newline
---------
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com >
2024-08-01 15:48:44 -07:00
Michael Yang
ff7c9060ec
Merge pull request #6115 from slouffka/fix-context
...
Fix context in /api/generate grows too much (#5980 ).
2024-08-01 15:13:59 -07:00
Michael Yang
0ff42e84b0
Merge pull request #4756 from ollama/mxyng/convert2
...
refactor convert
2024-08-01 14:16:30 -07:00
Vyacheslav Moskalev
8a9f946ca7
Refactor and format code.
2024-08-02 03:50:05 +07:00
Vyacheslav Moskalev
3b5210548e
Refactor code. Remove extra variable.
2024-08-01 19:56:15 +07:00
Vyacheslav Moskalev
b0c216584c
Better types and naming closer to style.
2024-08-01 19:43:44 +07:00
Vyacheslav Moskalev
49a5483139
Change the order of context and prompt.
2024-08-01 19:25:56 +07:00
Vyacheslav Moskalev
6bc5c13758
Fix extra context concatenation in generate handler ( #5980 ).
2024-08-01 15:45:58 +07:00
Michael Yang
3e614260af
Merge pull request #6109 from ollama/mxyng/fix-modelfile
...
fix modelfile message quotes
2024-07-31 17:05:43 -07:00
Michael Yang
d87b4a488e
fix modelfile message quotes
2024-07-31 16:52:09 -07:00
Michael Yang
4c14855ad7
Merge pull request #6106 from ollama/mxyng/default-sliding-window-attention
...
patches: phi3 optional sliding window attention
2024-07-31 16:12:06 -07:00
Blake Mizerany
dc77bbcfa4
server: fix json marshalling of downloadBlobPart ( #6108 )
2024-07-31 16:01:24 -07:00
Michael Yang
d8e2664c33
convert: fix parse functions
2024-07-31 15:58:55 -07:00
Michael Yang
eafc607abb
convert: only extract large files
2024-07-31 15:58:55 -07:00
Michael Yang
781fc2d576
Update convert/reader_safetensors.go
...
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com >
2024-07-31 15:58:55 -07:00
Michael Yang
df993fa37b
comments
2024-07-31 15:58:55 -07:00
Michael Yang
5e9db9fb0b
refactor convert
2024-07-31 15:58:33 -07:00
Michael Yang
0f3271db88
patches: phi3 default sliding window attention
2024-07-31 14:58:34 -07:00
Michael Yang
6b252918fb
update convert test to check result data
2024-07-31 10:59:38 -07:00
Michael Yang
c4c84b7a0d
Merge pull request #5196 from ollama/mxyng/messages-2
...
include modelfile messages
2024-07-31 10:18:17 -07:00
Michael Yang
5c1912769e
Merge pull request #5473 from ollama/mxyng/environ
...
fix: environ lookup
2024-07-31 10:18:05 -07:00
Daniel Nguyen
71399aa682
Added BoltAI as a desktop UI for Ollama ( #6096 )
2024-07-31 08:44:58 -07:00
Jeffrey Morgan
463a8aa273
Create SECURITY.md
2024-07-30 21:01:12 -07:00
Michael
3579b4966a
Update README to include Firebase Genkit ( #6083 )
...
Firebase Genkit
2024-07-30 18:40:09 -07:00
Jeffrey Morgan
5d66578356
Update README.md
...
Better example for multi-modal input
2024-07-30 18:08:34 -07:00
jmorganca
afa8d6e9d5
patch gemma support
2024-07-30 18:07:29 -07:00
royjhan
1b44d873e7
Add Metrics to api\embed response ( #5709 )
...
* add prompt tokens to embed response
* rm slog
* metrics
* types
* prompt n
* clean up
* reset submodule
* update tests
* test name
* list metrics
2024-07-30 13:12:21 -07:00
Daniel Hiltgen
cef2c6054d
Merge pull request #5859 from dhiltgen/homogeneous_gpus
...
Prevent partial loading on mixed GPU brands
2024-07-30 11:06:42 -07:00
Daniel Hiltgen
345420998e
Prevent partial loading on mixed GPU brands
...
In mult-brand GPU setups, if we couldn't fully load the model we
would fall through the scheduler and mistakenly try to load across
a mix of brands. This makes sure we find the set of GPU(s) that
best fit for the partial load.
2024-07-30 11:00:55 -07:00
Kim Hallberg
0be8baad2b
Update and Fix example models ( #6065 )
...
* Update example models
* Remove unused README.md
2024-07-29 23:56:37 -07:00
Daniel Hiltgen
1a83581a8e
Merge pull request #5895 from dhiltgen/sched_faq
...
Better explain multi-gpu behavior
2024-07-29 14:25:41 -07:00
Daniel Hiltgen
37926eb991
Merge pull request #5927 from dhiltgen/high_cpu_count
...
Ensure amd gpu nodes are numerically sorted
2024-07-29 14:24:57 -07:00
Daniel Hiltgen
3d4634fdff
Merge pull request #5934 from dhiltgen/missing_cuda_repo
...
Report better error on cuda unsupported os/arch
2024-07-29 14:24:20 -07:00
royjhan
365431d406
return tool calls finish reason for openai ( #5995 )
...
* hot fix
* backend stream support
* clean up
* finish reason
* move to openai
2024-07-29 13:56:57 -07:00
Daniel Hiltgen
161e12cecf
Merge pull request #5932 from dhiltgen/win_font
...
Explain font problems on windows 10
2024-07-29 13:40:24 -07:00
Jeffrey Morgan
46e6327e0f
api: add stringifier for Tool ( #5891 )
2024-07-29 13:35:16 -07:00
Jeffrey Morgan
68ee42f995
update llama.cpp submodule to 6eeaeba1 ( #6039 )
2024-07-29 13:20:26 -07:00
Ikko Eltociear Ashimine
f26aef9a8b
docs: update README.md ( #6059 )
...
HuggingFace -> Hugging Face
2024-07-29 10:53:30 -07:00
Michael Yang
38d9036b59
Merge pull request #5992 from ollama/mxyng/save
...
fix: model save
2024-07-29 09:53:19 -07:00
Veit Heller
6f26e9322f
Fix typo in image docs ( #6041 )
2024-07-29 08:50:53 -07:00
Jeffrey Morgan
0e4d653687
upate to llama3.1 elsewhere in repo ( #6032 )
2024-07-28 19:56:02 -07:00
Michael
2c01610616
update readme to llama3.1 ( #5933 )
2024-07-28 14:21:38 -07:00
Tibor Schmidt
f3d7a481b7
feat: add support for min_p ( resolve #1142 ) ( #1825 )
2024-07-27 14:37:40 -07:00
Jeffrey Morgan
f2a96c7d77
llm: keep patch for llama 3 rope factors ( #5987 )
2024-07-26 15:20:52 -07:00
Daniel Hiltgen
e8a66680d1
Merge pull request #5705 from dhiltgen/win_errormode
...
Enable windows error dialog for subprocess
2024-07-26 14:49:34 -07:00
Michael Yang
079b2c3b03
Merge pull request #5999 from ollama/mxyng/fix-push
...
fix nil deref in auth.go
2024-07-26 14:28:34 -07:00
Blake Mizerany
750c1c55f7
server: fix race conditions during download ( #5994 )
...
This fixes various data races scattered throughout the download/pull
client where the client was accessing the download state concurrently.
This commit is mostly a hot-fix and will be replaced by a new client one
day soon.
Also, remove the unnecessary opts argument from downloadChunk.
2024-07-26 14:24:24 -07:00
Michael Yang
a622c47bd3
fix nil deref in auth.go
2024-07-26 14:14:48 -07:00
Michael Yang
ec4c35fe99
Merge pull request #5512 from ollama/mxyng/detect-stop
...
autodetect stop parameters from template
2024-07-26 13:48:23 -07:00
Michael Yang
a250c2cb13
display messages
2024-07-26 13:39:57 -07:00
Michael Yang
3d9de805b7
fix: model save
...
stop parameter is saved as a slice which is incompatible with modelfile
parsing
2024-07-26 13:23:06 -07:00
Michael Yang
15af558423
include modelfile messages
2024-07-26 11:40:11 -07:00
Jeffrey Morgan
f5e3939220
Update api.md ( #5968 )
2024-07-25 23:10:18 -04:00
Jeffrey Morgan
ae27d9dcfd
Update openai.md
2024-07-25 20:27:33 -04:00
Michael Yang
37096790a7
Merge pull request #5552 from ollama/mxyng/messages-docs
...
docs
2024-07-25 16:26:19 -07:00
Michael Yang
997c903884
Update docs/template.md
...
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com >
2024-07-25 16:23:40 -07:00
Blake Mizerany
c8af3c2d96
server: reuse original download URL for images ( #5962 )
...
This changes the registry client to reuse the original download URL
it gets on the first redirect response for all subsequent requests,
preventing thundering herd issues when hot new LLMs are released.
2024-07-25 15:58:30 -07:00
Jeffrey Morgan
455e61170d
Update openai.md
2024-07-25 18:34:47 -04:00
royjhan
4de1370a9d
openai tools doc ( #5617 )
2024-07-25 18:34:06 -04:00
Jeffrey Morgan
bbf8f102ee
Revert "llm(llama): pass rope factors ( #5924 )" ( #5963 )
...
This reverts commit bb46bbcf5e .
2024-07-25 18:24:55 -04:00
Daniel Hiltgen
ce3c93b08f
Report better error on cuda unsupported os/arch
...
If we detect an NVIDIA GPU, but nvidia doesn't support the os/arch,
this will report a better error for the user and point them to docs
to self-install the drivers if possible.
2024-07-24 17:09:20 -07:00
Daniel Hiltgen
6c2129d5d0
Explain font problems on windows 10
2024-07-24 15:22:00 -07:00
Daniel Hiltgen
7c2a157ca4
Ensure amd gpu nodes are numerically sorted
...
For systems that enumerate over 10 CPUs the default lexicographical
sort order interleaves CPUs and GPUs.
2024-07-24 13:43:26 -07:00
Michael Yang
bb46bbcf5e
llm(llama): pass rope factors ( #5924 )
2024-07-24 16:05:59 -04:00
royjhan
ac33aa7d37
Fix Embed Test Flakes ( #5893 )
...
* float cmp
* increase tolerance
2024-07-24 11:15:46 -07:00
Daniel Hiltgen
830fdd2715
Better explain multi-gpu behavior
2024-07-23 15:16:38 -07:00
Ajay Chintala
a6cd8f6169
Update README.md to add LLMStack integration ( #5799 )
2024-07-23 14:40:23 -04:00
Daniel Hiltgen
c78089263a
Merge pull request #5864 from dhiltgen/bump_go
...
Bump Go patch version
2024-07-22 16:34:18 -07:00
Daniel Hiltgen
3e5ea035d5
Merge pull request #5757 from lreed-mdsol/lreed/bump-go-version-fix-vulnerabilities
...
bump go version to 1.22.5 to fix security vulnerabilities in docker
2024-07-22 16:32:43 -07:00
Daniel Hiltgen
5d604eec5b
Bump Go patch version
2024-07-22 16:16:28 -07:00
Josh
db0968f30c
fix dupe err message ( #5857 )
2024-07-22 15:48:15 -07:00
Daniel Hiltgen
e12fff8810
Enable windows error dialog for subprocess startup
...
Make sure if something goes wrong spawning the process, the user gets
enough info to be able to try to self correct, or at least file a bug
with details so we can fix it. Once the process starts, we immediately
change back to the recommended setting to prevent the blocking dialog.
This ensures if the model fails to load (OOM, unsupported model type,
etc.) the process will exit quickly and we can scan the stdout/stderr
of the subprocess for the reason to report via API.
2024-07-22 14:07:27 -07:00
Michael Yang
9b60a038e5
update api.md
2024-07-22 13:49:51 -07:00
Michael Yang
83a0cb8d88
docs
2024-07-22 13:38:09 -07:00
royjhan
c0648233f2
api embed docs ( #5282 )
2024-07-22 13:37:08 -07:00
Jeffrey Morgan
d835368eb8
convert: capture head_dim for mistral ( #5818 )
2024-07-22 16:16:22 -04:00
Michael Yang
85d9d73a72
comments
2024-07-22 11:49:03 -07:00
Michael Yang
78140a712c
cleanup tests
2024-07-22 11:49:03 -07:00
Michael Yang
1954ec5917
uint64
2024-07-22 11:49:02 -07:00
Michael Yang
0f1910129f
int
2024-07-22 11:30:07 -07:00
Michael Yang
e2c3f6b3e2
string
2024-07-22 11:27:52 -07:00
Michael Yang
8570c1c0ef
keepalive
2024-07-22 11:27:22 -07:00
Michael Yang
55cd3ddcca
bool
2024-07-22 11:27:21 -07:00
Michael Yang
66fe77f084
models
2024-07-22 11:26:12 -07:00
Michael Yang
d1a5227cad
origins
2024-07-22 11:25:30 -07:00
Michael Yang
4f1afd575d
host
2024-07-22 11:25:30 -07:00
Michael Yang
35b89b2eab
rfc: dynamic environ lookup
2024-07-22 11:25:30 -07:00
Daniel Hiltgen
5784c05397
Merge pull request #5854 from dhiltgen/win_exit_status
...
Refine error reporting for subprocess crash
2024-07-22 10:40:22 -07:00
Daniel Hiltgen
f14aa5435d
Merge pull request #5855 from dhiltgen/remove_max_vram
...
Remove no longer supported max vram var
2024-07-22 10:35:29 -07:00
Jeffrey Morgan
f8fedbda20
Update llama.cpp submodule commit to d94c6e0c ( #5805 )
2024-07-22 12:42:00 -04:00
Jeffrey Morgan
b3e5491e41
server: collect nested tool call objects when parsing ( #5824 )
2024-07-22 12:38:03 -04:00
Daniel Hiltgen
cc269ba094
Remove no longer supported max vram var
...
The OLLAMA_MAX_VRAM env var was a temporary workaround for OOM
scenarios. With Concurrency this was no longer wired up, and the simplistic
value doesn't map to multi-GPU setups. Users can still set `num_gpu`
to limit memory usage to avoid OOM if we get our predictions wrong.
2024-07-22 09:08:11 -07:00
Daniel Hiltgen
a3c20e3f18
Refine error reporting for subprocess crash
...
On windows, the exit status winds up being the search term many
users search for and end up piling in on issues that are unrelated.
This refines the reporting so that if we have a more detailed message
we'll suppress the exit status portion of the message.
2024-07-22 08:52:16 -07:00
Jeffrey Morgan
80ee9b5e47
Remove out of space test temporarily ( #5825 )
2024-07-21 00:22:11 -04:00
Jeffrey Morgan
5534f2cc6a
llm: consider head_dim in llama arch ( #5817 )
2024-07-20 21:48:12 -04:00
Daniel Hiltgen
d321297d8a
Merge pull request #5815 from dhiltgen/win_rocm_gfx_features
...
Adjust windows ROCm discovery
2024-07-20 16:02:55 -07:00
Daniel Hiltgen
06e5d74e34
Merge pull request #5506 from dhiltgen/sched_tests
...
Refine scheduler unit tests for reliability
2024-07-20 15:48:39 -07:00
Daniel Hiltgen
5d707e6fd5
Merge pull request #5583 from dhiltgen/integration_improvements
...
Fix context exhaustion integration test for small gpus
2024-07-20 15:48:21 -07:00
Daniel Hiltgen
283948c83b
Adjust windows ROCm discovery
...
The v5 hip library returns unsupported GPUs which wont enumerate at
inference time in the runner so this makes sure we align discovery. The
gfx906 cards are no longer supported so we shouldn't compile with that
GPU type as it wont enumerate at runtime.
2024-07-20 15:17:50 -07:00
Jeffrey Morgan
1475eab95f
add patch for tekken ( #5807 )
2024-07-20 13:41:21 -04:00
Jeffrey Morgan
20090f3172
preserve last assistant message ( #5802 )
2024-07-19 20:19:26 -07:00
Jeffrey Morgan
69a2d4ccff
Fix generate test flakyness ( #5804 )
2024-07-19 19:11:25 -07:00
Josh
e8b954c646
server: validate template ( #5734 )
...
add template validation to modelfile
2024-07-19 15:24:29 -07:00
royjhan
c57317cbf0
OpenAI: Function Based Testing ( #5752 )
...
* distinguish error forwarding
* more coverage
* rm comment
2024-07-19 11:37:12 -07:00
royjhan
51b2fd299c
adjust openai chat msg processing ( #5729 )
2024-07-19 11:19:20 -07:00
Michael Yang
d0634b1596
Merge pull request #5780 from ollama/mxyng/tools
...
fix parsing tool calls: break on unexpected eofs
2024-07-18 12:14:10 -07:00
Michael Yang
43606d6d6a
fix parsing tool calls
2024-07-18 12:08:11 -07:00
Jeffrey Morgan
70b1010fa5
server: check for empty tools array too ( #5779 )
2024-07-18 11:44:57 -07:00
Jeffrey Morgan
84e5721f3a
always provide content even if empty ( #5778 )
2024-07-18 11:28:19 -07:00
Jeffrey Morgan
319fb1ce03
server: only parse tool calls if tools are provided ( #5771 )
...
* server: only parse tool calls if tools are provided
* still set `resp.Message.Content`
2024-07-18 08:50:23 -07:00
Michael Yang
b255445557
marshal json automatically for some template values ( #5758 )
2024-07-17 15:35:11 -07:00
lreed
f02f83660c
bump go version to 1.22.5 to fix security vulnerabilities
2024-07-17 21:44:19 +00:00
Michael Yang
b23424bb3c
Merge pull request #5753 from ollama/mxyng/parse-tool-call
...
parse tool call as individual objects
2024-07-17 11:47:53 -07:00
Michael Yang
5fd6988126
parse tool call as individual objects
2024-07-17 11:19:04 -07:00
Michael Yang
5b82960df8
stub response ( #5750 )
2024-07-17 10:39:22 -07:00
Michael Yang
cc9a252d8c
Merge pull request #5732 from ollama/mxyng/cleanup
...
remove ToolCall from GenerateResponse
2024-07-17 10:26:54 -07:00
Pákozdi György
d281a6e603
add sidellama link ( #5702 )
2024-07-17 10:24:44 -07:00
royjhan
154f6f45d4
OpenAI: Support Tools ( #5614 )
...
* reopen pr
* tools
* remove tc from stream for now
* ID and Function
* openai expects arguments to be a string (#5739 )
* mutually exclusive content and tool calls
* clean up
---------
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com >
2024-07-16 20:52:59 -07:00
royjhan
0d41623b52
OpenAI: Add Suffix to v1/completions ( #5611 )
...
* add suffix
* remove todo
* remove TODO
* add to test
* rm outdated prompt tokens info md
* fix test
* fix test
2024-07-16 20:50:14 -07:00
Michael Yang
c279f96371
remove ToolCall from GenerateResponse
2024-07-16 15:22:49 -07:00
Michael Yang
499e87c9ba
Merge pull request #5730 from ollama/mxyng/cleanup
...
remove unneeded tool calls
2024-07-16 14:42:13 -07:00
Michael Yang
cd0853f2d5
Merge pull request #5207 from ollama/mxyng/suffix
...
add insert support to generate endpoint
2024-07-16 14:37:32 -07:00
Michael Yang
d290e87513
add suffix support to generate endpoint
...
this change is triggered by the presence of "suffix", particularly
useful for code completion tasks
2024-07-16 14:31:35 -07:00
Thorsten Sommer
97c20ede33
README: Added AI Studio to the list of UIs ( #5721 )
...
* Added AI Studio to the list of UIs
2024-07-16 14:24:27 -07:00
Michael Yang
5a83f79afd
remove unneeded tool calls
2024-07-16 13:48:45 -07:00
royjhan
987dbab0b0
OpenAI: /v1/embeddings compatibility ( #5285 )
...
* OpenAI v1 models
* Empty List Testing
* Add back envconfig
* v1/models docs
* Remove Docs
* OpenAI batch embed compatibility
* merge conflicts
* integrate with api/embed
* ep
* merge conflicts
* request tests
* rm resp test
* merge conflict
* merge conflict
* test fixes
* test fn renaming
* input validation for empty string
---------
Co-authored-by: jmorganca <jmorganca@gmail.com >
2024-07-16 13:36:08 -07:00
Michael Yang
a8388beb94
Merge pull request #5726 from ollama/mxyng/tools-templates
...
fix unmarshal type errors
2024-07-16 12:12:10 -07:00
Michael Yang
5afbb60fc4
fix unmarshal type errors
2024-07-16 11:39:34 -07:00
Jeffrey Morgan
4cb5d7decc
server: omit model system prompt if empty ( #5717 )
2024-07-16 11:09:00 -07:00
Michael Yang
8eac50dd4f
Merge pull request #5684 from ollama/mxyng/tests
...
add chat and generate tests with mock runner
2024-07-16 09:44:45 -07:00
Michael Yang
4a565cbf94
add chat and generate tests with mock runner
2024-07-16 09:39:31 -07:00
Michael Yang
64039df6d7
Merge pull request #5284 from ollama/mxyng/tools
...
tools
2024-07-15 18:03:37 -07:00
Jeffrey Morgan
7ac6d462ec
server: return empty slice on empty /api/embed request ( #5713 )
...
* server: return empty slice on empty `/api/embed` request
* fix tests
2024-07-15 17:39:44 -07:00
Michael Yang
ef5136a745
tools test
2024-07-15 17:18:21 -07:00
Daniel Hiltgen
8288ec8824
Merge pull request #5710 from dhiltgen/rocm_bump
...
Bump linux ROCm to 6.1.2
2024-07-15 15:32:18 -07:00
Michael Yang
d02bbebb11
tools
2024-07-15 15:26:16 -07:00
Daniel Hiltgen
224337b32f
Bump linux ROCm to 6.1.2
2024-07-15 15:10:22 -07:00
Jeffrey Morgan
9e35d9bbee
server: lowercase roles for compatibility with clients ( #5695 )
2024-07-15 13:55:57 -07:00
royjhan
b9f5e16c80
Introduce /api/embed endpoint supporting batch embedding ( #5127 )
...
* Initial Batch Embedding
* Revert "Initial Batch Embedding"
This reverts commit c22d54895a .
* Initial Draft
* mock up notes
* api/embed draft
* add server function
* check normalization
* clean up
* normalization
* playing around with truncate stuff
* Truncation
* Truncation
* move normalization to go
* Integration Test Template
* Truncation Integration Tests
* Clean up
* use float32
* move normalize
* move normalize test
* refactoring
* integration float32
* input handling and handler testing
* Refactoring of legacy and new
* clear comments
* merge conflicts
* touches
* embedding type 64
* merge conflicts
* fix hanging on single string
* refactoring
* test values
* set context length
* clean up
* testing clean up
* testing clean up
* remove function closure
* Revert "remove function closure"
This reverts commit 55d48c6ed1 .
* remove function closure
* remove redundant error check
* clean up
* more clean up
* clean up
2024-07-15 12:14:24 -07:00
royjhan
e9f7f36029
Support image input for OpenAI chat compatibility ( #5208 )
...
* OpenAI v1 models
* Refactor Writers
* Add Test
Co-Authored-By: Attila Kerekes
* Credit Co-Author
Co-Authored-By: Attila Kerekes <439392+keriati@users.noreply.github.com >
* Empty List Testing
* Use Namespace for Ownedby
* Update Test
* Add back envconfig
* v1/models docs
* Use ModelName Parser
* Test Names
* Remove Docs
* Clean Up
* Test name
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com >
* Add Middleware for Chat and List
* Testing Cleanup
* Test with Fatal
* Add functionality to chat test
* Support image input for OpenAI chat
* Decoding
* Fix message processing logic
* openai vision test
* type errors
* clean up
* redundant check
* merge conflicts
* merge conflicts
* merge conflicts
* flattening and smaller image
* add test
* support python and js SDKs and mandate prefixing
* clean up
---------
Co-authored-by: Attila Kerekes <439392+keriati@users.noreply.github.com >
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com >
2024-07-13 22:07:45 -07:00
Patrick Devine
057d31861e
remove template ( #5655 )
2024-07-13 20:56:24 -07:00
jmorganca
f7ee012300
server: prepend system message in chat handler
2024-07-13 15:08:00 -07:00
Jeffrey Morgan
1ed0aa8fea
server: fix context, load_duration and total_duration fields ( #5676 )
...
* server: fix `contet`, `load_duration` and `total_duration` fields
* Update server/routes.go
2024-07-13 09:25:31 -07:00
Jeffrey Morgan
ef98803d63
llm: looser checks for minimum memory ( #5677 )
2024-07-13 09:20:05 -07:00
Jarek
02fea420e5
Add Kerlig AI, an app for macOS ( #5675 )
2024-07-13 08:33:46 -07:00
Michael Yang
22c5451fc2
fix system prompt ( #5662 )
...
* fix system prompt
* execute template when hitting previous roles
* fix tests
---------
Co-authored-by: jmorganca <jmorganca@gmail.com >
2024-07-12 21:04:44 -07:00
Michael Yang
ebc529cbb3
autodetect stop parameters from template
2024-07-12 16:01:23 -07:00
Patrick Devine
23ebbaa46e
Revert "remove template from tests"
...
This reverts commit 9ac0a7a50b .
2024-07-12 15:47:17 -07:00
Patrick Devine
9ac0a7a50b
remove template from tests
2024-07-12 15:41:31 -07:00
Michael Yang
e5c65a85df
Merge pull request #5653 from ollama/mxyng/collect-system
...
template: preprocess message and collect system
2024-07-12 12:32:34 -07:00
Jeffrey Morgan
33627331a3
app: also clean up tempdir runners on install ( #5646 )
2024-07-12 12:29:23 -07:00
Michael Yang
36c87c433b
template: preprocess message and collect system
2024-07-12 12:26:43 -07:00
Jeffrey Morgan
179737feb7
Clean up old files when installing on Windows ( #5645 )
...
* app: always clean up install dir; force close applications
* remove wildcard
* revert `CloseApplications`
* whitespace
* update `LOCALAPPDATA` var
2024-07-11 22:53:46 -07:00
Michael Yang
47353f5ee4
Merge pull request #5639 from ollama/mxyng/unaggregated-system
2024-07-11 17:48:50 -07:00
Josh
10e768826c
fix: quant err message ( #5616 )
2024-07-11 17:24:29 -07:00
Michael Yang
5056bb9c01
rename aggregate to contents
2024-07-11 17:00:26 -07:00
Jeffrey Morgan
c4cf8ad559
llm: avoid loading model if system memory is too small ( #5637 )
...
* llm: avoid loading model if system memory is too small
* update log
* Instrument swap free space
On linux and windows, expose how much swap space is available
so we can take that into consideration when scheduling models
* use `systemSwapFreeMemory` in check
---------
Co-authored-by: Daniel Hiltgen <daniel@ollama.com >
2024-07-11 16:42:57 -07:00
Michael Yang
57ec6901eb
revert embedded templates to use prompt/response
...
This reverts commit 19753c18c0 .
for compat. messages will be added at a later date
2024-07-11 14:49:35 -07:00
Michael Yang
e64f9ebb44
do no automatically aggregate system messages
2024-07-11 14:49:35 -07:00
Jeffrey Morgan
791650ddef
sched: only error when over-allocating system memory ( #5626 )
2024-07-11 00:53:12 -07:00
Jeffrey Morgan
efbf41ed81
llm: dont link cuda with compat libs ( #5621 )
2024-07-10 20:01:52 -07:00
Michael Yang
cf15589851
Merge pull request #5620 from ollama/mxyng/templates
...
update embedded templates
2024-07-10 17:16:24 -07:00
Michael Yang
19753c18c0
update embedded templates
2024-07-10 17:03:08 -07:00
Michael Yang
41be28096a
add system prompt to first legacy template
2024-07-10 17:03:08 -07:00
Michael Yang
37a570f962
Merge pull request #5612 from ollama/mxyng/mem
...
chatglm graph
2024-07-10 14:18:33 -07:00
Michael Yang
5a739ff4cb
chatglm graph
2024-07-10 13:43:47 -07:00
Jeffrey Morgan
4e262eb2a8
remove GGML_CUDA_FORCE_MMQ=on from build ( #5588 )
2024-07-10 13:17:13 -07:00
Daniel Hiltgen
4cfcbc328f
Merge pull request #5124 from dhiltgen/amd_windows
...
Wire up windows AMD driver reporting
2024-07-10 12:50:23 -07:00
Daniel Hiltgen
79292ff3e0
Merge pull request #5555 from dhiltgen/msvc_deps
...
Bundle missing CRT libraries
2024-07-10 12:50:02 -07:00
Daniel Hiltgen
8ea500441d
Merge pull request #5580 from dhiltgen/cuda_overhead
...
Detect CUDA OS overhead
2024-07-10 12:47:31 -07:00
Daniel Hiltgen
b50c818623
Merge pull request #5607 from dhiltgen/win_rocm_v6
...
Bump ROCm on windows to 6.1.2
2024-07-10 12:47:10 -07:00
Daniel Hiltgen
b99e750b62
Merge pull request #5605 from dhiltgen/merge_glitch
...
Remove duplicate merge glitch
2024-07-10 11:47:08 -07:00
Daniel Hiltgen
1f50356e8e
Bump ROCm on windows to 6.1.2
...
This also adjusts our algorithm to favor our bundled ROCm.
I've confirmed VRAM reporting still doesn't work properly so we
can't yet enable concurrency by default.
2024-07-10 11:01:22 -07:00
Daniel Hiltgen
22c81f62ec
Remove duplicate merge glitch
2024-07-10 09:01:33 -07:00
Daniel Hiltgen
73e2c8f68f
Fix context exhaustion integration test for small gpus
...
On the smaller GPUs, the initial model load of llama2 took over 30s (the
default timeout for the DoGenerate helper)
2024-07-09 16:24:14 -07:00
Daniel Hiltgen
f4408219e9
Refine scheduler unit tests for reliability
...
This breaks up some of the test scenarios to create a
more reliable set of tests, as well as adding a little more
coverage.
2024-07-09 16:00:08 -07:00
Daniel Hiltgen
2d1e3c3229
Merge pull request #5503 from dhiltgen/dual_rocm
...
Workaround broken ROCm p2p copy
2024-07-09 15:44:16 -07:00
royjhan
4918fae535
OpenAI v1/completions: allow stop token list ( #5551 )
...
* stop token parsing fix
* add stop test
2024-07-09 14:01:26 -07:00
royjhan
0aff67877e
separate request tests ( #5578 )
2024-07-09 13:48:31 -07:00
Daniel Hiltgen
f6f759fc5f
Detect CUDA OS Overhead
...
This adds logic to detect skew between the driver and
management library which can be attributed to OS overhead
and records that so we can adjust subsequent management
library free VRAM updates and avoid OOM scenarios.
2024-07-09 12:21:50 -07:00
Daniel Hiltgen
9544a57ee4
Merge pull request #5579 from dhiltgen/win_static_deps
...
Statically link c++ and thread lib on windows
2024-07-09 12:21:13 -07:00
Daniel Hiltgen
b51e3b63ac
Statically link c++ and thread lib
...
This makes sure we statically link the c++ and thread library on windows
to avoid unnecessary runtime dependencies on non-standard DLLs
2024-07-09 11:34:30 -07:00
Michael Yang
6bbbc50f10
Merge pull request #5440 from ollama/mxyng/messages-templates
...
update named templates
2024-07-09 09:36:32 -07:00
Michael Yang
9bbddc37a7
Merge pull request #5126 from ollama/mxyng/messages
...
update message processing
2024-07-09 09:20:44 -07:00
Jeffrey Morgan
e4ff73297d
server: fix model reloads when setting OLLAMA_NUM_PARALLEL ( #5560 )
...
* server: fix unneeded model reloads when setting `OLLAMA_NUM_PARALLEL`
* remove whitespace change
* undo some changes
2024-07-08 22:32:15 -07:00
Daniel Hiltgen
b44320db13
Bundle missing CRT libraries
...
Some users are experienging runner startup errors due
to not having these msvc redist libraries on their host
2024-07-08 18:24:21 -07:00
Daniel Hiltgen
0bacb30007
Workaround broken ROCm p2p copy
...
Enable the build flag for llama.cpp to use CPU copy for multi-GPU scenarios.
2024-07-08 09:40:52 -07:00
Jeffrey Morgan
53da2c6965
llm: remove ambiguous comment when putting upper limit on predictions to avoid infinite generation ( #5535 )
2024-07-07 14:32:05 -04:00
Jeffrey Morgan
d8def1ff94
llm: allow gemma 2 to context shift ( #5534 )
2024-07-07 13:41:51 -04:00
Jeffrey Morgan
571dc61955
Update llama.cpp submodule to a8db2a9c ( #5530 )
2024-07-07 13:03:09 -04:00
Jeffrey Morgan
0e09c380fc
llm: print caching notices in debug only ( #5533 )
2024-07-07 12:38:04 -04:00
Jeffrey Morgan
0ee87615c7
sched: don't error if paging to disk on Windows and macOS ( #5523 )
2024-07-06 22:01:52 -04:00
Jeffrey Morgan
f8241bfba3
gpu: report system free memory instead of 0 ( #5521 )
2024-07-06 19:35:04 -04:00
Jeffrey Morgan
4607c70641
llm: add -DBUILD_SHARED_LIBS=off to common cpu cmake flags ( #5520 )
2024-07-06 18:58:16 -04:00
jmorganca
c12f1c5b99
release: move mingw library cleanup to correct job
2024-07-06 16:12:29 -04:00
jmorganca
a08f20d910
release: remove unwanted mingw dll.a files
2024-07-06 15:21:15 -04:00
jmorganca
6cea036027
Revert "llm: only statically link libstdc++"
...
This reverts commit 5796bfc401 .
2024-07-06 15:10:48 -04:00
jmorganca
5796bfc401
llm: only statically link libstdc++
2024-07-06 14:06:20 -04:00
jmorganca
f1a379aa56
llm: statically link pthread and stdc++ dependencies in windows build
2024-07-06 12:54:02 -04:00
jmorganca
9ae146993e
llm: add GGML_STATIC flag to windows static lib
2024-07-06 03:27:05 -04:00
Jeffrey Morgan
e0348d3fe8
llm: add COMMON_DARWIN_DEFS to arm static build ( #5513 )
2024-07-05 22:42:42 -04:00
Jeffrey Morgan
2cc854f8cb
llm: fix missing dylibs by restoring old build behavior on Linux and macOS ( #5511 )
...
* Revert "fix cmake build (#5505 )"
This reverts commit 4fd5f3526a .
* llm: fix missing dylibs by restoring old build behavior
* crlf -> lf
2024-07-05 21:48:31 -04:00
Jeffrey Morgan
5304b765b2
llm: put back old include dir ( #5507 )
...
* llm: put back old include dir
* llm: update link paths for old submodule commits
2024-07-05 19:34:21 -04:00
Michael Yang
fb6cbc02fb
update named templates
2024-07-05 16:29:32 -07:00
Jeffrey Morgan
4fd5f3526a
fix cmake build ( #5505 )
2024-07-05 19:07:01 -04:00
Daniel Hiltgen
842f85f758
Merge pull request #5502 from dhiltgen/ci_fixes
...
Always go build in CI generate steps
2024-07-05 15:39:11 -07:00
Daniel Hiltgen
9d30f9f8b3
Always go build in CI generate steps
...
With the recent cgo changes, bugs can sneak through
if we don't make sure to `go build` all the permutations
2024-07-05 15:31:52 -07:00
Blake Mizerany
631cfd9e62
types/model: remove knowledge of digest ( #5500 )
...
This was leading to ambiguity and confusion in ollama.com, and is not
used anywhere in ollama at the moment. Once manifests are addressable by
digest, we can add this back in, and in a way that is more tailored to
the concept of addressing a manifest by digest.
2024-07-05 13:42:30 -07:00
Michael Yang
326363b3a7
no funcs
2024-07-05 13:17:25 -07:00
Michael Yang
ac7a842e55
fix model reloading
...
ensure runtime model changes (template, system prompt, messages,
options) are captured on model updates without needing to reload the
server
2024-07-05 13:17:25 -07:00
Michael Yang
2c3fe1fd97
comments
2024-07-05 13:17:24 -07:00
Michael Yang
269ed6e6a2
update message processing
2024-07-05 13:16:58 -07:00
Jeffrey Morgan
78fb33dd07
fix typo in cgo directives in llm.go ( #5501 )
2024-07-05 15:18:36 -04:00
Jeffrey Morgan
8f8e736b13
update llama.cpp submodule to d7fd29f ( #5475 )
2024-07-05 13:25:58 -04:00
Jeffrey Morgan
d89454de80
Use slot with cached prompt instead of least recently used ( #5492 )
...
* Use common prefix to select slot
* actually report `longest`
2024-07-05 12:32:47 -04:00
Daniel Hiltgen
af28b94533
Merge pull request #5469 from dhiltgen/prevent_system_oom
...
Prevent loading models larger than total memory
2024-07-05 08:22:20 -07:00
Jeffrey Morgan
e9188e971a
Fix assert on small embedding inputs ( #5491 )
...
* Fix assert on small embedding inputs
* Update llm/patches/09-pooling.diff
2024-07-05 11:20:57 -04:00
Daniel Hiltgen
78eddfc068
Merge pull request #4412 from dhiltgen/win_docs
...
Document older win10 terminal problems
2024-07-05 08:18:22 -07:00
Daniel Hiltgen
02c24d3d01
Merge pull request #5466 from dhiltgen/fix_clip_unicode
...
Fix clip model loading with unicode paths
2024-07-05 08:16:58 -07:00
Daniel Hiltgen
52abc8acb7
Document older win10 terminal problems
...
We haven't found a workaround, so for now recommend updating.
2024-07-03 17:32:14 -07:00
Jeffrey Morgan
4d71c559b2
fix error detection by limiting model loading error parsing ( #5472 )
2024-07-03 20:04:30 -04:00
Anatoli Babenia
0d16eb310e
fix: use envconfig.ModelsDir directly ( #4821 )
...
* Co-authored-by: Anatoli Babenia <anatoli@rainforce.org >
Co-authored-by: Maas Lalani <maas@lalani.dev >
2024-07-03 15:36:11 -07:00
Daniel Hiltgen
8072e205ff
Merge pull request #5447 from dhiltgen/fix_keepalive
...
Only set default keep_alive on initial model load
2024-07-03 15:34:38 -07:00
Daniel Hiltgen
955f2a4e03
Only set default keep_alive on initial model load
...
This change fixes the handling of keep_alive so that if client
request omits the setting, we only set this on initial load. Once
the model is loaded, if new requests leave this unset, we'll keep
whatever keep_alive was there.
2024-07-03 15:29:56 -07:00
Daniel Hiltgen
3c75113e37
Prevent loading models larger than total memory
...
Users may not realize the siny new model they're trying to load
fits on their disk, but can't load into system+GPU memory. Today
we crash, but with this fix, we'll give them a better error message
before even trying to load it.
2024-07-03 14:47:42 -07:00
Daniel Hiltgen
ccd7785859
Merge pull request #5243 from dhiltgen/modelfile_use_mmap
...
Fix use_mmap for modefiles
2024-07-03 13:59:42 -07:00
royjhan
3b5a4a77f3
Return Correct Prompt Eval Count Regardless of Cache Prompt ( #5371 )
...
* openai compatibility
* Revert "openai compatibility"
This reverts commit d3f98a811e .
* remove erroneous subtraction of prompt cache
2024-07-03 13:46:23 -07:00
Daniel Hiltgen
daed0634a9
Merge pull request #5467 from dhiltgen/bogus_cpu_mac_error
...
Fix corner cases on tmp cleaner on mac
2024-07-03 13:39:36 -07:00
Daniel Hiltgen
0d4dd707bc
Merge pull request #5465 from dhiltgen/better_cuda_logging
...
Better nvidia GPU discovery logging
2024-07-03 13:12:22 -07:00
Daniel Hiltgen
0e982bc1f4
Fix corner cases on tmp cleaner on mac
...
When ollama is running a long time, tmp cleaners can remove the
runners. This tightens up a few corner cases on arm macs where
we failed with "server cpu not listed in available servers map[]"
2024-07-03 13:10:14 -07:00
Daniel Hiltgen
6298f49816
Fix clip model loading with unicode paths
...
On windows, if the model dir contained unicode characters
clip models would fail to load. This fixes the file name
handling in clip.cpp to support utf16 on windows.
2024-07-03 12:46:36 -07:00
Daniel Hiltgen
ef757da2c9
Better nvidia GPU discovery logging
...
Refine the way we log GPU discovery to improve the non-debug
output, and report more actionable log messages when possible
to help users troubleshoot on their own.
2024-07-03 10:50:40 -07:00
Michael Yang
e5352297d9
Merge pull request #5448 from ollama/mxyng/fix-generate
...
use model template by default
2024-07-02 16:48:06 -07:00
Michael Yang
65a5040e09
fix generate template
2024-07-02 16:42:17 -07:00
royjhan
d626b99b54
OpenAI: v1/completions compatibility ( #5209 )
...
* OpenAI v1 models
* Refactor Writers
* Add Test
Co-Authored-By: Attila Kerekes
* Credit Co-Author
Co-Authored-By: Attila Kerekes <439392+keriati@users.noreply.github.com >
* Empty List Testing
* Use Namespace for Ownedby
* Update Test
* Add back envconfig
* v1/models docs
* Use ModelName Parser
* Test Names
* Remove Docs
* Clean Up
* Test name
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com >
* Add Middleware for Chat and List
* Completions Endpoint
* Testing Cleanup
* Test with Fatal
* Add functionality to chat test
* Rename function
* float types
* type cleanup
* cleaning
* more cleaning
* Extra test cases
* merge conflicts
* merge conflicts
* merge conflicts
* merge conflicts
* cleaning
* cleaning
---------
Co-authored-by: Attila Kerekes <439392+keriati@users.noreply.github.com >
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com >
2024-07-02 16:01:45 -07:00
Michael Yang
dddb58a38b
Merge pull request #5051 from ollama/mxyng/capabilities
...
add model capabilities
2024-07-02 14:26:07 -07:00
Michael Yang
400056e154
Merge pull request #5420 from ollama/mxyng/insecure-path
...
err on insecure path
2024-07-02 14:03:23 -07:00
Daniel Hiltgen
d2f19024d0
Merge pull request #5442 from dhiltgen/concurrency_docs
...
Add windows radeon concurrency note
2024-07-02 12:47:47 -07:00
Daniel Hiltgen
69c04eecc4
Add windows radeon concurreny note
2024-07-02 12:46:14 -07:00
royjhan
996bb1b85e
OpenAI: /v1/models and /v1/models/{model} compatibility ( #5007 )
...
* OpenAI v1 models
* Refactor Writers
* Add Test
Co-Authored-By: Attila Kerekes
* Credit Co-Author
Co-Authored-By: Attila Kerekes <439392+keriati@users.noreply.github.com >
* Empty List Testing
* Use Namespace for Ownedby
* Update Test
* Add back envconfig
* v1/models docs
* Use ModelName Parser
* Test Names
* Remove Docs
* Clean Up
* Test name
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com >
* Add Middleware for Chat and List
* Testing Cleanup
* Test with Fatal
* Add functionality to chat test
* OpenAI: /v1/models/{model} compatibility (#5028 )
* Retrieve Model
* OpenAI Delete Model
* Retrieve Middleware
* Remove Delete from Branch
* Update Test
* Middleware Test File
* Function name
* Cleanup
* Test Update
* Test Update
---------
Co-authored-by: Attila Kerekes <439392+keriati@users.noreply.github.com >
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com >
2024-07-02 11:50:56 -07:00
Daniel Hiltgen
422dcc3856
Merge pull request #5439 from dhiltgen/fix_centos_7_build
...
Switch ARM64 container image base to rocky 8
2024-07-02 11:01:15 -07:00
Daniel Hiltgen
020bd60ab2
Switch amd container image base to rocky 8
...
The centos 7 arm mirrors have disappeared due to the EOL 2 days
ago, and the vault sed workaround which works for x86 doesn't work for arm.
2024-07-02 10:34:47 -07:00
Daniel Hiltgen
8e277b72bb
Merge pull request #5438 from dhiltgen/fix_centos_7_build
...
Centos 7 EOL broke mirrors
2024-07-02 09:28:00 -07:00
Daniel Hiltgen
4f67b39d26
Centos 7 EOL broke mirrors
...
As of July 1st 2024: Could not resolve host: mirrorlist.centos.org
This is expected due to EOL dates.
2024-07-02 09:22:17 -07:00
Josh
2425281317
Merge pull request #5336 from ollama/jyan/from-errors
...
fix: trim spaces for FROM argument, don't trim inside of quotes
2024-07-01 16:32:46 -07:00
Josh
0403e9860e
Merge pull request #5421 from ollama/jyan/ver
...
fix: add unsupported architecture message for linux/windows
2024-07-01 16:32:14 -07:00
Josh Yan
33a65e3ba3
error
2024-07-01 16:04:13 -07:00
Michael Yang
88bcd79bb9
err on insecure path
2024-07-01 15:55:59 -07:00
Josh Yan
7e571f95f0
trimspace test case
2024-07-01 11:07:48 -07:00
Michael Yang
da8e2a0447
use kvs to detect embedding models
2024-07-01 10:47:43 -07:00
Michael Yang
a30915bde1
add capabilities
2024-07-01 10:47:43 -07:00
Michael Yang
58e3fff311
rename templates to template
2024-07-01 10:40:54 -07:00
Michael Yang
3f0b309ad4
remove ManifestV2
2024-07-01 10:40:54 -07:00
Daniel Hiltgen
e70610ef06
Merge pull request #5410 from dhiltgen/ctx_cleanup
...
Fix case for NumCtx
2024-07-01 09:54:20 -07:00
Daniel Hiltgen
dfded7e075
Merge pull request #5364 from dhiltgen/concurrency_docs
...
Document concurrent behavior and settings
2024-07-01 09:49:48 -07:00
Daniel Hiltgen
173b550438
Remove default auto from help message
...
This may confuse users thinking "auto" is an acceptable string - it must be numeric
2024-07-01 09:48:05 -07:00
Daniel Hiltgen
cff3f44f4a
Fix case for NumCtx
2024-07-01 09:43:59 -07:00
Josh Yan
26e4e66faf
updated parsefile test
2024-07-01 09:43:49 -07:00
Daniel Hiltgen
97c9e11768
Switch use_mmap to a pointer type
...
This uses nil as undefined for a cleaner implementation.
2024-07-01 08:44:59 -07:00
Daniel Hiltgen
3518aaef33
Merge pull request #4218 from dhiltgen/auto_parallel
...
Enable concurrency by default
2024-07-01 08:32:29 -07:00
RAPID ARCHITECT
1963c00201
Update README.md ( #5214 )
...
* Update README.md
Added Mesop example to web & desktop
* Update README.md
---------
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com >
2024-06-30 22:00:57 -04:00
Eduard
27402cb7a2
Update gpu.md ( #5382 )
...
Runs fine on a NVIDIA GeForce GTX 1050 Ti
2024-06-30 21:48:51 -04:00
Jeffrey Morgan
c1218199cf
Update api.md
2024-06-29 16:22:49 -07:00
Jeffrey Morgan
717f7229eb
Do not shift context for sliding window models ( #5368 )
...
* Do not shift context for sliding window models
* truncate prompt > 2/3 tokens
* only target gemma2
2024-06-28 19:39:31 -07:00
Daniel Hiltgen
aae56abb7c
Document concurrent behavior and settings
2024-06-28 13:15:57 -07:00
royjhan
5f034f5b63
Include Show Info in Interactive ( #5342 )
2024-06-28 13:15:52 -07:00
royjhan
b910fa9010
Ollama Show: Check for Projector Type ( #5307 )
...
* Check exists projtype
* Maintain Ordering
2024-06-28 11:30:16 -07:00
royjhan
6d4219083c
Update docs ( #5312 )
2024-06-28 09:58:14 -07:00
Michael Yang
1ed4f521c4
Merge pull request #5340 from ollama/mxyng/mem
...
gemma2 graph
2024-06-27 14:26:49 -07:00
Michael Yang
de2163dafd
gemma2 graph
2024-06-27 13:34:52 -07:00
Josh Yan
9bd00041fa
trim all params
2024-06-27 11:18:38 -07:00
Josh Yan
4e986a823c
unquote, trimp space
2024-06-27 10:59:15 -07:00
Michael
2cc7d05012
update readme for gemma 2 ( #5333 )
...
* update readme for gemma 2
2024-06-27 12:45:16 -04:00
Michael Yang
123a722a6f
zip: prevent extracting files into parent dirs ( #5314 )
2024-06-26 21:38:21 -07:00
Jeffrey Morgan
4d311eb731
llm: architecture patch ( #5316 )
2024-06-26 21:38:12 -07:00
Blake Mizerany
cb42e607c5
llm: speed up gguf decoding by a lot ( #5246 )
...
Previously, some costly things were causing the loading of GGUF files
and their metadata and tensor information to be VERY slow:
* Too many allocations when decoding strings
* Hitting disk for each read of each key and value, resulting in a
not-okay amount of syscalls/disk I/O.
The show API is now down to 33ms from 800ms+ for llama3 on a macbook pro
m3.
This commit also prevents collecting large arrays of values when
decoding GGUFs (if desired). When such keys are encountered, their
values are null, and are encoded as such in JSON.
Also, this fixes a broken test that was not encoding valid GGUF.
2024-06-24 21:47:52 -07:00
Blake Mizerany
2aa91a937b
cmd: defer stating model info until necessary ( #5248 )
...
This commit changes the 'ollama run' command to defer fetching model
information until it really needs it. That is, when in interactive mode.
It also removes one such case where the model information is fetch in
duplicate, just before calling generateInteractive and then again, first
thing, in generateInteractive.
This positively impacts the performance of the command:
; time ./before run llama3 'hi'
Hi! It's nice to meet you. Is there something I can help you with, or would you like to chat?
./before run llama3 'hi' 0.02s user 0.01s system 2% cpu 1.168 total
; time ./before run llama3 'hi'
Hi! It's nice to meet you. Is there something I can help you with, or would you like to chat?
./before run llama3 'hi' 0.02s user 0.01s system 2% cpu 1.220 total
; time ./before run llama3 'hi'
Hi! It's nice to meet you. Is there something I can help you with, or would you like to chat?
./before run llama3 'hi' 0.02s user 0.01s system 2% cpu 1.217 total
; time ./after run llama3 'hi'
Hi! It's nice to meet you. Is there something I can help you with, or would you like to chat?
./after run llama3 'hi' 0.02s user 0.01s system 4% cpu 0.652 total
; time ./after run llama3 'hi'
Hi! It's nice to meet you. Is there something I can help you with, or would you like to chat?
./after run llama3 'hi' 0.01s user 0.01s system 5% cpu 0.498 total
; time ./after run llama3 'hi'
Hi! It's nice to meet you. Is there something I can help you with or would you like to chat?
./after run llama3 'hi' 0.01s user 0.01s system 3% cpu 0.479 total
; time ./after run llama3 'hi'
Hi! It's nice to meet you. Is there something I can help you with, or would you like to chat?
./after run llama3 'hi' 0.02s user 0.01s system 5% cpu 0.507 total
; time ./after run llama3 'hi'
Hi! It's nice to meet you. Is there something I can help you with, or would you like to chat?
./after run llama3 'hi' 0.02s user 0.01s system 5% cpu 0.507 total
2024-06-24 20:14:03 -07:00
Daniel Hiltgen
ccef9431c8
Merge pull request #5205 from dhiltgen/modelfile_use_mmap
...
Fix use_mmap parsing for modelfiles
2024-06-21 16:30:36 -07:00
Daniel Hiltgen
642cee1342
Sort the ps output
...
Provide consistent ordering for the ps command - longest duration listed first
2024-06-21 15:59:41 -07:00
royjhan
9a9e7d83c4
Docs ( #5149 )
2024-06-21 15:52:09 -07:00
Daniel Hiltgen
9929751cc8
Disable concurrency for AMD + Windows
...
Until ROCm v6.2 ships, we wont be able to get accurate free memory
reporting on windows, which makes automatic concurrency too risky.
Users can still opt-in but will need to pay attention to model sizes otherwise they may thrash/page VRAM or cause OOM crashes.
All other platforms and GPUs have accurate VRAM reporting wired
up now, so we can turn on concurrency by default.
2024-06-21 15:45:05 -07:00
Daniel Hiltgen
17b7186cd7
Enable concurrency by default
...
This adjusts our default settings to enable multiple models and parallel
requests to a single model. Users can still override these by the same
env var settings as before. Parallel has a direct impact on
num_ctx, which in turn can have a significant impact on small VRAM GPUs
so this change also refines the algorithm so that when parallel is not
explicitly set by the user, we try to find a reasonable default that fits
the model on their GPU(s). As before, multiple models will only load
concurrently if they fully fit in VRAM.
2024-06-21 15:45:05 -07:00
Michael Yang
189a43caa2
Merge pull request #5206 from ollama/mxyng/quantize
...
fix: quantization with template
2024-06-21 13:44:34 -07:00
Michael Yang
e835ef1836
fix: quantization with template
2024-06-21 13:39:25 -07:00
Daniel Hiltgen
7e7749224c
Fix use_mmap parsing for modelfiles
...
Add the new tristate parsing logic for the code path for modelfiles,
as well as a unit test.
2024-06-21 12:27:19 -07:00
Daniel Hiltgen
c7c2f3bc22
Merge pull request #5194 from dhiltgen/linux_mmap_auto
...
Refine mmap default logic on linux
2024-06-20 11:44:08 -07:00
Daniel Hiltgen
54a79d6a8a
Merge pull request #5125 from dhiltgen/fedora39
...
Bump latest fedora cuda repo to 39
2024-06-20 11:27:24 -07:00
Daniel Hiltgen
5bf5aeec01
Refine mmap default logic on linux
...
If we try to use mmap when the model is larger than the system free space, loading is slower than the no-mmap approach.
2024-06-20 11:07:04 -07:00
Michael Yang
e01e535cbb
Merge pull request #5192 from ollama/mxyng/kv
...
handle asymmetric embedding KVs
2024-06-20 10:46:24 -07:00
Josh
0195d6a2f8
Merge pull request #5188 from ollama/jyan/tmpdir2
...
fix: skip os.removeAll() if PID does not exist
2024-06-20 10:40:59 -07:00
Michael Yang
8e0641a9bf
handle asymmetric embedding KVs
2024-06-20 09:57:27 -07:00
Josh Yan
662568d453
err!=nil check
2024-06-20 09:30:59 -07:00
Josh Yan
4ebb66c662
reformat error check
2024-06-20 09:23:43 -07:00
Josh Yan
23e899f32d
skip os.removeAll() if PID does not exist
2024-06-20 08:51:35 -07:00
royjhan
fedf71635e
Extend api/show and ollama show to return more model info ( #4881 )
...
* API Show Extended
* Initial Draft of Information
Co-Authored-By: Patrick Devine <pdevine@sonic.net >
* Clean Up
* Descriptive arg error messages and other fixes
* Second Draft of Show with Projectors Included
* Remove Chat Template
* Touches
* Prevent wrapping from files
* Verbose functionality
* Docs
* Address Feedback
* Lint
* Resolve Conflicts
* Function Name
* Tests for api/show model info
* Show Test File
* Add Projector Test
* Clean routes
* Projector Check
* Move Show Test
* Touches
* Doc update
---------
Co-authored-by: Patrick Devine <pdevine@sonic.net >
2024-06-19 14:19:02 -07:00
Daniel Hiltgen
97c59be653
Merge pull request #5074 from dhiltgen/app_log_rotation
...
Implement log rotation for tray app
2024-06-19 13:02:24 -07:00
Daniel Hiltgen
9d8a4988e8
Implement log rotation for tray app
2024-06-19 12:53:34 -07:00
Michael Yang
1ae0750a21
Merge pull request #5147 from ollama/mxyng/cleanup
...
remove confusing log message
2024-06-19 12:50:31 -07:00
Michael Yang
9d91e5e587
remove confusing log message
2024-06-19 11:14:11 -07:00
Daniel Hiltgen
96624aa412
Merge pull request #5072 from dhiltgen/windows_path
...
Move libraries out of users path
2024-06-19 09:13:39 -07:00
Daniel Hiltgen
10f33b8537
Merge pull request #5146 from dhiltgen/backout
...
Put back temporary intel GPU env var
2024-06-19 09:12:45 -07:00
Daniel Hiltgen
4a633cc295
Merge pull request #5145 from dhiltgen/bad_loads
...
Fix bad symbol load detection
2024-06-19 09:12:33 -07:00
Daniel Hiltgen
d34d88e417
Revert "Revert "gpu: add env var for detecting Intel oneapi gpus ( #5076 )""
...
This reverts commit 755b4e4fc2 .
2024-06-19 08:57:41 -07:00
Daniel Hiltgen
52ce350b7a
Fix bad symbol load detection
...
pointer deref's weren't correct on a few libraries, which explains
some crashes on older systems or miswired symlinks for discovery libraries.
2024-06-19 08:39:07 -07:00
Daniel Hiltgen
2abebb2cbe
Merge pull request #5128 from zhewang1-intc/fix_levelzero_empty_symbol_detect
...
Fix levelzero empty symbol detect
2024-06-19 08:33:16 -07:00
Blake Mizerany
380e06e5be
types/model: remove Digest
...
The Digest type in its current form is awkward to work with and presents
challenges with regard to how it serializes via String using the '-'
prefix.
We currently only use this in ollama.com, so we'll move our specific
needs around digest parsing and validation there.
2024-06-18 20:28:11 -07:00
Wang,Zhe
badf975e45
get real func ptr.
2024-06-19 09:00:51 +08:00
Wang,Zhe
755b4e4fc2
Revert "gpu: add env var for detecting Intel oneapi gpus ( #5076 )"
...
This reverts commit 163cd3e77c .
2024-06-19 08:59:58 +08:00
Daniel Hiltgen
1a1c99e334
Bump latest fedora cuda repo to 39
2024-06-18 17:13:54 -07:00
Michael Yang
21adf8b6d2
Merge pull request #5121 from ollama/mxyng/deepseekv2
...
deepseek v2 graph
2024-06-18 16:30:58 -07:00
Daniel Hiltgen
784bf88b0d
Wire up windows AMD driver reporting
...
This seems to be ROCm version, not actually driver version, but
it may be useful for toggling logic for VRAM reporting in the future
2024-06-18 16:22:47 -07:00
Michael Yang
e873841cbb
deepseek v2 graph
2024-06-18 15:35:12 -07:00
Daniel Hiltgen
26d0bf9236
Merge pull request #5117 from dhiltgen/fix_prediction
...
Handle models with divergent layer sizes
2024-06-18 11:36:51 -07:00
Daniel Hiltgen
359b15a597
Handle models with divergent layer sizes
...
The recent refactoring of the memory prediction assumed all layers
are the same size, but for some models (like deepseek-coder-v2) this
is not the case, so our predictions were significantly off.
2024-06-18 11:05:34 -07:00
Daniel Hiltgen
b55958a587
Merge pull request #5106 from dhiltgen/clean_logs
...
Tighten up memory prediction logging
2024-06-18 09:24:38 -07:00
Daniel Hiltgen
7784ca33ce
Tighten up memory prediction logging
...
Prior to this change, we logged the memory prediction multiple times
as the scheduler iterates to find a suitable configuration, which can be
confusing since only the last log before the server starts is actually valid.
This now logs once just before starting the server on the final configuration.
It also reports what library instead of always saying "offloading to gpu" when
using CPU.
2024-06-18 09:15:35 -07:00
Daniel Hiltgen
c9c8c98bf6
Merge pull request #5105 from dhiltgen/cuda_mmap
...
Adjust mmap logic for cuda windows for faster model load
2024-06-17 17:07:30 -07:00
Daniel Hiltgen
171796791f
Adjust mmap logic for cuda windows for faster model load
...
On Windows, recent llama.cpp changes make mmap slower in most
cases, so default to off. This also implements a tri-state for
use_mmap so we can detect the difference between a user provided
value of true/false, or unspecified.
2024-06-17 16:54:30 -07:00
Jeffrey Morgan
176d0f7075
Update import.md
2024-06-17 19:44:14 -04:00
Daniel Hiltgen
8ed51cac37
Merge pull request #5103 from dhiltgen/faster_win_build
...
Revert powershell jobs, but keep nvcc and cmake parallelism
2024-06-17 14:23:18 -07:00
Daniel Hiltgen
c9e6f0542d
Merge pull request #5069 from dhiltgen/ci_release
...
Implement custom github release action
2024-06-17 13:59:37 -07:00
Daniel Hiltgen
b0930626c5
Add back lower level parallel flags
...
nvcc supports parallelism (threads) and cmake + make can use -j,
while msbuild requires /p:CL_MPcount=8
2024-06-17 13:44:46 -07:00
Daniel Hiltgen
e890be4814
Revert "More parallelism on windows generate"
...
This reverts commit 0577af98f4 .
2024-06-17 13:32:46 -07:00
Daniel Hiltgen
b2799f111b
Move libraries out of users path
...
We update the PATH on windows to get the CLI mapped, but this has
an unintended side effect of causing other apps that may use our bundled
DLLs to get terminated when we upgrade.
2024-06-17 13:12:18 -07:00
Jeffrey Morgan
152fc202f5
llm: update llama.cpp commit to 7c26775 ( #4896 )
...
* llm: update llama.cpp submodule to `7c26775`
* disable `LLAMA_BLAS` for now
* `-DLLAMA_OPENMP=off`
2024-06-17 15:56:16 -04:00
Lei Jitang
4ad0d4d6d3
Fix a build warning ( #5096 )
...
Signed-off-by: Lei Jitang <leijitang@outlook.com >
2024-06-17 14:47:48 -04:00
Jeffrey Morgan
163cd3e77c
gpu: add env var for detecting Intel oneapi gpus ( #5076 )
...
* gpu: add env var for detecting intel oneapi gpus
* fix build error
2024-06-16 20:09:05 -04:00
Daniel Hiltgen
4c2c8f93dd
Merge pull request #5080 from dhiltgen/debug_intel_crash
...
Add some more debugging logs for intel discovery
2024-06-16 14:42:41 -07:00
Daniel Hiltgen
fd1e6e0590
Add some more debugging logs for intel discovery
...
Also removes an unused overall count variable
2024-06-16 07:42:52 -07:00
royjhan
89c79bec8c
Add ModifiedAt Field to /api/show ( #5033 )
...
* Add Mod Time to Show
* Error Handling
2024-06-15 20:53:56 -07:00
Jeffrey Morgan
c7b77004e3
docs: add missing powershell package to windows development instructions ( #5075 )
...
* docs: add missing instruction for powershell build
The powershell script for building Ollama on Windows now requires the `ThreadJob` module. Add this to the instructions and dependency list.
* Update development.md
2024-06-15 23:08:09 -04:00
Daniel Hiltgen
07d143f412
Merge pull request #5058 from coolljt0725/fix_build_warning
...
gpu: Fix build warning
2024-06-15 11:52:36 -07:00
Daniel Hiltgen
a12283e2ff
Implement custom github release action
...
This implements the release logic we want via gh cli
to support updating releases with rc tags in place and retain
release notes and other community reactions.
2024-06-15 11:36:56 -07:00
Daniel Hiltgen
4b0050cf0e
Merge pull request #5037 from dhiltgen/faster_win_build
...
More parallelism on windows generate
2024-06-15 08:03:05 -07:00
Daniel Hiltgen
0577af98f4
More parallelism on windows generate
...
Make the build faster
2024-06-15 07:44:55 -07:00
Daniel Hiltgen
17ce203a26
Merge pull request #4875 from dhiltgen/rocm_gfx900_workaround
...
Rocm gfx900 workaround
2024-06-15 07:38:58 -07:00
Daniel Hiltgen
d76555ffb5
Merge pull request #4874 from dhiltgen/rocm_v6_bump
...
Rocm v6 bump
2024-06-15 07:38:32 -07:00
Daniel Hiltgen
2786dff5d3
Merge pull request #4264 from dhiltgen/show_gpu_visible_settings
...
Centralize GPU configuration vars
2024-06-15 07:33:52 -07:00
Lei Jitang
225f0d1219
gpu: Fix build warning
...
Signed-off-by: Lei Jitang <leijitang@outlook.com >
2024-06-15 14:26:23 +08:00
Daniel Hiltgen
532db58311
Merge pull request #4972 from jayson-cloude/main
...
fix: "Skip searching for network devices"
2024-06-14 17:04:40 -07:00
Daniel Hiltgen
6be309e1bd
Centralize GPU configuration vars
...
This should aid in troubleshooting by capturing and reporting the GPU
settings at startup in the logs along with all the other server settings.
2024-06-14 15:59:10 -07:00
Daniel Hiltgen
da3bf23354
Workaround gfx900 SDMA bugs
...
Implement support for GPU env var workarounds, and leverage
this for the Vega RX 56 which needs
HSA_ENABLE_SDMA=0 set to work properly
2024-06-14 15:38:13 -07:00
Daniel Hiltgen
26ab67732b
Bump ROCm linux to 6.1.1
2024-06-14 15:37:54 -07:00
Daniel Hiltgen
45cacbaf05
Merge pull request #4517 from dhiltgen/gpu_incremental
...
Enhanced GPU discovery and multi-gpu support with concurrency
2024-06-14 15:35:00 -07:00
Daniel Hiltgen
17df6520c8
Remove mmap related output calc logic
2024-06-14 14:55:50 -07:00
Daniel Hiltgen
6f351bf586
review comments and coverage
2024-06-14 14:55:50 -07:00
Daniel Hiltgen
ff4f0cbd1d
Prevent multiple concurrent loads on the same gpus
...
While models are loading, the VRAM metrics are dynamic, so try
to load on a GPU that doesn't have a model actively loading, or wait
to avoid races that lead to OOMs
2024-06-14 14:51:40 -07:00
Daniel Hiltgen
fc37c192ae
Refine CPU load behavior with system memory visibility
2024-06-14 14:51:40 -07:00
Daniel Hiltgen
434dfe30c5
Reintroduce nvidia nvml library for windows
...
This library will give us the most reliable free VRAM reporting on windows
to enable concurrent model scheduling.
2024-06-14 14:51:40 -07:00
Daniel Hiltgen
4e2b7e181d
Refactor intel gpu discovery
2024-06-14 14:51:40 -07:00
Daniel Hiltgen
48702dd149
Harden unload for empty runners
2024-06-14 14:51:40 -07:00
Daniel Hiltgen
68dfc6236a
refined test timing
...
adjust timing on some tests so they don't timeout on small/slow GPUs
2024-06-14 14:51:40 -07:00
Daniel Hiltgen
5e8ff556cb
Support forced spreading for multi GPU
...
Our default behavior today is to try to fit into a single GPU if possible.
Some users would prefer the old behavior of always spreading across
multiple GPUs even if the model can fit into one. This exposes that
tunable behavior.
2024-06-14 14:51:40 -07:00
Daniel Hiltgen
6fd04ca922
Improve multi-gpu handling at the limit
...
Still not complete, needs some refinement to our prediction to understand the
discrete GPUs available space so we can see how many layers fit in each one
since we can't split one layer across multiple GPUs we can't treat free space
as one logical block
2024-06-14 14:51:40 -07:00
Daniel Hiltgen
206797bda4
Fix concurrency integration test to work locally
...
This worked remotely but wound up trying to spawn multiple servers
locally which doesn't work
2024-06-14 14:51:40 -07:00
Daniel Hiltgen
43ed358f9a
Refine GPU discovery to bootstrap once
...
Now that we call the GPU discovery routines many times to
update memory, this splits initial discovery from free memory
updating.
2024-06-14 14:51:40 -07:00
Daniel Hiltgen
b32ebb4f29
Use DRM driver for VRAM info for amd
...
The amdgpu drivers free VRAM reporting omits some other apps, so leverage the
upstream DRM driver which keeps better tabs on things
2024-06-14 14:51:40 -07:00
Daniel Hiltgen
fb9cdfa723
Fix server.cpp for the new cuda build macros
2024-06-14 14:51:40 -07:00
Daniel Hiltgen
efac488675
Revert "Limit GPU lib search for now ( #4777 )"
...
This reverts commit 476fb8e892 .
2024-06-14 14:51:40 -07:00
Jeffrey Morgan
6b800aa7b7
openai: do not set temperature to 0 when setting seed ( #5045 )
2024-06-14 13:43:56 -07:00
Jeffrey Morgan
dd7c9ebeaf
server: longer timeout in TestRequests ( #5046 )
2024-06-14 09:48:25 -07:00
Patrick Devine
4dc7fb9525
update 40xx gpu compat matrix ( #5036 )
2024-06-13 17:10:33 -07:00
Daniel Hiltgen
c39761c552
Merge pull request #5032 from dhiltgen/actually_skip
...
Actually skip PhysX on windows
2024-06-13 13:26:09 -07:00
Daniel Hiltgen
aac367636d
Actually skip PhysX on windows
2024-06-13 13:17:19 -07:00
Michael Yang
15a687ae4b
Merge pull request #5031 from ollama/mxyng/fix-multibyte-utf16
...
fix: multibyte utf16
2024-06-13 13:14:55 -07:00
Michael Yang
d528e1af75
fix utf16 for multibyte runes
2024-06-13 13:07:42 -07:00
Michael Yang
cd234ce22c
parser: add test for multibyte runes
2024-06-13 13:07:42 -07:00
Patrick Devine
94618b2365
add OLLAMA_MODELS to envconfig ( #5029 )
2024-06-13 12:52:03 -07:00
Jeffrey Morgan
1fd236d177
server: remove jwt decoding error ( #5027 )
2024-06-13 11:21:15 -07:00
Michael Yang
e87fc7200d
Merge pull request #5025 from ollama/mxyng/revert-parser-scan
...
Revert "proper utf16 support"
2024-06-13 10:31:25 -07:00
Michael Yang
20b9f8e6f4
Revert "proper utf16 support"
...
This reverts commit 66ab48772f .
this change broke utf-8 scanning of multi-byte runes
2024-06-13 10:22:16 -07:00
Patrick Devine
c69bc19e46
move OLLAMA_HOST to envconfig ( #5009 )
2024-06-12 18:48:16 -04:00
Michael Yang
bba5d177aa
Merge pull request #5004 from ollama/mxyng/fix-templates
...
fix: multiple templates when creating from model
2024-06-12 14:39:29 -07:00
Michael Yang
c16f8af911
fix: multiple templates when creating from model
...
multiple templates may appear in a model if a model is created from
another model that 1) has an autodetected template and 2) defines a
custom template
2024-06-12 13:35:49 -07:00
Michael Yang
217f60c3d9
Merge pull request #4987 from ollama/mxyng/revert-byte-order
...
Revert "Merge pull request #4938 from ollama/mxyng/fix-byte-order"
2024-06-11 16:04:20 -07:00
Michael Yang
7bdcd1da94
Revert "Merge pull request #4938 from ollama/mxyng/fix-byte-order"
...
This reverts commit f5f245cc15 , reversing
changes made to 94d37fdcae .
this change broke gguf v2 which is incorrectly detected as big endian
2024-06-11 15:56:17 -07:00
Jeffrey Morgan
ead259d877
llm: fix seed value not being applied to requests ( #4986 )
2024-06-11 14:24:41 -07:00
James Montgomery
2ff45d571d
Add Ollama-hpp to Community Libraries in README. ( #4983 )
2024-06-11 11:15:05 -07:00
jayson-cloude
157f09acdf
fix: "Skip searching for network devices"
...
On an Ubuntu 24.04 computer with vmware installed, the sudo lshw command will get stuck. "Network interfaces" is always displayed
2024-06-11 16:11:35 +08:00
Michael Yang
0f3cf1d42e
Merge pull request #4715 from ollama/mxyng/utf16-parser
...
proper utf16 support
2024-06-10 11:41:29 -07:00
Michael Yang
5bc029c529
Merge pull request #4921 from ollama/mxyng/import-md
...
update import.md
2024-06-10 11:41:09 -07:00
Michael Yang
e9a9c6a8e8
Merge pull request #4965 from ollama/mxyng/skip-layer-remove
...
fix: skip removing layers that no longer exist
2024-06-10 11:40:03 -07:00
Michael Yang
515f497e6d
fix: skip removing layers that no longer exist
2024-06-10 11:32:19 -07:00
Michael Yang
b27268aaef
add test
2024-06-10 11:32:15 -07:00
Michael Yang
f5f245cc15
Merge pull request #4938 from ollama/mxyng/fix-byte-order
...
fix parsing big endian gguf
2024-06-10 09:38:12 -07:00
Jim Scardelis
94d37fdcae
fix: examples/langchain-python-rag-privategpt/requirements.txt ( #3382 )
2024-06-09 10:58:09 -07:00
Craig Hughes
b84aea1685
Critical fix from llama.cpp JSON grammar to forbid un-escaped escape characters inside strings, which breaks parsing. ( #3782 )
2024-06-09 10:57:09 -07:00
Napuh
896495de7b
Add instructions to easily install specific versions on faq.md ( #4084 )
...
* Added instructions to easily install specific versions on faq.md
* Small typo
* Moved instructions on how to install specific version to linux.md
* Update docs/linux.md
* Update docs/linux.md
---------
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com >
2024-06-09 10:49:03 -07:00
dcasota
5528dd9d11
Error handling load_single_document() in ingest.py ( #4852 )
...
load_single_document() handles
- corrupt files
- empty (zero byte) files
- unsupported file extensions
2024-06-09 10:41:07 -07:00
Jeffrey Morgan
943172cbf4
Update api.md
2024-06-08 23:04:32 -07:00
Nischal Jain
85169e8d6f
Added headless-ollama ( #4612 )
2024-06-08 18:51:16 -07:00
Jeffrey Morgan
34f142797a
llm: always add bos token to prompt ( #4941 )
...
* fix embedding by adding fixes from llama.cpp upstream
* remove assert
---------
Co-authored-by: Jesper Ek <deadbeef84@gmail.com >
2024-06-08 18:47:10 -07:00
Erhan
46a7f1e74a
Update README.md with LangChainRust ( #4854 )
2024-06-08 17:29:36 -07:00
Michael Yang
620d5c569e
fix parsing big endian gguf
2024-06-08 12:35:26 -07:00
Michael Yang
b9ce7bf75e
update import.md
2024-06-07 16:45:15 -07:00
Daniel Hiltgen
cddc63381c
Merge pull request #4909 from dhiltgen/oneapi_disable
...
Add ability to skip oneapi generate
2024-06-07 14:07:15 -07:00
Michael Yang
385a32ecb5
Merge pull request #4910 from ollama/mxyng/detect-chat-template
...
fix create model when template detection errors
2024-06-07 11:07:39 -07:00
Michael Yang
030e765e76
fix create model when template detection errors
2024-06-07 10:51:35 -07:00
Daniel Hiltgen
ab8c929e20
Add ability to skip oneapi generate
...
This follows the same pattern for cuda and rocm to allow
disabling the build even when we detect the dependent libraries
2024-06-07 08:32:49 -07:00
Jeffrey Morgan
ce0dc33cb8
llm: patch to fix qwen 2 temporarily on nvidia ( #4897 )
2024-06-06 23:14:33 -07:00
Michael Yang
78f81fc0e5
Merge pull request #4800 from ollama/mxyng/detect-chat-template
...
detect chat template from KV
2024-06-06 16:17:18 -07:00
Michael Yang
9b6c2e6eb6
detect chat template from KV
2024-06-06 16:03:47 -07:00
royjhan
1a29e9a879
API app/browser access ( #4879 )
...
* API app/browser access
* Add tauri (resolves #2291 , #4791 , #3799 , #4388 )
2024-06-06 15:19:03 -07:00
royjhan
4bf1da4944
Separate ListResponse and ModelResponse for api/tags vs api/ps ( #4842 )
...
* Remove false time fields
* Struct Separation for List and Process
* Remove Marshaler
2024-06-06 10:11:45 -07:00
Blake Mizerany
de5beb06b3
server: skip blob verification for already verified blobs
2024-06-05 16:39:11 -07:00
Sam
98e65929dc
docs(tools): add gollama ( #4829 )
2024-06-05 14:13:39 -07:00
Michael Yang
66ab48772f
proper utf16 support
2024-06-05 13:11:50 -07:00
Michael Yang
22fcf8f7de
Merge pull request #3737 from ollama/mxyng/modelname-4
...
update create handler to use model.Name
2024-06-05 12:05:05 -07:00
royjhan
28c7813ac4
API PS Documentation ( #4822 )
...
* API PS Documentation
2024-06-05 11:06:53 -07:00
Kartikeya Mishra
1d8616d30f
docs: update to add LLocal.in to web & desktop integrations ( #4719 )
2024-06-04 14:43:59 -07:00
Michael Yang
d61ef8b954
update create handler to use model.Name
2024-06-04 13:28:25 -07:00
Michael Yang
89d9900152
Merge pull request #4570 from ollama/mxyng/slices
...
lint some of the things
2024-06-04 13:27:05 -07:00
Michael
4a048715b6
local wording was confusing people
...
local wording was confusing people -- Ollama runs on cloud providers
2024-06-04 13:25:25 -07:00
Michael Yang
6297f85606
gofmt, goimports
2024-06-04 13:20:24 -07:00
Michael Yang
ed56428dd7
warn on intrange, usestdlibvars
2024-06-04 11:52:48 -07:00
Michael Yang
ad40b92b6a
disable intrange
2024-06-04 11:35:30 -07:00
Michael Yang
8ce4032e72
more lint
2024-06-04 11:13:30 -07:00
Michael Yang
42660466f8
no usestdlibvars
2024-06-04 11:13:30 -07:00
Michael Yang
e919f6811f
lint windows
2024-06-04 11:13:30 -07:00
Michael Yang
bf7edb0d5d
lint linux
2024-06-04 11:13:30 -07:00
Michael Yang
f38353d6b9
stdin.fd
2024-06-04 11:13:30 -07:00
Michael Yang
201d853fdf
nolintlint
2024-06-04 11:13:30 -07:00
Michael Yang
e40145a39d
lint
2024-06-04 11:13:30 -07:00
Michael Yang
c895a7d13f
some gocritic
2024-06-04 11:13:30 -07:00
Michael Yang
dad7a987ae
nosprintfhostport
2024-06-04 11:13:30 -07:00
Michael Yang
8ffb51749f
nolintlint
2024-06-04 11:13:30 -07:00
Michael Yang
55f6eba049
gofmt
2024-06-04 11:13:30 -07:00
Michael Yang
04f3c12bb7
replace x/exp/slices with slices
2024-06-04 11:13:30 -07:00
Shubham
60323e0805
add embed model command and fix question invoke ( #4766 )
...
* add embed model command and fix question invoke
* Update docs/tutorials/langchainpy.md
Co-authored-by: Kim Hallberg <hallberg.kim@gmail.com >
* Update docs/tutorials/langchainpy.md
---------
Co-authored-by: Kim Hallberg <hallberg.kim@gmail.com >
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com >
2024-06-03 22:20:48 -07:00
Jeffrey Morgan
d4a86102fd
update welcome prompt in windows to llama3 ( #4779 )
2024-06-01 21:05:51 -07:00
Jeffrey Morgan
476fb8e892
Limit GPU lib search for now ( #4777 )
...
* fix oneapi errors on windows 10
2024-06-01 19:24:33 -07:00
Michael Yang
829ff87bd1
revert tokenize ffi ( #4761 )
...
* Revert "use `int32_t` for call to tokenize (#4738 )"
This reverts commit 763bb65dbb .
* Revert "vocab only"
This reverts commit bf54c845e9 .
* Revert "use ffi for tokenizing/detokenizing"
This reverts commit 26a00a0410 .
2024-05-31 18:54:21 -07:00
Josh
f6b622c4b3
Merge pull request #4733 from ollama/jyan/isvalidname
...
added IsValidNamespace function
2024-05-31 14:08:45 -07:00
Josh Yan
2e4da8eec2
added tests for IsValidNamespace
2024-05-31 11:48:07 -07:00
Jeffrey Morgan
763bb65dbb
use int32_t for call to tokenize ( #4738 )
...
* use `int32_t` for call to tokenize
* variable naming
* cleanup
* fix crash
2024-05-30 21:43:30 -07:00
Jeffrey Morgan
7ca9605f54
speed up tests by only building static lib ( #4740 )
2024-05-30 21:43:15 -07:00
Michael Yang
eb2c443a79
Merge pull request #4736 from ollama/mxyng/vocab-only
...
vocab only for tokenize
2024-05-30 17:21:00 -07:00
Michael Yang
278e25ea44
Merge pull request #4737 from ollama/mxyng/less-generate
...
only generate on relevant changes
2024-05-30 17:17:50 -07:00
Jeffrey Morgan
a50a87a7b8
partial offloading: allow flash attention and disable mmap ( #4734 )
...
* partial offloading: allow flash attention and disable mmap
* allow mmap with num_gpu=0
2024-05-30 16:58:01 -07:00
Michael Yang
98085015d5
only generate on relevant changes
2024-05-30 16:54:11 -07:00
Michael Yang
bf54c845e9
vocab only
2024-05-30 16:49:28 -07:00
Josh Yan
c365f195a8
directly use isvalidpart
2024-05-30 16:40:04 -07:00
Josh
e91d0ef737
Merge pull request #4728 from ollama/jyan/japanese
...
fixed japanese characters deleted at end of line
2024-05-30 16:25:12 -07:00
Jeffrey Morgan
22f5c12ced
Update llama.cpp submodule to 5921b8f0 ( #4731 )
...
* update llama.cpp submodule to `5921b8f089d3b7bda86aac5a66825df6a6c10603`
* add patch
2024-05-30 16:20:22 -07:00
Josh Yan
298c996e54
added IsValidNamespace function
2024-05-30 16:02:07 -07:00
Daniel Hiltgen
0fc0cfc6d2
Merge pull request #4594 from dhiltgen/doc_container_workarounds
...
Add isolated gpu test to troubleshooting
2024-05-30 13:10:54 -07:00
Josh Yan
914f68f021
replaced duplicate call with variable
2024-05-30 10:38:07 -07:00
Josh Yan
bd1d119ba9
fixed japanese characters deleted at end of line
2024-05-30 10:24:21 -07:00
Lei Jitang
a03be18189
Fix OLLAMA_LLM_LIBRARY with wrong map name and add more env vars to help message ( #4663 )
...
* envconfig/config.go: Fix wrong description of OLLAMA_LLM_LIBRARY
Signed-off-by: Lei Jitang <leijitang@outlook.com >
* serve: Add more env to help message of ollama serve
Add more enviroment variables to `ollama serve --help`
to let users know what can be configurated.
Signed-off-by: Lei Jitang <leijitang@outlook.com >
---------
Signed-off-by: Lei Jitang <leijitang@outlook.com >
2024-05-30 09:36:51 -07:00
Michael Yang
96bc232b43
Merge pull request #4413 from ollama/mxyng/name-check
...
check if name exists before create/pull/copy
2024-05-29 12:06:58 -07:00
Michael Yang
bca7b12284
Merge pull request #3718 from ollama/mxyng/modelname-3
...
update delete handler to use model.Name
2024-05-29 12:02:07 -07:00
Michael Yang
32cb1960c1
Merge pull request #4380 from ollama/mxyng/tokenize
...
use tokenize/detokenize
2024-05-29 12:00:59 -07:00
Michael Yang
de781b37c8
rm unused infill
2024-05-29 11:26:47 -07:00
Michael Yang
3e21799377
rm unused system prompt
2024-05-29 11:26:47 -07:00
Michael Yang
26a00a0410
use ffi for tokenizing/detokenizing
2024-05-29 11:26:47 -07:00
Daniel Hiltgen
646371f56d
Merge pull request #3278 from zhewang1-intc/rebase_ollama_main
...
Enabling ollama to run on Intel GPUs with SYCL backend
2024-05-28 16:30:50 -07:00
Jeffrey Morgan
1f5008544b
Update install.sh
2024-05-28 15:01:22 -07:00
Jeffrey Morgan
45cbfc5aee
fix wsl2 status check for nvidia cards ( #4689 )
2024-05-28 14:49:46 -07:00
Jeffrey Morgan
6d423b383b
Improve install experience on WSL2 and Linux ( #4653 )
2024-05-28 14:41:50 -07:00
Josh
ad897080a2
working on integration of multi-byte and multi-width runes ( #4549 )
...
* integrated runewidth for display management - fixed cursor movement for mutli-width char
* updated input and deletion of multi-byte chars
* fixed line history with some exceptions
* improved insert and add
* fixed issues with moving across lines
* end of line extra space tracking'
* saved changes
* fixed end of line issues with empty spaces
* worked some more
* worked on end of line
* fixed failed test
* fixed minor inserting bug
* fixed movement hotkeys
* adjusted hotkeys
* removed comments
* Update readline/buffer.go
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com >
* Update readline/buffer.go
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com >
* Update readline/buffer.go
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com >
* Update readline/buffer.go
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com >
* Update readline/buffer.go
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com >
* Update readline/buffer.go
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com >
* Update readline/buffer.go
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com >
* Update readline/buffer.go
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com >
* Update readline/buffer.go
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com >
* Update readline/buffer.go
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com >
* Update readline/buffer.go
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com >
* Update readline/buffer.go
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com >
* deleted comments and duplicate code
* removed duplicate code
* added comments, refactored add function to use addChar
* added helper to retrieve lineSpacing, renamed lineFlags for clarity
* fixed remove()
---------
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com >
2024-05-28 12:04:03 -07:00
Jeffrey Morgan
b7d316d98d
fix nvidia detection in install script ( #4683 )
2024-05-28 09:59:36 -07:00
Daniel Hiltgen
d7339fad52
Merge pull request #4682 from dhiltgen/more_time
...
Give the final model loading more time
2024-05-28 09:36:02 -07:00
Daniel Hiltgen
92c81e8117
Give the final model loading more time
...
On some systems, 1 minute isn't sufficient to finish the load after it
hits 100% This creates 2 distinct timers, although they're both set to
the same value for now so we can refine the timeouts further.
2024-05-28 09:08:10 -07:00
Tai
9db0996ed4
Add OllamaSpring Project to Readme ( #4672 )
...
* Add OllamaSpring Project to Readme
* Update README.md
---------
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com >
2024-05-27 19:58:26 -07:00
Orfeo Ciano
6f43898b17
Adds olpaka flutter client ( #4647 )
...
* Adds olpaka flutter client
* Update README.md
---------
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com >
2024-05-27 17:22:01 -07:00
Lei Jitang
7487229c34
llm/server.go: Fix 2 minor typos ( #4661 )
...
Signed-off-by: Lei Jitang <leijitang@outlook.com >
2024-05-27 17:21:10 -07:00
Rayan Mostovoi
8a8e7afa96
small fix on examples/python-simplechat/client.py to actually get a streamed response and get tokens printed as we receive it ( #4671 )
2024-05-27 17:19:20 -07:00
Jeffrey Morgan
c79f8c9c39
Ensure nvidia and nvidia_uvm kernel modules are loaded in install.sh script and at startup ( #4652 )
...
* ensure kernel modules are loaded in `install.sh` script and at startup
* indentation
* use `SUDO` variable
* restart if nouveau is detected
* consistent success message for AMD
2024-05-26 14:57:17 -07:00
Jeffrey Morgan
485016bfbb
Update install.sh
2024-05-26 11:46:00 -07:00
Daniel Hiltgen
0165ba1651
Merge pull request #4638 from dhiltgen/better_error
...
Report better warning on client closed abort of load
2024-05-25 14:32:28 -07:00
Daniel Hiltgen
c4209d6d21
Report better warning on client closed abort of load
...
If the client closes the connection before we finish loading the model
we abort, so lets make the log message clearer why to help users
understand this failure mode
2024-05-25 09:23:28 -07:00
Michael Yang
6adca97f37
Merge pull request #4619 from noxer/patch-1
...
Fix download retry issue
2024-05-24 17:21:57 -07:00
Michael Yang
9a3c8003c8
Merge pull request #4624 from ollama/mxyng/fix-5
...
fix q5_0, q5_1
2024-05-24 16:11:21 -07:00
Michael Yang
d51f15257c
Update llm/ggml.go
...
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com >
2024-05-24 16:10:43 -07:00
Michael Yang
8f440d579a
fix q5_0, q5_1
2024-05-24 16:01:46 -07:00
Patrick Devine
4cc3be3035
Move envconfig and consolidate env vars ( #4608 )
2024-05-24 14:57:15 -07:00
Tim Scheuermann
db2ffa79f1
Fix download retry issue
2024-05-24 20:30:42 +02:00
Jeffrey Morgan
afd2b058b4
set codesign timeout to longer ( #4605 )
2024-05-23 22:46:23 -07:00
Wang,Zhe
fd5971be0b
support ollama run on Intel GPUs
2024-05-24 11:18:27 +08:00
Daniel Hiltgen
89bf98bcf2
Merge pull request #4598 from dhiltgen/docs
...
Tidy up developer guide a little
2024-05-23 15:14:29 -07:00
Daniel Hiltgen
1b2d156094
Tidy up developer guide a little
2024-05-23 15:14:05 -07:00
Michael Yang
714adb8bd1
bump ( #4597 )
2024-05-23 14:16:26 -07:00
Daniel Hiltgen
95b1133d0c
Merge pull request #4547 from dhiltgen/load_progress
...
Wire up load progress
2024-05-23 14:06:02 -07:00
Daniel Hiltgen
b37b496a12
Wire up load progress
...
This doesn't expose a UX yet, but wires the initial server portion
of progress reporting during load
2024-05-23 13:36:48 -07:00
Bruce MacDonald
d6f692ad1a
Add support for IQ1_S, IQ3_S, IQ2_S, IQ4_XS. IQ4_NL ( #4322 )
...
Co-authored-by: ManniX-ITA <20623405+mann1x@users.noreply.github.com >
2024-05-23 13:21:49 -07:00
Daniel Hiltgen
f77713bf1f
Add isolated gpu test to troubleshooting
2024-05-23 09:33:25 -07:00
Jeffrey Morgan
38255d2af1
Use flash attention flag for now ( #4580 )
...
* put flash attention behind flag for now
* add test
* remove print
* up timeout for sheduler tests
2024-05-22 21:52:09 -07:00
Michael
73630a7e85
add phi 3 medium ( #4578 )
2024-05-22 12:53:45 -04:00
Ikko Eltociear Ashimine
955c317cab
chore: update tokenizer.go ( #4571 )
...
PreTokenziers -> PreTokenizers
2024-05-22 00:25:23 -07:00
Josh
9f18b88a06
Merge pull request #4566 from ollama/jyan/shortcuts
...
add Ctrl + W shortcut
2024-05-21 22:49:36 -07:00
Josh Yan
353f83a9c7
add Ctrl + W shortcut
2024-05-21 16:55:09 -07:00
Patrick Devine
3bade04e10
doc updates for the faq/troubleshooting ( #4565 )
2024-05-21 15:30:09 -07:00
Michael Yang
a6d0f443eb
Merge pull request #4543 from ollama/mxyng/simple-safetensors
...
simplify safetensors reading
2024-05-21 14:43:55 -07:00
Michael Yang
96236b7968
Merge pull request #4268 from ollama/pdevine/llama3
...
Convert directly from llama3
2024-05-21 14:43:37 -07:00
Sang Park
4434d7f447
Correct typo in error message ( #4535 )
...
The spelling of the term "request" has been corrected, which was previously mistakenly written as "requeset" in the error log message.
2024-05-21 13:39:01 -07:00
Michael Yang
171eb040fc
simplify safetensors reading
2024-05-21 11:28:22 -07:00
Michael Yang
3591bbe56f
add test
2024-05-21 11:28:22 -07:00
Michael Yang
34d5ef29b3
fix conversion for f16 or f32 inputs
2024-05-21 11:28:22 -07:00
Michael Yang
bbbd9f20f3
cleanup
2024-05-20 16:13:57 -07:00
Michael Yang
547132e820
bpe pretokenizer
2024-05-20 16:13:57 -07:00
Patrick Devine
2d315ba9a9
add missing file
2024-05-20 16:13:57 -07:00
Patrick Devine
d355d2020f
add fixes for llama
2024-05-20 16:13:57 -07:00
Patrick Devine
c8cf0d94ed
llama3 conversion
2024-05-20 16:13:57 -07:00
Patrick Devine
4730762e5c
add safetensors version
2024-05-20 16:13:57 -07:00
Patrick Devine
d88582dffd
some changes for llama3
2024-05-20 16:13:57 -07:00
Michael Yang
2f81b3dce2
Merge pull request #4502 from ollama/mxyng/fix-quantize
...
fix quantize file types
2024-05-20 16:09:27 -07:00
jmorganca
5cab13739e
set llama.cpp submodule commit to 614d3b9
2024-05-20 15:28:17 -07:00
Josh Yan
8aadad9c72
updated updateURL
2024-05-20 15:24:32 -07:00
Michael Yang
807d092761
fix quantize file types
2024-05-20 15:22:11 -07:00
Michael Yang
f36f1d6be9
tidy intermediate blobs
2024-05-20 15:15:06 -07:00
alwqx
8800c8a59b
chore: fix typo in docs ( #4536 )
2024-05-20 14:19:03 -07:00
Michael Yang
b4dce13309
Merge pull request #4330 from ollama/mxyng/cache-intermediate-layers
...
cache and reuse intermediate blobs
2024-05-20 13:54:41 -07:00
Sam
e15307fdf4
feat: add support for flash_attn ( #4120 )
...
* feat: enable flash attention if supported
* feat: enable flash attention if supported
* feat: enable flash attention if supported
* feat: add flash_attn support
2024-05-20 13:36:03 -07:00
Michael Yang
3520c0e4d5
cache and reuse intermediate blobs
...
particularly useful for zipfiles and f16s
2024-05-20 13:25:10 -07:00
Patrick Devine
ccdf0b2a44
Move the parser back + handle utf16 files ( #4533 )
2024-05-20 11:26:45 -07:00
jmorganca
63a453554d
go mod tidy
2024-05-19 23:03:57 -07:00
Patrick Devine
105186aa17
add OLLAMA_NOHISTORY to turn off history in interactive mode ( #4508 )
2024-05-18 11:51:57 -07:00
Daniel Hiltgen
ba04afc9a4
Merge pull request #4483 from dhiltgen/clean_exit
...
Don't return error on signal exit
2024-05-17 11:41:57 -07:00
Daniel Hiltgen
7e1e0086e7
Merge pull request #4482 from dhiltgen/integration_improvements
...
Skip max queue test on remote
2024-05-16 16:43:48 -07:00
Daniel Hiltgen
02b31c9dc8
Don't return error on signal exit
2024-05-16 16:25:38 -07:00
Daniel Hiltgen
7f2fbad736
Skip max queue test on remote
...
This test needs to be able to adjust the queue size down from
our default setting for a reliable test, so it needs to skip on
remote test execution mode.
2024-05-16 16:24:18 -07:00
Josh
5bece94509
Merge pull request #4463 from ollama/jyan/line-display
...
changed line display to be calculated with runewidth
2024-05-16 14:15:08 -07:00
Josh Yan
3d90156e99
removed comment
2024-05-16 14:12:03 -07:00
Rose Heart
5e46c5c435
Updating software for read me ( #4467 )
...
* Update README.md
Added chat/moderation bot to list of software.
* Update README.md
Fixed link error.
2024-05-16 13:55:14 -07:00
Jeffrey Morgan
583c1f472c
update llama.cpp submodule to 614d3b9 ( #4414 )
2024-05-16 13:53:09 -07:00
Josh Yan
26bfc1c443
go fmt'd cmd.go
2024-05-15 17:26:39 -07:00
Josh Yan
799aa9883c
go fmt'd cmd.go
2024-05-15 17:24:17 -07:00
Michael Yang
84ed77cbd8
Merge pull request #4436 from ollama/mxyng/done-part
...
return on part done
2024-05-15 17:16:24 -07:00
Josh Yan
c9e584fb90
updated double-width display
2024-05-15 16:45:24 -07:00
Josh Yan
17b1e81ca1
fixed width and word count for double spacing
2024-05-15 16:29:33 -07:00
Daniel Hiltgen
7e9a2da097
Merge pull request #4462 from dhiltgen/opt_out_build
...
Port cuda/rocm skip build vars to linux
2024-05-15 16:27:47 -07:00
Daniel Hiltgen
c48c1d7c46
Port cuda/rocm skip build vars to linux
...
Windows already implements these, carry over to linux.
2024-05-15 15:56:43 -07:00
Patrick Devine
d1692fd3e0
fix the cpu estimatedTotal memory + get the expiry time for loading models ( #4461 )
2024-05-15 15:43:16 -07:00
Daniel Hiltgen
5fa36a0833
Merge pull request #4459 from dhiltgen/sanitize_env_log
...
Sanitize the env var debug log
2024-05-15 14:58:55 -07:00
Daniel Hiltgen
853ae490e1
Sanitize the env var debug log
...
Only dump env vars we care about in the logs
2024-05-15 14:42:57 -07:00
Patrick Devine
f2cf97d6f1
fix typo in modelfile generation ( #4439 )
2024-05-14 15:34:29 -07:00
Patrick Devine
c344da4c5a
fix keepalive for non-interactive mode ( #4438 )
2024-05-14 15:17:04 -07:00
Michael Yang
85a57006d1
check if name exists before create/pull/copy
2024-05-14 14:58:58 -07:00
Michael Yang
c5e892cb3e
update tests
2024-05-14 14:56:31 -07:00
Michael Yang
81fb06f530
more resilient Manifests
2024-05-14 14:08:24 -07:00
Michael Yang
a385382ff5
filepath.Join
2024-05-14 14:08:24 -07:00
Michael Yang
b8772a353f
remove DeleteModel
2024-05-14 14:08:24 -07:00
Michael Yang
c2714fcbfd
routes: use Manifests for ListHandler
2024-05-14 14:08:24 -07:00
Michael Yang
a2fc933fed
update delete handler to use model.Name
2024-05-14 14:08:24 -07:00
Michael Yang
0e331c7168
Merge pull request #4328 from ollama/mxyng/mem
...
count memory up to NumGPU if set by user
2024-05-14 13:47:44 -07:00
Michael Yang
ac145f75ca
return on part done
2024-05-14 13:04:30 -07:00
Patrick Devine
a4b8d1f89a
re-add system context ( #4435 )
2024-05-14 11:38:20 -07:00
Ryo Machida
798b107f19
Fixed the API endpoint /api/tags when the model list is empty. ( #4424 )
...
* Fixed the API endpoint /api/tags to return {models: []} instead of {models: null} when the model list is empty.
* Update server/routes.go
---------
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com >
2024-05-14 11:18:10 -07:00
Daniel Hiltgen
6a1b471365
Merge pull request #4430 from dhiltgen/gpu_info
...
Remove VRAM convergence check for windows
2024-05-14 10:59:06 -07:00
Daniel Hiltgen
ec231a7923
Remove VRAM convergence check for windows
...
The APIs we query are optimistic on free space, and windows pages
VRAM, so we don't have to wait to see reported usage recover on unload
2024-05-14 09:53:46 -07:00
Patrick Devine
7ca71a6b0f
don't abort when an invalid model name is used in /save ( #4416 )
2024-05-13 18:48:28 -07:00
Josh
7607e6e902
Merge pull request #4379 from WolfTheDeveloper/main
...
Update `LlamaScript` to point to new link from Legacy link.
2024-05-13 18:08:32 -07:00
Patrick Devine
f1548ef62d
update the FAQ to be more clear about windows env variables ( #4415 )
2024-05-13 18:01:13 -07:00
Patrick Devine
6845988807
Ollama ps command for showing currently loaded models ( #4327 )
2024-05-13 17:17:36 -07:00
Josh
9eed4a90ce
Merge pull request #4411 from joshyan1/main
...
removed inconsistent punctuation
2024-05-13 15:30:45 -07:00
Josh Yan
f8464785a6
removed inconsistencies
2024-05-13 14:50:52 -07:00
Michael Yang
1d359e737e
typo
2024-05-13 14:18:34 -07:00
Michael Yang
50b9056e09
count memory up to NumGPU
2024-05-13 14:13:10 -07:00
Josh Yan
91a090a485
removed inconsistent punctuation
2024-05-13 14:08:22 -07:00
睡觉型学渣
9c76b30d72
Correct typos. ( #4387 )
...
* Correct typos.
* Correct typos.
2024-05-12 18:21:11 -07:00
Zander Lewis
93f19910c5
Update LlamaScript to point to new link.
...
Still used Legacy link.
2024-05-12 11:24:21 -04:00
jmorganca
4ec7445a6f
Revert "use post token"
...
This reverts commit 0fec3525ad .
2024-05-11 22:19:14 -07:00
Michael Yang
0372c51f82
Merge pull request #4369 from ollama/mxyng/post-token
...
use post token
2024-05-11 19:29:14 -07:00
Michael Yang
0fec3525ad
use post token
2024-05-11 19:13:16 -07:00
Jeffrey Morgan
41ba3017fd
Fix OpenAI finish_reason values when empty ( #4368 )
2024-05-11 15:31:41 -07:00
todashuta
8080fbce35
fix ollama create's usage string ( #4362 )
2024-05-11 14:47:49 -07:00
Michael Yang
ec14f6ceda
case sensitive filepaths ( #4366 )
2024-05-11 14:12:36 -07:00
Daniel Hiltgen
c60a086635
Merge pull request #4331 from dhiltgen/fix_unit
...
Fix envconfig unit test
2024-05-11 09:16:28 -07:00
jmorganca
92ca2cca95
Revert "only forward some env vars"
...
This reverts commit ce3b212d12 .
2024-05-10 22:53:21 -07:00
Patrick Devine
1e1634daca
update go deps ( #4324 )
2024-05-10 21:39:27 -07:00
Daniel Hiltgen
824ee5446f
Fix envconfig unit test
2024-05-10 16:49:48 -07:00
Daniel Hiltgen
879e2caf8c
Merge pull request #4329 from dhiltgen/zero_layers
...
Fall back to CPU runner with zero layers
2024-05-10 15:23:16 -07:00
Daniel Hiltgen
c4014e73a2
Fall back to CPU runner with zero layers
2024-05-10 15:09:48 -07:00
Daniel Hiltgen
be9efdb981
Merge pull request #4326 from dhiltgen/fix_integration
...
Integration fixes
2024-05-10 14:25:59 -07:00
Daniel Hiltgen
074dc3b9d8
Integration fixes
2024-05-10 14:20:10 -07:00
Daniel Hiltgen
86f9b582d5
Merge pull request #4323 from dhiltgen/sort_by_free
...
Always use the sorted list of GPUs
2024-05-10 14:12:15 -07:00
Daniel Hiltgen
4142c3ef7c
Always use the sorted list of GPUs
...
Make sure the first GPU has the most free space
2024-05-10 13:53:21 -07:00
Jeffrey Morgan
6602e793c0
Use --quantize flag and quantize api parameter ( #4321 )
...
* rename `--quantization` to `--quantize`
* backwards
* Update api/types.go
Co-authored-by: Michael Yang <mxyng@pm.me >
---------
Co-authored-by: Michael Yang <mxyng@pm.me >
2024-05-10 13:06:13 -07:00
Michael Yang
ea0fdaed28
Merge pull request #4320 from ollama/mxyng/phi2-mem
...
add phi2 mem
2024-05-10 12:35:08 -07:00
Michael Yang
1eb382da5a
add phi2 mem
2024-05-10 12:13:28 -07:00
Jeffrey Morgan
bb6fd02298
Don't clamp ctx size in PredictServerFit ( #4317 )
...
* dont clamp ctx size in `PredictServerFit`
* minimum 4 context
* remove context warning
2024-05-10 10:17:12 -07:00
Daniel Hiltgen
7e2bceceee
Merge pull request #4316 from dhiltgen/more_buffer
...
Bump VRAM buffer back up
2024-05-10 10:02:34 -07:00
Daniel Hiltgen
30a7d7096c
Bump VRAM buffer back up
...
Under stress scenarios we're seeing OOMs so this should help stabilize
the allocations under heavy concurrency stress.
2024-05-10 09:15:28 -07:00
Michael Yang
200a18820e
Merge pull request #4306 from ollama/mxyng/fix-routes
2024-05-10 08:58:16 -07:00
Michael Yang
e03637176d
fix(routes): skip bad manifests
2024-05-10 08:46:11 -07:00
Bruce MacDonald
c02db93243
omit empty done reason
2024-05-09 16:45:29 -07:00
Michael Yang
ffa4d5134a
Merge pull request #4305 from ollama/mxyng/typo
...
fix typo
2024-05-09 16:42:09 -07:00
Jeffrey Morgan
302d7fdbf3
prune partial downloads ( #4272 )
2024-05-09 16:35:20 -07:00
Michael Yang
cf442cd57e
fix typo
2024-05-09 16:23:37 -07:00
Michael Yang
0e1ba65855
Merge pull request #4302 from ollama/mxyng/forward-env
...
only forward some env vars
2024-05-09 16:21:05 -07:00
Michael Yang
6aad333c63
Merge pull request #4298 from ollama/mxyng/log-cleanup
...
log clean up
2024-05-09 16:20:57 -07:00
Daniel Hiltgen
4fcc84e67a
Merge pull request #4304 from dhiltgen/signals
...
Fix race in shutdown logic
2024-05-09 15:58:44 -07:00
Daniel Hiltgen
3ae2f441e0
Fix race in shutdown logic
...
Ensure the runners are terminated
2024-05-09 15:54:02 -07:00
Zander Lewis
2abb3f6424
Update README.md ( #4300 )
2024-05-09 15:30:49 -07:00
Michael Yang
ce3b212d12
only forward some env vars
2024-05-09 15:16:09 -07:00
Daniel Hiltgen
83d6d46e29
Merge pull request #4299 from dhiltgen/handle_vram_reporting_lag
...
Wait for GPU free memory reporting to converge
2024-05-09 15:08:56 -07:00
Daniel Hiltgen
354ad9254e
Wait for GPU free memory reporting to converge
...
The GPU drivers take a while to update their free memory reporting, so we need
to wait until the values converge with what we're expecting before proceeding
to start another runner in order to get an accurate picture.
2024-05-09 14:56:01 -07:00
Michael Yang
58876091f7
log clean up
2024-05-09 14:55:36 -07:00
Daniel Hiltgen
dc18eee39d
Merge pull request #4238 from dhiltgen/gpu_info
...
Record more GPU information
2024-05-09 14:26:58 -07:00
Daniel Hiltgen
8727a9c140
Record more GPU information
...
This cleans up the logging for GPU discovery a bit, and can
serve as a foundation to report GPU information in a future UX.
2024-05-09 14:18:14 -07:00
Daniel Hiltgen
d0425f26cf
Merge pull request #4294 from dhiltgen/harden_subprocess_reaping
...
Harden subprocess reaping
2024-05-09 14:02:16 -07:00
Bruce MacDonald
cfa84b8470
add done_reason to the api ( #4235 )
2024-05-09 13:30:14 -07:00
Michael Yang
1580ed4c06
Merge pull request #4295 from ollama/mxyng/fix-list
...
routes: skip invalid filepaths
2024-05-09 11:37:34 -07:00
Michael Yang
a7ee84fc31
routes: skip invalid filepaths
2024-05-09 11:23:22 -07:00
Daniel Hiltgen
84ac7ce139
Refine subprocess reaping
2024-05-09 11:21:31 -07:00
tusharhero
788b092c49
docs: add Guix package manager in README. ( #4040 )
2024-05-09 11:10:24 -07:00
J S
5cde17a096
Add PromptingTools.jl ( #2192 )
2024-05-09 09:39:05 -07:00
Daniel Hiltgen
c3837eb08c
Merge pull request #4289 from dhiltgen/doc_container_workarounds
...
Doc container usage and workaround for nvidia errors
2024-05-09 09:27:29 -07:00
Daniel Hiltgen
8cc0ee2efe
Doc container usage and workaround for nvidia errors
2024-05-09 09:26:45 -07:00
Jeffrey Morgan
d5eec16d23
use model defaults for num_gqa, rope_frequency_base and rope_frequency_scale ( #1983 )
2024-05-09 09:06:13 -07:00
Carlos Gamez
daa1a032f7
Update langchainjs.md ( #2027 )
...
Updated sample code as per warning notification from the package maintainers
2024-05-08 20:21:03 -07:00
jmorganca
6042e8bc57
remove bash-comparemodels example
2024-05-08 19:49:45 -07:00
Daniel Hiltgen
920a4b0794
Merge remote-tracking branch 'upstream/main' into pr3702
2024-05-08 16:44:35 -07:00
Daniel Hiltgen
ee49844d09
Merge pull request #4153 from dhiltgen/gpu_verbose_response
...
Add GPU usage
2024-05-08 16:39:11 -07:00
Daniel Hiltgen
8a516ac862
Merge pull request #4241 from dhiltgen/fix_tmp_override
...
Detect noexec and report a better error
2024-05-08 15:34:22 -07:00
Daniel Hiltgen
bee2f4a3b0
Record GPU usage information
...
This records more GPU usage information for eventual UX inclusion.
2024-05-08 14:45:39 -07:00
Bruce MacDonald
cef45feaa4
Add preflight OPTIONS handling and update CORS config ( #4086 )
...
* Add preflight OPTIONS handling and update CORS config
- Implement early return with HTTP 204 (No Content) for OPTIONS requests in allowedHostsMiddleware to optimize preflight handling.
- Extend CORS configuration to explicitly allow 'Authorization' headers and 'OPTIONS' method when OLLAMA_ORIGINS environment variable is set.
* allow auth, content-type, and user-agent headers
* Update routes.go
2024-05-08 13:14:00 -07:00
Michael Yang
2687f02c96
Merge pull request #4265 from ollama/mxyng/fix-show-llava
...
routes: fix show llava models
2024-05-08 12:51:21 -07:00
Michael Yang
b25976aeb8
routes: fix show llava models
2024-05-08 12:43:36 -07:00
Michael Yang
001f167aad
Merge pull request #4261 from ollama/mxyng/fix-tag-case
...
types/model: fix tag case
2024-05-08 11:09:47 -07:00
Michael Yang
486a2c1d94
types/model: fix tag case
2024-05-08 08:47:16 -07:00
Michael Yang
88cf154483
Merge pull request #4244 from ollama/mxyng/skip-if-same
...
skip if same quantization
2024-05-07 19:03:37 -07:00
Bruce MacDonald
8cbd3e7510
skip hidden files in list models handler ( #4247 )
2024-05-07 19:01:45 -07:00
Michael Yang
eeb695261f
skip if same quantization
2024-05-07 17:44:19 -07:00
Bruce MacDonald
dc9b1111e0
fix invalid destination error message
2024-05-07 17:35:52 -07:00
Tobias Gårdhus
06ac829e70
Fix help string for stop parameter ( #2307 )
2024-05-07 16:48:35 -07:00
Daniel Hiltgen
72700279e2
Detect noexec and report a better error
...
This will bubble up a much more informative error message if noexec
is preventing us from running the subprocess
2024-05-07 16:46:15 -07:00
boessu
5d3f7fff26
Update langchainpy.md ( #4236 )
...
fixing pip code.
2024-05-07 16:36:34 -07:00
Eli Bendersky
d77c1c5f9d
api: fill up API documentation ( #3596 )
...
* api: fill up API documentation
Followup for #2878
Now that the documentation is more complete, mention it in the README.
Updates #2840
* fix typo/lint
* Update README.md
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com >
---------
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com >
2024-05-07 16:27:46 -07:00
Giuseppe Lumia
2a5302a1cf
Fix paste of text with line feed characters ( #3043 )
...
Some terminals may send line feed characters when pasting text with
newlines.
2024-05-07 15:26:07 -07:00
Michael Yang
ffbd3d173f
Merge pull request #3715 from ollama/mxyng/modelname-2
...
update list handler to use model.Name
2024-05-07 15:21:39 -07:00
Michael Yang
1e0a669f75
Merge pull request #3682 from ollama/mxyng/quantize-all-the-things
...
quantize any fp16/fp32 model
2024-05-07 15:20:49 -07:00
Bruce MacDonald
527e9be058
fix: store accurate model parameter size ( #4058 )
...
- add test for number formatting
- fix bug where 1B and 1M were not stored correctly
- display 2 decimal points for million param sizes
- display 1 decimal point for billion param sizes
2024-05-07 14:41:53 -07:00
Renat
34bea2e272
Add macai to list of Web & Desktop integrations ( #3881 )
2024-05-07 13:31:34 -07:00
Fernando Maclen
fe44ae3371
Update README.md ( #3884 )
2024-05-07 13:17:35 -07:00
Michael Yang
adeb40eaf2
Merge pull request #4231 from ollama/mxyng/parser
...
types/model: fix parser for empty values
2024-05-07 10:48:32 -07:00
Michael Yang
d7d33e5255
Merge pull request #951 from ollama/mxyng/example-fly
...
fly example
2024-05-07 10:46:24 -07:00
Michael Yang
63bc884e25
types/model: fix parser for empty values
2024-05-07 10:44:43 -07:00
Michael Yang
ef4e095d24
Merge pull request #4232 from ollama/revert-4190-fix/golang-ci
...
Revert "fix golangci workflow not enable gofmt and goimports"
2024-05-07 10:39:37 -07:00
Michael Yang
4d4f75a8a8
Revert "fix golangci workflow missing gofmt and goimports ( #4190 )"
...
This reverts commit 04f971c84b .
2024-05-07 10:35:44 -07:00
Mélony QIN
3f71ba406a
Correct the kubernetes terminology ( #3843 )
...
* add details on kubernetes deployment and separate the testing process
* Update examples/kubernetes/README.md
thanks for suggesting this change, I agree with you and let's make this project better together !
Co-authored-by: JonZeolla <Zeolla@gmail.com >
---------
Co-authored-by: QIN Mélony <MQN1@dsone.3ds.com >
Co-authored-by: JonZeolla <Zeolla@gmail.com >
2024-05-07 09:53:08 -07:00
Hause Lin
88a67127d8
Update README.md to include ollama-r library ( #4012 )
...
* Update README.md
Add Ollama for R - ollama-r library
* Update README.md
---------
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com >
2024-05-07 09:52:30 -07:00
Jeffrey Morgan
f7dc7dcc64
Update .gitattributes
2024-05-07 09:50:19 -07:00
alwqx
04f971c84b
fix golangci workflow missing gofmt and goimports ( #4190 )
2024-05-07 09:49:40 -07:00
Michael Yang
548a7df014
update list handler to use model.Name
2024-05-07 09:38:45 -07:00
Michael Yang
70edb9bc4d
Merge pull request #4215 from ollama/mxyng/mem
...
llm: add minimum based on layer size
2024-05-07 09:26:33 -07:00
Michael Yang
3f0ed03856
Update examples/flyio/README.md
...
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com >
2024-05-07 09:25:01 -07:00
Michael Yang
4736391bfb
llm: add minimum based on layer size
2024-05-06 17:04:19 -07:00
CrispStrobe
7c5330413b
note on naming restrictions ( #2625 )
...
* note on naming restrictions
else push would fail with cryptic
retrieving manifest
Error: file does not exist
==> maybe change that in code too
* Update docs/import.md
---------
Co-authored-by: C-4-5-3 <154636388+C-4-5-3@users.noreply.github.com >
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com >
2024-05-06 16:03:21 -07:00
Jeffrey Morgan
39d9d22ca3
close server on receiving signal ( #4213 )
2024-05-06 16:01:37 -07:00
Jackie Li
af47413dba
Add MarshalJSON to Duration ( #3284 )
...
---------
Co-authored-by: Patrick Devine <patrick@infrahq.com >
2024-05-06 15:59:18 -07:00
Michael Yang
b2f00aa977
close zip files
2024-05-06 15:27:19 -07:00
Michael Yang
6694be5e50
convert/llama: use WriteSeeker
2024-05-06 15:24:01 -07:00
Michael Yang
f5e8b207fb
s/DisplayLongest/String/
2024-05-06 15:24:01 -07:00
Michael Yang
d245460362
only quantize language models
2024-05-06 15:24:01 -07:00
Michael Yang
4d0d0fa383
no iterator
2024-05-06 15:24:01 -07:00
Michael Yang
7ffe45734d
rebase
2024-05-06 15:24:01 -07:00
Michael Yang
01811c176a
comments
2024-05-06 15:24:01 -07:00
Michael Yang
a7248f6ea8
update tests
2024-05-06 15:24:01 -07:00
Michael Yang
9685c34509
quantize any fp16/fp32 model
...
- FROM /path/to/{safetensors,pytorch}
- FROM /path/to/fp{16,32}.bin
- FROM model:fp{16,32}
2024-05-06 15:24:01 -07:00
Jeffrey Chen
d091fe3c21
Windows automatically recognizes username ( #3214 )
2024-05-06 15:03:14 -07:00
Mohamed A. Fouad
ee02f548c8
Update linux.md ( #3847 )
...
Add -e to viewing logs in order to show end of ollama logs
2024-05-06 15:02:25 -07:00
Daniel Hiltgen
b08870aff3
Merge pull request #4188 from dhiltgen/use_our_lib
...
User our bundled libraries (cuda) instead of the host library
2024-05-06 14:41:05 -07:00
Darinka
3ecae420ac
Update api.md ( #3945 )
...
* Update api.md
Changed the calculation of tps (token/s) in the documentation
* Update docs/api.md
---------
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com >
2024-05-06 14:39:58 -07:00
Daniel Hiltgen
4cbbf0e13b
Merge pull request #4090 from dhiltgen/rocm_paths
...
Support Fedoras standard ROCm location
2024-05-06 14:33:41 -07:00
Daniel Hiltgen
380378cc80
Use our libraries first
...
Trying to live off the land for cuda libraries was not the right strategy. We need to use the version we compiled against to ensure things work properly
2024-05-06 14:23:29 -07:00
Daniel Hiltgen
0963c65027
Merge pull request #4208 from dhiltgen/fix_sched_test
...
Fix stale test logic
2024-05-06 14:23:12 -07:00
Jeffrey Morgan
ed740a2504
Fix no slots available error with concurrent requests ( #4160 )
2024-05-06 14:22:53 -07:00
Jeffrey Morgan
c9f98622b1
Skip scheduling cancelled requests, always reload unloaded runners ( #4189 )
2024-05-06 14:22:24 -07:00
Daniel Hiltgen
0a954e5066
Fix stale test logic
...
The model processing was recently changed to be deferred but
this test scenario hadn't been adjusted for that change in behavior.
2024-05-06 14:15:37 -07:00
Adrien Brault
aa93423fbf
docs: pbcopy on mac ( #3129 )
2024-05-06 13:47:00 -07:00
Nurgo
01c9386267
Add BrainSoup to compatible clients list ( #3473 )
2024-05-06 13:42:16 -07:00
Daniel Hiltgen
af9eb36f9f
Merge pull request #4135 from dhiltgen/no_physx
...
Skip PhysX cudart library
2024-05-06 13:34:00 -07:00
Daniel Hiltgen
06093fd396
Merge pull request #4067 from dhiltgen/cudart
...
Add CUDA Driver API for GPU discovery
2024-05-06 13:30:27 -07:00
Tony Loehr
86b7fcac32
Update README.md with StreamDeploy ( #3621 )
...
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com >
2024-05-06 11:14:41 -07:00
Hyden Liu
fb8ddc564e
chore: delete HEAD ( #4194 )
2024-05-06 10:32:30 -07:00
Saif
242efe6611
👌 IMPROVE: add portkey library for production tools ( #4119 )
2024-05-06 10:25:23 -07:00
Jeffrey Morgan
1b0e6c9c0e
Fix llava models not working after first request ( #4164 )
...
* fix llava models not working after first request
* individual requests only for llava models
2024-05-05 20:50:31 -07:00
Jeffrey Morgan
dfa2f32ca0
unload in critical section ( #4187 )
2024-05-05 17:18:27 -07:00
Daniel Hiltgen
840424a2c4
Merge pull request #4154 from dhiltgen/central_config
...
Centralize server config handling
2024-05-05 17:08:26 -07:00
Daniel Hiltgen
f56aa20014
Centralize server config handling
...
This moves all the env var reading into one central module
and logs the loaded config once at startup which should
help in troubleshooting user server logs
2024-05-05 16:49:50 -07:00
alwqx
6707768ebd
chore: format go code ( #4149 )
2024-05-05 16:08:09 -07:00
Lord Basil - Automate EVERYTHING
c78bb76a12
update libraries for langchain_community + llama3 changed from llama2 ( #4174 )
2024-05-05 16:07:04 -07:00
Jeffrey Morgan
942c979232
allocate a large enough kv cache for all parallel requests ( #4162 )
2024-05-05 15:59:32 -07:00
Bernardo de Oliveira Bruning
06164911dd
Update README.md ( #4111 )
...
---------
Co-authored-by: Patrick Devine <patrick@infrahq.com >
2024-05-05 14:45:32 -07:00
Patrick Devine
2a21363bb7
validate the format of the digest when getting the model path ( #4175 )
2024-05-05 11:46:12 -07:00
Daniel Hiltgen
026869915f
Merge pull request #4144 from dhiltgen/max_queue
...
Make maximum pending request configurable
2024-05-05 10:53:44 -07:00
Daniel Hiltgen
45d61aaaa3
Add integration test to push max queue limits
2024-05-05 10:46:25 -07:00
Daniel Hiltgen
20f6c06569
Make maximum pending request configurable
...
This also bumps up the default to be 50 queued requests
instead of 10.
2024-05-04 21:00:52 -07:00
Daniel Hiltgen
371f5e52aa
Merge pull request #4141 from dhiltgen/win_docs
...
Explain the 2 different windows download options
2024-05-04 12:50:16 -07:00
Daniel Hiltgen
e006480e49
Explain the 2 different windows download options
2024-05-04 12:50:05 -07:00
Michael Yang
aed545872d
Merge pull request #4143 from ollama/mxyng/final-response
...
omit prompt and generate settings from final response
2024-05-03 17:39:49 -07:00
Michael Yang
44869c59d6
omit prompt and generate settings from final response
2024-05-03 17:00:02 -07:00
Daniel Hiltgen
52663284cf
Merge pull request #4145 from dhiltgen/fix_lint
...
Fix lint warnings
2024-05-03 16:53:17 -07:00
Daniel Hiltgen
42fa9d7f0a
Fix lint warnings
2024-05-03 16:44:19 -07:00
Michael Yang
b7a87a22b6
Merge pull request #4059 from ollama/mxyng/parser-2
...
rename parser to model/file
2024-05-03 13:01:22 -07:00
Dr Nic Williams
e8aaea030e
Update 'llama2' -> 'llama3' in most places ( #4116 )
...
* Update 'llama2' -> 'llama3' in most places
---------
Co-authored-by: Patrick Devine <patrick@infrahq.com >
2024-05-03 15:25:04 -04:00
Daniel Hiltgen
b1ad3a43cb
Skip PhysX cudart library
...
For some reason this library gives incorrect GPU information, so skip it
2024-05-03 11:55:32 -07:00
Daniel Hiltgen
267e25a750
Merge pull request #4129 from dhiltgen/unit_tests
...
Soften timeouts on sched unit tests
2024-05-03 11:10:26 -07:00
Daniel Hiltgen
9a32c514cb
Soften timeouts on sched unit tests
...
This gives us more headroom on the scheduler tests to tamp
down some flakes.
2024-05-03 09:08:33 -07:00
Michael Yang
e9ae607ece
Merge pull request #3892 from ollama/mxyng/parser
...
refactor modelfile parser
2024-05-02 17:04:47 -07:00
Michael Yang
93707fa3f2
Merge pull request #4108 from ollama/mxyng/lf
...
fix line ending
2024-05-02 14:55:15 -07:00
Michael Yang
94c369095f
fix line ending
...
replace CRLF with LF
2024-05-02 14:53:13 -07:00
Jeffrey Morgan
9164b0161b
Update .gitattributes
2024-05-02 14:06:31 -04:00
Daniel Hiltgen
e592e8fccb
Support Fedoras standard ROCm location
2024-05-01 15:47:12 -07:00
Bryce Reitano
bf4fc25f7b
Add a /clear command ( #3947 )
...
* Add a /clear command
* change help messages
---------
Co-authored-by: Patrick Devine <patrick@infrahq.com >
2024-05-01 17:44:36 -04:00
Michael Yang
5b806d8d24
Merge pull request #4089 from ollama/mxyng/target-invalid
...
server: destination invalid
2024-05-01 12:46:35 -07:00
Michael Yang
cb1e072643
Merge pull request #4087 from ollama/mxyng/fix-host-port
...
types/model: fix name for hostport
2024-05-01 12:42:07 -07:00
Michael Yang
45b6a12e45
server: target invalid
2024-05-01 12:40:45 -07:00
alwqx
68755f1f5e
chore: fix typo in docs/development.md ( #4073 )
2024-05-01 15:39:11 -04:00
Michael Yang
997a455039
want filepath
2024-05-01 12:33:41 -07:00
Michael Yang
88775e1ff9
strip scheme from name
2024-05-01 12:26:19 -07:00
Michael Yang
8867e744ff
types/model: fix name for hostport
2024-05-01 12:14:53 -07:00
Daniel Hiltgen
4fd064bea6
Merge pull request #4031 from MarkWard0110/fix/issue-3736
...
Fix/issue 3736: When runners are closing or expiring. Scheduler is getting dirty VRAM size readings.
2024-05-01 12:13:26 -07:00
Jeffrey Morgan
59fbceedcc
use lf for line endings ( #4085 )
2024-05-01 15:02:45 -04:00
Mark Ward
321d57e1a0
Removing go routine calling .wait from load.
2024-05-01 18:51:10 +00:00
Mark Ward
ba26c7aa00
it will always return an error due to Kill() discarding Wait() errors
2024-05-01 18:51:10 +00:00
Mark Ward
63c763685f
log when the waiting for the process to stop to help debug when other tasks execute during this wait.
...
expire timer clear the timer reference because it will not be reused.
close will clean up expireTimer if calling code has not already done this.
2024-05-01 18:51:10 +00:00
Mark Ward
34a4a94f13
ignore debug bin files
2024-05-01 18:51:10 +00:00
Mark Ward
f4a73d57a4
fix runner expire during active use. Clearing the expire timer as it is used. Allowing the finish to assign an expire timer so that the runner will expire after no use.
2024-05-01 18:51:10 +00:00
Mark Ward
948114e3e3
fix sched to wait for the runner to terminate to ensure following vram check will be more accurate
2024-05-01 18:51:10 +00:00
Arpit Jain
a3e60d9058
README.md: fix typos ( #4007 )
...
Co-authored-by: Blake Mizerany <blake.mizerany@gmail.com >
2024-05-01 10:39:38 -07:00
Michael Yang
8acb233668
use strings.Builder
2024-05-01 10:01:09 -07:00
Michael Yang
119589fcb3
rename parser to model/file
2024-05-01 09:53:50 -07:00
Michael Yang
5ea844964e
cmd: import regexp
2024-05-01 09:53:45 -07:00
Michael Yang
bd8eed57fc
fix parser name
2024-05-01 09:52:54 -07:00
Michael Yang
9cf0f2e973
use parser.Format instead of templating modelfile
2024-05-01 09:52:54 -07:00
Michael Yang
176ad3aa6e
parser: add commands format
2024-05-01 09:52:54 -07:00
Michael Yang
4d08363580
comments
2024-05-01 09:52:54 -07:00
Michael Yang
8907bf51d2
fix multiline
2024-05-01 09:52:54 -07:00
Michael Yang
abe614c705
tests
2024-05-01 09:52:54 -07:00
Michael Yang
238715037d
linting
2024-05-01 09:52:54 -07:00
Michael Yang
c0a00f68ae
refactor modelfile parser
2024-05-01 09:52:54 -07:00
Jeffrey Morgan
f0c454ab57
gpu: add 512MiB to darwin minimum, metal doesn't have partial offloading overhead ( #4068 )
2024-05-01 11:46:03 -04:00
Daniel Hiltgen
089daaeabc
Add CUDA Driver API for GPU discovery
...
We're seeing some corner cases with cudart which might be resolved by
switching to the driver API which comes bundled with the driver package
2024-04-30 18:00:45 -07:00
Blake Mizerany
b9f74ff3d6
types/model: reintroduce Digest ( #4065 )
2024-04-30 16:38:03 -07:00
jmorganca
fcf4d60eee
llm: add back check for empty token cache
2024-04-30 17:38:44 -04:00
jmorganca
e33d5c2dbc
update llama.cpp commit to 952d03d
2024-04-30 17:31:20 -04:00
Jeffrey Morgan
18d9a7e1f1
update llama.cpp submodule to f364eb6 ( #4060 )
2024-04-30 17:25:39 -04:00
Michael
8488388cbd
Update README.md
2024-04-30 15:45:56 -04:00
Blake Mizerany
588901f449
types/model: reduce Name.Filepath allocs from 5 to 2 ( #4039 )
2024-04-30 11:09:19 -07:00
Bruce MacDonald
0a7fdbe533
prompt to display and add local ollama keys to account ( #3717 )
...
- return descriptive error messages when unauthorized to create blob or push a model
- display the local public key associated with the request that was denied
2024-04-30 11:02:08 -07:00
Christian Frantzen
5950c176ca
Update langchainpy.md ( #4037 )
...
Updated the code a bit
2024-04-29 23:19:06 -04:00
Daniel Hiltgen
23d23409a0
Update llama.cpp ( #4036 )
...
* Bump llama.cpp to b2761
* Adjust types for bump
2024-04-29 23:18:48 -04:00
Patrick Devine
9009bedf13
better checking for OLLAMA_HOST variable ( #3661 )
2024-04-29 19:14:07 -04:00
Daniel Hiltgen
d4ac57e240
Merge pull request #4035 from dhiltgen/fix_relative_paths
...
Fix relative path lookup
2024-04-29 16:08:06 -07:00
Daniel Hiltgen
7b59d1770f
Fix relative path lookup
2024-04-29 16:00:08 -07:00
Jeffrey Morgan
95ead8ffba
Restart server on failure when running Windows app ( #3985 )
...
* app: restart server on failure
* fix linter
* address comments
* refactor log directory creation to be where logs are written
* check all log dir creation errors
2024-04-29 10:07:52 -04:00
Jeffrey Morgan
7aa08a77ca
llm: dont cap context window limit to training context window ( #3988 )
2024-04-29 10:07:30 -04:00
Blake Mizerany
7e432cdfac
types/model: remove old comment ( #4020 )
2024-04-28 20:52:26 -07:00
Jeffrey Morgan
586672f490
fix copying model to itself ( #4019 )
2024-04-28 23:47:49 -04:00
Daniel Hiltgen
b03408de74
Merge pull request #3972 from hmartinez82/win_arm64
...
Add support for building on Windows ARM64
2024-04-28 14:52:58 -07:00
Daniel Hiltgen
1e6a28bf5b
Merge pull request #4009 from dhiltgen/cpu_concurrency
...
Fix concurrency for CPU mode
2024-04-28 14:20:27 -07:00
Daniel Hiltgen
d6e3b64582
Fix concurrency for CPU mode
...
Prior refactoring passes accidentally removed the logic to bypass VRAM
checks for CPU loads. This adds that back, along with test coverage.
This also fixes loaded map access in the unit test to be behind the mutex which was
likely the cause of various flakes in the tests.
2024-04-28 13:42:39 -07:00
Blake Mizerany
114c932a8e
types/model: allow _ as starter character in Name parts ( #3991 )
2024-04-27 21:24:52 -07:00
Jeffrey Morgan
7f7103de06
mac: update setup command to llama3 ( #3986 )
2024-04-27 22:52:10 -04:00
Blake Mizerany
c631a9c726
types/model: relax name length constraint from 2 to 1 ( #3984 )
2024-04-27 17:58:41 -07:00
Blake Mizerany
8fd9e56804
types/structs: drop unused structs package ( #3981 )
2024-04-27 14:06:11 -07:00
Hernan Martinez
8a65717f55
Do not build AVX runners on ARM64
2024-04-26 23:55:32 -06:00
Hernan Martinez
6d3152a98a
Use architecture specific folders in installer script
2024-04-26 23:35:16 -06:00
Hernan Martinez
b438d485f1
Use architecture specific folders in the generate script
2024-04-26 23:34:12 -06:00
Hernan Martinez
204349b17b
Use architecture specific folders in the build script
2024-04-26 23:26:03 -06:00
Hernan Martinez
86e67fc4a9
Add import declaration for windows,arm64 to llm.go
2024-04-26 23:23:53 -06:00
Blake Mizerany
2bed62926e
types/model: remove Digest (for now) ( #3970 )
...
The Digest type needs more thought and is not necessary at the moment.
2024-04-26 21:14:28 -07:00
Jeffrey Morgan
aad8d128a0
also look at cwd as a root for windows runners ( #3959 )
2024-04-26 19:14:08 -04:00
Daniel Hiltgen
ec1acbb867
Merge pull request #3968 from dhiltgen/win_generate
...
Fine grain control over windows generate steps
2024-04-26 16:03:38 -07:00
Daniel Hiltgen
e4859c4563
Fine grain control over windows generate steps
...
This will speed up CI which already tries to only build static for unit tests
2024-04-26 15:49:46 -07:00
Nataly Merezhuk
8e30eb26bd
Updates the setup command to use llama3. ( #3962 )
2024-04-26 18:41:01 -04:00
Daniel Hiltgen
0b5c589ca2
Merge pull request #3966 from dhiltgen/bump
...
Fix target in gen_windows.ps1
2024-04-26 15:36:53 -07:00
Michael Yang
65fadddc85
Merge pull request #3964 from ollama/mxyng/weights
...
fix gemma, command-r layer weights
2024-04-26 15:23:33 -07:00
Daniel Hiltgen
ed5fb088c4
Fix target in gen_windows.ps1
2024-04-26 15:10:42 -07:00
Michael Yang
f81f308118
fix gemma, command-r layer weights
2024-04-26 15:00:55 -07:00
Blake Mizerany
b1390a7b37
types/model: export ParseNameBare and Merge ( #3957 )
...
These are useful outside this package.
2024-04-26 14:58:07 -07:00
Michael Yang
11d83386a5
Merge pull request #3951 from ollama/mxyng/zip
...
check file type before zip
2024-04-26 14:51:23 -07:00
Jeffrey Morgan
bb31def011
return code 499 when user cancels request while a model is loading ( #3955 )
2024-04-26 17:38:29 -04:00
Michael Yang
41e03ede95
check file type before zip
2024-04-26 14:18:07 -07:00
Michael Yang
7fea1ecdf6
Merge pull request #3958 from ollama/mxyng/fix-workflow
...
use merge base for diff-tree
2024-04-26 14:17:56 -07:00
Blake Mizerany
054894271d
.github/workflows/test.yaml: add in-flight cancellations on new push ( #3956 )
...
Also, remove a superfluous 'go get'
2024-04-26 13:54:24 -07:00
Michael Yang
6fef042f0b
use merge base for diff-tree
2024-04-26 13:54:15 -07:00
Daniel Hiltgen
5c0c2d1d09
Merge pull request #3954 from dhiltgen/ci_fixes
...
Put back non-avx CPU build for windows
2024-04-26 13:09:03 -07:00
Blake Mizerany
37f9c8ad99
types/model: overhaul Name and Digest types ( #3924 )
2024-04-26 13:08:32 -07:00
Quinten van Buul
2a80f55e2a
Update windows.md ( #3855 )
...
Fixed a typo
2024-04-26 16:04:15 -04:00
Daniel Hiltgen
421c878a2d
Put back non-avx CPU build for windows
2024-04-26 12:44:07 -07:00
Daniel Hiltgen
36666c2142
Merge pull request #3925 from dhiltgen/bump
...
Bump llama.cpp to b2737
2024-04-26 10:09:38 -07:00
Daniel Hiltgen
85801317d1
Fix clip log import
2024-04-26 09:43:46 -07:00
Daniel Hiltgen
2ed0d65948
Bump llama.cpp to b2737
2024-04-26 09:43:28 -07:00
Daniel Hiltgen
d459dc4ad1
Merge pull request #3950 from dhiltgen/windows_packaging
...
Fix exe name for zip packaging on windows
2024-04-26 09:27:37 -07:00
Daniel Hiltgen
40bc4622ef
Fix exe name for zip packaging on windows
...
The zip file encodes the OS and architecture, so keep the short exe name
2024-04-26 09:18:05 -07:00
Daniel Hiltgen
c0f818a07a
Merge pull request #3948 from dhiltgen/win_generate
...
Refactor windows generate for more modular usage
2024-04-26 09:17:20 -07:00
Daniel Hiltgen
8671fdeda6
Refactor windows generate for more modular usage
2024-04-26 08:35:50 -07:00
Daniel Hiltgen
2619850fb4
Merge pull request #3933 from dhiltgen/ci_fixes
...
Move cuda/rocm dependency gathering into generate script
2024-04-26 07:01:24 -07:00
Daniel Hiltgen
8feb97dc0d
Move cuda/rocm dependency gathering into generate script
...
This will make it simpler for CI to accumulate artifacts from prior steps
2024-04-25 22:38:44 -07:00
Daniel Hiltgen
4e1ff6dcbb
Merge pull request #3926 from dhiltgen/ci_fixes
...
Fix release CI
2024-04-25 17:42:31 -07:00
Daniel Hiltgen
8589d752ac
Fix release CI
...
download-artifact path was being used incorrectly. It is where to
extract the zip not the files in the zip to extract. Default is
workspace dir which is what we want, so omit it
2024-04-25 17:27:11 -07:00
Michael Yang
de4ded68b0
Merge pull request #3923 from ollama/mxyng/mem
...
only count output tensors
2024-04-25 16:34:17 -07:00
Daniel Hiltgen
9b5a3c5991
Merge pull request #3914 from dhiltgen/mac_perf
...
Improve mac parallel performance
2024-04-25 16:28:31 -07:00
Jeffrey Morgan
00b0699c75
Reload model if num_gpu changes ( #3920 )
...
* reload model if `num_gpu` changes
* dont reload on -1
* fix tests
2024-04-25 19:02:40 -04:00
Jeffrey Morgan
993cf8bf55
llm: limit generation to 10x context size to avoid run on generations ( #3918 )
...
* llm: limit generation to 10x context size to avoid run on generations
* add comment
* simplify condition statement
2024-04-25 19:02:30 -04:00
Michael Yang
7bb7cb8a60
only count output tensors
2024-04-25 15:24:08 -07:00
Daniel Hiltgen
b123be5b71
Adjust context size for parallelism
2024-04-25 13:58:54 -07:00
jmorganca
ddf5c09a9b
use matrix multiplcation kernels in more cases
2024-04-25 13:58:54 -07:00
Roy Yang
5f73c08729
Remove trailing spaces ( #3889 )
2024-04-25 14:32:26 -04:00
Daniel Hiltgen
f503a848c2
Merge pull request #3895 from brycereitano/shiftloading
...
Move ggml loading to when attempting to fit
2024-04-25 09:24:08 -07:00
Bryce Reitano
36a6daccab
Restructure loading conditional chain
2024-04-24 17:37:03 -06:00
Bryce Reitano
ceb0e26e5e
Provide variable ggml for TestLoad
2024-04-24 17:19:55 -06:00
Bryce Reitano
284e02bed0
Move ggml loading to when we attempt fitting
2024-04-24 17:17:24 -06:00
Michael Yang
3450a57d4a
Merge pull request #3713 from ollama/mxyng/modelname
...
update copy handler to use model.Name
2024-04-24 16:00:32 -07:00
Michael Yang
592dae31c8
update copy to use model.Name
2024-04-24 15:54:54 -07:00
Michael Yang
2010cbc5fa
Merge pull request #3833 from ollama/mxyng/fix-from
...
fix: from blob
2024-04-24 15:13:47 -07:00
Michael Yang
ac0801eced
only replace if it matches command
2024-04-24 14:49:26 -07:00
Michael Yang
ad66e5b060
split temp zip files
2024-04-24 14:18:01 -07:00
Blake Mizerany
ade4b55520
types/model: make ParseName use default without question ( #3886 )
2024-04-24 11:52:55 -07:00
Daniel Hiltgen
a6d62e0617
Merge pull request #3882 from dhiltgen/amd_gfx
...
AMD gfx patch rev is hex
2024-04-24 11:07:49 -07:00
Daniel Hiltgen
6e76348df7
Merge pull request #3834 from dhiltgen/not_found_in_path
...
Report errors on server lookup instead of path lookup failure
2024-04-24 10:50:48 -07:00
Daniel Hiltgen
0d6687f84c
AMD gfx patch rev is hex
...
Correctly handle gfx90a discovery
2024-04-24 09:43:52 -07:00
Patrick Devine
74d2a9ef9a
add OLLAMA_KEEP_ALIVE env variable to FAQ ( #3865 )
2024-04-23 21:06:51 -07:00
Patrick Devine
14476d48cc
fixes for gguf ( #3863 )
2024-04-23 20:57:20 -07:00
Patrick Devine
ce8ce82567
add mixtral 8x7b model conversion ( #3859 )
2024-04-23 20:17:04 -07:00
Blake Mizerany
4dc4f1be34
types/model: restrict digest hash part to a minimum of 2 characters ( #3858 )
...
This allows users of a valid Digest to know it has a minimum of 2
characters in the hash part for use when sharding.
This is a reasonable restriction as the hash part is a SHA256 hash which
is 64 characters long, which is the common hash used. There is no
anticipation of using a hash with less than 2 characters.
Also, add MustParseDigest.
Also, replace Digest.Type with Digest.Split for getting both the type
and hash parts together, which is most the common case when asking for
either.
2024-04-23 18:24:17 -07:00
Daniel Hiltgen
16b52331a4
Merge pull request #3857 from dhiltgen/mem_escape_valve
...
Add back memory escape valve
2024-04-23 17:32:24 -07:00
Daniel Hiltgen
5445aaa94e
Add back memory escape valve
...
If we get our predictions wrong, this can be used to
set a lower memory limit as a workaround. Recent multi-gpu
refactoring accidentally removed it, so this adds it back.
2024-04-23 17:09:02 -07:00
Daniel Hiltgen
2ac3dd6853
Merge pull request #3850 from dhiltgen/windows_packaging
...
Move nested payloads to installer and zip file on windows
2024-04-23 16:35:20 -07:00
Daniel Hiltgen
d8851cb7a0
Harden sched TestLoad
...
Give the go routine a moment to deliver the expired event
2024-04-23 16:14:47 -07:00
Daniel Hiltgen
058f6cd2cc
Move nested payloads to installer and zip file on windows
...
Now that the llm runner is an executable and not just a dll, more users are facing
problems with security policy configurations on windows that prevent users
writing to directories and then executing binaries from the same location.
This change removes payloads from the main executable on windows and shifts them
over to be packaged in the installer and discovered based on the executables location.
This also adds a new zip file for people who want to "roll their own" installation model.
2024-04-23 16:14:47 -07:00
Daniel Hiltgen
790cf34d17
Merge pull request #3846 from dhiltgen/missing_runner
...
Detect and recover if runner removed
2024-04-23 13:14:12 -07:00
Michael
928d844896
adding phi-3 mini to readme
...
adding phi-3 mini to readme
2024-04-23 13:58:31 -04:00
Daniel Hiltgen
939d6a8606
Make CI lint verbvose
2024-04-23 10:17:42 -07:00
Daniel Hiltgen
58888a74bc
Detect and recover if runner removed
...
Tmp cleaners can nuke the file out from underneath us. This detects the missing
runner, and re-initializes the payloads.
2024-04-23 10:05:26 -07:00
Daniel Hiltgen
cc5a71e0e3
Merge pull request #3709 from remy415/custom-gpu-defs
...
Adds support for customizing GPU build flags in llama.cpp
2024-04-23 09:28:34 -07:00
Michael Yang
e83bcf7f9a
Merge pull request #3836 from ollama/mxyng/mixtral
...
fix: mixtral graph
2024-04-23 09:15:10 -07:00
Daniel Hiltgen
5690e5ce99
Merge pull request #3418 from dhiltgen/concurrency
...
Request and model concurrency
2024-04-23 08:31:38 -07:00
Daniel Hiltgen
f2ea8470e5
Local unicode test case
2024-04-22 19:29:12 -07:00
Daniel Hiltgen
34b9db5afc
Request and model concurrency
...
This change adds support for multiple concurrent requests, as well as
loading multiple models by spawning multiple runners. The default
settings are currently set at 1 concurrent request per model and only 1
loaded model at a time, but these can be adjusted by setting
OLLAMA_NUM_PARALLEL and OLLAMA_MAX_LOADED_MODELS.
2024-04-22 19:29:12 -07:00
Daniel Hiltgen
8711d03df7
Report errors on server lookup instead of path lookup failure
2024-04-22 19:08:47 -07:00
Daniel Hiltgen
ee448deaba
Merge pull request #3835 from dhiltgen/harden_llm_override
...
Trim spaces and quotes from llm lib override
2024-04-22 19:06:54 -07:00
Bruce MacDonald
6e8db04716
tidy community integrations
...
- move some popular integrations to the top of the lists
2024-04-22 17:29:08 -07:00
Bruce MacDonald
658e60cf73
Revert "stop running model on interactive exit"
...
This reverts commit fad00a85e5 .
2024-04-22 17:23:11 -07:00
Bruce MacDonald
4c78f028f8
Merge branch 'main' of https://github.com/ollama/ollama
2024-04-22 17:22:28 -07:00
Michael Yang
435cc866a3
fix: mixtral graph
2024-04-22 17:19:44 -07:00
Hao Wu
c7d3a558f6
docs: update README to add chat (web UI) for LLM ( #3810 )
...
* add chat (web UI) for LLM
I have used chat with llama3 in local successfully and the code is MIT licensed.
* Update README.md
---------
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com >
2024-04-22 20:19:39 -04:00
Maple Gao
089cdb2877
docs: Update README for Lobe-chat integration. ( #3817 )
...
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com >
2024-04-22 20:18:15 -04:00
Võ Đình Đạt
ea1e9aa36b
Update README.md ( #3655 )
2024-04-22 20:16:55 -04:00
Jonathan Smoley
d0d28ef90d
Update README.md with Discord-Ollama project ( #3633 )
...
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com >
2024-04-22 20:14:20 -04:00
Eric Curtin
6654186a7c
Add podman-ollama to terminal apps ( #3626 )
...
The goal of podman-ollama is to make AI even more boring.
Signed-off-by: Eric Curtin <ecurtin@redhat.com >
2024-04-22 20:13:23 -04:00
Daniel Hiltgen
aa72281eae
Trim spaces and quotes from llm lib override
2024-04-22 17:11:14 -07:00
reid41
74bcbf828f
add qa-pilot link ( #3612 )
...
* add qa-pilot link
* format the link
* add shell-pilot
2024-04-22 20:10:34 -04:00
Christian Neff
fe39147e64
Add Chatbot UI v2 to Community Integrations ( #3503 )
2024-04-22 20:09:55 -04:00
Bruce MacDonald
fad00a85e5
stop running model on interactive exit
2024-04-22 16:22:14 -07:00
Jeremy
9c0db4cc83
Update gen_windows.ps1
...
Fixed improper env references
2024-04-21 16:13:41 -04:00
Cheng
62be2050dd
chore: use errors.New to replace fmt.Errorf will much better ( #3789 )
2024-04-20 22:11:06 -04:00
Blake Mizerany
56f8aa6912
types/model: export IsValidNamePart ( #3788 )
2024-04-20 18:26:34 -07:00
Sri Siddhaarth
e6f9bfc0e8
Update api.md ( #3705 )
2024-04-20 15:17:03 -04:00
Jeremy
6f18297b3a
Update gen_windows.ps1
...
Forgot a " on the write-host
2024-04-18 19:47:44 -04:00
Jeremy
15016413de
Update gen_windows.ps1
...
Added OLLAMA_CUSTOM_CUDA_DEFS and OLLAMA_CUSTOM_ROCM_DEFS to customize GPU builds on Windows
2024-04-18 19:27:16 -04:00
Jeremy
440b7190ed
Update gen_linux.sh
...
Added OLLAMA_CUSTOM_CUDA_DEFS and OLLAMA_CUSTOM_ROCM_DEFS instead of OLLAMA_CUSTOM_GPU_DEFS
2024-04-18 19:18:10 -04:00
Daniel Hiltgen
8d1995c625
Merge pull request #3708 from remy415/arm64static
...
move Ollama static build to its own flag
2024-04-18 16:04:12 -07:00
Daniel Hiltgen
fd01fbf038
Merge pull request #3710 from remy415/update-jetson-docs
...
update jetson tutorial
2024-04-18 16:02:08 -07:00
Blake Mizerany
0408205c1c
types/model: accept former : as a separator in digest ( #3724 )
...
This also converges the old sep `:` to the new sep `-`.
2024-04-18 14:17:46 -07:00
Jeffrey Morgan
63a7edd771
Update README.md
2024-04-18 16:09:38 -04:00
Michael
554ffdcce3
add llama3 to readme
...
add llama3 to readme
2024-04-18 15:18:48 -04:00
ManniX-ITA
c496967e56
Merge branch 'ollama:main' into mannix-server
2024-04-18 18:45:15 +02:00
Jeremy
9850a4ce08
Merge branch 'ollama:main' into update-jetson-docs
2024-04-18 09:55:17 -04:00
Jeremy
3934c15895
Merge branch 'ollama:main' into custom-gpu-defs
2024-04-18 09:55:10 -04:00
Jeremy
fd048f1367
Merge branch 'ollama:main' into arm64static
2024-04-18 09:55:04 -04:00
Michael Yang
8645076a71
Merge pull request #3712 from ollama/mxyng/mem
...
add stablelm graph calculation
2024-04-17 15:57:51 -07:00
Michael Yang
05e9424824
Merge pull request #3664 from ollama/mxyng/fix-padding-2
...
fix padding to only return padding
2024-04-17 15:57:40 -07:00
Michael Yang
52ebe67a98
Merge pull request #3714 from ollama/mxyng/model-name-host
...
types/model: support : in PartHost for host:port
2024-04-17 15:34:03 -07:00
Michael Yang
889b31ab78
types/model: support : in PartHost for host:port
2024-04-17 15:16:07 -07:00
Michael Yang
3cf483fe48
add stablelm graph calculation
2024-04-17 13:57:19 -07:00
Jeremy
8dca03173d
Merge remote-tracking branch 'upstream/main' into update-jetson-docs
2024-04-17 16:18:50 -04:00
Jeremy
85bdf14b56
update jetson tutorial
2024-04-17 16:17:42 -04:00
Jeremy
d524e5ef5e
Merge branch 'custom-gpu-defs' of https://github.com/remy415/ollama into custom-gpu-defs
2024-04-17 16:01:03 -04:00
Jeremy
52f5370c48
add support for custom gpu build flags for llama.cpp
2024-04-17 16:00:48 -04:00
Jeremy
da8a0c7657
Merge branch 'ollama:main' into arm64static
2024-04-17 15:22:34 -04:00
Jeremy
1b42b4b59a
Merge branch 'ollama:main' into custom-gpu-defs
2024-04-17 15:21:56 -04:00
Jeremy
7c000ec3ed
adds support for OLLAMA_CUSTOM_GPU_DEFS to customize GPU build flags
2024-04-17 15:21:05 -04:00
jmorganca
c8afe7168c
use correct extension for feature and model request issue templates
2024-04-17 15:18:40 -04:00
jmorganca
28d3cd0148
simpler feature and model request forms
2024-04-17 15:17:08 -04:00
jmorganca
eb5554232a
simpler feature and model request forms
2024-04-17 15:14:49 -04:00
Jeremy
ea4c284a48
Merge branch 'ollama:main' into arm64static
2024-04-17 15:11:38 -04:00
jmorganca
2bdc320216
add descriptions to issue templates
2024-04-17 15:08:36 -04:00
jmorganca
32561aed09
simplify github issue templates a bit
2024-04-17 15:07:03 -04:00
Michael Yang
71548d9829
Merge pull request #3706 from ollama/mxyng/mem
...
account for all non-repeating layers
2024-04-17 11:58:20 -07:00
Jeremy
8aec92fa6d
rearranged conditional logic for static build, dockerfile updated
2024-04-17 14:43:28 -04:00
Michael Yang
a8b9b930b4
account for all non-repeating layers
2024-04-17 11:21:21 -07:00
Michael
9755cf9173
acknowledge the amazing work done by Georgi and team!
2024-04-17 13:48:14 -04:00
Jeremy
70261b9bb6
move static build to its own flag
2024-04-17 13:04:28 -04:00
ManniX-ITA
c942e4a07b
Fixed startup sequence to report model loading
2024-04-17 17:40:32 +02:00
ManniX-ITA
bd54b08261
Streamlined WaitUntilRunning
2024-04-17 17:39:52 +02:00
Blake Mizerany
9df6c85c3a
types/model: add FilepathNoBuild ( #3680 )
...
Also, add test for DisplayLongest.
Also, plumb fill param to ParseName in MustParseName
2024-04-16 18:35:43 -07:00
Michael Yang
e74163af4c
fix padding to only return padding
2024-04-16 15:43:26 -07:00
Michael Yang
fb9580df85
Merge pull request #3684 from ollama/mxyng/scale-graph
...
scale graph based on gpu count
2024-04-16 14:57:09 -07:00
Michael Yang
26df674785
scale graph based on gpu count
2024-04-16 14:44:13 -07:00
Jeffrey Morgan
7c9792a6e0
Support unicode characters in model path ( #3681 )
...
* parse wide argv characters on windows
* cleanup
* move cleanup to end of `main`
2024-04-16 17:00:12 -04:00
Michael Yang
7afb2e125a
Merge pull request #3678 from ollama/mxyng/fix-darwin-partial-offloading
...
darwin: no partial offloading if required memory greater than system
2024-04-16 12:05:56 -07:00
Michael Yang
41a272de9f
darwin: no partial offloading if required memory greater than system
2024-04-16 11:22:38 -07:00
Jeffrey Morgan
f335722275
update llama.cpp submodule to 7593639 ( #3665 )
2024-04-15 23:04:43 -04:00
Michael Yang
6d53b67c2c
Merge pull request #3663 from ollama/mxyng/fix-padding
2024-04-15 17:44:54 -07:00
Michael Yang
969238b19e
fix padding in decode
...
TODO: update padding() to _only_ returning the padding
2024-04-15 17:27:06 -07:00
Blake Mizerany
949d7832cf
Revert "cmd: provide feedback if OLLAMA_MODELS is set on non-serve command ( #3470 )" ( #3662 )
...
This reverts commit 7d05a6ee8f .
This proved to be more painful than useful.
See: https://github.com/ollama/ollama/issues/3624
2024-04-15 16:58:00 -07:00
Sung Kim
99d227c9db
Added Solar example at README.md ( #3610 )
...
Added just one line
| Solar | 10.7B | 6.1GB | `ollama run solar` |
2024-04-15 19:54:23 -04:00
Carlos Gamez
a27e419b47
Update langchainjs.md ( #2030 )
...
Changed ollama.call() for ollama.invoke() as per deprecated documentation from langchain
2024-04-15 18:37:30 -04:00
Chandre Van Der Westhuizen
e4d0db5a97
Added MindsDB information ( #3595 )
...
* Added MindsDB information
Added more details to MindsDB so that Ollama users can know that they can connect their Ollama model with 200+ databases and apps
* updated text for mindsdb
2024-04-15 18:35:29 -04:00
Eli Bendersky
ba460802c2
examples: add more Go examples using the API ( #3599 )
...
* examples: go-multimodal
* examples: add go-pull-progress
* examples: add go-chat
* fix
2024-04-15 18:34:54 -04:00
Jeffrey Morgan
e54a3c7fcd
Update modelfile.md
...
Remove Modelfile parameters that are decided at runtime
2024-04-15 15:35:44 -04:00
Patrick Devine
9f8691c6c8
Add llama2 / torch models for ollama create ( #3607 )
2024-04-15 11:26:42 -07:00
Jeffrey Morgan
a0b8a32eb4
Terminate subprocess if receiving SIGINT or SIGTERM signals while model is loading ( #3653 )
...
* terminate subprocess if receiving `SIGINT` or `SIGTERM` signals while model is loading
* use `unload` in signal handler
2024-04-15 12:09:32 -04:00
Jeffrey Morgan
7027f264fb
app: gracefully shut down ollama serve on windows ( #3641 )
...
* app: gracefully shut down `ollama serve` on windows
* fix linter errors
* bring back `HideWindow`
* remove creation flags
* restore `windows.CREATE_NEW_PROCESS_GROUP`
2024-04-14 18:33:25 -04:00
Blake Mizerany
9bee3b63b1
types/model: add path helpers ( #3619 )
...
This commit adds path helpers for working with Names in URL and file
paths. The new helpers are ParseNameFromPath, ParseNameFromFilePath,
Name.Path, and Name.FilePath.
This commit also adds Name.DisplayLongest, and Name.DisplayLong.
Also, be it updates a place where strings.StripPrefix is more consistent
with the surrounding code.
Also, replace Parts with specific methods
2024-04-13 12:59:19 -07:00
Jeffrey Morgan
309aef7fee
update llama.cpp submodule to 4bd0f93 ( #3627 )
2024-04-13 10:43:02 -07:00
Blake Mizerany
08655170aa
types/model: make ParseName variants less confusing ( #3617 )
...
Also, fix http stripping bug.
Also, improve upon docs about fills and masks.
2024-04-12 13:57:57 -07:00
Blake Mizerany
2b341069a7
types/model: remove (*Digest).Scan and Digest.Value ( #3605 )
2024-04-11 13:32:31 -07:00
Daniel Hiltgen
c00fee6936
Merge pull request #3604 from dhiltgen/fix_rocm_deps
...
Fix rocm deps with new subprocess paths
2024-04-11 13:08:29 -07:00
Daniel Hiltgen
c2d813bdc3
Fix rocm deps with new subprocess paths
2024-04-11 12:52:06 -07:00
Michael Yang
786f3a1c44
Merge pull request #3600 from ollama/mxyng/mixtral
2024-04-11 12:23:37 -07:00
Michael Yang
3397eff0cd
mixtral mem
2024-04-11 11:10:41 -07:00
Blake Mizerany
0efb7931c7
Revert "types/model: remove (*Digest).Scan and Digest.Value ( #3589 )"
...
This reverts commit 42f2cc408e .
2024-04-11 00:45:07 -07:00
Blake Mizerany
42f2cc408e
types/model: remove (*Digest).Scan and Digest.Value ( #3589 )
2024-04-11 00:37:26 -07:00
Blake Mizerany
9446b795b5
types/model: remove DisplayLong ( #3587 )
2024-04-10 16:55:12 -07:00
Blake Mizerany
62f8cda3b3
types/model: remove MarshalText/UnmarshalText from Digest ( #3586 )
2024-04-10 16:52:49 -07:00
Blake Mizerany
6a1de23175
types/model: init with Name and Digest types ( #3541 )
2024-04-10 16:30:05 -07:00
Blake Mizerany
a7b431e743
server: provide helpful workaround hint when stalling on pull ( #3584 )
...
This is a quick fix to help users who are stuck on the "pull" step at
99%.
In the near future we're introducing a new registry client that
should/will hopefully be smarter. In the meantime, this should unblock
the users hitting issue #1736 .
2024-04-10 16:24:37 -07:00
Michael Yang
5a25f93522
Merge pull request #3478 from ollama/mxyng/tensor-layer
...
refactor tensor query
2024-04-10 12:45:03 -07:00
Michael Yang
7e33a017c0
partial offloading
2024-04-10 11:37:20 -07:00
Michael Yang
8b2c10061c
refactor tensor query
2024-04-10 11:37:20 -07:00
Michael Yang
c5c451ca3b
Merge pull request #3579 from ollama/mxyng/fix-ci
...
fix ci
2024-04-10 11:37:01 -07:00
Michael Yang
2b4ca6cf36
fix ci
2024-04-10 11:35:12 -07:00
Eli Bendersky
ad90b9ab3d
api: start adding documentation to package api ( #2878 )
...
* api: start adding documentation to package api
Updates #2840
* Fix lint typo report
2024-04-10 13:31:55 -04:00
Eli Bendersky
4340f8eba4
examples: start adding Go examples using api/ ( #2879 )
...
We can have the same examples as e.g. https://github.com/ollama/ollama-python/tree/main/examples
here. Using consistent naming and renaming the existing example to have -http-
since it uses direct HTTP requests rather than api/
Updates #2840
2024-04-10 13:26:45 -04:00
Daniel Hiltgen
4c7db6b7e9
Merge pull request #3566 from dhiltgen/more_time
...
Handle very slow model loads
2024-04-09 16:53:49 -07:00
Michael Yang
c03f0e3c3d
Merge pull request #3565 from ollama/mxyng/rope
...
fix: rope
2024-04-09 16:36:55 -07:00
Daniel Hiltgen
c5ff443b9f
Handle very slow model loads
...
During testing, we're seeing some models take over 3 minutes.
2024-04-09 16:35:10 -07:00
Michael Yang
01114b4526
fix: rope
2024-04-09 16:15:24 -07:00
Blake Mizerany
1524f323a3
Revert "build.go: introduce a friendlier way to build Ollama ( #3548 )" ( #3564 )
2024-04-09 15:57:45 -07:00
Blake Mizerany
fccf3eecaa
build.go: introduce a friendlier way to build Ollama ( #3548 )
...
This commit introduces a more friendly way to build Ollama dependencies
and the binary without abusing `go generate` and removing the
unnecessary extra steps it brings with it.
This script also provides nicer feedback to the user about what is
happening during the build process.
At the end, it prints a helpful message to the user about what to do
next (e.g. run the new local Ollama).
2024-04-09 14:18:47 -07:00
Michael Yang
c77d45d836
Merge pull request #3506 from ollama/mxyng/quantize-redux
...
cgo quantize
2024-04-09 12:32:53 -07:00
Jeffrey Morgan
5ec12cec6c
update llama.cpp submodule to 1b67731 ( #3561 )
2024-04-09 15:10:17 -04:00
Michael Yang
d9578d2bad
Merge pull request #3559 from ollama/mxyng/ci
...
ci: use go-version-file
2024-04-09 11:03:18 -07:00
Michael Yang
cb8352d6b4
ci: use go-version-file
2024-04-09 09:50:12 -07:00
Alex Mavrogiannis
fc6558f47f
Correct directory reference in macapp/README ( #3555 )
2024-04-09 09:48:46 -04:00
Michael Yang
9502e5661f
cgo quantize
2024-04-08 15:31:08 -07:00
Michael Yang
e1c9a2a00f
no blob create if already exists
2024-04-08 15:09:48 -07:00
writinwaters
1341ee1b56
Update README.md ( #3539 )
...
RAGFlow now supports integration with Ollama.
2024-04-08 10:58:14 -04:00
Jeffrey Morgan
63efa075a0
update generate scripts with new LLAMA_CUDA variable, set HIP_PLATFORM to avoid compiler errors ( #3528 )
2024-04-07 19:29:51 -04:00
Thomas Vitale
cb03fc9571
Docs: Remove wrong parameter for Chat Completion ( #3515 )
...
Fixes gh-3514
Signed-off-by: Thomas Vitale <ThomasVitale@users.noreply.github.com >
2024-04-06 09:08:35 -07:00
Michael Yang
a5ec9cfc0f
Merge pull request #3508 from ollama/mxyng/rope
2024-04-05 18:46:06 -07:00
Michael Yang
be517e491c
no rope parameters
2024-04-05 18:05:27 -07:00
Michael Yang
fc8e108642
Merge pull request #3496 from ollama/mxyng/cmd-r-graph
...
add command-r graph estimate
2024-04-05 12:26:21 -07:00
Daniel Hiltgen
c5d5c4a96c
Merge pull request #3491 from dhiltgen/context_bust_test
...
Add test case for context exhaustion
2024-04-04 16:20:20 -07:00
Daniel Hiltgen
dfe330fa1c
Merge pull request #3488 from mofanke/fix-windows-dll-compress
...
fix dll compress in windows building
2024-04-04 16:12:13 -07:00
Michael Yang
01f77ae25d
add command-r graph estimate
2024-04-04 14:07:24 -07:00
Daniel Hiltgen
483b81a863
Merge pull request #3494 from dhiltgen/ci_release
...
Fail fast if mingw missing on windows
2024-04-04 10:15:40 -07:00
Daniel Hiltgen
36bd967722
Fail fast if mingw missing on windows
2024-04-04 09:51:26 -07:00
Jeffrey Morgan
b0e7d35db8
use an older version of the mac os sdk in release ( #3484 )
2024-04-04 09:48:54 -07:00
Daniel Hiltgen
aeb1fb5192
Add test case for context exhaustion
...
Confirmed this fails on 0.1.30 with known regression
but passes on main
2024-04-04 07:42:17 -07:00
Daniel Hiltgen
a2e60ebcaf
Merge pull request #3490 from dhiltgen/ci_fixes
...
CI missing archive
2024-04-04 07:24:24 -07:00
Daniel Hiltgen
883ec4d1ef
CI missing archive
2024-04-04 07:23:27 -07:00
mofanke
4de0126719
fix dll compress in windows building
2024-04-04 21:27:33 +08:00
Daniel Hiltgen
9768e2dc75
Merge pull request #3481 from dhiltgen/ci_fixes
...
CI subprocess path fix
2024-04-03 19:29:09 -07:00
Daniel Hiltgen
08600d5bec
CI subprocess path fix
2024-04-03 19:12:53 -07:00
Daniel Hiltgen
a624e672d2
Merge pull request #3479 from dhiltgen/ci_fixes
...
Fix CI release glitches
2024-04-03 18:42:27 -07:00
Daniel Hiltgen
e4a7e5b2ca
Fix CI release glitches
...
The subprocess change moved the build directory
arm64 builds weren't setting cross-compilation flags when building on x86
2024-04-03 16:41:40 -07:00
Michael Yang
a0a15cfd5b
Merge pull request #3463 from ollama/mxyng/graph-estimate
...
update graph size estimate
2024-04-03 14:27:30 -07:00
Michael Yang
12e923e158
update graph size estimate
2024-04-03 13:34:12 -07:00
Jeffrey Morgan
cd135317d2
Fix macOS builds on older SDKs ( #3467 )
2024-04-03 10:45:54 -07:00
Michael Yang
4f895d633f
Merge pull request #3466 from ollama/mxyng/head-kv
...
default head_kv to 1
2024-04-03 10:41:00 -07:00
Blake Mizerany
7d05a6ee8f
cmd: provide feedback if OLLAMA_MODELS is set on non-serve command ( #3470 )
...
This also moves the checkServerHeartbeat call out of the "RunE" Cobra
stuff (that's the only word I have for that) to on-site where it's after
the check for OLLAMA_MODELS, which allows the helpful error message to
be printed before the server heartbeat check. This also arguably makes
the code more readable without the magic/superfluous "pre" function
caller.
2024-04-02 22:11:13 -07:00
Daniel Hiltgen
464d817824
Merge pull request #3464 from dhiltgen/subprocess
...
Fix numgpu opt miscomparison
2024-04-02 20:10:17 -07:00
Pier Francesco Contino
531324a9be
feat: add OLLAMA_DEBUG in ollama server help message ( #3461 )
...
Co-authored-by: Pier Francesco Contino <pfcontino@gmail.com >
2024-04-02 18:20:03 -07:00
Daniel Hiltgen
6589eb8a8c
Revert options as a ref in the server
2024-04-02 16:44:10 -07:00
Michael Yang
90f071c658
default head_kv to 1
2024-04-02 16:37:59 -07:00
Michael Yang
a039e383cd
Merge pull request #3465 from ollama/mxyng/fix-metal
...
fix metal gpu
2024-04-02 16:29:58 -07:00
Michael Yang
80163ebcb5
fix metal gpu
2024-04-02 16:06:45 -07:00
Daniel Hiltgen
a57818d93e
Merge pull request #3343 from dhiltgen/bump_more2
...
Bump llama.cpp to b2581
2024-04-02 15:08:26 -07:00
Daniel Hiltgen
841adda157
Fix windows lint CI flakiness
2024-04-02 12:22:16 -07:00
Daniel Hiltgen
0035e31af8
Bump to b2581
2024-04-02 11:53:07 -07:00
Daniel Hiltgen
c863c6a96d
Merge pull request #3218 from dhiltgen/subprocess
...
Switch back to subprocessing for llama.cpp
2024-04-02 10:49:44 -07:00
Daniel Hiltgen
1f11b52511
Refined min memory from testing
2024-04-01 16:48:33 -07:00
Daniel Hiltgen
526d4eb204
Release gpu discovery library after use
...
Leaving the cudart library loaded kept ~30m of memory
pinned in the GPU in the main process. This change ensures
we don't hold GPU resources when idle.
2024-04-01 16:48:33 -07:00
Daniel Hiltgen
0a74cb31d5
Safeguard for noexec
...
We may have users that run into problems with our current
payload model, so this gives us an escape valve.
2024-04-01 16:48:33 -07:00
Daniel Hiltgen
10ed1b6292
Detect too-old cuda driver
...
"cudart init failure: 35" isn't particularly helpful in the logs.
2024-04-01 16:48:33 -07:00
Daniel Hiltgen
4fec5816d6
Integration test improvements
...
Cleaner shutdown logic, a bit of response hardening
2024-04-01 16:48:18 -07:00
Daniel Hiltgen
0a0e9f3e0f
Apply 01-cache.diff
2024-04-01 16:48:18 -07:00
Daniel Hiltgen
58d95cc9bd
Switch back to subprocessing for llama.cpp
...
This should resolve a number of memory leak and stability defects by allowing
us to isolate llama.cpp in a separate process and shutdown when idle, and
gracefully restart if it has problems. This also serves as a first step to be
able to run multiple copies to support multiple models concurrently.
2024-04-01 16:48:18 -07:00
Patrick Devine
3b6a9154dd
Simplify model conversion ( #3422 )
2024-04-01 16:14:53 -07:00
Michael Yang
d6dd2ff839
Merge pull request #3241 from ollama/mxyng/mem
...
update memory estimations for gpu offloading
2024-04-01 13:59:14 -07:00
Michael Yang
e57a6ba89f
Merge pull request #2926 from ollama/mxyng/decode-ggml-v2
...
refactor model parsing
2024-04-01 13:58:13 -07:00
Michael Yang
12ec2346ef
Merge pull request #3442 from ollama/mxyng/generate-output
...
fix generate output
2024-04-01 13:56:09 -07:00
Michael Yang
1ec0df1069
fix generate output
2024-04-01 13:47:34 -07:00
Michael Yang
91b3e4d282
update memory calcualtions
...
count each layer independently when deciding gpu offloading
2024-04-01 13:16:32 -07:00
Michael Yang
d338d70492
refactor model parsing
2024-04-01 13:16:15 -07:00
Philipp Gillé
011bb67351
Add chromem-go to community integrations ( #3437 )
2024-04-01 11:17:37 -04:00
Saifeddine ALOUI
d124627202
Update README.md ( #3436 )
2024-04-01 11:16:31 -04:00
Jesse Zhang
b0a8246a69
Community Integration: CRAG Ollama Chat ( #3423 )
...
Corrective Retrieval Augmented Generation Demo, powered by Langgraph and Streamlit 🤗
Support:
- Ollama
- OpenAI APIs
2024-04-01 11:16:14 -04:00
Yaroslav
e6fb39c182
Update README.md ( #3378 )
...
Plugins list updated
2024-03-31 13:10:05 -04:00
sugarforever
e1f1c374ea
Community Integration: ChatOllama ( #3400 )
...
* Community Integration: ChatOllama
* fixed typo
2024-03-30 22:46:50 -04:00
Jeffrey Morgan
06a1508bfe
Update 90_bug_report.yml
2024-03-29 10:11:17 -04:00
Patrick Devine
5a5efee46b
Add gemma safetensors conversion ( #3250 )
...
Co-authored-by: Michael Yang <mxyng@pm.me >
2024-03-28 18:54:01 -07:00
Daniel Hiltgen
97ae517fbf
Merge pull request #3398 from dhiltgen/release_latest
...
CI automation for tagging latest images
2024-03-28 16:25:54 -07:00
Daniel Hiltgen
44b813e459
Merge pull request #3377 from dhiltgen/rocm_v6_bump
...
Bump ROCm to 6.0.2 patch release
2024-03-28 16:07:54 -07:00
Daniel Hiltgen
539043f5e0
CI automation for tagging latest images
2024-03-28 16:07:37 -07:00
Daniel Hiltgen
dbcace6847
Merge pull request #3392 from dhiltgen/ci_build_win_cuda
...
CI windows gpu builds
2024-03-28 16:03:52 -07:00
Daniel Hiltgen
c91a4ebcff
Bump ROCm to 6.0.2 patch release
2024-03-28 15:58:57 -07:00
Daniel Hiltgen
b79c7e4528
CI windows gpu builds
...
If we're doing generate, test windows cuda and rocm as well
2024-03-28 14:39:10 -07:00
Michael Yang
035b274b70
Merge pull request #3379 from ollama/mxyng/origins
...
fix: trim quotes on OLLAMA_ORIGINS
2024-03-28 14:14:18 -07:00
Michael Yang
9c6a254945
Merge pull request #3391 from ollama/mxyng-patch-1
2024-03-28 13:15:56 -07:00
Michael Yang
f31f2bedf4
Update troubleshooting link
2024-03-28 12:05:26 -07:00
Michael Yang
756c257553
Merge pull request #3380 from ollama/mxyng/conditional-generate
...
fix: workflows
2024-03-28 00:35:27 +01:00
Michael Yang
5255d0af8a
fix: workflows
2024-03-27 16:30:01 -07:00
Michael Yang
af8a8a6b59
fix: trim quotes on OLLAMA_ORIGINS
2024-03-27 15:24:29 -07:00
Michael Yang
461ad25015
Merge pull request #3376 from ollama/mxyng/conditional-generate
...
only generate on changes to llm subdirectory
2024-03-27 22:12:53 +01:00
Michael Yang
8838ae787d
stub stub
2024-03-27 13:59:12 -07:00
Michael Yang
db75402ade
mangle arch
2024-03-27 13:44:50 -07:00
Michael Yang
1e85a140a3
only generate on changes to llm subdirectory
2024-03-27 12:45:35 -07:00
Michael Yang
c363282fdc
Merge pull request #3375 from ollama/mxyng/conditional-generate
...
only generate cuda/rocm when changes to llm detected
2024-03-27 20:40:55 +01:00
Michael Yang
5b0c48d29e
only generate cuda/rocm when changes to llm detected
2024-03-27 12:23:09 -07:00
Jeffrey Morgan
913306f4fd
Detect arrow keys on windows ( #3363 )
...
* detect arrow keys on windows
* add some helpful comments
2024-03-26 18:21:56 -04:00
Jeffrey Morgan
f5ca7f8c8e
add license in file header for vendored llama.cpp code ( #3351 )
2024-03-26 16:23:23 -04:00
Jeffrey Morgan
856b8ec131
remove need for $VSINSTALLDIR since build will fail if ninja cannot be found ( #3350 )
2024-03-26 16:23:16 -04:00
Patrick Devine
1b272d5bcd
change github.com/jmorganca/ollama to github.com/ollama/ollama ( #3347 )
2024-03-26 13:04:17 -07:00
Christophe Dervieux
29715dbca7
malformed markdown link ( #3358 )
2024-03-26 10:46:36 -04:00
Daniel Hiltgen
54a028d07f
Merge pull request #3356 from dhiltgen/fix_arm_linux
...
Switch runner for final release job
2024-03-25 20:54:46 -07:00
Daniel Hiltgen
f83e4db365
Switch runner for final release job
...
The manifest and tagging step use a lot of disk space
2024-03-25 20:51:40 -07:00
Daniel Hiltgen
3b5866a233
Merge pull request #3353 from dhiltgen/fix_arm_linux
...
Use Rocky Linux Vault to get GCC 10.2 installed
2024-03-25 19:38:56 -07:00
Daniel Hiltgen
b8c2be6142
Use Rocky Linux Vault to get GCC 10.2 installed
...
This should hopefully only be a temporary workaround until Rocky 8
picks up GCC 10.4 which fixes the NVCC bug
2024-03-25 19:18:50 -07:00
Daniel Hiltgen
e0319bd78d
Revert "Switch arm cuda base image to centos 7"
...
This reverts commit 5dacc1ebe8 .
2024-03-25 19:01:11 -07:00
Daniel Hiltgen
b31ed7f031
Merge pull request #3352 from dhiltgen/fix_arm_linux
...
Switch arm cuda base image to centos 7
2024-03-25 16:13:10 -07:00
Daniel Hiltgen
5dacc1ebe8
Switch arm cuda base image to centos 7
...
We had started using rocky linux 8, but they've updated to GCC 10.3,
which breaks NVCC. 10.2 is compatible (or 10.4, but that's not
available from rocky linux 8 repos yet)
2024-03-25 15:57:08 -07:00
Daniel Hiltgen
c2712b5566
Merge pull request #3348 from dhiltgen/bump_llamacpp
...
Bump llama.cpp to b2527
2024-03-25 14:15:53 -07:00
Daniel Hiltgen
8091ef2eeb
Bump llama.cpp to b2527
2024-03-25 13:47:44 -07:00
Jeffrey Morgan
f38b705dc7
Fix ROCm link in development.md
2024-03-25 16:32:44 -04:00
Daniel Hiltgen
560be5e0b6
Merge pull request #3308 from dhiltgen/bump_more
...
Bump llama.cpp to b2510
2024-03-25 12:56:12 -07:00
Daniel Hiltgen
4a1c76b3aa
Merge pull request #3331 from dhiltgen/integration_testing
...
Integration tests conditionally pull
2024-03-25 12:48:51 -07:00
Daniel Hiltgen
28a64e23ca
Merge pull request #2279 from remy415/main
...
Add support for libcudart.so for CUDA devices (Adds Jetson support)
2024-03-25 12:46:28 -07:00
Niclas Pahlfer
92d74e2f59
adds ooo to community integrations ( #1623 )
...
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com >
2024-03-25 15:08:33 -04:00
Herval Freire
6f8f57dd1d
Add cliobot to ollama supported list ( #1873 )
...
* Update README.md
* Update README.md
---------
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com >
2024-03-25 15:07:19 -04:00
Chenhe Gu
b2fa68b0ea
Add Dify.AI to community integrations ( #1944 )
...
Dify.AI is a model-agnostic LLMOps platform for building and managing LLM applications.
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com >
2024-03-25 15:06:39 -04:00
Marco Antônio
3767d5ef0d
enh: add ollero.nvim to community applications ( #1905 )
...
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com >
2024-03-25 15:06:08 -04:00
Ani Betts
9fed85bc8b
Add typechat-cli to Terminal apps ( #2428 )
...
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com >
2024-03-25 15:05:04 -04:00
Miguel
4501bc0913
add new Web & Desktop link in readme for alpaca webui ( #2881 )
...
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com >
2024-03-25 15:00:18 -04:00
Danny Avila
57ba519e63
Add LibreChat to Web & Desktop Apps ( #2918 )
2024-03-25 14:59:18 -04:00
enoch1118
d98d322d24
Add Community Integration: OllamaGUI ( #2927 )
...
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com >
2024-03-25 14:58:28 -04:00
fly2tomato
0c3ec74cf1
Add Community Integration: OpenAOE ( #2946 )
...
* Update README.md
* Update README.md
---------
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com >
2024-03-25 14:57:40 -04:00
tusharhero
42ae8359fa
docs: Add AI telegram to Community Integrations. ( #3033 )
2024-03-25 14:56:42 -04:00
Timothy Carambat
e4b76dfb76
docs: Add AnythingLLM to README as integration option ( #3145 )
...
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com >
2024-03-25 14:54:48 -04:00
Jikku Jose
2c56517494
Add Saddle ( #3178 )
2024-03-25 14:54:09 -04:00
Yusuf Can Bayrak
cfbc1b152b
tlm added to README.md terminal section. ( #3274 )
2024-03-25 14:53:26 -04:00
RAPID ARCHITECT
9305ac1b2e
Update README.md ( #3288 )
...
Added Ollama Basic chat based on hyperdiv
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com >
2024-03-25 14:52:25 -04:00
drazdra
45d6292959
Update README.md ( #3338 )
...
adding drazdra/ollama-chats to the list of UI :)
2024-03-25 14:50:51 -04:00
Blake Mizerany
22921a3969
doc: specify ADAPTER is optional ( #3333 )
2024-03-25 09:43:19 -07:00
Daniel Hiltgen
7b6cbc10ec
Integration tests conditionally pull
...
If images aren't present, pull them.
Also fixes the expected responses
2024-03-25 08:57:45 -07:00
Jeremy
dfc6721b20
add support for libcudart.so for CUDA devices (adds Jetson support)
2024-03-25 11:07:44 -04:00
Blake Mizerany
acfa2b9422
llm: prevent race appending to slice ( #3320 )
2024-03-24 11:35:54 -07:00
Daniel Hiltgen
2c390a73ac
Merge pull request #3282 from dhiltgen/gpu_docs
...
Add docs for GPU selection and nvidia uvm workaround
2024-03-24 19:15:03 +01:00
Daniel Hiltgen
3e30c75f3e
Bump llama.cpp to b2510
2024-03-23 19:55:56 +01:00
Eddú Meléndez Gonzales
7e430ff352
Add Testcontainers into Libraries section ( #3291 )
...
Testcontainers provides a module for Ollama.
2024-03-23 19:55:25 +01:00
Daniel Hiltgen
1784113ef5
Merge pull request #3309 from dhiltgen/integration_testing
...
Revamp go based integration tests
2024-03-23 19:08:49 +01:00
Daniel Hiltgen
949b6c01e0
Revamp go based integration tests
...
This uplevels the integration tests to run the server which can allow
testing an existing server, or a remote server.
2024-03-23 14:24:18 +01:00
jmorganca
38daf0a252
rename .gitattributes
2024-03-23 12:40:31 +01:00
Daniel Hiltgen
43799532c1
Bump llama.cpp to b2474
...
The release just before ggml-cuda.cu refactoring
2024-03-23 09:54:56 +01:00
Daniel Hiltgen
d8fdbfd8da
Add docs for GPU selection and nvidia uvm workaround
2024-03-21 11:52:54 +01:00
Bruce MacDonald
a5ba0fcf78
doc: faq gpu compatibility ( #3142 )
2024-03-21 05:21:34 -04:00
Jeffrey Morgan
3a30bf56dc
Update faq.md
2024-03-20 17:48:39 +01:00
Daniel Hiltgen
a1c0a48524
Merge pull request #3122 from dhiltgen/better_tmp_cleanup
...
Better tmpdir cleanup
2024-03-20 16:28:03 +01:00
Daniel Hiltgen
74788b487c
Better tmpdir cleanup
...
If expanding the runners fails, don't leave a corrupt/incomplete payloads dir
We now write a pid file out to the tmpdir, which allows us to scan for stale tmpdirs
and remove this as long as there isn't still a process running.
2024-03-20 16:03:19 +01:00
Jeffrey Morgan
7ed3e94105
Update faq.md
2024-03-18 10:24:39 +01:00
jmorganca
2297ad39da
update faq.md
2024-03-18 10:17:59 +01:00
Michael Yang
01cff6136d
Merge pull request #3217 from ollama/mxyng/cleanup
...
remove global
2024-03-18 02:13:30 -07:00
Michael Yang
3c4ad0ecab
dyn global
2024-03-18 09:45:45 +01:00
Michael Yang
22f326464e
Merge pull request #3083 from ollama/mxyng/refactor-readseeker
...
refactor readseeker
2024-03-16 12:08:56 -07:00
Jeffrey Morgan
e95ffc7448
llama: remove server static assets ( #3174 )
2024-03-15 19:24:12 -07:00
Jeffrey Morgan
2dce1ab40b
add llm/ext_server directory to linguist-vendored ( #3173 )
2024-03-15 17:46:46 -07:00
Daniel Hiltgen
f4b31c2d53
Merge pull request #3111 from alitrack/main
...
Update ollama.iss
2024-03-15 16:46:59 -07:00
Daniel Hiltgen
ab3456207b
Merge pull request #3028 from ollama/ci_release
...
CI release process
2024-03-15 16:40:54 -07:00
Daniel Hiltgen
6ad414f31e
Merge pull request #3086 from dhiltgen/import_server
...
Import server.cpp to retain llava support
2024-03-15 16:10:35 -07:00
Daniel Hiltgen
052b5a3b77
Merge pull request #3171 from dhiltgen/rocm_94x
...
Add Radeon gfx940-942 GPU support
2024-03-15 15:58:33 -07:00
Daniel Hiltgen
d4c10df2b0
Add Radeon gfx940-942 GPU support
2024-03-15 15:34:58 -07:00
Daniel Hiltgen
540f4af45f
Wire up more complete CI for releases
...
Flesh out our github actions CI so we can build official releaes.
2024-03-15 12:37:36 -07:00
Blake Mizerany
6ce37e4d96
llm,readline: use errors.Is instead of simple == check ( #3161 )
...
This fixes some brittle, simple equality checks to use errors.Is. Since
go1.13, errors.Is is the idiomatic way to check for errors.
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com >
2024-03-15 07:14:12 -07:00
Blake Mizerany
703684a82a
server: replace blob prefix separator from ':' to '-' ( #3146 )
...
This fixes issues with blob file names that contain ':' characters to be rejected by file systems that do not support them.
2024-03-14 20:18:06 -07:00
Daniel Hiltgen
6459377ae0
Add ROCm support to linux install script ( #2966 )
2024-03-14 18:00:16 -07:00
Blake Mizerany
8546dd3d72
.github: fix model and feature request yml ( #3155 )
2024-03-14 15:26:06 -07:00
Blake Mizerany
87100be5e0
.github: add issue templates ( #3143 )
2024-03-14 15:19:10 -07:00
Michael Yang
e87c780ff9
Merge pull request #3149 from ollama/mxyng/fix-memory-leak
...
fix: clip memory leak
2024-03-14 13:34:15 -07:00
Michael Yang
291c663865
fix: clip memory leak
2024-03-14 13:12:42 -07:00
Daniel Hiltgen
da20786e3e
Merge pull request #3068 from dhiltgen/win_pipe
...
Use stdin for term discovery on windows
2024-03-14 11:55:19 -07:00
Jeffrey Morgan
5ce997a7b9
Update README.md
2024-03-13 21:12:17 -07:00
Jeffrey Morgan
672ffe9b7d
add OLLAMA_KEEP_ALIVE to environment variable docs for ollama serve ( #3127 )
2024-03-13 14:35:33 -07:00
Patrick Devine
47cfe58af5
Default Keep Alive environment variable ( #3094 )
...
---------
Co-authored-by: Chris-AS1 <8493773+Chris-AS1@users.noreply.github.com >
2024-03-13 13:29:40 -07:00
Daniel Hiltgen
c1a81c6fe3
Use stdin for term discovery on windows
...
When you feed input to the cmd via a pipe it no longer reports a warning
2024-03-13 10:37:31 -07:00
Steven Lee
152ab524c2
Update ollama.iss
...
add arm64 support
2024-03-13 20:15:45 +08:00
Jeffrey Morgan
e72c567cfd
restore locale patch ( #3091 )
2024-03-12 22:08:13 -07:00
Bruce MacDonald
3e22611200
token repeat limit for prediction requests ( #3080 )
2024-03-12 22:08:25 -04:00
Daniel Hiltgen
a54d4a28dc
Merge pull request #3088 from dhiltgen/rocm_igpu_linux
...
Fix iGPU detection for linux
2024-03-12 17:20:27 -07:00
Daniel Hiltgen
82b0c7c27e
Fix iGPU detection for linux
...
This fixes a few bugs in the new sysfs discovery logic. iGPUs are now
correctly identified by their <1G VRAM reported. the sysfs IDs are off
by one compared to what HIP wants due to the CPU being reported
in amdgpu, but HIP only cares about GPUs.
2024-03-12 16:57:19 -07:00
Patrick Devine
ba7cf7fb66
add more docs on for the modelfile message command ( #3087 )
2024-03-12 16:41:41 -07:00
Bruce MacDonald
2f804068bd
warn when json format is expected but not mentioned in prompt ( #3081 )
2024-03-12 19:07:11 -04:00
Daniel Hiltgen
85129d3a32
Adapt our build for imported server.cpp
2024-03-12 14:57:15 -07:00
Daniel Hiltgen
9ac6440da3
Import server.cpp as of b2356
2024-03-12 13:58:06 -07:00
Michael Yang
0085297928
refactor readseeker
2024-03-12 12:54:18 -07:00
Daniel Hiltgen
34d00f90b1
Merge pull request #3070 from dhiltgen/visible_devices
...
Add docs explaining GPU selection env vars
2024-03-12 11:36:46 -07:00
Daniel Hiltgen
b53229a2ed
Add docs explaining GPU selection env vars
2024-03-12 11:33:06 -07:00
racerole
53c107e20e
chore: fix typo ( #3073 )
...
Signed-off-by: racerole <jiangyifeng@outlook.com >
2024-03-12 14:09:22 -04:00
mofanke
51578d8573
fix gpu_info_cuda.c compile warning ( #3077 )
2024-03-12 14:08:40 -04:00
Jeffrey Morgan
b5fcd9d3aa
use -trimpath when building releases ( #3069 )
2024-03-11 15:58:46 -07:00
Bruce MacDonald
b80661e8c7
relay load model errors to the client ( #3065 )
2024-03-11 16:48:27 -04:00
Jeffrey Morgan
6d3adfbea2
Update troubleshooting.md
2024-03-11 13:22:28 -07:00
Jeffrey Morgan
369eda65f5
update llama.cpp submodule to ceca1ae ( #3064 )
2024-03-11 12:57:48 -07:00
Michael Yang
f878e91070
Merge pull request #3044 from ollama/mxyng/fix-convert-shape
...
convert: fix shape
2024-03-11 09:56:57 -07:00
Daniel Hiltgen
0d651478e4
Merge pull request #3056 from dhiltgen/rocm_link_clash
...
Avoid rocm runner and dependency clash
2024-03-11 09:48:48 -07:00
Michael Yang
9ea492f1ce
convert: fix shape
2024-03-11 09:41:01 -07:00
Daniel Hiltgen
bc13da2bfe
Avoid rocm runner and dependency clash
...
Putting the rocm symlink next to the runners is risky. This moves
the payloads into a subdir to avoid potential clashes.
2024-03-11 09:33:22 -07:00
Jeffrey Morgan
41b00b9856
fix 03-locale.diff
2024-03-10 16:21:05 -07:00
Daniel Hiltgen
c2a8ed48e7
Merge pull request #3048 from dhiltgen/harden_rocm_deps
...
Harden for deps file being empty (or short)
2024-03-10 15:17:22 -07:00
Daniel Hiltgen
3dc1bb6a35
Harden for deps file being empty (or short)
2024-03-10 14:45:38 -07:00
Daniel Hiltgen
7865a6996a
Merge pull request #3046 from dhiltgen/rocm_search_paths
...
Add ollama executable peer dir for rocm
2024-03-10 12:30:56 -07:00
Daniel Hiltgen
00ec269321
Add ollama executable peer dir for rocm
...
This allows people who package up ollama on their own to place
the rocm dependencies in a peer directory to the ollama executable
much like our windows install flow.
2024-03-10 12:16:30 -07:00
Jeffrey Morgan
908005d90b
patch: use default locale in wpm tokenizer ( #3034 )
2024-03-09 21:12:12 -08:00
Jeffrey Morgan
cdf65e793f
only copy deps for amd64 in build_linux.sh
2024-03-09 17:55:22 -08:00
Daniel Hiltgen
82ca694d68
Rename ROCm deps file to avoid confusion ( #3025 )
2024-03-09 17:48:38 -08:00
Jeffrey Morgan
5017a15bcb
add macapp to .dockerignore
2024-03-09 16:07:06 -08:00
Jeffrey Morgan
e11668aa07
add bundle_metal and cleanup_metal funtions to gen_darwin.sh
2024-03-09 16:04:57 -08:00
Jeffrey Morgan
0bd0f4a29c
tidy cleanup logs
2024-03-09 15:56:48 -08:00
Jeffrey Morgan
1ffb1e2874
update llama.cpp submodule to 77d1ac7 ( #3030 )
2024-03-09 15:55:34 -08:00
Daniel Hiltgen
0a7844413c
Merge pull request #3026 from dhiltgen/win_rocm_docs
...
Doc how to set up ROCm builds on windows
2024-03-09 14:17:19 -08:00
Jeffrey Morgan
f9cd55c70b
disable gpu for certain model architectures and fix divide-by-zero on memory estimation
2024-03-09 12:51:38 -08:00
Daniel Hiltgen
0fdebb34a9
Doc how to set up ROCm builds on windows
2024-03-09 11:29:45 -08:00
Daniel Hiltgen
ac64cd4ef9
Merge pull request #3008 from dhiltgen/no_more_idempotent
...
Finish unwinding idempotent payload logic
2024-03-09 09:13:24 -08:00
Daniel Hiltgen
4a5c9b8035
Finish unwinding idempotent payload logic
...
The recent ROCm change partially removed idempotent
payloads, but the ggml-metal.metal file for mac was still
idempotent. This finishes switching to always extract
the payloads, and now that idempotentcy is gone, the
version directory is no longer useful.
2024-03-09 08:34:39 -08:00
Jeffrey Morgan
efe5617b64
update llama.cpp submodule to c2101a2 ( #3020 )
2024-03-09 00:44:50 -08:00
Jeffrey Morgan
5b3fad9636
separate out isLocalIP
2024-03-09 00:22:08 -08:00
Jeffrey Morgan
bfec2c6e10
simplify host checks
2024-03-08 23:29:53 -08:00
Jeffrey Morgan
5c143af726
add additional allowed hosts
2024-03-08 23:23:59 -08:00
Jeffrey Morgan
6c0af2599e
Update docs README.md and table of contents
2024-03-08 22:45:11 -08:00
Jeffrey Morgan
fc8c044584
add allowed host middleware and remove workDir middleware ( #3018 )
2024-03-08 22:23:47 -08:00
Michael Yang
ecc133d843
Merge pull request #3014 from ollama/mxyng/decode-ggla
2024-03-08 16:14:53 -08:00
Michael Yang
76bdebbadf
decode ggla
2024-03-08 15:46:25 -08:00
Michael Yang
18979ad4a1
convert: fix default shape
2024-03-08 15:42:48 -08:00
Michael Yang
8e0ef931d8
Merge pull request #2990 from ollama/mxyng/default-term-size
...
fix: default terminal width, height
2024-03-08 15:20:54 -08:00
Daniel Hiltgen
280da44522
Merge pull request #2988 from dhiltgen/rocm_docs
...
Refined ROCm troubleshooting docs
2024-03-08 13:33:30 -08:00
Bruce MacDonald
0cebc79cba
fix: allow importing a model from name reference ( #3005 )
2024-03-08 12:27:47 -05:00
Jeffrey Morgan
0e4669b04f
update llama.cpp submodule to 6cdabe6 ( #2999 )
2024-03-08 00:26:20 -08:00
Jeffrey Morgan
b886bec3f9
Update api.md
2024-03-07 23:27:51 -08:00
Jeffrey Morgan
fc06205971
Revert "adjust download and upload concurrency based on available bandwidth" ( #2995 )
2024-03-07 18:10:16 -08:00
Blake Mizerany
2ada81e068
cmd: tighten up env var usage sections ( #2962 )
...
Also, document OLLAMA_HOST client semantics per command that honors it.
This looks nicer than having a general puprose environment variable
section in the root usage which was showing up after the "addition help
topics" section outputed by Cobra's default template.
It was decided this was easier to work with than using a custom template
for Cobra right now.
2024-03-07 13:57:07 -08:00
Michael Yang
b1e74d4fda
default terminal width, height
2024-03-07 11:35:42 -08:00
Michael Yang
f678f5c5c3
Merge pull request #2991 from ollama/mxyng/fix-ci
...
fix ci
2024-03-07 11:35:06 -08:00
Michael Yang
2cb74e23fb
fix ci
2024-03-07 11:33:49 -08:00
Daniel Hiltgen
69f0227813
Refined ROCm troubleshooting docs
2024-03-07 11:22:37 -08:00
Daniel Hiltgen
3c8df3808b
Merge pull request #2885 from dhiltgen/rocm_v6_only
...
Revamp ROCm support
2024-03-07 10:51:00 -08:00
Michael Yang
7d564835c2
Merge pull request #2985 from ollama/rm-empty-examples
...
remove empty examples
2024-03-07 10:49:40 -08:00
Michael Yang
72431031d9
no ci test on docs, examples
2024-03-07 10:44:48 -08:00
Michael Yang
6041abb5b2
remove empty examples
2024-03-07 10:40:32 -08:00
Daniel Hiltgen
6c5ccb11f9
Revamp ROCm support
...
This refines where we extract the LLM libraries to by adding a new
OLLAMA_HOME env var, that defaults to `~/.ollama` The logic was already
idempotenent, so this should speed up startups after the first time a
new release is deployed. It also cleans up after itself.
We now build only a single ROCm version (latest major) on both windows
and linux. Given the large size of ROCms tensor files, we split the
dependency out. It's bundled into the installer on windows, and a
separate download on windows. The linux install script is now smart and
detects the presence of AMD GPUs and looks to see if rocm v6 is already
present, and if not, then downloads our dependency tar file.
For Linux discovery, we now use sysfs and check each GPU against what
ROCm supports so we can degrade to CPU gracefully instead of having
llama.cpp+rocm assert/crash on us. For Windows, we now use go's windows
dynamic library loading logic to access the amdhip64.dll APIs to query
the GPU information.
2024-03-07 10:36:50 -08:00
Michael Yang
2e20110e50
Merge pull request #2221 from ollama/mxyng/up-down-ccy
...
adjust download and upload concurrency based on available bandwidth
2024-03-07 09:27:33 -08:00
Daniel Hiltgen
82ddc3e441
Merge pull request #2964 from dhiltgen/mem_limit_var
...
Allow setting max vram for workarounds
2024-03-07 09:25:44 -08:00
Jeffrey Morgan
d481fb3cc8
update go to 1.22 in other places ( #2975 )
2024-03-07 07:39:49 -08:00
DJ Johnson
23ee633252
docs: Add LLM-X to Web Integration section ( #2759 )
2024-03-07 10:11:53 -05:00
John
23ebe8fe11
fix some typos ( #2973 )
...
Signed-off-by: hishope <csqiye@126.com >
2024-03-06 22:50:11 -08:00
Patrick Devine
2c017ca441
Convert Safetensors to an Ollama model ( #2824 )
2024-03-06 21:01:51 -08:00
Daniel Hiltgen
be330174dd
Allow setting max vram for workarounds
...
Until we get all the memory calculations correct, this can provide
and escape valve for users to workaround out of memory crashes.
2024-03-06 17:15:06 -08:00
Blake Mizerany
0ded7fdc4b
cmd: document environment variables for serve command
...
Updates #2944
2024-03-06 13:48:46 -08:00
Leo
2103a5073c
Add Odin Runes, a Feature-Rich Java UI for Ollama, to README ( #2440 )
...
* Add Odin Runes to README
Add Odin Runes to README
This commit adds Odin Runes to the "Community Integrations" section of the README. Odin Runes is a Java-based GPT client designed to provide seamless interaction with GPT models, enhancing productivity in prompt engineering and text generation tasks. This addition highlights the integration between Odin Runes and Ollama, offering users the flexibility to leverage large language models locally within their development workflow.
* Update README.md
this commit applies the comments of the reviewer.
2024-03-06 11:57:49 -08:00
Jeffrey Morgan
ce9f7c4674
Update api.md
2024-03-05 13:13:23 -08:00
Anders Rex
e5596c1944
Add NotesOllama to Community Integrations ( #2909 )
2024-03-04 01:18:10 -08:00
Timothy Graupmann
9bc3fee694
Added community link for Ollama Copilot ( #2582 )
...
* Added community link for Ollama Copilot
* Update README.md
---------
Co-authored-by: Michael <mchiang0610@users.noreply.github.com >
2024-03-04 00:40:36 -08:00
Jeffrey Morgan
21347e1ed6
update llama.cpp submodule to c29af7e ( #2868 )
2024-03-01 15:26:04 -08:00
Jeffrey Morgan
3b4bab3dc5
Fix embeddings load model behavior ( #2848 )
2024-02-29 17:40:56 -08:00
Daniel Hiltgen
cbd6e3b38e
Merge pull request #2838 from dhiltgen/opensuse
...
Add ollama user to video group
2024-02-29 15:47:56 -08:00
Daniel Hiltgen
b830afa716
Merge pull request #2837 from dhiltgen/podman_image_support
...
Add env var so podman will map cuda GPUs
2024-02-29 15:47:37 -08:00
Daniel Hiltgen
bd1d8b0d14
Merge pull request #2836 from bmwiedemann/gzip
...
Omit build date from gzip headers
2024-02-29 15:46:46 -08:00
fred-bf
25c2912120
Add Community Integration: NextChat ( #2780 )
2024-02-29 12:12:13 -08:00
Michael Yang
0e19476b56
prepend image tags ( #2789 )
...
instead of appending image tags, prepend them - this generally produces better results
2024-02-29 11:30:14 -08:00
tylinux
fa2f2b3563
fix: print usedMemory size right ( #2827 )
2024-02-29 11:11:04 -08:00
Jeffrey Morgan
cbf4970e0f
bump submodule to 87c91c07663b707e831c59ec373b5e665ff9d64a ( #2828 )
2024-02-29 09:42:08 -08:00
Daniel Hiltgen
74468513bd
Add ollama user to video group
...
On OpenSUSE, ollama needs to be a member of the video group
to access the GPU
2024-02-29 08:50:10 -08:00
Daniel Hiltgen
794a916a72
Add env var so podman will map cuda GPUs
...
Without this env var, podman's GPU logic doesn't map the GPU through
2024-02-29 08:43:08 -08:00
Bernhard M. Wiedemann
76e5d9ec88
Omit build date from gzip headers
...
See https://reproducible-builds.org/ for why this is good.
This patch was done while working on reproducible builds for openSUSE.
2024-02-29 16:48:19 +01:00
Daniel Hiltgen
076237b8ea
Merge pull request #2771 from dhiltgen/toggle_models
...
Bump llama.cpp to b2276
2024-02-27 11:29:53 -08:00
Daniel Hiltgen
53d694c67f
Merge pull request #2772 from dhiltgen/container_image
...
Refine container image build script
2024-02-27 11:29:08 -08:00
Daniel Hiltgen
5aa6bfea94
Merge pull request #2785 from dhiltgen/win_download
...
Log unexpected server errors checking for update
2024-02-27 10:43:14 -08:00
Daniel Hiltgen
1cde63dd64
Log unexpected server errors checking for update
...
This should unmask some failure modes that likely
show up in app logs as unmarshal errors
2024-02-27 09:17:04 -08:00
Daniel Hiltgen
98e0b7e94f
Refine container image build script
...
Allow overriding the platform, image name, and tag latest for
standard and rocm images.
2024-02-26 17:26:49 -08:00
Daniel Hiltgen
061e8f6abc
Bump llama.cpp to b2276
2024-02-26 16:49:24 -08:00
peanut256
a189810df6
Determine max VRAM on macOS using recommendedMaxWorkingSetSize ( #2354 )
...
* read iogpu.wired_limit_mb on macOS
Fix for https://github.com/ollama/ollama/issues/1826
* improved determination of available vram on macOS
read the recommended maximal vram on macOS via Metal API
* Removed macOS-specific logging
* Remove logging from gpu_darwin.go
* release Core Foundation object
fixes a possible memory leak
2024-02-25 18:16:45 -05:00
Ikko Eltociear Ashimine
e95b896790
Update types.go ( #2744 )
...
specfied -> specified
2024-02-25 13:41:25 -05:00
elthommy
1f087c4d26
Update langchain python tutorial ( #2737 )
...
Remove unused GPT4all
Use nomic-embed-text as embedded model
Fix a deprecation warning (__call__)
2024-02-25 00:31:36 -05:00
Jeffrey Morgan
5d7ea6616f
no extra disk space for windows installation ( #2739 )
2024-02-25 00:20:35 -05:00
Michael Yang
2a4b128ae3
Merge pull request #2719 from ollama/mxyng/format-private-key
...
remove format private key
2024-02-23 17:15:14 -08:00
Michael Yang
fc483274ad
clean up go.mod
2024-02-23 16:53:36 -08:00
Michael Yang
fd10a2ad4b
remove format/openssh.go
...
this is unnecessary now that x/crypto/ssh.MarshalPrivateKey has been
added
2024-02-23 16:52:23 -08:00
Benn Huang
b291f63188
Add Community Integration: Chatbox
...
Co-authored-by: bennhuang <bennhuang@tencent.com >
2024-02-23 07:17:28 -05:00
Jeffrey Morgan
f58856bf6f
better directory cleanup in ollama.iss
2024-02-23 07:14:59 -05:00
Jeffrey Morgan
275ea01587
restore windows build flags and compression
2024-02-22 18:07:18 -05:00
Jeffrey Morgan
8782dd5628
fix build_windows.ps1 script to run go build with the correct flags
2024-02-22 17:41:43 -05:00
Jeffrey Morgan
11bfff8ee1
update llama.cpp submodule to 96633eeca1265ed03e57230de54032041c58f9cd
2024-02-22 16:44:26 -05:00
Logan Yang
7c0167a8f6
Add copilot for obsidian plugin to community integration ( #1918 )
2024-02-22 14:17:20 -05:00
LangChain4j
74d898e37d
Added LangChain4j links ( #1690 )
2024-02-22 14:09:08 -05:00
Yuan-Man
c6e8b00718
Add README.md ( #2249 )
2024-02-22 14:03:44 -05:00
B-Tocs.org Community
be9980ef13
Update README.md - Ollama for SAP ABAP ( #2510 )
2024-02-22 13:12:27 -05:00
Augustinas Malinauskas
646a0dedb9
Update README.md ( #2504 )
...
- Enchanted is now supported for desktop on macOS
2024-02-22 13:09:29 -05:00
Azhar Khan
7f964d938c
update README to add Gemma 2B, 7B model in Model Library Table ( #2686 )
2024-02-22 13:07:47 -05:00
Pavel Frankov
e6b8a139ff
Update README.md ( #2138 )
2024-02-22 10:52:36 -05:00
Jeffrey Morgan
bdc0ea1ba5
Update import.md
2024-02-22 02:08:03 -05:00
Jeffrey Morgan
7fab7918cc
Update import.md
2024-02-22 02:06:24 -05:00
Michael Yang
74c1bdba0d
Merge pull request #2657 from joshyan1/patch-1
...
Update install.sh success message
2024-02-21 15:55:20 -08:00
Josh
f983ef7f5f
Update install.sh success message
2024-02-21 18:30:01 -05:00
Jeffrey Morgan
1ae1c33651
Windows build + installer adjustments ( #2656 )
...
* remove `-w -s` linker flags on windows
* use `zip` for windows installer compression
2024-02-21 18:21:26 -05:00
Michael Yang
084d846621
refactor
2024-02-21 13:42:48 -08:00
Michael Yang
6a4b994433
lint
2024-02-21 13:42:48 -08:00
Michael Yang
bea007deb7
use LimitGroup for uploads
2024-02-21 13:42:48 -08:00
Michael Yang
074934be03
adjust group limit based on download speed
2024-02-21 13:42:48 -08:00
Michael Yang
0de12368a0
add new LimitGroup for dynamic concurrency
2024-02-21 13:42:48 -08:00
Michael Yang
917bd61084
refactor download run
2024-02-21 13:42:46 -08:00
Jeffrey Morgan
efe040f8c0
reset with init_vars ahead of each cpu build in gen_windows.ps1 ( #2654 )
2024-02-21 16:35:34 -05:00
Jeffrey Morgan
2a7553ce09
update llama.cpp submodule to c14f72d
2024-02-21 09:03:14 -05:00
Sun Bo
10af6070a9
Update big-AGI config file link ( #2626 )
...
Co-authored-by: bo.sun <bo.sun@cotticoffee.com >
2024-02-21 01:24:48 -05:00
Jeffrey Morgan
92423b0600
add dist directory in build_windows.ps
2024-02-21 00:05:05 -05:00
Jeffrey Morgan
b3eac61cac
update llama.cpp submodule to f0d1fafc029a056cd765bdae58dcaa12312e9879
2024-02-20 22:56:51 -05:00
Jeffrey Morgan
287ba11500
better error message when calling /api/generate or /api/chat with embedding models
2024-02-20 21:53:45 -05:00
Jeffrey Morgan
63861f58cc
Support for bert and nomic-bert embedding models
2024-02-20 21:37:29 -05:00
Jeffrey Morgan
f0425d3de9
Update faq.md
2024-02-20 20:44:45 -05:00
Michael Yang
210b65268e
replace strings buffer with hasher ( #2437 )
...
the buffered value is going into the hasher eventually so write directly
to the hasher instead
2024-02-20 19:07:50 -05:00
Michael Yang
949d7b1c48
add gguf file types ( #2532 )
2024-02-20 19:06:29 -05:00
Michael Yang
897b213468
use http.DefaultClient ( #2530 )
...
default client already handles proxy
2024-02-20 18:34:47 -05:00
Jeffrey Morgan
4613a080e7
update llama.cpp submodule to 66c1968f7 ( #2618 )
2024-02-20 17:42:31 -05:00
Muhammed Nazeem
ace2cdf1c6
Add Page Assist to the community integrations ( #2447 )
2024-02-20 14:03:58 -05:00
Nikesh Parajuli
eed92bc19a
docs: add Msty app in readme ( #1775 )
...
* docs: add Msty app in readme
* docs: update msty url
2024-02-20 14:03:33 -05:00
Michael Edoror
e0a2f46466
Update README.md to include Elixir LangChain Library ( #2180 )
...
The Elixir LangChain Library now supports Ollama Chat with this [PR](https://github.com/brainlid/langchain/pull/70 )
2024-02-20 14:03:02 -05:00
Taras Tsugrii
01ff2e14db
[nit] Remove unused msg local var. ( #2511 )
2024-02-20 14:02:34 -05:00
BADR
199e79ec0c
docs: add tenere to terminal clients ( #2329 )
2024-02-19 23:13:03 -05:00
Jeffrey Morgan
8125ce4cb6
Update import.md
...
Add instructions to get public key on windows
2024-02-19 22:48:24 -05:00
Daniel
636d6eea99
Add ShellOracle to community terminal integrations ( #1767 )
2024-02-19 22:18:05 -05:00
Jeffrey Morgan
df56f1ee5e
Update faq.md
2024-02-19 22:16:42 -05:00
Jean-Baptiste Detroyes
0b6c6c9092
feat: add Helm Chart link to Package managers list ( #1673 )
2024-02-19 22:05:14 -05:00
Jakob Hoeg Mørk
cb60389de7
NextJS web interface for Ollama ( #2466 )
2024-02-19 21:57:36 -05:00
lulz
ce0c95d097
[fix] /bye and /exit are now treated as prefixes ( #2381 )
...
* [fix] /bye and /exit are now treated as prefixes
instead of being treated as entire lines which doesn't align with the way the rest of the commands are treated
* Update cmd/interactive.go
Fixing whitespace
---------
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com >
2024-02-19 21:56:49 -05:00
Eddú Meléndez Gonzales
a9bc1e1c37
Add LangChain4J ( #2164 )
2024-02-19 21:17:32 -05:00
Branislav Gerazov
62c71f4cb1
add ollama-chat.nvim ( #2188 )
2024-02-19 21:14:29 -05:00
Jeffrey Morgan
41aca5c2d0
Update faq.md
2024-02-19 21:11:01 -05:00
Jeffrey Morgan
753724d867
Update api.md to include examples for reproducible outputs
2024-02-19 20:36:16 -05:00
Jeffrey Morgan
e4576c2ee1
Update README.md
2024-02-19 20:15:24 -05:00
Patrick Devine
9a7a4b9533
add faqs for memory pre-loading and the keep_alive setting ( #2601 )
2024-02-19 14:45:25 -08:00
Daniel Hiltgen
2653191222
Merge pull request #2600 from dhiltgen/refined_win_docs
...
Document setting server vars for windows
2024-02-19 13:46:37 -08:00
Daniel Hiltgen
b338c0635f
Document setting server vars for windows
2024-02-19 13:30:46 -08:00
Daniel Hiltgen
4fcbf1cde6
Merge pull request #2599 from dhiltgen/fix_avx
...
Explicitly disable AVX2 on GPU builds
2024-02-19 13:13:05 -08:00
Daniel Hiltgen
9220b4fa91
Merge pull request #2585 from dhiltgen/cuda_leaks
...
Fix cuda leaks
2024-02-19 12:48:00 -08:00
Daniel Hiltgen
fc39a6cd7a
Fix cuda leaks
...
This should resolve the problem where we don't fully unload from the GPU
when we go idle.
2024-02-18 18:37:20 -08:00
Justin Hayes
1e23e82324
Update Web UI link to new project name ( #2563 )
...
Ollama WebUI is now known as Open WebUI.
2024-02-17 20:05:20 -08:00
Daniel Hiltgen
f9fd08040b
Merge pull request #2552 from dhiltgen/dup_update_menus
...
Fix duplicate menus on update and exit on signals
2024-02-16 17:23:37 -08:00
Daniel Hiltgen
4318e35ee3
Merge pull request #2553 from dhiltgen/amdgpu_version
...
Harden AMD driver lookup logic
2024-02-16 17:23:12 -08:00
Daniel Hiltgen
9754c6d9d8
Harden AMD driver lookup logic
...
It looks like the version file doesnt exist on older(?) drivers
2024-02-16 16:20:16 -08:00
Daniel Hiltgen
a497235a55
Fix view logs menu
2024-02-16 15:42:53 -08:00
Daniel Hiltgen
df6dc4fd96
Fix duplicate menus on update and exit on signals
...
Also fixes a few fit-and-finish items for better developer experience
2024-02-16 15:33:16 -08:00
Bruce MacDonald
88622847c6
fix: chat system prompting overrides ( #2542 )
2024-02-16 14:42:43 -05:00
Tristan Rhodes
9774663013
Update faq.md with the location of models on Windows ( #2545 )
2024-02-16 11:04:19 -08:00
Daniel Hiltgen
a468ae0459
Merge pull request #2499 from ollama/windows-preview
...
Windows Preview
2024-02-15 16:06:32 -08:00
Daniel Hiltgen
c3e62ba38a
Merge pull request #2516 from dhiltgen/single_tray_app
...
Fix a couple duplicate instance bugs
2024-02-15 15:52:43 -08:00
Daniel Hiltgen
117369aa73
Exit if we detect another copy of Ollama running
2024-02-15 14:58:29 -08:00
Daniel Hiltgen
1ba734de67
typo
2024-02-15 14:56:55 -08:00
Daniel Hiltgen
5208cf09b1
clean up some logging
2024-02-15 14:56:55 -08:00
Daniel Hiltgen
bb9de6037c
Prevent multiple installers running concurrently
2024-02-15 14:56:55 -08:00
Daniel Hiltgen
272e53a1f5
Prepare to distribute standalone windows executable
...
This will be useful for our automated test riggig, and may be useful for
advanced users who want to "roll their own" system service
2024-02-15 14:56:55 -08:00
Daniel Hiltgen
db2a9ad1fe
Explicitly disable AVX2 on GPU builds
...
Even though we weren't setting it to on, somewhere in the cmake config
it was getting toggled on. By explicitly setting it to off, we get `/arch:AVX`
as intended.
2024-02-15 14:50:11 -08:00
Daniel Hiltgen
c9ab1aead3
Merge pull request #2526 from dhiltgen/harden_for_quotes
...
Harden the OLLAMA_HOST lookup for quotes
2024-02-15 14:13:40 -08:00
Daniel Hiltgen
4a10e7a7fa
Harden the OLLAMA_HOST lookup for quotes
2024-02-15 13:46:56 -08:00
Michael Yang
86808f80a8
remove unused import
2024-02-15 12:09:11 -08:00
Michael Yang
4240b045e6
always enable view logs
2024-02-15 12:08:27 -08:00
Michael Yang
e547378893
disable default debug
2024-02-15 12:05:13 -08:00
Michael Yang
fd77dbec4d
do not print update request headers
2024-02-15 11:36:35 -08:00
Michael
fefb3e77d1
Update README.md
2024-02-15 10:32:40 -08:00
Jeffrey Morgan
ed5489a96e
higher resolution tray icons
2024-02-14 22:55:03 -08:00
jmorganca
76113742cf
update installer title
2024-02-15 05:56:45 +00:00
Jeffrey Morgan
57e60c836f
better windows app and tray icons
2024-02-15 05:56:45 +00:00
jmorganca
622b1f3e67
update installer and app.exe metadata
2024-02-15 05:56:45 +00:00
jmorganca
7ad9844ac0
set exe metadata using resource files
2024-02-15 05:56:45 +00:00
Michael Yang
e43648afe5
rerefactor
2024-02-15 05:56:45 +00:00
Daniel Hiltgen
823a520266
Fix lint error on ignored error for win console
2024-02-15 05:56:45 +00:00
vinjn
66ef308abd
Import "containerd/console" lib to support colorful output in Windows terminal
2024-02-15 05:56:45 +00:00
Daniel Hiltgen
29e90cc13b
Implement new Go based Desktop app
...
This focuses on Windows first, but coudl be used for Mac
and possibly linux in the future.
2024-02-15 05:56:45 +00:00
Daniel Hiltgen
f397e0e988
Move hub auth out to new package
2024-02-15 05:56:45 +00:00
Daniel Hiltgen
9da9e8fb72
Move Mac App to a new dir
2024-02-15 05:56:45 +00:00
Patrick Devine
42e77e2a69
handle race condition while setting raw mode in windows ( #2509 )
2024-02-14 21:28:35 -08:00
Jeffrey Morgan
9241a29336
Revert "Revert "bump submodule to 6c00a06 ( #2479 )"" ( #2485 )
...
This reverts commit 6920964b87 .
2024-02-13 18:18:41 -08:00
Jeffrey Morgan
f7231ad9ad
set shutting_down to false once shutdown is complete ( #2484 )
2024-02-13 17:48:41 -08:00
Jeffrey Morgan
6920964b87
Revert "bump submodule to 6c00a06 ( #2479 )"
...
This reverts commit 2f9ed52bbd .
2024-02-13 17:23:05 -08:00
Jeffrey Morgan
2f9ed52bbd
bump submodule to 6c00a06 ( #2479 )
2024-02-13 17:12:42 -08:00
bnorick
caf2b13c10
Fix infinite keep_alive ( #2480 )
2024-02-13 15:40:32 -08:00
lebrunel
1d263449ff
Update README.md to include link to Ollama-ex Elixir library ( #2477 )
2024-02-13 11:40:44 -08:00
Jeffrey Morgan
48a273f80b
Fix issues with templating prompt in chat mode ( #2460 )
2024-02-12 15:06:57 -08:00
Daniel Hiltgen
939c60473f
Merge pull request #2422 from dhiltgen/better_kill
...
More robust shutdown
2024-02-12 14:05:06 -08:00
Jeffrey Morgan
f76ca04f9e
update submodule to 099afc6 ( #2468 )
2024-02-12 14:01:16 -08:00
Daniel Hiltgen
76b8728f0c
Merge pull request #2465 from dhiltgen/block_rocm_pre_9
...
Detect AMD GPU info via sysfs and block old cards
2024-02-12 12:41:43 -08:00
Jeffrey Morgan
1f9078d6ae
Check image filetype in api handlers ( #2467 )
2024-02-12 11:16:20 -08:00
Daniel Hiltgen
6d84f07505
Detect AMD GPU info via sysfs and block old cards
...
This wires up some new logic to start using sysfs to discover AMD GPU
information and detects old cards we can't yet support so we can fallback to CPU mode.
2024-02-12 08:19:41 -08:00
Jeffrey Morgan
26b13fc33c
patch: always add token to cache_tokens ( #2459 )
2024-02-12 08:10:16 -08:00
Jeffrey Morgan
1c8435ffa9
Update domain name references in docs and install script ( #2435 )
2024-02-09 15:19:30 -08:00
Daniel Hiltgen
6680761596
Shutdown faster
...
Make sure that when a shutdown signal comes, we shutdown quickly instead
of waiting for a potentially long exchange to wrap up.
2024-02-08 22:22:50 -08:00
Jeffrey Morgan
42b797ed9c
Update openai.md
2024-02-08 15:03:23 -05:00
Jeffrey Morgan
336aa43f3c
Update openai.md
2024-02-08 12:48:28 -05:00
Daniel Hiltgen
69f392c9b7
Merge pull request #2403 from dhiltgen/handle_tmp_cleanup
...
Ensure the libraries are present
2024-02-07 17:55:31 -08:00
Daniel Hiltgen
a1dfab43b9
Ensure the libraries are present
...
When we store our libraries in a temp dir, a reaper might clean
them when we are idle, so make sure to check for them before
we reload.
2024-02-07 17:27:49 -08:00
Jeffrey Morgan
a0a199b108
Fix hanging issue when sending empty content ( #2399 )
2024-02-07 19:30:33 -05:00
Jeffrey Morgan
ab0d37fde4
Update openai.md
2024-02-07 17:25:33 -05:00
Jeffrey Morgan
14e71350c8
Update openai.md
2024-02-07 17:25:24 -05:00
Jeffrey Morgan
453f572f83
Initial OpenAI /v1/chat/completions API compatibility ( #2376 )
2024-02-07 17:24:29 -05:00
Daniel Hiltgen
c9dfa6e571
Merge pull request #2377 from dhiltgen/bump_llamacpp
...
Bump llama.cpp to b2081
2024-02-07 12:04:38 -08:00
Michael Yang
3dcbcd367d
Merge pull request #2394 from ollama/mxyng/fix-error-response
2024-02-07 11:47:31 -08:00
Michael Yang
e805ac1d59
fix response on token error
2024-02-07 11:05:49 -08:00
Michael Yang
b9229ffca5
Merge pull request #2378 from ollama/mxyng/runners
...
runners
2024-02-06 13:49:58 -08:00
Michael Yang
46c847c4ad
enable rocm builds
2024-02-06 13:36:13 -08:00
Michael Yang
92b1a21f79
use linux runners
2024-02-06 13:36:04 -08:00
Daniel Hiltgen
de76b95dd4
Bump llama.cpp to b2081
2024-02-06 12:06:43 -08:00
Michael Yang
59ec837ef6
Merge pull request #2374 from ollama/mxyng/rocm-builds
...
disable rocm builds
2024-02-06 09:41:02 -08:00
Michael Yang
f06b99a461
disable rocm builds
2024-02-06 09:29:42 -08:00
Bruce MacDonald
128fce5495
docs: keep_alive ( #2258 )
2024-02-06 11:00:05 -05:00
Daniel Hiltgen
27aa2d4a19
Merge pull request #1849 from mraiser/main
...
Accomodate split cuda lib dir
2024-02-05 16:01:16 -08:00
Jeffrey Morgan
b9f91a0b36
Update import instructions to use convert and quantize tooling from llama.cpp submodule ( #2247 )
2024-02-05 00:50:44 -05:00
Erik S
b538dc3858
Add llm-ollama plugin for Datasette's LLM CLI to README ( #2340 )
...
Co-authored-by: Erik Sp <git@aschwa.com >
2024-02-03 15:40:50 -08:00
Jeffrey Morgan
f0e9496c85
Update api.md
2024-02-02 12:17:24 -08:00
Jeffrey Morgan
09a6f76f4c
fix error on ollama run with a non-existent model
2024-02-01 23:11:52 -08:00
Jeffrey Morgan
e135167484
Add multimodel support to ollama run in noninteractive mopde ( #2317 )
2024-02-01 21:33:06 -08:00
Jeffrey Morgan
38296ab352
clear previous images when submitting an image to ollama run ( #2316 )
2024-02-01 21:30:26 -08:00
Daniel Hiltgen
f43dea68d1
Merge pull request #2318 from dhiltgen/more_clean
...
Harden generate patching model
2024-02-01 20:41:29 -08:00
Daniel Hiltgen
e1f50377f4
Harden generate patching model
...
Only apply patches if we have any, and make sure to cleanup
every file we patched at the end to leave the tree clean
2024-02-01 19:34:36 -08:00
Jeffrey Morgan
7913104527
Improvements to ollama run for multimodal models ( #2300 )
2024-02-01 17:09:51 -08:00
Michael Yang
bfbf2f7cf7
Merge pull request #2296 from ollama/mxyng/img-tags
...
append image tags to user content
2024-02-01 13:16:59 -08:00
Michael Yang
fe3cbd014f
Merge pull request #2298 from ollama/mxyng/debug-prompt
...
structured debug prompt
2024-02-01 13:16:49 -08:00
Michael Yang
3d6f48507a
structured debug prompt
2024-02-01 11:56:28 -08:00
Michael Yang
f3761405c8
use image id
2024-02-01 11:52:42 -08:00
Michael Yang
e49dc9f3d8
fix tests
2024-02-01 11:48:11 -08:00
Michael Yang
d125510b4b
remove image tags
2024-02-01 11:32:51 -08:00
Russell Canfield
1ca386aa9e
Feature - Add Wingman Extension ( #2313 )
2024-02-01 11:16:24 -08:00
Michael Yang
fb56988014
account for image projection in token count
2024-02-01 09:50:48 -08:00
Michael Yang
d046bee790
use llm.ImageData for chat
2024-01-31 19:18:25 -08:00
Jeffrey Morgan
f11bf0740b
use llm.ImageData
2024-01-31 19:13:48 -08:00
Michael Yang
8450bf66e6
trim images
2024-01-31 19:13:47 -08:00
Michael Yang
b4e11be8ef
append image tags to user content
2024-01-31 19:13:10 -08:00
Bruce MacDonald
a896079705
preserve last system message from modelfile ( #2289 )
2024-01-31 21:45:01 -05:00
Michael Yang
583950c828
Merge pull request #2294 from ollama/mxyng/slog-source
...
update slog handler options
2024-01-31 15:29:11 -08:00
Michael Yang
8ac08a0eec
update slog handler options
...
- consistent format by using text handler for debug and non-debug
- truncate source file to just the file name
2024-01-31 15:15:00 -08:00
Michael Yang
60f47be64c
Merge pull request #2284 from ollama/mxyng/parse-raw
...
remove unnecessary parse raw
2024-01-31 09:40:48 -08:00
Daniel Hiltgen
6e56077ada
Merge pull request #2263 from dhiltgen/bump_llamacpp
...
Bump llama.cpp to b1999
2024-01-31 08:39:41 -08:00
Hoang Nguyen
98ae9467bb
Added MindMac to Community Integrations -> Web & Desktop section ( #1957 )
2024-01-31 07:48:37 -08:00
Richard Macarthy
b7a24af083
Add twinny vscode extension to Extensions and Plugins ( #1950 )
2024-01-31 06:25:06 -08:00
Michael Yang
c8b1f2369e
remove unnecessary parse raw
2024-01-30 17:00:53 -08:00
Daniel Hiltgen
72b12c3be7
Bump llama.cpp to b1999
...
This requires an upstream change to support graceful termination,
carried as a patch.
2024-01-30 16:52:12 -08:00
Bruce MacDonald
0632dff3f8
trim chat prompt based on llm context size ( #1963 )
2024-01-30 15:59:29 -05:00
Maximilian Weber
509e2dec8a
Update README.md ( #2252 )
...
Added - [Ollama for R - rollama](https://github.com/JBGruber/rollama ) in Libraries in README.md
2024-01-30 11:56:51 -08:00
Daniel Hiltgen
78a48de804
Merge pull request #2256 from dhiltgen/container_logs
...
Add container hints for troubleshooting
2024-01-30 08:12:48 -08:00
Daniel Hiltgen
e7dbb00331
Add container hints for troubleshooting
...
Some users are new to containers and unsure where the server logs go
2024-01-29 08:53:41 -08:00
Marc Raiser
c3f9538636
remove default.nix
2024-01-29 00:05:07 -05:00
Jeffrey Morgan
2e06ed01d5
remove unknown CPPFLAGS option
2024-01-28 17:51:23 -08:00
Daniel Hiltgen
4072b5879b
Merge pull request #2246 from dhiltgen/reject_cuda_without_avx
...
Don't disable GPUs on arm without AVX
2024-01-28 16:26:55 -08:00
Daniel Hiltgen
15562e887d
Don't disable GPUs on arm without AVX
...
AVX is an x86 feature, so ARM should be excluded from
the check.
2024-01-28 15:22:38 -08:00
Jeffrey Morgan
f2245c7c77
print prompt with OLLAMA_DEBUG=1 ( #2245 )
2024-01-28 15:22:35 -08:00
Jeffrey Morgan
e4b9b72f2a
Do not repeat system prompt for chat templating ( #2241 )
2024-01-28 14:15:56 -08:00
Daniel Hiltgen
311f8e0c3f
Merge pull request #2243 from dhiltgen/harden_zero_gpus
...
Harden for zero detected GPUs
2024-01-28 13:30:44 -08:00
Daniel Hiltgen
f07f8b7a9e
Harden for zero detected GPUs
...
At least with the ROCm libraries, its possible to have the library
present with zero GPUs. This fix avoids a divide by zero bug in llm.go
when we try to calculate GPU memory with zero GPUs.
2024-01-28 13:13:10 -08:00
mraiser
4c4c730a0a
Merge branch 'ollama:main' into main
2024-01-27 21:56:11 -05:00
Daniel Hiltgen
e02ecfb6c8
Merge pull request #2116 from dhiltgen/cc_50_80
...
Add support for CUDA 5.0 cards
2024-01-27 10:28:38 -08:00
Daniel Hiltgen
c8059b4dcf
Merge pull request #2224 from jaglinux/fix_rocm_get_version_message
...
ROCm: Correct the response string in rocm_get_version function
2024-01-27 07:29:32 -08:00
Jagadish Krishnamoorthy
59d87127f5
Update gpu_info_rocm.c
2024-01-26 22:08:27 -08:00
Patrick Devine
b5cf31b460
add keep_alive to generate/chat/embedding api endpoints ( #2146 )
2024-01-26 14:28:02 -08:00
Daniel Hiltgen
cc4915e262
Merge pull request #2214 from dhiltgen/reject_cuda_without_avx
...
Detect lack of AVX and fallback to CPU mode
2024-01-26 12:06:44 -08:00
Daniel Hiltgen
667a2ba18a
Detect lack of AVX and fallback to CPU mode
...
We build the GPU libraries with AVX enabled to ensure that if not all
layers fit on the GPU we get better performance in a mixed mode.
If the user is using a virtualization/emulation system that lacks AVX
this used to result in an illegal instruction error and crash before this
fix. Now we will report a warning in the server log, and just use
CPU mode to ensure we don't crash.
2024-01-26 11:36:03 -08:00
Michael Yang
e054ebe059
Merge pull request #2212 from ollama/mxyng/fix-build
...
fix build
2024-01-26 11:19:08 -08:00
Michael Yang
9d3dcfd0ec
fix logging
2024-01-26 11:04:27 -08:00
Michael Yang
6e0ea5ecc8
Merge pull request #1916 from ollama/mxyng/inactivity-monitor
...
download: add inactivity monitor
2024-01-26 10:56:00 -08:00
Daniel Hiltgen
a47d8b2557
Merge pull request #2197 from dhiltgen/remove_rocm_image
...
Add back ROCm container support
2024-01-26 09:34:23 -08:00
Daniel Hiltgen
30c43c285c
Merge pull request #2195 from dhiltgen/rocm_real_gpus
...
Ignore AMD integrated GPUs
2024-01-26 09:30:24 -08:00
Daniel Hiltgen
23a7ea593b
Merge pull request #2209 from dhiltgen/harden_mgmt
...
Fix crash on cuda ml init failure
2024-01-26 09:30:13 -08:00
Daniel Hiltgen
75c44aa319
Add back ROCm container support
...
This adds ROCm support back as a discrete image.
2024-01-26 09:24:29 -08:00
Daniel Hiltgen
9d7b5d6c91
Ignore AMD integrated GPUs
...
Detect and ignore integrated GPUs reported by rocm.
2024-01-26 09:21:35 -08:00
Daniel Hiltgen
5d9c4a5f5a
Fix crash on cuda ml init failure
...
The new driver lookup code was triggering after init failure due to a missing return
2024-01-26 09:18:33 -08:00
Daniel Hiltgen
197e420a97
Merge pull request #2196 from dhiltgen/remove_rocm_image
...
Switch back to ubuntu base
2024-01-25 16:50:32 -08:00
Daniel Hiltgen
a34e1ad3cf
Switch back to ubuntu base
...
The size increase for rocm support in the standard image is problematic
We'll revisit multiple tags for rocm support in a follow up PR.
2024-01-25 16:46:01 -08:00
Michael Yang
2ae0556292
Merge pull request #1679 from ollama/mxyng/build-gpus
...
build cuda and rocm
2024-01-25 16:38:14 -08:00
Jeffrey Morgan
5be9bdd444
Update modelfile.md
2024-01-25 16:29:48 -08:00
Jeffrey Morgan
b706794905
Update modelfile.md to include MESSAGE
2024-01-25 16:29:32 -08:00
Michael Yang
a8c5413d06
only generate gpu libs
2024-01-25 15:41:56 -08:00
Michael Yang
5580de4571
archive ollama binaries
2024-01-25 15:40:16 -08:00
Michael Yang
946431d5b0
build cuda and rocm
2024-01-25 15:40:15 -08:00
Michael Yang
0610126049
remove env setting
2024-01-25 15:39:43 -08:00
Jeffrey Morgan
3ebd6a83fc
update submodule to cd4fddb29f81d6a1f6d51a0c016bc6b486d68def
2024-01-25 13:54:11 -08:00
Jeffrey Morgan
a64570dcae
Fix clearing kv cache between requests with the same prompt ( #2186 )
...
* Fix clearing kv cache between requests with the same prompt
* fix powershell script
2024-01-25 13:46:20 -08:00
Patrick Devine
7c40a67841
Save and load sessions ( #2063 )
2024-01-25 12:12:36 -08:00
Michael Yang
e64b5b07a2
Merge pull request #2181 from ollama/mxyng/stub-lint
...
stub generate outputs for lint
2024-01-25 11:55:15 -08:00
Michael Yang
9e1e295cdc
Merge pull request #2175 from ollama/mxyng/refactor-tensor-read
...
refactor tensor read
2024-01-25 09:22:42 -08:00
Marc Raiser
6eb3cddcb6
To build on NixOS: nix-shell --run 'go generate ./... && go build .'
2024-01-25 10:17:22 -05:00
mraiser
a4564232a4
Update gen_linux.sh to find libcudart in separate directory
2024-01-25 09:49:35 -05:00
Jeffrey Morgan
a643823f86
Update README.md
2024-01-24 21:36:56 -08:00
Michael Yang
8e5d359a03
stub generate outputs for lint
2024-01-24 17:36:10 -08:00
Daniel Hiltgen
a170888dd4
Merge pull request #2174 from dhiltgen/rocm_real_gpus
...
More logging for gpu management
2024-01-24 11:09:17 -08:00
Michael Yang
cd22855ef8
refactor tensor read
2024-01-24 10:48:31 -08:00
Daniel Hiltgen
013fd07139
More logging for gpu management
...
Fix an ordering glitch of dlerr/dlclose and add more logging to help
root cause some crashes users are hitting. This also refines the
function pointer names to use the underlying function names instead
of simplified names for readability.
2024-01-24 10:32:36 -08:00
Daniel Hiltgen
f63dc2db5c
Merge pull request #2162 from dhiltgen/rocm_real_gpus
...
Report more information about GPUs in verbose mode
2024-01-23 17:45:40 -08:00
Jeffrey Morgan
eaa5a396d9
Update README.md
2024-01-23 16:08:15 -08:00
Jeffrey Morgan
8ed22f5d72
Update README.md
2024-01-23 14:38:01 -08:00
Daniel Hiltgen
987c16b2f7
Report more information about GPUs in verbose mode
...
This adds additional calls to both CUDA and ROCm management libraries to
discover additional attributes about the GPU(s) detected in the system, and
wires up runtime verbosity selection. When users hit problems with GPUs we can
ask them to run with `OLLAMA_DEBUG=1 ollama serve` and share the results.
2024-01-23 11:37:02 -08:00
Jeffrey Morgan
950f636d64
Update README.md
2024-01-23 10:29:10 -08:00
Jeffrey Morgan
4458efb73a
Load all layers on arm64 macOS if model is small enough ( #2149 )
2024-01-22 17:40:06 -08:00
Daniel Hiltgen
ceea599494
Merge pull request #2150 from dhiltgen/default_version
...
Set a default version using git describe
2024-01-22 17:38:27 -08:00
Daniel Hiltgen
3005ec74b3
Set a default version using git describe
...
If a VERSION is not specified, this will generate a version string that
represents the state of the repo. For example `0.1.21-12-gffaf52e-dirty`
representing 12 commits away from 0.1.21 tag, on commit gffaf52e
and the tree is dirty.
2024-01-22 17:12:20 -08:00
Daniel Hiltgen
0759d8996e
Merge pull request #2148 from dhiltgen/intel_mac
...
Refine Accelerate usage on mac
2024-01-22 16:56:58 -08:00
Daniel Hiltgen
0f5b843319
Refine Accelerate usage on mac
...
For old macs, accelerate seems to cause crashes, but for
AVX2 capable macs, it does not.
2024-01-22 16:25:56 -08:00
Jeffrey Morgan
ffaf52e1e9
update submodule to 011e8ec577fd135cbc02993d3ea9840c516d6a1c
2024-01-22 15:16:54 -08:00
Michael Yang
940b10b036
Merge pull request #2144 from jmorganca/mxyng/update-faq
...
faq: update to use launchctl setenv
2024-01-22 13:46:57 -08:00
Daniel Hiltgen
3bc28736cd
Merge pull request #2143 from dhiltgen/llm_verbosity
...
Refine debug logging for llm
2024-01-22 13:19:16 -08:00
Michael Yang
93a756266c
faq: update to use launchctl setenv
2024-01-22 13:10:13 -08:00
Daniel Hiltgen
a0a829bf7a
Merge pull request #2142 from dhiltgen/debug_on_fail
...
Debug logging on init failure
2024-01-22 12:29:22 -08:00
Daniel Hiltgen
730dcfcc7a
Refine debug logging for llm
...
This wires up logging in llama.cpp to always go to stderr, and also
turns up logging if OLLAMA_DEBUG is set.
2024-01-22 12:26:49 -08:00
Daniel Hiltgen
27a2d5af54
Debug logging on init failure
2024-01-22 12:08:22 -08:00
Jeffrey Morgan
5f81a33f43
update submodule to 6f9939d ( #2115 )
2024-01-22 11:56:40 -08:00
Michael Yang
6225fde046
Merge pull request #2102 from jmorganca/mxyng/fix-create-override
...
fix: remove overwritten model layers
2024-01-22 09:37:48 -08:00
Meng Zhuo
069184562b
readline: drop not use min function ( #2134 )
2024-01-22 08:15:08 -08:00
Daniel Hiltgen
5576bb2348
Merge pull request #2130 from dhiltgen/more_faster
...
Make CPU builds parallel and customizable AMD GPUs
2024-01-21 16:14:12 -08:00
Daniel Hiltgen
2738837786
Merge pull request #2131 from dhiltgen/probe_cards_at_init
...
Probe GPUs before backend init
2024-01-21 16:13:47 -08:00
Daniel Hiltgen
ec3764538d
Probe GPUs before backend init
...
Detect potential error scenarios so we can fallback to CPU mode without
hitting asserts.
2024-01-21 15:59:38 -08:00
Daniel Hiltgen
df54c723ae
Make CPU builds parallel and customizable AMD GPUs
...
The linux build now support parallel CPU builds to speed things up.
This also exposes AMD GPU targets as an optional setting for advaced
users who want to alter our default set.
2024-01-21 15:12:21 -08:00
Daniel Hiltgen
fa8c990e58
Merge pull request #2127 from dhiltgen/rocm_container
...
Combine the 2 Dockerfiles and add ROCm
2024-01-21 11:49:01 -08:00
Daniel Hiltgen
da72235ebf
Combine the 2 Dockerfiles and add ROCm
...
This renames Dockerfile.build to Dockerfile, and adds some new stages
to support 2 modes of building - the build_linux.sh script uses
intermediate stages to extract the artifacts for ./dist, and the default
build generates a container image usable by both cuda and rocm cards.
This required transitioniing the x86 base to the rocm image to avoid
layer bloat.
2024-01-21 11:37:11 -08:00
Jeffrey Morgan
89c4aee29e
Unlock mutex when failing to load model ( #2117 )
2024-01-20 20:54:46 -05:00
Daniel Hiltgen
a447a083f2
Add compute capability 5.0, 7.5, and 8.0
2024-01-20 14:24:05 -08:00
Jeffrey Morgan
f32ea81b21
increase minimum overhead to 1024MiB ( #2114 )
2024-01-20 17:11:38 -05:00
Daniel Hiltgen
681a914990
Add support for CUDA 5.2 cards
2024-01-20 10:48:43 -08:00
Jeffrey Morgan
4c54f0ddeb
sign dylibs on macOS ( #2101 )
2024-01-19 19:24:11 -05:00
Michael Yang
c08dfaa23d
fix: remove overwritten model layers
...
if create overrides a manifest, first add the older manifest's layers to
the delete map so they can be cleaned up
2024-01-19 14:58:37 -08:00
Daniel Hiltgen
3b76e736ae
Merge pull request #2100 from dhiltgen/more_wsl_globs
...
More WSL paths
2024-01-19 13:41:08 -08:00
Daniel Hiltgen
552db98bf1
More WSL paths
2024-01-19 13:23:29 -08:00
Daniel Hiltgen
fdcdfef620
Merge pull request #2099 from dhiltgen/fix_cuda_model_swap
...
Switch to local dlopen symbols
2024-01-19 12:22:04 -08:00
Daniel Hiltgen
6a042438af
Switch to local dlopen symbols
2024-01-19 11:37:02 -08:00
Jeffrey Morgan
dc88cc3981
use gzip for runner embedding ( #2067 )
2024-01-19 13:23:03 -05:00
Daniel Hiltgen
62976087c6
Merge pull request #1999 from lainedfles/termux_android_cpu_only
...
Fix CPU-only build under Android Termux enviornment.
2024-01-18 17:16:53 -08:00
Self Denial
344342abdf
Restore dyn_ext_server.c since RTLD_DEEPBIND has been removed
2024-01-18 17:30:42 -07:00
Self Denial
eb76f3e379
Fix CPU-only build under Android Termux enviornment.
...
Update gpu.go initGPUHandles() to declare gpuHandles variable before
reading it. This resolves an "invalid memory address or nil pointer
dereference" error.
Update dyn_ext_server.c to avoid setting the RTLD_DEEPBIND flag under
__TERMUX__ (Android).
2024-01-18 17:25:33 -07:00
Michael Yang
d017e3d0a6
Merge pull request #2060 from jmorganca/mxyng/fix-show
...
fix show handler
2024-01-18 16:02:27 -08:00
Michael Yang
aac9ab4db7
fix show handler
2024-01-18 15:36:50 -08:00
Michael Yang
1f5b7ff976
Merge pull request #1932 from jmorganca/mxyng/api-fields
...
api: add model for all requests
2024-01-18 14:56:51 -08:00
Michael Yang
e299831e2c
Merge pull request #1958 from purificant/ci
...
ci: update setup-go action
2024-01-18 14:53:36 -08:00
Michael Yang
745b5934fa
add model to ModelResponse
2024-01-18 14:32:55 -08:00
Michael Yang
a38d88d828
api: add model for all requests
...
prefer using req.Model and fallback to req.Name
2024-01-18 14:31:37 -08:00
Daniel Hiltgen
abec7f06e5
Merge pull request #2056 from dhiltgen/slog
...
Mechanical switch from log to slog
2024-01-18 14:27:24 -08:00
Michael Yang
e5da190bac
Merge pull request #2020 from jmorganca/mxyng/install-fedora
...
install: pin fedora to max 37
2024-01-18 14:23:42 -08:00
Daniel Hiltgen
ecbfc0182f
Go bump to v1.21 to pick up slog
2024-01-18 14:12:57 -08:00
Daniel Hiltgen
fedd705aea
Mechanical switch from log to slog
...
A few obvious levels were adjusted, but generally everything mapped to "info" level.
2024-01-18 14:12:57 -08:00
Mike Bird
82ee019bfc
add open interpreter to list of extensions ( #2016 )
2024-01-18 13:59:39 -08:00
Sachin Sachdeva
ad9dbc2a04
Haystack Ollama Integration ( #2021 )
...
Updated readme with the web link for haystack ollama integration
2024-01-18 13:38:32 -08:00
Daniel Hiltgen
fccdf4c635
Merge pull request #1987 from xyproto/archlinux
...
Let gpu.go and gen_linux.sh also find CUDA on Arch Linux
2024-01-18 13:32:10 -08:00
Daniel Hiltgen
d450fb1d1e
Merge pull request #2055 from dhiltgen/cuda_docs
...
Refine the linux cuda/rocm developer docs
2024-01-18 12:07:31 -08:00
Daniel Hiltgen
df40b11d03
Merge pull request #2007 from dhiltgen/cpu_fallback
...
Add multiple CPU variants for Intel Mac
2024-01-18 11:32:29 -08:00
Daniel Hiltgen
9cd20b0ec8
Refine the linux cuda/rocm developer docs
2024-01-18 09:44:44 -08:00
Daniel Hiltgen
b992bf65fc
Disable arm64 for test phase
...
The runners are x86 so we can only run binaries that match.
2024-01-17 19:26:13 -08:00
Daniel Hiltgen
1b249748ab
Add multiple CPU variants for Intel Mac
...
This also refines the build process for the ext_server build.
2024-01-17 15:08:54 -08:00
Alexander F. Rødseth
cbe2adc78a
Merge branch 'main' into archlinux
2024-01-17 12:50:11 +01:00
Michael Yang
d5a7353357
Merge pull request #2026 from jmorganca/mxyng/fix-windows
...
fix: normalize name path before splitting
2024-01-16 16:58:42 -08:00
Michael Yang
96cfb62641
fix: normalize name path before splitting
2024-01-16 16:48:29 -08:00
Daniel Hiltgen
7d00b5d110
Merge pull request #1915 from dhiltgen/bump_llama_with_new_dep
...
Bump llama.cpp to b1842 and add new cuda lib dep
2024-01-16 13:36:49 -08:00
Daniel Hiltgen
795674dd90
Bump llama.cpp to b1842 and add new cuda lib dep
...
Upstream llama.cpp has added a new dependency with the
NVIDIA CUDA Driver Libraries (libcuda.so) which is part of the
driver distribution, not the general cuda libraries, and is not
available as an archive, so we can not statically link it. This may
introduce some additional compatibility challenges which we'll
need to keep an eye on.
2024-01-16 12:53:52 -08:00
Daniel Hiltgen
e282bdccdd
Merge pull request #1990 from dhiltgen/ci_mac_cross
...
Add macos cross-compile CI coverage
2024-01-16 12:31:37 -08:00
Michael Yang
d9bfb2f08f
install: pin fedora to max 37
...
repos for fedora 38 and newer do not exist as of this commit
```
$ dnf config-manager --add-repo https://developer.download.nvidia.com/compute/cuda/repos/fedora38/x86_64/cuda-fedora38.repo
Adding repo from: https://developer.download.nvidia.com/compute/cuda/repos/fedora38/x86_64/cuda-fedora38.repo
Status code: 404 for https://developer.download.nvidia.com/compute/cuda/repos/fedora38/x86_64/cuda-fedora38.repo (IP: 152.195.19.142)
Error: Configuration of repo failed
```
2024-01-16 11:45:21 -08:00
Michael Yang
598d6d5572
Merge pull request #1937 from jmorganca/mxyng/remove-client-py
...
remove client.py
2024-01-16 11:01:41 -08:00
Bruce MacDonald
a897e833b8
do not cache prompt ( #2018 )
...
- prompt cache causes inferance to hang after some time
2024-01-16 13:48:05 -05:00
Patrick Devine
eef50accb4
Fix show parameters ( #2017 )
2024-01-16 10:34:44 -08:00
Michael Yang
05d53de7a1
Merge pull request #1968 from jmorganca/mxyng/fix-request-retry
...
fix: request retry with error
2024-01-16 10:33:50 -08:00
Daniel Hiltgen
8795447dad
Merge pull request #1966 from fpreiss/fpreiss/gen_linux_cuda_detection
...
improve cuda detection (rel. issue #1704 )
2024-01-14 18:00:11 -08:00
Daniel Hiltgen
b3035112a1
Add macos cross-compile CI coverage
2024-01-14 10:38:59 -08:00
Daniel Hiltgen
95ad9a9fc8
Merge pull request #1988 from dhiltgen/fix_intel_mac
...
Fix typo in arm mac arch script
2024-01-14 08:45:18 -08:00
Daniel Hiltgen
3ca5f69ce8
Fix typo in arm mac arch script
2024-01-14 08:32:57 -08:00
Daniel Hiltgen
cfa6337960
Merge pull request #1982 from dhiltgen/fix_intel_mac
...
Fix intel mac build
2024-01-14 08:26:46 -08:00
Alexander F. Rødseth
f4bf1d514f
Let gpu.go and gen_linux.sh also find CUDA on Arch Linux
2024-01-14 13:40:36 +01:00
Jeffrey Morgan
557110d0ba
Disable mmap with lora layers ( #1985 )
2024-01-13 23:36:31 -05:00
Daniel Hiltgen
2ecb247276
Fix intel mac build
...
Make sure we're building an x86 ext_server lib when cross-compiling
2024-01-13 14:46:34 -08:00
Jeffrey Morgan
288ef8ff95
add gcc -lstdc++ flag for linux cpu ( #1974 )
2024-01-13 03:53:00 -05:00
Jeffrey Morgan
4cf17990f7
use g++ to build libext_server.so on linux ( #1972 )
2024-01-13 03:12:42 -05:00
Michael Yang
27331ae3a8
download: add inactivity monitor
...
if a download part is inactive for some time, restart it
2024-01-12 15:23:15 -08:00
Michael Yang
b6c0ef1e70
Merge pull request #1961 from jmorganca/mxyng/rm-double-newline
...
remove double newlines in /set parameter
2024-01-12 15:18:19 -08:00
Michael Yang
356d178f6e
Merge pull request #1971 from jmorganca/mxyng/max-context-length
...
add max context length check
2024-01-12 15:10:25 -08:00
Michael Yang
eaed6f8c45
add max context length check
2024-01-12 14:54:07 -08:00
purificant
6a5bfc2ed6
update actions/setup-go
2024-01-12 22:27:25 +00:00
Michael Yang
cf29bd2d72
fix: request retry with error
...
this fixes a subtle bug with makeRequestWithRetry where an HTTP status
error on a retried request will potentially not return the right err
2024-01-12 13:32:27 -08:00
Fabian Preiss
905862e17b
improve cuda detection (rel. issue #1704 )
2024-01-12 21:59:19 +01:00
Patrick Devine
565f8a3c44
Convert the REPL to use /api/chat for interactive responses ( #1936 )
2024-01-12 12:05:52 -08:00
Michael Yang
5121b7ac9c
remove double newlines in /set parameter
2024-01-12 11:21:15 -08:00
Michael Yang
a70262c6b2
Update README.md
...
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com >
2024-01-12 09:43:04 -08:00
Tristram Oaten
40a0a90a88
Add group delete to uninstall instructions ( #1924 )
...
After executing the `userdel ollama` command, I saw this message:
```sh
$ sudo userdel ollama
userdel: group ollama not removed because it has other members.
```
Which reminded me that I had to remove the dangling group too. For completeness, the uninstall instructions should do this too.
Thanks!
2024-01-12 00:07:00 -05:00
Michael Yang
cbe20c4375
update readme
2024-01-11 16:24:37 -08:00
Michael Yang
5ffbbea1d7
remove client.py
2024-01-11 15:53:10 -08:00
Daniel Hiltgen
3773fb6465
Merge pull request #1935 from dhiltgen/cpu_fallback
...
Fix up the CPU fallback selection
2024-01-11 15:52:32 -08:00
Daniel Hiltgen
7427fa1387
Fix up the CPU fallback selection
...
The memory changes and multi-variant change had some merge
glitches I missed. This fixes them so we actually get the cpu llm lib
and best variant for the given system.
2024-01-11 15:27:06 -08:00
Michael Yang
f84537e0e0
Merge pull request #1934 from jmorganca/mxyng/fix-slices
...
fix build and lint
2024-01-11 14:36:20 -08:00
Michael Yang
d2be6387c9
fix typo
2024-01-11 14:25:21 -08:00
Michael Yang
d7af35d3d0
import fmt
2024-01-11 14:22:32 -08:00
Michael Yang
defc1dbd6e
use x/exp/slices
2024-01-11 14:20:13 -08:00
Daniel Hiltgen
de2fbdec99
Merge pull request #1819 from dhiltgen/multi_variant
...
Support multiple LLM libs; ROCm v5 and v6; Rosetta, AVX, and AVX2 compatible CPU builds
2024-01-11 14:00:48 -08:00
Eduard van Valkenburg
f5faf79aa1
Add semantic kernel to Readme ( #1931 )
2024-01-11 14:40:23 -05:00
Michael Yang
f4f939de28
Merge pull request #1552 from jmorganca/mxyng/lint-test
...
add lint and test on pull_request
2024-01-11 09:37:45 -08:00
Daniel Hiltgen
39928a42e8
Always dynamically load the llm server library
...
This switches darwin to dynamic loading, and refactors the code now that no
static linking of the library is used on any platform
2024-01-11 08:42:47 -08:00
Daniel Hiltgen
d88c527be3
Build multiple CPU variants and pick the best
...
This reduces the built-in linux version to not use any vector extensions
which enables the resulting builds to run under Rosetta on MacOS in
Docker. Then at runtime it checks for the actual CPU vector
extensions and loads the best CPU library available
2024-01-11 08:42:47 -08:00
Fabian Preiß
3bc8b9832b
fix gpu_test.go Error (same type) uint64->uint32 ( #1921 )
2024-01-11 08:22:23 -05:00
Jeffrey Morgan
ab6be852c7
revisit memory allocation to account for full kv cache on main gpu
2024-01-11 01:45:31 -05:00
Daniel Hiltgen
052b33b81b
DRY out the Dockefile.build
2024-01-10 17:27:51 -08:00
Daniel Hiltgen
8da7bef05f
Support multiple variants for a given llm lib type
...
In some cases we may want multiple variants for a given GPU type or CPU.
This adds logic to have an optional Variant which we can use to select
an optimal library, but also allows us to try multiple variants in case
some fail to load.
This can be useful for scenarios such as ROCm v5 vs v6 incompatibility
or potentially CPU features.
2024-01-10 17:27:51 -08:00
Jeffrey Morgan
b24e8d17b2
Increase minimum CUDA memory allocation overhead and fix minimum overhead for multi-gpu ( #1896 )
...
* increase minimum cuda overhead and fix minimum overhead for multi-gpu
* fix multi gpu overhead
* limit overhead to 10% of all gpus
* better wording
* allocate fixed amount before layers
* fixed only includes graph alloc
2024-01-10 19:08:51 -05:00
Jeffrey Morgan
f83881390f
revert submodule back to 328b83de23b33240e28f4e74900d1d06726f5eb1
2024-01-10 18:42:39 -05:00
Daniel Hiltgen
ac70ab6761
Merge pull request #1914 from dhiltgen/smarter_cuda_detection
...
Smarter GPU Management library detection
2024-01-10 15:21:56 -08:00
Daniel Hiltgen
3c49c3ab0d
Harden GPU mgmt library lookup
...
When there are multiple management libraries installed on a system
not every one will be compatible with the current driver. This change
improves our management library algorithm to build up a set of discovered
libraries based on glob patterns, and then try all of them until we're able to
load one without error.
2024-01-10 15:06:41 -08:00
Daniel Hiltgen
9754ae4c89
Support optional override of the target archictures
...
This can help speed up incremental builds when you're only testing one
archicture, like amd64. E.g.
BUILD_ARCH=amd64 ./scripts/build_linux.sh && scp ./dist/ollama-linux-amd64 test-system:
2024-01-10 14:43:24 -08:00
Jeffrey Morgan
224fbf2795
update submodule to commit 1fc2f265ff9377a37fd2c61eae9cd813a3491bea until its main branch is fixed
2024-01-10 17:03:15 -05:00
Jeffrey Morgan
2c6e8f5248
Update submodule to 6efb8eb30e7025b168f3fda3ff83b9b386428ad6 ( #1885 )
...
* update submodule to `6efb8eb30e7025b168f3fda3ff83b9b386428ad6`
* unblock condition variable in `update_slots` when closing server
2024-01-10 16:48:38 -05:00
Jeffrey Morgan
34344d801c
clean up cmake build directory when cross compiling macOS builds
2024-01-09 17:13:56 -05:00
Robin Glauser
e868c8a5c7
Update api.md ( #1878 )
...
Fixed assistant in the example response.
2024-01-09 16:21:17 -05:00
Jeffrey Morgan
c336693f07
calculate overhead based number of gpu devices ( #1875 )
2024-01-09 15:53:33 -05:00
Daniel Hiltgen
e89dc1d54b
Merge pull request #1874 from dhiltgen/correct_cuda_min
...
Set corret CUDA minimum compute capability version
2024-01-09 11:37:22 -08:00
Daniel Hiltgen
1961a81f03
Set corret CUDA minimum compute capability version
...
If you attempt to run the current CUDA build on compute capability 5.2
cards, you'll hit the following failure:
cuBLAS error 15 at ggml-cuda.cu:7956: the requested functionality is not supported
2024-01-09 11:28:24 -08:00
Jeffrey Morgan
8a8c7e7f8d
only build for metal on arm64
2024-01-09 13:51:08 -05:00
Jeffrey Morgan
6df83e6daa
update rough cuda overhead estimate to 15% + 384MiB
2024-01-09 13:51:08 -05:00
Michael Yang
f921e2696e
typo
2024-01-09 09:45:42 -08:00
Michael Yang
4a33cede20
remove unused fields and functions
2024-01-09 09:37:40 -08:00
Michael Yang
f95d2f25f3
fix temporary history file permissions
2024-01-09 09:36:58 -08:00
Michael Yang
2b9892a808
fix(windows): modelpath and list
2024-01-09 09:36:58 -08:00
Michael Yang
2bb2bdd5d4
fix lint
2024-01-09 09:36:58 -08:00
Michael Yang
acfc376efd
add .golangci.yaml
2024-01-09 09:36:58 -08:00
Michael Yang
997253143f
add lint and test on pull_request
2024-01-09 09:36:58 -08:00
Michael Yang
62023177f6
Merge pull request #1614 from jmorganca/mxyng/fix-set-template
...
fix: set template without triple quotes
2024-01-09 09:36:24 -08:00
Jeffrey Morgan
6164f378f2
revert cuda overhead to 20%
2024-01-09 00:54:29 -05:00
Jeffrey Morgan
f387e9631b
use runner if cuda alloc won't fit
2024-01-09 00:44:34 -05:00
Jeffrey Morgan
6566387ae3
add TODO for cuda overhead
2024-01-09 00:28:03 -05:00
Jeffrey Morgan
37708931fb
update cuda overhead to 20% to fix crashes when switching between models and large context sizes
2024-01-09 00:05:23 -05:00
Jeffrey Morgan
f6cb0a553c
update cuda overhead to 15% or 400MiB
2024-01-08 23:45:45 -05:00
Jeffrey Morgan
2680078c13
fix build on linux
2024-01-08 23:44:13 -05:00
Jeffrey Morgan
f1b7e5f560
update overhead to 15%
2024-01-08 23:37:45 -05:00
Jeffrey Morgan
cb534e6ac2
use 10% vram overhead for cuda
2024-01-08 23:17:44 -05:00
Jeffrey Morgan
58ce2d8273
better estimate scratch buffer size
2024-01-08 21:32:44 -05:00
Jeffrey Morgan
18ddf6d57d
fix windows build
2024-01-08 20:04:01 -05:00
Michael Yang
61e6502449
Merge pull request #1818 from jmorganca/mxyng/fix-alt-prompt
...
fix(cmd): history in alt prompt
2024-01-08 13:48:34 -08:00
Jeffrey Morgan
08f1e18965
Offload layers to GPU based on new model size estimates ( #1850 )
...
* select layers based on estimated model memory usage
* always account for scratch vram
* dont load +1 layers
* better estmation for graph alloc
* Update gpu/gpu_darwin.go
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com >
* Update llm/llm.go
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com >
* Update llm/llm.go
* add overhead for cuda memory
* Update llm/llm.go
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com >
* fix build error on linux
* address comments
---------
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com >
2024-01-08 16:42:00 -05:00
Bruce MacDonald
7e8f7c8358
remove ggml automatic re-pull ( #1856 )
2024-01-08 14:41:01 -05:00
Bruce MacDonald
3f3eb19a3b
document response in modelfile template variables ( #1428 )
2024-01-08 14:38:51 -05:00
Daniel Hiltgen
059ae4585e
Merge pull request #1834 from dhiltgen/old_cuda
...
Detect very old CUDA GPUs and fall back to CPU
2024-01-07 10:39:49 -08:00
Daniel Hiltgen
6347f501ca
Merge pull request #1828 from dhiltgen/fix_llava
...
Accept windows paths for image processing
2024-01-07 09:05:46 -08:00
Jeffrey Morgan
5feec959ad
dont use -Wall in static build ( #1833 )
2024-01-07 10:39:19 -05:00
Jeffrey Morgan
dbdd50b283
add -DCMAKE_SYSTEM_NAME=Darwin cmake flag ( #1832 )
2024-01-07 00:46:17 -05:00
Daniel Hiltgen
d74ce6bd4f
Detect very old CUDA GPUs and fall back to CPU
...
If we try to load the CUDA library on an old GPU, it panics and crashes
the server. This checks the compute capability before we load the
library so we can gracefully fall back to CPU mode.
2024-01-06 21:40:29 -08:00
Guilherme Baptista
57942b4676
Update README.md - Community Integrations - Ollama for Ruby ( #1830 )
2024-01-06 22:31:39 -05:00
Daniel Hiltgen
e0d05b0f1e
Accept windows paths for image processing
...
This enhances our regex to support windows style paths. The regex will
match invalid path specifications, but we'll still validate file
existence and filter out mismatches
2024-01-06 10:50:27 -08:00
Daniel Hiltgen
2d9dd14f27
Merge pull request #1697 from dhiltgen/win_docs
...
Add windows native build instructions
2024-01-05 19:34:20 -08:00
Jeffrey Morgan
1caa56128f
add cuda lib path for nvidia container toolkit
2024-01-05 21:10:37 -05:00
Michael Yang
0101e76dbe
Merge pull request #1797 from sublimator/nd-allow-extension-origins-still-needs-explicit-listing-2024-01-05
...
fix: allow extension origins (still needs explicit listing), fixes #1686
2024-01-05 17:20:09 -08:00
Michael Yang
2ef9352b94
fix(cmd): history in alt mode
2024-01-05 16:20:02 -08:00
Michael Yang
5580ae2472
fix: set template without triple quotes
2024-01-05 15:51:33 -08:00
Bruce MacDonald
3a9f447141
only pull gguf model if already exists ( #1817 )
2024-01-05 18:50:00 -05:00
Patrick Devine
9c2941e61b
switch api for ShowRequest to use the name field ( #1816 )
2024-01-05 15:06:43 -08:00
Patrick Devine
238ac5e765
Add unit tests for Parser ( #1815 )
2024-01-05 14:04:31 -08:00
Bruce MacDonald
4f4980b66b
simplify ggml update logic ( #1814 )
...
- additional information is now available in show response, use this to pull gguf before running
- make gguf updates cancellable
2024-01-05 15:22:32 -05:00
Patrick Devine
22e93efa41
add show info command and fix the modelfile
2024-01-05 12:20:05 -08:00
Patrick Devine
2909dce894
split up interactive generation
2024-01-05 12:20:05 -08:00
Jeffrey Morgan
df32537312
gpu: read memory info from all cuda devices ( #1802 )
...
* gpu: read memory info from all cuda devices
* add `LOOKUP_SIZE` constant
* better constant name
* address comments
2024-01-05 11:25:58 -05:00
Bruce MacDonald
3367b5f3df
remove unused generate patches ( #1810 )
2024-01-05 11:25:45 -05:00
Matt Williams
46edbbc518
Merge pull request #1801 from jmorganca/mattw/correctdockerlink
2024-01-04 19:20:45 -08:00
Michael Yang
d2ff18cd6b
Merge pull request #1791 from jmorganca/mxyng/update-build
...
update Dockerfile.build
2024-01-04 19:13:44 -08:00
Matt Williams
df086d3c8c
fix docker doc to point to hub
...
Signed-off-by: Matt Williams <m@technovangelist.com >
2024-01-04 18:42:23 -08:00
Nicholas Dudfield
8baaaa39c0
Allow extension origins (still needs explicit listing), fixes #1686
2024-01-05 09:06:47 +07:00
Michael Yang
f9961c70ae
update build
2024-01-04 17:34:38 -08:00
Daniel Hiltgen
cd8fad3398
Merge pull request #1790 from dhiltgen/llm_code_shuffle
...
Cleaup stale submodule
2024-01-04 13:47:25 -08:00
Daniel Hiltgen
9983fa5f4e
Cleaup stale submodule
...
If the tree has a stale submodule, make sure we clean it up first
2024-01-04 13:40:16 -08:00
Daniel Hiltgen
dfda91c2ee
Merge pull request #1788 from dhiltgen/llm_code_shuffle
...
Revamp code layout for the llm directory and llama.cpp submodule
2024-01-04 13:14:28 -08:00
Daniel Hiltgen
fac9060da5
Init submodule with new path
2024-01-04 13:00:13 -08:00
Daniel Hiltgen
a554616f8e
remove old llama.cpp submodule path
2024-01-04 12:12:21 -08:00
Daniel Hiltgen
77d96da94b
Code shuffle to clean up the llm dir
2024-01-04 12:12:05 -08:00
Brian Murray
0d6e3565ae
Add embeddings to API ( #1773 )
2024-01-04 15:00:52 -05:00
Daniel Hiltgen
b5939008a1
Merge pull request #1785 from dhiltgen/win_native_cli
...
Load dynamic cpu lib on windows
2024-01-04 08:55:01 -08:00
Daniel Hiltgen
e9ce91e9a6
Load dynamic cpu lib on windows
...
On linux, we link the CPU library in to the Go app and fall back to it
when no GPU match is found. On windows we do not link in the CPU library
so that we can better control our dependencies for the CLI. This fixes
the logic so we correctly fallback to the dynamic CPU library
on windows.
2024-01-04 08:41:41 -08:00
Bruce MacDonald
4ad6c9b11f
fix: pull either original model or from model on create ( #1774 )
2024-01-04 01:34:38 -05:00
Jeffrey Morgan
c0285158a9
tweak memory requirements error text
2024-01-03 19:47:18 -05:00
Jeffrey Morgan
77a66df72c
add macOS memory check for 47B models
2024-01-03 19:46:16 -05:00
Jeffrey Morgan
5b4837f881
remove unused filetype check
2024-01-03 19:45:39 -05:00
Jeffrey Morgan
29340c2e62
update cmake flags for amd64 macOS ( #1780 )
...
* update cmake flags for intel macOS
* remove `LLAMA_K_QUANTS`
* put back `CMAKE_OSX_DEPLOYMENT_TARGET` and disable `LLAMA_F16C`
2024-01-03 19:22:15 -05:00
Daniel Hiltgen
d5ec730354
Merge pull request #1779 from dhiltgen/refined_amd_gpu_list
...
Improve maintainability of Radeon card list
2024-01-03 16:18:57 -08:00
Daniel Hiltgen
8bed487aba
Merge pull request #1778 from dhiltgen/wsl1
...
Fail fast on WSL1 while allowing on WSL2
2024-01-03 16:18:41 -08:00
Daniel Hiltgen
c1a10a6e9b
Merge pull request #1781 from dhiltgen/cpu_only_build
...
Fix CPU only builds
2024-01-03 16:18:25 -08:00
Daniel Hiltgen
ddbfa6fe31
Fix CPU only builds
...
Go embed doesn't like when there's no matching files, so put
a dummy placeholder in to allow building without any GPU support
If no "server" library is found, it's safely ignored at runtime.
2024-01-03 16:08:34 -08:00
Daniel Hiltgen
2fcd41ef81
Fail fast on WSL1 while allowing on WSL2
...
This prevents users from accidentally installing on WSL1 with instructions
guiding how to upgrade their WSL instance to version 2. Once running WSL2
if you have an NVIDIA card, you can follow their instructions to set up
GPU passthrough and run models on the GPU. This is not possible on WSL1.
2024-01-03 16:02:32 -08:00
Daniel Hiltgen
16f4603b67
Improve maintainability of Radeon card list
...
This moves the list of AMD GPUs to an easier to maintain list which
should make it easier to update over time.
2024-01-03 15:16:56 -08:00
Daniel Hiltgen
1184686649
Merge pull request #1776 from dhiltgen/render_group
...
Add ollama user to render group for Radeon support
2024-01-03 13:07:54 -08:00
Daniel Hiltgen
2588cb2daa
Add ollama user to render group for Radeon support
...
For the ROCm libraries to access the driver, we need to add the ollama user
to the render group.
2024-01-03 12:56:31 -08:00
Jeffrey Morgan
c7ea8f237e
set num_gpu to 1 only by default on darwin arm64 ( #1771 )
2024-01-03 14:10:29 -05:00
Bruce MacDonald
0b3118e0af
fix: relay request opts to loaded llm prediction ( #1761 )
2024-01-03 12:01:42 -05:00
Daniel Hiltgen
05face44ef
Merge pull request #1683 from dhiltgen/fix_windows_test
...
Fix windows system memory lookup
2024-01-03 09:00:39 -08:00
Daniel Hiltgen
a2ad952440
Fix windows system memory lookup
...
This refines the gpu package error handling and fixes a bug with the
system memory lookup on windows.
2024-01-03 08:50:01 -08:00
Daniel Hiltgen
5fea4410be
Merge pull request #1680 from dhiltgen/better_patching
...
Refactor how we augment llama.cpp and refine windows native build
2024-01-03 08:10:17 -08:00
Bruce MacDonald
b846eb64d0
Fix template api doc description ( #1661 )
2024-01-03 11:00:59 -05:00
Cole Gillespie
3c5dd9ed1d
Update README.md ( #1766 )
2024-01-03 10:44:22 -05:00
Jeffrey Morgan
b17ccd0542
Update import.md
2024-01-02 22:28:18 -05:00
Patrick Devine
d0409f772f
keyboard shortcut help ( #1764 )
2024-01-02 18:04:12 -08:00
Jeffrey Morgan
ec261422af
use docker build in build scripts
2024-01-02 19:32:54 -05:00
Daniel Hiltgen
0498f7ce56
Get rid of one-line llama.log
...
This one log line was triggering a single line llama.log to be generated
in the pwd of the server
2024-01-02 15:36:16 -08:00
Daniel Hiltgen
738a8d12eb
Rename the ollama cmakefile
2024-01-02 15:36:16 -08:00
Daniel Hiltgen
d966b730ac
Switch windows build to fully dynamic
...
Refactor where we store build outputs, and support a fully dynamic loading
model on windows so the base executable has no special dependencies thus
doesn't require a special PATH.
2024-01-02 15:36:16 -08:00
Daniel Hiltgen
9a70aecccb
Refactor how we augment llama.cpp
...
This changes the model for llama.cpp inclusion so we're not applying a patch,
but instead have the C++ code directly in the ollama tree, which should make it
easier to refine and update over time.
2024-01-02 15:35:55 -08:00
Karim ElGhandour
22cd5eaab6
Added Ollama-SwiftUI to integrations ( #1747 )
2024-01-02 09:47:50 -05:00
Dane Madsen
304a8799ca
Update README.md ( #1757 )
2024-01-02 09:47:08 -05:00
Jeffrey Morgan
2a2fa3c329
api.md cleanup & formatting
2023-12-27 14:32:35 -05:00
Jeffrey Morgan
55978c1dc9
clean up cache api option
2023-12-27 14:27:45 -05:00
Jeffrey Morgan
d4ebdadbe7
enable cache_prompt by default
2023-12-27 14:23:42 -05:00
Daniel Hiltgen
e201efa14b
Add windows native build instructions
2023-12-25 08:31:34 -08:00
Icelain
c5f21f73a4
follow best practices by adding resp.Body.Close() ( #1708 )
2023-12-25 09:01:37 -05:00
Jeffrey Morgan
371bc73531
Update README.md
2023-12-24 11:54:08 -05:00
Jeffrey Morgan
c651d8b824
Update README.md
2023-12-23 11:18:12 -05:00
Daniel Hiltgen
cf50ef5b51
Merge pull request #1684 from dhiltgen/tag_integration_tests
...
Guard integration tests with a tag
2023-12-22 16:43:41 -08:00
Daniel Hiltgen
697bea6939
Guard integration tests with a tag
...
This should help CI avoid running the integration test logic in a
container where it's not currently possible.
2023-12-22 16:33:27 -08:00
K0IN
10da41d677
Add Cache flag to api ( #1642 )
2023-12-22 17:16:20 -05:00
Bruce MacDonald
db356c8519
post-response templating ( #1427 )
2023-12-22 17:07:05 -05:00
Jeffrey Morgan
b80081022f
cache docker builds in build_linux.sh
2023-12-22 16:01:20 -05:00
Matt Williams
790457398a
Merge pull request #1677 from jmorganca/mattw/docrunupdate
...
update where are models stored q
2023-12-22 09:56:27 -08:00
Matt Williams
511069a2a5
update where are models stored q
...
Signed-off-by: Matt Williams <m@technovangelist.com >
2023-12-22 09:48:44 -08:00
Matt Williams
5a85070c22
Update readmes, requirements, packagejsons, etc for all examples ( #1452 )
...
Most of the examples needed updates of Readmes to show how to run them. Some of the requirements.txt files had extra content that wasn't needed, or missing altogether. Apparently some folks like to run npm start
to run typescript, so a script was added to all typescript examples which
hadn't been done before.
Basically just a lot of cleanup.
Signed-off-by: Matt Williams <m@technovangelist.com >
2023-12-22 09:10:41 -08:00
Matt Williams
291700c92d
Clean up documentation ( #1506 )
...
* Clean up documentation
Will probably need to update with PRs for new release.
Signed-off-by: Matt Williams <m@technovangelist.com >
* Correcting to fit in 0.1.15 changes
Signed-off-by: Matt Williams <m@technovangelist.com >
* Update README.md
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com >
* addressing comments
Signed-off-by: Matt Williams <m@technovangelist.com >
* more api cleanup
Signed-off-by: Matt Williams <m@technovangelist.com >
* its llava not llama
Signed-off-by: Matt Williams <m@technovangelist.com >
* Update docs/troubleshooting.md
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com >
* Updated hosting to server and documented all env vars
Signed-off-by: Matt Williams <m@technovangelist.com >
* remove last of the cli descriptions
Signed-off-by: Matt Williams <m@technovangelist.com >
* Update README.md
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com >
* update further per conversation with jeff earlier today
Signed-off-by: Matt Williams <m@technovangelist.com >
* cleanup the doc readme
Signed-off-by: Matt Williams <m@technovangelist.com >
* move upgrade to faq
Signed-off-by: Matt Williams <m@technovangelist.com >
* first change
Signed-off-by: Matt Williams <m@technovangelist.com >
* updated
Signed-off-by: Matt Williams <m@technovangelist.com >
* Update docs/faq.md
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com >
* Update docs/api.md
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com >
* Update docs/api.md
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com >
* Update docs/api.md
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com >
* Update docs/api.md
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com >
* Update docs/api.md
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com >
* Update docs/api.md
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com >
* Update docs/README.md
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com >
* Update docs/api.md
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com >
* Update docs/api.md
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com >
* Update docs/api.md
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com >
* Update README.md
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com >
* Update docs/README.md
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com >
* Update docs/api.md
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com >
* Update docs/api.md
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com >
* Update docs/api.md
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com >
* Update docs/README.md
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com >
* Update docs/README.md
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com >
* Update docs/README.md
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com >
* examples in parent
Signed-off-by: Matt Williams <m@technovangelist.com >
* add exapmle for create model.
Signed-off-by: Matt Williams <m@technovangelist.com >
* update faq
Signed-off-by: Matt Williams <m@technovangelist.com >
* update create model api
Signed-off-by: Matt Williams <m@technovangelist.com >
* Update docs/api.md
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com >
* Update docs/faq.md
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com >
* Update docs/troubleshooting.md
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com >
* update the readme in docs
Signed-off-by: Matt Williams <m@technovangelist.com >
* update a few more things
Signed-off-by: Matt Williams <m@technovangelist.com >
* Update docs/troubleshooting.md
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com >
* Update docs/faq.md
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com >
* Update README.md
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com >
* Update docs/modelfile.md
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com >
* Update docs/troubleshooting.md
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com >
---------
Signed-off-by: Matt Williams <m@technovangelist.com >
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com >
2023-12-22 09:10:01 -08:00
Daniel Hiltgen
9db28af84e
Merge pull request #1675 from dhiltgen/less_verbose
...
Quiet down llama.cpp logging by default
2023-12-22 08:57:17 -08:00
Daniel Hiltgen
e5202eb687
Quiet down llama.cpp logging by default
...
By default builds will now produce non-debug and non-verbose binaries.
To enable verbose logs in llama.cpp and debug symbols in the
native code, set `CGO_CFLAGS=-g`
2023-12-22 08:47:18 -08:00
Daniel Hiltgen
96fb441abd
Merge pull request #1146 from dhiltgen/ext_server_cgo
...
Add cgo implementation for llama.cpp
2023-12-22 08:16:31 -08:00
Daniel Hiltgen
495c06e4a6
Fix doc glitch
2023-12-21 18:21:31 -08:00
Daniel Hiltgen
fa24e73b82
Remove CPU build, fixup linux build script
2023-12-21 18:21:31 -08:00
Daniel Hiltgen
325d74985b
Fix CPU performance on hyperthreaded systems
...
The default thread count logic was broken and resulted in 2x the number
of threads as it should on a hyperthreading CPU
resulting in thrashing and poor performance.
2023-12-21 16:23:36 -08:00
Bruce MacDonald
fabf2f3467
allow for starting llava queries with filepath ( #1549 )
2023-12-21 13:20:59 -05:00
Daniel Hiltgen
d9cd3d9667
Revive windows build
...
The windows native setup still needs some more work, but this gets it building
again and if you set the PATH properly, you can run the resulting exe on a cuda system.
2023-12-20 17:21:54 -08:00
Patrick Devine
a607d922f0
add FAQ for slow networking in WSL2 ( #1646 )
2023-12-20 16:27:24 -08:00
Daniel Hiltgen
7555ea44f8
Revamp the dynamic library shim
...
This switches the default llama.cpp to be CPU based, and builds the GPU variants
as dynamically loaded libraries which we can select at runtime.
This also bumps the ROCm library to version 6 given 5.7 builds don't work
on the latest ROCm library that just shipped.
2023-12-20 14:45:57 -08:00
Jeffrey Morgan
df06812494
Update api.md
2023-12-20 08:47:53 -05:00
Daniel Hiltgen
1d1eb1688c
Additional nvidial-ml path to check
2023-12-19 15:52:34 -08:00
Michael Yang
23dc179350
Merge pull request #1619 from jmorganca/mxyng/fix-version-test
...
fix(test): use real version string for comparison
2023-12-19 15:48:52 -08:00
Michael Yang
63aac0edc5
fix(test): use real version string for comparison
2023-12-19 15:03:02 -08:00
Daniel Hiltgen
6558f94ed0
Fix darwin intel build
2023-12-19 13:32:24 -08:00
Erick Ghaumez
1ca484f67e
Add Langchain Dart library ( #1564 )
...
* Add Langchain Dart
* Update README.md
---------
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com >
2023-12-19 14:04:52 -05:00
Jeffrey Morgan
72b0c32fe9
Update README.md
2023-12-19 12:59:22 -05:00
Jeffrey Morgan
68c28224f8
Update README.md
2023-12-19 12:59:03 -05:00
Daniel Hiltgen
54dbfa4c4a
Carry ggml-metal.metal as payload
2023-12-19 09:05:46 -08:00
Daniel Hiltgen
5646826a79
Add WSL2 path to nvidia-ml.so library
2023-12-19 09:05:46 -08:00
Daniel Hiltgen
3269535a4c
Refine handling of shim presence
...
This allows the CPU only builds to work on systems with Radeon cards
2023-12-19 09:05:46 -08:00
Daniel Hiltgen
1b991d0ba9
Refine build to support CPU only
...
If someone checks out the ollama repo and doesn't install the CUDA
library, this will ensure they can build a CPU only version
2023-12-19 09:05:46 -08:00
Daniel Hiltgen
51082535e1
Add automated test for multimodal
...
A simple test case that verifies llava:7b can read text in an image
2023-12-19 09:05:46 -08:00
Daniel Hiltgen
9adca7f711
Bump llama.cpp to b1662 and set n_parallel=1
2023-12-19 09:05:46 -08:00
Daniel Hiltgen
89bbaafa64
Build linux using ubuntu 20.04
...
This changes the container-based linux build to use an older Ubuntu
distro to improve our compatibility matrix for older user machines
2023-12-19 09:05:46 -08:00
Daniel Hiltgen
35934b2e05
Adapted rocm support to cgo based llama.cpp
2023-12-19 09:05:46 -08:00
65a
f8ef4439e9
Use build tags to generate accelerated binaries for CUDA and ROCm on Linux.
...
The build tags rocm or cuda must be specified to both go generate and go build.
ROCm builds should have both ROCM_PATH set (and the ROCM SDK present) as well
as CLBlast installed (for GGML) and CLBlast_DIR set in the environment to the
CLBlast cmake directory (likely /usr/lib/cmake/CLBlast). Build tags are also
used to switch VRAM detection between cuda and rocm implementations, using
added "accelerator_foo.go" files which contain architecture specific functions
and variables. accelerator_none is used when no tags are set, and a helper
function addRunner will ignore it if it is the chosen accelerator. Fix go
generate commands, thanks @deadmeu for testing.
2023-12-19 09:05:46 -08:00
Daniel Hiltgen
d4cd695759
Add cgo implementation for llama.cpp
...
Run the server.cpp directly inside the Go runtime via cgo
while retaining the LLM Go abstractions.
2023-12-19 09:05:46 -08:00
Bruce MacDonald
5e7fd6906f
Update images.go
2023-12-19 09:05:46 -08:00
Bruce MacDonald
811b1f03c8
deprecate ggml
...
- remove ggml runner
- automatically pull gguf models when ggml detected
- tell users to update to gguf in the case automatic pull fails
Co-Authored-By: Jeffrey Morgan <jmorganca@gmail.com >
2023-12-19 09:05:46 -08:00
Matt Williams
ed195f3562
Merge pull request #1595 from pgibler/main
...
Added cmdh to community section in README
2023-12-18 20:55:18 -08:00
Matt Williams
e0d0072ef1
Merge pull request #1592 from jmorganca/mattw/examplepruning
...
Lets get rid of these old modelfile examples
2023-12-18 20:29:48 -08:00
pgibler
620a2ffcfb
Added cmdh to community section in README
2023-12-18 22:04:40 -05:00
Matt Williams
d287013f24
Lets get rid of these old modelfile examples
...
Signed-off-by: Matt Williams <m@technovangelist.com >
2023-12-18 17:47:33 -08:00
Jeffrey Morgan
6b5bdfa6c9
update runner submodule
2023-12-18 17:33:46 -05:00
Jeffrey Morgan
c063ee4af0
update runner submodule to fix hipblas build
2023-12-18 15:41:13 -05:00
Bruce MacDonald
d99fa6ce0a
send empty messages on last chat response ( #1530 )
2023-12-18 14:23:38 -05:00
Patrick Devine
3948c6ea06
add magic header for unit tests ( #1558 )
2023-12-18 10:41:02 -08:00
Jeffrey Morgan
b85982eb91
update runner submodule
2023-12-18 12:43:31 -05:00
Patrick Devine
86b0dd4b16
add API create/copy handlers ( #1541 )
2023-12-15 11:59:18 -08:00
Augustinas Malinauskas
f728738427
README with Enchanted iOS App ( #1529 )
...
* feat(docs): README with Enchanted iOS app
* Update README.md
---------
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com >
2023-12-15 14:37:29 -05:00
Ian Purton
115048a0d8
Added Bionic GPT as a front end. ( #1463 )
...
* Added Bionic GPT as a front end.
* Update README.md
---------
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com >
2023-12-15 14:33:04 -05:00
Bruce MacDonald
1b417a7836
use exp slices for go 1.20 compatibility ( #1544 )
2023-12-15 14:15:56 -05:00
Patrick Devine
0174665d0e
add API tests for list handler ( #1535 )
2023-12-14 18:18:25 -08:00
Patrick Devine
630518f0d9
Add unit test of API routes ( #1528 )
2023-12-14 16:47:40 -08:00
Bruce MacDonald
6e16098a60
remove sample_count from docs ( #1527 )
...
this info has not been returned from these endpoints in some time
2023-12-14 17:49:00 -05:00
Bruce MacDonald
6ee8c80199
restore model load duration on generate response ( #1524 )
...
* restore model load duration on generate response
- set model load duration on generate and chat done response
- calculate createAt time when response created
* remove checkpoints predict opts
* Update routes.go
2023-12-14 12:15:50 -05:00
Jeffrey Morgan
31f0551dab
Update runner to support mixtral and mixture of experts (MoE) ( #1475 )
2023-12-13 17:15:10 -05:00
Jeffrey Morgan
4a1abfe4fa
fix tests
2023-12-13 14:42:30 -05:00
Jeffrey Morgan
bbd41494bf
add multimodal to README.md
2023-12-13 14:38:47 -05:00
Jeffrey Morgan
fedba24a63
Docs for multimodal support ( #1485 )
...
* add multimodal docs
* add chat api docs
* consistency between `/api/generate` and `/api/chat`
* simplify docs
2023-12-13 13:59:33 -05:00
pepperoni21
e3b090dbc5
Added message format for chat api ( #1488 )
2023-12-13 11:21:23 -05:00
Patrick Devine
d9e60f634b
add image support to the chat api ( #1490 )
2023-12-12 13:28:58 -08:00
Michael Yang
4251b342de
Merge pull request #1469 from jmorganca/mxyng/model-types
...
remove per-model types
2023-12-12 12:27:03 -08:00
Jeffrey Morgan
0a9d348023
Fix issues with /set template and /set system ( #1486 )
2023-12-12 14:43:19 -05:00
Bruce MacDonald
3144e2a439
exponential back-off ( #1484 )
2023-12-12 12:33:02 -05:00
Bruce MacDonald
c0960e29b5
retry on concurrent request failure ( #1483 )
...
- remove parallel
2023-12-12 12:14:35 -05:00
ruecat
5314fc9b63
Fix Readme "Database -> MindsDB" link ( #1479 )
2023-12-12 10:26:13 -05:00
Jorge Torres
a36b5fef3b
Update README.md ( #1412 )
2023-12-11 18:05:10 -05:00
Patrick Devine
910e9401d0
Multimodal support ( #1216 )
...
---------
Co-authored-by: Matt Apperson <mattapperson@Matts-MacBook-Pro.local >
2023-12-11 13:56:22 -08:00
Michael Yang
56ffc3023a
remove per-model types
...
mostly replaced by decoding tensors except ggml models which only
support llama
2023-12-11 09:40:21 -08:00
Bruce MacDonald
7a1b37ac64
os specific ctrl-z ( #1420 )
2023-12-11 10:48:14 -05:00
Jeffrey Morgan
5d4d2e2c60
update docs with chat completion api
2023-12-10 13:53:36 -05:00
Jeffrey Morgan
7db5bcf73b
fix go-staticcheck warning
2023-12-10 11:44:27 -05:00
Jeffrey Morgan
fa2f095bd9
fix model name returned by /api/generate being different than the model name provided
2023-12-10 11:42:15 -05:00
Jeffrey Morgan
045b855db9
fix error on accumulating final chat response
2023-12-10 11:24:39 -05:00
Jeffrey Morgan
32064a0646
fix empty response when receiving runner error
2023-12-10 10:53:38 -05:00
Jeffrey Morgan
d9a250e9b5
seek to end of file when decoding older model formats
2023-12-09 21:14:35 -05:00
Jeffrey Morgan
944519ed16
seek to eof for older model binaries
2023-12-09 20:48:57 -05:00
Jeffrey Morgan
2dd040d04c
do not use --parallel 2 for old runners
2023-12-09 20:17:33 -05:00
Bruce MacDonald
bbe41ce41a
fix: parallel queueing race condition caused silent failure ( #1445 )
...
* fix: queued request failures
- increase parallel requests to 2 to complete queued request, queueing is managed in ollama
* log steam errors
2023-12-09 14:14:02 -05:00
Jeffrey Morgan
9e1406e4ed
Don't expose model information in /api/generate
2023-12-09 02:05:43 -08:00
Jeffrey Morgan
b74580c913
Update api.md
2023-12-08 16:02:07 -08:00
Bruce MacDonald
7e9405fd07
fix: encode full previous prompt in context ( #1424 )
2023-12-08 16:53:51 -05:00
Bruce MacDonald
3b0b8930d4
fix: only flush template in chat when current role encountered ( #1426 )
2023-12-08 16:44:24 -05:00
Bruce MacDonald
e3f925fc1b
fix: restore modelfile system in prompt template ( #1425 )
2023-12-08 14:20:19 -05:00
Jeffrey Morgan
2a2289fb6b
Update api.md
2023-12-08 09:36:45 -08:00
Matt Williams
dd427f499a
Merge pull request #1419 from jmorganca/mattw/typescript-simplechat
...
Simple chat example for typescript
2023-12-07 14:42:24 -08:00
Michael Yang
2ae573c7ed
Merge pull request #1421 from jmorganca/mxyng/fix-newline
...
fix redundant newline
2023-12-07 13:47:23 -08:00
Matt Williams
02fe26c44b
update the readme as per bruce
...
Signed-off-by: Matt Williams <m@technovangelist.com >
2023-12-07 13:46:30 -08:00
Michael Yang
16c7548460
fix redundant newline
2023-12-07 13:44:45 -08:00
Matt Williams
fa75998c0d
Update examples/typescript-simplechat/readme.md
...
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com >
2023-12-07 13:40:54 -08:00
Matt Williams
5344f886c8
Update examples/typescript-simplechat/client.ts
...
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com >
2023-12-07 13:40:37 -08:00
Matt Williams
6cc823c9b5
Update examples/typescript-simplechat/client.ts
...
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com >
2023-12-07 13:39:59 -08:00
Matt Williams
b84d34e632
Update examples/typescript-simplechat/readme.md
...
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com >
2023-12-07 13:39:33 -08:00
Matt Williams
30229a913c
Update examples/typescript-simplechat/client.ts
...
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com >
2023-12-07 13:39:24 -08:00
Matt Williams
1ade380bd7
Simple chat example for typescript
...
Signed-off-by: Matt Williams <m@technovangelist.com >
2023-12-07 11:48:25 -08:00
Jeffrey Morgan
ba264e9da8
add future version note to chat api docs
2023-12-07 09:42:15 -08:00
Matt Williams
a2405ec831
Merge pull request #1409 from jmorganca/mattw/python-simplechat
...
Simple chat example
2023-12-06 15:49:45 -08:00
Matt Williams
ce809bb529
Merge branch 'mattw/python-simplechat' of github.com:jmorganca/ollama into mattw/python-simplechat
2023-12-06 15:48:42 -08:00
Matt Williams
76bc4d0458
Cleanup as per Bruce
...
Signed-off-by: Matt Williams <m@technovangelist.com >
2023-12-06 15:44:40 -08:00
Bruce MacDonald
4a02945a15
Update examples/python-simplechat/client.py
2023-12-06 18:36:45 -05:00
Matt Williams
aec742b6d2
Update examples/python-simplechat/readme.md
...
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com >
2023-12-06 15:30:45 -08:00
Matt Williams
f337642e94
Update examples/python-simplechat/readme.md
...
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com >
2023-12-06 15:30:35 -08:00
Matt Williams
51131cc6e2
Update examples/python-simplechat/client.py
...
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com >
2023-12-06 15:30:10 -08:00
Matt Williams
43027789dc
Simple chat example
...
Signed-off-by: Matt Williams <m@technovangelist.com >
2023-12-06 14:35:58 -08:00
Xe Iaso
f9b7d65e2b
docs/tutorials: add bit on how to use Fly GPUs on-demand with Ollama ( #1406 )
...
Signed-off-by: Xe Iaso <xe@camellia.finch-kitefin.ts.net >
2023-12-06 14:14:02 -08:00
Michael Yang
1f05d77110
Merge pull request #1244 from jmorganca/brucemacd/no-fail-template
...
do not fail on unsupported template variables
2023-12-06 13:23:04 -08:00
Michael Yang
c3ff36088b
Merge pull request #774 from jmorganca/mxyng/server-version
...
add version api and show server version in cli
2023-12-06 13:22:55 -08:00
Samuel Calderon
13524b5e72
List "Send chat messages" in table of contents ( #1399 )
...
Thank you @calderonsamuel
2023-12-06 12:34:27 -08:00
Michael Yang
f1b049fed8
Merge pull request #1377 from jmorganca/mxyng/qwen
...
update for qwen
2023-12-06 12:31:51 -08:00
Jeffrey Morgan
97c5696945
fix base urls in chat examples
2023-12-06 12:10:20 -08:00
Bruce MacDonald
47d4e22673
use missingkey in set empty interface when missing
2023-12-05 15:49:05 -08:00
Michael Yang
32f62fbb8e
Merge pull request #1334 from jmorganca/mxyng/load-projectors
...
load projectors
2023-12-05 14:40:53 -08:00
Michael Yang
5d75505ebd
return model configuration in generate
2023-12-05 14:39:02 -08:00
Michael Yang
b9495ea162
load projectors
2023-12-05 14:36:12 -08:00
Michael Yang
409bb9674e
Merge pull request #1308 from jmorganca/mxyng/split-from
...
split from into one or more models
2023-12-05 14:33:03 -08:00
Michael Yang
d3479c07a1
Merge pull request #1250 from jmorganca/mxyng/create-layer
...
refactor layer creation
2023-12-05 14:32:52 -08:00
Michael Yang
b12f1b984f
Merge pull request #1393 from jmorganca/mxyng/fix-whitespace
...
fix: trim space in modelfile fields
2023-12-05 12:18:01 -08:00
Bruce MacDonald
195e3d9dbd
chat api endpoint ( #1392 )
2023-12-05 14:57:33 -05:00
Michael Yang
38fe1a368b
fix: trim space in modelfile fields
2023-12-05 11:57:29 -08:00
Michael Yang
4b77fcb2b9
comments
2023-12-05 09:43:50 -08:00
Michael Yang
cde13bcdea
cmd: only print server version when different
2023-12-05 09:36:01 -08:00
Michael Yang
0f0cd265a7
cmd: add server version
2023-12-05 09:36:01 -08:00
Michael Yang
0db4706ec2
api: add version api handler
2023-12-05 09:36:01 -08:00
Michael Yang
1ebdbd9694
server: add version handler
2023-12-05 09:36:01 -08:00
Michael Yang
5c59455b59
cmd: use existing cmd context
2023-12-05 09:36:01 -08:00
Jeffrey Morgan
00d06619a1
Revert "chat api ( #991 )" while context variable is fixed
...
This reverts commit 7a0899d62d .
2023-12-04 21:16:27 -08:00
Matt Williams
f1ef3f9947
remove mention of gpt-neox in import ( #1381 )
...
Signed-off-by: Matt Williams <m@technovangelist.com >
2023-12-04 20:58:10 -08:00
Michael Yang
5a5dca13b2
comments
2023-12-04 16:59:23 -08:00
Michael Yang
7232f1fa41
go mod tidy
2023-12-04 16:59:23 -08:00
Michael Yang
72e7a49aa9
seek instead of copyn
2023-12-04 16:59:23 -08:00
Michael Yang
a3737cbd33
use NewLayer for CreateBlobHandler
2023-12-04 16:59:23 -08:00
Michael Yang
998f1785b6
add modelfamilies
2023-12-04 16:59:23 -08:00
Michael Yang
70a93057cd
refactor layer creation
...
previous layer creation was not ideal because:
1. it required reading the input file multiple times, once to calculate
the sha256 checksum, another to write it to disk, and potentially one
more to decode the underlying gguf
2. used io.ReadSeeker which is prone to user error. if the file isn't
reset correctly or in the right place, it could end up reading an
empty file
there are also some brittleness when reading existing layers else
writing the inherited layers will error reading an already closed file
this commit aims to fix these issues by restructuring layer creation.
1. it will now write the layer to a temporary file as well as the hash
function and move it to the final location on Commit
2. layers are read once once when copied to the destination. exception
is raw model files which still requires a second read to decode the
model metadata
2023-12-04 16:59:23 -08:00
Michael Yang
2cb0fa7d40
split from into one or more models
2023-12-04 16:59:23 -08:00
Michael Yang
b2816bca67
unnecessary ReadSeeker for DecodeGGML
2023-12-04 16:59:23 -08:00
Patrick Devine
bf704423c5
revert cli to use /api/generate ( #1383 )
2023-12-04 16:35:29 -08:00
Bruce MacDonald
7a0899d62d
chat api ( #991 )
...
- update chat docs
- add messages chat endpoint
- remove deprecated context and template generate parameters from docs
- context and template are still supported for the time being and will continue to work as expected
- add partial response to chat history
2023-12-04 18:01:06 -05:00
Michael Yang
0cca1486dd
Merge pull request #1376 from jmorganca/mxyng/rocky-install
...
install: fix rocky kernel packages
2023-12-04 14:23:43 -08:00
Patrick Devine
2113c9d31a
make linewrap still work when the terminal width has changed ( #1350 )
2023-12-04 14:14:56 -08:00
Michael Yang
6deebf2489
update for qwen
2023-12-04 11:38:05 -08:00
Michael Yang
95cb38ae47
install: fix rocky kernel packages
2023-12-04 11:10:42 -08:00
ruecat
1f126afb2d
Ollama Telegram Bot ( #1364 )
...
* Add "ollama-telegram" to Extensions & Plugins
* Update README.md
2023-12-03 11:19:55 -08:00
Jeffrey Morgan
f6201a7a6c
remove duplicate community integration in README.md
2023-12-02 21:18:13 -08:00
Michael Yang
b3f6c6598f
Merge pull request #1349 from jmorganca/mxyng/ctrl-z
...
handle ctrl+z
2023-12-01 16:21:49 -08:00
Michael Yang
88620e983a
handle ctrl+z
2023-12-01 16:15:20 -08:00
Michael Yang
cedae0d17a
Merge pull request #1347 from jshph/adapter-hash
...
Fix adapter loading from SHA hash
2023-12-01 11:08:25 -08:00
Joshua Pham
bb80a597db
Fix adapter loading from SHA hash
2023-12-01 13:50:55 -05:00
Patrick Devine
6681d37861
allow setting the system and template for prompts in the repl ( #1335 )
2023-12-01 09:28:35 -08:00
Michael Yang
0409c1fa59
docker: set PATH, LD_LIBRARY_PATH, and capabilities ( #1336 )
...
* docker: set PATH, LD_LIBRARY_PATH, and capabilities
* example: update k8s gpu manifest
2023-11-30 21:16:56 -08:00
Michael Yang
b56e92470a
Merge pull request #1229 from jmorganca/mxyng/calculate-as-you-go
...
revert checksum calculation to calculate-as-you-go
2023-11-30 10:54:38 -08:00
Jeffrey Morgan
5687f1a0cf
fix unexpected end of response errors when cancelling in ollama run
2023-11-30 00:30:21 -05:00
James Radtke
7eda3d0c55
Corrected transposed 129 to 192 for OLLAMA_ORIGINS example ( #1325 )
2023-11-29 22:44:17 -05:00
Bruce MacDonald
7194a07d4d
Add chatd to example projects
2023-11-29 21:18:21 -05:00
Michael Yang
13efd5f218
upload: fix PUT retry
2023-11-29 16:38:35 -08:00
Michael Yang
c4bdfffd96
upload: separate progress tracking
2023-11-29 16:38:33 -08:00
Michael Yang
26c63418e0
new hasher
2023-11-29 14:52:41 -08:00
Michael Yang
2799784ac8
revert checksum calculation to calculate-as-you-go
2023-11-29 13:47:58 -08:00
Alec Hammond
91897a606f
Add OllamaEmbeddings to python LangChain example ( #994 )
...
* Add OllamaEmbeddings to python LangChain example
* typo
---------
Co-authored-by: Alec Hammond <alechammond@fb.com >
2023-11-29 16:25:39 -05:00
Bruce MacDonald
96122b7271
validate model tags on copy ( #1323 )
2023-11-29 15:54:29 -05:00
jeremiahbuckley
39be7fdb98
fix rhel cuda install ( #1321 )
...
Co-authored-by: Cloud User <azureuser@testgpu2.hqzwom21okjenksna4y3c4ymjd.phxx.internal.cloudapp.net >
2023-11-29 14:55:15 -05:00
Timothy Jaeryang Baek
c2e3b89176
fix: disable ':' in tag names ( #1280 )
...
Co-authored-by: rootedbox
2023-11-29 13:33:45 -05:00
Patrick Devine
cde31cb220
Allow setting parameters in the REPL ( #1294 )
2023-11-29 09:56:42 -08:00
ToasterUwU
63097607b2
Correct MacOS Host port example ( #1301 )
2023-11-29 11:44:03 -05:00
Michael
2ae80e1e27
Update README.md
...
add new recent models as examples
2023-11-28 22:16:37 -05:00
Michael Yang
b173cfc558
Merge pull request #1195 from jmorganca/mxyng/fix-bar-rate
...
progress: fix bar rate
2023-11-28 11:55:23 -08:00
Michael Yang
424d53ac70
progress: fix bar rate
2023-11-28 11:44:56 -08:00
ftorto
e1a69d44c9
Update faq.md ( #1299 )
...
Fix a typo in the CA update command
2023-11-28 09:54:42 -05:00
Jason Jacobs
3d620f9462
ignore jetbrain ides ( #1287 )
2023-11-27 15:57:45 -05:00
Bruce MacDonald
928950fcc6
update python client create example ( #1227 )
...
* add remote create to python example client
2023-11-27 15:36:19 -05:00
Kasumi
39c6d949fc
Add Amica to community integrations ( #1281 )
2023-11-27 10:44:37 -05:00
Jeffrey Morgan
16a9006306
add back f16c instructions on intel mac
2023-11-26 15:59:49 -05:00
Jeffrey Morgan
e9216ea459
fix readline history on linux
2023-11-26 15:59:04 -05:00
Jeffrey Morgan
9e4a316405
update submodule commit
2023-11-26 14:52:00 -05:00
Jeffrey Morgan
9fb5e8399c
Fix issues with inputting and formatting multi line strings in ollama run
...
Co-authored-by: Wen Sun <iwendellsun@gmail.com >
2023-11-26 12:54:29 -05:00
Jing Zhang
82b9b329ff
windows CUDA support ( #1262 )
...
* Support cuda build in Windows
* Enable dynamic NumGPU allocation for Windows
2023-11-24 17:16:36 -05:00
Jongwook Choi
12e8c12d2b
Disable CUDA peer access as a workaround for multi-gpu inference bug ( #1261 )
...
When CUDA peer access is enabled, multi-gpu inference will produce
garbage output. This is a known bug of llama.cpp (or nvidia). Until the
upstream bug is fixed, we can disable CUDA peer access temporarily
to ensure correct output.
See #961 .
2023-11-24 14:05:57 -05:00
Jeffrey Morgan
d77dde126b
consistent cpu instructions on macos and linux
2023-11-22 16:26:46 -05:00
Michael Yang
c7e70cd3bb
Merge pull request #1245 from jmorganca/mxyng/gguf-int
...
fix: gguf int type
2023-11-22 11:42:56 -08:00
Michael Yang
199941cd15
fix: gguf int type
2023-11-22 11:40:30 -08:00
Long Huynh
c9474f7f61
Update README.md - Community Integrations - Obsidian BMO Chatbot plugin ( #1239 )
2023-11-22 14:32:30 -05:00
Jeffrey Morgan
927e3ba4a4
tag image with correct version when building with build_docker script
2023-11-22 14:32:17 -05:00
Bruce MacDonald
37d95157df
fix relative path on create ( #1222 )
2023-11-21 15:43:17 -05:00
Jeffrey Morgan
2eaa95b417
Update api.md
2023-11-21 15:32:05 -05:00
Kevin Cao
3cd07728f4
Make alt+backspace delete word ( #1223 )
2023-11-21 12:26:47 -08:00
Michael Yang
ecf8b793f0
Merge pull request #1224 from jmorganca/mxyng/update
...
update llama.cpp
2023-11-21 12:21:59 -08:00
Matt Williams
abf294826b
Merge pull request #1221 from jmorganca/mattw/communityinstalls
...
add installation packages category to community
2023-11-21 12:12:23 -08:00
Steve Korshakov
ae06bb426b
add Llama Coder ( #1225 )
...
* add Llama Coder
* Update README.md
2023-11-21 14:08:19 -05:00
Matt Williams
d8e0f62ebb
Merge pull request #1159 from jmorganca/mattw/functioncalling
...
Example: Function Calling in Typescript
2023-11-21 10:06:55 -08:00
Michael Yang
a00fac4ec8
update llama.cpp
2023-11-21 09:50:02 -08:00
Jeffrey Morgan
f2113c1fc7
fix potential error in progress bar calculation
2023-11-21 12:48:20 -05:00
Jeffrey Morgan
6452e2ecb8
fix cases where progress bar would not be fixed size
2023-11-21 12:07:25 -05:00
Matt Williams
9a28e263a5
Update README.md
...
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com >
2023-11-21 07:25:32 -08:00
Matt Williams
0c066c9214
Update README.md
...
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com >
2023-11-21 07:25:26 -08:00
Jeffrey Morgan
aabd71aede
fix rendering and variable width issues on progress bar
2023-11-21 10:02:37 -05:00
Matt Williams
da4d7c9f9c
add installation packages category to community
...
Moved the arch package and someone has added a pr for brew.
that needs to get updated to be a link.
Signed-off-by: Matt Williams <m@technovangelist.com >
2023-11-21 06:40:59 -08:00
Matt Williams
f321b13a03
Merge pull request #1178 from tusharhero/install-instructions-archlinux
...
Add Installation instructions for Archlinux
2023-11-21 06:33:22 -08:00
Matt Williams
5ebcde1541
Merge branch 'main' into install-instructions-archlinux
2023-11-21 06:32:50 -08:00
Matt Williams
45206cb7cc
Merge pull request #1218 from danemadsen/main
...
Update Maid repo
2023-11-21 06:30:33 -08:00
Matt Williams
6e65b84f54
Merge pull request #1219 from dustinblackman/main
...
docs: Add Oatmeal to terminal integrations
2023-11-21 06:28:12 -08:00
Dustin Blackman
c00ce12e83
docs: Add Oatmeal to terminal integrations
2023-11-21 06:47:43 -05:00
tusharhero
e1cd3152c9
Move Archlinux package to Community Integrations section.
2023-11-21 16:28:50 +05:30
Dane Madsen
0bef3778c9
Update README.md
2023-11-21 21:02:13 +11:00
Dane Madsen
6ebab38b89
Merge branch 'jmorganca:main' into main
2023-11-21 20:01:13 +10:00
Dane Madsen
5d8e864d44
Update Maid repo
2023-11-21 21:00:54 +11:00
Matt Williams
5f7acd0bbd
remove 'recent'
...
Signed-off-by: Matt Williams <m@technovangelist.com >
2023-11-20 17:03:25 -08:00
Matt Williams
44b3a1ad42
Merge branch 'mattw/functioncalling' of github.com:jmorganca/ollama into mattw/functioncalling
...
Signed-off-by: Matt Williams <m@technovangelist.com >
2023-11-20 17:01:41 -08:00
Matt Williams
0260be4414
remove 'recently'
...
Signed-off-by: Matt Williams <m@technovangelist.com >
2023-11-20 16:57:07 -08:00
Jeffrey Morgan
a3fcecf943
only set main_gpu if value > 0 is provided
2023-11-20 19:54:04 -05:00
Jeffrey Morgan
df07e4a097
remove redundant filename parameter ( #1213 )
2023-11-20 17:05:36 -05:00
Michael Yang
0b7ade0d4c
Merge pull request #1212 from jmorganca/mxyng/metal
...
enable metal for fp32, q5_0, q5_1
2023-11-20 13:56:39 -08:00
Michael Yang
19b7a4d715
recent llama.cpp update added kernels for fp32, q5_0, and q5_1
2023-11-20 13:44:31 -08:00
Bruce MacDonald
31ab453d37
resolve FROM path before sending modelfile ( #1211 )
2023-11-20 16:43:48 -05:00
Jeffrey Morgan
35c4b5ec16
calculate hash separately from http request
2023-11-20 15:45:11 -05:00
James Braza
f24741ff39
Documenting how to view Modelfiles ( #723 )
...
* Documented viewing Modelfiles in ollama.ai/library
* Moved Modelfile in ollama.ai down per request
2023-11-20 15:24:29 -05:00
Jeffrey Morgan
8c4022b06b
fix initial progress stats
2023-11-20 14:33:46 -05:00
Jeffrey Morgan
433702f421
hide progress stats on completion
2023-11-20 14:22:39 -05:00
Matt Williams
48896f626c
Update examples/typescript-functioncalling/extractwp.ts
...
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com >
2023-11-20 10:12:10 -08:00
Matt Williams
c57aee6fba
Update examples/typescript-functioncalling/readme.md
...
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com >
2023-11-20 10:10:42 -08:00
Jeffrey Morgan
6066c70edd
restore progress messages for older endpoints
2023-11-20 11:37:17 -05:00
Jeffrey Morgan
f10ac5de19
restore stats updated every second to progress bar
2023-11-20 10:58:19 -05:00
Jeffrey Morgan
93a108214c
only show decimal points for smaller file size numbers
2023-11-20 10:58:19 -05:00
Purinda Gunasekara
be61a81758
main-gpu argument is not getting passed to llamacpp, fixed. ( #1192 )
2023-11-20 10:52:52 -05:00
Toni Soriano
2fdf1b5ff8
add laravel package to README.md ( #1208 )
...
Co-authored-by: Toni <cloudstudio@Tonis-Mac-mini.local >
2023-11-20 10:48:35 -05:00
Huy Le
331068b964
Adding ogpt.nvim into the list of plugins! ( #1190 )
...
* adding ollama.nvim for visibility
* adding an ogpt.nvim neovim plugin
2023-11-20 10:39:14 -05:00
Andy Brenneke
0179d8eb6b
Add Rivet to Community Integrations ( #1183 )
2023-11-20 10:36:47 -05:00
Eli Bendersky
be48741308
README: link to LangChainGo for talking to ollama, with an example ( #1206 )
2023-11-20 10:35:07 -05:00
Jeffrey Morgan
6bbd6e26fb
fix temporary newline created and removed with spinner in ollama run
2023-11-20 00:49:08 -05:00
Jeffrey Morgan
e6ad4813d3
dont crash when redirecting stderr
2023-11-19 23:50:45 -05:00
Jeffrey Morgan
13ba6df5ab
enable cpu instructions on intel macs
2023-11-19 23:20:26 -05:00
Jeffrey Morgan
9d73d3a6b5
add back part.Reset()
2023-11-19 14:32:19 -05:00
Jeffrey Morgan
72cd336410
dont retry on upload complete context cancel
2023-11-19 14:32:19 -05:00
Jeffrey Morgan
1bd594b2fa
revert to using one open file for blob uploads
2023-11-19 14:32:19 -05:00
Jeffrey Morgan
9a8c21ac3d
use exponential everywhere
2023-11-19 14:32:19 -05:00
Jeffrey Morgan
f6b317e8c9
fix sending too little data in chunk upload body
2023-11-19 14:32:19 -05:00
Jeffrey Morgan
ac5076ce1e
exponential backoff up to 30s
2023-11-19 14:32:19 -05:00
Michael Yang
42c2e3a624
upload: retry complete upload
2023-11-19 14:32:19 -05:00
Michael Yang
cb42589792
adjust download/upload parts
2023-11-19 14:32:19 -05:00
Jeffrey Morgan
258addc799
fix comment in progress.go
2023-11-19 13:46:19 -05:00
Jeffrey Morgan
c06b9b7304
update progress rendering to be closer to v0.1.10
2023-11-19 13:43:21 -05:00
Jeffrey Morgan
95b9acd324
improve pull percentage rendering
2023-11-19 11:00:43 -05:00
Jeffrey Morgan
04cbf5ccc0
progress bar styling improvements
2023-11-19 09:54:33 -05:00
Jeffrey Morgan
e1d7056496
update progress statuses
2023-11-19 09:21:13 -05:00
Jeffrey Morgan
02524a56ff
check retry for authorization error
2023-11-19 00:19:53 -05:00
Jeffrey Morgan
1657c6abc7
add note to specify JSON in the prompt when using JSON mode
2023-11-18 22:59:26 -05:00
Jeffrey Morgan
12e046f12a
remove unused function
2023-11-18 22:16:51 -05:00
Jeffrey Morgan
36a3bbf65f
Update llm/llama.go
2023-11-18 21:25:07 -05:00
Bruce MacDonald
43a726149d
fix potentially inaccurate error message
2023-11-18 21:25:07 -05:00
Jeffrey Morgan
984714f131
update status text when transfering blob on ollama create
2023-11-18 09:40:10 -05:00
Jeffrey Morgan
bab9494176
add - separator to temp file created on ollama create
2023-11-18 09:39:52 -05:00
Jeffrey Morgan
85e4441c6a
cache docker builds
2023-11-18 08:51:38 -05:00
Michael Yang
42e43736a4
Merge pull request #1186 from jmorganca/mxyng/copy-blob
...
fix cross device rename
2023-11-17 21:54:53 -08:00
Michael Yang
c6e6c8ee7e
fix cross device rename
2023-11-17 15:22:17 -08:00
Jeffrey Morgan
a185b29719
fix install script error on linux
2023-11-17 18:00:41 -05:00
Michael Yang
dc84b20d6b
Merge pull request #1104 from jmorganca/mxyng/jupyter
...
add jupyter notebook example
2023-11-17 14:46:26 -08:00
Michael Yang
ad8659b980
Merge pull request #1161 from jmorganca/mxyng/systemd-placeholder
...
placeholder environment variables
2023-11-17 14:45:38 -08:00
Michael Yang
c1bbf5ddee
Merge pull request #1134 from jmorganca/mxyng/progress
...
progress bar
2023-11-17 14:03:35 -08:00
Bruce MacDonald
0b19e24d81
only retry once on auth failure ( #1175 )
2023-11-17 14:22:35 -05:00
Michael Yang
3cb07d2773
simplify StopAndClear
2023-11-17 10:26:22 -08:00
Michael Yang
976068369b
stop all spinners on progress stop
2023-11-17 10:06:19 -08:00
Michael Yang
4d677ee389
no divide by zero
2023-11-17 10:06:19 -08:00
Michael Yang
7ea905871a
only move cursor up if pos > 0
2023-11-17 10:06:19 -08:00
Michael Yang
d6ecaa2cbf
update progress responses
2023-11-17 10:06:19 -08:00
Michael Yang
4dcf7a59b1
generate progress
2023-11-17 10:06:19 -08:00
Michael Yang
1c0e092ead
progress cmd
2023-11-17 10:06:19 -08:00
Michael Yang
c4a3ccd7ac
progress
2023-11-17 10:06:19 -08:00
Michael Yang
9f04e5a8ea
format bytes
2023-11-17 10:06:19 -08:00
Michael Yang
f91bb2f7f0
remove progressbar
2023-11-17 10:06:19 -08:00
Michael Yang
0813387414
Merge pull request #1177 from jmorganca/mxyng/faq
...
faq: fix heading and add more details
2023-11-17 10:05:21 -08:00
Michael Yang
4936b5bb37
add jupyter readme
2023-11-17 10:04:52 -08:00
tusharhero
786288829e
Make Archlinux a sub-heading of Linux.
2023-11-17 23:17:36 +05:30
tusharhero
72dcc952b6
Add Installation instructions for Archlinux
...
Pacman is the recommended installation method. And the package is in
the official repository, so makes sense to mention it in the README.
2023-11-17 23:13:40 +05:30
Michael Yang
f7f6d6c693
Update examples/jupyter-notebook/ollama.ipynb
...
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com >
2023-11-17 09:30:07 -08:00
Michael Yang
a3053b66d2
add jupyter notebook example
2023-11-17 09:30:07 -08:00
Michael Yang
c82ead4d01
faq: fix heading and add more details
2023-11-17 09:02:17 -08:00
Michael Yang
90860b6a7e
update faq ( #1176 )
2023-11-17 11:42:58 -05:00
Jeffrey Morgan
81092147c4
remove unnecessary -X POST from example curl commands
2023-11-17 09:50:38 -05:00
Jeffrey Morgan
92656a74b7
Use llama2 as the model in api.md
2023-11-17 07:17:51 -05:00
Jeffrey Morgan
41434a7cdc
build intel mac with correct binary and compile flags
2023-11-16 22:14:51 -05:00
Michael Yang
71687ab809
Merge pull request #1164 from jmorganca/mxyng/faq
...
update faq
2023-11-16 17:20:18 -08:00
Michael Yang
d8842b4d4b
update faq
2023-11-16 17:07:36 -08:00
Michael Yang
32add8577d
placeholder environment variables
2023-11-16 16:57:39 -08:00
Michael Yang
585f9c01fa
Merge pull request #1160 from jmorganca/mxyng/faq
...
update faq
2023-11-16 16:48:51 -08:00
Michael Yang
c13bde962d
Update docs/faq.md
...
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com >
2023-11-16 16:48:38 -08:00
Michael Yang
ee307937fd
update faq
2023-11-16 16:46:43 -08:00
Matt Williams
ab6639bc47
Merge pull request #1074 from jmorganca/mattw/loganalysisexample
...
Log Analysis Example
2023-11-16 16:33:07 -08:00
Matt Williams
fefae84c06
example: function calling
...
Signed-off-by: Matt Williams <m@technovangelist.com >
2023-11-16 16:26:29 -08:00
Jeffrey Morgan
dbe6e77472
Update README.md
2023-11-16 16:46:38 -05:00
Bruce MacDonald
4b3f4bc7d9
return failure details when unauthorized to push ( #1131 )
...
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com >
2023-11-16 16:44:18 -05:00
Michael Yang
a5ccf742c1
fix cross repo mounts
2023-11-16 16:33:30 -05:00
Michael Yang
e33ef391cd
fix push scope error for inherited model
2023-11-16 16:33:30 -05:00
yanndegat
75295b9528
install: fix enable contrib on debian 12 ( #1151 )
...
On debian 12, sources definitions have moved from
/etc/apt/sources.list to /etc/apt/sources.list.d/debian.sources
2023-11-16 15:53:06 -05:00
Matt Williams
db5ef3004c
Merge pull request #1079 from jmorganca/mattw/jsonexample
...
Add example using JSON format output
2023-11-16 09:13:34 -08:00
Michael Yang
b5f158f046
add faq for proxies ( #1147 )
2023-11-16 11:43:37 -05:00
Piero Savastano
30141b42e9
Add Cheshire Cat to community integrations ( #1124 )
2023-11-16 11:30:54 -05:00
Dane Madsen
5f301ece1d
Add Maid to Community Integrations ( #1120 )
2023-11-16 11:27:53 -05:00
Michael Yang
77954bea0e
Merge pull request #898 from jmorganca/mxyng/build-context
...
create remote models
2023-11-15 16:41:12 -08:00
Michael Yang
54f92f01cb
update docs
2023-11-15 15:28:15 -08:00
Michael
30ae6e731e
Update randomaddresses.py
2023-11-15 18:24:50 -05:00
Michael
b28a30f7ba
Update examples/python-json-datagenerator/predefinedschema.py
...
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com >
2023-11-15 18:23:36 -05:00
Jeffrey Morgan
ecd71347ab
Update faq.md
2023-11-15 18:17:13 -05:00
Jeffrey Morgan
8ee4cbea0f
Remove table of contents in faq.md
2023-11-15 18:16:27 -05:00
Michael Yang
652d90e1c7
Update server/images.go
...
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com >
2023-11-15 15:16:23 -08:00
Michael Yang
bc22d5a38b
no blob response
2023-11-15 15:16:23 -08:00
Michael Yang
71d71d0988
update docs
2023-11-15 15:16:23 -08:00
Michael Yang
1901044b07
use checksum reference
2023-11-15 15:16:23 -08:00
Michael Yang
d660eebf22
fix create from model tag
2023-11-15 15:16:23 -08:00
Michael Yang
cac11c9137
update api docs
2023-11-15 15:16:23 -08:00
Michael Yang
a07c935d34
ignore non blobs
2023-11-15 15:16:23 -08:00
Michael Yang
1552cee59f
client create modelfile
2023-11-15 15:16:23 -08:00
Michael Yang
3ca56b5ada
add create modelfile field
2023-11-15 15:16:23 -08:00
Michael Yang
b0d14ed51c
refactor create model
2023-11-15 15:16:23 -08:00
Matt Williams
f61f340279
FAQ: answer a few faq questions ( #1128 )
...
* faq: does ollama share my prompts
Signed-off-by: Matt Williams <m@technovangelist.com >
* faq: ollama and openai
Signed-off-by: Matt Williams <m@technovangelist.com >
* faq: vscode plugins
Signed-off-by: Matt Williams <m@technovangelist.com >
* faq: send a doc to Ollama
Signed-off-by: Matt Williams <m@technovangelist.com >
* extra spacing
Signed-off-by: Matt Williams <m@technovangelist.com >
* Update faq.md
* Update faq.md
---------
Signed-off-by: Matt Williams <m@technovangelist.com >
Co-authored-by: Michael <mchiang0610@users.noreply.github.com >
2023-11-15 18:05:13 -05:00
Dane Madsen
779e196ef6
Merge branch 'jmorganca:main' into main
2023-11-15 21:38:07 +10:00
Matt Williams
47ffb81db7
Update examples/python-json-datagenerator/readme.md
...
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com >
2023-11-14 10:33:34 -08:00
Matt Williams
69795d2db0
Update examples/python-json-datagenerator/readme.md
...
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com >
2023-11-14 10:33:16 -08:00
Matt Williams
acde0819d9
Update examples/python-json-datagenerator/randomaddresses.py
...
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com >
2023-11-14 10:33:02 -08:00
Matt Williams
f748331aa3
Update examples/python-json-datagenerator/predefinedschema.py
...
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com >
2023-11-14 10:32:45 -08:00
Matt Williams
f4edc302a8
Update examples/python-loganalysis/readme.md
...
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com >
2023-11-14 10:31:22 -08:00
Matt Williams
64b7e0c218
Update examples/python-loganalysis/loganalysis.py
...
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com >
2023-11-14 10:31:05 -08:00
Matt Williams
eced0d52ab
Update examples/python-loganalysis/loganalysis.py
...
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com >
2023-11-14 10:30:30 -08:00
Matt Williams
96bf9cafa7
Update examples/python-loganalysis/loganalysis.py
...
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com >
2023-11-14 10:30:17 -08:00
Dane Madsen
c1a5220860
Update README.md
2023-11-14 15:31:31 +10:00
Dane Madsen
3b15175a70
Add maid to community integrations
2023-11-14 15:30:03 +10:00
Matt Williams
b6817a83d8
Add gif and finish readme
...
Signed-off-by: Matt Williams <m@technovangelist.com >
2023-11-10 16:41:48 -06:00
Matt Williams
73f3448ede
add example showing use of JSON format
...
Signed-off-by: Matt Williams <m@technovangelist.com >
2023-11-10 16:33:56 -06:00
Matt Williams
e4f59ba073
better streaming plus gif
...
Signed-off-by: Matt Williams <m@technovangelist.com >
2023-11-10 08:55:17 -06:00
Matt Williams
5de568bffe
Add a simple log analysis example
...
Signed-off-by: Matt Williams <m@technovangelist.com >
2023-11-10 08:28:52 -06:00
Michael Yang
b99c291f47
fly example
2023-11-01 14:58:20 -07:00