Bruce MacDonald
5f62064e2f
examples
2025-03-25 09:33:17 -07:00
Bruce MacDonald
e3f3043f5b
Update add-a-model.md
2025-02-25 14:59:39 -08:00
Bruce MacDonald
b5fc84c930
rename doc
2025-02-21 09:32:26 -08:00
Bruce MacDonald
827b6b5d16
Update docs/implement.md
...
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com >
2025-02-21 09:27:32 -08:00
Bruce MacDonald
0d15036d82
docs: add basic steps to implement a new model
...
Add detailed guide for implementing new models in Ollama's Go inference engine.
The guide walks through the full process from initial setup to deployment, including architecture overview,
file structure, conversion process, and testing requirements. This will help new contributors understand how to add models to Ollama.
2025-02-19 11:17:33 -08:00
Jeffrey Morgan
d2eb226c91
llama: add patch to fix ggml backend reg on Linux with utf-8 characters in the path ( #9159 )
2025-02-18 22:46:17 -05:00
Michael Yang
e13e7c8d94
Merge pull request #9079 from jeremyschlatter/main
...
cmd: fix flickering in progress bar
2025-02-18 22:59:29 +00:00
Jeremy Schlatter
78f403ff45
address code review comments
2025-02-18 14:50:09 -08:00
Michael Yang
08a299e1d0
cmake: avoid building intel backends on linux
2025-02-18 22:17:00 +00:00
Michael Yang
7b5d916a9a
ci: set owner/group in tarball
...
set owner and group when building the linux tarball so extracted files
are consistent. this is the behaviour of release tarballs in version
0.5.7 and lower
2025-02-18 20:11:09 +00:00
benhaotang
33ad61b112
Add OpenDeepResearcher-via-searxng to Community Integrations ( #9138 )
2025-02-18 11:39:11 -08:00
L. Jiang
716e365615
test: add test cases for HumanNumber ( #9108 )
2025-02-18 11:35:26 -08:00
innightwolfsleep
3b4424ff98
readme: add LLM Telegram Bot to community integrations ( #9150 )
2025-02-18 10:04:30 -05:00
Jeremy Schlatter
f9c7ead160
cmd: eliminate flickering with synchronized output
2025-02-17 20:01:03 -08:00
Jeremy Schlatter
5930aaeb1a
cmd: fix cursor flickering in progress bar
...
The previous commit fixed flickering in the progress bar itself. Cursor
flickering is harder to address.
Cursor flickering could be fixed by hiding the cursor altogether while
the progress bar is displayed. The downside of this is that if the
program is killed in such a way that it can't clean up its state, it
would leave the cursor invisible.
Instead, this commit introduces an output buffer. All of the escape
codes and content for a single progress update are written to a buffer,
which is then flushed to the terminal all at once. This significantly
decreases the time during which the terminal has seen the cursor-hiding
code but has not yet seen the cursor-showing code, thus minimizing (but
not 100% eliminating) cursor flickering.
For more context, see:
https://gitlab.gnome.org/GNOME/vte/-/issues/2837#note_2269501
2025-02-17 14:56:57 -08:00
Jeremy Schlatter
faf67db089
cmd: fix progress bar flickering
...
Previous code cleared the display before writing new content, creating a
window where the terminal could (and in some cases did) render empty lines.
Instead, we now write new content over the old content, only clearing
the trailing end of lines for cases where the new line is shorter.
Fixes #1664
2025-02-17 13:39:02 -08:00
James-William-Kincaid-III
0667baddc6
docs: fix incorrect shortcut key in windows.md ( #9098 )
2025-02-15 15:38:24 -05:00
Bruce MacDonald
d006e1e09b
model: document high-level model interface ( #9122 )
2025-02-14 16:01:00 -08:00
Daniel Hiltgen
df2680b4b9
Wire up system info log for new engine ( #9123 )
2025-02-14 15:55:33 -08:00
Jesse Gross
010313bb63
llamarunner: Init GGML before printing system info
...
We currently print system info before the GGML backends are loaded.
This results in only getting information about the default lowest
common denominator runner. If we move up the GGML init then we can
see what we are actually running.
Before:
time=2025-02-14T11:15:07.606-08:00 level=INFO source=runner.go:935 msg=system info="CPU : LLAMAFILE = 1 | CPU : LLAMAFILE = 1 | cgo(gcc)" threads=24
After:
time=2025-02-14T11:16:02.936-08:00 level=INFO source=runner.go:935 msg=system info="CPU : LLAMAFILE = 1 | CPU : LLAMAFILE = 1 | CUDA : ARCHS = 890 | USE_GRAPHS = 1 | PEER_MAX_BATCH_SIZE = 128 | CPU : SSE3 = 1 | SSSE3 = 1 | AVX = 1 | AVX2 = 1 | F16C = 1 | FMA = 1 | AVX512 = 1 | AVX512_VBMI = 1 | AVX512_VNNI = 1 | LLAMAFILE = 1 | cgo(gcc)" threads=24
2025-02-14 11:41:53 -08:00
Jeffrey Morgan
5296f487a8
llm: attempt to evaluate symlinks, but do not fail ( #9089 )
...
provides a better approach to #9088 that will attempt to
evaluate symlinks (important for macOS where 'ollama' is
often a symlink), but use the result of os.Executable()
as a fallback in scenarios where filepath.EvalSymlinks
fails due to permission erorrs or other issues
2025-02-13 22:37:59 -08:00
Jeffrey Morgan
f05774b04c
llm: do not evaluate symlink for exe path lookup ( #9088 )
...
In some cases, the directories in the executable path read by
filepath.EvalSymlinks are not accessible, resulting in permission
errors which results in an error when running models. It also
doesn't work well on long paths on windows, also resulting in
errors. This change removes filepath.EvalSymlinks when accessing
os.Executable() altogether
2025-02-13 22:13:00 -08:00
Jeffrey Morgan
6600bd7d91
ml/backend/ggml: stable sort devices by score ( #9081 )
2025-02-13 18:42:36 -08:00
Jesse Gross
ed443a0393
Runner for Ollama engine
...
This provides integration with the new Ollama engine
(5824541 next ollama runner (#7913 )) and the rest of the Ollama
infrastructure such as the runner and Ollama server.
In addition, it also builds out the KV cache infrastructure to
support requirements of how Ollama runs models such as:
- Parallel processing
- Memory management for defragmentation and shifting
- Multi-modal modals
Both old and new engines continue to be supported. By default, only
the old engine is used. To enable the new engine:
Start the server with the OLLAMA_NEW_ENGINE environment variable set:
OLLAMA_NEW_ENGINE=1 ./ollama serve
Start a model that is supported by the Ollama engine. This one is Llama 3.1 8b Q4_K_M:
./ollama run jessegross/llama3.1
2025-02-13 17:09:26 -08:00
Jesse Gross
6945617af5
models: Move model into their own directory
...
This allows there to be a file that is a list of models that is
not mixed into the runner code.
2025-02-13 17:09:26 -08:00
Jesse Gross
7916f55009
vocab: Use int32 for special tokens
...
Special tokens are currently read as uint32 from the model metadata.
However, all other parts of the system (including the tokenizer) use
int32 to represent tokens so it is impossible to represent the high
portion of the unsigned range. For consistency and to avoid casts,
we should just use int32 everywhere.
2025-02-13 17:09:26 -08:00
Jesse Gross
d650ad398f
model: Load tensors behind an interface
...
Currently, if a model uses an interface for its data structures (as mllama
does) then the tensor data in the structs implementing that interface will
not get loaded.
2025-02-13 17:09:26 -08:00
Jesse Gross
d223f3b697
ggml-backend: Close on nil should be a no-op
2025-02-13 17:09:26 -08:00
Jesse Gross
60830695c2
ggml-backend: Ensure data is available after async computation
...
We need to sync before retrieving data after async computation.
It is also important to ensure that the Go buffer is not moved by
the GC across function calls so we do a synchronous copy.
2025-02-13 17:09:26 -08:00
Jesse Gross
01d9a46854
ggml-backend: Let GGML allocate context memory
...
Passing in a Go buffer is not safe because the garbage collector could
free or move the memory while the context is still open. However, if
we pass in the size and a nil pointer then GGML will allocate it from
the C side.
2025-02-13 17:09:26 -08:00
Jesse Gross
d773b7d671
backend: API to support full precision matmul
...
Most tensor backends try to optimize performance by using a lower
precision for matmuls. However, some operations (such as kq) on
some models are sensitive to this and require full precision.
2025-02-13 17:09:26 -08:00
Jesse Gross
4d4463b2bd
backend: Support graph computation that does not return an output
...
There are two cases where we may not have an output after computing:
- Prompt processing where the length of the input exceeds the batch
size
- Internal memory management operations such as cache defrag and shift
2025-02-13 17:09:26 -08:00
Jesse Gross
0e38297f87
backend: Consistently use int (vs. int64) for tensor shapes
...
Currently there is a mixture of int and int64 used when dealing with
tensor dimensions and shapes, which causes unnecessary conversions -
they all should be the same type.
In general, most interfaces (such as Pytorch) use int64 for
generality but most implementations (such as CUDA) use int32 for
performance. There isn't much benefit to us to being more flexible
than the implementations we are likely to run on.
In addition, as a practical matter, a model with a tensor with a single
dimension larger than 32 bits is unlikely to run on a 32-bit machine.
2025-02-13 17:09:26 -08:00
Jesse Gross
7e13f568dc
backend: Don't return an error on Close
...
It is not common to return errors with close/free operations - most
people won't check it and even if they did there's probably not much
that can do. It's better to not give implementations false expectations.
2025-02-13 17:09:26 -08:00
Michael Yang
58245413f4
next ollama runner ( #7913 )
...
feat: add new Ollama engine using ggml through cgo
This change introduces a new way to run pretrained models. It introduces 3 high level interfaces and a bunch of smaller helper interfaces to facilitate this.
- `model.Model` defines the interface for a model architecture. Models such as `llama` and `mllama`, which are provided as examples, can implement the model's forward propagation in the `Forward` method. This method will be called to generate completions. This interface can be found in `model/model.go`
- `ml.Backend` defines the interface for a backend tensor library, in this case `ggml`. Among other things, a Backend is responsible for loading a pretrained model into hardware (GPU, CPU, etc) and providing an interface for Models to access loaded tensors. This interface can be found in `ml/backend.go`
- `ml.Tensor` defines the interface for a tensor and tensor operations
This is the first implementation of the new engine. Follow up PRs will implement more features:
- non-greedy sampling (#8410 )
- integration with Ollama and KV caching (#8301 )
- more model support (#9080 ) with more coming soon
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com >
2025-02-13 16:31:21 -08:00
Bùi Đức Nhật
8cf16063a5
docs: add ollamazing to the README.md ( #9075 )
2025-02-13 10:47:09 -08:00
frob
3a4449e2f1
docs: add H200 as supported device. ( #9076 )
...
Co-authored-by: Richard Lyons <frob@cloudstaff.com >
2025-02-13 10:44:23 -08:00
Anuraag (Rag) Agrawal
10d59d5f90
openai: finish_reason as tool_calls for streaming with tools ( #7963 )
2025-02-13 10:20:12 -08:00
Jeffrey Morgan
a4f69a0191
build: add -DGGML_CUDA_NO_PEER_COPY=ON for rocm builds on windows ( #9060 )
2025-02-13 00:23:17 -08:00
Clinton
82658c3eec
readme: add Homebrew to package managers section ( #9052 )
2025-02-12 11:17:39 -08:00
bloominstrong
378d6e1e6a
docs: fix nix package link ( #9045 )
...
removing the channel tag from the url so it will always go to the current stable channel.
2025-02-12 09:16:26 -08:00
Hugues Chocart
afa55bc70c
doc: fix link for Abso ( #9043 )
2025-02-12 09:15:08 -08:00
Michael Yang
49df03da9a
fix: harden backend loading ( #9024 )
...
* wrap ggml_backend_load_best in try/catch
* ignore non-ollama paths
2025-02-11 15:36:53 -08:00
Hugues Chocart
0189bdd0b7
readme: add Abso SDK to community integrations ( #8973 )
2025-02-11 00:14:45 -08:00
Jeffrey Morgan
f4711da7bd
ml/backend/ggml: fix crash on dlopen for non-AVX systems ( #8976 )
2025-02-10 09:52:12 -08:00
Hugues Chocart
38117fba83
readme: add Lunary to observability community integrations ( #8975 )
2025-02-09 22:08:46 -08:00
Michael Yang
1f766c36fb
ci: use windows-2022 to sign and bundle ( #8941 )
...
ollama requires vcruntime140_1.dll which isn't found on 2019. previously
the job used the windows runner (2019) but it explicitly installs
2022 to build the app. since the sign job doesn't actually build
anything, it can use the windows-2022 runner instead.
2025-02-08 13:07:00 -08:00
Qusai Ismael
484a99e428
docs: add LocalLLM app to community integrations ( #8953 )
2025-02-08 12:28:01 -08:00
DravenK
ec6121c331
docs: ollama zig community lib ( #8688 )
2025-02-08 11:10:47 -08:00
Jeffrey Morgan
b86c0a1500
docs: link directly to latest release page for tdm-gcc ( #8939 )
2025-02-08 00:21:10 -08:00
Guddu Kumar
7e402ebb8c
readme: add deepseek to supported models
2025-02-07 11:28:28 -08:00
Azis Alvriyanto
b901a712c6
docs: improve syntax highlighting in code blocks ( #8854 )
2025-02-07 09:55:07 -08:00
Michael Yang
abb8dd57f8
add gfx instinct gpus ( #8933 )
2025-02-07 09:51:22 -08:00
Leisure Linux
a400df48c0
docs: include port in faq.md OLLAMA_HOST examples ( #8905 )
2025-02-06 18:45:09 -08:00
annilq
6ab4ba4c26
readme: add React Native client to community integrations ( #8877 )
2025-02-06 17:15:48 -08:00
CosmicEventHorizon
e8d4eb3e68
readme: add ChibiChat to community integrations ( #8883 )
2025-02-06 16:08:46 -08:00
Michael Yang
ae7e368f75
build(rocm): add numa, elf ( #8900 )
2025-02-06 15:46:30 -08:00
oslook
31acd1ebf9
readme: add Ollama Chat WebUI for Docker to community integrations ( #8084 )
2025-02-06 15:41:02 -08:00
Michael Yang
9a4757ae66
build(rocm): add tinfo ( #8899 )
2025-02-06 15:08:12 -08:00
Abhinav Pant
7814019708
docs: add step for removing libraries in linux.md ( #8897 )
2025-02-06 14:54:58 -08:00
Michael Yang
b698f9a0d8
build: add missing dependencies ( #8896 )
2025-02-06 13:12:16 -08:00
Azis Alvriyanto
32285a6d19
format: rename test file from byte_test.go to bytes_test.go ( #8865 )
2025-02-06 13:06:15 -08:00
Michael Yang
1c198977ec
ci: fix linux archive ( #8862 )
...
the find returns intermediate directories which pulls the parent
directories. it also omits files under lib/ollama.
switch back to globbing
2025-02-05 19:45:58 -08:00
zyphixor
330b6c50b0
readme: add simple-discord-ai to community integrations ( #8659 )
2025-02-05 18:35:04 -08:00
Diego Pereira
928911bc68
runner: avoid buffer overwrite when generating multiple embeddings ( #8714 )
...
Shield the code processing the embedding result
from subsequent calls that may overwrite the same
buffer to process a second input when retrieving
model embeddings.
2025-02-05 16:53:33 -08:00
Michael Yang
5b446cc815
chore: update gitattributes ( #8860 )
...
* chore: update gitattributes
* chore: add build info source
2025-02-05 16:37:18 -08:00
Daniel Lok
451c1596af
readme: add MLflow Tracing as an observability integration ( #8811 )
2025-02-05 16:04:24 -08:00
Michael Yang
932bded12f
chore: add optional field for server logs
2025-02-05 15:55:32 -08:00
Michael Yang
070ad913ac
ci: fix linux archive
2025-02-05 15:08:02 -08:00
Azis Alvriyanto
8d8b9f83ae
format: byte formatting test coverage ( #8692 )
...
Removed redundant checks and streamlined the switch-case structure.
Added test cases for both HumanBytes and HumanBytes2 to cover a wide range of scenarios.
2025-02-05 12:23:07 -08:00
Jeffrey Morgan
f00d359a67
docs: add section in development.md on library detection ( #8855 )
2025-02-05 11:16:27 -08:00
Yashwanth A
291def6adb
server: increase timeout in stall detection from 5s to 30s ( #8831 )
...
In some cases, downloads slow due to disk i/o or other factors,
causing the download to restart a part. This causes the download
to "reverse" in percent completion. By increasing the timeout to 30s,
this should happen less frequently.
2025-02-05 10:00:26 -08:00
Jeffrey Morgan
cd3fbf1c49
llama: use dynamic backend loading for mllama and clip ( #8835 )
2025-02-05 09:46:56 -08:00
Jeffrey Morgan
c852b8e021
server: always print upload/download part info ( #8832 )
2025-02-04 19:30:49 -08:00
William
d8932c55e7
server: fix out of bounds exception on model download ( #8746 )
2025-02-04 18:52:47 -08:00
Michael Yang
63f0269f7f
ci: split docker build by platform
...
this improves build reliability and concurrency
2025-02-04 17:04:27 -08:00
Jeffrey Morgan
4759ecae19
ml/backend/ggml: fix library loading on macOS amd64 ( #8827 )
2025-02-04 15:05:39 -08:00
Michael Yang
65b7ecac7b
fix extra quote
2025-02-04 08:35:30 -08:00
Michael Yang
f9d2d89135
fix linux archive
2025-02-03 16:12:33 -08:00
Michael Yang
669dc31cf3
fix build
2025-02-03 15:10:51 -08:00
Tilman Griesel
d4d338c224
readme: add Chipper to community integrations ( #8803 )
2025-02-03 14:18:19 -08:00
Melroy van den Berg
bfdeffc375
docs: use OLLAMA_VERSION=0.5.7 for install version override ( #8802 )
2025-02-03 13:54:08 -08:00
Michael Yang
e806184023
fix release workflow
2025-02-03 13:19:57 -08:00
Jeffrey Morgan
50566113ac
llm: do not error if LibOllamaPath does not exist ( #8801 )
2025-02-03 12:27:48 -08:00
Davide Bertoni
ad22ace439
docs: add missing json and shell code blocks in api.md ( #8766 )
2025-02-02 13:12:55 -08:00
Anıl Kaynar
f4321a421c
readme: add MinimalNextOllamaChat to community integrations ( #8767 )
2025-02-02 12:56:10 -08:00
Michael Yang
475333d533
fix docker build-args
...
env context is not accessible from job.*.strategy. since it's in the
environment, just tell docker to use the environment variable[1]
[1]: https://docs.docker.com/reference/cli/docker/buildx/build/#build-arg
2025-01-31 14:56:02 -08:00
Michael Yang
39fd89308c
build: set CFLAGS=-O3 specifically for cpu.go
2025-01-31 10:25:39 -08:00
Michael Yang
548a9f56a6
Revert "cgo: use O3"
...
This reverts commit bea1f1fac6 .
2025-01-31 10:25:39 -08:00
Michael Yang
3f0cb36bdb
build: set goflags in linux release
2025-01-30 13:07:32 -08:00
Michael Yang
bea1f1fac6
cgo: use O3
2025-01-30 12:21:50 -08:00
Jeffrey Morgan
5d75d837ef
discover: fix default LibOllamaPath value ( #8702 )
2025-01-30 12:21:38 -08:00
Parth Sareen
711648c9bb
docs: update api.md with streaming with tools is enabled ( #8676 )
2025-01-29 15:14:30 -08:00
Michael Yang
dcfb7a105c
next build ( #8539 )
...
* add build to .dockerignore
* test: only build one arch
* add build to .gitignore
* fix ccache path
* filter amdgpu targets
* only filter if autodetecting
* Don't clobber gpu list for default runner
This ensures the GPU specific environment variables are set properly
* explicitly set CXX compiler for HIP
* Update build_windows.ps1
This isn't complete, but is close. Dependencies are missing, and it only builds the "default" preset.
* build: add ollama subdir
* add .git to .dockerignore
* docs: update development.md
* update build_darwin.sh
* remove unused scripts
* llm: add cwd and build/lib/ollama to library paths
* default DYLD_LIBRARY_PATH to LD_LIBRARY_PATH in runner on macOS
* add additional cmake output vars for msvc
* interim edits to make server detection logic work with dll directories like lib/ollama/cuda_v12
* remove unncessary filepath.Dir, cleanup
* add hardware-specific directory to path
* use absolute server path
* build: linux arm
* cmake install targets
* remove unused files
* ml: visit each library path once
* build: skip cpu variants on arm
* build: install cpu targets
* build: fix workflow
* shorter names
* fix rocblas install
* docs: clean up development.md
* consistent build dir removal in development.md
* silence -Wimplicit-function-declaration build warnings in ggml-cpu
* update readme
* update development readme
* llm: update library lookup logic now that there is one runner (#8587 )
* tweak development.md
* update docs
* add windows cuda/rocm tests
---------
Co-authored-by: jmorganca <jmorganca@gmail.com >
Co-authored-by: Daniel Hiltgen <daniel@ollama.com >
2025-01-29 15:03:38 -08:00
Xiaofu Huang
2ef3c803a1
readme: add AI Toolkit for VSCode to community integrations ( #8604 )
2025-01-27 00:36:23 -08:00
Matěj Štágl
453e4d090b
readme: add LlmTornado to community integrations ( #8551 )
2025-01-25 01:04:07 -08:00
Daniel Jalkut
ca2f9843c8
docs: remove reference to the deleted examples folder ( #8524 )
2025-01-22 22:52:15 -08:00
frob
294b6f5a22
docs: remove tfs_z option from documentation ( #8515 )
2025-01-21 09:28:59 -08:00
EndoTheDev
7bb356c680
docs: update suspend header in gpu.md ( #8487 )
2025-01-19 18:45:35 -08:00
Jannik Maierhöfer
021817e59a
readme: add link to Langfuse ( #8455 )
2025-01-16 22:41:12 -08:00
Patrick Devine
a420a453b4
fix default modelfile for create ( #8452 )
2025-01-16 01:14:04 -08:00
Jeffrey Morgan
42cf4db601
parser: fix parsing Modelfiles with multiple FROM commands ( #8449 )
2025-01-16 00:14:04 -08:00
Josh
93a8daf285
convert: import support for command-r models from safetensors ( #6063 )
...
---------
Co-authored-by: Patrick Devine <patrick@infrahq.com >
2025-01-15 16:31:22 -08:00
Gloryjaw
a041b4df7c
docs: fix path to examples ( #8438 )
2025-01-15 11:49:12 -08:00
Patrick Devine
2539f2dbf9
Fix absolute path names + gguf detection ( #8428 )
2025-01-14 19:01:24 -08:00
Jeffrey Morgan
61676fb506
llama: move grammar tests to llama_test.go ( #8411 )
2025-01-14 12:55:45 -08:00
Bruce MacDonald
f6f3713001
convert: qwen2 from safetensors ( #8408 )
...
Add native support for converting Qwen2 family models (including Qwen2.5)
from safetensors to gguf format so we can run it.
2025-01-14 10:34:37 -08:00
Steve Berdy
a30f347201
readme: add LangChain for .NET to community integrations ( #8352 )
2025-01-14 09:37:35 -08:00
Jeffrey Morgan
74ea4fb604
remove .prettierrc.json ( #8413 )
2025-01-14 09:30:34 -08:00
Jeffrey Morgan
6982e9cc96
readme: remove link to missing page
2025-01-13 18:56:31 -08:00
Patrick Devine
ab39872cb4
add new create api doc ( #8388 )
2025-01-13 17:30:24 -08:00
Parth Sareen
84a2314463
examples: remove codified examples ( #8267 )
2025-01-13 11:26:22 -08:00
Jeffrey Morgan
17fcdea698
readme: move discord link
2025-01-12 22:45:47 -08:00
Patrick Devine
32bd37adf8
make the modelfile path relative for ollama create ( #8380 )
2025-01-10 16:14:08 -08:00
Michael Yang
9446c2c902
Merge pull request #8196 from ollama/mxyng/gods-v2
...
chore: upgrade to gods v2
2025-01-10 13:50:11 -08:00
Jeffrey Morgan
9aa141d023
readme: remove discord badge image for now
2025-01-09 22:02:18 -08:00
Patrick Devine
8bccae4f92
show a more descriptive error in the client if it is newer than the server ( #8351 )
2025-01-09 10:12:30 -08:00
isamu arimoto
6ae2adc1af
openai: accept additional headers to fix CORS errors ( #8343 )
2025-01-08 11:28:11 -08:00
Jeffrey Morgan
1deafd8254
llama: update vendored code to commit 46e3556 ( #8308 )
2025-01-08 11:22:01 -08:00
Michael
57f038ec7b
readme: add phi4 model ( #8350 )
2025-01-08 11:21:39 -08:00
frob
cdf3a181dc
Add CUSTOM_CPU_FLAGS to Dockerfile. ( #8284 )
...
* Add CUSTOM_CPU_FLAGS.
* fix golangci-lint error.
---------
Co-authored-by: Richard Lyons <rick@frob.com.au >
2025-01-06 09:17:19 -08:00
Ubaldo Porcheddu
3919f4ba3d
llama: fix runner api example url in README.md ( #8307 )
2025-01-04 15:45:16 -08:00
Bruce MacDonald
2d33c4e97d
discover: remove leading new-line for linter
2025-01-03 12:03:58 -08:00
Bruce MacDonald
29a8975c66
api: remove unused create fields
...
These fields are deprecated, but specifying them will not do anything. Removing them as the other deprecated fields will still work, but these do not, so they dont match our existing pattern.
2025-01-03 12:03:58 -08:00
Patrick Devine
86a622cbdc
Update the /api/create endpoint to use JSON ( #7935 )
...
Replaces `POST /api/create` to use JSON instead of a Modelfile.
This is a breaking change.
2024-12-31 18:02:30 -08:00
Jeffrey Morgan
459d822b51
readme: link header to ollama.com
2024-12-29 17:36:07 -05:00
Simon Schampijer
844899440a
examples: updated deprecated imports ( #3602 )
2024-12-29 14:36:25 -05:00
Anas Khan
103db4216d
docs: add /api/version endpoint documentation ( #8082 )
...
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com >
2024-12-29 14:33:44 -05:00
Jeffrey Morgan
6daddcde01
readme: update import header
2024-12-29 14:12:23 -05:00
Emilien Lancelot
07f7e69b36
readme: add Yacana multi-agent framework to community integrations ( #7259 )
2024-12-28 15:05:57 -05:00
CIIDMike
b68e8e5727
docs: add syntax highlighting on Go template code blocks ( #8215 )
2024-12-27 13:17:49 -05:00
Adarsh Mishra
369fb529e2
readme: add TextLLaMA to community integrations
2024-12-27 13:16:06 -05:00
Jared Donnell
023e4bca14
readme: add neollama to terminal section of community integrations ( #8242 )
2024-12-25 17:16:11 -05:00
aritra saha
51af455f62
readme: add alpaca client application to community integrations ( #8227 )
2024-12-24 23:05:35 -05:00
Emanuil Rusev
ffe3549064
readme: add IntelliBar to community integrations ( #7950 )
2024-12-23 12:04:18 -05:00
湛露先生
928de9050e
server: reuse InvalidModelNameErrMsg type ( #8163 )
2024-12-23 10:38:34 -05:00
ItzCrazyKns
36aea6154a
readme: add Perplexica to community-integrations ( #8198 )
2024-12-22 20:04:01 -05:00
Patrick Devine
dd352ab27f
fix crash bug with /save when quotes are used ( #8208 )
2024-12-21 22:31:37 -08:00
Michael Yang
cb40d60469
chore: upgrade to gods v2
...
gods v2 uses go generics rather than interfaces which simplifies the
code considerably
2024-12-21 00:05:16 -08:00
Patrick Devine
d8bab8ea44
remove tutorials.md which pointed to removed tutorials ( #8189 )
2024-12-20 14:04:20 -08:00
Squishedmac
9ab62eb96f
update golang.org/x dependencies ( #8172 )
2024-12-20 09:29:30 -08:00
Parth Sareen
290cf2040a
llama: test key order preservation in schema_to_grammar ( #8078 )
...
This change adds a test to catch a regression in schema_to_grammar where
the order of keys in the JSON schema is not preserved in the generated
grammar, which is critical for step-by-step reasoning.
2024-12-18 19:44:50 -08:00
Jeffrey Morgan
a72f2dce45
scripts: sign renamed macOS binary ( #8131 )
2024-12-17 18:03:49 -08:00
Jesse Gross
08a832b482
llama: Ensure KV cache is fully defragmented.
...
Sometimes the KV cache requires defragmentation even without
triggering the threshold heuristic. In this case, decoding
will not being able to find a KV cache slot. This is particularly
difficult for the caller to handle if it happens in between
ubatches. To avoid this, we should immediately trigger a defrag.
In addition, a heavily fragmented cache can require more than
max_moves to defragment. Currently, we stop when we hit the limit
but this can leave a cache that still does not have adequate space
even after defragmentation is triggered. Instead, we should do
multiple batches of processing until everything is complete.
Fixes #7949
2024-12-17 14:01:19 -08:00
Blake Mizerany
2ddc32d5c5
llm: do not error on "null" format ( #8139 )
...
This fixes another regression in the previous commit that fixed other
known bugs.
2024-12-17 09:49:37 -08:00
Jascha Beste
2cde4b8817
readme: change getting started guide link for pgai ( #8119 )
2024-12-16 22:13:23 -08:00
Blake Mizerany
87f0a49fe6
llm: do not silently fail for supplied, but invalid formats ( #8130 )
...
Changes in #8002 introduced fixes for bugs with mangling JSON Schemas.
It also fixed a bug where the server would silently fail when clients
requested invalid formats. It also, unfortunately, introduced a bug
where the server would reject requests with an empty format, which
should be allowed.
The change in #8127 updated the code to allow the empty format, but also
reintroduced the regression where the server would silently fail when
the format was set, but invalid.
This commit fixes both regressions. The server does not reject the empty
format, but it does reject invalid formats. It also adds tests to help
us catch regressions in the future.
Also, the updated code provides a more detailed error message when a
client sends a non-empty, but invalid format, echoing the invalid format
in the response.
This commits also takes the opportunity to remove superfluous linter
checks.
2024-12-16 21:57:49 -08:00
Jeffrey Morgan
0f06a6daa7
llm: loosen format check to default to no format ( #8127 )
2024-12-16 18:45:46 -08:00
Daniel Hiltgen
8f805dd74b
darwin: restore multiple runners for x86 ( #8125 )
...
In 0.5.2 we simplified packaging to have avx only for macos x86. It looks like
there may still be some non-AVX systems out there, so this puts back the prior
logic of building no-AVX for the primary binary, and now 2 runners for avx and avx2.
These will be packaged in the App bundle only, so the stand-alone binary will now be
without AVX support on macos. On arm, we'll also see these runners reported
as available in the log, but they're dormant and will never be used at runtime.
2024-12-16 18:45:02 -08:00
Michael
89d5e2f2fd
readme: example/get started guide for pgai with Ollama ( #8115 )
...
readme: example/get started guide for pgai with Ollama
2024-12-16 17:14:37 +08:00
Jascha Beste
297ada6c87
readme: add pgai to readme for semantic search ( #8028 )
...
* docs: switch around database integrations order and link to quickstart
* docs: link to blog post in example readme
* chore: link to main readme
* readme: removing example to link externally
readme: removing example to link externally so we don't have to keep this example up-to-date
---------
2024-12-16 17:02:28 +08:00
Patrick Devine
8c9fb8eb73
imageproc mllama refactor ( #7537 )
...
Refactor mllama image processing code, and add pixtral and qwen2vl
2024-12-14 19:50:15 -08:00
Daniel Hiltgen
b75ccfc5ec
ci: be more aggressive on parallelism in build ( #8102 )
2024-12-14 14:56:05 -08:00
Jeffrey Morgan
7a81daf026
llama: update vendor code to commit ba1cb19c ( #8101 )
2024-12-14 14:55:51 -08:00
Daniel Hiltgen
60f75560a2
runner: switch logging back to stderr ( #8091 )
...
This puts the low-level runner logging back on stderr for consistency with prior releases
2024-12-13 14:36:50 -08:00
Anuraag (Rag) Agrawal
e28f2d4900
openai: return usage as final chunk for streams ( #6784 )
...
* openai: return usage as final chunk for streams
---------
Co-authored-by: ParthSareen <parth.sareen@ollama.com >
2024-12-12 17:09:30 -08:00
Pascal Patry
c216850523
llama: parse JSON schema using nlohmann::ordered_json to maintain ordering ( #8071 )
2024-12-12 09:57:28 -08:00
Parth Sareen
18f6a98bd6
llama: enable JSON schema key ordering for generating grammars ( #8055 )
2024-12-11 17:17:36 -08:00
Blake Mizerany
b1fd7fef86
server: more support for mixed-case model names ( #8017 )
...
Fixes #7944
2024-12-11 15:29:59 -08:00
Daniel Hiltgen
36d111e788
ci: fix linux version ( #8054 )
...
Pass through the version override so the makefiles use it
2024-12-11 14:09:57 -08:00
Blake Mizerany
9039c821a2
llama: preserve field order in user-defined JSON schemas ( #8002 )
...
Previously we decoded and re-encoded JSON schemas during validation,
which served no purpose since json.RawMessage already validates JSON
syntax. Worse, the re-encoding lost field ordering from the original
schema, which affects inference quality during step-by-step reasoning.
While fixing this ordering issue by using json.RawMessage directly,
testing revealed that schema_to_grammar (from llama.cpp) also fails to
preserve field order during grammar generation. This appears to be the
root cause of inference degradation.
This change prevents us from mangling the user's original schema order,
but we still need to address the ordering issue in schema_to_grammar.
That will be a separate change.
Updates #7978
2024-12-11 14:07:30 -08:00
Daniel Hiltgen
581a4a5553
ci: fix artifact path prefix for missing windows payloads ( #8052 )
...
upload-artifacts strips off leading common paths so when
the ./build/ artifacts were removed, the ./dist/windows-amd64
prefix became common and was stripped, making the
later download-artifacts place them in the wrong location
2024-12-11 10:59:32 -08:00
Daniel Hiltgen
cf4d7c52c4
win: builtin arm runner ( #8039 )
...
The new build embeds the arm runner in the
main binary, so there is no longer a lib/ollama
2024-12-11 08:32:13 -08:00
Daniel Hiltgen
6a6328a5e9
ci: build dir changed ( #8037 )
...
Remove no longer relevant build log dir
2024-12-10 20:33:34 -08:00
Jeffrey Morgan
527cc97899
llama: update vendored code to commit 40c6d79f ( #7875 )
2024-12-10 19:21:34 -08:00
Blake Mizerany
a37f4a86a7
go.mod: go 1.22.8 -> 1.23.4 ( #8036 )
2024-12-10 18:16:16 -08:00
湛露先生
46f74e0cb5
Return err when NewHipLib() detect error. ( #8012 )
...
Signed-off-by: zhanluxianshen <zhanluxianshen@163.com >
2024-12-10 16:32:29 -08:00
Phil Wornath
7622ea21af
readme: add AI summary helper plugin to community-integrations ( #7202 )
2024-12-10 16:13:06 -08:00
Tao Zuhong
c5d3947084
readme: add Kangaroo, an AI-powered SQL admin tool to community integrations ( #7948 )
2024-12-10 13:48:32 -08:00
frob
757eeacc1b
server: lowercase hostname for Host header check ( #5851 )
2024-12-10 13:43:22 -08:00
Dr. Daniel Bender
dd42acf737
readme: add aidful-ollama-model-delete to community integrations ( #8024 )
2024-12-10 13:03:19 -08:00
Daniel Hiltgen
b9ccb3741e
Remove unused runner CpuFeatures ( #8032 )
...
The final implementation of #7499 removed dynamic vector requirements
in favor of a simpler filename based model, and this was left over logic that
is no longer needed.
2024-12-10 12:59:39 -08:00
Stefan Weil
abfdc4710f
all: fix typos in documentation, code, and comments ( #7021 )
2024-12-10 12:58:06 -08:00
Daniel Hiltgen
82a02e18d9
build: fix typo in override variable ( #8031 )
...
The "F" was missing.
2024-12-10 10:51:16 -08:00
Daniel Hiltgen
4879a234c4
build: Make target improvements ( #7499 )
...
* llama: wire up builtin runner
This adds a new entrypoint into the ollama CLI to run the cgo built runner.
On Mac arm64, this will have GPU support, but on all other platforms it will
be the lowest common denominator CPU build. After we fully transition
to the new Go runners more tech-debt can be removed and we can stop building
the "default" runner via make and rely on the builtin always.
* build: Make target improvements
Add a few new targets and help for building locally.
This also adjusts the runner lookup to favor local builds, then
runners relative to the executable, and finally payloads.
* Support customized CPU flags for runners
This implements a simplified custom CPU flags pattern for the runners.
When built without overrides, the runner name contains the vector flag
we check for (AVX) to ensure we don't try to run on unsupported systems
and crash. If the user builds a customized set, we omit the naming
scheme and don't check for compatibility. This avoids checking
requirements at runtime, so that logic has been removed as well. This
can be used to build GPU runners with no vector flags, or CPU/GPU
runners with additional flags (e.g. AVX512) enabled.
* Use relative paths
If the user checks out the repo in a path that contains spaces, make gets
really confused so use relative paths for everything in-repo to avoid breakage.
* Remove payloads from main binary
* install: clean up prior libraries
This removes support for v0.3.6 and older versions (before the tar bundle)
and ensures we clean up prior libraries before extracting the bundle(s).
Without this change, runners and dependent libraries could leak when we
update and lead to subtle runtime errors.
2024-12-10 09:47:19 -08:00
frob
63269668c0
Prevent underflow when FreeMemory < overhead ( #8014 )
...
Co-authored-by: Richard Lyons <frob@cloudstaff.com >
2024-12-10 09:10:40 -08:00
Jesse Gross
900f64e6be
prompt: Don't trim whitespace from prompts
...
New lines can be an important part of a user's prompt and trimming
it can alter the results. We previously only trimmed prompts with
images but refactoring brought this behavior to all prompts, where
it became more noticable.
The /generate endpoint adds less whitespace and therefore doesn't
need to trim it out - this brings the same behavior to /chat.
Thanks to @gabe-l-hart for spotting the issue!
Fixes #7795
2024-12-09 11:02:55 -08:00
Yannick Gloster
da09488fbf
docs: remove comment regarding tool streaming in openai.md ( #7960 )
2024-12-07 22:16:21 -08:00
湛露先生
7f0ccc8a9d
docs: fix syntax error in openai.md ( #7986 )
2024-12-07 22:14:36 -08:00
Parth Sareen
de52b6c2f9
bugfix: "null" value json mode ( #7979 )
2024-12-06 14:13:15 -08:00
Michael
acd7d03266
readme: add llama3.3 to readme ( #7975 )
...
readme: add llama3.3 to readme
2024-12-06 14:05:11 -05:00
Parth Sareen
f6e87fd628
docs: update readmes for structured outputs ( #7962 )
2024-12-06 10:35:37 -08:00
Jeffrey Morgan
aed1419c64
ci: skip go build for tests ( #7899 )
2024-12-04 21:22:36 -08:00
Parth Sareen
c6c526275d
api: add generate endpoint for structured outputs ( #7939 )
2024-12-04 17:37:12 -08:00
Parth Sareen
630e7dc6ff
api: structured outputs - chat endpoint ( #7900 )
...
Adds structured outputs to chat endpoint
---------
Co-authored-by: Michael Yang <mxyng@pm.me >
Co-authored-by: Hieu Nguyen <hieunguyen1053@outlook.com >
2024-12-04 16:31:19 -08:00
Michael Yang
eb8366d658
Merge pull request #7932 from ollama/mxyng/fix-merges
2024-12-04 10:04:52 -08:00
Michael Yang
4456012956
fix unmarshaling merges
2024-12-04 09:21:56 -08:00
Sam
539be43640
llm: normalise kvct parameter handling ( #7926 )
2024-12-03 16:30:40 -08:00
Sam
1bdab9fdb1
llm: introduce k/v context quantization (vRAM improvements) ( #6279 )
2024-12-03 15:57:19 -08:00
owboson
2b82c5a8a1
docs: correct default num_predict value in modelfile.md ( #7693 )
2024-12-03 15:00:05 -08:00
Tigran
55c3efa900
docs: remove extra quote in modelfile.md ( #7908 )
2024-12-02 09:28:56 -08:00
David Mayboroda
1aedffad93
readme: add minima to community integrations ( #7906 )
2024-12-02 01:14:47 -08:00
Jeffrey Morgan
ff6c2d6dc8
cmd: don't rely on reading repo file for test ( #7898 )
2024-11-30 14:12:53 -08:00
Jeffrey Morgan
d543b282a7
server: add warning message for deprecated context field ( #7878 )
2024-11-30 14:05:50 -08:00
Parth Sareen
5f8051180e
Enable index tracking for tools - openai api support ( #7888 )
2024-11-29 20:00:09 -08:00
Jeffrey Morgan
39e29ae5dd
llama: fix typo and formatting in readme ( #7876 )
2024-11-28 17:27:11 -08:00
TheCookingSenpai
30a9f063c9
readme: add SpaceLlama, YouLama, and DualMind to community integrations ( #7216 )
2024-11-28 15:16:27 -08:00
Parth Sareen
ce7455a8e1
api: enable tool streaming ( #7836 )
2024-11-27 13:40:57 -08:00
ItzCrazyKns
e3936d4fb3
Support Multiple LoRa Adapters ( #7667 )
...
Closes #7627
2024-11-27 11:00:04 -08:00
Bruce MacDonald
940e62772e
openai: remove unused error code ( #7850 )
...
The writeError takes a code argument which is no longer used. Remove it for clarity.
2024-11-26 16:08:09 -08:00
Jesse Gross
71e6a0d0d1
runner.go: Don't try to extract image tags for text models
...
When processing a prompt, we look for image tags of the form
[img-0], which are inserted by the Ollama server process.
However, this can cause errors if the original prompt has these
tags - typically an image not found error is returned.
This changes tag searching behavior to be similar to the 0.3.x
series, which will largely avoid these problems. However,they can
still happen when input text with these tags is used with image
models. The correct solution is to escape the tags but this is a
larger issue with special sequences in general so this is an
incremental fix that should avoid the problem for the majority
of cases.
2024-11-26 13:23:24 -08:00
Jesse Gross
2cd11ae365
runner.go: Add unit tests for context shifting
...
This also makes it easier to truncate long inputs the same as
shifting but does not actually implement it. This type of
truncation has a trade off between quality and time to first
token.
2024-11-26 11:21:35 -08:00
jake83741
52bbad12f9
readme: update description for vnc-lm community integration ( #7832 )
2024-11-25 17:56:30 -08:00
frob
30e88d7f31
cmd: don't submit svg files as images for now ( #7830 )
2024-11-25 16:43:29 -08:00
Blake Mizerany
2b7ed61ca2
server: fix Transport override ( #7834 )
...
This changes makeRequest to update the http client Transport if and only
if testMakeRequestDialContext is set. This is to avoid overriding the
default Transport when testMakeRequestDialContext is nil, which broke
existing behavior, included proxies, timeouts, and other behaviors.
Fixes #7829
Fixes #7788
2024-11-25 15:08:34 -08:00
Shikhar Bakhda
647513a7d4
readme: add HoneyHive to community integrations ( #7831 )
2024-11-25 09:55:33 -08:00
Bruce MacDonald
a210ec74d2
cmd: print location of model after pushing ( #7695 )
...
After a user pushes their model it is not clear what to do next. Add a link
to the output of `ollama push` that tells the user where their model can now
be found.
2024-11-25 09:40:16 -08:00
Simon Schampijer
cfb1ddd6fc
examples: update langchain-python-simple ( #3591 )
...
- better formatting of input prompt
- use invoke instead of predict
2024-11-24 16:06:22 -08:00
reid41
3987acd7ec
readme: add descriptions for QA-Pilot and shell-pilot community integrations ( #4303 )
2024-11-24 15:55:09 -08:00
frob
fda1e6b563
llm: bring fileTypes into alignment with llama.cpp ( #7819 )
2024-11-24 10:33:33 -08:00
Adarsh Mishra
3440ffb37b
readme: add description for OpenTalkGpt in community integrations ( #7818 )
2024-11-24 10:32:23 -08:00
Patcher
a820d2b267
readme: add observability section with OpenLIT to community-integrations
2024-11-23 18:03:12 -08:00
Meng Zhuo
2ebdb54fb3
all: update math32 go mod to v1.11.0 ( #6627 )
2024-11-23 15:21:54 -08:00
josc146
bb52abfa55
readme: add ChatGPTBox and RWKV-Runner to community integrations ( #4118 )
2024-11-23 13:31:27 -08:00
oza6ut0ne
31cb1ca9e5
openai: accept X-Stainless-Retry-Count header ( #6910 )
2024-11-23 12:39:05 -08:00
Rodrigo Ribeiro Gomes
78f779a323
readme: add powershai, a powershell module with ollama support to community integrations ( #7438 )
2024-11-23 10:08:59 -08:00
Jesse Gross
3478b2cf14
runner.go: Fix deadlock with many concurrent requests
...
If there are no avilable slots for new sequences then a request
will not be added to the processing queue but will continue on
to wait for a response that never comes. Besides never giving a
response to the request, this prevents the model from being
unloaded due to the outstanding request.
To prevent this, there are semaphores that prevent more requests
from being processed than there are slots - one in the Ollama
server and one in the runner.
- The Ollama server one works but it is not designed to protect
the runner's data internal structures and the runner can return a
final response before clearing its data structures.
- The internal runner semaphore has similar behavior where it
can release the semaphore when it issues a response. This is
wrong - it should only release the semaphore after it has
cleared the data structure.
In addition, we should return an error if a slot is not found
rather than deadlocking in the event we ever get to this spot.
Fixes #7779
2024-11-22 16:14:51 -08:00
Bruce MacDonald
7b5585b9cb
server: remove out of date anonymous access check ( #7785 )
...
In the past the ollama.com server would return a JWT that contained
information about the user being authenticated. This was used to return
different error messages to the user. This is no longer possible since the
token used to authenticate does not contain information about the user
anymore. Removing this code that no longer works.
Follow up changes will improve the error messages returned here, but good to
clean up first.
2024-11-22 11:57:35 -08:00
Daniel Hiltgen
f0a351810c
tests: fix max queue integration test ( #7782 )
...
This had fallen out of sync with the envconfig behavior, where max queue default was not zero.
2024-11-22 08:05:45 -08:00
Daniel Hiltgen
b85520bfb9
logs: explain client aborts better ( #7783 )
...
Users get confused by "Failed to acquire semaphore" error="context canceled"
messages in the logs, which are actually clients giving up. While there could be
a legitimate hang bug in the system, sometimes this is just short client timeouts
with an overloaded system, so this should help users understand what's going on
better.
2024-11-22 08:05:32 -08:00
Daniel Hiltgen
d88972ea48
Be quiet when redirecting output ( #7360 )
...
This avoids emitting the progress indicators to stderr, and the interactive
prompts to the output file or pipe. Running "ollama run model > out.txt"
now exits immediately, and "echo hello | ollama run model > out.txt"
produces zero stderr output and a typical response in out.txt
2024-11-22 08:04:54 -08:00
Leon Sander
25c9339e2d
readme: add Local Multimodal AI Chat app to community integrations ( #6931 )
2024-11-21 20:39:38 -08:00
Mikel Olasagasti Uranga
597072ef1b
readme: update google/uuid module ( #7310 )
...
update uuid.New().String() to uuid.NewString()
2024-11-21 19:37:04 -08:00
Dustin
84b3e07f1b
readme: add ollamarama-matrix to community integrations ( #7325 )
2024-11-21 17:49:30 -08:00
Edwin.JH.Lee
422d52858c
readme: add x-cmd ollama module to community integrations ( #5191 )
2024-11-21 16:55:25 -08:00
Elias
723f285813
readme: add OrionChat to community integrations ( #7084 )
...
OrionChat is a free web-based chat interface that simplifies interactions
with multiple AI model providers. It provides a unified platform for chatting
and exploring multiple large language models (LLMs).
2024-11-21 11:23:42 -08:00
湛露先生
eaaf5d309d
cmd: delete duplicated call to sb.Reset() ( #7308 )
...
Signed-off-by: zhanluxianshen <zhanluxianshen@163.com >
2024-11-21 11:20:48 -08:00
Jeffrey Morgan
27d9c749d5
docs: remove tutorials, add cloud section to community integrations ( #7784 )
2024-11-21 09:59:53 -08:00
R0CKSTAR
b7bddeebc1
env.sh: cleanup unused RELEASE_IMAGE_REPO ( #6855 )
...
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com >
2024-11-21 08:28:04 -08:00
Paul Robello
6a0c2ec50f
readme: add terminal tool ParLlama to community integrations ( #5623 )
2024-11-21 02:55:35 -08:00
毛巳煜
baa41be2aa
readme: add a community made ollama web management tool ( #7126 )
2024-11-21 02:51:45 -08:00
xuyangbocn
2157b1232e
readme: add Terraform AWS Ollama & Open WebUI community example ( #5633 )
2024-11-21 02:28:57 -08:00
emrgnt-cmplxty
37711578a2
readme: add R2R to community integrations ( #5587 )
2024-11-21 02:09:36 -08:00
Cyril Blaecke
fb2c9594e0
readme: Add Nosia to Community Integrations ( #5381 )
2024-11-21 02:07:17 -08:00
Christian Tzolov
7fbcd55da3
readme: Add Spring AI library reference ( #5981 )
2024-11-21 02:02:14 -08:00
Philippe Charrière
b4348bdd25
readme: add Parakeet to community integrations
...
Parakeet is a GoLang SDK for Ollama
---------
Co-authored-by: Parth Sareen <parth.sareen@ollama.com >
2024-11-21 02:00:32 -08:00
Marcin Szczygliński
155734e09a
readme: add community integration py-gpt ( #6503 )
2024-11-21 01:54:39 -08:00
Michael
883d80e097
readme: add Promptery to community integrations ( #7093 )
2024-11-21 01:46:20 -08:00
Jakub Burkiewicz
e4c9f75b23
readme: add node-red-contrib-ollama to community integrations ( #4648 )
2024-11-21 01:09:37 -08:00
Dezoito
f5ec7cc872
readme: add ollama grid search, a community project ( #4301 )
2024-11-21 01:02:46 -08:00
Franco Lombardo
811bafba82
readme: Add LLPhant to community integrations ( #5679 )
2024-11-21 00:54:26 -08:00
Aarushi
431075fcbb
readme: add autogpt integration to list of community integrations ( #6459 )
2024-11-21 00:51:38 -08:00
Kevin Brake
c4f27225ac
readme: add community contribution to readme ollama-kis ( #5575 )
2024-11-21 00:31:27 -08:00
chyok
b7aa5ee06c
readme: Add tkinter-based client to community based integrations ( #5412 )
2024-11-21 00:19:24 -08:00
Nico
3f87f71755
readme: add Shinkai Desktop to community integrations ( #4877 )
2024-11-21 00:16:18 -08:00
Laurent Eschenauer
20623cec13
readme: add OpenGPA to community integrations ( #5497 )
2024-11-21 00:13:54 -08:00
Andy Gill
0e5f31a86d
readme: add Haverscript to community integrations ( #6945 )
...
Haverscript uses classical functional programming techniques to provide a composable interface for interacting with ollama-hosted LLMs.
2024-11-21 00:11:39 -08:00
drunkwcodes
7e92091751
readme: Terminal app bb7 to community integrations ( #7064 )
2024-11-21 00:03:11 -08:00
boessu
1a742f54c9
readme: update AMD ROCm links ( #7213 )
2024-11-20 23:48:55 -08:00
奶茶叔叔
6a89dcf848
readme: flutter-based chat app to community integrations ( #7221 )
2024-11-20 23:30:10 -08:00
Alexander F. Rødseth
c5e238e8e5
readme: orbiton to community integrations ( #7770 )
2024-11-20 23:24:05 -08:00
Nikita Ganzikov
fce30f407a
app: typo in wintray messages const ( #7705 )
2024-11-20 22:01:58 -08:00
Daniel Hiltgen
d863298210
docs: Link to AMD guide on multi-GPU guidance ( #7744 )
2024-11-20 16:00:46 -08:00
Jesse Gross
c4b34f2a2a
runner.go: Truncate inputs that exceed context rather than shifting
...
Previous versions of the runner would truncate inputs to the context
window before beginning processing. The main processing loop relied
on this behavior if the context needed to be shifted later (due to
token generation). If truncation did not occur then invariants
would be broken, causing crashes or infinite loops.
Later versions attempted to fix these bugs and make the logic less
subtle so that all inputs could be handled. Truncation was removed
to make things consistent.
However, truncation is much faster than processing and shifting, so
removing it caused performance problems when the input vastly exceeded
the context size. This restores the input truncation as a performance
optimization while keeping the more robust processing logic.
Fixes #7762
2024-11-20 12:49:24 -08:00
Jesse Gross
c3ff916431
runner.go: Don't add inputs to cache view until actually processed
...
We need to track which tokens are in the cache ourselves. We currently
add tokens to the cache tracker when we add them to batch but they are
not actually in the cache until we call Decode. This can cause
confusion when we are shifting the cache.
Avoids "could not find a KV slot for the batch" issues.
Bug #7545
2024-11-20 12:49:24 -08:00
Jesse Gross
3fc1dc0e6f
runner.go: Hard fail on errors rather than potentially infinite looping
...
We try to recover from errors by dropping the tokens that caused the
problem and re-trying. However, dropping the tokens is not correct
and continuing often leads to infinite loops. To avoid, this we
end the sequence if such a condition is detected, which is also
surprising.
At this point, it is better to just report the error. This will make
it easier to find problems and the alternatives are perhaps even more
surprising to users.
This is not a very satisfactory solution either - we should isolate
the error and return it to the user without killing the whole process.
However, this is an incremental step and consistent with most other
failures (which either manifest as abort() or panic).
2024-11-20 12:49:24 -08:00
Jesse Gross
7121dfa309
runner.go: Retry decoding after defragmentation if needed
...
Fragmentation of the KV cache can occur due to cache shifting or
different sequences getting processed. Decode uses a heuristic to
decide if it should defrag. However, this heuristic isn't 100%
accurate, so decoding can sometimes fail by surprise.
For these cases, if decode indicates that there is no KV cache space,
we should defrag and then try again.
2024-11-20 12:49:24 -08:00
Jesse Gross
5f68fcab12
runner.go: Use correct index when retrieving embedding results
...
This doesn't have any impact currently because NUM_PARALLEL is forced
to 1 for embeddings, so both indicies will always be 0.
2024-11-20 12:49:24 -08:00
Emir Sahin
ecf41eed05
readme: add llm-axe to community integrations ( #5931 )
2024-11-20 10:53:14 -08:00
Marcus Ziadé
b8c66d3307
readme: add a swift community integration ( #7383 )
2024-11-20 10:49:15 -08:00
thewh1teagle
303f4bc79e
readme: add vibe app to community integrations ( #7607 )
2024-11-20 10:45:10 -08:00
Adarsh Mishra
d2a25206b1
readme: add opentalkgpt to community integrations ( #7707 )
2024-11-20 10:42:55 -08:00
rohitanshu
2f0a8c8778
docs: fix minor typo in import.md ( #7764 )
...
change 'containg' to 'containing'
2024-11-20 09:57:32 -08:00
Gordon Kamer
bfd30f4286
readme: add Abbey to community integrations ( #7746 )
2024-11-19 21:37:15 -08:00
Jonathan Hecl
0ef17ede89
readme: add Gollama to community integrations ( #7756 )
2024-11-19 21:31:43 -08:00
Daniel Hiltgen
909a88c5c0
Improve crash reporting ( #7728 )
...
Many model crashes are masked behind "An existing connection was forcibly closed by the remote host"
This captures that common error message and wires in any detected errors from the log.
This also adds the deepseek context shift error to the known errors we capture.
2024-11-19 16:26:57 -08:00
Daniel Hiltgen
f602ab4de4
expose underlying error on embedding failure ( #7743 )
...
Avoid a round-trip asking users for logs to see what went wrong.
2024-11-19 16:26:05 -08:00
Gabe Goodhart
807ace5b1f
fix(runner): Set logits to 0 if false on Batch.Add
...
https://github.com/ollama/ollama/issues/7656
Branch: Granite3StoppingBug-7656
Signed-off-by: Gabe Goodhart <ghart@us.ibm.com >
2024-11-19 15:45:37 -08:00
Blake Mizerany
4b8a2e341a
server: allow mixed-case model names on push, pull, cp, and create ( #7676 )
...
This change allows for mixed-case model names to be pushed, pulled,
copied, and created, which was previously disallowed because the Ollama
registry was backed by a Docker registry that enforced a naming
convention that disallowed mixed-case names, which is no longer the
case.
This does not break existing, intended, behaviors.
Also, make TestCase test a story of creating, updating, pulling, and
copying a model with case variations, ensuring the model's manifest is
updated correctly, and not duplicated across different files with
different case variations.
2024-11-19 15:05:57 -08:00
frob
e66c29261a
Better error suppresion when getting terminal colours ( #7739 )
...
Co-authored-by: Richard Lyons <frob@cloudstaff.com >
2024-11-19 08:33:52 -08:00
Patrick Devine
712d63c3f0
update the docs ( #7731 )
2024-11-18 21:17:38 -08:00
Patrick Sy
6cdf27d154
readme: add Alfred Ollama to community integrations ( #7724 )
2024-11-18 19:33:23 -08:00
frob
5c18e66384
Notify the user if systemd is not running ( #6693 )
...
Co-authored-by: Richard Lyons <frob@cloudstaff.com >
2024-11-18 15:02:41 -08:00
Daniel Hiltgen
35096a7eff
win: add right click menu support ( #7727 )
...
Enable both left and right click on the pop-up menu
2024-11-18 14:39:52 -08:00
Daniel Hiltgen
81d55d3e4d
fix index out of range on zero layer metal load ( #7696 )
...
If the model doesn't fit any layers on metal, and we load zero layers
we would panic trying to look up the GPU size during scheduling ops
2024-11-18 11:48:13 -08:00
Vinh Nguyen
a14f76491d
readme: improve Community Integrations section ( #7718 )
2024-11-17 19:30:22 -08:00
Nicolas Bonamy
760cfa27e5
readme: add Witsy and multi-llm-ts to community integrations ( #7713 )
2024-11-17 16:33:10 -08:00
Darius Kocar
c9a5aca3da
readme: add Perfect Memory AI to community integrations ( #7431 )
2024-11-17 15:19:26 -08:00
Tushar Adhatrao
d5da2ab7e8
readme: add ollama-haskell library to community integrations ( #7451 )
2024-11-17 15:18:04 -08:00
Vinh Nguyen
1c04117114
readme: add the VT app to the community integrations section ( #7706 )
2024-11-17 14:35:41 -08:00
Jeffrey Morgan
8b4b243f5f
server: fix warnings in prompt_test.go ( #7710 )
2024-11-17 13:01:04 -08:00
Jeffrey Morgan
b42a596425
docs: add customization section in linux.md ( #7709 )
2024-11-17 11:48:12 -08:00
Daniel Hiltgen
4759d879f2
Install support for jetpacks ( #7632 )
...
Follow up to #7217 - merge after release
2024-11-15 16:47:54 -08:00
Jesse Gross
d875e99e46
runner.go: Propagate panics back to the user.
...
This is a partial revert of 8a35bb92
"runner.go: Increase survivability of main processing loop", removing
the panic handler.
Although we want to avoid errors taking down the runner, we also
should make the user aware of problems when they happen. In the
future, we can restructure things so both parts are true.
2024-11-15 11:52:25 -08:00
Jesse Gross
8a35bb926e
runner.go: Increase survivability of main processing loop
...
Currently, if an error occurs during the prep stages (such as
tokenizing) of a single request, it will only affect that request.
However, if an error happens during decoding, it can take down the
entire runner.
Instead, it's better to drop the tokens that triggered the error and try to
keep going. However, we also need to stop when we run out of tokens,
otherwise, this just causes an infinite loop. This is likely the cause
of at least some of the hanging issues that have been reported.
Bug #7573
2024-11-14 17:18:41 -08:00
Daniel Hiltgen
a0ea067b63
build: fix arm container image ( #7674 )
...
Fix a rebase glitch from the old C++ runner build model
2024-11-14 16:02:01 -08:00
Patrick Devine
4efb98cb4f
add line numbers for parser errors ( #7326 )
2024-11-14 13:59:44 -08:00
Bruce MacDonald
0679d491fe
chore(deps): bump golang.org/x dependencies ( #7655 )
...
- golang.org/x/sync v0.3.0 -> v0.9.0
- golang.org/x/image v0.14.0 -> v0.22.0
- golang.org/x/text v0.15.0 -> v0.20.0
2024-11-14 13:58:25 -08:00
Jesse Gross
c25ffde91d
runner.go: Don't trim whitespace from inputs
...
It's possible to get prompts that consist entirely of whitespace -
this is most likely to happen when generating embeddings. Currently,
we will trim this away, leaving an empty prompt, which will then
generate an error.
Generating embeddings from whitespace should not trigger an error,
as this may break pipelines. It's better to just leave the whitespace
in place and process what we are given. This is consistent with
past versions of Ollama.
Bug #7578
2024-11-14 11:23:06 -08:00
Jesse Gross
17b386a891
runner.go: Enforce NUM_PARALLEL directly in the runner
...
NUM_PARALEL is currently enforced by the Ollama server process - it
will only issue requests to the runner if the maximum number of
concurrent requests has not been exceeded. Although this should
be sufficient, it is good for the runner to protect its own data
structures. Currently, if too many requests get through to the
runner, they will just get stuck and never return.
This may help with reports of Ollama hanging, though it is unclear
how it would actually occur.
Bug #7573
2024-11-14 11:21:59 -08:00
Michael Yang
549c2bdfcf
Merge pull request #7657 from ollama/mxyng/sync
...
fix(mllama): sync backend between batches
2024-11-14 09:40:04 -08:00
Blake Mizerany
67691e410d
cmd: preserve exact bytes when displaying template/system layers ( #7586 )
2024-11-13 23:53:30 -08:00
Michael Yang
5b3393b6a2
fix(mllama): sync backend between batches
2024-11-13 16:37:21 -08:00
Jesse Gross
d7eb05b936
runner.go: Fix off-by-one for num predicted
2024-11-12 11:35:57 -08:00
Daniel Hiltgen
636a743c2b
CI: give windows lint more time ( #7635 )
...
It looks like 8 minutes isn't quite enough and we're seeing sporadic timeouts
2024-11-12 11:22:39 -08:00
Daniel Hiltgen
df011054fa
Jetpack support for Go server ( #7217 )
...
This adds support for the Jetson JetPack variants into the Go runner
2024-11-12 10:31:52 -08:00
Daniel Hiltgen
ac07160c8d
doc: capture numeric group requirement ( #6941 )
...
Docker uses the container filesystem for name resolution, so we can't guide users
to use the name of the host group. Instead they must specify the numeric ID.
2024-11-12 09:13:23 -08:00
Daniel Hiltgen
6606e4243c
docs: Capture docker cgroup workaround ( #7519 )
...
GPU support can break on some systems after a while. This captures a
known workaround to solve the problem.
2024-11-12 09:12:50 -08:00
Jesse Gross
65973ceb64
runner.go: Make KV entry accounting more robust
...
The structure of the accounting for KV cache shifting was carried
over from the old runner but it now doesn't feel natural with the new
runner. There are a number of invariants that should hold true but
are difficult to reason about. There is at least one bug report
that would imply that the invariants are not holding.
This reduces the number of implicit assumptions and is more forgiving
of unexpected situations. It also improves behavior around which input
tokens are kept when truncation occurs.
Bug #7545
2024-11-11 20:23:03 -08:00
Joey Zheng
bebef1e50d
readme: add aichat terminal app to community integrations ( #7418 )
2024-11-11 16:44:46 -08:00
Evan
d48c1c5a44
api: fix typos in Go Doc comments ( #7620 )
2024-11-11 16:21:58 -08:00
Prasad Bhalerao
36a8372b28
readme: add GoLamify to community integrations ( #7521 )
2024-11-10 22:38:18 -08:00
Ivo Stoykov
4e94227b5d
readme: add browser extension that enables using Ollama for interacting with web pages ( #5827 )
2024-11-10 22:14:22 -08:00
frances720
479d551766
docs: add mentions of Llama 3.2 ( #7517 )
2024-11-10 19:04:23 -08:00
Evan
76b2b723b2
api: fix typo in python ClientFromEnvironment docs ( #7604 )
2024-11-10 17:30:27 -08:00
Arhan Busam
b8d77cdeab
readme: add llama3.2-vision to model list ( #7580 )
2024-11-10 13:36:25 -08:00
Jesse Gross
c2e8cbaa14
runner.go: Check for zero length images
...
If we get a request with a zero length image, it will result in
an out-of-bounds error when we pass the data to the image encoder.
2024-11-08 09:39:32 -08:00
Edward J. Schwartz
771fab1dd8
docs: update langchainpy.md with proper model name ( #7527 )
2024-11-08 09:36:17 -08:00
Daniel Hiltgen
3a5239e6bf
Set macos min version for all architectures ( #7579 )
2024-11-08 09:27:04 -08:00
Daniel Hiltgen
3d25e7bf8c
win: remove preview title from installer ( #7529 )
...
This should have been in #7347 but was overlooked.
2024-11-07 14:26:47 -08:00
Daniel Hiltgen
1618700c5a
Workaround buggy P2P ROCm copy on windows ( #7466 )
...
This enables the workaround code only for windows which should help windows users with muliple AMD GPUs
2024-11-07 14:26:31 -08:00
Daniel Hiltgen
b111aa5a91
Debug logging for nvcuda init ( #7532 )
...
Some users are reporting crashes during nvcuda.dll initialization
on windows. This should help narrow down where things are going bad.
2024-11-07 14:25:53 -08:00
Daniel Hiltgen
9e83e550e1
Align rocm compiler flags ( #7467 )
...
Bring consistency with the old generate script behavior
2024-11-07 10:20:50 -08:00
Daniel Hiltgen
fc2a0715df
Be explicit for gpu library link dir ( #7560 )
...
On linux nvcc isn't automatically linking to the same cuda version.
2024-11-07 09:20:40 -08:00
Jesse Gross
3020d2dc58
docs: OLLAMA_NEW_RUNNERS no longer exists
2024-11-06 14:39:02 -08:00
Jesse Gross
a909417602
runner.go: Remove unused arguments
...
Now that server.cpp is gone, we don't need to keep passing arguments
that were only ignored and only kept for compatibility.
2024-11-06 13:32:18 -08:00
Jesse Gross
6cd566872b
sched: Lift parallel restriction for multimodal models except mllama
...
The Go runner does not have a problem with supporting parallel
requests for most multimodal models. Now that we won't be potentially
falling back to server.cpp, this restriction can be lifted.
However, the new mllama model can't support parallel requests, so we
will need to keep a restriction for that.
2024-11-06 13:32:18 -08:00
RAPID ARCHITECT
9d71bcc3e2
Update README.md ( #7516 )
...
added reddit rate below hexabot, ollama powered reddit search and analysis with streamlit for the intervace
2024-11-05 15:07:25 -08:00
Daniel Hiltgen
a4c70fe157
One corrupt manifest should not wedge model operations ( #7515 )
...
One potential failure mode is an empty file which bubbles up as an EOF error,
leading to all pulls and listing operations failing. Instead, continue and
warn about the corrupt manifest. This also allows re-pulling the corrupt
manifest to repair the system.
2024-11-05 14:21:45 -08:00
Jesse Gross
34a75102f7
prompt: Use a single token when estimating mllama context size
...
Currently we assume that images take 768 tokens of context size for
the purposes of clipping old messages that exceed the context window.
However, our mllama implementation stores the full image embedding
in a single token. As a result, there is significant waste of context
space.
Ideally, we would handle this more generically and have the
implementation report the number of tokens. However, at the moment
this would just result in a similar set of 'if' conditions in the
runner plus APIs to report it back. So for now, we just keep this
simple.
2024-11-05 10:11:50 -08:00
Med Marrouchi
4157d1f7b6
readme: add Hexabot to the list of community integrations
2024-11-05 09:06:38 -08:00
Daniel Hiltgen
4ebfa2cb91
Quiet down debug log of image payload ( #7454 )
...
Avoid excessive log spew and make consistent with chat logging
2024-11-04 13:05:16 -08:00
Daniel Hiltgen
046054fa3b
CI: Switch to v13 macos runner ( #7498 )
2024-11-04 13:02:07 -08:00
Daniel Hiltgen
95483f348b
CI: matrix strategy fix ( #7496 )
...
Github actions matrix strategy can't access env settings
2024-11-04 10:48:35 -08:00
Michael Yang
f247a6233e
Merge pull request #7456 from ollama/mxyng/llama3.2-vision-mem
...
update llama3.2 vision memory estimation
2024-11-04 09:48:43 -08:00
Daniel Hiltgen
44bd9e5994
Sign windows arm64 official binaries ( #7493 )
2024-11-04 09:15:14 -08:00
suncloudsmoon
18237be9b2
readme: add TextCraft to community integrations ( #7377 )
2024-11-03 16:53:51 -08:00
Daniel Hiltgen
29ab9fa7d7
nvidia libs have inconsistent ordering ( #7473 )
...
The runtime and management libraries may not always have
identical ordering, so use the device UUID to correlate instead of ID.
2024-11-02 16:35:41 -07:00
Daniel Hiltgen
b8d5036e33
CI: omit unused tools for faster release builds ( #7432 )
...
This leverages caching, and some reduced installer scope to try
to speed up builds. It also tidies up some windows build logic
that was only relevant for the older generate/cmake builds.
2024-11-02 13:56:54 -07:00
Jesse Gross
312d9de1d1
llama: Improve error handling
...
Check for NULL return values from llama.cpp in more places and
convert them into Go errors, which should make debugging easier
in the future rather than having hidden surprises in our data
structures.
2024-11-02 13:37:55 -07:00
Jesse Gross
a103dae01e
runner.go: Only allocate 1 element embedding batches for mllama
...
Mllama has large embeddings (100 MB per image) and each embedding is
represented as 1 token when passed to llama.cpp. Batches are pre-
allocated for the size of the tokens times the batch size, so this
results in allocations of over 50 GB at the default batch size.
On some systems, these mallocs will fail.
Since an image is represented as a single token and mllama doesn't
support more than 1 image per request, we only need to allocate a
batch size of 1, which is much more reasonable. In addition, for
non-multimodal models, we don't need to allocate the embedding
batches at all.
Fixes #7464
2024-11-02 13:37:55 -07:00
Michael Yang
d07cf41a97
refactor kv estimation
2024-11-01 16:23:55 -07:00
Michael Yang
8c238e70ab
mllama cross attention
2024-11-01 16:23:55 -07:00
Daniel Hiltgen
8a9bb0d000
Add basic mllama integration tests ( #7455 )
2024-10-31 17:25:48 -07:00
Jesse Gross
26acdcf44e
runner.go: Don't set cross attention before sending embeddings
...
Currently if an input has embeddings at any point then we will set
cross attention to true from the beginning. This means that any
tokens before the embeddings are sent will incorrectly have cross
attention layers applied.
This only sets cross attention when we have an embedding, either
previously in this sequence or in the cache. It also makes cross
attention capable of supporting parallelism at the runner level,
though the mllama implementation doesn't support that yet.
2024-10-31 13:56:08 -07:00
Daniel Hiltgen
921779bb10
Give unicode test more time to run ( #7437 )
...
* Give unicode test more time to run
Some slower GPUs (or partial CPU/GPU loads) can take more than the default 30s to complete this test
* Give more time for concurrency test
CPU inference can be very slow under stress
2024-10-31 13:35:31 -07:00
Daniel Hiltgen
16f4eabe2d
Refine default thread selection for NUMA systems ( #7322 )
...
Until we have full NUMA support, this adjusts the default thread selection
algorithm to count up the number of performance cores across all sockets.
2024-10-30 15:05:45 -07:00
Jesse Gross
c826e57475
runner.go: Better abstract vision model integration
...
-Update mllama to take the cross attention state as embeddings in
a batch, more similar to how Llava handles it. This improves
integration with the input cache.
-Pass locations in a prompt for embeddings using tags similar to Llava.
-Abstract interface to vision models so the main runner accesses Clip
and Mllama similarly
Co-authored-by: Michael Yang <mxyng@pm.me >
2024-10-30 14:53:43 -07:00
Daniel Hiltgen
712e99d477
Soften windows clang requirement ( #7428 )
...
This will no longer error if built with regular gcc on windows. To help
triage issues that may come in related to different compilers, the runner now
reports the compier used by cgo.
2024-10-30 12:28:36 -07:00
Daniel Hiltgen
b754f5a6a3
Remove submodule and shift to Go server - 0.4.0 ( #7157 )
...
* Remove llama.cpp submodule and shift new build to top
* CI: install msys and clang gcc on win
Needed for deepseek to work properly on windows
2024-10-30 10:34:28 -07:00
Daniel Hiltgen
a805e5947e
Move windows app out of preview ( #7347 )
2024-10-30 09:24:59 -07:00
Daniel Hiltgen
91dfbb1bba
windows: Support alt install paths, fit and finish ( #6967 )
...
* windows: Support alt install paths
Advanced users are leveraging innosetup's /DIR switch to target
an alternate location, but we get confused by things not existing in the LocalAppData dir.
This also hardens the server path lookup code for a future attempt to unify with a ./bin prefix
* Fit and finish improvements for windows app
Document alternate install location instructions for binaries and model.
Pop up progress UI for upgrades (automatic, with cancel button).
Expose non-default port in menu to disambiguate mutiple instances.
Set minimum Windows version to 10 22H2
2024-10-30 09:24:31 -07:00
Patrick Devine
db1842b9e1
add more tests for getting the optimal tiled canvas ( #7411 )
2024-10-29 16:28:02 -07:00
Daniel Hiltgen
c9ca386131
Switch windows to clang ( #7407 )
...
* Switch over to clang for deepseek on windows
The patch for deepseek requires clang on windows. gcc on windows
has a buggy c++ library and can't handle the unicode characters
* Fail fast with wrong compiler on windows
Avoid users mistakenly building with GCC when we need clang
2024-10-29 13:15:04 -07:00
Jesse Gross
078f666f73
tests: Add test for Unicode processing
2024-10-28 18:12:29 -07:00
Jesse Gross
de1557a0dc
runner.go: Better handle return NULL values from llama.cpp
...
Llama.cpp sometimes returns NULL as a return value to report an
error. We should explicitly check for this and convert it to a Go
error rather than putting NULL in our data structures and waiting
for it to blow up later.
2024-10-28 18:12:29 -07:00
Patrick Devine
084929c293
add mllama image processing to the generate handler ( #7384 )
2024-10-28 13:51:19 -07:00
Daniel Hiltgen
abd5dfd06a
Bump to latest Go 1.22 patch ( #7379 )
2024-10-26 17:03:37 -07:00
Daniel Hiltgen
099f7077a1
Fix deepseek deseret regex ( #7369 )
...
On windows compiled with gcc the c++ regex library failed to handle
the characters
2024-10-26 14:58:54 -07:00
Daniel Hiltgen
d7c94e0ca6
Better support for AMD multi-GPU on linux ( #7212 )
...
* Better support for AMD multi-GPU
This resolves a number of problems related to AMD multi-GPU setups on linux.
The numeric IDs used by rocm are not the same as the numeric IDs exposed in
sysfs although the ordering is consistent. We have to count up from the first
valid gfx (major/minor/patch with non-zero values) we find starting at zero.
There are 3 different env vars for selecting GPUs, and only ROCR_VISIBLE_DEVICES
supports UUID based identification, so we should favor that one, and try
to use UUIDs if detected to avoid potential ordering bugs with numeric IDs
* ROCR_VISIBLE_DEVICES only works on linux
Use the numeric ID only HIP_VISIBLE_DEVICES on windows
2024-10-26 14:04:14 -07:00
Daniel Hiltgen
35ec7f079f
Fix unicode output on windows with redirect to file ( #7358 )
...
If we're not writing out to a terminal, avoid setting the console mode
on windows, which corrupts the output file.
2024-10-25 13:43:16 -07:00
Daniel Hiltgen
5231ae52d9
Fix incremental build file deps ( #7361 )
...
The common src/hdr defs should be in the common definitions, not gpu specific.
2024-10-25 11:50:45 -07:00
Daniel Hiltgen
3085c47bea
Improve dependency gathering logic ( #7345 )
...
This unfies the rocm/cuda dependency logic into the makefile
and fixes a missing define which broke windows rocm
2024-10-24 09:51:53 -07:00
Bill Wang
0ccc73251a
fix #7247 - invalid image input ( #7249 )
...
---------
Co-authored-by: Bill Wang <bill.wang@bill.wang >
2024-10-23 10:31:04 -07:00
Daniel Hiltgen
dc6fe82051
integration: harden embedding test ( #7306 )
...
Use cosine similarity to make the embeddings tests more robust
2024-10-22 15:25:22 -07:00
Patrick Devine
d78fb62056
default to "FROM ." if a Modelfile isn't present ( #7250 )
2024-10-22 13:32:24 -07:00
Daniel Hiltgen
5c44461ccf
Fix rocm windows build and clean up dependency gathering ( #7305 )
...
On windows ensure windows version define is properly set for rocm.
Remove duplicate rocm arch flags.
Resolve wildcards in the targets so parallel builds don't race.
Use readlink to resolve rocm dependencies since wildcards omit libelf
Keep windows rocm deps aligned with unified packaging model
2024-10-22 12:54:15 -07:00
Jesse Gross
03e40efa51
runner.go: Merge partial unicode characters before sending
...
We check for partial unicode characters and accumulate them before
sending. However, when we did send, we still sent each individual piece
separately, leading to broken output. This combines everything into
a single group, which is also more efficient.
This also switches to the built-in check for valid unicode characters,
which is stricter. After this, we should never send back an invalid
sequence.
Fixes #7290
2024-10-22 12:07:51 -07:00
Mattt
23f746508d
readme: add Ollama for Swift to the community integrations ( #7295 )
2024-10-21 22:29:11 -07:00
Jeffrey Morgan
48708ca0d5
server: allow vscode-webview origin ( #7273 )
2024-10-19 14:06:41 -07:00
Patrick Devine
c7cb0f0602
image processing for llama3.2 ( #6963 )
...
Co-authored-by: jmorganca <jmorganca@gmail.com >
Co-authored-by: Michael Yang <mxyng@pm.me >
Co-authored-by: Jesse Gross <jesse@ollama.com >
2024-10-18 16:12:35 -07:00
Daniel Hiltgen
bf4018b9ec
llama: Decouple patching script from submodule ( #7139 )
...
* Refine llama.cpp vendoring workflow tools
Switch from the sync.sh over to make based tooling
* Run new make sync and patch flow
2024-10-17 15:03:09 -07:00
Daniel Hiltgen
f86d00cd95
llama: add compiler tags for cpu features ( #7137 )
...
This adds the ability to customize the default runner with user specified flags
2024-10-17 13:43:20 -07:00
Gabe Goodhart
f2890a4494
IBM granite/granitemoe architecture support ( #6760 )
...
* fix(ext_server): Port llama.cpp sampling refactors to ext_server
This was a fairly large changeset. I closely followed the changes here:
https://github.com/ggerganov/llama.cpp/commit/df270ef74596da8f1178f08991f4c51f18c9ee82
Branch: IBMGraniteArchitectureSupport
Signed-off-by: Gabe Goodhart <ghart@us.ibm.com >
* fix(server.cpp): Refactor server.cpp logging for llama.cpp overhaul
Branch: IBMGraniteArchitectureSupport
Signed-off-by: Gabe Goodhart <ghart@us.ibm.com >
* feat: Bump llama.cpp to the latest master with `granite` support
This does not yet have granite MoE support, but that can come in a
follow up PR
Branch: IBMGraniteArchitectureSupport
Signed-off-by: Gabe Goodhart <ghart@us.ibm.com >
* fix(patches): Update all patches (except solar-pro) to work with bumped llama.cpp
Branch: IBMGraniteArchitectureSupport
Signed-off-by: Gabe Goodhart <ghart@us.ibm.com >
* fix(solar): Update solar patch for llama.cpp bump
Branch: IBMGraniteArchitectureSupport
Signed-off-by: Gabe Goodhart <ghart@us.ibm.com >
* feat(llama.cpp): Bump llama.cpp for granitemoe support
Branch: IBMGraniteArchitectureSupport
Signed-off-by: Gabe Goodhart <ghart@us.ibm.com >
* feat(llama.cpp): Bump llama.cpp for granitemoe support
Branch: IBMGraniteArchitectureSupport
Signed-off-by: Gabe Goodhart <ghart@us.ibm.com >
* fix(solar): Update the solar-pro patch for latest llama.cpp bump
Branch: IBMGraniteArchitectureSupport
Signed-off-by: Gabe Goodhart <ghart@us.ibm.com >
* feat(llama.cpp): Bump to the latest master of llama.cpp
Branch: IBMGraniteArchitectureSupport
Signed-off-by: Gabe Goodhart <ghart@us.ibm.com >
* fix(patches): Update all patches for latest bump
Branch: IBMGraniteArchitectureSupport
Signed-off-by: Gabe Goodhart <ghart@us.ibm.com >
* feat(llama): Always run sync.sh from the right directory
Branch: IBMGraniteArchitectureSupport
Signed-off-by: Gabe Goodhart <ghart@us.ibm.com >
* fix(llama/patches): Update llama patches
Branch: IBMGraniteArchitectureSupport
Signed-off-by: Gabe Goodhart <ghart@us.ibm.com >
* feat(llama)!: Rough sync with llama.cpp submodule
There are a number of changes that will need to be propagated to llama.go
before any of this works!
Branch: IBMGraniteArchitectureSupport
Signed-off-by: Gabe Goodhart <ghart@us.ibm.com >
* fix(llama/patches): Add a patch and update for missing ggml-impl.h include
This include is where the ggml_cgraph struct is defined. It is included in
many of the .c files to define the forward declartion in ggml.h. It seems
that with the subset of code included here, the import was somehow lost (or
out-of-order) when building, so adding this include to llama.cpp fixes the
missing definition.
Branch: IBMGraniteArchitectureSupport
Signed-off-by: Gabe Goodhart <ghart@us.ibm.com >
* fix(llama/sync): Add missing ggml-cpu-impl.h copy-over in sync.sh
Branch: IBMGraniteArchitectureSupport
Signed-off-by: Gabe Goodhart <ghart@us.ibm.com >
* fix(llama): Add missing log.cpp
This was added as part of the logging overhaul done in llama.cpp
Branch: IBMGraniteArchitectureSupport
Signed-off-by: Gabe Goodhart <ghart@us.ibm.com >
* fix(llama): Overhaul use of sampling module for llama.cpp changes
The changes here reflect the changes made in the big llama.cpp sampling PR
https://github.com/ggerganov/llama.cpp/pull/9294
The sampling functionality is now broken into the base interface
(llama_sampler) and the generation implementation (gpt_sampler). The
changes here reflect that. Since the sampling.h/sampling.cpp code uses c++
STL headers, the sampling_ext.[h|cpp] wrapper is maintained to allow go to
access a pure-C interface.
Branch: IBMGraniteArchitectureSupport
Signed-off-by: Gabe Goodhart <ghart@us.ibm.com >
* fix(llama): Fix the impl of SampleTokenGreedy for new sampling
I don't think this method is currently used, so it could probably just be
removed so that all sampling goes through the GPT interface, but in the
interest of doing no harm, this should keep the method working as expected.
Branch: IBMGraniteArchitectureSupport
* fix(llama): Remove unused SampleTokenGreedy
Branch: IBMGraniteArchitectureSupport
Signed-off-by: Gabe Goodhart <ghart@us.ibm.com >
* fix(sync): Remove bash-specific change to sync.sh
Branch: IBMGraniteArchitectureSupport
Signed-off-by: Gabe Goodhart <ghart@us.ibm.com >
* chore(gofumpt): Format on llama.go to pass linting
Branch: IBMGraniteArchitectureSupport
Signed-off-by: Gabe Goodhart <ghart@us.ibm.com >
* fix(llm): Fix missing <thread> include in ext_server
Branch: IBMGraniteArchitectureSupport
Signed-off-by: Gabe Goodhart <ghart@us.ibm.com >
* fix(llama): Remove TODO about grammar_first
This feature was not used/needed previously so should be fine without
plumbing it through now.
Branch: IBMGraniteArchitectureSupport
Signed-off-by: Gabe Goodhart <ghart@us.ibm.com >
* fix(llama): Better naming for sampling wrapper and args
Branch: IBMGraniteArchitectureSupport
Signed-off-by: Gabe Goodhart <ghart@us.ibm.com >
* fix(llama): Fix patch 05 to use new wrapper api and re-sync
Branch: IBMGraniteArchitectureSupport
Signed-off-by: Gabe Goodhart <ghart@us.ibm.com >
* runner: Flush pending responses before returning
If there are any pending reponses (such as from potential stop
tokens) then we should send them back before ending the sequence.
Otherwise, we can be missing tokens at the end of a response.
Fixes #6707
* fix(llama/sampling): Use gpt_sampler with a forward declaration
Branch: IBMGraniteArchitectureSupport
Signed-off-by: Gabe Goodhart <ghart@us.ibm.com >
* fix(llama): Remove unnecessary patch for gguf impl header
This was caused by an earlier mistake in the embeddings patch that was
dereferencing the pointer instead of using the wrapper API.
Branch: IBMGraniteArchitectureSupport
Signed-off-by: Gabe Goodhart <ghart@us.ibm.com >
* fix(llm): Remove use of deprecated --log-disable flag
Branch: IBMGraniteArchitectureSupport
Signed-off-by: Gabe Goodhart <ghart@us.ibm.com >
---------
Signed-off-by: Gabe Goodhart <ghart@us.ibm.com >
2024-10-17 11:59:52 -07:00
Daniel Hiltgen
05cd82ef94
Rename gpu package discover ( #7143 )
...
Cleaning up go package naming
2024-10-16 17:45:00 -07:00
Daniel Hiltgen
7d6eb0d4c3
Move macos v11 support flags to build script ( #7203 )
...
Having v11 support hard-coded into the cgo settings causes warnings
for newer Xcode versions. This should help keep the build clean for users
building from source with the latest tools, while still allow us to target
the older OS via our CI processes.
2024-10-16 12:49:46 -07:00
Daniel Hiltgen
24636dfa87
Discovery CPU details for default thread selection ( #6264 )
...
On windows, detect large multi-socket systems and reduce to the number of cores
in one socket for best performance
2024-10-15 11:36:08 -07:00
JHubi1
1d7fa3ad2d
Adding 'Ollama App' as community integrations ( #6465 )
2024-10-15 09:57:32 -07:00
frob
09035b71cd
Add missing BF16 tensor type. ( #7193 )
...
Co-authored-by: Richard Lyons <frob@cloudstaff.com >
2024-10-14 17:06:35 -07:00
Daniel Hiltgen
f3c8b898cd
Track GPU discovery failure information ( #5820 )
...
* Expose GPU discovery failure information
* Remove exposed API for now
2024-10-14 16:26:45 -07:00
Daniel Hiltgen
5dd0477fd4
Fix regression on older macos versions ( #7192 )
...
The new cgo compilation requires a flag to target older macos versions
2024-10-13 10:47:42 -07:00
Daniel Hiltgen
c3d321d405
llm: Remove GGML_CUDA_NO_PEER_COPY for ROCm ( #7174 )
...
This workaround logic in llama.cpp is causing crashes for users with less system memory than VRAM.
2024-10-12 09:56:49 -07:00
Jesse Gross
7fe3902552
cli: Send all images in conversation history
...
Currently the CLI only sends images from the most recent image-
containing message. This prevents doing things like sending
one message with an image and then a follow message with a
second image and asking for comparision based on additional
information not present in any text that was output.
It's possible that some models have a problem with this but the
CLI is not the right place to do this since any adjustments are
model-specific and should affect all clients.
Both llava:34b and minicpm-v do reasonable things with multiple
images in the history.
2024-10-10 11:21:51 -07:00
Jesse Gross
0077e22d52
runner.go: Handle truncation of tokens for stop sequences
...
When a single token contains both text to be return and a stop
sequence, this causes an out of bounds error when we update the
cache to match our text. This is because we currently assume that
the removing the stop sequence will consume at least one token.
This also inverts the logic to deal with positive numbers, rather
than a value to be subtracted, which is easier to reason about.
Fixes #7153
2024-10-09 20:39:04 -07:00
Jesse Gross
03408f3437
server: Don't clear cmd when closing a server
...
Close can be called on an LLM server if the runner subprocess dies.
However, the Ollama scheduler code may not know about this yet and
still try to access it. In this case, it is important that 'cmd'
is still available as it is used to check on the status of the
subprocess. If this happens, Kill may be called twice on the subprocess -
that is fine.
In addition, model unloading may race with new accesses, so we should
hold a lock around this. This may result in the model being reloaded
after the first close call - this is also fine as close will be called
again later.
2024-10-09 20:39:04 -07:00
Daniel Hiltgen
cd7e01e8b9
fix vendoring attribute for metal ( #7156 )
...
Add missing metal files to vendoring list
2024-10-09 15:22:36 -07:00
Daniel Hiltgen
7a962bd802
fix vendoring attribute ( #7155 )
...
Expand out the file extensions for vendored code so git reports the
status correctly
2024-10-09 14:21:02 -07:00
Daniel Hiltgen
f9584deba5
Fix build leakages ( #7141 )
...
The recent change to applying patches leaves the submodule dirty based on
"new commits" being present. This ensures we clean up so the tree no longer
reports dirty after a `go generate ./...` run.
The Makefile was being a bit too aggressive in cleaning things up and would result in deleting the placeholder files which someone might accidentally commit.
2024-10-08 13:04:59 -07:00
Jeffrey Morgan
96efd9052f
Re-introduce the llama package ( #5034 )
...
* Re-introduce the llama package
This PR brings back the llama package, making it possible to call llama.cpp and
ggml APIs from Go directly via CGo. This has a few advantages:
- C APIs can be called directly from Go without needing to use the previous
"server" REST API
- On macOS and for CPU builds on Linux and Windows, Ollama can be built without
a go generate ./... step, making it easy to get up and running to hack on
parts of Ollama that don't require fast inference
- Faster build times for AVX,AVX2,CUDA and ROCM (a full build of all runners
takes <5 min on a fast CPU)
- No git submodule making it easier to clone and build from source
This is a big PR, but much of it is vendor code except for:
- llama.go CGo bindings
- example/: a simple example of running inference
- runner/: a subprocess server designed to replace the llm/ext_server package
- Makefile an as minimal as possible Makefile to build the runner package for
different targets (cpu, avx, avx2, cuda, rocm)
Co-authored-by: Jesse Gross <jesse@ollama.com >
Co-authored-by: Daniel Hiltgen <daniel@ollama.com >
* cache: Clear old KV cache entries when evicting a slot
When forking a cache entry, if no empty slots are available we
evict the least recently used one and copy over the KV entries
from the closest match. However, this copy does not overwrite
existing values but only adds new ones. Therefore, we need to
clear the old slot first.
This change fixes two issues:
- The KV cache fills up and runs out of space even though we think
we are managing it correctly
- Performance gets worse over time as we use new cache entries that
are not hot in the processor caches
* doc: explain golang objc linker warning (#6830 )
* llama: gather transitive dependencies for rocm for dist packaging (#6848 )
* Refine go server makefiles to be more DRY (#6924 )
This breaks up the monolithic Makefile for the Go based runners into a
set of utility files as well as recursive Makefiles for the runners.
Files starting with the name "Makefile" are buildable, while files that
end with ".make" are utilities to include in other Makefiles. This
reduces the amount of nearly identical targets and helps set a pattern
for future community contributions for new GPU runner architectures.
When we are ready to switch over to the Go runners, these files should
move to the top of the repo, and we should add targets for the main CLI,
as well as a helper "install" (put all the built binaries on the local
system in a runnable state) and "dist" target (generate the various
tar/zip files for distribution) for local developer use.
* llama: don't create extraneous directories (#6988 )
* llama: Exercise the new build in CI (#6989 )
Wire up some basic sanity testing in CI for the Go runner. GPU runners are not covered yet.
* llama: Refine developer docs for Go server (#6842 )
This enhances the documentation for development focusing on the new Go
server. After we complete the transition further doc refinements
can remove the "transition" discussion.
* runner.go: Allocate batches for all sequences during init
We should tell the model that we could have full batches for all
sequences. We already do this when we allocate the batches but it was
missed during initialization.
* llama.go: Don't return nil from Tokenize on zero length input
Potentially receiving nil in a non-error condition is surprising to
most callers - it's better to return an empty slice.
* runner.go: Remove stop tokens from cache
If the last token is EOG then we don't return this and it isn't
present in the cache (because it was never submitted to Decode).
This works well for extending the cache entry with a new sequence.
However, for multi-token stop sequences, we won't return any of the
tokens but all but the last one will be in the cache. This means
when the conversation continues the cache will contain tokens that
don't overlap with the new prompt.
This works (we will pick up the portion where there is overlap) but
it causes unnecessary cache thrashing because we will fork the original
cache entry as it is not a perfect match.
By trimming the cache to the tokens that we actually return this
issue can be avoided.
* runner.go: Simplify flushing of pending tokens
* runner.go: Update TODOs
* runner.go: Don't panic when processing sequences
If there is an error processing a sequence, we should return a
clean HTTP error back to Ollama rather than panicing. This will
make us more resilient to transient failures.
Panics can still occur during startup as there is no way to serve
requests if that fails.
Co-authored-by: jmorganca <jmorganca@gmail.com >
* runner.go: More accurately capture timings
Currently prompt processing time doesn't capture the that it takes
to tokenize the input, only decoding time. We should capture the
full process to more accurately reflect reality. This is especially
true once we start processing images where the initial processing
can take significant time. This is also more consistent with the
existing C++ runner.
* runner.go: Support for vision models
In addition to bringing feature parity with the C++ runner, this also
incorporates several improvements:
- Cache prompting works with images, avoiding the need to re-decode
embeddings for every message in a conversation
- Parallelism is supported, avoiding the need to restrict to one
sequence at a time. (Though for now Ollama will not schedule
them while we might need to fall back to the old runner.)
Co-authored-by: jmorganca <jmorganca@gmail.com >
* runner.go: Move Unicode checking code and add tests
* runner.go: Export external cache members
Runner and cache are in the same package so the change doesn't
affect anything but it is more internally consistent.
* runner.go: Image embedding cache
Generating embeddings from images can take significant time (on
my machine between 100ms and 8s depending on the model). Although
we already cache the result of decoding these images, the embeddings
need to be regenerated every time. This is not necessary if we get
the same image over and over again, for example, during a conversation.
This currently uses a very small cache with a very simple algorithm
but it is easy to improve as is warranted.
* llama: catch up on patches
Carry forward solar-pro and cli-unicode patches
* runner.go: Don't re-allocate memory for every batch
We can reuse memory allocated from batch to batch since batch
size is fixed. This both saves the cost of reallocation as well
keeps the cache lines hot.
This results in a roughly 1% performance improvement for token
generation with Nvidia GPUs on Linux.
* runner.go: Default to classic input cache policy
The input cache as part of the go runner implemented a cache
policy that aims to maximize hit rate in both single and multi-
user scenarios. When there is a cache hit, the response is
very fast.
However, performance is actually slower when there is an input
cache miss due to worse GPU VRAM locality. This means that
performance is generally better overall for multi-user scenarios
(better input cache hit rate, locality was relatively poor already).
But worse for single users (input cache hit rate is about the same,
locality is now worse).
This defaults the policy back to the old one to avoid a regression
but keeps the new one available through an environment variable
OLLAMA_MULTIUSER_CACHE. This is left undocumented as the goal is
to improve this in the future to get the best of both worlds
without user configuration.
For inputs that result in cache misses, on Nvidia/Linux this
change improves performance by 31% for prompt processing and
13% for token generation.
* runner.go: Increase size of response channel
Generally the CPU can easily keep up with handling reponses that
are generated but there's no reason not to let generation continue
and handle things in larger batches if needed.
* llama: Add CI to verify all vendored changes have patches (#7066 )
Make sure we don't accidentally merge changes in the vendored code
that aren't also reflected in the patches.
* llama: adjust clip patch for mingw utf-16 (#7065 )
* llama: adjust clip patch for mingw utf-16
* llama: ensure static linking of runtime libs
Avoid runtime dependencies on non-standard libraries
* runner.go: Enable llamafile (all platforms) and BLAS (Mac OS)
These are two features that are shown on llama.cpp's system info
that are currently different between the two runners. On my test
systems the performance difference is very small to negligible
but it is probably still good to equalize the features.
* llm: Don't add BOS/EOS for tokenize requests
This is consistent with what server.cpp currently does. It affects
things like token processing counts for embedding requests.
* runner.go: Don't cache prompts for embeddings
Our integration with server.cpp implicitly disables prompt caching
because it is not part of the JSON object being parsed, this makes
the Go runner behavior similarly.
Prompt caching has been seen to affect the results of text completions
on certain hardware. The results are not wrong either way but they
are non-deterministic. However, embeddings seem to be affected even
on hardware that does not show this behavior for completions. For
now, it is best to maintain consistency with the existing behavior.
* runner.go: Adjust debug log levels
Add system info printed at startup and quiet down noisier logging.
* llama: fix compiler flag differences (#7082 )
Adjust the flags for the new Go server to more closely match the
generate flow
* llama: refine developer docs (#7121 )
* llama: doc and example clean up (#7122 )
* llama: doc and example clean up
* llama: Move new dockerfile into llama dir
Temporary home until we fully transition to the Go server
* llama: runner doc cleanup
* llama.go: Add description for Tokenize error case
---------
Co-authored-by: Jesse Gross <jesse@ollama.com >
Co-authored-by: Daniel Hiltgen <daniel@ollama.com >
Co-authored-by: Daniel Hiltgen <dhiltgen@users.noreply.github.com >
2024-10-08 08:53:54 -07:00
Shifra Goldstone
de982616f1
readme: replace stale links to LangChain documentation ( #7117 )
2024-10-07 21:16:56 -04:00
hidden1nin
defbf9425a
readme: add G1 to list of community integrations ( #7096 )
2024-10-05 11:57:53 -07:00
Alex Mavrogiannis
f40bb398f6
Stop model before deletion if loaded ( fixed #6957 ) ( #7050 )
2024-10-01 15:45:43 -07:00
zmldndx
79d3b1e2bd
readme: add ARGO LLM tool to community integrations ( #7027 )
2024-09-29 13:01:01 -07:00
Blake Mizerany
03608cb46e
server: close response body on error ( #6986 )
...
This change closes the response body when an error occurs in
makeRequestWithRetry. Previously, the first, non-200 response body was
not closed before reattempting the request. This change ensures that
the response body is closed in all cases where an error occurs,
preventing leaks of file descriptors.
Fixes #6974
2024-09-26 12:00:31 -07:00
Xe Iaso
450acb71a6
readme: fix llama3.1 -> llama3.2 typo ( #6962 )
2024-09-25 11:53:47 -07:00
Jeffrey Morgan
55ea963c9e
update default model to llama3.2 ( #6959 )
2024-09-25 11:11:22 -07:00
Daniel Hiltgen
e9e9bdb8d9
CI: Fix win arm version defect ( #6940 )
...
write-host in powershell writes directly to the console and will not be picked
up by a pipe. Echo, or write-output will.
2024-09-24 15:18:10 -07:00
Alex Yang
35bb6d32b3
readme: update llamaindex links ( #6939 )
2024-09-24 12:15:43 -07:00
Deep Lakhani
98701b58b3
readme: add LLMChat to community integrations ( #6919 )
2024-09-23 17:49:46 -07:00
Mahesh Sathiamoorthy
ad935f45ac
examples: use punkt_tab instead of punkt ( #6907 )
...
This was causing an error since we depend on punkt_tab.
2024-09-21 18:55:28 -07:00
Daniel Hiltgen
dbba73469d
runner: Set windows above normal priority ( #6905 )
...
When running the subprocess as a background service windows may
throttle, which can lead to thrashing and very poor token rate.
2024-09-21 16:54:49 -07:00
Daniel Hiltgen
6c2eb73a70
Fix missing dep path on windows CPU runners ( #6884 )
...
GPUs handled the dependency path properly, but CPU runners didn't which
results in missing vc redist libraries on systems where the user didn't
already have it installed from some other app.
2024-09-21 16:28:29 -07:00
Daniel Hiltgen
2a038c1d7e
CI: win arm artifact dist dir ( #6900 )
...
The upload artifact is missing the dist prefix since all
payloads are in the same directory, so restore the prefix
on download.
2024-09-20 19:16:18 -07:00
Daniel Hiltgen
616c5eafee
CI: win arm adjustments ( #6898 )
2024-09-20 16:58:56 -07:00
Daniel Hiltgen
f5ff917b1d
CI: adjust step ordering for win arm to match x64 ( #6895 )
2024-09-20 14:20:57 -07:00
Daniel Hiltgen
d632e23fba
Add Windows arm64 support to official builds ( #5712 )
...
* Unified arm/x86 windows installer
This adjusts the installer payloads to be architecture aware so we can cary
both amd64 and arm64 binaries in the installer, and install only the applicable
architecture at install time.
* Include arm64 in official windows build
* Harden schedule test for slow windows timers
This test seems to be a bit flaky on windows, so give it more time to converge
2024-09-20 13:09:38 -07:00
Patrick Devine
5804cf1723
documentation for stopping a model ( #6766 )
2024-09-18 16:26:42 -07:00
Ryan Marten
bf7ee0f4d4
examples: add python examples for bespoke-minicheck ( #6841 )
2024-09-18 09:35:25 -07:00
Michael Yang
504a410f02
llm: add solar pro (preview) ( #6846 )
2024-09-17 18:11:26 -07:00
Jeffrey Morgan
d05da29912
server: add tool parsing support for nemotron-mini ( #6849 )
2024-09-17 18:06:16 -07:00
Michael Yang
72962c6e08
Merge pull request #6833 from ollama/mxyng/git-am
...
make patches git am-able
2024-09-17 16:33:23 -07:00
Michael Yang
7bd7b02712
make patches git am-able
...
raw diffs can be applied using `git apply` but not with `git am`. git
patches, e.g. through `git format-patch` are both apply-able and am-able
2024-09-17 15:26:40 -07:00
Daniel Hiltgen
8f9ab5e14d
CI: dist directories no longer present ( #6834 )
...
The new buildx based build no longer leaves the dist/linux-* directories
around, so we don't have to clean them up before uploading.
2024-09-16 17:31:37 -07:00
Daniel Hiltgen
7717bb6a84
CI: clean up naming, fix tagging latest ( #6832 )
...
The rocm CI step for RCs was incorrectly tagging them as the latest rocm build.
The multiarch manifest was incorrectly tagged twice (with and without the
prefix "v"). Static windows artifacts weren't being carried between build
jobs. This also fixes the latest tagging script.
2024-09-16 16:18:41 -07:00
Daniel Hiltgen
0ec2915ea7
CI: set platform build build_linux script to keep buildx happy ( #6829 )
...
The runners don't have emulation set up so the default multi-platform build
wont work.
2024-09-16 14:07:29 -07:00
Michael Yang
c9a7541b9c
readme: add Agents-Flex to community integrations ( #6788 )
2024-09-16 13:42:52 -07:00
Patrick Devine
d81cfd7d6f
fix typo in import docs ( #6828 )
2024-09-16 11:48:14 -07:00
Pepo
b330c830d3
readme: add vim-intelligence-bridge to Terminal section ( #6818 )
2024-09-15 21:20:36 -04:00
Edward Cui
d889c6fd07
readme: add Obsidian Quiz Generator plugin to community integrations ( #6789 )
2024-09-14 23:52:37 -04:00
Daniel Hiltgen
56b9af336a
Fix incremental builds on linux ( #6780 )
...
scripts: fix incremental builds on linux or similar
2024-09-13 08:24:08 -07:00
Daniel Hiltgen
fda0d3be52
Use GOARCH for build dirs ( #6779 )
...
Corrects x86_64 vs amd64 discrepancy
2024-09-12 16:38:05 -07:00
Daniel Hiltgen
cd5c8f6471
Optimize container images for startup ( #6547 )
...
* Optimize container images for startup
This change adjusts how to handle runner payloads to support
container builds where we keep them extracted in the filesystem.
This makes it easier to optimize the cpu/cuda vs cpu/rocm images for
size, and should result in faster startup times for container images.
* Refactor payload logic and add buildx support for faster builds
* Move payloads around
* Review comments
* Converge to buildx based helper scripts
* Use docker buildx action for release
2024-09-12 12:10:30 -07:00
dcasota
fef257c5c5
examples: updated requirements.txt for privategpt example
2024-09-11 18:56:56 -07:00
Adrian Cole
d066d9b8e0
examples: polish loganalyzer example ( #6744 )
2024-09-11 18:37:37 -07:00
RAPID ARCHITECT
5a00dc9fc9
readme: add ollama_moe to community integrations ( #6752 )
2024-09-11 18:36:26 -07:00
Jesse Gross
c354e87809
Merge pull request #6767 from ollama/jessegross/bug_6707
...
runner: Flush pending responses before returning
2024-09-11 17:20:22 -07:00
Jesse Gross
93ac3760cb
runner: Flush pending responses before returning
...
If there are any pending reponses (such as from potential stop
tokens) then we should send them back before ending the sequence.
Otherwise, we can be missing tokens at the end of a response.
Fixes #6707
2024-09-11 16:39:32 -07:00
Patrick Devine
abed273de3
add "stop" command ( #6739 )
2024-09-11 16:36:21 -07:00
Michael Yang
034392624c
Merge pull request #6762 from ollama/mxyng/show-output
...
refactor show ouput
2024-09-11 14:58:40 -07:00
Michael Yang
ecab6f1cc5
refactor show ouput
...
fixes line wrapping on long texts
2024-09-11 14:23:09 -07:00
Petr Mironychev
7d6900827d
readme: add QodeAssist to community integrations ( #6754 )
2024-09-11 13:19:49 -07:00
Daniel Hiltgen
9246e6dd15
Verify permissions for AMD GPU ( #6736 )
...
This adds back a check which was lost many releases back to verify /dev/kfd permissions
which when lacking, can lead to confusing failure modes of:
"rocBLAS error: Could not initialize Tensile host: No devices found"
This implementation does not hard fail the serve command but instead will fall back to CPU
with an error log. In the future we can include this in the GPU discovery UX to show
detected but unsupported devices we discovered.
2024-09-11 11:38:25 -07:00
Michael Yang
735a0ca2e4
Merge pull request #6732 from ollama/mxyng/debug-proxy
...
add *_proxy to env map for debugging
2024-09-10 16:13:25 -07:00
Michael Yang
dddb72e084
add *_proxy for debugging
2024-09-10 09:43:35 -07:00
Jeffrey Morgan
83a9b5271a
docs: update examples to use llama3.1 ( #6718 )
2024-09-09 22:47:16 -07:00
Daniel Hiltgen
4a8069f9c4
Quiet down dockers new lint warnings ( #6716 )
...
* Quiet down dockers new lint warnings
Docker has recently added lint warnings to build. This cleans up those warnings.
* Fix go lint regression
2024-09-09 17:22:20 -07:00
Patrick Devine
84b84ce2db
catch when model vocab size is set correctly ( #6714 )
2024-09-09 17:18:54 -07:00
Jeffrey Morgan
bb6a086d63
readme: add crewAI to community integrations ( #6699 )
2024-09-08 00:36:24 -07:00
RAPID ARCHITECT
30c8f201cc
readme: add crewAI with mesop to community integrations
2024-09-08 00:35:59 -07:00
frob
06d4fba851
openai: align chat temperature and frequency_penalty options with completion ( #6688 )
2024-09-07 09:08:08 -07:00
Jeffrey Morgan
108fb6c1d1
docs: improve linux install documentation ( #6683 )
...
Includes small improvements to document layout and code blocks
2024-09-06 22:05:37 -07:00
Yaroslav
da915345d1
openai: don't scale temperature or frequency_penalty ( #6514 )
2024-09-06 17:45:45 -07:00
nickthecook
8a027bc401
readme: add Archyve to community integrations ( #6680 )
2024-09-06 14:06:01 -07:00
imoize
5446903fbd
readme: add Plasmoid Ollama Control to community integrations ( #6681 )
2024-09-06 14:04:12 -07:00
Daniel Hiltgen
56318fb365
Improve logging on GPU too small ( #6666 )
...
When we determine a GPU is too small for any layers, it's not always clear why.
This will help troubleshoot those scenarios.
2024-09-06 08:29:36 -07:00
frob
fe91d7fff1
openai: fix "presence_penalty" typo and add test ( #6665 )
2024-09-06 01:16:28 -07:00
Patrick Devine
608e87bf87
Fix gemma2 2b conversion ( #6645 )
2024-09-05 17:02:28 -07:00
Daniel Hiltgen
48685c6ed0
Document uninstall on windows ( #6663 )
2024-09-05 15:57:38 -07:00
Daniel Hiltgen
9565fa64a8
Revert "Detect running in a container ( #6495 )" ( #6662 )
...
This reverts commit a60d9b89ce .
2024-09-05 14:26:00 -07:00
Daniel Hiltgen
6719097649
llm: make load time stall duration configurable via OLLAMA_LOAD_TIMEOUT
...
With the new very large parameter models, some users are willing to wait for
a very long time for models to load.
2024-09-05 14:00:08 -07:00
Daniel Hiltgen
b05c9e83d9
Introduce GPU Overhead env var ( #5922 )
...
Provide a mechanism for users to set aside an amount of VRAM on each GPU
to make room for other applications they want to start after Ollama, or workaround
memory prediction bugs
2024-09-05 13:46:35 -07:00
Daniel Hiltgen
a60d9b89ce
Detect running in a container ( #6495 )
2024-09-05 13:24:51 -07:00
Michael Yang
bf612cd608
Merge pull request #6260 from ollama/mxyng/mem
...
llama3.1 memory
2024-09-05 13:22:08 -07:00
Zeyo
ef98e56122
readme: add AiLama to the list of community integrations ( #4957 )
2024-09-05 13:10:44 -07:00
Michael
5f944baac7
Update gpu.md: Add RTX 3050 Ti and RTX 3050 Ti ( #5888 )
...
* Update gpu.md
Seems strange that the laptop versions of 3050 and 3050 Ti would be supported but not the non-notebook, but this is what the page (https://developer.nvidia.com/cuda-gpus ) says.
Signed-off-by: bean5 <2052646+bean5@users.noreply.github.com >
* Update gpu.md
Remove notebook reference
---------
Signed-off-by: bean5 <2052646+bean5@users.noreply.github.com >
2024-09-05 11:24:26 -07:00
Tobias Heinze
6fc9d22707
server: fix blob download when receiving a 200 response ( #6656 )
2024-09-05 10:48:26 -07:00
Vitaly Zdanevich
f27c00d8c5
readme: add Gentoo package manager entry to community integrations ( #5714 )
2024-09-05 09:58:14 -07:00
王卿
c7c845ec52
Update install.sh:Replace "command -v" with encapsulated functionality ( #6035 )
...
Replace "command -v" with encapsulated functionality
2024-09-05 09:49:48 -07:00
Augustinas Malinauskas
cf48603943
readme: include Enchanted for Apple Vision Pro ( #4949 )
...
Added Enchanted with Apple Vision Pro support
2024-09-05 01:30:19 -04:00
Silas Marvin
6e67be09b6
readme: add lsp-ai to community integrations ( #5063 )
2024-09-05 01:17:34 -04:00
Arda Günsüren
0f5f060d2b
readme: add ollama-php library to community integrations ( #6361 )
2024-09-05 01:01:14 -04:00
jk011ru
b3554778bd
readme: add vnc-lm discord bot community integration ( #6644 )
2024-09-04 19:46:02 -04:00
Pascal Patry
bbe7b96ded
llm: use json.hpp from common ( #6642 )
2024-09-04 19:34:42 -04:00
Rune Berg
c18ff18b2c
readme: add confichat to community integrations ( #6378 )
2024-09-04 17:26:02 -04:00
Tomoya Fujita
133770a548
docs: add group to manual Linux isntructions and verify service is running ( #6430 )
2024-09-04 14:45:09 -04:00
Teïlo M
f36ebfb478
readme: add gollm to the list of community libraries ( #6099 )
2024-09-04 14:19:41 -04:00
亢奋猫
5b55379651
readme: add Cherry Studio to community integrations ( #6633 )
2024-09-04 10:53:36 -04:00
Mitar
93eb43d020
readme: add Go fun package ( #6421 )
2024-09-04 10:52:46 -04:00
Carter
369479cc30
docs: fix spelling error ( #6391 )
...
change "dorrect" to "correct"
2024-09-04 09:42:33 -04:00
Erkin Alp Güney
7d89e48f5c
install.sh: update instructions to use WSL2 ( #6450 )
2024-09-04 09:34:53 -04:00
Sam
27bcce6d9f
readme: add claude-dev to community integrations ( #6630 )
2024-09-04 09:32:26 -04:00
Viz
491fc312ae
readme: add PyOllaMx project ( #6624 )
2024-09-03 23:10:53 -04:00
Jeffrey Morgan
5e2653f9fe
llm: update llama.cpp commit to 8962422 ( #6618 )
2024-09-03 21:12:39 -04:00
Daniel Hiltgen
f29b167e1a
Use cuda v11 for driver 525 and older ( #6620 )
...
It looks like driver 525 (aka, cuda driver 12.0) has problems with the cuda v12 library
we compile against, so run v11 on those older drivers if detected.
2024-09-03 17:15:31 -07:00
Daniel Hiltgen
037a4d103e
Log system memory at info ( #6617 )
...
On systems with low system memory, we can hit allocation failures that are difficult to diagnose
without debug logs. This will make it easier to spot.
2024-09-03 14:55:20 -07:00
Mateusz Migas
50c05d57e0
readme: add Painting Droid community integration ( #5514 )
2024-09-03 16:15:54 -04:00
Amith Koujalgi
35159de18a
readme: update Ollama4j link and add link to Ollama4j Web UI ( #6608 )
2024-09-03 16:08:50 -04:00
FellowTraveler
94fff5805f
Fix sprintf to snprintf ( #5664 )
...
/Users/au/src/ollama/llm/ext_server/server.cpp:289:9: warning: 'sprintf' is deprecated: This function is provided for compatibility reasons only. Due to security concerns inherent in the design of sprintf(3), it is highly recommended that you use snprintf(3) instead.
2024-09-03 09:32:59 -07:00
OpenVMP
14d5093cd0
readme: add PartCAD tool to readme for generating 3D CAD models using Ollama ( #6605 )
2024-09-03 12:28:01 -04:00
R0CKSTAR
9df5f0e8e4
Reduce docker image size ( #5847 )
...
Signed-off-by: Xiaodong Ye <yeahdongcn@gmail.com >
2024-09-03 09:25:31 -07:00
presbrey
ad3eb00bee
readme: add OllamaFarm project ( #6508 )
2024-09-02 16:05:36 -04:00
Jonathan Hecl
bfc2d61549
readme: add go-crew and Ollamaclient projects ( #6583 )
2024-09-02 15:34:26 -04:00
SnoopyTlion
741affdfd6
docs: update faq.md for OLLAMA_MODELS env var permissions ( #6587 )
2024-09-02 15:31:29 -04:00
Vimal Kumar
5f7b4a5e30
fix(cmd): show info may have nil ModelInfo ( #6579 )
2024-08-31 21:12:17 -07:00
rayfiyo
1aad838707
docs: update GGUF examples and references ( #6577 )
2024-08-31 19:34:25 -07:00
Daniel Hiltgen
a1cef4d0a5
Add findutils to base images ( #6581 )
...
This caused missing internal files
2024-08-31 10:40:05 -07:00
Michael Yang
c41f0b9e6c
Merge pull request #6562 from ollama/mxyng/build-artifacts
...
remove any unneeded build artifacts
2024-08-30 09:40:50 -07:00
Michael Yang
142cbb722d
Merge pull request #6482 from ollama/mxyng/client-path
...
passthrough OLLAMA_HOST path to client
2024-08-30 09:40:34 -07:00
Michael Yang
9468c6824a
Merge pull request #6534 from ollama/mxyng/messages
...
update templates to use messages
2024-08-30 09:39:59 -07:00
Michael Yang
11018196e0
remove any unneeded build artifacts
2024-08-29 13:40:47 -07:00
Bryan Honof
56346ccfa3
doc: Add Nix and Flox to package manager listing ( #6074 )
2024-08-29 12:45:35 -04:00
Patrick Devine
8e4e509fa4
update the openai docs to explain how to set the context size ( #6548 )
2024-08-28 17:11:46 -07:00
Michael Yang
47c2b947a9
Merge pull request #6546 from ollama/mxyng/fix-test
...
fix(test): do not clobber models directory
2024-08-28 15:37:47 -07:00
Michael Yang
5eb77bf976
Merge pull request #6539 from ollama/mxyng/validate-modelpath
...
fix: validate modelpath
2024-08-28 14:38:27 -07:00
Michael Yang
e4d0a9c325
fix(test): do not clobber models directory
2024-08-28 14:07:48 -07:00
Patrick Devine
7416ced70f
add llama3.1 chat template ( #6545 )
2024-08-28 14:03:20 -07:00
Michael Yang
9cfd2dd3e3
Merge pull request #6522 from ollama/mxyng/detect-chat
...
detect chat template from configs that contain lists
2024-08-28 11:04:18 -07:00
Michael Yang
8e6da3cbc5
update deprecated warnings
2024-08-28 09:55:11 -07:00
Michael Yang
d9d50c43cc
validate model path
2024-08-28 09:32:57 -07:00
Patrick Devine
6c1c1ad6a9
throw an error when encountering unsupport tensor sizes ( #6538 )
2024-08-27 17:54:04 -07:00
Daniel Hiltgen
93ea9240ae
Move ollama executable out of bin dir ( #6535 )
2024-08-27 16:19:00 -07:00
Michael Yang
413ae39f3c
update templates to use messages
2024-08-27 15:44:04 -07:00
Michael Yang
60e47573a6
more tokenizer tests
2024-08-27 14:51:10 -07:00
Patrick Devine
d13c3daa0b
add safetensors to the modelfile docs ( #6532 )
2024-08-27 14:46:47 -07:00
Patrick Devine
1713eddcd0
Fix import image width ( #6528 )
2024-08-27 14:19:47 -07:00
Daniel Hiltgen
4e1c4f6e0b
Update manual instructions with discrete ROCm bundle ( #6445 )
2024-08-27 13:42:28 -07:00
Sean Khatiri
397cae7962
llm: fix typo in comment ( #6530 )
2024-08-27 13:28:29 -07:00
Patrick Devine
1c70a00f71
adjust image sizes
2024-08-27 11:15:25 -07:00
Michael Yang
eae3af6807
clean up convert tokenizer
2024-08-27 11:11:43 -07:00
Michael Yang
3eb08377f8
detect chat template from configs that contain lists
2024-08-27 10:49:33 -07:00
Patrick Devine
ac80010db8
update the import docs ( #6104 )
2024-08-26 19:57:26 -07:00
Jeffrey Morgan
47fa0839b9
server: clean up route names for consistency ( #6524 )
2024-08-26 19:36:11 -07:00
Daniel Hiltgen
0f92b19bec
Only enable numa on CPUs ( #6484 )
...
The numa flag may be having a performance impact on multi-socket systems with GPU loads
2024-08-24 17:24:50 -07:00
Daniel Hiltgen
69be940bf6
gpu: Group GPU Library sets by variant ( #6483 )
...
The recent cuda variant changes uncovered a bug in ByLibrary
which failed to group by common variant for GPU types.
2024-08-23 15:11:56 -07:00
Michael Yang
9638c24c58
Merge pull request #5446 from ollama/mxyng/faq
...
update faq
2024-08-23 14:05:59 -07:00
Michael Yang
bb362caf88
update faq
2024-08-23 13:37:21 -07:00
Michael Yang
386af6c1a0
passthrough OLLAMA_HOST path to client
2024-08-23 13:23:28 -07:00
Patrick Devine
0c819e167b
convert safetensor adapters into GGUF ( #6327 )
2024-08-23 11:29:56 -07:00
Daniel Hiltgen
7a1e1c1caf
gpu: Ensure driver version set before variant ( #6480 )
...
During rebasing, the ordering was inverted causing the cuda version
selection logic to break, with driver version being evaluated as zero
incorrectly causing a downgrade to v11.
2024-08-23 11:21:12 -07:00
Daniel Hiltgen
0b03b9c32f
llm: Align cmake define for cuda no peer copy ( #6455 )
...
Define changed recently and this slipped through the cracks with the old
name.
2024-08-23 11:20:39 -07:00
Daniel Hiltgen
90ca84172c
Fix embeddings memory corruption ( #6467 )
...
* Fix embeddings memory corruption
The patch was leading to a buffer overrun corruption. Once removed though, parallism
in server.cpp lead to hitting an assert due to slot/seq IDs being >= token count. To
work around this, only use slot 0 for embeddings.
* Fix embed integration test assumption
The token eval count has changed with recent llama.cpp bumps (0.3.5+)
2024-08-22 14:51:42 -07:00
Michael Yang
6bd8a4b0a1
Merge pull request #6064 from ollama/mxyng/convert-llama3
...
convert: update llama conversion for llama3.1
2024-08-21 12:57:09 -07:00
Michael Yang
77903ab8b4
llama3.1
2024-08-21 11:49:31 -07:00
Michael Yang
e22286c9e1
Merge pull request #5365 from ollama/mxyng/convert-gemma2
...
convert gemma2
2024-08-21 11:48:43 -07:00
Michael Yang
107f695929
Merge pull request #4917 from ollama/mxyng/convert-bert
...
convert bert model from safetensors
2024-08-21 11:48:29 -07:00
Michael Yang
4ecc70d3b4
Merge pull request #6386 from zwwhdls/fix-new-layer
...
fix: chmod new layer to 0o644 when creating it
2024-08-21 10:58:45 -07:00
Michael Yang
3546bbd08c
convert gemma2
2024-08-20 17:27:51 -07:00
Michael Yang
beb49eef65
create bert models from cli
2024-08-20 17:27:34 -07:00
Michael Yang
5a28b9cf5f
bert
2024-08-20 17:27:34 -07:00
Daniel Hiltgen
a017cf2fea
Split rocm back out of bundle ( #6432 )
...
We're over budget for github's maximum release artifact size with rocm + 2 cuda
versions. This splits rocm back out as a discrete artifact, but keeps the layout so it can
be extracted into the same location as the main bundle.
2024-08-20 07:26:38 -07:00
Daniel Hiltgen
19e5a890f7
CI: remove directories from dist dir before upload step ( #6429 )
2024-08-19 15:19:21 -07:00
Daniel Hiltgen
f91c9e3709
CI: handle directories during checksum ( #6427 )
2024-08-19 13:48:45 -07:00
Daniel Hiltgen
2df6905ede
Merge pull request #6424 from dhiltgen/cuda_v12
...
Fix overlapping artifact name on CI
2024-08-19 12:11:58 -07:00
Daniel Hiltgen
d8be22e47d
Fix overlapping artifact name on CI
2024-08-19 12:07:18 -07:00
Daniel Hiltgen
652c273f0e
Merge pull request #5049 from dhiltgen/cuda_v12
...
Cuda v12
2024-08-19 11:14:24 -07:00
Daniel Hiltgen
88e7705079
Merge pull request #6402 from rick-github/numParallel
...
Override numParallel in pickBestPartialFitByLibrary() only if unset.
2024-08-19 11:07:22 -07:00
Daniel Hiltgen
f9e31da946
Review comments
2024-08-19 10:36:15 -07:00
Daniel Hiltgen
88bb9e3328
Adjust layout to bin+lib/ollama
2024-08-19 09:38:53 -07:00
Daniel Hiltgen
3b19cdba2a
Remove Jetpack
2024-08-19 09:38:53 -07:00
Daniel Hiltgen
927d98a6cd
Add windows cuda v12 + v11 support
2024-08-19 09:38:53 -07:00
Daniel Hiltgen
f6c811b320
Enable cuda v12 flags
2024-08-19 09:38:53 -07:00
Daniel Hiltgen
4fe3a556fa
Add cuda v12 variant and selection logic
...
Based on compute capability and driver version, pick
v12 or v11 cuda variants.
2024-08-19 09:38:53 -07:00
Daniel Hiltgen
fc3b4cda89
Report GPU variant in log
2024-08-19 09:38:53 -07:00
Daniel Hiltgen
d470ebe78b
Add Jetson cuda variants for arm
...
This adds new variants for arm64 specific to Jetson platforms
2024-08-19 09:38:53 -07:00
Daniel Hiltgen
c7bcb00319
Wire up ccache and pigz in the docker based build
...
This should help speed things up a little
2024-08-19 09:38:53 -07:00
Daniel Hiltgen
74d45f0102
Refactor linux packaging
...
This adjusts linux to follow a similar model to windows with a discrete archive
(zip/tgz) to cary the primary executable, and dependent libraries. Runners are
still carried as payloads inside the main binary
Darwin retain the payload model where the go binary is fully self contained.
2024-08-19 09:38:53 -07:00
Jeffrey Morgan
9fddef3731
server: limit upload parts to 16 ( #6411 )
2024-08-19 09:20:52 -07:00
Richard Lyons
885cf45087
Fix white space.
2024-08-18 03:07:16 +02:00
Richard Lyons
9352eeb752
Reset NumCtx.
2024-08-18 02:55:01 +02:00
Richard Lyons
0ad0e738cd
Override numParallel only if unset.
2024-08-18 01:43:26 +02:00
zwwhdls
bdc4308afb
fix: chmod new layer to 0o644 when creating it
...
Signed-off-by: zwwhdls <zww@hdls.me >
2024-08-16 11:43:19 +08:00
Daniel Hiltgen
d29cd4c2ed
Merge pull request #6381 from eust-w/main
...
fix: Add tooltip to system tray icon
2024-08-15 15:31:15 -07:00
eust-w
a84c05cf91
fix: Add tooltip to system tray icon
...
- Updated setIcon method to include tooltip text for the system tray icon.
- Added NIF_TIP flag and set the tooltip text using UTF16 encoding.
Resolves : #6372
2024-08-16 06:00:12 +08:00
Michael Yang
e3d7f32af7
Merge pull request #6363 from ollama/mxyng/fix-noprune
...
fix: noprune on pull
2024-08-15 12:20:38 -07:00
Michael Yang
3a75e74e34
only skip invalid json manifests
2024-08-15 10:29:14 -07:00
Michael Yang
237dccba1e
skip invalid manifest files
2024-08-14 16:55:45 -07:00
Michael Yang
b3f75fc812
fix noprune
2024-08-14 15:48:51 -07:00
Jeffrey Morgan
8200c371ae
add CONTRIBUTING.md ( #6349 )
2024-08-14 15:19:50 -07:00
longtao
0a8d6ea86d
Fix typo and improve readability ( #5964 )
...
* Fix typo and improve readability
Summary:
* Rename updatAvailableMenuID to updateAvailableMenuID
* Replace unused cmd parameter with _ in RunServer function
* Fix typos in comments
(cherry picked from commit 5b8715f0b04773369e8eb1f9e6737995a0ab3ba7)
* Update api/client.go
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com >
---------
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com >
2024-08-13 17:54:19 -07:00
Blake Mizerany
8e1050f366
server: reduce max connections used in download ( #6347 )
...
The previous value of 64 was WAY too high and unnecessary. It reached
diminishing returns and blew past it. This is a more reasonable number
for _most_ normal cases. For users on cloud servers with excellent
network quality, this will keep screaming for them, without hitting our
CDN limits. For users with relatively poor network quality, this will
keep them from saturating their network and causing other issues.
2024-08-13 16:47:35 -07:00
Bruce MacDonald
eda8a32a09
update chatml template format to latest in docs ( #6344 )
2024-08-13 16:39:18 -07:00
Michael Yang
a0a40aa20c
Merge pull request #6346 from ollama/mxyng/lint
2024-08-13 14:58:35 -07:00
Michael Yang
2697d7f5aa
lint
...
- fixes printf: non-constant format string in call to fmt.Printf
- fixes SA1032: arguments have the wrong order
- disables testifylint
2024-08-13 14:36:33 -07:00
Pamela Fox
1f32276178
Update openai.md to remove extra checkbox ( #6345 )
2024-08-13 13:36:05 -07:00
Daniel Hiltgen
4c4fe3f87f
Merge pull request #6343 from dhiltgen/revert_win_go_version
...
Go back to a pinned Go version
2024-08-13 11:53:49 -07:00
Daniel Hiltgen
feedf49c71
Go back to a pinned Go version
...
Go version 1.22.6 is triggering AV false positives, so go back to 1.22.5
2024-08-13 11:45:44 -07:00
royjhan
8b00a415ab
Load Embedding Model on Empty Input ( #6325 )
...
* load on empty input
* no load on invalid input
2024-08-13 10:19:56 -07:00
Michael Yang
01b80e9ffc
Merge pull request #5443 from ollama/mxyng/convert-phi3
...
add conversion for microsoft phi 3 mini/medium 4k, 128k
2024-08-12 15:47:58 -07:00
Michael Yang
bd5e432630
update import.md
2024-08-12 15:13:29 -07:00
Bruce MacDonald
aec77d6a05
support new "longrope" attention factor
2024-08-12 15:13:29 -07:00
Michael Yang
6ffb5cb017
add conversion for microsoft phi 3 mini/medium 4k, 128
2024-08-12 15:13:29 -07:00
Josh
f7e3b9190f
cmd: spinner progress for transfer model data ( #6100 )
2024-08-12 11:46:32 -07:00
Josh
980dd15f81
cmd: speed up gguf creates ( #6324 )
2024-08-12 11:46:09 -07:00
royjhan
01d544d373
OpenAI: Simplify input output in testing ( #5858 )
...
* simplify input output
* direct comp
* in line image
* rm error pointer type
* update response testing
* lint
2024-08-12 10:33:34 -07:00
Josh
1dc3ef3aa9
Revert "server: speed up single gguf creates ( #5898 )" ( #6323 )
...
This reverts commit 8aac22438e .
2024-08-12 09:57:51 -07:00
Josh
8aac22438e
server: speed up single gguf creates ( #5898 )
2024-08-12 09:28:55 -07:00
Jeffrey Morgan
15c2d8fe14
server: parallelize embeddings in API web handler instead of in subprocess runner ( #6220 )
...
For simplicity, perform parallelization of embedding requests in the API handler instead of offloading this to the subprocess runner. This keeps the scheduling story simpler as it builds on existing parallel requests, similar to existing text completion functionality.
2024-08-11 11:57:10 -07:00
Daniel Hiltgen
25906d72d1
llm: prevent loading too large models on windows ( #5926 )
...
Don't allow loading models that would lead to memory exhaustion (across vram, system memory and disk paging). This check was already applied on Linux but should also be applied on Windows as well.
2024-08-11 11:30:20 -07:00
CognitiveTech
023451ce47
add integration obook-summary ( #6305 )
2024-08-10 18:43:08 -07:00
Jesse Gross
9b53e39d8e
Merge pull request #6258 from coolljt0725/fix_typo
...
server/download.go: Fix a typo in log
2024-08-09 17:19:48 -07:00
Michael Yang
97fae2df95
Merge pull request #6235 from Nicholas42/fix_line_endings
...
Set *.png and *.ico to be treated as binary files.
2024-08-09 17:06:30 -07:00
Michael Yang
160d9d4900
Merge pull request #6171 from ollama/mxyng/remove-temp
...
removeall to remove non-empty temp dirs
2024-08-09 15:47:13 -07:00
Nicholas Schwab
d4e6407464
Restrict text files with explicit line feeds to *.go.
...
This partially reverts b732beba6a . It
seems like explicitly setting all files to use line feeds was done due
to issues with the go linter, hence it can be restricted to those files
(https://github.com/ollama/ollama/pull/6235#issuecomment-2278745953 ).
2024-08-09 23:14:13 +02:00
Daniel Hiltgen
b7f7d8cd15
Merge pull request #6291 from dhiltgen/no_sparse_fail
...
Don't hard fail on sparse setup error
2024-08-09 12:30:25 -07:00
Daniel Hiltgen
2fa1db4345
Don't hard fail on sparse setup error
...
It seems this can fail in some casees, but proceed
with the download anyway.
2024-08-09 12:16:19 -07:00
Daniel Hiltgen
71b0945fc6
Merge pull request #6290 from dhiltgen/intel_npe
...
Harden intel boostrap for nil pointers
2024-08-09 12:14:42 -07:00
Daniel Hiltgen
5bca2e60a7
Harden intel boostrap for nil pointers
2024-08-09 11:31:38 -07:00
Nicholas42
67472e0e89
Also flag *.icns as binary
2024-08-09 13:41:20 +02:00
Daniel Hiltgen
e9aa5117c4
Merge pull request #6133 from dhiltgen/cuda_repo
...
Adjust arm cuda repo paths
2024-08-08 12:33:35 -07:00
Daniel Hiltgen
2473bdba5e
Merge pull request #6182 from dhiltgen/more_patterns
...
Catch one more error log
2024-08-08 12:33:17 -07:00
Michael Yang
2003d60159
llama3.1 memory
2024-08-08 11:18:13 -07:00
Jesse Gross
7d1c0047fa
Merge pull request #6247 from ollama/jessegross/layers
...
Store layers inside manifests consistently as values.
2024-08-08 10:46:43 -07:00
Jitang Lei
7b61eba471
server/download.go: Fix a typo in log
...
Signed-off-by: Jitang Lei <leijitang@outlook.com >
2024-08-08 20:28:01 +08:00
Jesse Gross
7edaf6e7e8
manifest: Store layers inside manifests consistently as values.
...
Commit 1829fb61 ("manifest: Fix crash on startup when trying to clean up
unused files (#5840 )") changed the config layer stored in manifests
from a pointer to a value. This was done in order to avoid potential
nil pointer dereferences after it is deserialized from JSON in the
event that the field is missing.
This changes the Layers slice to also be stored by value. This enables
consistency in handling across the two objects.
2024-08-07 17:03:06 -07:00
Jesse Gross
97ec8cfd4e
image: Clarify argument to WriteManifest is config
...
When creating a model the config layer is appended to the list of
layers and then the last layer is used as the config when writing the
manifest. This change directly uses the config layer to write the
manifest. There is no behavior change but it is less error prone.
2024-08-07 16:58:42 -07:00
royjhan
5b3a21b578
add metrics to docs ( #6079 )
2024-08-07 14:43:44 -07:00
Kyle Kelley
ad0c19dde4
Use llama3.1 in tools example ( #5985 )
...
* Use llama3.1 in tools example
* Update api.md
2024-08-07 17:20:50 -04:00
Jesse Gross
69eb06c40e
Merge pull request #6145 from ollama/jessegross/bug5840
...
Fix crash on startup when trying to clean up unused files (#5840 )
2024-08-07 11:24:15 -07:00
Jesse Gross
1829fb61bd
manifest: Fix crash on startup when trying to clean up unused files ( #5840 )
...
Currently if the config field is missing in the manifest file (or
corrupted), Ollama will crash when it tries to read it. This can
happen at startup or when pulling new models.
This data is mostly just used for showing model information so we
can be tolerant of it not being present - it is not required to
run the models. Besides avoiding crashing, this also gives us the
ability to restructure the config in the future by pulling it
into the main manifest file.
2024-08-07 10:30:44 -07:00
Nicholas Schwab
ce67706037
Set *.png and *.ico to be treated as binary files.
...
The change b732beba6 makes all files text files and sets lf as eol. This
will automatically change all files to have lf if they are touched by
git (e.g. via git status). This change cannot be stashed and makes it
hard to work with the repo (rebase and checkout don't really work). See
also #6183 .
Here, we set the offending files (*.png and *.ico, but that might be
more in the future) to be treated as binary files and not be changed by
git.
2024-08-07 18:20:11 +02:00
Jesse Gross
685a53534b
manifest: Don't prune layers if we can't open a manifest file
...
If there is an error when opening a manifest file (corrupted, permission denied, etc.)
then the referenced layers will not be included in the list of active
layers. This causes them to be deleted when pruning happens at startup
or a model is pulled.
In such a situation, we should prefer to preserve data in the hopes that
it can be recovered rather than being agressive about deletion.
2024-08-06 23:11:19 -07:00
Jeffrey Morgan
de4fc29773
llm: reserve required number of slots for embeddings ( #6219 )
2024-08-06 23:20:49 -04:00
Jeffrey Morgan
e04c7012c2
update llama.cpp submodule to 1e6f6554 ( #6208 )
2024-08-06 15:11:45 -04:00
Chua Chee Seng
d4a7216c82
Fixed invalid option provided not displaying the invalid option name problem. ( #6202 )
2024-08-06 14:37:16 -04:00
Daniel Hiltgen
a4fdd03c3b
Merge pull request #6207 from dhiltgen/sparse_win
...
Ensure sparse files on windows during download
2024-08-06 11:06:06 -07:00
Daniel Hiltgen
fc85f50a2b
Ensure sparse files on windows during download
...
The file.Truncate call on windows will write the whole file
unless you set the sparse flag, leading to heavy I/O at the
beginning of download. This should improve our
I/O behavior on windows and put less stress on the users disk.
2024-08-06 10:58:08 -07:00
royjhan
86b907f82a
sort batch results ( #6189 )
2024-08-05 16:55:34 -07:00
Michael Yang
10d49bce70
Merge pull request #6190 from ollama/mxyng/fix-integration
...
fix concurrency test
2024-08-05 16:45:49 -07:00
Michael Yang
7ed367419e
fix concurrency test
2024-08-05 16:36:16 -07:00
Daniel Hiltgen
50ee8b5f56
Merge pull request #6186 from dhiltgen/numa
...
Implement linux NUMA detection
2024-08-05 15:20:06 -07:00
Michael Yang
03bdac0595
Merge pull request #6146 from ollama/mxyng/testing
...
use testing tempdirs
2024-08-05 13:00:05 -07:00
Daniel Hiltgen
f457d63400
Implement linux NUMA detection
...
If the system has multiple numa nodes, enable numa support in llama.cpp
If we detect numactl in the path, use that, else use the basic "distribute" mode.
2024-08-05 12:56:20 -07:00
Daniel Hiltgen
04210aa6dd
Catch one more error log
2024-08-05 09:28:07 -07:00
Michael Yang
43f9d92008
close pid file
2024-08-05 00:41:16 -07:00
Michael Yang
ed6c8bfe57
removeall to remove non-empty temp dirs
2024-08-05 00:41:16 -07:00
Michael Yang
39f2bc6bfc
Merge pull request #6167 from ollama/mxyng/line-feed
...
line feed
2024-08-05 00:06:28 -07:00
frob
b73b0940ef
Disable paging for journalctl ( #6154 )
...
Users using `journalctl` to get logs for issue logging sometimes don't realize that paging is causing information to be missed.
2024-08-05 00:10:53 -04:00
Michael Yang
6a07344786
line feed
2024-08-04 17:25:41 -07:00
sryu1
8b920f35a4
Add Gemma 2 2b ( #6151 )
2024-08-04 10:58:39 -04:00
Ivan Charapanau
4221e39867
Reference ollama integration with Harbor ( #6147 )
2024-08-02 17:03:46 -07:00
Michael Yang
a091fadfda
use testing tempdirs
2024-08-02 16:04:06 -07:00
Michael Yang
77ccbf04dc
Merge pull request #6128 from ollama/mxyng/lint
...
enable gofmt/gofumpt/goimports/tenv
2024-08-02 14:58:40 -07:00
royjhan
4addf6b587
Update OpenAI Compatibility Docs with /v1/completions ( #5311 )
...
* Update docs
* token bug corrected
* Update docs/openai.md
* Update docs/openai.md
* add suffix
* merge conflicts
* merge conflicts
2024-08-02 13:16:23 -07:00
royjhan
85c7f11170
Update docs ( #5310 )
2024-08-02 13:05:57 -07:00
Daniel Hiltgen
df3802a65f
Adjust arm cuda repo paths
...
Ubuntu distros fail to install cuda drivers since aarch64 isn't valid
2024-08-01 17:22:25 -07:00
Michael Yang
b732beba6a
lint
2024-08-01 17:06:06 -07:00
Kim Hallberg
ce1fb4447e
Fix models/{model} URL ( #6132 )
2024-08-01 16:31:47 -07:00
royjhan
558a54b098
Update OpenAI Compatibility Docs with /v1/embeddings ( #5470 )
...
* docs without usage
* no usage
* rm metric note
2024-08-01 16:00:29 -07:00
royjhan
ed52833bb1
Add to docs ( #5309 )
2024-08-01 15:58:13 -07:00
royjhan
6f133a0bdd
OpenAI: Add Usage to v1/embeddings ( #5886 )
...
* add prompt tokens to embed response
* rm slog
* metrics
* types
* prompt n
* clean up
* reset submodule
* add tokens to v1/embeddings
* separate usage
2024-08-01 15:49:37 -07:00
royjhan
f561eecfb8
Update OpenAI Compatibility Docs with /v1/models ( #5151 )
...
* OpenAI Docs
* Update docs/openai.md
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com >
* Remove newline
---------
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com >
2024-08-01 15:48:44 -07:00
Michael Yang
ff7c9060ec
Merge pull request #6115 from slouffka/fix-context
...
Fix context in /api/generate grows too much (#5980 ).
2024-08-01 15:13:59 -07:00
Michael Yang
0ff42e84b0
Merge pull request #4756 from ollama/mxyng/convert2
...
refactor convert
2024-08-01 14:16:30 -07:00
Vyacheslav Moskalev
8a9f946ca7
Refactor and format code.
2024-08-02 03:50:05 +07:00
Vyacheslav Moskalev
3b5210548e
Refactor code. Remove extra variable.
2024-08-01 19:56:15 +07:00
Vyacheslav Moskalev
b0c216584c
Better types and naming closer to style.
2024-08-01 19:43:44 +07:00
Vyacheslav Moskalev
49a5483139
Change the order of context and prompt.
2024-08-01 19:25:56 +07:00
Vyacheslav Moskalev
6bc5c13758
Fix extra context concatenation in generate handler ( #5980 ).
2024-08-01 15:45:58 +07:00
Michael Yang
3e614260af
Merge pull request #6109 from ollama/mxyng/fix-modelfile
...
fix modelfile message quotes
2024-07-31 17:05:43 -07:00
Michael Yang
d87b4a488e
fix modelfile message quotes
2024-07-31 16:52:09 -07:00
Michael Yang
4c14855ad7
Merge pull request #6106 from ollama/mxyng/default-sliding-window-attention
...
patches: phi3 optional sliding window attention
2024-07-31 16:12:06 -07:00
Blake Mizerany
dc77bbcfa4
server: fix json marshalling of downloadBlobPart ( #6108 )
2024-07-31 16:01:24 -07:00
Michael Yang
d8e2664c33
convert: fix parse functions
2024-07-31 15:58:55 -07:00
Michael Yang
eafc607abb
convert: only extract large files
2024-07-31 15:58:55 -07:00
Michael Yang
781fc2d576
Update convert/reader_safetensors.go
...
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com >
2024-07-31 15:58:55 -07:00
Michael Yang
df993fa37b
comments
2024-07-31 15:58:55 -07:00
Michael Yang
5e9db9fb0b
refactor convert
2024-07-31 15:58:33 -07:00
Michael Yang
0f3271db88
patches: phi3 default sliding window attention
2024-07-31 14:58:34 -07:00
Michael Yang
6b252918fb
update convert test to check result data
2024-07-31 10:59:38 -07:00
Michael Yang
c4c84b7a0d
Merge pull request #5196 from ollama/mxyng/messages-2
...
include modelfile messages
2024-07-31 10:18:17 -07:00
Michael Yang
5c1912769e
Merge pull request #5473 from ollama/mxyng/environ
...
fix: environ lookup
2024-07-31 10:18:05 -07:00
Daniel Nguyen
71399aa682
Added BoltAI as a desktop UI for Ollama ( #6096 )
2024-07-31 08:44:58 -07:00
Jeffrey Morgan
463a8aa273
Create SECURITY.md
2024-07-30 21:01:12 -07:00
Michael
3579b4966a
Update README to include Firebase Genkit ( #6083 )
...
Firebase Genkit
2024-07-30 18:40:09 -07:00
Jeffrey Morgan
5d66578356
Update README.md
...
Better example for multi-modal input
2024-07-30 18:08:34 -07:00
jmorganca
afa8d6e9d5
patch gemma support
2024-07-30 18:07:29 -07:00
royjhan
1b44d873e7
Add Metrics to api\embed response ( #5709 )
...
* add prompt tokens to embed response
* rm slog
* metrics
* types
* prompt n
* clean up
* reset submodule
* update tests
* test name
* list metrics
2024-07-30 13:12:21 -07:00
Daniel Hiltgen
cef2c6054d
Merge pull request #5859 from dhiltgen/homogeneous_gpus
...
Prevent partial loading on mixed GPU brands
2024-07-30 11:06:42 -07:00
Daniel Hiltgen
345420998e
Prevent partial loading on mixed GPU brands
...
In mult-brand GPU setups, if we couldn't fully load the model we
would fall through the scheduler and mistakenly try to load across
a mix of brands. This makes sure we find the set of GPU(s) that
best fit for the partial load.
2024-07-30 11:00:55 -07:00
Kim Hallberg
0be8baad2b
Update and Fix example models ( #6065 )
...
* Update example models
* Remove unused README.md
2024-07-29 23:56:37 -07:00
Daniel Hiltgen
1a83581a8e
Merge pull request #5895 from dhiltgen/sched_faq
...
Better explain multi-gpu behavior
2024-07-29 14:25:41 -07:00
Daniel Hiltgen
37926eb991
Merge pull request #5927 from dhiltgen/high_cpu_count
...
Ensure amd gpu nodes are numerically sorted
2024-07-29 14:24:57 -07:00
Daniel Hiltgen
3d4634fdff
Merge pull request #5934 from dhiltgen/missing_cuda_repo
...
Report better error on cuda unsupported os/arch
2024-07-29 14:24:20 -07:00
royjhan
365431d406
return tool calls finish reason for openai ( #5995 )
...
* hot fix
* backend stream support
* clean up
* finish reason
* move to openai
2024-07-29 13:56:57 -07:00
Daniel Hiltgen
161e12cecf
Merge pull request #5932 from dhiltgen/win_font
...
Explain font problems on windows 10
2024-07-29 13:40:24 -07:00
Jeffrey Morgan
46e6327e0f
api: add stringifier for Tool ( #5891 )
2024-07-29 13:35:16 -07:00
Jeffrey Morgan
68ee42f995
update llama.cpp submodule to 6eeaeba1 ( #6039 )
2024-07-29 13:20:26 -07:00
Ikko Eltociear Ashimine
f26aef9a8b
docs: update README.md ( #6059 )
...
HuggingFace -> Hugging Face
2024-07-29 10:53:30 -07:00
Michael Yang
38d9036b59
Merge pull request #5992 from ollama/mxyng/save
...
fix: model save
2024-07-29 09:53:19 -07:00
Veit Heller
6f26e9322f
Fix typo in image docs ( #6041 )
2024-07-29 08:50:53 -07:00
Jeffrey Morgan
0e4d653687
upate to llama3.1 elsewhere in repo ( #6032 )
2024-07-28 19:56:02 -07:00
Michael
2c01610616
update readme to llama3.1 ( #5933 )
2024-07-28 14:21:38 -07:00
Tibor Schmidt
f3d7a481b7
feat: add support for min_p ( resolve #1142 ) ( #1825 )
2024-07-27 14:37:40 -07:00
Jeffrey Morgan
f2a96c7d77
llm: keep patch for llama 3 rope factors ( #5987 )
2024-07-26 15:20:52 -07:00
Daniel Hiltgen
e8a66680d1
Merge pull request #5705 from dhiltgen/win_errormode
...
Enable windows error dialog for subprocess
2024-07-26 14:49:34 -07:00
Michael Yang
079b2c3b03
Merge pull request #5999 from ollama/mxyng/fix-push
...
fix nil deref in auth.go
2024-07-26 14:28:34 -07:00
Blake Mizerany
750c1c55f7
server: fix race conditions during download ( #5994 )
...
This fixes various data races scattered throughout the download/pull
client where the client was accessing the download state concurrently.
This commit is mostly a hot-fix and will be replaced by a new client one
day soon.
Also, remove the unnecessary opts argument from downloadChunk.
2024-07-26 14:24:24 -07:00
Michael Yang
a622c47bd3
fix nil deref in auth.go
2024-07-26 14:14:48 -07:00
Michael Yang
ec4c35fe99
Merge pull request #5512 from ollama/mxyng/detect-stop
...
autodetect stop parameters from template
2024-07-26 13:48:23 -07:00
Michael Yang
a250c2cb13
display messages
2024-07-26 13:39:57 -07:00
Michael Yang
3d9de805b7
fix: model save
...
stop parameter is saved as a slice which is incompatible with modelfile
parsing
2024-07-26 13:23:06 -07:00
Michael Yang
15af558423
include modelfile messages
2024-07-26 11:40:11 -07:00
Jeffrey Morgan
f5e3939220
Update api.md ( #5968 )
2024-07-25 23:10:18 -04:00
Jeffrey Morgan
ae27d9dcfd
Update openai.md
2024-07-25 20:27:33 -04:00
Michael Yang
37096790a7
Merge pull request #5552 from ollama/mxyng/messages-docs
...
docs
2024-07-25 16:26:19 -07:00
Michael Yang
997c903884
Update docs/template.md
...
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com >
2024-07-25 16:23:40 -07:00
Blake Mizerany
c8af3c2d96
server: reuse original download URL for images ( #5962 )
...
This changes the registry client to reuse the original download URL
it gets on the first redirect response for all subsequent requests,
preventing thundering herd issues when hot new LLMs are released.
2024-07-25 15:58:30 -07:00
Jeffrey Morgan
455e61170d
Update openai.md
2024-07-25 18:34:47 -04:00
royjhan
4de1370a9d
openai tools doc ( #5617 )
2024-07-25 18:34:06 -04:00
Jeffrey Morgan
bbf8f102ee
Revert "llm(llama): pass rope factors ( #5924 )" ( #5963 )
...
This reverts commit bb46bbcf5e .
2024-07-25 18:24:55 -04:00
Daniel Hiltgen
ce3c93b08f
Report better error on cuda unsupported os/arch
...
If we detect an NVIDIA GPU, but nvidia doesn't support the os/arch,
this will report a better error for the user and point them to docs
to self-install the drivers if possible.
2024-07-24 17:09:20 -07:00
Daniel Hiltgen
6c2129d5d0
Explain font problems on windows 10
2024-07-24 15:22:00 -07:00
Daniel Hiltgen
7c2a157ca4
Ensure amd gpu nodes are numerically sorted
...
For systems that enumerate over 10 CPUs the default lexicographical
sort order interleaves CPUs and GPUs.
2024-07-24 13:43:26 -07:00
Michael Yang
bb46bbcf5e
llm(llama): pass rope factors ( #5924 )
2024-07-24 16:05:59 -04:00
royjhan
ac33aa7d37
Fix Embed Test Flakes ( #5893 )
...
* float cmp
* increase tolerance
2024-07-24 11:15:46 -07:00
Daniel Hiltgen
830fdd2715
Better explain multi-gpu behavior
2024-07-23 15:16:38 -07:00
Ajay Chintala
a6cd8f6169
Update README.md to add LLMStack integration ( #5799 )
2024-07-23 14:40:23 -04:00
Daniel Hiltgen
c78089263a
Merge pull request #5864 from dhiltgen/bump_go
...
Bump Go patch version
2024-07-22 16:34:18 -07:00
Daniel Hiltgen
3e5ea035d5
Merge pull request #5757 from lreed-mdsol/lreed/bump-go-version-fix-vulnerabilities
...
bump go version to 1.22.5 to fix security vulnerabilities in docker
2024-07-22 16:32:43 -07:00
Daniel Hiltgen
5d604eec5b
Bump Go patch version
2024-07-22 16:16:28 -07:00
Josh
db0968f30c
fix dupe err message ( #5857 )
2024-07-22 15:48:15 -07:00
Daniel Hiltgen
e12fff8810
Enable windows error dialog for subprocess startup
...
Make sure if something goes wrong spawning the process, the user gets
enough info to be able to try to self correct, or at least file a bug
with details so we can fix it. Once the process starts, we immediately
change back to the recommended setting to prevent the blocking dialog.
This ensures if the model fails to load (OOM, unsupported model type,
etc.) the process will exit quickly and we can scan the stdout/stderr
of the subprocess for the reason to report via API.
2024-07-22 14:07:27 -07:00
Michael Yang
9b60a038e5
update api.md
2024-07-22 13:49:51 -07:00
Michael Yang
83a0cb8d88
docs
2024-07-22 13:38:09 -07:00
royjhan
c0648233f2
api embed docs ( #5282 )
2024-07-22 13:37:08 -07:00
Jeffrey Morgan
d835368eb8
convert: capture head_dim for mistral ( #5818 )
2024-07-22 16:16:22 -04:00
Michael Yang
85d9d73a72
comments
2024-07-22 11:49:03 -07:00
Michael Yang
78140a712c
cleanup tests
2024-07-22 11:49:03 -07:00
Michael Yang
1954ec5917
uint64
2024-07-22 11:49:02 -07:00
Michael Yang
0f1910129f
int
2024-07-22 11:30:07 -07:00
Michael Yang
e2c3f6b3e2
string
2024-07-22 11:27:52 -07:00
Michael Yang
8570c1c0ef
keepalive
2024-07-22 11:27:22 -07:00
Michael Yang
55cd3ddcca
bool
2024-07-22 11:27:21 -07:00
Michael Yang
66fe77f084
models
2024-07-22 11:26:12 -07:00
Michael Yang
d1a5227cad
origins
2024-07-22 11:25:30 -07:00
Michael Yang
4f1afd575d
host
2024-07-22 11:25:30 -07:00
Michael Yang
35b89b2eab
rfc: dynamic environ lookup
2024-07-22 11:25:30 -07:00
Daniel Hiltgen
5784c05397
Merge pull request #5854 from dhiltgen/win_exit_status
...
Refine error reporting for subprocess crash
2024-07-22 10:40:22 -07:00
Daniel Hiltgen
f14aa5435d
Merge pull request #5855 from dhiltgen/remove_max_vram
...
Remove no longer supported max vram var
2024-07-22 10:35:29 -07:00
Jeffrey Morgan
f8fedbda20
Update llama.cpp submodule commit to d94c6e0c ( #5805 )
2024-07-22 12:42:00 -04:00
Jeffrey Morgan
b3e5491e41
server: collect nested tool call objects when parsing ( #5824 )
2024-07-22 12:38:03 -04:00
Daniel Hiltgen
cc269ba094
Remove no longer supported max vram var
...
The OLLAMA_MAX_VRAM env var was a temporary workaround for OOM
scenarios. With Concurrency this was no longer wired up, and the simplistic
value doesn't map to multi-GPU setups. Users can still set `num_gpu`
to limit memory usage to avoid OOM if we get our predictions wrong.
2024-07-22 09:08:11 -07:00
Daniel Hiltgen
a3c20e3f18
Refine error reporting for subprocess crash
...
On windows, the exit status winds up being the search term many
users search for and end up piling in on issues that are unrelated.
This refines the reporting so that if we have a more detailed message
we'll suppress the exit status portion of the message.
2024-07-22 08:52:16 -07:00
Jeffrey Morgan
80ee9b5e47
Remove out of space test temporarily ( #5825 )
2024-07-21 00:22:11 -04:00
Jeffrey Morgan
5534f2cc6a
llm: consider head_dim in llama arch ( #5817 )
2024-07-20 21:48:12 -04:00
Daniel Hiltgen
d321297d8a
Merge pull request #5815 from dhiltgen/win_rocm_gfx_features
...
Adjust windows ROCm discovery
2024-07-20 16:02:55 -07:00
Daniel Hiltgen
06e5d74e34
Merge pull request #5506 from dhiltgen/sched_tests
...
Refine scheduler unit tests for reliability
2024-07-20 15:48:39 -07:00
Daniel Hiltgen
5d707e6fd5
Merge pull request #5583 from dhiltgen/integration_improvements
...
Fix context exhaustion integration test for small gpus
2024-07-20 15:48:21 -07:00
Daniel Hiltgen
283948c83b
Adjust windows ROCm discovery
...
The v5 hip library returns unsupported GPUs which wont enumerate at
inference time in the runner so this makes sure we align discovery. The
gfx906 cards are no longer supported so we shouldn't compile with that
GPU type as it wont enumerate at runtime.
2024-07-20 15:17:50 -07:00
Jeffrey Morgan
1475eab95f
add patch for tekken ( #5807 )
2024-07-20 13:41:21 -04:00
Jeffrey Morgan
20090f3172
preserve last assistant message ( #5802 )
2024-07-19 20:19:26 -07:00
Jeffrey Morgan
69a2d4ccff
Fix generate test flakyness ( #5804 )
2024-07-19 19:11:25 -07:00
Josh
e8b954c646
server: validate template ( #5734 )
...
add template validation to modelfile
2024-07-19 15:24:29 -07:00
royjhan
c57317cbf0
OpenAI: Function Based Testing ( #5752 )
...
* distinguish error forwarding
* more coverage
* rm comment
2024-07-19 11:37:12 -07:00
royjhan
51b2fd299c
adjust openai chat msg processing ( #5729 )
2024-07-19 11:19:20 -07:00
Michael Yang
d0634b1596
Merge pull request #5780 from ollama/mxyng/tools
...
fix parsing tool calls: break on unexpected eofs
2024-07-18 12:14:10 -07:00
Michael Yang
43606d6d6a
fix parsing tool calls
2024-07-18 12:08:11 -07:00
Jeffrey Morgan
70b1010fa5
server: check for empty tools array too ( #5779 )
2024-07-18 11:44:57 -07:00
Jeffrey Morgan
84e5721f3a
always provide content even if empty ( #5778 )
2024-07-18 11:28:19 -07:00
Jeffrey Morgan
319fb1ce03
server: only parse tool calls if tools are provided ( #5771 )
...
* server: only parse tool calls if tools are provided
* still set `resp.Message.Content`
2024-07-18 08:50:23 -07:00
Michael Yang
b255445557
marshal json automatically for some template values ( #5758 )
2024-07-17 15:35:11 -07:00
lreed
f02f83660c
bump go version to 1.22.5 to fix security vulnerabilities
2024-07-17 21:44:19 +00:00
Michael Yang
b23424bb3c
Merge pull request #5753 from ollama/mxyng/parse-tool-call
...
parse tool call as individual objects
2024-07-17 11:47:53 -07:00
Michael Yang
5fd6988126
parse tool call as individual objects
2024-07-17 11:19:04 -07:00
Michael Yang
5b82960df8
stub response ( #5750 )
2024-07-17 10:39:22 -07:00
Michael Yang
cc9a252d8c
Merge pull request #5732 from ollama/mxyng/cleanup
...
remove ToolCall from GenerateResponse
2024-07-17 10:26:54 -07:00
Pákozdi György
d281a6e603
add sidellama link ( #5702 )
2024-07-17 10:24:44 -07:00
royjhan
154f6f45d4
OpenAI: Support Tools ( #5614 )
...
* reopen pr
* tools
* remove tc from stream for now
* ID and Function
* openai expects arguments to be a string (#5739 )
* mutually exclusive content and tool calls
* clean up
---------
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com >
2024-07-16 20:52:59 -07:00
royjhan
0d41623b52
OpenAI: Add Suffix to v1/completions ( #5611 )
...
* add suffix
* remove todo
* remove TODO
* add to test
* rm outdated prompt tokens info md
* fix test
* fix test
2024-07-16 20:50:14 -07:00
Michael Yang
c279f96371
remove ToolCall from GenerateResponse
2024-07-16 15:22:49 -07:00
Michael Yang
499e87c9ba
Merge pull request #5730 from ollama/mxyng/cleanup
...
remove unneeded tool calls
2024-07-16 14:42:13 -07:00
Michael Yang
cd0853f2d5
Merge pull request #5207 from ollama/mxyng/suffix
...
add insert support to generate endpoint
2024-07-16 14:37:32 -07:00
Michael Yang
d290e87513
add suffix support to generate endpoint
...
this change is triggered by the presence of "suffix", particularly
useful for code completion tasks
2024-07-16 14:31:35 -07:00
Thorsten Sommer
97c20ede33
README: Added AI Studio to the list of UIs ( #5721 )
...
* Added AI Studio to the list of UIs
2024-07-16 14:24:27 -07:00
Michael Yang
5a83f79afd
remove unneeded tool calls
2024-07-16 13:48:45 -07:00
royjhan
987dbab0b0
OpenAI: /v1/embeddings compatibility ( #5285 )
...
* OpenAI v1 models
* Empty List Testing
* Add back envconfig
* v1/models docs
* Remove Docs
* OpenAI batch embed compatibility
* merge conflicts
* integrate with api/embed
* ep
* merge conflicts
* request tests
* rm resp test
* merge conflict
* merge conflict
* test fixes
* test fn renaming
* input validation for empty string
---------
Co-authored-by: jmorganca <jmorganca@gmail.com >
2024-07-16 13:36:08 -07:00
Michael Yang
a8388beb94
Merge pull request #5726 from ollama/mxyng/tools-templates
...
fix unmarshal type errors
2024-07-16 12:12:10 -07:00
Michael Yang
5afbb60fc4
fix unmarshal type errors
2024-07-16 11:39:34 -07:00
Jeffrey Morgan
4cb5d7decc
server: omit model system prompt if empty ( #5717 )
2024-07-16 11:09:00 -07:00
Michael Yang
8eac50dd4f
Merge pull request #5684 from ollama/mxyng/tests
...
add chat and generate tests with mock runner
2024-07-16 09:44:45 -07:00
Michael Yang
4a565cbf94
add chat and generate tests with mock runner
2024-07-16 09:39:31 -07:00
Michael Yang
64039df6d7
Merge pull request #5284 from ollama/mxyng/tools
...
tools
2024-07-15 18:03:37 -07:00
Jeffrey Morgan
7ac6d462ec
server: return empty slice on empty /api/embed request ( #5713 )
...
* server: return empty slice on empty `/api/embed` request
* fix tests
2024-07-15 17:39:44 -07:00
Michael Yang
ef5136a745
tools test
2024-07-15 17:18:21 -07:00
Daniel Hiltgen
8288ec8824
Merge pull request #5710 from dhiltgen/rocm_bump
...
Bump linux ROCm to 6.1.2
2024-07-15 15:32:18 -07:00
Michael Yang
d02bbebb11
tools
2024-07-15 15:26:16 -07:00
Daniel Hiltgen
224337b32f
Bump linux ROCm to 6.1.2
2024-07-15 15:10:22 -07:00
Jeffrey Morgan
9e35d9bbee
server: lowercase roles for compatibility with clients ( #5695 )
2024-07-15 13:55:57 -07:00
royjhan
b9f5e16c80
Introduce /api/embed endpoint supporting batch embedding ( #5127 )
...
* Initial Batch Embedding
* Revert "Initial Batch Embedding"
This reverts commit c22d54895a .
* Initial Draft
* mock up notes
* api/embed draft
* add server function
* check normalization
* clean up
* normalization
* playing around with truncate stuff
* Truncation
* Truncation
* move normalization to go
* Integration Test Template
* Truncation Integration Tests
* Clean up
* use float32
* move normalize
* move normalize test
* refactoring
* integration float32
* input handling and handler testing
* Refactoring of legacy and new
* clear comments
* merge conflicts
* touches
* embedding type 64
* merge conflicts
* fix hanging on single string
* refactoring
* test values
* set context length
* clean up
* testing clean up
* testing clean up
* remove function closure
* Revert "remove function closure"
This reverts commit 55d48c6ed1 .
* remove function closure
* remove redundant error check
* clean up
* more clean up
* clean up
2024-07-15 12:14:24 -07:00
royjhan
e9f7f36029
Support image input for OpenAI chat compatibility ( #5208 )
...
* OpenAI v1 models
* Refactor Writers
* Add Test
Co-Authored-By: Attila Kerekes
* Credit Co-Author
Co-Authored-By: Attila Kerekes <439392+keriati@users.noreply.github.com >
* Empty List Testing
* Use Namespace for Ownedby
* Update Test
* Add back envconfig
* v1/models docs
* Use ModelName Parser
* Test Names
* Remove Docs
* Clean Up
* Test name
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com >
* Add Middleware for Chat and List
* Testing Cleanup
* Test with Fatal
* Add functionality to chat test
* Support image input for OpenAI chat
* Decoding
* Fix message processing logic
* openai vision test
* type errors
* clean up
* redundant check
* merge conflicts
* merge conflicts
* merge conflicts
* flattening and smaller image
* add test
* support python and js SDKs and mandate prefixing
* clean up
---------
Co-authored-by: Attila Kerekes <439392+keriati@users.noreply.github.com >
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com >
2024-07-13 22:07:45 -07:00
Patrick Devine
057d31861e
remove template ( #5655 )
2024-07-13 20:56:24 -07:00
jmorganca
f7ee012300
server: prepend system message in chat handler
2024-07-13 15:08:00 -07:00
Jeffrey Morgan
1ed0aa8fea
server: fix context, load_duration and total_duration fields ( #5676 )
...
* server: fix `contet`, `load_duration` and `total_duration` fields
* Update server/routes.go
2024-07-13 09:25:31 -07:00
Jeffrey Morgan
ef98803d63
llm: looser checks for minimum memory ( #5677 )
2024-07-13 09:20:05 -07:00
Jarek
02fea420e5
Add Kerlig AI, an app for macOS ( #5675 )
2024-07-13 08:33:46 -07:00
Michael Yang
22c5451fc2
fix system prompt ( #5662 )
...
* fix system prompt
* execute template when hitting previous roles
* fix tests
---------
Co-authored-by: jmorganca <jmorganca@gmail.com >
2024-07-12 21:04:44 -07:00
Michael Yang
ebc529cbb3
autodetect stop parameters from template
2024-07-12 16:01:23 -07:00
Patrick Devine
23ebbaa46e
Revert "remove template from tests"
...
This reverts commit 9ac0a7a50b .
2024-07-12 15:47:17 -07:00
Patrick Devine
9ac0a7a50b
remove template from tests
2024-07-12 15:41:31 -07:00
Michael Yang
e5c65a85df
Merge pull request #5653 from ollama/mxyng/collect-system
...
template: preprocess message and collect system
2024-07-12 12:32:34 -07:00
Jeffrey Morgan
33627331a3
app: also clean up tempdir runners on install ( #5646 )
2024-07-12 12:29:23 -07:00
Michael Yang
36c87c433b
template: preprocess message and collect system
2024-07-12 12:26:43 -07:00
Jeffrey Morgan
179737feb7
Clean up old files when installing on Windows ( #5645 )
...
* app: always clean up install dir; force close applications
* remove wildcard
* revert `CloseApplications`
* whitespace
* update `LOCALAPPDATA` var
2024-07-11 22:53:46 -07:00
Michael Yang
47353f5ee4
Merge pull request #5639 from ollama/mxyng/unaggregated-system
2024-07-11 17:48:50 -07:00
Josh
10e768826c
fix: quant err message ( #5616 )
2024-07-11 17:24:29 -07:00
Michael Yang
5056bb9c01
rename aggregate to contents
2024-07-11 17:00:26 -07:00
Jeffrey Morgan
c4cf8ad559
llm: avoid loading model if system memory is too small ( #5637 )
...
* llm: avoid loading model if system memory is too small
* update log
* Instrument swap free space
On linux and windows, expose how much swap space is available
so we can take that into consideration when scheduling models
* use `systemSwapFreeMemory` in check
---------
Co-authored-by: Daniel Hiltgen <daniel@ollama.com >
2024-07-11 16:42:57 -07:00
Michael Yang
57ec6901eb
revert embedded templates to use prompt/response
...
This reverts commit 19753c18c0 .
for compat. messages will be added at a later date
2024-07-11 14:49:35 -07:00
Michael Yang
e64f9ebb44
do no automatically aggregate system messages
2024-07-11 14:49:35 -07:00
Jeffrey Morgan
791650ddef
sched: only error when over-allocating system memory ( #5626 )
2024-07-11 00:53:12 -07:00
Jeffrey Morgan
efbf41ed81
llm: dont link cuda with compat libs ( #5621 )
2024-07-10 20:01:52 -07:00
Michael Yang
cf15589851
Merge pull request #5620 from ollama/mxyng/templates
...
update embedded templates
2024-07-10 17:16:24 -07:00
Michael Yang
19753c18c0
update embedded templates
2024-07-10 17:03:08 -07:00
Michael Yang
41be28096a
add system prompt to first legacy template
2024-07-10 17:03:08 -07:00
Michael Yang
37a570f962
Merge pull request #5612 from ollama/mxyng/mem
...
chatglm graph
2024-07-10 14:18:33 -07:00
Michael Yang
5a739ff4cb
chatglm graph
2024-07-10 13:43:47 -07:00
Jeffrey Morgan
4e262eb2a8
remove GGML_CUDA_FORCE_MMQ=on from build ( #5588 )
2024-07-10 13:17:13 -07:00
Daniel Hiltgen
4cfcbc328f
Merge pull request #5124 from dhiltgen/amd_windows
...
Wire up windows AMD driver reporting
2024-07-10 12:50:23 -07:00
Daniel Hiltgen
79292ff3e0
Merge pull request #5555 from dhiltgen/msvc_deps
...
Bundle missing CRT libraries
2024-07-10 12:50:02 -07:00
Daniel Hiltgen
8ea500441d
Merge pull request #5580 from dhiltgen/cuda_overhead
...
Detect CUDA OS overhead
2024-07-10 12:47:31 -07:00
Daniel Hiltgen
b50c818623
Merge pull request #5607 from dhiltgen/win_rocm_v6
...
Bump ROCm on windows to 6.1.2
2024-07-10 12:47:10 -07:00
Daniel Hiltgen
b99e750b62
Merge pull request #5605 from dhiltgen/merge_glitch
...
Remove duplicate merge glitch
2024-07-10 11:47:08 -07:00
Daniel Hiltgen
1f50356e8e
Bump ROCm on windows to 6.1.2
...
This also adjusts our algorithm to favor our bundled ROCm.
I've confirmed VRAM reporting still doesn't work properly so we
can't yet enable concurrency by default.
2024-07-10 11:01:22 -07:00
Daniel Hiltgen
22c81f62ec
Remove duplicate merge glitch
2024-07-10 09:01:33 -07:00
Daniel Hiltgen
73e2c8f68f
Fix context exhaustion integration test for small gpus
...
On the smaller GPUs, the initial model load of llama2 took over 30s (the
default timeout for the DoGenerate helper)
2024-07-09 16:24:14 -07:00
Daniel Hiltgen
f4408219e9
Refine scheduler unit tests for reliability
...
This breaks up some of the test scenarios to create a
more reliable set of tests, as well as adding a little more
coverage.
2024-07-09 16:00:08 -07:00
Daniel Hiltgen
2d1e3c3229
Merge pull request #5503 from dhiltgen/dual_rocm
...
Workaround broken ROCm p2p copy
2024-07-09 15:44:16 -07:00
royjhan
4918fae535
OpenAI v1/completions: allow stop token list ( #5551 )
...
* stop token parsing fix
* add stop test
2024-07-09 14:01:26 -07:00
royjhan
0aff67877e
separate request tests ( #5578 )
2024-07-09 13:48:31 -07:00
Daniel Hiltgen
f6f759fc5f
Detect CUDA OS Overhead
...
This adds logic to detect skew between the driver and
management library which can be attributed to OS overhead
and records that so we can adjust subsequent management
library free VRAM updates and avoid OOM scenarios.
2024-07-09 12:21:50 -07:00
Daniel Hiltgen
9544a57ee4
Merge pull request #5579 from dhiltgen/win_static_deps
...
Statically link c++ and thread lib on windows
2024-07-09 12:21:13 -07:00
Daniel Hiltgen
b51e3b63ac
Statically link c++ and thread lib
...
This makes sure we statically link the c++ and thread library on windows
to avoid unnecessary runtime dependencies on non-standard DLLs
2024-07-09 11:34:30 -07:00
Michael Yang
6bbbc50f10
Merge pull request #5440 from ollama/mxyng/messages-templates
...
update named templates
2024-07-09 09:36:32 -07:00
Michael Yang
9bbddc37a7
Merge pull request #5126 from ollama/mxyng/messages
...
update message processing
2024-07-09 09:20:44 -07:00
Jeffrey Morgan
e4ff73297d
server: fix model reloads when setting OLLAMA_NUM_PARALLEL ( #5560 )
...
* server: fix unneeded model reloads when setting `OLLAMA_NUM_PARALLEL`
* remove whitespace change
* undo some changes
2024-07-08 22:32:15 -07:00
Daniel Hiltgen
b44320db13
Bundle missing CRT libraries
...
Some users are experienging runner startup errors due
to not having these msvc redist libraries on their host
2024-07-08 18:24:21 -07:00
Daniel Hiltgen
0bacb30007
Workaround broken ROCm p2p copy
...
Enable the build flag for llama.cpp to use CPU copy for multi-GPU scenarios.
2024-07-08 09:40:52 -07:00
Jeffrey Morgan
53da2c6965
llm: remove ambiguous comment when putting upper limit on predictions to avoid infinite generation ( #5535 )
2024-07-07 14:32:05 -04:00
Jeffrey Morgan
d8def1ff94
llm: allow gemma 2 to context shift ( #5534 )
2024-07-07 13:41:51 -04:00
Jeffrey Morgan
571dc61955
Update llama.cpp submodule to a8db2a9c ( #5530 )
2024-07-07 13:03:09 -04:00
Jeffrey Morgan
0e09c380fc
llm: print caching notices in debug only ( #5533 )
2024-07-07 12:38:04 -04:00
Jeffrey Morgan
0ee87615c7
sched: don't error if paging to disk on Windows and macOS ( #5523 )
2024-07-06 22:01:52 -04:00
Jeffrey Morgan
f8241bfba3
gpu: report system free memory instead of 0 ( #5521 )
2024-07-06 19:35:04 -04:00
Jeffrey Morgan
4607c70641
llm: add -DBUILD_SHARED_LIBS=off to common cpu cmake flags ( #5520 )
2024-07-06 18:58:16 -04:00
jmorganca
c12f1c5b99
release: move mingw library cleanup to correct job
2024-07-06 16:12:29 -04:00
jmorganca
a08f20d910
release: remove unwanted mingw dll.a files
2024-07-06 15:21:15 -04:00
jmorganca
6cea036027
Revert "llm: only statically link libstdc++"
...
This reverts commit 5796bfc401 .
2024-07-06 15:10:48 -04:00
jmorganca
5796bfc401
llm: only statically link libstdc++
2024-07-06 14:06:20 -04:00
jmorganca
f1a379aa56
llm: statically link pthread and stdc++ dependencies in windows build
2024-07-06 12:54:02 -04:00
jmorganca
9ae146993e
llm: add GGML_STATIC flag to windows static lib
2024-07-06 03:27:05 -04:00
Jeffrey Morgan
e0348d3fe8
llm: add COMMON_DARWIN_DEFS to arm static build ( #5513 )
2024-07-05 22:42:42 -04:00
Jeffrey Morgan
2cc854f8cb
llm: fix missing dylibs by restoring old build behavior on Linux and macOS ( #5511 )
...
* Revert "fix cmake build (#5505 )"
This reverts commit 4fd5f3526a .
* llm: fix missing dylibs by restoring old build behavior
* crlf -> lf
2024-07-05 21:48:31 -04:00
Jeffrey Morgan
5304b765b2
llm: put back old include dir ( #5507 )
...
* llm: put back old include dir
* llm: update link paths for old submodule commits
2024-07-05 19:34:21 -04:00
Michael Yang
fb6cbc02fb
update named templates
2024-07-05 16:29:32 -07:00
Jeffrey Morgan
4fd5f3526a
fix cmake build ( #5505 )
2024-07-05 19:07:01 -04:00
Daniel Hiltgen
842f85f758
Merge pull request #5502 from dhiltgen/ci_fixes
...
Always go build in CI generate steps
2024-07-05 15:39:11 -07:00
Daniel Hiltgen
9d30f9f8b3
Always go build in CI generate steps
...
With the recent cgo changes, bugs can sneak through
if we don't make sure to `go build` all the permutations
2024-07-05 15:31:52 -07:00
Blake Mizerany
631cfd9e62
types/model: remove knowledge of digest ( #5500 )
...
This was leading to ambiguity and confusion in ollama.com, and is not
used anywhere in ollama at the moment. Once manifests are addressable by
digest, we can add this back in, and in a way that is more tailored to
the concept of addressing a manifest by digest.
2024-07-05 13:42:30 -07:00
Michael Yang
326363b3a7
no funcs
2024-07-05 13:17:25 -07:00
Michael Yang
ac7a842e55
fix model reloading
...
ensure runtime model changes (template, system prompt, messages,
options) are captured on model updates without needing to reload the
server
2024-07-05 13:17:25 -07:00
Michael Yang
2c3fe1fd97
comments
2024-07-05 13:17:24 -07:00
Michael Yang
269ed6e6a2
update message processing
2024-07-05 13:16:58 -07:00
Jeffrey Morgan
78fb33dd07
fix typo in cgo directives in llm.go ( #5501 )
2024-07-05 15:18:36 -04:00
Jeffrey Morgan
8f8e736b13
update llama.cpp submodule to d7fd29f ( #5475 )
2024-07-05 13:25:58 -04:00
Jeffrey Morgan
d89454de80
Use slot with cached prompt instead of least recently used ( #5492 )
...
* Use common prefix to select slot
* actually report `longest`
2024-07-05 12:32:47 -04:00
Daniel Hiltgen
af28b94533
Merge pull request #5469 from dhiltgen/prevent_system_oom
...
Prevent loading models larger than total memory
2024-07-05 08:22:20 -07:00
Jeffrey Morgan
e9188e971a
Fix assert on small embedding inputs ( #5491 )
...
* Fix assert on small embedding inputs
* Update llm/patches/09-pooling.diff
2024-07-05 11:20:57 -04:00
Daniel Hiltgen
78eddfc068
Merge pull request #4412 from dhiltgen/win_docs
...
Document older win10 terminal problems
2024-07-05 08:18:22 -07:00
Daniel Hiltgen
02c24d3d01
Merge pull request #5466 from dhiltgen/fix_clip_unicode
...
Fix clip model loading with unicode paths
2024-07-05 08:16:58 -07:00
Daniel Hiltgen
52abc8acb7
Document older win10 terminal problems
...
We haven't found a workaround, so for now recommend updating.
2024-07-03 17:32:14 -07:00
Jeffrey Morgan
4d71c559b2
fix error detection by limiting model loading error parsing ( #5472 )
2024-07-03 20:04:30 -04:00
Anatoli Babenia
0d16eb310e
fix: use envconfig.ModelsDir directly ( #4821 )
...
* Co-authored-by: Anatoli Babenia <anatoli@rainforce.org >
Co-authored-by: Maas Lalani <maas@lalani.dev >
2024-07-03 15:36:11 -07:00
Daniel Hiltgen
8072e205ff
Merge pull request #5447 from dhiltgen/fix_keepalive
...
Only set default keep_alive on initial model load
2024-07-03 15:34:38 -07:00
Daniel Hiltgen
955f2a4e03
Only set default keep_alive on initial model load
...
This change fixes the handling of keep_alive so that if client
request omits the setting, we only set this on initial load. Once
the model is loaded, if new requests leave this unset, we'll keep
whatever keep_alive was there.
2024-07-03 15:29:56 -07:00
Daniel Hiltgen
3c75113e37
Prevent loading models larger than total memory
...
Users may not realize the siny new model they're trying to load
fits on their disk, but can't load into system+GPU memory. Today
we crash, but with this fix, we'll give them a better error message
before even trying to load it.
2024-07-03 14:47:42 -07:00
Daniel Hiltgen
ccd7785859
Merge pull request #5243 from dhiltgen/modelfile_use_mmap
...
Fix use_mmap for modefiles
2024-07-03 13:59:42 -07:00
royjhan
3b5a4a77f3
Return Correct Prompt Eval Count Regardless of Cache Prompt ( #5371 )
...
* openai compatibility
* Revert "openai compatibility"
This reverts commit d3f98a811e .
* remove erroneous subtraction of prompt cache
2024-07-03 13:46:23 -07:00
Daniel Hiltgen
daed0634a9
Merge pull request #5467 from dhiltgen/bogus_cpu_mac_error
...
Fix corner cases on tmp cleaner on mac
2024-07-03 13:39:36 -07:00
Daniel Hiltgen
0d4dd707bc
Merge pull request #5465 from dhiltgen/better_cuda_logging
...
Better nvidia GPU discovery logging
2024-07-03 13:12:22 -07:00
Daniel Hiltgen
0e982bc1f4
Fix corner cases on tmp cleaner on mac
...
When ollama is running a long time, tmp cleaners can remove the
runners. This tightens up a few corner cases on arm macs where
we failed with "server cpu not listed in available servers map[]"
2024-07-03 13:10:14 -07:00
Daniel Hiltgen
6298f49816
Fix clip model loading with unicode paths
...
On windows, if the model dir contained unicode characters
clip models would fail to load. This fixes the file name
handling in clip.cpp to support utf16 on windows.
2024-07-03 12:46:36 -07:00
Daniel Hiltgen
ef757da2c9
Better nvidia GPU discovery logging
...
Refine the way we log GPU discovery to improve the non-debug
output, and report more actionable log messages when possible
to help users troubleshoot on their own.
2024-07-03 10:50:40 -07:00
Michael Yang
e5352297d9
Merge pull request #5448 from ollama/mxyng/fix-generate
...
use model template by default
2024-07-02 16:48:06 -07:00
Michael Yang
65a5040e09
fix generate template
2024-07-02 16:42:17 -07:00
royjhan
d626b99b54
OpenAI: v1/completions compatibility ( #5209 )
...
* OpenAI v1 models
* Refactor Writers
* Add Test
Co-Authored-By: Attila Kerekes
* Credit Co-Author
Co-Authored-By: Attila Kerekes <439392+keriati@users.noreply.github.com >
* Empty List Testing
* Use Namespace for Ownedby
* Update Test
* Add back envconfig
* v1/models docs
* Use ModelName Parser
* Test Names
* Remove Docs
* Clean Up
* Test name
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com >
* Add Middleware for Chat and List
* Completions Endpoint
* Testing Cleanup
* Test with Fatal
* Add functionality to chat test
* Rename function
* float types
* type cleanup
* cleaning
* more cleaning
* Extra test cases
* merge conflicts
* merge conflicts
* merge conflicts
* merge conflicts
* cleaning
* cleaning
---------
Co-authored-by: Attila Kerekes <439392+keriati@users.noreply.github.com >
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com >
2024-07-02 16:01:45 -07:00
Michael Yang
dddb58a38b
Merge pull request #5051 from ollama/mxyng/capabilities
...
add model capabilities
2024-07-02 14:26:07 -07:00
Michael Yang
400056e154
Merge pull request #5420 from ollama/mxyng/insecure-path
...
err on insecure path
2024-07-02 14:03:23 -07:00
Daniel Hiltgen
d2f19024d0
Merge pull request #5442 from dhiltgen/concurrency_docs
...
Add windows radeon concurrency note
2024-07-02 12:47:47 -07:00
Daniel Hiltgen
69c04eecc4
Add windows radeon concurreny note
2024-07-02 12:46:14 -07:00
royjhan
996bb1b85e
OpenAI: /v1/models and /v1/models/{model} compatibility ( #5007 )
...
* OpenAI v1 models
* Refactor Writers
* Add Test
Co-Authored-By: Attila Kerekes
* Credit Co-Author
Co-Authored-By: Attila Kerekes <439392+keriati@users.noreply.github.com >
* Empty List Testing
* Use Namespace for Ownedby
* Update Test
* Add back envconfig
* v1/models docs
* Use ModelName Parser
* Test Names
* Remove Docs
* Clean Up
* Test name
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com >
* Add Middleware for Chat and List
* Testing Cleanup
* Test with Fatal
* Add functionality to chat test
* OpenAI: /v1/models/{model} compatibility (#5028 )
* Retrieve Model
* OpenAI Delete Model
* Retrieve Middleware
* Remove Delete from Branch
* Update Test
* Middleware Test File
* Function name
* Cleanup
* Test Update
* Test Update
---------
Co-authored-by: Attila Kerekes <439392+keriati@users.noreply.github.com >
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com >
2024-07-02 11:50:56 -07:00
Daniel Hiltgen
422dcc3856
Merge pull request #5439 from dhiltgen/fix_centos_7_build
...
Switch ARM64 container image base to rocky 8
2024-07-02 11:01:15 -07:00
Daniel Hiltgen
020bd60ab2
Switch amd container image base to rocky 8
...
The centos 7 arm mirrors have disappeared due to the EOL 2 days
ago, and the vault sed workaround which works for x86 doesn't work for arm.
2024-07-02 10:34:47 -07:00
Daniel Hiltgen
8e277b72bb
Merge pull request #5438 from dhiltgen/fix_centos_7_build
...
Centos 7 EOL broke mirrors
2024-07-02 09:28:00 -07:00
Daniel Hiltgen
4f67b39d26
Centos 7 EOL broke mirrors
...
As of July 1st 2024: Could not resolve host: mirrorlist.centos.org
This is expected due to EOL dates.
2024-07-02 09:22:17 -07:00
Josh
2425281317
Merge pull request #5336 from ollama/jyan/from-errors
...
fix: trim spaces for FROM argument, don't trim inside of quotes
2024-07-01 16:32:46 -07:00
Josh
0403e9860e
Merge pull request #5421 from ollama/jyan/ver
...
fix: add unsupported architecture message for linux/windows
2024-07-01 16:32:14 -07:00
Josh Yan
33a65e3ba3
error
2024-07-01 16:04:13 -07:00
Michael Yang
88bcd79bb9
err on insecure path
2024-07-01 15:55:59 -07:00
Josh Yan
7e571f95f0
trimspace test case
2024-07-01 11:07:48 -07:00
Michael Yang
da8e2a0447
use kvs to detect embedding models
2024-07-01 10:47:43 -07:00
Michael Yang
a30915bde1
add capabilities
2024-07-01 10:47:43 -07:00
Michael Yang
58e3fff311
rename templates to template
2024-07-01 10:40:54 -07:00
Michael Yang
3f0b309ad4
remove ManifestV2
2024-07-01 10:40:54 -07:00
Daniel Hiltgen
e70610ef06
Merge pull request #5410 from dhiltgen/ctx_cleanup
...
Fix case for NumCtx
2024-07-01 09:54:20 -07:00
Daniel Hiltgen
dfded7e075
Merge pull request #5364 from dhiltgen/concurrency_docs
...
Document concurrent behavior and settings
2024-07-01 09:49:48 -07:00
Daniel Hiltgen
173b550438
Remove default auto from help message
...
This may confuse users thinking "auto" is an acceptable string - it must be numeric
2024-07-01 09:48:05 -07:00
Daniel Hiltgen
cff3f44f4a
Fix case for NumCtx
2024-07-01 09:43:59 -07:00
Josh Yan
26e4e66faf
updated parsefile test
2024-07-01 09:43:49 -07:00
Daniel Hiltgen
97c9e11768
Switch use_mmap to a pointer type
...
This uses nil as undefined for a cleaner implementation.
2024-07-01 08:44:59 -07:00
Daniel Hiltgen
3518aaef33
Merge pull request #4218 from dhiltgen/auto_parallel
...
Enable concurrency by default
2024-07-01 08:32:29 -07:00
RAPID ARCHITECT
1963c00201
Update README.md ( #5214 )
...
* Update README.md
Added Mesop example to web & desktop
* Update README.md
---------
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com >
2024-06-30 22:00:57 -04:00
Eduard
27402cb7a2
Update gpu.md ( #5382 )
...
Runs fine on a NVIDIA GeForce GTX 1050 Ti
2024-06-30 21:48:51 -04:00
Jeffrey Morgan
c1218199cf
Update api.md
2024-06-29 16:22:49 -07:00
Jeffrey Morgan
717f7229eb
Do not shift context for sliding window models ( #5368 )
...
* Do not shift context for sliding window models
* truncate prompt > 2/3 tokens
* only target gemma2
2024-06-28 19:39:31 -07:00
Daniel Hiltgen
aae56abb7c
Document concurrent behavior and settings
2024-06-28 13:15:57 -07:00
royjhan
5f034f5b63
Include Show Info in Interactive ( #5342 )
2024-06-28 13:15:52 -07:00
royjhan
b910fa9010
Ollama Show: Check for Projector Type ( #5307 )
...
* Check exists projtype
* Maintain Ordering
2024-06-28 11:30:16 -07:00
royjhan
6d4219083c
Update docs ( #5312 )
2024-06-28 09:58:14 -07:00
Michael Yang
1ed4f521c4
Merge pull request #5340 from ollama/mxyng/mem
...
gemma2 graph
2024-06-27 14:26:49 -07:00
Michael Yang
de2163dafd
gemma2 graph
2024-06-27 13:34:52 -07:00
Josh Yan
9bd00041fa
trim all params
2024-06-27 11:18:38 -07:00
Josh Yan
4e986a823c
unquote, trimp space
2024-06-27 10:59:15 -07:00
Michael
2cc7d05012
update readme for gemma 2 ( #5333 )
...
* update readme for gemma 2
2024-06-27 12:45:16 -04:00
Michael Yang
123a722a6f
zip: prevent extracting files into parent dirs ( #5314 )
2024-06-26 21:38:21 -07:00
Jeffrey Morgan
4d311eb731
llm: architecture patch ( #5316 )
2024-06-26 21:38:12 -07:00
Blake Mizerany
cb42e607c5
llm: speed up gguf decoding by a lot ( #5246 )
...
Previously, some costly things were causing the loading of GGUF files
and their metadata and tensor information to be VERY slow:
* Too many allocations when decoding strings
* Hitting disk for each read of each key and value, resulting in a
not-okay amount of syscalls/disk I/O.
The show API is now down to 33ms from 800ms+ for llama3 on a macbook pro
m3.
This commit also prevents collecting large arrays of values when
decoding GGUFs (if desired). When such keys are encountered, their
values are null, and are encoded as such in JSON.
Also, this fixes a broken test that was not encoding valid GGUF.
2024-06-24 21:47:52 -07:00
Blake Mizerany
2aa91a937b
cmd: defer stating model info until necessary ( #5248 )
...
This commit changes the 'ollama run' command to defer fetching model
information until it really needs it. That is, when in interactive mode.
It also removes one such case where the model information is fetch in
duplicate, just before calling generateInteractive and then again, first
thing, in generateInteractive.
This positively impacts the performance of the command:
; time ./before run llama3 'hi'
Hi! It's nice to meet you. Is there something I can help you with, or would you like to chat?
./before run llama3 'hi' 0.02s user 0.01s system 2% cpu 1.168 total
; time ./before run llama3 'hi'
Hi! It's nice to meet you. Is there something I can help you with, or would you like to chat?
./before run llama3 'hi' 0.02s user 0.01s system 2% cpu 1.220 total
; time ./before run llama3 'hi'
Hi! It's nice to meet you. Is there something I can help you with, or would you like to chat?
./before run llama3 'hi' 0.02s user 0.01s system 2% cpu 1.217 total
; time ./after run llama3 'hi'
Hi! It's nice to meet you. Is there something I can help you with, or would you like to chat?
./after run llama3 'hi' 0.02s user 0.01s system 4% cpu 0.652 total
; time ./after run llama3 'hi'
Hi! It's nice to meet you. Is there something I can help you with, or would you like to chat?
./after run llama3 'hi' 0.01s user 0.01s system 5% cpu 0.498 total
; time ./after run llama3 'hi'
Hi! It's nice to meet you. Is there something I can help you with or would you like to chat?
./after run llama3 'hi' 0.01s user 0.01s system 3% cpu 0.479 total
; time ./after run llama3 'hi'
Hi! It's nice to meet you. Is there something I can help you with, or would you like to chat?
./after run llama3 'hi' 0.02s user 0.01s system 5% cpu 0.507 total
; time ./after run llama3 'hi'
Hi! It's nice to meet you. Is there something I can help you with, or would you like to chat?
./after run llama3 'hi' 0.02s user 0.01s system 5% cpu 0.507 total
2024-06-24 20:14:03 -07:00
Daniel Hiltgen
ccef9431c8
Merge pull request #5205 from dhiltgen/modelfile_use_mmap
...
Fix use_mmap parsing for modelfiles
2024-06-21 16:30:36 -07:00
Daniel Hiltgen
642cee1342
Sort the ps output
...
Provide consistent ordering for the ps command - longest duration listed first
2024-06-21 15:59:41 -07:00
royjhan
9a9e7d83c4
Docs ( #5149 )
2024-06-21 15:52:09 -07:00
Daniel Hiltgen
9929751cc8
Disable concurrency for AMD + Windows
...
Until ROCm v6.2 ships, we wont be able to get accurate free memory
reporting on windows, which makes automatic concurrency too risky.
Users can still opt-in but will need to pay attention to model sizes otherwise they may thrash/page VRAM or cause OOM crashes.
All other platforms and GPUs have accurate VRAM reporting wired
up now, so we can turn on concurrency by default.
2024-06-21 15:45:05 -07:00
Daniel Hiltgen
17b7186cd7
Enable concurrency by default
...
This adjusts our default settings to enable multiple models and parallel
requests to a single model. Users can still override these by the same
env var settings as before. Parallel has a direct impact on
num_ctx, which in turn can have a significant impact on small VRAM GPUs
so this change also refines the algorithm so that when parallel is not
explicitly set by the user, we try to find a reasonable default that fits
the model on their GPU(s). As before, multiple models will only load
concurrently if they fully fit in VRAM.
2024-06-21 15:45:05 -07:00
Michael Yang
189a43caa2
Merge pull request #5206 from ollama/mxyng/quantize
...
fix: quantization with template
2024-06-21 13:44:34 -07:00
Michael Yang
e835ef1836
fix: quantization with template
2024-06-21 13:39:25 -07:00
Daniel Hiltgen
7e7749224c
Fix use_mmap parsing for modelfiles
...
Add the new tristate parsing logic for the code path for modelfiles,
as well as a unit test.
2024-06-21 12:27:19 -07:00
Daniel Hiltgen
c7c2f3bc22
Merge pull request #5194 from dhiltgen/linux_mmap_auto
...
Refine mmap default logic on linux
2024-06-20 11:44:08 -07:00
Daniel Hiltgen
54a79d6a8a
Merge pull request #5125 from dhiltgen/fedora39
...
Bump latest fedora cuda repo to 39
2024-06-20 11:27:24 -07:00
Daniel Hiltgen
5bf5aeec01
Refine mmap default logic on linux
...
If we try to use mmap when the model is larger than the system free space, loading is slower than the no-mmap approach.
2024-06-20 11:07:04 -07:00
Michael Yang
e01e535cbb
Merge pull request #5192 from ollama/mxyng/kv
...
handle asymmetric embedding KVs
2024-06-20 10:46:24 -07:00
Josh
0195d6a2f8
Merge pull request #5188 from ollama/jyan/tmpdir2
...
fix: skip os.removeAll() if PID does not exist
2024-06-20 10:40:59 -07:00
Michael Yang
8e0641a9bf
handle asymmetric embedding KVs
2024-06-20 09:57:27 -07:00
Josh Yan
662568d453
err!=nil check
2024-06-20 09:30:59 -07:00
Josh Yan
4ebb66c662
reformat error check
2024-06-20 09:23:43 -07:00
Josh Yan
23e899f32d
skip os.removeAll() if PID does not exist
2024-06-20 08:51:35 -07:00
royjhan
fedf71635e
Extend api/show and ollama show to return more model info ( #4881 )
...
* API Show Extended
* Initial Draft of Information
Co-Authored-By: Patrick Devine <pdevine@sonic.net >
* Clean Up
* Descriptive arg error messages and other fixes
* Second Draft of Show with Projectors Included
* Remove Chat Template
* Touches
* Prevent wrapping from files
* Verbose functionality
* Docs
* Address Feedback
* Lint
* Resolve Conflicts
* Function Name
* Tests for api/show model info
* Show Test File
* Add Projector Test
* Clean routes
* Projector Check
* Move Show Test
* Touches
* Doc update
---------
Co-authored-by: Patrick Devine <pdevine@sonic.net >
2024-06-19 14:19:02 -07:00
Daniel Hiltgen
97c59be653
Merge pull request #5074 from dhiltgen/app_log_rotation
...
Implement log rotation for tray app
2024-06-19 13:02:24 -07:00
Daniel Hiltgen
9d8a4988e8
Implement log rotation for tray app
2024-06-19 12:53:34 -07:00
Michael Yang
1ae0750a21
Merge pull request #5147 from ollama/mxyng/cleanup
...
remove confusing log message
2024-06-19 12:50:31 -07:00
Michael Yang
9d91e5e587
remove confusing log message
2024-06-19 11:14:11 -07:00
Daniel Hiltgen
96624aa412
Merge pull request #5072 from dhiltgen/windows_path
...
Move libraries out of users path
2024-06-19 09:13:39 -07:00
Daniel Hiltgen
10f33b8537
Merge pull request #5146 from dhiltgen/backout
...
Put back temporary intel GPU env var
2024-06-19 09:12:45 -07:00
Daniel Hiltgen
4a633cc295
Merge pull request #5145 from dhiltgen/bad_loads
...
Fix bad symbol load detection
2024-06-19 09:12:33 -07:00
Daniel Hiltgen
d34d88e417
Revert "Revert "gpu: add env var for detecting Intel oneapi gpus ( #5076 )""
...
This reverts commit 755b4e4fc2 .
2024-06-19 08:57:41 -07:00
Daniel Hiltgen
52ce350b7a
Fix bad symbol load detection
...
pointer deref's weren't correct on a few libraries, which explains
some crashes on older systems or miswired symlinks for discovery libraries.
2024-06-19 08:39:07 -07:00
Daniel Hiltgen
2abebb2cbe
Merge pull request #5128 from zhewang1-intc/fix_levelzero_empty_symbol_detect
...
Fix levelzero empty symbol detect
2024-06-19 08:33:16 -07:00
Blake Mizerany
380e06e5be
types/model: remove Digest
...
The Digest type in its current form is awkward to work with and presents
challenges with regard to how it serializes via String using the '-'
prefix.
We currently only use this in ollama.com, so we'll move our specific
needs around digest parsing and validation there.
2024-06-18 20:28:11 -07:00
Wang,Zhe
badf975e45
get real func ptr.
2024-06-19 09:00:51 +08:00
Wang,Zhe
755b4e4fc2
Revert "gpu: add env var for detecting Intel oneapi gpus ( #5076 )"
...
This reverts commit 163cd3e77c .
2024-06-19 08:59:58 +08:00
Daniel Hiltgen
1a1c99e334
Bump latest fedora cuda repo to 39
2024-06-18 17:13:54 -07:00
Michael Yang
21adf8b6d2
Merge pull request #5121 from ollama/mxyng/deepseekv2
...
deepseek v2 graph
2024-06-18 16:30:58 -07:00
Daniel Hiltgen
784bf88b0d
Wire up windows AMD driver reporting
...
This seems to be ROCm version, not actually driver version, but
it may be useful for toggling logic for VRAM reporting in the future
2024-06-18 16:22:47 -07:00
Michael Yang
e873841cbb
deepseek v2 graph
2024-06-18 15:35:12 -07:00
Daniel Hiltgen
26d0bf9236
Merge pull request #5117 from dhiltgen/fix_prediction
...
Handle models with divergent layer sizes
2024-06-18 11:36:51 -07:00
Daniel Hiltgen
359b15a597
Handle models with divergent layer sizes
...
The recent refactoring of the memory prediction assumed all layers
are the same size, but for some models (like deepseek-coder-v2) this
is not the case, so our predictions were significantly off.
2024-06-18 11:05:34 -07:00
Daniel Hiltgen
b55958a587
Merge pull request #5106 from dhiltgen/clean_logs
...
Tighten up memory prediction logging
2024-06-18 09:24:38 -07:00
Daniel Hiltgen
7784ca33ce
Tighten up memory prediction logging
...
Prior to this change, we logged the memory prediction multiple times
as the scheduler iterates to find a suitable configuration, which can be
confusing since only the last log before the server starts is actually valid.
This now logs once just before starting the server on the final configuration.
It also reports what library instead of always saying "offloading to gpu" when
using CPU.
2024-06-18 09:15:35 -07:00
Daniel Hiltgen
c9c8c98bf6
Merge pull request #5105 from dhiltgen/cuda_mmap
...
Adjust mmap logic for cuda windows for faster model load
2024-06-17 17:07:30 -07:00
Daniel Hiltgen
171796791f
Adjust mmap logic for cuda windows for faster model load
...
On Windows, recent llama.cpp changes make mmap slower in most
cases, so default to off. This also implements a tri-state for
use_mmap so we can detect the difference between a user provided
value of true/false, or unspecified.
2024-06-17 16:54:30 -07:00
Jeffrey Morgan
176d0f7075
Update import.md
2024-06-17 19:44:14 -04:00
Daniel Hiltgen
8ed51cac37
Merge pull request #5103 from dhiltgen/faster_win_build
...
Revert powershell jobs, but keep nvcc and cmake parallelism
2024-06-17 14:23:18 -07:00
Daniel Hiltgen
c9e6f0542d
Merge pull request #5069 from dhiltgen/ci_release
...
Implement custom github release action
2024-06-17 13:59:37 -07:00
Daniel Hiltgen
b0930626c5
Add back lower level parallel flags
...
nvcc supports parallelism (threads) and cmake + make can use -j,
while msbuild requires /p:CL_MPcount=8
2024-06-17 13:44:46 -07:00
Daniel Hiltgen
e890be4814
Revert "More parallelism on windows generate"
...
This reverts commit 0577af98f4 .
2024-06-17 13:32:46 -07:00
Daniel Hiltgen
b2799f111b
Move libraries out of users path
...
We update the PATH on windows to get the CLI mapped, but this has
an unintended side effect of causing other apps that may use our bundled
DLLs to get terminated when we upgrade.
2024-06-17 13:12:18 -07:00
Jeffrey Morgan
152fc202f5
llm: update llama.cpp commit to 7c26775 ( #4896 )
...
* llm: update llama.cpp submodule to `7c26775`
* disable `LLAMA_BLAS` for now
* `-DLLAMA_OPENMP=off`
2024-06-17 15:56:16 -04:00
Lei Jitang
4ad0d4d6d3
Fix a build warning ( #5096 )
...
Signed-off-by: Lei Jitang <leijitang@outlook.com >
2024-06-17 14:47:48 -04:00
Jeffrey Morgan
163cd3e77c
gpu: add env var for detecting Intel oneapi gpus ( #5076 )
...
* gpu: add env var for detecting intel oneapi gpus
* fix build error
2024-06-16 20:09:05 -04:00
Daniel Hiltgen
4c2c8f93dd
Merge pull request #5080 from dhiltgen/debug_intel_crash
...
Add some more debugging logs for intel discovery
2024-06-16 14:42:41 -07:00
Daniel Hiltgen
fd1e6e0590
Add some more debugging logs for intel discovery
...
Also removes an unused overall count variable
2024-06-16 07:42:52 -07:00
royjhan
89c79bec8c
Add ModifiedAt Field to /api/show ( #5033 )
...
* Add Mod Time to Show
* Error Handling
2024-06-15 20:53:56 -07:00
Jeffrey Morgan
c7b77004e3
docs: add missing powershell package to windows development instructions ( #5075 )
...
* docs: add missing instruction for powershell build
The powershell script for building Ollama on Windows now requires the `ThreadJob` module. Add this to the instructions and dependency list.
* Update development.md
2024-06-15 23:08:09 -04:00
Daniel Hiltgen
07d143f412
Merge pull request #5058 from coolljt0725/fix_build_warning
...
gpu: Fix build warning
2024-06-15 11:52:36 -07:00
Daniel Hiltgen
a12283e2ff
Implement custom github release action
...
This implements the release logic we want via gh cli
to support updating releases with rc tags in place and retain
release notes and other community reactions.
2024-06-15 11:36:56 -07:00
Daniel Hiltgen
4b0050cf0e
Merge pull request #5037 from dhiltgen/faster_win_build
...
More parallelism on windows generate
2024-06-15 08:03:05 -07:00
Daniel Hiltgen
0577af98f4
More parallelism on windows generate
...
Make the build faster
2024-06-15 07:44:55 -07:00
Daniel Hiltgen
17ce203a26
Merge pull request #4875 from dhiltgen/rocm_gfx900_workaround
...
Rocm gfx900 workaround
2024-06-15 07:38:58 -07:00
Daniel Hiltgen
d76555ffb5
Merge pull request #4874 from dhiltgen/rocm_v6_bump
...
Rocm v6 bump
2024-06-15 07:38:32 -07:00
Daniel Hiltgen
2786dff5d3
Merge pull request #4264 from dhiltgen/show_gpu_visible_settings
...
Centralize GPU configuration vars
2024-06-15 07:33:52 -07:00
Lei Jitang
225f0d1219
gpu: Fix build warning
...
Signed-off-by: Lei Jitang <leijitang@outlook.com >
2024-06-15 14:26:23 +08:00
Daniel Hiltgen
532db58311
Merge pull request #4972 from jayson-cloude/main
...
fix: "Skip searching for network devices"
2024-06-14 17:04:40 -07:00
Daniel Hiltgen
6be309e1bd
Centralize GPU configuration vars
...
This should aid in troubleshooting by capturing and reporting the GPU
settings at startup in the logs along with all the other server settings.
2024-06-14 15:59:10 -07:00
Daniel Hiltgen
da3bf23354
Workaround gfx900 SDMA bugs
...
Implement support for GPU env var workarounds, and leverage
this for the Vega RX 56 which needs
HSA_ENABLE_SDMA=0 set to work properly
2024-06-14 15:38:13 -07:00
Daniel Hiltgen
26ab67732b
Bump ROCm linux to 6.1.1
2024-06-14 15:37:54 -07:00
Daniel Hiltgen
45cacbaf05
Merge pull request #4517 from dhiltgen/gpu_incremental
...
Enhanced GPU discovery and multi-gpu support with concurrency
2024-06-14 15:35:00 -07:00
Daniel Hiltgen
17df6520c8
Remove mmap related output calc logic
2024-06-14 14:55:50 -07:00
Daniel Hiltgen
6f351bf586
review comments and coverage
2024-06-14 14:55:50 -07:00
Daniel Hiltgen
ff4f0cbd1d
Prevent multiple concurrent loads on the same gpus
...
While models are loading, the VRAM metrics are dynamic, so try
to load on a GPU that doesn't have a model actively loading, or wait
to avoid races that lead to OOMs
2024-06-14 14:51:40 -07:00
Daniel Hiltgen
fc37c192ae
Refine CPU load behavior with system memory visibility
2024-06-14 14:51:40 -07:00
Daniel Hiltgen
434dfe30c5
Reintroduce nvidia nvml library for windows
...
This library will give us the most reliable free VRAM reporting on windows
to enable concurrent model scheduling.
2024-06-14 14:51:40 -07:00
Daniel Hiltgen
4e2b7e181d
Refactor intel gpu discovery
2024-06-14 14:51:40 -07:00
Daniel Hiltgen
48702dd149
Harden unload for empty runners
2024-06-14 14:51:40 -07:00
Daniel Hiltgen
68dfc6236a
refined test timing
...
adjust timing on some tests so they don't timeout on small/slow GPUs
2024-06-14 14:51:40 -07:00
Daniel Hiltgen
5e8ff556cb
Support forced spreading for multi GPU
...
Our default behavior today is to try to fit into a single GPU if possible.
Some users would prefer the old behavior of always spreading across
multiple GPUs even if the model can fit into one. This exposes that
tunable behavior.
2024-06-14 14:51:40 -07:00
Daniel Hiltgen
6fd04ca922
Improve multi-gpu handling at the limit
...
Still not complete, needs some refinement to our prediction to understand the
discrete GPUs available space so we can see how many layers fit in each one
since we can't split one layer across multiple GPUs we can't treat free space
as one logical block
2024-06-14 14:51:40 -07:00
Daniel Hiltgen
206797bda4
Fix concurrency integration test to work locally
...
This worked remotely but wound up trying to spawn multiple servers
locally which doesn't work
2024-06-14 14:51:40 -07:00
Daniel Hiltgen
43ed358f9a
Refine GPU discovery to bootstrap once
...
Now that we call the GPU discovery routines many times to
update memory, this splits initial discovery from free memory
updating.
2024-06-14 14:51:40 -07:00
Daniel Hiltgen
b32ebb4f29
Use DRM driver for VRAM info for amd
...
The amdgpu drivers free VRAM reporting omits some other apps, so leverage the
upstream DRM driver which keeps better tabs on things
2024-06-14 14:51:40 -07:00
Daniel Hiltgen
fb9cdfa723
Fix server.cpp for the new cuda build macros
2024-06-14 14:51:40 -07:00
Daniel Hiltgen
efac488675
Revert "Limit GPU lib search for now ( #4777 )"
...
This reverts commit 476fb8e892 .
2024-06-14 14:51:40 -07:00
Jeffrey Morgan
6b800aa7b7
openai: do not set temperature to 0 when setting seed ( #5045 )
2024-06-14 13:43:56 -07:00
Jeffrey Morgan
dd7c9ebeaf
server: longer timeout in TestRequests ( #5046 )
2024-06-14 09:48:25 -07:00
Patrick Devine
4dc7fb9525
update 40xx gpu compat matrix ( #5036 )
2024-06-13 17:10:33 -07:00
Daniel Hiltgen
c39761c552
Merge pull request #5032 from dhiltgen/actually_skip
...
Actually skip PhysX on windows
2024-06-13 13:26:09 -07:00
Daniel Hiltgen
aac367636d
Actually skip PhysX on windows
2024-06-13 13:17:19 -07:00
Michael Yang
15a687ae4b
Merge pull request #5031 from ollama/mxyng/fix-multibyte-utf16
...
fix: multibyte utf16
2024-06-13 13:14:55 -07:00
Michael Yang
d528e1af75
fix utf16 for multibyte runes
2024-06-13 13:07:42 -07:00
Michael Yang
cd234ce22c
parser: add test for multibyte runes
2024-06-13 13:07:42 -07:00
Patrick Devine
94618b2365
add OLLAMA_MODELS to envconfig ( #5029 )
2024-06-13 12:52:03 -07:00
Jeffrey Morgan
1fd236d177
server: remove jwt decoding error ( #5027 )
2024-06-13 11:21:15 -07:00
Michael Yang
e87fc7200d
Merge pull request #5025 from ollama/mxyng/revert-parser-scan
...
Revert "proper utf16 support"
2024-06-13 10:31:25 -07:00
Michael Yang
20b9f8e6f4
Revert "proper utf16 support"
...
This reverts commit 66ab48772f .
this change broke utf-8 scanning of multi-byte runes
2024-06-13 10:22:16 -07:00
Patrick Devine
c69bc19e46
move OLLAMA_HOST to envconfig ( #5009 )
2024-06-12 18:48:16 -04:00
Michael Yang
bba5d177aa
Merge pull request #5004 from ollama/mxyng/fix-templates
...
fix: multiple templates when creating from model
2024-06-12 14:39:29 -07:00
Michael Yang
c16f8af911
fix: multiple templates when creating from model
...
multiple templates may appear in a model if a model is created from
another model that 1) has an autodetected template and 2) defines a
custom template
2024-06-12 13:35:49 -07:00
Michael Yang
217f60c3d9
Merge pull request #4987 from ollama/mxyng/revert-byte-order
...
Revert "Merge pull request #4938 from ollama/mxyng/fix-byte-order"
2024-06-11 16:04:20 -07:00
Michael Yang
7bdcd1da94
Revert "Merge pull request #4938 from ollama/mxyng/fix-byte-order"
...
This reverts commit f5f245cc15 , reversing
changes made to 94d37fdcae .
this change broke gguf v2 which is incorrectly detected as big endian
2024-06-11 15:56:17 -07:00
Jeffrey Morgan
ead259d877
llm: fix seed value not being applied to requests ( #4986 )
2024-06-11 14:24:41 -07:00
James Montgomery
2ff45d571d
Add Ollama-hpp to Community Libraries in README. ( #4983 )
2024-06-11 11:15:05 -07:00
jayson-cloude
157f09acdf
fix: "Skip searching for network devices"
...
On an Ubuntu 24.04 computer with vmware installed, the sudo lshw command will get stuck. "Network interfaces" is always displayed
2024-06-11 16:11:35 +08:00
Michael Yang
0f3cf1d42e
Merge pull request #4715 from ollama/mxyng/utf16-parser
...
proper utf16 support
2024-06-10 11:41:29 -07:00
Michael Yang
5bc029c529
Merge pull request #4921 from ollama/mxyng/import-md
...
update import.md
2024-06-10 11:41:09 -07:00
Michael Yang
e9a9c6a8e8
Merge pull request #4965 from ollama/mxyng/skip-layer-remove
...
fix: skip removing layers that no longer exist
2024-06-10 11:40:03 -07:00
Michael Yang
515f497e6d
fix: skip removing layers that no longer exist
2024-06-10 11:32:19 -07:00
Michael Yang
b27268aaef
add test
2024-06-10 11:32:15 -07:00
Michael Yang
f5f245cc15
Merge pull request #4938 from ollama/mxyng/fix-byte-order
...
fix parsing big endian gguf
2024-06-10 09:38:12 -07:00
Jim Scardelis
94d37fdcae
fix: examples/langchain-python-rag-privategpt/requirements.txt ( #3382 )
2024-06-09 10:58:09 -07:00
Craig Hughes
b84aea1685
Critical fix from llama.cpp JSON grammar to forbid un-escaped escape characters inside strings, which breaks parsing. ( #3782 )
2024-06-09 10:57:09 -07:00
Napuh
896495de7b
Add instructions to easily install specific versions on faq.md ( #4084 )
...
* Added instructions to easily install specific versions on faq.md
* Small typo
* Moved instructions on how to install specific version to linux.md
* Update docs/linux.md
* Update docs/linux.md
---------
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com >
2024-06-09 10:49:03 -07:00
dcasota
5528dd9d11
Error handling load_single_document() in ingest.py ( #4852 )
...
load_single_document() handles
- corrupt files
- empty (zero byte) files
- unsupported file extensions
2024-06-09 10:41:07 -07:00
Jeffrey Morgan
943172cbf4
Update api.md
2024-06-08 23:04:32 -07:00
Nischal Jain
85169e8d6f
Added headless-ollama ( #4612 )
2024-06-08 18:51:16 -07:00
Jeffrey Morgan
34f142797a
llm: always add bos token to prompt ( #4941 )
...
* fix embedding by adding fixes from llama.cpp upstream
* remove assert
---------
Co-authored-by: Jesper Ek <deadbeef84@gmail.com >
2024-06-08 18:47:10 -07:00
Erhan
46a7f1e74a
Update README.md with LangChainRust ( #4854 )
2024-06-08 17:29:36 -07:00
Michael Yang
620d5c569e
fix parsing big endian gguf
2024-06-08 12:35:26 -07:00
Michael Yang
b9ce7bf75e
update import.md
2024-06-07 16:45:15 -07:00
Daniel Hiltgen
cddc63381c
Merge pull request #4909 from dhiltgen/oneapi_disable
...
Add ability to skip oneapi generate
2024-06-07 14:07:15 -07:00
Michael Yang
385a32ecb5
Merge pull request #4910 from ollama/mxyng/detect-chat-template
...
fix create model when template detection errors
2024-06-07 11:07:39 -07:00
Michael Yang
030e765e76
fix create model when template detection errors
2024-06-07 10:51:35 -07:00
Daniel Hiltgen
ab8c929e20
Add ability to skip oneapi generate
...
This follows the same pattern for cuda and rocm to allow
disabling the build even when we detect the dependent libraries
2024-06-07 08:32:49 -07:00
Jeffrey Morgan
ce0dc33cb8
llm: patch to fix qwen 2 temporarily on nvidia ( #4897 )
2024-06-06 23:14:33 -07:00
Michael Yang
78f81fc0e5
Merge pull request #4800 from ollama/mxyng/detect-chat-template
...
detect chat template from KV
2024-06-06 16:17:18 -07:00
Michael Yang
9b6c2e6eb6
detect chat template from KV
2024-06-06 16:03:47 -07:00
royjhan
1a29e9a879
API app/browser access ( #4879 )
...
* API app/browser access
* Add tauri (resolves #2291 , #4791 , #3799 , #4388 )
2024-06-06 15:19:03 -07:00
royjhan
4bf1da4944
Separate ListResponse and ModelResponse for api/tags vs api/ps ( #4842 )
...
* Remove false time fields
* Struct Separation for List and Process
* Remove Marshaler
2024-06-06 10:11:45 -07:00
Blake Mizerany
de5beb06b3
server: skip blob verification for already verified blobs
2024-06-05 16:39:11 -07:00
Sam
98e65929dc
docs(tools): add gollama ( #4829 )
2024-06-05 14:13:39 -07:00
Michael Yang
66ab48772f
proper utf16 support
2024-06-05 13:11:50 -07:00
Michael Yang
22fcf8f7de
Merge pull request #3737 from ollama/mxyng/modelname-4
...
update create handler to use model.Name
2024-06-05 12:05:05 -07:00
royjhan
28c7813ac4
API PS Documentation ( #4822 )
...
* API PS Documentation
2024-06-05 11:06:53 -07:00
Kartikeya Mishra
1d8616d30f
docs: update to add LLocal.in to web & desktop integrations ( #4719 )
2024-06-04 14:43:59 -07:00
Michael Yang
d61ef8b954
update create handler to use model.Name
2024-06-04 13:28:25 -07:00
Michael Yang
89d9900152
Merge pull request #4570 from ollama/mxyng/slices
...
lint some of the things
2024-06-04 13:27:05 -07:00
Michael
4a048715b6
local wording was confusing people
...
local wording was confusing people -- Ollama runs on cloud providers
2024-06-04 13:25:25 -07:00
Michael Yang
6297f85606
gofmt, goimports
2024-06-04 13:20:24 -07:00
Michael Yang
ed56428dd7
warn on intrange, usestdlibvars
2024-06-04 11:52:48 -07:00
Michael Yang
ad40b92b6a
disable intrange
2024-06-04 11:35:30 -07:00
Michael Yang
8ce4032e72
more lint
2024-06-04 11:13:30 -07:00
Michael Yang
42660466f8
no usestdlibvars
2024-06-04 11:13:30 -07:00
Michael Yang
e919f6811f
lint windows
2024-06-04 11:13:30 -07:00
Michael Yang
bf7edb0d5d
lint linux
2024-06-04 11:13:30 -07:00
Michael Yang
f38353d6b9
stdin.fd
2024-06-04 11:13:30 -07:00
Michael Yang
201d853fdf
nolintlint
2024-06-04 11:13:30 -07:00
Michael Yang
e40145a39d
lint
2024-06-04 11:13:30 -07:00
Michael Yang
c895a7d13f
some gocritic
2024-06-04 11:13:30 -07:00
Michael Yang
dad7a987ae
nosprintfhostport
2024-06-04 11:13:30 -07:00
Michael Yang
8ffb51749f
nolintlint
2024-06-04 11:13:30 -07:00
Michael Yang
55f6eba049
gofmt
2024-06-04 11:13:30 -07:00
Michael Yang
04f3c12bb7
replace x/exp/slices with slices
2024-06-04 11:13:30 -07:00
Shubham
60323e0805
add embed model command and fix question invoke ( #4766 )
...
* add embed model command and fix question invoke
* Update docs/tutorials/langchainpy.md
Co-authored-by: Kim Hallberg <hallberg.kim@gmail.com >
* Update docs/tutorials/langchainpy.md
---------
Co-authored-by: Kim Hallberg <hallberg.kim@gmail.com >
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com >
2024-06-03 22:20:48 -07:00
Jeffrey Morgan
d4a86102fd
update welcome prompt in windows to llama3 ( #4779 )
2024-06-01 21:05:51 -07:00
Jeffrey Morgan
476fb8e892
Limit GPU lib search for now ( #4777 )
...
* fix oneapi errors on windows 10
2024-06-01 19:24:33 -07:00
Michael Yang
829ff87bd1
revert tokenize ffi ( #4761 )
...
* Revert "use `int32_t` for call to tokenize (#4738 )"
This reverts commit 763bb65dbb .
* Revert "vocab only"
This reverts commit bf54c845e9 .
* Revert "use ffi for tokenizing/detokenizing"
This reverts commit 26a00a0410 .
2024-05-31 18:54:21 -07:00
Josh
f6b622c4b3
Merge pull request #4733 from ollama/jyan/isvalidname
...
added IsValidNamespace function
2024-05-31 14:08:45 -07:00
Josh Yan
2e4da8eec2
added tests for IsValidNamespace
2024-05-31 11:48:07 -07:00
Jeffrey Morgan
763bb65dbb
use int32_t for call to tokenize ( #4738 )
...
* use `int32_t` for call to tokenize
* variable naming
* cleanup
* fix crash
2024-05-30 21:43:30 -07:00
Jeffrey Morgan
7ca9605f54
speed up tests by only building static lib ( #4740 )
2024-05-30 21:43:15 -07:00
Michael Yang
eb2c443a79
Merge pull request #4736 from ollama/mxyng/vocab-only
...
vocab only for tokenize
2024-05-30 17:21:00 -07:00
Michael Yang
278e25ea44
Merge pull request #4737 from ollama/mxyng/less-generate
...
only generate on relevant changes
2024-05-30 17:17:50 -07:00
Jeffrey Morgan
a50a87a7b8
partial offloading: allow flash attention and disable mmap ( #4734 )
...
* partial offloading: allow flash attention and disable mmap
* allow mmap with num_gpu=0
2024-05-30 16:58:01 -07:00
Michael Yang
98085015d5
only generate on relevant changes
2024-05-30 16:54:11 -07:00
Michael Yang
bf54c845e9
vocab only
2024-05-30 16:49:28 -07:00
Josh Yan
c365f195a8
directly use isvalidpart
2024-05-30 16:40:04 -07:00
Josh
e91d0ef737
Merge pull request #4728 from ollama/jyan/japanese
...
fixed japanese characters deleted at end of line
2024-05-30 16:25:12 -07:00
Jeffrey Morgan
22f5c12ced
Update llama.cpp submodule to 5921b8f0 ( #4731 )
...
* update llama.cpp submodule to `5921b8f089d3b7bda86aac5a66825df6a6c10603`
* add patch
2024-05-30 16:20:22 -07:00
Josh Yan
298c996e54
added IsValidNamespace function
2024-05-30 16:02:07 -07:00
Daniel Hiltgen
0fc0cfc6d2
Merge pull request #4594 from dhiltgen/doc_container_workarounds
...
Add isolated gpu test to troubleshooting
2024-05-30 13:10:54 -07:00
Josh Yan
914f68f021
replaced duplicate call with variable
2024-05-30 10:38:07 -07:00
Josh Yan
bd1d119ba9
fixed japanese characters deleted at end of line
2024-05-30 10:24:21 -07:00
Lei Jitang
a03be18189
Fix OLLAMA_LLM_LIBRARY with wrong map name and add more env vars to help message ( #4663 )
...
* envconfig/config.go: Fix wrong description of OLLAMA_LLM_LIBRARY
Signed-off-by: Lei Jitang <leijitang@outlook.com >
* serve: Add more env to help message of ollama serve
Add more enviroment variables to `ollama serve --help`
to let users know what can be configurated.
Signed-off-by: Lei Jitang <leijitang@outlook.com >
---------
Signed-off-by: Lei Jitang <leijitang@outlook.com >
2024-05-30 09:36:51 -07:00
Michael Yang
96bc232b43
Merge pull request #4413 from ollama/mxyng/name-check
...
check if name exists before create/pull/copy
2024-05-29 12:06:58 -07:00
Michael Yang
bca7b12284
Merge pull request #3718 from ollama/mxyng/modelname-3
...
update delete handler to use model.Name
2024-05-29 12:02:07 -07:00
Michael Yang
32cb1960c1
Merge pull request #4380 from ollama/mxyng/tokenize
...
use tokenize/detokenize
2024-05-29 12:00:59 -07:00
Michael Yang
de781b37c8
rm unused infill
2024-05-29 11:26:47 -07:00
Michael Yang
3e21799377
rm unused system prompt
2024-05-29 11:26:47 -07:00
Michael Yang
26a00a0410
use ffi for tokenizing/detokenizing
2024-05-29 11:26:47 -07:00
Daniel Hiltgen
646371f56d
Merge pull request #3278 from zhewang1-intc/rebase_ollama_main
...
Enabling ollama to run on Intel GPUs with SYCL backend
2024-05-28 16:30:50 -07:00
Jeffrey Morgan
1f5008544b
Update install.sh
2024-05-28 15:01:22 -07:00
Jeffrey Morgan
45cbfc5aee
fix wsl2 status check for nvidia cards ( #4689 )
2024-05-28 14:49:46 -07:00
Jeffrey Morgan
6d423b383b
Improve install experience on WSL2 and Linux ( #4653 )
2024-05-28 14:41:50 -07:00
Josh
ad897080a2
working on integration of multi-byte and multi-width runes ( #4549 )
...
* integrated runewidth for display management - fixed cursor movement for mutli-width char
* updated input and deletion of multi-byte chars
* fixed line history with some exceptions
* improved insert and add
* fixed issues with moving across lines
* end of line extra space tracking'
* saved changes
* fixed end of line issues with empty spaces
* worked some more
* worked on end of line
* fixed failed test
* fixed minor inserting bug
* fixed movement hotkeys
* adjusted hotkeys
* removed comments
* Update readline/buffer.go
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com >
* Update readline/buffer.go
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com >
* Update readline/buffer.go
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com >
* Update readline/buffer.go
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com >
* Update readline/buffer.go
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com >
* Update readline/buffer.go
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com >
* Update readline/buffer.go
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com >
* Update readline/buffer.go
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com >
* Update readline/buffer.go
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com >
* Update readline/buffer.go
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com >
* Update readline/buffer.go
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com >
* Update readline/buffer.go
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com >
* deleted comments and duplicate code
* removed duplicate code
* added comments, refactored add function to use addChar
* added helper to retrieve lineSpacing, renamed lineFlags for clarity
* fixed remove()
---------
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com >
2024-05-28 12:04:03 -07:00
Jeffrey Morgan
b7d316d98d
fix nvidia detection in install script ( #4683 )
2024-05-28 09:59:36 -07:00
Daniel Hiltgen
d7339fad52
Merge pull request #4682 from dhiltgen/more_time
...
Give the final model loading more time
2024-05-28 09:36:02 -07:00
Daniel Hiltgen
92c81e8117
Give the final model loading more time
...
On some systems, 1 minute isn't sufficient to finish the load after it
hits 100% This creates 2 distinct timers, although they're both set to
the same value for now so we can refine the timeouts further.
2024-05-28 09:08:10 -07:00
Tai
9db0996ed4
Add OllamaSpring Project to Readme ( #4672 )
...
* Add OllamaSpring Project to Readme
* Update README.md
---------
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com >
2024-05-27 19:58:26 -07:00
Orfeo Ciano
6f43898b17
Adds olpaka flutter client ( #4647 )
...
* Adds olpaka flutter client
* Update README.md
---------
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com >
2024-05-27 17:22:01 -07:00
Lei Jitang
7487229c34
llm/server.go: Fix 2 minor typos ( #4661 )
...
Signed-off-by: Lei Jitang <leijitang@outlook.com >
2024-05-27 17:21:10 -07:00
Rayan Mostovoi
8a8e7afa96
small fix on examples/python-simplechat/client.py to actually get a streamed response and get tokens printed as we receive it ( #4671 )
2024-05-27 17:19:20 -07:00
Jeffrey Morgan
c79f8c9c39
Ensure nvidia and nvidia_uvm kernel modules are loaded in install.sh script and at startup ( #4652 )
...
* ensure kernel modules are loaded in `install.sh` script and at startup
* indentation
* use `SUDO` variable
* restart if nouveau is detected
* consistent success message for AMD
2024-05-26 14:57:17 -07:00
Jeffrey Morgan
485016bfbb
Update install.sh
2024-05-26 11:46:00 -07:00
Daniel Hiltgen
0165ba1651
Merge pull request #4638 from dhiltgen/better_error
...
Report better warning on client closed abort of load
2024-05-25 14:32:28 -07:00
Daniel Hiltgen
c4209d6d21
Report better warning on client closed abort of load
...
If the client closes the connection before we finish loading the model
we abort, so lets make the log message clearer why to help users
understand this failure mode
2024-05-25 09:23:28 -07:00
Michael Yang
6adca97f37
Merge pull request #4619 from noxer/patch-1
...
Fix download retry issue
2024-05-24 17:21:57 -07:00
Michael Yang
9a3c8003c8
Merge pull request #4624 from ollama/mxyng/fix-5
...
fix q5_0, q5_1
2024-05-24 16:11:21 -07:00
Michael Yang
d51f15257c
Update llm/ggml.go
...
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com >
2024-05-24 16:10:43 -07:00
Michael Yang
8f440d579a
fix q5_0, q5_1
2024-05-24 16:01:46 -07:00
Patrick Devine
4cc3be3035
Move envconfig and consolidate env vars ( #4608 )
2024-05-24 14:57:15 -07:00
Tim Scheuermann
db2ffa79f1
Fix download retry issue
2024-05-24 20:30:42 +02:00
Jeffrey Morgan
afd2b058b4
set codesign timeout to longer ( #4605 )
2024-05-23 22:46:23 -07:00
Wang,Zhe
fd5971be0b
support ollama run on Intel GPUs
2024-05-24 11:18:27 +08:00
Daniel Hiltgen
89bf98bcf2
Merge pull request #4598 from dhiltgen/docs
...
Tidy up developer guide a little
2024-05-23 15:14:29 -07:00
Daniel Hiltgen
1b2d156094
Tidy up developer guide a little
2024-05-23 15:14:05 -07:00
Michael Yang
714adb8bd1
bump ( #4597 )
2024-05-23 14:16:26 -07:00
Daniel Hiltgen
95b1133d0c
Merge pull request #4547 from dhiltgen/load_progress
...
Wire up load progress
2024-05-23 14:06:02 -07:00
Daniel Hiltgen
b37b496a12
Wire up load progress
...
This doesn't expose a UX yet, but wires the initial server portion
of progress reporting during load
2024-05-23 13:36:48 -07:00
Bruce MacDonald
d6f692ad1a
Add support for IQ1_S, IQ3_S, IQ2_S, IQ4_XS. IQ4_NL ( #4322 )
...
Co-authored-by: ManniX-ITA <20623405+mann1x@users.noreply.github.com >
2024-05-23 13:21:49 -07:00
Daniel Hiltgen
f77713bf1f
Add isolated gpu test to troubleshooting
2024-05-23 09:33:25 -07:00
Jeffrey Morgan
38255d2af1
Use flash attention flag for now ( #4580 )
...
* put flash attention behind flag for now
* add test
* remove print
* up timeout for sheduler tests
2024-05-22 21:52:09 -07:00
Michael
73630a7e85
add phi 3 medium ( #4578 )
2024-05-22 12:53:45 -04:00
Ikko Eltociear Ashimine
955c317cab
chore: update tokenizer.go ( #4571 )
...
PreTokenziers -> PreTokenizers
2024-05-22 00:25:23 -07:00
Josh
9f18b88a06
Merge pull request #4566 from ollama/jyan/shortcuts
...
add Ctrl + W shortcut
2024-05-21 22:49:36 -07:00
Josh Yan
353f83a9c7
add Ctrl + W shortcut
2024-05-21 16:55:09 -07:00
Patrick Devine
3bade04e10
doc updates for the faq/troubleshooting ( #4565 )
2024-05-21 15:30:09 -07:00
Michael Yang
a6d0f443eb
Merge pull request #4543 from ollama/mxyng/simple-safetensors
...
simplify safetensors reading
2024-05-21 14:43:55 -07:00
Michael Yang
96236b7968
Merge pull request #4268 from ollama/pdevine/llama3
...
Convert directly from llama3
2024-05-21 14:43:37 -07:00
Sang Park
4434d7f447
Correct typo in error message ( #4535 )
...
The spelling of the term "request" has been corrected, which was previously mistakenly written as "requeset" in the error log message.
2024-05-21 13:39:01 -07:00
Michael Yang
171eb040fc
simplify safetensors reading
2024-05-21 11:28:22 -07:00
Michael Yang
3591bbe56f
add test
2024-05-21 11:28:22 -07:00
Michael Yang
34d5ef29b3
fix conversion for f16 or f32 inputs
2024-05-21 11:28:22 -07:00
Michael Yang
bbbd9f20f3
cleanup
2024-05-20 16:13:57 -07:00
Michael Yang
547132e820
bpe pretokenizer
2024-05-20 16:13:57 -07:00
Patrick Devine
2d315ba9a9
add missing file
2024-05-20 16:13:57 -07:00
Patrick Devine
d355d2020f
add fixes for llama
2024-05-20 16:13:57 -07:00
Patrick Devine
c8cf0d94ed
llama3 conversion
2024-05-20 16:13:57 -07:00
Patrick Devine
4730762e5c
add safetensors version
2024-05-20 16:13:57 -07:00
Patrick Devine
d88582dffd
some changes for llama3
2024-05-20 16:13:57 -07:00
Michael Yang
2f81b3dce2
Merge pull request #4502 from ollama/mxyng/fix-quantize
...
fix quantize file types
2024-05-20 16:09:27 -07:00
jmorganca
5cab13739e
set llama.cpp submodule commit to 614d3b9
2024-05-20 15:28:17 -07:00
Josh Yan
8aadad9c72
updated updateURL
2024-05-20 15:24:32 -07:00
Michael Yang
807d092761
fix quantize file types
2024-05-20 15:22:11 -07:00
Michael Yang
f36f1d6be9
tidy intermediate blobs
2024-05-20 15:15:06 -07:00
alwqx
8800c8a59b
chore: fix typo in docs ( #4536 )
2024-05-20 14:19:03 -07:00
Michael Yang
b4dce13309
Merge pull request #4330 from ollama/mxyng/cache-intermediate-layers
...
cache and reuse intermediate blobs
2024-05-20 13:54:41 -07:00
Sam
e15307fdf4
feat: add support for flash_attn ( #4120 )
...
* feat: enable flash attention if supported
* feat: enable flash attention if supported
* feat: enable flash attention if supported
* feat: add flash_attn support
2024-05-20 13:36:03 -07:00
Michael Yang
3520c0e4d5
cache and reuse intermediate blobs
...
particularly useful for zipfiles and f16s
2024-05-20 13:25:10 -07:00
Patrick Devine
ccdf0b2a44
Move the parser back + handle utf16 files ( #4533 )
2024-05-20 11:26:45 -07:00
jmorganca
63a453554d
go mod tidy
2024-05-19 23:03:57 -07:00
Patrick Devine
105186aa17
add OLLAMA_NOHISTORY to turn off history in interactive mode ( #4508 )
2024-05-18 11:51:57 -07:00
Daniel Hiltgen
ba04afc9a4
Merge pull request #4483 from dhiltgen/clean_exit
...
Don't return error on signal exit
2024-05-17 11:41:57 -07:00
Daniel Hiltgen
7e1e0086e7
Merge pull request #4482 from dhiltgen/integration_improvements
...
Skip max queue test on remote
2024-05-16 16:43:48 -07:00
Daniel Hiltgen
02b31c9dc8
Don't return error on signal exit
2024-05-16 16:25:38 -07:00
Daniel Hiltgen
7f2fbad736
Skip max queue test on remote
...
This test needs to be able to adjust the queue size down from
our default setting for a reliable test, so it needs to skip on
remote test execution mode.
2024-05-16 16:24:18 -07:00
Josh
5bece94509
Merge pull request #4463 from ollama/jyan/line-display
...
changed line display to be calculated with runewidth
2024-05-16 14:15:08 -07:00
Josh Yan
3d90156e99
removed comment
2024-05-16 14:12:03 -07:00
Rose Heart
5e46c5c435
Updating software for read me ( #4467 )
...
* Update README.md
Added chat/moderation bot to list of software.
* Update README.md
Fixed link error.
2024-05-16 13:55:14 -07:00
Jeffrey Morgan
583c1f472c
update llama.cpp submodule to 614d3b9 ( #4414 )
2024-05-16 13:53:09 -07:00
Josh Yan
26bfc1c443
go fmt'd cmd.go
2024-05-15 17:26:39 -07:00
Josh Yan
799aa9883c
go fmt'd cmd.go
2024-05-15 17:24:17 -07:00
Michael Yang
84ed77cbd8
Merge pull request #4436 from ollama/mxyng/done-part
...
return on part done
2024-05-15 17:16:24 -07:00
Josh Yan
c9e584fb90
updated double-width display
2024-05-15 16:45:24 -07:00
Josh Yan
17b1e81ca1
fixed width and word count for double spacing
2024-05-15 16:29:33 -07:00
Daniel Hiltgen
7e9a2da097
Merge pull request #4462 from dhiltgen/opt_out_build
...
Port cuda/rocm skip build vars to linux
2024-05-15 16:27:47 -07:00
Daniel Hiltgen
c48c1d7c46
Port cuda/rocm skip build vars to linux
...
Windows already implements these, carry over to linux.
2024-05-15 15:56:43 -07:00
Patrick Devine
d1692fd3e0
fix the cpu estimatedTotal memory + get the expiry time for loading models ( #4461 )
2024-05-15 15:43:16 -07:00
Daniel Hiltgen
5fa36a0833
Merge pull request #4459 from dhiltgen/sanitize_env_log
...
Sanitize the env var debug log
2024-05-15 14:58:55 -07:00
Daniel Hiltgen
853ae490e1
Sanitize the env var debug log
...
Only dump env vars we care about in the logs
2024-05-15 14:42:57 -07:00
Patrick Devine
f2cf97d6f1
fix typo in modelfile generation ( #4439 )
2024-05-14 15:34:29 -07:00
Patrick Devine
c344da4c5a
fix keepalive for non-interactive mode ( #4438 )
2024-05-14 15:17:04 -07:00
Michael Yang
85a57006d1
check if name exists before create/pull/copy
2024-05-14 14:58:58 -07:00
Michael Yang
c5e892cb3e
update tests
2024-05-14 14:56:31 -07:00
Michael Yang
81fb06f530
more resilient Manifests
2024-05-14 14:08:24 -07:00
Michael Yang
a385382ff5
filepath.Join
2024-05-14 14:08:24 -07:00
Michael Yang
b8772a353f
remove DeleteModel
2024-05-14 14:08:24 -07:00
Michael Yang
c2714fcbfd
routes: use Manifests for ListHandler
2024-05-14 14:08:24 -07:00
Michael Yang
a2fc933fed
update delete handler to use model.Name
2024-05-14 14:08:24 -07:00
Michael Yang
0e331c7168
Merge pull request #4328 from ollama/mxyng/mem
...
count memory up to NumGPU if set by user
2024-05-14 13:47:44 -07:00
Michael Yang
ac145f75ca
return on part done
2024-05-14 13:04:30 -07:00
Patrick Devine
a4b8d1f89a
re-add system context ( #4435 )
2024-05-14 11:38:20 -07:00
Ryo Machida
798b107f19
Fixed the API endpoint /api/tags when the model list is empty. ( #4424 )
...
* Fixed the API endpoint /api/tags to return {models: []} instead of {models: null} when the model list is empty.
* Update server/routes.go
---------
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com >
2024-05-14 11:18:10 -07:00
Daniel Hiltgen
6a1b471365
Merge pull request #4430 from dhiltgen/gpu_info
...
Remove VRAM convergence check for windows
2024-05-14 10:59:06 -07:00
Daniel Hiltgen
ec231a7923
Remove VRAM convergence check for windows
...
The APIs we query are optimistic on free space, and windows pages
VRAM, so we don't have to wait to see reported usage recover on unload
2024-05-14 09:53:46 -07:00
Patrick Devine
7ca71a6b0f
don't abort when an invalid model name is used in /save ( #4416 )
2024-05-13 18:48:28 -07:00
Josh
7607e6e902
Merge pull request #4379 from WolfTheDeveloper/main
...
Update `LlamaScript` to point to new link from Legacy link.
2024-05-13 18:08:32 -07:00
Patrick Devine
f1548ef62d
update the FAQ to be more clear about windows env variables ( #4415 )
2024-05-13 18:01:13 -07:00
Patrick Devine
6845988807
Ollama ps command for showing currently loaded models ( #4327 )
2024-05-13 17:17:36 -07:00
Josh
9eed4a90ce
Merge pull request #4411 from joshyan1/main
...
removed inconsistent punctuation
2024-05-13 15:30:45 -07:00
Josh Yan
f8464785a6
removed inconsistencies
2024-05-13 14:50:52 -07:00
Michael Yang
1d359e737e
typo
2024-05-13 14:18:34 -07:00
Michael Yang
50b9056e09
count memory up to NumGPU
2024-05-13 14:13:10 -07:00
Josh Yan
91a090a485
removed inconsistent punctuation
2024-05-13 14:08:22 -07:00
睡觉型学渣
9c76b30d72
Correct typos. ( #4387 )
...
* Correct typos.
* Correct typos.
2024-05-12 18:21:11 -07:00
Zander Lewis
93f19910c5
Update LlamaScript to point to new link.
...
Still used Legacy link.
2024-05-12 11:24:21 -04:00
jmorganca
4ec7445a6f
Revert "use post token"
...
This reverts commit 0fec3525ad .
2024-05-11 22:19:14 -07:00
Michael Yang
0372c51f82
Merge pull request #4369 from ollama/mxyng/post-token
...
use post token
2024-05-11 19:29:14 -07:00
Michael Yang
0fec3525ad
use post token
2024-05-11 19:13:16 -07:00
Jeffrey Morgan
41ba3017fd
Fix OpenAI finish_reason values when empty ( #4368 )
2024-05-11 15:31:41 -07:00
todashuta
8080fbce35
fix ollama create's usage string ( #4362 )
2024-05-11 14:47:49 -07:00
Michael Yang
ec14f6ceda
case sensitive filepaths ( #4366 )
2024-05-11 14:12:36 -07:00
Daniel Hiltgen
c60a086635
Merge pull request #4331 from dhiltgen/fix_unit
...
Fix envconfig unit test
2024-05-11 09:16:28 -07:00
jmorganca
92ca2cca95
Revert "only forward some env vars"
...
This reverts commit ce3b212d12 .
2024-05-10 22:53:21 -07:00
Patrick Devine
1e1634daca
update go deps ( #4324 )
2024-05-10 21:39:27 -07:00
Daniel Hiltgen
824ee5446f
Fix envconfig unit test
2024-05-10 16:49:48 -07:00
Daniel Hiltgen
879e2caf8c
Merge pull request #4329 from dhiltgen/zero_layers
...
Fall back to CPU runner with zero layers
2024-05-10 15:23:16 -07:00
Daniel Hiltgen
c4014e73a2
Fall back to CPU runner with zero layers
2024-05-10 15:09:48 -07:00
Daniel Hiltgen
be9efdb981
Merge pull request #4326 from dhiltgen/fix_integration
...
Integration fixes
2024-05-10 14:25:59 -07:00
Daniel Hiltgen
074dc3b9d8
Integration fixes
2024-05-10 14:20:10 -07:00
Daniel Hiltgen
86f9b582d5
Merge pull request #4323 from dhiltgen/sort_by_free
...
Always use the sorted list of GPUs
2024-05-10 14:12:15 -07:00
Daniel Hiltgen
4142c3ef7c
Always use the sorted list of GPUs
...
Make sure the first GPU has the most free space
2024-05-10 13:53:21 -07:00
Jeffrey Morgan
6602e793c0
Use --quantize flag and quantize api parameter ( #4321 )
...
* rename `--quantization` to `--quantize`
* backwards
* Update api/types.go
Co-authored-by: Michael Yang <mxyng@pm.me >
---------
Co-authored-by: Michael Yang <mxyng@pm.me >
2024-05-10 13:06:13 -07:00
Michael Yang
ea0fdaed28
Merge pull request #4320 from ollama/mxyng/phi2-mem
...
add phi2 mem
2024-05-10 12:35:08 -07:00
Michael Yang
1eb382da5a
add phi2 mem
2024-05-10 12:13:28 -07:00
Jeffrey Morgan
bb6fd02298
Don't clamp ctx size in PredictServerFit ( #4317 )
...
* dont clamp ctx size in `PredictServerFit`
* minimum 4 context
* remove context warning
2024-05-10 10:17:12 -07:00
Daniel Hiltgen
7e2bceceee
Merge pull request #4316 from dhiltgen/more_buffer
...
Bump VRAM buffer back up
2024-05-10 10:02:34 -07:00
Daniel Hiltgen
30a7d7096c
Bump VRAM buffer back up
...
Under stress scenarios we're seeing OOMs so this should help stabilize
the allocations under heavy concurrency stress.
2024-05-10 09:15:28 -07:00
Michael Yang
200a18820e
Merge pull request #4306 from ollama/mxyng/fix-routes
2024-05-10 08:58:16 -07:00
Michael Yang
e03637176d
fix(routes): skip bad manifests
2024-05-10 08:46:11 -07:00
Bruce MacDonald
c02db93243
omit empty done reason
2024-05-09 16:45:29 -07:00
Michael Yang
ffa4d5134a
Merge pull request #4305 from ollama/mxyng/typo
...
fix typo
2024-05-09 16:42:09 -07:00
Jeffrey Morgan
302d7fdbf3
prune partial downloads ( #4272 )
2024-05-09 16:35:20 -07:00
Michael Yang
cf442cd57e
fix typo
2024-05-09 16:23:37 -07:00
Michael Yang
0e1ba65855
Merge pull request #4302 from ollama/mxyng/forward-env
...
only forward some env vars
2024-05-09 16:21:05 -07:00
Michael Yang
6aad333c63
Merge pull request #4298 from ollama/mxyng/log-cleanup
...
log clean up
2024-05-09 16:20:57 -07:00
Daniel Hiltgen
4fcc84e67a
Merge pull request #4304 from dhiltgen/signals
...
Fix race in shutdown logic
2024-05-09 15:58:44 -07:00
Daniel Hiltgen
3ae2f441e0
Fix race in shutdown logic
...
Ensure the runners are terminated
2024-05-09 15:54:02 -07:00
Zander Lewis
2abb3f6424
Update README.md ( #4300 )
2024-05-09 15:30:49 -07:00
Michael Yang
ce3b212d12
only forward some env vars
2024-05-09 15:16:09 -07:00
Daniel Hiltgen
83d6d46e29
Merge pull request #4299 from dhiltgen/handle_vram_reporting_lag
...
Wait for GPU free memory reporting to converge
2024-05-09 15:08:56 -07:00
Daniel Hiltgen
354ad9254e
Wait for GPU free memory reporting to converge
...
The GPU drivers take a while to update their free memory reporting, so we need
to wait until the values converge with what we're expecting before proceeding
to start another runner in order to get an accurate picture.
2024-05-09 14:56:01 -07:00
Michael Yang
58876091f7
log clean up
2024-05-09 14:55:36 -07:00
Daniel Hiltgen
dc18eee39d
Merge pull request #4238 from dhiltgen/gpu_info
...
Record more GPU information
2024-05-09 14:26:58 -07:00
Daniel Hiltgen
8727a9c140
Record more GPU information
...
This cleans up the logging for GPU discovery a bit, and can
serve as a foundation to report GPU information in a future UX.
2024-05-09 14:18:14 -07:00
Daniel Hiltgen
d0425f26cf
Merge pull request #4294 from dhiltgen/harden_subprocess_reaping
...
Harden subprocess reaping
2024-05-09 14:02:16 -07:00
Bruce MacDonald
cfa84b8470
add done_reason to the api ( #4235 )
2024-05-09 13:30:14 -07:00
Michael Yang
1580ed4c06
Merge pull request #4295 from ollama/mxyng/fix-list
...
routes: skip invalid filepaths
2024-05-09 11:37:34 -07:00
Michael Yang
a7ee84fc31
routes: skip invalid filepaths
2024-05-09 11:23:22 -07:00
Daniel Hiltgen
84ac7ce139
Refine subprocess reaping
2024-05-09 11:21:31 -07:00
tusharhero
788b092c49
docs: add Guix package manager in README. ( #4040 )
2024-05-09 11:10:24 -07:00
J S
5cde17a096
Add PromptingTools.jl ( #2192 )
2024-05-09 09:39:05 -07:00
Daniel Hiltgen
c3837eb08c
Merge pull request #4289 from dhiltgen/doc_container_workarounds
...
Doc container usage and workaround for nvidia errors
2024-05-09 09:27:29 -07:00
Daniel Hiltgen
8cc0ee2efe
Doc container usage and workaround for nvidia errors
2024-05-09 09:26:45 -07:00
Jeffrey Morgan
d5eec16d23
use model defaults for num_gqa, rope_frequency_base and rope_frequency_scale ( #1983 )
2024-05-09 09:06:13 -07:00
Carlos Gamez
daa1a032f7
Update langchainjs.md ( #2027 )
...
Updated sample code as per warning notification from the package maintainers
2024-05-08 20:21:03 -07:00
jmorganca
6042e8bc57
remove bash-comparemodels example
2024-05-08 19:49:45 -07:00
Daniel Hiltgen
920a4b0794
Merge remote-tracking branch 'upstream/main' into pr3702
2024-05-08 16:44:35 -07:00
Daniel Hiltgen
ee49844d09
Merge pull request #4153 from dhiltgen/gpu_verbose_response
...
Add GPU usage
2024-05-08 16:39:11 -07:00
Daniel Hiltgen
8a516ac862
Merge pull request #4241 from dhiltgen/fix_tmp_override
...
Detect noexec and report a better error
2024-05-08 15:34:22 -07:00
Daniel Hiltgen
bee2f4a3b0
Record GPU usage information
...
This records more GPU usage information for eventual UX inclusion.
2024-05-08 14:45:39 -07:00
Bruce MacDonald
cef45feaa4
Add preflight OPTIONS handling and update CORS config ( #4086 )
...
* Add preflight OPTIONS handling and update CORS config
- Implement early return with HTTP 204 (No Content) for OPTIONS requests in allowedHostsMiddleware to optimize preflight handling.
- Extend CORS configuration to explicitly allow 'Authorization' headers and 'OPTIONS' method when OLLAMA_ORIGINS environment variable is set.
* allow auth, content-type, and user-agent headers
* Update routes.go
2024-05-08 13:14:00 -07:00
Michael Yang
2687f02c96
Merge pull request #4265 from ollama/mxyng/fix-show-llava
...
routes: fix show llava models
2024-05-08 12:51:21 -07:00
Michael Yang
b25976aeb8
routes: fix show llava models
2024-05-08 12:43:36 -07:00
Michael Yang
001f167aad
Merge pull request #4261 from ollama/mxyng/fix-tag-case
...
types/model: fix tag case
2024-05-08 11:09:47 -07:00
Michael Yang
486a2c1d94
types/model: fix tag case
2024-05-08 08:47:16 -07:00
Michael Yang
88cf154483
Merge pull request #4244 from ollama/mxyng/skip-if-same
...
skip if same quantization
2024-05-07 19:03:37 -07:00
Bruce MacDonald
8cbd3e7510
skip hidden files in list models handler ( #4247 )
2024-05-07 19:01:45 -07:00
Michael Yang
eeb695261f
skip if same quantization
2024-05-07 17:44:19 -07:00
Bruce MacDonald
dc9b1111e0
fix invalid destination error message
2024-05-07 17:35:52 -07:00
Tobias Gårdhus
06ac829e70
Fix help string for stop parameter ( #2307 )
2024-05-07 16:48:35 -07:00
Daniel Hiltgen
72700279e2
Detect noexec and report a better error
...
This will bubble up a much more informative error message if noexec
is preventing us from running the subprocess
2024-05-07 16:46:15 -07:00
boessu
5d3f7fff26
Update langchainpy.md ( #4236 )
...
fixing pip code.
2024-05-07 16:36:34 -07:00
Eli Bendersky
d77c1c5f9d
api: fill up API documentation ( #3596 )
...
* api: fill up API documentation
Followup for #2878
Now that the documentation is more complete, mention it in the README.
Updates #2840
* fix typo/lint
* Update README.md
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com >
---------
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com >
2024-05-07 16:27:46 -07:00
Giuseppe Lumia
2a5302a1cf
Fix paste of text with line feed characters ( #3043 )
...
Some terminals may send line feed characters when pasting text with
newlines.
2024-05-07 15:26:07 -07:00
Michael Yang
ffbd3d173f
Merge pull request #3715 from ollama/mxyng/modelname-2
...
update list handler to use model.Name
2024-05-07 15:21:39 -07:00
Michael Yang
1e0a669f75
Merge pull request #3682 from ollama/mxyng/quantize-all-the-things
...
quantize any fp16/fp32 model
2024-05-07 15:20:49 -07:00
Bruce MacDonald
527e9be058
fix: store accurate model parameter size ( #4058 )
...
- add test for number formatting
- fix bug where 1B and 1M were not stored correctly
- display 2 decimal points for million param sizes
- display 1 decimal point for billion param sizes
2024-05-07 14:41:53 -07:00
Renat
34bea2e272
Add macai to list of Web & Desktop integrations ( #3881 )
2024-05-07 13:31:34 -07:00
Fernando Maclen
fe44ae3371
Update README.md ( #3884 )
2024-05-07 13:17:35 -07:00
Michael Yang
adeb40eaf2
Merge pull request #4231 from ollama/mxyng/parser
...
types/model: fix parser for empty values
2024-05-07 10:48:32 -07:00
Michael Yang
d7d33e5255
Merge pull request #951 from ollama/mxyng/example-fly
...
fly example
2024-05-07 10:46:24 -07:00
Michael Yang
63bc884e25
types/model: fix parser for empty values
2024-05-07 10:44:43 -07:00
Michael Yang
ef4e095d24
Merge pull request #4232 from ollama/revert-4190-fix/golang-ci
...
Revert "fix golangci workflow not enable gofmt and goimports"
2024-05-07 10:39:37 -07:00
Michael Yang
4d4f75a8a8
Revert "fix golangci workflow missing gofmt and goimports ( #4190 )"
...
This reverts commit 04f971c84b .
2024-05-07 10:35:44 -07:00
Mélony QIN
3f71ba406a
Correct the kubernetes terminology ( #3843 )
...
* add details on kubernetes deployment and separate the testing process
* Update examples/kubernetes/README.md
thanks for suggesting this change, I agree with you and let's make this project better together !
Co-authored-by: JonZeolla <Zeolla@gmail.com >
---------
Co-authored-by: QIN Mélony <MQN1@dsone.3ds.com >
Co-authored-by: JonZeolla <Zeolla@gmail.com >
2024-05-07 09:53:08 -07:00
Hause Lin
88a67127d8
Update README.md to include ollama-r library ( #4012 )
...
* Update README.md
Add Ollama for R - ollama-r library
* Update README.md
---------
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com >
2024-05-07 09:52:30 -07:00
Jeffrey Morgan
f7dc7dcc64
Update .gitattributes
2024-05-07 09:50:19 -07:00
alwqx
04f971c84b
fix golangci workflow missing gofmt and goimports ( #4190 )
2024-05-07 09:49:40 -07:00
Michael Yang
548a7df014
update list handler to use model.Name
2024-05-07 09:38:45 -07:00
Michael Yang
70edb9bc4d
Merge pull request #4215 from ollama/mxyng/mem
...
llm: add minimum based on layer size
2024-05-07 09:26:33 -07:00
Michael Yang
3f0ed03856
Update examples/flyio/README.md
...
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com >
2024-05-07 09:25:01 -07:00
Michael Yang
4736391bfb
llm: add minimum based on layer size
2024-05-06 17:04:19 -07:00
CrispStrobe
7c5330413b
note on naming restrictions ( #2625 )
...
* note on naming restrictions
else push would fail with cryptic
retrieving manifest
Error: file does not exist
==> maybe change that in code too
* Update docs/import.md
---------
Co-authored-by: C-4-5-3 <154636388+C-4-5-3@users.noreply.github.com >
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com >
2024-05-06 16:03:21 -07:00
Jeffrey Morgan
39d9d22ca3
close server on receiving signal ( #4213 )
2024-05-06 16:01:37 -07:00
Jackie Li
af47413dba
Add MarshalJSON to Duration ( #3284 )
...
---------
Co-authored-by: Patrick Devine <patrick@infrahq.com >
2024-05-06 15:59:18 -07:00
Michael Yang
b2f00aa977
close zip files
2024-05-06 15:27:19 -07:00
Michael Yang
6694be5e50
convert/llama: use WriteSeeker
2024-05-06 15:24:01 -07:00
Michael Yang
f5e8b207fb
s/DisplayLongest/String/
2024-05-06 15:24:01 -07:00
Michael Yang
d245460362
only quantize language models
2024-05-06 15:24:01 -07:00
Michael Yang
4d0d0fa383
no iterator
2024-05-06 15:24:01 -07:00
Michael Yang
7ffe45734d
rebase
2024-05-06 15:24:01 -07:00
Michael Yang
01811c176a
comments
2024-05-06 15:24:01 -07:00
Michael Yang
a7248f6ea8
update tests
2024-05-06 15:24:01 -07:00
Michael Yang
9685c34509
quantize any fp16/fp32 model
...
- FROM /path/to/{safetensors,pytorch}
- FROM /path/to/fp{16,32}.bin
- FROM model:fp{16,32}
2024-05-06 15:24:01 -07:00
Jeffrey Chen
d091fe3c21
Windows automatically recognizes username ( #3214 )
2024-05-06 15:03:14 -07:00
Mohamed A. Fouad
ee02f548c8
Update linux.md ( #3847 )
...
Add -e to viewing logs in order to show end of ollama logs
2024-05-06 15:02:25 -07:00
Daniel Hiltgen
b08870aff3
Merge pull request #4188 from dhiltgen/use_our_lib
...
User our bundled libraries (cuda) instead of the host library
2024-05-06 14:41:05 -07:00
Darinka
3ecae420ac
Update api.md ( #3945 )
...
* Update api.md
Changed the calculation of tps (token/s) in the documentation
* Update docs/api.md
---------
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com >
2024-05-06 14:39:58 -07:00
Daniel Hiltgen
4cbbf0e13b
Merge pull request #4090 from dhiltgen/rocm_paths
...
Support Fedoras standard ROCm location
2024-05-06 14:33:41 -07:00
Daniel Hiltgen
380378cc80
Use our libraries first
...
Trying to live off the land for cuda libraries was not the right strategy. We need to use the version we compiled against to ensure things work properly
2024-05-06 14:23:29 -07:00
Daniel Hiltgen
0963c65027
Merge pull request #4208 from dhiltgen/fix_sched_test
...
Fix stale test logic
2024-05-06 14:23:12 -07:00
Jeffrey Morgan
ed740a2504
Fix no slots available error with concurrent requests ( #4160 )
2024-05-06 14:22:53 -07:00
Jeffrey Morgan
c9f98622b1
Skip scheduling cancelled requests, always reload unloaded runners ( #4189 )
2024-05-06 14:22:24 -07:00
Daniel Hiltgen
0a954e5066
Fix stale test logic
...
The model processing was recently changed to be deferred but
this test scenario hadn't been adjusted for that change in behavior.
2024-05-06 14:15:37 -07:00
Adrien Brault
aa93423fbf
docs: pbcopy on mac ( #3129 )
2024-05-06 13:47:00 -07:00
Nurgo
01c9386267
Add BrainSoup to compatible clients list ( #3473 )
2024-05-06 13:42:16 -07:00
Daniel Hiltgen
af9eb36f9f
Merge pull request #4135 from dhiltgen/no_physx
...
Skip PhysX cudart library
2024-05-06 13:34:00 -07:00
Daniel Hiltgen
06093fd396
Merge pull request #4067 from dhiltgen/cudart
...
Add CUDA Driver API for GPU discovery
2024-05-06 13:30:27 -07:00
Tony Loehr
86b7fcac32
Update README.md with StreamDeploy ( #3621 )
...
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com >
2024-05-06 11:14:41 -07:00
Hyden Liu
fb8ddc564e
chore: delete HEAD ( #4194 )
2024-05-06 10:32:30 -07:00
Saif
242efe6611
👌 IMPROVE: add portkey library for production tools ( #4119 )
2024-05-06 10:25:23 -07:00
Jeffrey Morgan
1b0e6c9c0e
Fix llava models not working after first request ( #4164 )
...
* fix llava models not working after first request
* individual requests only for llava models
2024-05-05 20:50:31 -07:00
Jeffrey Morgan
dfa2f32ca0
unload in critical section ( #4187 )
2024-05-05 17:18:27 -07:00
Daniel Hiltgen
840424a2c4
Merge pull request #4154 from dhiltgen/central_config
...
Centralize server config handling
2024-05-05 17:08:26 -07:00
Daniel Hiltgen
f56aa20014
Centralize server config handling
...
This moves all the env var reading into one central module
and logs the loaded config once at startup which should
help in troubleshooting user server logs
2024-05-05 16:49:50 -07:00
alwqx
6707768ebd
chore: format go code ( #4149 )
2024-05-05 16:08:09 -07:00
Lord Basil - Automate EVERYTHING
c78bb76a12
update libraries for langchain_community + llama3 changed from llama2 ( #4174 )
2024-05-05 16:07:04 -07:00
Jeffrey Morgan
942c979232
allocate a large enough kv cache for all parallel requests ( #4162 )
2024-05-05 15:59:32 -07:00
Bernardo de Oliveira Bruning
06164911dd
Update README.md ( #4111 )
...
---------
Co-authored-by: Patrick Devine <patrick@infrahq.com >
2024-05-05 14:45:32 -07:00
Patrick Devine
2a21363bb7
validate the format of the digest when getting the model path ( #4175 )
2024-05-05 11:46:12 -07:00
Daniel Hiltgen
026869915f
Merge pull request #4144 from dhiltgen/max_queue
...
Make maximum pending request configurable
2024-05-05 10:53:44 -07:00
Daniel Hiltgen
45d61aaaa3
Add integration test to push max queue limits
2024-05-05 10:46:25 -07:00
Daniel Hiltgen
20f6c06569
Make maximum pending request configurable
...
This also bumps up the default to be 50 queued requests
instead of 10.
2024-05-04 21:00:52 -07:00
Daniel Hiltgen
371f5e52aa
Merge pull request #4141 from dhiltgen/win_docs
...
Explain the 2 different windows download options
2024-05-04 12:50:16 -07:00
Daniel Hiltgen
e006480e49
Explain the 2 different windows download options
2024-05-04 12:50:05 -07:00
Michael Yang
aed545872d
Merge pull request #4143 from ollama/mxyng/final-response
...
omit prompt and generate settings from final response
2024-05-03 17:39:49 -07:00
Michael Yang
44869c59d6
omit prompt and generate settings from final response
2024-05-03 17:00:02 -07:00
Daniel Hiltgen
52663284cf
Merge pull request #4145 from dhiltgen/fix_lint
...
Fix lint warnings
2024-05-03 16:53:17 -07:00
Daniel Hiltgen
42fa9d7f0a
Fix lint warnings
2024-05-03 16:44:19 -07:00
Michael Yang
b7a87a22b6
Merge pull request #4059 from ollama/mxyng/parser-2
...
rename parser to model/file
2024-05-03 13:01:22 -07:00
Dr Nic Williams
e8aaea030e
Update 'llama2' -> 'llama3' in most places ( #4116 )
...
* Update 'llama2' -> 'llama3' in most places
---------
Co-authored-by: Patrick Devine <patrick@infrahq.com >
2024-05-03 15:25:04 -04:00
Daniel Hiltgen
b1ad3a43cb
Skip PhysX cudart library
...
For some reason this library gives incorrect GPU information, so skip it
2024-05-03 11:55:32 -07:00
Daniel Hiltgen
267e25a750
Merge pull request #4129 from dhiltgen/unit_tests
...
Soften timeouts on sched unit tests
2024-05-03 11:10:26 -07:00
Daniel Hiltgen
9a32c514cb
Soften timeouts on sched unit tests
...
This gives us more headroom on the scheduler tests to tamp
down some flakes.
2024-05-03 09:08:33 -07:00
Michael Yang
e9ae607ece
Merge pull request #3892 from ollama/mxyng/parser
...
refactor modelfile parser
2024-05-02 17:04:47 -07:00
Michael Yang
93707fa3f2
Merge pull request #4108 from ollama/mxyng/lf
...
fix line ending
2024-05-02 14:55:15 -07:00
Michael Yang
94c369095f
fix line ending
...
replace CRLF with LF
2024-05-02 14:53:13 -07:00
Jeffrey Morgan
9164b0161b
Update .gitattributes
2024-05-02 14:06:31 -04:00
Daniel Hiltgen
e592e8fccb
Support Fedoras standard ROCm location
2024-05-01 15:47:12 -07:00
Bryce Reitano
bf4fc25f7b
Add a /clear command ( #3947 )
...
* Add a /clear command
* change help messages
---------
Co-authored-by: Patrick Devine <patrick@infrahq.com >
2024-05-01 17:44:36 -04:00
Michael Yang
5b806d8d24
Merge pull request #4089 from ollama/mxyng/target-invalid
...
server: destination invalid
2024-05-01 12:46:35 -07:00
Michael Yang
cb1e072643
Merge pull request #4087 from ollama/mxyng/fix-host-port
...
types/model: fix name for hostport
2024-05-01 12:42:07 -07:00
Michael Yang
45b6a12e45
server: target invalid
2024-05-01 12:40:45 -07:00
alwqx
68755f1f5e
chore: fix typo in docs/development.md ( #4073 )
2024-05-01 15:39:11 -04:00
Michael Yang
997a455039
want filepath
2024-05-01 12:33:41 -07:00
Michael Yang
88775e1ff9
strip scheme from name
2024-05-01 12:26:19 -07:00
Michael Yang
8867e744ff
types/model: fix name for hostport
2024-05-01 12:14:53 -07:00
Daniel Hiltgen
4fd064bea6
Merge pull request #4031 from MarkWard0110/fix/issue-3736
...
Fix/issue 3736: When runners are closing or expiring. Scheduler is getting dirty VRAM size readings.
2024-05-01 12:13:26 -07:00
Jeffrey Morgan
59fbceedcc
use lf for line endings ( #4085 )
2024-05-01 15:02:45 -04:00
Mark Ward
321d57e1a0
Removing go routine calling .wait from load.
2024-05-01 18:51:10 +00:00
Mark Ward
ba26c7aa00
it will always return an error due to Kill() discarding Wait() errors
2024-05-01 18:51:10 +00:00
Mark Ward
63c763685f
log when the waiting for the process to stop to help debug when other tasks execute during this wait.
...
expire timer clear the timer reference because it will not be reused.
close will clean up expireTimer if calling code has not already done this.
2024-05-01 18:51:10 +00:00
Mark Ward
34a4a94f13
ignore debug bin files
2024-05-01 18:51:10 +00:00
Mark Ward
f4a73d57a4
fix runner expire during active use. Clearing the expire timer as it is used. Allowing the finish to assign an expire timer so that the runner will expire after no use.
2024-05-01 18:51:10 +00:00
Mark Ward
948114e3e3
fix sched to wait for the runner to terminate to ensure following vram check will be more accurate
2024-05-01 18:51:10 +00:00
Arpit Jain
a3e60d9058
README.md: fix typos ( #4007 )
...
Co-authored-by: Blake Mizerany <blake.mizerany@gmail.com >
2024-05-01 10:39:38 -07:00
Michael Yang
8acb233668
use strings.Builder
2024-05-01 10:01:09 -07:00
Michael Yang
119589fcb3
rename parser to model/file
2024-05-01 09:53:50 -07:00
Michael Yang
5ea844964e
cmd: import regexp
2024-05-01 09:53:45 -07:00
Michael Yang
bd8eed57fc
fix parser name
2024-05-01 09:52:54 -07:00
Michael Yang
9cf0f2e973
use parser.Format instead of templating modelfile
2024-05-01 09:52:54 -07:00
Michael Yang
176ad3aa6e
parser: add commands format
2024-05-01 09:52:54 -07:00
Michael Yang
4d08363580
comments
2024-05-01 09:52:54 -07:00
Michael Yang
8907bf51d2
fix multiline
2024-05-01 09:52:54 -07:00
Michael Yang
abe614c705
tests
2024-05-01 09:52:54 -07:00
Michael Yang
238715037d
linting
2024-05-01 09:52:54 -07:00
Michael Yang
c0a00f68ae
refactor modelfile parser
2024-05-01 09:52:54 -07:00
Jeffrey Morgan
f0c454ab57
gpu: add 512MiB to darwin minimum, metal doesn't have partial offloading overhead ( #4068 )
2024-05-01 11:46:03 -04:00
Daniel Hiltgen
089daaeabc
Add CUDA Driver API for GPU discovery
...
We're seeing some corner cases with cudart which might be resolved by
switching to the driver API which comes bundled with the driver package
2024-04-30 18:00:45 -07:00
Blake Mizerany
b9f74ff3d6
types/model: reintroduce Digest ( #4065 )
2024-04-30 16:38:03 -07:00
jmorganca
fcf4d60eee
llm: add back check for empty token cache
2024-04-30 17:38:44 -04:00
jmorganca
e33d5c2dbc
update llama.cpp commit to 952d03d
2024-04-30 17:31:20 -04:00
Jeffrey Morgan
18d9a7e1f1
update llama.cpp submodule to f364eb6 ( #4060 )
2024-04-30 17:25:39 -04:00
Michael
8488388cbd
Update README.md
2024-04-30 15:45:56 -04:00
Blake Mizerany
588901f449
types/model: reduce Name.Filepath allocs from 5 to 2 ( #4039 )
2024-04-30 11:09:19 -07:00
Bruce MacDonald
0a7fdbe533
prompt to display and add local ollama keys to account ( #3717 )
...
- return descriptive error messages when unauthorized to create blob or push a model
- display the local public key associated with the request that was denied
2024-04-30 11:02:08 -07:00
Christian Frantzen
5950c176ca
Update langchainpy.md ( #4037 )
...
Updated the code a bit
2024-04-29 23:19:06 -04:00
Daniel Hiltgen
23d23409a0
Update llama.cpp ( #4036 )
...
* Bump llama.cpp to b2761
* Adjust types for bump
2024-04-29 23:18:48 -04:00
Patrick Devine
9009bedf13
better checking for OLLAMA_HOST variable ( #3661 )
2024-04-29 19:14:07 -04:00
Daniel Hiltgen
d4ac57e240
Merge pull request #4035 from dhiltgen/fix_relative_paths
...
Fix relative path lookup
2024-04-29 16:08:06 -07:00
Daniel Hiltgen
7b59d1770f
Fix relative path lookup
2024-04-29 16:00:08 -07:00
Jeffrey Morgan
95ead8ffba
Restart server on failure when running Windows app ( #3985 )
...
* app: restart server on failure
* fix linter
* address comments
* refactor log directory creation to be where logs are written
* check all log dir creation errors
2024-04-29 10:07:52 -04:00
Jeffrey Morgan
7aa08a77ca
llm: dont cap context window limit to training context window ( #3988 )
2024-04-29 10:07:30 -04:00
Blake Mizerany
7e432cdfac
types/model: remove old comment ( #4020 )
2024-04-28 20:52:26 -07:00
Jeffrey Morgan
586672f490
fix copying model to itself ( #4019 )
2024-04-28 23:47:49 -04:00
Daniel Hiltgen
b03408de74
Merge pull request #3972 from hmartinez82/win_arm64
...
Add support for building on Windows ARM64
2024-04-28 14:52:58 -07:00
Daniel Hiltgen
1e6a28bf5b
Merge pull request #4009 from dhiltgen/cpu_concurrency
...
Fix concurrency for CPU mode
2024-04-28 14:20:27 -07:00
Daniel Hiltgen
d6e3b64582
Fix concurrency for CPU mode
...
Prior refactoring passes accidentally removed the logic to bypass VRAM
checks for CPU loads. This adds that back, along with test coverage.
This also fixes loaded map access in the unit test to be behind the mutex which was
likely the cause of various flakes in the tests.
2024-04-28 13:42:39 -07:00
Blake Mizerany
114c932a8e
types/model: allow _ as starter character in Name parts ( #3991 )
2024-04-27 21:24:52 -07:00
Jeffrey Morgan
7f7103de06
mac: update setup command to llama3 ( #3986 )
2024-04-27 22:52:10 -04:00
Blake Mizerany
c631a9c726
types/model: relax name length constraint from 2 to 1 ( #3984 )
2024-04-27 17:58:41 -07:00
Blake Mizerany
8fd9e56804
types/structs: drop unused structs package ( #3981 )
2024-04-27 14:06:11 -07:00
Hernan Martinez
8a65717f55
Do not build AVX runners on ARM64
2024-04-26 23:55:32 -06:00
Hernan Martinez
6d3152a98a
Use architecture specific folders in installer script
2024-04-26 23:35:16 -06:00
Hernan Martinez
b438d485f1
Use architecture specific folders in the generate script
2024-04-26 23:34:12 -06:00
Hernan Martinez
204349b17b
Use architecture specific folders in the build script
2024-04-26 23:26:03 -06:00
Hernan Martinez
86e67fc4a9
Add import declaration for windows,arm64 to llm.go
2024-04-26 23:23:53 -06:00
Blake Mizerany
2bed62926e
types/model: remove Digest (for now) ( #3970 )
...
The Digest type needs more thought and is not necessary at the moment.
2024-04-26 21:14:28 -07:00
Jeffrey Morgan
aad8d128a0
also look at cwd as a root for windows runners ( #3959 )
2024-04-26 19:14:08 -04:00
Daniel Hiltgen
ec1acbb867
Merge pull request #3968 from dhiltgen/win_generate
...
Fine grain control over windows generate steps
2024-04-26 16:03:38 -07:00
Daniel Hiltgen
e4859c4563
Fine grain control over windows generate steps
...
This will speed up CI which already tries to only build static for unit tests
2024-04-26 15:49:46 -07:00
Nataly Merezhuk
8e30eb26bd
Updates the setup command to use llama3. ( #3962 )
2024-04-26 18:41:01 -04:00
Daniel Hiltgen
0b5c589ca2
Merge pull request #3966 from dhiltgen/bump
...
Fix target in gen_windows.ps1
2024-04-26 15:36:53 -07:00
Michael Yang
65fadddc85
Merge pull request #3964 from ollama/mxyng/weights
...
fix gemma, command-r layer weights
2024-04-26 15:23:33 -07:00
Daniel Hiltgen
ed5fb088c4
Fix target in gen_windows.ps1
2024-04-26 15:10:42 -07:00
Michael Yang
f81f308118
fix gemma, command-r layer weights
2024-04-26 15:00:55 -07:00
Blake Mizerany
b1390a7b37
types/model: export ParseNameBare and Merge ( #3957 )
...
These are useful outside this package.
2024-04-26 14:58:07 -07:00
Michael Yang
11d83386a5
Merge pull request #3951 from ollama/mxyng/zip
...
check file type before zip
2024-04-26 14:51:23 -07:00
Jeffrey Morgan
bb31def011
return code 499 when user cancels request while a model is loading ( #3955 )
2024-04-26 17:38:29 -04:00
Michael Yang
41e03ede95
check file type before zip
2024-04-26 14:18:07 -07:00
Michael Yang
7fea1ecdf6
Merge pull request #3958 from ollama/mxyng/fix-workflow
...
use merge base for diff-tree
2024-04-26 14:17:56 -07:00
Blake Mizerany
054894271d
.github/workflows/test.yaml: add in-flight cancellations on new push ( #3956 )
...
Also, remove a superfluous 'go get'
2024-04-26 13:54:24 -07:00
Michael Yang
6fef042f0b
use merge base for diff-tree
2024-04-26 13:54:15 -07:00
Daniel Hiltgen
5c0c2d1d09
Merge pull request #3954 from dhiltgen/ci_fixes
...
Put back non-avx CPU build for windows
2024-04-26 13:09:03 -07:00
Blake Mizerany
37f9c8ad99
types/model: overhaul Name and Digest types ( #3924 )
2024-04-26 13:08:32 -07:00
Quinten van Buul
2a80f55e2a
Update windows.md ( #3855 )
...
Fixed a typo
2024-04-26 16:04:15 -04:00
Daniel Hiltgen
421c878a2d
Put back non-avx CPU build for windows
2024-04-26 12:44:07 -07:00
Daniel Hiltgen
36666c2142
Merge pull request #3925 from dhiltgen/bump
...
Bump llama.cpp to b2737
2024-04-26 10:09:38 -07:00
Daniel Hiltgen
85801317d1
Fix clip log import
2024-04-26 09:43:46 -07:00
Daniel Hiltgen
2ed0d65948
Bump llama.cpp to b2737
2024-04-26 09:43:28 -07:00
Daniel Hiltgen
d459dc4ad1
Merge pull request #3950 from dhiltgen/windows_packaging
...
Fix exe name for zip packaging on windows
2024-04-26 09:27:37 -07:00
Daniel Hiltgen
40bc4622ef
Fix exe name for zip packaging on windows
...
The zip file encodes the OS and architecture, so keep the short exe name
2024-04-26 09:18:05 -07:00
Daniel Hiltgen
c0f818a07a
Merge pull request #3948 from dhiltgen/win_generate
...
Refactor windows generate for more modular usage
2024-04-26 09:17:20 -07:00
Daniel Hiltgen
8671fdeda6
Refactor windows generate for more modular usage
2024-04-26 08:35:50 -07:00
Daniel Hiltgen
2619850fb4
Merge pull request #3933 from dhiltgen/ci_fixes
...
Move cuda/rocm dependency gathering into generate script
2024-04-26 07:01:24 -07:00
Daniel Hiltgen
8feb97dc0d
Move cuda/rocm dependency gathering into generate script
...
This will make it simpler for CI to accumulate artifacts from prior steps
2024-04-25 22:38:44 -07:00
Daniel Hiltgen
4e1ff6dcbb
Merge pull request #3926 from dhiltgen/ci_fixes
...
Fix release CI
2024-04-25 17:42:31 -07:00
Daniel Hiltgen
8589d752ac
Fix release CI
...
download-artifact path was being used incorrectly. It is where to
extract the zip not the files in the zip to extract. Default is
workspace dir which is what we want, so omit it
2024-04-25 17:27:11 -07:00
Michael Yang
de4ded68b0
Merge pull request #3923 from ollama/mxyng/mem
...
only count output tensors
2024-04-25 16:34:17 -07:00
Daniel Hiltgen
9b5a3c5991
Merge pull request #3914 from dhiltgen/mac_perf
...
Improve mac parallel performance
2024-04-25 16:28:31 -07:00
Jeffrey Morgan
00b0699c75
Reload model if num_gpu changes ( #3920 )
...
* reload model if `num_gpu` changes
* dont reload on -1
* fix tests
2024-04-25 19:02:40 -04:00
Jeffrey Morgan
993cf8bf55
llm: limit generation to 10x context size to avoid run on generations ( #3918 )
...
* llm: limit generation to 10x context size to avoid run on generations
* add comment
* simplify condition statement
2024-04-25 19:02:30 -04:00
Michael Yang
7bb7cb8a60
only count output tensors
2024-04-25 15:24:08 -07:00
Daniel Hiltgen
b123be5b71
Adjust context size for parallelism
2024-04-25 13:58:54 -07:00
jmorganca
ddf5c09a9b
use matrix multiplcation kernels in more cases
2024-04-25 13:58:54 -07:00
Roy Yang
5f73c08729
Remove trailing spaces ( #3889 )
2024-04-25 14:32:26 -04:00
Daniel Hiltgen
f503a848c2
Merge pull request #3895 from brycereitano/shiftloading
...
Move ggml loading to when attempting to fit
2024-04-25 09:24:08 -07:00
Bryce Reitano
36a6daccab
Restructure loading conditional chain
2024-04-24 17:37:03 -06:00
Bryce Reitano
ceb0e26e5e
Provide variable ggml for TestLoad
2024-04-24 17:19:55 -06:00
Bryce Reitano
284e02bed0
Move ggml loading to when we attempt fitting
2024-04-24 17:17:24 -06:00
Michael Yang
3450a57d4a
Merge pull request #3713 from ollama/mxyng/modelname
...
update copy handler to use model.Name
2024-04-24 16:00:32 -07:00
Michael Yang
592dae31c8
update copy to use model.Name
2024-04-24 15:54:54 -07:00
Michael Yang
2010cbc5fa
Merge pull request #3833 from ollama/mxyng/fix-from
...
fix: from blob
2024-04-24 15:13:47 -07:00
Michael Yang
ac0801eced
only replace if it matches command
2024-04-24 14:49:26 -07:00
Michael Yang
ad66e5b060
split temp zip files
2024-04-24 14:18:01 -07:00
Blake Mizerany
ade4b55520
types/model: make ParseName use default without question ( #3886 )
2024-04-24 11:52:55 -07:00
Daniel Hiltgen
a6d62e0617
Merge pull request #3882 from dhiltgen/amd_gfx
...
AMD gfx patch rev is hex
2024-04-24 11:07:49 -07:00
Daniel Hiltgen
6e76348df7
Merge pull request #3834 from dhiltgen/not_found_in_path
...
Report errors on server lookup instead of path lookup failure
2024-04-24 10:50:48 -07:00
Daniel Hiltgen
0d6687f84c
AMD gfx patch rev is hex
...
Correctly handle gfx90a discovery
2024-04-24 09:43:52 -07:00
Patrick Devine
74d2a9ef9a
add OLLAMA_KEEP_ALIVE env variable to FAQ ( #3865 )
2024-04-23 21:06:51 -07:00
Patrick Devine
14476d48cc
fixes for gguf ( #3863 )
2024-04-23 20:57:20 -07:00
Patrick Devine
ce8ce82567
add mixtral 8x7b model conversion ( #3859 )
2024-04-23 20:17:04 -07:00
Blake Mizerany
4dc4f1be34
types/model: restrict digest hash part to a minimum of 2 characters ( #3858 )
...
This allows users of a valid Digest to know it has a minimum of 2
characters in the hash part for use when sharding.
This is a reasonable restriction as the hash part is a SHA256 hash which
is 64 characters long, which is the common hash used. There is no
anticipation of using a hash with less than 2 characters.
Also, add MustParseDigest.
Also, replace Digest.Type with Digest.Split for getting both the type
and hash parts together, which is most the common case when asking for
either.
2024-04-23 18:24:17 -07:00
Daniel Hiltgen
16b52331a4
Merge pull request #3857 from dhiltgen/mem_escape_valve
...
Add back memory escape valve
2024-04-23 17:32:24 -07:00
Daniel Hiltgen
5445aaa94e
Add back memory escape valve
...
If we get our predictions wrong, this can be used to
set a lower memory limit as a workaround. Recent multi-gpu
refactoring accidentally removed it, so this adds it back.
2024-04-23 17:09:02 -07:00
Daniel Hiltgen
2ac3dd6853
Merge pull request #3850 from dhiltgen/windows_packaging
...
Move nested payloads to installer and zip file on windows
2024-04-23 16:35:20 -07:00
Daniel Hiltgen
d8851cb7a0
Harden sched TestLoad
...
Give the go routine a moment to deliver the expired event
2024-04-23 16:14:47 -07:00
Daniel Hiltgen
058f6cd2cc
Move nested payloads to installer and zip file on windows
...
Now that the llm runner is an executable and not just a dll, more users are facing
problems with security policy configurations on windows that prevent users
writing to directories and then executing binaries from the same location.
This change removes payloads from the main executable on windows and shifts them
over to be packaged in the installer and discovered based on the executables location.
This also adds a new zip file for people who want to "roll their own" installation model.
2024-04-23 16:14:47 -07:00
Daniel Hiltgen
790cf34d17
Merge pull request #3846 from dhiltgen/missing_runner
...
Detect and recover if runner removed
2024-04-23 13:14:12 -07:00
Michael
928d844896
adding phi-3 mini to readme
...
adding phi-3 mini to readme
2024-04-23 13:58:31 -04:00
Daniel Hiltgen
939d6a8606
Make CI lint verbvose
2024-04-23 10:17:42 -07:00
Daniel Hiltgen
58888a74bc
Detect and recover if runner removed
...
Tmp cleaners can nuke the file out from underneath us. This detects the missing
runner, and re-initializes the payloads.
2024-04-23 10:05:26 -07:00
Daniel Hiltgen
cc5a71e0e3
Merge pull request #3709 from remy415/custom-gpu-defs
...
Adds support for customizing GPU build flags in llama.cpp
2024-04-23 09:28:34 -07:00
Michael Yang
e83bcf7f9a
Merge pull request #3836 from ollama/mxyng/mixtral
...
fix: mixtral graph
2024-04-23 09:15:10 -07:00
Daniel Hiltgen
5690e5ce99
Merge pull request #3418 from dhiltgen/concurrency
...
Request and model concurrency
2024-04-23 08:31:38 -07:00
Daniel Hiltgen
f2ea8470e5
Local unicode test case
2024-04-22 19:29:12 -07:00
Daniel Hiltgen
34b9db5afc
Request and model concurrency
...
This change adds support for multiple concurrent requests, as well as
loading multiple models by spawning multiple runners. The default
settings are currently set at 1 concurrent request per model and only 1
loaded model at a time, but these can be adjusted by setting
OLLAMA_NUM_PARALLEL and OLLAMA_MAX_LOADED_MODELS.
2024-04-22 19:29:12 -07:00
Daniel Hiltgen
8711d03df7
Report errors on server lookup instead of path lookup failure
2024-04-22 19:08:47 -07:00
Daniel Hiltgen
ee448deaba
Merge pull request #3835 from dhiltgen/harden_llm_override
...
Trim spaces and quotes from llm lib override
2024-04-22 19:06:54 -07:00
Bruce MacDonald
6e8db04716
tidy community integrations
...
- move some popular integrations to the top of the lists
2024-04-22 17:29:08 -07:00
Bruce MacDonald
658e60cf73
Revert "stop running model on interactive exit"
...
This reverts commit fad00a85e5 .
2024-04-22 17:23:11 -07:00
Bruce MacDonald
4c78f028f8
Merge branch 'main' of https://github.com/ollama/ollama
2024-04-22 17:22:28 -07:00
Michael Yang
435cc866a3
fix: mixtral graph
2024-04-22 17:19:44 -07:00
Hao Wu
c7d3a558f6
docs: update README to add chat (web UI) for LLM ( #3810 )
...
* add chat (web UI) for LLM
I have used chat with llama3 in local successfully and the code is MIT licensed.
* Update README.md
---------
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com >
2024-04-22 20:19:39 -04:00
Maple Gao
089cdb2877
docs: Update README for Lobe-chat integration. ( #3817 )
...
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com >
2024-04-22 20:18:15 -04:00
Võ Đình Đạt
ea1e9aa36b
Update README.md ( #3655 )
2024-04-22 20:16:55 -04:00
Jonathan Smoley
d0d28ef90d
Update README.md with Discord-Ollama project ( #3633 )
...
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com >
2024-04-22 20:14:20 -04:00
Eric Curtin
6654186a7c
Add podman-ollama to terminal apps ( #3626 )
...
The goal of podman-ollama is to make AI even more boring.
Signed-off-by: Eric Curtin <ecurtin@redhat.com >
2024-04-22 20:13:23 -04:00
Daniel Hiltgen
aa72281eae
Trim spaces and quotes from llm lib override
2024-04-22 17:11:14 -07:00
reid41
74bcbf828f
add qa-pilot link ( #3612 )
...
* add qa-pilot link
* format the link
* add shell-pilot
2024-04-22 20:10:34 -04:00
Christian Neff
fe39147e64
Add Chatbot UI v2 to Community Integrations ( #3503 )
2024-04-22 20:09:55 -04:00
Bruce MacDonald
fad00a85e5
stop running model on interactive exit
2024-04-22 16:22:14 -07:00
Jeremy
9c0db4cc83
Update gen_windows.ps1
...
Fixed improper env references
2024-04-21 16:13:41 -04:00
Cheng
62be2050dd
chore: use errors.New to replace fmt.Errorf will much better ( #3789 )
2024-04-20 22:11:06 -04:00
Blake Mizerany
56f8aa6912
types/model: export IsValidNamePart ( #3788 )
2024-04-20 18:26:34 -07:00
Sri Siddhaarth
e6f9bfc0e8
Update api.md ( #3705 )
2024-04-20 15:17:03 -04:00
Jeremy
6f18297b3a
Update gen_windows.ps1
...
Forgot a " on the write-host
2024-04-18 19:47:44 -04:00
Jeremy
15016413de
Update gen_windows.ps1
...
Added OLLAMA_CUSTOM_CUDA_DEFS and OLLAMA_CUSTOM_ROCM_DEFS to customize GPU builds on Windows
2024-04-18 19:27:16 -04:00
Jeremy
440b7190ed
Update gen_linux.sh
...
Added OLLAMA_CUSTOM_CUDA_DEFS and OLLAMA_CUSTOM_ROCM_DEFS instead of OLLAMA_CUSTOM_GPU_DEFS
2024-04-18 19:18:10 -04:00
Daniel Hiltgen
8d1995c625
Merge pull request #3708 from remy415/arm64static
...
move Ollama static build to its own flag
2024-04-18 16:04:12 -07:00
Daniel Hiltgen
fd01fbf038
Merge pull request #3710 from remy415/update-jetson-docs
...
update jetson tutorial
2024-04-18 16:02:08 -07:00
Blake Mizerany
0408205c1c
types/model: accept former : as a separator in digest ( #3724 )
...
This also converges the old sep `:` to the new sep `-`.
2024-04-18 14:17:46 -07:00
Jeffrey Morgan
63a7edd771
Update README.md
2024-04-18 16:09:38 -04:00
Michael
554ffdcce3
add llama3 to readme
...
add llama3 to readme
2024-04-18 15:18:48 -04:00
ManniX-ITA
c496967e56
Merge branch 'ollama:main' into mannix-server
2024-04-18 18:45:15 +02:00
Jeremy
9850a4ce08
Merge branch 'ollama:main' into update-jetson-docs
2024-04-18 09:55:17 -04:00
Jeremy
3934c15895
Merge branch 'ollama:main' into custom-gpu-defs
2024-04-18 09:55:10 -04:00
Jeremy
fd048f1367
Merge branch 'ollama:main' into arm64static
2024-04-18 09:55:04 -04:00
Michael Yang
8645076a71
Merge pull request #3712 from ollama/mxyng/mem
...
add stablelm graph calculation
2024-04-17 15:57:51 -07:00
Michael Yang
05e9424824
Merge pull request #3664 from ollama/mxyng/fix-padding-2
...
fix padding to only return padding
2024-04-17 15:57:40 -07:00
Michael Yang
52ebe67a98
Merge pull request #3714 from ollama/mxyng/model-name-host
...
types/model: support : in PartHost for host:port
2024-04-17 15:34:03 -07:00
Michael Yang
889b31ab78
types/model: support : in PartHost for host:port
2024-04-17 15:16:07 -07:00
Michael Yang
3cf483fe48
add stablelm graph calculation
2024-04-17 13:57:19 -07:00
Jeremy
8dca03173d
Merge remote-tracking branch 'upstream/main' into update-jetson-docs
2024-04-17 16:18:50 -04:00
Jeremy
85bdf14b56
update jetson tutorial
2024-04-17 16:17:42 -04:00
Jeremy
d524e5ef5e
Merge branch 'custom-gpu-defs' of https://github.com/remy415/ollama into custom-gpu-defs
2024-04-17 16:01:03 -04:00
Jeremy
52f5370c48
add support for custom gpu build flags for llama.cpp
2024-04-17 16:00:48 -04:00
Jeremy
da8a0c7657
Merge branch 'ollama:main' into arm64static
2024-04-17 15:22:34 -04:00
Jeremy
1b42b4b59a
Merge branch 'ollama:main' into custom-gpu-defs
2024-04-17 15:21:56 -04:00
Jeremy
7c000ec3ed
adds support for OLLAMA_CUSTOM_GPU_DEFS to customize GPU build flags
2024-04-17 15:21:05 -04:00
jmorganca
c8afe7168c
use correct extension for feature and model request issue templates
2024-04-17 15:18:40 -04:00
jmorganca
28d3cd0148
simpler feature and model request forms
2024-04-17 15:17:08 -04:00
jmorganca
eb5554232a
simpler feature and model request forms
2024-04-17 15:14:49 -04:00
Jeremy
ea4c284a48
Merge branch 'ollama:main' into arm64static
2024-04-17 15:11:38 -04:00
jmorganca
2bdc320216
add descriptions to issue templates
2024-04-17 15:08:36 -04:00
jmorganca
32561aed09
simplify github issue templates a bit
2024-04-17 15:07:03 -04:00
Michael Yang
71548d9829
Merge pull request #3706 from ollama/mxyng/mem
...
account for all non-repeating layers
2024-04-17 11:58:20 -07:00
Jeremy
8aec92fa6d
rearranged conditional logic for static build, dockerfile updated
2024-04-17 14:43:28 -04:00
Michael Yang
a8b9b930b4
account for all non-repeating layers
2024-04-17 11:21:21 -07:00
Michael
9755cf9173
acknowledge the amazing work done by Georgi and team!
2024-04-17 13:48:14 -04:00
Jeremy
70261b9bb6
move static build to its own flag
2024-04-17 13:04:28 -04:00
ManniX-ITA
c942e4a07b
Fixed startup sequence to report model loading
2024-04-17 17:40:32 +02:00
ManniX-ITA
bd54b08261
Streamlined WaitUntilRunning
2024-04-17 17:39:52 +02:00
Blake Mizerany
9df6c85c3a
types/model: add FilepathNoBuild ( #3680 )
...
Also, add test for DisplayLongest.
Also, plumb fill param to ParseName in MustParseName
2024-04-16 18:35:43 -07:00
Michael Yang
e74163af4c
fix padding to only return padding
2024-04-16 15:43:26 -07:00
Michael Yang
fb9580df85
Merge pull request #3684 from ollama/mxyng/scale-graph
...
scale graph based on gpu count
2024-04-16 14:57:09 -07:00
Michael Yang
26df674785
scale graph based on gpu count
2024-04-16 14:44:13 -07:00
Jeffrey Morgan
7c9792a6e0
Support unicode characters in model path ( #3681 )
...
* parse wide argv characters on windows
* cleanup
* move cleanup to end of `main`
2024-04-16 17:00:12 -04:00
Michael Yang
7afb2e125a
Merge pull request #3678 from ollama/mxyng/fix-darwin-partial-offloading
...
darwin: no partial offloading if required memory greater than system
2024-04-16 12:05:56 -07:00
Michael Yang
41a272de9f
darwin: no partial offloading if required memory greater than system
2024-04-16 11:22:38 -07:00
Jeffrey Morgan
f335722275
update llama.cpp submodule to 7593639 ( #3665 )
2024-04-15 23:04:43 -04:00
Michael Yang
6d53b67c2c
Merge pull request #3663 from ollama/mxyng/fix-padding
2024-04-15 17:44:54 -07:00
Michael Yang
969238b19e
fix padding in decode
...
TODO: update padding() to _only_ returning the padding
2024-04-15 17:27:06 -07:00
Blake Mizerany
949d7832cf
Revert "cmd: provide feedback if OLLAMA_MODELS is set on non-serve command ( #3470 )" ( #3662 )
...
This reverts commit 7d05a6ee8f .
This proved to be more painful than useful.
See: https://github.com/ollama/ollama/issues/3624
2024-04-15 16:58:00 -07:00
Sung Kim
99d227c9db
Added Solar example at README.md ( #3610 )
...
Added just one line
| Solar | 10.7B | 6.1GB | `ollama run solar` |
2024-04-15 19:54:23 -04:00
Carlos Gamez
a27e419b47
Update langchainjs.md ( #2030 )
...
Changed ollama.call() for ollama.invoke() as per deprecated documentation from langchain
2024-04-15 18:37:30 -04:00
Chandre Van Der Westhuizen
e4d0db5a97
Added MindsDB information ( #3595 )
...
* Added MindsDB information
Added more details to MindsDB so that Ollama users can know that they can connect their Ollama model with 200+ databases and apps
* updated text for mindsdb
2024-04-15 18:35:29 -04:00
Eli Bendersky
ba460802c2
examples: add more Go examples using the API ( #3599 )
...
* examples: go-multimodal
* examples: add go-pull-progress
* examples: add go-chat
* fix
2024-04-15 18:34:54 -04:00
Jeffrey Morgan
e54a3c7fcd
Update modelfile.md
...
Remove Modelfile parameters that are decided at runtime
2024-04-15 15:35:44 -04:00
Patrick Devine
9f8691c6c8
Add llama2 / torch models for ollama create ( #3607 )
2024-04-15 11:26:42 -07:00
Jeffrey Morgan
a0b8a32eb4
Terminate subprocess if receiving SIGINT or SIGTERM signals while model is loading ( #3653 )
...
* terminate subprocess if receiving `SIGINT` or `SIGTERM` signals while model is loading
* use `unload` in signal handler
2024-04-15 12:09:32 -04:00
Jeffrey Morgan
7027f264fb
app: gracefully shut down ollama serve on windows ( #3641 )
...
* app: gracefully shut down `ollama serve` on windows
* fix linter errors
* bring back `HideWindow`
* remove creation flags
* restore `windows.CREATE_NEW_PROCESS_GROUP`
2024-04-14 18:33:25 -04:00
Blake Mizerany
9bee3b63b1
types/model: add path helpers ( #3619 )
...
This commit adds path helpers for working with Names in URL and file
paths. The new helpers are ParseNameFromPath, ParseNameFromFilePath,
Name.Path, and Name.FilePath.
This commit also adds Name.DisplayLongest, and Name.DisplayLong.
Also, be it updates a place where strings.StripPrefix is more consistent
with the surrounding code.
Also, replace Parts with specific methods
2024-04-13 12:59:19 -07:00
Jeffrey Morgan
309aef7fee
update llama.cpp submodule to 4bd0f93 ( #3627 )
2024-04-13 10:43:02 -07:00
Blake Mizerany
08655170aa
types/model: make ParseName variants less confusing ( #3617 )
...
Also, fix http stripping bug.
Also, improve upon docs about fills and masks.
2024-04-12 13:57:57 -07:00
Blake Mizerany
2b341069a7
types/model: remove (*Digest).Scan and Digest.Value ( #3605 )
2024-04-11 13:32:31 -07:00
Daniel Hiltgen
c00fee6936
Merge pull request #3604 from dhiltgen/fix_rocm_deps
...
Fix rocm deps with new subprocess paths
2024-04-11 13:08:29 -07:00
Daniel Hiltgen
c2d813bdc3
Fix rocm deps with new subprocess paths
2024-04-11 12:52:06 -07:00
Michael Yang
786f3a1c44
Merge pull request #3600 from ollama/mxyng/mixtral
2024-04-11 12:23:37 -07:00
Michael Yang
3397eff0cd
mixtral mem
2024-04-11 11:10:41 -07:00
Blake Mizerany
0efb7931c7
Revert "types/model: remove (*Digest).Scan and Digest.Value ( #3589 )"
...
This reverts commit 42f2cc408e .
2024-04-11 00:45:07 -07:00
Blake Mizerany
42f2cc408e
types/model: remove (*Digest).Scan and Digest.Value ( #3589 )
2024-04-11 00:37:26 -07:00
Blake Mizerany
9446b795b5
types/model: remove DisplayLong ( #3587 )
2024-04-10 16:55:12 -07:00
Blake Mizerany
62f8cda3b3
types/model: remove MarshalText/UnmarshalText from Digest ( #3586 )
2024-04-10 16:52:49 -07:00
Blake Mizerany
6a1de23175
types/model: init with Name and Digest types ( #3541 )
2024-04-10 16:30:05 -07:00
Blake Mizerany
a7b431e743
server: provide helpful workaround hint when stalling on pull ( #3584 )
...
This is a quick fix to help users who are stuck on the "pull" step at
99%.
In the near future we're introducing a new registry client that
should/will hopefully be smarter. In the meantime, this should unblock
the users hitting issue #1736 .
2024-04-10 16:24:37 -07:00
Michael Yang
5a25f93522
Merge pull request #3478 from ollama/mxyng/tensor-layer
...
refactor tensor query
2024-04-10 12:45:03 -07:00
Michael Yang
7e33a017c0
partial offloading
2024-04-10 11:37:20 -07:00
Michael Yang
8b2c10061c
refactor tensor query
2024-04-10 11:37:20 -07:00
Michael Yang
c5c451ca3b
Merge pull request #3579 from ollama/mxyng/fix-ci
...
fix ci
2024-04-10 11:37:01 -07:00
Michael Yang
2b4ca6cf36
fix ci
2024-04-10 11:35:12 -07:00
Eli Bendersky
ad90b9ab3d
api: start adding documentation to package api ( #2878 )
...
* api: start adding documentation to package api
Updates #2840
* Fix lint typo report
2024-04-10 13:31:55 -04:00
Eli Bendersky
4340f8eba4
examples: start adding Go examples using api/ ( #2879 )
...
We can have the same examples as e.g. https://github.com/ollama/ollama-python/tree/main/examples
here. Using consistent naming and renaming the existing example to have -http-
since it uses direct HTTP requests rather than api/
Updates #2840
2024-04-10 13:26:45 -04:00
Daniel Hiltgen
4c7db6b7e9
Merge pull request #3566 from dhiltgen/more_time
...
Handle very slow model loads
2024-04-09 16:53:49 -07:00
Michael Yang
c03f0e3c3d
Merge pull request #3565 from ollama/mxyng/rope
...
fix: rope
2024-04-09 16:36:55 -07:00
Daniel Hiltgen
c5ff443b9f
Handle very slow model loads
...
During testing, we're seeing some models take over 3 minutes.
2024-04-09 16:35:10 -07:00
Michael Yang
01114b4526
fix: rope
2024-04-09 16:15:24 -07:00
Blake Mizerany
1524f323a3
Revert "build.go: introduce a friendlier way to build Ollama ( #3548 )" ( #3564 )
2024-04-09 15:57:45 -07:00
Blake Mizerany
fccf3eecaa
build.go: introduce a friendlier way to build Ollama ( #3548 )
...
This commit introduces a more friendly way to build Ollama dependencies
and the binary without abusing `go generate` and removing the
unnecessary extra steps it brings with it.
This script also provides nicer feedback to the user about what is
happening during the build process.
At the end, it prints a helpful message to the user about what to do
next (e.g. run the new local Ollama).
2024-04-09 14:18:47 -07:00
Michael Yang
c77d45d836
Merge pull request #3506 from ollama/mxyng/quantize-redux
...
cgo quantize
2024-04-09 12:32:53 -07:00
Jeffrey Morgan
5ec12cec6c
update llama.cpp submodule to 1b67731 ( #3561 )
2024-04-09 15:10:17 -04:00
Michael Yang
d9578d2bad
Merge pull request #3559 from ollama/mxyng/ci
...
ci: use go-version-file
2024-04-09 11:03:18 -07:00
Michael Yang
cb8352d6b4
ci: use go-version-file
2024-04-09 09:50:12 -07:00
Alex Mavrogiannis
fc6558f47f
Correct directory reference in macapp/README ( #3555 )
2024-04-09 09:48:46 -04:00
Michael Yang
9502e5661f
cgo quantize
2024-04-08 15:31:08 -07:00
Michael Yang
e1c9a2a00f
no blob create if already exists
2024-04-08 15:09:48 -07:00
writinwaters
1341ee1b56
Update README.md ( #3539 )
...
RAGFlow now supports integration with Ollama.
2024-04-08 10:58:14 -04:00
Jeffrey Morgan
63efa075a0
update generate scripts with new LLAMA_CUDA variable, set HIP_PLATFORM to avoid compiler errors ( #3528 )
2024-04-07 19:29:51 -04:00
Thomas Vitale
cb03fc9571
Docs: Remove wrong parameter for Chat Completion ( #3515 )
...
Fixes gh-3514
Signed-off-by: Thomas Vitale <ThomasVitale@users.noreply.github.com >
2024-04-06 09:08:35 -07:00
Michael Yang
a5ec9cfc0f
Merge pull request #3508 from ollama/mxyng/rope
2024-04-05 18:46:06 -07:00
Michael Yang
be517e491c
no rope parameters
2024-04-05 18:05:27 -07:00
Michael Yang
fc8e108642
Merge pull request #3496 from ollama/mxyng/cmd-r-graph
...
add command-r graph estimate
2024-04-05 12:26:21 -07:00
Daniel Hiltgen
c5d5c4a96c
Merge pull request #3491 from dhiltgen/context_bust_test
...
Add test case for context exhaustion
2024-04-04 16:20:20 -07:00
Daniel Hiltgen
dfe330fa1c
Merge pull request #3488 from mofanke/fix-windows-dll-compress
...
fix dll compress in windows building
2024-04-04 16:12:13 -07:00
Michael Yang
01f77ae25d
add command-r graph estimate
2024-04-04 14:07:24 -07:00
Daniel Hiltgen
483b81a863
Merge pull request #3494 from dhiltgen/ci_release
...
Fail fast if mingw missing on windows
2024-04-04 10:15:40 -07:00
Daniel Hiltgen
36bd967722
Fail fast if mingw missing on windows
2024-04-04 09:51:26 -07:00
Jeffrey Morgan
b0e7d35db8
use an older version of the mac os sdk in release ( #3484 )
2024-04-04 09:48:54 -07:00
Daniel Hiltgen
aeb1fb5192
Add test case for context exhaustion
...
Confirmed this fails on 0.1.30 with known regression
but passes on main
2024-04-04 07:42:17 -07:00
Daniel Hiltgen
a2e60ebcaf
Merge pull request #3490 from dhiltgen/ci_fixes
...
CI missing archive
2024-04-04 07:24:24 -07:00
Daniel Hiltgen
883ec4d1ef
CI missing archive
2024-04-04 07:23:27 -07:00
mofanke
4de0126719
fix dll compress in windows building
2024-04-04 21:27:33 +08:00
Daniel Hiltgen
9768e2dc75
Merge pull request #3481 from dhiltgen/ci_fixes
...
CI subprocess path fix
2024-04-03 19:29:09 -07:00
Daniel Hiltgen
08600d5bec
CI subprocess path fix
2024-04-03 19:12:53 -07:00
Daniel Hiltgen
a624e672d2
Merge pull request #3479 from dhiltgen/ci_fixes
...
Fix CI release glitches
2024-04-03 18:42:27 -07:00
Daniel Hiltgen
e4a7e5b2ca
Fix CI release glitches
...
The subprocess change moved the build directory
arm64 builds weren't setting cross-compilation flags when building on x86
2024-04-03 16:41:40 -07:00
Michael Yang
a0a15cfd5b
Merge pull request #3463 from ollama/mxyng/graph-estimate
...
update graph size estimate
2024-04-03 14:27:30 -07:00
Michael Yang
12e923e158
update graph size estimate
2024-04-03 13:34:12 -07:00
Jeffrey Morgan
cd135317d2
Fix macOS builds on older SDKs ( #3467 )
2024-04-03 10:45:54 -07:00
Michael Yang
4f895d633f
Merge pull request #3466 from ollama/mxyng/head-kv
...
default head_kv to 1
2024-04-03 10:41:00 -07:00
Blake Mizerany
7d05a6ee8f
cmd: provide feedback if OLLAMA_MODELS is set on non-serve command ( #3470 )
...
This also moves the checkServerHeartbeat call out of the "RunE" Cobra
stuff (that's the only word I have for that) to on-site where it's after
the check for OLLAMA_MODELS, which allows the helpful error message to
be printed before the server heartbeat check. This also arguably makes
the code more readable without the magic/superfluous "pre" function
caller.
2024-04-02 22:11:13 -07:00
Daniel Hiltgen
464d817824
Merge pull request #3464 from dhiltgen/subprocess
...
Fix numgpu opt miscomparison
2024-04-02 20:10:17 -07:00
Pier Francesco Contino
531324a9be
feat: add OLLAMA_DEBUG in ollama server help message ( #3461 )
...
Co-authored-by: Pier Francesco Contino <pfcontino@gmail.com >
2024-04-02 18:20:03 -07:00
Daniel Hiltgen
6589eb8a8c
Revert options as a ref in the server
2024-04-02 16:44:10 -07:00
Michael Yang
90f071c658
default head_kv to 1
2024-04-02 16:37:59 -07:00
Michael Yang
a039e383cd
Merge pull request #3465 from ollama/mxyng/fix-metal
...
fix metal gpu
2024-04-02 16:29:58 -07:00
Michael Yang
80163ebcb5
fix metal gpu
2024-04-02 16:06:45 -07:00
Daniel Hiltgen
a57818d93e
Merge pull request #3343 from dhiltgen/bump_more2
...
Bump llama.cpp to b2581
2024-04-02 15:08:26 -07:00
Daniel Hiltgen
841adda157
Fix windows lint CI flakiness
2024-04-02 12:22:16 -07:00
Daniel Hiltgen
0035e31af8
Bump to b2581
2024-04-02 11:53:07 -07:00
Daniel Hiltgen
c863c6a96d
Merge pull request #3218 from dhiltgen/subprocess
...
Switch back to subprocessing for llama.cpp
2024-04-02 10:49:44 -07:00
Daniel Hiltgen
1f11b52511
Refined min memory from testing
2024-04-01 16:48:33 -07:00
Daniel Hiltgen
526d4eb204
Release gpu discovery library after use
...
Leaving the cudart library loaded kept ~30m of memory
pinned in the GPU in the main process. This change ensures
we don't hold GPU resources when idle.
2024-04-01 16:48:33 -07:00
Daniel Hiltgen
0a74cb31d5
Safeguard for noexec
...
We may have users that run into problems with our current
payload model, so this gives us an escape valve.
2024-04-01 16:48:33 -07:00
Daniel Hiltgen
10ed1b6292
Detect too-old cuda driver
...
"cudart init failure: 35" isn't particularly helpful in the logs.
2024-04-01 16:48:33 -07:00
Daniel Hiltgen
4fec5816d6
Integration test improvements
...
Cleaner shutdown logic, a bit of response hardening
2024-04-01 16:48:18 -07:00
Daniel Hiltgen
0a0e9f3e0f
Apply 01-cache.diff
2024-04-01 16:48:18 -07:00
Daniel Hiltgen
58d95cc9bd
Switch back to subprocessing for llama.cpp
...
This should resolve a number of memory leak and stability defects by allowing
us to isolate llama.cpp in a separate process and shutdown when idle, and
gracefully restart if it has problems. This also serves as a first step to be
able to run multiple copies to support multiple models concurrently.
2024-04-01 16:48:18 -07:00
Patrick Devine
3b6a9154dd
Simplify model conversion ( #3422 )
2024-04-01 16:14:53 -07:00
Michael Yang
d6dd2ff839
Merge pull request #3241 from ollama/mxyng/mem
...
update memory estimations for gpu offloading
2024-04-01 13:59:14 -07:00
Michael Yang
e57a6ba89f
Merge pull request #2926 from ollama/mxyng/decode-ggml-v2
...
refactor model parsing
2024-04-01 13:58:13 -07:00
Michael Yang
12ec2346ef
Merge pull request #3442 from ollama/mxyng/generate-output
...
fix generate output
2024-04-01 13:56:09 -07:00
Michael Yang
1ec0df1069
fix generate output
2024-04-01 13:47:34 -07:00
Michael Yang
91b3e4d282
update memory calcualtions
...
count each layer independently when deciding gpu offloading
2024-04-01 13:16:32 -07:00
Michael Yang
d338d70492
refactor model parsing
2024-04-01 13:16:15 -07:00
Philipp Gillé
011bb67351
Add chromem-go to community integrations ( #3437 )
2024-04-01 11:17:37 -04:00
Saifeddine ALOUI
d124627202
Update README.md ( #3436 )
2024-04-01 11:16:31 -04:00
Jesse Zhang
b0a8246a69
Community Integration: CRAG Ollama Chat ( #3423 )
...
Corrective Retrieval Augmented Generation Demo, powered by Langgraph and Streamlit 🤗
Support:
- Ollama
- OpenAI APIs
2024-04-01 11:16:14 -04:00
Yaroslav
e6fb39c182
Update README.md ( #3378 )
...
Plugins list updated
2024-03-31 13:10:05 -04:00
sugarforever
e1f1c374ea
Community Integration: ChatOllama ( #3400 )
...
* Community Integration: ChatOllama
* fixed typo
2024-03-30 22:46:50 -04:00
Jeffrey Morgan
06a1508bfe
Update 90_bug_report.yml
2024-03-29 10:11:17 -04:00
Patrick Devine
5a5efee46b
Add gemma safetensors conversion ( #3250 )
...
Co-authored-by: Michael Yang <mxyng@pm.me >
2024-03-28 18:54:01 -07:00
Daniel Hiltgen
97ae517fbf
Merge pull request #3398 from dhiltgen/release_latest
...
CI automation for tagging latest images
2024-03-28 16:25:54 -07:00
Daniel Hiltgen
44b813e459
Merge pull request #3377 from dhiltgen/rocm_v6_bump
...
Bump ROCm to 6.0.2 patch release
2024-03-28 16:07:54 -07:00
Daniel Hiltgen
539043f5e0
CI automation for tagging latest images
2024-03-28 16:07:37 -07:00
Daniel Hiltgen
dbcace6847
Merge pull request #3392 from dhiltgen/ci_build_win_cuda
...
CI windows gpu builds
2024-03-28 16:03:52 -07:00
Daniel Hiltgen
c91a4ebcff
Bump ROCm to 6.0.2 patch release
2024-03-28 15:58:57 -07:00
Daniel Hiltgen
b79c7e4528
CI windows gpu builds
...
If we're doing generate, test windows cuda and rocm as well
2024-03-28 14:39:10 -07:00
Michael Yang
035b274b70
Merge pull request #3379 from ollama/mxyng/origins
...
fix: trim quotes on OLLAMA_ORIGINS
2024-03-28 14:14:18 -07:00
Michael Yang
9c6a254945
Merge pull request #3391 from ollama/mxyng-patch-1
2024-03-28 13:15:56 -07:00
Michael Yang
f31f2bedf4
Update troubleshooting link
2024-03-28 12:05:26 -07:00
Michael Yang
756c257553
Merge pull request #3380 from ollama/mxyng/conditional-generate
...
fix: workflows
2024-03-28 00:35:27 +01:00
Michael Yang
5255d0af8a
fix: workflows
2024-03-27 16:30:01 -07:00
Michael Yang
af8a8a6b59
fix: trim quotes on OLLAMA_ORIGINS
2024-03-27 15:24:29 -07:00
Michael Yang
461ad25015
Merge pull request #3376 from ollama/mxyng/conditional-generate
...
only generate on changes to llm subdirectory
2024-03-27 22:12:53 +01:00
Michael Yang
8838ae787d
stub stub
2024-03-27 13:59:12 -07:00
Michael Yang
db75402ade
mangle arch
2024-03-27 13:44:50 -07:00
Michael Yang
1e85a140a3
only generate on changes to llm subdirectory
2024-03-27 12:45:35 -07:00
Michael Yang
c363282fdc
Merge pull request #3375 from ollama/mxyng/conditional-generate
...
only generate cuda/rocm when changes to llm detected
2024-03-27 20:40:55 +01:00
Michael Yang
5b0c48d29e
only generate cuda/rocm when changes to llm detected
2024-03-27 12:23:09 -07:00
Jeffrey Morgan
913306f4fd
Detect arrow keys on windows ( #3363 )
...
* detect arrow keys on windows
* add some helpful comments
2024-03-26 18:21:56 -04:00
Jeffrey Morgan
f5ca7f8c8e
add license in file header for vendored llama.cpp code ( #3351 )
2024-03-26 16:23:23 -04:00
Jeffrey Morgan
856b8ec131
remove need for $VSINSTALLDIR since build will fail if ninja cannot be found ( #3350 )
2024-03-26 16:23:16 -04:00
Patrick Devine
1b272d5bcd
change github.com/jmorganca/ollama to github.com/ollama/ollama ( #3347 )
2024-03-26 13:04:17 -07:00
Christophe Dervieux
29715dbca7
malformed markdown link ( #3358 )
2024-03-26 10:46:36 -04:00
Daniel Hiltgen
54a028d07f
Merge pull request #3356 from dhiltgen/fix_arm_linux
...
Switch runner for final release job
2024-03-25 20:54:46 -07:00
Daniel Hiltgen
f83e4db365
Switch runner for final release job
...
The manifest and tagging step use a lot of disk space
2024-03-25 20:51:40 -07:00
Daniel Hiltgen
3b5866a233
Merge pull request #3353 from dhiltgen/fix_arm_linux
...
Use Rocky Linux Vault to get GCC 10.2 installed
2024-03-25 19:38:56 -07:00
Daniel Hiltgen
b8c2be6142
Use Rocky Linux Vault to get GCC 10.2 installed
...
This should hopefully only be a temporary workaround until Rocky 8
picks up GCC 10.4 which fixes the NVCC bug
2024-03-25 19:18:50 -07:00
Daniel Hiltgen
e0319bd78d
Revert "Switch arm cuda base image to centos 7"
...
This reverts commit 5dacc1ebe8 .
2024-03-25 19:01:11 -07:00
Daniel Hiltgen
b31ed7f031
Merge pull request #3352 from dhiltgen/fix_arm_linux
...
Switch arm cuda base image to centos 7
2024-03-25 16:13:10 -07:00
Daniel Hiltgen
5dacc1ebe8
Switch arm cuda base image to centos 7
...
We had started using rocky linux 8, but they've updated to GCC 10.3,
which breaks NVCC. 10.2 is compatible (or 10.4, but that's not
available from rocky linux 8 repos yet)
2024-03-25 15:57:08 -07:00
Daniel Hiltgen
c2712b5566
Merge pull request #3348 from dhiltgen/bump_llamacpp
...
Bump llama.cpp to b2527
2024-03-25 14:15:53 -07:00
Daniel Hiltgen
8091ef2eeb
Bump llama.cpp to b2527
2024-03-25 13:47:44 -07:00
Jeffrey Morgan
f38b705dc7
Fix ROCm link in development.md
2024-03-25 16:32:44 -04:00
Daniel Hiltgen
560be5e0b6
Merge pull request #3308 from dhiltgen/bump_more
...
Bump llama.cpp to b2510
2024-03-25 12:56:12 -07:00
Daniel Hiltgen
4a1c76b3aa
Merge pull request #3331 from dhiltgen/integration_testing
...
Integration tests conditionally pull
2024-03-25 12:48:51 -07:00
Daniel Hiltgen
28a64e23ca
Merge pull request #2279 from remy415/main
...
Add support for libcudart.so for CUDA devices (Adds Jetson support)
2024-03-25 12:46:28 -07:00
Niclas Pahlfer
92d74e2f59
adds ooo to community integrations ( #1623 )
...
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com >
2024-03-25 15:08:33 -04:00
Herval Freire
6f8f57dd1d
Add cliobot to ollama supported list ( #1873 )
...
* Update README.md
* Update README.md
---------
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com >
2024-03-25 15:07:19 -04:00
Chenhe Gu
b2fa68b0ea
Add Dify.AI to community integrations ( #1944 )
...
Dify.AI is a model-agnostic LLMOps platform for building and managing LLM applications.
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com >
2024-03-25 15:06:39 -04:00
Marco Antônio
3767d5ef0d
enh: add ollero.nvim to community applications ( #1905 )
...
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com >
2024-03-25 15:06:08 -04:00
Ani Betts
9fed85bc8b
Add typechat-cli to Terminal apps ( #2428 )
...
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com >
2024-03-25 15:05:04 -04:00
Miguel
4501bc0913
add new Web & Desktop link in readme for alpaca webui ( #2881 )
...
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com >
2024-03-25 15:00:18 -04:00
Danny Avila
57ba519e63
Add LibreChat to Web & Desktop Apps ( #2918 )
2024-03-25 14:59:18 -04:00
enoch1118
d98d322d24
Add Community Integration: OllamaGUI ( #2927 )
...
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com >
2024-03-25 14:58:28 -04:00
fly2tomato
0c3ec74cf1
Add Community Integration: OpenAOE ( #2946 )
...
* Update README.md
* Update README.md
---------
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com >
2024-03-25 14:57:40 -04:00
tusharhero
42ae8359fa
docs: Add AI telegram to Community Integrations. ( #3033 )
2024-03-25 14:56:42 -04:00
Timothy Carambat
e4b76dfb76
docs: Add AnythingLLM to README as integration option ( #3145 )
...
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com >
2024-03-25 14:54:48 -04:00
Jikku Jose
2c56517494
Add Saddle ( #3178 )
2024-03-25 14:54:09 -04:00
Yusuf Can Bayrak
cfbc1b152b
tlm added to README.md terminal section. ( #3274 )
2024-03-25 14:53:26 -04:00
RAPID ARCHITECT
9305ac1b2e
Update README.md ( #3288 )
...
Added Ollama Basic chat based on hyperdiv
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com >
2024-03-25 14:52:25 -04:00
drazdra
45d6292959
Update README.md ( #3338 )
...
adding drazdra/ollama-chats to the list of UI :)
2024-03-25 14:50:51 -04:00
Blake Mizerany
22921a3969
doc: specify ADAPTER is optional ( #3333 )
2024-03-25 09:43:19 -07:00
Daniel Hiltgen
7b6cbc10ec
Integration tests conditionally pull
...
If images aren't present, pull them.
Also fixes the expected responses
2024-03-25 08:57:45 -07:00
Jeremy
dfc6721b20
add support for libcudart.so for CUDA devices (adds Jetson support)
2024-03-25 11:07:44 -04:00
Blake Mizerany
acfa2b9422
llm: prevent race appending to slice ( #3320 )
2024-03-24 11:35:54 -07:00
Daniel Hiltgen
2c390a73ac
Merge pull request #3282 from dhiltgen/gpu_docs
...
Add docs for GPU selection and nvidia uvm workaround
2024-03-24 19:15:03 +01:00
Daniel Hiltgen
3e30c75f3e
Bump llama.cpp to b2510
2024-03-23 19:55:56 +01:00
Eddú Meléndez Gonzales
7e430ff352
Add Testcontainers into Libraries section ( #3291 )
...
Testcontainers provides a module for Ollama.
2024-03-23 19:55:25 +01:00
Daniel Hiltgen
1784113ef5
Merge pull request #3309 from dhiltgen/integration_testing
...
Revamp go based integration tests
2024-03-23 19:08:49 +01:00
Daniel Hiltgen
949b6c01e0
Revamp go based integration tests
...
This uplevels the integration tests to run the server which can allow
testing an existing server, or a remote server.
2024-03-23 14:24:18 +01:00
jmorganca
38daf0a252
rename .gitattributes
2024-03-23 12:40:31 +01:00
Daniel Hiltgen
43799532c1
Bump llama.cpp to b2474
...
The release just before ggml-cuda.cu refactoring
2024-03-23 09:54:56 +01:00
Daniel Hiltgen
d8fdbfd8da
Add docs for GPU selection and nvidia uvm workaround
2024-03-21 11:52:54 +01:00
Bruce MacDonald
a5ba0fcf78
doc: faq gpu compatibility ( #3142 )
2024-03-21 05:21:34 -04:00
Jeffrey Morgan
3a30bf56dc
Update faq.md
2024-03-20 17:48:39 +01:00
Daniel Hiltgen
a1c0a48524
Merge pull request #3122 from dhiltgen/better_tmp_cleanup
...
Better tmpdir cleanup
2024-03-20 16:28:03 +01:00
Daniel Hiltgen
74788b487c
Better tmpdir cleanup
...
If expanding the runners fails, don't leave a corrupt/incomplete payloads dir
We now write a pid file out to the tmpdir, which allows us to scan for stale tmpdirs
and remove this as long as there isn't still a process running.
2024-03-20 16:03:19 +01:00
Jeffrey Morgan
7ed3e94105
Update faq.md
2024-03-18 10:24:39 +01:00
jmorganca
2297ad39da
update faq.md
2024-03-18 10:17:59 +01:00
Michael Yang
01cff6136d
Merge pull request #3217 from ollama/mxyng/cleanup
...
remove global
2024-03-18 02:13:30 -07:00
Michael Yang
3c4ad0ecab
dyn global
2024-03-18 09:45:45 +01:00
Michael Yang
22f326464e
Merge pull request #3083 from ollama/mxyng/refactor-readseeker
...
refactor readseeker
2024-03-16 12:08:56 -07:00
Jeffrey Morgan
e95ffc7448
llama: remove server static assets ( #3174 )
2024-03-15 19:24:12 -07:00
Jeffrey Morgan
2dce1ab40b
add llm/ext_server directory to linguist-vendored ( #3173 )
2024-03-15 17:46:46 -07:00
Daniel Hiltgen
f4b31c2d53
Merge pull request #3111 from alitrack/main
...
Update ollama.iss
2024-03-15 16:46:59 -07:00
Daniel Hiltgen
ab3456207b
Merge pull request #3028 from ollama/ci_release
...
CI release process
2024-03-15 16:40:54 -07:00
Daniel Hiltgen
6ad414f31e
Merge pull request #3086 from dhiltgen/import_server
...
Import server.cpp to retain llava support
2024-03-15 16:10:35 -07:00
Daniel Hiltgen
052b5a3b77
Merge pull request #3171 from dhiltgen/rocm_94x
...
Add Radeon gfx940-942 GPU support
2024-03-15 15:58:33 -07:00
Daniel Hiltgen
d4c10df2b0
Add Radeon gfx940-942 GPU support
2024-03-15 15:34:58 -07:00
Daniel Hiltgen
540f4af45f
Wire up more complete CI for releases
...
Flesh out our github actions CI so we can build official releaes.
2024-03-15 12:37:36 -07:00
Blake Mizerany
6ce37e4d96
llm,readline: use errors.Is instead of simple == check ( #3161 )
...
This fixes some brittle, simple equality checks to use errors.Is. Since
go1.13, errors.Is is the idiomatic way to check for errors.
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com >
2024-03-15 07:14:12 -07:00
Blake Mizerany
703684a82a
server: replace blob prefix separator from ':' to '-' ( #3146 )
...
This fixes issues with blob file names that contain ':' characters to be rejected by file systems that do not support them.
2024-03-14 20:18:06 -07:00
Daniel Hiltgen
6459377ae0
Add ROCm support to linux install script ( #2966 )
2024-03-14 18:00:16 -07:00
Blake Mizerany
8546dd3d72
.github: fix model and feature request yml ( #3155 )
2024-03-14 15:26:06 -07:00
Blake Mizerany
87100be5e0
.github: add issue templates ( #3143 )
2024-03-14 15:19:10 -07:00
Michael Yang
e87c780ff9
Merge pull request #3149 from ollama/mxyng/fix-memory-leak
...
fix: clip memory leak
2024-03-14 13:34:15 -07:00
Michael Yang
291c663865
fix: clip memory leak
2024-03-14 13:12:42 -07:00
Daniel Hiltgen
da20786e3e
Merge pull request #3068 from dhiltgen/win_pipe
...
Use stdin for term discovery on windows
2024-03-14 11:55:19 -07:00
Jeffrey Morgan
5ce997a7b9
Update README.md
2024-03-13 21:12:17 -07:00
Jeffrey Morgan
672ffe9b7d
add OLLAMA_KEEP_ALIVE to environment variable docs for ollama serve ( #3127 )
2024-03-13 14:35:33 -07:00
Patrick Devine
47cfe58af5
Default Keep Alive environment variable ( #3094 )
...
---------
Co-authored-by: Chris-AS1 <8493773+Chris-AS1@users.noreply.github.com >
2024-03-13 13:29:40 -07:00
Daniel Hiltgen
c1a81c6fe3
Use stdin for term discovery on windows
...
When you feed input to the cmd via a pipe it no longer reports a warning
2024-03-13 10:37:31 -07:00
Steven Lee
152ab524c2
Update ollama.iss
...
add arm64 support
2024-03-13 20:15:45 +08:00
Jeffrey Morgan
e72c567cfd
restore locale patch ( #3091 )
2024-03-12 22:08:13 -07:00
Bruce MacDonald
3e22611200
token repeat limit for prediction requests ( #3080 )
2024-03-12 22:08:25 -04:00
Daniel Hiltgen
a54d4a28dc
Merge pull request #3088 from dhiltgen/rocm_igpu_linux
...
Fix iGPU detection for linux
2024-03-12 17:20:27 -07:00
Daniel Hiltgen
82b0c7c27e
Fix iGPU detection for linux
...
This fixes a few bugs in the new sysfs discovery logic. iGPUs are now
correctly identified by their <1G VRAM reported. the sysfs IDs are off
by one compared to what HIP wants due to the CPU being reported
in amdgpu, but HIP only cares about GPUs.
2024-03-12 16:57:19 -07:00
Patrick Devine
ba7cf7fb66
add more docs on for the modelfile message command ( #3087 )
2024-03-12 16:41:41 -07:00
Bruce MacDonald
2f804068bd
warn when json format is expected but not mentioned in prompt ( #3081 )
2024-03-12 19:07:11 -04:00
Daniel Hiltgen
85129d3a32
Adapt our build for imported server.cpp
2024-03-12 14:57:15 -07:00
Daniel Hiltgen
9ac6440da3
Import server.cpp as of b2356
2024-03-12 13:58:06 -07:00
Michael Yang
0085297928
refactor readseeker
2024-03-12 12:54:18 -07:00
Daniel Hiltgen
34d00f90b1
Merge pull request #3070 from dhiltgen/visible_devices
...
Add docs explaining GPU selection env vars
2024-03-12 11:36:46 -07:00
Daniel Hiltgen
b53229a2ed
Add docs explaining GPU selection env vars
2024-03-12 11:33:06 -07:00
racerole
53c107e20e
chore: fix typo ( #3073 )
...
Signed-off-by: racerole <jiangyifeng@outlook.com >
2024-03-12 14:09:22 -04:00
mofanke
51578d8573
fix gpu_info_cuda.c compile warning ( #3077 )
2024-03-12 14:08:40 -04:00
Jeffrey Morgan
b5fcd9d3aa
use -trimpath when building releases ( #3069 )
2024-03-11 15:58:46 -07:00
Bruce MacDonald
b80661e8c7
relay load model errors to the client ( #3065 )
2024-03-11 16:48:27 -04:00
Jeffrey Morgan
6d3adfbea2
Update troubleshooting.md
2024-03-11 13:22:28 -07:00
Jeffrey Morgan
369eda65f5
update llama.cpp submodule to ceca1ae ( #3064 )
2024-03-11 12:57:48 -07:00
Michael Yang
f878e91070
Merge pull request #3044 from ollama/mxyng/fix-convert-shape
...
convert: fix shape
2024-03-11 09:56:57 -07:00
Daniel Hiltgen
0d651478e4
Merge pull request #3056 from dhiltgen/rocm_link_clash
...
Avoid rocm runner and dependency clash
2024-03-11 09:48:48 -07:00
Michael Yang
9ea492f1ce
convert: fix shape
2024-03-11 09:41:01 -07:00
Daniel Hiltgen
bc13da2bfe
Avoid rocm runner and dependency clash
...
Putting the rocm symlink next to the runners is risky. This moves
the payloads into a subdir to avoid potential clashes.
2024-03-11 09:33:22 -07:00
Jeffrey Morgan
41b00b9856
fix 03-locale.diff
2024-03-10 16:21:05 -07:00
Daniel Hiltgen
c2a8ed48e7
Merge pull request #3048 from dhiltgen/harden_rocm_deps
...
Harden for deps file being empty (or short)
2024-03-10 15:17:22 -07:00
Daniel Hiltgen
3dc1bb6a35
Harden for deps file being empty (or short)
2024-03-10 14:45:38 -07:00
Daniel Hiltgen
7865a6996a
Merge pull request #3046 from dhiltgen/rocm_search_paths
...
Add ollama executable peer dir for rocm
2024-03-10 12:30:56 -07:00
Daniel Hiltgen
00ec269321
Add ollama executable peer dir for rocm
...
This allows people who package up ollama on their own to place
the rocm dependencies in a peer directory to the ollama executable
much like our windows install flow.
2024-03-10 12:16:30 -07:00
Jeffrey Morgan
908005d90b
patch: use default locale in wpm tokenizer ( #3034 )
2024-03-09 21:12:12 -08:00
Jeffrey Morgan
cdf65e793f
only copy deps for amd64 in build_linux.sh
2024-03-09 17:55:22 -08:00
Daniel Hiltgen
82ca694d68
Rename ROCm deps file to avoid confusion ( #3025 )
2024-03-09 17:48:38 -08:00
Jeffrey Morgan
5017a15bcb
add macapp to .dockerignore
2024-03-09 16:07:06 -08:00
Jeffrey Morgan
e11668aa07
add bundle_metal and cleanup_metal funtions to gen_darwin.sh
2024-03-09 16:04:57 -08:00
Jeffrey Morgan
0bd0f4a29c
tidy cleanup logs
2024-03-09 15:56:48 -08:00
Jeffrey Morgan
1ffb1e2874
update llama.cpp submodule to 77d1ac7 ( #3030 )
2024-03-09 15:55:34 -08:00
Daniel Hiltgen
0a7844413c
Merge pull request #3026 from dhiltgen/win_rocm_docs
...
Doc how to set up ROCm builds on windows
2024-03-09 14:17:19 -08:00
Jeffrey Morgan
f9cd55c70b
disable gpu for certain model architectures and fix divide-by-zero on memory estimation
2024-03-09 12:51:38 -08:00
Daniel Hiltgen
0fdebb34a9
Doc how to set up ROCm builds on windows
2024-03-09 11:29:45 -08:00
Daniel Hiltgen
ac64cd4ef9
Merge pull request #3008 from dhiltgen/no_more_idempotent
...
Finish unwinding idempotent payload logic
2024-03-09 09:13:24 -08:00
Daniel Hiltgen
4a5c9b8035
Finish unwinding idempotent payload logic
...
The recent ROCm change partially removed idempotent
payloads, but the ggml-metal.metal file for mac was still
idempotent. This finishes switching to always extract
the payloads, and now that idempotentcy is gone, the
version directory is no longer useful.
2024-03-09 08:34:39 -08:00
Jeffrey Morgan
efe5617b64
update llama.cpp submodule to c2101a2 ( #3020 )
2024-03-09 00:44:50 -08:00
Jeffrey Morgan
5b3fad9636
separate out isLocalIP
2024-03-09 00:22:08 -08:00
Jeffrey Morgan
bfec2c6e10
simplify host checks
2024-03-08 23:29:53 -08:00
Jeffrey Morgan
5c143af726
add additional allowed hosts
2024-03-08 23:23:59 -08:00
Jeffrey Morgan
6c0af2599e
Update docs README.md and table of contents
2024-03-08 22:45:11 -08:00
Jeffrey Morgan
fc8c044584
add allowed host middleware and remove workDir middleware ( #3018 )
2024-03-08 22:23:47 -08:00
Michael Yang
ecc133d843
Merge pull request #3014 from ollama/mxyng/decode-ggla
2024-03-08 16:14:53 -08:00
Michael Yang
76bdebbadf
decode ggla
2024-03-08 15:46:25 -08:00
Michael Yang
18979ad4a1
convert: fix default shape
2024-03-08 15:42:48 -08:00
Michael Yang
8e0ef931d8
Merge pull request #2990 from ollama/mxyng/default-term-size
...
fix: default terminal width, height
2024-03-08 15:20:54 -08:00
Daniel Hiltgen
280da44522
Merge pull request #2988 from dhiltgen/rocm_docs
...
Refined ROCm troubleshooting docs
2024-03-08 13:33:30 -08:00
Bruce MacDonald
0cebc79cba
fix: allow importing a model from name reference ( #3005 )
2024-03-08 12:27:47 -05:00
Jeffrey Morgan
0e4669b04f
update llama.cpp submodule to 6cdabe6 ( #2999 )
2024-03-08 00:26:20 -08:00
Jeffrey Morgan
b886bec3f9
Update api.md
2024-03-07 23:27:51 -08:00
Jeffrey Morgan
fc06205971
Revert "adjust download and upload concurrency based on available bandwidth" ( #2995 )
2024-03-07 18:10:16 -08:00
Blake Mizerany
2ada81e068
cmd: tighten up env var usage sections ( #2962 )
...
Also, document OLLAMA_HOST client semantics per command that honors it.
This looks nicer than having a general puprose environment variable
section in the root usage which was showing up after the "addition help
topics" section outputed by Cobra's default template.
It was decided this was easier to work with than using a custom template
for Cobra right now.
2024-03-07 13:57:07 -08:00
Michael Yang
b1e74d4fda
default terminal width, height
2024-03-07 11:35:42 -08:00
Michael Yang
f678f5c5c3
Merge pull request #2991 from ollama/mxyng/fix-ci
...
fix ci
2024-03-07 11:35:06 -08:00
Michael Yang
2cb74e23fb
fix ci
2024-03-07 11:33:49 -08:00
Daniel Hiltgen
69f0227813
Refined ROCm troubleshooting docs
2024-03-07 11:22:37 -08:00
Daniel Hiltgen
3c8df3808b
Merge pull request #2885 from dhiltgen/rocm_v6_only
...
Revamp ROCm support
2024-03-07 10:51:00 -08:00
Michael Yang
7d564835c2
Merge pull request #2985 from ollama/rm-empty-examples
...
remove empty examples
2024-03-07 10:49:40 -08:00
Michael Yang
72431031d9
no ci test on docs, examples
2024-03-07 10:44:48 -08:00
Michael Yang
6041abb5b2
remove empty examples
2024-03-07 10:40:32 -08:00
Daniel Hiltgen
6c5ccb11f9
Revamp ROCm support
...
This refines where we extract the LLM libraries to by adding a new
OLLAMA_HOME env var, that defaults to `~/.ollama` The logic was already
idempotenent, so this should speed up startups after the first time a
new release is deployed. It also cleans up after itself.
We now build only a single ROCm version (latest major) on both windows
and linux. Given the large size of ROCms tensor files, we split the
dependency out. It's bundled into the installer on windows, and a
separate download on windows. The linux install script is now smart and
detects the presence of AMD GPUs and looks to see if rocm v6 is already
present, and if not, then downloads our dependency tar file.
For Linux discovery, we now use sysfs and check each GPU against what
ROCm supports so we can degrade to CPU gracefully instead of having
llama.cpp+rocm assert/crash on us. For Windows, we now use go's windows
dynamic library loading logic to access the amdhip64.dll APIs to query
the GPU information.
2024-03-07 10:36:50 -08:00
Michael Yang
2e20110e50
Merge pull request #2221 from ollama/mxyng/up-down-ccy
...
adjust download and upload concurrency based on available bandwidth
2024-03-07 09:27:33 -08:00
Daniel Hiltgen
82ddc3e441
Merge pull request #2964 from dhiltgen/mem_limit_var
...
Allow setting max vram for workarounds
2024-03-07 09:25:44 -08:00
Jeffrey Morgan
d481fb3cc8
update go to 1.22 in other places ( #2975 )
2024-03-07 07:39:49 -08:00
DJ Johnson
23ee633252
docs: Add LLM-X to Web Integration section ( #2759 )
2024-03-07 10:11:53 -05:00
John
23ebe8fe11
fix some typos ( #2973 )
...
Signed-off-by: hishope <csqiye@126.com >
2024-03-06 22:50:11 -08:00
Patrick Devine
2c017ca441
Convert Safetensors to an Ollama model ( #2824 )
2024-03-06 21:01:51 -08:00
Daniel Hiltgen
be330174dd
Allow setting max vram for workarounds
...
Until we get all the memory calculations correct, this can provide
and escape valve for users to workaround out of memory crashes.
2024-03-06 17:15:06 -08:00
Blake Mizerany
0ded7fdc4b
cmd: document environment variables for serve command
...
Updates #2944
2024-03-06 13:48:46 -08:00
Leo
2103a5073c
Add Odin Runes, a Feature-Rich Java UI for Ollama, to README ( #2440 )
...
* Add Odin Runes to README
Add Odin Runes to README
This commit adds Odin Runes to the "Community Integrations" section of the README. Odin Runes is a Java-based GPT client designed to provide seamless interaction with GPT models, enhancing productivity in prompt engineering and text generation tasks. This addition highlights the integration between Odin Runes and Ollama, offering users the flexibility to leverage large language models locally within their development workflow.
* Update README.md
this commit applies the comments of the reviewer.
2024-03-06 11:57:49 -08:00
Jeffrey Morgan
ce9f7c4674
Update api.md
2024-03-05 13:13:23 -08:00
Anders Rex
e5596c1944
Add NotesOllama to Community Integrations ( #2909 )
2024-03-04 01:18:10 -08:00
Timothy Graupmann
9bc3fee694
Added community link for Ollama Copilot ( #2582 )
...
* Added community link for Ollama Copilot
* Update README.md
---------
Co-authored-by: Michael <mchiang0610@users.noreply.github.com >
2024-03-04 00:40:36 -08:00
Jeffrey Morgan
21347e1ed6
update llama.cpp submodule to c29af7e ( #2868 )
2024-03-01 15:26:04 -08:00
Jeffrey Morgan
3b4bab3dc5
Fix embeddings load model behavior ( #2848 )
2024-02-29 17:40:56 -08:00
Daniel Hiltgen
cbd6e3b38e
Merge pull request #2838 from dhiltgen/opensuse
...
Add ollama user to video group
2024-02-29 15:47:56 -08:00
Daniel Hiltgen
b830afa716
Merge pull request #2837 from dhiltgen/podman_image_support
...
Add env var so podman will map cuda GPUs
2024-02-29 15:47:37 -08:00
Daniel Hiltgen
bd1d8b0d14
Merge pull request #2836 from bmwiedemann/gzip
...
Omit build date from gzip headers
2024-02-29 15:46:46 -08:00
fred-bf
25c2912120
Add Community Integration: NextChat ( #2780 )
2024-02-29 12:12:13 -08:00
Michael Yang
0e19476b56
prepend image tags ( #2789 )
...
instead of appending image tags, prepend them - this generally produces better results
2024-02-29 11:30:14 -08:00
tylinux
fa2f2b3563
fix: print usedMemory size right ( #2827 )
2024-02-29 11:11:04 -08:00
Jeffrey Morgan
cbf4970e0f
bump submodule to 87c91c07663b707e831c59ec373b5e665ff9d64a ( #2828 )
2024-02-29 09:42:08 -08:00
Daniel Hiltgen
74468513bd
Add ollama user to video group
...
On OpenSUSE, ollama needs to be a member of the video group
to access the GPU
2024-02-29 08:50:10 -08:00
Daniel Hiltgen
794a916a72
Add env var so podman will map cuda GPUs
...
Without this env var, podman's GPU logic doesn't map the GPU through
2024-02-29 08:43:08 -08:00
Bernhard M. Wiedemann
76e5d9ec88
Omit build date from gzip headers
...
See https://reproducible-builds.org/ for why this is good.
This patch was done while working on reproducible builds for openSUSE.
2024-02-29 16:48:19 +01:00
Daniel Hiltgen
076237b8ea
Merge pull request #2771 from dhiltgen/toggle_models
...
Bump llama.cpp to b2276
2024-02-27 11:29:53 -08:00
Daniel Hiltgen
53d694c67f
Merge pull request #2772 from dhiltgen/container_image
...
Refine container image build script
2024-02-27 11:29:08 -08:00
Daniel Hiltgen
5aa6bfea94
Merge pull request #2785 from dhiltgen/win_download
...
Log unexpected server errors checking for update
2024-02-27 10:43:14 -08:00
Daniel Hiltgen
1cde63dd64
Log unexpected server errors checking for update
...
This should unmask some failure modes that likely
show up in app logs as unmarshal errors
2024-02-27 09:17:04 -08:00
Daniel Hiltgen
98e0b7e94f
Refine container image build script
...
Allow overriding the platform, image name, and tag latest for
standard and rocm images.
2024-02-26 17:26:49 -08:00
Daniel Hiltgen
061e8f6abc
Bump llama.cpp to b2276
2024-02-26 16:49:24 -08:00
peanut256
a189810df6
Determine max VRAM on macOS using recommendedMaxWorkingSetSize ( #2354 )
...
* read iogpu.wired_limit_mb on macOS
Fix for https://github.com/ollama/ollama/issues/1826
* improved determination of available vram on macOS
read the recommended maximal vram on macOS via Metal API
* Removed macOS-specific logging
* Remove logging from gpu_darwin.go
* release Core Foundation object
fixes a possible memory leak
2024-02-25 18:16:45 -05:00
Ikko Eltociear Ashimine
e95b896790
Update types.go ( #2744 )
...
specfied -> specified
2024-02-25 13:41:25 -05:00
elthommy
1f087c4d26
Update langchain python tutorial ( #2737 )
...
Remove unused GPT4all
Use nomic-embed-text as embedded model
Fix a deprecation warning (__call__)
2024-02-25 00:31:36 -05:00
Jeffrey Morgan
5d7ea6616f
no extra disk space for windows installation ( #2739 )
2024-02-25 00:20:35 -05:00
Michael Yang
2a4b128ae3
Merge pull request #2719 from ollama/mxyng/format-private-key
...
remove format private key
2024-02-23 17:15:14 -08:00
Michael Yang
fc483274ad
clean up go.mod
2024-02-23 16:53:36 -08:00
Michael Yang
fd10a2ad4b
remove format/openssh.go
...
this is unnecessary now that x/crypto/ssh.MarshalPrivateKey has been
added
2024-02-23 16:52:23 -08:00
Benn Huang
b291f63188
Add Community Integration: Chatbox
...
Co-authored-by: bennhuang <bennhuang@tencent.com >
2024-02-23 07:17:28 -05:00
Jeffrey Morgan
f58856bf6f
better directory cleanup in ollama.iss
2024-02-23 07:14:59 -05:00
Jeffrey Morgan
275ea01587
restore windows build flags and compression
2024-02-22 18:07:18 -05:00
Jeffrey Morgan
8782dd5628
fix build_windows.ps1 script to run go build with the correct flags
2024-02-22 17:41:43 -05:00
Jeffrey Morgan
11bfff8ee1
update llama.cpp submodule to 96633eeca1265ed03e57230de54032041c58f9cd
2024-02-22 16:44:26 -05:00
Logan Yang
7c0167a8f6
Add copilot for obsidian plugin to community integration ( #1918 )
2024-02-22 14:17:20 -05:00
LangChain4j
74d898e37d
Added LangChain4j links ( #1690 )
2024-02-22 14:09:08 -05:00
Yuan-Man
c6e8b00718
Add README.md ( #2249 )
2024-02-22 14:03:44 -05:00
B-Tocs.org Community
be9980ef13
Update README.md - Ollama for SAP ABAP ( #2510 )
2024-02-22 13:12:27 -05:00
Augustinas Malinauskas
646a0dedb9
Update README.md ( #2504 )
...
- Enchanted is now supported for desktop on macOS
2024-02-22 13:09:29 -05:00
Azhar Khan
7f964d938c
update README to add Gemma 2B, 7B model in Model Library Table ( #2686 )
2024-02-22 13:07:47 -05:00
Pavel Frankov
e6b8a139ff
Update README.md ( #2138 )
2024-02-22 10:52:36 -05:00
Jeffrey Morgan
bdc0ea1ba5
Update import.md
2024-02-22 02:08:03 -05:00
Jeffrey Morgan
7fab7918cc
Update import.md
2024-02-22 02:06:24 -05:00
Michael Yang
74c1bdba0d
Merge pull request #2657 from joshyan1/patch-1
...
Update install.sh success message
2024-02-21 15:55:20 -08:00
Josh
f983ef7f5f
Update install.sh success message
2024-02-21 18:30:01 -05:00
Jeffrey Morgan
1ae1c33651
Windows build + installer adjustments ( #2656 )
...
* remove `-w -s` linker flags on windows
* use `zip` for windows installer compression
2024-02-21 18:21:26 -05:00
Michael Yang
084d846621
refactor
2024-02-21 13:42:48 -08:00
Michael Yang
6a4b994433
lint
2024-02-21 13:42:48 -08:00
Michael Yang
bea007deb7
use LimitGroup for uploads
2024-02-21 13:42:48 -08:00
Michael Yang
074934be03
adjust group limit based on download speed
2024-02-21 13:42:48 -08:00
Michael Yang
0de12368a0
add new LimitGroup for dynamic concurrency
2024-02-21 13:42:48 -08:00
Michael Yang
917bd61084
refactor download run
2024-02-21 13:42:46 -08:00
Jeffrey Morgan
efe040f8c0
reset with init_vars ahead of each cpu build in gen_windows.ps1 ( #2654 )
2024-02-21 16:35:34 -05:00
Jeffrey Morgan
2a7553ce09
update llama.cpp submodule to c14f72d
2024-02-21 09:03:14 -05:00
Sun Bo
10af6070a9
Update big-AGI config file link ( #2626 )
...
Co-authored-by: bo.sun <bo.sun@cotticoffee.com >
2024-02-21 01:24:48 -05:00
Jeffrey Morgan
92423b0600
add dist directory in build_windows.ps
2024-02-21 00:05:05 -05:00
Jeffrey Morgan
b3eac61cac
update llama.cpp submodule to f0d1fafc029a056cd765bdae58dcaa12312e9879
2024-02-20 22:56:51 -05:00
Jeffrey Morgan
287ba11500
better error message when calling /api/generate or /api/chat with embedding models
2024-02-20 21:53:45 -05:00
Jeffrey Morgan
63861f58cc
Support for bert and nomic-bert embedding models
2024-02-20 21:37:29 -05:00
Jeffrey Morgan
f0425d3de9
Update faq.md
2024-02-20 20:44:45 -05:00
Michael Yang
210b65268e
replace strings buffer with hasher ( #2437 )
...
the buffered value is going into the hasher eventually so write directly
to the hasher instead
2024-02-20 19:07:50 -05:00
Michael Yang
949d7b1c48
add gguf file types ( #2532 )
2024-02-20 19:06:29 -05:00
Michael Yang
897b213468
use http.DefaultClient ( #2530 )
...
default client already handles proxy
2024-02-20 18:34:47 -05:00
Jeffrey Morgan
4613a080e7
update llama.cpp submodule to 66c1968f7 ( #2618 )
2024-02-20 17:42:31 -05:00
Muhammed Nazeem
ace2cdf1c6
Add Page Assist to the community integrations ( #2447 )
2024-02-20 14:03:58 -05:00
Nikesh Parajuli
eed92bc19a
docs: add Msty app in readme ( #1775 )
...
* docs: add Msty app in readme
* docs: update msty url
2024-02-20 14:03:33 -05:00
Michael Edoror
e0a2f46466
Update README.md to include Elixir LangChain Library ( #2180 )
...
The Elixir LangChain Library now supports Ollama Chat with this [PR](https://github.com/brainlid/langchain/pull/70 )
2024-02-20 14:03:02 -05:00
Taras Tsugrii
01ff2e14db
[nit] Remove unused msg local var. ( #2511 )
2024-02-20 14:02:34 -05:00
BADR
199e79ec0c
docs: add tenere to terminal clients ( #2329 )
2024-02-19 23:13:03 -05:00
Jeffrey Morgan
8125ce4cb6
Update import.md
...
Add instructions to get public key on windows
2024-02-19 22:48:24 -05:00
Daniel
636d6eea99
Add ShellOracle to community terminal integrations ( #1767 )
2024-02-19 22:18:05 -05:00
Jeffrey Morgan
df56f1ee5e
Update faq.md
2024-02-19 22:16:42 -05:00
Jean-Baptiste Detroyes
0b6c6c9092
feat: add Helm Chart link to Package managers list ( #1673 )
2024-02-19 22:05:14 -05:00
Jakob Hoeg Mørk
cb60389de7
NextJS web interface for Ollama ( #2466 )
2024-02-19 21:57:36 -05:00
lulz
ce0c95d097
[fix] /bye and /exit are now treated as prefixes ( #2381 )
...
* [fix] /bye and /exit are now treated as prefixes
instead of being treated as entire lines which doesn't align with the way the rest of the commands are treated
* Update cmd/interactive.go
Fixing whitespace
---------
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com >
2024-02-19 21:56:49 -05:00
Eddú Meléndez Gonzales
a9bc1e1c37
Add LangChain4J ( #2164 )
2024-02-19 21:17:32 -05:00
Branislav Gerazov
62c71f4cb1
add ollama-chat.nvim ( #2188 )
2024-02-19 21:14:29 -05:00
Jeffrey Morgan
41aca5c2d0
Update faq.md
2024-02-19 21:11:01 -05:00
Jeffrey Morgan
753724d867
Update api.md to include examples for reproducible outputs
2024-02-19 20:36:16 -05:00
Jeffrey Morgan
e4576c2ee1
Update README.md
2024-02-19 20:15:24 -05:00
Patrick Devine
9a7a4b9533
add faqs for memory pre-loading and the keep_alive setting ( #2601 )
2024-02-19 14:45:25 -08:00
Daniel Hiltgen
2653191222
Merge pull request #2600 from dhiltgen/refined_win_docs
...
Document setting server vars for windows
2024-02-19 13:46:37 -08:00
Daniel Hiltgen
b338c0635f
Document setting server vars for windows
2024-02-19 13:30:46 -08:00
Daniel Hiltgen
4fcbf1cde6
Merge pull request #2599 from dhiltgen/fix_avx
...
Explicitly disable AVX2 on GPU builds
2024-02-19 13:13:05 -08:00
Daniel Hiltgen
9220b4fa91
Merge pull request #2585 from dhiltgen/cuda_leaks
...
Fix cuda leaks
2024-02-19 12:48:00 -08:00
Daniel Hiltgen
fc39a6cd7a
Fix cuda leaks
...
This should resolve the problem where we don't fully unload from the GPU
when we go idle.
2024-02-18 18:37:20 -08:00
Justin Hayes
1e23e82324
Update Web UI link to new project name ( #2563 )
...
Ollama WebUI is now known as Open WebUI.
2024-02-17 20:05:20 -08:00
Daniel Hiltgen
f9fd08040b
Merge pull request #2552 from dhiltgen/dup_update_menus
...
Fix duplicate menus on update and exit on signals
2024-02-16 17:23:37 -08:00
Daniel Hiltgen
4318e35ee3
Merge pull request #2553 from dhiltgen/amdgpu_version
...
Harden AMD driver lookup logic
2024-02-16 17:23:12 -08:00
Daniel Hiltgen
9754c6d9d8
Harden AMD driver lookup logic
...
It looks like the version file doesnt exist on older(?) drivers
2024-02-16 16:20:16 -08:00
Daniel Hiltgen
a497235a55
Fix view logs menu
2024-02-16 15:42:53 -08:00
Daniel Hiltgen
df6dc4fd96
Fix duplicate menus on update and exit on signals
...
Also fixes a few fit-and-finish items for better developer experience
2024-02-16 15:33:16 -08:00
Bruce MacDonald
88622847c6
fix: chat system prompting overrides ( #2542 )
2024-02-16 14:42:43 -05:00
Tristan Rhodes
9774663013
Update faq.md with the location of models on Windows ( #2545 )
2024-02-16 11:04:19 -08:00
Daniel Hiltgen
a468ae0459
Merge pull request #2499 from ollama/windows-preview
...
Windows Preview
2024-02-15 16:06:32 -08:00
Daniel Hiltgen
c3e62ba38a
Merge pull request #2516 from dhiltgen/single_tray_app
...
Fix a couple duplicate instance bugs
2024-02-15 15:52:43 -08:00
Daniel Hiltgen
117369aa73
Exit if we detect another copy of Ollama running
2024-02-15 14:58:29 -08:00
Daniel Hiltgen
1ba734de67
typo
2024-02-15 14:56:55 -08:00
Daniel Hiltgen
5208cf09b1
clean up some logging
2024-02-15 14:56:55 -08:00
Daniel Hiltgen
bb9de6037c
Prevent multiple installers running concurrently
2024-02-15 14:56:55 -08:00
Daniel Hiltgen
272e53a1f5
Prepare to distribute standalone windows executable
...
This will be useful for our automated test riggig, and may be useful for
advanced users who want to "roll their own" system service
2024-02-15 14:56:55 -08:00
Daniel Hiltgen
db2a9ad1fe
Explicitly disable AVX2 on GPU builds
...
Even though we weren't setting it to on, somewhere in the cmake config
it was getting toggled on. By explicitly setting it to off, we get `/arch:AVX`
as intended.
2024-02-15 14:50:11 -08:00
Daniel Hiltgen
c9ab1aead3
Merge pull request #2526 from dhiltgen/harden_for_quotes
...
Harden the OLLAMA_HOST lookup for quotes
2024-02-15 14:13:40 -08:00
Daniel Hiltgen
4a10e7a7fa
Harden the OLLAMA_HOST lookup for quotes
2024-02-15 13:46:56 -08:00
Michael Yang
86808f80a8
remove unused import
2024-02-15 12:09:11 -08:00
Michael Yang
4240b045e6
always enable view logs
2024-02-15 12:08:27 -08:00
Michael Yang
e547378893
disable default debug
2024-02-15 12:05:13 -08:00
Michael Yang
fd77dbec4d
do not print update request headers
2024-02-15 11:36:35 -08:00
Michael
fefb3e77d1
Update README.md
2024-02-15 10:32:40 -08:00
Jeffrey Morgan
ed5489a96e
higher resolution tray icons
2024-02-14 22:55:03 -08:00
jmorganca
76113742cf
update installer title
2024-02-15 05:56:45 +00:00
Jeffrey Morgan
57e60c836f
better windows app and tray icons
2024-02-15 05:56:45 +00:00
jmorganca
622b1f3e67
update installer and app.exe metadata
2024-02-15 05:56:45 +00:00
jmorganca
7ad9844ac0
set exe metadata using resource files
2024-02-15 05:56:45 +00:00
Michael Yang
e43648afe5
rerefactor
2024-02-15 05:56:45 +00:00
Daniel Hiltgen
823a520266
Fix lint error on ignored error for win console
2024-02-15 05:56:45 +00:00
vinjn
66ef308abd
Import "containerd/console" lib to support colorful output in Windows terminal
2024-02-15 05:56:45 +00:00
Daniel Hiltgen
29e90cc13b
Implement new Go based Desktop app
...
This focuses on Windows first, but coudl be used for Mac
and possibly linux in the future.
2024-02-15 05:56:45 +00:00
Daniel Hiltgen
f397e0e988
Move hub auth out to new package
2024-02-15 05:56:45 +00:00
Daniel Hiltgen
9da9e8fb72
Move Mac App to a new dir
2024-02-15 05:56:45 +00:00
Patrick Devine
42e77e2a69
handle race condition while setting raw mode in windows ( #2509 )
2024-02-14 21:28:35 -08:00
Jeffrey Morgan
9241a29336
Revert "Revert "bump submodule to 6c00a06 ( #2479 )"" ( #2485 )
...
This reverts commit 6920964b87 .
2024-02-13 18:18:41 -08:00
Jeffrey Morgan
f7231ad9ad
set shutting_down to false once shutdown is complete ( #2484 )
2024-02-13 17:48:41 -08:00
Jeffrey Morgan
6920964b87
Revert "bump submodule to 6c00a06 ( #2479 )"
...
This reverts commit 2f9ed52bbd .
2024-02-13 17:23:05 -08:00
Jeffrey Morgan
2f9ed52bbd
bump submodule to 6c00a06 ( #2479 )
2024-02-13 17:12:42 -08:00
bnorick
caf2b13c10
Fix infinite keep_alive ( #2480 )
2024-02-13 15:40:32 -08:00
lebrunel
1d263449ff
Update README.md to include link to Ollama-ex Elixir library ( #2477 )
2024-02-13 11:40:44 -08:00
Jeffrey Morgan
48a273f80b
Fix issues with templating prompt in chat mode ( #2460 )
2024-02-12 15:06:57 -08:00
Daniel Hiltgen
939c60473f
Merge pull request #2422 from dhiltgen/better_kill
...
More robust shutdown
2024-02-12 14:05:06 -08:00
Jeffrey Morgan
f76ca04f9e
update submodule to 099afc6 ( #2468 )
2024-02-12 14:01:16 -08:00
Daniel Hiltgen
76b8728f0c
Merge pull request #2465 from dhiltgen/block_rocm_pre_9
...
Detect AMD GPU info via sysfs and block old cards
2024-02-12 12:41:43 -08:00
Jeffrey Morgan
1f9078d6ae
Check image filetype in api handlers ( #2467 )
2024-02-12 11:16:20 -08:00
Daniel Hiltgen
6d84f07505
Detect AMD GPU info via sysfs and block old cards
...
This wires up some new logic to start using sysfs to discover AMD GPU
information and detects old cards we can't yet support so we can fallback to CPU mode.
2024-02-12 08:19:41 -08:00
Jeffrey Morgan
26b13fc33c
patch: always add token to cache_tokens ( #2459 )
2024-02-12 08:10:16 -08:00
Jeffrey Morgan
1c8435ffa9
Update domain name references in docs and install script ( #2435 )
2024-02-09 15:19:30 -08:00
Daniel Hiltgen
6680761596
Shutdown faster
...
Make sure that when a shutdown signal comes, we shutdown quickly instead
of waiting for a potentially long exchange to wrap up.
2024-02-08 22:22:50 -08:00
Jeffrey Morgan
42b797ed9c
Update openai.md
2024-02-08 15:03:23 -05:00
Jeffrey Morgan
336aa43f3c
Update openai.md
2024-02-08 12:48:28 -05:00
Daniel Hiltgen
69f392c9b7
Merge pull request #2403 from dhiltgen/handle_tmp_cleanup
...
Ensure the libraries are present
2024-02-07 17:55:31 -08:00
Daniel Hiltgen
a1dfab43b9
Ensure the libraries are present
...
When we store our libraries in a temp dir, a reaper might clean
them when we are idle, so make sure to check for them before
we reload.
2024-02-07 17:27:49 -08:00
Jeffrey Morgan
a0a199b108
Fix hanging issue when sending empty content ( #2399 )
2024-02-07 19:30:33 -05:00
Jeffrey Morgan
ab0d37fde4
Update openai.md
2024-02-07 17:25:33 -05:00
Jeffrey Morgan
14e71350c8
Update openai.md
2024-02-07 17:25:24 -05:00
Jeffrey Morgan
453f572f83
Initial OpenAI /v1/chat/completions API compatibility ( #2376 )
2024-02-07 17:24:29 -05:00
Daniel Hiltgen
c9dfa6e571
Merge pull request #2377 from dhiltgen/bump_llamacpp
...
Bump llama.cpp to b2081
2024-02-07 12:04:38 -08:00
Michael Yang
3dcbcd367d
Merge pull request #2394 from ollama/mxyng/fix-error-response
2024-02-07 11:47:31 -08:00
Michael Yang
e805ac1d59
fix response on token error
2024-02-07 11:05:49 -08:00
Michael Yang
b9229ffca5
Merge pull request #2378 from ollama/mxyng/runners
...
runners
2024-02-06 13:49:58 -08:00
Michael Yang
46c847c4ad
enable rocm builds
2024-02-06 13:36:13 -08:00
Michael Yang
92b1a21f79
use linux runners
2024-02-06 13:36:04 -08:00
Daniel Hiltgen
de76b95dd4
Bump llama.cpp to b2081
2024-02-06 12:06:43 -08:00
Michael Yang
59ec837ef6
Merge pull request #2374 from ollama/mxyng/rocm-builds
...
disable rocm builds
2024-02-06 09:41:02 -08:00
Michael Yang
f06b99a461
disable rocm builds
2024-02-06 09:29:42 -08:00
Bruce MacDonald
128fce5495
docs: keep_alive ( #2258 )
2024-02-06 11:00:05 -05:00
Daniel Hiltgen
27aa2d4a19
Merge pull request #1849 from mraiser/main
...
Accomodate split cuda lib dir
2024-02-05 16:01:16 -08:00
Jeffrey Morgan
b9f91a0b36
Update import instructions to use convert and quantize tooling from llama.cpp submodule ( #2247 )
2024-02-05 00:50:44 -05:00
Erik S
b538dc3858
Add llm-ollama plugin for Datasette's LLM CLI to README ( #2340 )
...
Co-authored-by: Erik Sp <git@aschwa.com >
2024-02-03 15:40:50 -08:00
Jeffrey Morgan
f0e9496c85
Update api.md
2024-02-02 12:17:24 -08:00
Jeffrey Morgan
09a6f76f4c
fix error on ollama run with a non-existent model
2024-02-01 23:11:52 -08:00
Jeffrey Morgan
e135167484
Add multimodel support to ollama run in noninteractive mopde ( #2317 )
2024-02-01 21:33:06 -08:00
Jeffrey Morgan
38296ab352
clear previous images when submitting an image to ollama run ( #2316 )
2024-02-01 21:30:26 -08:00
Daniel Hiltgen
f43dea68d1
Merge pull request #2318 from dhiltgen/more_clean
...
Harden generate patching model
2024-02-01 20:41:29 -08:00
Daniel Hiltgen
e1f50377f4
Harden generate patching model
...
Only apply patches if we have any, and make sure to cleanup
every file we patched at the end to leave the tree clean
2024-02-01 19:34:36 -08:00
Jeffrey Morgan
7913104527
Improvements to ollama run for multimodal models ( #2300 )
2024-02-01 17:09:51 -08:00
Michael Yang
bfbf2f7cf7
Merge pull request #2296 from ollama/mxyng/img-tags
...
append image tags to user content
2024-02-01 13:16:59 -08:00
Michael Yang
fe3cbd014f
Merge pull request #2298 from ollama/mxyng/debug-prompt
...
structured debug prompt
2024-02-01 13:16:49 -08:00
Michael Yang
3d6f48507a
structured debug prompt
2024-02-01 11:56:28 -08:00
Michael Yang
f3761405c8
use image id
2024-02-01 11:52:42 -08:00
Michael Yang
e49dc9f3d8
fix tests
2024-02-01 11:48:11 -08:00
Michael Yang
d125510b4b
remove image tags
2024-02-01 11:32:51 -08:00
Russell Canfield
1ca386aa9e
Feature - Add Wingman Extension ( #2313 )
2024-02-01 11:16:24 -08:00
Michael Yang
fb56988014
account for image projection in token count
2024-02-01 09:50:48 -08:00
Michael Yang
d046bee790
use llm.ImageData for chat
2024-01-31 19:18:25 -08:00
Jeffrey Morgan
f11bf0740b
use llm.ImageData
2024-01-31 19:13:48 -08:00
Michael Yang
8450bf66e6
trim images
2024-01-31 19:13:47 -08:00
Michael Yang
b4e11be8ef
append image tags to user content
2024-01-31 19:13:10 -08:00
Bruce MacDonald
a896079705
preserve last system message from modelfile ( #2289 )
2024-01-31 21:45:01 -05:00
Michael Yang
583950c828
Merge pull request #2294 from ollama/mxyng/slog-source
...
update slog handler options
2024-01-31 15:29:11 -08:00
Michael Yang
8ac08a0eec
update slog handler options
...
- consistent format by using text handler for debug and non-debug
- truncate source file to just the file name
2024-01-31 15:15:00 -08:00
Michael Yang
60f47be64c
Merge pull request #2284 from ollama/mxyng/parse-raw
...
remove unnecessary parse raw
2024-01-31 09:40:48 -08:00
Daniel Hiltgen
6e56077ada
Merge pull request #2263 from dhiltgen/bump_llamacpp
...
Bump llama.cpp to b1999
2024-01-31 08:39:41 -08:00
Hoang Nguyen
98ae9467bb
Added MindMac to Community Integrations -> Web & Desktop section ( #1957 )
2024-01-31 07:48:37 -08:00
Richard Macarthy
b7a24af083
Add twinny vscode extension to Extensions and Plugins ( #1950 )
2024-01-31 06:25:06 -08:00
Michael Yang
c8b1f2369e
remove unnecessary parse raw
2024-01-30 17:00:53 -08:00
Daniel Hiltgen
72b12c3be7
Bump llama.cpp to b1999
...
This requires an upstream change to support graceful termination,
carried as a patch.
2024-01-30 16:52:12 -08:00
Bruce MacDonald
0632dff3f8
trim chat prompt based on llm context size ( #1963 )
2024-01-30 15:59:29 -05:00
Maximilian Weber
509e2dec8a
Update README.md ( #2252 )
...
Added - [Ollama for R - rollama](https://github.com/JBGruber/rollama ) in Libraries in README.md
2024-01-30 11:56:51 -08:00
Daniel Hiltgen
78a48de804
Merge pull request #2256 from dhiltgen/container_logs
...
Add container hints for troubleshooting
2024-01-30 08:12:48 -08:00
Daniel Hiltgen
e7dbb00331
Add container hints for troubleshooting
...
Some users are new to containers and unsure where the server logs go
2024-01-29 08:53:41 -08:00
Marc Raiser
c3f9538636
remove default.nix
2024-01-29 00:05:07 -05:00
Jeffrey Morgan
2e06ed01d5
remove unknown CPPFLAGS option
2024-01-28 17:51:23 -08:00
Daniel Hiltgen
4072b5879b
Merge pull request #2246 from dhiltgen/reject_cuda_without_avx
...
Don't disable GPUs on arm without AVX
2024-01-28 16:26:55 -08:00
Daniel Hiltgen
15562e887d
Don't disable GPUs on arm without AVX
...
AVX is an x86 feature, so ARM should be excluded from
the check.
2024-01-28 15:22:38 -08:00
Jeffrey Morgan
f2245c7c77
print prompt with OLLAMA_DEBUG=1 ( #2245 )
2024-01-28 15:22:35 -08:00
Jeffrey Morgan
e4b9b72f2a
Do not repeat system prompt for chat templating ( #2241 )
2024-01-28 14:15:56 -08:00
Daniel Hiltgen
311f8e0c3f
Merge pull request #2243 from dhiltgen/harden_zero_gpus
...
Harden for zero detected GPUs
2024-01-28 13:30:44 -08:00
Daniel Hiltgen
f07f8b7a9e
Harden for zero detected GPUs
...
At least with the ROCm libraries, its possible to have the library
present with zero GPUs. This fix avoids a divide by zero bug in llm.go
when we try to calculate GPU memory with zero GPUs.
2024-01-28 13:13:10 -08:00
mraiser
4c4c730a0a
Merge branch 'ollama:main' into main
2024-01-27 21:56:11 -05:00
Daniel Hiltgen
e02ecfb6c8
Merge pull request #2116 from dhiltgen/cc_50_80
...
Add support for CUDA 5.0 cards
2024-01-27 10:28:38 -08:00
Daniel Hiltgen
c8059b4dcf
Merge pull request #2224 from jaglinux/fix_rocm_get_version_message
...
ROCm: Correct the response string in rocm_get_version function
2024-01-27 07:29:32 -08:00
Jagadish Krishnamoorthy
59d87127f5
Update gpu_info_rocm.c
2024-01-26 22:08:27 -08:00
Patrick Devine
b5cf31b460
add keep_alive to generate/chat/embedding api endpoints ( #2146 )
2024-01-26 14:28:02 -08:00
Daniel Hiltgen
cc4915e262
Merge pull request #2214 from dhiltgen/reject_cuda_without_avx
...
Detect lack of AVX and fallback to CPU mode
2024-01-26 12:06:44 -08:00
Daniel Hiltgen
667a2ba18a
Detect lack of AVX and fallback to CPU mode
...
We build the GPU libraries with AVX enabled to ensure that if not all
layers fit on the GPU we get better performance in a mixed mode.
If the user is using a virtualization/emulation system that lacks AVX
this used to result in an illegal instruction error and crash before this
fix. Now we will report a warning in the server log, and just use
CPU mode to ensure we don't crash.
2024-01-26 11:36:03 -08:00
Michael Yang
e054ebe059
Merge pull request #2212 from ollama/mxyng/fix-build
...
fix build
2024-01-26 11:19:08 -08:00
Michael Yang
9d3dcfd0ec
fix logging
2024-01-26 11:04:27 -08:00
Michael Yang
6e0ea5ecc8
Merge pull request #1916 from ollama/mxyng/inactivity-monitor
...
download: add inactivity monitor
2024-01-26 10:56:00 -08:00
Daniel Hiltgen
a47d8b2557
Merge pull request #2197 from dhiltgen/remove_rocm_image
...
Add back ROCm container support
2024-01-26 09:34:23 -08:00
Daniel Hiltgen
30c43c285c
Merge pull request #2195 from dhiltgen/rocm_real_gpus
...
Ignore AMD integrated GPUs
2024-01-26 09:30:24 -08:00
Daniel Hiltgen
23a7ea593b
Merge pull request #2209 from dhiltgen/harden_mgmt
...
Fix crash on cuda ml init failure
2024-01-26 09:30:13 -08:00
Daniel Hiltgen
75c44aa319
Add back ROCm container support
...
This adds ROCm support back as a discrete image.
2024-01-26 09:24:29 -08:00
Daniel Hiltgen
9d7b5d6c91
Ignore AMD integrated GPUs
...
Detect and ignore integrated GPUs reported by rocm.
2024-01-26 09:21:35 -08:00
Daniel Hiltgen
5d9c4a5f5a
Fix crash on cuda ml init failure
...
The new driver lookup code was triggering after init failure due to a missing return
2024-01-26 09:18:33 -08:00
Daniel Hiltgen
197e420a97
Merge pull request #2196 from dhiltgen/remove_rocm_image
...
Switch back to ubuntu base
2024-01-25 16:50:32 -08:00
Daniel Hiltgen
a34e1ad3cf
Switch back to ubuntu base
...
The size increase for rocm support in the standard image is problematic
We'll revisit multiple tags for rocm support in a follow up PR.
2024-01-25 16:46:01 -08:00
Michael Yang
2ae0556292
Merge pull request #1679 from ollama/mxyng/build-gpus
...
build cuda and rocm
2024-01-25 16:38:14 -08:00
Jeffrey Morgan
5be9bdd444
Update modelfile.md
2024-01-25 16:29:48 -08:00
Jeffrey Morgan
b706794905
Update modelfile.md to include MESSAGE
2024-01-25 16:29:32 -08:00
Michael Yang
a8c5413d06
only generate gpu libs
2024-01-25 15:41:56 -08:00
Michael Yang
5580de4571
archive ollama binaries
2024-01-25 15:40:16 -08:00
Michael Yang
946431d5b0
build cuda and rocm
2024-01-25 15:40:15 -08:00
Michael Yang
0610126049
remove env setting
2024-01-25 15:39:43 -08:00
Jeffrey Morgan
3ebd6a83fc
update submodule to cd4fddb29f81d6a1f6d51a0c016bc6b486d68def
2024-01-25 13:54:11 -08:00
Jeffrey Morgan
a64570dcae
Fix clearing kv cache between requests with the same prompt ( #2186 )
...
* Fix clearing kv cache between requests with the same prompt
* fix powershell script
2024-01-25 13:46:20 -08:00
Patrick Devine
7c40a67841
Save and load sessions ( #2063 )
2024-01-25 12:12:36 -08:00
Michael Yang
e64b5b07a2
Merge pull request #2181 from ollama/mxyng/stub-lint
...
stub generate outputs for lint
2024-01-25 11:55:15 -08:00
Michael Yang
9e1e295cdc
Merge pull request #2175 from ollama/mxyng/refactor-tensor-read
...
refactor tensor read
2024-01-25 09:22:42 -08:00
Marc Raiser
6eb3cddcb6
To build on NixOS: nix-shell --run 'go generate ./... && go build .'
2024-01-25 10:17:22 -05:00
mraiser
a4564232a4
Update gen_linux.sh to find libcudart in separate directory
2024-01-25 09:49:35 -05:00
Jeffrey Morgan
a643823f86
Update README.md
2024-01-24 21:36:56 -08:00
Michael Yang
8e5d359a03
stub generate outputs for lint
2024-01-24 17:36:10 -08:00
Daniel Hiltgen
a170888dd4
Merge pull request #2174 from dhiltgen/rocm_real_gpus
...
More logging for gpu management
2024-01-24 11:09:17 -08:00
Michael Yang
cd22855ef8
refactor tensor read
2024-01-24 10:48:31 -08:00
Daniel Hiltgen
013fd07139
More logging for gpu management
...
Fix an ordering glitch of dlerr/dlclose and add more logging to help
root cause some crashes users are hitting. This also refines the
function pointer names to use the underlying function names instead
of simplified names for readability.
2024-01-24 10:32:36 -08:00
Daniel Hiltgen
f63dc2db5c
Merge pull request #2162 from dhiltgen/rocm_real_gpus
...
Report more information about GPUs in verbose mode
2024-01-23 17:45:40 -08:00
Jeffrey Morgan
eaa5a396d9
Update README.md
2024-01-23 16:08:15 -08:00
Jeffrey Morgan
8ed22f5d72
Update README.md
2024-01-23 14:38:01 -08:00
Daniel Hiltgen
987c16b2f7
Report more information about GPUs in verbose mode
...
This adds additional calls to both CUDA and ROCm management libraries to
discover additional attributes about the GPU(s) detected in the system, and
wires up runtime verbosity selection. When users hit problems with GPUs we can
ask them to run with `OLLAMA_DEBUG=1 ollama serve` and share the results.
2024-01-23 11:37:02 -08:00
Jeffrey Morgan
950f636d64
Update README.md
2024-01-23 10:29:10 -08:00
Jeffrey Morgan
4458efb73a
Load all layers on arm64 macOS if model is small enough ( #2149 )
2024-01-22 17:40:06 -08:00
Daniel Hiltgen
ceea599494
Merge pull request #2150 from dhiltgen/default_version
...
Set a default version using git describe
2024-01-22 17:38:27 -08:00
Daniel Hiltgen
3005ec74b3
Set a default version using git describe
...
If a VERSION is not specified, this will generate a version string that
represents the state of the repo. For example `0.1.21-12-gffaf52e-dirty`
representing 12 commits away from 0.1.21 tag, on commit gffaf52e
and the tree is dirty.
2024-01-22 17:12:20 -08:00
Daniel Hiltgen
0759d8996e
Merge pull request #2148 from dhiltgen/intel_mac
...
Refine Accelerate usage on mac
2024-01-22 16:56:58 -08:00
Daniel Hiltgen
0f5b843319
Refine Accelerate usage on mac
...
For old macs, accelerate seems to cause crashes, but for
AVX2 capable macs, it does not.
2024-01-22 16:25:56 -08:00
Jeffrey Morgan
ffaf52e1e9
update submodule to 011e8ec577fd135cbc02993d3ea9840c516d6a1c
2024-01-22 15:16:54 -08:00
Michael Yang
940b10b036
Merge pull request #2144 from jmorganca/mxyng/update-faq
...
faq: update to use launchctl setenv
2024-01-22 13:46:57 -08:00
Daniel Hiltgen
3bc28736cd
Merge pull request #2143 from dhiltgen/llm_verbosity
...
Refine debug logging for llm
2024-01-22 13:19:16 -08:00
Michael Yang
93a756266c
faq: update to use launchctl setenv
2024-01-22 13:10:13 -08:00
Daniel Hiltgen
a0a829bf7a
Merge pull request #2142 from dhiltgen/debug_on_fail
...
Debug logging on init failure
2024-01-22 12:29:22 -08:00
Daniel Hiltgen
730dcfcc7a
Refine debug logging for llm
...
This wires up logging in llama.cpp to always go to stderr, and also
turns up logging if OLLAMA_DEBUG is set.
2024-01-22 12:26:49 -08:00
Daniel Hiltgen
27a2d5af54
Debug logging on init failure
2024-01-22 12:08:22 -08:00
Jeffrey Morgan
5f81a33f43
update submodule to 6f9939d ( #2115 )
2024-01-22 11:56:40 -08:00
Michael Yang
6225fde046
Merge pull request #2102 from jmorganca/mxyng/fix-create-override
...
fix: remove overwritten model layers
2024-01-22 09:37:48 -08:00
Meng Zhuo
069184562b
readline: drop not use min function ( #2134 )
2024-01-22 08:15:08 -08:00
Daniel Hiltgen
5576bb2348
Merge pull request #2130 from dhiltgen/more_faster
...
Make CPU builds parallel and customizable AMD GPUs
2024-01-21 16:14:12 -08:00
Daniel Hiltgen
2738837786
Merge pull request #2131 from dhiltgen/probe_cards_at_init
...
Probe GPUs before backend init
2024-01-21 16:13:47 -08:00
Daniel Hiltgen
ec3764538d
Probe GPUs before backend init
...
Detect potential error scenarios so we can fallback to CPU mode without
hitting asserts.
2024-01-21 15:59:38 -08:00
Daniel Hiltgen
df54c723ae
Make CPU builds parallel and customizable AMD GPUs
...
The linux build now support parallel CPU builds to speed things up.
This also exposes AMD GPU targets as an optional setting for advaced
users who want to alter our default set.
2024-01-21 15:12:21 -08:00
Daniel Hiltgen
fa8c990e58
Merge pull request #2127 from dhiltgen/rocm_container
...
Combine the 2 Dockerfiles and add ROCm
2024-01-21 11:49:01 -08:00
Daniel Hiltgen
da72235ebf
Combine the 2 Dockerfiles and add ROCm
...
This renames Dockerfile.build to Dockerfile, and adds some new stages
to support 2 modes of building - the build_linux.sh script uses
intermediate stages to extract the artifacts for ./dist, and the default
build generates a container image usable by both cuda and rocm cards.
This required transitioniing the x86 base to the rocm image to avoid
layer bloat.
2024-01-21 11:37:11 -08:00
Jeffrey Morgan
89c4aee29e
Unlock mutex when failing to load model ( #2117 )
2024-01-20 20:54:46 -05:00
Daniel Hiltgen
a447a083f2
Add compute capability 5.0, 7.5, and 8.0
2024-01-20 14:24:05 -08:00
Jeffrey Morgan
f32ea81b21
increase minimum overhead to 1024MiB ( #2114 )
2024-01-20 17:11:38 -05:00
Daniel Hiltgen
681a914990
Add support for CUDA 5.2 cards
2024-01-20 10:48:43 -08:00
Jeffrey Morgan
4c54f0ddeb
sign dylibs on macOS ( #2101 )
2024-01-19 19:24:11 -05:00
Michael Yang
c08dfaa23d
fix: remove overwritten model layers
...
if create overrides a manifest, first add the older manifest's layers to
the delete map so they can be cleaned up
2024-01-19 14:58:37 -08:00
Daniel Hiltgen
3b76e736ae
Merge pull request #2100 from dhiltgen/more_wsl_globs
...
More WSL paths
2024-01-19 13:41:08 -08:00
Daniel Hiltgen
552db98bf1
More WSL paths
2024-01-19 13:23:29 -08:00
Daniel Hiltgen
fdcdfef620
Merge pull request #2099 from dhiltgen/fix_cuda_model_swap
...
Switch to local dlopen symbols
2024-01-19 12:22:04 -08:00
Daniel Hiltgen
6a042438af
Switch to local dlopen symbols
2024-01-19 11:37:02 -08:00
Jeffrey Morgan
dc88cc3981
use gzip for runner embedding ( #2067 )
2024-01-19 13:23:03 -05:00
Daniel Hiltgen
62976087c6
Merge pull request #1999 from lainedfles/termux_android_cpu_only
...
Fix CPU-only build under Android Termux enviornment.
2024-01-18 17:16:53 -08:00
Self Denial
344342abdf
Restore dyn_ext_server.c since RTLD_DEEPBIND has been removed
2024-01-18 17:30:42 -07:00
Self Denial
eb76f3e379
Fix CPU-only build under Android Termux enviornment.
...
Update gpu.go initGPUHandles() to declare gpuHandles variable before
reading it. This resolves an "invalid memory address or nil pointer
dereference" error.
Update dyn_ext_server.c to avoid setting the RTLD_DEEPBIND flag under
__TERMUX__ (Android).
2024-01-18 17:25:33 -07:00
Michael Yang
d017e3d0a6
Merge pull request #2060 from jmorganca/mxyng/fix-show
...
fix show handler
2024-01-18 16:02:27 -08:00
Michael Yang
aac9ab4db7
fix show handler
2024-01-18 15:36:50 -08:00
Michael Yang
1f5b7ff976
Merge pull request #1932 from jmorganca/mxyng/api-fields
...
api: add model for all requests
2024-01-18 14:56:51 -08:00
Michael Yang
e299831e2c
Merge pull request #1958 from purificant/ci
...
ci: update setup-go action
2024-01-18 14:53:36 -08:00
Michael Yang
745b5934fa
add model to ModelResponse
2024-01-18 14:32:55 -08:00
Michael Yang
a38d88d828
api: add model for all requests
...
prefer using req.Model and fallback to req.Name
2024-01-18 14:31:37 -08:00
Daniel Hiltgen
abec7f06e5
Merge pull request #2056 from dhiltgen/slog
...
Mechanical switch from log to slog
2024-01-18 14:27:24 -08:00
Michael Yang
e5da190bac
Merge pull request #2020 from jmorganca/mxyng/install-fedora
...
install: pin fedora to max 37
2024-01-18 14:23:42 -08:00
Daniel Hiltgen
ecbfc0182f
Go bump to v1.21 to pick up slog
2024-01-18 14:12:57 -08:00
Daniel Hiltgen
fedd705aea
Mechanical switch from log to slog
...
A few obvious levels were adjusted, but generally everything mapped to "info" level.
2024-01-18 14:12:57 -08:00
Mike Bird
82ee019bfc
add open interpreter to list of extensions ( #2016 )
2024-01-18 13:59:39 -08:00
Sachin Sachdeva
ad9dbc2a04
Haystack Ollama Integration ( #2021 )
...
Updated readme with the web link for haystack ollama integration
2024-01-18 13:38:32 -08:00
Daniel Hiltgen
fccdf4c635
Merge pull request #1987 from xyproto/archlinux
...
Let gpu.go and gen_linux.sh also find CUDA on Arch Linux
2024-01-18 13:32:10 -08:00
Daniel Hiltgen
d450fb1d1e
Merge pull request #2055 from dhiltgen/cuda_docs
...
Refine the linux cuda/rocm developer docs
2024-01-18 12:07:31 -08:00
Daniel Hiltgen
df40b11d03
Merge pull request #2007 from dhiltgen/cpu_fallback
...
Add multiple CPU variants for Intel Mac
2024-01-18 11:32:29 -08:00
Daniel Hiltgen
9cd20b0ec8
Refine the linux cuda/rocm developer docs
2024-01-18 09:44:44 -08:00
Daniel Hiltgen
b992bf65fc
Disable arm64 for test phase
...
The runners are x86 so we can only run binaries that match.
2024-01-17 19:26:13 -08:00
Daniel Hiltgen
1b249748ab
Add multiple CPU variants for Intel Mac
...
This also refines the build process for the ext_server build.
2024-01-17 15:08:54 -08:00
Alexander F. Rødseth
cbe2adc78a
Merge branch 'main' into archlinux
2024-01-17 12:50:11 +01:00
Michael Yang
d5a7353357
Merge pull request #2026 from jmorganca/mxyng/fix-windows
...
fix: normalize name path before splitting
2024-01-16 16:58:42 -08:00
Michael Yang
96cfb62641
fix: normalize name path before splitting
2024-01-16 16:48:29 -08:00
Daniel Hiltgen
7d00b5d110
Merge pull request #1915 from dhiltgen/bump_llama_with_new_dep
...
Bump llama.cpp to b1842 and add new cuda lib dep
2024-01-16 13:36:49 -08:00
Daniel Hiltgen
795674dd90
Bump llama.cpp to b1842 and add new cuda lib dep
...
Upstream llama.cpp has added a new dependency with the
NVIDIA CUDA Driver Libraries (libcuda.so) which is part of the
driver distribution, not the general cuda libraries, and is not
available as an archive, so we can not statically link it. This may
introduce some additional compatibility challenges which we'll
need to keep an eye on.
2024-01-16 12:53:52 -08:00
Daniel Hiltgen
e282bdccdd
Merge pull request #1990 from dhiltgen/ci_mac_cross
...
Add macos cross-compile CI coverage
2024-01-16 12:31:37 -08:00
Michael Yang
d9bfb2f08f
install: pin fedora to max 37
...
repos for fedora 38 and newer do not exist as of this commit
```
$ dnf config-manager --add-repo https://developer.download.nvidia.com/compute/cuda/repos/fedora38/x86_64/cuda-fedora38.repo
Adding repo from: https://developer.download.nvidia.com/compute/cuda/repos/fedora38/x86_64/cuda-fedora38.repo
Status code: 404 for https://developer.download.nvidia.com/compute/cuda/repos/fedora38/x86_64/cuda-fedora38.repo (IP: 152.195.19.142)
Error: Configuration of repo failed
```
2024-01-16 11:45:21 -08:00
Michael Yang
598d6d5572
Merge pull request #1937 from jmorganca/mxyng/remove-client-py
...
remove client.py
2024-01-16 11:01:41 -08:00
Bruce MacDonald
a897e833b8
do not cache prompt ( #2018 )
...
- prompt cache causes inferance to hang after some time
2024-01-16 13:48:05 -05:00
Patrick Devine
eef50accb4
Fix show parameters ( #2017 )
2024-01-16 10:34:44 -08:00
Michael Yang
05d53de7a1
Merge pull request #1968 from jmorganca/mxyng/fix-request-retry
...
fix: request retry with error
2024-01-16 10:33:50 -08:00
Daniel Hiltgen
8795447dad
Merge pull request #1966 from fpreiss/fpreiss/gen_linux_cuda_detection
...
improve cuda detection (rel. issue #1704 )
2024-01-14 18:00:11 -08:00
Daniel Hiltgen
b3035112a1
Add macos cross-compile CI coverage
2024-01-14 10:38:59 -08:00
Daniel Hiltgen
95ad9a9fc8
Merge pull request #1988 from dhiltgen/fix_intel_mac
...
Fix typo in arm mac arch script
2024-01-14 08:45:18 -08:00
Daniel Hiltgen
3ca5f69ce8
Fix typo in arm mac arch script
2024-01-14 08:32:57 -08:00
Daniel Hiltgen
cfa6337960
Merge pull request #1982 from dhiltgen/fix_intel_mac
...
Fix intel mac build
2024-01-14 08:26:46 -08:00
Alexander F. Rødseth
f4bf1d514f
Let gpu.go and gen_linux.sh also find CUDA on Arch Linux
2024-01-14 13:40:36 +01:00
Jeffrey Morgan
557110d0ba
Disable mmap with lora layers ( #1985 )
2024-01-13 23:36:31 -05:00
Daniel Hiltgen
2ecb247276
Fix intel mac build
...
Make sure we're building an x86 ext_server lib when cross-compiling
2024-01-13 14:46:34 -08:00
Jeffrey Morgan
288ef8ff95
add gcc -lstdc++ flag for linux cpu ( #1974 )
2024-01-13 03:53:00 -05:00
Jeffrey Morgan
4cf17990f7
use g++ to build libext_server.so on linux ( #1972 )
2024-01-13 03:12:42 -05:00
Michael Yang
27331ae3a8
download: add inactivity monitor
...
if a download part is inactive for some time, restart it
2024-01-12 15:23:15 -08:00
Michael Yang
b6c0ef1e70
Merge pull request #1961 from jmorganca/mxyng/rm-double-newline
...
remove double newlines in /set parameter
2024-01-12 15:18:19 -08:00
Michael Yang
356d178f6e
Merge pull request #1971 from jmorganca/mxyng/max-context-length
...
add max context length check
2024-01-12 15:10:25 -08:00
Michael Yang
eaed6f8c45
add max context length check
2024-01-12 14:54:07 -08:00
purificant
6a5bfc2ed6
update actions/setup-go
2024-01-12 22:27:25 +00:00
Michael Yang
cf29bd2d72
fix: request retry with error
...
this fixes a subtle bug with makeRequestWithRetry where an HTTP status
error on a retried request will potentially not return the right err
2024-01-12 13:32:27 -08:00
Fabian Preiss
905862e17b
improve cuda detection (rel. issue #1704 )
2024-01-12 21:59:19 +01:00
Patrick Devine
565f8a3c44
Convert the REPL to use /api/chat for interactive responses ( #1936 )
2024-01-12 12:05:52 -08:00
Michael Yang
5121b7ac9c
remove double newlines in /set parameter
2024-01-12 11:21:15 -08:00
Michael Yang
a70262c6b2
Update README.md
...
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com >
2024-01-12 09:43:04 -08:00
Tristram Oaten
40a0a90a88
Add group delete to uninstall instructions ( #1924 )
...
After executing the `userdel ollama` command, I saw this message:
```sh
$ sudo userdel ollama
userdel: group ollama not removed because it has other members.
```
Which reminded me that I had to remove the dangling group too. For completeness, the uninstall instructions should do this too.
Thanks!
2024-01-12 00:07:00 -05:00
Michael Yang
cbe20c4375
update readme
2024-01-11 16:24:37 -08:00
Michael Yang
5ffbbea1d7
remove client.py
2024-01-11 15:53:10 -08:00
Daniel Hiltgen
3773fb6465
Merge pull request #1935 from dhiltgen/cpu_fallback
...
Fix up the CPU fallback selection
2024-01-11 15:52:32 -08:00
Daniel Hiltgen
7427fa1387
Fix up the CPU fallback selection
...
The memory changes and multi-variant change had some merge
glitches I missed. This fixes them so we actually get the cpu llm lib
and best variant for the given system.
2024-01-11 15:27:06 -08:00
Michael Yang
f84537e0e0
Merge pull request #1934 from jmorganca/mxyng/fix-slices
...
fix build and lint
2024-01-11 14:36:20 -08:00
Michael Yang
d2be6387c9
fix typo
2024-01-11 14:25:21 -08:00
Michael Yang
d7af35d3d0
import fmt
2024-01-11 14:22:32 -08:00
Michael Yang
defc1dbd6e
use x/exp/slices
2024-01-11 14:20:13 -08:00
Daniel Hiltgen
de2fbdec99
Merge pull request #1819 from dhiltgen/multi_variant
...
Support multiple LLM libs; ROCm v5 and v6; Rosetta, AVX, and AVX2 compatible CPU builds
2024-01-11 14:00:48 -08:00
Eduard van Valkenburg
f5faf79aa1
Add semantic kernel to Readme ( #1931 )
2024-01-11 14:40:23 -05:00
Michael Yang
f4f939de28
Merge pull request #1552 from jmorganca/mxyng/lint-test
...
add lint and test on pull_request
2024-01-11 09:37:45 -08:00
Daniel Hiltgen
39928a42e8
Always dynamically load the llm server library
...
This switches darwin to dynamic loading, and refactors the code now that no
static linking of the library is used on any platform
2024-01-11 08:42:47 -08:00
Daniel Hiltgen
d88c527be3
Build multiple CPU variants and pick the best
...
This reduces the built-in linux version to not use any vector extensions
which enables the resulting builds to run under Rosetta on MacOS in
Docker. Then at runtime it checks for the actual CPU vector
extensions and loads the best CPU library available
2024-01-11 08:42:47 -08:00
Fabian Preiß
3bc8b9832b
fix gpu_test.go Error (same type) uint64->uint32 ( #1921 )
2024-01-11 08:22:23 -05:00
Jeffrey Morgan
ab6be852c7
revisit memory allocation to account for full kv cache on main gpu
2024-01-11 01:45:31 -05:00
Daniel Hiltgen
052b33b81b
DRY out the Dockefile.build
2024-01-10 17:27:51 -08:00
Daniel Hiltgen
8da7bef05f
Support multiple variants for a given llm lib type
...
In some cases we may want multiple variants for a given GPU type or CPU.
This adds logic to have an optional Variant which we can use to select
an optimal library, but also allows us to try multiple variants in case
some fail to load.
This can be useful for scenarios such as ROCm v5 vs v6 incompatibility
or potentially CPU features.
2024-01-10 17:27:51 -08:00
Jeffrey Morgan
b24e8d17b2
Increase minimum CUDA memory allocation overhead and fix minimum overhead for multi-gpu ( #1896 )
...
* increase minimum cuda overhead and fix minimum overhead for multi-gpu
* fix multi gpu overhead
* limit overhead to 10% of all gpus
* better wording
* allocate fixed amount before layers
* fixed only includes graph alloc
2024-01-10 19:08:51 -05:00
Jeffrey Morgan
f83881390f
revert submodule back to 328b83de23b33240e28f4e74900d1d06726f5eb1
2024-01-10 18:42:39 -05:00
Daniel Hiltgen
ac70ab6761
Merge pull request #1914 from dhiltgen/smarter_cuda_detection
...
Smarter GPU Management library detection
2024-01-10 15:21:56 -08:00
Daniel Hiltgen
3c49c3ab0d
Harden GPU mgmt library lookup
...
When there are multiple management libraries installed on a system
not every one will be compatible with the current driver. This change
improves our management library algorithm to build up a set of discovered
libraries based on glob patterns, and then try all of them until we're able to
load one without error.
2024-01-10 15:06:41 -08:00
Daniel Hiltgen
9754ae4c89
Support optional override of the target archictures
...
This can help speed up incremental builds when you're only testing one
archicture, like amd64. E.g.
BUILD_ARCH=amd64 ./scripts/build_linux.sh && scp ./dist/ollama-linux-amd64 test-system:
2024-01-10 14:43:24 -08:00
Jeffrey Morgan
224fbf2795
update submodule to commit 1fc2f265ff9377a37fd2c61eae9cd813a3491bea until its main branch is fixed
2024-01-10 17:03:15 -05:00
Jeffrey Morgan
2c6e8f5248
Update submodule to 6efb8eb30e7025b168f3fda3ff83b9b386428ad6 ( #1885 )
...
* update submodule to `6efb8eb30e7025b168f3fda3ff83b9b386428ad6`
* unblock condition variable in `update_slots` when closing server
2024-01-10 16:48:38 -05:00
Jeffrey Morgan
34344d801c
clean up cmake build directory when cross compiling macOS builds
2024-01-09 17:13:56 -05:00
Robin Glauser
e868c8a5c7
Update api.md ( #1878 )
...
Fixed assistant in the example response.
2024-01-09 16:21:17 -05:00
Jeffrey Morgan
c336693f07
calculate overhead based number of gpu devices ( #1875 )
2024-01-09 15:53:33 -05:00
Daniel Hiltgen
e89dc1d54b
Merge pull request #1874 from dhiltgen/correct_cuda_min
...
Set corret CUDA minimum compute capability version
2024-01-09 11:37:22 -08:00
Daniel Hiltgen
1961a81f03
Set corret CUDA minimum compute capability version
...
If you attempt to run the current CUDA build on compute capability 5.2
cards, you'll hit the following failure:
cuBLAS error 15 at ggml-cuda.cu:7956: the requested functionality is not supported
2024-01-09 11:28:24 -08:00
Jeffrey Morgan
8a8c7e7f8d
only build for metal on arm64
2024-01-09 13:51:08 -05:00
Jeffrey Morgan
6df83e6daa
update rough cuda overhead estimate to 15% + 384MiB
2024-01-09 13:51:08 -05:00
Michael Yang
f921e2696e
typo
2024-01-09 09:45:42 -08:00
Michael Yang
4a33cede20
remove unused fields and functions
2024-01-09 09:37:40 -08:00
Michael Yang
f95d2f25f3
fix temporary history file permissions
2024-01-09 09:36:58 -08:00
Michael Yang
2b9892a808
fix(windows): modelpath and list
2024-01-09 09:36:58 -08:00
Michael Yang
2bb2bdd5d4
fix lint
2024-01-09 09:36:58 -08:00
Michael Yang
acfc376efd
add .golangci.yaml
2024-01-09 09:36:58 -08:00
Michael Yang
997253143f
add lint and test on pull_request
2024-01-09 09:36:58 -08:00
Michael Yang
62023177f6
Merge pull request #1614 from jmorganca/mxyng/fix-set-template
...
fix: set template without triple quotes
2024-01-09 09:36:24 -08:00
Jeffrey Morgan
6164f378f2
revert cuda overhead to 20%
2024-01-09 00:54:29 -05:00
Jeffrey Morgan
f387e9631b
use runner if cuda alloc won't fit
2024-01-09 00:44:34 -05:00
Jeffrey Morgan
6566387ae3
add TODO for cuda overhead
2024-01-09 00:28:03 -05:00
Jeffrey Morgan
37708931fb
update cuda overhead to 20% to fix crashes when switching between models and large context sizes
2024-01-09 00:05:23 -05:00
Jeffrey Morgan
f6cb0a553c
update cuda overhead to 15% or 400MiB
2024-01-08 23:45:45 -05:00
Jeffrey Morgan
2680078c13
fix build on linux
2024-01-08 23:44:13 -05:00
Jeffrey Morgan
f1b7e5f560
update overhead to 15%
2024-01-08 23:37:45 -05:00
Jeffrey Morgan
cb534e6ac2
use 10% vram overhead for cuda
2024-01-08 23:17:44 -05:00
Jeffrey Morgan
58ce2d8273
better estimate scratch buffer size
2024-01-08 21:32:44 -05:00
Jeffrey Morgan
18ddf6d57d
fix windows build
2024-01-08 20:04:01 -05:00
Michael Yang
61e6502449
Merge pull request #1818 from jmorganca/mxyng/fix-alt-prompt
...
fix(cmd): history in alt prompt
2024-01-08 13:48:34 -08:00
Jeffrey Morgan
08f1e18965
Offload layers to GPU based on new model size estimates ( #1850 )
...
* select layers based on estimated model memory usage
* always account for scratch vram
* dont load +1 layers
* better estmation for graph alloc
* Update gpu/gpu_darwin.go
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com >
* Update llm/llm.go
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com >
* Update llm/llm.go
* add overhead for cuda memory
* Update llm/llm.go
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com >
* fix build error on linux
* address comments
---------
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com >
2024-01-08 16:42:00 -05:00
Bruce MacDonald
7e8f7c8358
remove ggml automatic re-pull ( #1856 )
2024-01-08 14:41:01 -05:00
Bruce MacDonald
3f3eb19a3b
document response in modelfile template variables ( #1428 )
2024-01-08 14:38:51 -05:00
Daniel Hiltgen
059ae4585e
Merge pull request #1834 from dhiltgen/old_cuda
...
Detect very old CUDA GPUs and fall back to CPU
2024-01-07 10:39:49 -08:00
Daniel Hiltgen
6347f501ca
Merge pull request #1828 from dhiltgen/fix_llava
...
Accept windows paths for image processing
2024-01-07 09:05:46 -08:00
Jeffrey Morgan
5feec959ad
dont use -Wall in static build ( #1833 )
2024-01-07 10:39:19 -05:00
Jeffrey Morgan
dbdd50b283
add -DCMAKE_SYSTEM_NAME=Darwin cmake flag ( #1832 )
2024-01-07 00:46:17 -05:00
Daniel Hiltgen
d74ce6bd4f
Detect very old CUDA GPUs and fall back to CPU
...
If we try to load the CUDA library on an old GPU, it panics and crashes
the server. This checks the compute capability before we load the
library so we can gracefully fall back to CPU mode.
2024-01-06 21:40:29 -08:00
Guilherme Baptista
57942b4676
Update README.md - Community Integrations - Ollama for Ruby ( #1830 )
2024-01-06 22:31:39 -05:00
Daniel Hiltgen
e0d05b0f1e
Accept windows paths for image processing
...
This enhances our regex to support windows style paths. The regex will
match invalid path specifications, but we'll still validate file
existence and filter out mismatches
2024-01-06 10:50:27 -08:00
Daniel Hiltgen
2d9dd14f27
Merge pull request #1697 from dhiltgen/win_docs
...
Add windows native build instructions
2024-01-05 19:34:20 -08:00
Jeffrey Morgan
1caa56128f
add cuda lib path for nvidia container toolkit
2024-01-05 21:10:37 -05:00
Michael Yang
0101e76dbe
Merge pull request #1797 from sublimator/nd-allow-extension-origins-still-needs-explicit-listing-2024-01-05
...
fix: allow extension origins (still needs explicit listing), fixes #1686
2024-01-05 17:20:09 -08:00
Michael Yang
2ef9352b94
fix(cmd): history in alt mode
2024-01-05 16:20:02 -08:00
Michael Yang
5580ae2472
fix: set template without triple quotes
2024-01-05 15:51:33 -08:00
Bruce MacDonald
3a9f447141
only pull gguf model if already exists ( #1817 )
2024-01-05 18:50:00 -05:00
Patrick Devine
9c2941e61b
switch api for ShowRequest to use the name field ( #1816 )
2024-01-05 15:06:43 -08:00
Patrick Devine
238ac5e765
Add unit tests for Parser ( #1815 )
2024-01-05 14:04:31 -08:00
Bruce MacDonald
4f4980b66b
simplify ggml update logic ( #1814 )
...
- additional information is now available in show response, use this to pull gguf before running
- make gguf updates cancellable
2024-01-05 15:22:32 -05:00
Patrick Devine
22e93efa41
add show info command and fix the modelfile
2024-01-05 12:20:05 -08:00
Patrick Devine
2909dce894
split up interactive generation
2024-01-05 12:20:05 -08:00
Jeffrey Morgan
df32537312
gpu: read memory info from all cuda devices ( #1802 )
...
* gpu: read memory info from all cuda devices
* add `LOOKUP_SIZE` constant
* better constant name
* address comments
2024-01-05 11:25:58 -05:00
Bruce MacDonald
3367b5f3df
remove unused generate patches ( #1810 )
2024-01-05 11:25:45 -05:00
Matt Williams
46edbbc518
Merge pull request #1801 from jmorganca/mattw/correctdockerlink
2024-01-04 19:20:45 -08:00
Michael Yang
d2ff18cd6b
Merge pull request #1791 from jmorganca/mxyng/update-build
...
update Dockerfile.build
2024-01-04 19:13:44 -08:00
Matt Williams
df086d3c8c
fix docker doc to point to hub
...
Signed-off-by: Matt Williams <m@technovangelist.com >
2024-01-04 18:42:23 -08:00
Nicholas Dudfield
8baaaa39c0
Allow extension origins (still needs explicit listing), fixes #1686
2024-01-05 09:06:47 +07:00
Michael Yang
f9961c70ae
update build
2024-01-04 17:34:38 -08:00
Daniel Hiltgen
cd8fad3398
Merge pull request #1790 from dhiltgen/llm_code_shuffle
...
Cleaup stale submodule
2024-01-04 13:47:25 -08:00
Daniel Hiltgen
9983fa5f4e
Cleaup stale submodule
...
If the tree has a stale submodule, make sure we clean it up first
2024-01-04 13:40:16 -08:00
Daniel Hiltgen
dfda91c2ee
Merge pull request #1788 from dhiltgen/llm_code_shuffle
...
Revamp code layout for the llm directory and llama.cpp submodule
2024-01-04 13:14:28 -08:00
Daniel Hiltgen
fac9060da5
Init submodule with new path
2024-01-04 13:00:13 -08:00
Daniel Hiltgen
a554616f8e
remove old llama.cpp submodule path
2024-01-04 12:12:21 -08:00
Daniel Hiltgen
77d96da94b
Code shuffle to clean up the llm dir
2024-01-04 12:12:05 -08:00
Brian Murray
0d6e3565ae
Add embeddings to API ( #1773 )
2024-01-04 15:00:52 -05:00
Daniel Hiltgen
b5939008a1
Merge pull request #1785 from dhiltgen/win_native_cli
...
Load dynamic cpu lib on windows
2024-01-04 08:55:01 -08:00
Daniel Hiltgen
e9ce91e9a6
Load dynamic cpu lib on windows
...
On linux, we link the CPU library in to the Go app and fall back to it
when no GPU match is found. On windows we do not link in the CPU library
so that we can better control our dependencies for the CLI. This fixes
the logic so we correctly fallback to the dynamic CPU library
on windows.
2024-01-04 08:41:41 -08:00
Bruce MacDonald
4ad6c9b11f
fix: pull either original model or from model on create ( #1774 )
2024-01-04 01:34:38 -05:00
Jeffrey Morgan
c0285158a9
tweak memory requirements error text
2024-01-03 19:47:18 -05:00
Jeffrey Morgan
77a66df72c
add macOS memory check for 47B models
2024-01-03 19:46:16 -05:00
Jeffrey Morgan
5b4837f881
remove unused filetype check
2024-01-03 19:45:39 -05:00
Jeffrey Morgan
29340c2e62
update cmake flags for amd64 macOS ( #1780 )
...
* update cmake flags for intel macOS
* remove `LLAMA_K_QUANTS`
* put back `CMAKE_OSX_DEPLOYMENT_TARGET` and disable `LLAMA_F16C`
2024-01-03 19:22:15 -05:00
Daniel Hiltgen
d5ec730354
Merge pull request #1779 from dhiltgen/refined_amd_gpu_list
...
Improve maintainability of Radeon card list
2024-01-03 16:18:57 -08:00
Daniel Hiltgen
8bed487aba
Merge pull request #1778 from dhiltgen/wsl1
...
Fail fast on WSL1 while allowing on WSL2
2024-01-03 16:18:41 -08:00
Daniel Hiltgen
c1a10a6e9b
Merge pull request #1781 from dhiltgen/cpu_only_build
...
Fix CPU only builds
2024-01-03 16:18:25 -08:00
Daniel Hiltgen
ddbfa6fe31
Fix CPU only builds
...
Go embed doesn't like when there's no matching files, so put
a dummy placeholder in to allow building without any GPU support
If no "server" library is found, it's safely ignored at runtime.
2024-01-03 16:08:34 -08:00
Daniel Hiltgen
2fcd41ef81
Fail fast on WSL1 while allowing on WSL2
...
This prevents users from accidentally installing on WSL1 with instructions
guiding how to upgrade their WSL instance to version 2. Once running WSL2
if you have an NVIDIA card, you can follow their instructions to set up
GPU passthrough and run models on the GPU. This is not possible on WSL1.
2024-01-03 16:02:32 -08:00
Daniel Hiltgen
16f4603b67
Improve maintainability of Radeon card list
...
This moves the list of AMD GPUs to an easier to maintain list which
should make it easier to update over time.
2024-01-03 15:16:56 -08:00
Daniel Hiltgen
1184686649
Merge pull request #1776 from dhiltgen/render_group
...
Add ollama user to render group for Radeon support
2024-01-03 13:07:54 -08:00
Daniel Hiltgen
2588cb2daa
Add ollama user to render group for Radeon support
...
For the ROCm libraries to access the driver, we need to add the ollama user
to the render group.
2024-01-03 12:56:31 -08:00
Jeffrey Morgan
c7ea8f237e
set num_gpu to 1 only by default on darwin arm64 ( #1771 )
2024-01-03 14:10:29 -05:00
Bruce MacDonald
0b3118e0af
fix: relay request opts to loaded llm prediction ( #1761 )
2024-01-03 12:01:42 -05:00
Daniel Hiltgen
05face44ef
Merge pull request #1683 from dhiltgen/fix_windows_test
...
Fix windows system memory lookup
2024-01-03 09:00:39 -08:00
Daniel Hiltgen
a2ad952440
Fix windows system memory lookup
...
This refines the gpu package error handling and fixes a bug with the
system memory lookup on windows.
2024-01-03 08:50:01 -08:00
Daniel Hiltgen
5fea4410be
Merge pull request #1680 from dhiltgen/better_patching
...
Refactor how we augment llama.cpp and refine windows native build
2024-01-03 08:10:17 -08:00
Bruce MacDonald
b846eb64d0
Fix template api doc description ( #1661 )
2024-01-03 11:00:59 -05:00
Cole Gillespie
3c5dd9ed1d
Update README.md ( #1766 )
2024-01-03 10:44:22 -05:00
Jeffrey Morgan
b17ccd0542
Update import.md
2024-01-02 22:28:18 -05:00
Patrick Devine
d0409f772f
keyboard shortcut help ( #1764 )
2024-01-02 18:04:12 -08:00
Jeffrey Morgan
ec261422af
use docker build in build scripts
2024-01-02 19:32:54 -05:00
Daniel Hiltgen
0498f7ce56
Get rid of one-line llama.log
...
This one log line was triggering a single line llama.log to be generated
in the pwd of the server
2024-01-02 15:36:16 -08:00
Daniel Hiltgen
738a8d12eb
Rename the ollama cmakefile
2024-01-02 15:36:16 -08:00
Daniel Hiltgen
d966b730ac
Switch windows build to fully dynamic
...
Refactor where we store build outputs, and support a fully dynamic loading
model on windows so the base executable has no special dependencies thus
doesn't require a special PATH.
2024-01-02 15:36:16 -08:00
Daniel Hiltgen
9a70aecccb
Refactor how we augment llama.cpp
...
This changes the model for llama.cpp inclusion so we're not applying a patch,
but instead have the C++ code directly in the ollama tree, which should make it
easier to refine and update over time.
2024-01-02 15:35:55 -08:00
Karim ElGhandour
22cd5eaab6
Added Ollama-SwiftUI to integrations ( #1747 )
2024-01-02 09:47:50 -05:00
Dane Madsen
304a8799ca
Update README.md ( #1757 )
2024-01-02 09:47:08 -05:00
Jeffrey Morgan
2a2fa3c329
api.md cleanup & formatting
2023-12-27 14:32:35 -05:00
Jeffrey Morgan
55978c1dc9
clean up cache api option
2023-12-27 14:27:45 -05:00
Jeffrey Morgan
d4ebdadbe7
enable cache_prompt by default
2023-12-27 14:23:42 -05:00
Daniel Hiltgen
e201efa14b
Add windows native build instructions
2023-12-25 08:31:34 -08:00
Icelain
c5f21f73a4
follow best practices by adding resp.Body.Close() ( #1708 )
2023-12-25 09:01:37 -05:00
Jeffrey Morgan
371bc73531
Update README.md
2023-12-24 11:54:08 -05:00
Jeffrey Morgan
c651d8b824
Update README.md
2023-12-23 11:18:12 -05:00
Daniel Hiltgen
cf50ef5b51
Merge pull request #1684 from dhiltgen/tag_integration_tests
...
Guard integration tests with a tag
2023-12-22 16:43:41 -08:00
Daniel Hiltgen
697bea6939
Guard integration tests with a tag
...
This should help CI avoid running the integration test logic in a
container where it's not currently possible.
2023-12-22 16:33:27 -08:00
K0IN
10da41d677
Add Cache flag to api ( #1642 )
2023-12-22 17:16:20 -05:00
Bruce MacDonald
db356c8519
post-response templating ( #1427 )
2023-12-22 17:07:05 -05:00
Jeffrey Morgan
b80081022f
cache docker builds in build_linux.sh
2023-12-22 16:01:20 -05:00
Matt Williams
790457398a
Merge pull request #1677 from jmorganca/mattw/docrunupdate
...
update where are models stored q
2023-12-22 09:56:27 -08:00
Matt Williams
511069a2a5
update where are models stored q
...
Signed-off-by: Matt Williams <m@technovangelist.com >
2023-12-22 09:48:44 -08:00
Matt Williams
5a85070c22
Update readmes, requirements, packagejsons, etc for all examples ( #1452 )
...
Most of the examples needed updates of Readmes to show how to run them. Some of the requirements.txt files had extra content that wasn't needed, or missing altogether. Apparently some folks like to run npm start
to run typescript, so a script was added to all typescript examples which
hadn't been done before.
Basically just a lot of cleanup.
Signed-off-by: Matt Williams <m@technovangelist.com >
2023-12-22 09:10:41 -08:00
Matt Williams
291700c92d
Clean up documentation ( #1506 )
...
* Clean up documentation
Will probably need to update with PRs for new release.
Signed-off-by: Matt Williams <m@technovangelist.com >
* Correcting to fit in 0.1.15 changes
Signed-off-by: Matt Williams <m@technovangelist.com >
* Update README.md
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com >
* addressing comments
Signed-off-by: Matt Williams <m@technovangelist.com >
* more api cleanup
Signed-off-by: Matt Williams <m@technovangelist.com >
* its llava not llama
Signed-off-by: Matt Williams <m@technovangelist.com >
* Update docs/troubleshooting.md
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com >
* Updated hosting to server and documented all env vars
Signed-off-by: Matt Williams <m@technovangelist.com >
* remove last of the cli descriptions
Signed-off-by: Matt Williams <m@technovangelist.com >
* Update README.md
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com >
* update further per conversation with jeff earlier today
Signed-off-by: Matt Williams <m@technovangelist.com >
* cleanup the doc readme
Signed-off-by: Matt Williams <m@technovangelist.com >
* move upgrade to faq
Signed-off-by: Matt Williams <m@technovangelist.com >
* first change
Signed-off-by: Matt Williams <m@technovangelist.com >
* updated
Signed-off-by: Matt Williams <m@technovangelist.com >
* Update docs/faq.md
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com >
* Update docs/api.md
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com >
* Update docs/api.md
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com >
* Update docs/api.md
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com >
* Update docs/api.md
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com >
* Update docs/api.md
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com >
* Update docs/api.md
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com >
* Update docs/README.md
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com >
* Update docs/api.md
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com >
* Update docs/api.md
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com >
* Update docs/api.md
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com >
* Update README.md
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com >
* Update docs/README.md
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com >
* Update docs/api.md
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com >
* Update docs/api.md
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com >
* Update docs/api.md
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com >
* Update docs/README.md
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com >
* Update docs/README.md
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com >
* Update docs/README.md
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com >
* examples in parent
Signed-off-by: Matt Williams <m@technovangelist.com >
* add exapmle for create model.
Signed-off-by: Matt Williams <m@technovangelist.com >
* update faq
Signed-off-by: Matt Williams <m@technovangelist.com >
* update create model api
Signed-off-by: Matt Williams <m@technovangelist.com >
* Update docs/api.md
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com >
* Update docs/faq.md
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com >
* Update docs/troubleshooting.md
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com >
* update the readme in docs
Signed-off-by: Matt Williams <m@technovangelist.com >
* update a few more things
Signed-off-by: Matt Williams <m@technovangelist.com >
* Update docs/troubleshooting.md
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com >
* Update docs/faq.md
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com >
* Update README.md
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com >
* Update docs/modelfile.md
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com >
* Update docs/troubleshooting.md
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com >
---------
Signed-off-by: Matt Williams <m@technovangelist.com >
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com >
2023-12-22 09:10:01 -08:00
Daniel Hiltgen
9db28af84e
Merge pull request #1675 from dhiltgen/less_verbose
...
Quiet down llama.cpp logging by default
2023-12-22 08:57:17 -08:00
Daniel Hiltgen
e5202eb687
Quiet down llama.cpp logging by default
...
By default builds will now produce non-debug and non-verbose binaries.
To enable verbose logs in llama.cpp and debug symbols in the
native code, set `CGO_CFLAGS=-g`
2023-12-22 08:47:18 -08:00
Daniel Hiltgen
96fb441abd
Merge pull request #1146 from dhiltgen/ext_server_cgo
...
Add cgo implementation for llama.cpp
2023-12-22 08:16:31 -08:00
Daniel Hiltgen
495c06e4a6
Fix doc glitch
2023-12-21 18:21:31 -08:00
Daniel Hiltgen
fa24e73b82
Remove CPU build, fixup linux build script
2023-12-21 18:21:31 -08:00
Daniel Hiltgen
325d74985b
Fix CPU performance on hyperthreaded systems
...
The default thread count logic was broken and resulted in 2x the number
of threads as it should on a hyperthreading CPU
resulting in thrashing and poor performance.
2023-12-21 16:23:36 -08:00
Bruce MacDonald
fabf2f3467
allow for starting llava queries with filepath ( #1549 )
2023-12-21 13:20:59 -05:00
Daniel Hiltgen
d9cd3d9667
Revive windows build
...
The windows native setup still needs some more work, but this gets it building
again and if you set the PATH properly, you can run the resulting exe on a cuda system.
2023-12-20 17:21:54 -08:00
Patrick Devine
a607d922f0
add FAQ for slow networking in WSL2 ( #1646 )
2023-12-20 16:27:24 -08:00
Daniel Hiltgen
7555ea44f8
Revamp the dynamic library shim
...
This switches the default llama.cpp to be CPU based, and builds the GPU variants
as dynamically loaded libraries which we can select at runtime.
This also bumps the ROCm library to version 6 given 5.7 builds don't work
on the latest ROCm library that just shipped.
2023-12-20 14:45:57 -08:00
Jeffrey Morgan
df06812494
Update api.md
2023-12-20 08:47:53 -05:00
Daniel Hiltgen
1d1eb1688c
Additional nvidial-ml path to check
2023-12-19 15:52:34 -08:00
Michael Yang
23dc179350
Merge pull request #1619 from jmorganca/mxyng/fix-version-test
...
fix(test): use real version string for comparison
2023-12-19 15:48:52 -08:00
Michael Yang
63aac0edc5
fix(test): use real version string for comparison
2023-12-19 15:03:02 -08:00
Daniel Hiltgen
6558f94ed0
Fix darwin intel build
2023-12-19 13:32:24 -08:00
Erick Ghaumez
1ca484f67e
Add Langchain Dart library ( #1564 )
...
* Add Langchain Dart
* Update README.md
---------
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com >
2023-12-19 14:04:52 -05:00
Jeffrey Morgan
72b0c32fe9
Update README.md
2023-12-19 12:59:22 -05:00
Jeffrey Morgan
68c28224f8
Update README.md
2023-12-19 12:59:03 -05:00
Daniel Hiltgen
54dbfa4c4a
Carry ggml-metal.metal as payload
2023-12-19 09:05:46 -08:00
Daniel Hiltgen
5646826a79
Add WSL2 path to nvidia-ml.so library
2023-12-19 09:05:46 -08:00
Daniel Hiltgen
3269535a4c
Refine handling of shim presence
...
This allows the CPU only builds to work on systems with Radeon cards
2023-12-19 09:05:46 -08:00
Daniel Hiltgen
1b991d0ba9
Refine build to support CPU only
...
If someone checks out the ollama repo and doesn't install the CUDA
library, this will ensure they can build a CPU only version
2023-12-19 09:05:46 -08:00
Daniel Hiltgen
51082535e1
Add automated test for multimodal
...
A simple test case that verifies llava:7b can read text in an image
2023-12-19 09:05:46 -08:00
Daniel Hiltgen
9adca7f711
Bump llama.cpp to b1662 and set n_parallel=1
2023-12-19 09:05:46 -08:00
Daniel Hiltgen
89bbaafa64
Build linux using ubuntu 20.04
...
This changes the container-based linux build to use an older Ubuntu
distro to improve our compatibility matrix for older user machines
2023-12-19 09:05:46 -08:00
Daniel Hiltgen
35934b2e05
Adapted rocm support to cgo based llama.cpp
2023-12-19 09:05:46 -08:00
65a
f8ef4439e9
Use build tags to generate accelerated binaries for CUDA and ROCm on Linux.
...
The build tags rocm or cuda must be specified to both go generate and go build.
ROCm builds should have both ROCM_PATH set (and the ROCM SDK present) as well
as CLBlast installed (for GGML) and CLBlast_DIR set in the environment to the
CLBlast cmake directory (likely /usr/lib/cmake/CLBlast). Build tags are also
used to switch VRAM detection between cuda and rocm implementations, using
added "accelerator_foo.go" files which contain architecture specific functions
and variables. accelerator_none is used when no tags are set, and a helper
function addRunner will ignore it if it is the chosen accelerator. Fix go
generate commands, thanks @deadmeu for testing.
2023-12-19 09:05:46 -08:00
Daniel Hiltgen
d4cd695759
Add cgo implementation for llama.cpp
...
Run the server.cpp directly inside the Go runtime via cgo
while retaining the LLM Go abstractions.
2023-12-19 09:05:46 -08:00
Bruce MacDonald
5e7fd6906f
Update images.go
2023-12-19 09:05:46 -08:00
Bruce MacDonald
811b1f03c8
deprecate ggml
...
- remove ggml runner
- automatically pull gguf models when ggml detected
- tell users to update to gguf in the case automatic pull fails
Co-Authored-By: Jeffrey Morgan <jmorganca@gmail.com >
2023-12-19 09:05:46 -08:00
Matt Williams
ed195f3562
Merge pull request #1595 from pgibler/main
...
Added cmdh to community section in README
2023-12-18 20:55:18 -08:00
Matt Williams
e0d0072ef1
Merge pull request #1592 from jmorganca/mattw/examplepruning
...
Lets get rid of these old modelfile examples
2023-12-18 20:29:48 -08:00
pgibler
620a2ffcfb
Added cmdh to community section in README
2023-12-18 22:04:40 -05:00
Matt Williams
d287013f24
Lets get rid of these old modelfile examples
...
Signed-off-by: Matt Williams <m@technovangelist.com >
2023-12-18 17:47:33 -08:00
Jeffrey Morgan
6b5bdfa6c9
update runner submodule
2023-12-18 17:33:46 -05:00
Jeffrey Morgan
c063ee4af0
update runner submodule to fix hipblas build
2023-12-18 15:41:13 -05:00
Bruce MacDonald
d99fa6ce0a
send empty messages on last chat response ( #1530 )
2023-12-18 14:23:38 -05:00
Patrick Devine
3948c6ea06
add magic header for unit tests ( #1558 )
2023-12-18 10:41:02 -08:00
Jeffrey Morgan
b85982eb91
update runner submodule
2023-12-18 12:43:31 -05:00
Patrick Devine
86b0dd4b16
add API create/copy handlers ( #1541 )
2023-12-15 11:59:18 -08:00
Augustinas Malinauskas
f728738427
README with Enchanted iOS App ( #1529 )
...
* feat(docs): README with Enchanted iOS app
* Update README.md
---------
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com >
2023-12-15 14:37:29 -05:00
Ian Purton
115048a0d8
Added Bionic GPT as a front end. ( #1463 )
...
* Added Bionic GPT as a front end.
* Update README.md
---------
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com >
2023-12-15 14:33:04 -05:00
Bruce MacDonald
1b417a7836
use exp slices for go 1.20 compatibility ( #1544 )
2023-12-15 14:15:56 -05:00
Patrick Devine
0174665d0e
add API tests for list handler ( #1535 )
2023-12-14 18:18:25 -08:00
Patrick Devine
630518f0d9
Add unit test of API routes ( #1528 )
2023-12-14 16:47:40 -08:00
Bruce MacDonald
6e16098a60
remove sample_count from docs ( #1527 )
...
this info has not been returned from these endpoints in some time
2023-12-14 17:49:00 -05:00
Bruce MacDonald
6ee8c80199
restore model load duration on generate response ( #1524 )
...
* restore model load duration on generate response
- set model load duration on generate and chat done response
- calculate createAt time when response created
* remove checkpoints predict opts
* Update routes.go
2023-12-14 12:15:50 -05:00
Jeffrey Morgan
31f0551dab
Update runner to support mixtral and mixture of experts (MoE) ( #1475 )
2023-12-13 17:15:10 -05:00
Jeffrey Morgan
4a1abfe4fa
fix tests
2023-12-13 14:42:30 -05:00
Jeffrey Morgan
bbd41494bf
add multimodal to README.md
2023-12-13 14:38:47 -05:00
Jeffrey Morgan
fedba24a63
Docs for multimodal support ( #1485 )
...
* add multimodal docs
* add chat api docs
* consistency between `/api/generate` and `/api/chat`
* simplify docs
2023-12-13 13:59:33 -05:00
pepperoni21
e3b090dbc5
Added message format for chat api ( #1488 )
2023-12-13 11:21:23 -05:00
Patrick Devine
d9e60f634b
add image support to the chat api ( #1490 )
2023-12-12 13:28:58 -08:00
Michael Yang
4251b342de
Merge pull request #1469 from jmorganca/mxyng/model-types
...
remove per-model types
2023-12-12 12:27:03 -08:00
Jeffrey Morgan
0a9d348023
Fix issues with /set template and /set system ( #1486 )
2023-12-12 14:43:19 -05:00
Bruce MacDonald
3144e2a439
exponential back-off ( #1484 )
2023-12-12 12:33:02 -05:00
Bruce MacDonald
c0960e29b5
retry on concurrent request failure ( #1483 )
...
- remove parallel
2023-12-12 12:14:35 -05:00
ruecat
5314fc9b63
Fix Readme "Database -> MindsDB" link ( #1479 )
2023-12-12 10:26:13 -05:00
Jorge Torres
a36b5fef3b
Update README.md ( #1412 )
2023-12-11 18:05:10 -05:00
Patrick Devine
910e9401d0
Multimodal support ( #1216 )
...
---------
Co-authored-by: Matt Apperson <mattapperson@Matts-MacBook-Pro.local >
2023-12-11 13:56:22 -08:00
Michael Yang
56ffc3023a
remove per-model types
...
mostly replaced by decoding tensors except ggml models which only
support llama
2023-12-11 09:40:21 -08:00
Bruce MacDonald
7a1b37ac64
os specific ctrl-z ( #1420 )
2023-12-11 10:48:14 -05:00
Jeffrey Morgan
5d4d2e2c60
update docs with chat completion api
2023-12-10 13:53:36 -05:00
Jeffrey Morgan
7db5bcf73b
fix go-staticcheck warning
2023-12-10 11:44:27 -05:00
Jeffrey Morgan
fa2f095bd9
fix model name returned by /api/generate being different than the model name provided
2023-12-10 11:42:15 -05:00
Jeffrey Morgan
045b855db9
fix error on accumulating final chat response
2023-12-10 11:24:39 -05:00
Jeffrey Morgan
32064a0646
fix empty response when receiving runner error
2023-12-10 10:53:38 -05:00
Jeffrey Morgan
d9a250e9b5
seek to end of file when decoding older model formats
2023-12-09 21:14:35 -05:00
Jeffrey Morgan
944519ed16
seek to eof for older model binaries
2023-12-09 20:48:57 -05:00
Jeffrey Morgan
2dd040d04c
do not use --parallel 2 for old runners
2023-12-09 20:17:33 -05:00
Bruce MacDonald
bbe41ce41a
fix: parallel queueing race condition caused silent failure ( #1445 )
...
* fix: queued request failures
- increase parallel requests to 2 to complete queued request, queueing is managed in ollama
* log steam errors
2023-12-09 14:14:02 -05:00
Jeffrey Morgan
9e1406e4ed
Don't expose model information in /api/generate
2023-12-09 02:05:43 -08:00
Jeffrey Morgan
b74580c913
Update api.md
2023-12-08 16:02:07 -08:00
Bruce MacDonald
7e9405fd07
fix: encode full previous prompt in context ( #1424 )
2023-12-08 16:53:51 -05:00
Bruce MacDonald
3b0b8930d4
fix: only flush template in chat when current role encountered ( #1426 )
2023-12-08 16:44:24 -05:00
Bruce MacDonald
e3f925fc1b
fix: restore modelfile system in prompt template ( #1425 )
2023-12-08 14:20:19 -05:00
Jeffrey Morgan
2a2289fb6b
Update api.md
2023-12-08 09:36:45 -08:00
Matt Williams
dd427f499a
Merge pull request #1419 from jmorganca/mattw/typescript-simplechat
...
Simple chat example for typescript
2023-12-07 14:42:24 -08:00
Michael Yang
2ae573c7ed
Merge pull request #1421 from jmorganca/mxyng/fix-newline
...
fix redundant newline
2023-12-07 13:47:23 -08:00
Matt Williams
02fe26c44b
update the readme as per bruce
...
Signed-off-by: Matt Williams <m@technovangelist.com >
2023-12-07 13:46:30 -08:00
Michael Yang
16c7548460
fix redundant newline
2023-12-07 13:44:45 -08:00
Matt Williams
fa75998c0d
Update examples/typescript-simplechat/readme.md
...
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com >
2023-12-07 13:40:54 -08:00
Matt Williams
5344f886c8
Update examples/typescript-simplechat/client.ts
...
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com >
2023-12-07 13:40:37 -08:00
Matt Williams
6cc823c9b5
Update examples/typescript-simplechat/client.ts
...
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com >
2023-12-07 13:39:59 -08:00
Matt Williams
b84d34e632
Update examples/typescript-simplechat/readme.md
...
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com >
2023-12-07 13:39:33 -08:00
Matt Williams
30229a913c
Update examples/typescript-simplechat/client.ts
...
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com >
2023-12-07 13:39:24 -08:00
Matt Williams
1ade380bd7
Simple chat example for typescript
...
Signed-off-by: Matt Williams <m@technovangelist.com >
2023-12-07 11:48:25 -08:00
Jeffrey Morgan
ba264e9da8
add future version note to chat api docs
2023-12-07 09:42:15 -08:00
Matt Williams
a2405ec831
Merge pull request #1409 from jmorganca/mattw/python-simplechat
...
Simple chat example
2023-12-06 15:49:45 -08:00
Matt Williams
ce809bb529
Merge branch 'mattw/python-simplechat' of github.com:jmorganca/ollama into mattw/python-simplechat
2023-12-06 15:48:42 -08:00
Matt Williams
76bc4d0458
Cleanup as per Bruce
...
Signed-off-by: Matt Williams <m@technovangelist.com >
2023-12-06 15:44:40 -08:00
Bruce MacDonald
4a02945a15
Update examples/python-simplechat/client.py
2023-12-06 18:36:45 -05:00
Matt Williams
aec742b6d2
Update examples/python-simplechat/readme.md
...
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com >
2023-12-06 15:30:45 -08:00
Matt Williams
f337642e94
Update examples/python-simplechat/readme.md
...
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com >
2023-12-06 15:30:35 -08:00
Matt Williams
51131cc6e2
Update examples/python-simplechat/client.py
...
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com >
2023-12-06 15:30:10 -08:00
Matt Williams
43027789dc
Simple chat example
...
Signed-off-by: Matt Williams <m@technovangelist.com >
2023-12-06 14:35:58 -08:00
Xe Iaso
f9b7d65e2b
docs/tutorials: add bit on how to use Fly GPUs on-demand with Ollama ( #1406 )
...
Signed-off-by: Xe Iaso <xe@camellia.finch-kitefin.ts.net >
2023-12-06 14:14:02 -08:00
Michael Yang
1f05d77110
Merge pull request #1244 from jmorganca/brucemacd/no-fail-template
...
do not fail on unsupported template variables
2023-12-06 13:23:04 -08:00
Michael Yang
c3ff36088b
Merge pull request #774 from jmorganca/mxyng/server-version
...
add version api and show server version in cli
2023-12-06 13:22:55 -08:00
Samuel Calderon
13524b5e72
List "Send chat messages" in table of contents ( #1399 )
...
Thank you @calderonsamuel
2023-12-06 12:34:27 -08:00
Michael Yang
f1b049fed8
Merge pull request #1377 from jmorganca/mxyng/qwen
...
update for qwen
2023-12-06 12:31:51 -08:00
Jeffrey Morgan
97c5696945
fix base urls in chat examples
2023-12-06 12:10:20 -08:00
Bruce MacDonald
47d4e22673
use missingkey in set empty interface when missing
2023-12-05 15:49:05 -08:00
Michael Yang
32f62fbb8e
Merge pull request #1334 from jmorganca/mxyng/load-projectors
...
load projectors
2023-12-05 14:40:53 -08:00
Michael Yang
5d75505ebd
return model configuration in generate
2023-12-05 14:39:02 -08:00
Michael Yang
b9495ea162
load projectors
2023-12-05 14:36:12 -08:00
Michael Yang
409bb9674e
Merge pull request #1308 from jmorganca/mxyng/split-from
...
split from into one or more models
2023-12-05 14:33:03 -08:00
Michael Yang
d3479c07a1
Merge pull request #1250 from jmorganca/mxyng/create-layer
...
refactor layer creation
2023-12-05 14:32:52 -08:00
Michael Yang
b12f1b984f
Merge pull request #1393 from jmorganca/mxyng/fix-whitespace
...
fix: trim space in modelfile fields
2023-12-05 12:18:01 -08:00
Bruce MacDonald
195e3d9dbd
chat api endpoint ( #1392 )
2023-12-05 14:57:33 -05:00
Michael Yang
38fe1a368b
fix: trim space in modelfile fields
2023-12-05 11:57:29 -08:00
Michael Yang
4b77fcb2b9
comments
2023-12-05 09:43:50 -08:00
Michael Yang
cde13bcdea
cmd: only print server version when different
2023-12-05 09:36:01 -08:00
Michael Yang
0f0cd265a7
cmd: add server version
2023-12-05 09:36:01 -08:00
Michael Yang
0db4706ec2
api: add version api handler
2023-12-05 09:36:01 -08:00
Michael Yang
1ebdbd9694
server: add version handler
2023-12-05 09:36:01 -08:00
Michael Yang
5c59455b59
cmd: use existing cmd context
2023-12-05 09:36:01 -08:00
Jeffrey Morgan
00d06619a1
Revert "chat api ( #991 )" while context variable is fixed
...
This reverts commit 7a0899d62d .
2023-12-04 21:16:27 -08:00
Matt Williams
f1ef3f9947
remove mention of gpt-neox in import ( #1381 )
...
Signed-off-by: Matt Williams <m@technovangelist.com >
2023-12-04 20:58:10 -08:00
Michael Yang
5a5dca13b2
comments
2023-12-04 16:59:23 -08:00
Michael Yang
7232f1fa41
go mod tidy
2023-12-04 16:59:23 -08:00
Michael Yang
72e7a49aa9
seek instead of copyn
2023-12-04 16:59:23 -08:00
Michael Yang
a3737cbd33
use NewLayer for CreateBlobHandler
2023-12-04 16:59:23 -08:00
Michael Yang
998f1785b6
add modelfamilies
2023-12-04 16:59:23 -08:00
Michael Yang
70a93057cd
refactor layer creation
...
previous layer creation was not ideal because:
1. it required reading the input file multiple times, once to calculate
the sha256 checksum, another to write it to disk, and potentially one
more to decode the underlying gguf
2. used io.ReadSeeker which is prone to user error. if the file isn't
reset correctly or in the right place, it could end up reading an
empty file
there are also some brittleness when reading existing layers else
writing the inherited layers will error reading an already closed file
this commit aims to fix these issues by restructuring layer creation.
1. it will now write the layer to a temporary file as well as the hash
function and move it to the final location on Commit
2. layers are read once once when copied to the destination. exception
is raw model files which still requires a second read to decode the
model metadata
2023-12-04 16:59:23 -08:00
Michael Yang
2cb0fa7d40
split from into one or more models
2023-12-04 16:59:23 -08:00
Michael Yang
b2816bca67
unnecessary ReadSeeker for DecodeGGML
2023-12-04 16:59:23 -08:00
Patrick Devine
bf704423c5
revert cli to use /api/generate ( #1383 )
2023-12-04 16:35:29 -08:00
Bruce MacDonald
7a0899d62d
chat api ( #991 )
...
- update chat docs
- add messages chat endpoint
- remove deprecated context and template generate parameters from docs
- context and template are still supported for the time being and will continue to work as expected
- add partial response to chat history
2023-12-04 18:01:06 -05:00
Michael Yang
0cca1486dd
Merge pull request #1376 from jmorganca/mxyng/rocky-install
...
install: fix rocky kernel packages
2023-12-04 14:23:43 -08:00
Patrick Devine
2113c9d31a
make linewrap still work when the terminal width has changed ( #1350 )
2023-12-04 14:14:56 -08:00
Michael Yang
6deebf2489
update for qwen
2023-12-04 11:38:05 -08:00
Michael Yang
95cb38ae47
install: fix rocky kernel packages
2023-12-04 11:10:42 -08:00
ruecat
1f126afb2d
Ollama Telegram Bot ( #1364 )
...
* Add "ollama-telegram" to Extensions & Plugins
* Update README.md
2023-12-03 11:19:55 -08:00
Jeffrey Morgan
f6201a7a6c
remove duplicate community integration in README.md
2023-12-02 21:18:13 -08:00
Michael Yang
b3f6c6598f
Merge pull request #1349 from jmorganca/mxyng/ctrl-z
...
handle ctrl+z
2023-12-01 16:21:49 -08:00
Michael Yang
88620e983a
handle ctrl+z
2023-12-01 16:15:20 -08:00
Michael Yang
cedae0d17a
Merge pull request #1347 from jshph/adapter-hash
...
Fix adapter loading from SHA hash
2023-12-01 11:08:25 -08:00
Joshua Pham
bb80a597db
Fix adapter loading from SHA hash
2023-12-01 13:50:55 -05:00
Patrick Devine
6681d37861
allow setting the system and template for prompts in the repl ( #1335 )
2023-12-01 09:28:35 -08:00
Michael Yang
0409c1fa59
docker: set PATH, LD_LIBRARY_PATH, and capabilities ( #1336 )
...
* docker: set PATH, LD_LIBRARY_PATH, and capabilities
* example: update k8s gpu manifest
2023-11-30 21:16:56 -08:00
Michael Yang
b56e92470a
Merge pull request #1229 from jmorganca/mxyng/calculate-as-you-go
...
revert checksum calculation to calculate-as-you-go
2023-11-30 10:54:38 -08:00
Jeffrey Morgan
5687f1a0cf
fix unexpected end of response errors when cancelling in ollama run
2023-11-30 00:30:21 -05:00
James Radtke
7eda3d0c55
Corrected transposed 129 to 192 for OLLAMA_ORIGINS example ( #1325 )
2023-11-29 22:44:17 -05:00
Bruce MacDonald
7194a07d4d
Add chatd to example projects
2023-11-29 21:18:21 -05:00
Michael Yang
13efd5f218
upload: fix PUT retry
2023-11-29 16:38:35 -08:00
Michael Yang
c4bdfffd96
upload: separate progress tracking
2023-11-29 16:38:33 -08:00
Michael Yang
26c63418e0
new hasher
2023-11-29 14:52:41 -08:00
Michael Yang
2799784ac8
revert checksum calculation to calculate-as-you-go
2023-11-29 13:47:58 -08:00
Alec Hammond
91897a606f
Add OllamaEmbeddings to python LangChain example ( #994 )
...
* Add OllamaEmbeddings to python LangChain example
* typo
---------
Co-authored-by: Alec Hammond <alechammond@fb.com >
2023-11-29 16:25:39 -05:00
Bruce MacDonald
96122b7271
validate model tags on copy ( #1323 )
2023-11-29 15:54:29 -05:00
jeremiahbuckley
39be7fdb98
fix rhel cuda install ( #1321 )
...
Co-authored-by: Cloud User <azureuser@testgpu2.hqzwom21okjenksna4y3c4ymjd.phxx.internal.cloudapp.net >
2023-11-29 14:55:15 -05:00
Timothy Jaeryang Baek
c2e3b89176
fix: disable ':' in tag names ( #1280 )
...
Co-authored-by: rootedbox
2023-11-29 13:33:45 -05:00
Patrick Devine
cde31cb220
Allow setting parameters in the REPL ( #1294 )
2023-11-29 09:56:42 -08:00
ToasterUwU
63097607b2
Correct MacOS Host port example ( #1301 )
2023-11-29 11:44:03 -05:00
Michael
2ae80e1e27
Update README.md
...
add new recent models as examples
2023-11-28 22:16:37 -05:00
Michael Yang
b173cfc558
Merge pull request #1195 from jmorganca/mxyng/fix-bar-rate
...
progress: fix bar rate
2023-11-28 11:55:23 -08:00
Michael Yang
424d53ac70
progress: fix bar rate
2023-11-28 11:44:56 -08:00
ftorto
e1a69d44c9
Update faq.md ( #1299 )
...
Fix a typo in the CA update command
2023-11-28 09:54:42 -05:00
Jason Jacobs
3d620f9462
ignore jetbrain ides ( #1287 )
2023-11-27 15:57:45 -05:00
Bruce MacDonald
928950fcc6
update python client create example ( #1227 )
...
* add remote create to python example client
2023-11-27 15:36:19 -05:00
Kasumi
39c6d949fc
Add Amica to community integrations ( #1281 )
2023-11-27 10:44:37 -05:00
Jeffrey Morgan
16a9006306
add back f16c instructions on intel mac
2023-11-26 15:59:49 -05:00
Jeffrey Morgan
e9216ea459
fix readline history on linux
2023-11-26 15:59:04 -05:00
Jeffrey Morgan
9e4a316405
update submodule commit
2023-11-26 14:52:00 -05:00
Jeffrey Morgan
9fb5e8399c
Fix issues with inputting and formatting multi line strings in ollama run
...
Co-authored-by: Wen Sun <iwendellsun@gmail.com >
2023-11-26 12:54:29 -05:00
Jing Zhang
82b9b329ff
windows CUDA support ( #1262 )
...
* Support cuda build in Windows
* Enable dynamic NumGPU allocation for Windows
2023-11-24 17:16:36 -05:00
Jongwook Choi
12e8c12d2b
Disable CUDA peer access as a workaround for multi-gpu inference bug ( #1261 )
...
When CUDA peer access is enabled, multi-gpu inference will produce
garbage output. This is a known bug of llama.cpp (or nvidia). Until the
upstream bug is fixed, we can disable CUDA peer access temporarily
to ensure correct output.
See #961 .
2023-11-24 14:05:57 -05:00
Jeffrey Morgan
d77dde126b
consistent cpu instructions on macos and linux
2023-11-22 16:26:46 -05:00
Michael Yang
c7e70cd3bb
Merge pull request #1245 from jmorganca/mxyng/gguf-int
...
fix: gguf int type
2023-11-22 11:42:56 -08:00
Michael Yang
199941cd15
fix: gguf int type
2023-11-22 11:40:30 -08:00
Long Huynh
c9474f7f61
Update README.md - Community Integrations - Obsidian BMO Chatbot plugin ( #1239 )
2023-11-22 14:32:30 -05:00
Jeffrey Morgan
927e3ba4a4
tag image with correct version when building with build_docker script
2023-11-22 14:32:17 -05:00
Bruce MacDonald
37d95157df
fix relative path on create ( #1222 )
2023-11-21 15:43:17 -05:00
Jeffrey Morgan
2eaa95b417
Update api.md
2023-11-21 15:32:05 -05:00
Kevin Cao
3cd07728f4
Make alt+backspace delete word ( #1223 )
2023-11-21 12:26:47 -08:00
Michael Yang
ecf8b793f0
Merge pull request #1224 from jmorganca/mxyng/update
...
update llama.cpp
2023-11-21 12:21:59 -08:00
Matt Williams
abf294826b
Merge pull request #1221 from jmorganca/mattw/communityinstalls
...
add installation packages category to community
2023-11-21 12:12:23 -08:00
Steve Korshakov
ae06bb426b
add Llama Coder ( #1225 )
...
* add Llama Coder
* Update README.md
2023-11-21 14:08:19 -05:00
Matt Williams
d8e0f62ebb
Merge pull request #1159 from jmorganca/mattw/functioncalling
...
Example: Function Calling in Typescript
2023-11-21 10:06:55 -08:00
Michael Yang
a00fac4ec8
update llama.cpp
2023-11-21 09:50:02 -08:00
Jeffrey Morgan
f2113c1fc7
fix potential error in progress bar calculation
2023-11-21 12:48:20 -05:00
Jeffrey Morgan
6452e2ecb8
fix cases where progress bar would not be fixed size
2023-11-21 12:07:25 -05:00
Matt Williams
9a28e263a5
Update README.md
...
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com >
2023-11-21 07:25:32 -08:00
Matt Williams
0c066c9214
Update README.md
...
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com >
2023-11-21 07:25:26 -08:00
Jeffrey Morgan
aabd71aede
fix rendering and variable width issues on progress bar
2023-11-21 10:02:37 -05:00
Matt Williams
da4d7c9f9c
add installation packages category to community
...
Moved the arch package and someone has added a pr for brew.
that needs to get updated to be a link.
Signed-off-by: Matt Williams <m@technovangelist.com >
2023-11-21 06:40:59 -08:00
Matt Williams
f321b13a03
Merge pull request #1178 from tusharhero/install-instructions-archlinux
...
Add Installation instructions for Archlinux
2023-11-21 06:33:22 -08:00
Matt Williams
5ebcde1541
Merge branch 'main' into install-instructions-archlinux
2023-11-21 06:32:50 -08:00
Matt Williams
45206cb7cc
Merge pull request #1218 from danemadsen/main
...
Update Maid repo
2023-11-21 06:30:33 -08:00
Matt Williams
6e65b84f54
Merge pull request #1219 from dustinblackman/main
...
docs: Add Oatmeal to terminal integrations
2023-11-21 06:28:12 -08:00
Dustin Blackman
c00ce12e83
docs: Add Oatmeal to terminal integrations
2023-11-21 06:47:43 -05:00
tusharhero
e1cd3152c9
Move Archlinux package to Community Integrations section.
2023-11-21 16:28:50 +05:30
Dane Madsen
0bef3778c9
Update README.md
2023-11-21 21:02:13 +11:00
Dane Madsen
6ebab38b89
Merge branch 'jmorganca:main' into main
2023-11-21 20:01:13 +10:00
Dane Madsen
5d8e864d44
Update Maid repo
2023-11-21 21:00:54 +11:00
Matt Williams
5f7acd0bbd
remove 'recent'
...
Signed-off-by: Matt Williams <m@technovangelist.com >
2023-11-20 17:03:25 -08:00
Matt Williams
44b3a1ad42
Merge branch 'mattw/functioncalling' of github.com:jmorganca/ollama into mattw/functioncalling
...
Signed-off-by: Matt Williams <m@technovangelist.com >
2023-11-20 17:01:41 -08:00
Matt Williams
0260be4414
remove 'recently'
...
Signed-off-by: Matt Williams <m@technovangelist.com >
2023-11-20 16:57:07 -08:00
Jeffrey Morgan
a3fcecf943
only set main_gpu if value > 0 is provided
2023-11-20 19:54:04 -05:00
Jeffrey Morgan
df07e4a097
remove redundant filename parameter ( #1213 )
2023-11-20 17:05:36 -05:00
Michael Yang
0b7ade0d4c
Merge pull request #1212 from jmorganca/mxyng/metal
...
enable metal for fp32, q5_0, q5_1
2023-11-20 13:56:39 -08:00
Michael Yang
19b7a4d715
recent llama.cpp update added kernels for fp32, q5_0, and q5_1
2023-11-20 13:44:31 -08:00
Bruce MacDonald
31ab453d37
resolve FROM path before sending modelfile ( #1211 )
2023-11-20 16:43:48 -05:00
Jeffrey Morgan
35c4b5ec16
calculate hash separately from http request
2023-11-20 15:45:11 -05:00
James Braza
f24741ff39
Documenting how to view Modelfiles ( #723 )
...
* Documented viewing Modelfiles in ollama.ai/library
* Moved Modelfile in ollama.ai down per request
2023-11-20 15:24:29 -05:00
Jeffrey Morgan
8c4022b06b
fix initial progress stats
2023-11-20 14:33:46 -05:00
Jeffrey Morgan
433702f421
hide progress stats on completion
2023-11-20 14:22:39 -05:00
Matt Williams
48896f626c
Update examples/typescript-functioncalling/extractwp.ts
...
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com >
2023-11-20 10:12:10 -08:00
Matt Williams
c57aee6fba
Update examples/typescript-functioncalling/readme.md
...
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com >
2023-11-20 10:10:42 -08:00
Jeffrey Morgan
6066c70edd
restore progress messages for older endpoints
2023-11-20 11:37:17 -05:00
Jeffrey Morgan
f10ac5de19
restore stats updated every second to progress bar
2023-11-20 10:58:19 -05:00
Jeffrey Morgan
93a108214c
only show decimal points for smaller file size numbers
2023-11-20 10:58:19 -05:00
Purinda Gunasekara
be61a81758
main-gpu argument is not getting passed to llamacpp, fixed. ( #1192 )
2023-11-20 10:52:52 -05:00
Toni Soriano
2fdf1b5ff8
add laravel package to README.md ( #1208 )
...
Co-authored-by: Toni <cloudstudio@Tonis-Mac-mini.local >
2023-11-20 10:48:35 -05:00
Huy Le
331068b964
Adding ogpt.nvim into the list of plugins! ( #1190 )
...
* adding ollama.nvim for visibility
* adding an ogpt.nvim neovim plugin
2023-11-20 10:39:14 -05:00
Andy Brenneke
0179d8eb6b
Add Rivet to Community Integrations ( #1183 )
2023-11-20 10:36:47 -05:00
Eli Bendersky
be48741308
README: link to LangChainGo for talking to ollama, with an example ( #1206 )
2023-11-20 10:35:07 -05:00
Jeffrey Morgan
6bbd6e26fb
fix temporary newline created and removed with spinner in ollama run
2023-11-20 00:49:08 -05:00
Jeffrey Morgan
e6ad4813d3
dont crash when redirecting stderr
2023-11-19 23:50:45 -05:00
Jeffrey Morgan
13ba6df5ab
enable cpu instructions on intel macs
2023-11-19 23:20:26 -05:00
Jeffrey Morgan
9d73d3a6b5
add back part.Reset()
2023-11-19 14:32:19 -05:00
Jeffrey Morgan
72cd336410
dont retry on upload complete context cancel
2023-11-19 14:32:19 -05:00
Jeffrey Morgan
1bd594b2fa
revert to using one open file for blob uploads
2023-11-19 14:32:19 -05:00
Jeffrey Morgan
9a8c21ac3d
use exponential everywhere
2023-11-19 14:32:19 -05:00
Jeffrey Morgan
f6b317e8c9
fix sending too little data in chunk upload body
2023-11-19 14:32:19 -05:00
Jeffrey Morgan
ac5076ce1e
exponential backoff up to 30s
2023-11-19 14:32:19 -05:00
Michael Yang
42c2e3a624
upload: retry complete upload
2023-11-19 14:32:19 -05:00
Michael Yang
cb42589792
adjust download/upload parts
2023-11-19 14:32:19 -05:00
Jeffrey Morgan
258addc799
fix comment in progress.go
2023-11-19 13:46:19 -05:00
Jeffrey Morgan
c06b9b7304
update progress rendering to be closer to v0.1.10
2023-11-19 13:43:21 -05:00
Jeffrey Morgan
95b9acd324
improve pull percentage rendering
2023-11-19 11:00:43 -05:00
Jeffrey Morgan
04cbf5ccc0
progress bar styling improvements
2023-11-19 09:54:33 -05:00
Jeffrey Morgan
e1d7056496
update progress statuses
2023-11-19 09:21:13 -05:00
Jeffrey Morgan
02524a56ff
check retry for authorization error
2023-11-19 00:19:53 -05:00
Jeffrey Morgan
1657c6abc7
add note to specify JSON in the prompt when using JSON mode
2023-11-18 22:59:26 -05:00
Jeffrey Morgan
12e046f12a
remove unused function
2023-11-18 22:16:51 -05:00
Jeffrey Morgan
36a3bbf65f
Update llm/llama.go
2023-11-18 21:25:07 -05:00
Bruce MacDonald
43a726149d
fix potentially inaccurate error message
2023-11-18 21:25:07 -05:00
Jeffrey Morgan
984714f131
update status text when transfering blob on ollama create
2023-11-18 09:40:10 -05:00
Jeffrey Morgan
bab9494176
add - separator to temp file created on ollama create
2023-11-18 09:39:52 -05:00
Jeffrey Morgan
85e4441c6a
cache docker builds
2023-11-18 08:51:38 -05:00
Michael Yang
42e43736a4
Merge pull request #1186 from jmorganca/mxyng/copy-blob
...
fix cross device rename
2023-11-17 21:54:53 -08:00
Michael Yang
c6e6c8ee7e
fix cross device rename
2023-11-17 15:22:17 -08:00
Jeffrey Morgan
a185b29719
fix install script error on linux
2023-11-17 18:00:41 -05:00
Michael Yang
dc84b20d6b
Merge pull request #1104 from jmorganca/mxyng/jupyter
...
add jupyter notebook example
2023-11-17 14:46:26 -08:00
Michael Yang
ad8659b980
Merge pull request #1161 from jmorganca/mxyng/systemd-placeholder
...
placeholder environment variables
2023-11-17 14:45:38 -08:00
Michael Yang
c1bbf5ddee
Merge pull request #1134 from jmorganca/mxyng/progress
...
progress bar
2023-11-17 14:03:35 -08:00
Bruce MacDonald
0b19e24d81
only retry once on auth failure ( #1175 )
2023-11-17 14:22:35 -05:00
Michael Yang
3cb07d2773
simplify StopAndClear
2023-11-17 10:26:22 -08:00
Michael Yang
976068369b
stop all spinners on progress stop
2023-11-17 10:06:19 -08:00
Michael Yang
4d677ee389
no divide by zero
2023-11-17 10:06:19 -08:00
Michael Yang
7ea905871a
only move cursor up if pos > 0
2023-11-17 10:06:19 -08:00
Michael Yang
d6ecaa2cbf
update progress responses
2023-11-17 10:06:19 -08:00
Michael Yang
4dcf7a59b1
generate progress
2023-11-17 10:06:19 -08:00
Michael Yang
1c0e092ead
progress cmd
2023-11-17 10:06:19 -08:00
Michael Yang
c4a3ccd7ac
progress
2023-11-17 10:06:19 -08:00
Michael Yang
9f04e5a8ea
format bytes
2023-11-17 10:06:19 -08:00
Michael Yang
f91bb2f7f0
remove progressbar
2023-11-17 10:06:19 -08:00
Michael Yang
0813387414
Merge pull request #1177 from jmorganca/mxyng/faq
...
faq: fix heading and add more details
2023-11-17 10:05:21 -08:00
Michael Yang
4936b5bb37
add jupyter readme
2023-11-17 10:04:52 -08:00
tusharhero
786288829e
Make Archlinux a sub-heading of Linux.
2023-11-17 23:17:36 +05:30
tusharhero
72dcc952b6
Add Installation instructions for Archlinux
...
Pacman is the recommended installation method. And the package is in
the official repository, so makes sense to mention it in the README.
2023-11-17 23:13:40 +05:30
Michael Yang
f7f6d6c693
Update examples/jupyter-notebook/ollama.ipynb
...
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com >
2023-11-17 09:30:07 -08:00
Michael Yang
a3053b66d2
add jupyter notebook example
2023-11-17 09:30:07 -08:00
Michael Yang
c82ead4d01
faq: fix heading and add more details
2023-11-17 09:02:17 -08:00
Michael Yang
90860b6a7e
update faq ( #1176 )
2023-11-17 11:42:58 -05:00
Jeffrey Morgan
81092147c4
remove unnecessary -X POST from example curl commands
2023-11-17 09:50:38 -05:00
Jeffrey Morgan
92656a74b7
Use llama2 as the model in api.md
2023-11-17 07:17:51 -05:00
Jeffrey Morgan
41434a7cdc
build intel mac with correct binary and compile flags
2023-11-16 22:14:51 -05:00
Michael Yang
71687ab809
Merge pull request #1164 from jmorganca/mxyng/faq
...
update faq
2023-11-16 17:20:18 -08:00
Michael Yang
d8842b4d4b
update faq
2023-11-16 17:07:36 -08:00
Michael Yang
32add8577d
placeholder environment variables
2023-11-16 16:57:39 -08:00
Michael Yang
585f9c01fa
Merge pull request #1160 from jmorganca/mxyng/faq
...
update faq
2023-11-16 16:48:51 -08:00
Michael Yang
c13bde962d
Update docs/faq.md
...
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com >
2023-11-16 16:48:38 -08:00
Michael Yang
ee307937fd
update faq
2023-11-16 16:46:43 -08:00
Matt Williams
ab6639bc47
Merge pull request #1074 from jmorganca/mattw/loganalysisexample
...
Log Analysis Example
2023-11-16 16:33:07 -08:00
Matt Williams
fefae84c06
example: function calling
...
Signed-off-by: Matt Williams <m@technovangelist.com >
2023-11-16 16:26:29 -08:00
Jeffrey Morgan
dbe6e77472
Update README.md
2023-11-16 16:46:38 -05:00
Bruce MacDonald
4b3f4bc7d9
return failure details when unauthorized to push ( #1131 )
...
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com >
2023-11-16 16:44:18 -05:00
Michael Yang
a5ccf742c1
fix cross repo mounts
2023-11-16 16:33:30 -05:00
Michael Yang
e33ef391cd
fix push scope error for inherited model
2023-11-16 16:33:30 -05:00
yanndegat
75295b9528
install: fix enable contrib on debian 12 ( #1151 )
...
On debian 12, sources definitions have moved from
/etc/apt/sources.list to /etc/apt/sources.list.d/debian.sources
2023-11-16 15:53:06 -05:00
Matt Williams
db5ef3004c
Merge pull request #1079 from jmorganca/mattw/jsonexample
...
Add example using JSON format output
2023-11-16 09:13:34 -08:00
Michael Yang
b5f158f046
add faq for proxies ( #1147 )
2023-11-16 11:43:37 -05:00
Piero Savastano
30141b42e9
Add Cheshire Cat to community integrations ( #1124 )
2023-11-16 11:30:54 -05:00
Dane Madsen
5f301ece1d
Add Maid to Community Integrations ( #1120 )
2023-11-16 11:27:53 -05:00
Michael Yang
77954bea0e
Merge pull request #898 from jmorganca/mxyng/build-context
...
create remote models
2023-11-15 16:41:12 -08:00
Michael Yang
54f92f01cb
update docs
2023-11-15 15:28:15 -08:00
Michael
30ae6e731e
Update randomaddresses.py
2023-11-15 18:24:50 -05:00
Michael
b28a30f7ba
Update examples/python-json-datagenerator/predefinedschema.py
...
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com >
2023-11-15 18:23:36 -05:00
Jeffrey Morgan
ecd71347ab
Update faq.md
2023-11-15 18:17:13 -05:00
Jeffrey Morgan
8ee4cbea0f
Remove table of contents in faq.md
2023-11-15 18:16:27 -05:00
Michael Yang
652d90e1c7
Update server/images.go
...
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com >
2023-11-15 15:16:23 -08:00
Michael Yang
bc22d5a38b
no blob response
2023-11-15 15:16:23 -08:00
Michael Yang
71d71d0988
update docs
2023-11-15 15:16:23 -08:00
Michael Yang
1901044b07
use checksum reference
2023-11-15 15:16:23 -08:00
Michael Yang
d660eebf22
fix create from model tag
2023-11-15 15:16:23 -08:00
Michael Yang
cac11c9137
update api docs
2023-11-15 15:16:23 -08:00
Michael Yang
a07c935d34
ignore non blobs
2023-11-15 15:16:23 -08:00
Michael Yang
1552cee59f
client create modelfile
2023-11-15 15:16:23 -08:00
Michael Yang
3ca56b5ada
add create modelfile field
2023-11-15 15:16:23 -08:00
Michael Yang
b0d14ed51c
refactor create model
2023-11-15 15:16:23 -08:00
Matt Williams
f61f340279
FAQ: answer a few faq questions ( #1128 )
...
* faq: does ollama share my prompts
Signed-off-by: Matt Williams <m@technovangelist.com >
* faq: ollama and openai
Signed-off-by: Matt Williams <m@technovangelist.com >
* faq: vscode plugins
Signed-off-by: Matt Williams <m@technovangelist.com >
* faq: send a doc to Ollama
Signed-off-by: Matt Williams <m@technovangelist.com >
* extra spacing
Signed-off-by: Matt Williams <m@technovangelist.com >
* Update faq.md
* Update faq.md
---------
Signed-off-by: Matt Williams <m@technovangelist.com >
Co-authored-by: Michael <mchiang0610@users.noreply.github.com >
2023-11-15 18:05:13 -05:00
Michael Yang
686f85d6ca
Merge pull request #1132 from jmorganca/mxyng/human-bytes
...
replace go-humanize with format.HumanBytes
2023-11-15 09:46:21 -08:00
bnodnarb
85951d25ef
Created tutorial for running Ollama on NVIDIA Jetson devices ( #1098 )
2023-11-15 12:32:37 -05:00
Dane Madsen
779e196ef6
Merge branch 'jmorganca:main' into main
2023-11-15 21:38:07 +10:00
Michael Yang
01ea6002c4
replace go-humanize with format.HumanBytes
2023-11-14 14:57:41 -08:00
Jeffrey Morgan
423862042a
treat ollama run model < file as entire prompt, not prompt-per-line ( #1126 )
...
Previously, `ollama run` treated a non-terminal stdin (such as `ollama run model < file`) as containing one prompt per line. To run inference on a multi-line prompt, the only non-API workaround was to run `ollama run` interactively and wrap the prompt in `"""..."""`.
Now, `ollama run` treats a non-terminal stdin as containing a single prompt. For example, if `myprompt.txt` is a multi-line file, then `ollama run model < myprompt.txt` would treat `myprompt.txt`'s entire contents as the prompt.
Co-authored-by: Quinn Slack <quinn@slack.org >
2023-11-14 16:42:21 -05:00
Bruce MacDonald
df18486c35
Move /generate format to optional parameters ( #1127 )
...
This field is optional and should be under the `Advanced parameters` header
2023-11-14 16:12:30 -05:00
Jeffrey Morgan
4e612a2e92
use stdout fd for terminal size ( #1125 )
2023-11-14 16:09:09 -05:00
Matt Williams
47ffb81db7
Update examples/python-json-datagenerator/readme.md
...
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com >
2023-11-14 10:33:34 -08:00
Matt Williams
69795d2db0
Update examples/python-json-datagenerator/readme.md
...
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com >
2023-11-14 10:33:16 -08:00
Matt Williams
acde0819d9
Update examples/python-json-datagenerator/randomaddresses.py
...
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com >
2023-11-14 10:33:02 -08:00
Matt Williams
f748331aa3
Update examples/python-json-datagenerator/predefinedschema.py
...
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com >
2023-11-14 10:32:45 -08:00
Matt Williams
f4edc302a8
Update examples/python-loganalysis/readme.md
...
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com >
2023-11-14 10:31:22 -08:00
Matt Williams
64b7e0c218
Update examples/python-loganalysis/loganalysis.py
...
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com >
2023-11-14 10:31:05 -08:00
Matt Williams
eced0d52ab
Update examples/python-loganalysis/loganalysis.py
...
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com >
2023-11-14 10:30:30 -08:00
Matt Williams
96bf9cafa7
Update examples/python-loganalysis/loganalysis.py
...
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com >
2023-11-14 10:30:17 -08:00
Jeffrey Morgan
6e0f686afa
--format json should work in interactive mode
2023-11-14 10:22:03 -05:00
Dane Madsen
c1a5220860
Update README.md
2023-11-14 15:31:31 +10:00
Dane Madsen
3b15175a70
Add maid to community integrations
2023-11-14 15:30:03 +10:00
Jeffrey Morgan
c1844bbee2
add json mode to cli ( #1095 )
2023-11-13 21:54:02 -05:00
Huy Le
cb745965ce
adding ollama.nvim for visibility ( #1115 )
2023-11-13 17:00:17 -05:00
Enrico Ros
8d29b6a2b6
New big-AGI integration ( #1078 )
...
* New big-AGI integration
Ollama works great in big-AGI, and this document explains how to link the two projects.
* Update README.md
2023-11-13 16:59:00 -05:00
Ilya Breitburg
724aa64bee
Add Dart library to README.md ( #1106 )
2023-11-13 14:50:42 -05:00
Michael Yang
d91c103e74
Merge pull request #1055 from dansreis/946-fix-incorrect-base-model-name
...
Fixed incorrect base model name
2023-11-13 08:42:55 -08:00
Kevin Hermawan
98ec7d81e3
Add OllamaKit to the community integrations ( #1085 )
2023-11-11 14:41:42 -08:00
Matt Williams
b6817a83d8
Add gif and finish readme
...
Signed-off-by: Matt Williams <m@technovangelist.com >
2023-11-10 16:41:48 -06:00
Matt Williams
73f3448ede
add example showing use of JSON format
...
Signed-off-by: Matt Williams <m@technovangelist.com >
2023-11-10 16:33:56 -06:00
Daniel Reis
7c438f2c53
Replaced method
2023-11-10 20:22:03 +00:00
Daniel Reis
6e46338d44
Reverting previous changes
2023-11-10 20:21:35 +00:00
Jeffrey Morgan
cdddd3df65
add format to example python client
2023-11-10 10:22:21 -08:00
Daniel Hiltgen
afa61bdf45
Merge pull request #1075 from jmorganca/dhiltgen/unexpected-eof
...
Resume chunk download on UnexpectedEOF errors
2023-11-10 08:48:27 -08:00
Daniel Hiltgen
cc54a416c6
Resume chunk download on UnexpectedEOF errors
...
If the chunk download is interrupted, resume from where we left off
2023-11-10 08:29:42 -08:00
Matt Williams
c819d7f68a
Merge pull request #955 from jmorganca/mattw/example-bash-compare
...
docs: add examples using bash to compare models
2023-11-10 08:59:32 -06:00
Matt Williams
e4f59ba073
better streaming plus gif
...
Signed-off-by: Matt Williams <m@technovangelist.com >
2023-11-10 08:55:17 -06:00
Matt Williams
5de568bffe
Add a simple log analysis example
...
Signed-off-by: Matt Williams <m@technovangelist.com >
2023-11-10 08:28:52 -06:00
Jeffrey Morgan
5cba29b9d6
JSON mode: add `"format" as an api parameter ( #1051 )
...
* add `"format": "json"` as an API parameter
---------
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com >
2023-11-09 16:44:02 -08:00
Daniel Reis
d17730356a
Removed inline parse model path
2023-11-09 22:44:26 +00:00
Daniel Reis
32d79a6eea
Using 'GetShortTagname' method instead
2023-11-09 22:40:37 +00:00
Bruce MacDonald
5b39503bcd
document specifying multiple stop params ( #1061 )
2023-11-09 13:16:26 -08:00
Bruce MacDonald
1ae84bc2a2
skip gpu if less than 2GB VRAM are available ( #1059 )
2023-11-09 13:16:16 -08:00
Bruce MacDonald
db8bf336fc
Update README.md
2023-11-09 12:53:24 -08:00
Nick Anderson
d77e094a90
Added gptel to list of integrations ( #1062 )
2023-11-09 12:52:36 -08:00
Matt Williams
dd3dc47ddb
Merge pull request #992 from aashish2057/aashish2057/langchainjs_doc_update
2023-11-09 05:08:31 -08:00
Michael Yang
c5e1bbabda
instead of static number of parameters for each model family, get the real number from the tensors ( #1022 )
...
* parse tensor info
* refactor decoder
* return actual parameter count
* explicit rounding
* s/Human/HumanNumber/
2023-11-08 17:55:46 -08:00
Bruce MacDonald
a49d6acc1e
add a complete /generate options example ( #1035 )
2023-11-08 16:44:36 -08:00
Moritz Poldrack
6e9bcdb9b3
progressbar: make start and end seamless ( #1042 )
2023-11-08 16:42:40 -08:00
Matt Williams
13086363bd
Update as per bmacd
...
Signed-off-by: Matt Williams <m@technovangelist.com >
2023-11-08 18:09:05 -06:00
Bruce MacDonald
ec2a31e9b3
support raw generation requests ( #952 )
...
- add the optional `raw` generate request parameter to bypass prompt formatting and response context
-add raw request to docs
2023-11-08 14:05:02 -08:00
Amith Koujalgi
ec84c02d54
Add Ollama4j Java library to the list of community libraries ( #1044 )
2023-11-08 11:04:32 -08:00
Kevin Hermawan
2a88b66bc9
Add Ollamac to community integrations ( #1043 )
2023-11-08 11:01:09 -08:00
Jeffrey Morgan
2d0faea96c
clean up README.md
2023-11-08 00:03:29 -08:00
Jeffrey Morgan
637142181a
clean up README.md
2023-11-07 23:52:31 -08:00
Matt Williams
bcbff421c9
Merge pull request #1023 from jmorganca/mattw/wherearemodelsfaq
2023-11-07 17:59:54 -08:00
thealhu
1359d6cf3b
Fix sudo variable in install.sh ( #1034 )
...
It was forgotten to replace sudo at one place with the variable for sudo.
2023-11-07 09:59:57 -08:00
Omar Magdy
6e2d0224d9
Added logseq ollama plugin ( #1029 )
2023-11-07 09:58:13 -08:00
Ikko Eltociear Ashimine
921406f721
Update client.py ( #1026 )
...
recieve -> receive
2023-11-07 09:55:47 -08:00
Michael Yang
c7047d7353
Merge pull request #959 from jmorganca/mxyng/example-k8s
2023-11-07 10:43:21 -06:00
Matt Williams
1d155caba3
docs: clarify where the models are stored in the faq
...
Signed-off-by: Matt Williams <m@technovangelist.com >
2023-11-06 14:38:49 -08:00
Michael Yang
866324b9a5
Merge pull request #943 from tjbck/patch-1
...
doc: categorised community integrations + added ollama-webui
2023-11-06 11:35:39 -08:00
Michael Yang
145e060855
Apply suggestions from code review
...
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com >
2023-11-06 11:32:23 -08:00
Michael Yang
146072113d
Merge pull request #993 from jmorganca/mxyng/cleanup
...
cleanup upload and download errors
2023-11-06 11:32:12 -08:00
Timothy Jaeryang Baek
33d31d1b56
Merge branch 'main' into patch-1
2023-11-06 14:27:02 -05:00
Dr. David A. Kunz
274c6cbf4c
Added gen.nvim to community integrations ( #996 )
2023-11-06 10:51:41 -08:00
Elton Renda
7ebbd89bbf
add hass-ollama-conversation ( #999 )
2023-11-06 10:50:35 -08:00
Lars Grammel
9079b1bb6d
Add ModelFusion community integration ( #1020 )
2023-11-06 10:46:16 -08:00
Timothy Jaeryang Baek
6febde7200
Merge branch 'main' into patch-1
2023-11-04 19:12:18 -05:00
pepperoni21
325cfcd9ff
Added ollama-rs to community integrations ( #995 )
...
Co-authored-by: pepperoni21 <pepperoni2100@gmail.com >
2023-11-04 14:51:29 -07:00
Jeffrey Morgan
639d0fd070
Update README.md
2023-11-04 12:24:24 -07:00
Jeffrey Morgan
e21579a0f1
Restore system prompt on requests
2023-11-03 17:26:45 -07:00
Jeffrey Morgan
c44b619428
remove unused fmt.Println
2023-11-03 17:24:58 -07:00
Michael Yang
434a6f9d46
return last error
2023-11-03 16:49:51 -07:00
aashish2057
b13586cc72
update langchainjs doc
2023-11-03 18:45:19 -05:00
Jeffrey Morgan
17678b7225
Restore system prompt on requests and default num_keep to 0
2023-11-03 13:25:25 -07:00
Michael Yang
84725ec7e3
refactor part reset
2023-11-03 09:20:32 -07:00
Bruce MacDonald
6109bebba6
reformat api docs for more examples ( #972 )
2023-11-03 10:57:00 -04:00
Noah Gitsham
8ae8c9fa8c
Remove duplicate "install" in GPU support warning ( #984 )
2023-11-03 00:45:14 -07:00
Noah Gitsham
f39daff461
Add missing "be" to GPU support warning message ( #983 )
2023-11-02 18:37:12 -07:00
Jeffrey Morgan
c50b01bc21
check request.Context for initial system prompt
2023-11-02 18:17:00 -07:00
Bruce MacDonald
b9dc875401
remove modelfile context deprecated in v0.0.7 ( #974 )
2023-11-02 20:52:56 -04:00
Jeffrey Morgan
06589a3b30
Set NumKeep to 4 by default ( #982 )
2023-11-02 17:26:11 -07:00
Michael Yang
1fd511e661
Merge pull request #975 from jmorganca/mxyng/downloads
...
update downloads to use retry wrapper
2023-11-02 16:12:48 -07:00
Michael Yang
c01bbe94fd
Merge pull request #979 from jmorganca/mxyng/num-keep
...
update default NumKeep
2023-11-02 15:48:44 -07:00
Jeffrey Morgan
1beb5645a9
only use system prompt if context is not provided ( #978 )
2023-11-02 15:48:02 -07:00
Michael Yang
6db3691b8f
update default NumKeep
2023-11-02 15:47:35 -07:00
Michael Yang
fe5a872444
fix upload
2023-11-02 13:25:58 -07:00
Michael Yang
d39709260f
download with retry
2023-11-02 13:16:11 -07:00
Michael Yang
60bb3c03a1
use http.Method
2023-11-02 13:12:45 -07:00
Jeffrey Morgan
2e53704685
default rope params to 0 for new models ( #968 )
2023-11-02 08:41:30 -07:00
Michael Yang
527f9a7975
Merge pull request #966 from jmorganca/mxyng/fix-log
2023-11-01 17:49:10 -07:00
Michael Yang
c4cc738cbf
fix log
2023-11-01 17:18:11 -07:00
Michael Yang
2c6189f4fe
Merge pull request #750 from jmorganca/mxyng/concurrent-uploads
...
concurrent uploads
2023-11-01 15:00:01 -07:00
Michael Yang
b99c291f47
fly example
2023-11-01 14:58:20 -07:00
Michael Yang
dccac8c8fa
k8s example
2023-11-01 14:52:58 -07:00
Michael Yang
c05ab9a86e
Merge pull request #965 from jmorganca/mxyng/go-mod-tidy
...
go mod tidy
2023-11-01 11:55:43 -07:00
Michael Yang
f42f3d9b27
go fmt
2023-11-01 11:55:08 -07:00
Michael Yang
341fb7e35f
go mod tidy
2023-11-01 11:54:25 -07:00
Michael
f31961637f
Update README.md
2023-11-01 12:20:55 -04:00
Michael Yang
ec3614812a
Merge pull request #960 from jmorganca/mxyng/fix-tautology
2023-11-01 08:30:49 -07:00
Michael Yang
f14969314a
Merge pull request #958 from jmorganca/mxyng/append-ld-library-path
2023-11-01 08:30:38 -07:00
Bruce MacDonald
1fb9288661
notify that the ollama api is available after linux install ( #954 )
2023-11-01 11:28:26 -04:00
Matt Williams
01a03caa20
Merge pull request #956 from jmorganca/mattw/apidocupdate
2023-10-31 21:43:11 -07:00
Michael Yang
bf6786bb39
fix tautology
2023-10-31 20:49:48 -07:00
Michael Yang
642128b75a
append LD_LIBRARY_PATH
2023-10-31 15:54:49 -07:00
Matt Williams
f21bd6210d
docs: clarify and clean up API docs
...
Signed-off-by: Matt Williams <m@technovangelist.com >
2023-10-31 13:11:33 -07:00
Matt Williams
80362fedce
better readme
...
Signed-off-by: Matt Williams <m@technovangelist.com >
2023-10-31 12:40:46 -07:00
Matt Williams
5757925060
add a gif
...
Signed-off-by: Matt Williams <m@technovangelist.com >
2023-10-31 11:52:01 -07:00
Michael
4512301756
Update README.md
2023-10-31 13:25:36 -04:00
Matt Williams
2236a93efc
docs: add examples using bash to compare models
...
Signed-off-by: Matt Williams <m@technovangelist.com >
2023-10-31 09:12:39 -07:00
Matt Williams
ad88799411
Merge pull request #949 from jmorganca/matt/fixPrivateGPT
...
fix: private gpt example was broken due to changes in chroma
2023-10-30 17:17:00 -07:00
Bruce MacDonald
0818b5e318
readline windows terminal support ( #950 )
...
- update the readline package to have basic support on windows, this is not full feature parity with the unix cli yet
2023-10-30 16:18:12 -04:00
Matt Williams
1df6100c77
Update examples/langchain-python-rag-privategpt/privateGPT.py
...
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com >
2023-10-30 12:48:17 -07:00
Matt Williams
5c48fe1fb0
Update examples/langchain-python-rag-privategpt/constants.py
...
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com >
2023-10-30 12:47:56 -07:00
Dirk Loss
874bb31986
Fix conversion command for gptneox ( #948 )
2023-10-30 14:34:29 -04:00
Matt Williams
f7856a57eb
fix: private gpt example was broken due to changes in chroma
...
Signed-off-by: Matt Williams <m@technovangelist.com >
2023-10-30 10:56:25 -07:00
Bruce MacDonald
f9a4281124
clean up: remove server functions from client ( #937 )
2023-10-30 11:10:18 -04:00
Timothy Jaeryang Baek
96da0792e6
doc: OllamaSharp for .NET moved to libraries
2023-10-28 16:18:38 -05:00
Timothy Jaeryang Baek
95d24262fc
doc: categorised community integrations + added web-ui
2023-10-28 16:02:13 -05:00
Jeffrey Morgan
8d03bd7b54
remove +build directive in term.go
2023-10-28 09:56:03 -07:00
Jeffrey Morgan
9ec16f0f03
fix formatting when exiting ollama run
2023-10-27 21:26:23 -07:00
Jeffrey Morgan
57a58db1b0
history: update pos after compact
2023-10-27 20:38:03 -07:00
Jeffrey Morgan
2d75a4537c
close input channel when receiving io.EOF
2023-10-27 20:26:04 -07:00
Jeffrey Morgan
4748609611
Don't quit ioloop on NUL character ( #940 )
...
* dont quit ioloop on 0 rune
* check for closed channel
* remove unused error on `Close()`
2023-10-27 20:01:48 -07:00
Jeffrey Morgan
c0dcea1398
Update faq.md
2023-10-27 18:29:00 -07:00
Michael Yang
115fc56eb7
calculate and verify md5 checksum
2023-10-27 17:07:33 -07:00
Michael Yang
186f685224
retry PUT
2023-10-27 17:07:33 -07:00
Michael Yang
12efcbb057
comments
2023-10-27 17:07:33 -07:00
Michael Yang
4e09aab8b9
concurrent uploads
2023-10-27 17:07:33 -07:00
Jeffrey Morgan
3a1ed9ff70
restore building runner with AVX on by default ( #900 )
2023-10-27 12:13:44 -07:00
Bruce MacDonald
6d283882b1
catch insufficient permissions nvidia err ( #934 )
2023-10-27 12:42:40 -04:00
Bruce MacDonald
5c3491f425
allow for a configurable ollama model storage directory ( #897 )
...
* allow for a configurable ollama models directory
- set OLLAMA_MODELS in the environment that ollama is running in to change where model files are stored
- update docs
Co-Authored-By: Jeffrey Morgan <jmorganca@gmail.com >
Co-Authored-By: Jay Nakrani <dhananjaynakrani@gmail.com >
Co-Authored-By: Akhil Acharya <akhilcacharya@gmail.com >
Co-Authored-By: Sasha Devol <sasha.devol@protonmail.com >
2023-10-27 10:19:59 -04:00
James Braza
e5d1ce4dde
Tweaks to README.md ( #906 )
...
* Mentioned Docker Hub in docs
* Consolidated brew installs to one line
2023-10-27 00:10:23 -07:00
Bruce MacDonald
2665f3c28e
offload 75% of available vram to improve stability ( #921 )
2023-10-26 20:49:55 -04:00
Patrick Devine
a79f030e75
add bracketed paste mode ( #922 )
2023-10-26 15:57:00 -07:00
Michael Yang
9bc5864a03
Merge pull request #918 from jmorganca/mxyng/fix-out-of-space
...
fix(download): no retry when out of space
2023-10-26 12:24:20 -07:00
Michael Yang
b88cc0fac9
Merge pull request #916 from jmorganca/mxyng/fix-client-host
...
fix(client): trim trailing slash
2023-10-26 12:24:12 -07:00
Patrick Devine
5b2cf16397
fix docker build annotations ( #917 )
2023-10-26 12:00:33 -07:00
Michael Yang
910816a532
fix(download): no retry when out of space
2023-10-26 11:34:07 -07:00
Michael Yang
28c3f288e2
client: fix trailing slash
2023-10-26 11:09:38 -07:00
Patrick Devine
deeac961bb
new readline library ( #847 )
2023-10-25 16:41:18 -07:00
Jeffrey Morgan
49443e7da5
fix typo in README.md
2023-10-25 16:19:27 -07:00
Ajay Kemparaj
bb8464c0d2
update golang.org/x/net fixes CVE-2023-3978,CVE-2023-39325,CVE-2023-44487 ( #855 )
2023-10-25 16:17:24 -07:00
Michael Yang
daa5bb4473
Merge pull request #907 from jmorganca/mxyng/linux
...
update linux.md
2023-10-25 15:03:34 -07:00
Michael Yang
92119de9d8
update linux.md
2023-10-25 14:57:50 -07:00
Michael Yang
53b0ba8d43
Merge pull request #893 from jmorganca/mxyng/update-faq
...
update faq
2023-10-24 16:02:35 -07:00
Michael Yang
db342691f9
Update docs/faq.md
...
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com >
2023-10-24 13:59:33 -07:00
Bruce MacDonald
cecf83141e
Linux uninstall instructions ( #894 )
2023-10-24 14:07:05 -04:00
Michael Yang
a5a2adf1ec
update faq
2023-10-24 10:54:16 -07:00
Jeffrey Morgan
b0c9cd0f3b
fix metal assertion errors
2023-10-24 00:32:36 -07:00
Jeffrey Morgan
77f61c6301
update submodule commit
2023-10-24 00:30:27 -07:00
Jeffrey Morgan
f3604534e5
update submodule commit
2023-10-23 23:59:12 -07:00
Jeffrey Morgan
914428351a
Update import.md
2023-10-23 17:44:53 -07:00
Jeffrey Morgan
9afea9e3b9
Update import.md
...
Separate GGUF and PyTorch guides
2023-10-23 17:42:17 -07:00
Bruce MacDonald
c039432b5c
add current user to ollama group on install ( #772 )
2023-10-23 17:06:31 -04:00
Michael Yang
c345b4ca7c
Merge pull request #884 from jmorganca/mxyng/update-submodules
...
bump submodules
2023-10-23 11:27:38 -07:00
Michael Yang
0c7a00a264
bump submodules
...
pin to 9e70cc03229df19ca2d28ce23cc817198f897278 for now since
438c2ca83045a00ef244093d27e9ed41a8cb4ea9 is breaking
2023-10-23 11:17:59 -07:00
Michael Yang
36c160f1c3
Merge pull request #881 from jmorganca/mxyng/ggufv3
...
ggufv3
2023-10-23 10:50:45 -07:00
Michael Yang
b66bcaa582
Merge pull request #883 from jmorganca/mxyng/logs
...
update default log target
2023-10-23 10:50:29 -07:00
Michael Yang
c9167494cb
update default log target
2023-10-23 10:44:50 -07:00
Michael Yang
125d0a013a
ggufv3
...
ggufv3 adds support for big endianness, mainly for s390x architecture.
while that's not currently supported for ollama, the change is simple.
loosen version check to be more forward compatible. unless specified,
gguf versions other v1 will be decoded into v2.
2023-10-23 09:35:49 -07:00
Richard Awoyemi
ba2da6ceaa
Added a minimalist React UI for Ollama models to the community contributions.md ( #870 )
2023-10-23 10:44:39 -04:00
Jeffrey Morgan
ccff9ca09c
Update README.md
2023-10-21 11:58:10 -04:00
Jeffrey Morgan
436a5be49c
Update README.md
2023-10-21 11:57:32 -04:00
Matt Williams
cc0bf96398
Merge pull request #829 from jmorganca/mattw/example-summarize-news
...
added python rag news summary
2023-10-20 21:03:16 -07:00
Michael Yang
386169205c
update runtime options ( #864 )
2023-10-20 21:17:14 -04:00
Michael Yang
0d6342a882
Merge pull request #863 from jmorganca/mxyng/nil-pointer
...
fix: nil pointer dereference
2023-10-20 17:23:37 -07:00
Michael Yang
75bee074b6
fix: nil pointer dereference
2023-10-20 16:55:24 -07:00
Michael Yang
533d76368c
Merge pull request #859 from jmorganca/mxyng/fix-hostname
...
fix: ollama host for hostname
2023-10-20 11:40:56 -07:00
Michael Yang
459f4a7889
fix: ollama host for hostname
2023-10-20 11:32:41 -07:00
Matt Williams
25c63c91d8
Update README.md
...
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com >
2023-10-19 13:52:40 -07:00
Jeffrey Morgan
cbfff4f868
update dependencies in app/
2023-10-19 15:52:41 -04:00
Jeffrey Morgan
7ed5a39bc7
simpler check for model loading compatibility errors
2023-10-19 14:50:49 -04:00
Michael Yang
cc1d03f4ec
Merge pull request #841 from jmorganca/mxyng/cleanup-cmd-args
2023-10-19 11:22:40 -07:00
Michael Yang
846f593dbf
Merge pull request #828 from jmorganca/mxyng/template-parameters
...
image: show parameters
2023-10-19 09:31:31 -07:00
Michael Yang
0a53da03fd
Merge pull request #843 from jmorganca/mxyng/request-validation
...
basic request validation
2023-10-19 09:30:45 -07:00
Michael Yang
2ce1793a1d
go fmt
2023-10-19 09:21:51 -07:00
Michael Yang
e1c5be24e7
check json eof
2023-10-19 09:21:51 -07:00
Michael Yang
2ad8a074ac
generate: set created_at
...
move the empty response so it's more visible
2023-10-19 09:21:51 -07:00
Michael Yang
7e547c6833
s/message/error/
2023-10-19 09:21:04 -07:00
Michael Yang
689842b9ff
request: bad request when model missing fields
2023-10-19 09:21:04 -07:00
Michael Yang
a19d47642e
models: rm workDir from CreateModel
...
unused after removing EMBED
2023-10-19 09:21:04 -07:00
Jeffrey Morgan
a7dad24d92
add error for falcon and starcoder vocab compatibility ( #844 )
...
add error for falcon and starcoder vocab compatibility
---------
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com >
2023-10-19 12:18:31 -04:00
Jeffrey Morgan
6b213216d5
Update import.md
2023-10-19 12:17:36 -04:00
Bruce MacDonald
fe6f3b48f7
do not reload the running llm when runtime params change ( #840 )
...
- only reload the running llm if the model has changed, or the options for loading the running model have changed
- rename loaded llm to runner to differentiate from loaded model image
- remove logic which keeps the first system prompt in the generation context
2023-10-19 10:39:58 -04:00
Michael Yang
36c88cb9db
cmd: set ExactArgs
2023-10-18 14:40:48 -07:00
Michael Yang
235e43d7f6
Merge pull request #833 from discovertomorrow/leadingspace
...
Fix Issue with Leading Whitespaces in Decoded Context
2023-10-18 13:52:48 -07:00
Arne Müller
730996e530
use TrimPrefix instead of TrimLeft
2023-10-18 22:51:30 +02:00
Arne Müller
ce6197a8e0
removed redundant strings.CutPrefix from Decode
2023-10-18 22:47:20 +02:00
Arne Müller
46b9953f32
use strings.TrimLeft to remove spaces
2023-10-18 22:41:19 +02:00
Michael Yang
4dcceeffb7
let the template do the work
2023-10-18 13:12:00 -07:00
Michael Yang
019e4a4558
image: show parameters
2023-10-18 13:12:00 -07:00
Michael Yang
627d04d927
Merge pull request #827 from jmorganca/mxyng/template-adapters
...
model: native gotemplate adapter template
2023-10-18 13:11:25 -07:00
Michael Yang
940e8ebec3
Merge pull request #826 from jmorganca/mxyng/template-system
...
show: no template system if empty
2023-10-18 13:11:09 -07:00
Bruce MacDonald
565648f3f7
relay CUDA errors to the client ( #825 )
2023-10-18 15:36:56 -04:00
Arne Müller
90c49bed57
moved removal of leading space into Predict
2023-10-18 20:08:26 +02:00
Michael Yang
3a2477174f
Merge pull request #822 from ggozad/fix-tags-api
...
Fix /api/tags for no models.
2023-10-18 09:34:00 -07:00
Yiorgis Gozadinos
8c6c2cbc8c
When the .ollama folder is broken or there are no models return an empty list on /api/tags
2023-10-18 08:23:20 +02:00
Arne Müller
5dc0cff459
fix whitespace removal
2023-10-18 08:15:27 +02:00
Matt Williams
c5c8b4b16a
added python rag news summary
...
Signed-off-by: Matt Williams <m@technovangelist.com >
2023-10-17 16:41:28 -07:00
Michael Yang
8299bf76ed
model: native gotemplate adapter template
2023-10-17 15:28:38 -07:00
Michael Yang
ee4979e510
show: no template system if empty
2023-10-17 15:25:43 -07:00
Michael Yang
08b0e04f40
Merge pull request #813 from jmorganca/mxyng/llama
...
refactor llm/llama.go
2023-10-17 14:05:58 -07:00
Michael Yang
b36b0b71f8
use cut prefix
2023-10-17 14:01:39 -07:00
Michael Yang
094df37563
remove unused struct
2023-10-17 14:01:38 -07:00
Bruce MacDonald
f3648fd206
Update llama.cpp gguf to latest ( #710 )
2023-10-17 16:55:16 -04:00
Bruce MacDonald
bd93a94abd
fix MB VRAM log output ( #824 )
2023-10-17 15:35:16 -04:00
Michael Yang
f55bdb6f10
Merge pull request #799 from deichbewohner/jsonmarshaling
...
Fix JSON Marshal Escaping for Special Characters
2023-10-17 08:46:02 -07:00
Michael Yang
2870a9bfc8
Merge pull request #812 from jmorganca/mxyng/fix-format-string
...
fix: wrong format string type
2023-10-17 08:40:49 -07:00
Michael Yang
c031c211d1
Merge pull request #809 from jmorganca/mxyng/fix-gpu
...
fix: regression unsupported metal types
2023-10-17 08:40:40 -07:00
Andreas Wäscher
68391b0055
Add OllamaSharp for .NET ( #811 )
2023-10-17 11:31:48 -04:00
Alexander F. Rødseth
b7e137323a
Fix a typo ( #818 )
2023-10-17 09:00:15 -04:00
Arne Müller
8fa3f366ad
Removed newline trimming and used buffer directly in POST request.
2023-10-17 08:17:35 +02:00
Michael Yang
fddb303f23
fix: format string wrong type
2023-10-16 16:14:28 -07:00
Michael Yang
ad5ee20c7b
Merge pull request #794 from ggozad/add_oterm
...
Add oterm to community integrations
2023-10-16 15:51:55 -07:00
Michael Yang
785b4eb5bf
Merge branch 'main' into add_oterm
2023-10-16 15:51:44 -07:00
Michael Yang
16ede1b30b
Merge pull request #801 from s-kostyaev/add-ellama-community-integration
...
Add ellama community integration
2023-10-16 15:51:25 -07:00
Michael Yang
17d6bbbb2a
Merge pull request #810 from vieux/patch-1
...
Update install.sh
2023-10-16 15:50:57 -07:00
Victor Vieux
6481b7f34c
Update install.sh, avoid ARCH: unbound variable
2023-10-16 14:40:24 -07:00
Michael Yang
cb4a80b693
fix: regression unsupported metal types
...
omitting `--n-gpu-layers` means use metal on macos which isn't correct
since ollama uses `num_gpu=0` to explicitly disable gpu for file types
that are not implemented in metal
2023-10-16 14:37:20 -07:00
Bruce MacDonald
68d7255bd3
show request to server rather than local check ( #778 )
2023-10-16 17:27:25 -04:00
Michael Yang
9ef2fce33a
Merge pull request #768 from jmorganca/mxyng/bytes
...
fix memory check
2023-10-16 12:42:41 -07:00
Michael Yang
43eaba3d60
Merge pull request #787 from jmorganca/mxyng/server-version2
...
server: print version on start
2023-10-16 09:59:30 -07:00
Michael Yang
1af493c5a0
server: print version on start
2023-10-16 09:59:14 -07:00
Bruce MacDonald
a0c3e989de
deprecate modelfile embed command ( #759 )
2023-10-16 11:07:37 -04:00
Sergey Kostyaev
7af0fdce48
add ellama community integration
2023-10-16 16:39:10 +07:00
Arne Müller
ee94693b1a
handling unescaped json marshaling
2023-10-16 11:15:55 +02:00
Yiorgis Gozadinos
731dbdc1a5
Add oterm to community integrations
2023-10-15 23:21:17 +02:00
Jeffrey Morgan
06bcfbd629
cleanup docker section in readme
2023-10-15 02:33:25 -04:00
Jeffrey Morgan
7d7c2510f8
add docker exec command to readme
2023-10-15 02:31:15 -04:00
Jeffrey Morgan
f9b2f999ac
update readme with docker setup and link to import.md
2023-10-15 02:23:03 -04:00
Jeffrey Morgan
c416087339
import.md: formatting and spelling
2023-10-15 01:39:46 -04:00
Jeffrey Morgan
6002cebd2c
import.md: convert and quantize docs
2023-10-15 00:11:51 -04:00
Jeffrey Morgan
212bdc541c
import.md: model architectures spelling
2023-10-15 00:07:58 -04:00
Jeffrey Morgan
dca6686273
add steps for creating a Modelfile and more example commands to import.md
2023-10-15 00:05:50 -04:00
Jeffrey Morgan
598621afab
add push script for docker images
2023-10-14 14:24:39 -04:00
Matt Williams
6479f49c09
Merge pull request #773 from jmorganca/mattw/howtoquant
...
add how to quantize doc
2023-10-14 08:29:39 -07:00
Matt Williams
b2974a7095
applied mikes comments
...
Signed-off-by: Matt Williams <m@technovangelist.com >
2023-10-14 08:29:24 -07:00
Jeffrey Morgan
832b4db9d4
Use correct url for auto updates
2023-10-13 19:04:42 -04:00
Bruce MacDonald
c43873f33b
check update response ( #785 )
2023-10-13 18:05:46 -04:00
Michael Yang
11d82d7b9b
update checkvram
2023-10-13 14:47:29 -07:00
Michael Yang
36fe2deebf
only check system memory on macos
2023-10-13 14:47:29 -07:00
Michael Yang
4a8931f634
check total (system + video) memory
2023-10-13 14:47:29 -07:00
Michael Yang
bd6e38fb1a
refactor memory check
2023-10-13 14:47:29 -07:00
Michael Yang
92189a5855
fix memory check
2023-10-13 14:47:29 -07:00
Michael Yang
d790bf9916
Merge pull request #783 from jmorganca/mxyng/fix-gpu-offloading
...
fix: offloading on low end GPUs
2023-10-13 14:36:44 -07:00
Michael Yang
35afac099a
do not use gpu binary when num_gpu == 0
2023-10-13 14:32:12 -07:00
Michael Yang
811c3d1900
no gpu if vram < 2GB
2023-10-13 14:32:12 -07:00
Bruce MacDonald
3553d10769
check for newer updates ( #784 )
...
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com >
2023-10-13 17:29:46 -04:00
Bruce MacDonald
6fe178134d
improve api error handling ( #781 )
...
- remove new lines from llama.cpp error messages relayed to client
- check api option types and return error on wrong type
- change num layers from 95% VRAM to 92% VRAM
2023-10-13 16:57:10 -04:00
Jeffrey Morgan
d890890f66
use lower glibc versions in Dockerfile.build
2023-10-13 01:06:19 -04:00
Jeffrey Morgan
89ba19feca
use Go 1.21.3 in Dockerfile
2023-10-12 23:23:12 -04:00
Jeffrey Morgan
6f58c77671
update Dockerfile.build for linux binary builds
2023-10-12 22:14:20 -04:00
Matt Williams
3c975f898f
update doc to refer to docker image
...
Signed-off-by: Matt Williams <m@technovangelist.com >
2023-10-12 15:57:50 -07:00
Matt Williams
9245c8a1df
add how to quantize doc
...
Signed-off-by: Matt Williams <m@technovangelist.com >
2023-10-12 15:34:57 -07:00
Michael Yang
7a537cdca9
Merge pull request #770 from jmorganca/mxyng/fix-download
...
fix download
2023-10-12 12:56:43 -07:00
Michael Yang
257ffeb997
fix download
2023-10-12 12:52:43 -07:00
Matt Williams
9b513bb6b1
Merge pull request #753 from jmorganca/mattw/examplereorg
...
rename the examples to be more descriptive
2023-10-12 11:24:12 -07:00
Matt Williams
042100f797
final rename
...
Signed-off-by: Matt Williams <m@technovangelist.com >
2023-10-12 11:23:41 -07:00
Bruce MacDonald
7804b8fab9
validate api options fields from map ( #711 )
2023-10-12 11:18:11 -04:00
Bruce MacDonald
56497663c8
relay model runner error message to client ( #720 )
...
* give direction to user when runner fails
* also relay errors from timeout
* increase timeout to 3 minutes
2023-10-12 11:16:37 -04:00
Matt Williams
e1afcb8af2
simple gen to simple
...
Signed-off-by: Matt Williams <m@technovangelist.com >
2023-10-11 21:29:07 -07:00
Matt Williams
385eeea357
remove with
...
Signed-off-by: Matt Williams <m@technovangelist.com >
2023-10-11 21:26:11 -07:00
Matt Williams
8a41b244e8
add golang gen
...
Signed-off-by: Matt Williams <m@technovangelist.com >
2023-10-11 21:20:50 -07:00
Jeffrey Morgan
92578798bb
fix relative links in README.md
2023-10-11 19:24:06 -04:00
Michael Yang
788637918a
Merge pull request #760 from jmorganca/mxyng/more-downloads
...
Mxyng/more downloads
2023-10-11 14:33:10 -07:00
Michael Yang
c413a55093
download: handle inner errors
2023-10-11 14:15:30 -07:00
Michael Yang
630bb75d2a
dynamically size download parts based on file size
2023-10-11 14:10:25 -07:00
Michael Yang
a2055a1e93
update download
2023-10-11 14:10:25 -07:00
Michael Yang
b599946b74
add format bytes
2023-10-11 14:08:23 -07:00
Michael Yang
aca2d65b82
Merge pull request #757 from jmorganca/mxyng/format-time
...
cleanup format time
2023-10-11 11:12:29 -07:00
Michael Yang
b5e08e3373
cleanup format time
2023-10-11 11:09:27 -07:00
Bruce MacDonald
274d5a5fdf
optional parameter to not stream response ( #639 )
...
* update streaming request accept header
* add optional stream param to request bodies
2023-10-11 12:54:27 -04:00
Matt Williams
fc6b49be32
add ts alternate to python langchain simplegen
...
Signed-off-by: Matt Williams <m@technovangelist.com >
2023-10-11 09:50:15 -07:00
Bruce MacDonald
77295f716e
prevent waiting on exited command ( #752 )
...
* prevent waiting on exited command
* close llama runner once
2023-10-11 12:32:13 -04:00
Matt Williams
615f7d1dea
cleanup readme.
...
Signed-off-by: Matt Williams <m@technovangelist.com >
2023-10-11 06:13:29 -07:00
Matt Williams
cdf5e106ae
rename dirs
...
Signed-off-by: Matt Williams <m@technovangelist.com >
2023-10-11 06:10:24 -07:00
Matt Williams
a85329f59a
rename the models to be more descriptive
...
Signed-off-by: Matt Williams <m@technovangelist.com >
2023-10-10 17:40:02 -07:00
Bruce MacDonald
f2ba1311aa
improve vram safety with 5% vram memory buffer ( #724 )
...
* check free memory not total
* wait for subprocess to exit
2023-10-10 16:16:09 -04:00
Jeffrey Morgan
65dcd0ce35
always cleanup blob download ( #747 )
2023-10-10 13:12:29 -04:00
Michael Yang
0040f543a2
Merge pull request #743 from jmorganca/mxyng/http-proxy
...
handle upstream proxies
2023-10-10 09:59:06 -07:00
Matt Williams
767f9bdbbb
Merge pull request #585 from jmorganca/matt/examplementors
...
add the example for ask the mentors
2023-10-09 13:58:14 -07:00
Costa Alexoglou
f7f5169c94
Update api.md ( #741 )
...
Avoid triple ticks in visual editor and also copied in clipboard.
2023-10-09 16:01:46 -04:00
Michael Yang
2cfffea02e
handle client proxy
2023-10-09 12:33:47 -07:00
Michael Yang
f6e98334e4
handle upstream proxies
2023-10-09 11:42:36 -07:00
Jeffrey Morgan
ab0668293c
llm: fix build on amd64
2023-10-06 14:39:54 -07:00
Bruce MacDonald
af4cf55884
not found error before pulling model ( #718 )
2023-10-06 16:06:20 -04:00
Bruce MacDonald
d6786f2945
add feedback for reading model metadata ( #722 )
2023-10-06 16:05:32 -04:00
Michael Yang
38dc2f79bc
Merge pull request #626 from jmorganca/mxyng/concurrent-downloads
...
parallel chunked downloads
2023-10-06 13:01:29 -07:00
Michael Yang
cb961c87ca
Merge pull request #679 from jamesbraza/modelfile-docs
...
`Modelfile` syntax highlighting
2023-10-06 12:59:45 -07:00
Michael Yang
0560b28a8d
names
2023-10-06 12:56:56 -07:00
Michael Yang
10199c5987
replace done channel with file check
2023-10-06 12:56:56 -07:00
Michael Yang
288814d3e4
fix ref counts
2023-10-06 12:56:43 -07:00
Michael Yang
04733438da
check head request response
2023-10-06 12:56:43 -07:00
Michael Yang
711e891f0f
fix resumable downloads
...
glob returns files in lexical order which is not appropriate when
rebuilding the parts list
2023-10-06 12:56:43 -07:00
Michael Yang
090d08422b
handle unexpected eofs
2023-10-06 12:56:43 -07:00
Michael Yang
5b84404c64
handle concurrent requests for the same blobs
2023-10-06 12:56:43 -07:00
Michael Yang
8544edca21
parallel chunked downloads
2023-10-06 12:56:43 -07:00
Bruce MacDonald
5d22319a2c
rename server subprocess ( #700 )
...
- this makes it easier to see that the subprocess is associated with ollama
2023-10-06 10:15:42 -04:00
Bruce MacDonald
2130c0708b
output type parsed from modelfile ( #678 )
2023-10-05 14:58:04 -04:00
Patrick Devine
61ff1946e6
revise help text ( #706 )
2023-10-05 11:36:07 -07:00
Bruce MacDonald
d06bc0cb6e
enable q8, q5, 5_1, and f32 for linux gpu ( #699 )
2023-10-05 12:53:47 -04:00
Alexander F. Rødseth
d104b7e997
Fix go test./... issue: fmt.Println arg list ends with redundant newline ( #705 )
2023-10-05 11:11:04 -04:00
Bruce MacDonald
9e2de1bd2c
increase streaming buffer size ( #692 )
2023-10-04 14:09:00 -04:00
Jeffrey Morgan
dc87e9c9ae
update Dockerfile to pass GOFLAGS
2023-10-03 07:05:15 -07:00
Michael Yang
367cb68dc1
Merge pull request #686 from jmorganca/mxyng/starcoder
...
decode starcoder
2023-10-02 22:47:19 -07:00
Michael Yang
c02c0cd483
starcoder
2023-10-02 19:56:51 -07:00
Patrick Devine
1852755154
show a default message when license/parameters/system prompt/template aren't specified ( #681 )
2023-10-02 14:34:52 -07:00
James Braza
6f2ce74231
Got rif of all caps to show it can be lower case
2023-10-02 13:54:27 -07:00
James Braza
6edcc5c79f
Using code highlighting syntax around Modelfile
2023-10-02 13:46:05 -07:00
Bruce MacDonald
b1f7123301
clean up num_gpu calculation code ( #673 )
2023-10-02 14:53:42 -04:00
Bruce MacDonald
1fbf3585d6
Relay default values to llama runner ( #672 )
...
* include seed in params for llama.cpp server and remove empty filter for temp
* relay default predict options to llama.cpp
- reorganize options to match predict request for readability
* omit empty stop
---------
Co-authored-by: hallh <hallh@users.noreply.github.com >
2023-10-02 14:53:16 -04:00
Patrick Devine
99d5161e8a
don't wordwrap when stdout is redirected or piped ( #662 )
2023-10-02 11:50:55 -07:00
Michael
ea8380be45
add community project: Chatbot Ollama
...
add community project: Chatbot Ollama by @ivanfioravanti
2023-10-02 09:04:31 -07:00
Jeffrey Morgan
4f25092dc1
fix build_docker.sh permissions
2023-10-01 16:42:32 -07:00
Jiayu Liu
4fc10acce9
add some missing code directives in docs ( #664 )
2023-10-01 11:51:01 -07:00
Michael Yang
0a4f21c0a7
fix docker build ( #659 )
2023-09-30 13:34:01 -07:00
Jeffrey Morgan
9abb66254a
docker: fix volume permission errors
2023-09-30 12:32:15 -07:00
Jay Nakrani
1d0ebe67e8
Document response stream chunk delimiter. ( #632 )
...
Document response stream chunk delimiter.
2023-09-29 21:45:52 -07:00
Bruce MacDonald
a1b2d95f96
remove unused push/pull params ( #650 )
2023-09-29 17:27:19 -04:00
Michael Yang
c0b1bf7537
Merge pull request #606 from jmorganca/mxyng/install.sh-2
...
ordered list of install locations
2023-09-29 11:30:46 -07:00
Michael Yang
cdfeb165ca
Merge pull request #608 from jmorganca/mxyng/build
...
update build scripts
2023-09-29 11:30:25 -07:00
Michael Yang
92d454ec5f
update build_darwin.sh
2023-09-29 11:29:23 -07:00
Michael Yang
9333b0cc82
Merge pull request #612 from jmorganca/mxyng/prune-empty-directories
...
prune empty directories
2023-09-29 11:23:39 -07:00
Bruce MacDonald
9771b1ec51
windows runner fixes ( #637 )
2023-09-29 11:47:55 -04:00
Patrick Devine
76db4a49cf
allow the user to cancel generating with ctrl-C ( #641 )
2023-09-28 17:13:01 -07:00
Luc Stepniewski
4aa0976a2e
Added missing return preventing SIGSEGV because of missing resp ( #621 )
...
Co-authored-by: Luc Stepniewski <luc@eclipse-fr.com >
2023-09-28 14:25:22 -07:00
Patrick Devine
92c20fdae6
fix error messages for unknown commands in the repl ( #611 )
2023-09-28 14:19:45 -07:00
Michael Yang
c951da7096
Merge pull request #634 from jmorganca/mxyng/int64
...
use int64 consistently
2023-09-28 14:17:47 -07:00
Bruce MacDonald
24d82a23a2
do not download updates multiple times ( #633 )
2023-09-28 15:29:17 -04:00
Michael Yang
f40b3de758
use int64 consistently
2023-09-28 11:07:24 -07:00
Michael
5f4008c296
Update README.md
...
adding in instruction to run mistral
2023-09-28 09:06:03 -07:00
Aaron Coffey
6ae33d8141
Update modelfile.md to reflect the usage of num_gpu. ( #629 )
2023-09-28 10:21:21 -04:00
Jeffrey Morgan
c5664c1fef
Update faq.md
2023-09-27 13:49:43 -07:00
Bruce MacDonald
958a5a8184
revert fedora cuda version check
2023-09-27 15:12:29 -04:00
Michael Yang
8608eb4760
prune empty directories
2023-09-27 10:58:09 -07:00
Bruce MacDonald
a2b210130f
fedora install fixes ( #609 )
2023-09-27 11:43:47 -04:00
Bruce MacDonald
ed20837f9a
Update modelfile.md
2023-09-27 10:38:10 -04:00
James Braza
1db2a61dd0
Added num_predict to the options table ( #614 )
2023-09-27 10:26:08 -04:00
Jeffrey Morgan
2ded8ab206
use 11.8.0 nvidia dockerfile base image for now
2023-09-26 21:48:41 -07:00
Michael Yang
e6b3648bbf
Merge pull request #616 from jmorganca/mxyng/fix-model-name
2023-09-26 20:54:18 -07:00
Michael Yang
0625e805f0
fix model name not matching
2023-09-26 19:50:04 -07:00
Michael Yang
c38ec5befb
Merge pull request #598 from jmorganca/mxyng/help-exit
...
add painter message for exit
2023-09-26 15:17:40 -07:00
Michael Yang
c577721a43
Merge pull request #605 from jmorganca/mxyng/install.sh
...
do not unload nouveau driver
2023-09-26 09:53:05 -07:00
Michael Yang
29c056ea39
ordered list of install locations
2023-09-26 09:38:11 -07:00
Michael Yang
9fc3bba9cf
do no unload nouveau driver
2023-09-26 09:36:54 -07:00
Michael Chiang
7774ed4ae6
Update README.md for linux + cleanup ( #601 )
...
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com >
2023-09-25 23:44:53 -07:00
Michael Yang
11f920f209
Merge pull request #599 from jmorganca/mxyng/install.sh
...
update install.sh
2023-09-25 18:24:13 -07:00
Michael Yang
6e6b655956
update install.sh
2023-09-25 18:09:44 -07:00
Michael Yang
110ae89a6c
Merge pull request #596 from jmorganca/mxyng/install.sh
...
update install.sh
2023-09-25 17:59:13 -07:00
Michael Yang
5e388f931e
check cuda installed before installing
2023-09-25 17:56:43 -07:00
Michael Yang
d5ad41dd7b
fix path for wsl user
2023-09-25 17:56:25 -07:00
Michael Yang
d294a11bc9
start service on exit instead of immediately
2023-09-25 17:54:02 -07:00
Michael Yang
93d887e4bc
add painter message for exit
2023-09-25 16:30:22 -07:00
Jeffrey Morgan
5306b0269d
Update linux.md
2023-09-25 16:10:32 -07:00
Michael Yang
7de0c8345d
Merge pull request #595 from jmorganca/mxyng/install.sh
...
ignore systemctl is-system-running exit code
2023-09-25 15:49:47 -07:00
Michael Yang
1b9dcab3ab
ignore systemctl is-system-running exit code
2023-09-25 15:47:45 -07:00
Bruce MacDonald
86279f4ae3
unbound max num gpu layers ( #591 )
...
---------
Co-authored-by: Michael Yang <mxyng@pm.me >
2023-09-25 18:36:46 -04:00
Michael Yang
b934bf23e6
exit on unknown distro ( #594 )
2023-09-25 15:30:58 -07:00
Michael Yang
2b8ef455ad
Merge pull request #593 from jmorganca/mxyng/install.sh
...
update install.sh
2023-09-25 14:09:40 -07:00
Michael Yang
0c5f47177c
update install.sh
2023-09-25 14:01:44 -07:00
Michael Yang
1210db2924
Merge pull request #592 from jmorganca/mxyng/install.sh
...
fix dkms on debian
2023-09-25 12:59:01 -07:00
Michael Yang
d0854bf1e6
fix dkms on debian
2023-09-25 12:57:25 -07:00
Michael Yang
8396463255
Merge pull request #590 from jmorganca/mxyng/install.sh
...
fix dkms install
2023-09-25 12:17:31 -07:00
Michael Yang
a027bbf4d7
fix dkms install
2023-09-25 12:16:41 -07:00
Michael Yang
ed94a3dd02
Merge pull request #589 from jmorganca/mxyng/install.sh
...
update install.sh
2023-09-25 11:08:25 -07:00
Michael Yang
f14f62ab3b
update install.sh
2023-09-25 11:05:38 -07:00
Jeffrey Morgan
0fb5268496
Update linux.md
2023-09-25 10:06:23 -07:00
Bruce MacDonald
c65edb1506
fix linux installer warning logs ( #588 )
2023-09-25 11:22:56 -04:00
Twan L
1605af32ec
Added a new community project ( #574 )
2023-09-25 10:40:59 -04:00
Jeffrey Morgan
ee3032ad89
improvements to docs/linux.md
2023-09-24 21:50:07 -07:00
Jeffrey Morgan
5b7a27281d
improvements to docs/linux.md
2023-09-24 21:38:23 -07:00
Jeffrey Morgan
d2a784e33e
add docs/linux.md
2023-09-24 21:34:44 -07:00
Jeffrey Morgan
413a2e4f91
set DEBIAN_FRONTEND=noninteractive correctly
2023-09-24 20:35:42 -07:00
Matt Williams
a92fdff620
add the example for ask the mentors
...
Signed-off-by: Matt Williams <m@technovangelist.com >
2023-09-24 15:58:32 -07:00
Patrick Devine
b5614f3ebc
fix end-of-line issue with the new prompt ( #582 )
2023-09-23 17:20:30 -07:00
Jeffrey Morgan
8b2ba9cab8
minor improvements to install.sh
2023-09-23 11:20:39 -04:00
Jeffrey Morgan
e29662ab5c
fix minor install script issues on debian
2023-09-23 10:25:47 -04:00
Bruce MacDonald
cbc40aa996
debian installer support ( #579 )
...
* debian installer support
- normalize os name to lowercase
- check needed commands are available
- dont check sudo when root user
- share common install commands
- support debian cuda install
- skip aarm cuda install
- system user shared home dir
* refactor and add other platforms (#580 )
---------
Co-authored-by: Michael Yang <mxyng@pm.me >
2023-09-23 09:46:47 -04:00
Jeffrey Morgan
5cb82540c9
install.sh: update install url
2023-09-23 09:35:14 -04:00
Jeffrey Morgan
d7849a1dc9
add .env to .dockerignore
2023-09-23 00:53:48 -04:00
Jeffrey Morgan
01c44d687e
add multi line strings to final prompt
2023-09-23 00:27:24 -04:00
Jeffrey Morgan
9b12a511ca
check other request fields before load short circuit in /api/generate
2023-09-22 23:50:55 -04:00
Jeffrey Morgan
e20362e0d5
fix multi line input in ollama run
2023-09-22 23:49:35 -04:00
Patrick Devine
c928ceb927
add word wrapping for lines which are longer than the terminal width ( #553 )
2023-09-22 13:36:08 -07:00
Michael Yang
e1a0846483
Merge pull request #571 from jmorganca/mxyng/update-dockerfile
...
update dockerfile.cuda
2023-09-22 12:34:41 -07:00
Jeffrey Morgan
f997e29e45
Add Dockerfile.build for building linux binaries ( #558 )
...
Add `Dockerfile.build` for building linux binaries
---------
Co-authored-by: Michael Yang <mxyng@pm.me >
2023-09-22 15:20:12 -04:00
Patrick Devine
87d9efb364
switch to forked readline lib which doesn't wreck the repl prompt ( #578 )
2023-09-22 12:17:45 -07:00
Michael Yang
93d3a2568d
replace dockerfile
2023-09-22 11:57:38 -07:00
Michael Yang
5a81390b24
update dockerfile.cuda
2023-09-22 11:57:38 -07:00
Michael Yang
a89ef99aed
Merge pull request #575 from jmorganca/mxyng/fix-ipv6-only
...
fix ipv6 parse ip
2023-09-22 11:47:11 -07:00
Bruce MacDonald
dc0c725ceb
ubuntu cuda drivers ( #576 )
2023-09-22 19:43:14 +01:00
Bruce MacDonald
5d71bda478
close llm on interrupt ( #577 )
2023-09-22 19:41:52 +01:00
Michael Yang
88897a90e4
fix ipv6 parse ip
2023-09-22 10:41:32 -07:00
Bruce MacDonald
9df31c3518
linux installer script ( #534 )
...
Co-authored-by: Michael Yang <mxyng@pm.me >
2023-09-22 17:01:03 +01:00
Michael Yang
2044f9d4da
Merge pull request #570 from jmorganca/mxyng/head-request
...
fix HEAD request
2023-09-21 16:56:17 -07:00
Michael Yang
0d186f3b33
Merge pull request #569 from jmorganca/mxyng/update-submodules
...
silence warm up log
2023-09-21 16:52:42 -07:00
Michael Yang
82f5b66c01
register HEAD /api/tags
2023-09-21 16:38:03 -07:00
Michael Yang
c986694367
fix HEAD / request
...
HEAD request should respond like their GET counterparts except without a
response body.
2023-09-21 16:35:58 -07:00
Michael Yang
058d0cd04b
silence warm up log
2023-09-21 14:53:33 -07:00
Michael Yang
ee1c994d15
update submodule ( #567 )
2023-09-21 16:22:23 -04:00
Bruce MacDonald
4cba75efc5
remove tmp directories created by previous servers ( #559 )
...
* remove tmp directories created by previous servers
* clean up on server stop
* Update routes.go
* Update server/routes.go
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com >
* create top-level temp ollama dir
* check file exists before creating
---------
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com >
Co-authored-by: Michael Yang <mxyng@pm.me >
2023-09-21 20:38:49 +01:00
Michael Yang
8c83701e9f
Merge pull request #566 from jmorganca/mxyng/api-check-model-exists
...
Use API to check if model exists and pull if necessary
2023-09-21 10:35:14 -07:00
Michael Yang
6137b12799
validate existence and pull model using api
2023-09-21 09:55:34 -07:00
Michael Yang
1fabba474b
refactor default allow origins
...
this should be less error prone
2023-09-21 09:42:25 -07:00
Michael Yang
765770efdb
Merge pull request #562 from jmorganca/mxyng/fix-ollama-host
...
fix OLLAMA_HOST parsing for ip6
2023-09-20 19:54:47 -07:00
Michael Yang
9297ff8330
fix OLLAMA_HOST parsing for ip6
2023-09-20 18:52:57 -07:00
Michael Yang
ee4fd16f2c
Merge pull request #556 from jmorganca/pack-cuda
...
pack in cuda libs
2023-09-20 15:02:36 -07:00
Michael Yang
a9ed7cc6aa
rename generate.go
2023-09-20 14:42:17 -07:00
Michael Yang
6c6a31a1e8
embed libraries using cmake
2023-09-20 14:41:57 -07:00
Bruce MacDonald
fc6ec356fc
remove libcuda.so
2023-09-20 20:36:14 +01:00
Bruce MacDonald
1255bc9b45
only package 11.8 runner
2023-09-20 20:00:41 +01:00
Michael Yang
084e4c782a
Merge pull request #557 from jmorganca/mxyng/cleanup
...
fix impossible condition
2023-09-20 11:51:01 -07:00
Michael Yang
58ffa03d8b
fix impossible condition
2023-09-20 11:27:44 -07:00
Michael Yang
637f8bc6a5
Merge pull request #536 from jmorganca/mxyng/redirect-uploads
...
explicitly follow upload redirects
2023-09-20 11:27:03 -07:00
Michael Yang
499e9007a5
pick chunksize based on location
2023-09-20 11:10:24 -07:00
Bruce MacDonald
b9bb5ca288
use cuda_version
2023-09-20 17:58:16 +01:00
Bruce MacDonald
4e8be787c7
pack in cuda libs
2023-09-20 17:40:42 +01:00
Michael Yang
aa45d7c1df
draft: explicitly follow upload redirects
2023-09-19 13:36:58 -07:00
Michael Yang
e35565c567
Merge pull request #555 from jmorganca/mxyng/fix-windows-startup
...
fix build
2023-09-19 10:51:58 -07:00
Michael Yang
a5520bfb42
fix build
2023-09-19 10:42:24 -07:00
Michael Yang
2627c464ba
Merge pull request #554 from jmorganca/mxyng/fix-windows-startup
...
fix mkdir on windows
2023-09-19 09:42:12 -07:00
Michael Yang
b58d5d16b0
fix mkdir on windows
2023-09-19 09:41:13 -07:00
Patrick Devine
24580df958
only add a layer if there is actual data ( #535 )
2023-09-18 13:47:45 -07:00
Patrick Devine
80dd44e80a
Cmd changes ( #541 )
2023-09-18 12:26:56 -07:00
James Braza
94e1d96b29
Updated README section on community projects for table ( #550 )
2023-09-18 15:22:50 -04:00
Bruce MacDonald
66003e1d05
subprocess improvements ( #524 )
...
* subprocess improvements
- increase start-up timeout
- when runner fails to start fail rather than timing out
- try runners in order rather than choosing 1 runner
- embed metal runner in metal dir rather than gpu
- refactor logging and error messages
* Update llama.go
* Update llama.go
* simplify by using glob
2023-09-18 15:16:32 -04:00
Michael Yang
c345053a8b
Merge pull request #537 from jmorganca/mxyng/upload
...
fix error on upload chunk
2023-09-15 17:48:39 -07:00
Michael Yang
08d7c2a944
fix error on upload chunk
2023-09-15 15:59:30 -07:00
Michael Yang
bc9573dcb1
Merge pull request #530 from jmorganca/mxyng/progresswriter
...
implement ProgressWriter
2023-09-15 12:43:46 -07:00
Michael Yang
e53bc57d4d
split uploadBlobChunked
2023-09-14 17:22:05 -07:00
Michael Yang
f0b398d17f
implement ProgressWriter
2023-09-14 17:22:04 -07:00
Patrick Devine
8efbc5df55
DRAFT: add a simple python client to access ollama ( #522 )
2023-09-14 16:37:38 -07:00
Michael Yang
ccc3e9ac6d
Merge pull request #531 from jmorganca/mxyng/content-length
...
set request.ContentLength
2023-09-14 13:33:11 -07:00
Michael Yang
daa4f096f9
set request.ContentLength
...
This informs the HTTP client the content length is known and disables
chunked Transfer-Encoding
2023-09-14 13:32:44 -07:00
Michael Yang
3ee85f1c6c
Merge pull request #526 from jmorganca/mxyng/cleanup
...
remove unused
2023-09-14 13:10:59 -07:00
Bruce MacDonald
2540c9181c
support for packaging in multiple cuda runners ( #509 )
...
* enable packaging multiple cuda versions
* use nvcc cuda version if available
---------
Co-authored-by: Michael Yang <mxyng@pm.me >
2023-09-14 15:08:13 -04:00
Michael Yang
83ffb154bc
Merge pull request #507 from jmorganca/mxyng/build
...
update docker image
2023-09-14 11:25:59 -07:00
Michael Yang
9aa192c812
update cuda docker image
2023-09-14 11:25:20 -07:00
Matt Williams
fc8707686f
Update API docs ( #527 )
...
* Update API docs
Signed-off-by: Matt Williams <m@technovangelist.com >
* strange TOC was getting auto generated
Signed-off-by: Matt Williams <m@technovangelist.com >
* Update docs/api.md
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com >
* Update docs/api.md
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com >
* Update docs/api.md
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com >
* Update api.md
---------
Signed-off-by: Matt Williams <m@technovangelist.com >
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com >
Co-authored-by: Michael Chiang <mchiang0610@users.noreply.github.com >
2023-09-14 08:51:26 -07:00
Michael Yang
f89c23764b
Merge pull request #525 from jmorganca/mxyng/falcon-decode
...
fix: add falcon.go
2023-09-13 15:08:47 -07:00
Michael Yang
e6881cabd0
remove unused
2023-09-13 14:48:33 -07:00
Michael Yang
d028853879
fix: add falcon.go
2023-09-13 14:47:37 -07:00
Michael Yang
949553db23
Merge pull request #519 from jmorganca/mxyng/decode
...
Mxyng/decode
2023-09-13 12:43:57 -07:00
Michael Yang
0c5a454361
fix model type for 70b
2023-09-12 15:12:59 -07:00
Bruce MacDonald
f59c4d03f7
fix ggml arm64 cuda build ( #520 )
2023-09-12 17:06:48 -04:00
Michael Yang
7dee25a07f
fix falcon decode
...
get model and file type from bin file
2023-09-12 12:34:53 -07:00
Bruce MacDonald
f221637053
first pass at linux gpu support ( #454 )
...
* linux gpu support
* handle multiple gpus
* add cuda docker image (#488 )
---------
Co-authored-by: Michael Yang <mxyng@pm.me >
2023-09-12 11:04:35 -04:00
Patrick Devine
45ac07cd02
create the blobs directory correctly ( #508 )
2023-09-11 14:54:52 -07:00
Jeffrey Morgan
7d749cc787
fix darwin build script
2023-09-11 16:31:46 -04:00
Patrick Devine
e7e91cd71c
add autoprune to remove unused layers ( #491 )
2023-09-11 11:46:35 -07:00
Jeffrey Morgan
3920e15386
add model format to config layer ( #497 )
2023-09-09 17:53:44 -04:00
Michael Yang
41e976edde
Merge pull request #492 from jmorganca/mxyng/nil-pointer
...
fix nil pointer dereference
2023-09-07 17:25:23 -07:00
Michael Yang
de227b620f
fix nil pointer dereference
2023-09-07 17:24:31 -07:00
Michael Yang
63def6ca49
Merge pull request #487 from jmorganca/mxyng/dockerignore
...
update dockerignore
2023-09-07 14:16:17 -07:00
Michael Yang
738fe9c4aa
Merge pull request #486 from jmorganca/mxyng/fix-push
...
fix: retry push on expired token
2023-09-07 13:58:34 -07:00
Michael Yang
a8da0bacbe
update dockerignore
2023-09-07 13:36:25 -07:00
Michael Yang
bf146fb072
fix retry on unauthorized chunk
2023-09-07 12:02:04 -07:00
Michael Yang
f0f4943577
fix get auth token
2023-09-07 12:01:56 -07:00
Bruce MacDonald
09dd2aeff9
GGUF support ( #441 )
2023-09-07 13:55:37 -04:00
Alexander Pepper
07b4074e7b
[docs] Improve build instructions ( #482 )
...
Go is required and not installed by default.
2023-09-07 06:43:26 -04:00
Jeffrey Morgan
61dda6a5e0
set minimum CMAKE_OSX_DEPLOYMENT_TARGET to 11.0
2023-09-06 19:56:50 -04:00
Michael Yang
e1f9ced568
Merge pull request #479 from jmorganca/mxyng/dockerfile
...
update dockerfile
2023-09-06 15:44:24 -07:00
Michael Yang
9795b43d93
update dockerfile
2023-09-06 15:31:25 -07:00
Michael Yang
0980d5c7e3
Merge pull request #478 from jmorganca/mxyng/cleanup
...
remove unused openssh key types
2023-09-06 15:18:54 -07:00
Michael Yang
0dae34b6a7
remove unused openssh key types
2023-09-06 14:34:09 -07:00
Michael Yang
83c6be1666
fix model manifests ( #477 )
2023-09-06 17:30:08 -04:00
Patrick Devine
1adfa67589
tighten up the error string for ollama show flags ( #476 )
2023-09-06 13:38:49 -07:00
Patrick Devine
790d24eb7b
add show command ( #474 )
2023-09-06 11:04:17 -07:00
Jeffrey Morgan
7de300856b
use osPath in gpu check
2023-09-05 21:52:21 -04:00
Jeffrey Morgan
213ffdb548
macos amd64 compatibility fixes
2023-09-05 21:33:31 -04:00
Michael Yang
d42d88386a
Merge pull request #473 from jmorganca/mxyng/fix-manifest-path
...
create manifests directory
2023-09-05 17:37:41 -07:00
Ackermann Yuriy
154f24af91
Added missing options params to the embeddings docs ( #472 )
2023-09-05 20:18:49 -04:00
Michael Yang
a1ecdd36d5
create manifests directory
2023-09-05 17:10:40 -07:00
Bruce MacDonald
d18282bfda
metal: add missing barriers for mul-mat ( #469 )
2023-09-05 19:37:13 -04:00
Michael Yang
9ae76ba8c9
Merge pull request #471 from jmorganca/mxyng/fix-empty-response
...
fix empty response
2023-09-05 15:23:05 -07:00
Michael Yang
2bc06565c7
fix empty response
2023-09-05 15:03:24 -07:00
Michael Yang
d1c2558f7e
Merge pull request #461 from jmorganca/mxyng/fix-inherit-params
...
fix inherit params
2023-09-05 12:30:23 -07:00
Michael Yang
7b5aefb427
Merge pull request #462 from jmorganca/mxyng/rm-marshal-prompt
...
remove marshalPrompt which is no longer needed
2023-09-05 11:48:41 -07:00
Michael Yang
06ef90c051
fix parameter inheritence
...
parameters are not inherited because they are processed differently from
other layer. fix this by explicitly merging the inherited params into
the new params. parameter values defined in the new modelfile will
override those defined in the inherited modelfile. array lists are
replaced instead of appended
2023-09-05 11:40:20 -07:00
Michael Yang
7efbc84320
Merge pull request #464 from jmorganca/mxyng/fix-num-keep
...
fix num_keep
2023-09-05 11:30:45 -07:00
Michael Yang
e9f6df7dca
use slices.DeleteFunc
2023-09-05 09:56:59 -07:00
Jeffrey Morgan
7fa6e51686
generate binary dependencies based on GOARCH on macos ( #459 )
2023-09-05 12:53:57 -04:00
Michael Yang
8dc68417e7
Merge pull request #463 from jmorganca/mxyng/fix-last-token
...
fix not forwarding last token
2023-09-05 09:01:32 -07:00
Michael Yang
681f3c4c42
fix num_keep
2023-09-03 17:47:49 -04:00
Michael Yang
59a705525c
fix not forwarding last token
2023-09-03 17:46:50 -04:00
Michael Yang
5d3f314b0b
remove marshalPrompt which is no longer needed
2023-09-03 17:01:05 -04:00
Michael Yang
adaa13088b
Merge pull request #457 from sqs/dont-html-escape-prompt
...
do not HTML-escape prompt
2023-09-01 17:41:53 -07:00
Quinn Slack
62d29b2157
do not HTML-escape prompt
...
The `html/template` package automatically HTML-escapes interpolated strings in templates. This behavior is undesirable because it causes prompts like `<h1>hello` to be escaped to `<h1>hello` before being passed to the LLM.
The included test case passes, but before the code change, it failed:
```
--- FAIL: TestModelPrompt
images_test.go:21: got "a<h1>b", want "a<h1>b"
```
2023-09-01 17:16:38 -05:00
Michael Yang
ed19d10aa5
update readme ( #451 )
...
* update readme
* readme: more run examples
2023-09-01 16:44:14 -04:00
Michael Yang
36c2f45c40
Merge pull request #450 from jmorganca/mxyng/update-readme
...
update readme
2023-09-01 08:21:49 -07:00
Michael Yang
742226625f
update readme
2023-09-01 10:54:31 -04:00
Matt Williams
6bb8a16ccb
Merge pull request #273 from jmorganca/matt/moreexamples
...
Create a sentiments example
2023-08-31 16:31:59 -07:00
Jeffrey Morgan
a5dbcf2e73
app: dont package ggml-metal.metal
2023-08-31 17:41:09 -04:00
Michael Yang
9304f0e7a8
Merge pull request #443 from jmorganca/mxyng/fix-list-models
...
windows: fix filepath bugs
2023-08-31 14:19:10 -07:00
Michael Yang
6578b2f8a1
Merge pull request #448 from callmephilip/patch-1
...
fix spelling errors in example prompts
2023-08-31 08:57:07 -07:00
Michael Yang
1c8fd627ad
windows: fix create modelfile
2023-08-31 09:47:10 -04:00
Michael Yang
ae950b00f1
windows: fix delete
2023-08-31 09:47:10 -04:00
Michael Yang
eeb40a672c
fix list models for windows
2023-08-31 09:47:10 -04:00
Michael Yang
0f541a0367
s/ListResponseModel/ModelResponse/
2023-08-31 09:47:10 -04:00
Philip Nuzhnyi
1363f537ce
fix spelling errors in prompt
2023-08-31 10:02:46 +01:00
Jeffrey Morgan
bc3e21fdc6
update README.md
2023-08-30 17:56:14 -04:00
Jeffrey Morgan
a82eb275ff
update docs for subprocess
2023-08-30 17:54:02 -04:00
Bruce MacDonald
f964aea9a2
remove test not applicate to subprocess
2023-08-30 16:36:11 -04:00
Bruce MacDonald
42998d797d
subprocess llama.cpp server ( #401 )
...
* remove c code
* pack llama.cpp
* use request context for llama_cpp
* let llama_cpp decide the number of threads to use
* stop llama runner when app stops
* remove sample count and duration metrics
* use go generate to get libraries
* tmp dir for running llm
2023-08-30 16:35:03 -04:00
Quinn Slack
f4432e1dba
treat stop as stop sequences, not exact tokens ( #442 )
...
The `stop` option to the generate API is a list of sequences that should cause generation to stop. Although these are commonly called "stop tokens", they do not necessarily correspond to LLM tokens (per the LLM's tokenizer). For example, if the caller sends a generate request with `"stop":["\n"]`, then generation should stop on any token containing `\n` (and trim `\n` from the output), not just if the token exactly matches `\n`. If `stop` were interpreted strictly as LLM tokens, then it would require callers of the generate API to know the LLM's tokenizer and enumerate many tokens in the `stop` list.
Fixes https://github.com/jmorganca/ollama/issues/295 .
2023-08-30 11:53:42 -04:00
Michael Yang
982c535428
Merge pull request #428 from jmorganca/mxyng/upload-chunks
...
update upload chunks
2023-08-30 07:47:17 -07:00
Michael Yang
7df342a6ea
Merge pull request #421 from jmorganca/mxyng/f16-metal
...
allow F16 to use metal
2023-08-29 06:32:59 -07:00
Patrick Devine
8bbff2df98
add model IDs ( #439 )
2023-08-28 20:50:24 -07:00
Michael Yang
16b06699fd
remove unused parameter
2023-08-28 18:35:18 -04:00
Michael Yang
246dc65417
loosen http status code checks
2023-08-28 18:34:53 -04:00
Michael Yang
865fceb73c
chunked pipe
2023-08-28 18:34:53 -04:00
Michael Yang
72266c7684
bump chunk size to 95MB
2023-08-28 18:34:53 -04:00
Jeffrey Morgan
d3b838ce60
update orca to orca-mini
2023-08-27 13:26:30 -04:00
Michael Yang
e639a12fa1
Merge pull request #412 from jmorganca/mxyng/update-readme
...
update README.md
2023-08-26 21:26:34 -07:00
Michael Yang
e82fcf30c6
Merge pull request #420 from jmorganca/mxyng/34b-mem-check
...
add 34b to mem check
2023-08-26 14:15:52 -07:00
Michael Yang
495e8b0a6a
Merge pull request #426 from jmorganca/default-template
...
set default template
2023-08-26 14:15:38 -07:00
Michael Yang
59734ca24d
set default template
2023-08-26 12:20:48 -07:00
Jeffrey Morgan
22ab7f5f88
default host to 127.0.0.1, fixes #424
2023-08-26 11:59:28 -07:00
Michael Yang
b25dd1795d
allow F16 to use metal
...
warning F16 uses significantly more memory than quantized model so the
standard requires don't apply.
2023-08-26 08:38:48 -07:00
Michael Yang
304f2b6c96
add 34b to mem check
2023-08-26 08:29:21 -07:00
Quinn Slack
2ecc3a33c3
delete all models (not just 1st) in ollama rm ( #415 )
...
Previously, `ollama rm model1 model2 modelN` would only delete `model1`. The other model command-line arguments would be silently ignored. Now, all models mentioned are deleted.
2023-08-26 00:47:56 -07:00
Jeffrey Morgan
ee6e1df118
add codellama to model list in readme
2023-08-25 20:44:26 -07:00
Jeffrey Morgan
177b69a211
add missing entries for 34B
2023-08-25 18:35:35 -07:00
Michael Yang
dad63f0821
Merge pull request #411 from jmorganca/mxyng/34b
...
patch llama.cpp for 34B
2023-08-25 11:59:05 -07:00
Michael Yang
041f9ad1a1
update README.md
2023-08-25 11:44:25 -07:00
Michael Yang
7a378f8b66
patch llama.cpp for 34B
2023-08-25 10:06:55 -07:00
Michael Yang
de0bdd7f29
Merge pull request #405 from jmorganca/mxyng/34b
...
add 34b model type
2023-08-24 10:37:22 -07:00
Michael Yang
b1cececb8e
add 34b model type
2023-08-24 10:35:44 -07:00
Michael Yang
e0d39fa3bf
Merge pull request #398 from jmorganca/mxyng/cleanup
...
Mxyng/cleanup
2023-08-22 15:51:41 -07:00
Michael Yang
968ced2e71
Merge pull request #393 from jmorganca/mxyng/net-url
...
use url.URL
2023-08-22 15:51:33 -07:00
Michael Yang
32d1a00017
remove unused requestContextKey
2023-08-22 10:49:54 -07:00
Michael Yang
04e2128273
move upload funcs to upload.go
2023-08-22 10:49:53 -07:00
Michael Yang
2cc634689b
use url.URL
2023-08-22 10:49:07 -07:00
Michael Yang
8f827641b0
Merge pull request #397 from jmorganca/mxyng/release-mode
...
build release mode
2023-08-22 10:48:44 -07:00
Michael Yang
95187d7e1e
build release mode
2023-08-22 09:52:43 -07:00
Michael Yang
9ec7e37534
Merge pull request #392 from jmorganca/mxyng/version
...
add version
2023-08-22 09:50:25 -07:00
Michael Yang
2c7f956b38
add version
2023-08-22 09:40:58 -07:00
Jeffrey Morgan
a9f6c56652
fix FROM instruction erroring when referring to a file
2023-08-22 09:39:42 -07:00
Ryan Baker
0a892419ad
Strip protocol from model path ( #377 )
2023-08-21 21:56:56 -07:00
Jeffrey Morgan
e3054fc74e
add .env to .dockerignore
2023-08-21 09:32:02 -07:00
Michael Yang
23c2485044
Merge pull request #381 from jmorganca/mxyng/fix-push-chunks
...
retry on unauthorized chunk push
2023-08-18 13:49:25 -07:00
Michael Yang
386c66f285
Merge pull request #378 from jmorganca/mxyng/copy-metadata-from-source
...
copy metadata from source
2023-08-18 13:49:09 -07:00
Michael Yang
3b49315f97
retry on unauthorized chunk push
...
The token printed for authorized requests has a lifetime of 1h. If an
upload exceeds 1h, a chunk push will fail since the token is created on
a "start upload" request.
This replaces the Pipe with SectionReader which is simpler and
implements Seek, a requirement for makeRequestWithRetry. This is
slightly worse than using a Pipe since the progress update is directly
tied to the chunk size instead of controlled separately.
2023-08-18 11:23:47 -07:00
Michael Yang
5ca05c2e88
fix ModelType()
2023-08-18 11:23:38 -07:00
Michael Yang
7eda70f23b
copy metadata from source
2023-08-17 21:55:25 -07:00
Jeffrey Morgan
3d79b414d3
app: package ggml-metal.metal from correct directory
2023-08-17 23:55:45 -04:00
Michael Yang
c84bbf1dd6
Merge pull request #376 from jmorganca/mxyng/from-map-ignore-nil
...
ignore nil map values
2023-08-17 15:57:12 -07:00
Michael Yang
f723bf0879
ignore nil map values
2023-08-17 15:50:46 -07:00
Michael Yang
cbf725a9ba
Merge pull request #375 from jmorganca/mxyng/fix-push
...
fix push manifest
2023-08-17 15:33:31 -07:00
Michael Yang
086449b6c7
fmt
2023-08-17 15:32:31 -07:00
Michael Yang
3cbc6a5c01
fix push manifest
2023-08-17 15:28:12 -07:00
Jeffrey Morgan
54bb49a502
parse protocol for OLLAMA_HOST
2023-08-17 18:20:44 -04:00
Michael Yang
cabaada956
Merge pull request #372 from jmorganca/mxyng/string-types
...
model and file type as strings
2023-08-17 15:10:59 -07:00
Michael Yang
a894cc792d
model and file type as strings
2023-08-17 12:08:04 -07:00
Bruce MacDonald
519f4d98ef
add embed docs for modelfile
2023-08-17 13:37:42 -04:00
Michael Yang
b963a83559
Merge pull request #364 from jmorganca/chunked-uploads
...
reimplement chunked uploads
2023-08-17 09:58:51 -07:00
Michael Yang
bf6688abe6
Merge pull request #360 from jmorganca/fix-request-copies
...
Fix request copies
2023-08-17 09:58:42 -07:00
Bruce MacDonald
6005b157c2
retry download on network errors
2023-08-17 10:31:45 -04:00
Patrick Devine
14220d9833
set the scopes correctly ( #368 )
2023-08-16 21:42:02 -07:00
Michael Chiang
8ca50f24f3
fix nous-hermes model file size listing in readme ( #367 )
...
fix nous-hermes model file size listing in readme
2023-08-16 23:42:00 -04:00
Michael Chiang
c149fc3143
Update README.md
2023-08-16 22:54:55 -04:00
Michael Chiang
afbc763dac
adding link to models directly available on ollama ( #366 )
...
- adding link to models directly available on ollama
- ability to push your own models to the library will come in the future
2023-08-16 22:53:27 -04:00
Michael Yang
5dfe91be8b
reimplement chunked uploads
2023-08-16 14:50:24 -07:00
Michael Yang
9f944c00f1
push: retry on unauthorized
2023-08-16 11:35:33 -07:00
Michael Yang
56e87cecb1
images: remove body copies
2023-08-16 10:30:41 -07:00
Jeffrey Morgan
5ee6116420
set default OLLAMA_HOST to http://localhost:11434
2023-08-16 12:22:59 -04:00
Michael Yang
5d9a4cd251
Merge pull request #348 from jmorganca/cross-repo-mount
...
cross repo blob mount
2023-08-16 09:20:36 -07:00
Michael Yang
0ebec07569
Merge pull request #345 from jmorganca/exit-non-zero
...
set non-zero error code on error
2023-08-16 09:20:28 -07:00
Matt Williams
08265515b3
Merge pull request #303 from jmorganca/matt/dockerit
...
DockerIt example
2023-08-16 08:04:34 -07:00
Blake Mizerany
67e593e355
cmd: support OLLAMA_CLIENT_HOST environment variable ( #262 )
...
* cmd: support OLLAMA_HOST environment variable
This commit adds support for the OLLAMA_HOST environment
variable. This variable can be used to specify the host to which
the client should connect. This is useful when the client is
running somewhere other than the host where the server is running.
The new api.FromEnv function is used to read configure clients from the
environment. Clients wishing to use the environment variable being
consistent with the Ollama CLI can use this new function.
* Update api/client.go
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com >
* Update api/client.go
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com >
---------
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com >
2023-08-16 11:03:48 -04:00
Jeffrey Morgan
d15c7622b9
Update orca to orca-mini in README.md
2023-08-15 21:10:28 -04:00
Bruce MacDonald
1deb35ca64
use loaded llm for generating model file embeddings
2023-08-15 16:12:02 -03:00
Bruce MacDonald
e2de886831
do not regenerate embeddings
2023-08-15 16:10:22 -03:00
Bruce MacDonald
f0d7c2f5ea
retry download on network errors
2023-08-15 15:07:19 -03:00
Bruce MacDonald
12052a7624
always remove from in progress map on download
2023-08-15 13:20:32 -03:00
Bruce MacDonald
23e1da778d
Add context to api docs
2023-08-15 11:43:22 -03:00
Bruce MacDonald
326de48930
use loaded llm for embeddings
2023-08-15 10:50:54 -03:00
Bruce MacDonald
18f2cb0472
dont log fatal
2023-08-15 10:39:59 -03:00
Bruce MacDonald
53bc36d207
Update modelfile.md
2023-08-15 09:23:36 -03:00
Michael Yang
4dcf5c3e0b
Merge pull request #349 from jmorganca/close-files
...
close open files
2023-08-14 16:15:58 -07:00
Michael Yang
d1b2f532b9
Merge pull request #350 from jmorganca/update-llama-cpp
...
update llama.cpp
2023-08-14 16:15:51 -07:00
Michael Yang
e26085b921
close open files
2023-08-14 16:08:06 -07:00
Michael Yang
f7b613332c
update llama.cpp
2023-08-14 15:47:00 -07:00
Michael Yang
f594c8eb91
cross repo mount
2023-08-14 15:07:35 -07:00
Michael Yang
76b85bc0e9
set non-zero error code on error
2023-08-14 14:09:58 -07:00
Bruce MacDonald
af98a1773f
update python example
2023-08-14 16:38:44 -03:00
Bruce MacDonald
9ae9a89883
Update modelfile.md
2023-08-14 16:26:53 -03:00
Bruce MacDonald
648f0974c6
python example
2023-08-14 15:27:13 -03:00
Bruce MacDonald
fc5230dffa
Add context to api docs
2023-08-14 15:23:24 -03:00
Bruce MacDonald
2ab20095b3
log embedding eval timing
2023-08-14 12:15:55 -04:00
Bruce MacDonald
f020e1d519
always remove from in progress map on download
2023-08-14 13:09:20 -03:00
Bruce MacDonald
4b2d366c37
Update llama.go
2023-08-14 12:55:50 -03:00
Bruce MacDonald
56fd4e4ef2
log embedding eval timing
2023-08-14 12:51:31 -03:00
Bruce MacDonald
2c8b680b03
use file info for embeddings cache
2023-08-14 12:11:04 -03:00
Bruce MacDonald
99b6b60085
use model bin digest for embed digest
2023-08-14 11:57:12 -03:00
Bruce MacDonald
74f00474e1
Merge pull request #340 from gusanmaz/main
...
Update langchainpy.md
2023-08-14 09:38:42 -04:00
Bruce MacDonald
e9a9580bdd
do not regenerate embeddings
...
- re-use previously evaluated embeddings when possible
- change embeddings digest identifier to be based on model name and embedded file path
2023-08-14 10:34:17 -03:00
Güvenç Usanmaz
4c33a9ac67
Update langchainpy.md
...
base_url value for Ollama object creation is corrected.
2023-08-14 12:12:56 +03:00
Jeffrey Morgan
22885aeaee
update llama.cpp to f64d44a
2023-08-12 22:47:15 -04:00
Jeffrey Morgan
ed969d2a06
add LiteLLM to README.md
2023-08-12 20:47:57 -04:00
Patrick Devine
d9cf18e28d
add maximum retries when pushing ( #334 )
2023-08-11 15:41:55 -07:00
Jeffrey Morgan
1556162c90
create .ollama directory if it doesnt exist
2023-08-11 15:35:55 -07:00
Jeffrey Morgan
148f0225c0
create .ollama directory if it doesnt exist
2023-08-11 15:33:11 -07:00
Matt Williams
4e07941b1e
Merge pull request #329 from jmorganca/matt/tutorials
...
Add tutorials for using Langchain with ollama
2023-08-11 15:19:39 -07:00
Matt Williams
202c29c21a
resolving bmacd comment
...
Signed-off-by: Matt Williams <m@technovangelist.com >
2023-08-11 13:51:44 -07:00
Matt Williams
c1c871620a
Update docs/tutorials/langchainjs.md
...
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com >
2023-08-11 13:48:46 -07:00
Matt Williams
a21a8bef56
Update docs/tutorials/langchainjs.md
...
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com >
2023-08-11 13:48:35 -07:00
Matt Williams
522726228a
Update docs/tutorials.md
...
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com >
2023-08-11 13:48:16 -07:00
Patrick Devine
9770e3b325
Generate private/public keypair for use w/ auth ( #324 )
2023-08-11 10:58:23 -07:00
Michael Yang
d617823355
Merge pull request #333 from jmorganca/off-by-one
...
ggml: fix off by one error
2023-08-11 10:51:06 -07:00
Michael Yang
6ed991c8e2
ggml: fix off by one error
...
remove used Unknown FileType
2023-08-11 10:45:22 -07:00
Michael Chiang
e41576e768
Merge branch 'new-syntax' of https://github.com/jmorganca/ollama into new-syntax
2023-08-11 09:00:43 -07:00
Michael Chiang
155c1640f1
add demo video
2023-08-11 08:58:57 -07:00
Jeffrey Morgan
f7d4947573
update header note for privategpt example
2023-08-11 08:52:26 -07:00
Jeffrey Morgan
0d7a133b15
Update README.md for privategpt
2023-08-11 08:29:19 -07:00
Jeffrey Morgan
e863066144
clean up privategpt example
2023-08-11 00:34:52 -07:00
Jeffrey Morgan
89a92477ad
fix README.md for privategpt example
2023-08-11 00:26:33 -07:00
Jeffrey Morgan
5cda9cdd13
add instructions to privategpt example to try another model
2023-08-11 00:23:31 -07:00
Jeffrey Morgan
e5914eb320
add venv instructions to privategpt example
2023-08-11 00:20:22 -07:00
Jeffrey Morgan
ab78f48ff8
more setup instructions for privategpt example
2023-08-11 00:19:25 -07:00
Jeffrey Morgan
b1c88eb978
add privategpt example
2023-08-11 00:18:13 -07:00
Jeffrey Morgan
efae43f932
update langchain examples
2023-08-10 23:35:19 -07:00
Matt Williams
d3ee1329e9
Add tutorials for using Langchain with ollama
...
Signed-off-by: Matt Williams <m@technovangelist.com >
2023-08-10 21:27:37 -07:00
Jeffrey Morgan
700c719422
remove document example for now
2023-08-10 20:25:01 -07:00
Jeffrey Morgan
55aa4aaf0f
add langchain examples
2023-08-10 20:23:50 -07:00
Jeffrey Morgan
820f95c4c4
add example
2023-08-10 20:13:47 -07:00
Michael Yang
3a05d3def7
Merge pull request #326 from asarturas/document-num-gqa-parameter
...
Document num_gqa parameter
2023-08-10 18:18:38 -07:00
Michael Yang
edac9c2446
Merge pull request #325 from jmorganca/mxyng/typo
...
s/parmeter/parameter/
2023-08-10 17:30:02 -07:00
Arturas Smorgun
d9c2687fd0
document default num_gqa to 1, as it's applicable to most models
...
Co-authored-by: Michael Yang <mxyng@pm.me >
2023-08-11 01:29:40 +01:00
Michael Yang
6517bcc53c
Merge pull request #290 from jmorganca/add-adapter-layers
...
implement loading ggml lora adapters through the modelfile
2023-08-10 17:23:01 -07:00
Michael Yang
4f54f25b66
Merge pull request #272 from jmorganca/decode-ggml-2
...
Decode ggml 2: Use decoded values
2023-08-10 17:22:48 -07:00
Michael Yang
6a6828bddf
Merge pull request #167 from jmorganca/decode-ggml
...
partial decode ggml bin for more info
2023-08-10 17:22:40 -07:00
Arturas Smorgun
c0e7a3b90e
Document num_gqa parameter
...
It is required to be adjusted for some models, see https://github.com/jmorganca/ollama/issues/320 for more context
2023-08-11 00:58:09 +01:00
Michael Yang
f27bc261cf
s/parmeter/parameter/
2023-08-10 16:26:06 -07:00
Michael Yang
21e6197c0b
Merge pull request #322 from jmorganca/no-comment-warning
...
no warning on comments
2023-08-10 16:24:41 -07:00
Michael Yang
75d7d681c9
Merge pull request #323 from jmorganca/fix-convert-int
...
fix could not convert int
2023-08-10 16:24:33 -07:00
Michael Yang
81d8d7b73f
fix could not convert int
2023-08-10 16:24:17 -07:00
Michael Yang
5c0de09a07
Merge pull request #321 from jmorganca/fix-parameters
...
length check for parameters
2023-08-10 16:23:10 -07:00
Michael Yang
20bf000e55
no warning on comments
2023-08-10 16:22:38 -07:00
Michael Yang
40d0c4a1dc
length check for parameters
2023-08-10 16:09:02 -07:00
Jeffrey Morgan
be889b2f81
add docs for /api/embeddings
2023-08-10 15:56:59 -07:00
Jeffrey Morgan
7e26a8df31
cmd: use environment variables for server options
2023-08-10 14:17:53 -07:00
Jeffrey Morgan
4ab1da38ba
guard around id()
2023-08-10 14:11:54 -07:00
Patrick Devine
be989d89d1
Token auth ( #314 )
2023-08-10 11:34:25 -07:00
Soroush Javadi
bea683e3bf
cmd: check GetBlobsPath error ( #317 )
...
The error returned by `server.GetBlobsPath` in `showLayer` was never
checked. Check the error and return if not nil. Also, make newlines at
the end of error messages consistent and fix a typo.
2023-08-10 09:57:49 -07:00
Jeffrey Morgan
178237d37f
tweak README.md
2023-08-10 09:54:03 -07:00
Jeffrey Morgan
76a678af34
app: dont always show installer window on top now that it lives in the dock
2023-08-10 09:53:46 -07:00
Jeffrey Morgan
f65169b13e
clean up cli flags
2023-08-10 09:28:56 -07:00
Jeffrey Morgan
040a5b9750
clean up cli flags
2023-08-10 09:27:03 -07:00
Michael Yang
37c9a8eea9
add lora docs
2023-08-10 09:23:40 -07:00
Michael Yang
6de5d032e1
implement loading ggml lora adapters through the modelfile
2023-08-10 09:23:39 -07:00
Michael Yang
d791df75dd
check memory requirements before loading
2023-08-10 09:23:11 -07:00
Michael Yang
020a3b3530
disable gpu for q5_0, q5_1, q8_0 quants
2023-08-10 09:23:11 -07:00
Michael Yang
fccf8d179f
partial decode ggml bin for more info
2023-08-10 09:23:10 -07:00
Bruce MacDonald
5b5cc9c9f1
embeddings endpoint
2023-08-10 11:49:55 -04:00
Bruce MacDonald
4b3507f036
embeddings endpoint
...
Co-Authored-By: Jeffrey Morgan <jmorganca@gmail.com >
2023-08-10 11:45:57 -04:00
Jun Tian
5ebce03c77
Add an example on multiline input ( #311 )
2023-08-10 08:22:28 -07:00
Bruce MacDonald
5e25f801ed
fix a typo in the tweetwriter example Modelfile
2023-08-10 10:19:53 -04:00
Bruce MacDonald
8e1234b758
fix embeddings invalid values
2023-08-10 10:17:00 -04:00
Soroush Javadi
10885986b8
fix a typo in the tweetwriter example Modelfile
2023-08-10 15:12:48 +03:30
Bruce MacDonald
984c9c628c
fix embeddings invalid values
2023-08-09 16:50:53 -04:00
Bruce MacDonald
43c40c500e
add embed docs for modelfile
2023-08-09 16:14:58 -04:00
Bruce MacDonald
c4861360ec
remove embed docs
2023-08-09 16:14:19 -04:00
Bruce MacDonald
9738ef85db
allow for concurrent pulls of the same files
2023-08-09 11:35:24 -04:00
Bruce MacDonald
ac971c56d1
Update images.go
2023-08-09 11:31:54 -04:00
Bruce MacDonald
8228d166ce
pr comments
2023-08-09 11:31:54 -04:00
Bruce MacDonald
907e6c56b3
unlock downloadu in case or requestDownload err
2023-08-09 11:31:54 -04:00
Bruce MacDonald
868e3b31c7
allow for concurrent pulls of the same files
2023-08-09 11:31:54 -04:00
Bruce MacDonald
09d8bf6730
fix build errors
2023-08-09 10:45:57 -04:00
Bruce MacDonald
7a5f3616fd
embed text document in modelfile
2023-08-09 10:26:19 -04:00
Jeffrey Morgan
cff002b824
use content type application/x-ndjson for streaming responses
2023-08-08 21:38:10 -07:00
Jeffrey Morgan
55cf5021f0
update langchain example to include python
2023-08-08 21:03:10 -07:00
Jeffrey Morgan
f58caa5ab5
update README.md
2023-08-08 15:50:23 -07:00
Jeffrey Morgan
82df473ec9
use note syntax in README.md
2023-08-08 15:49:50 -07:00
Jeffrey Morgan
e184c1d035
Link to api.md in README.md
2023-08-08 15:48:47 -07:00
Jeffrey Morgan
371d4e5df3
docs: fix invalid json in api.md
2023-08-08 15:46:05 -07:00
Jeffrey Morgan
1f78e409b4
docs: format with prettier
2023-08-08 15:41:48 -07:00
Jeffrey Morgan
34a88cd776
docs: update api.md formatting
2023-08-08 15:41:19 -07:00
Bruce MacDonald
1bee2347be
pr feedback
...
- defer closing llm on embedding
- do not override licenses
- remove debugging print line
- reformat model file docs
2023-08-08 17:01:37 -04:00
Jeffrey Morgan
a027a7dd65
add 0.0.0.0 as an allowed origin by default
...
Fixes #282
2023-08-08 13:39:50 -07:00
Jeffrey Morgan
22986ccb38
add llama2:70b to the model library list
2023-08-08 13:08:05 -07:00
Bruce MacDonald
884d78ceb3
allow embedding from model binary
2023-08-08 14:38:57 -04:00
Bruce MacDonald
3ceac05108
Add embedding docs
2023-08-08 14:04:11 -04:00
Bruce MacDonald
21ddcaa1f1
pr comments
...
- default to embeddings enabled
- move embedding logic for loaded model to request
- allow embedding full directory
- close llm on reload
2023-08-08 13:49:37 -04:00
Michael Yang
f2074ed4c0
Merge pull request #306 from jmorganca/default-keep-system
...
automatically set num_keep if num_keep < 0
2023-08-08 09:25:34 -07:00
Bruce MacDonald
a6f6d18f83
embed text document in modelfile
2023-08-08 11:27:17 -04:00
Bruce MacDonald
34a13a9d05
pass flags to serve to allow setting allowed-origins + host and port
2023-08-08 10:41:42 -04:00
Jeffrey Morgan
8713ac23a8
allow overriding template and system in /api/generate
...
Fixes #297
Fixes #296
2023-08-08 00:55:34 -04:00
Jeffrey Morgan
5eb712f962
trim whitespace before checking stop conditions
...
Fixes #295
2023-08-08 00:29:19 -04:00
Michael Yang
4dc5b117dd
automatically set num_keep if num_keep < 0
...
num_keep defines how many tokens to keep in the context when truncating
inputs. if left to its default value of -1, the server will calculate
num_keep to be the left of the system instructions
2023-08-07 16:19:12 -07:00
Matt Williams
931a5f3cb9
Merge pull request #304 from jmorganca/matt/docs
...
missed a backtick
2023-08-07 15:14:06 -07:00
Jeffrey Morgan
639288bf2b
make ollama binary executable on build
2023-08-07 18:10:37 -04:00
Jeffrey Morgan
d112c15d58
remove old library and web directories
2023-08-07 18:09:24 -04:00
Matt Williams
1267895e44
missed a backtick
...
Signed-off-by: Matt Williams <m@technovangelist.com >
2023-08-07 13:53:49 -07:00
Matt Williams
089d03bc8d
Merge pull request #289 from jmorganca/docs
...
First draft of API Docs
2023-08-07 13:46:22 -07:00
Matt Williams
e37f4c4f42
DockerIt example
...
Signed-off-by: Matt Williams <m@technovangelist.com >
2023-08-07 13:45:22 -07:00
Michael Yang
ab3ced9d32
Merge pull request #276 from jmorganca/rope-freq
...
configurable rope frequency parameters
2023-08-07 13:39:38 -07:00
Matt Williams
0c52b4509b
get rid of namespace and site
...
Signed-off-by: Matt Williams <m@technovangelist.com >
2023-08-07 13:27:58 -07:00
Matt Williams
13aace3d34
clarify some more
...
Signed-off-by: Matt Williams <m@technovangelist.com >
2023-08-07 13:21:54 -07:00
Matt Williams
2b3bb41598
model name format added
...
Signed-off-by: Matt Williams <m@technovangelist.com >
2023-08-07 13:17:16 -07:00
cmiller01
93492f1e18
correct precedence of serve params (args over env over default)
2023-08-07 19:55:20 +00:00
Michael Chiang
54ba3e2ceb
langchain JS integration ( #302 )
...
langchain JS integration
2023-08-07 12:21:36 -04:00
Matt Williams
4904cd8bcd
update simpler code samples
...
Signed-off-by: Matt Williams <m@technovangelist.com >
2023-08-07 07:40:38 -07:00
Matt Williams
8a45359ec6
Update docs/api.md
...
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com >
2023-08-07 07:33:05 -07:00
cmiller01
fb593b7bfc
pass flags to serve to allow setting allowed-origins + host and port
...
* resolves: https://github.com/jmorganca/ollama/issues/300 and
https://github.com/jmorganca/ollama/issues/282
* example usage:
```
ollama serve --port 9999 --allowed-origins "http://foo.example.com,http://192.0.0.1 "
```
2023-08-07 03:34:37 +00:00
Matt Williams
2544b8afa1
update as per Mike's comments
...
Signed-off-by: Matt Williams <m@technovangelist.com >
2023-08-04 17:42:24 -07:00
Matt Williams
ac1b04f271
Update docs/api.md
...
Co-authored-by: Michael Yang <mxyng@pm.me >
2023-08-04 17:40:52 -07:00
Matt Williams
123fdeb919
Update docs/api.md
...
Co-authored-by: Michael Yang <mxyng@pm.me >
2023-08-04 17:38:52 -07:00
Matt Williams
5c82bf95d1
Update docs/api.md
...
Co-authored-by: Michael Yang <mxyng@pm.me >
2023-08-04 17:12:24 -07:00
Matt Williams
38a9b1618c
missed some quotes
...
Signed-off-by: Matt Williams <m@technovangelist.com >
2023-08-04 16:09:07 -07:00
Matt Williams
c18be72a3b
complete 1st draft of api docs
...
Signed-off-by: Matt Williams <m@technovangelist.com >
2023-08-04 16:08:11 -07:00
Matt Williams
a101fe51a7
clean up
...
Signed-off-by: Matt Williams <m@technovangelist.com >
2023-08-04 12:56:41 -07:00
Bruce MacDonald
06fc48ad66
Update README.md ( #285 )
...
Ollama now supports Intel Macs
2023-08-04 15:45:55 -04:00
Matt Williams
d93e2f9210
fleshing out response
...
Signed-off-by: Matt Williams <m@technovangelist.com >
2023-08-04 12:38:58 -07:00
Matt Williams
31edc829fc
continuing
...
Signed-off-by: Matt Williams <m@technovangelist.com >
2023-08-04 12:30:23 -07:00
Matt Williams
b31104768c
filling out generate
...
Signed-off-by: Matt Williams <m@technovangelist.com >
2023-08-04 12:27:47 -07:00
Matt Williams
b662d9fd8c
starting to build out some docs
...
Signed-off-by: Matt Williams <m@technovangelist.com >
2023-08-04 11:55:00 -07:00
Matt Williams
da36196d79
Update the modelfile
...
needed to override the system prompt
from orca and make it easier for a downstream
user to define their system prompt
Signed-off-by: Matt Williams <m@technovangelist.com >
2023-08-04 08:11:24 -07:00
Michael Yang
b9f4d67554
configurable rope frequency parameters
2023-08-03 22:11:58 -07:00
Matt Williams
42903973b7
Added an example to generate a list of 10 tweets
...
Signed-off-by: Matt Williams <m@technovangelist.com >
2023-08-03 17:26:05 -07:00
Matt Williams
8f2df948ab
Create a sentiments example
...
Signed-off-by: Matt Williams <m@technovangelist.com >
2023-08-03 16:38:31 -07:00
Jeffrey Morgan
e3fb1fd3f1
server: compare options correctly
2023-08-03 15:55:40 -04:00
Michael Yang
29b897f525
Merge pull request #253 from jmorganca/upload
...
use a pipe to push to registry with progress
2023-08-03 12:11:23 -07:00
Michael Yang
85aeb42869
Merge pull request #270 from jmorganca/update-llama-cpp
...
update llama.cpp
2023-08-03 12:09:00 -07:00
Michael Yang
c5bcf32823
update llama.cpp
2023-08-03 11:50:24 -07:00
Michael Yang
a71ff3f6a2
use a pipe to push to registry with progress
...
switch to a monolithic upload instead of a chunk upload through a pipe
to report progress
2023-08-03 10:37:13 -07:00
Michael Chiang
f0b365a478
Merge pull request #268 from jmorganca/mchiang0610-patch-2
...
Update README.md
2023-08-03 11:23:31 -04:00
Michael Chiang
df8048fecd
Update README.md
2023-08-03 11:22:57 -04:00
Michael Yang
da2459d519
Update README.md ( #265 )
2023-08-02 22:38:32 -04:00
Bruce MacDonald
bd6d741d87
tell users to check the server error logs
2023-08-02 17:08:11 -04:00
Bruce MacDonald
8b1e791820
allow specifying zero values in modelfile
2023-08-02 17:07:53 -04:00
Jeffrey Morgan
03cff3a225
server: reset digest at end of generate
2023-08-02 16:15:44 -04:00
Michael Yang
cc509a994e
Merge pull request #260 from jmorganca/embed-ggml-metal
...
override ggml-metal if the file is different
2023-08-02 13:01:46 -07:00
Michael Yang
0e79e52ddd
override ggml-metal if the file is different
2023-08-02 12:50:30 -07:00
Jeffrey Morgan
6fbb380076
hide dock icon if window closes
2023-08-02 11:05:34 -04:00
Bruce MacDonald
8f8b6288ac
check server is running before running command
2023-08-02 10:51:23 -04:00
Michael Yang
b98096389d
Merge pull request #255 from jmorganca/update-llama-cpp
...
Update llama cpp
2023-08-01 17:18:33 -07:00
Michael Yang
74a5f7e698
no gpu for 70B model
2023-08-01 17:12:50 -07:00
Michael Yang
7a1c3e62dc
update llama.cpp
2023-08-01 16:54:01 -07:00
Jeffrey Morgan
da52f5bfdd
run npm install on build
2023-08-01 17:41:25 -04:00
Bruce MacDonald
50e87c6691
read from os executable
2023-08-01 16:01:55 -04:00
Gerd
e4a970ece1
Add model update to README.md ( #252 )
2023-08-01 15:06:33 -04:00
Jeffrey Morgan
4ca43a694c
remove newlines between list items in README.md
2023-08-01 15:05:39 -04:00
Bruce MacDonald
765994362c
use head to check heartbeat
2023-08-01 14:50:38 -04:00
Bruce MacDonald
40a25bf8c3
pr comments
2023-08-01 13:48:48 -04:00
Bruce MacDonald
1c5a8770ee
read runner parameter options from map
...
- read runner options from map to see what was specified explicitly and overwrite zero values
2023-08-01 13:38:19 -04:00
Bruce MacDonald
daa0d1de7a
allow specifying zero values in modelfile
2023-08-01 13:37:50 -04:00
Jeffrey Morgan
58daeb962a
add llama2-uncensored to model list
2023-08-01 11:25:01 -04:00
Jeffrey Morgan
528bafa585
cache loaded model
2023-08-01 11:24:18 -04:00
Michael Chiang
81f75696e2
Merge pull request #251 from jmorganca/mchiang0610-patch-2
...
add examples of projects using Ollama
2023-08-01 11:16:14 -04:00
Michael Chiang
8bdcf894bd
Update README.md
...
add examples of projects using Ollama
2023-08-01 11:14:54 -04:00
Michael Chiang
fe530423a5
Merge pull request #249 from sestinj/main
...
Add "Awesome projects built with Ollama" section to README, including Continue
2023-08-01 08:07:50 -07:00
Michael Yang
05e390205b
Merge pull request #250 from jmorganca/fixes
...
Fixes
2023-07-31 21:47:42 -07:00
Michael Yang
872011630a
fix license
2023-07-31 21:46:48 -07:00
Michael Yang
203fdbc4b8
check err
2023-07-31 21:46:48 -07:00
Michael Yang
70e0ab6b3d
remove unnecessary fmt.Sprintf
2023-07-31 21:46:47 -07:00
Michael Yang
319f078dd9
remove -Werror
...
there are compile warnings on Linux which -Werror elevates to errors,
preventing compile
2023-07-31 21:45:56 -07:00
Jeffrey Morgan
9968153729
fix Go warnings
2023-07-31 21:37:40 -04:00
Jeffrey Morgan
7da249fcc1
only build metal for darwin,arm target
2023-07-31 21:35:23 -04:00
Bruce MacDonald
f529626c6c
log prediction failures
2023-07-31 17:39:20 -04:00
Bruce MacDonald
36d6081ed1
find symlink of mac app
2023-07-31 17:38:10 -04:00
Nate Sesti
aadedda486
Update README.md
2023-07-31 13:59:39 -07:00
Bruce MacDonald
671eec6da9
log prediction failures
2023-07-31 16:46:37 -04:00
Bruce MacDonald
e72fe7945f
check server is running before running command
2023-07-31 16:25:57 -04:00
Bruce MacDonald
d1c098b038
tell users to check the server error logs
2023-07-31 11:49:33 -04:00
Jeffrey Morgan
90ba0b80c7
fix build_darwin.sh
2023-07-29 22:36:59 -04:00
Patrick Devine
39bb25d5f6
allow multiline text using three double-quotes ( #239 )
2023-07-29 13:35:23 -07:00
Michael Yang
eadee46840
Merge pull request #236 from jmorganca/check-os-walk
...
check os.Walk err
2023-07-28 14:14:21 -07:00
Jeffrey Morgan
2e2e624d21
app: use notarytool for notarizing
2023-07-28 12:23:56 -07:00
Jeffrey Morgan
ed832ce3b7
darwin build script
2023-07-28 12:23:27 -07:00
Michael Yang
227da16909
Merge pull request #235 from jmorganca/rm-ioutil
...
remove io/ioutil import
2023-07-28 12:19:06 -07:00
Michael Yang
bd58528fbd
check os.Walk err
2023-07-28 12:15:31 -07:00
Michael Yang
c5e447a359
remove io/ioutil import
...
ioutil is deprecated
2023-07-28 12:06:03 -07:00
Michael Yang
fc40a4f166
Merge pull request #234 from jmorganca/fix-parse-license
...
use max scan token size to hold large objects
2023-07-28 12:03:51 -07:00
Michael Yang
9c7f30d31c
use max scan token size to hold large objects
2023-07-28 11:43:31 -07:00
Bruce MacDonald
6ed3ec0cb3
Allow specifying stop conditions in Modelfile
2023-07-28 12:31:08 -04:00
Bruce MacDonald
47bda0b860
add stop to docs
2023-07-28 12:30:27 -04:00
Jeffrey Morgan
c75cafdb58
build for universal architecture on macos
2023-07-28 12:18:11 -04:00
Bruce MacDonald
f5cbcb08e6
specify stop params separately
2023-07-28 11:29:00 -04:00
Jeffrey Morgan
67b6f8ba86
add ggml-metal.metal to .gitignore
2023-07-28 11:04:21 -04:00
Bruce MacDonald
184ad8f057
allow specifying stop conditions in modelfile
2023-07-28 11:02:04 -04:00
Jeffrey Morgan
822a0e36eb
lower batch size to 512
2023-07-28 10:56:21 -04:00
Jeffrey Morgan
18b6b601ad
app: cleanup README.md
2023-07-28 10:51:41 -04:00
Bruce MacDonald
0345070dfa
update model file docs
2023-07-28 10:33:52 -04:00
Jeffrey Morgan
dffc8b6e09
update llama.cpp to d91f3f0
2023-07-28 08:07:48 -04:00
Jeffrey Morgan
0871083776
app: fix tray icon color scheme in dark mode
2023-07-28 07:03:46 -04:00
Michael Yang
e5b26c3aa2
Merge pull request #221 from jmorganca/embed-metal
...
embed ggml-metal.metal
2023-07-27 17:24:41 -07:00
Michael Yang
3549676678
embed ggml-metal.metal
2023-07-27 17:23:29 -07:00
Michael Yang
8fa477fadb
Merge pull request #225 from jmorganca/stop-conditions
...
add stop conditions
2023-07-27 17:20:56 -07:00
Michael Yang
fadf75f99d
add stop conditions
2023-07-27 17:00:47 -07:00
Patrick Devine
01d155c969
show system/template/license layers from cmd prompt ( #223 )
2023-07-27 16:58:40 -07:00
Michael Yang
5685c16d4e
Merge pull request #211 from jmorganca/update-llama-cpp
...
update llama.cpp
2023-07-27 16:57:03 -07:00
Michael Yang
db77dfe01f
Merge pull request #102 from jmorganca/session-id
...
Session
2023-07-27 16:46:29 -07:00
Michael Yang
ad3a7d0e2c
add NumGQA
2023-07-27 14:05:11 -07:00
Michael Yang
18ffeeec45
update llama.cpp
2023-07-27 14:05:11 -07:00
Jeffrey Morgan
688661ab9b
increase default batch size to 1024
2023-07-27 16:51:01 -04:00
Michael Chiang
36ad90e8e3
Merge pull request #231 from jmorganca/mchiang0610-discord
...
Update discord invite link
2023-07-27 15:43:52 -04:00
Michael Chiang
6fff59c637
Update discord invite link
...
Update discord invite link
2023-07-27 15:43:15 -04:00
Bruce MacDonald
fee7687cf3
Update modelfile.md
2023-07-27 15:15:10 -04:00
Bruce MacDonald
d3bfb4889c
Update README.md
2023-07-27 15:13:50 -04:00
Bruce MacDonald
1ac38ec89c
improve modelfile docs
2023-07-27 15:13:04 -04:00
Michael Yang
1ad8266473
Merge pull request #226 from jmorganca/fix-modelfile-quotes
...
refactor scan multiline for reuse
2023-07-27 11:45:41 -07:00
Michael Yang
f5ac8ddfb4
refactor scan multiline for reuse
2023-07-27 11:30:51 -07:00
Michael Yang
cca61181cb
sample metrics
2023-07-27 09:31:44 -07:00
Michael Yang
c490416189
lock on llm.lock(); decrease batch size
2023-07-27 09:31:44 -07:00
Michael Yang
f62a882760
add session expiration
2023-07-27 09:31:44 -07:00
Michael Yang
3003fc03fc
update predict code
2023-07-27 09:31:44 -07:00
Michael Yang
32aec66e6a
add load duration
2023-07-27 09:31:44 -07:00
Michael Yang
35af37a2cb
session id
2023-07-27 09:31:44 -07:00
Jeffrey Morgan
dbb3174cbc
app: fix #218 and keep dock open on install
2023-07-27 10:53:38 -04:00
Jeffrey Morgan
31673d26d0
app: quit other instance when starting
2023-07-27 00:57:25 -04:00
Jeffrey Morgan
8ba0f328af
clobber release artifacts
2023-07-26 18:58:28 -04:00
Jeffrey Morgan
d0e934b497
app: tray cleanup
2023-07-26 14:24:56 -04:00
Jeffrey Morgan
e751e47d70
app: remove dialog, icons for updates
2023-07-26 14:04:36 -04:00
Jeffrey Morgan
19d0f2b4cc
publish as pre-release first
2023-07-26 10:48:49 -04:00
Jeffrey Morgan
c48f07f821
app: dont advance on error
2023-07-26 10:46:43 -04:00
Jeffrey Morgan
dc642aa07d
web: skip pre-releases
2023-07-25 17:11:57 -04:00
Bruce MacDonald
f1ff892fdd
pull model on make if not present locally
2023-07-25 16:53:01 -04:00
Jeffrey Morgan
3f2a100465
app: log app errors to console
2023-07-25 15:42:04 -04:00
Michael Yang
95397416f3
Merge pull request #212 from jmorganca/fix-multiline-parsing
...
fix multiline string
2023-07-25 11:53:51 -07:00
Michael Yang
8a86aae019
Merge pull request #209 from jmorganca/k-quants
...
enable k quants
2023-07-25 11:53:29 -07:00
Michael Yang
24c2c77057
fix multiline string
...
the data needs to remove the multiline quotes but include the command:
e.g.
TEMPLATE """
my template values
"""
should be
TEMPLATE
my template values
after scanning
2023-07-25 11:51:43 -07:00
Michael Yang
5614984f06
Merge pull request #189 from Mohit-Gaur/main
...
Improve command parsing and multiline string handling
2023-07-25 11:28:10 -07:00
Bruce MacDonald
4c1caa3733
download models when creating from modelfile
2023-07-25 14:25:13 -04:00
Bruce MacDonald
12ab8f8f5f
Revert "pull model on make if not present locally"
...
This reverts commit 360a10ace391a674de60aa7b9b8cb65e8074027c.
2023-07-25 14:18:46 -04:00
Bruce MacDonald
8ebbd12f21
pull model on make if not present locally
2023-07-25 14:18:46 -04:00
Eva Ho
07971759fa
fix typo
2023-07-25 13:30:52 -04:00
Mohit Gaur
f5f79049c2
Incorporate code review improvements
2023-07-25 22:52:23 +05:30
Michael Yang
726bc647b2
enable k quants
2023-07-25 08:39:58 -07:00
Bruce MacDonald
af9039a167
better error message when model not found on pull
2023-07-25 10:30:48 -04:00
Bruce MacDonald
07ed69bc37
remove reduandant err var
2023-07-25 10:30:14 -04:00
Michael Yang
0deb3767fc
Merge pull request #205 from jmorganca/accelerate
...
enable accelerate
2023-07-24 20:06:05 -07:00
Michael Yang
cb55fa9270
enable accelerate
2023-07-24 17:14:45 -07:00
Michael Yang
93bc9f17a1
Merge pull request #192 from jmorganca/update-development.md
...
update development.md
2023-07-24 16:13:22 -07:00
Bruce MacDonald
536028c35a
better error message when model not found on pull
2023-07-24 17:48:17 -04:00
Michael Chiang
aedf3d1f38
Merge pull request #196 from isbkch/main
...
add devops-engineer example
2023-07-24 17:10:22 -04:00
iLyas Bakouch
91d927abc5
Update Modelfile
2023-07-24 16:43:11 -04:00
iLyas Bakouch
ba8df10a43
Update examples/devops-engineer/Modelfile
...
Co-authored-by: Jeffrey Morgan <251292+jmorganca@users.noreply.github.com >
2023-07-24 16:42:08 -04:00
Bruce MacDonald
abf614804b
remove file on digest mismatch
2023-07-24 21:59:12 +02:00
Bruce MacDonald
a0dbbb23c4
truncate file size on resume
2023-07-24 21:58:32 +02:00
Bruce MacDonald
0fd6278446
do not panic server if file cannot be opened
2023-07-24 15:24:34 -04:00
Bruce MacDonald
29fe07f0cc
make response errors unique for error trace
2023-07-24 21:21:18 +02:00
Bruce MacDonald
abfc73d31e
make response errors unique for error trace
2023-07-24 15:04:21 -04:00
Bruce MacDonald
5a5ca8e7ff
remove file on digest mismatch
2023-07-24 14:53:01 -04:00
Ilyas Bakouch
f24a6f5988
add devops-engineer example
2023-07-24 14:44:44 -04:00
Bruce MacDonald
fdbef6c95e
truncate file size on resume
2023-07-24 14:36:19 -04:00
Michael Yang
24e43e3212
update development.md
2023-07-24 09:43:57 -07:00
Patrick Devine
4cb42ca55e
add copy command ( #191 )
2023-07-24 11:27:28 -04:00
Michael Yang
ec5e22ac85
Merge pull request #174 from jmorganca/tokenize
...
allocate a large enough tokens slice
2023-07-24 08:22:51 -07:00
Mohit Gaur
ed89da92b4
Improve command parsing and multiline string handling
2023-07-24 18:11:13 +05:30
Jeffrey Morgan
a3297fed41
add /api/create docs to readme
2023-07-23 18:01:05 -04:00
Patrick Devine
88c55199f8
change push to chunked uploads from monolithic ( #179 )
2023-07-22 17:31:26 -07:00
hoyyeva
c448443813
Merge pull request #164 from jmorganca/restart-server
...
restart server more gracefully
2023-07-22 18:19:22 -04:00
Michael Yang
efacd45fc5
Merge pull request #175 from jk1jk/main
...
Update .gitignore
2023-07-22 09:40:37 -07:00
Michael Yang
fa522695c4
Merge pull request #178 from jmorganca/gin-cors
...
use gin-contrib/cors middleware
2023-07-22 09:40:01 -07:00
Michael Yang
8609db77ea
use gin-contrib/cors middleware
2023-07-22 09:39:08 -07:00
Ikko Eltociear Ashimine
65d93a86b2
Update modelfile.md ( #177 )
...
fix markdown.
2023-07-22 08:19:30 -07:00
jk1jk
e6c427ce4d
Update .gitignore
2023-07-22 17:00:52 +03:00
Michael Yang
b71c67b6ba
allocate a large enough tokens slice
2023-07-21 23:05:15 -07:00
Patrick Devine
6d6b0d3321
change error handler behavior and fix error when a model isn't found ( #173 )
2023-07-21 23:02:12 -07:00
Michael Yang
37324a0a00
Merge pull request #172 from jmorganca/set-vars-first
...
fix vars.First
2023-07-21 20:55:06 -07:00
Michael Yang
20a5d99f77
fix vars.First
2023-07-21 20:45:32 -07:00
Patrick Devine
3b43cc019a
fix extended tag names ( #171 )
2023-07-21 20:27:25 -07:00
Patrick Devine
b8421dce3d
get the proper path for blobs to delete ( #168 )
2023-07-21 17:30:40 -07:00
Patrick Devine
9f6e97865c
allow pushing/pulling to insecure registries ( #157 )
2023-07-21 15:42:19 -07:00
Eva Ho
9657314ae2
address comment
2023-07-21 17:29:07 -04:00
Eva Ho
3f7d2336c7
add prettier and address comments
2023-07-21 17:10:05 -04:00
Eva Ho
e0a73d7fbe
address comment
2023-07-21 16:53:56 -04:00
hoyyeva
b08c4ca2bd
Update app/src/index.ts
...
Co-authored-by: Jeffrey Morgan <251292+jmorganca@users.noreply.github.com >
2023-07-21 16:53:56 -04:00
Eva Ho
734892f1e2
address comment
2023-07-21 16:53:56 -04:00
Eva Ho
d2bfaeac63
format code
2023-07-21 16:53:56 -04:00
Eva Ho
0768b1b907
restart server with condition and timeout
2023-07-21 16:53:56 -04:00
Bruce MacDonald
f5f0da06d9
Merge pull request #166 from jmorganca/brucemacd/dev-cgo
2023-07-21 22:48:10 +02:00
Bruce MacDonald
52f04e39f2
Note that CGO must be enabled in dev docs
2023-07-21 22:36:36 +02:00
Jeffrey Morgan
3c8f4c03d7
web: tweak homepage text
2023-07-21 09:57:57 -07:00
Bruce MacDonald
7ba1308595
Merge pull request #147 from jmorganca/brucemacd/cli-err-display
...
Improve CLI error display
2023-07-21 16:10:19 +02:00
Jeffrey Morgan
91cd54016c
add basic REST api documentation
2023-07-21 00:47:17 -07:00
Patrick Devine
e7a393de54
add rm command for models ( #151 )
2023-07-20 16:09:23 -07:00
Jeffrey Morgan
8454f298ac
fix example Modelfiles
2023-07-20 15:46:32 -07:00
Patrick Devine
a3badaf103
add ls alias ( #152 )
2023-07-20 15:28:27 -07:00
Michael Yang
50e8e5bdbe
Merge pull request #148 from jmorganca/more-llama-files
...
add llama.cpp mpi, opencl files
2023-07-20 14:26:46 -07:00
Michael Yang
8526e1f5f1
add llama.cpp mpi, opencl files
2023-07-20 14:19:55 -07:00
Michael Yang
0cfdbb95cc
Merge pull request #146 from jmorganca/fix-windows-pull
...
windows: fix model pulling
2023-07-20 13:41:54 -07:00
Michael Yang
6cea2061ec
windows: fix model pulling
2023-07-20 12:35:04 -07:00
Michael Yang
2832801c2a
Merge pull request #91 from jmorganca/fix-stream-errors
...
fix stream errors
2023-07-20 12:21:59 -07:00
Jeffrey Morgan
23a37dc466
clean up README.md
2023-07-20 12:21:36 -07:00
Michael Yang
992892866b
Merge pull request #145 from jmorganca/verify-digest
...
verify blob digest
2023-07-20 12:14:21 -07:00
Michael Yang
dde880290c
Merge pull request #131 from jmorganca/update-llama-cpp
...
update llama.cpp to e782c9e735f93ab4767ffc37462c523b73a17ddc
2023-07-20 12:14:10 -07:00
Michael Yang
1f27d7f1b8
fix stream errors
2023-07-20 12:12:08 -07:00
Bruce MacDonald
00aaa05901
remove unused code
2023-07-20 20:57:30 +02:00
Michael Yang
a83eaa7a9f
update llama.cpp to e782c9e735f93ab4767ffc37462c523b73a17ddc
2023-07-20 11:55:56 -07:00
Michael Yang
5156e48c2a
add script to update llama.cpp
2023-07-20 11:54:59 -07:00
Michael Yang
bf198c3918
verify blob digest
2023-07-20 11:53:57 -07:00
Bruce MacDonald
09dc6273e3
suppress error when running list before pulling image
2023-07-20 20:53:09 +02:00
Bruce MacDonald
ebaa33ac28
display gin api errors in cli
2023-07-20 20:45:12 +02:00
Bruce MacDonald
3ec4ebc562
remove unused code
2023-07-20 20:18:00 +02:00
Jeffrey Morgan
6a19724d5f
remove colon from library modelfiles
2023-07-20 09:51:30 -07:00
Jeffrey Morgan
924ce739f9
documentation on the model format
2023-07-20 09:03:41 -07:00
Michael Chiang
e1973e6780
Update icon ( #139 )
2023-07-20 08:55:20 -07:00
Jeffrey Morgan
f1b08ef40e
set temperature on README.md example
2023-07-20 08:17:09 -07:00
Jeffrey Morgan
31f0cb7742
new Modelfile syntax
2023-07-20 07:52:24 -07:00
Jeffrey Morgan
e4b2ccfb23
web: clean up remaining models.json usage
2023-07-20 07:51:46 -07:00
Bruce MacDonald
a3d7bb0a30
Merge pull request #136 from jmorganca/brucemacd/remove-models
...
Delete models.json
2023-07-20 16:40:46 +02:00
Bruce MacDonald
77e49f3822
Delete models.json
2023-07-20 16:32:50 +02:00
Jeffrey Morgan
8945b25484
new modelfile syntax on branch
2023-07-20 02:24:21 -07:00
Jeffrey Morgan
99ccf0c5d3
fix broken link in README.md
2023-07-20 02:15:11 -07:00
Jeffrey Morgan
d59b164fa2
add prompt back to parser
2023-07-20 01:13:30 -07:00
Michael Yang
55b5f5dc34
ctrl+c on empty line exits ( #135 )
2023-07-20 00:53:08 -07:00
Jeffrey Morgan
3b135ac963
parser: fix case where multi line string termination error wouldnt show
2023-07-20 00:43:22 -07:00
Jeffrey Morgan
e6bae8d916
parser: keep seeking until eof
2023-07-20 00:37:52 -07:00
Jeffrey Morgan
d9f54300c3
library: add echo for verify progress
2023-07-19 23:58:28 -07:00
Jeffrey Morgan
1511219763
update library modelfiles with new syntax
2023-07-19 23:57:22 -07:00
Jeffrey Morgan
ada0add89b
fix llama library templates
2023-07-19 23:53:40 -07:00
Jeffrey Morgan
75e508e1d6
remove old templates
2023-07-19 23:47:13 -07:00
Michael Yang
6f046dbf18
Update images.go ( #134 )
2023-07-19 23:46:01 -07:00
Jeffrey Morgan
cd820c8bca
move wizard-vicuna to correct location
2023-07-19 23:44:03 -07:00
Jeffrey Morgan
88e755d7fd
Add files for library models
2023-07-19 23:40:37 -07:00
Michael Yang
6984171cfd
Merge pull request #93 from jmorganca/split-prompt
...
separate prompt into template and system
2023-07-19 23:25:33 -07:00
Michael Yang
60b4db6389
add .First
2023-07-19 23:24:32 -07:00
Michael Chiang
7c6ea2a966
fix dangling """
2023-07-19 23:24:32 -07:00
Michael Chiang
c161aef5f9
update example
2023-07-19 23:24:32 -07:00
Michael Chiang
c47786c1b0
Update docs/modelfile.md
...
Co-authored-by: Michael Yang <mxyng@pm.me >
2023-07-19 23:24:32 -07:00
Michael Chiang
df100ce540
Update docs/modelfile.md
...
Co-authored-by: Michael Yang <mxyng@pm.me >
2023-07-19 23:24:32 -07:00
Michael Chiang
5c5948b4e7
clean up my previous empty sentences
2023-07-19 23:24:32 -07:00
Michael Yang
1c72e46e09
update modelfile.md
2023-07-19 23:24:32 -07:00
Michael Yang
ca210ba480
handle vnd.ollama.image.prompt for compat
2023-07-19 23:24:32 -07:00
Michael Yang
df146c41e2
separate prompt into template and system
2023-07-19 23:24:31 -07:00
Jeffrey Morgan
2d305fa99a
allow relative paths in FROM instruction
2023-07-19 21:55:15 -07:00
Patrick Devine
e4d7f3e287
vendor in progress bar and change to bytes instead of bibytes ( #130 )
2023-07-19 17:24:03 -07:00
Jeffrey Morgan
f2044b5838
web: fix newsletter signup
2023-07-19 16:11:56 -07:00