Compare commits

..

1523 Commits

Author SHA1 Message Date
Jeffrey Morgan
2789ed31a7 improve scratch buffer estimates 2024-01-19 13:24:24 -05:00
Jeffrey Morgan
dc88cc3981 use gzip for runner embedding (#2067) 2024-01-19 13:23:03 -05:00
Daniel Hiltgen
62976087c6 Merge pull request #1999 from lainedfles/termux_android_cpu_only
Fix CPU-only build under Android Termux enviornment.
2024-01-18 17:16:53 -08:00
Self Denial
344342abdf Restore dyn_ext_server.c since RTLD_DEEPBIND has been removed 2024-01-18 17:30:42 -07:00
Self Denial
eb76f3e379 Fix CPU-only build under Android Termux enviornment.
Update gpu.go initGPUHandles() to declare gpuHandles variable before
reading it. This resolves an "invalid memory address or nil pointer
dereference" error.

Update dyn_ext_server.c to avoid setting the RTLD_DEEPBIND flag under
__TERMUX__ (Android).
2024-01-18 17:25:33 -07:00
Michael Yang
d017e3d0a6 Merge pull request #2060 from jmorganca/mxyng/fix-show
fix show handler
2024-01-18 16:02:27 -08:00
Michael Yang
aac9ab4db7 fix show handler 2024-01-18 15:36:50 -08:00
Michael Yang
1f5b7ff976 Merge pull request #1932 from jmorganca/mxyng/api-fields
api: add model for all requests
2024-01-18 14:56:51 -08:00
Michael Yang
e299831e2c Merge pull request #1958 from purificant/ci
ci: update setup-go action
2024-01-18 14:53:36 -08:00
Michael Yang
745b5934fa add model to ModelResponse 2024-01-18 14:32:55 -08:00
Michael Yang
a38d88d828 api: add model for all requests
prefer using req.Model and fallback to req.Name
2024-01-18 14:31:37 -08:00
Daniel Hiltgen
abec7f06e5 Merge pull request #2056 from dhiltgen/slog
Mechanical switch from log to slog
2024-01-18 14:27:24 -08:00
Michael Yang
e5da190bac Merge pull request #2020 from jmorganca/mxyng/install-fedora
install: pin fedora to max 37
2024-01-18 14:23:42 -08:00
Daniel Hiltgen
ecbfc0182f Go bump to v1.21 to pick up slog 2024-01-18 14:12:57 -08:00
Daniel Hiltgen
fedd705aea Mechanical switch from log to slog
A few obvious levels were adjusted, but generally everything mapped to "info" level.
2024-01-18 14:12:57 -08:00
Mike Bird
82ee019bfc add open interpreter to list of extensions (#2016) 2024-01-18 13:59:39 -08:00
Sachin Sachdeva
ad9dbc2a04 Haystack Ollama Integration (#2021)
Updated readme with the web link for haystack ollama integration
2024-01-18 13:38:32 -08:00
Daniel Hiltgen
fccdf4c635 Merge pull request #1987 from xyproto/archlinux
Let gpu.go and gen_linux.sh also find CUDA on Arch Linux
2024-01-18 13:32:10 -08:00
Daniel Hiltgen
d450fb1d1e Merge pull request #2055 from dhiltgen/cuda_docs
Refine the linux cuda/rocm developer docs
2024-01-18 12:07:31 -08:00
Daniel Hiltgen
df40b11d03 Merge pull request #2007 from dhiltgen/cpu_fallback
Add multiple CPU variants for Intel Mac
2024-01-18 11:32:29 -08:00
Daniel Hiltgen
9cd20b0ec8 Refine the linux cuda/rocm developer docs 2024-01-18 09:44:44 -08:00
Daniel Hiltgen
b992bf65fc Disable arm64 for test phase
The runners are x86 so we can only run binaries that match.
2024-01-17 19:26:13 -08:00
Daniel Hiltgen
1b249748ab Add multiple CPU variants for Intel Mac
This also refines the build process for the ext_server build.
2024-01-17 15:08:54 -08:00
Alexander F. Rødseth
cbe2adc78a Merge branch 'main' into archlinux 2024-01-17 12:50:11 +01:00
Michael Yang
d5a7353357 Merge pull request #2026 from jmorganca/mxyng/fix-windows
fix: normalize name path before splitting
2024-01-16 16:58:42 -08:00
Michael Yang
96cfb62641 fix: normalize name path before splitting 2024-01-16 16:48:29 -08:00
Daniel Hiltgen
7d00b5d110 Merge pull request #1915 from dhiltgen/bump_llama_with_new_dep
Bump llama.cpp to b1842 and add new cuda lib dep
2024-01-16 13:36:49 -08:00
Daniel Hiltgen
795674dd90 Bump llama.cpp to b1842 and add new cuda lib dep
Upstream llama.cpp has added a new dependency with the
NVIDIA CUDA Driver Libraries (libcuda.so) which is part of the
driver distribution, not the general cuda libraries, and is not
available as an archive, so we can not statically link it.  This may
introduce some additional compatibility challenges which we'll
need to keep an eye on.
2024-01-16 12:53:52 -08:00
Daniel Hiltgen
e282bdccdd Merge pull request #1990 from dhiltgen/ci_mac_cross
Add macos cross-compile CI coverage
2024-01-16 12:31:37 -08:00
Michael Yang
d9bfb2f08f install: pin fedora to max 37
repos for fedora 38 and newer do not exist as of this commit

```
$ dnf config-manager --add-repo https://developer.download.nvidia.com/compute/cuda/repos/fedora38/x86_64/cuda-fedora38.repo
Adding repo from: https://developer.download.nvidia.com/compute/cuda/repos/fedora38/x86_64/cuda-fedora38.repo
Status code: 404 for https://developer.download.nvidia.com/compute/cuda/repos/fedora38/x86_64/cuda-fedora38.repo (IP: 152.195.19.142)
Error: Configuration of repo failed
```
2024-01-16 11:45:21 -08:00
Michael Yang
598d6d5572 Merge pull request #1937 from jmorganca/mxyng/remove-client-py
remove client.py
2024-01-16 11:01:41 -08:00
Bruce MacDonald
a897e833b8 do not cache prompt (#2018)
- prompt cache causes inferance to hang after some time
2024-01-16 13:48:05 -05:00
Patrick Devine
eef50accb4 Fix show parameters (#2017) 2024-01-16 10:34:44 -08:00
Michael Yang
05d53de7a1 Merge pull request #1968 from jmorganca/mxyng/fix-request-retry
fix: request retry with error
2024-01-16 10:33:50 -08:00
Daniel Hiltgen
8795447dad Merge pull request #1966 from fpreiss/fpreiss/gen_linux_cuda_detection
improve cuda detection (rel. issue #1704)
2024-01-14 18:00:11 -08:00
Daniel Hiltgen
b3035112a1 Add macos cross-compile CI coverage 2024-01-14 10:38:59 -08:00
Daniel Hiltgen
95ad9a9fc8 Merge pull request #1988 from dhiltgen/fix_intel_mac
Fix typo in arm mac arch script
2024-01-14 08:45:18 -08:00
Daniel Hiltgen
3ca5f69ce8 Fix typo in arm mac arch script 2024-01-14 08:32:57 -08:00
Daniel Hiltgen
cfa6337960 Merge pull request #1982 from dhiltgen/fix_intel_mac
Fix intel mac build
2024-01-14 08:26:46 -08:00
Alexander F. Rødseth
f4bf1d514f Let gpu.go and gen_linux.sh also find CUDA on Arch Linux 2024-01-14 13:40:36 +01:00
Jeffrey Morgan
557110d0ba Disable mmap with lora layers (#1985) 2024-01-13 23:36:31 -05:00
Daniel Hiltgen
2ecb247276 Fix intel mac build
Make sure we're building an x86 ext_server lib when cross-compiling
2024-01-13 14:46:34 -08:00
Jeffrey Morgan
288ef8ff95 add gcc -lstdc++ flag for linux cpu (#1974) 2024-01-13 03:53:00 -05:00
Jeffrey Morgan
4cf17990f7 use g++ to build libext_server.so on linux (#1972) 2024-01-13 03:12:42 -05:00
Michael Yang
b6c0ef1e70 Merge pull request #1961 from jmorganca/mxyng/rm-double-newline
remove double newlines in /set parameter
2024-01-12 15:18:19 -08:00
Michael Yang
356d178f6e Merge pull request #1971 from jmorganca/mxyng/max-context-length
add max context length check
2024-01-12 15:10:25 -08:00
Michael Yang
eaed6f8c45 add max context length check 2024-01-12 14:54:07 -08:00
purificant
6a5bfc2ed6 update actions/setup-go 2024-01-12 22:27:25 +00:00
Michael Yang
cf29bd2d72 fix: request retry with error
this fixes a subtle bug with makeRequestWithRetry where an HTTP status
error on a retried request will potentially not return the right err
2024-01-12 13:32:27 -08:00
Fabian Preiss
905862e17b improve cuda detection (rel. issue #1704) 2024-01-12 21:59:19 +01:00
Patrick Devine
565f8a3c44 Convert the REPL to use /api/chat for interactive responses (#1936) 2024-01-12 12:05:52 -08:00
Michael Yang
5121b7ac9c remove double newlines in /set parameter 2024-01-12 11:21:15 -08:00
Michael Yang
a70262c6b2 Update README.md
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>
2024-01-12 09:43:04 -08:00
Tristram Oaten
40a0a90a88 Add group delete to uninstall instructions (#1924)
After executing the `userdel ollama` command, I saw this message:

```sh
$ sudo userdel ollama
userdel: group ollama not removed because it has other members.
```

Which reminded me that I had to remove the dangling group too. For completeness, the uninstall instructions should do this too.

Thanks!
2024-01-12 00:07:00 -05:00
Michael Yang
cbe20c4375 update readme 2024-01-11 16:24:37 -08:00
Michael Yang
5ffbbea1d7 remove client.py 2024-01-11 15:53:10 -08:00
Daniel Hiltgen
3773fb6465 Merge pull request #1935 from dhiltgen/cpu_fallback
Fix up the CPU fallback selection
2024-01-11 15:52:32 -08:00
Daniel Hiltgen
7427fa1387 Fix up the CPU fallback selection
The memory changes and multi-variant change had some merge
glitches I missed.  This fixes them so we actually get the cpu llm lib
and best variant for the given system.
2024-01-11 15:27:06 -08:00
Michael Yang
f84537e0e0 Merge pull request #1934 from jmorganca/mxyng/fix-slices
fix build and lint
2024-01-11 14:36:20 -08:00
Michael Yang
d2be6387c9 fix typo 2024-01-11 14:25:21 -08:00
Michael Yang
d7af35d3d0 import fmt 2024-01-11 14:22:32 -08:00
Michael Yang
defc1dbd6e use x/exp/slices 2024-01-11 14:20:13 -08:00
Daniel Hiltgen
de2fbdec99 Merge pull request #1819 from dhiltgen/multi_variant
Support multiple LLM libs; ROCm v5 and v6; Rosetta, AVX, and AVX2 compatible CPU builds
2024-01-11 14:00:48 -08:00
Eduard van Valkenburg
f5faf79aa1 Add semantic kernel to Readme (#1931) 2024-01-11 14:40:23 -05:00
Michael Yang
f4f939de28 Merge pull request #1552 from jmorganca/mxyng/lint-test
add lint and test on pull_request
2024-01-11 09:37:45 -08:00
Daniel Hiltgen
39928a42e8 Always dynamically load the llm server library
This switches darwin to dynamic loading, and refactors the code now that no
static linking of the library is used on any platform
2024-01-11 08:42:47 -08:00
Daniel Hiltgen
d88c527be3 Build multiple CPU variants and pick the best
This reduces the built-in linux version to not use any vector extensions
which enables the resulting builds to run under Rosetta on MacOS in
Docker.  Then at runtime it checks for the actual CPU vector
extensions and loads the best CPU library available
2024-01-11 08:42:47 -08:00
Fabian Preiß
3bc8b9832b fix gpu_test.go Error (same type) uint64->uint32 (#1921) 2024-01-11 08:22:23 -05:00
Jeffrey Morgan
ab6be852c7 revisit memory allocation to account for full kv cache on main gpu 2024-01-11 01:45:31 -05:00
Daniel Hiltgen
052b33b81b DRY out the Dockefile.build 2024-01-10 17:27:51 -08:00
Daniel Hiltgen
8da7bef05f Support multiple variants for a given llm lib type
In some cases we may want multiple variants for a given GPU type or CPU.
This adds logic to have an optional Variant which we can use to select
an optimal library, but also allows us to try multiple variants in case
some fail to load.

This can be useful for scenarios such as ROCm v5 vs v6 incompatibility
or potentially CPU features.
2024-01-10 17:27:51 -08:00
Jeffrey Morgan
b24e8d17b2 Increase minimum CUDA memory allocation overhead and fix minimum overhead for multi-gpu (#1896)
* increase minimum cuda overhead and fix minimum overhead for multi-gpu

* fix multi gpu overhead

* limit overhead to 10% of all gpus

* better wording

* allocate fixed amount before layers

* fixed only includes graph alloc
2024-01-10 19:08:51 -05:00
Jeffrey Morgan
f83881390f revert submodule back to 328b83de23b33240e28f4e74900d1d06726f5eb1 2024-01-10 18:42:39 -05:00
Daniel Hiltgen
ac70ab6761 Merge pull request #1914 from dhiltgen/smarter_cuda_detection
Smarter GPU Management library detection
2024-01-10 15:21:56 -08:00
Daniel Hiltgen
3c49c3ab0d Harden GPU mgmt library lookup
When there are multiple management libraries installed on a system
not every one will be compatible with the current driver.  This change
improves our management library algorithm to build up a set of discovered
libraries based on glob patterns, and then try all of them until we're able to
load one without error.
2024-01-10 15:06:41 -08:00
Daniel Hiltgen
9754ae4c89 Support optional override of the target archictures
This can help speed up incremental builds when you're only testing one
archicture, like amd64.  E.g.
BUILD_ARCH=amd64 ./scripts/build_linux.sh && scp ./dist/ollama-linux-amd64 test-system:
2024-01-10 14:43:24 -08:00
Jeffrey Morgan
224fbf2795 update submodule to commit 1fc2f265ff9377a37fd2c61eae9cd813a3491bea until its main branch is fixed 2024-01-10 17:03:15 -05:00
Jeffrey Morgan
2c6e8f5248 Update submodule to 6efb8eb30e7025b168f3fda3ff83b9b386428ad6 (#1885)
* update submodule to `6efb8eb30e7025b168f3fda3ff83b9b386428ad6`
* unblock condition variable in `update_slots` when closing server
2024-01-10 16:48:38 -05:00
Jeffrey Morgan
34344d801c clean up cmake build directory when cross compiling macOS builds 2024-01-09 17:13:56 -05:00
Robin Glauser
e868c8a5c7 Update api.md (#1878)
Fixed assistant in the example response.
2024-01-09 16:21:17 -05:00
Jeffrey Morgan
c336693f07 calculate overhead based number of gpu devices (#1875) 2024-01-09 15:53:33 -05:00
Daniel Hiltgen
e89dc1d54b Merge pull request #1874 from dhiltgen/correct_cuda_min
Set corret CUDA minimum compute capability version
2024-01-09 11:37:22 -08:00
Daniel Hiltgen
1961a81f03 Set corret CUDA minimum compute capability version
If you attempt to run the current CUDA build on compute capability 5.2
cards, you'll hit the following failure:
cuBLAS error 15 at ggml-cuda.cu:7956: the requested functionality is not supported
2024-01-09 11:28:24 -08:00
Jeffrey Morgan
8a8c7e7f8d only build for metal on arm64 2024-01-09 13:51:08 -05:00
Jeffrey Morgan
6df83e6daa update rough cuda overhead estimate to 15% + 384MiB 2024-01-09 13:51:08 -05:00
Michael Yang
f921e2696e typo 2024-01-09 09:45:42 -08:00
Michael Yang
4a33cede20 remove unused fields and functions 2024-01-09 09:37:40 -08:00
Michael Yang
f95d2f25f3 fix temporary history file permissions 2024-01-09 09:36:58 -08:00
Michael Yang
2b9892a808 fix(windows): modelpath and list 2024-01-09 09:36:58 -08:00
Michael Yang
2bb2bdd5d4 fix lint 2024-01-09 09:36:58 -08:00
Michael Yang
acfc376efd add .golangci.yaml 2024-01-09 09:36:58 -08:00
Michael Yang
997253143f add lint and test on pull_request 2024-01-09 09:36:58 -08:00
Michael Yang
62023177f6 Merge pull request #1614 from jmorganca/mxyng/fix-set-template
fix: set template without triple quotes
2024-01-09 09:36:24 -08:00
Jeffrey Morgan
6164f378f2 revert cuda overhead to 20% 2024-01-09 00:54:29 -05:00
Jeffrey Morgan
f387e9631b use runner if cuda alloc won't fit 2024-01-09 00:44:34 -05:00
Jeffrey Morgan
6566387ae3 add TODO for cuda overhead 2024-01-09 00:28:03 -05:00
Jeffrey Morgan
37708931fb update cuda overhead to 20% to fix crashes when switching between models and large context sizes 2024-01-09 00:05:23 -05:00
Jeffrey Morgan
f6cb0a553c update cuda overhead to 15% or 400MiB 2024-01-08 23:45:45 -05:00
Jeffrey Morgan
2680078c13 fix build on linux 2024-01-08 23:44:13 -05:00
Jeffrey Morgan
f1b7e5f560 update overhead to 15% 2024-01-08 23:37:45 -05:00
Jeffrey Morgan
cb534e6ac2 use 10% vram overhead for cuda 2024-01-08 23:17:44 -05:00
Jeffrey Morgan
58ce2d8273 better estimate scratch buffer size 2024-01-08 21:32:44 -05:00
Jeffrey Morgan
18ddf6d57d fix windows build 2024-01-08 20:04:01 -05:00
Michael Yang
61e6502449 Merge pull request #1818 from jmorganca/mxyng/fix-alt-prompt
fix(cmd): history in alt prompt
2024-01-08 13:48:34 -08:00
Jeffrey Morgan
08f1e18965 Offload layers to GPU based on new model size estimates (#1850)
* select layers based on estimated model memory usage

* always account for scratch vram

* dont load +1 layers

* better estmation for graph alloc

* Update gpu/gpu_darwin.go

Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com>

* Update llm/llm.go

Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com>

* Update llm/llm.go

* add overhead for cuda memory

* Update llm/llm.go

Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com>

* fix build error on linux

* address comments

---------

Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com>
2024-01-08 16:42:00 -05:00
Bruce MacDonald
7e8f7c8358 remove ggml automatic re-pull (#1856) 2024-01-08 14:41:01 -05:00
Bruce MacDonald
3f3eb19a3b document response in modelfile template variables (#1428) 2024-01-08 14:38:51 -05:00
Daniel Hiltgen
059ae4585e Merge pull request #1834 from dhiltgen/old_cuda
Detect very old CUDA GPUs and fall back to CPU
2024-01-07 10:39:49 -08:00
Daniel Hiltgen
6347f501ca Merge pull request #1828 from dhiltgen/fix_llava
Accept windows paths for image processing
2024-01-07 09:05:46 -08:00
Jeffrey Morgan
5feec959ad dont use -Wall in static build (#1833) 2024-01-07 10:39:19 -05:00
Jeffrey Morgan
dbdd50b283 add -DCMAKE_SYSTEM_NAME=Darwin cmake flag (#1832) 2024-01-07 00:46:17 -05:00
Daniel Hiltgen
d74ce6bd4f Detect very old CUDA GPUs and fall back to CPU
If we try to load the CUDA library on an old GPU, it panics and crashes
the server.  This checks the compute capability before we load the
library so we can gracefully fall back to CPU mode.
2024-01-06 21:40:29 -08:00
Guilherme Baptista
57942b4676 Update README.md - Community Integrations - Ollama for Ruby (#1830) 2024-01-06 22:31:39 -05:00
Daniel Hiltgen
e0d05b0f1e Accept windows paths for image processing
This enhances our regex to support windows style paths.  The regex will
match invalid path specifications, but we'll still validate file
existence and filter out mismatches
2024-01-06 10:50:27 -08:00
Daniel Hiltgen
2d9dd14f27 Merge pull request #1697 from dhiltgen/win_docs
Add windows native build instructions
2024-01-05 19:34:20 -08:00
Jeffrey Morgan
1caa56128f add cuda lib path for nvidia container toolkit 2024-01-05 21:10:37 -05:00
Michael Yang
0101e76dbe Merge pull request #1797 from sublimator/nd-allow-extension-origins-still-needs-explicit-listing-2024-01-05
fix: allow extension origins (still needs explicit listing), fixes #1686
2024-01-05 17:20:09 -08:00
Michael Yang
2ef9352b94 fix(cmd): history in alt mode 2024-01-05 16:20:02 -08:00
Michael Yang
5580ae2472 fix: set template without triple quotes 2024-01-05 15:51:33 -08:00
Bruce MacDonald
3a9f447141 only pull gguf model if already exists (#1817) 2024-01-05 18:50:00 -05:00
Patrick Devine
9c2941e61b switch api for ShowRequest to use the name field (#1816) 2024-01-05 15:06:43 -08:00
Patrick Devine
238ac5e765 Add unit tests for Parser (#1815) 2024-01-05 14:04:31 -08:00
Bruce MacDonald
4f4980b66b simplify ggml update logic (#1814)
- additional information is now available in show response, use this to pull gguf before running
- make gguf updates cancellable
2024-01-05 15:22:32 -05:00
Patrick Devine
22e93efa41 add show info command and fix the modelfile 2024-01-05 12:20:05 -08:00
Patrick Devine
2909dce894 split up interactive generation 2024-01-05 12:20:05 -08:00
Jeffrey Morgan
df32537312 gpu: read memory info from all cuda devices (#1802)
* gpu: read memory info from all cuda devices

* add `LOOKUP_SIZE` constant

* better constant name

* address comments
2024-01-05 11:25:58 -05:00
Bruce MacDonald
3367b5f3df remove unused generate patches (#1810) 2024-01-05 11:25:45 -05:00
Matt Williams
46edbbc518 Merge pull request #1801 from jmorganca/mattw/correctdockerlink 2024-01-04 19:20:45 -08:00
Michael Yang
d2ff18cd6b Merge pull request #1791 from jmorganca/mxyng/update-build
update Dockerfile.build
2024-01-04 19:13:44 -08:00
Matt Williams
df086d3c8c fix docker doc to point to hub
Signed-off-by: Matt Williams <m@technovangelist.com>
2024-01-04 18:42:23 -08:00
Nicholas Dudfield
8baaaa39c0 Allow extension origins (still needs explicit listing), fixes #1686 2024-01-05 09:06:47 +07:00
Michael Yang
f9961c70ae update build 2024-01-04 17:34:38 -08:00
Daniel Hiltgen
cd8fad3398 Merge pull request #1790 from dhiltgen/llm_code_shuffle
Cleaup stale submodule
2024-01-04 13:47:25 -08:00
Daniel Hiltgen
9983fa5f4e Cleaup stale submodule
If the tree has a stale submodule, make sure we clean it up first
2024-01-04 13:40:16 -08:00
Daniel Hiltgen
dfda91c2ee Merge pull request #1788 from dhiltgen/llm_code_shuffle
Revamp code layout for the llm directory and llama.cpp submodule
2024-01-04 13:14:28 -08:00
Daniel Hiltgen
fac9060da5 Init submodule with new path 2024-01-04 13:00:13 -08:00
Daniel Hiltgen
a554616f8e remove old llama.cpp submodule path 2024-01-04 12:12:21 -08:00
Daniel Hiltgen
77d96da94b Code shuffle to clean up the llm dir 2024-01-04 12:12:05 -08:00
Brian Murray
0d6e3565ae Add embeddings to API (#1773) 2024-01-04 15:00:52 -05:00
Daniel Hiltgen
b5939008a1 Merge pull request #1785 from dhiltgen/win_native_cli
Load dynamic cpu lib on windows
2024-01-04 08:55:01 -08:00
Daniel Hiltgen
e9ce91e9a6 Load dynamic cpu lib on windows
On linux, we link the CPU library in to the Go app and fall back to it
when no GPU match is found. On windows we do not link in the CPU library
so that we can better control our dependencies for the CLI.  This fixes
the logic so we correctly fallback to the dynamic CPU library
on windows.
2024-01-04 08:41:41 -08:00
Bruce MacDonald
4ad6c9b11f fix: pull either original model or from model on create (#1774) 2024-01-04 01:34:38 -05:00
Jeffrey Morgan
c0285158a9 tweak memory requirements error text 2024-01-03 19:47:18 -05:00
Jeffrey Morgan
77a66df72c add macOS memory check for 47B models 2024-01-03 19:46:16 -05:00
Jeffrey Morgan
5b4837f881 remove unused filetype check 2024-01-03 19:45:39 -05:00
Jeffrey Morgan
29340c2e62 update cmake flags for amd64 macOS (#1780)
* update cmake flags for intel macOS

* remove `LLAMA_K_QUANTS`

* put back `CMAKE_OSX_DEPLOYMENT_TARGET` and disable `LLAMA_F16C`
2024-01-03 19:22:15 -05:00
Daniel Hiltgen
d5ec730354 Merge pull request #1779 from dhiltgen/refined_amd_gpu_list
Improve maintainability of Radeon card list
2024-01-03 16:18:57 -08:00
Daniel Hiltgen
8bed487aba Merge pull request #1778 from dhiltgen/wsl1
Fail fast on WSL1 while allowing on WSL2
2024-01-03 16:18:41 -08:00
Daniel Hiltgen
c1a10a6e9b Merge pull request #1781 from dhiltgen/cpu_only_build
Fix CPU only builds
2024-01-03 16:18:25 -08:00
Daniel Hiltgen
ddbfa6fe31 Fix CPU only builds
Go embed doesn't like when there's no matching files, so put
a dummy placeholder in to allow building without any GPU support
If no "server" library is found, it's safely ignored at runtime.
2024-01-03 16:08:34 -08:00
Daniel Hiltgen
2fcd41ef81 Fail fast on WSL1 while allowing on WSL2
This prevents users from accidentally installing on WSL1 with instructions
guiding how to upgrade their WSL instance to version 2.  Once running WSL2
if you have an NVIDIA card, you can follow their instructions to set up
GPU passthrough and run models on the GPU.  This is not possible on WSL1.
2024-01-03 16:02:32 -08:00
Daniel Hiltgen
16f4603b67 Improve maintainability of Radeon card list
This moves the list of AMD GPUs to an easier to maintain list which
should make it easier to update over time.
2024-01-03 15:16:56 -08:00
Daniel Hiltgen
1184686649 Merge pull request #1776 from dhiltgen/render_group
Add ollama user to render group for Radeon support
2024-01-03 13:07:54 -08:00
Daniel Hiltgen
2588cb2daa Add ollama user to render group for Radeon support
For the ROCm libraries to access the driver, we need to add the ollama user
to the render group.
2024-01-03 12:56:31 -08:00
Jeffrey Morgan
c7ea8f237e set num_gpu to 1 only by default on darwin arm64 (#1771) 2024-01-03 14:10:29 -05:00
Bruce MacDonald
0b3118e0af fix: relay request opts to loaded llm prediction (#1761) 2024-01-03 12:01:42 -05:00
Daniel Hiltgen
05face44ef Merge pull request #1683 from dhiltgen/fix_windows_test
Fix windows system memory lookup
2024-01-03 09:00:39 -08:00
Daniel Hiltgen
a2ad952440 Fix windows system memory lookup
This refines the gpu package error handling and fixes a bug with the
system memory lookup on windows.
2024-01-03 08:50:01 -08:00
Daniel Hiltgen
5fea4410be Merge pull request #1680 from dhiltgen/better_patching
Refactor how we augment llama.cpp and refine windows native build
2024-01-03 08:10:17 -08:00
Bruce MacDonald
b846eb64d0 Fix template api doc description (#1661) 2024-01-03 11:00:59 -05:00
Cole Gillespie
3c5dd9ed1d Update README.md (#1766) 2024-01-03 10:44:22 -05:00
Jeffrey Morgan
b17ccd0542 Update import.md 2024-01-02 22:28:18 -05:00
Patrick Devine
d0409f772f keyboard shortcut help (#1764) 2024-01-02 18:04:12 -08:00
Jeffrey Morgan
ec261422af use docker build in build scripts 2024-01-02 19:32:54 -05:00
Daniel Hiltgen
0498f7ce56 Get rid of one-line llama.log
This one log line was triggering a single line llama.log to be generated
in the pwd of the server
2024-01-02 15:36:16 -08:00
Daniel Hiltgen
738a8d12eb Rename the ollama cmakefile 2024-01-02 15:36:16 -08:00
Daniel Hiltgen
d966b730ac Switch windows build to fully dynamic
Refactor where we store build outputs, and support a fully dynamic loading
model on windows so the base executable has no special dependencies thus
doesn't require a special PATH.
2024-01-02 15:36:16 -08:00
Daniel Hiltgen
9a70aecccb Refactor how we augment llama.cpp
This changes the model for llama.cpp inclusion so we're not applying a patch,
but instead have the C++ code directly in the ollama tree, which should make it
easier to refine and update over time.
2024-01-02 15:35:55 -08:00
Karim ElGhandour
22cd5eaab6 Added Ollama-SwiftUI to integrations (#1747) 2024-01-02 09:47:50 -05:00
Dane Madsen
304a8799ca Update README.md (#1757) 2024-01-02 09:47:08 -05:00
Jeffrey Morgan
2a2fa3c329 api.md cleanup & formatting 2023-12-27 14:32:35 -05:00
Jeffrey Morgan
55978c1dc9 clean up cache api option 2023-12-27 14:27:45 -05:00
Jeffrey Morgan
d4ebdadbe7 enable cache_prompt by default 2023-12-27 14:23:42 -05:00
Daniel Hiltgen
e201efa14b Add windows native build instructions 2023-12-25 08:31:34 -08:00
Icelain
c5f21f73a4 follow best practices by adding resp.Body.Close() (#1708) 2023-12-25 09:01:37 -05:00
Jeffrey Morgan
371bc73531 Update README.md 2023-12-24 11:54:08 -05:00
Jeffrey Morgan
c651d8b824 Update README.md 2023-12-23 11:18:12 -05:00
Daniel Hiltgen
cf50ef5b51 Merge pull request #1684 from dhiltgen/tag_integration_tests
Guard integration tests with a tag
2023-12-22 16:43:41 -08:00
Daniel Hiltgen
697bea6939 Guard integration tests with a tag
This should help CI avoid running the integration test logic in a
container where it's not currently possible.
2023-12-22 16:33:27 -08:00
K0IN
10da41d677 Add Cache flag to api (#1642) 2023-12-22 17:16:20 -05:00
Bruce MacDonald
db356c8519 post-response templating (#1427) 2023-12-22 17:07:05 -05:00
Jeffrey Morgan
b80081022f cache docker builds in build_linux.sh 2023-12-22 16:01:20 -05:00
Matt Williams
790457398a Merge pull request #1677 from jmorganca/mattw/docrunupdate
update where are models stored q
2023-12-22 09:56:27 -08:00
Matt Williams
511069a2a5 update where are models stored q
Signed-off-by: Matt Williams <m@technovangelist.com>
2023-12-22 09:48:44 -08:00
Matt Williams
5a85070c22 Update readmes, requirements, packagejsons, etc for all examples (#1452)
Most of the examples needed updates of Readmes to show how to run them. Some of the requirements.txt files had extra content that wasn't needed, or missing altogether. Apparently some folks like to run npm start
to run typescript, so a script was added to all typescript examples which
hadn't been done before.

Basically just a lot of cleanup.

Signed-off-by: Matt Williams <m@technovangelist.com>
2023-12-22 09:10:41 -08:00
Matt Williams
291700c92d Clean up documentation (#1506)
* Clean up documentation

Will probably need to update with PRs for new release.

Signed-off-by: Matt Williams <m@technovangelist.com>

* Correcting to fit in 0.1.15 changes

Signed-off-by: Matt Williams <m@technovangelist.com>

* Update README.md

Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>

* addressing comments

Signed-off-by: Matt Williams <m@technovangelist.com>

* more api cleanup

Signed-off-by: Matt Williams <m@technovangelist.com>

* its llava not llama

Signed-off-by: Matt Williams <m@technovangelist.com>

* Update docs/troubleshooting.md

Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>

* Updated hosting to server and documented all env vars

Signed-off-by: Matt Williams <m@technovangelist.com>

* remove last of the cli descriptions

Signed-off-by: Matt Williams <m@technovangelist.com>

* Update README.md

Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>

* update further per conversation with jeff earlier today

Signed-off-by: Matt Williams <m@technovangelist.com>

* cleanup the doc readme

Signed-off-by: Matt Williams <m@technovangelist.com>

* move upgrade to faq

Signed-off-by: Matt Williams <m@technovangelist.com>

* first change

Signed-off-by: Matt Williams <m@technovangelist.com>

* updated

Signed-off-by: Matt Williams <m@technovangelist.com>

* Update docs/faq.md

Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>

* Update docs/api.md

Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>

* Update docs/api.md

Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>

* Update docs/api.md

Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>

* Update docs/api.md

Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>

* Update docs/api.md

Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>

* Update docs/api.md

Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>

* Update docs/README.md

Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>

* Update docs/api.md

Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>

* Update docs/api.md

Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>

* Update docs/api.md

Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>

* Update README.md

Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>

* Update docs/README.md

Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>

* Update docs/api.md

Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>

* Update docs/api.md

Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>

* Update docs/api.md

Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>

* Update docs/README.md

Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>

* Update docs/README.md

Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>

* Update docs/README.md

Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>

* examples in parent

Signed-off-by: Matt Williams <m@technovangelist.com>

* add exapmle for create model.

Signed-off-by: Matt Williams <m@technovangelist.com>

* update faq

Signed-off-by: Matt Williams <m@technovangelist.com>

* update create model api

Signed-off-by: Matt Williams <m@technovangelist.com>

* Update docs/api.md

Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>

* Update docs/faq.md

Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>

* Update docs/troubleshooting.md

Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>

* update the readme in docs

Signed-off-by: Matt Williams <m@technovangelist.com>

* update a few more things

Signed-off-by: Matt Williams <m@technovangelist.com>

* Update docs/troubleshooting.md

Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>

* Update docs/faq.md

Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>

* Update README.md

Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>

* Update docs/modelfile.md

Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>

* Update docs/troubleshooting.md

Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>

---------

Signed-off-by: Matt Williams <m@technovangelist.com>
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>
2023-12-22 09:10:01 -08:00
Daniel Hiltgen
9db28af84e Merge pull request #1675 from dhiltgen/less_verbose
Quiet down llama.cpp logging by default
2023-12-22 08:57:17 -08:00
Daniel Hiltgen
e5202eb687 Quiet down llama.cpp logging by default
By default builds will now produce non-debug and non-verbose binaries.
To enable verbose logs in llama.cpp and debug symbols in the
native code, set `CGO_CFLAGS=-g`
2023-12-22 08:47:18 -08:00
Daniel Hiltgen
96fb441abd Merge pull request #1146 from dhiltgen/ext_server_cgo
Add cgo implementation for llama.cpp
2023-12-22 08:16:31 -08:00
Daniel Hiltgen
495c06e4a6 Fix doc glitch 2023-12-21 18:21:31 -08:00
Daniel Hiltgen
fa24e73b82 Remove CPU build, fixup linux build script 2023-12-21 18:21:31 -08:00
Daniel Hiltgen
325d74985b Fix CPU performance on hyperthreaded systems
The default thread count logic was broken and resulted in 2x the number
of threads as it should on a hyperthreading CPU
resulting in thrashing and poor performance.
2023-12-21 16:23:36 -08:00
Bruce MacDonald
fabf2f3467 allow for starting llava queries with filepath (#1549) 2023-12-21 13:20:59 -05:00
Daniel Hiltgen
d9cd3d9667 Revive windows build
The windows native setup still needs some more work, but this gets it building
again and if you set the PATH properly, you can run the resulting exe on a cuda system.
2023-12-20 17:21:54 -08:00
Patrick Devine
a607d922f0 add FAQ for slow networking in WSL2 (#1646) 2023-12-20 16:27:24 -08:00
Daniel Hiltgen
7555ea44f8 Revamp the dynamic library shim
This switches the default llama.cpp to be CPU based, and builds the GPU variants
as dynamically loaded libraries which we can select at runtime.

This also bumps the ROCm library to version 6 given 5.7 builds don't work
on the latest ROCm library that just shipped.
2023-12-20 14:45:57 -08:00
Jeffrey Morgan
df06812494 Update api.md 2023-12-20 08:47:53 -05:00
Daniel Hiltgen
1d1eb1688c Additional nvidial-ml path to check 2023-12-19 15:52:34 -08:00
Michael Yang
23dc179350 Merge pull request #1619 from jmorganca/mxyng/fix-version-test
fix(test): use real version string for comparison
2023-12-19 15:48:52 -08:00
Michael Yang
63aac0edc5 fix(test): use real version string for comparison 2023-12-19 15:03:02 -08:00
Daniel Hiltgen
6558f94ed0 Fix darwin intel build 2023-12-19 13:32:24 -08:00
Erick Ghaumez
1ca484f67e Add Langchain Dart library (#1564)
* Add Langchain Dart

* Update README.md

---------

Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com>
2023-12-19 14:04:52 -05:00
Jeffrey Morgan
72b0c32fe9 Update README.md 2023-12-19 12:59:22 -05:00
Jeffrey Morgan
68c28224f8 Update README.md 2023-12-19 12:59:03 -05:00
Daniel Hiltgen
54dbfa4c4a Carry ggml-metal.metal as payload 2023-12-19 09:05:46 -08:00
Daniel Hiltgen
5646826a79 Add WSL2 path to nvidia-ml.so library 2023-12-19 09:05:46 -08:00
Daniel Hiltgen
3269535a4c Refine handling of shim presence
This allows the CPU only builds to work on systems with Radeon cards
2023-12-19 09:05:46 -08:00
Daniel Hiltgen
1b991d0ba9 Refine build to support CPU only
If someone checks out the ollama repo and doesn't install the CUDA
library, this will ensure they can build a CPU only version
2023-12-19 09:05:46 -08:00
Daniel Hiltgen
51082535e1 Add automated test for multimodal
A simple test case that verifies llava:7b can read text in an image
2023-12-19 09:05:46 -08:00
Daniel Hiltgen
9adca7f711 Bump llama.cpp to b1662 and set n_parallel=1 2023-12-19 09:05:46 -08:00
Daniel Hiltgen
89bbaafa64 Build linux using ubuntu 20.04
This changes the container-based linux build to use an older Ubuntu
distro to improve our compatibility matrix for older user machines
2023-12-19 09:05:46 -08:00
Daniel Hiltgen
35934b2e05 Adapted rocm support to cgo based llama.cpp 2023-12-19 09:05:46 -08:00
65a
f8ef4439e9 Use build tags to generate accelerated binaries for CUDA and ROCm on Linux.
The build tags rocm or cuda must be specified to both go generate and go build.
ROCm builds should have both ROCM_PATH set (and the ROCM SDK present) as well
as CLBlast installed (for GGML) and CLBlast_DIR set in the environment to the
CLBlast cmake directory (likely /usr/lib/cmake/CLBlast). Build tags are also
used to switch VRAM detection between cuda and rocm implementations, using
added "accelerator_foo.go" files which contain architecture specific functions
and variables. accelerator_none is used when no tags are set, and a helper
function addRunner will ignore it if it is the chosen accelerator. Fix go
generate commands, thanks @deadmeu for testing.
2023-12-19 09:05:46 -08:00
Daniel Hiltgen
d4cd695759 Add cgo implementation for llama.cpp
Run the server.cpp directly inside the Go runtime via cgo
while retaining the LLM Go abstractions.
2023-12-19 09:05:46 -08:00
Bruce MacDonald
5e7fd6906f Update images.go 2023-12-19 09:05:46 -08:00
Bruce MacDonald
811b1f03c8 deprecate ggml
- remove ggml runner
- automatically pull gguf models when ggml detected
- tell users to update to gguf in the case automatic pull fails

Co-Authored-By: Jeffrey Morgan <jmorganca@gmail.com>
2023-12-19 09:05:46 -08:00
Matt Williams
ed195f3562 Merge pull request #1595 from pgibler/main
Added cmdh to community section in README
2023-12-18 20:55:18 -08:00
Matt Williams
e0d0072ef1 Merge pull request #1592 from jmorganca/mattw/examplepruning
Lets get rid of these old modelfile examples
2023-12-18 20:29:48 -08:00
pgibler
620a2ffcfb Added cmdh to community section in README 2023-12-18 22:04:40 -05:00
Matt Williams
d287013f24 Lets get rid of these old modelfile examples
Signed-off-by: Matt Williams <m@technovangelist.com>
2023-12-18 17:47:33 -08:00
Jeffrey Morgan
6b5bdfa6c9 update runner submodule 2023-12-18 17:33:46 -05:00
Jeffrey Morgan
c063ee4af0 update runner submodule to fix hipblas build 2023-12-18 15:41:13 -05:00
Bruce MacDonald
d99fa6ce0a send empty messages on last chat response (#1530) 2023-12-18 14:23:38 -05:00
Patrick Devine
3948c6ea06 add magic header for unit tests (#1558) 2023-12-18 10:41:02 -08:00
Jeffrey Morgan
b85982eb91 update runner submodule 2023-12-18 12:43:31 -05:00
Patrick Devine
86b0dd4b16 add API create/copy handlers (#1541) 2023-12-15 11:59:18 -08:00
Augustinas Malinauskas
f728738427 README with Enchanted iOS App (#1529)
* feat(docs): README with Enchanted iOS app

* Update README.md

---------

Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com>
2023-12-15 14:37:29 -05:00
Ian Purton
115048a0d8 Added Bionic GPT as a front end. (#1463)
* Added Bionic GPT as a front end.

* Update README.md

---------

Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com>
2023-12-15 14:33:04 -05:00
Bruce MacDonald
1b417a7836 use exp slices for go 1.20 compatibility (#1544) 2023-12-15 14:15:56 -05:00
Patrick Devine
0174665d0e add API tests for list handler (#1535) 2023-12-14 18:18:25 -08:00
Patrick Devine
630518f0d9 Add unit test of API routes (#1528) 2023-12-14 16:47:40 -08:00
Bruce MacDonald
6e16098a60 remove sample_count from docs (#1527)
this info has not been returned from these endpoints in some time
2023-12-14 17:49:00 -05:00
Bruce MacDonald
6ee8c80199 restore model load duration on generate response (#1524)
* restore model load duration on generate response

- set model load duration on generate and chat done response
- calculate createAt time when response created

* remove checkpoints predict opts

* Update routes.go
2023-12-14 12:15:50 -05:00
Jeffrey Morgan
31f0551dab Update runner to support mixtral and mixture of experts (MoE) (#1475) 2023-12-13 17:15:10 -05:00
Jeffrey Morgan
4a1abfe4fa fix tests 2023-12-13 14:42:30 -05:00
Jeffrey Morgan
bbd41494bf add multimodal to README.md 2023-12-13 14:38:47 -05:00
Jeffrey Morgan
fedba24a63 Docs for multimodal support (#1485)
* add multimodal docs

* add chat api docs

* consistency between `/api/generate` and `/api/chat`

* simplify docs
2023-12-13 13:59:33 -05:00
pepperoni21
e3b090dbc5 Added message format for chat api (#1488) 2023-12-13 11:21:23 -05:00
Patrick Devine
d9e60f634b add image support to the chat api (#1490) 2023-12-12 13:28:58 -08:00
Michael Yang
4251b342de Merge pull request #1469 from jmorganca/mxyng/model-types
remove per-model types
2023-12-12 12:27:03 -08:00
Jeffrey Morgan
0a9d348023 Fix issues with /set template and /set system (#1486) 2023-12-12 14:43:19 -05:00
Bruce MacDonald
3144e2a439 exponential back-off (#1484) 2023-12-12 12:33:02 -05:00
Bruce MacDonald
c0960e29b5 retry on concurrent request failure (#1483)
- remove parallel
2023-12-12 12:14:35 -05:00
ruecat
5314fc9b63 Fix Readme "Database -> MindsDB" link (#1479) 2023-12-12 10:26:13 -05:00
Jorge Torres
a36b5fef3b Update README.md (#1412) 2023-12-11 18:05:10 -05:00
Patrick Devine
910e9401d0 Multimodal support (#1216)
---------

Co-authored-by: Matt Apperson <mattapperson@Matts-MacBook-Pro.local>
2023-12-11 13:56:22 -08:00
Michael Yang
56ffc3023a remove per-model types
mostly replaced by decoding tensors except ggml models which only
support llama
2023-12-11 09:40:21 -08:00
Bruce MacDonald
7a1b37ac64 os specific ctrl-z (#1420) 2023-12-11 10:48:14 -05:00
Jeffrey Morgan
5d4d2e2c60 update docs with chat completion api 2023-12-10 13:53:36 -05:00
Jeffrey Morgan
7db5bcf73b fix go-staticcheck warning 2023-12-10 11:44:27 -05:00
Jeffrey Morgan
fa2f095bd9 fix model name returned by /api/generate being different than the model name provided 2023-12-10 11:42:15 -05:00
Jeffrey Morgan
045b855db9 fix error on accumulating final chat response 2023-12-10 11:24:39 -05:00
Jeffrey Morgan
32064a0646 fix empty response when receiving runner error 2023-12-10 10:53:38 -05:00
Jeffrey Morgan
d9a250e9b5 seek to end of file when decoding older model formats 2023-12-09 21:14:35 -05:00
Jeffrey Morgan
944519ed16 seek to eof for older model binaries 2023-12-09 20:48:57 -05:00
Jeffrey Morgan
2dd040d04c do not use --parallel 2 for old runners 2023-12-09 20:17:33 -05:00
Bruce MacDonald
bbe41ce41a fix: parallel queueing race condition caused silent failure (#1445)
* fix: queued request failures

- increase parallel requests to 2 to complete queued request, queueing is managed in ollama

* log steam errors
2023-12-09 14:14:02 -05:00
Jeffrey Morgan
9e1406e4ed Don't expose model information in /api/generate 2023-12-09 02:05:43 -08:00
Jeffrey Morgan
b74580c913 Update api.md 2023-12-08 16:02:07 -08:00
Bruce MacDonald
7e9405fd07 fix: encode full previous prompt in context (#1424) 2023-12-08 16:53:51 -05:00
Bruce MacDonald
3b0b8930d4 fix: only flush template in chat when current role encountered (#1426) 2023-12-08 16:44:24 -05:00
Bruce MacDonald
e3f925fc1b fix: restore modelfile system in prompt template (#1425) 2023-12-08 14:20:19 -05:00
Jeffrey Morgan
2a2289fb6b Update api.md 2023-12-08 09:36:45 -08:00
Matt Williams
dd427f499a Merge pull request #1419 from jmorganca/mattw/typescript-simplechat
Simple chat example for typescript
2023-12-07 14:42:24 -08:00
Michael Yang
2ae573c7ed Merge pull request #1421 from jmorganca/mxyng/fix-newline
fix redundant newline
2023-12-07 13:47:23 -08:00
Matt Williams
02fe26c44b update the readme as per bruce
Signed-off-by: Matt Williams <m@technovangelist.com>
2023-12-07 13:46:30 -08:00
Michael Yang
16c7548460 fix redundant newline 2023-12-07 13:44:45 -08:00
Matt Williams
fa75998c0d Update examples/typescript-simplechat/readme.md
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com>
2023-12-07 13:40:54 -08:00
Matt Williams
5344f886c8 Update examples/typescript-simplechat/client.ts
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com>
2023-12-07 13:40:37 -08:00
Matt Williams
6cc823c9b5 Update examples/typescript-simplechat/client.ts
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com>
2023-12-07 13:39:59 -08:00
Matt Williams
b84d34e632 Update examples/typescript-simplechat/readme.md
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com>
2023-12-07 13:39:33 -08:00
Matt Williams
30229a913c Update examples/typescript-simplechat/client.ts
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com>
2023-12-07 13:39:24 -08:00
Matt Williams
1ade380bd7 Simple chat example for typescript
Signed-off-by: Matt Williams <m@technovangelist.com>
2023-12-07 11:48:25 -08:00
Jeffrey Morgan
ba264e9da8 add future version note to chat api docs 2023-12-07 09:42:15 -08:00
Matt Williams
a2405ec831 Merge pull request #1409 from jmorganca/mattw/python-simplechat
Simple chat example
2023-12-06 15:49:45 -08:00
Matt Williams
ce809bb529 Merge branch 'mattw/python-simplechat' of github.com:jmorganca/ollama into mattw/python-simplechat 2023-12-06 15:48:42 -08:00
Matt Williams
76bc4d0458 Cleanup as per Bruce
Signed-off-by: Matt Williams <m@technovangelist.com>
2023-12-06 15:44:40 -08:00
Bruce MacDonald
4a02945a15 Update examples/python-simplechat/client.py 2023-12-06 18:36:45 -05:00
Matt Williams
aec742b6d2 Update examples/python-simplechat/readme.md
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com>
2023-12-06 15:30:45 -08:00
Matt Williams
f337642e94 Update examples/python-simplechat/readme.md
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com>
2023-12-06 15:30:35 -08:00
Matt Williams
51131cc6e2 Update examples/python-simplechat/client.py
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com>
2023-12-06 15:30:10 -08:00
Matt Williams
43027789dc Simple chat example
Signed-off-by: Matt Williams <m@technovangelist.com>
2023-12-06 14:35:58 -08:00
Xe Iaso
f9b7d65e2b docs/tutorials: add bit on how to use Fly GPUs on-demand with Ollama (#1406)
Signed-off-by: Xe Iaso <xe@camellia.finch-kitefin.ts.net>
2023-12-06 14:14:02 -08:00
Michael Yang
1f05d77110 Merge pull request #1244 from jmorganca/brucemacd/no-fail-template
do not fail on unsupported template variables
2023-12-06 13:23:04 -08:00
Michael Yang
c3ff36088b Merge pull request #774 from jmorganca/mxyng/server-version
add version api and show server version in cli
2023-12-06 13:22:55 -08:00
Samuel Calderon
13524b5e72 List "Send chat messages" in table of contents (#1399)
Thank you @calderonsamuel
2023-12-06 12:34:27 -08:00
Michael Yang
f1b049fed8 Merge pull request #1377 from jmorganca/mxyng/qwen
update for qwen
2023-12-06 12:31:51 -08:00
Jeffrey Morgan
97c5696945 fix base urls in chat examples 2023-12-06 12:10:20 -08:00
Bruce MacDonald
47d4e22673 use missingkey in set empty interface when missing 2023-12-05 15:49:05 -08:00
Michael Yang
32f62fbb8e Merge pull request #1334 from jmorganca/mxyng/load-projectors
load projectors
2023-12-05 14:40:53 -08:00
Michael Yang
5d75505ebd return model configuration in generate 2023-12-05 14:39:02 -08:00
Michael Yang
b9495ea162 load projectors 2023-12-05 14:36:12 -08:00
Michael Yang
409bb9674e Merge pull request #1308 from jmorganca/mxyng/split-from
split from into one or more models
2023-12-05 14:33:03 -08:00
Michael Yang
d3479c07a1 Merge pull request #1250 from jmorganca/mxyng/create-layer
refactor layer creation
2023-12-05 14:32:52 -08:00
Michael Yang
b12f1b984f Merge pull request #1393 from jmorganca/mxyng/fix-whitespace
fix: trim space in modelfile fields
2023-12-05 12:18:01 -08:00
Bruce MacDonald
195e3d9dbd chat api endpoint (#1392) 2023-12-05 14:57:33 -05:00
Michael Yang
38fe1a368b fix: trim space in modelfile fields 2023-12-05 11:57:29 -08:00
Michael Yang
4b77fcb2b9 comments 2023-12-05 09:43:50 -08:00
Michael Yang
cde13bcdea cmd: only print server version when different 2023-12-05 09:36:01 -08:00
Michael Yang
0f0cd265a7 cmd: add server version 2023-12-05 09:36:01 -08:00
Michael Yang
0db4706ec2 api: add version api handler 2023-12-05 09:36:01 -08:00
Michael Yang
1ebdbd9694 server: add version handler 2023-12-05 09:36:01 -08:00
Michael Yang
5c59455b59 cmd: use existing cmd context 2023-12-05 09:36:01 -08:00
Jeffrey Morgan
00d06619a1 Revert "chat api (#991)" while context variable is fixed
This reverts commit 7a0899d62d.
2023-12-04 21:16:27 -08:00
Matt Williams
f1ef3f9947 remove mention of gpt-neox in import (#1381)
Signed-off-by: Matt Williams <m@technovangelist.com>
2023-12-04 20:58:10 -08:00
Michael Yang
5a5dca13b2 comments 2023-12-04 16:59:23 -08:00
Michael Yang
7232f1fa41 go mod tidy 2023-12-04 16:59:23 -08:00
Michael Yang
72e7a49aa9 seek instead of copyn 2023-12-04 16:59:23 -08:00
Michael Yang
a3737cbd33 use NewLayer for CreateBlobHandler 2023-12-04 16:59:23 -08:00
Michael Yang
998f1785b6 add modelfamilies 2023-12-04 16:59:23 -08:00
Michael Yang
70a93057cd refactor layer creation
previous layer creation was not ideal because:

1. it required reading the input file multiple times, once to calculate
   the sha256 checksum, another to write it to disk, and potentially one
   more to decode the underlying gguf
2. used io.ReadSeeker which is prone to user error. if the file isn't
   reset correctly or in the right place, it could end up reading an
   empty file

there are also some brittleness when reading existing layers else
writing the inherited layers will error reading an already closed file

this commit aims to fix these issues by restructuring layer creation.

1. it will now write the layer to a temporary file as well as the hash
   function and move it to the final location on Commit
2. layers are read once once when copied to the destination. exception
   is raw model files which still requires a second read to decode the
   model metadata
2023-12-04 16:59:23 -08:00
Michael Yang
2cb0fa7d40 split from into one or more models 2023-12-04 16:59:23 -08:00
Michael Yang
b2816bca67 unnecessary ReadSeeker for DecodeGGML 2023-12-04 16:59:23 -08:00
Patrick Devine
bf704423c5 revert cli to use /api/generate (#1383) 2023-12-04 16:35:29 -08:00
Bruce MacDonald
7a0899d62d chat api (#991)
- update chat docs
- add messages chat endpoint
- remove deprecated context and template generate parameters from docs
- context and template are still supported for the time being and will continue to work as expected
- add partial response to chat history
2023-12-04 18:01:06 -05:00
Michael Yang
0cca1486dd Merge pull request #1376 from jmorganca/mxyng/rocky-install
install: fix rocky kernel packages
2023-12-04 14:23:43 -08:00
Patrick Devine
2113c9d31a make linewrap still work when the terminal width has changed (#1350) 2023-12-04 14:14:56 -08:00
Michael Yang
6deebf2489 update for qwen 2023-12-04 11:38:05 -08:00
Michael Yang
95cb38ae47 install: fix rocky kernel packages 2023-12-04 11:10:42 -08:00
ruecat
1f126afb2d Ollama Telegram Bot (#1364)
* Add "ollama-telegram" to Extensions & Plugins

* Update README.md
2023-12-03 11:19:55 -08:00
Jeffrey Morgan
f6201a7a6c remove duplicate community integration in README.md 2023-12-02 21:18:13 -08:00
Michael Yang
b3f6c6598f Merge pull request #1349 from jmorganca/mxyng/ctrl-z
handle ctrl+z
2023-12-01 16:21:49 -08:00
Michael Yang
88620e983a handle ctrl+z 2023-12-01 16:15:20 -08:00
Michael Yang
cedae0d17a Merge pull request #1347 from jshph/adapter-hash
Fix adapter loading from SHA hash
2023-12-01 11:08:25 -08:00
Joshua Pham
bb80a597db Fix adapter loading from SHA hash 2023-12-01 13:50:55 -05:00
Patrick Devine
6681d37861 allow setting the system and template for prompts in the repl (#1335) 2023-12-01 09:28:35 -08:00
Michael Yang
0409c1fa59 docker: set PATH, LD_LIBRARY_PATH, and capabilities (#1336)
* docker: set PATH, LD_LIBRARY_PATH, and capabilities

* example: update k8s gpu manifest
2023-11-30 21:16:56 -08:00
Michael Yang
b56e92470a Merge pull request #1229 from jmorganca/mxyng/calculate-as-you-go
revert checksum calculation to calculate-as-you-go
2023-11-30 10:54:38 -08:00
Jeffrey Morgan
5687f1a0cf fix unexpected end of response errors when cancelling in ollama run 2023-11-30 00:30:21 -05:00
James Radtke
7eda3d0c55 Corrected transposed 129 to 192 for OLLAMA_ORIGINS example (#1325) 2023-11-29 22:44:17 -05:00
Bruce MacDonald
7194a07d4d Add chatd to example projects 2023-11-29 21:18:21 -05:00
Michael Yang
13efd5f218 upload: fix PUT retry 2023-11-29 16:38:35 -08:00
Michael Yang
c4bdfffd96 upload: separate progress tracking 2023-11-29 16:38:33 -08:00
Michael Yang
26c63418e0 new hasher 2023-11-29 14:52:41 -08:00
Michael Yang
2799784ac8 revert checksum calculation to calculate-as-you-go 2023-11-29 13:47:58 -08:00
Alec Hammond
91897a606f Add OllamaEmbeddings to python LangChain example (#994)
* Add OllamaEmbeddings to python LangChain example

* typo

---------

Co-authored-by: Alec Hammond <alechammond@fb.com>
2023-11-29 16:25:39 -05:00
Bruce MacDonald
96122b7271 validate model tags on copy (#1323) 2023-11-29 15:54:29 -05:00
jeremiahbuckley
39be7fdb98 fix rhel cuda install (#1321)
Co-authored-by: Cloud User <azureuser@testgpu2.hqzwom21okjenksna4y3c4ymjd.phxx.internal.cloudapp.net>
2023-11-29 14:55:15 -05:00
Timothy Jaeryang Baek
c2e3b89176 fix: disable ':' in tag names (#1280)
Co-authored-by: rootedbox
2023-11-29 13:33:45 -05:00
Patrick Devine
cde31cb220 Allow setting parameters in the REPL (#1294) 2023-11-29 09:56:42 -08:00
ToasterUwU
63097607b2 Correct MacOS Host port example (#1301) 2023-11-29 11:44:03 -05:00
Michael
2ae80e1e27 Update README.md
add new recent models as examples
2023-11-28 22:16:37 -05:00
Michael Yang
b173cfc558 Merge pull request #1195 from jmorganca/mxyng/fix-bar-rate
progress: fix bar rate
2023-11-28 11:55:23 -08:00
Michael Yang
424d53ac70 progress: fix bar rate 2023-11-28 11:44:56 -08:00
ftorto
e1a69d44c9 Update faq.md (#1299)
Fix a typo in the CA update command
2023-11-28 09:54:42 -05:00
Jason Jacobs
3d620f9462 ignore jetbrain ides (#1287) 2023-11-27 15:57:45 -05:00
Bruce MacDonald
928950fcc6 update python client create example (#1227)
* add remote create to python example client
2023-11-27 15:36:19 -05:00
Kasumi
39c6d949fc Add Amica to community integrations (#1281) 2023-11-27 10:44:37 -05:00
Jeffrey Morgan
16a9006306 add back f16c instructions on intel mac 2023-11-26 15:59:49 -05:00
Jeffrey Morgan
e9216ea459 fix readline history on linux 2023-11-26 15:59:04 -05:00
Jeffrey Morgan
9e4a316405 update submodule commit 2023-11-26 14:52:00 -05:00
Jeffrey Morgan
9fb5e8399c Fix issues with inputting and formatting multi line strings in ollama run
Co-authored-by: Wen Sun <iwendellsun@gmail.com>
2023-11-26 12:54:29 -05:00
Jing Zhang
82b9b329ff windows CUDA support (#1262)
* Support cuda build in Windows
* Enable dynamic NumGPU allocation for Windows
2023-11-24 17:16:36 -05:00
Jongwook Choi
12e8c12d2b Disable CUDA peer access as a workaround for multi-gpu inference bug (#1261)
When CUDA peer access is enabled, multi-gpu inference will produce
garbage output. This is a known bug of llama.cpp (or nvidia). Until the
upstream bug is fixed, we can disable CUDA peer access temporarily
to ensure correct output.

See #961.
2023-11-24 14:05:57 -05:00
Jeffrey Morgan
d77dde126b consistent cpu instructions on macos and linux 2023-11-22 16:26:46 -05:00
Michael Yang
c7e70cd3bb Merge pull request #1245 from jmorganca/mxyng/gguf-int
fix: gguf int type
2023-11-22 11:42:56 -08:00
Michael Yang
199941cd15 fix: gguf int type 2023-11-22 11:40:30 -08:00
Long Huynh
c9474f7f61 Update README.md - Community Integrations - Obsidian BMO Chatbot plugin (#1239) 2023-11-22 14:32:30 -05:00
Jeffrey Morgan
927e3ba4a4 tag image with correct version when building with build_docker script 2023-11-22 14:32:17 -05:00
Bruce MacDonald
37d95157df fix relative path on create (#1222) 2023-11-21 15:43:17 -05:00
Jeffrey Morgan
2eaa95b417 Update api.md 2023-11-21 15:32:05 -05:00
Kevin Cao
3cd07728f4 Make alt+backspace delete word (#1223) 2023-11-21 12:26:47 -08:00
Michael Yang
ecf8b793f0 Merge pull request #1224 from jmorganca/mxyng/update
update llama.cpp
2023-11-21 12:21:59 -08:00
Matt Williams
abf294826b Merge pull request #1221 from jmorganca/mattw/communityinstalls
add installation packages category to community
2023-11-21 12:12:23 -08:00
Steve Korshakov
ae06bb426b add Llama Coder (#1225)
* add Llama Coder
* Update README.md
2023-11-21 14:08:19 -05:00
Matt Williams
d8e0f62ebb Merge pull request #1159 from jmorganca/mattw/functioncalling
Example: Function Calling in Typescript
2023-11-21 10:06:55 -08:00
Michael Yang
a00fac4ec8 update llama.cpp 2023-11-21 09:50:02 -08:00
Jeffrey Morgan
f2113c1fc7 fix potential error in progress bar calculation 2023-11-21 12:48:20 -05:00
Jeffrey Morgan
6452e2ecb8 fix cases where progress bar would not be fixed size 2023-11-21 12:07:25 -05:00
Matt Williams
9a28e263a5 Update README.md
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>
2023-11-21 07:25:32 -08:00
Matt Williams
0c066c9214 Update README.md
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>
2023-11-21 07:25:26 -08:00
Jeffrey Morgan
aabd71aede fix rendering and variable width issues on progress bar 2023-11-21 10:02:37 -05:00
Matt Williams
da4d7c9f9c add installation packages category to community
Moved the arch package and someone has added a pr for brew.
that needs to get updated to be a link.

Signed-off-by: Matt Williams <m@technovangelist.com>
2023-11-21 06:40:59 -08:00
Matt Williams
f321b13a03 Merge pull request #1178 from tusharhero/install-instructions-archlinux
Add Installation instructions for Archlinux
2023-11-21 06:33:22 -08:00
Matt Williams
5ebcde1541 Merge branch 'main' into install-instructions-archlinux 2023-11-21 06:32:50 -08:00
Matt Williams
45206cb7cc Merge pull request #1218 from danemadsen/main
Update Maid repo
2023-11-21 06:30:33 -08:00
Matt Williams
6e65b84f54 Merge pull request #1219 from dustinblackman/main
docs: Add Oatmeal to terminal integrations
2023-11-21 06:28:12 -08:00
Dustin Blackman
c00ce12e83 docs: Add Oatmeal to terminal integrations 2023-11-21 06:47:43 -05:00
tusharhero
e1cd3152c9 Move Archlinux package to Community Integrations section. 2023-11-21 16:28:50 +05:30
Dane Madsen
0bef3778c9 Update README.md 2023-11-21 21:02:13 +11:00
Dane Madsen
6ebab38b89 Merge branch 'jmorganca:main' into main 2023-11-21 20:01:13 +10:00
Dane Madsen
5d8e864d44 Update Maid repo 2023-11-21 21:00:54 +11:00
Matt Williams
5f7acd0bbd remove 'recent'
Signed-off-by: Matt Williams <m@technovangelist.com>
2023-11-20 17:03:25 -08:00
Matt Williams
44b3a1ad42 Merge branch 'mattw/functioncalling' of github.com:jmorganca/ollama into mattw/functioncalling
Signed-off-by: Matt Williams <m@technovangelist.com>
2023-11-20 17:01:41 -08:00
Matt Williams
0260be4414 remove 'recently'
Signed-off-by: Matt Williams <m@technovangelist.com>
2023-11-20 16:57:07 -08:00
Jeffrey Morgan
a3fcecf943 only set main_gpu if value > 0 is provided 2023-11-20 19:54:04 -05:00
Jeffrey Morgan
df07e4a097 remove redundant filename parameter (#1213) 2023-11-20 17:05:36 -05:00
Michael Yang
0b7ade0d4c Merge pull request #1212 from jmorganca/mxyng/metal
enable metal for fp32, q5_0, q5_1
2023-11-20 13:56:39 -08:00
Michael Yang
19b7a4d715 recent llama.cpp update added kernels for fp32, q5_0, and q5_1 2023-11-20 13:44:31 -08:00
Bruce MacDonald
31ab453d37 resolve FROM path before sending modelfile (#1211) 2023-11-20 16:43:48 -05:00
Jeffrey Morgan
35c4b5ec16 calculate hash separately from http request 2023-11-20 15:45:11 -05:00
James Braza
f24741ff39 Documenting how to view Modelfiles (#723)
* Documented viewing Modelfiles in ollama.ai/library

* Moved Modelfile in ollama.ai down per request
2023-11-20 15:24:29 -05:00
Jeffrey Morgan
8c4022b06b fix initial progress stats 2023-11-20 14:33:46 -05:00
Jeffrey Morgan
433702f421 hide progress stats on completion 2023-11-20 14:22:39 -05:00
Matt Williams
48896f626c Update examples/typescript-functioncalling/extractwp.ts
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com>
2023-11-20 10:12:10 -08:00
Matt Williams
c57aee6fba Update examples/typescript-functioncalling/readme.md
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com>
2023-11-20 10:10:42 -08:00
Jeffrey Morgan
6066c70edd restore progress messages for older endpoints 2023-11-20 11:37:17 -05:00
Jeffrey Morgan
f10ac5de19 restore stats updated every second to progress bar 2023-11-20 10:58:19 -05:00
Jeffrey Morgan
93a108214c only show decimal points for smaller file size numbers 2023-11-20 10:58:19 -05:00
Purinda Gunasekara
be61a81758 main-gpu argument is not getting passed to llamacpp, fixed. (#1192) 2023-11-20 10:52:52 -05:00
Toni Soriano
2fdf1b5ff8 add laravel package to README.md (#1208)
Co-authored-by: Toni <cloudstudio@Tonis-Mac-mini.local>
2023-11-20 10:48:35 -05:00
Huy Le
331068b964 Adding ogpt.nvim into the list of plugins! (#1190)
* adding ollama.nvim for visibility

* adding an ogpt.nvim neovim plugin
2023-11-20 10:39:14 -05:00
Andy Brenneke
0179d8eb6b Add Rivet to Community Integrations (#1183) 2023-11-20 10:36:47 -05:00
Eli Bendersky
be48741308 README: link to LangChainGo for talking to ollama, with an example (#1206) 2023-11-20 10:35:07 -05:00
Jeffrey Morgan
6bbd6e26fb fix temporary newline created and removed with spinner in ollama run 2023-11-20 00:49:08 -05:00
Jeffrey Morgan
e6ad4813d3 dont crash when redirecting stderr 2023-11-19 23:50:45 -05:00
Jeffrey Morgan
13ba6df5ab enable cpu instructions on intel macs 2023-11-19 23:20:26 -05:00
Jeffrey Morgan
9d73d3a6b5 add back part.Reset() 2023-11-19 14:32:19 -05:00
Jeffrey Morgan
72cd336410 dont retry on upload complete context cancel 2023-11-19 14:32:19 -05:00
Jeffrey Morgan
1bd594b2fa revert to using one open file for blob uploads 2023-11-19 14:32:19 -05:00
Jeffrey Morgan
9a8c21ac3d use exponential everywhere 2023-11-19 14:32:19 -05:00
Jeffrey Morgan
f6b317e8c9 fix sending too little data in chunk upload body 2023-11-19 14:32:19 -05:00
Jeffrey Morgan
ac5076ce1e exponential backoff up to 30s 2023-11-19 14:32:19 -05:00
Michael Yang
42c2e3a624 upload: retry complete upload 2023-11-19 14:32:19 -05:00
Michael Yang
cb42589792 adjust download/upload parts 2023-11-19 14:32:19 -05:00
Jeffrey Morgan
258addc799 fix comment in progress.go 2023-11-19 13:46:19 -05:00
Jeffrey Morgan
c06b9b7304 update progress rendering to be closer to v0.1.10 2023-11-19 13:43:21 -05:00
Jeffrey Morgan
95b9acd324 improve pull percentage rendering 2023-11-19 11:00:43 -05:00
Jeffrey Morgan
04cbf5ccc0 progress bar styling improvements 2023-11-19 09:54:33 -05:00
Jeffrey Morgan
e1d7056496 update progress statuses 2023-11-19 09:21:13 -05:00
Jeffrey Morgan
02524a56ff check retry for authorization error 2023-11-19 00:19:53 -05:00
Jeffrey Morgan
1657c6abc7 add note to specify JSON in the prompt when using JSON mode 2023-11-18 22:59:26 -05:00
Jeffrey Morgan
12e046f12a remove unused function 2023-11-18 22:16:51 -05:00
Jeffrey Morgan
36a3bbf65f Update llm/llama.go 2023-11-18 21:25:07 -05:00
Bruce MacDonald
43a726149d fix potentially inaccurate error message 2023-11-18 21:25:07 -05:00
Jeffrey Morgan
984714f131 update status text when transfering blob on ollama create 2023-11-18 09:40:10 -05:00
Jeffrey Morgan
bab9494176 add - separator to temp file created on ollama create 2023-11-18 09:39:52 -05:00
Jeffrey Morgan
85e4441c6a cache docker builds 2023-11-18 08:51:38 -05:00
Michael Yang
42e43736a4 Merge pull request #1186 from jmorganca/mxyng/copy-blob
fix cross device rename
2023-11-17 21:54:53 -08:00
Michael Yang
c6e6c8ee7e fix cross device rename 2023-11-17 15:22:17 -08:00
Jeffrey Morgan
a185b29719 fix install script error on linux 2023-11-17 18:00:41 -05:00
Michael Yang
dc84b20d6b Merge pull request #1104 from jmorganca/mxyng/jupyter
add jupyter notebook example
2023-11-17 14:46:26 -08:00
Michael Yang
ad8659b980 Merge pull request #1161 from jmorganca/mxyng/systemd-placeholder
placeholder environment variables
2023-11-17 14:45:38 -08:00
Michael Yang
c1bbf5ddee Merge pull request #1134 from jmorganca/mxyng/progress
progress bar
2023-11-17 14:03:35 -08:00
Bruce MacDonald
0b19e24d81 only retry once on auth failure (#1175) 2023-11-17 14:22:35 -05:00
Michael Yang
3cb07d2773 simplify StopAndClear 2023-11-17 10:26:22 -08:00
Michael Yang
976068369b stop all spinners on progress stop 2023-11-17 10:06:19 -08:00
Michael Yang
4d677ee389 no divide by zero 2023-11-17 10:06:19 -08:00
Michael Yang
7ea905871a only move cursor up if pos > 0 2023-11-17 10:06:19 -08:00
Michael Yang
d6ecaa2cbf update progress responses 2023-11-17 10:06:19 -08:00
Michael Yang
4dcf7a59b1 generate progress 2023-11-17 10:06:19 -08:00
Michael Yang
1c0e092ead progress cmd 2023-11-17 10:06:19 -08:00
Michael Yang
c4a3ccd7ac progress 2023-11-17 10:06:19 -08:00
Michael Yang
9f04e5a8ea format bytes 2023-11-17 10:06:19 -08:00
Michael Yang
f91bb2f7f0 remove progressbar 2023-11-17 10:06:19 -08:00
Michael Yang
0813387414 Merge pull request #1177 from jmorganca/mxyng/faq
faq: fix heading and add more details
2023-11-17 10:05:21 -08:00
Michael Yang
4936b5bb37 add jupyter readme 2023-11-17 10:04:52 -08:00
tusharhero
786288829e Make Archlinux a sub-heading of Linux. 2023-11-17 23:17:36 +05:30
tusharhero
72dcc952b6 Add Installation instructions for Archlinux
Pacman is the recommended installation method. And the package is in
the official repository, so makes sense to mention it in the README.
2023-11-17 23:13:40 +05:30
Michael Yang
f7f6d6c693 Update examples/jupyter-notebook/ollama.ipynb
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com>
2023-11-17 09:30:07 -08:00
Michael Yang
a3053b66d2 add jupyter notebook example 2023-11-17 09:30:07 -08:00
Michael Yang
c82ead4d01 faq: fix heading and add more details 2023-11-17 09:02:17 -08:00
Michael Yang
90860b6a7e update faq (#1176) 2023-11-17 11:42:58 -05:00
Jeffrey Morgan
81092147c4 remove unnecessary -X POST from example curl commands 2023-11-17 09:50:38 -05:00
Jeffrey Morgan
92656a74b7 Use llama2 as the model in api.md 2023-11-17 07:17:51 -05:00
Jeffrey Morgan
41434a7cdc build intel mac with correct binary and compile flags 2023-11-16 22:14:51 -05:00
Michael Yang
71687ab809 Merge pull request #1164 from jmorganca/mxyng/faq
update faq
2023-11-16 17:20:18 -08:00
Michael Yang
d8842b4d4b update faq 2023-11-16 17:07:36 -08:00
Michael Yang
32add8577d placeholder environment variables 2023-11-16 16:57:39 -08:00
Michael Yang
585f9c01fa Merge pull request #1160 from jmorganca/mxyng/faq
update faq
2023-11-16 16:48:51 -08:00
Michael Yang
c13bde962d Update docs/faq.md
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>
2023-11-16 16:48:38 -08:00
Michael Yang
ee307937fd update faq 2023-11-16 16:46:43 -08:00
Matt Williams
ab6639bc47 Merge pull request #1074 from jmorganca/mattw/loganalysisexample
Log Analysis Example
2023-11-16 16:33:07 -08:00
Matt Williams
fefae84c06 example: function calling
Signed-off-by: Matt Williams <m@technovangelist.com>
2023-11-16 16:26:29 -08:00
Jeffrey Morgan
dbe6e77472 Update README.md 2023-11-16 16:46:38 -05:00
Bruce MacDonald
4b3f4bc7d9 return failure details when unauthorized to push (#1131)
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>
2023-11-16 16:44:18 -05:00
Michael Yang
a5ccf742c1 fix cross repo mounts 2023-11-16 16:33:30 -05:00
Michael Yang
e33ef391cd fix push scope error for inherited model 2023-11-16 16:33:30 -05:00
yanndegat
75295b9528 install: fix enable contrib on debian 12 (#1151)
On debian 12, sources definitions have moved from
/etc/apt/sources.list to /etc/apt/sources.list.d/debian.sources
2023-11-16 15:53:06 -05:00
Matt Williams
db5ef3004c Merge pull request #1079 from jmorganca/mattw/jsonexample
Add example using JSON format output
2023-11-16 09:13:34 -08:00
Michael Yang
b5f158f046 add faq for proxies (#1147) 2023-11-16 11:43:37 -05:00
Piero Savastano
30141b42e9 Add Cheshire Cat to community integrations (#1124) 2023-11-16 11:30:54 -05:00
Dane Madsen
5f301ece1d Add Maid to Community Integrations (#1120) 2023-11-16 11:27:53 -05:00
Michael Yang
77954bea0e Merge pull request #898 from jmorganca/mxyng/build-context
create remote models
2023-11-15 16:41:12 -08:00
Michael Yang
54f92f01cb update docs 2023-11-15 15:28:15 -08:00
Michael
30ae6e731e Update randomaddresses.py 2023-11-15 18:24:50 -05:00
Michael
b28a30f7ba Update examples/python-json-datagenerator/predefinedschema.py
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com>
2023-11-15 18:23:36 -05:00
Jeffrey Morgan
ecd71347ab Update faq.md 2023-11-15 18:17:13 -05:00
Jeffrey Morgan
8ee4cbea0f Remove table of contents in faq.md 2023-11-15 18:16:27 -05:00
Michael Yang
652d90e1c7 Update server/images.go
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com>
2023-11-15 15:16:23 -08:00
Michael Yang
bc22d5a38b no blob response 2023-11-15 15:16:23 -08:00
Michael Yang
71d71d0988 update docs 2023-11-15 15:16:23 -08:00
Michael Yang
1901044b07 use checksum reference 2023-11-15 15:16:23 -08:00
Michael Yang
d660eebf22 fix create from model tag 2023-11-15 15:16:23 -08:00
Michael Yang
cac11c9137 update api docs 2023-11-15 15:16:23 -08:00
Michael Yang
a07c935d34 ignore non blobs 2023-11-15 15:16:23 -08:00
Michael Yang
1552cee59f client create modelfile 2023-11-15 15:16:23 -08:00
Michael Yang
3ca56b5ada add create modelfile field 2023-11-15 15:16:23 -08:00
Michael Yang
b0d14ed51c refactor create model 2023-11-15 15:16:23 -08:00
Matt Williams
f61f340279 FAQ: answer a few faq questions (#1128)
* faq: does ollama share my prompts

Signed-off-by: Matt Williams <m@technovangelist.com>

* faq: ollama and openai

Signed-off-by: Matt Williams <m@technovangelist.com>

* faq: vscode plugins

Signed-off-by: Matt Williams <m@technovangelist.com>

* faq: send a doc to Ollama

Signed-off-by: Matt Williams <m@technovangelist.com>

* extra spacing

Signed-off-by: Matt Williams <m@technovangelist.com>

* Update faq.md

* Update faq.md

---------

Signed-off-by: Matt Williams <m@technovangelist.com>
Co-authored-by: Michael <mchiang0610@users.noreply.github.com>
2023-11-15 18:05:13 -05:00
Michael Yang
686f85d6ca Merge pull request #1132 from jmorganca/mxyng/human-bytes
replace go-humanize with format.HumanBytes
2023-11-15 09:46:21 -08:00
bnodnarb
85951d25ef Created tutorial for running Ollama on NVIDIA Jetson devices (#1098) 2023-11-15 12:32:37 -05:00
Dane Madsen
779e196ef6 Merge branch 'jmorganca:main' into main 2023-11-15 21:38:07 +10:00
Michael Yang
01ea6002c4 replace go-humanize with format.HumanBytes 2023-11-14 14:57:41 -08:00
Jeffrey Morgan
423862042a treat ollama run model < file as entire prompt, not prompt-per-line (#1126)
Previously, `ollama run` treated a non-terminal stdin (such as `ollama run model < file`) as containing one prompt per line. To run inference on a multi-line prompt, the only non-API workaround was to run `ollama run` interactively and wrap the prompt in `"""..."""`.

Now, `ollama run` treats a non-terminal stdin as containing a single prompt. For example, if `myprompt.txt` is a multi-line file, then `ollama run model < myprompt.txt` would treat `myprompt.txt`'s entire contents as the prompt.

Co-authored-by: Quinn Slack <quinn@slack.org>
2023-11-14 16:42:21 -05:00
Bruce MacDonald
df18486c35 Move /generate format to optional parameters (#1127)
This field is optional and should be under the `Advanced parameters` header
2023-11-14 16:12:30 -05:00
Jeffrey Morgan
4e612a2e92 use stdout fd for terminal size (#1125) 2023-11-14 16:09:09 -05:00
Matt Williams
47ffb81db7 Update examples/python-json-datagenerator/readme.md
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com>
2023-11-14 10:33:34 -08:00
Matt Williams
69795d2db0 Update examples/python-json-datagenerator/readme.md
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com>
2023-11-14 10:33:16 -08:00
Matt Williams
acde0819d9 Update examples/python-json-datagenerator/randomaddresses.py
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com>
2023-11-14 10:33:02 -08:00
Matt Williams
f748331aa3 Update examples/python-json-datagenerator/predefinedschema.py
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com>
2023-11-14 10:32:45 -08:00
Matt Williams
f4edc302a8 Update examples/python-loganalysis/readme.md
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com>
2023-11-14 10:31:22 -08:00
Matt Williams
64b7e0c218 Update examples/python-loganalysis/loganalysis.py
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com>
2023-11-14 10:31:05 -08:00
Matt Williams
eced0d52ab Update examples/python-loganalysis/loganalysis.py
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com>
2023-11-14 10:30:30 -08:00
Matt Williams
96bf9cafa7 Update examples/python-loganalysis/loganalysis.py
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com>
2023-11-14 10:30:17 -08:00
Jeffrey Morgan
6e0f686afa --format json should work in interactive mode 2023-11-14 10:22:03 -05:00
Dane Madsen
c1a5220860 Update README.md 2023-11-14 15:31:31 +10:00
Dane Madsen
3b15175a70 Add maid to community integrations 2023-11-14 15:30:03 +10:00
Jeffrey Morgan
c1844bbee2 add json mode to cli (#1095) 2023-11-13 21:54:02 -05:00
Huy Le
cb745965ce adding ollama.nvim for visibility (#1115) 2023-11-13 17:00:17 -05:00
Enrico Ros
8d29b6a2b6 New big-AGI integration (#1078)
* New big-AGI integration

Ollama works great in big-AGI, and this document explains how to link the two projects.

* Update README.md
2023-11-13 16:59:00 -05:00
Ilya Breitburg
724aa64bee Add Dart library to README.md (#1106) 2023-11-13 14:50:42 -05:00
Michael Yang
d91c103e74 Merge pull request #1055 from dansreis/946-fix-incorrect-base-model-name
Fixed incorrect base model name
2023-11-13 08:42:55 -08:00
Kevin Hermawan
98ec7d81e3 Add OllamaKit to the community integrations (#1085) 2023-11-11 14:41:42 -08:00
Matt Williams
b6817a83d8 Add gif and finish readme
Signed-off-by: Matt Williams <m@technovangelist.com>
2023-11-10 16:41:48 -06:00
Matt Williams
73f3448ede add example showing use of JSON format
Signed-off-by: Matt Williams <m@technovangelist.com>
2023-11-10 16:33:56 -06:00
Daniel Reis
7c438f2c53 Replaced method 2023-11-10 20:22:03 +00:00
Daniel Reis
6e46338d44 Reverting previous changes 2023-11-10 20:21:35 +00:00
Jeffrey Morgan
cdddd3df65 add format to example python client 2023-11-10 10:22:21 -08:00
Daniel Hiltgen
afa61bdf45 Merge pull request #1075 from jmorganca/dhiltgen/unexpected-eof
Resume chunk download on UnexpectedEOF errors
2023-11-10 08:48:27 -08:00
Daniel Hiltgen
cc54a416c6 Resume chunk download on UnexpectedEOF errors
If the chunk download is interrupted, resume from where we left off
2023-11-10 08:29:42 -08:00
Matt Williams
c819d7f68a Merge pull request #955 from jmorganca/mattw/example-bash-compare
docs: add examples using bash to compare models
2023-11-10 08:59:32 -06:00
Matt Williams
e4f59ba073 better streaming plus gif
Signed-off-by: Matt Williams <m@technovangelist.com>
2023-11-10 08:55:17 -06:00
Matt Williams
5de568bffe Add a simple log analysis example
Signed-off-by: Matt Williams <m@technovangelist.com>
2023-11-10 08:28:52 -06:00
Jeffrey Morgan
5cba29b9d6 JSON mode: add `"format" as an api parameter (#1051)
* add `"format": "json"` as an API parameter
---------
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com>
2023-11-09 16:44:02 -08:00
Daniel Reis
d17730356a Removed inline parse model path 2023-11-09 22:44:26 +00:00
Daniel Reis
32d79a6eea Using 'GetShortTagname' method instead 2023-11-09 22:40:37 +00:00
Bruce MacDonald
5b39503bcd document specifying multiple stop params (#1061) 2023-11-09 13:16:26 -08:00
Bruce MacDonald
1ae84bc2a2 skip gpu if less than 2GB VRAM are available (#1059) 2023-11-09 13:16:16 -08:00
Bruce MacDonald
db8bf336fc Update README.md 2023-11-09 12:53:24 -08:00
Nick Anderson
d77e094a90 Added gptel to list of integrations (#1062) 2023-11-09 12:52:36 -08:00
Matt Williams
dd3dc47ddb Merge pull request #992 from aashish2057/aashish2057/langchainjs_doc_update 2023-11-09 05:08:31 -08:00
Michael Yang
c5e1bbabda instead of static number of parameters for each model family, get the real number from the tensors (#1022)
* parse tensor info

* refactor decoder

* return actual parameter count

* explicit rounding

* s/Human/HumanNumber/
2023-11-08 17:55:46 -08:00
Bruce MacDonald
a49d6acc1e add a complete /generate options example (#1035) 2023-11-08 16:44:36 -08:00
Moritz Poldrack
6e9bcdb9b3 progressbar: make start and end seamless (#1042) 2023-11-08 16:42:40 -08:00
Matt Williams
13086363bd Update as per bmacd
Signed-off-by: Matt Williams <m@technovangelist.com>
2023-11-08 18:09:05 -06:00
Bruce MacDonald
ec2a31e9b3 support raw generation requests (#952)
- add the optional `raw` generate request parameter to bypass prompt formatting and response context
-add raw request to docs
2023-11-08 14:05:02 -08:00
Amith Koujalgi
ec84c02d54 Add Ollama4j Java library to the list of community libraries (#1044) 2023-11-08 11:04:32 -08:00
Kevin Hermawan
2a88b66bc9 Add Ollamac to community integrations (#1043) 2023-11-08 11:01:09 -08:00
Jeffrey Morgan
2d0faea96c clean up README.md 2023-11-08 00:03:29 -08:00
Jeffrey Morgan
637142181a clean up README.md 2023-11-07 23:52:31 -08:00
Matt Williams
bcbff421c9 Merge pull request #1023 from jmorganca/mattw/wherearemodelsfaq 2023-11-07 17:59:54 -08:00
thealhu
1359d6cf3b Fix sudo variable in install.sh (#1034)
It was forgotten to replace sudo at one place with the variable for sudo.
2023-11-07 09:59:57 -08:00
Omar Magdy
6e2d0224d9 Added logseq ollama plugin (#1029) 2023-11-07 09:58:13 -08:00
Ikko Eltociear Ashimine
921406f721 Update client.py (#1026)
recieve -> receive
2023-11-07 09:55:47 -08:00
Michael Yang
c7047d7353 Merge pull request #959 from jmorganca/mxyng/example-k8s 2023-11-07 10:43:21 -06:00
Matt Williams
1d155caba3 docs: clarify where the models are stored in the faq
Signed-off-by: Matt Williams <m@technovangelist.com>
2023-11-06 14:38:49 -08:00
Michael Yang
866324b9a5 Merge pull request #943 from tjbck/patch-1
doc: categorised community integrations + added ollama-webui
2023-11-06 11:35:39 -08:00
Michael Yang
145e060855 Apply suggestions from code review
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com>
2023-11-06 11:32:23 -08:00
Michael Yang
146072113d Merge pull request #993 from jmorganca/mxyng/cleanup
cleanup upload and download errors
2023-11-06 11:32:12 -08:00
Timothy Jaeryang Baek
33d31d1b56 Merge branch 'main' into patch-1 2023-11-06 14:27:02 -05:00
Dr. David A. Kunz
274c6cbf4c Added gen.nvim to community integrations (#996) 2023-11-06 10:51:41 -08:00
Elton Renda
7ebbd89bbf add hass-ollama-conversation (#999) 2023-11-06 10:50:35 -08:00
Lars Grammel
9079b1bb6d Add ModelFusion community integration (#1020) 2023-11-06 10:46:16 -08:00
Timothy Jaeryang Baek
6febde7200 Merge branch 'main' into patch-1 2023-11-04 19:12:18 -05:00
pepperoni21
325cfcd9ff Added ollama-rs to community integrations (#995)
Co-authored-by: pepperoni21 <pepperoni2100@gmail.com>
2023-11-04 14:51:29 -07:00
Jeffrey Morgan
639d0fd070 Update README.md 2023-11-04 12:24:24 -07:00
Jeffrey Morgan
e21579a0f1 Restore system prompt on requests 2023-11-03 17:26:45 -07:00
Jeffrey Morgan
c44b619428 remove unused fmt.Println 2023-11-03 17:24:58 -07:00
Michael Yang
434a6f9d46 return last error 2023-11-03 16:49:51 -07:00
aashish2057
b13586cc72 update langchainjs doc 2023-11-03 18:45:19 -05:00
Jeffrey Morgan
17678b7225 Restore system prompt on requests and default num_keep to 0 2023-11-03 13:25:25 -07:00
Michael Yang
84725ec7e3 refactor part reset 2023-11-03 09:20:32 -07:00
Bruce MacDonald
6109bebba6 reformat api docs for more examples (#972) 2023-11-03 10:57:00 -04:00
Noah Gitsham
8ae8c9fa8c Remove duplicate "install" in GPU support warning (#984) 2023-11-03 00:45:14 -07:00
Noah Gitsham
f39daff461 Add missing "be" to GPU support warning message (#983) 2023-11-02 18:37:12 -07:00
Jeffrey Morgan
c50b01bc21 check request.Context for initial system prompt 2023-11-02 18:17:00 -07:00
Bruce MacDonald
b9dc875401 remove modelfile context deprecated in v0.0.7 (#974) 2023-11-02 20:52:56 -04:00
Jeffrey Morgan
06589a3b30 Set NumKeep to 4 by default (#982) 2023-11-02 17:26:11 -07:00
Michael Yang
1fd511e661 Merge pull request #975 from jmorganca/mxyng/downloads
update downloads to use retry wrapper
2023-11-02 16:12:48 -07:00
Michael Yang
c01bbe94fd Merge pull request #979 from jmorganca/mxyng/num-keep
update default NumKeep
2023-11-02 15:48:44 -07:00
Jeffrey Morgan
1beb5645a9 only use system prompt if context is not provided (#978) 2023-11-02 15:48:02 -07:00
Michael Yang
6db3691b8f update default NumKeep 2023-11-02 15:47:35 -07:00
Michael Yang
fe5a872444 fix upload 2023-11-02 13:25:58 -07:00
Michael Yang
d39709260f download with retry 2023-11-02 13:16:11 -07:00
Michael Yang
60bb3c03a1 use http.Method 2023-11-02 13:12:45 -07:00
Jeffrey Morgan
2e53704685 default rope params to 0 for new models (#968) 2023-11-02 08:41:30 -07:00
Michael Yang
527f9a7975 Merge pull request #966 from jmorganca/mxyng/fix-log 2023-11-01 17:49:10 -07:00
Michael Yang
c4cc738cbf fix log 2023-11-01 17:18:11 -07:00
Michael Yang
2c6189f4fe Merge pull request #750 from jmorganca/mxyng/concurrent-uploads
concurrent uploads
2023-11-01 15:00:01 -07:00
Michael Yang
dccac8c8fa k8s example 2023-11-01 14:52:58 -07:00
Michael Yang
c05ab9a86e Merge pull request #965 from jmorganca/mxyng/go-mod-tidy
go mod tidy
2023-11-01 11:55:43 -07:00
Michael Yang
f42f3d9b27 go fmt 2023-11-01 11:55:08 -07:00
Michael Yang
341fb7e35f go mod tidy 2023-11-01 11:54:25 -07:00
Michael
f31961637f Update README.md 2023-11-01 12:20:55 -04:00
Michael Yang
ec3614812a Merge pull request #960 from jmorganca/mxyng/fix-tautology 2023-11-01 08:30:49 -07:00
Michael Yang
f14969314a Merge pull request #958 from jmorganca/mxyng/append-ld-library-path 2023-11-01 08:30:38 -07:00
Bruce MacDonald
1fb9288661 notify that the ollama api is available after linux install (#954) 2023-11-01 11:28:26 -04:00
Matt Williams
01a03caa20 Merge pull request #956 from jmorganca/mattw/apidocupdate 2023-10-31 21:43:11 -07:00
Michael Yang
bf6786bb39 fix tautology 2023-10-31 20:49:48 -07:00
Michael Yang
642128b75a append LD_LIBRARY_PATH 2023-10-31 15:54:49 -07:00
Matt Williams
f21bd6210d docs: clarify and clean up API docs
Signed-off-by: Matt Williams <m@technovangelist.com>
2023-10-31 13:11:33 -07:00
Matt Williams
80362fedce better readme
Signed-off-by: Matt Williams <m@technovangelist.com>
2023-10-31 12:40:46 -07:00
Matt Williams
5757925060 add a gif
Signed-off-by: Matt Williams <m@technovangelist.com>
2023-10-31 11:52:01 -07:00
Michael
4512301756 Update README.md 2023-10-31 13:25:36 -04:00
Matt Williams
2236a93efc docs: add examples using bash to compare models
Signed-off-by: Matt Williams <m@technovangelist.com>
2023-10-31 09:12:39 -07:00
Matt Williams
ad88799411 Merge pull request #949 from jmorganca/matt/fixPrivateGPT
fix: private gpt example was broken due to changes in chroma
2023-10-30 17:17:00 -07:00
Bruce MacDonald
0818b5e318 readline windows terminal support (#950)
- update the readline package to have basic support on windows, this is not full feature parity with the unix cli yet
2023-10-30 16:18:12 -04:00
Matt Williams
1df6100c77 Update examples/langchain-python-rag-privategpt/privateGPT.py
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com>
2023-10-30 12:48:17 -07:00
Matt Williams
5c48fe1fb0 Update examples/langchain-python-rag-privategpt/constants.py
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com>
2023-10-30 12:47:56 -07:00
Dirk Loss
874bb31986 Fix conversion command for gptneox (#948) 2023-10-30 14:34:29 -04:00
Matt Williams
f7856a57eb fix: private gpt example was broken due to changes in chroma
Signed-off-by: Matt Williams <m@technovangelist.com>
2023-10-30 10:56:25 -07:00
Bruce MacDonald
f9a4281124 clean up: remove server functions from client (#937) 2023-10-30 11:10:18 -04:00
Timothy Jaeryang Baek
96da0792e6 doc: OllamaSharp for .NET moved to libraries 2023-10-28 16:18:38 -05:00
Timothy Jaeryang Baek
95d24262fc doc: categorised community integrations + added web-ui 2023-10-28 16:02:13 -05:00
Jeffrey Morgan
8d03bd7b54 remove +build directive in term.go 2023-10-28 09:56:03 -07:00
Jeffrey Morgan
9ec16f0f03 fix formatting when exiting ollama run 2023-10-27 21:26:23 -07:00
Jeffrey Morgan
57a58db1b0 history: update pos after compact 2023-10-27 20:38:03 -07:00
Jeffrey Morgan
2d75a4537c close input channel when receiving io.EOF 2023-10-27 20:26:04 -07:00
Jeffrey Morgan
4748609611 Don't quit ioloop on NUL character (#940)
* dont quit ioloop on 0 rune

* check for closed channel

* remove unused error on `Close()`
2023-10-27 20:01:48 -07:00
Jeffrey Morgan
c0dcea1398 Update faq.md 2023-10-27 18:29:00 -07:00
Michael Yang
115fc56eb7 calculate and verify md5 checksum 2023-10-27 17:07:33 -07:00
Michael Yang
186f685224 retry PUT 2023-10-27 17:07:33 -07:00
Michael Yang
12efcbb057 comments 2023-10-27 17:07:33 -07:00
Michael Yang
4e09aab8b9 concurrent uploads 2023-10-27 17:07:33 -07:00
Jeffrey Morgan
3a1ed9ff70 restore building runner with AVX on by default (#900) 2023-10-27 12:13:44 -07:00
Bruce MacDonald
6d283882b1 catch insufficient permissions nvidia err (#934) 2023-10-27 12:42:40 -04:00
Bruce MacDonald
5c3491f425 allow for a configurable ollama model storage directory (#897)
* allow for a configurable ollama models directory

- set OLLAMA_MODELS in the environment that ollama is running in to change where model files are stored
- update docs

Co-Authored-By: Jeffrey Morgan <jmorganca@gmail.com>
Co-Authored-By: Jay Nakrani <dhananjaynakrani@gmail.com>
Co-Authored-By: Akhil Acharya <akhilcacharya@gmail.com>
Co-Authored-By: Sasha Devol <sasha.devol@protonmail.com>
2023-10-27 10:19:59 -04:00
James Braza
e5d1ce4dde Tweaks to README.md (#906)
* Mentioned Docker Hub in docs
* Consolidated brew installs to one line
2023-10-27 00:10:23 -07:00
Bruce MacDonald
2665f3c28e offload 75% of available vram to improve stability (#921) 2023-10-26 20:49:55 -04:00
Patrick Devine
a79f030e75 add bracketed paste mode (#922) 2023-10-26 15:57:00 -07:00
Michael Yang
9bc5864a03 Merge pull request #918 from jmorganca/mxyng/fix-out-of-space
fix(download): no retry when out of space
2023-10-26 12:24:20 -07:00
Michael Yang
b88cc0fac9 Merge pull request #916 from jmorganca/mxyng/fix-client-host
fix(client): trim trailing slash
2023-10-26 12:24:12 -07:00
Patrick Devine
5b2cf16397 fix docker build annotations (#917) 2023-10-26 12:00:33 -07:00
Michael Yang
910816a532 fix(download): no retry when out of space 2023-10-26 11:34:07 -07:00
Michael Yang
28c3f288e2 client: fix trailing slash 2023-10-26 11:09:38 -07:00
Patrick Devine
deeac961bb new readline library (#847) 2023-10-25 16:41:18 -07:00
Jeffrey Morgan
49443e7da5 fix typo in README.md 2023-10-25 16:19:27 -07:00
Ajay Kemparaj
bb8464c0d2 update golang.org/x/net fixes CVE-2023-3978,CVE-2023-39325,CVE-2023-44487 (#855) 2023-10-25 16:17:24 -07:00
Michael Yang
daa5bb4473 Merge pull request #907 from jmorganca/mxyng/linux
update linux.md
2023-10-25 15:03:34 -07:00
Michael Yang
92119de9d8 update linux.md 2023-10-25 14:57:50 -07:00
Michael Yang
53b0ba8d43 Merge pull request #893 from jmorganca/mxyng/update-faq
update faq
2023-10-24 16:02:35 -07:00
Michael Yang
db342691f9 Update docs/faq.md
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com>
2023-10-24 13:59:33 -07:00
Bruce MacDonald
cecf83141e Linux uninstall instructions (#894) 2023-10-24 14:07:05 -04:00
Michael Yang
a5a2adf1ec update faq 2023-10-24 10:54:16 -07:00
Jeffrey Morgan
b0c9cd0f3b fix metal assertion errors 2023-10-24 00:32:36 -07:00
Jeffrey Morgan
77f61c6301 update submodule commit 2023-10-24 00:30:27 -07:00
Jeffrey Morgan
f3604534e5 update submodule commit 2023-10-23 23:59:12 -07:00
Jeffrey Morgan
914428351a Update import.md 2023-10-23 17:44:53 -07:00
Jeffrey Morgan
9afea9e3b9 Update import.md
Separate GGUF and PyTorch guides
2023-10-23 17:42:17 -07:00
Bruce MacDonald
c039432b5c add current user to ollama group on install (#772) 2023-10-23 17:06:31 -04:00
Michael Yang
c345b4ca7c Merge pull request #884 from jmorganca/mxyng/update-submodules
bump submodules
2023-10-23 11:27:38 -07:00
Michael Yang
0c7a00a264 bump submodules
pin to 9e70cc03229df19ca2d28ce23cc817198f897278 for now since
438c2ca83045a00ef244093d27e9ed41a8cb4ea9 is breaking
2023-10-23 11:17:59 -07:00
Michael Yang
36c160f1c3 Merge pull request #881 from jmorganca/mxyng/ggufv3
ggufv3
2023-10-23 10:50:45 -07:00
Michael Yang
b66bcaa582 Merge pull request #883 from jmorganca/mxyng/logs
update default log target
2023-10-23 10:50:29 -07:00
Michael Yang
c9167494cb update default log target 2023-10-23 10:44:50 -07:00
Michael Yang
125d0a013a ggufv3
ggufv3 adds support for big endianness, mainly for s390x architecture.
while that's not currently supported for ollama, the change is simple.

loosen version check to be more forward compatible. unless specified,
gguf versions other v1 will be decoded into v2.
2023-10-23 09:35:49 -07:00
Richard Awoyemi
ba2da6ceaa Added a minimalist React UI for Ollama models to the community contributions.md (#870) 2023-10-23 10:44:39 -04:00
Jeffrey Morgan
ccff9ca09c Update README.md 2023-10-21 11:58:10 -04:00
Jeffrey Morgan
436a5be49c Update README.md 2023-10-21 11:57:32 -04:00
Matt Williams
cc0bf96398 Merge pull request #829 from jmorganca/mattw/example-summarize-news
added python rag news summary
2023-10-20 21:03:16 -07:00
Michael Yang
386169205c update runtime options (#864) 2023-10-20 21:17:14 -04:00
Michael Yang
0d6342a882 Merge pull request #863 from jmorganca/mxyng/nil-pointer
fix: nil pointer dereference
2023-10-20 17:23:37 -07:00
Michael Yang
75bee074b6 fix: nil pointer dereference 2023-10-20 16:55:24 -07:00
Michael Yang
533d76368c Merge pull request #859 from jmorganca/mxyng/fix-hostname
fix: ollama host for hostname
2023-10-20 11:40:56 -07:00
Michael Yang
459f4a7889 fix: ollama host for hostname 2023-10-20 11:32:41 -07:00
Matt Williams
25c63c91d8 Update README.md
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>
2023-10-19 13:52:40 -07:00
Jeffrey Morgan
cbfff4f868 update dependencies in app/ 2023-10-19 15:52:41 -04:00
Jeffrey Morgan
7ed5a39bc7 simpler check for model loading compatibility errors 2023-10-19 14:50:49 -04:00
Michael Yang
cc1d03f4ec Merge pull request #841 from jmorganca/mxyng/cleanup-cmd-args 2023-10-19 11:22:40 -07:00
Michael Yang
846f593dbf Merge pull request #828 from jmorganca/mxyng/template-parameters
image: show parameters
2023-10-19 09:31:31 -07:00
Michael Yang
0a53da03fd Merge pull request #843 from jmorganca/mxyng/request-validation
basic request validation
2023-10-19 09:30:45 -07:00
Michael Yang
2ce1793a1d go fmt 2023-10-19 09:21:51 -07:00
Michael Yang
e1c5be24e7 check json eof 2023-10-19 09:21:51 -07:00
Michael Yang
2ad8a074ac generate: set created_at
move the empty response so it's more visible
2023-10-19 09:21:51 -07:00
Michael Yang
7e547c6833 s/message/error/ 2023-10-19 09:21:04 -07:00
Michael Yang
689842b9ff request: bad request when model missing fields 2023-10-19 09:21:04 -07:00
Michael Yang
a19d47642e models: rm workDir from CreateModel
unused after removing EMBED
2023-10-19 09:21:04 -07:00
Jeffrey Morgan
a7dad24d92 add error for falcon and starcoder vocab compatibility (#844)
add error for falcon and starcoder vocab compatibility
---------
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com>
2023-10-19 12:18:31 -04:00
Jeffrey Morgan
6b213216d5 Update import.md 2023-10-19 12:17:36 -04:00
Bruce MacDonald
fe6f3b48f7 do not reload the running llm when runtime params change (#840)
- only reload the running llm if the model has changed, or the options for loading the running model have changed
- rename loaded llm to runner to differentiate from loaded model image
- remove logic which keeps the first system prompt in the generation context
2023-10-19 10:39:58 -04:00
Michael Yang
36c88cb9db cmd: set ExactArgs 2023-10-18 14:40:48 -07:00
Michael Yang
235e43d7f6 Merge pull request #833 from discovertomorrow/leadingspace
Fix Issue with Leading Whitespaces in Decoded Context
2023-10-18 13:52:48 -07:00
Arne Müller
730996e530 use TrimPrefix instead of TrimLeft 2023-10-18 22:51:30 +02:00
Arne Müller
ce6197a8e0 removed redundant strings.CutPrefix from Decode 2023-10-18 22:47:20 +02:00
Arne Müller
46b9953f32 use strings.TrimLeft to remove spaces 2023-10-18 22:41:19 +02:00
Michael Yang
4dcceeffb7 let the template do the work 2023-10-18 13:12:00 -07:00
Michael Yang
019e4a4558 image: show parameters 2023-10-18 13:12:00 -07:00
Michael Yang
627d04d927 Merge pull request #827 from jmorganca/mxyng/template-adapters
model: native gotemplate adapter template
2023-10-18 13:11:25 -07:00
Michael Yang
940e8ebec3 Merge pull request #826 from jmorganca/mxyng/template-system
show: no template system if empty
2023-10-18 13:11:09 -07:00
Bruce MacDonald
565648f3f7 relay CUDA errors to the client (#825) 2023-10-18 15:36:56 -04:00
Arne Müller
90c49bed57 moved removal of leading space into Predict 2023-10-18 20:08:26 +02:00
Michael Yang
3a2477174f Merge pull request #822 from ggozad/fix-tags-api
Fix /api/tags for no models.
2023-10-18 09:34:00 -07:00
Yiorgis Gozadinos
8c6c2cbc8c When the .ollama folder is broken or there are no models return an empty list on /api/tags 2023-10-18 08:23:20 +02:00
Arne Müller
5dc0cff459 fix whitespace removal 2023-10-18 08:15:27 +02:00
Matt Williams
c5c8b4b16a added python rag news summary
Signed-off-by: Matt Williams <m@technovangelist.com>
2023-10-17 16:41:28 -07:00
Michael Yang
8299bf76ed model: native gotemplate adapter template 2023-10-17 15:28:38 -07:00
Michael Yang
ee4979e510 show: no template system if empty 2023-10-17 15:25:43 -07:00
Michael Yang
08b0e04f40 Merge pull request #813 from jmorganca/mxyng/llama
refactor llm/llama.go
2023-10-17 14:05:58 -07:00
Michael Yang
b36b0b71f8 use cut prefix 2023-10-17 14:01:39 -07:00
Michael Yang
094df37563 remove unused struct 2023-10-17 14:01:38 -07:00
Bruce MacDonald
f3648fd206 Update llama.cpp gguf to latest (#710) 2023-10-17 16:55:16 -04:00
Bruce MacDonald
bd93a94abd fix MB VRAM log output (#824) 2023-10-17 15:35:16 -04:00
Michael Yang
f55bdb6f10 Merge pull request #799 from deichbewohner/jsonmarshaling
Fix JSON Marshal Escaping for Special Characters
2023-10-17 08:46:02 -07:00
Michael Yang
2870a9bfc8 Merge pull request #812 from jmorganca/mxyng/fix-format-string
fix: wrong format string type
2023-10-17 08:40:49 -07:00
Michael Yang
c031c211d1 Merge pull request #809 from jmorganca/mxyng/fix-gpu
fix: regression unsupported metal types
2023-10-17 08:40:40 -07:00
Andreas Wäscher
68391b0055 Add OllamaSharp for .NET (#811) 2023-10-17 11:31:48 -04:00
Alexander F. Rødseth
b7e137323a Fix a typo (#818) 2023-10-17 09:00:15 -04:00
Arne Müller
8fa3f366ad Removed newline trimming and used buffer directly in POST request. 2023-10-17 08:17:35 +02:00
Michael Yang
fddb303f23 fix: format string wrong type 2023-10-16 16:14:28 -07:00
Michael Yang
ad5ee20c7b Merge pull request #794 from ggozad/add_oterm
Add oterm to community integrations
2023-10-16 15:51:55 -07:00
Michael Yang
785b4eb5bf Merge branch 'main' into add_oterm 2023-10-16 15:51:44 -07:00
Michael Yang
16ede1b30b Merge pull request #801 from s-kostyaev/add-ellama-community-integration
Add ellama community integration
2023-10-16 15:51:25 -07:00
Michael Yang
17d6bbbb2a Merge pull request #810 from vieux/patch-1
Update install.sh
2023-10-16 15:50:57 -07:00
Victor Vieux
6481b7f34c Update install.sh, avoid ARCH: unbound variable 2023-10-16 14:40:24 -07:00
Michael Yang
cb4a80b693 fix: regression unsupported metal types
omitting `--n-gpu-layers` means use metal on macos which isn't correct
since ollama uses `num_gpu=0` to explicitly disable gpu for file types
that are not implemented in metal
2023-10-16 14:37:20 -07:00
Bruce MacDonald
68d7255bd3 show request to server rather than local check (#778) 2023-10-16 17:27:25 -04:00
Michael Yang
9ef2fce33a Merge pull request #768 from jmorganca/mxyng/bytes
fix memory check
2023-10-16 12:42:41 -07:00
Michael Yang
43eaba3d60 Merge pull request #787 from jmorganca/mxyng/server-version2
server: print version on start
2023-10-16 09:59:30 -07:00
Michael Yang
1af493c5a0 server: print version on start 2023-10-16 09:59:14 -07:00
Bruce MacDonald
a0c3e989de deprecate modelfile embed command (#759) 2023-10-16 11:07:37 -04:00
Sergey Kostyaev
7af0fdce48 add ellama community integration 2023-10-16 16:39:10 +07:00
Arne Müller
ee94693b1a handling unescaped json marshaling 2023-10-16 11:15:55 +02:00
Yiorgis Gozadinos
731dbdc1a5 Add oterm to community integrations 2023-10-15 23:21:17 +02:00
Jeffrey Morgan
06bcfbd629 cleanup docker section in readme 2023-10-15 02:33:25 -04:00
Jeffrey Morgan
7d7c2510f8 add docker exec command to readme 2023-10-15 02:31:15 -04:00
Jeffrey Morgan
f9b2f999ac update readme with docker setup and link to import.md 2023-10-15 02:23:03 -04:00
Jeffrey Morgan
c416087339 import.md: formatting and spelling 2023-10-15 01:39:46 -04:00
Jeffrey Morgan
6002cebd2c import.md: convert and quantize docs 2023-10-15 00:11:51 -04:00
Jeffrey Morgan
212bdc541c import.md: model architectures spelling 2023-10-15 00:07:58 -04:00
Jeffrey Morgan
dca6686273 add steps for creating a Modelfile and more example commands to import.md 2023-10-15 00:05:50 -04:00
Jeffrey Morgan
598621afab add push script for docker images 2023-10-14 14:24:39 -04:00
Matt Williams
6479f49c09 Merge pull request #773 from jmorganca/mattw/howtoquant
add how to quantize doc
2023-10-14 08:29:39 -07:00
Matt Williams
b2974a7095 applied mikes comments
Signed-off-by: Matt Williams <m@technovangelist.com>
2023-10-14 08:29:24 -07:00
Jeffrey Morgan
832b4db9d4 Use correct url for auto updates 2023-10-13 19:04:42 -04:00
Bruce MacDonald
c43873f33b check update response (#785) 2023-10-13 18:05:46 -04:00
Michael Yang
11d82d7b9b update checkvram 2023-10-13 14:47:29 -07:00
Michael Yang
36fe2deebf only check system memory on macos 2023-10-13 14:47:29 -07:00
Michael Yang
4a8931f634 check total (system + video) memory 2023-10-13 14:47:29 -07:00
Michael Yang
bd6e38fb1a refactor memory check 2023-10-13 14:47:29 -07:00
Michael Yang
92189a5855 fix memory check 2023-10-13 14:47:29 -07:00
Michael Yang
d790bf9916 Merge pull request #783 from jmorganca/mxyng/fix-gpu-offloading
fix: offloading on low end GPUs
2023-10-13 14:36:44 -07:00
Michael Yang
35afac099a do not use gpu binary when num_gpu == 0 2023-10-13 14:32:12 -07:00
Michael Yang
811c3d1900 no gpu if vram < 2GB 2023-10-13 14:32:12 -07:00
Bruce MacDonald
3553d10769 check for newer updates (#784)
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>
2023-10-13 17:29:46 -04:00
Bruce MacDonald
6fe178134d improve api error handling (#781)
- remove new lines from llama.cpp error messages relayed to client
- check api option types and return error on wrong type
- change num layers from 95% VRAM to 92% VRAM
2023-10-13 16:57:10 -04:00
Jeffrey Morgan
d890890f66 use lower glibc versions in Dockerfile.build 2023-10-13 01:06:19 -04:00
Jeffrey Morgan
89ba19feca use Go 1.21.3 in Dockerfile 2023-10-12 23:23:12 -04:00
Jeffrey Morgan
6f58c77671 update Dockerfile.build for linux binary builds 2023-10-12 22:14:20 -04:00
Matt Williams
3c975f898f update doc to refer to docker image
Signed-off-by: Matt Williams <m@technovangelist.com>
2023-10-12 15:57:50 -07:00
Matt Williams
9245c8a1df add how to quantize doc
Signed-off-by: Matt Williams <m@technovangelist.com>
2023-10-12 15:34:57 -07:00
Michael Yang
7a537cdca9 Merge pull request #770 from jmorganca/mxyng/fix-download
fix download
2023-10-12 12:56:43 -07:00
Michael Yang
257ffeb997 fix download 2023-10-12 12:52:43 -07:00
Matt Williams
9b513bb6b1 Merge pull request #753 from jmorganca/mattw/examplereorg
rename the examples to be more descriptive
2023-10-12 11:24:12 -07:00
Matt Williams
042100f797 final rename
Signed-off-by: Matt Williams <m@technovangelist.com>
2023-10-12 11:23:41 -07:00
Bruce MacDonald
7804b8fab9 validate api options fields from map (#711) 2023-10-12 11:18:11 -04:00
Bruce MacDonald
56497663c8 relay model runner error message to client (#720)
* give direction to user when runner fails
* also relay errors from timeout
* increase timeout to 3 minutes
2023-10-12 11:16:37 -04:00
Matt Williams
e1afcb8af2 simple gen to simple
Signed-off-by: Matt Williams <m@technovangelist.com>
2023-10-11 21:29:07 -07:00
Matt Williams
385eeea357 remove with
Signed-off-by: Matt Williams <m@technovangelist.com>
2023-10-11 21:26:11 -07:00
Matt Williams
8a41b244e8 add golang gen
Signed-off-by: Matt Williams <m@technovangelist.com>
2023-10-11 21:20:50 -07:00
Jeffrey Morgan
92578798bb fix relative links in README.md 2023-10-11 19:24:06 -04:00
Michael Yang
788637918a Merge pull request #760 from jmorganca/mxyng/more-downloads
Mxyng/more downloads
2023-10-11 14:33:10 -07:00
Michael Yang
c413a55093 download: handle inner errors 2023-10-11 14:15:30 -07:00
Michael Yang
630bb75d2a dynamically size download parts based on file size 2023-10-11 14:10:25 -07:00
Michael Yang
a2055a1e93 update download 2023-10-11 14:10:25 -07:00
Michael Yang
b599946b74 add format bytes 2023-10-11 14:08:23 -07:00
Michael Yang
aca2d65b82 Merge pull request #757 from jmorganca/mxyng/format-time
cleanup format time
2023-10-11 11:12:29 -07:00
Michael Yang
b5e08e3373 cleanup format time 2023-10-11 11:09:27 -07:00
Bruce MacDonald
274d5a5fdf optional parameter to not stream response (#639)
* update streaming request accept header
* add optional stream param to request bodies
2023-10-11 12:54:27 -04:00
Matt Williams
fc6b49be32 add ts alternate to python langchain simplegen
Signed-off-by: Matt Williams <m@technovangelist.com>
2023-10-11 09:50:15 -07:00
Bruce MacDonald
77295f716e prevent waiting on exited command (#752)
* prevent waiting on exited command
* close llama runner once
2023-10-11 12:32:13 -04:00
Matt Williams
615f7d1dea cleanup readme.
Signed-off-by: Matt Williams <m@technovangelist.com>
2023-10-11 06:13:29 -07:00
Matt Williams
cdf5e106ae rename dirs
Signed-off-by: Matt Williams <m@technovangelist.com>
2023-10-11 06:10:24 -07:00
Matt Williams
a85329f59a rename the models to be more descriptive
Signed-off-by: Matt Williams <m@technovangelist.com>
2023-10-10 17:40:02 -07:00
Bruce MacDonald
f2ba1311aa improve vram safety with 5% vram memory buffer (#724)
* check free memory not total
* wait for subprocess to exit
2023-10-10 16:16:09 -04:00
Jeffrey Morgan
65dcd0ce35 always cleanup blob download (#747) 2023-10-10 13:12:29 -04:00
Michael Yang
0040f543a2 Merge pull request #743 from jmorganca/mxyng/http-proxy
handle upstream proxies
2023-10-10 09:59:06 -07:00
Matt Williams
767f9bdbbb Merge pull request #585 from jmorganca/matt/examplementors
add the example for ask the mentors
2023-10-09 13:58:14 -07:00
Costa Alexoglou
f7f5169c94 Update api.md (#741)
Avoid triple ticks in visual editor and also copied in clipboard.
2023-10-09 16:01:46 -04:00
Michael Yang
2cfffea02e handle client proxy 2023-10-09 12:33:47 -07:00
Michael Yang
f6e98334e4 handle upstream proxies 2023-10-09 11:42:36 -07:00
Jeffrey Morgan
ab0668293c llm: fix build on amd64 2023-10-06 14:39:54 -07:00
Bruce MacDonald
af4cf55884 not found error before pulling model (#718) 2023-10-06 16:06:20 -04:00
Bruce MacDonald
d6786f2945 add feedback for reading model metadata (#722) 2023-10-06 16:05:32 -04:00
Michael Yang
38dc2f79bc Merge pull request #626 from jmorganca/mxyng/concurrent-downloads
parallel chunked downloads
2023-10-06 13:01:29 -07:00
Michael Yang
cb961c87ca Merge pull request #679 from jamesbraza/modelfile-docs
`Modelfile` syntax highlighting
2023-10-06 12:59:45 -07:00
Michael Yang
0560b28a8d names 2023-10-06 12:56:56 -07:00
Michael Yang
10199c5987 replace done channel with file check 2023-10-06 12:56:56 -07:00
Michael Yang
288814d3e4 fix ref counts 2023-10-06 12:56:43 -07:00
Michael Yang
04733438da check head request response 2023-10-06 12:56:43 -07:00
Michael Yang
711e891f0f fix resumable downloads
glob returns files in lexical order which is not appropriate when
rebuilding the parts list
2023-10-06 12:56:43 -07:00
Michael Yang
090d08422b handle unexpected eofs 2023-10-06 12:56:43 -07:00
Michael Yang
5b84404c64 handle concurrent requests for the same blobs 2023-10-06 12:56:43 -07:00
Michael Yang
8544edca21 parallel chunked downloads 2023-10-06 12:56:43 -07:00
Bruce MacDonald
5d22319a2c rename server subprocess (#700)
- this makes it easier to see that the subprocess is associated with ollama
2023-10-06 10:15:42 -04:00
Bruce MacDonald
2130c0708b output type parsed from modelfile (#678) 2023-10-05 14:58:04 -04:00
Patrick Devine
61ff1946e6 revise help text (#706) 2023-10-05 11:36:07 -07:00
Bruce MacDonald
d06bc0cb6e enable q8, q5, 5_1, and f32 for linux gpu (#699) 2023-10-05 12:53:47 -04:00
Alexander F. Rødseth
d104b7e997 Fix go test./... issue: fmt.Println arg list ends with redundant newline (#705) 2023-10-05 11:11:04 -04:00
Bruce MacDonald
9e2de1bd2c increase streaming buffer size (#692) 2023-10-04 14:09:00 -04:00
Jeffrey Morgan
dc87e9c9ae update Dockerfile to pass GOFLAGS 2023-10-03 07:05:15 -07:00
Michael Yang
367cb68dc1 Merge pull request #686 from jmorganca/mxyng/starcoder
decode starcoder
2023-10-02 22:47:19 -07:00
Michael Yang
c02c0cd483 starcoder 2023-10-02 19:56:51 -07:00
Patrick Devine
1852755154 show a default message when license/parameters/system prompt/template aren't specified (#681) 2023-10-02 14:34:52 -07:00
James Braza
6f2ce74231 Got rif of all caps to show it can be lower case 2023-10-02 13:54:27 -07:00
James Braza
6edcc5c79f Using code highlighting syntax around Modelfile 2023-10-02 13:46:05 -07:00
Bruce MacDonald
b1f7123301 clean up num_gpu calculation code (#673) 2023-10-02 14:53:42 -04:00
Bruce MacDonald
1fbf3585d6 Relay default values to llama runner (#672)
* include seed in params for llama.cpp server and remove empty filter for temp

* relay default predict options to llama.cpp

- reorganize options to match predict request for readability

* omit empty stop

---------

Co-authored-by: hallh <hallh@users.noreply.github.com>
2023-10-02 14:53:16 -04:00
Patrick Devine
99d5161e8a don't wordwrap when stdout is redirected or piped (#662) 2023-10-02 11:50:55 -07:00
Michael
ea8380be45 add community project: Chatbot Ollama
add community project: Chatbot Ollama by @ivanfioravanti
2023-10-02 09:04:31 -07:00
Jeffrey Morgan
4f25092dc1 fix build_docker.sh permissions 2023-10-01 16:42:32 -07:00
Jiayu Liu
4fc10acce9 add some missing code directives in docs (#664) 2023-10-01 11:51:01 -07:00
Michael Yang
0a4f21c0a7 fix docker build (#659) 2023-09-30 13:34:01 -07:00
Jeffrey Morgan
9abb66254a docker: fix volume permission errors 2023-09-30 12:32:15 -07:00
Jay Nakrani
1d0ebe67e8 Document response stream chunk delimiter. (#632)
Document response stream chunk delimiter.
2023-09-29 21:45:52 -07:00
Bruce MacDonald
a1b2d95f96 remove unused push/pull params (#650) 2023-09-29 17:27:19 -04:00
Michael Yang
c0b1bf7537 Merge pull request #606 from jmorganca/mxyng/install.sh-2
ordered list of install locations
2023-09-29 11:30:46 -07:00
Michael Yang
cdfeb165ca Merge pull request #608 from jmorganca/mxyng/build
update build scripts
2023-09-29 11:30:25 -07:00
Michael Yang
92d454ec5f update build_darwin.sh 2023-09-29 11:29:23 -07:00
Michael Yang
9333b0cc82 Merge pull request #612 from jmorganca/mxyng/prune-empty-directories
prune empty directories
2023-09-29 11:23:39 -07:00
Bruce MacDonald
9771b1ec51 windows runner fixes (#637) 2023-09-29 11:47:55 -04:00
Patrick Devine
76db4a49cf allow the user to cancel generating with ctrl-C (#641) 2023-09-28 17:13:01 -07:00
Luc Stepniewski
4aa0976a2e Added missing return preventing SIGSEGV because of missing resp (#621)
Co-authored-by: Luc Stepniewski <luc@eclipse-fr.com>
2023-09-28 14:25:22 -07:00
Patrick Devine
92c20fdae6 fix error messages for unknown commands in the repl (#611) 2023-09-28 14:19:45 -07:00
Michael Yang
c951da7096 Merge pull request #634 from jmorganca/mxyng/int64
use int64 consistently
2023-09-28 14:17:47 -07:00
Bruce MacDonald
24d82a23a2 do not download updates multiple times (#633) 2023-09-28 15:29:17 -04:00
Michael Yang
f40b3de758 use int64 consistently 2023-09-28 11:07:24 -07:00
Michael
5f4008c296 Update README.md
adding in instruction to run mistral
2023-09-28 09:06:03 -07:00
Aaron Coffey
6ae33d8141 Update modelfile.md to reflect the usage of num_gpu. (#629) 2023-09-28 10:21:21 -04:00
Jeffrey Morgan
c5664c1fef Update faq.md 2023-09-27 13:49:43 -07:00
Bruce MacDonald
958a5a8184 revert fedora cuda version check 2023-09-27 15:12:29 -04:00
Michael Yang
8608eb4760 prune empty directories 2023-09-27 10:58:09 -07:00
Bruce MacDonald
a2b210130f fedora install fixes (#609) 2023-09-27 11:43:47 -04:00
Bruce MacDonald
ed20837f9a Update modelfile.md 2023-09-27 10:38:10 -04:00
James Braza
1db2a61dd0 Added num_predict to the options table (#614) 2023-09-27 10:26:08 -04:00
Jeffrey Morgan
2ded8ab206 use 11.8.0 nvidia dockerfile base image for now 2023-09-26 21:48:41 -07:00
Michael Yang
e6b3648bbf Merge pull request #616 from jmorganca/mxyng/fix-model-name 2023-09-26 20:54:18 -07:00
Michael Yang
0625e805f0 fix model name not matching 2023-09-26 19:50:04 -07:00
Michael Yang
c38ec5befb Merge pull request #598 from jmorganca/mxyng/help-exit
add painter message for exit
2023-09-26 15:17:40 -07:00
Michael Yang
c577721a43 Merge pull request #605 from jmorganca/mxyng/install.sh
do not unload nouveau driver
2023-09-26 09:53:05 -07:00
Michael Yang
29c056ea39 ordered list of install locations 2023-09-26 09:38:11 -07:00
Michael Yang
9fc3bba9cf do no unload nouveau driver 2023-09-26 09:36:54 -07:00
Michael Chiang
7774ed4ae6 Update README.md for linux + cleanup (#601)
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>
2023-09-25 23:44:53 -07:00
Michael Yang
11f920f209 Merge pull request #599 from jmorganca/mxyng/install.sh
update install.sh
2023-09-25 18:24:13 -07:00
Michael Yang
6e6b655956 update install.sh 2023-09-25 18:09:44 -07:00
Michael Yang
110ae89a6c Merge pull request #596 from jmorganca/mxyng/install.sh
update install.sh
2023-09-25 17:59:13 -07:00
Michael Yang
5e388f931e check cuda installed before installing 2023-09-25 17:56:43 -07:00
Michael Yang
d5ad41dd7b fix path for wsl user 2023-09-25 17:56:25 -07:00
Michael Yang
d294a11bc9 start service on exit instead of immediately 2023-09-25 17:54:02 -07:00
Michael Yang
93d887e4bc add painter message for exit 2023-09-25 16:30:22 -07:00
Jeffrey Morgan
5306b0269d Update linux.md 2023-09-25 16:10:32 -07:00
Michael Yang
7de0c8345d Merge pull request #595 from jmorganca/mxyng/install.sh
ignore systemctl is-system-running exit code
2023-09-25 15:49:47 -07:00
Michael Yang
1b9dcab3ab ignore systemctl is-system-running exit code 2023-09-25 15:47:45 -07:00
Bruce MacDonald
86279f4ae3 unbound max num gpu layers (#591)
---------

Co-authored-by: Michael Yang <mxyng@pm.me>
2023-09-25 18:36:46 -04:00
Michael Yang
b934bf23e6 exit on unknown distro (#594) 2023-09-25 15:30:58 -07:00
Michael Yang
2b8ef455ad Merge pull request #593 from jmorganca/mxyng/install.sh
update install.sh
2023-09-25 14:09:40 -07:00
Michael Yang
0c5f47177c update install.sh 2023-09-25 14:01:44 -07:00
Michael Yang
1210db2924 Merge pull request #592 from jmorganca/mxyng/install.sh
fix dkms on debian
2023-09-25 12:59:01 -07:00
Michael Yang
d0854bf1e6 fix dkms on debian 2023-09-25 12:57:25 -07:00
Michael Yang
8396463255 Merge pull request #590 from jmorganca/mxyng/install.sh
fix dkms install
2023-09-25 12:17:31 -07:00
Michael Yang
a027bbf4d7 fix dkms install 2023-09-25 12:16:41 -07:00
Michael Yang
ed94a3dd02 Merge pull request #589 from jmorganca/mxyng/install.sh
update install.sh
2023-09-25 11:08:25 -07:00
Michael Yang
f14f62ab3b update install.sh 2023-09-25 11:05:38 -07:00
Jeffrey Morgan
0fb5268496 Update linux.md 2023-09-25 10:06:23 -07:00
Bruce MacDonald
c65edb1506 fix linux installer warning logs (#588) 2023-09-25 11:22:56 -04:00
Twan L
1605af32ec Added a new community project (#574) 2023-09-25 10:40:59 -04:00
Jeffrey Morgan
ee3032ad89 improvements to docs/linux.md 2023-09-24 21:50:07 -07:00
Jeffrey Morgan
5b7a27281d improvements to docs/linux.md 2023-09-24 21:38:23 -07:00
Jeffrey Morgan
d2a784e33e add docs/linux.md 2023-09-24 21:34:44 -07:00
Jeffrey Morgan
413a2e4f91 set DEBIAN_FRONTEND=noninteractive correctly 2023-09-24 20:35:42 -07:00
Matt Williams
a92fdff620 add the example for ask the mentors
Signed-off-by: Matt Williams <m@technovangelist.com>
2023-09-24 15:58:32 -07:00
Patrick Devine
b5614f3ebc fix end-of-line issue with the new prompt (#582) 2023-09-23 17:20:30 -07:00
Jeffrey Morgan
8b2ba9cab8 minor improvements to install.sh 2023-09-23 11:20:39 -04:00
Jeffrey Morgan
e29662ab5c fix minor install script issues on debian 2023-09-23 10:25:47 -04:00
Bruce MacDonald
cbc40aa996 debian installer support (#579)
* debian installer support

- normalize os name to lowercase
- check needed commands are available
- dont check sudo when root user
- share common install commands
- support debian cuda install
- skip aarm cuda install
- system user shared home dir

* refactor and add other platforms (#580)

---------

Co-authored-by: Michael Yang <mxyng@pm.me>
2023-09-23 09:46:47 -04:00
Jeffrey Morgan
5cb82540c9 install.sh: update install url 2023-09-23 09:35:14 -04:00
Jeffrey Morgan
d7849a1dc9 add .env to .dockerignore 2023-09-23 00:53:48 -04:00
Jeffrey Morgan
01c44d687e add multi line strings to final prompt 2023-09-23 00:27:24 -04:00
Jeffrey Morgan
9b12a511ca check other request fields before load short circuit in /api/generate 2023-09-22 23:50:55 -04:00
Jeffrey Morgan
e20362e0d5 fix multi line input in ollama run 2023-09-22 23:49:35 -04:00
Patrick Devine
c928ceb927 add word wrapping for lines which are longer than the terminal width (#553) 2023-09-22 13:36:08 -07:00
Michael Yang
e1a0846483 Merge pull request #571 from jmorganca/mxyng/update-dockerfile
update dockerfile.cuda
2023-09-22 12:34:41 -07:00
Jeffrey Morgan
f997e29e45 Add Dockerfile.build for building linux binaries (#558)
Add `Dockerfile.build` for building linux binaries

---------

Co-authored-by: Michael Yang <mxyng@pm.me>
2023-09-22 15:20:12 -04:00
Patrick Devine
87d9efb364 switch to forked readline lib which doesn't wreck the repl prompt (#578) 2023-09-22 12:17:45 -07:00
Michael Yang
93d3a2568d replace dockerfile 2023-09-22 11:57:38 -07:00
Michael Yang
5a81390b24 update dockerfile.cuda 2023-09-22 11:57:38 -07:00
Michael Yang
a89ef99aed Merge pull request #575 from jmorganca/mxyng/fix-ipv6-only
fix ipv6 parse ip
2023-09-22 11:47:11 -07:00
Bruce MacDonald
dc0c725ceb ubuntu cuda drivers (#576) 2023-09-22 19:43:14 +01:00
Bruce MacDonald
5d71bda478 close llm on interrupt (#577) 2023-09-22 19:41:52 +01:00
Michael Yang
88897a90e4 fix ipv6 parse ip 2023-09-22 10:41:32 -07:00
Bruce MacDonald
9df31c3518 linux installer script (#534)
Co-authored-by: Michael Yang <mxyng@pm.me>
2023-09-22 17:01:03 +01:00
Michael Yang
2044f9d4da Merge pull request #570 from jmorganca/mxyng/head-request
fix HEAD request
2023-09-21 16:56:17 -07:00
Michael Yang
0d186f3b33 Merge pull request #569 from jmorganca/mxyng/update-submodules
silence warm up log
2023-09-21 16:52:42 -07:00
Michael Yang
82f5b66c01 register HEAD /api/tags 2023-09-21 16:38:03 -07:00
Michael Yang
c986694367 fix HEAD / request
HEAD request should respond like their GET counterparts except without a
response body.
2023-09-21 16:35:58 -07:00
Michael Yang
058d0cd04b silence warm up log 2023-09-21 14:53:33 -07:00
Michael Yang
ee1c994d15 update submodule (#567) 2023-09-21 16:22:23 -04:00
Bruce MacDonald
4cba75efc5 remove tmp directories created by previous servers (#559)
* remove tmp directories created by previous servers

* clean up on server stop

* Update routes.go

* Update server/routes.go

Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>

* create top-level temp ollama dir

* check file exists before creating

---------

Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>
Co-authored-by: Michael Yang <mxyng@pm.me>
2023-09-21 20:38:49 +01:00
Michael Yang
8c83701e9f Merge pull request #566 from jmorganca/mxyng/api-check-model-exists
Use API to check if model exists and pull if necessary
2023-09-21 10:35:14 -07:00
Michael Yang
6137b12799 validate existence and pull model using api 2023-09-21 09:55:34 -07:00
Michael Yang
1fabba474b refactor default allow origins
this should be less error prone
2023-09-21 09:42:25 -07:00
Michael Yang
765770efdb Merge pull request #562 from jmorganca/mxyng/fix-ollama-host
fix OLLAMA_HOST parsing for ip6
2023-09-20 19:54:47 -07:00
Michael Yang
9297ff8330 fix OLLAMA_HOST parsing for ip6 2023-09-20 18:52:57 -07:00
Michael Yang
ee4fd16f2c Merge pull request #556 from jmorganca/pack-cuda
pack in cuda libs
2023-09-20 15:02:36 -07:00
Michael Yang
a9ed7cc6aa rename generate.go 2023-09-20 14:42:17 -07:00
Michael Yang
6c6a31a1e8 embed libraries using cmake 2023-09-20 14:41:57 -07:00
Bruce MacDonald
fc6ec356fc remove libcuda.so 2023-09-20 20:36:14 +01:00
Bruce MacDonald
1255bc9b45 only package 11.8 runner 2023-09-20 20:00:41 +01:00
Michael Yang
084e4c782a Merge pull request #557 from jmorganca/mxyng/cleanup
fix impossible condition
2023-09-20 11:51:01 -07:00
Michael Yang
58ffa03d8b fix impossible condition 2023-09-20 11:27:44 -07:00
Michael Yang
637f8bc6a5 Merge pull request #536 from jmorganca/mxyng/redirect-uploads
explicitly follow upload redirects
2023-09-20 11:27:03 -07:00
Michael Yang
499e9007a5 pick chunksize based on location 2023-09-20 11:10:24 -07:00
Bruce MacDonald
b9bb5ca288 use cuda_version 2023-09-20 17:58:16 +01:00
Bruce MacDonald
4e8be787c7 pack in cuda libs 2023-09-20 17:40:42 +01:00
Michael Yang
aa45d7c1df draft: explicitly follow upload redirects 2023-09-19 13:36:58 -07:00
Michael Yang
e35565c567 Merge pull request #555 from jmorganca/mxyng/fix-windows-startup
fix build
2023-09-19 10:51:58 -07:00
Michael Yang
a5520bfb42 fix build 2023-09-19 10:42:24 -07:00
Michael Yang
2627c464ba Merge pull request #554 from jmorganca/mxyng/fix-windows-startup
fix mkdir on windows
2023-09-19 09:42:12 -07:00
Michael Yang
b58d5d16b0 fix mkdir on windows 2023-09-19 09:41:13 -07:00
Patrick Devine
24580df958 only add a layer if there is actual data (#535) 2023-09-18 13:47:45 -07:00
Patrick Devine
80dd44e80a Cmd changes (#541) 2023-09-18 12:26:56 -07:00
James Braza
94e1d96b29 Updated README section on community projects for table (#550) 2023-09-18 15:22:50 -04:00
Bruce MacDonald
66003e1d05 subprocess improvements (#524)
* subprocess improvements

- increase start-up timeout
- when runner fails to start fail rather than timing out
- try runners in order rather than choosing 1 runner
- embed metal runner in metal dir rather than gpu
- refactor logging and error messages

* Update llama.go

* Update llama.go

* simplify by using glob
2023-09-18 15:16:32 -04:00
Michael Yang
c345053a8b Merge pull request #537 from jmorganca/mxyng/upload
fix error on upload chunk
2023-09-15 17:48:39 -07:00
Michael Yang
08d7c2a944 fix error on upload chunk 2023-09-15 15:59:30 -07:00
Michael Yang
bc9573dcb1 Merge pull request #530 from jmorganca/mxyng/progresswriter
implement ProgressWriter
2023-09-15 12:43:46 -07:00
Michael Yang
e53bc57d4d split uploadBlobChunked 2023-09-14 17:22:05 -07:00
Michael Yang
f0b398d17f implement ProgressWriter 2023-09-14 17:22:04 -07:00
Patrick Devine
8efbc5df55 DRAFT: add a simple python client to access ollama (#522) 2023-09-14 16:37:38 -07:00
Michael Yang
ccc3e9ac6d Merge pull request #531 from jmorganca/mxyng/content-length
set request.ContentLength
2023-09-14 13:33:11 -07:00
Michael Yang
daa4f096f9 set request.ContentLength
This informs the HTTP client the content length is known and disables
chunked Transfer-Encoding
2023-09-14 13:32:44 -07:00
Michael Yang
3ee85f1c6c Merge pull request #526 from jmorganca/mxyng/cleanup
remove unused
2023-09-14 13:10:59 -07:00
Bruce MacDonald
2540c9181c support for packaging in multiple cuda runners (#509)
* enable packaging multiple cuda versions
* use nvcc cuda version if available

---------

Co-authored-by: Michael Yang <mxyng@pm.me>
2023-09-14 15:08:13 -04:00
Michael Yang
83ffb154bc Merge pull request #507 from jmorganca/mxyng/build
update docker image
2023-09-14 11:25:59 -07:00
Michael Yang
9aa192c812 update cuda docker image 2023-09-14 11:25:20 -07:00
Matt Williams
fc8707686f Update API docs (#527)
* Update API docs

Signed-off-by: Matt Williams <m@technovangelist.com>

* strange TOC was getting auto generated

Signed-off-by: Matt Williams <m@technovangelist.com>

* Update docs/api.md

Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com>

* Update docs/api.md

Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com>

* Update docs/api.md

Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com>

* Update api.md

---------

Signed-off-by: Matt Williams <m@technovangelist.com>
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com>
Co-authored-by: Michael Chiang <mchiang0610@users.noreply.github.com>
2023-09-14 08:51:26 -07:00
Michael Yang
f89c23764b Merge pull request #525 from jmorganca/mxyng/falcon-decode
fix: add falcon.go
2023-09-13 15:08:47 -07:00
Michael Yang
e6881cabd0 remove unused 2023-09-13 14:48:33 -07:00
Michael Yang
d028853879 fix: add falcon.go 2023-09-13 14:47:37 -07:00
Michael Yang
949553db23 Merge pull request #519 from jmorganca/mxyng/decode
Mxyng/decode
2023-09-13 12:43:57 -07:00
Michael Yang
0c5a454361 fix model type for 70b 2023-09-12 15:12:59 -07:00
Bruce MacDonald
f59c4d03f7 fix ggml arm64 cuda build (#520) 2023-09-12 17:06:48 -04:00
Michael Yang
7dee25a07f fix falcon decode
get model and file type from bin file
2023-09-12 12:34:53 -07:00
Bruce MacDonald
f221637053 first pass at linux gpu support (#454)
* linux gpu support
* handle multiple gpus
* add cuda docker image (#488)
---------

Co-authored-by: Michael Yang <mxyng@pm.me>
2023-09-12 11:04:35 -04:00
Patrick Devine
45ac07cd02 create the blobs directory correctly (#508) 2023-09-11 14:54:52 -07:00
Jeffrey Morgan
7d749cc787 fix darwin build script 2023-09-11 16:31:46 -04:00
Patrick Devine
e7e91cd71c add autoprune to remove unused layers (#491) 2023-09-11 11:46:35 -07:00
Jeffrey Morgan
3920e15386 add model format to config layer (#497) 2023-09-09 17:53:44 -04:00
Michael Yang
41e976edde Merge pull request #492 from jmorganca/mxyng/nil-pointer
fix nil pointer dereference
2023-09-07 17:25:23 -07:00
Michael Yang
de227b620f fix nil pointer dereference 2023-09-07 17:24:31 -07:00
Michael Yang
63def6ca49 Merge pull request #487 from jmorganca/mxyng/dockerignore
update dockerignore
2023-09-07 14:16:17 -07:00
Michael Yang
738fe9c4aa Merge pull request #486 from jmorganca/mxyng/fix-push
fix: retry push on expired token
2023-09-07 13:58:34 -07:00
Michael Yang
a8da0bacbe update dockerignore 2023-09-07 13:36:25 -07:00
Michael Yang
bf146fb072 fix retry on unauthorized chunk 2023-09-07 12:02:04 -07:00
Michael Yang
f0f4943577 fix get auth token 2023-09-07 12:01:56 -07:00
Bruce MacDonald
09dd2aeff9 GGUF support (#441) 2023-09-07 13:55:37 -04:00
Alexander Pepper
07b4074e7b [docs] Improve build instructions (#482)
Go is required and not installed by default.
2023-09-07 06:43:26 -04:00
Jeffrey Morgan
61dda6a5e0 set minimum CMAKE_OSX_DEPLOYMENT_TARGET to 11.0 2023-09-06 19:56:50 -04:00
Michael Yang
e1f9ced568 Merge pull request #479 from jmorganca/mxyng/dockerfile
update dockerfile
2023-09-06 15:44:24 -07:00
Michael Yang
9795b43d93 update dockerfile 2023-09-06 15:31:25 -07:00
Michael Yang
0980d5c7e3 Merge pull request #478 from jmorganca/mxyng/cleanup
remove unused openssh key types
2023-09-06 15:18:54 -07:00
Michael Yang
0dae34b6a7 remove unused openssh key types 2023-09-06 14:34:09 -07:00
Michael Yang
83c6be1666 fix model manifests (#477) 2023-09-06 17:30:08 -04:00
Patrick Devine
1adfa67589 tighten up the error string for ollama show flags (#476) 2023-09-06 13:38:49 -07:00
Patrick Devine
790d24eb7b add show command (#474) 2023-09-06 11:04:17 -07:00
Jeffrey Morgan
7de300856b use osPath in gpu check 2023-09-05 21:52:21 -04:00
Jeffrey Morgan
213ffdb548 macos amd64 compatibility fixes 2023-09-05 21:33:31 -04:00
Michael Yang
d42d88386a Merge pull request #473 from jmorganca/mxyng/fix-manifest-path
create manifests directory
2023-09-05 17:37:41 -07:00
Ackermann Yuriy
154f24af91 Added missing options params to the embeddings docs (#472) 2023-09-05 20:18:49 -04:00
Michael Yang
a1ecdd36d5 create manifests directory 2023-09-05 17:10:40 -07:00
Bruce MacDonald
d18282bfda metal: add missing barriers for mul-mat (#469) 2023-09-05 19:37:13 -04:00
Michael Yang
9ae76ba8c9 Merge pull request #471 from jmorganca/mxyng/fix-empty-response
fix empty response
2023-09-05 15:23:05 -07:00
Michael Yang
2bc06565c7 fix empty response 2023-09-05 15:03:24 -07:00
Michael Yang
d1c2558f7e Merge pull request #461 from jmorganca/mxyng/fix-inherit-params
fix inherit params
2023-09-05 12:30:23 -07:00
Michael Yang
7b5aefb427 Merge pull request #462 from jmorganca/mxyng/rm-marshal-prompt
remove marshalPrompt which is no longer needed
2023-09-05 11:48:41 -07:00
Michael Yang
06ef90c051 fix parameter inheritence
parameters are not inherited because they are processed differently from
other layer. fix this by explicitly merging the inherited params into
the new params. parameter values defined in the new modelfile will
override those defined in the inherited modelfile. array lists are
replaced instead of appended
2023-09-05 11:40:20 -07:00
Michael Yang
7efbc84320 Merge pull request #464 from jmorganca/mxyng/fix-num-keep
fix num_keep
2023-09-05 11:30:45 -07:00
Michael Yang
e9f6df7dca use slices.DeleteFunc 2023-09-05 09:56:59 -07:00
Jeffrey Morgan
7fa6e51686 generate binary dependencies based on GOARCH on macos (#459) 2023-09-05 12:53:57 -04:00
Michael Yang
8dc68417e7 Merge pull request #463 from jmorganca/mxyng/fix-last-token
fix not forwarding last token
2023-09-05 09:01:32 -07:00
Michael Yang
681f3c4c42 fix num_keep 2023-09-03 17:47:49 -04:00
Michael Yang
59a705525c fix not forwarding last token 2023-09-03 17:46:50 -04:00
Michael Yang
5d3f314b0b remove marshalPrompt which is no longer needed 2023-09-03 17:01:05 -04:00
Michael Yang
adaa13088b Merge pull request #457 from sqs/dont-html-escape-prompt
do not HTML-escape prompt
2023-09-01 17:41:53 -07:00
Quinn Slack
62d29b2157 do not HTML-escape prompt
The `html/template` package automatically HTML-escapes interpolated strings in templates. This behavior is undesirable because it causes prompts like `<h1>hello` to be escaped to `&lt;h1&gt;hello` before being passed to the LLM.

The included test case passes, but before the code change, it failed:

```
--- FAIL: TestModelPrompt
    images_test.go:21: got "a&lt;h1&gt;b", want "a<h1>b"
```
2023-09-01 17:16:38 -05:00
Michael Yang
ed19d10aa5 update readme (#451)
* update readme

* readme: more run examples
2023-09-01 16:44:14 -04:00
Michael Yang
36c2f45c40 Merge pull request #450 from jmorganca/mxyng/update-readme
update readme
2023-09-01 08:21:49 -07:00
Michael Yang
742226625f update readme 2023-09-01 10:54:31 -04:00
Matt Williams
6bb8a16ccb Merge pull request #273 from jmorganca/matt/moreexamples
Create a sentiments example
2023-08-31 16:31:59 -07:00
Jeffrey Morgan
a5dbcf2e73 app: dont package ggml-metal.metal 2023-08-31 17:41:09 -04:00
Michael Yang
9304f0e7a8 Merge pull request #443 from jmorganca/mxyng/fix-list-models
windows: fix filepath bugs
2023-08-31 14:19:10 -07:00
Michael Yang
6578b2f8a1 Merge pull request #448 from callmephilip/patch-1
fix spelling errors in example prompts
2023-08-31 08:57:07 -07:00
Michael Yang
1c8fd627ad windows: fix create modelfile 2023-08-31 09:47:10 -04:00
Michael Yang
ae950b00f1 windows: fix delete 2023-08-31 09:47:10 -04:00
Michael Yang
eeb40a672c fix list models for windows 2023-08-31 09:47:10 -04:00
Michael Yang
0f541a0367 s/ListResponseModel/ModelResponse/ 2023-08-31 09:47:10 -04:00
Philip Nuzhnyi
1363f537ce fix spelling errors in prompt 2023-08-31 10:02:46 +01:00
Jeffrey Morgan
bc3e21fdc6 update README.md 2023-08-30 17:56:14 -04:00
Jeffrey Morgan
a82eb275ff update docs for subprocess 2023-08-30 17:54:02 -04:00
Bruce MacDonald
f964aea9a2 remove test not applicate to subprocess 2023-08-30 16:36:11 -04:00
Bruce MacDonald
42998d797d subprocess llama.cpp server (#401)
* remove c code
* pack llama.cpp
* use request context for llama_cpp
* let llama_cpp decide the number of threads to use
* stop llama runner when app stops
* remove sample count and duration metrics
* use go generate to get libraries
* tmp dir for running llm
2023-08-30 16:35:03 -04:00
Quinn Slack
f4432e1dba treat stop as stop sequences, not exact tokens (#442)
The `stop` option to the generate API is a list of sequences that should cause generation to stop. Although these are commonly called "stop tokens", they do not necessarily correspond to LLM tokens (per the LLM's tokenizer). For example, if the caller sends a generate request with `"stop":["\n"]`, then generation should stop on any token containing `\n` (and trim `\n` from the output), not just if the token exactly matches `\n`. If `stop` were interpreted strictly as LLM tokens, then it would require callers of the generate API to know the LLM's tokenizer and enumerate many tokens in the `stop` list.

Fixes https://github.com/jmorganca/ollama/issues/295.
2023-08-30 11:53:42 -04:00
Michael Yang
982c535428 Merge pull request #428 from jmorganca/mxyng/upload-chunks
update upload chunks
2023-08-30 07:47:17 -07:00
Michael Yang
7df342a6ea Merge pull request #421 from jmorganca/mxyng/f16-metal
allow F16 to use metal
2023-08-29 06:32:59 -07:00
Patrick Devine
8bbff2df98 add model IDs (#439) 2023-08-28 20:50:24 -07:00
Michael Yang
16b06699fd remove unused parameter 2023-08-28 18:35:18 -04:00
Michael Yang
246dc65417 loosen http status code checks 2023-08-28 18:34:53 -04:00
Michael Yang
865fceb73c chunked pipe 2023-08-28 18:34:53 -04:00
Michael Yang
72266c7684 bump chunk size to 95MB 2023-08-28 18:34:53 -04:00
Jeffrey Morgan
d3b838ce60 update orca to orca-mini 2023-08-27 13:26:30 -04:00
Michael Yang
e639a12fa1 Merge pull request #412 from jmorganca/mxyng/update-readme
update README.md
2023-08-26 21:26:34 -07:00
Michael Yang
e82fcf30c6 Merge pull request #420 from jmorganca/mxyng/34b-mem-check
add 34b to mem check
2023-08-26 14:15:52 -07:00
Michael Yang
495e8b0a6a Merge pull request #426 from jmorganca/default-template
set default template
2023-08-26 14:15:38 -07:00
Michael Yang
59734ca24d set default template 2023-08-26 12:20:48 -07:00
Jeffrey Morgan
22ab7f5f88 default host to 127.0.0.1, fixes #424 2023-08-26 11:59:28 -07:00
Michael Yang
b25dd1795d allow F16 to use metal
warning F16 uses significantly more memory than quantized model so the
standard requires don't apply.
2023-08-26 08:38:48 -07:00
Michael Yang
304f2b6c96 add 34b to mem check 2023-08-26 08:29:21 -07:00
Quinn Slack
2ecc3a33c3 delete all models (not just 1st) in ollama rm (#415)
Previously, `ollama rm model1 model2 modelN` would only delete `model1`. The other model command-line arguments would be silently ignored. Now, all models mentioned are deleted.
2023-08-26 00:47:56 -07:00
Jeffrey Morgan
ee6e1df118 add codellama to model list in readme 2023-08-25 20:44:26 -07:00
Jeffrey Morgan
177b69a211 add missing entries for 34B 2023-08-25 18:35:35 -07:00
Michael Yang
dad63f0821 Merge pull request #411 from jmorganca/mxyng/34b
patch llama.cpp for 34B
2023-08-25 11:59:05 -07:00
Michael Yang
041f9ad1a1 update README.md 2023-08-25 11:44:25 -07:00
Michael Yang
7a378f8b66 patch llama.cpp for 34B 2023-08-25 10:06:55 -07:00
Michael Yang
de0bdd7f29 Merge pull request #405 from jmorganca/mxyng/34b
add 34b model type
2023-08-24 10:37:22 -07:00
Michael Yang
b1cececb8e add 34b model type 2023-08-24 10:35:44 -07:00
Michael Yang
e0d39fa3bf Merge pull request #398 from jmorganca/mxyng/cleanup
Mxyng/cleanup
2023-08-22 15:51:41 -07:00
Michael Yang
968ced2e71 Merge pull request #393 from jmorganca/mxyng/net-url
use url.URL
2023-08-22 15:51:33 -07:00
Michael Yang
32d1a00017 remove unused requestContextKey 2023-08-22 10:49:54 -07:00
Michael Yang
04e2128273 move upload funcs to upload.go 2023-08-22 10:49:53 -07:00
Michael Yang
2cc634689b use url.URL 2023-08-22 10:49:07 -07:00
Michael Yang
8f827641b0 Merge pull request #397 from jmorganca/mxyng/release-mode
build release mode
2023-08-22 10:48:44 -07:00
Michael Yang
95187d7e1e build release mode 2023-08-22 09:52:43 -07:00
Michael Yang
9ec7e37534 Merge pull request #392 from jmorganca/mxyng/version
add version
2023-08-22 09:50:25 -07:00
Michael Yang
2c7f956b38 add version 2023-08-22 09:40:58 -07:00
Jeffrey Morgan
a9f6c56652 fix FROM instruction erroring when referring to a file 2023-08-22 09:39:42 -07:00
Ryan Baker
0a892419ad Strip protocol from model path (#377) 2023-08-21 21:56:56 -07:00
Jeffrey Morgan
e3054fc74e add .env to .dockerignore 2023-08-21 09:32:02 -07:00
Michael Yang
23c2485044 Merge pull request #381 from jmorganca/mxyng/fix-push-chunks
retry on unauthorized chunk push
2023-08-18 13:49:25 -07:00
Michael Yang
386c66f285 Merge pull request #378 from jmorganca/mxyng/copy-metadata-from-source
copy metadata from source
2023-08-18 13:49:09 -07:00
Michael Yang
3b49315f97 retry on unauthorized chunk push
The token printed for authorized requests has a lifetime of 1h. If an
upload exceeds 1h, a chunk push will fail since the token is created on
a "start upload" request.

This replaces the Pipe with SectionReader which is simpler and
implements Seek, a requirement for makeRequestWithRetry. This is
slightly worse than using a Pipe since the progress update is directly
tied to the chunk size instead of controlled separately.
2023-08-18 11:23:47 -07:00
Michael Yang
5ca05c2e88 fix ModelType() 2023-08-18 11:23:38 -07:00
Michael Yang
7eda70f23b copy metadata from source 2023-08-17 21:55:25 -07:00
Jeffrey Morgan
3d79b414d3 app: package ggml-metal.metal from correct directory 2023-08-17 23:55:45 -04:00
Michael Yang
c84bbf1dd6 Merge pull request #376 from jmorganca/mxyng/from-map-ignore-nil
ignore nil map values
2023-08-17 15:57:12 -07:00
Michael Yang
f723bf0879 ignore nil map values 2023-08-17 15:50:46 -07:00
Michael Yang
cbf725a9ba Merge pull request #375 from jmorganca/mxyng/fix-push
fix push manifest
2023-08-17 15:33:31 -07:00
Michael Yang
086449b6c7 fmt 2023-08-17 15:32:31 -07:00
Michael Yang
3cbc6a5c01 fix push manifest 2023-08-17 15:28:12 -07:00
Jeffrey Morgan
54bb49a502 parse protocol for OLLAMA_HOST 2023-08-17 18:20:44 -04:00
Michael Yang
cabaada956 Merge pull request #372 from jmorganca/mxyng/string-types
model and file type as strings
2023-08-17 15:10:59 -07:00
Michael Yang
a894cc792d model and file type as strings 2023-08-17 12:08:04 -07:00
Bruce MacDonald
519f4d98ef add embed docs for modelfile 2023-08-17 13:37:42 -04:00
Michael Yang
b963a83559 Merge pull request #364 from jmorganca/chunked-uploads
reimplement chunked uploads
2023-08-17 09:58:51 -07:00
Michael Yang
bf6688abe6 Merge pull request #360 from jmorganca/fix-request-copies
Fix request copies
2023-08-17 09:58:42 -07:00
Bruce MacDonald
6005b157c2 retry download on network errors 2023-08-17 10:31:45 -04:00
Patrick Devine
14220d9833 set the scopes correctly (#368) 2023-08-16 21:42:02 -07:00
Michael Chiang
8ca50f24f3 fix nous-hermes model file size listing in readme (#367)
fix nous-hermes model file size listing in readme
2023-08-16 23:42:00 -04:00
Michael Chiang
c149fc3143 Update README.md 2023-08-16 22:54:55 -04:00
Michael Chiang
afbc763dac adding link to models directly available on ollama (#366)
- adding link to models directly available on ollama

- ability to push your own models to the library will come in the future
2023-08-16 22:53:27 -04:00
Michael Yang
5dfe91be8b reimplement chunked uploads 2023-08-16 14:50:24 -07:00
Michael Yang
9f944c00f1 push: retry on unauthorized 2023-08-16 11:35:33 -07:00
Michael Yang
56e87cecb1 images: remove body copies 2023-08-16 10:30:41 -07:00
Jeffrey Morgan
5ee6116420 set default OLLAMA_HOST to http://localhost:11434 2023-08-16 12:22:59 -04:00
Michael Yang
5d9a4cd251 Merge pull request #348 from jmorganca/cross-repo-mount
cross repo blob mount
2023-08-16 09:20:36 -07:00
Michael Yang
0ebec07569 Merge pull request #345 from jmorganca/exit-non-zero
set non-zero error code on error
2023-08-16 09:20:28 -07:00
Matt Williams
08265515b3 Merge pull request #303 from jmorganca/matt/dockerit
DockerIt example
2023-08-16 08:04:34 -07:00
Blake Mizerany
67e593e355 cmd: support OLLAMA_CLIENT_HOST environment variable (#262)
* cmd: support OLLAMA_HOST environment variable

This commit adds support for the OLLAMA_HOST environment
variable. This variable can be used to specify the host to which
the client should connect. This is useful when the client is
running somewhere other than the host where the server is running.

The new api.FromEnv function is used to read configure clients from the
environment. Clients wishing to use the environment variable being
consistent with the Ollama CLI can use this new function.

* Update api/client.go

Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>

* Update api/client.go

Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>

---------

Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>
2023-08-16 11:03:48 -04:00
Jeffrey Morgan
d15c7622b9 Update orca to orca-mini in README.md 2023-08-15 21:10:28 -04:00
Bruce MacDonald
1deb35ca64 use loaded llm for generating model file embeddings 2023-08-15 16:12:02 -03:00
Bruce MacDonald
e2de886831 do not regenerate embeddings 2023-08-15 16:10:22 -03:00
Bruce MacDonald
f0d7c2f5ea retry download on network errors 2023-08-15 15:07:19 -03:00
Bruce MacDonald
12052a7624 always remove from in progress map on download 2023-08-15 13:20:32 -03:00
Bruce MacDonald
23e1da778d Add context to api docs 2023-08-15 11:43:22 -03:00
Bruce MacDonald
326de48930 use loaded llm for embeddings 2023-08-15 10:50:54 -03:00
Bruce MacDonald
18f2cb0472 dont log fatal 2023-08-15 10:39:59 -03:00
Bruce MacDonald
53bc36d207 Update modelfile.md 2023-08-15 09:23:36 -03:00
Michael Yang
4dcf5c3e0b Merge pull request #349 from jmorganca/close-files
close open files
2023-08-14 16:15:58 -07:00
Michael Yang
d1b2f532b9 Merge pull request #350 from jmorganca/update-llama-cpp
update llama.cpp
2023-08-14 16:15:51 -07:00
Michael Yang
e26085b921 close open files 2023-08-14 16:08:06 -07:00
Michael Yang
f7b613332c update llama.cpp 2023-08-14 15:47:00 -07:00
Michael Yang
f594c8eb91 cross repo mount 2023-08-14 15:07:35 -07:00
Michael Yang
76b85bc0e9 set non-zero error code on error 2023-08-14 14:09:58 -07:00
Bruce MacDonald
af98a1773f update python example 2023-08-14 16:38:44 -03:00
Bruce MacDonald
9ae9a89883 Update modelfile.md 2023-08-14 16:26:53 -03:00
Bruce MacDonald
648f0974c6 python example 2023-08-14 15:27:13 -03:00
Bruce MacDonald
fc5230dffa Add context to api docs 2023-08-14 15:23:24 -03:00
Bruce MacDonald
2ab20095b3 log embedding eval timing 2023-08-14 12:15:55 -04:00
Bruce MacDonald
f020e1d519 always remove from in progress map on download 2023-08-14 13:09:20 -03:00
Bruce MacDonald
4b2d366c37 Update llama.go 2023-08-14 12:55:50 -03:00
Bruce MacDonald
56fd4e4ef2 log embedding eval timing 2023-08-14 12:51:31 -03:00
Bruce MacDonald
2c8b680b03 use file info for embeddings cache 2023-08-14 12:11:04 -03:00
Bruce MacDonald
99b6b60085 use model bin digest for embed digest 2023-08-14 11:57:12 -03:00
Bruce MacDonald
74f00474e1 Merge pull request #340 from gusanmaz/main
Update langchainpy.md
2023-08-14 09:38:42 -04:00
Bruce MacDonald
e9a9580bdd do not regenerate embeddings
- re-use previously evaluated embeddings when possible
- change embeddings digest identifier to be based on model name and embedded file path
2023-08-14 10:34:17 -03:00
Güvenç Usanmaz
4c33a9ac67 Update langchainpy.md
base_url value for Ollama object creation is corrected.
2023-08-14 12:12:56 +03:00
Jeffrey Morgan
22885aeaee update llama.cpp to f64d44a 2023-08-12 22:47:15 -04:00
Jeffrey Morgan
ed969d2a06 add LiteLLM to README.md 2023-08-12 20:47:57 -04:00
Patrick Devine
d9cf18e28d add maximum retries when pushing (#334) 2023-08-11 15:41:55 -07:00
Jeffrey Morgan
1556162c90 create .ollama directory if it doesnt exist 2023-08-11 15:35:55 -07:00
Jeffrey Morgan
148f0225c0 create .ollama directory if it doesnt exist 2023-08-11 15:33:11 -07:00
Matt Williams
4e07941b1e Merge pull request #329 from jmorganca/matt/tutorials
Add tutorials for using Langchain with ollama
2023-08-11 15:19:39 -07:00
Matt Williams
202c29c21a resolving bmacd comment
Signed-off-by: Matt Williams <m@technovangelist.com>
2023-08-11 13:51:44 -07:00
Matt Williams
c1c871620a Update docs/tutorials/langchainjs.md
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com>
2023-08-11 13:48:46 -07:00
Matt Williams
a21a8bef56 Update docs/tutorials/langchainjs.md
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com>
2023-08-11 13:48:35 -07:00
Matt Williams
522726228a Update docs/tutorials.md
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com>
2023-08-11 13:48:16 -07:00
Patrick Devine
9770e3b325 Generate private/public keypair for use w/ auth (#324) 2023-08-11 10:58:23 -07:00
Michael Yang
d617823355 Merge pull request #333 from jmorganca/off-by-one
ggml: fix off by one error
2023-08-11 10:51:06 -07:00
Michael Yang
6ed991c8e2 ggml: fix off by one error
remove used Unknown FileType
2023-08-11 10:45:22 -07:00
Michael Chiang
e41576e768 Merge branch 'new-syntax' of https://github.com/jmorganca/ollama into new-syntax 2023-08-11 09:00:43 -07:00
Michael Chiang
155c1640f1 add demo video 2023-08-11 08:58:57 -07:00
Jeffrey Morgan
f7d4947573 update header note for privategpt example 2023-08-11 08:52:26 -07:00
Jeffrey Morgan
0d7a133b15 Update README.md for privategpt 2023-08-11 08:29:19 -07:00
Jeffrey Morgan
e863066144 clean up privategpt example 2023-08-11 00:34:52 -07:00
Jeffrey Morgan
89a92477ad fix README.md for privategpt example 2023-08-11 00:26:33 -07:00
Jeffrey Morgan
5cda9cdd13 add instructions to privategpt example to try another model 2023-08-11 00:23:31 -07:00
Jeffrey Morgan
e5914eb320 add venv instructions to privategpt example 2023-08-11 00:20:22 -07:00
Jeffrey Morgan
ab78f48ff8 more setup instructions for privategpt example 2023-08-11 00:19:25 -07:00
Jeffrey Morgan
b1c88eb978 add privategpt example 2023-08-11 00:18:13 -07:00
Jeffrey Morgan
efae43f932 update langchain examples 2023-08-10 23:35:19 -07:00
Matt Williams
d3ee1329e9 Add tutorials for using Langchain with ollama
Signed-off-by: Matt Williams <m@technovangelist.com>
2023-08-10 21:27:37 -07:00
Jeffrey Morgan
700c719422 remove document example for now 2023-08-10 20:25:01 -07:00
Jeffrey Morgan
55aa4aaf0f add langchain examples 2023-08-10 20:23:50 -07:00
Jeffrey Morgan
820f95c4c4 add example 2023-08-10 20:13:47 -07:00
Michael Yang
3a05d3def7 Merge pull request #326 from asarturas/document-num-gqa-parameter
Document num_gqa parameter
2023-08-10 18:18:38 -07:00
Michael Yang
edac9c2446 Merge pull request #325 from jmorganca/mxyng/typo
s/parmeter/parameter/
2023-08-10 17:30:02 -07:00
Arturas Smorgun
d9c2687fd0 document default num_gqa to 1, as it's applicable to most models
Co-authored-by: Michael Yang <mxyng@pm.me>
2023-08-11 01:29:40 +01:00
Michael Yang
6517bcc53c Merge pull request #290 from jmorganca/add-adapter-layers
implement loading ggml lora adapters through the modelfile
2023-08-10 17:23:01 -07:00
Michael Yang
4f54f25b66 Merge pull request #272 from jmorganca/decode-ggml-2
Decode ggml 2: Use decoded values
2023-08-10 17:22:48 -07:00
Michael Yang
6a6828bddf Merge pull request #167 from jmorganca/decode-ggml
partial decode ggml bin for more info
2023-08-10 17:22:40 -07:00
Arturas Smorgun
c0e7a3b90e Document num_gqa parameter
It is required to be adjusted for some models, see https://github.com/jmorganca/ollama/issues/320 for more context
2023-08-11 00:58:09 +01:00
Michael Yang
f27bc261cf s/parmeter/parameter/ 2023-08-10 16:26:06 -07:00
Michael Yang
21e6197c0b Merge pull request #322 from jmorganca/no-comment-warning
no warning on comments
2023-08-10 16:24:41 -07:00
Michael Yang
75d7d681c9 Merge pull request #323 from jmorganca/fix-convert-int
fix could not convert int
2023-08-10 16:24:33 -07:00
Michael Yang
81d8d7b73f fix could not convert int 2023-08-10 16:24:17 -07:00
Michael Yang
5c0de09a07 Merge pull request #321 from jmorganca/fix-parameters
length check for parameters
2023-08-10 16:23:10 -07:00
Michael Yang
20bf000e55 no warning on comments 2023-08-10 16:22:38 -07:00
Michael Yang
40d0c4a1dc length check for parameters 2023-08-10 16:09:02 -07:00
Jeffrey Morgan
be889b2f81 add docs for /api/embeddings 2023-08-10 15:56:59 -07:00
Jeffrey Morgan
7e26a8df31 cmd: use environment variables for server options 2023-08-10 14:17:53 -07:00
Jeffrey Morgan
4ab1da38ba guard around id() 2023-08-10 14:11:54 -07:00
Patrick Devine
be989d89d1 Token auth (#314) 2023-08-10 11:34:25 -07:00
Soroush Javadi
bea683e3bf cmd: check GetBlobsPath error (#317)
The error returned by `server.GetBlobsPath` in `showLayer` was never
checked. Check the error and return if not nil. Also, make newlines at
the end of error messages consistent and fix a typo.
2023-08-10 09:57:49 -07:00
Jeffrey Morgan
178237d37f tweak README.md 2023-08-10 09:54:03 -07:00
Jeffrey Morgan
76a678af34 app: dont always show installer window on top now that it lives in the dock 2023-08-10 09:53:46 -07:00
Jeffrey Morgan
f65169b13e clean up cli flags 2023-08-10 09:28:56 -07:00
Jeffrey Morgan
040a5b9750 clean up cli flags 2023-08-10 09:27:03 -07:00
Michael Yang
37c9a8eea9 add lora docs 2023-08-10 09:23:40 -07:00
Michael Yang
6de5d032e1 implement loading ggml lora adapters through the modelfile 2023-08-10 09:23:39 -07:00
Michael Yang
d791df75dd check memory requirements before loading 2023-08-10 09:23:11 -07:00
Michael Yang
020a3b3530 disable gpu for q5_0, q5_1, q8_0 quants 2023-08-10 09:23:11 -07:00
Michael Yang
fccf8d179f partial decode ggml bin for more info 2023-08-10 09:23:10 -07:00
Bruce MacDonald
5b5cc9c9f1 embeddings endpoint 2023-08-10 11:49:55 -04:00
Bruce MacDonald
4b3507f036 embeddings endpoint
Co-Authored-By: Jeffrey Morgan <jmorganca@gmail.com>
2023-08-10 11:45:57 -04:00
Jun Tian
5ebce03c77 Add an example on multiline input (#311) 2023-08-10 08:22:28 -07:00
Bruce MacDonald
5e25f801ed fix a typo in the tweetwriter example Modelfile 2023-08-10 10:19:53 -04:00
Bruce MacDonald
8e1234b758 fix embeddings invalid values 2023-08-10 10:17:00 -04:00
Soroush Javadi
10885986b8 fix a typo in the tweetwriter example Modelfile 2023-08-10 15:12:48 +03:30
Bruce MacDonald
984c9c628c fix embeddings invalid values 2023-08-09 16:50:53 -04:00
Bruce MacDonald
43c40c500e add embed docs for modelfile 2023-08-09 16:14:58 -04:00
Bruce MacDonald
c4861360ec remove embed docs 2023-08-09 16:14:19 -04:00
Bruce MacDonald
9738ef85db allow for concurrent pulls of the same files 2023-08-09 11:35:24 -04:00
Bruce MacDonald
ac971c56d1 Update images.go 2023-08-09 11:31:54 -04:00
Bruce MacDonald
8228d166ce pr comments 2023-08-09 11:31:54 -04:00
Bruce MacDonald
907e6c56b3 unlock downloadu in case or requestDownload err 2023-08-09 11:31:54 -04:00
Bruce MacDonald
868e3b31c7 allow for concurrent pulls of the same files 2023-08-09 11:31:54 -04:00
Bruce MacDonald
09d8bf6730 fix build errors 2023-08-09 10:45:57 -04:00
Bruce MacDonald
7a5f3616fd embed text document in modelfile 2023-08-09 10:26:19 -04:00
Jeffrey Morgan
cff002b824 use content type application/x-ndjson for streaming responses 2023-08-08 21:38:10 -07:00
Jeffrey Morgan
55cf5021f0 update langchain example to include python 2023-08-08 21:03:10 -07:00
Jeffrey Morgan
f58caa5ab5 update README.md 2023-08-08 15:50:23 -07:00
Jeffrey Morgan
82df473ec9 use note syntax in README.md 2023-08-08 15:49:50 -07:00
Jeffrey Morgan
e184c1d035 Link to api.md in README.md 2023-08-08 15:48:47 -07:00
Jeffrey Morgan
371d4e5df3 docs: fix invalid json in api.md 2023-08-08 15:46:05 -07:00
Jeffrey Morgan
1f78e409b4 docs: format with prettier 2023-08-08 15:41:48 -07:00
Jeffrey Morgan
34a88cd776 docs: update api.md formatting 2023-08-08 15:41:19 -07:00
Bruce MacDonald
1bee2347be pr feedback
- defer closing llm on embedding
- do not override licenses
- remove debugging print line
- reformat model file docs
2023-08-08 17:01:37 -04:00
Jeffrey Morgan
a027a7dd65 add 0.0.0.0 as an allowed origin by default
Fixes #282
2023-08-08 13:39:50 -07:00
Jeffrey Morgan
22986ccb38 add llama2:70b to the model library list 2023-08-08 13:08:05 -07:00
Bruce MacDonald
884d78ceb3 allow embedding from model binary 2023-08-08 14:38:57 -04:00
Bruce MacDonald
3ceac05108 Add embedding docs 2023-08-08 14:04:11 -04:00
Bruce MacDonald
21ddcaa1f1 pr comments
- default to embeddings enabled
- move embedding logic for loaded model to request
- allow embedding full directory
- close llm on reload
2023-08-08 13:49:37 -04:00
Michael Yang
f2074ed4c0 Merge pull request #306 from jmorganca/default-keep-system
automatically set num_keep if num_keep < 0
2023-08-08 09:25:34 -07:00
Bruce MacDonald
a6f6d18f83 embed text document in modelfile 2023-08-08 11:27:17 -04:00
Bruce MacDonald
34a13a9d05 pass flags to serve to allow setting allowed-origins + host and port 2023-08-08 10:41:42 -04:00
Jeffrey Morgan
8713ac23a8 allow overriding template and system in /api/generate
Fixes #297
Fixes #296
2023-08-08 00:55:34 -04:00
Jeffrey Morgan
5eb712f962 trim whitespace before checking stop conditions
Fixes #295
2023-08-08 00:29:19 -04:00
Michael Yang
4dc5b117dd automatically set num_keep if num_keep < 0
num_keep defines how many tokens to keep in the context when truncating
inputs. if left to its default value of -1, the server will calculate
num_keep to be the left of the system instructions
2023-08-07 16:19:12 -07:00
Matt Williams
931a5f3cb9 Merge pull request #304 from jmorganca/matt/docs
missed a backtick
2023-08-07 15:14:06 -07:00
Jeffrey Morgan
639288bf2b make ollama binary executable on build 2023-08-07 18:10:37 -04:00
Jeffrey Morgan
d112c15d58 remove old library and web directories 2023-08-07 18:09:24 -04:00
Matt Williams
1267895e44 missed a backtick
Signed-off-by: Matt Williams <m@technovangelist.com>
2023-08-07 13:53:49 -07:00
Matt Williams
089d03bc8d Merge pull request #289 from jmorganca/docs
First draft of API Docs
2023-08-07 13:46:22 -07:00
Matt Williams
e37f4c4f42 DockerIt example
Signed-off-by: Matt Williams <m@technovangelist.com>
2023-08-07 13:45:22 -07:00
Michael Yang
ab3ced9d32 Merge pull request #276 from jmorganca/rope-freq
configurable rope frequency parameters
2023-08-07 13:39:38 -07:00
Matt Williams
0c52b4509b get rid of namespace and site
Signed-off-by: Matt Williams <m@technovangelist.com>
2023-08-07 13:27:58 -07:00
Matt Williams
13aace3d34 clarify some more
Signed-off-by: Matt Williams <m@technovangelist.com>
2023-08-07 13:21:54 -07:00
Matt Williams
2b3bb41598 model name format added
Signed-off-by: Matt Williams <m@technovangelist.com>
2023-08-07 13:17:16 -07:00
cmiller01
93492f1e18 correct precedence of serve params (args over env over default) 2023-08-07 19:55:20 +00:00
Michael Chiang
54ba3e2ceb langchain JS integration (#302)
langchain JS integration
2023-08-07 12:21:36 -04:00
Matt Williams
4904cd8bcd update simpler code samples
Signed-off-by: Matt Williams <m@technovangelist.com>
2023-08-07 07:40:38 -07:00
Matt Williams
8a45359ec6 Update docs/api.md
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>
2023-08-07 07:33:05 -07:00
cmiller01
fb593b7bfc pass flags to serve to allow setting allowed-origins + host and port
* resolves: https://github.com/jmorganca/ollama/issues/300 and
https://github.com/jmorganca/ollama/issues/282

* example usage:
```
ollama serve --port 9999 --allowed-origins "http://foo.example.com,http://192.0.0.1"
```
2023-08-07 03:34:37 +00:00
Matt Williams
2544b8afa1 update as per Mike's comments
Signed-off-by: Matt Williams <m@technovangelist.com>
2023-08-04 17:42:24 -07:00
Matt Williams
ac1b04f271 Update docs/api.md
Co-authored-by: Michael Yang <mxyng@pm.me>
2023-08-04 17:40:52 -07:00
Matt Williams
123fdeb919 Update docs/api.md
Co-authored-by: Michael Yang <mxyng@pm.me>
2023-08-04 17:38:52 -07:00
Matt Williams
5c82bf95d1 Update docs/api.md
Co-authored-by: Michael Yang <mxyng@pm.me>
2023-08-04 17:12:24 -07:00
Matt Williams
38a9b1618c missed some quotes
Signed-off-by: Matt Williams <m@technovangelist.com>
2023-08-04 16:09:07 -07:00
Matt Williams
c18be72a3b complete 1st draft of api docs
Signed-off-by: Matt Williams <m@technovangelist.com>
2023-08-04 16:08:11 -07:00
Matt Williams
a101fe51a7 clean up
Signed-off-by: Matt Williams <m@technovangelist.com>
2023-08-04 12:56:41 -07:00
Bruce MacDonald
06fc48ad66 Update README.md (#285)
Ollama now supports Intel Macs
2023-08-04 15:45:55 -04:00
Matt Williams
d93e2f9210 fleshing out response
Signed-off-by: Matt Williams <m@technovangelist.com>
2023-08-04 12:38:58 -07:00
Matt Williams
31edc829fc continuing
Signed-off-by: Matt Williams <m@technovangelist.com>
2023-08-04 12:30:23 -07:00
Matt Williams
b31104768c filling out generate
Signed-off-by: Matt Williams <m@technovangelist.com>
2023-08-04 12:27:47 -07:00
Matt Williams
b662d9fd8c starting to build out some docs
Signed-off-by: Matt Williams <m@technovangelist.com>
2023-08-04 11:55:00 -07:00
Matt Williams
da36196d79 Update the modelfile
needed to override the system prompt
from orca and make it easier for a downstream
user to define their system prompt

Signed-off-by: Matt Williams <m@technovangelist.com>
2023-08-04 08:11:24 -07:00
Michael Yang
b9f4d67554 configurable rope frequency parameters 2023-08-03 22:11:58 -07:00
Matt Williams
42903973b7 Added an example to generate a list of 10 tweets
Signed-off-by: Matt Williams <m@technovangelist.com>
2023-08-03 17:26:05 -07:00
Matt Williams
8f2df948ab Create a sentiments example
Signed-off-by: Matt Williams <m@technovangelist.com>
2023-08-03 16:38:31 -07:00
Jeffrey Morgan
e3fb1fd3f1 server: compare options correctly 2023-08-03 15:55:40 -04:00
Michael Yang
29b897f525 Merge pull request #253 from jmorganca/upload
use a pipe to push to registry with progress
2023-08-03 12:11:23 -07:00
Michael Yang
85aeb42869 Merge pull request #270 from jmorganca/update-llama-cpp
update llama.cpp
2023-08-03 12:09:00 -07:00
Michael Yang
c5bcf32823 update llama.cpp 2023-08-03 11:50:24 -07:00
Michael Yang
a71ff3f6a2 use a pipe to push to registry with progress
switch to a monolithic upload instead of a chunk upload through a pipe
to report progress
2023-08-03 10:37:13 -07:00
Michael Chiang
f0b365a478 Merge pull request #268 from jmorganca/mchiang0610-patch-2
Update README.md
2023-08-03 11:23:31 -04:00
Michael Chiang
df8048fecd Update README.md 2023-08-03 11:22:57 -04:00
Michael Yang
da2459d519 Update README.md (#265) 2023-08-02 22:38:32 -04:00
Bruce MacDonald
bd6d741d87 tell users to check the server error logs 2023-08-02 17:08:11 -04:00
Bruce MacDonald
8b1e791820 allow specifying zero values in modelfile 2023-08-02 17:07:53 -04:00
Jeffrey Morgan
03cff3a225 server: reset digest at end of generate 2023-08-02 16:15:44 -04:00
Michael Yang
cc509a994e Merge pull request #260 from jmorganca/embed-ggml-metal
override ggml-metal if the file is different
2023-08-02 13:01:46 -07:00
Michael Yang
0e79e52ddd override ggml-metal if the file is different 2023-08-02 12:50:30 -07:00
Jeffrey Morgan
6fbb380076 hide dock icon if window closes 2023-08-02 11:05:34 -04:00
Bruce MacDonald
8f8b6288ac check server is running before running command 2023-08-02 10:51:23 -04:00
Michael Yang
b98096389d Merge pull request #255 from jmorganca/update-llama-cpp
Update llama cpp
2023-08-01 17:18:33 -07:00
Michael Yang
74a5f7e698 no gpu for 70B model 2023-08-01 17:12:50 -07:00
Michael Yang
7a1c3e62dc update llama.cpp 2023-08-01 16:54:01 -07:00
Jeffrey Morgan
da52f5bfdd run npm install on build 2023-08-01 17:41:25 -04:00
Bruce MacDonald
50e87c6691 read from os executable 2023-08-01 16:01:55 -04:00
Gerd
e4a970ece1 Add model update to README.md (#252) 2023-08-01 15:06:33 -04:00
Jeffrey Morgan
4ca43a694c remove newlines between list items in README.md 2023-08-01 15:05:39 -04:00
Bruce MacDonald
765994362c use head to check heartbeat 2023-08-01 14:50:38 -04:00
Bruce MacDonald
40a25bf8c3 pr comments 2023-08-01 13:48:48 -04:00
Bruce MacDonald
1c5a8770ee read runner parameter options from map
- read runner options from map to see what was specified explicitly and overwrite zero values
2023-08-01 13:38:19 -04:00
Bruce MacDonald
daa0d1de7a allow specifying zero values in modelfile 2023-08-01 13:37:50 -04:00
Jeffrey Morgan
58daeb962a add llama2-uncensored to model list 2023-08-01 11:25:01 -04:00
Jeffrey Morgan
528bafa585 cache loaded model 2023-08-01 11:24:18 -04:00
Michael Chiang
81f75696e2 Merge pull request #251 from jmorganca/mchiang0610-patch-2
add examples of projects using Ollama
2023-08-01 11:16:14 -04:00
Michael Chiang
8bdcf894bd Update README.md
add examples of projects using Ollama
2023-08-01 11:14:54 -04:00
Michael Chiang
fe530423a5 Merge pull request #249 from sestinj/main
Add "Awesome projects built with Ollama" section to README, including Continue
2023-08-01 08:07:50 -07:00
Michael Yang
05e390205b Merge pull request #250 from jmorganca/fixes
Fixes
2023-07-31 21:47:42 -07:00
Michael Yang
872011630a fix license 2023-07-31 21:46:48 -07:00
Michael Yang
203fdbc4b8 check err 2023-07-31 21:46:48 -07:00
Michael Yang
70e0ab6b3d remove unnecessary fmt.Sprintf 2023-07-31 21:46:47 -07:00
Michael Yang
319f078dd9 remove -Werror
there are compile warnings on Linux which -Werror elevates to errors,
preventing compile
2023-07-31 21:45:56 -07:00
Jeffrey Morgan
9968153729 fix Go warnings 2023-07-31 21:37:40 -04:00
Jeffrey Morgan
7da249fcc1 only build metal for darwin,arm target 2023-07-31 21:35:23 -04:00
Bruce MacDonald
f529626c6c log prediction failures 2023-07-31 17:39:20 -04:00
Bruce MacDonald
36d6081ed1 find symlink of mac app 2023-07-31 17:38:10 -04:00
Nate Sesti
aadedda486 Update README.md 2023-07-31 13:59:39 -07:00
Bruce MacDonald
671eec6da9 log prediction failures 2023-07-31 16:46:37 -04:00
Bruce MacDonald
e72fe7945f check server is running before running command 2023-07-31 16:25:57 -04:00
Bruce MacDonald
d1c098b038 tell users to check the server error logs 2023-07-31 11:49:33 -04:00
Jeffrey Morgan
90ba0b80c7 fix build_darwin.sh 2023-07-29 22:36:59 -04:00
Patrick Devine
39bb25d5f6 allow multiline text using three double-quotes (#239) 2023-07-29 13:35:23 -07:00
Michael Yang
eadee46840 Merge pull request #236 from jmorganca/check-os-walk
check os.Walk err
2023-07-28 14:14:21 -07:00
Jeffrey Morgan
2e2e624d21 app: use notarytool for notarizing 2023-07-28 12:23:56 -07:00
Jeffrey Morgan
ed832ce3b7 darwin build script 2023-07-28 12:23:27 -07:00
Michael Yang
227da16909 Merge pull request #235 from jmorganca/rm-ioutil
remove io/ioutil import
2023-07-28 12:19:06 -07:00
Michael Yang
bd58528fbd check os.Walk err 2023-07-28 12:15:31 -07:00
Michael Yang
c5e447a359 remove io/ioutil import
ioutil is deprecated
2023-07-28 12:06:03 -07:00
Michael Yang
fc40a4f166 Merge pull request #234 from jmorganca/fix-parse-license
use max scan token size to hold large objects
2023-07-28 12:03:51 -07:00
Michael Yang
9c7f30d31c use max scan token size to hold large objects 2023-07-28 11:43:31 -07:00
Bruce MacDonald
6ed3ec0cb3 Allow specifying stop conditions in Modelfile 2023-07-28 12:31:08 -04:00
Bruce MacDonald
47bda0b860 add stop to docs 2023-07-28 12:30:27 -04:00
Jeffrey Morgan
c75cafdb58 build for universal architecture on macos 2023-07-28 12:18:11 -04:00
Bruce MacDonald
f5cbcb08e6 specify stop params separately 2023-07-28 11:29:00 -04:00
Jeffrey Morgan
67b6f8ba86 add ggml-metal.metal to .gitignore 2023-07-28 11:04:21 -04:00
Bruce MacDonald
184ad8f057 allow specifying stop conditions in modelfile 2023-07-28 11:02:04 -04:00
Jeffrey Morgan
822a0e36eb lower batch size to 512 2023-07-28 10:56:21 -04:00
Jeffrey Morgan
18b6b601ad app: cleanup README.md 2023-07-28 10:51:41 -04:00
Bruce MacDonald
0345070dfa update model file docs 2023-07-28 10:33:52 -04:00
Jeffrey Morgan
dffc8b6e09 update llama.cpp to d91f3f0 2023-07-28 08:07:48 -04:00
Jeffrey Morgan
0871083776 app: fix tray icon color scheme in dark mode 2023-07-28 07:03:46 -04:00
Michael Yang
e5b26c3aa2 Merge pull request #221 from jmorganca/embed-metal
embed ggml-metal.metal
2023-07-27 17:24:41 -07:00
Michael Yang
3549676678 embed ggml-metal.metal 2023-07-27 17:23:29 -07:00
Michael Yang
8fa477fadb Merge pull request #225 from jmorganca/stop-conditions
add stop conditions
2023-07-27 17:20:56 -07:00
Michael Yang
fadf75f99d add stop conditions 2023-07-27 17:00:47 -07:00
Patrick Devine
01d155c969 show system/template/license layers from cmd prompt (#223) 2023-07-27 16:58:40 -07:00
Michael Yang
5685c16d4e Merge pull request #211 from jmorganca/update-llama-cpp
update llama.cpp
2023-07-27 16:57:03 -07:00
Michael Yang
db77dfe01f Merge pull request #102 from jmorganca/session-id
Session
2023-07-27 16:46:29 -07:00
Michael Yang
ad3a7d0e2c add NumGQA 2023-07-27 14:05:11 -07:00
Michael Yang
18ffeeec45 update llama.cpp 2023-07-27 14:05:11 -07:00
Jeffrey Morgan
688661ab9b increase default batch size to 1024 2023-07-27 16:51:01 -04:00
Michael Chiang
36ad90e8e3 Merge pull request #231 from jmorganca/mchiang0610-discord
Update discord invite link
2023-07-27 15:43:52 -04:00
Michael Chiang
6fff59c637 Update discord invite link
Update discord invite link
2023-07-27 15:43:15 -04:00
Bruce MacDonald
fee7687cf3 Update modelfile.md 2023-07-27 15:15:10 -04:00
Bruce MacDonald
d3bfb4889c Update README.md 2023-07-27 15:13:50 -04:00
Bruce MacDonald
1ac38ec89c improve modelfile docs 2023-07-27 15:13:04 -04:00
Michael Yang
1ad8266473 Merge pull request #226 from jmorganca/fix-modelfile-quotes
refactor scan multiline for reuse
2023-07-27 11:45:41 -07:00
Michael Yang
f5ac8ddfb4 refactor scan multiline for reuse 2023-07-27 11:30:51 -07:00
Michael Yang
cca61181cb sample metrics 2023-07-27 09:31:44 -07:00
Michael Yang
c490416189 lock on llm.lock(); decrease batch size 2023-07-27 09:31:44 -07:00
Michael Yang
f62a882760 add session expiration 2023-07-27 09:31:44 -07:00
Michael Yang
3003fc03fc update predict code 2023-07-27 09:31:44 -07:00
Michael Yang
32aec66e6a add load duration 2023-07-27 09:31:44 -07:00
Michael Yang
35af37a2cb session id 2023-07-27 09:31:44 -07:00
Jeffrey Morgan
dbb3174cbc app: fix #218 and keep dock open on install 2023-07-27 10:53:38 -04:00
Jeffrey Morgan
31673d26d0 app: quit other instance when starting 2023-07-27 00:57:25 -04:00
Jeffrey Morgan
8ba0f328af clobber release artifacts 2023-07-26 18:58:28 -04:00
Jeffrey Morgan
d0e934b497 app: tray cleanup 2023-07-26 14:24:56 -04:00
Jeffrey Morgan
e751e47d70 app: remove dialog, icons for updates 2023-07-26 14:04:36 -04:00
Jeffrey Morgan
19d0f2b4cc publish as pre-release first 2023-07-26 10:48:49 -04:00
Jeffrey Morgan
c48f07f821 app: dont advance on error 2023-07-26 10:46:43 -04:00
Jeffrey Morgan
dc642aa07d web: skip pre-releases 2023-07-25 17:11:57 -04:00
Bruce MacDonald
f1ff892fdd pull model on make if not present locally 2023-07-25 16:53:01 -04:00
Jeffrey Morgan
3f2a100465 app: log app errors to console 2023-07-25 15:42:04 -04:00
Michael Yang
95397416f3 Merge pull request #212 from jmorganca/fix-multiline-parsing
fix multiline string
2023-07-25 11:53:51 -07:00
Michael Yang
8a86aae019 Merge pull request #209 from jmorganca/k-quants
enable k quants
2023-07-25 11:53:29 -07:00
Michael Yang
24c2c77057 fix multiline string
the data needs to remove the multiline quotes but include the command:

e.g.

TEMPLATE """
my template values
"""

should be

TEMPLATE
my template values

after scanning
2023-07-25 11:51:43 -07:00
Michael Yang
5614984f06 Merge pull request #189 from Mohit-Gaur/main
Improve command parsing and multiline string handling
2023-07-25 11:28:10 -07:00
Bruce MacDonald
4c1caa3733 download models when creating from modelfile 2023-07-25 14:25:13 -04:00
Bruce MacDonald
12ab8f8f5f Revert "pull model on make if not present locally"
This reverts commit 360a10ace391a674de60aa7b9b8cb65e8074027c.
2023-07-25 14:18:46 -04:00
Bruce MacDonald
8ebbd12f21 pull model on make if not present locally 2023-07-25 14:18:46 -04:00
Eva Ho
07971759fa fix typo 2023-07-25 13:30:52 -04:00
Mohit Gaur
f5f79049c2 Incorporate code review improvements 2023-07-25 22:52:23 +05:30
Michael Yang
726bc647b2 enable k quants 2023-07-25 08:39:58 -07:00
Bruce MacDonald
af9039a167 better error message when model not found on pull 2023-07-25 10:30:48 -04:00
Bruce MacDonald
07ed69bc37 remove reduandant err var 2023-07-25 10:30:14 -04:00
Michael Yang
0deb3767fc Merge pull request #205 from jmorganca/accelerate
enable accelerate
2023-07-24 20:06:05 -07:00
Michael Yang
cb55fa9270 enable accelerate 2023-07-24 17:14:45 -07:00
Michael Yang
93bc9f17a1 Merge pull request #192 from jmorganca/update-development.md
update development.md
2023-07-24 16:13:22 -07:00
Bruce MacDonald
536028c35a better error message when model not found on pull 2023-07-24 17:48:17 -04:00
Michael Chiang
aedf3d1f38 Merge pull request #196 from isbkch/main
add devops-engineer example
2023-07-24 17:10:22 -04:00
iLyas Bakouch
91d927abc5 Update Modelfile 2023-07-24 16:43:11 -04:00
iLyas Bakouch
ba8df10a43 Update examples/devops-engineer/Modelfile
Co-authored-by: Jeffrey Morgan <251292+jmorganca@users.noreply.github.com>
2023-07-24 16:42:08 -04:00
Bruce MacDonald
abf614804b remove file on digest mismatch 2023-07-24 21:59:12 +02:00
Bruce MacDonald
a0dbbb23c4 truncate file size on resume 2023-07-24 21:58:32 +02:00
Bruce MacDonald
0fd6278446 do not panic server if file cannot be opened 2023-07-24 15:24:34 -04:00
Bruce MacDonald
29fe07f0cc make response errors unique for error trace 2023-07-24 21:21:18 +02:00
Bruce MacDonald
abfc73d31e make response errors unique for error trace 2023-07-24 15:04:21 -04:00
Bruce MacDonald
5a5ca8e7ff remove file on digest mismatch 2023-07-24 14:53:01 -04:00
Ilyas Bakouch
f24a6f5988 add devops-engineer example 2023-07-24 14:44:44 -04:00
Bruce MacDonald
fdbef6c95e truncate file size on resume 2023-07-24 14:36:19 -04:00
Michael Yang
24e43e3212 update development.md 2023-07-24 09:43:57 -07:00
Patrick Devine
4cb42ca55e add copy command (#191) 2023-07-24 11:27:28 -04:00
Michael Yang
ec5e22ac85 Merge pull request #174 from jmorganca/tokenize
allocate a large enough tokens slice
2023-07-24 08:22:51 -07:00
Mohit Gaur
ed89da92b4 Improve command parsing and multiline string handling 2023-07-24 18:11:13 +05:30
Jeffrey Morgan
a3297fed41 add /api/create docs to readme 2023-07-23 18:01:05 -04:00
Patrick Devine
88c55199f8 change push to chunked uploads from monolithic (#179) 2023-07-22 17:31:26 -07:00
hoyyeva
c448443813 Merge pull request #164 from jmorganca/restart-server
restart server more gracefully
2023-07-22 18:19:22 -04:00
Michael Yang
efacd45fc5 Merge pull request #175 from jk1jk/main
Update .gitignore
2023-07-22 09:40:37 -07:00
Michael Yang
fa522695c4 Merge pull request #178 from jmorganca/gin-cors
use gin-contrib/cors middleware
2023-07-22 09:40:01 -07:00
Michael Yang
8609db77ea use gin-contrib/cors middleware 2023-07-22 09:39:08 -07:00
Ikko Eltociear Ashimine
65d93a86b2 Update modelfile.md (#177)
fix markdown.
2023-07-22 08:19:30 -07:00
jk1jk
e6c427ce4d Update .gitignore 2023-07-22 17:00:52 +03:00
Michael Yang
b71c67b6ba allocate a large enough tokens slice 2023-07-21 23:05:15 -07:00
Patrick Devine
6d6b0d3321 change error handler behavior and fix error when a model isn't found (#173) 2023-07-21 23:02:12 -07:00
Michael Yang
37324a0a00 Merge pull request #172 from jmorganca/set-vars-first
fix vars.First
2023-07-21 20:55:06 -07:00
Michael Yang
20a5d99f77 fix vars.First 2023-07-21 20:45:32 -07:00
Patrick Devine
3b43cc019a fix extended tag names (#171) 2023-07-21 20:27:25 -07:00
Patrick Devine
b8421dce3d get the proper path for blobs to delete (#168) 2023-07-21 17:30:40 -07:00
Patrick Devine
9f6e97865c allow pushing/pulling to insecure registries (#157) 2023-07-21 15:42:19 -07:00
Eva Ho
9657314ae2 address comment 2023-07-21 17:29:07 -04:00
Eva Ho
3f7d2336c7 add prettier and address comments 2023-07-21 17:10:05 -04:00
Eva Ho
e0a73d7fbe address comment 2023-07-21 16:53:56 -04:00
hoyyeva
b08c4ca2bd Update app/src/index.ts
Co-authored-by: Jeffrey Morgan <251292+jmorganca@users.noreply.github.com>
2023-07-21 16:53:56 -04:00
Eva Ho
734892f1e2 address comment 2023-07-21 16:53:56 -04:00
Eva Ho
d2bfaeac63 format code 2023-07-21 16:53:56 -04:00
Eva Ho
0768b1b907 restart server with condition and timeout 2023-07-21 16:53:56 -04:00
Bruce MacDonald
f5f0da06d9 Merge pull request #166 from jmorganca/brucemacd/dev-cgo 2023-07-21 22:48:10 +02:00
Bruce MacDonald
52f04e39f2 Note that CGO must be enabled in dev docs 2023-07-21 22:36:36 +02:00
Jeffrey Morgan
3c8f4c03d7 web: tweak homepage text 2023-07-21 09:57:57 -07:00
Bruce MacDonald
7ba1308595 Merge pull request #147 from jmorganca/brucemacd/cli-err-display
Improve CLI error display
2023-07-21 16:10:19 +02:00
Jeffrey Morgan
91cd54016c add basic REST api documentation 2023-07-21 00:47:17 -07:00
Patrick Devine
e7a393de54 add rm command for models (#151) 2023-07-20 16:09:23 -07:00
Jeffrey Morgan
8454f298ac fix example Modelfiles 2023-07-20 15:46:32 -07:00
Patrick Devine
a3badaf103 add ls alias (#152) 2023-07-20 15:28:27 -07:00
Michael Yang
50e8e5bdbe Merge pull request #148 from jmorganca/more-llama-files
add llama.cpp mpi, opencl files
2023-07-20 14:26:46 -07:00
Michael Yang
8526e1f5f1 add llama.cpp mpi, opencl files 2023-07-20 14:19:55 -07:00
Michael Yang
0cfdbb95cc Merge pull request #146 from jmorganca/fix-windows-pull
windows: fix model pulling
2023-07-20 13:41:54 -07:00
Michael Yang
6cea2061ec windows: fix model pulling 2023-07-20 12:35:04 -07:00
Michael Yang
2832801c2a Merge pull request #91 from jmorganca/fix-stream-errors
fix stream errors
2023-07-20 12:21:59 -07:00
Jeffrey Morgan
23a37dc466 clean up README.md 2023-07-20 12:21:36 -07:00
Michael Yang
992892866b Merge pull request #145 from jmorganca/verify-digest
verify blob digest
2023-07-20 12:14:21 -07:00
Michael Yang
dde880290c Merge pull request #131 from jmorganca/update-llama-cpp
update llama.cpp to e782c9e735f93ab4767ffc37462c523b73a17ddc
2023-07-20 12:14:10 -07:00
Michael Yang
1f27d7f1b8 fix stream errors 2023-07-20 12:12:08 -07:00
Bruce MacDonald
00aaa05901 remove unused code 2023-07-20 20:57:30 +02:00
Michael Yang
a83eaa7a9f update llama.cpp to e782c9e735f93ab4767ffc37462c523b73a17ddc 2023-07-20 11:55:56 -07:00
Michael Yang
5156e48c2a add script to update llama.cpp 2023-07-20 11:54:59 -07:00
Michael Yang
bf198c3918 verify blob digest 2023-07-20 11:53:57 -07:00
Bruce MacDonald
09dc6273e3 suppress error when running list before pulling image 2023-07-20 20:53:09 +02:00
Bruce MacDonald
ebaa33ac28 display gin api errors in cli 2023-07-20 20:45:12 +02:00
Bruce MacDonald
3ec4ebc562 remove unused code 2023-07-20 20:18:00 +02:00
Jeffrey Morgan
6a19724d5f remove colon from library modelfiles 2023-07-20 09:51:30 -07:00
Jeffrey Morgan
924ce739f9 documentation on the model format 2023-07-20 09:03:41 -07:00
Michael Chiang
e1973e6780 Update icon (#139) 2023-07-20 08:55:20 -07:00
Jeffrey Morgan
f1b08ef40e set temperature on README.md example 2023-07-20 08:17:09 -07:00
Jeffrey Morgan
31f0cb7742 new Modelfile syntax 2023-07-20 07:52:24 -07:00
Jeffrey Morgan
e4b2ccfb23 web: clean up remaining models.json usage 2023-07-20 07:51:46 -07:00
Bruce MacDonald
a3d7bb0a30 Merge pull request #136 from jmorganca/brucemacd/remove-models
Delete models.json
2023-07-20 16:40:46 +02:00
Bruce MacDonald
77e49f3822 Delete models.json 2023-07-20 16:32:50 +02:00
Jeffrey Morgan
8945b25484 new modelfile syntax on branch 2023-07-20 02:24:21 -07:00
Jeffrey Morgan
99ccf0c5d3 fix broken link in README.md 2023-07-20 02:15:11 -07:00
Jeffrey Morgan
d59b164fa2 add prompt back to parser 2023-07-20 01:13:30 -07:00
Michael Yang
55b5f5dc34 ctrl+c on empty line exits (#135) 2023-07-20 00:53:08 -07:00
Jeffrey Morgan
3b135ac963 parser: fix case where multi line string termination error wouldnt show 2023-07-20 00:43:22 -07:00
Jeffrey Morgan
e6bae8d916 parser: keep seeking until eof 2023-07-20 00:37:52 -07:00
Jeffrey Morgan
d9f54300c3 library: add echo for verify progress 2023-07-19 23:58:28 -07:00
Jeffrey Morgan
1511219763 update library modelfiles with new syntax 2023-07-19 23:57:22 -07:00
Jeffrey Morgan
ada0add89b fix llama library templates 2023-07-19 23:53:40 -07:00
Jeffrey Morgan
75e508e1d6 remove old templates 2023-07-19 23:47:13 -07:00
Michael Yang
6f046dbf18 Update images.go (#134) 2023-07-19 23:46:01 -07:00
Jeffrey Morgan
cd820c8bca move wizard-vicuna to correct location 2023-07-19 23:44:03 -07:00
Jeffrey Morgan
88e755d7fd Add files for library models 2023-07-19 23:40:37 -07:00
Michael Yang
6984171cfd Merge pull request #93 from jmorganca/split-prompt
separate prompt into template and system
2023-07-19 23:25:33 -07:00
Michael Yang
60b4db6389 add .First 2023-07-19 23:24:32 -07:00
Michael Chiang
7c6ea2a966 fix dangling """ 2023-07-19 23:24:32 -07:00
Michael Chiang
c161aef5f9 update example 2023-07-19 23:24:32 -07:00
Michael Chiang
c47786c1b0 Update docs/modelfile.md
Co-authored-by: Michael Yang <mxyng@pm.me>
2023-07-19 23:24:32 -07:00
Michael Chiang
df100ce540 Update docs/modelfile.md
Co-authored-by: Michael Yang <mxyng@pm.me>
2023-07-19 23:24:32 -07:00
Michael Chiang
5c5948b4e7 clean up my previous empty sentences 2023-07-19 23:24:32 -07:00
Michael Yang
1c72e46e09 update modelfile.md 2023-07-19 23:24:32 -07:00
Michael Yang
ca210ba480 handle vnd.ollama.image.prompt for compat 2023-07-19 23:24:32 -07:00
Michael Yang
df146c41e2 separate prompt into template and system 2023-07-19 23:24:31 -07:00
Jeffrey Morgan
2d305fa99a allow relative paths in FROM instruction 2023-07-19 21:55:15 -07:00
Patrick Devine
e4d7f3e287 vendor in progress bar and change to bytes instead of bibytes (#130) 2023-07-19 17:24:03 -07:00
Jeffrey Morgan
f2044b5838 web: fix newsletter signup 2023-07-19 16:11:56 -07:00
Michael Chiang
d53988f619 Merge pull request #128 from jmorganca/mchiang0610-patch-1
Update modelfile.md
2023-07-19 13:40:39 -07:00
Michael Chiang
ac88ab48d9 update 2023-07-19 13:37:21 -07:00
Michael Yang
84c6ee8cc6 Merge pull request #104 from jmorganca/interactive-readline
use readline
2023-07-19 13:36:24 -07:00
Michael Yang
dbc90576b8 add verbose/quiet commands 2023-07-19 13:34:56 -07:00
Michael Yang
84200dcde6 use readline 2023-07-19 13:34:56 -07:00
Michael Chiang
e54c08da89 updating prompt 2023-07-19 13:34:40 -07:00
Michael Chiang
31413857ea organizing examples 2023-07-19 13:25:14 -07:00
Michael Chiang
25f874c030 Update modelfile.md 2023-07-19 12:48:57 -07:00
Jeffrey Morgan
10d502611f fix discord link in README.md 2023-07-19 12:31:48 -07:00
Jeffrey Morgan
7fe4103b94 add discord link, remove repeated text 2023-07-19 12:28:50 -07:00
Michael Chiang
7fbdc8e2c1 Update modelfile.md 2023-07-19 11:38:06 -07:00
Eva Ho
9c5572d51f add discord link back 2023-07-19 13:03:26 -04:00
Matt Williams
75eb28f574 Merge pull request #125 from jmorganca/matt/addlicensetomodelfiledoc
Updated modelfile doc to include license
2023-07-19 08:57:06 -07:00
Patrick Devine
56b6a1720f add llama2:13b model to the readme (#126) 2023-07-19 08:21:28 -07:00
Eva Ho
dfceca48a7 update icons to have different images for bright and dark mode 2023-07-19 11:14:43 -04:00
Matt Williams
bbb67002c3 get rid of latest
Signed-off-by: Matt Williams <m@technovangelist.com>
2023-07-19 07:40:40 -07:00
Michael Chiang
0294216ea9 Merge pull request #124 from DavidZirinsky/patch-1
Update README.md
2023-07-19 07:40:24 -07:00
Matt Williams
7a62b2d2ab Update the FROM instructions
Signed-off-by: Matt Williams <m@technovangelist.com>
2023-07-19 07:39:40 -07:00
Eva Ho
f08c050e57 fix page transitions flickering 2023-07-19 10:19:24 -04:00
Matt Williams
67c8d49757 Updated modelfile doc to include license
and attributed midjourneyprompt

Signed-off-by: Matt Williams <m@technovangelist.com>
2023-07-19 07:16:38 -07:00
DavidZirinsky
ffcd90e8a7 Update README.md
I needed to do this to run the project
2023-07-19 08:14:44 -06:00
Jeffrey Morgan
4ca7c4be1f dont consume reader when calculating digest 2023-07-19 00:47:55 -07:00
Michael Chiang
17b7af78f0 Merge pull request #115 from jmorganca/Add-wizard-vicuna-uncensored-model-link
Add wizard vicuna uncensored model link
2023-07-18 22:58:07 -07:00
Jeffrey Morgan
4c1dc52083 app: create /usr/local/bin/ if it does not exist 2023-07-18 22:50:52 -07:00
Patrick Devine
572fc9099f add license layers to the parser (#116) 2023-07-18 22:49:38 -07:00
Michael Chiang
3020f29041 Add wizard vicuna uncensored model link 2023-07-18 22:19:12 -07:00
Michael Yang
a6d03dd510 Merge pull request #110 from jmorganca/fix-pull-0-bytes
fix pull 0 bytes on completed layer
2023-07-18 19:38:59 -07:00
Michael Yang
68df36ae50 fix pull 0 bytes on completed layer 2023-07-18 19:38:11 -07:00
Michael Yang
5540305293 Merge pull request #112 from jmorganca/fix-relative-modelfile
resolve modelfile before passing to server
2023-07-18 19:36:24 -07:00
Michael Yang
d4cfee79d5 resolve modelfile before passing to server 2023-07-18 19:34:05 -07:00
Michael Yang
6e36f948df Merge pull request #109 from jmorganca/fix-create-memory
fix memory leak in create
2023-07-18 17:25:19 -07:00
Michael Yang
553fa39fe8 fix memory leak in create 2023-07-18 17:14:17 -07:00
Jeffrey Morgan
820e581ad8 web: fix typos and add link to discord 2023-07-18 17:03:40 -07:00
Isaac McFadyen
d14785738e README typo fix (#106)
* Fixed typo in README
2023-07-18 16:24:57 -07:00
Patrick Devine
9e15635c2d attempt two for skipping files in the file walk (#105) 2023-07-18 15:37:01 -07:00
Jeffrey Morgan
3e10f902f5 add mario example 2023-07-18 14:27:36 -07:00
Jeffrey Morgan
aa6714f25c fix typo in README.md 2023-07-18 14:03:11 -07:00
Jeffrey Morgan
7f3a37aed4 fix typo 2023-07-18 13:32:06 -07:00
Jeffrey Morgan
7b08280355 move download to the top of README.md 2023-07-18 13:31:25 -07:00
Jeffrey Morgan
e3cc4d5eac update README.md with new syntax 2023-07-18 13:22:46 -07:00
Jeffrey Morgan
8c85dfb735 Add README.md for examples 2023-07-18 13:22:46 -07:00
hoyyeva
ac62a413e5 Merge pull request #103 from jmorganca/web-update
website content and design update
2023-07-18 16:18:04 -04:00
Eva Ho
d1f89778e9 fix css on smaller screen 2023-07-18 16:17:42 -04:00
Eva Ho
df67a90e64 fix css 2023-07-18 16:02:45 -04:00
Eva Ho
576ae644de enable downloader 2023-07-18 15:57:39 -04:00
Eva Ho
7e52e51db1 update website text and design 2023-07-18 15:56:43 -04:00
Michael Chiang
f12df8d79a Merge pull request #101 from jmorganca/adding-logo
add logo
2023-07-18 12:47:20 -07:00
Michael Chiang
65de730bdb Update README.md
add logo
2023-07-18 12:45:38 -07:00
Patrick Devine
9658a5043b skip files in the list if we can't get the correct model path (#100) 2023-07-18 12:39:08 -07:00
Jeffrey Morgan
280fbe8019 app: use llama2 instead of orca 2023-07-18 12:36:03 -07:00
Jeffrey Morgan
2e339c2bab flatten examples 2023-07-18 12:33:50 -07:00
Michael Yang
38f0c54c64 Merge pull request #99 from jmorganca/mkdir-blobs
fix mkdir blob path
2023-07-18 11:29:05 -07:00
Michael Yang
f20426a768 fix mkdir blob path 2023-07-18 11:24:19 -07:00
Michael Yang
885f67a471 Merge pull request #92 from jmorganca/create-model-spinner
Create model spinner
2023-07-18 11:15:45 -07:00
Eva Ho
a9cc270b4d icon update 2023-07-18 13:33:26 -04:00
Eva Ho
aa281a30e5 updating icons 2023-07-18 13:33:26 -04:00
Matt Williams
760bc3366b Merge pull request #98 from jmorganca/matt/modelfiledoc
First stab at a modelfile doc
2023-07-18 09:16:01 -07:00
Patrick Devine
5bea29f610 add new list command (#97) 2023-07-18 09:09:45 -07:00
Matt Williams
9310ee3967 First stab at a modelfile doc
Signed-off-by: Matt Williams <m@technovangelist.com>
2023-07-18 08:22:17 -07:00
Matt Williams
da7ddbb4dc Merge pull request #95 from jmorganca/matt/examplemodelfiles 2023-07-18 05:32:38 -07:00
Patrick Devine
4a28a2f093 add modelpaths (#96) 2023-07-17 22:44:21 -07:00
Matt Williams
3d9498dc95 Some simple modelfile examples
Signed-off-by: Matt Williams <m@technovangelist.com>
2023-07-17 17:16:59 -07:00
Jeffrey Morgan
1f45f7bb52 convert commands to uppercase in parser 2023-07-17 15:34:08 -07:00
Michael Yang
2e6c64a8f9 Merge pull request #88 from jmorganca/modelfile-params
modelfile params
2023-07-17 14:18:56 -07:00
Michael Yang
c7dd52271c remove debugging messages 2023-07-17 14:17:34 -07:00
Michael Yang
e4300e1eb7 add spinner to create 2023-07-17 14:15:42 -07:00
Michael Yang
aba706ea2d remove unused persistent pre run 2023-07-17 14:14:57 -07:00
Michael Yang
53d0052c6c unavoid unnecessary type conversion 2023-07-17 12:35:03 -07:00
Michael Yang
28a136e9a3 modelfile params 2023-07-17 12:35:03 -07:00
Jeffrey Morgan
529ff9ab6d Add note to README.md about Apple Silicon support 2023-07-17 11:22:34 -07:00
Michael Yang
41aca47d43 Merge pull request #87 from jmorganca/windows
fix file paths for windows
2023-07-17 11:21:25 -07:00
Michael Yang
3862a51a6a create directories if they do not exist 2023-07-17 11:18:48 -07:00
Michael Yang
bcb612a30a fix file paths for windows 2023-07-17 10:47:47 -07:00
hoyyeva
c05219aa0d Merge pull request #86 from jmorganca/welcome-screen-improve
welcome screen improvements
2023-07-17 13:44:53 -04:00
Eva Ho
508ffbbb15 improve the copy command experience 2023-07-17 13:17:52 -04:00
Jeffrey Morgan
59fa93cdd4 app: simpler winston settings 2023-07-16 20:26:12 -07:00
Jeffrey Morgan
952abe029b app: remove unused import 2023-07-16 20:25:50 -07:00
Jeffrey Morgan
f923855906 app: keep installer in foreground 2023-07-16 20:25:11 -07:00
Jeffrey Morgan
9386073e96 app: dont listen for disconnect events 2023-07-16 19:21:50 -07:00
Jeffrey Morgan
52ea4d4bb2 app: use app.on('before-quit') to detect app closing 2023-07-16 19:18:12 -07:00
Jeffrey Morgan
c4ba192187 app: use enum for steps 2023-07-16 18:47:23 -07:00
Jeffrey Morgan
fe758ca319 app: do not restart the server if app is closing 2023-07-16 18:41:43 -07:00
Jeffrey Morgan
08b933cc10 app: use async and `await instead of callbacks 2023-07-16 18:38:37 -07:00
Jeffrey Morgan
6746a00af8 app: format app.tsx 2023-07-16 18:29:11 -07:00
Patrick Devine
2fb52261ad basic distribution w/ push/pull (#78)
* basic distribution w/ push/pull

* add the parser

* add create, pull, and push

* changes to the parser, FROM line, and fix commands

* mkdirp new manifest directories

* make `blobs` directory if it does not exist

* fix go warnings

* add progressbar for model pulls

* move model struct

---------

Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>
2023-07-16 17:02:22 -07:00
Jeffrey Morgan
6fdea03049 docs: remove python.md 2023-07-14 21:41:46 -07:00
Michael Yang
38021ba494 Merge pull request #83 from jmorganca/multibyte-responses
fix multibyte responses
2023-07-14 20:12:12 -07:00
Michael Yang
6c9fa573ae Merge pull request #82 from jmorganca/filepath
windows build
2023-07-14 20:11:55 -07:00
Michael Yang
40c9dc0a31 fix multibyte responses 2023-07-14 20:11:44 -07:00
Michael Yang
0142660bd4 size_t 2023-07-14 17:29:16 -07:00
Michael Yang
743e957d88 use filepath for os compat 2023-07-14 17:27:14 -07:00
Jeffrey Morgan
560f36e6c8 app: set first-time-run to true instead of false 2023-07-14 16:50:12 -07:00
hoyyeva
e88dd25bab ollama app welcome screen for first time run (#80) 2023-07-14 16:34:24 -07:00
Michael Yang
567e74e7d7 Merge pull request #81 from jmorganca/fix-race-2
fix race
2023-07-14 15:12:01 -07:00
Michael Yang
5ade3db040 fix race
block on write which only returns when the channel is closed. this is
contrary to the previous arrangement where the handler may return but
the stream hasn't finished writing. it can lead to the client receiving
unexpected responses (since the request has been handled) or worst case
a nil-pointer dereference as the stream tries to flush a nil writer
2023-07-14 15:10:46 -07:00
Michael Yang
965f9ad033 Merge pull request #77 from jmorganca/mem
continue conversation
2023-07-14 14:57:42 -07:00
Michael Yang
5d1c6b7499 Merge pull request #79 from jmorganca/fix-typo
fix typo
2023-07-14 10:50:44 -07:00
Michael Yang
5fefaa5d4d fix typo 2023-07-14 10:47:18 -07:00
Michael Yang
1775647f76 continue conversation
feed responses back into the llm
2023-07-13 17:13:00 -07:00
Michael Yang
77dc1a6d74 Merge pull request #74 from jmorganca/timings
Timings
2023-07-13 10:17:13 -07:00
Michael Yang
05e08d2310 return more info in generate response 2023-07-13 09:37:32 -07:00
Michael Yang
31590284a7 fix route 2023-07-12 19:21:49 -07:00
Michael Yang
f2863cc7f8 Merge pull request #76 from jmorganca/fix-pull
fix pull race
2023-07-12 19:21:13 -07:00
Jeffrey Morgan
4dd296e155 build app in publish script 2023-07-12 19:16:39 -07:00
Jeffrey Morgan
304f419429 update README.md API reference 2023-07-12 19:16:28 -07:00
Michael Yang
2666d3c206 fix pull race 2023-07-12 19:07:23 -07:00
266 changed files with 27820 additions and 42066 deletions

View File

@@ -1,7 +1,8 @@
build
llama/build
.venv
.vscode
ollama
app
web
dist
llm/llama.cpp
.env
.cache
test_data

106
.github/workflows/test.yaml vendored Normal file
View File

@@ -0,0 +1,106 @@
name: test
on:
pull_request:
jobs:
generate:
strategy:
matrix:
os: [ubuntu-latest, macos-latest, windows-latest]
arch: [amd64, arm64]
exclude:
- os: ubuntu-latest
arch: arm64
- os: windows-latest
arch: arm64
runs-on: ${{ matrix.os }}
env:
GOARCH: ${{ matrix.arch }}
steps:
- uses: actions/checkout@v4
- uses: actions/setup-go@v5
with:
go-version: '1.21'
cache: true
- if: ${{ startsWith(matrix.os, 'windows-') }}
shell: pwsh
run: |
$path = vswhere -latest -products * -requires Microsoft.VisualStudio.Component.VC.Tools.x86.x64 -property installationPath
if ($path) {
$path = join-path $path 'Common7\Tools\vsdevcmd.bat'
if (test-path $path) {
cmd /s /c """$path"" $args && set" | where { $_ -match '(\w+)=(.*)' } | foreach {
echo "$($Matches[1])=$($Matches[2])" | Out-File -FilePath $Env:GITHUB_ENV -Encoding utf8 -Append
}
}
}
echo "C:\Program Files\Git\usr\bin" | Out-File -FilePath $Env:GITHUB_PATH -Encoding utf8 -Append
- run: go get ./...
- run: go generate -x ./...
- uses: actions/upload-artifact@v4
with:
name: ${{ matrix.os }}-${{ matrix.arch }}-libraries
path: |
llm/llama.cpp/build/**/lib/*
lint:
needs: generate
strategy:
matrix:
os: [ubuntu-latest, macos-latest, windows-latest]
arch: [amd64, arm64]
exclude:
- os: ubuntu-latest
arch: arm64
- os: windows-latest
arch: arm64
- os: macos-latest
arch: amd64
runs-on: ${{ matrix.os }}
env:
GOARCH: ${{ matrix.arch }}
CGO_ENABLED: "1"
steps:
- uses: actions/checkout@v4
with:
submodules: recursive
- uses: actions/setup-go@v5
with:
go-version: '1.21'
cache: false
- uses: actions/download-artifact@v4
with:
name: ${{ matrix.os }}-${{ matrix.arch }}-libraries
path: llm/llama.cpp/build
- uses: golangci/golangci-lint-action@v3
test:
needs: generate
strategy:
matrix:
os: [ubuntu-latest, macos-latest, windows-latest]
arch: [amd64]
exclude:
- os: ubuntu-latest
arch: arm64
- os: windows-latest
arch: arm64
runs-on: ${{ matrix.os }}
env:
GOARCH: ${{ matrix.arch }}
CGO_ENABLED: "1"
steps:
- uses: actions/checkout@v4
with:
submodules: recursive
- uses: actions/setup-go@v5
with:
go-version: '1.21'
cache: true
- run: go get
- uses: actions/download-artifact@v4
with:
name: ${{ matrix.os }}-${{ matrix.arch }}-libraries
path: llm/llama.cpp/build
- run: go build
- run: go test -v ./...

6
.gitignore vendored
View File

@@ -2,5 +2,11 @@
.vscode
.env
.venv
.swp
dist
ollama
ggml-metal.metal
.cache
*.exe
.idea
test_data

4
.gitmodules vendored Normal file
View File

@@ -0,0 +1,4 @@
[submodule "llama.cpp"]
path = llm/llama.cpp
url = https://github.com/ggerganov/llama.cpp.git
shallow = true

27
.golangci.yaml Normal file
View File

@@ -0,0 +1,27 @@
run:
timeout: 5m
linters:
enable:
- asasalint
- bidichk
- bodyclose
- containedctx
- contextcheck
- exportloopref
- gocheckcompilerdirectives
# FIXME: for some reason this errors on windows
# - gofmt
# - goimports
- misspell
- nilerr
- unused
linters-settings:
errcheck:
# exclude the following functions since we don't generally
# need to be concerned with the returned errors
exclude-functions:
- encoding/binary.Read
- (*os.File).Seek
- (*bufio.Writer).WriteString
- (*github.com/spf13/pflag.FlagSet).Set
- (*github.com/jmorganca/ollama/llm.readSeekOffset).Seek

View File

@@ -1,15 +1,29 @@
FROM golang:1.20
WORKDIR /go/src/github.com/jmorganca/ollama
COPY . .
RUN CGO_ENABLED=1 go build -ldflags '-linkmode external -extldflags "-static"' .
FROM nvidia/cuda:11.8.0-devel-ubuntu22.04
FROM alpine
ARG TARGETARCH
ARG GOFLAGS="'-ldflags=-w -s'"
WORKDIR /go/src/github.com/jmorganca/ollama
RUN apt-get update && apt-get install -y git build-essential cmake
ADD https://dl.google.com/go/go1.21.3.linux-$TARGETARCH.tar.gz /tmp/go1.21.3.tar.gz
RUN mkdir -p /usr/local && tar xz -C /usr/local </tmp/go1.21.3.tar.gz
COPY . .
ENV GOARCH=$TARGETARCH
ENV GOFLAGS=$GOFLAGS
RUN /usr/local/go/bin/go generate ./... \
&& /usr/local/go/bin/go build .
FROM ubuntu:22.04
RUN apt-get update && apt-get install -y ca-certificates
COPY --from=0 /go/src/github.com/jmorganca/ollama/ollama /bin/ollama
EXPOSE 11434
ARG USER=ollama
ARG GROUP=ollama
RUN addgroup -g 1000 $GROUP && adduser -u 1000 -DG $GROUP $USER
USER $USER:$GROUP
ENTRYPOINT ["/bin/ollama"]
ENV OLLAMA_HOST 0.0.0.0
# set some environment variable for better NVIDIA compatibility
ENV PATH=/usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
ENV LD_LIBRARY_PATH=/usr/local/nvidia/lib:/usr/local/nvidia/lib64
ENV NVIDIA_DRIVER_CAPABILITIES=compute,utility
ENTRYPOINT ["/bin/ollama"]
CMD ["serve"]

99
Dockerfile.build Normal file
View File

@@ -0,0 +1,99 @@
ARG GOLANG_VERSION=1.21.3
ARG CMAKE_VERSION=3.22.1
ARG CUDA_VERSION=11.3.1
# Copy the minimal context we need to run the generate scripts
FROM scratch AS llm-code
COPY .git .git
COPY .gitmodules .gitmodules
COPY llm llm
FROM --platform=linux/amd64 nvidia/cuda:$CUDA_VERSION-devel-centos7 AS cuda-build-amd64
ARG CMAKE_VERSION
ARG CGO_CFLAGS
COPY ./scripts/rh_linux_deps.sh /
RUN CMAKE_VERSION=${CMAKE_VERSION} sh /rh_linux_deps.sh
ENV PATH /opt/rh/devtoolset-10/root/usr/bin:$PATH
COPY --from=llm-code / /go/src/github.com/jmorganca/ollama/
WORKDIR /go/src/github.com/jmorganca/ollama/llm/generate
RUN OLLAMA_SKIP_CPU_GENERATE=1 sh gen_linux.sh
FROM --platform=linux/arm64 nvidia/cuda:$CUDA_VERSION-devel-rockylinux8 AS cuda-build-arm64
ARG CMAKE_VERSION
ARG CGO_CFLAGS
COPY ./scripts/rh_linux_deps.sh /
RUN CMAKE_VERSION=${CMAKE_VERSION} sh /rh_linux_deps.sh
ENV PATH /opt/rh/gcc-toolset-10/root/usr/bin:$PATH
COPY --from=llm-code / /go/src/github.com/jmorganca/ollama/
WORKDIR /go/src/github.com/jmorganca/ollama/llm/generate
RUN OLLAMA_SKIP_CPU_GENERATE=1 sh gen_linux.sh
FROM --platform=linux/amd64 rocm/dev-centos-7:5.7.1-complete AS rocm-5-build-amd64
ARG CMAKE_VERSION
ARG CGO_CFLAGS
COPY ./scripts/rh_linux_deps.sh /
RUN CMAKE_VERSION=${CMAKE_VERSION} sh /rh_linux_deps.sh
ENV PATH /opt/rh/devtoolset-10/root/usr/bin:$PATH
ENV LIBRARY_PATH /opt/amdgpu/lib64
COPY --from=llm-code / /go/src/github.com/jmorganca/ollama/
WORKDIR /go/src/github.com/jmorganca/ollama/llm/generate
RUN OLLAMA_SKIP_CPU_GENERATE=1 sh gen_linux.sh
FROM --platform=linux/amd64 rocm/dev-centos-7:6.0-complete AS rocm-6-build-amd64
ARG CMAKE_VERSION
ARG CGO_CFLAGS
COPY ./scripts/rh_linux_deps.sh /
RUN CMAKE_VERSION=${CMAKE_VERSION} sh /rh_linux_deps.sh
ENV PATH /opt/rh/devtoolset-10/root/usr/bin:$PATH
ENV LIBRARY_PATH /opt/amdgpu/lib64
COPY --from=llm-code / /go/src/github.com/jmorganca/ollama/
WORKDIR /go/src/github.com/jmorganca/ollama/llm/generate
RUN OLLAMA_SKIP_CPU_GENERATE=1 sh gen_linux.sh
FROM --platform=linux/amd64 centos:7 AS cpu-build-amd64
ARG CMAKE_VERSION
ARG GOLANG_VERSION
ARG OLLAMA_CUSTOM_CPU_DEFS
ARG CGO_CFLAGS
COPY ./scripts/rh_linux_deps.sh /
RUN CMAKE_VERSION=${CMAKE_VERSION} GOLANG_VERSION=${GOLANG_VERSION} sh /rh_linux_deps.sh
ENV PATH /opt/rh/devtoolset-10/root/usr/bin:$PATH
COPY --from=llm-code / /go/src/github.com/jmorganca/ollama/
WORKDIR /go/src/github.com/jmorganca/ollama/llm/generate
RUN sh gen_linux.sh
FROM --platform=linux/arm64 centos:7 AS cpu-build-arm64
ARG CMAKE_VERSION
ARG GOLANG_VERSION
ARG OLLAMA_CUSTOM_CPU_DEFS
ARG CGO_CFLAGS
COPY ./scripts/rh_linux_deps.sh /
RUN CMAKE_VERSION=${CMAKE_VERSION} GOLANG_VERSION=${GOLANG_VERSION} sh /rh_linux_deps.sh
ENV PATH /opt/rh/devtoolset-10/root/usr/bin:$PATH
COPY --from=llm-code / /go/src/github.com/jmorganca/ollama/
WORKDIR /go/src/github.com/jmorganca/ollama/llm/generate
RUN sh gen_linux.sh
FROM --platform=linux/amd64 cpu-build-amd64 AS build-amd64
ENV CGO_ENABLED 1
ARG GOFLAGS
ARG CGO_CFLAGS
WORKDIR /go/src/github.com/jmorganca/ollama
COPY . .
COPY --from=cuda-build-amd64 /go/src/github.com/jmorganca/ollama/llm/llama.cpp/build/linux/ llm/llama.cpp/build/linux/
COPY --from=rocm-5-build-amd64 /go/src/github.com/jmorganca/ollama/llm/llama.cpp/build/linux/ llm/llama.cpp/build/linux/
COPY --from=rocm-6-build-amd64 /go/src/github.com/jmorganca/ollama/llm/llama.cpp/build/linux/ llm/llama.cpp/build/linux/
RUN go build .
FROM --platform=linux/arm64 cpu-build-arm64 AS build-arm64
ENV CGO_ENABLED 1
ARG GOLANG_VERSION
ARG GOFLAGS
ARG CGO_CFLAGS
WORKDIR /go/src/github.com/jmorganca/ollama
COPY . .
COPY --from=cuda-build-arm64 /go/src/github.com/jmorganca/ollama/llm/llama.cpp/build/linux/ llm/llama.cpp/build/linux/
RUN go build .
FROM build-$TARGETARCH

314
README.md
View File

@@ -1,108 +1,330 @@
![ollama](https://github.com/jmorganca/ollama/assets/251292/961f99bb-251a-4eec-897d-1ba99997ad0f)
<div align="center">
<picture>
<source media="(prefers-color-scheme: dark)" height="200px" srcset="https://github.com/jmorganca/ollama/assets/3325447/56ea1849-1284-4645-8970-956de6e51c3c">
<img alt="logo" height="200px" src="https://github.com/jmorganca/ollama/assets/3325447/0d0b44e2-8f4a-4e99-9b52-a5c1c741c8f7">
</picture>
</div>
# Ollama
Run large language models with `llama.cpp`.
[![Discord](https://dcbadge.vercel.app/api/server/ollama?style=flat&compact=true)](https://discord.gg/ollama)
> Note: certain models that can be run with Ollama are intended for research and/or non-commercial use only.
Get up and running with large language models locally.
### Features
### macOS
- Download and run popular large language models
- Switch between multiple models on the fly
- Hardware acceleration where available (Metal, CUDA)
- Fast inference server written in Go, powered by [llama.cpp](https://github.com/ggerganov/llama.cpp)
- REST API to use with your application (python, typescript SDKs coming soon)
[Download](https://ollama.ai/download/Ollama-darwin.zip)
## Install
### Windows
- [Download](https://ollama.ai/download) for macOS
- Download for Windows (coming soon)
Coming soon! For now, you can install Ollama on Windows via WSL2.
You can also build the [binary from source](#building).
### Linux & WSL2
```
curl https://ollama.ai/install.sh | sh
```
[Manual install instructions](https://github.com/jmorganca/ollama/blob/main/docs/linux.md)
### Docker
The official [Ollama Docker image](https://hub.docker.com/r/ollama/ollama) `ollama/ollama` is available on Docker Hub.
## Quickstart
Run a fast and simple model.
To run and chat with [Llama 2](https://ollama.ai/library/llama2):
```
ollama run orca
ollama run llama2
```
## Example models
## Model library
### 💬 Chat
Ollama supports a list of open-source models available on [ollama.ai/library](https://ollama.ai/library 'ollama model library')
Have a conversation.
Here are some example open-source models that can be downloaded:
| Model | Parameters | Size | Download |
| ------------------ | ---------- | ----- | ------------------------------ |
| Llama 2 | 7B | 3.8GB | `ollama run llama2` |
| Mistral | 7B | 4.1GB | `ollama run mistral` |
| Dolphin Phi | 2.7B | 1.6GB | `ollama run dolphin-phi` |
| Phi-2 | 2.7B | 1.7GB | `ollama run phi` |
| Neural Chat | 7B | 4.1GB | `ollama run neural-chat` |
| Starling | 7B | 4.1GB | `ollama run starling-lm` |
| Code Llama | 7B | 3.8GB | `ollama run codellama` |
| Llama 2 Uncensored | 7B | 3.8GB | `ollama run llama2-uncensored` |
| Llama 2 13B | 13B | 7.3GB | `ollama run llama2:13b` |
| Llama 2 70B | 70B | 39GB | `ollama run llama2:70b` |
| Orca Mini | 3B | 1.9GB | `ollama run orca-mini` |
| Vicuna | 7B | 3.8GB | `ollama run vicuna` |
| LLaVA | 7B | 4.5GB | `ollama run llava` |
> Note: You should have at least 8 GB of RAM available to run the 7B models, 16 GB to run the 13B models, and 32 GB to run the 33B models.
## Customize a model
### Import from GGUF
Ollama supports importing GGUF models in the Modelfile:
1. Create a file named `Modelfile`, with a `FROM` instruction with the local filepath to the model you want to import.
```
FROM ./vicuna-33b.Q4_0.gguf
```
2. Create the model in Ollama
```
ollama create example -f Modelfile
```
3. Run the model
```
ollama run example
```
### Import from PyTorch or Safetensors
See the [guide](docs/import.md) on importing models for more information.
### Customize a prompt
Models from the Ollama library can be customized with a prompt. For example, to customize the `llama2` model:
```
ollama run vicuna "Why is the sky blue?"
ollama pull llama2
```
### 🗺️ Instructions
Get a helping hand.
Create a `Modelfile`:
```
ollama run orca "Write an email to my boss."
FROM llama2
# set the temperature to 1 [higher is more creative, lower is more coherent]
PARAMETER temperature 1
# set the system message
SYSTEM """
You are Mario from Super Mario Bros. Answer as Mario, the assistant, only.
"""
```
### 🔎 Ask questions about documents
Send the contents of a document and ask questions about it.
Next, create and run the model:
```
ollama run nous-hermes "$(cat input.txt)", please summarize this story
ollama create mario -f ./Modelfile
ollama run mario
>>> hi
Hello! It's your friend Mario.
```
### 📖 Storytelling
For more examples, see the [examples](examples) directory. For more information on working with a Modelfile, see the [Modelfile](docs/modelfile.md) documentation.
Venture into the unknown.
## CLI Reference
### Create a model
`ollama create` is used to create a model from a Modelfile.
```
ollama run nous-hermes "Once upon a time"
ollama create mymodel -f ./Modelfile
```
## Advanced usage
### Run a local model
### Pull a model
```
ollama run ~/Downloads/vicuna-7b-v1.3.ggmlv3.q4_1.bin
ollama pull llama2
```
> This command can also be used to update a local model. Only the diff will be pulled.
### Remove a model
```
ollama rm llama2
```
### Copy a model
```
ollama cp llama2 my-llama2
```
### Multiline input
For multiline input, you can wrap text with `"""`:
```
>>> """Hello,
... world!
... """
I'm a basic program that prints the famous "Hello, world!" message to the console.
```
### Multimodal models
```
>>> What's in this image? /Users/jmorgan/Desktop/smile.png
The image features a yellow smiley face, which is likely the central focus of the picture.
```
### Pass in prompt as arguments
```
$ ollama run llama2 "Summarize this file: $(cat README.md)"
Ollama is a lightweight, extensible framework for building and running language models on the local machine. It provides a simple API for creating, running, and managing models, as well as a library of pre-built models that can be easily used in a variety of applications.
```
### List models on your computer
```
ollama list
```
### Start Ollama
`ollama serve` is used when you want to start ollama without running the desktop application.
## Building
Install `cmake` and `go`:
```
brew install cmake go
```
Then generate dependencies:
```
go generate ./...
```
Then build the binary:
```
go build .
```
To run it start the server:
More detailed instructions can be found in the [developer guide](https://github.com/jmorganca/ollama/blob/main/docs/development.md)
### Running local builds
Next, start the server:
```
./ollama server &
./ollama serve
```
Finally, run a model!
Finally, in a separate shell, run a model:
```
./ollama run ~/Downloads/vicuna-7b-v1.3.ggmlv3.q4_1.bin
./ollama run llama2
```
## API Reference
## REST API
### `POST /api/pull`
Ollama has a REST API for running and managing models.
Download a model
### Generate a response
```
curl -X POST http://localhost:11343/api/pull -d '{"model": "orca"}'
curl http://localhost:11434/api/generate -d '{
"model": "llama2",
"prompt":"Why is the sky blue?"
}'
```
### `POST /api/generate`
Complete a prompt
### Chat with a model
```
curl -X POST http://localhost:11434/api/generate -d '{"model": "orca", "prompt": "hello!", "stream": true}'
curl http://localhost:11434/api/chat -d '{
"model": "mistral",
"messages": [
{ "role": "user", "content": "why is the sky blue?" }
]
}'
```
See the [API documentation](./docs/api.md) for all endpoints.
## Integrations
- [ollama-python](https://github.com/jmorganca/ollama-python)
## Community Integrations
### Web & Desktop
- [Bionic GPT](https://github.com/bionic-gpt/bionic-gpt)
- [HTML UI](https://github.com/rtcfirefly/ollama-ui)
- [Chatbot UI](https://github.com/ivanfioravanti/chatbot-ollama)
- [Typescript UI](https://github.com/ollama-interface/Ollama-Gui?tab=readme-ov-file)
- [Minimalistic React UI for Ollama Models](https://github.com/richawo/minimal-llm-ui)
- [Web UI](https://github.com/ollama-webui/ollama-webui)
- [Ollamac](https://github.com/kevinhermawan/Ollamac)
- [big-AGI](https://github.com/enricoros/big-agi/blob/main/docs/config-ollama.md)
- [Cheshire Cat assistant framework](https://github.com/cheshire-cat-ai/core)
- [Amica](https://github.com/semperai/amica)
- [chatd](https://github.com/BruceMacD/chatd)
- [Ollama-SwiftUI](https://github.com/kghandour/Ollama-SwiftUI)
### Terminal
- [oterm](https://github.com/ggozad/oterm)
- [Ellama Emacs client](https://github.com/s-kostyaev/ellama)
- [Emacs client](https://github.com/zweifisch/ollama)
- [gen.nvim](https://github.com/David-Kunz/gen.nvim)
- [ollama.nvim](https://github.com/nomnivore/ollama.nvim)
- [ogpt.nvim](https://github.com/huynle/ogpt.nvim)
- [gptel Emacs client](https://github.com/karthink/gptel)
- [Oatmeal](https://github.com/dustinblackman/oatmeal)
- [cmdh](https://github.com/pgibler/cmdh)
### Database
- [MindsDB](https://github.com/mindsdb/mindsdb/blob/staging/mindsdb/integrations/handlers/ollama_handler/README.md)
### Package managers
- [Pacman](https://archlinux.org/packages/extra/x86_64/ollama/)
### Libraries
- [LangChain](https://python.langchain.com/docs/integrations/llms/ollama) and [LangChain.js](https://js.langchain.com/docs/modules/model_io/models/llms/integrations/ollama) with [example](https://js.langchain.com/docs/use_cases/question_answering/local_retrieval_qa)
- [LangChainGo](https://github.com/tmc/langchaingo/) with [example](https://github.com/tmc/langchaingo/tree/main/examples/ollama-completion-example)
- [LlamaIndex](https://gpt-index.readthedocs.io/en/stable/examples/llm/ollama.html)
- [LiteLLM](https://github.com/BerriAI/litellm)
- [OllamaSharp for .NET](https://github.com/awaescher/OllamaSharp)
- [Ollama for Ruby](https://github.com/gbaptista/ollama-ai)
- [Ollama-rs for Rust](https://github.com/pepperoni21/ollama-rs)
- [Ollama4j for Java](https://github.com/amithkoujalgi/ollama4j)
- [ModelFusion Typescript Library](https://modelfusion.dev/integration/model-provider/ollama)
- [OllamaKit for Swift](https://github.com/kevinhermawan/OllamaKit)
- [Ollama for Dart](https://github.com/breitburg/dart-ollama)
- [Ollama for Laravel](https://github.com/cloudstudio/ollama-laravel)
- [LangChainDart](https://github.com/davidmigloz/langchain_dart)
- [Semantic Kernel - Python](https://github.com/microsoft/semantic-kernel/tree/main/python/semantic_kernel/connectors/ai/ollama)
- [Haystack](https://github.com/deepset-ai/haystack-integrations/blob/main/integrations/ollama.md)
### Mobile
- [Enchanted](https://github.com/AugustDev/enchanted)
- [Maid](https://github.com/Mobile-Artificial-Intelligence/maid)
### Extensions & Plugins
- [Raycast extension](https://github.com/MassimilianoPasquini97/raycast_ollama)
- [Discollama](https://github.com/mxyng/discollama) (Discord bot inside the Ollama discord channel)
- [Continue](https://github.com/continuedev/continue)
- [Obsidian Ollama plugin](https://github.com/hinterdupfinger/obsidian-ollama)
- [Logseq Ollama plugin](https://github.com/omagdy7/ollama-logseq)
- [Dagger Chatbot](https://github.com/samalba/dagger-chatbot)
- [Discord AI Bot](https://github.com/mekb-turtle/discord-ai-bot)
- [Ollama Telegram Bot](https://github.com/ruecat/ollama-telegram)
- [Hass Ollama Conversation](https://github.com/ej52/hass-ollama-conversation)
- [Rivet plugin](https://github.com/abrenneke/rivet-plugin-ollama)
- [Llama Coder](https://github.com/ex3ndr/llama-coder) (Copilot alternative using Ollama)
- [Obsidian BMO Chatbot plugin](https://github.com/longy2k/obsidian-bmo-chatbot)
- [Open Interpreter](https://docs.openinterpreter.com/language-model-setup/local-models/ollama)

View File

@@ -5,40 +5,148 @@ import (
"bytes"
"context"
"encoding/json"
"errors"
"fmt"
"io"
"net"
"net/http"
"net/url"
"os"
"runtime"
"strings"
"github.com/jmorganca/ollama/format"
"github.com/jmorganca/ollama/version"
)
type StatusError struct {
StatusCode int
Status string
Message string
}
func (e StatusError) Error() string {
if e.Message != "" {
return fmt.Sprintf("%s: %s", e.Status, e.Message)
}
return e.Status
}
type Client struct {
base url.URL
base *url.URL
http http.Client
}
func NewClient(hosts ...string) *Client {
host := "127.0.0.1:11434"
if len(hosts) > 0 {
host = hosts[0]
func checkError(resp *http.Response, body []byte) error {
if resp.StatusCode < http.StatusBadRequest {
return nil
}
return &Client{
base: url.URL{Scheme: "http", Host: host},
apiError := StatusError{StatusCode: resp.StatusCode}
err := json.Unmarshal(body, &apiError)
if err != nil {
// Use the full body as the message if we fail to decode a response.
apiError.ErrorMessage = string(body)
}
return apiError
}
func ClientFromEnvironment() (*Client, error) {
defaultPort := "11434"
scheme, hostport, ok := strings.Cut(os.Getenv("OLLAMA_HOST"), "://")
switch {
case !ok:
scheme, hostport = "http", os.Getenv("OLLAMA_HOST")
case scheme == "http":
defaultPort = "80"
case scheme == "https":
defaultPort = "443"
}
// trim trailing slashes
hostport = strings.TrimRight(hostport, "/")
host, port, err := net.SplitHostPort(hostport)
if err != nil {
host, port = "127.0.0.1", defaultPort
if ip := net.ParseIP(strings.Trim(hostport, "[]")); ip != nil {
host = ip.String()
} else if hostport != "" {
host = hostport
}
}
client := Client{
base: &url.URL{
Scheme: scheme,
Host: net.JoinHostPort(host, port),
},
}
mockRequest, err := http.NewRequest(http.MethodHead, client.base.String(), nil)
if err != nil {
return nil, err
}
proxyURL, err := http.ProxyFromEnvironment(mockRequest)
if err != nil {
return nil, err
}
client.http = http.Client{
Transport: &http.Transport{
Proxy: http.ProxyURL(proxyURL),
},
}
return &client, nil
}
func (c *Client) do(ctx context.Context, method, path string, reqData, respData any) error {
var reqBody io.Reader
var data []byte
var err error
switch reqData := reqData.(type) {
case io.Reader:
// reqData is already an io.Reader
reqBody = reqData
case nil:
// noop
default:
data, err = json.Marshal(reqData)
if err != nil {
return err
}
reqBody = bytes.NewReader(data)
}
requestURL := c.base.JoinPath(path)
request, err := http.NewRequestWithContext(ctx, method, requestURL.String(), reqBody)
if err != nil {
return err
}
request.Header.Set("Content-Type", "application/json")
request.Header.Set("Accept", "application/json")
request.Header.Set("User-Agent", fmt.Sprintf("ollama/%s (%s %s) Go/%s", version.Version, runtime.GOARCH, runtime.GOOS, runtime.Version()))
respObj, err := c.http.Do(request)
if err != nil {
return err
}
defer respObj.Body.Close()
respBody, err := io.ReadAll(respObj.Body)
if err != nil {
return err
}
if err := checkError(respObj, respBody); err != nil {
return err
}
if len(respBody) > 0 && respData != nil {
if err := json.Unmarshal(respBody, respData); err != nil {
return err
}
}
return nil
}
const maxBufferSize = 512 * format.KiloByte
func (c *Client) stream(ctx context.Context, method, path string, data any, fn func([]byte) error) error {
var buf *bytes.Buffer
if data != nil {
@@ -50,21 +158,26 @@ func (c *Client) stream(ctx context.Context, method, path string, data any, fn f
buf = bytes.NewBuffer(bts)
}
request, err := http.NewRequestWithContext(ctx, method, c.base.JoinPath(path).String(), buf)
requestURL := c.base.JoinPath(path)
request, err := http.NewRequestWithContext(ctx, method, requestURL.String(), buf)
if err != nil {
return err
}
request.Header.Set("Content-Type", "application/json")
request.Header.Set("Accept", "application/json")
request.Header.Set("Accept", "application/x-ndjson")
request.Header.Set("User-Agent", fmt.Sprintf("ollama/%s (%s %s) Go/%s", version.Version, runtime.GOARCH, runtime.GOOS, runtime.Version()))
response, err := http.DefaultClient.Do(request)
response, err := c.http.Do(request)
if err != nil {
return err
}
defer response.Body.Close()
scanner := bufio.NewScanner(response.Body)
// increase the buffer size to avoid running out of space
scanBuf := make([]byte, 0, maxBufferSize)
scanner.Buffer(scanBuf, maxBufferSize)
for scanner.Scan() {
var errorResponse struct {
Error string `json:"error,omitempty"`
@@ -75,11 +188,15 @@ func (c *Client) stream(ctx context.Context, method, path string, data any, fn f
return fmt.Errorf("unmarshal: %w", err)
}
if response.StatusCode >= 400 {
if errorResponse.Error != "" {
return fmt.Errorf(errorResponse.Error)
}
if response.StatusCode >= http.StatusBadRequest {
return StatusError{
StatusCode: response.StatusCode,
Status: response.Status,
Message: errorResponse.Error,
StatusCode: response.StatusCode,
Status: response.Status,
ErrorMessage: errorResponse.Error,
}
}
@@ -104,11 +221,11 @@ func (c *Client) Generate(ctx context.Context, req *GenerateRequest, fn Generate
})
}
type PullProgressFunc func(PullProgress) error
type ChatResponseFunc func(ChatResponse) error
func (c *Client) Pull(ctx context.Context, req *PullRequest, fn PullProgressFunc) error {
return c.stream(ctx, http.MethodPost, "/api/pull", req, func(bts []byte) error {
var resp PullProgress
func (c *Client) Chat(ctx context.Context, req *ChatRequest, fn ChatResponseFunc) error {
return c.stream(ctx, http.MethodPost, "/api/chat", req, func(bts []byte) error {
var resp ChatResponse
if err := json.Unmarshal(bts, &resp); err != nil {
return err
}
@@ -116,3 +233,113 @@ func (c *Client) Pull(ctx context.Context, req *PullRequest, fn PullProgressFunc
return fn(resp)
})
}
type PullProgressFunc func(ProgressResponse) error
func (c *Client) Pull(ctx context.Context, req *PullRequest, fn PullProgressFunc) error {
return c.stream(ctx, http.MethodPost, "/api/pull", req, func(bts []byte) error {
var resp ProgressResponse
if err := json.Unmarshal(bts, &resp); err != nil {
return err
}
return fn(resp)
})
}
type PushProgressFunc func(ProgressResponse) error
func (c *Client) Push(ctx context.Context, req *PushRequest, fn PushProgressFunc) error {
return c.stream(ctx, http.MethodPost, "/api/push", req, func(bts []byte) error {
var resp ProgressResponse
if err := json.Unmarshal(bts, &resp); err != nil {
return err
}
return fn(resp)
})
}
type CreateProgressFunc func(ProgressResponse) error
func (c *Client) Create(ctx context.Context, req *CreateRequest, fn CreateProgressFunc) error {
return c.stream(ctx, http.MethodPost, "/api/create", req, func(bts []byte) error {
var resp ProgressResponse
if err := json.Unmarshal(bts, &resp); err != nil {
return err
}
return fn(resp)
})
}
func (c *Client) List(ctx context.Context) (*ListResponse, error) {
var lr ListResponse
if err := c.do(ctx, http.MethodGet, "/api/tags", nil, &lr); err != nil {
return nil, err
}
return &lr, nil
}
func (c *Client) Copy(ctx context.Context, req *CopyRequest) error {
if err := c.do(ctx, http.MethodPost, "/api/copy", req, nil); err != nil {
return err
}
return nil
}
func (c *Client) Delete(ctx context.Context, req *DeleteRequest) error {
if err := c.do(ctx, http.MethodDelete, "/api/delete", req, nil); err != nil {
return err
}
return nil
}
func (c *Client) Show(ctx context.Context, req *ShowRequest) (*ShowResponse, error) {
var resp ShowResponse
if err := c.do(ctx, http.MethodPost, "/api/show", req, &resp); err != nil {
return nil, err
}
return &resp, nil
}
func (c *Client) Heartbeat(ctx context.Context) error {
if err := c.do(ctx, http.MethodHead, "/", nil, nil); err != nil {
return err
}
return nil
}
func (c *Client) Embeddings(ctx context.Context, req *EmbeddingRequest) (*EmbeddingResponse, error) {
var resp EmbeddingResponse
if err := c.do(ctx, http.MethodPost, "/api/embeddings", req, &resp); err != nil {
return nil, err
}
return &resp, nil
}
func (c *Client) CreateBlob(ctx context.Context, digest string, r io.Reader) error {
if err := c.do(ctx, http.MethodHead, fmt.Sprintf("/api/blobs/%s", digest), nil, nil); err != nil {
var statusError StatusError
if !errors.As(err, &statusError) || statusError.StatusCode != http.StatusNotFound {
return err
}
if err := c.do(ctx, http.MethodPost, fmt.Sprintf("/api/blobs/%s", digest), r, nil); err != nil {
return err
}
}
return nil
}
func (c *Client) Version(ctx context.Context) (string, error) {
var version struct {
Version string `json:"version"`
}
if err := c.do(ctx, http.MethodGet, "/api/version", nil, &version); err != nil {
return "", err
}
return version.Version, nil
}

43
api/client_test.go Normal file
View File

@@ -0,0 +1,43 @@
package api
import "testing"
func TestClientFromEnvironment(t *testing.T) {
type testCase struct {
value string
expect string
err error
}
testCases := map[string]*testCase{
"empty": {value: "", expect: "http://127.0.0.1:11434"},
"only address": {value: "1.2.3.4", expect: "http://1.2.3.4:11434"},
"only port": {value: ":1234", expect: "http://:1234"},
"address and port": {value: "1.2.3.4:1234", expect: "http://1.2.3.4:1234"},
"scheme http and address": {value: "http://1.2.3.4", expect: "http://1.2.3.4:80"},
"scheme https and address": {value: "https://1.2.3.4", expect: "https://1.2.3.4:443"},
"scheme, address, and port": {value: "https://1.2.3.4:1234", expect: "https://1.2.3.4:1234"},
"hostname": {value: "example.com", expect: "http://example.com:11434"},
"hostname and port": {value: "example.com:1234", expect: "http://example.com:1234"},
"scheme http and hostname": {value: "http://example.com", expect: "http://example.com:80"},
"scheme https and hostname": {value: "https://example.com", expect: "https://example.com:443"},
"scheme, hostname, and port": {value: "https://example.com:1234", expect: "https://example.com:1234"},
"trailing slash": {value: "example.com/", expect: "http://example.com:11434"},
"trailing slash port": {value: "example.com:1234/", expect: "http://example.com:1234"},
}
for k, v := range testCases {
t.Run(k, func(t *testing.T) {
t.Setenv("OLLAMA_HOST", v.value)
client, err := ClientFromEnvironment()
if err != v.err {
t.Fatalf("expected %s, got %s", v.err, err)
}
if client.base.String() != v.expect {
t.Fatalf("expected %s, got %s", v.expect, client.base.String())
}
})
}
}

View File

@@ -1,91 +1,485 @@
package api
import "runtime"
import (
"encoding/json"
"fmt"
"math"
"os"
"reflect"
"strconv"
"strings"
"time"
)
type PullRequest struct {
Model string `json:"model"`
type StatusError struct {
StatusCode int
Status string
ErrorMessage string `json:"error"`
}
type PullProgress struct {
Total int64 `json:"total"`
Completed int64 `json:"completed"`
Percent float64 `json:"percent"`
func (e StatusError) Error() string {
switch {
case e.Status != "" && e.ErrorMessage != "":
return fmt.Sprintf("%s: %s", e.Status, e.ErrorMessage)
case e.Status != "":
return e.Status
case e.ErrorMessage != "":
return e.ErrorMessage
default:
// this should not happen
return "something went wrong, please see the ollama server logs for details"
}
}
type ImageData []byte
type GenerateRequest struct {
Model string `json:"model"`
Prompt string `json:"prompt"`
System string `json:"system"`
Template string `json:"template"`
Context []int `json:"context,omitempty"`
Stream *bool `json:"stream,omitempty"`
Raw bool `json:"raw,omitempty"`
Format string `json:"format"`
Images []ImageData `json:"images,omitempty"`
Options map[string]interface{} `json:"options"`
}
type ChatRequest struct {
Model string `json:"model"`
Messages []Message `json:"messages"`
Stream *bool `json:"stream,omitempty"`
Format string `json:"format"`
Options map[string]interface{} `json:"options"`
}
type Message struct {
Role string `json:"role"` // one of ["system", "user", "assistant"]
Content string `json:"content"`
Images []ImageData `json:"images,omitempty"`
}
type ChatResponse struct {
Model string `json:"model"`
CreatedAt time.Time `json:"created_at"`
Message Message `json:"message"`
Done bool `json:"done"`
Metrics
}
type Metrics struct {
TotalDuration time.Duration `json:"total_duration,omitempty"`
LoadDuration time.Duration `json:"load_duration,omitempty"`
PromptEvalCount int `json:"prompt_eval_count,omitempty"`
PromptEvalDuration time.Duration `json:"prompt_eval_duration,omitempty"`
EvalCount int `json:"eval_count,omitempty"`
EvalDuration time.Duration `json:"eval_duration,omitempty"`
}
// Options specfied in GenerateRequest, if you add a new option here add it to the API docs also
type Options struct {
Runner
// Predict options used at runtime
NumKeep int `json:"num_keep,omitempty"`
Seed int `json:"seed,omitempty"`
NumPredict int `json:"num_predict,omitempty"`
TopK int `json:"top_k,omitempty"`
TopP float32 `json:"top_p,omitempty"`
TFSZ float32 `json:"tfs_z,omitempty"`
TypicalP float32 `json:"typical_p,omitempty"`
RepeatLastN int `json:"repeat_last_n,omitempty"`
Temperature float32 `json:"temperature,omitempty"`
RepeatPenalty float32 `json:"repeat_penalty,omitempty"`
PresencePenalty float32 `json:"presence_penalty,omitempty"`
FrequencyPenalty float32 `json:"frequency_penalty,omitempty"`
Mirostat int `json:"mirostat,omitempty"`
MirostatTau float32 `json:"mirostat_tau,omitempty"`
MirostatEta float32 `json:"mirostat_eta,omitempty"`
PenalizeNewline bool `json:"penalize_newline,omitempty"`
Stop []string `json:"stop,omitempty"`
}
// Runner options which must be set when the model is loaded into memory
type Runner struct {
UseNUMA bool `json:"numa,omitempty"`
NumCtx int `json:"num_ctx,omitempty"`
NumBatch int `json:"num_batch,omitempty"`
NumGQA int `json:"num_gqa,omitempty"`
NumGPU int `json:"num_gpu,omitempty"`
MainGPU int `json:"main_gpu,omitempty"`
LowVRAM bool `json:"low_vram,omitempty"`
F16KV bool `json:"f16_kv,omitempty"`
LogitsAll bool `json:"logits_all,omitempty"`
VocabOnly bool `json:"vocab_only,omitempty"`
UseMMap bool `json:"use_mmap,omitempty"`
UseMLock bool `json:"use_mlock,omitempty"`
EmbeddingOnly bool `json:"embedding_only,omitempty"`
RopeFrequencyBase float32 `json:"rope_frequency_base,omitempty"`
RopeFrequencyScale float32 `json:"rope_frequency_scale,omitempty"`
NumThread int `json:"num_thread,omitempty"`
}
type EmbeddingRequest struct {
Model string `json:"model"`
Prompt string `json:"prompt"`
Options `json:"options"`
Options map[string]interface{} `json:"options"`
}
type EmbeddingResponse struct {
Embedding []float64 `json:"embedding"`
}
type CreateRequest struct {
Model string `json:"model"`
Path string `json:"path"`
Modelfile string `json:"modelfile"`
Stream *bool `json:"stream,omitempty"`
// Name is deprecated, see Model
Name string `json:"name"`
}
type DeleteRequest struct {
Model string `json:"model"`
// Name is deprecated, see Model
Name string `json:"name"`
}
type ShowRequest struct {
Model string `json:"model"`
System string `json:"system"`
Template string `json:"template"`
Options map[string]interface{} `json:"options"`
// Name is deprecated, see Model
Name string `json:"name"`
}
type ShowResponse struct {
License string `json:"license,omitempty"`
Modelfile string `json:"modelfile,omitempty"`
Parameters string `json:"parameters,omitempty"`
Template string `json:"template,omitempty"`
System string `json:"system,omitempty"`
Details ModelDetails `json:"details,omitempty"`
}
type CopyRequest struct {
Source string `json:"source"`
Destination string `json:"destination"`
}
type PullRequest struct {
Model string `json:"model"`
Insecure bool `json:"insecure,omitempty"`
Username string `json:"username"`
Password string `json:"password"`
Stream *bool `json:"stream,omitempty"`
// Name is deprecated, see Model
Name string `json:"name"`
}
type ProgressResponse struct {
Status string `json:"status"`
Digest string `json:"digest,omitempty"`
Total int64 `json:"total,omitempty"`
Completed int64 `json:"completed,omitempty"`
}
type PushRequest struct {
Model string `json:"model"`
Insecure bool `json:"insecure,omitempty"`
Username string `json:"username"`
Password string `json:"password"`
Stream *bool `json:"stream,omitempty"`
// Name is deprecated, see Model
Name string `json:"name"`
}
type ListResponse struct {
Models []ModelResponse `json:"models"`
}
type ModelResponse struct {
Name string `json:"name"`
Model string `json:"model"`
ModifiedAt time.Time `json:"modified_at"`
Size int64 `json:"size"`
Digest string `json:"digest"`
Details ModelDetails `json:"details,omitempty"`
}
type TokenResponse struct {
Token string `json:"token"`
}
type GenerateResponse struct {
Response string `json:"response"`
Model string `json:"model"`
CreatedAt time.Time `json:"created_at"`
Response string `json:"response"`
Done bool `json:"done"`
Context []int `json:"context,omitempty"`
Metrics
}
type Options struct {
Seed int `json:"seed,omitempty"`
type ModelDetails struct {
Format string `json:"format"`
Family string `json:"family"`
Families []string `json:"families"`
ParameterSize string `json:"parameter_size"`
QuantizationLevel string `json:"quantization_level"`
}
// Backend options
UseNUMA bool `json:"numa,omitempty"`
func (m *Metrics) Summary() {
if m.TotalDuration > 0 {
fmt.Fprintf(os.Stderr, "total duration: %v\n", m.TotalDuration)
}
// Model options
NumCtx int `json:"num_ctx,omitempty"`
NumBatch int `json:"num_batch,omitempty"`
NumGPU int `json:"num_gpu,omitempty"`
MainGPU int `json:"main_gpu,omitempty"`
LowVRAM bool `json:"low_vram,omitempty"`
F16KV bool `json:"f16_kv,omitempty"`
LogitsAll bool `json:"logits_all,omitempty"`
VocabOnly bool `json:"vocab_only,omitempty"`
UseMMap bool `json:"use_mmap,omitempty"`
UseMLock bool `json:"use_mlock,omitempty"`
EmbeddingOnly bool `json:"embedding_only,omitempty"`
if m.LoadDuration > 0 {
fmt.Fprintf(os.Stderr, "load duration: %v\n", m.LoadDuration)
}
// Predict options
RepeatLastN int `json:"repeat_last_n,omitempty"`
RepeatPenalty float32 `json:"repeat_penalty,omitempty"`
FrequencyPenalty float32 `json:"frequency_penalty,omitempty"`
PresencePenalty float32 `json:"presence_penalty,omitempty"`
Temperature float32 `json:"temperature,omitempty"`
TopK int `json:"top_k,omitempty"`
TopP float32 `json:"top_p,omitempty"`
TFSZ float32 `json:"tfs_z,omitempty"`
TypicalP float32 `json:"typical_p,omitempty"`
Mirostat int `json:"mirostat,omitempty"`
MirostatTau float32 `json:"mirostat_tau,omitempty"`
MirostatEta float32 `json:"mirostat_eta,omitempty"`
if m.PromptEvalCount > 0 {
fmt.Fprintf(os.Stderr, "prompt eval count: %d token(s)\n", m.PromptEvalCount)
}
NumThread int `json:"num_thread,omitempty"`
if m.PromptEvalDuration > 0 {
fmt.Fprintf(os.Stderr, "prompt eval duration: %s\n", m.PromptEvalDuration)
fmt.Fprintf(os.Stderr, "prompt eval rate: %.2f tokens/s\n", float64(m.PromptEvalCount)/m.PromptEvalDuration.Seconds())
}
if m.EvalCount > 0 {
fmt.Fprintf(os.Stderr, "eval count: %d token(s)\n", m.EvalCount)
}
if m.EvalDuration > 0 {
fmt.Fprintf(os.Stderr, "eval duration: %s\n", m.EvalDuration)
fmt.Fprintf(os.Stderr, "eval rate: %.2f tokens/s\n", float64(m.EvalCount)/m.EvalDuration.Seconds())
}
}
var ErrInvalidOpts = fmt.Errorf("invalid options")
func (opts *Options) FromMap(m map[string]interface{}) error {
valueOpts := reflect.ValueOf(opts).Elem() // names of the fields in the options struct
typeOpts := reflect.TypeOf(opts).Elem() // types of the fields in the options struct
// build map of json struct tags to their types
jsonOpts := make(map[string]reflect.StructField)
for _, field := range reflect.VisibleFields(typeOpts) {
jsonTag := strings.Split(field.Tag.Get("json"), ",")[0]
if jsonTag != "" {
jsonOpts[jsonTag] = field
}
}
invalidOpts := []string{}
for key, val := range m {
if opt, ok := jsonOpts[key]; ok {
field := valueOpts.FieldByName(opt.Name)
if field.IsValid() && field.CanSet() {
if val == nil {
continue
}
switch field.Kind() {
case reflect.Int:
switch t := val.(type) {
case int64:
field.SetInt(t)
case float64:
// when JSON unmarshals numbers, it uses float64, not int
field.SetInt(int64(t))
default:
return fmt.Errorf("option %q must be of type integer", key)
}
case reflect.Bool:
val, ok := val.(bool)
if !ok {
return fmt.Errorf("option %q must be of type boolean", key)
}
field.SetBool(val)
case reflect.Float32:
// JSON unmarshals to float64
val, ok := val.(float64)
if !ok {
return fmt.Errorf("option %q must be of type float32", key)
}
field.SetFloat(val)
case reflect.String:
val, ok := val.(string)
if !ok {
return fmt.Errorf("option %q must be of type string", key)
}
field.SetString(val)
case reflect.Slice:
// JSON unmarshals to []interface{}, not []string
val, ok := val.([]interface{})
if !ok {
return fmt.Errorf("option %q must be of type array", key)
}
// convert []interface{} to []string
slice := make([]string, len(val))
for i, item := range val {
str, ok := item.(string)
if !ok {
return fmt.Errorf("option %q must be of an array of strings", key)
}
slice[i] = str
}
field.Set(reflect.ValueOf(slice))
default:
return fmt.Errorf("unknown type loading config params: %v", field.Kind())
}
}
} else {
invalidOpts = append(invalidOpts, key)
}
}
if len(invalidOpts) > 0 {
return fmt.Errorf("%w: %v", ErrInvalidOpts, strings.Join(invalidOpts, ", "))
}
return nil
}
func DefaultOptions() Options {
return Options{
Seed: -1,
UseNUMA: false,
NumCtx: 512,
NumBatch: 512,
NumGPU: 1,
LowVRAM: false,
F16KV: true,
UseMMap: true,
UseMLock: false,
RepeatLastN: 512,
RepeatPenalty: 1.1,
FrequencyPenalty: 0.0,
PresencePenalty: 0.0,
// options set on request to runner
NumPredict: -1,
NumKeep: 0,
Temperature: 0.8,
TopK: 40,
TopP: 0.9,
TFSZ: 1.0,
TypicalP: 1.0,
RepeatLastN: 64,
RepeatPenalty: 1.1,
PresencePenalty: 0.0,
FrequencyPenalty: 0.0,
Mirostat: 0,
MirostatTau: 5.0,
MirostatEta: 0.1,
PenalizeNewline: true,
Seed: -1,
NumThread: runtime.NumCPU(),
Runner: Runner{
// options set when the model is loaded
NumCtx: 2048,
RopeFrequencyBase: 10000.0,
RopeFrequencyScale: 1.0,
NumBatch: 512,
NumGPU: -1, // -1 here indicates that NumGPU should be set dynamically
NumGQA: 1,
NumThread: 0, // let the runtime decide
LowVRAM: false,
F16KV: true,
UseMLock: false,
UseMMap: true,
UseNUMA: false,
EmbeddingOnly: true,
},
}
}
type Duration struct {
time.Duration
}
func (d *Duration) UnmarshalJSON(b []byte) (err error) {
var v any
if err := json.Unmarshal(b, &v); err != nil {
return err
}
d.Duration = 5 * time.Minute
switch t := v.(type) {
case float64:
if t < 0 {
t = math.MaxFloat64
}
d.Duration = time.Duration(t)
case string:
d.Duration, err = time.ParseDuration(t)
if err != nil {
return err
}
}
return nil
}
// FormatParams converts specified parameter options to their correct types
func FormatParams(params map[string][]string) (map[string]interface{}, error) {
opts := Options{}
valueOpts := reflect.ValueOf(&opts).Elem() // names of the fields in the options struct
typeOpts := reflect.TypeOf(opts) // types of the fields in the options struct
// build map of json struct tags to their types
jsonOpts := make(map[string]reflect.StructField)
for _, field := range reflect.VisibleFields(typeOpts) {
jsonTag := strings.Split(field.Tag.Get("json"), ",")[0]
if jsonTag != "" {
jsonOpts[jsonTag] = field
}
}
out := make(map[string]interface{})
// iterate params and set values based on json struct tags
for key, vals := range params {
if opt, ok := jsonOpts[key]; !ok {
return nil, fmt.Errorf("unknown parameter '%s'", key)
} else {
field := valueOpts.FieldByName(opt.Name)
if field.IsValid() && field.CanSet() {
switch field.Kind() {
case reflect.Float32:
floatVal, err := strconv.ParseFloat(vals[0], 32)
if err != nil {
return nil, fmt.Errorf("invalid float value %s", vals)
}
out[key] = float32(floatVal)
case reflect.Int:
intVal, err := strconv.ParseInt(vals[0], 10, 64)
if err != nil {
return nil, fmt.Errorf("invalid int value %s", vals)
}
out[key] = intVal
case reflect.Bool:
boolVal, err := strconv.ParseBool(vals[0])
if err != nil {
return nil, fmt.Errorf("invalid bool value %s", vals)
}
out[key] = boolVal
case reflect.String:
out[key] = vals[0]
case reflect.Slice:
// TODO: only string slices are supported right now
out[key] = vals
default:
return nil, fmt.Errorf("unknown type %s for %s", field.Kind(), key)
}
}
}
}
return out, nil
}

View File

@@ -1,7 +1,5 @@
# Desktop
_Note: the Ollama desktop app is a work in progress and is not ready yet for general use._
This app builds upon Ollama to provide a desktop experience for running models.
## Developing
@@ -9,19 +7,15 @@ This app builds upon Ollama to provide a desktop experience for running models.
First, build the `ollama` binary:
```
make -C ..
cd ..
go build .
```
Then run the desktop app with `npm start`:
```
cd app
npm install
npm start
```
## Coming soon
- Browse the latest available models on Hugging Face and other sources
- Keep track of previous conversations with models
- Switch quickly between models
- Connect to remote Ollama servers to run models

Binary file not shown.

After

Width:  |  Height:  |  Size: 402 B

Binary file not shown.

After

Width:  |  Height:  |  Size: 741 B

Binary file not shown.

After

Width:  |  Height:  |  Size: 440 B

Binary file not shown.

After

Width:  |  Height:  |  Size: 763 B

BIN
app/assets/iconTemplate.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 447 B

Binary file not shown.

After

Width:  |  Height:  |  Size: 891 B

Binary file not shown.

After

Width:  |  Height:  |  Size: 443 B

Binary file not shown.

After

Width:  |  Height:  |  Size: 844 B

Binary file not shown.

Before

Width:  |  Height:  |  Size: 442 B

Binary file not shown.

Before

Width:  |  Height:  |  Size: 889 B

View File

@@ -1,4 +1,4 @@
import type { ForgeConfig, ResolvedForgeConfig, ForgeMakeResult } from '@electron-forge/shared-types'
import type { ForgeConfig } from '@electron-forge/shared-types'
import { MakerSquirrel } from '@electron-forge/maker-squirrel'
import { MakerZIP } from '@electron-forge/maker-zip'
import { PublisherGithub } from '@electron-forge/publisher-github'
@@ -18,10 +18,15 @@ const config: ForgeConfig = {
asar: true,
icon: './assets/icon.icns',
extraResource: [
'../ollama',
path.join(__dirname, './assets/ollama_icon_16x16Template.png'),
path.join(__dirname, './assets/ollama_icon_16x16Template@2x.png'),
...(process.platform === 'darwin' ? ['../llama/ggml-metal.metal'] : []),
'../dist/ollama',
path.join(__dirname, './assets/iconTemplate.png'),
path.join(__dirname, './assets/iconTemplate@2x.png'),
path.join(__dirname, './assets/iconUpdateTemplate.png'),
path.join(__dirname, './assets/iconUpdateTemplate@2x.png'),
path.join(__dirname, './assets/iconDarkTemplate.png'),
path.join(__dirname, './assets/iconDarkTemplate@2x.png'),
path.join(__dirname, './assets/iconDarkUpdateTemplate.png'),
path.join(__dirname, './assets/iconDarkUpdateTemplate@2x.png'),
],
...(process.env.SIGN
? {
@@ -36,19 +41,12 @@ const config: ForgeConfig = {
},
}
: {}),
osxUniversal: {
x64ArchFiles: '**/ollama',
},
},
rebuildConfig: {},
makers: [new MakerSquirrel({}), new MakerZIP({}, ['darwin'])],
publishers: [
new PublisherGithub({
repository: {
name: 'ollama',
owner: 'jmorganca',
},
draft: false,
prerelease: true,
}),
],
hooks: {
readPackageJson: async (_, packageJson) => {
return { ...packageJson, version: process.env.VERSION || packageJson.version }
@@ -58,7 +56,7 @@ const config: ForgeConfig = {
new AutoUnpackNativesPlugin({}),
new WebpackPlugin({
mainConfig,
devContentSecurityPolicy: `default-src * 'unsafe-eval' 'unsafe-inline'`,
devContentSecurityPolicy: `default-src * 'unsafe-eval' 'unsafe-inline'; img-src data: 'self'`,
renderer: {
config: rendererConfig,
nodeIntegration: true,

3273
app/package-lock.json generated

File diff suppressed because it is too large Load Diff

View File

@@ -6,12 +6,14 @@
"main": ".webpack/main",
"scripts": {
"start": "electron-forge start",
"package": "electron-forge package",
"package:sign": "SIGN=1 electron-forge package",
"make": "electron-forge make",
"make:sign": "SIGN=1 electron-forge make",
"package": "electron-forge package --arch universal",
"package:sign": "SIGN=1 electron-forge package --arch universal",
"make": "electron-forge make --arch universal",
"make:sign": "SIGN=1 electron-forge make --arch universal",
"publish": "SIGN=1 electron-forge publish",
"lint": "eslint --ext .ts,.tsx ."
"lint": "eslint --ext .ts,.tsx .",
"format": "prettier --check . --ignore-path .gitignore",
"format:fix": "prettier --write . --ignore-path .gitignore"
},
"keywords": [],
"author": {
@@ -30,6 +32,8 @@
"@electron-forge/plugin-auto-unpack-natives": "^6.2.1",
"@electron-forge/plugin-webpack": "^6.2.1",
"@electron-forge/publisher-github": "^6.2.1",
"@electron/universal": "^1.4.1",
"@svgr/webpack": "^8.0.1",
"@types/chmodr": "^1.0.0",
"@types/node": "^20.4.0",
"@types/react": "^18.2.14",
@@ -42,7 +46,7 @@
"chmodr": "^1.2.0",
"copy-webpack-plugin": "^11.0.0",
"css-loader": "^6.8.1",
"electron": "25.2.0",
"electron": "25.9.2",
"eslint": "^8.43.0",
"eslint-plugin-import": "^2.27.5",
"fork-ts-checker-webpack-plugin": "^7.3.0",
@@ -54,17 +58,21 @@
"prettier": "^2.8.8",
"prettier-plugin-tailwindcss": "^0.3.0",
"style-loader": "^3.3.3",
"svg-inline-loader": "^0.8.2",
"tailwindcss": "^3.3.2",
"ts-loader": "^9.4.3",
"ts-node": "^10.9.1",
"typescript": "~4.5.4",
"url-loader": "^4.1.1",
"webpack": "^5.88.0",
"webpack-cli": "^5.1.4",
"webpack-dev-server": "^4.15.1"
},
"dependencies": {
"@electron/remote": "^2.0.10",
"@heroicons/react": "^2.0.18",
"@segment/analytics-node": "^1.0.0",
"copy-to-clipboard": "^3.3.3",
"electron-squirrel-startup": "^1.0.0",
"electron-store": "^8.1.0",
"react": "^18.2.0",

View File

@@ -11,6 +11,10 @@ body {
-webkit-app-region: drag;
}
.no-drag {
-webkit-app-region: no-drag;
}
.blink {
-webkit-animation: 1s blink step-end infinite;
-moz-animation: 1s blink step-end infinite;

View File

@@ -1,158 +1,120 @@
import { useState } from 'react'
import path from 'path'
import os from 'os'
import { dialog, getCurrentWindow } from '@electron/remote'
import copy from 'copy-to-clipboard'
import { CheckIcon, DocumentDuplicateIcon } from '@heroicons/react/24/outline'
import Store from 'electron-store'
import { getCurrentWindow, app } from '@electron/remote'
const API_URL = 'http://127.0.0.1:7734'
import { install } from './install'
import OllamaIcon from './ollama.svg'
type Message = {
sender: 'bot' | 'human'
content: string
}
const store = new Store()
const userInfo = os.userInfo()
async function generate(prompt: string, model: string, callback: (res: string) => void) {
const result = await fetch(`${API_URL}/generate`, {
method: 'POST',
headers: {
'Content-Type': 'application/json',
},
body: JSON.stringify({
prompt,
model,
}),
})
if (!result.ok) {
return
}
let reader = result.body.getReader()
while (true) {
const { done, value } = await reader.read()
if (done) {
break
}
let decoder = new TextDecoder()
let str = decoder.decode(value)
let re = /}\s*{/g
str = '[' + str.replace(re, '},{') + ']'
let messages = JSON.parse(str)
for (const message of messages) {
const choice = message.choices[0]
callback(choice.text)
if (choice.finish_reason === 'stop') {
break
}
}
}
return
enum Step {
WELCOME = 0,
CLI,
FINISH,
}
export default function () {
const [prompt, setPrompt] = useState('')
const [messages, setMessages] = useState<Message[]>([])
const [model, setModel] = useState('')
const [generating, setGenerating] = useState(false)
const [step, setStep] = useState<Step>(Step.WELCOME)
const [commandCopied, setCommandCopied] = useState<boolean>(false)
const command = 'ollama run llama2'
return (
<div className='flex min-h-screen flex-1 flex-col justify-between bg-white'>
<header className='drag sticky top-0 z-50 flex h-14 w-full flex-row items-center border-b border-black/10 bg-white/75 backdrop-blur-md'>
<div className='mx-auto w-full max-w-xl leading-none'>
<h1 className='text-sm font-medium'>{path.basename(model).replace('.bin', '')}</h1>
</div>
</header>
{model ? (
<section className='mx-auto mb-10 w-full max-w-xl flex-1 break-words'>
{messages.map((m, i) => (
<div className='my-4 flex gap-4' key={i}>
<div className='flex-none pr-1 text-lg'>
{m.sender === 'human' ? (
<div className='mt-px flex h-6 w-6 items-center justify-center rounded-md bg-neutral-200 text-sm text-neutral-700'>
{userInfo.username[0].toUpperCase()}
</div>
) : (
<div className='mt-0.5 flex h-6 w-6 items-center justify-center rounded-md bg-blue-600 text-sm text-white'>
{path.basename(model)[0].toUpperCase()}
</div>
)}
</div>
<div className='flex-1 text-gray-800'>
{m.content}
{m.sender === 'bot' && generating && i === messages.length - 1 && (
<span className='blink relative -top-[3px] left-1 text-[10px]'></span>
)}
<div className='drag'>
<div className='mx-auto flex min-h-screen w-full flex-col justify-between bg-white px-4 pt-16'>
{step === Step.WELCOME && (
<>
<div className='mx-auto text-center'>
<h1 className='mb-6 mt-4 text-2xl tracking-tight text-gray-900'>Welcome to Ollama</h1>
<p className='mx-auto w-[65%] text-sm text-gray-400'>
Let's get you up and running with your own large language models.
</p>
<button
onClick={() => setStep(Step.CLI)}
className='no-drag rounded-dm mx-auto my-8 w-[40%] rounded-md bg-black px-4 py-2 text-sm text-white hover:brightness-110'
>
Next
</button>
</div>
<div className='mx-auto'>
<OllamaIcon />
</div>
</>
)}
{step === Step.CLI && (
<>
<div className='mx-auto flex flex-col space-y-28 text-center'>
<h1 className='mt-4 text-2xl tracking-tight text-gray-900'>Install the command line</h1>
<pre className='mx-auto text-4xl text-gray-400'>&gt; ollama</pre>
<div className='mx-auto'>
<button
onClick={async () => {
try {
await install()
setStep(Step.FINISH)
} catch (e) {
console.error('could not install: ', e)
} finally {
getCurrentWindow().show()
getCurrentWindow().focus()
}
}}
className='no-drag rounded-dm mx-auto w-[60%] rounded-md bg-black px-4 py-2 text-sm text-white hover:brightness-110'
>
Install
</button>
<p className='mx-auto my-4 w-[70%] text-xs text-gray-400'>
You will be prompted for administrator access
</p>
</div>
</div>
))}
</section>
) : (
<section className='flex flex-1 select-none flex-col items-center justify-center pb-20'>
<h2 className='text-3xl font-light text-neutral-400'>No model selected</h2>
<button
onClick={async () => {
const res = await dialog.showOpenDialog(getCurrentWindow(), {
properties: ['openFile', 'multiSelections'],
})
if (res.canceled) {
return
}
setModel(res.filePaths[0])
}}
className='rounded-dm my-8 rounded-md bg-blue-600 px-4 py-2 text-sm text-white hover:brightness-110'
>
Open file...
</button>
</section>
)}
<div className='sticky bottom-0 bg-gradient-to-b from-transparent to-white'>
{model && (
<textarea
autoFocus
rows={1}
value={prompt}
placeholder='Send a message...'
onChange={e => setPrompt(e.target.value)}
className='mx-auto my-4 block w-full max-w-xl resize-none rounded-xl border border-gray-200 px-5 py-3.5 text-[15px] shadow-lg shadow-black/5 focus:outline-none'
onKeyDownCapture={async e => {
if (e.key === 'Enter' && !e.shiftKey) {
e.preventDefault()
if (generating) {
return
}
if (!prompt) {
return
}
await setMessages(messages => {
return [...messages, { sender: 'human', content: prompt }, { sender: 'bot', content: '' }]
})
setPrompt('')
setGenerating(true)
await generate(prompt, model, res => {
setMessages(messages => {
let last = messages[messages.length - 1]
return [...messages.slice(0, messages.length - 1), { ...last, content: last.content + res }]
})
})
setGenerating(false)
}
}}
></textarea>
</>
)}
{step === Step.FINISH && (
<>
<div className='mx-auto flex flex-col space-y-20 text-center'>
<h1 className='mt-4 text-2xl tracking-tight text-gray-900'>Run your first model</h1>
<div className='flex flex-col'>
<div className='group relative flex items-center'>
<pre className='language-none text-2xs w-full rounded-md bg-gray-100 px-4 py-3 text-start leading-normal'>
{command}
</pre>
<button
className={`no-drag absolute right-[5px] px-2 py-2 ${
commandCopied
? 'text-gray-900 opacity-100 hover:cursor-auto'
: 'text-gray-200 opacity-50 hover:cursor-pointer'
} hover:font-bold hover:text-gray-900 group-hover:opacity-100`}
onClick={() => {
copy(command)
setCommandCopied(true)
setTimeout(() => setCommandCopied(false), 3000)
}}
>
{commandCopied ? (
<CheckIcon className='h-4 w-4 font-bold text-gray-500' />
) : (
<DocumentDuplicateIcon className='h-4 w-4 text-gray-500' />
)}
</button>
</div>
<p className='mx-auto my-4 w-[70%] text-xs text-gray-400'>
Run this command in your favorite terminal.
</p>
</div>
<button
onClick={() => {
store.set('first-time-run', true)
window.close()
}}
className='no-drag rounded-dm mx-auto w-[60%] rounded-md bg-black px-4 py-2 text-sm text-white hover:brightness-110'
>
Finish
</button>
</div>
</>
)}
</div>
</div>

4
app/src/declarations.d.ts vendored Normal file
View File

@@ -0,0 +1,4 @@
declare module '*.svg' {
const content: string
export default content
}

View File

@@ -1,17 +1,24 @@
import { spawn, exec } from 'child_process'
import { app, autoUpdater, dialog, Tray, Menu } from 'electron'
import { spawn, ChildProcess } from 'child_process'
import { app, autoUpdater, dialog, Tray, Menu, BrowserWindow, MenuItemConstructorOptions, nativeTheme } from 'electron'
import Store from 'electron-store'
import winston from 'winston'
import 'winston-daily-rotate-file'
import * as path from 'path'
import * as fs from 'fs'
import { analytics, id } from './telemetry'
import { v4 as uuidv4 } from 'uuid'
import { installed } from './install'
require('@electron/remote/main').initialize()
if (require('electron-squirrel-startup')) {
app.quit()
}
const store = new Store()
let tray: Tray | null = null
let welcomeWindow: BrowserWindow | null = null
declare const MAIN_WINDOW_WEBPACK_ENTRY: string
const logger = winston.createLogger({
transports: [
@@ -22,41 +29,116 @@ const logger = winston.createLogger({
maxFiles: 5,
}),
],
format: winston.format.printf(info => `${info.message}`),
format: winston.format.printf(info => info.message),
})
const SingleInstanceLock = app.requestSingleInstanceLock()
if (!SingleInstanceLock) {
app.quit()
}
const createSystemtray = () => {
let iconPath = path.join(__dirname, '..', '..', 'assets', 'ollama_icon_16x16Template.png')
if (app.isPackaged) {
iconPath = path.join(process.resourcesPath, 'ollama_icon_16x16Template.png')
app.on('ready', () => {
const gotTheLock = app.requestSingleInstanceLock()
if (!gotTheLock) {
app.exit(0)
return
}
tray = new Tray(iconPath)
app.on('second-instance', () => {
if (app.hasSingleInstanceLock()) {
app.releaseSingleInstanceLock()
}
const contextMenu = Menu.buildFromTemplate([{ role: 'quit', label: 'Quit Ollama', accelerator: 'Command+Q' }])
if (proc) {
proc.off('exit', restart)
proc.kill()
}
tray.setContextMenu(contextMenu)
tray.setToolTip('Ollama')
app.exit(0)
})
app.focus({ steal: true })
init()
})
function firstRunWindow() {
// Create the browser window.
welcomeWindow = new BrowserWindow({
width: 400,
height: 500,
frame: false,
fullscreenable: false,
resizable: false,
movable: true,
show: false,
webPreferences: {
nodeIntegration: true,
contextIsolation: false,
},
})
require('@electron/remote/main').enable(welcomeWindow.webContents)
welcomeWindow.loadURL(MAIN_WINDOW_WEBPACK_ENTRY)
welcomeWindow.on('ready-to-show', () => welcomeWindow.show())
welcomeWindow.on('closed', () => {
if (process.platform === 'darwin') {
app.dock.hide()
}
})
}
if (require('electron-squirrel-startup')) {
app.quit()
let tray: Tray | null = null
let updateAvailable = false
const assetPath = app.isPackaged ? process.resourcesPath : path.join(__dirname, '..', '..', 'assets')
function trayIconPath() {
return nativeTheme.shouldUseDarkColors
? updateAvailable
? path.join(assetPath, 'iconDarkUpdateTemplate.png')
: path.join(assetPath, 'iconDarkTemplate.png')
: updateAvailable
? path.join(assetPath, 'iconUpdateTemplate.png')
: path.join(assetPath, 'iconTemplate.png')
}
const ollama = path.join(process.resourcesPath, 'ollama')
function updateTrayIcon() {
if (tray) {
tray.setImage(trayIconPath())
}
}
function updateTray() {
const updateItems: MenuItemConstructorOptions[] = [
{ label: 'An update is available', enabled: false },
{
label: 'Restart to update',
click: () => autoUpdater.quitAndInstall(),
},
{ type: 'separator' },
]
const menu = Menu.buildFromTemplate([
...(updateAvailable ? updateItems : []),
{ role: 'quit', label: 'Quit Ollama', accelerator: 'Command+Q' },
])
if (!tray) {
tray = new Tray(trayIconPath())
}
tray.setToolTip(updateAvailable ? 'An update is available' : 'Ollama')
tray.setContextMenu(menu)
tray.setImage(trayIconPath())
nativeTheme.off('updated', updateTrayIcon)
nativeTheme.on('updated', updateTrayIcon)
}
let proc: ChildProcess = null
function server() {
const binary = app.isPackaged
? path.join(process.resourcesPath, 'ollama')
: path.resolve(process.cwd(), '..', 'ollama')
const proc = spawn(binary, ['serve'])
proc = spawn(binary, ['serve'])
proc.stdout.on('data', data => {
logger.info(data.toString().trim())
@@ -66,66 +148,76 @@ function server() {
logger.error(data.toString().trim())
})
proc.on('exit', () => {
logger.info('Restarting the server...')
server()
})
proc.on('disconnect', () => {
logger.info('Server disconnected. Reconnecting...')
server()
})
process.on('exit', () => {
proc.kill()
})
proc.on('exit', restart)
}
function installCLI() {
const symlinkPath = '/usr/local/bin/ollama'
function restart() {
setTimeout(server, 1000)
}
if (fs.existsSync(symlinkPath) && fs.readlinkSync(symlinkPath) === ollama) {
return
app.on('before-quit', () => {
if (proc) {
proc.off('exit', restart)
proc.kill('SIGINT') // send SIGINT signal to the server, which also stops any loaded llms
}
})
dialog
.showMessageBox({
type: 'info',
title: 'Ollama CLI installation',
message: 'To make the Ollama command work in your terminal, it needs administrator privileges.',
buttons: ['OK'],
})
.then(result => {
if (result.response === 0) {
const command = `
do shell script "ln -F -s ${ollama} /usr/local/bin/ollama" with administrator privileges
`
exec(`osascript -e '${command}'`, (error: Error | null, stdout: string, stderr: string) => {
if (error) {
logger.error(`cli: failed to install cli: ${error.message}`)
return
}
const updateURL = `https://ollama.ai/api/update?os=${process.platform}&arch=${
process.arch
}&version=${app.getVersion()}&id=${id()}`
logger.info(stdout)
logger.error(stderr)
})
}
})
}
let latest = ''
async function isNewReleaseAvailable() {
try {
const response = await fetch(updateURL)
app.on('ready', () => {
if (process.platform === 'darwin') {
app.dock.hide()
if (!store.has('first-time-run')) {
// This is the first run
app.setLoginItemSettings({ openAtLogin: true })
store.set('first-time-run', false)
} else {
// The app has been run before
app.setLoginItemSettings({ openAtLogin: app.getLoginItemSettings().openAtLogin })
if (!response.ok) {
return false
}
if (response.status === 204) {
return false
}
const data = await response.json()
const url = data?.url
if (!url) {
return false
}
if (latest === url) {
return false
}
latest = url
return true
} catch (error) {
logger.error(`update check failed - ${error}`)
return false
}
}
async function checkUpdate() {
const available = await isNewReleaseAvailable()
if (available) {
logger.info('checking for update')
autoUpdater.checkForUpdates()
}
}
function init() {
if (app.isPackaged) {
checkUpdate()
setInterval(() => {
checkUpdate()
}, 60 * 60 * 1000)
}
updateTray()
if (process.platform === 'darwin') {
if (app.isPackaged) {
if (!app.isInApplicationsFolder()) {
const chosen = dialog.showMessageBoxSync({
@@ -157,14 +249,24 @@ app.on('ready', () => {
}
}
}
installCLI()
}
}
createSystemtray()
server()
})
if (store.get('first-time-run') && installed()) {
if (process.platform === 'darwin') {
app.dock.hide()
}
app.setLoginItemSettings({ openAtLogin: app.getLoginItemSettings().openAtLogin })
return
}
// This is the first run or the CLI is no longer installed
app.setLoginItemSettings({ openAtLogin: true })
firstRunWindow()
}
// Quit when all windows are closed, except on macOS. There, it's common
// for applications and their menu bar to stay active until the user quits
@@ -175,45 +277,26 @@ app.on('window-all-closed', () => {
}
})
// In this file you can include the rest of your app's specific main process
// code. You can also put them in separate files and import them here.
autoUpdater.setFeedURL({
url: `https://ollama.ai/api/update?os=${process.platform}&arch=${process.arch}&version=${app.getVersion()}`,
})
function id(): string {
const id = store.get('id') as string
async function heartbeat() {
analytics.track({
anonymousId: id(),
event: 'heartbeat',
properties: {
version: app.getVersion(),
},
})
if (id) {
return id
}
const uuid = uuidv4()
store.set('id', uuid)
return uuid
}
if (app.isPackaged) {
heartbeat()
autoUpdater.checkForUpdates()
setInterval(() => {
heartbeat()
autoUpdater.checkForUpdates()
}, 60 * 60 * 1000)
}
autoUpdater.setFeedURL({ url: updateURL })
autoUpdater.on('error', e => {
logger.error(`update check failed - ${e.message}`)
console.error(`update check failed - ${e.message}`)
})
autoUpdater.on('update-downloaded', (event, releaseNotes, releaseName) => {
dialog
.showMessageBox({
type: 'info',
buttons: ['Restart Now', 'Later'],
title: 'New update available',
message: process.platform === 'win32' ? releaseNotes : releaseName,
detail: 'A new version of Ollama is available. Restart to apply the update.',
})
.then(returnValue => {
if (returnValue.response === 0) autoUpdater.quitAndInstall()
})
autoUpdater.on('update-downloaded', () => {
updateAvailable = true
updateTray()
})

21
app/src/install.ts Normal file
View File

@@ -0,0 +1,21 @@
import * as fs from 'fs'
import { exec as cbExec } from 'child_process'
import * as path from 'path'
import { promisify } from 'util'
const app = process && process.type === 'renderer' ? require('@electron/remote').app : require('electron').app
const ollama = app.isPackaged ? path.join(process.resourcesPath, 'ollama') : path.resolve(process.cwd(), '..', 'ollama')
const exec = promisify(cbExec)
const symlinkPath = '/usr/local/bin/ollama'
export function installed() {
return fs.existsSync(symlinkPath) && fs.readlinkSync(symlinkPath) === ollama
}
export async function install() {
const command = `do shell script "mkdir -p ${path.dirname(
symlinkPath
)} && ln -F -s \\"${ollama}\\" \\"${symlinkPath}\\"" with administrator privileges`
await exec(`osascript -e '${command}'`)
}

9
app/src/ollama.svg Normal file

File diff suppressed because one or more lines are too long

After

Width:  |  Height:  |  Size: 17 KiB

View File

@@ -1,19 +0,0 @@
import { Analytics } from '@segment/analytics-node'
import { v4 as uuidv4 } from 'uuid'
import Store from 'electron-store'
const store = new Store()
export const analytics = new Analytics({ writeKey: process.env.TELEMETRY_WRITE_KEY || '<empty>' })
export function id(): string {
const id = store.get('id') as string
if (id) {
return id
}
const uuid = uuidv4()
store.set('id', uuid)
return uuid
}

View File

@@ -28,4 +28,8 @@ export const rules: Required<ModuleOptions>['rules'] = [
},
},
},
{
test: /\.svg$/,
use: ['@svgr/webpack'],
},
]

File diff suppressed because it is too large Load Diff

555
cmd/interactive.go Normal file
View File

@@ -0,0 +1,555 @@
package cmd
import (
"errors"
"fmt"
"io"
"net/http"
"os"
"regexp"
"strings"
"github.com/spf13/cobra"
"golang.org/x/exp/slices"
"github.com/jmorganca/ollama/api"
"github.com/jmorganca/ollama/readline"
)
type MultilineState int
const (
MultilineNone MultilineState = iota
MultilinePrompt
MultilineSystem
MultilineTemplate
)
func modelIsMultiModal(cmd *cobra.Command, name string) bool {
// get model details
client, err := api.ClientFromEnvironment()
if err != nil {
fmt.Println("error: couldn't connect to ollama server")
return false
}
req := api.ShowRequest{Name: name}
resp, err := client.Show(cmd.Context(), &req)
if err != nil {
return false
}
return slices.Contains(resp.Details.Families, "clip")
}
func generateInteractive(cmd *cobra.Command, opts runOptions) error {
multiModal := modelIsMultiModal(cmd, opts.Model)
// load the model
loadOpts := runOptions{
Model: opts.Model,
Prompt: "",
Messages: []api.Message{},
}
if _, err := chat(cmd, loadOpts); err != nil {
return err
}
usage := func() {
fmt.Fprintln(os.Stderr, "Available Commands:")
fmt.Fprintln(os.Stderr, " /set Set session variables")
fmt.Fprintln(os.Stderr, " /show Show model information")
fmt.Fprintln(os.Stderr, " /bye Exit")
fmt.Fprintln(os.Stderr, " /?, /help Help for a command")
fmt.Fprintln(os.Stderr, " /? shortcuts Help for keyboard shortcuts")
fmt.Fprintln(os.Stderr, "")
fmt.Fprintln(os.Stderr, "Use \"\"\" to begin a multi-line message.")
fmt.Fprintln(os.Stderr, "")
}
usageSet := func() {
fmt.Fprintln(os.Stderr, "Available Commands:")
fmt.Fprintln(os.Stderr, " /set parameter ... Set a parameter")
fmt.Fprintln(os.Stderr, " /set system <string> Set system message")
fmt.Fprintln(os.Stderr, " /set template <string> Set prompt template")
fmt.Fprintln(os.Stderr, " /set history Enable history")
fmt.Fprintln(os.Stderr, " /set nohistory Disable history")
fmt.Fprintln(os.Stderr, " /set wordwrap Enable wordwrap")
fmt.Fprintln(os.Stderr, " /set nowordwrap Disable wordwrap")
fmt.Fprintln(os.Stderr, " /set format json Enable JSON mode")
fmt.Fprintln(os.Stderr, " /set noformat Disable formatting")
fmt.Fprintln(os.Stderr, " /set verbose Show LLM stats")
fmt.Fprintln(os.Stderr, " /set quiet Disable LLM stats")
fmt.Fprintln(os.Stderr, "")
}
usageShortcuts := func() {
fmt.Fprintln(os.Stderr, "Available keyboard shortcuts:")
fmt.Fprintln(os.Stderr, " Ctrl + a Move to the beginning of the line (Home)")
fmt.Fprintln(os.Stderr, " Ctrl + e Move to the end of the line (End)")
fmt.Fprintln(os.Stderr, " Alt + b Move back (left) one word")
fmt.Fprintln(os.Stderr, " Alt + f Move forward (right) one word")
fmt.Fprintln(os.Stderr, " Ctrl + k Delete the sentence after the cursor")
fmt.Fprintln(os.Stderr, " Ctrl + u Delete the sentence before the cursor")
fmt.Fprintln(os.Stderr, "")
fmt.Fprintln(os.Stderr, " Ctrl + l Clear the screen")
fmt.Fprintln(os.Stderr, " Ctrl + c Stop the model from responding")
fmt.Fprintln(os.Stderr, " Ctrl + d Exit ollama (/bye)")
fmt.Fprintln(os.Stderr, "")
}
usageShow := func() {
fmt.Fprintln(os.Stderr, "Available Commands:")
fmt.Fprintln(os.Stderr, " /show info Show details for this model")
fmt.Fprintln(os.Stderr, " /show license Show model license")
fmt.Fprintln(os.Stderr, " /show modelfile Show Modelfile for this model")
fmt.Fprintln(os.Stderr, " /show parameters Show parameters for this model")
fmt.Fprintln(os.Stderr, " /show system Show system message")
fmt.Fprintln(os.Stderr, " /show template Show prompt template")
fmt.Fprintln(os.Stderr, "")
}
// only list out the most common parameters
usageParameters := func() {
fmt.Fprintln(os.Stderr, "Available Parameters:")
fmt.Fprintln(os.Stderr, " /set parameter seed <int> Random number seed")
fmt.Fprintln(os.Stderr, " /set parameter num_predict <int> Max number of tokens to predict")
fmt.Fprintln(os.Stderr, " /set parameter top_k <int> Pick from top k num of tokens")
fmt.Fprintln(os.Stderr, " /set parameter top_p <float> Pick token based on sum of probabilities")
fmt.Fprintln(os.Stderr, " /set parameter num_ctx <int> Set the context size")
fmt.Fprintln(os.Stderr, " /set parameter temperature <float> Set creativity level")
fmt.Fprintln(os.Stderr, " /set parameter repeat_penalty <float> How strongly to penalize repetitions")
fmt.Fprintln(os.Stderr, " /set parameter repeat_last_n <int> Set how far back to look for repetitions")
fmt.Fprintln(os.Stderr, " /set parameter num_gpu <int> The number of layers to send to the GPU")
fmt.Fprintln(os.Stderr, " /set parameter stop \"<string>\", ... Set the stop parameters")
fmt.Fprintln(os.Stderr, "")
}
scanner, err := readline.New(readline.Prompt{
Prompt: ">>> ",
AltPrompt: "... ",
Placeholder: "Send a message (/? for help)",
AltPlaceholder: `Use """ to end multi-line input`,
})
if err != nil {
return err
}
fmt.Print(readline.StartBracketedPaste)
defer fmt.Printf(readline.EndBracketedPaste)
var sb strings.Builder
var multiline MultilineState
opts.Messages = make([]api.Message, 0)
for {
line, err := scanner.Readline()
switch {
case errors.Is(err, io.EOF):
fmt.Println()
return nil
case errors.Is(err, readline.ErrInterrupt):
if line == "" {
fmt.Println("\nUse Ctrl + d or /bye to exit.")
}
scanner.Prompt.UseAlt = false
sb.Reset()
continue
case err != nil:
return err
}
switch {
case multiline != MultilineNone:
// check if there's a multiline terminating string
before, ok := strings.CutSuffix(line, `"""`)
sb.WriteString(before)
if !ok {
fmt.Fprintln(&sb)
continue
}
switch multiline {
case MultilineSystem:
opts.System = sb.String()
fmt.Println("Set system message.")
sb.Reset()
case MultilineTemplate:
opts.Template = sb.String()
fmt.Println("Set prompt template.")
sb.Reset()
}
multiline = MultilineNone
scanner.Prompt.UseAlt = false
case strings.HasPrefix(line, `"""`):
line := strings.TrimPrefix(line, `"""`)
line, ok := strings.CutSuffix(line, `"""`)
sb.WriteString(line)
if !ok {
// no multiline terminating string; need more input
fmt.Fprintln(&sb)
multiline = MultilinePrompt
scanner.Prompt.UseAlt = true
break
}
case scanner.Pasting:
fmt.Fprintln(&sb, line)
continue
case strings.HasPrefix(line, "/list"):
args := strings.Fields(line)
if err := ListHandler(cmd, args[1:]); err != nil {
return err
}
case strings.HasPrefix(line, "/set"):
args := strings.Fields(line)
if len(args) > 1 {
switch args[1] {
case "history":
scanner.HistoryEnable()
case "nohistory":
scanner.HistoryDisable()
case "wordwrap":
opts.WordWrap = true
fmt.Println("Set 'wordwrap' mode.")
case "nowordwrap":
opts.WordWrap = false
fmt.Println("Set 'nowordwrap' mode.")
case "verbose":
cmd.Flags().Set("verbose", "true")
fmt.Println("Set 'verbose' mode.")
case "quiet":
cmd.Flags().Set("verbose", "false")
fmt.Println("Set 'quiet' mode.")
case "format":
if len(args) < 3 || args[2] != "json" {
fmt.Println("Invalid or missing format. For 'json' mode use '/set format json'")
} else {
opts.Format = args[2]
fmt.Printf("Set format to '%s' mode.\n", args[2])
}
case "noformat":
opts.Format = ""
fmt.Println("Disabled format.")
case "parameter":
if len(args) < 4 {
usageParameters()
continue
}
params := args[3:]
fp, err := api.FormatParams(map[string][]string{args[2]: params})
if err != nil {
fmt.Printf("Couldn't set parameter: %q\n", err)
continue
}
fmt.Printf("Set parameter '%s' to '%s'\n", args[2], strings.Join(params, ", "))
opts.Options[args[2]] = fp[args[2]]
case "system", "template":
if len(args) < 3 {
usageSet()
continue
}
if args[1] == "system" {
multiline = MultilineSystem
} else if args[1] == "template" {
multiline = MultilineTemplate
}
line := strings.Join(args[2:], " ")
line, ok := strings.CutPrefix(line, `"""`)
if !ok {
multiline = MultilineNone
} else {
// only cut suffix if the line is multiline
line, ok = strings.CutSuffix(line, `"""`)
if ok {
multiline = MultilineNone
}
}
sb.WriteString(line)
if multiline != MultilineNone {
scanner.Prompt.UseAlt = true
continue
}
if args[1] == "system" {
opts.System = sb.String()
fmt.Println("Set system message.")
} else if args[1] == "template" {
opts.Template = sb.String()
fmt.Println("Set prompt template.")
}
sb.Reset()
continue
default:
fmt.Printf("Unknown command '/set %s'. Type /? for help\n", args[1])
}
} else {
usageSet()
}
case strings.HasPrefix(line, "/show"):
args := strings.Fields(line)
if len(args) > 1 {
client, err := api.ClientFromEnvironment()
if err != nil {
fmt.Println("error: couldn't connect to ollama server")
return err
}
req := &api.ShowRequest{
Name: opts.Model,
System: opts.System,
Template: opts.Template,
Options: opts.Options,
}
resp, err := client.Show(cmd.Context(), req)
if err != nil {
fmt.Println("error: couldn't get model")
return err
}
switch args[1] {
case "info":
fmt.Println("Model details:")
if len(resp.Details.Families) > 0 {
fmt.Printf("Family %s\n", strings.Join(resp.Details.Families, ", "))
} else if resp.Details.Family != "" {
fmt.Printf("Family %s\n", resp.Details.Family)
}
fmt.Printf("Parameter Size %s\n", resp.Details.ParameterSize)
fmt.Printf("Quantization Level %s\n", resp.Details.QuantizationLevel)
fmt.Println("")
case "license":
if resp.License == "" {
fmt.Println("No license was specified for this model.")
} else {
fmt.Println(resp.License)
}
case "modelfile":
fmt.Println(resp.Modelfile)
case "parameters":
if resp.Parameters == "" {
fmt.Println("No parameters were specified for this model.")
} else {
if len(opts.Options) > 0 {
fmt.Println("User defined parameters:")
for k, v := range opts.Options {
fmt.Printf("%-*s %v\n", 30, k, v)
}
fmt.Println()
}
fmt.Println("Model defined parameters:")
fmt.Println(resp.Parameters)
}
case "system":
switch {
case opts.System != "":
fmt.Println(opts.System + "\n")
case resp.System != "":
fmt.Println(resp.System + "\n")
default:
fmt.Println("No system message was specified for this model.")
}
case "template":
switch {
case opts.Template != "":
fmt.Println(opts.Template + "\n")
case resp.Template != "":
fmt.Println(resp.Template)
default:
fmt.Println("No prompt template was specified for this model.")
}
default:
fmt.Printf("Unknown command '/show %s'. Type /? for help\n", args[1])
}
} else {
usageShow()
}
case strings.HasPrefix(line, "/help"), strings.HasPrefix(line, "/?"):
args := strings.Fields(line)
if len(args) > 1 {
switch args[1] {
case "set", "/set":
usageSet()
case "show", "/show":
usageShow()
case "shortcut", "shortcuts":
usageShortcuts()
}
} else {
usage()
}
case line == "/exit", line == "/bye":
return nil
case strings.HasPrefix(line, "/"):
args := strings.Fields(line)
isFile := false
if multiModal {
for _, f := range extractFileNames(line) {
if strings.HasPrefix(f, args[0]) {
isFile = true
break
}
}
}
if !isFile {
fmt.Printf("Unknown command '%s'. Type /? for help\n", args[0])
continue
}
sb.WriteString(line)
default:
sb.WriteString(line)
}
if sb.Len() > 0 && multiline == MultilineNone {
newMessage := api.Message{Role: "user", Content: sb.String()}
if multiModal {
msg, images, err := extractFileData(sb.String())
if err != nil {
return err
}
newMessage.Content = msg
// reset the context if we find another image
if len(images) > 0 {
newMessage.Images = append(newMessage.Images, images...)
// reset the context for the new image
opts.Messages = []api.Message{}
} else {
if len(opts.Messages) > 1 {
newMessage.Images = append(newMessage.Images, opts.Messages[len(opts.Messages)-2].Images...)
}
}
if len(newMessage.Images) == 0 {
fmt.Println("This model requires you to add a jpeg, png, or svg image.")
fmt.Println()
sb.Reset()
continue
}
}
if opts.System != "" {
opts.Messages = append(opts.Messages, api.Message{Role: "system", Content: opts.System})
}
opts.Messages = append(opts.Messages, newMessage)
assistant, err := chat(cmd, opts)
if err != nil {
return err
}
if assistant != nil {
opts.Messages = append(opts.Messages, *assistant)
}
sb.Reset()
}
}
}
func normalizeFilePath(fp string) string {
// Define a map of escaped characters and their replacements
replacements := map[string]string{
"\\ ": " ", // Escaped space
"\\(": "(", // Escaped left parenthesis
"\\)": ")", // Escaped right parenthesis
"\\[": "[", // Escaped left square bracket
"\\]": "]", // Escaped right square bracket
"\\{": "{", // Escaped left curly brace
"\\}": "}", // Escaped right curly brace
"\\$": "$", // Escaped dollar sign
"\\&": "&", // Escaped ampersand
"\\;": ";", // Escaped semicolon
"\\'": "'", // Escaped single quote
"\\\\": "\\", // Escaped backslash
"\\*": "*", // Escaped asterisk
"\\?": "?", // Escaped question mark
}
for escaped, actual := range replacements {
fp = strings.ReplaceAll(fp, escaped, actual)
}
return fp
}
func extractFileNames(input string) []string {
// Regex to match file paths starting with optional drive letter, / ./ \ or .\ and include escaped or unescaped spaces (\ or %20)
// and followed by more characters and a file extension
// This will capture non filename strings, but we'll check for file existence to remove mismatches
regexPattern := `(?:[a-zA-Z]:)?(?:\./|/|\\)[\S\\ ]+?\.(?i:jpg|jpeg|png|svg)\b`
re := regexp.MustCompile(regexPattern)
return re.FindAllString(input, -1)
}
func extractFileData(input string) (string, []api.ImageData, error) {
filePaths := extractFileNames(input)
var imgs []api.ImageData
for _, fp := range filePaths {
nfp := normalizeFilePath(fp)
data, err := getImageData(nfp)
if err != nil {
if os.IsNotExist(err) {
continue
}
fmt.Printf("Couldn't process image: %q\n", err)
return "", imgs, err
}
fmt.Printf("Added image '%s'\n", nfp)
input = strings.ReplaceAll(input, fp, "")
imgs = append(imgs, data)
}
return input, imgs, nil
}
func getImageData(filePath string) ([]byte, error) {
file, err := os.Open(filePath)
if err != nil {
return nil, err
}
defer file.Close()
buf := make([]byte, 512)
_, err = file.Read(buf)
if err != nil {
return nil, err
}
contentType := http.DetectContentType(buf)
allowedTypes := []string{"image/jpeg", "image/jpg", "image/svg+xml", "image/png"}
if !slices.Contains(allowedTypes, contentType) {
return nil, fmt.Errorf("invalid image type: %s", contentType)
}
info, err := file.Stat()
if err != nil {
return nil, err
}
// Check if the file size exceeds 100MB
var maxSize int64 = 100 * 1024 * 1024 // 100MB in bytes
if info.Size() > maxSize {
return nil, fmt.Errorf("file size exceeds maximum limit (100MB)")
}
buf = make([]byte, info.Size())
_, err = file.Seek(0, 0)
if err != nil {
return nil, err
}
_, err = io.ReadFull(file, buf)
if err != nil {
return nil, err
}
return buf, nil
}

51
cmd/interactive_test.go Normal file
View File

@@ -0,0 +1,51 @@
package cmd
import (
"testing"
"github.com/stretchr/testify/assert"
)
func TestExtractFilenames(t *testing.T) {
// Unix style paths
input := ` some preamble
./relative\ path/one.png inbetween1 ./not a valid two.jpg inbetween2
/unescaped space /three.jpeg inbetween3 /valid\ path/dir/four.png "./quoted with spaces/five.svg`
res := extractFileNames(input)
assert.Len(t, res, 5)
assert.Contains(t, res[0], "one.png")
assert.Contains(t, res[1], "two.jpg")
assert.Contains(t, res[2], "three.jpeg")
assert.Contains(t, res[3], "four.png")
assert.Contains(t, res[4], "five.svg")
assert.NotContains(t, res[4], '"')
assert.NotContains(t, res, "inbtween")
// Windows style paths
input = ` some preamble
c:/users/jdoe/one.png inbetween1 c:/program files/someplace/two.jpg inbetween2
/absolute/nospace/three.jpeg inbetween3 /absolute/with space/four.png inbetween4
./relative\ path/five.svg inbetween5 "./relative with/spaces/six.png inbetween6
d:\path with\spaces\seven.svg inbetween7 c:\users\jdoe\eight.png inbetween8
d:\program files\someplace\nine.png inbetween9 "E:\program files\someplace\ten.svg some ending
`
res = extractFileNames(input)
assert.Len(t, res, 10)
assert.NotContains(t, res, "inbtween")
assert.Contains(t, res[0], "one.png")
assert.Contains(t, res[0], "c:")
assert.Contains(t, res[1], "two.jpg")
assert.Contains(t, res[1], "c:")
assert.Contains(t, res[2], "three.jpeg")
assert.Contains(t, res[3], "four.png")
assert.Contains(t, res[4], "five.svg")
assert.Contains(t, res[5], "six.png")
assert.Contains(t, res[6], "seven.svg")
assert.Contains(t, res[6], "d:")
assert.Contains(t, res[7], "eight.png")
assert.Contains(t, res[7], "c:")
assert.Contains(t, res[8], "nine.png")
assert.Contains(t, res[8], "d:")
assert.Contains(t, res[9], "ten.svg")
assert.Contains(t, res[9], "E:")
}

25
docs/README.md Normal file
View File

@@ -0,0 +1,25 @@
# Documentation
To get started, see the project's **[quickstart](../README.md#quickstart)**.
Ollama is a tool for running AI models on your hardware. Many users will choose to use the Command Line Interface (CLI) to work with Ollama. Learn more about all the commands in the CLI in the **[Main Readme](../README.md)**.
Use the RESTful API using any language, including Python, JavaScript, Typescript, Go, Rust, and many more. Learn more about using the API in the **[API Documentation](./api.md)**.
Create new models or modify models already in the library using the Modelfile. Learn more about the Modelfile syntax in the **[Modelfile Documentation](./modelfile.md)**.
Import models using source model weights found on Hugging Face and similar sites by referring to the **[Import Documentation](./import.md)**.
Installing on Linux in most cases is easy using the script on Ollama.ai. To get more detail about the install, including CUDA drivers, see the **[Linux Documentation](./linux.md)**.
Many of our users like the flexibility of using our official Docker Image. Learn more about using Docker with Ollama using the **[Docker Documentation](https://hub.docker.com/r/ollama/ollama)**.
It is easy to install on Linux and Mac, but many users will choose to build Ollama on their own. To do this, refer to the **[Development Documentation](./development.md)**.
If encountering a problem with Ollama, the best place to start is the logs. Find more information about them here in the **[Troubleshooting Guide](./troubleshooting.md)**.
Finally for all the questions that don't fit anywhere else, there is the **[FAQ](./faq.md)**
[Tutorials](./tutorials.md) apply the documentation to tasks.
For working code examples of using Ollama, see [Examples](../examples).

982
docs/api.md Normal file
View File

@@ -0,0 +1,982 @@
# API
## Endpoints
- [Generate a completion](#generate-a-completion)
- [Generate a chat completion](#generate-a-chat-completion)
- [Create a Model](#create-a-model)
- [List Local Models](#list-local-models)
- [Show Model Information](#show-model-information)
- [Copy a Model](#copy-a-model)
- [Delete a Model](#delete-a-model)
- [Pull a Model](#pull-a-model)
- [Push a Model](#push-a-model)
- [Generate Embeddings](#generate-embeddings)
## Conventions
### Model names
Model names follow a `model:tag` format, where `model` can have an optional namespace such as `example/model`. Some examples are `orca-mini:3b-q4_1` and `llama2:70b`. The tag is optional and, if not provided, will default to `latest`. The tag is used to identify a specific version.
### Durations
All durations are returned in nanoseconds.
### Streaming responses
Certain endpoints stream responses as JSON objects and can optional return non-streamed responses.
## Generate a completion
```shell
POST /api/generate
```
Generate a response for a given prompt with a provided model. This is a streaming endpoint, so there will be a series of responses. The final response object will include statistics and additional data from the request.
### Parameters
- `model`: (required) the [model name](#model-names)
- `prompt`: the prompt to generate a response for
- `images`: (optional) a list of base64-encoded images (for multimodal models such as `llava`)
Advanced parameters (optional):
- `format`: the format to return a response in. Currently the only accepted value is `json`
- `options`: additional model parameters listed in the documentation for the [Modelfile](./modelfile.md#valid-parameters-and-values) such as `temperature`
- `system`: system message to (overrides what is defined in the `Modelfile`)
- `template`: the prompt template to use (overrides what is defined in the `Modelfile`)
- `context`: the context parameter returned from a previous request to `/generate`, this can be used to keep a short conversational memory
- `stream`: if `false` the response will be returned as a single response object, rather than a stream of objects
- `raw`: if `true` no formatting will be applied to the prompt. You may choose to use the `raw` parameter if you are specifying a full templated prompt in your request to the API.
#### JSON mode
Enable JSON mode by setting the `format` parameter to `json`. This will structure the response as a valid JSON object. See the JSON mode [example](#generate-request-json-mode) below.
> Note: it's important to instruct the model to use JSON in the `prompt`. Otherwise, the model may generate large amounts whitespace.
### Examples
#### Generate request (Streaming)
##### Request
```shell
curl http://localhost:11434/api/generate -d '{
"model": "llama2",
"prompt": "Why is the sky blue?"
}'
```
##### Response
A stream of JSON objects is returned:
```json
{
"model": "llama2",
"created_at": "2023-08-04T08:52:19.385406455-07:00",
"response": "The",
"done": false
}
```
The final response in the stream also includes additional data about the generation:
- `total_duration`: time spent generating the response
- `load_duration`: time spent in nanoseconds loading the model
- `prompt_eval_count`: number of tokens in the prompt
- `prompt_eval_duration`: time spent in nanoseconds evaluating the prompt
- `eval_count`: number of tokens the response
- `eval_duration`: time in nanoseconds spent generating the response
- `context`: an encoding of the conversation used in this response, this can be sent in the next request to keep a conversational memory
- `response`: empty if the response was streamed, if not streamed, this will contain the full response
To calculate how fast the response is generated in tokens per second (token/s), divide `eval_count` / `eval_duration`.
```json
{
"model": "llama2",
"created_at": "2023-08-04T19:22:45.499127Z",
"response": "",
"done": true,
"context": [1, 2, 3],
"total_duration": 10706818083,
"load_duration": 6338219291,
"prompt_eval_count": 26,
"prompt_eval_duration": 130079000,
"eval_count": 259,
"eval_duration": 4232710000
}
```
#### Request (No streaming)
##### Request
A response can be received in one reply when streaming is off.
```shell
curl http://localhost:11434/api/generate -d '{
"model": "llama2",
"prompt": "Why is the sky blue?",
"stream": false
}'
```
##### Response
If `stream` is set to `false`, the response will be a single JSON object:
```json
{
"model": "llama2",
"created_at": "2023-08-04T19:22:45.499127Z",
"response": "The sky is blue because it is the color of the sky.",
"done": true,
"context": [1, 2, 3],
"total_duration": 5043500667,
"load_duration": 5025959,
"prompt_eval_count": 26,
"prompt_eval_duration": 325953000,
"eval_count": 290,
"eval_duration": 4709213000
}
```
#### Request (JSON mode)
> When `format` is set to `json`, the output will always be a well-formed JSON object. It's important to also instruct the model to respond in JSON.
##### Request
```shell
curl http://localhost:11434/api/generate -d '{
"model": "llama2",
"prompt": "What color is the sky at different times of the day? Respond using JSON",
"format": "json",
"stream": false
}'
```
##### Response
```json
{
"model": "llama2",
"created_at": "2023-11-09T21:07:55.186497Z",
"response": "{\n\"morning\": {\n\"color\": \"blue\"\n},\n\"noon\": {\n\"color\": \"blue-gray\"\n},\n\"afternoon\": {\n\"color\": \"warm gray\"\n},\n\"evening\": {\n\"color\": \"orange\"\n}\n}\n",
"done": true,
"context": [1, 2, 3],
"total_duration": 4648158584,
"load_duration": 4071084,
"prompt_eval_count": 36,
"prompt_eval_duration": 439038000,
"eval_count": 180,
"eval_duration": 4196918000
}
```
The value of `response` will be a string containing JSON similar to:
```json
{
"morning": {
"color": "blue"
},
"noon": {
"color": "blue-gray"
},
"afternoon": {
"color": "warm gray"
},
"evening": {
"color": "orange"
}
}
```
#### Request (with images)
To submit images to multimodal models such as `llava` or `bakllava`, provide a list of base64-encoded `images`:
#### Request
```shell
curl http://localhost:11434/api/generate -d '{
"model": "llava",
"prompt":"What is in this picture?",
"stream": false,
"images": ["iVBORw0KGgoAAAANSUhEUgAAAG0AAABmCAYAAADBPx+VAAAACXBIWXMAAAsTAAALEwEAmpwYAAAAAXNSR0IArs4c6QAAAARnQU1BAACxjwv8YQUAAA3VSURBVHgB7Z27r0zdG8fX743i1bi1ikMoFMQloXRpKFFIqI7LH4BEQ+NWIkjQuSWCRIEoULk0gsK1kCBI0IhrQVT7tz/7zZo888yz1r7MnDl7z5xvsjkzs2fP3uu71nNfa7lkAsm7d++Sffv2JbNmzUqcc8m0adOSzZs3Z+/XES4ZckAWJEGWPiCxjsQNLWmQsWjRIpMseaxcuTKpG/7HP27I8P79e7dq1ars/yL4/v27S0ejqwv+cUOGEGGpKHR37tzJCEpHV9tnT58+dXXCJDdECBE2Ojrqjh071hpNECjx4cMHVycM1Uhbv359B2F79+51586daxN/+pyRkRFXKyRDAqxEp4yMlDDzXG1NPnnyJKkThoK0VFd1ELZu3TrzXKxKfW7dMBQ6bcuWLW2v0VlHjx41z717927ba22U9APcw7Nnz1oGEPeL3m3p2mTAYYnFmMOMXybPPXv2bNIPpFZr1NHn4HMw0KRBjg9NuRw95s8PEcz/6DZELQd/09C9QGq5RsmSRybqkwHGjh07OsJSsYYm3ijPpyHzoiacg35MLdDSIS/O1yM778jOTwYUkKNHWUzUWaOsylE00MyI0fcnOwIdjvtNdW/HZwNLGg+sR1kMepSNJXmIwxBZiG8tDTpEZzKg0GItNsosY8USkxDhD0Rinuiko2gfL/RbiD2LZAjU9zKQJj8RDR0vJBR1/Phx9+PHj9Z7REF4nTZkxzX4LCXHrV271qXkBAPGfP/atWvu/PnzHe4C97F48eIsRLZ9+3a3f/9+87dwP1JxaF7/3r17ba+5l4EcaVo0lj3SBq5kGTJSQmLWMjgYNei2GPT1MuMqGTDEFHzeQSP2wi/jGnkmPJ/nhccs44jvDAxpVcxnq0F6eT8h4ni/iIWpR5lPyA6ETkNXoSukvpJAD3AsXLiwpZs49+fPn5ke4j10TqYvegSfn0OnafC+Tv9ooA/JPkgQysqQNBzagXY55nO/oa1F7qvIPWkRL12WRpMWUvpVDYmxAPehxWSe8ZEXL20sadYIozfmNch4QJPAfeJgW3rNsnzphBKNJM2KKODo1rVOMRYik5ETy3ix4qWNI81qAAirizgMIc+yhTytx0JWZuNI03qsrgWlGtwjoS9XwgUhWGyhUaRZZQNNIEwCiXD16tXcAHUs79co0vSD8rrJCIW98pzvxpAWyyo3HYwqS0+H0BjStClcZJT5coMm6D2LOF8TolGJtK9fvyZpyiC5ePFi9nc/oJU4eiEP0jVoAnHa9wyJycITMP78+eMeP37sXrx44d6+fdt6f82aNdkx1pg9e3Zb5W+RSRE+n+VjksQWifvVaTKFhn5O8my63K8Qabdv33b379/PiAP//vuvW7BggZszZ072/+TJk91YgkafPn166zXB1rQHFvouAWHq9z3SEevSUerqCn2/dDCeta2jxYbr69evk4MHDyY7d+7MjhMnTiTPnz9Pfv/+nfQT2ggpO2dMF8cghuoM7Ygj5iWCqRlGFml0QC/ftGmTmzt3rmsaKDsgBSPh0/8yPeLLBihLkOKJc0jp8H8vUzcxIA1k6QJ/c78tWEyj5P3o4u9+jywNPdJi5rAH9x0KHcl4Hg570eQp3+vHXGyrmEeigzQsQsjavXt38ujRo44LQuDDhw+TW7duRS1HGgMxhNXHgflaNTOsHyKvHK5Ijo2jbFjJBQK9YwFd6RVMzfgRBmEfP37suBBm/p49e1qjEP2mwTViNRo0VJWH1deMXcNK08uUjVUu7s/zRaL+oLNxz1bpANco4npUgX4G2eFbpDFyQoQxojBCpEGSytmOH8qrH5Q9vuzD6ofQylkCUmh8DBAr+q8JCyVNtWQIidKQE9wNtLSQnS4jDSsxNHogzFuQBw4cyM61UKVsjfr3ooBkPSqqQHesUPWVtzi9/vQi1T+rJj7WiTz4Pt/l3LxUkr5P2VYZaZ4URpsE+st/dujQoaBBYokbrz/8TJNQYLSonrPS9kUaSkPeZyj1AWSj+d+VBoy1pIWVNed8P0Ll/ee5HdGRhrHhR5GGN0r4LGZBaj8oFDJitBTJzIZgFcmU0Y8ytWMZMzJOaXUSrUs5RxKnrxmbb5YXO9VGUhtpXldhEUogFr3IzIsvlpmdosVcGVGXFWp2oU9kLFL3dEkSz6NHEY1sjSRdIuDFWEhd8KxFqsRi1uM/nz9/zpxnwlESONdg6dKlbsaMGS4EHFHtjFIDHwKOo46l4TxSuxgDzi+rE2jg+BaFruOX4HXa0Nnf1lwAPufZeF8/r6zD97WK2qFnGjBxTw5qNGPxT+5T/r7/7RawFC3j4vTp09koCxkeHjqbHJqArmH5UrFKKksnxrK7FuRIs8STfBZv+luugXZ2pR/pP9Ois4z+TiMzUUkUjD0iEi1fzX8GmXyuxUBRcaUfykV0YZnlJGKQpOiGB76x5GeWkWWJc3mOrK6S7xdND+W5N6XyaRgtWJFe13GkaZnKOsYqGdOVVVbGupsyA/l7emTLHi7vwTdirNEt0qxnzAvBFcnQF16xh/TMpUuXHDowhlA9vQVraQhkudRdzOnK+04ZSP3DUhVSP61YsaLtd/ks7ZgtPcXqPqEafHkdqa84X6aCeL7YWlv6edGFHb+ZFICPlljHhg0bKuk0CSvVznWsotRu433alNdFrqG45ejoaPCaUkWERpLXjzFL2Rpllp7PJU2a/v7Ab8N05/9t27Z16KUqoFGsxnI9EosS2niSYg9SpU6B4JgTrvVW1flt1sT+0ADIJU2maXzcUTraGCRaL1Wp9rUMk16PMom8QhruxzvZIegJjFU7LLCePfS8uaQdPny4jTTL0dbee5mYokQsXTIWNY46kuMbnt8Kmec+LGWtOVIl9cT1rCB0V8WqkjAsRwta93TbwNYoGKsUSChN44lgBNCoHLHzquYKrU6qZ8lolCIN0Rh6cP0Q3U6I6IXILYOQI513hJaSKAorFpuHXJNfVlpRtmYBk1Su1obZr5dnKAO+L10Hrj3WZW+E3qh6IszE37F6EB+68mGpvKm4eb9bFrlzrok7fvr0Kfv727dvWRmdVTJHw0qiiCUSZ6wCK+7XL/AcsgNyL74DQQ730sv78Su7+t/A36MdY0sW5o40ahslXr58aZ5HtZB8GH64m9EmMZ7FpYw4T6QnrZfgenrhFxaSiSGXtPnz57e9TkNZLvTjeqhr734CNtrK41L40sUQckmj1lGKQ0rC37x544r8eNXRpnVE3ZZY7zXo8NomiO0ZUCj2uHz58rbXoZ6gc0uA+F6ZeKS/jhRDUq8MKrTho9fEkihMmhxtBI1DxKFY9XLpVcSkfoi8JGnToZO5sU5aiDQIW716ddt7ZLYtMQlhECdBGXZZMWldY5BHm5xgAroWj4C0hbYkSc/jBmggIrXJWlZM6pSETsEPGqZOndr2uuuR5rF169a2HoHPdurUKZM4CO1WTPqaDaAd+GFGKdIQkxAn9RuEWcTRyN2KSUgiSgF5aWzPTeA/lN5rZubMmR2bE4SIC4nJoltgAV/dVefZm72AtctUCJU2CMJ327hxY9t7EHbkyJFseq+EJSY16RPo3Dkq1kkr7+q0bNmyDuLQcZBEPYmHVdOBiJyIlrRDq41YPWfXOxUysi5fvtyaj+2BpcnsUV/oSoEMOk2CQGlr4ckhBwaetBhjCwH0ZHtJROPJkyc7UjcYLDjmrH7ADTEBXFfOYmB0k9oYBOjJ8b4aOYSe7QkKcYhFlq3QYLQhSidNmtS2RATwy8YOM3EQJsUjKiaWZ+vZToUQgzhkHXudb/PW5YMHD9yZM2faPsMwoc7RciYJXbGuBqJ1UIGKKLv915jsvgtJxCZDubdXr165mzdvtr1Hz5LONA8jrUwKPqsmVesKa49S3Q4WxmRPUEYdTjgiUcfUwLx589ySJUva3oMkP6IYddq6HMS4o55xBJBUeRjzfa4Zdeg56QZ43LhxoyPo7Lf1kNt7oO8wWAbNwaYjIv5lhyS7kRf96dvm5Jah8vfvX3flyhX35cuX6HfzFHOToS1H4BenCaHvO8pr8iDuwoUL7tevX+b5ZdbBair0xkFIlFDlW4ZknEClsp/TzXyAKVOmmHWFVSbDNw1l1+4f90U6IY/q4V27dpnE9bJ+v87QEydjqx/UamVVPRG+mwkNTYN+9tjkwzEx+atCm/X9WvWtDtAb68Wy9LXa1UmvCDDIpPkyOQ5ZwSzJ4jMrvFcr0rSjOUh+GcT4LSg5ugkW1Io0/SCDQBojh0hPlaJdah+tkVYrnTZowP8iq1F1TgMBBauufyB33x1v+NWFYmT5KmppgHC+NkAgbmRkpD3yn9QIseXymoTQFGQmIOKTxiZIWpvAatenVqRVXf2nTrAWMsPnKrMZHz6bJq5jvce6QK8J1cQNgKxlJapMPdZSR64/UivS9NztpkVEdKcrs5alhhWP9NeqlfWopzhZScI6QxseegZRGeg5a8C3Re1Mfl1ScP36ddcUaMuv24iOJtz7sbUjTS4qBvKmstYJoUauiuD3k5qhyr7QdUHMeCgLa1Ear9NquemdXgmum4fvJ6w1lqsuDhNrg1qSpleJK7K3TF0Q2jSd94uSZ60kK1e3qyVpQK6PVWXp2/FC3mp6jBhKKOiY2h3gtUV64TWM6wDETRPLDfSakXmH3w8g9Jlug8ZtTt4kVF0kLUYYmCCtD/DrQ5YhMGbA9L3ucdjh0y8kOHW5gU/VEEmJTcL4Pz/f7mgoAbYkAAAAAElFTkSuQmCC"]
}'
```
#### Response
```
{
"model": "llava",
"created_at": "2023-11-03T15:36:02.583064Z",
"response": "A happy cartoon character, which is cute and cheerful.",
"done": true,
"context": [1, 2, 3],
"total_duration": 2938432250,
"load_duration": 2559292,
"prompt_eval_count": 1,
"prompt_eval_duration": 2195557000,
"eval_count": 44,
"eval_duration": 736432000
}
```
#### Request (Raw Mode)
In some cases, you may wish to bypass the templating system and provide a full prompt. In this case, you can use the `raw` parameter to disable templating. Also note that raw mode will not return a context.
##### Request
```shell
curl http://localhost:11434/api/generate -d '{
"model": "mistral",
"prompt": "[INST] why is the sky blue? [/INST]",
"raw": true,
"stream": false
}'
```
##### Response
```json
{
"model": "mistral",
"created_at": "2023-11-03T15:36:02.583064Z",
"response": " The sky appears blue because of a phenomenon called Rayleigh scattering.",
"done": true,
"total_duration": 8493852375,
"load_duration": 6589624375,
"prompt_eval_count": 14,
"prompt_eval_duration": 119039000,
"eval_count": 110,
"eval_duration": 1779061000
}
```
#### Generate request (With options)
If you want to set custom options for the model at runtime rather than in the Modelfile, you can do so with the `options` parameter. This example sets every available option, but you can set any of them individually and omit the ones you do not want to override.
##### Request
```shell
curl http://localhost:11434/api/generate -d '{
"model": "llama2",
"prompt": "Why is the sky blue?",
"stream": false,
"options": {
"num_keep": 5,
"seed": 42,
"num_predict": 100,
"top_k": 20,
"top_p": 0.9,
"tfs_z": 0.5,
"typical_p": 0.7,
"repeat_last_n": 33,
"temperature": 0.8,
"repeat_penalty": 1.2,
"presence_penalty": 1.5,
"frequency_penalty": 1.0,
"mirostat": 1,
"mirostat_tau": 0.8,
"mirostat_eta": 0.6,
"penalize_newline": true,
"stop": ["\n", "user:"],
"numa": false,
"num_ctx": 1024,
"num_batch": 2,
"num_gqa": 1,
"num_gpu": 1,
"main_gpu": 0,
"low_vram": false,
"f16_kv": true,
"vocab_only": false,
"use_mmap": true,
"use_mlock": false,
"embedding_only": false,
"rope_frequency_base": 1.1,
"rope_frequency_scale": 0.8,
"num_thread": 8
}
}'
```
##### Response
```json
{
"model": "llama2",
"created_at": "2023-08-04T19:22:45.499127Z",
"response": "The sky is blue because it is the color of the sky.",
"done": true,
"context": [1, 2, 3],
"total_duration": 4935886791,
"load_duration": 534986708,
"prompt_eval_count": 26,
"prompt_eval_duration": 107345000,
"eval_count": 237,
"eval_duration": 4289432000
}
```
#### Load a model
If an empty prompt is provided, the model will be loaded into memory.
##### Request
```shell
curl http://localhost:11434/api/generate -d '{
"model": "llama2"
}'
```
##### Response
A single JSON object is returned:
```json
{
"model": "llama2",
"created_at": "2023-12-18T19:52:07.071755Z",
"response": "",
"done": true
}
```
## Generate a chat completion
```shell
POST /api/chat
```
Generate the next message in a chat with a provided model. This is a streaming endpoint, so there will be a series of responses. Streaming can be disabled using `"stream": false`. The final response object will include statistics and additional data from the request.
### Parameters
- `model`: (required) the [model name](#model-names)
- `messages`: the messages of the chat, this can be used to keep a chat memory
The `message` object has the following fields:
- `role`: the role of the message, either `system`, `user` or `assistant`
- `content`: the content of the message
- `images` (optional): a list of images to include in the message (for multimodal models such as `llava`)
Advanced parameters (optional):
- `format`: the format to return a response in. Currently the only accepted value is `json`
- `options`: additional model parameters listed in the documentation for the [Modelfile](./modelfile.md#valid-parameters-and-values) such as `temperature`
- `template`: the prompt template to use (overrides what is defined in the `Modelfile`)
- `stream`: if `false` the response will be returned as a single response object, rather than a stream of objects
### Examples
#### Chat Request (Streaming)
##### Request
Send a chat message with a streaming response.
```shell
curl http://localhost:11434/api/chat -d '{
"model": "llama2",
"messages": [
{
"role": "user",
"content": "why is the sky blue?"
}
]
}'
```
##### Response
A stream of JSON objects is returned:
```json
{
"model": "llama2",
"created_at": "2023-08-04T08:52:19.385406455-07:00",
"message": {
"role": "assistant",
"content": "The",
"images": null
},
"done": false
}
```
Final response:
```json
{
"model": "llama2",
"created_at": "2023-08-04T19:22:45.499127Z",
"done": true,
"total_duration": 4883583458,
"load_duration": 1334875,
"prompt_eval_count": 26,
"prompt_eval_duration": 342546000,
"eval_count": 282,
"eval_duration": 4535599000
}
```
#### Chat request (No streaming)
##### Request
```shell
curl http://localhost:11434/api/chat -d '{
"model": "llama2",
"messages": [
{
"role": "user",
"content": "why is the sky blue?"
}
],
"stream": false
}'
```
##### Response
```json
{
"model": "registry.ollama.ai/library/llama2:latest",
"created_at": "2023-12-12T14:13:43.416799Z",
"message": {
"role": "assistant",
"content": "Hello! How are you today?"
},
"done": true,
"total_duration": 5191566416,
"load_duration": 2154458,
"prompt_eval_count": 26,
"prompt_eval_duration": 383809000,
"eval_count": 298,
"eval_duration": 4799921000
}
```
#### Chat request (With History)
Send a chat message with a conversation history. You can use this same approach to start the conversation using multi-shot or chain-of-thought prompting.
##### Request
```shell
curl http://localhost:11434/api/chat -d '{
"model": "llama2",
"messages": [
{
"role": "user",
"content": "why is the sky blue?"
},
{
"role": "assistant",
"content": "due to rayleigh scattering."
},
{
"role": "user",
"content": "how is that different than mie scattering?"
}
]
}'
```
##### Response
A stream of JSON objects is returned:
```json
{
"model": "llama2",
"created_at": "2023-08-04T08:52:19.385406455-07:00",
"message": {
"role": "assistant",
"content": "The"
},
"done": false
}
```
Final response:
```json
{
"model": "llama2",
"created_at": "2023-08-04T19:22:45.499127Z",
"done": true,
"total_duration": 8113331500,
"load_duration": 6396458,
"prompt_eval_count": 61,
"prompt_eval_duration": 398801000,
"eval_count": 468,
"eval_duration": 7701267000
}
```
#### Chat request (with images)
##### Request
Send a chat message with a conversation history.
```shell
curl http://localhost:11434/api/chat -d '{
"model": "llava",
"messages": [
{
"role": "user",
"content": "what is in this image?",
"images": ["iVBORw0KGgoAAAANSUhEUgAAAG0AAABmCAYAAADBPx+VAAAACXBIWXMAAAsTAAALEwEAmpwYAAAAAXNSR0IArs4c6QAAAARnQU1BAACxjwv8YQUAAA3VSURBVHgB7Z27r0zdG8fX743i1bi1ikMoFMQloXRpKFFIqI7LH4BEQ+NWIkjQuSWCRIEoULk0gsK1kCBI0IhrQVT7tz/7zZo888yz1r7MnDl7z5xvsjkzs2fP3uu71nNfa7lkAsm7d++Sffv2JbNmzUqcc8m0adOSzZs3Z+/XES4ZckAWJEGWPiCxjsQNLWmQsWjRIpMseaxcuTKpG/7HP27I8P79e7dq1ars/yL4/v27S0ejqwv+cUOGEGGpKHR37tzJCEpHV9tnT58+dXXCJDdECBE2Ojrqjh071hpNECjx4cMHVycM1Uhbv359B2F79+51586daxN/+pyRkRFXKyRDAqxEp4yMlDDzXG1NPnnyJKkThoK0VFd1ELZu3TrzXKxKfW7dMBQ6bcuWLW2v0VlHjx41z717927ba22U9APcw7Nnz1oGEPeL3m3p2mTAYYnFmMOMXybPPXv2bNIPpFZr1NHn4HMw0KRBjg9NuRw95s8PEcz/6DZELQd/09C9QGq5RsmSRybqkwHGjh07OsJSsYYm3ijPpyHzoiacg35MLdDSIS/O1yM778jOTwYUkKNHWUzUWaOsylE00MyI0fcnOwIdjvtNdW/HZwNLGg+sR1kMepSNJXmIwxBZiG8tDTpEZzKg0GItNsosY8USkxDhD0Rinuiko2gfL/RbiD2LZAjU9zKQJj8RDR0vJBR1/Phx9+PHj9Z7REF4nTZkxzX4LCXHrV271qXkBAPGfP/atWvu/PnzHe4C97F48eIsRLZ9+3a3f/9+87dwP1JxaF7/3r17ba+5l4EcaVo0lj3SBq5kGTJSQmLWMjgYNei2GPT1MuMqGTDEFHzeQSP2wi/jGnkmPJ/nhccs44jvDAxpVcxnq0F6eT8h4ni/iIWpR5lPyA6ETkNXoSukvpJAD3AsXLiwpZs49+fPn5ke4j10TqYvegSfn0OnafC+Tv9ooA/JPkgQysqQNBzagXY55nO/oa1F7qvIPWkRL12WRpMWUvpVDYmxAPehxWSe8ZEXL20sadYIozfmNch4QJPAfeJgW3rNsnzphBKNJM2KKODo1rVOMRYik5ETy3ix4qWNI81qAAirizgMIc+yhTytx0JWZuNI03qsrgWlGtwjoS9XwgUhWGyhUaRZZQNNIEwCiXD16tXcAHUs79co0vSD8rrJCIW98pzvxpAWyyo3HYwqS0+H0BjStClcZJT5coMm6D2LOF8TolGJtK9fvyZpyiC5ePFi9nc/oJU4eiEP0jVoAnHa9wyJycITMP78+eMeP37sXrx44d6+fdt6f82aNdkx1pg9e3Zb5W+RSRE+n+VjksQWifvVaTKFhn5O8my63K8Qabdv33b379/PiAP//vuvW7BggZszZ072/+TJk91YgkafPn166zXB1rQHFvouAWHq9z3SEevSUerqCn2/dDCeta2jxYbr69evk4MHDyY7d+7MjhMnTiTPnz9Pfv/+nfQT2ggpO2dMF8cghuoM7Ygj5iWCqRlGFml0QC/ftGmTmzt3rmsaKDsgBSPh0/8yPeLLBihLkOKJc0jp8H8vUzcxIA1k6QJ/c78tWEyj5P3o4u9+jywNPdJi5rAH9x0KHcl4Hg570eQp3+vHXGyrmEeigzQsQsjavXt38ujRo44LQuDDhw+TW7duRS1HGgMxhNXHgflaNTOsHyKvHK5Ijo2jbFjJBQK9YwFd6RVMzfgRBmEfP37suBBm/p49e1qjEP2mwTViNRo0VJWH1deMXcNK08uUjVUu7s/zRaL+oLNxz1bpANco4npUgX4G2eFbpDFyQoQxojBCpEGSytmOH8qrH5Q9vuzD6ofQylkCUmh8DBAr+q8JCyVNtWQIidKQE9wNtLSQnS4jDSsxNHogzFuQBw4cyM61UKVsjfr3ooBkPSqqQHesUPWVtzi9/vQi1T+rJj7WiTz4Pt/l3LxUkr5P2VYZaZ4URpsE+st/dujQoaBBYokbrz/8TJNQYLSonrPS9kUaSkPeZyj1AWSj+d+VBoy1pIWVNed8P0Ll/ee5HdGRhrHhR5GGN0r4LGZBaj8oFDJitBTJzIZgFcmU0Y8ytWMZMzJOaXUSrUs5RxKnrxmbb5YXO9VGUhtpXldhEUogFr3IzIsvlpmdosVcGVGXFWp2oU9kLFL3dEkSz6NHEY1sjSRdIuDFWEhd8KxFqsRi1uM/nz9/zpxnwlESONdg6dKlbsaMGS4EHFHtjFIDHwKOo46l4TxSuxgDzi+rE2jg+BaFruOX4HXa0Nnf1lwAPufZeF8/r6zD97WK2qFnGjBxTw5qNGPxT+5T/r7/7RawFC3j4vTp09koCxkeHjqbHJqArmH5UrFKKksnxrK7FuRIs8STfBZv+luugXZ2pR/pP9Ois4z+TiMzUUkUjD0iEi1fzX8GmXyuxUBRcaUfykV0YZnlJGKQpOiGB76x5GeWkWWJc3mOrK6S7xdND+W5N6XyaRgtWJFe13GkaZnKOsYqGdOVVVbGupsyA/l7emTLHi7vwTdirNEt0qxnzAvBFcnQF16xh/TMpUuXHDowhlA9vQVraQhkudRdzOnK+04ZSP3DUhVSP61YsaLtd/ks7ZgtPcXqPqEafHkdqa84X6aCeL7YWlv6edGFHb+ZFICPlljHhg0bKuk0CSvVznWsotRu433alNdFrqG45ejoaPCaUkWERpLXjzFL2Rpllp7PJU2a/v7Ab8N05/9t27Z16KUqoFGsxnI9EosS2niSYg9SpU6B4JgTrvVW1flt1sT+0ADIJU2maXzcUTraGCRaL1Wp9rUMk16PMom8QhruxzvZIegJjFU7LLCePfS8uaQdPny4jTTL0dbee5mYokQsXTIWNY46kuMbnt8Kmec+LGWtOVIl9cT1rCB0V8WqkjAsRwta93TbwNYoGKsUSChN44lgBNCoHLHzquYKrU6qZ8lolCIN0Rh6cP0Q3U6I6IXILYOQI513hJaSKAorFpuHXJNfVlpRtmYBk1Su1obZr5dnKAO+L10Hrj3WZW+E3qh6IszE37F6EB+68mGpvKm4eb9bFrlzrok7fvr0Kfv727dvWRmdVTJHw0qiiCUSZ6wCK+7XL/AcsgNyL74DQQ730sv78Su7+t/A36MdY0sW5o40ahslXr58aZ5HtZB8GH64m9EmMZ7FpYw4T6QnrZfgenrhFxaSiSGXtPnz57e9TkNZLvTjeqhr734CNtrK41L40sUQckmj1lGKQ0rC37x544r8eNXRpnVE3ZZY7zXo8NomiO0ZUCj2uHz58rbXoZ6gc0uA+F6ZeKS/jhRDUq8MKrTho9fEkihMmhxtBI1DxKFY9XLpVcSkfoi8JGnToZO5sU5aiDQIW716ddt7ZLYtMQlhECdBGXZZMWldY5BHm5xgAroWj4C0hbYkSc/jBmggIrXJWlZM6pSETsEPGqZOndr2uuuR5rF169a2HoHPdurUKZM4CO1WTPqaDaAd+GFGKdIQkxAn9RuEWcTRyN2KSUgiSgF5aWzPTeA/lN5rZubMmR2bE4SIC4nJoltgAV/dVefZm72AtctUCJU2CMJ327hxY9t7EHbkyJFseq+EJSY16RPo3Dkq1kkr7+q0bNmyDuLQcZBEPYmHVdOBiJyIlrRDq41YPWfXOxUysi5fvtyaj+2BpcnsUV/oSoEMOk2CQGlr4ckhBwaetBhjCwH0ZHtJROPJkyc7UjcYLDjmrH7ADTEBXFfOYmB0k9oYBOjJ8b4aOYSe7QkKcYhFlq3QYLQhSidNmtS2RATwy8YOM3EQJsUjKiaWZ+vZToUQgzhkHXudb/PW5YMHD9yZM2faPsMwoc7RciYJXbGuBqJ1UIGKKLv915jsvgtJxCZDubdXr165mzdvtr1Hz5LONA8jrUwKPqsmVesKa49S3Q4WxmRPUEYdTjgiUcfUwLx589ySJUva3oMkP6IYddq6HMS4o55xBJBUeRjzfa4Zdeg56QZ43LhxoyPo7Lf1kNt7oO8wWAbNwaYjIv5lhyS7kRf96dvm5Jah8vfvX3flyhX35cuX6HfzFHOToS1H4BenCaHvO8pr8iDuwoUL7tevX+b5ZdbBair0xkFIlFDlW4ZknEClsp/TzXyAKVOmmHWFVSbDNw1l1+4f90U6IY/q4V27dpnE9bJ+v87QEydjqx/UamVVPRG+mwkNTYN+9tjkwzEx+atCm/X9WvWtDtAb68Wy9LXa1UmvCDDIpPkyOQ5ZwSzJ4jMrvFcr0rSjOUh+GcT4LSg5ugkW1Io0/SCDQBojh0hPlaJdah+tkVYrnTZowP8iq1F1TgMBBauufyB33x1v+NWFYmT5KmppgHC+NkAgbmRkpD3yn9QIseXymoTQFGQmIOKTxiZIWpvAatenVqRVXf2nTrAWMsPnKrMZHz6bJq5jvce6QK8J1cQNgKxlJapMPdZSR64/UivS9NztpkVEdKcrs5alhhWP9NeqlfWopzhZScI6QxseegZRGeg5a8C3Re1Mfl1ScP36ddcUaMuv24iOJtz7sbUjTS4qBvKmstYJoUauiuD3k5qhyr7QdUHMeCgLa1Ear9NquemdXgmum4fvJ6w1lqsuDhNrg1qSpleJK7K3TF0Q2jSd94uSZ60kK1e3qyVpQK6PVWXp2/FC3mp6jBhKKOiY2h3gtUV64TWM6wDETRPLDfSakXmH3w8g9Jlug8ZtTt4kVF0kLUYYmCCtD/DrQ5YhMGbA9L3ucdjh0y8kOHW5gU/VEEmJTcL4Pz/f7mgoAbYkAAAAAElFTkSuQmCC"]
},
]
}'
```
##### Response
```json
{
"model": "llava",
"created_at": "2023-12-13T22:42:50.203334Z",
"message": {
"role": "assistant",
"content": " The image features a cute, little pig with an angry facial expression. It's wearing a heart on its shirt and is waving in the air. This scene appears to be part of a drawing or sketching project.",
"images": null
},
"done": true,
"total_duration": 1668506709,
"load_duration": 1986209,
"prompt_eval_count": 26,
"prompt_eval_duration": 359682000,
"eval_count": 83,
"eval_duration": 1303285000
}
```
## Create a Model
```shell
POST /api/create
```
Create a model from a [`Modelfile`](./modelfile.md). It is recommended to set `modelfile` to the content of the Modelfile rather than just set `path`. This is a requirement for remote create. Remote model creation must also create any file blobs, fields such as `FROM` and `ADAPTER`, explicitly with the server using [Create a Blob](#create-a-blob) and the value to the path indicated in the response.
### Parameters
- `name`: name of the model to create
- `modelfile` (optional): contents of the Modelfile
- `stream`: (optional) if `false` the response will be returned as a single response object, rather than a stream of objects
- `path` (optional): path to the Modelfile
### Examples
#### Create a new model
Create a new model from a `Modelfile`.
##### Request
```shell
curl http://localhost:11434/api/create -d '{
"name": "mario",
"modelfile": "FROM llama2\nSYSTEM You are mario from Super Mario Bros."
}'
```
##### Response
A stream of JSON objects. Notice that the final JSON object shows a `"status": "success"`.
```json
{"status":"reading model metadata"}
{"status":"creating system layer"}
{"status":"using already created layer sha256:22f7f8ef5f4c791c1b03d7eb414399294764d7cc82c7e94aa81a1feb80a983a2"}
{"status":"using already created layer sha256:8c17c2ebb0ea011be9981cc3922db8ca8fa61e828c5d3f44cb6ae342bf80460b"}
{"status":"using already created layer sha256:7c23fb36d80141c4ab8cdbb61ee4790102ebd2bf7aeff414453177d4f2110e5d"}
{"status":"using already created layer sha256:2e0493f67d0c8c9c68a8aeacdf6a38a2151cb3c4c1d42accf296e19810527988"}
{"status":"using already created layer sha256:2759286baa875dc22de5394b4a925701b1896a7e3f8e53275c36f75a877a82c9"}
{"status":"writing layer sha256:df30045fe90f0d750db82a058109cecd6d4de9c90a3d75b19c09e5f64580bb42"}
{"status":"writing layer sha256:f18a68eb09bf925bb1b669490407c1b1251c5db98dc4d3d81f3088498ea55690"}
{"status":"writing manifest"}
{"status":"success"}
```
### Check if a Blob Exists
```shell
HEAD /api/blobs/:digest
```
Ensures that the file blob used for a FROM or ADAPTER field exists on the server. This is checking your Ollama server and not Ollama.ai.
#### Query Parameters
- `digest`: the SHA256 digest of the blob
#### Examples
##### Request
```shell
curl -I http://localhost:11434/api/blobs/sha256:29fdb92e57cf0827ded04ae6461b5931d01fa595843f55d36f5b275a52087dd2
```
##### Response
Return 200 OK if the blob exists, 404 Not Found if it does not.
### Create a Blob
```shell
POST /api/blobs/:digest
```
Create a blob from a file on the server. Returns the server file path.
#### Query Parameters
- `digest`: the expected SHA256 digest of the file
#### Examples
##### Request
```shell
curl -T model.bin -X POST http://localhost:11434/api/blobs/sha256:29fdb92e57cf0827ded04ae6461b5931d01fa595843f55d36f5b275a52087dd2
```
##### Response
Return 201 Created if the blob was successfully created, 400 Bad Request if the digest used is not expected.
## List Local Models
```shell
GET /api/tags
```
List models that are available locally.
### Examples
#### Request
```shell
curl http://localhost:11434/api/tags
```
#### Response
A single JSON object will be returned.
```json
{
"models": [
{
"name": "codellama:13b",
"modified_at": "2023-11-04T14:56:49.277302595-07:00",
"size": 7365960935,
"digest": "9f438cb9cd581fc025612d27f7c1a6669ff83a8bb0ed86c94fcf4c5440555697",
"details": {
"format": "gguf",
"family": "llama",
"families": null,
"parameter_size": "13B",
"quantization_level": "Q4_0"
}
},
{
"name": "llama2:latest",
"modified_at": "2023-12-07T09:32:18.757212583-08:00",
"size": 3825819519,
"digest": "fe938a131f40e6f6d40083c9f0f430a515233eb2edaa6d72eb85c50d64f2300e",
"details": {
"format": "gguf",
"family": "llama",
"families": null,
"parameter_size": "7B",
"quantization_level": "Q4_0"
}
}
]
}
```
## Show Model Information
```shell
POST /api/show
```
Show information about a model including details, modelfile, template, parameters, license, and system prompt.
### Parameters
- `name`: name of the model to show
### Examples
#### Request
```shell
curl http://localhost:11434/api/show -d '{
"name": "llama2"
}'
```
#### Response
```json
{
"modelfile": "# Modelfile generated by \"ollama show\"\n# To build a new Modelfile based on this one, replace the FROM line with:\n# FROM llava:latest\n\nFROM /Users/matt/.ollama/models/blobs/sha256:200765e1283640ffbd013184bf496e261032fa75b99498a9613be4e94d63ad52\nTEMPLATE \"\"\"{{ .System }}\nUSER: {{ .Prompt }}\nASSSISTANT: \"\"\"\nPARAMETER num_ctx 4096\nPARAMETER stop \"\u003c/s\u003e\"\nPARAMETER stop \"USER:\"\nPARAMETER stop \"ASSSISTANT:\"",
"parameters": "num_ctx 4096\nstop \u003c/s\u003e\nstop USER:\nstop ASSSISTANT:",
"template": "{{ .System }}\nUSER: {{ .Prompt }}\nASSSISTANT: ",
"details": {
"format": "gguf",
"family": "llama",
"families": ["llama", "clip"],
"parameter_size": "7B",
"quantization_level": "Q4_0"
}
}
```
## Copy a Model
```shell
POST /api/copy
```
Copy a model. Creates a model with another name from an existing model.
### Examples
#### Request
```shell
curl http://localhost:11434/api/copy -d '{
"source": "llama2",
"destination": "llama2-backup"
}'
```
#### Response
Returns a 200 OK if successful, or a 404 Not Found if the source model doesn't exist.
## Delete a Model
```shell
DELETE /api/delete
```
Delete a model and its data.
### Parameters
- `name`: model name to delete
### Examples
#### Request
```shell
curl -X DELETE http://localhost:11434/api/delete -d '{
"name": "llama2:13b"
}'
```
#### Response
Returns a 200 OK if successful, 404 Not Found if the model to be deleted doesn't exist.
## Pull a Model
```shell
POST /api/pull
```
Download a model from the ollama library. Cancelled pulls are resumed from where they left off, and multiple calls will share the same download progress.
### Parameters
- `name`: name of the model to pull
- `insecure`: (optional) allow insecure connections to the library. Only use this if you are pulling from your own library during development.
- `stream`: (optional) if `false` the response will be returned as a single response object, rather than a stream of objects
### Examples
#### Request
```shell
curl http://localhost:11434/api/pull -d '{
"name": "llama2"
}'
```
#### Response
If `stream` is not specified, or set to `true`, a stream of JSON objects is returned:
The first object is the manifest:
```json
{
"status": "pulling manifest"
}
```
Then there is a series of downloading responses. Until any of the download is completed, the `completed` key may not be included. The number of files to be downloaded depends on the number of layers specified in the manifest.
```json
{
"status": "downloading digestname",
"digest": "digestname",
"total": 2142590208,
"completed": 241970
}
```
After all the files are downloaded, the final responses are:
```json
{
"status": "verifying sha256 digest"
}
{
"status": "writing manifest"
}
{
"status": "removing any unused layers"
}
{
"status": "success"
}
```
if `stream` is set to false, then the response is a single JSON object:
```json
{
"status": "success"
}
```
## Push a Model
```shell
POST /api/push
```
Upload a model to a model library. Requires registering for ollama.ai and adding a public key first.
### Parameters
- `name`: name of the model to push in the form of `<namespace>/<model>:<tag>`
- `insecure`: (optional) allow insecure connections to the library. Only use this if you are pushing to your library during development.
- `stream`: (optional) if `false` the response will be returned as a single response object, rather than a stream of objects
### Examples
#### Request
```shell
curl http://localhost:11434/api/push -d '{
"name": "mattw/pygmalion:latest"
}'
```
#### Response
If `stream` is not specified, or set to `true`, a stream of JSON objects is returned:
```json
{ "status": "retrieving manifest" }
```
and then:
```json
{
"status": "starting upload",
"digest": "sha256:bc07c81de745696fdf5afca05e065818a8149fb0c77266fb584d9b2cba3711ab",
"total": 1928429856
}
```
Then there is a series of uploading responses:
```json
{
"status": "starting upload",
"digest": "sha256:bc07c81de745696fdf5afca05e065818a8149fb0c77266fb584d9b2cba3711ab",
"total": 1928429856
}
```
Finally, when the upload is complete:
```json
{"status":"pushing manifest"}
{"status":"success"}
```
If `stream` is set to `false`, then the response is a single JSON object:
```json
{ "status": "success" }
```
## Generate Embeddings
```shell
POST /api/embeddings
```
Generate embeddings from a model
### Parameters
- `model`: name of model to generate embeddings from
- `prompt`: text to generate embeddings for
Advanced parameters:
- `options`: additional model parameters listed in the documentation for the [Modelfile](./modelfile.md#valid-parameters-and-values) such as `temperature`
### Examples
#### Request
```shell
curl http://localhost:11434/api/embeddings -d '{
"model": "llama2",
"prompt": "Here is an article about llamas..."
}'
```
#### Response
```json
{
"embedding": [
0.5670403838157654, 0.009260174818336964, 0.23178744316101074, -0.2916173040866852, -0.8924556970596313,
0.8785552978515625, -0.34576427936553955, 0.5742510557174683, -0.04222835972905159, -0.137906014919281
]
}
```

View File

@@ -2,39 +2,135 @@
Install required tools:
```
brew install cmake go node
- cmake version 3.24 or higher
- go version 1.21 or higher
- gcc version 11.4.0 or higher
```bash
brew install go cmake gcc
```
Then run `make`:
Optionally enable debugging and more verbose logging:
```bash
# At build time
export CGO_CFLAGS="-g"
# At runtime
export OLLAMA_DEBUG=1
```
make
Get the required libraries and build the native LLM code:
```bash
go generate ./...
```
Then build ollama:
```bash
go build .
```
Now you can run `ollama`:
```
```bash
./ollama
```
## Releasing
### Linux
To release a new version of Ollama you'll need to set some environment variables:
#### Linux CUDA (NVIDIA)
* `GITHUB_TOKEN`: your GitHub token
* `APPLE_IDENTITY`: the Apple signing identity (macOS only)
* `APPLE_ID`: your Apple ID
* `APPLE_PASSWORD`: your Apple ID app-specific password
* `APPLE_TEAM_ID`: the Apple team ID for the signing identity
* `TELEMETRY_WRITE_KEY`: segment write key for telemetry
*Your operating system distribution may already have packages for NVIDIA CUDA. Distro packages are often preferable, but instructions are distro-specific. Please consult distro-specific docs for dependencies if available!*
Then run the publish script with the target version:
Install `cmake` and `golang` as well as [NVIDIA CUDA](https://developer.nvidia.com/cuda-downloads)
development and runtime packages.
Typically the build scripts will auto-detect CUDA, however, if your Linux distro
or installation approach uses unusual paths, you can specify the location by
specifying an environment variable `CUDA_LIB_DIR` to the location of the shared
libraries, and `CUDACXX` to the location of the nvcc compiler.
Then generate dependencies:
```
VERSION=0.0.2 ./scripts/publish.sh
go generate ./...
```
Then build the binary:
```
go build .
```
#### Linux ROCm (AMD)
*Your operating system distribution may already have packages for AMD ROCm and CLBlast. Distro packages are often preferable, but instructions are distro-specific. Please consult distro-specific docs for dependencies if available!*
Install [CLBlast](https://github.com/CNugteren/CLBlast/blob/master/doc/installation.md) and [ROCm](https://rocm.docs.amd.com/en/latest/deploy/linux/quick_start.html) developement packages first, as well as `cmake` and `golang`.
Typically the build scripts will auto-detect ROCm, however, if your Linux distro
or installation approach uses unusual paths, you can specify the location by
specifying an environment variable `ROCM_PATH` to the location of the ROCm
install (typically `/opt/rocm`), and `CLBlast_DIR` to the location of the
CLBlast install (typically `/usr/lib/cmake/CLBlast`).
```
go generate ./...
```
Then build the binary:
```
go build .
```
ROCm requires elevated privileges to access the GPU at runtime. On most distros you can add your user account to the `render` group, or run as root.
#### Advanced CPU Settings
By default, running `go generate ./...` will compile a few different variations
of the LLM library based on common CPU families and vector math capabilities,
including a lowest-common-denominator which should run on almost any 64 bit CPU
somewhat slowly. At runtime, Ollama will auto-detect the optimal variation to
load. If you would like to build a CPU-based build customized for your
processor, you can set `OLLAMA_CUSTOM_CPU_DEFS` to the llama.cpp flags you would
like to use. For example, to compile an optimized binary for an Intel i9-9880H,
you might use:
```
OLLAMA_CUSTOM_CPU_DEFS="-DLLAMA_AVX=on -DLLAMA_AVX2=on -DLLAMA_F16C=on -DLLAMA_FMA=on" go generate ./...
go build .
```
#### Containerized Linux Build
If you have Docker available, you can build linux binaries with `./scripts/build_linux.sh` which has the CUDA and ROCm dependencies included. The resulting binary is placed in `./dist`
### Windows
Note: The windows build for Ollama is still under development.
Install required tools:
- MSVC toolchain - C/C++ and cmake as minimal requirements
- go version 1.21 or higher
- MinGW (pick one variant) with GCC.
- <https://www.mingw-w64.org/>
- <https://www.msys2.org/>
```powershell
$env:CGO_ENABLED="1"
go generate ./...
go build .
```
#### Windows CUDA (NVIDIA)
In addition to the common Windows development tools described above, install:
- [NVIDIA CUDA](https://docs.nvidia.com/cuda/cuda-installation-guide-microsoft-windows/index.html)

114
docs/faq.md Normal file
View File

@@ -0,0 +1,114 @@
# FAQ
## How can I upgrade Ollama?
To upgrade Ollama, run the installation process again. On the Mac, click the Ollama icon in the menubar and choose the restart option if an update is available.
## How can I view the logs?
Review the [Troubleshooting](./troubleshooting.md) docs for more about using logs.
## How do I use Ollama server environment variables on Mac
On macOS, Ollama runs in the background and is managed by the menubar app. If adding environment variables, Ollama will need to be run manually.
1. Click the menubar icon for Ollama and choose **Quit Ollama**.
2. Open a new terminal window and run the following command (this example uses `OLLAMA_HOST` with an IP address of `123.1.1.1`):
```bash
OLLAMA_HOST=123.1.1.1 ollama serve
```
## How do I use Ollama server environment variables on Linux?
If Ollama is installed with the install script, a systemd service was created, running as the Ollama user. To add an environment variable, such as OLLAMA_HOST, follow these steps:
1. Create a `systemd` drop-in directory and add a config file. This is only needed once.
```bash
mkdir -p /etc/systemd/system/ollama.service.d
echo '[Service]' >>/etc/systemd/system/ollama.service.d/environment.conf
```
2. For each environment variable, add it to the config file:
```bash
echo 'Environment="OLLAMA_HOST=0.0.0.0:11434"' >>/etc/systemd/system/ollama.service.d/environment.conf
```
3. Reload `systemd` and restart Ollama:
```bash
systemctl daemon-reload
systemctl restart ollama
```
## How can I expose Ollama on my network?
Ollama binds to 127.0.0.1 port 11434 by default. Change the bind address with the `OLLAMA_HOST` environment variable. Refer to the section above for how to use environment variables on your platform.
## How can I allow additional web origins to access Ollama?
Ollama allows cross-origin requests from `127.0.0.1` and `0.0.0.0` by default. Add additional origins with the `OLLAMA_ORIGINS` environment variable. For example, to add all ports on 192.168.1.1 and https://example.com, use:
```shell
OLLAMA_ORIGINS=http://192.168.1.1:*,https://example.com
```
Refer to the section above for how to use environment variables on your platform.
## Where are models stored?
- macOS: `~/.ollama/models`.
- Linux: `/usr/share/ollama/.ollama/models`
## How do I set them to a different location?
If a different directory needs to be used, set the environment variable `OLLAMA_MODELS` to the chosen directory. Refer to the section above for how to use environment variables on your platform.
## Does Ollama send my prompts and answers back to Ollama.ai to use in any way?
No, Ollama runs entirely locally, and conversation data will never leave your machine.
## How can I use Ollama in Visual Studio Code?
There is already a large collection of plugins available for VSCode as well as other editors that leverage Ollama. See the list of [extensions & plugins](https://github.com/jmorganca/ollama#extensions--plugins) at the bottom of the main repository readme.
## How do I use Ollama behind a proxy?
Ollama is compatible with proxy servers if `HTTP_PROXY` or `HTTPS_PROXY` are configured. When using either variables, ensure it is set where `ollama serve` can access the values. When using `HTTPS_PROXY`, ensure the proxy certificate is installed as a system certificate. Refer to the section above for how to use environment variables on your platform.
### How do I use Ollama behind a proxy in Docker?
The Ollama Docker container image can be configured to use a proxy by passing `-e HTTPS_PROXY=https://proxy.example.com` when starting the container.
Alternatively, the Docker daemon can be configured to use a proxy. Instructions are available for Docker Desktop on [macOS](https://docs.docker.com/desktop/settings/mac/#proxies), [Windows](https://docs.docker.com/desktop/settings/windows/#proxies), and [Linux](https://docs.docker.com/desktop/settings/linux/#proxies), and Docker [daemon with systemd](https://docs.docker.com/config/daemon/systemd/#httphttps-proxy).
Ensure the certificate is installed as a system certificate when using HTTPS. This may require a new Docker image when using a self-signed certificate.
```dockerfile
FROM ollama/ollama
COPY my-ca.pem /usr/local/share/ca-certificates/my-ca.crt
RUN update-ca-certificates
```
Build and run this image:
```shell
docker build -t ollama-with-ca .
docker run -d -e HTTPS_PROXY=https://my.proxy.example.com -p 11434:11434 ollama-with-ca
```
## How do I use Ollama with GPU acceleration in Docker?
The Ollama Docker container can be configured with GPU acceleration in Linux or Windows (with WSL2). This requires the [nvidia-container-toolkit](https://github.com/NVIDIA/nvidia-container-toolkit). See [ollama/ollama](https://hub.docker.com/r/ollama/ollama) for more details.
GPU acceleration is not available for Docker Desktop in macOS due to the lack of GPU passthrough and emulation.
## Why is networking slow in WSL2 on Windows 10?
This can impact both installing Ollama, as well as downloading models.
Open `Control Panel > Networking and Internet > View network status and tasks` and click on `Change adapter settings` on the left panel. Find the `vEthernel (WSL)` adapter, right click and select `Properties`.
Click on `Configure` and open the `Advanced` tab. Search through each of the properties until you find `Large Send Offload Version 2 (IPv4)` and `Large Send Offload Version 2 (IPv6)`. *Disable* both of these
properties.

195
docs/import.md Normal file
View File

@@ -0,0 +1,195 @@
# Import a model
This guide walks through importing a GGUF, PyTorch or Safetensors model.
## Importing (GGUF)
### Step 1: Write a `Modelfile`
Start by creating a `Modelfile`. This file is the blueprint for your model, specifying weights, parameters, prompt templates and more.
```
FROM ./mistral-7b-v0.1.Q4_0.gguf
```
(Optional) many chat models require a prompt template in order to answer correctly. A default prompt template can be specified with the `TEMPLATE` instruction in the `Modelfile`:
```
FROM ./q4_0.bin
TEMPLATE "[INST] {{ .Prompt }} [/INST]"
```
### Step 2: Create the Ollama model
Finally, create a model from your `Modelfile`:
```
ollama create example -f Modelfile
```
### Step 3: Run your model
Next, test the model with `ollama run`:
```
ollama run example "What is your favourite condiment?"
```
## Importing (PyTorch & Safetensors)
### Supported models
Ollama supports a set of model architectures, with support for more coming soon:
- Llama & Mistral
- Falcon & RW
- BigCode
To view a model's architecture, check the `config.json` file in its HuggingFace repo. You should see an entry under `architectures` (e.g. `LlamaForCausalLM`).
### Step 1: Clone the HuggingFace repository (optional)
If the model is currently hosted in a HuggingFace repository, first clone that repository to download the raw model.
```
git lfs install
git clone https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.1
cd Mistral-7B-Instruct-v0.1
```
### Step 2: Convert and quantize to a `.bin` file (optional, for PyTorch and Safetensors)
If the model is in PyTorch or Safetensors format, a [Docker image](https://hub.docker.com/r/ollama/quantize) with the tooling required to convert and quantize models is available.
First, Install [Docker](https://www.docker.com/get-started/).
Next, to convert and quantize your model, run:
```
docker run --rm -v .:/model ollama/quantize -q q4_0 /model
```
This will output two files into the directory:
- `f16.bin`: the model converted to GGUF
- `q4_0.bin` the model quantized to a 4-bit quantization (Ollama will use this file to create the Ollama model)
### Step 3: Write a `Modelfile`
Next, create a `Modelfile` for your model:
```
FROM ./q4_0.bin
```
(Optional) many chat models require a prompt template in order to answer correctly. A default prompt template can be specified with the `TEMPLATE` instruction in the `Modelfile`:
```
FROM ./q4_0.bin
TEMPLATE "[INST] {{ .Prompt }} [/INST]"
```
### Step 4: Create the Ollama model
Finally, create a model from your `Modelfile`:
```
ollama create example -f Modelfile
```
### Step 5: Run your model
Next, test the model with `ollama run`:
```
ollama run example "What is your favourite condiment?"
```
## Publishing your model (optional early alpha)
Publishing models is in early alpha. If you'd like to publish your model to share with others, follow these steps:
1. Create [an account](https://ollama.ai/signup)
2. Run `cat ~/.ollama/id_ed25519.pub` to view your Ollama public key. Copy this to the clipboard.
3. Add your public key to your [Ollama account](https://ollama.ai/settings/keys)
Next, copy your model to your username's namespace:
```
ollama cp example <your username>/example
```
Then push the model:
```
ollama push <your username>/example
```
After publishing, your model will be available at `https://ollama.ai/<your username>/example`.
## Quantization reference
The quantization options are as follow (from highest highest to lowest levels of quantization). Note: some architectures such as Falcon do not support K quants.
- `q2_K`
- `q3_K`
- `q3_K_S`
- `q3_K_M`
- `q3_K_L`
- `q4_0` (recommended)
- `q4_1`
- `q4_K`
- `q4_K_S`
- `q4_K_M`
- `q5_0`
- `q5_1`
- `q5_K`
- `q5_K_S`
- `q5_K_M`
- `q6_K`
- `q8_0`
- `f16`
## Manually converting & quantizing models
### Prerequisites
Start by cloning the `llama.cpp` repo to your machine in another directory:
```
git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
```
Next, install the Python dependencies:
```
pip install -r requirements.txt
```
Finally, build the `quantize` tool:
```
make quantize
```
### Convert the model
Run the correct conversion script for your model architecture:
```shell
# LlamaForCausalLM or MistralForCausalLM
python convert.py <path to model directory>
# FalconForCausalLM
python convert-falcon-hf-to-gguf.py <path to model directory>
# GPTBigCodeForCausalLM
python convert-starcoder-hf-to-gguf.py <path to model directory>
```
### Quantize the model
```
quantize <path to model dir>/ggml-model-f32.bin <path to model dir>/q4_0.bin q4_0
```

117
docs/linux.md Normal file
View File

@@ -0,0 +1,117 @@
# Ollama on Linux
## Install
Install Ollama running this one-liner:
>
```bash
curl https://ollama.ai/install.sh | sh
```
## Manual install
### Download the `ollama` binary
Ollama is distributed as a self-contained binary. Download it to a directory in your PATH:
```bash
sudo curl -L https://ollama.ai/download/ollama-linux-amd64 -o /usr/bin/ollama
sudo chmod +x /usr/bin/ollama
```
### Adding Ollama as a startup service (recommended)
Create a user for Ollama:
```bash
sudo useradd -r -s /bin/false -m -d /usr/share/ollama ollama
```
Create a service file in `/etc/systemd/system/ollama.service`:
```ini
[Unit]
Description=Ollama Service
After=network-online.target
[Service]
ExecStart=/usr/bin/ollama serve
User=ollama
Group=ollama
Restart=always
RestartSec=3
[Install]
WantedBy=default.target
```
Then start the service:
```bash
sudo systemctl daemon-reload
sudo systemctl enable ollama
```
### Install CUDA drivers (optional for Nvidia GPUs)
[Download and install](https://developer.nvidia.com/cuda-downloads) CUDA.
Verify that the drivers are installed by running the following command, which should print details about your GPU:
```bash
nvidia-smi
```
### Start Ollama
Start Ollama using `systemd`:
```bash
sudo systemctl start ollama
```
## Update
Update ollama by running the install script again:
```bash
curl https://ollama.ai/install.sh | sh
```
Or by downloading the ollama binary:
```bash
sudo curl -L https://ollama.ai/download/ollama-linux-amd64 -o /usr/bin/ollama
sudo chmod +x /usr/bin/ollama
```
## Viewing logs
To view logs of Ollama running as a startup service, run:
```bash
journalctl -u ollama
```
## Uninstall
Remove the ollama service:
```bash
sudo systemctl stop ollama
sudo systemctl disable ollama
sudo rm /etc/systemd/system/ollama.service
```
Remove the ollama binary from your bin directory (either `/usr/local/bin`, `/usr/bin`, or `/bin`):
```bash
sudo rm $(which ollama)
```
Remove the downloaded models and Ollama service user and group:
```bash
sudo rm -r /usr/share/ollama
sudo userdel ollama
sudo groupdel ollama
```

213
docs/modelfile.md Normal file
View File

@@ -0,0 +1,213 @@
# Ollama Model File
> Note: `Modelfile` syntax is in development
A model file is the blueprint to create and share models with Ollama.
## Table of Contents
- [Format](#format)
- [Examples](#examples)
- [Instructions](#instructions)
- [FROM (Required)](#from-required)
- [Build from llama2](#build-from-llama2)
- [Build from a bin file](#build-from-a-bin-file)
- [PARAMETER](#parameter)
- [Valid Parameters and Values](#valid-parameters-and-values)
- [TEMPLATE](#template)
- [Template Variables](#template-variables)
- [SYSTEM](#system)
- [ADAPTER](#adapter)
- [LICENSE](#license)
- [Notes](#notes)
## Format
The format of the `Modelfile`:
```modelfile
# comment
INSTRUCTION arguments
```
| Instruction | Description |
| ----------------------------------- | -------------------------------------------------------------- |
| [`FROM`](#from-required) (required) | Defines the base model to use. |
| [`PARAMETER`](#parameter) | Sets the parameters for how Ollama will run the model. |
| [`TEMPLATE`](#template) | The full prompt template to be sent to the model. |
| [`SYSTEM`](#system) | Specifies the system message that will be set in the template. |
| [`ADAPTER`](#adapter) | Defines the (Q)LoRA adapters to apply to the model. |
| [`LICENSE`](#license) | Specifies the legal license. |
## Examples
### Basic `Modelfile`
An example of a `Modelfile` creating a mario blueprint:
```modelfile
FROM llama2
# sets the temperature to 1 [higher is more creative, lower is more coherent]
PARAMETER temperature 1
# sets the context window size to 4096, this controls how many tokens the LLM can use as context to generate the next token
PARAMETER num_ctx 4096
# sets a custom system message to specify the behavior of the chat assistant
SYSTEM You are Mario from super mario bros, acting as an assistant.
```
To use this:
1. Save it as a file (e.g. `Modelfile`)
2. `ollama create choose-a-model-name -f <location of the file e.g. ./Modelfile>'`
3. `ollama run choose-a-model-name`
4. Start using the model!
More examples are available in the [examples directory](../examples).
### `Modelfile`s in [ollama.ai/library][1]
There are two ways to view `Modelfile`s underlying the models in [ollama.ai/library][1]:
- Option 1: view a details page from a model's tags page:
1. Go to a particular model's tags (e.g. https://ollama.ai/library/llama2/tags)
2. Click on a tag (e.g. https://ollama.ai/library/llama2:13b)
3. Scroll down to "Layers"
- Note: if the [`FROM` instruction](#from-required) is not present,
it means the model was created from a local file
- Option 2: use `ollama show` to print the `Modelfile` for any local models like so:
```bash
> ollama show --modelfile llama2:13b
# Modelfile generated by "ollama show"
# To build a new Modelfile based on this one, replace the FROM line with:
# FROM llama2:13b
FROM /root/.ollama/models/blobs/sha256:123abc
TEMPLATE """[INST] {{ if and .First .System }}<<SYS>>{{ .System }}<</SYS>>
{{ end }}{{ .Prompt }} [/INST] """
SYSTEM """"""
PARAMETER stop [INST]
PARAMETER stop [/INST]
PARAMETER stop <<SYS>>
PARAMETER stop <</SYS>>
```
## Instructions
### FROM (Required)
The `FROM` instruction defines the base model to use when creating a model.
```modelfile
FROM <model name>:<tag>
```
#### Build from llama2
```modelfile
FROM llama2
```
A list of available base models:
<https://github.com/jmorganca/ollama#model-library>
#### Build from a `bin` file
```modelfile
FROM ./ollama-model.bin
```
This bin file location should be specified as an absolute path or relative to the `Modelfile` location.
### PARAMETER
The `PARAMETER` instruction defines a parameter that can be set when the model is run.
```modelfile
PARAMETER <parameter> <parametervalue>
```
### Valid Parameters and Values
| Parameter | Description | Value Type | Example Usage |
| -------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ---------- | -------------------- |
| mirostat | Enable Mirostat sampling for controlling perplexity. (default: 0, 0 = disabled, 1 = Mirostat, 2 = Mirostat 2.0) | int | mirostat 0 |
| mirostat_eta | Influences how quickly the algorithm responds to feedback from the generated text. A lower learning rate will result in slower adjustments, while a higher learning rate will make the algorithm more responsive. (Default: 0.1) | float | mirostat_eta 0.1 |
| mirostat_tau | Controls the balance between coherence and diversity of the output. A lower value will result in more focused and coherent text. (Default: 5.0) | float | mirostat_tau 5.0 |
| num_ctx | Sets the size of the context window used to generate the next token. (Default: 2048) | int | num_ctx 4096 |
| num_gqa | The number of GQA groups in the transformer layer. Required for some models, for example it is 8 for llama2:70b | int | num_gqa 1 |
| num_gpu | The number of layers to send to the GPU(s). On macOS it defaults to 1 to enable metal support, 0 to disable. | int | num_gpu 50 |
| num_thread | Sets the number of threads to use during computation. By default, Ollama will detect this for optimal performance. It is recommended to set this value to the number of physical CPU cores your system has (as opposed to the logical number of cores). | int | num_thread 8 |
| repeat_last_n | Sets how far back for the model to look back to prevent repetition. (Default: 64, 0 = disabled, -1 = num_ctx) | int | repeat_last_n 64 |
| repeat_penalty | Sets how strongly to penalize repetitions. A higher value (e.g., 1.5) will penalize repetitions more strongly, while a lower value (e.g., 0.9) will be more lenient. (Default: 1.1) | float | repeat_penalty 1.1 |
| temperature | The temperature of the model. Increasing the temperature will make the model answer more creatively. (Default: 0.8) | float | temperature 0.7 |
| seed | Sets the random number seed to use for generation. Setting this to a specific number will make the model generate the same text for the same prompt. (Default: 0) | int | seed 42 |
| stop | Sets the stop sequences to use. When this pattern is encountered the LLM will stop generating text and return. Multiple stop patterns may be set by specifying multiple separate `stop` parameters in a modelfile. | string | stop "AI assistant:" |
| tfs_z | Tail free sampling is used to reduce the impact of less probable tokens from the output. A higher value (e.g., 2.0) will reduce the impact more, while a value of 1.0 disables this setting. (default: 1) | float | tfs_z 1 |
| num_predict | Maximum number of tokens to predict when generating text. (Default: 128, -1 = infinite generation, -2 = fill context) | int | num_predict 42 |
| top_k | Reduces the probability of generating nonsense. A higher value (e.g. 100) will give more diverse answers, while a lower value (e.g. 10) will be more conservative. (Default: 40) | int | top_k 40 |
| top_p | Works together with top-k. A higher value (e.g., 0.95) will lead to more diverse text, while a lower value (e.g., 0.5) will generate more focused and conservative text. (Default: 0.9) | float | top_p 0.9 |
### TEMPLATE
`TEMPLATE` of the full prompt template to be passed into the model. It may include (optionally) a system message and a user's prompt. This is used to create a full custom prompt, and syntax may be model specific. You can usually find the template for a given model in the readme for that model.
#### Template Variables
| Variable | Description |
| ----------------- | ------------------------------------------------------------------------------------------------------------- |
| `{{ .System }}` | The system message used to specify custom behavior, this must also be set in the Modelfile as an instruction. |
| `{{ .Prompt }}` | The incoming prompt, this is not specified in the model file and will be set based on input. |
| `{{ .Response }}` | The response from the LLM, if not specified response is appended to the end of the template. |
| `{{ .First }}` | A boolean value used to render specific template information for the first generation of a session. |
```modelfile
TEMPLATE """
{{- if .First }}
### System:
{{ .System }}
{{- end }}
### User:
{{ .Prompt }}
### Response:
"""
SYSTEM """<system message>"""
```
### SYSTEM
The `SYSTEM` instruction specifies the system message to be used in the template, if applicable.
```modelfile
SYSTEM """<system message>"""
```
### ADAPTER
The `ADAPTER` instruction specifies the LoRA adapter to apply to the base model. The value of this instruction should be an absolute path or a path relative to the Modelfile and the file must be in a GGML file format. The adapter should be tuned from the base model otherwise the behaviour is undefined.
```modelfile
ADAPTER ./ollama-lora.bin
```
### LICENSE
The `LICENSE` instruction allows you to specify the legal license under which the model used with this Modelfile is shared or distributed.
```modelfile
LICENSE """
<license text>
"""
```
## Notes
- the **`Modelfile` is not case sensitive**. In the examples, uppercase instructions are used to make it easier to distinguish it from arguments.
- Instructions can be in any order. In the examples, the `FROM` instruction is first to keep it easily readable.
[1]: https://ollama.ai/library

View File

@@ -1,64 +0,0 @@
# Python SDK
## Install
```
pip install ollama
```
## Example
```python
import ollama
ollama.generate("orca-mini-3b", "hi")
```
## Reference
### `ollama.generate(model, message)`
Generate a completion
```python
ollama.generate("./llama-7b-ggml.bin", "hi")
```
### `ollama.models()`
List available local models
```python
models = ollama.models()
```
### `ollama.load(model)`
Manually a model for generation
```python
ollama.load("model")
```
### `ollama.unload(model)`
Unload a model
```python
ollama.unload("model")
```
### `ollama.pull(model)`
Download a model
```python
ollama.pull("huggingface.co/thebloke/llama-7b-ggml")
```
### `ollama.search(query)`
Search for compatible models that Ollama can run
```python
ollama.search("llama-7b")
```

53
docs/troubleshooting.md Normal file
View File

@@ -0,0 +1,53 @@
# How to troubleshoot issues
Sometimes Ollama may not perform as expected. One of the best ways to figure out what happened is to take a look at the logs. Find the logs on Mac by running the command:
```shell
cat ~/.ollama/logs/server.log
```
On Linux systems with systemd, the logs can be found with this command:
```shell
journalctl -u ollama
```
If manually running `ollama serve` in a terminal, the logs will be on that terminal.
Join the [Discord](https://discord.gg/ollama) for help interpreting the logs.
## LLM libraries
Ollama includes multiple LLM libraries compiled for different GPUs and CPU
vector features. Ollama tries to pick the best one based on the capabilities of
your system. If this autodetection has problems, or you run into other problems
(e.g. crashes in your GPU) you can workaround this by forcing a specific LLM
library. `cpu_avx2` will perform the best, followed by `cpu_avx` an the slowest
but most compatible is `cpu`. Rosetta emulation under MacOS will work with the
`cpu` library.
In the server log, you will see a message that looks something like this (varies
from release to release):
```
Dynamic LLM libraries [rocm_v6 cpu cpu_avx cpu_avx2 cuda_v11 rocm_v5]
```
**Experimental LLM Library Override**
You can set OLLAMA_LLM_LIBRARY to any of the available LLM libraries to bypass
autodetection, so for example, if you have a CUDA card, but want to force the
CPU LLM library with AVX2 vector support, use:
```
OLLAMA_LLM_LIBRARY="cpu_avx2" ollama serve
```
You can see what features your CPU has with the following.
```
cat /proc/cpuinfo| grep flags | head -1
```
## Known issues
* N/A

9
docs/tutorials.md Normal file
View File

@@ -0,0 +1,9 @@
# Tutorials
Here is a list of ways you can use Ollama with other tools to build interesting applications.
- [Using LangChain with Ollama in JavaScript](./tutorials/langchainjs.md)
- [Using LangChain with Ollama in Python](./tutorials/langchainpy.md)
- [Running Ollama on NVIDIA Jetson Devices](./tutorials/nvidia-jetson.md)
Also be sure to check out the [examples](../examples) directory for more ways to use Ollama.

83
docs/tutorials/fly-gpu.md Normal file
View File

@@ -0,0 +1,83 @@
# Running Ollama on Fly.io GPU Instances
Ollama runs with little to no configuration on [Fly.io GPU instances](https://fly.io/docs/gpus/gpu-quickstart/). If you don't have access to GPUs yet, you'll need to [apply for access](https://fly.io/gpu/) on the waitlist. Once you're accepted, you'll get an email with instructions on how to get started.
Create a new app with `fly apps create`:
```bash
fly apps create
```
Then create a `fly.toml` file in a new folder that looks like this:
```toml
app = "sparkling-violet-709"
primary_region = "ord"
vm.size = "a100-40gb" # see https://fly.io/docs/gpus/gpu-quickstart/ for more info
[build]
image = "ollama/ollama"
[http_service]
internal_port = 11434
force_https = false
auto_stop_machines = true
auto_start_machines = true
min_machines_running = 0
processes = ["app"]
[mounts]
source = "models"
destination = "/root/.ollama"
initial_size = "100gb"
```
Then create a [new private IPv6 address](https://fly.io/docs/reference/private-networking/#flycast-private-load-balancing) for your app:
```bash
fly ips allocate-v6 --private
```
Then deploy your app:
```bash
fly deploy
```
And finally you can access it interactively with a new Fly.io Machine:
```
fly machine run -e OLLAMA_HOST=http://your-app-name.flycast --shell ollama/ollama
```
```bash
$ ollama run openchat:7b-v3.5-fp16
>>> How do I bake chocolate chip cookies?
To bake chocolate chip cookies, follow these steps:
1. Preheat the oven to 375°F (190°C) and line a baking sheet with parchment paper or silicone baking mat.
2. In a large bowl, mix together 1 cup of unsalted butter (softened), 3/4 cup granulated sugar, and 3/4
cup packed brown sugar until light and fluffy.
3. Add 2 large eggs, one at a time, to the butter mixture, beating well after each addition. Stir in 1
teaspoon of pure vanilla extract.
4. In a separate bowl, whisk together 2 cups all-purpose flour, 1/2 teaspoon baking soda, and 1/2 teaspoon
salt. Gradually add the dry ingredients to the wet ingredients, stirring until just combined.
5. Fold in 2 cups of chocolate chips (or chunks) into the dough.
6. Drop rounded tablespoons of dough onto the prepared baking sheet, spacing them about 2 inches apart.
7. Bake for 10-12 minutes, or until the edges are golden brown. The centers should still be slightly soft.
8. Allow the cookies to cool on the baking sheet for a few minutes before transferring them to a wire rack
to cool completely.
Enjoy your homemade chocolate chip cookies!
```
When you set it up like this, it will automatically turn off when you're done using it. Then when you access it again, it will automatically turn back on. This is a great way to save money on GPU instances when you're not using them. If you want a persistent wake-on-use connection to your Ollama instance, you can set up a [connection to your Fly network using WireGuard](https://fly.io/docs/reference/private-networking/#discovering-apps-through-dns-on-a-wireguard-connection). Then you can access your Ollama instance at `http://your-app-name.flycast`.
And that's it!

View File

@@ -0,0 +1,77 @@
# Using LangChain with Ollama using JavaScript
In this tutorial, we are going to use JavaScript with LangChain and Ollama to learn about something just a touch more recent. In August 2023, there was a series of wildfires on Maui. There is no way an LLM trained before that time can know about this, since their training data would not include anything as recent as that. So we can find the [Wikipedia article about the fires](https://en.wikipedia.org/wiki/2023_Hawaii_wildfires) and ask questions about the contents.
To get started, let's just use **LangChain** to ask a simple question to a model. To do this with JavaScript, we need to install **LangChain**:
```bash
npm install langchain
```
Now we can start building out our JavaScript:
```javascript
import { Ollama } from "langchain/llms/ollama";
const ollama = new Ollama({
baseUrl: "http://localhost:11434",
model: "llama2",
});
const answer = await ollama.call(`why is the sky blue?`);
console.log(answer);
```
That will get us the same thing as if we ran `ollama run llama2 "why is the sky blue"` in the terminal. But we want to load a document from the web to ask a question against. **Cheerio** is a great library for ingesting a webpage, and **LangChain** uses it in their **CheerioWebBaseLoader**. So let's install **Cheerio** and build that part of the app.
```bash
npm install cheerio
```
```javascript
import { CheerioWebBaseLoader } from "langchain/document_loaders/web/cheerio";
const loader = new CheerioWebBaseLoader("https://en.wikipedia.org/wiki/2023_Hawaii_wildfires");
const data = await loader.load();
```
That will load the document. Although this page is smaller than the Odyssey, it is certainly bigger than the context size for most LLMs. So we are going to need to split into smaller pieces, and then select just the pieces relevant to our question. This is a great use for a vector datastore. In this example, we will use the **MemoryVectorStore** that is part of **LangChain**. But there is one more thing we need to get the content into the datastore. We have to run an embeddings process that converts the tokens in the text into a series of vectors. And for that, we are going to use **Tensorflow**. There is a lot of stuff going on in this one. First, install the **Tensorflow** components that we need.
```javascript
npm install @tensorflow/tfjs-core@3.6.0 @tensorflow/tfjs-converter@3.6.0 @tensorflow-models/universal-sentence-encoder@1.3.3 @tensorflow/tfjs-node@4.10.0
```
If you just install those components without the version numbers, it will install the latest versions, but there are conflicts within **Tensorflow**, so you need to install the compatible versions.
```javascript
import { RecursiveCharacterTextSplitter } from "langchain/text_splitter"
import { MemoryVectorStore } from "langchain/vectorstores/memory";
import "@tensorflow/tfjs-node";
import { TensorFlowEmbeddings } from "langchain/embeddings/tensorflow";
// Split the text into 500 character chunks. And overlap each chunk by 20 characters
const textSplitter = new RecursiveCharacterTextSplitter({
chunkSize: 500,
chunkOverlap: 20
});
const splitDocs = await textSplitter.splitDocuments(data);
// Then use the TensorFlow Embedding to store these chunks in the datastore
const vectorStore = await MemoryVectorStore.fromDocuments(splitDocs, new TensorFlowEmbeddings());
```
To connect the datastore to a question asked to a LLM, we need to use the concept at the heart of **LangChain**: the chain. Chains are a way to connect a number of activities together to accomplish a particular tasks. There are a number of chain types available, but for this tutorial we are using the **RetrievalQAChain**.
```javascript
import { RetrievalQAChain } from "langchain/chains";
const retriever = vectorStore.asRetriever();
const chain = RetrievalQAChain.fromLLM(ollama, retriever);
const result = await chain.call({query: "When was Hawaii's request for a major disaster declaration approved?"});
console.log(result.text)
```
So we created a retriever, which is a way to return the chunks that match a query from a datastore. And then connect the retriever and the model via a chain. Finally, we send a query to the chain, which results in an answer using our document as a source. The answer it returned was correct, August 10, 2023.
And that is a simple introduction to what you can do with **LangChain** and **Ollama.**

View File

@@ -0,0 +1,82 @@
# Using LangChain with Ollama in Python
Let's imagine we are studying the classics, such as **the Odyssey** by **Homer**. We might have a question about Neleus and his family. If you ask llama2 for that info, you may get something like:
> I apologize, but I'm a large language model, I cannot provide information on individuals or families that do not exist in reality. Neleus is not a real person or character, and therefore does not have a family or any other personal details. My apologies for any confusion. Is there anything else I can help you with?
This sounds like a typical censored response, but even llama2-uncensored gives a mediocre answer:
> Neleus was a legendary king of Pylos and the father of Nestor, one of the Argonauts. His mother was Clymene, a sea nymph, while his father was Neptune, the god of the sea.
So let's figure out how we can use **LangChain** with Ollama to ask our question to the actual document, the Odyssey by Homer, using Python.
Let's start by asking a simple question that we can get an answer to from the **Llama2** model using **Ollama**. First, we need to install the **LangChain** package:
`pip install langchain`
Then we can create a model and ask the question:
```python
from langchain.llms import Ollama
ollama = Ollama(base_url='http://localhost:11434',
model="llama2")
print(ollama("why is the sky blue"))
```
Notice that we are defining the model and the base URL for Ollama.
Now let's load a document to ask questions against. I'll load up the Odyssey by Homer, which you can find at Project Gutenberg. We will need **WebBaseLoader** which is part of **LangChain** and loads text from any webpage. On my machine, I also needed to install **bs4** to get that to work, so run `pip install bs4`.
```python
from langchain.document_loaders import WebBaseLoader
loader = WebBaseLoader("https://www.gutenberg.org/files/1727/1727-h/1727-h.htm")
data = loader.load()
```
This file is pretty big. Just the preface is 3000 tokens. Which means the full document won't fit into the context for the model. So we need to split it up into smaller pieces.
```python
from langchain.text_splitter import RecursiveCharacterTextSplitter
text_splitter=RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=0)
all_splits = text_splitter.split_documents(data)
```
It's split up, but we have to find the relevant splits and then submit those to the model. We can do this by creating embeddings and storing them in a vector database. We can use Ollama directly to instantiate an embedding model. We will use ChromaDB in this example for a vector database. `pip install GPT4All chromadb`
```python
from langchain.embeddings import OllamaEmbeddings
from langchain.vectorstores import Chroma
oembed = OllamaEmbeddings(base_url="http://localhost:11434", model="llama2")
vectorstore = Chroma.from_documents(documents=all_splits, embedding=oembed)
```
Now let's ask a question from the document. **Who was Neleus, and who is in his family?** Neleus is a character in the Odyssey, and the answer can be found in our text.
```python
question="Who is Neleus and who is in Neleus' family?"
docs = vectorstore.similarity_search(question)
len(docs)
```
This will output the number of matches for chunks of data similar to the search.
The next thing is to send the question and the relevant parts of the docs to the model to see if we can get a good answer. But we are stitching two parts of the process together, and that is called a chain. This means we need to define a chain:
```python
from langchain.chains import RetrievalQA
qachain=RetrievalQA.from_chain_type(ollama, retriever=vectorstore.as_retriever())
qachain({"query": question})
```
The answer received from this chain was:
> Neleus is a character in Homer's "Odyssey" and is mentioned in the context of Penelope's suitors. Neleus is the father of Chloris, who is married to Neleus and bears him several children, including Nestor, Chromius, Periclymenus, and Pero. Amphinomus, the son of Nisus, is also mentioned as a suitor of Penelope and is known for his good natural disposition and agreeable conversation.
It's not a perfect answer, as it implies Neleus married his daughter when actually Chloris "was the youngest daughter to Amphion son of Iasus and king of Minyan Orchomenus, and was Queen in Pylos".
I updated the chunk_overlap for the text splitter to 20 and tried again and got a much better answer:
> Neleus is a character in Homer's epic poem "The Odyssey." He is the husband of Chloris, who is the youngest daughter of Amphion son of Iasus and king of Minyan Orchomenus. Neleus has several children with Chloris, including Nestor, Chromius, Periclymenus, and Pero.
And that is a much better answer.

View File

@@ -0,0 +1,38 @@
# Running Ollama on NVIDIA Jetson Devices
With some minor configuration, Ollama runs well on [NVIDIA Jetson Devices](https://www.nvidia.com/en-us/autonomous-machines/embedded-systems/). The following has been tested on [JetPack 5.1.2](https://developer.nvidia.com/embedded/jetpack).
NVIDIA Jetson devices are Linux-based embedded AI computers that are purpose-built for AI applications.
Jetsons have an integrated GPU that is wired directly to the memory controller of the machine. For this reason, the `nvidia-smi` command is unrecognized, and Ollama proceeds to operate in "CPU only"
mode. This can be verified by using a monitoring tool like jtop.
In order to address this, we simply pass the path to the Jetson's pre-installed CUDA libraries into `ollama serve` (while in a tmux session). We then hardcode the num_gpu parameters into a cloned
version of our target model.
Prerequisites:
- curl
- tmux
Here are the steps:
- Install Ollama via standard Linux command (ignore the 404 error): `curl https://ollama.ai/install.sh | sh`
- Stop the Ollama service: `sudo systemctl stop ollama`
- Start Ollama serve in a tmux session called ollama_jetson and reference the CUDA libraries path: `tmux has-session -t ollama_jetson 2>/dev/null || tmux new-session -d -s ollama_jetson
'LD_LIBRARY_PATH=/usr/local/cuda/lib64 ollama serve'`
- Pull the model you want to use (e.g. mistral): `ollama pull mistral`
- Create a new Modelfile specifically for enabling GPU support on the Jetson: `touch ModelfileMistralJetson`
- In the ModelfileMistralJetson file, specify the FROM model and the num_gpu PARAMETER as shown below:
```
FROM mistral
PARAMETER num_gpu 999
```
- Create a new model from your Modelfile: `ollama create mistral-jetson -f ./ModelfileMistralJetson`
- Run the new model: `ollama run mistral-jetson`
If you run a monitoring tool like jtop you should now see that Ollama is using the Jetson's integrated GPU.
And that's it!

174
examples/.gitignore vendored Normal file
View File

@@ -0,0 +1,174 @@
node_modules
bun.lockb
.vscode
# OSX
.DS_STORE
# Models
models/
# Local Chroma db
.chroma/
db/
# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class
# C extensions
*.so
# Distribution / packaging
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
share/python-wheels/
*.egg-info/
.installed.cfg
*.egg
MANIFEST
# PyInstaller
# Usually these files are written by a python script from a template
# before PyInstaller builds the exe, so as to inject date/other infos into it.
*.manifest
*.spec
# Installer logs
pip-log.txt
pip-delete-this-directory.txt
# Unit test / coverage reports
htmlcov/
.tox/
.nox/
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*.cover
*.py,cover
.hypothesis/
.pytest_cache/
cover/
# Translations
*.mo
*.pot
# Django stuff:
*.log
local_settings.py
db.sqlite3
db.sqlite3-journal
# Flask stuff:
instance/
.webassets-cache
# Scrapy stuff:
.scrapy
# Sphinx documentation
docs/_build/
# PyBuilder
.pybuilder/
target/
# Jupyter Notebook
.ipynb_checkpoints
# IPython
profile_default/
ipython_config.py
# pyenv
# For a library or package, you might want to ignore these files since the code is
# intended to run in multiple environments; otherwise, check them in:
# .python-version
# pipenv
# According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
# However, in case of collaboration, if having platform-specific dependencies or dependencies
# having no cross-platform support, pipenv may install dependencies that don't work, or not
# install all needed dependencies.
#Pipfile.lock
# poetry
# Similar to Pipfile.lock, it is generally recommended to include poetry.lock in version control.
# This is especially recommended for binary packages to ensure reproducibility, and is more
# commonly ignored for libraries.
# https://python-poetry.org/docs/basic-usage/#commit-your-poetrylock-file-to-version-control
#poetry.lock
# pdm
# Similar to Pipfile.lock, it is generally recommended to include pdm.lock in version control.
#pdm.lock
# pdm stores project-wide configurations in .pdm.toml, but it is recommended to not include it
# in version control.
# https://pdm.fming.dev/#use-with-ide
.pdm.toml
# PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm
__pypackages__/
# Celery stuff
celerybeat-schedule
celerybeat.pid
# SageMath parsed files
*.sage.py
# Environments
.env
.venv
env/
venv/
ENV/
env.bak/
venv.bak/
# Spyder project settings
.spyderproject
.spyproject
# Rope project settings
.ropeproject
# mkdocs documentation
/site
# mypy
.mypy_cache/
.dmypy.json
dmypy.json
# Pyre type checker
.pyre/
# pytype static type analyzer
.pytype/
# Cython debug symbols
cython_debug/
# PyCharm
# JetBrains specific template is maintained in a separate JetBrains.gitignore that can
# be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore
# and can be added to the global gitignore or merged into this file. For a more nuclear
# option (not recommended) you can uncomment the following to ignore the entire idea folder.
#.idea/

3
examples/README.md Normal file
View File

@@ -0,0 +1,3 @@
# Examples
This directory contains different examples of using Ollama.

View File

@@ -0,0 +1,10 @@
# Bash Shell examples
When calling `ollama`, you can pass it a file to run all the prompts in the file, one after the other:
`ollama run llama2 < sourcequestions.txt`
This concept is used in the following example.
## Compare Models
`comparemodels.sh` is a script that runs all the questions in `sourcequestions.txt` using any 4 models you choose that you have already pulled from the Ollama library or have created locally.

View File

@@ -0,0 +1,64 @@
#! /usr/bin/env bash
# Compare multiple models by running them with the same questions
NUMBEROFCHOICES=4
SELECTIONS=()
declare -a SUMS=()
# Get the list of models
CHOICES=$(ollama list | awk '{print $1}')
# Select which models to run as a comparison
echo "Select $NUMBEROFCHOICES models to compare:"
select ITEM in $CHOICES; do
if [[ -n $ITEM ]]; then
echo "You have selected $ITEM"
SELECTIONS+=("$ITEM")
((COUNT++))
if [[ $COUNT -eq $NUMBEROFCHOICES ]]; then
break
fi
else
echo "Invalid selection"
fi
done
# Loop through each of the selected models
for ITEM in "${SELECTIONS[@]}"; do
echo "--------------------------------------------------------------"
echo "Loading the model $ITEM into memory"
ollama run "$ITEM" ""
echo "--------------------------------------------------------------"
echo "Running the questions through the model $ITEM"
COMMAND_OUTPUT=$(ollama run "$ITEM" --verbose < sourcequestions.txt 2>&1| tee /dev/stderr)
# eval duration is sometimes listed in seconds and sometimes in milliseconds.
# Add up the values for each model
SUM=$(echo "$COMMAND_OUTPUT" | awk '
/eval duration:/ {
value = $3
if (index(value, "ms") > 0) {
gsub("ms", "", value)
value /= 1000
} else {
gsub("s", "", value)
}
sum += value
}
END { print sum }')
SUMS+=("All questions for $ITEM completed in $SUM seconds")
done
echo ""
echo "--------------------------------------------------------------"
echo -e "Sums of eval durations for each run:"
for val in "${SUMS[@]}"; do
echo "$val"
done
echo "--------------------------------------------------------------"
echo "Comparison complete. Now you can decide"
echo "which model is best."
echo "--------------------------------------------------------------"

View File

@@ -0,0 +1,7 @@
Why is the sky blue
What is a black hole
Explain the big bang theory like I am 5?
What is the quickest way to win a game of Monopoly with 3 others?
Why does a vacuum bottle keep my coffee hot and my milkshake cold?
What is the difference between a meteor, a meteorite, and a meteoroid?
Create an array with 5 items and print to the console. Do this in Python, C#, Typescript, and Rust.

View File

View File

@@ -0,0 +1,29 @@
package main
import (
"bytes"
"fmt"
"io"
"log"
"net/http"
"os"
)
func main() {
body := []byte(`{"model":"mistral"}`)
resp, err := http.Post("http://localhost:11434/api/generate", "application/json", bytes.NewBuffer(body))
if err != nil {
fmt.Print(err.Error())
os.Exit(1)
}
defer resp.Body.Close()
responseData, err := io.ReadAll(resp.Body)
if err != nil {
log.Fatal(err)
}
fmt.Println(string(responseData))
}

View File

@@ -0,0 +1,5 @@
# Ollama Jupyter Notebook
This example downloads and installs Ollama in a Jupyter instance such as Google Colab. It will start the Ollama service and expose an endpoint using `ngrok` which can be used to communicate with the Ollama instance remotely.
For best results, use an instance with GPU accelerator.

View File

@@ -0,0 +1,102 @@
{
"cells": [
{
"cell_type": "code",
"execution_count": null,
"id": "93f59dcb-c588-41b8-a792-55d88ade739c",
"metadata": {},
"outputs": [],
"source": [
"# Download and run the Ollama Linux install script\n",
"!curl https://ollama.ai/install.sh | sh\n",
"!command -v systemctl >/dev/null && sudo systemctl stop ollama"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "658c147e-c7f8-490e-910e-62b80f577dda",
"metadata": {},
"outputs": [],
"source": [
"!pip install aiohttp pyngrok\n",
"\n",
"import os\n",
"import asyncio\n",
"from aiohttp import ClientSession\n",
"\n",
"# Set LD_LIBRARY_PATH so the system NVIDIA library becomes preferred\n",
"# over the built-in library. This is particularly important for \n",
"# Google Colab which installs older drivers\n",
"os.environ.update({'LD_LIBRARY_PATH': '/usr/lib64-nvidia'})\n",
"\n",
"async def run(cmd):\n",
" '''\n",
" run is a helper function to run subcommands asynchronously.\n",
" '''\n",
" print('>>> starting', *cmd)\n",
" p = await asyncio.subprocess.create_subprocess_exec(\n",
" *cmd,\n",
" stdout=asyncio.subprocess.PIPE,\n",
" stderr=asyncio.subprocess.PIPE,\n",
" )\n",
"\n",
" async def pipe(lines):\n",
" async for line in lines:\n",
" print(line.strip().decode('utf-8'))\n",
"\n",
" await asyncio.gather(\n",
" pipe(p.stdout),\n",
" pipe(p.stderr),\n",
" )\n",
"\n",
"\n",
"await asyncio.gather(\n",
" run(['ollama', 'serve']),\n",
" run(['ngrok', 'http', '--log', 'stderr', '11434']),\n",
")"
]
},
{
"cell_type": "markdown",
"id": "e7735a55-9aad-4caf-8683-52e2163ba53b",
"metadata": {},
"source": [
"The previous cell starts two processes, `ollama` and `ngrok`. The log output will show a line like the following which describes the external address.\n",
"\n",
"```\n",
"t=2023-11-12T22:55:56+0000 lvl=info msg=\"started tunnel\" obj=tunnels name=command_line addr=http://localhost:11434 url=https://8249-34-125-179-11.ngrok.io\n",
"```\n",
"\n",
"The external address in this case is `https://8249-34-125-179-11.ngrok.io` which can be passed into `OLLAMA_HOST` to access this instance.\n",
"\n",
"```bash\n",
"export OLLAMA_HOST=https://8249-34-125-179-11.ngrok.io\n",
"ollama list\n",
"ollama run mistral\n",
"```"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.6"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@@ -0,0 +1,36 @@
# Deploy Ollama to Kubernetes
## Prerequisites
- Ollama: https://ollama.ai/download
- Kubernetes cluster. This example will use Google Kubernetes Engine.
## Steps
1. Create the Ollama namespace, daemon set, and service
```bash
kubectl apply -f cpu.yaml
```
1. Port forward the Ollama service to connect and use it locally
```bash
kubectl -n ollama port-forward service/ollama 11434:80
```
1. Pull and run a model, for example `orca-mini:3b`
```bash
ollama run orca-mini:3b
```
## (Optional) Hardware Acceleration
Hardware acceleration in Kubernetes requires NVIDIA's [`k8s-device-plugin`](https://github.com/NVIDIA/k8s-device-plugin). Follow the link for more details.
Once configured, create a GPU enabled Ollama deployment.
```bash
kubectl apply -f gpu.yaml
```

View File

@@ -0,0 +1,42 @@
---
apiVersion: v1
kind: Namespace
metadata:
name: ollama
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: ollama
namespace: ollama
spec:
selector:
matchLabels:
name: ollama
template:
metadata:
labels:
name: ollama
spec:
containers:
- name: ollama
image: ollama/ollama:latest
ports:
- name: http
containerPort: 11434
protocol: TCP
---
apiVersion: v1
kind: Service
metadata:
name: ollama
namespace: ollama
spec:
type: ClusterIP
selector:
name: ollama
ports:
- port: 80
name: http
targetPort: http
protocol: TCP

View File

@@ -0,0 +1,58 @@
---
apiVersion: v1
kind: Namespace
metadata:
name: ollama
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: ollama
namespace: ollama
spec:
strategy:
type: Recreate
selector:
matchLabels:
name: ollama
template:
metadata:
labels:
name: ollama
spec:
containers:
- name: ollama
image: ollama/ollama:latest
env:
- name: PATH
value: /usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
- name: LD_LIBRARY_PATH
value: /usr/local/nvidia/lib:/usr/local/nvidia/lib64
- name: NVIDIA_DRIVER_CAPABILITIES
value: compute,utility
ports:
- name: http
containerPort: 11434
protocol: TCP
resources:
limits:
nvidia.com/gpu: 1
tolerations:
- key: nvidia.com/gpu
operator: Exists
effect: NoSchedule
---
apiVersion: v1
kind: Service
metadata:
name: ollama
namespace: ollama
spec:
type: ClusterIP
selector:
name: ollama
ports:
- port: 80
name: http
targetPort: http
protocol: TCP

View File

@@ -0,0 +1,21 @@
# LangChain Document QA
This example provides an interface for asking questions to a PDF document.
## Setup
```
pip install -r requirements.txt
```
## Run
```
python main.py
```
A prompt will appear, where questions may be asked:
```
Query: How many locations does WeWork have?
```

View File

@@ -0,0 +1,61 @@
from langchain.document_loaders import OnlinePDFLoader
from langchain.vectorstores import Chroma
from langchain.embeddings import GPT4AllEmbeddings
from langchain import PromptTemplate
from langchain.llms import Ollama
from langchain.callbacks.manager import CallbackManager
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler
from langchain.chains import RetrievalQA
import sys
import os
class SuppressStdout:
def __enter__(self):
self._original_stdout = sys.stdout
self._original_stderr = sys.stderr
sys.stdout = open(os.devnull, 'w')
sys.stderr = open(os.devnull, 'w')
def __exit__(self, exc_type, exc_val, exc_tb):
sys.stdout.close()
sys.stdout = self._original_stdout
sys.stderr = self._original_stderr
# load the pdf and split it into chunks
loader = OnlinePDFLoader("https://d18rn0p25nwr6d.cloudfront.net/CIK-0001813756/975b3e9b-268e-4798-a9e4-2a9a7c92dc10.pdf")
data = loader.load()
from langchain.text_splitter import RecursiveCharacterTextSplitter
text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=0)
all_splits = text_splitter.split_documents(data)
with SuppressStdout():
vectorstore = Chroma.from_documents(documents=all_splits, embedding=GPT4AllEmbeddings())
while True:
query = input("\nQuery: ")
if query == "exit":
break
if query.strip() == "":
continue
# Prompt
template = """Use the following pieces of context to answer the question at the end.
If you don't know the answer, just say that you don't know, don't try to make up an answer.
Use three sentences maximum and keep the answer as concise as possible.
{context}
Question: {question}
Helpful Answer:"""
QA_CHAIN_PROMPT = PromptTemplate(
input_variables=["context", "question"],
template=template,
)
llm = Ollama(model="llama2:13b", callback_manager=CallbackManager([StreamingStdOutCallbackHandler()]))
qa_chain = RetrievalQA.from_chain_type(
llm,
retriever=vectorstore.as_retriever(),
chain_type_kwargs={"prompt": QA_CHAIN_PROMPT},
)
result = qa_chain({"query": query})

View File

@@ -0,0 +1,109 @@
absl-py==1.4.0
aiohttp==3.8.5
aiosignal==1.3.1
anyio==3.7.1
astunparse==1.6.3
async-timeout==4.0.3
attrs==23.1.0
backoff==2.2.1
beautifulsoup4==4.12.2
bs4==0.0.1
cachetools==5.3.1
certifi==2023.7.22
cffi==1.15.1
chardet==5.2.0
charset-normalizer==3.2.0
Chroma==0.2.0
chroma-hnswlib==0.7.2
chromadb==0.4.5
click==8.1.6
coloredlogs==15.0.1
cryptography==41.0.3
dataclasses-json==0.5.14
fastapi==0.99.1
filetype==1.2.0
flatbuffers==23.5.26
frozenlist==1.4.0
gast==0.4.0
google-auth==2.22.0
google-auth-oauthlib==1.0.0
google-pasta==0.2.0
gpt4all==1.0.8
grpcio==1.57.0
h11==0.14.0
h5py==3.9.0
httptools==0.6.0
humanfriendly==10.0
idna==3.4
importlib-resources==6.0.1
joblib==1.3.2
keras==2.13.1
langchain==0.0.261
langsmith==0.0.21
libclang==16.0.6
lxml==4.9.3
Markdown==3.4.4
MarkupSafe==2.1.3
marshmallow==3.20.1
monotonic==1.6
mpmath==1.3.0
multidict==6.0.4
mypy-extensions==1.0.0
nltk==3.8.1
numexpr==2.8.5
numpy==1.24.3
oauthlib==3.2.2
onnxruntime==1.15.1
openapi-schema-pydantic==1.2.4
opt-einsum==3.3.0
overrides==7.4.0
packaging==23.1
pdf2image==1.16.3
pdfminer==20191125
pdfminer.six==20221105
Pillow==10.0.0
posthog==3.0.1
protobuf==4.24.0
pulsar-client==3.2.0
pyasn1==0.5.0
pyasn1-modules==0.3.0
pycparser==2.21
pycryptodome==3.18.0
pydantic==1.10.12
PyPika==0.48.9
python-dateutil==2.8.2
python-dotenv==1.0.0
python-magic==0.4.27
PyYAML==6.0.1
regex==2023.8.8
requests==2.31.0
requests-oauthlib==1.3.1
rsa==4.9
six==1.16.0
sniffio==1.3.0
soupsieve==2.4.1
SQLAlchemy==2.0.19
starlette==0.27.0
sympy==1.12
tabulate==0.9.0
tenacity==8.2.2
tensorboard==2.13.0
tensorboard-data-server==0.7.1
tensorflow==2.13.0
tensorflow-estimator==2.13.0
tensorflow-hub==0.14.0
tensorflow-macos==2.13.0
termcolor==2.3.0
tokenizers==0.13.3
tqdm==4.66.1
typing-inspect==0.9.0
typing_extensions==4.5.0
unstructured==0.9.2
urllib3==1.26.16
uvicorn==0.23.2
uvloop==0.17.0
watchfiles==0.19.0
websockets==11.0.3
Werkzeug==2.3.6
wrapt==1.15.0
yarl==1.9.2

View File

@@ -0,0 +1,170 @@
# OSX
.DS_STORE
# Models
models/
# Local Chroma db
.chroma/
db/
# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class
# C extensions
*.so
# Distribution / packaging
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
share/python-wheels/
*.egg-info/
.installed.cfg
*.egg
MANIFEST
# PyInstaller
# Usually these files are written by a python script from a template
# before PyInstaller builds the exe, so as to inject date/other infos into it.
*.manifest
*.spec
# Installer logs
pip-log.txt
pip-delete-this-directory.txt
# Unit test / coverage reports
htmlcov/
.tox/
.nox/
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*.cover
*.py,cover
.hypothesis/
.pytest_cache/
cover/
# Translations
*.mo
*.pot
# Django stuff:
*.log
local_settings.py
db.sqlite3
db.sqlite3-journal
# Flask stuff:
instance/
.webassets-cache
# Scrapy stuff:
.scrapy
# Sphinx documentation
docs/_build/
# PyBuilder
.pybuilder/
target/
# Jupyter Notebook
.ipynb_checkpoints
# IPython
profile_default/
ipython_config.py
# pyenv
# For a library or package, you might want to ignore these files since the code is
# intended to run in multiple environments; otherwise, check them in:
# .python-version
# pipenv
# According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
# However, in case of collaboration, if having platform-specific dependencies or dependencies
# having no cross-platform support, pipenv may install dependencies that don't work, or not
# install all needed dependencies.
#Pipfile.lock
# poetry
# Similar to Pipfile.lock, it is generally recommended to include poetry.lock in version control.
# This is especially recommended for binary packages to ensure reproducibility, and is more
# commonly ignored for libraries.
# https://python-poetry.org/docs/basic-usage/#commit-your-poetrylock-file-to-version-control
#poetry.lock
# pdm
# Similar to Pipfile.lock, it is generally recommended to include pdm.lock in version control.
#pdm.lock
# pdm stores project-wide configurations in .pdm.toml, but it is recommended to not include it
# in version control.
# https://pdm.fming.dev/#use-with-ide
.pdm.toml
# PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm
__pypackages__/
# Celery stuff
celerybeat-schedule
celerybeat.pid
# SageMath parsed files
*.sage.py
# Environments
.env
.venv
env/
venv/
ENV/
env.bak/
venv.bak/
# Spyder project settings
.spyderproject
.spyproject
# Rope project settings
.ropeproject
# mkdocs documentation
/site
# mypy
.mypy_cache/
.dmypy.json
dmypy.json
# Pyre type checker
.pyre/
# pytype static type analyzer
.pytype/
# Cython debug symbols
cython_debug/
# PyCharm
# JetBrains specific template is maintained in a separate JetBrains.gitignore that can
# be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore
# and can be added to the global gitignore or merged into this file. For a more nuclear
# option (not recommended) you can uncomment the following to ignore the entire idea folder.
#.idea/

View File

@@ -0,0 +1,201 @@
Apache License
Version 2.0, January 2004
http://www.apache.org/licenses/
TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
1. Definitions.
"License" shall mean the terms and conditions for use, reproduction,
and distribution as defined by Sections 1 through 9 of this document.
"Licensor" shall mean the copyright owner or entity authorized by
the copyright owner that is granting the License.
"Legal Entity" shall mean the union of the acting entity and all
other entities that control, are controlled by, or are under common
control with that entity. For the purposes of this definition,
"control" means (i) the power, direct or indirect, to cause the
direction or management of such entity, whether by contract or
otherwise, or (ii) ownership of fifty percent (50%) or more of the
outstanding shares, or (iii) beneficial ownership of such entity.
"You" (or "Your") shall mean an individual or Legal Entity
exercising permissions granted by this License.
"Source" form shall mean the preferred form for making modifications,
including but not limited to software source code, documentation
source, and configuration files.
"Object" form shall mean any form resulting from mechanical
transformation or translation of a Source form, including but
not limited to compiled object code, generated documentation,
and conversions to other media types.
"Work" shall mean the work of authorship, whether in Source or
Object form, made available under the License, as indicated by a
copyright notice that is included in or attached to the work
(an example is provided in the Appendix below).
"Derivative Works" shall mean any work, whether in Source or Object
form, that is based on (or derived from) the Work and for which the
editorial revisions, annotations, elaborations, or other modifications
represent, as a whole, an original work of authorship. For the purposes
of this License, Derivative Works shall not include works that remain
separable from, or merely link (or bind by name) to the interfaces of,
the Work and Derivative Works thereof.
"Contribution" shall mean any work of authorship, including
the original version of the Work and any modifications or additions
to that Work or Derivative Works thereof, that is intentionally
submitted to Licensor for inclusion in the Work by the copyright owner
or by an individual or Legal Entity authorized to submit on behalf of
the copyright owner. For the purposes of this definition, "submitted"
means any form of electronic, verbal, or written communication sent
to the Licensor or its representatives, including but not limited to
communication on electronic mailing lists, source code control systems,
and issue tracking systems that are managed by, or on behalf of, the
Licensor for the purpose of discussing and improving the Work, but
excluding communication that is conspicuously marked or otherwise
designated in writing by the copyright owner as "Not a Contribution."
"Contributor" shall mean Licensor and any individual or Legal Entity
on behalf of whom a Contribution has been received by Licensor and
subsequently incorporated within the Work.
2. Grant of Copyright License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
copyright license to reproduce, prepare Derivative Works of,
publicly display, publicly perform, sublicense, and distribute the
Work and such Derivative Works in Source or Object form.
3. Grant of Patent License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
(except as stated in this section) patent license to make, have made,
use, offer to sell, sell, import, and otherwise transfer the Work,
where such license applies only to those patent claims licensable
by such Contributor that are necessarily infringed by their
Contribution(s) alone or by combination of their Contribution(s)
with the Work to which such Contribution(s) was submitted. If You
institute patent litigation against any entity (including a
cross-claim or counterclaim in a lawsuit) alleging that the Work
or a Contribution incorporated within the Work constitutes direct
or contributory patent infringement, then any patent licenses
granted to You under this License for that Work shall terminate
as of the date such litigation is filed.
4. Redistribution. You may reproduce and distribute copies of the
Work or Derivative Works thereof in any medium, with or without
modifications, and in Source or Object form, provided that You
meet the following conditions:
(a) You must give any other recipients of the Work or
Derivative Works a copy of this License; and
(b) You must cause any modified files to carry prominent notices
stating that You changed the files; and
(c) You must retain, in the Source form of any Derivative Works
that You distribute, all copyright, patent, trademark, and
attribution notices from the Source form of the Work,
excluding those notices that do not pertain to any part of
the Derivative Works; and
(d) If the Work includes a "NOTICE" text file as part of its
distribution, then any Derivative Works that You distribute must
include a readable copy of the attribution notices contained
within such NOTICE file, excluding those notices that do not
pertain to any part of the Derivative Works, in at least one
of the following places: within a NOTICE text file distributed
as part of the Derivative Works; within the Source form or
documentation, if provided along with the Derivative Works; or,
within a display generated by the Derivative Works, if and
wherever such third-party notices normally appear. The contents
of the NOTICE file are for informational purposes only and
do not modify the License. You may add Your own attribution
notices within Derivative Works that You distribute, alongside
or as an addendum to the NOTICE text from the Work, provided
that such additional attribution notices cannot be construed
as modifying the License.
You may add Your own copyright statement to Your modifications and
may provide additional or different license terms and conditions
for use, reproduction, or distribution of Your modifications, or
for any such Derivative Works as a whole, provided Your use,
reproduction, and distribution of the Work otherwise complies with
the conditions stated in this License.
5. Submission of Contributions. Unless You explicitly state otherwise,
any Contribution intentionally submitted for inclusion in the Work
by You to the Licensor shall be under the terms and conditions of
this License, without any additional terms or conditions.
Notwithstanding the above, nothing herein shall supersede or modify
the terms of any separate license agreement you may have executed
with Licensor regarding such Contributions.
6. Trademarks. This License does not grant permission to use the trade
names, trademarks, service marks, or product names of the Licensor,
except as required for reasonable and customary use in describing the
origin of the Work and reproducing the content of the NOTICE file.
7. Disclaimer of Warranty. Unless required by applicable law or
agreed to in writing, Licensor provides the Work (and each
Contributor provides its Contributions) on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
implied, including, without limitation, any warranties or conditions
of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
PARTICULAR PURPOSE. You are solely responsible for determining the
appropriateness of using or redistributing the Work and assume any
risks associated with Your exercise of permissions under this License.
8. Limitation of Liability. In no event and under no legal theory,
whether in tort (including negligence), contract, or otherwise,
unless required by applicable law (such as deliberate and grossly
negligent acts) or agreed to in writing, shall any Contributor be
liable to You for damages, including any direct, indirect, special,
incidental, or consequential damages of any character arising as a
result of this License or out of the use or inability to use the
Work (including but not limited to damages for loss of goodwill,
work stoppage, computer failure or malfunction, or any and all
other commercial damages or losses), even if such Contributor
has been advised of the possibility of such damages.
9. Accepting Warranty or Additional Liability. While redistributing
the Work or Derivative Works thereof, You may choose to offer,
and charge a fee for, acceptance of support, warranty, indemnity,
or other liability obligations and/or rights consistent with this
License. However, in accepting such obligations, You may act only
on Your own behalf and on Your sole responsibility, not on behalf
of any other Contributor, and only if You agree to indemnify,
defend, and hold each Contributor harmless for any liability
incurred by, or claims asserted against, such Contributor by reason
of your accepting any such warranty or additional liability.
END OF TERMS AND CONDITIONS
APPENDIX: How to apply the Apache License to your work.
To apply the Apache License to your work, attach the following
boilerplate notice, with the fields enclosed by brackets "[]"
replaced with your own identifying information. (Don't include
the brackets!) The text should be enclosed in the appropriate
comment syntax for the file format. We also recommend that a
file or class name and description of purpose be included on the
same "printed page" as the copyright notice for easier
identification within third-party archives.
Copyright [yyyy] [name of copyright owner]
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

View File

@@ -0,0 +1,91 @@
# PrivateGPT with Llama 2 uncensored
https://github.com/jmorganca/ollama/assets/3325447/20cf8ec6-ff25-42c6-bdd8-9be594e3ce1b
> Note: this example is a slightly modified version of PrivateGPT using models such as Llama 2 Uncensored. All credit for PrivateGPT goes to Iván Martínez who is the creator of it, and you can find his GitHub repo [here](https://github.com/imartinez/privateGPT).
### Setup
Set up a virtual environment (optional):
```
python3 -m venv .venv
source .venv/bin/activate
```
Install the Python dependencies:
```shell
pip install -r requirements.txt
```
Pull the model you'd like to use:
```
ollama pull llama2-uncensored
```
### Getting WeWork's latest quarterly earnings report (10-Q)
```
mkdir source_documents
curl https://d18rn0p25nwr6d.cloudfront.net/CIK-0001813756/975b3e9b-268e-4798-a9e4-2a9a7c92dc10.pdf -o source_documents/wework.pdf
```
### Ingesting files
```shell
python ingest.py
```
Output should look like this:
```shell
Creating new vectorstore
Loading documents from source_documents
Loading new documents: 100%|██████████████████████| 1/1 [00:01<00:00, 1.73s/it]
Loaded 1 new documents from source_documents
Split into 90 chunks of text (max. 500 tokens each)
Creating embeddings. May take some minutes...
Using embedded DuckDB with persistence: data will be stored in: db
Ingestion complete! You can now run privateGPT.py to query your documents
```
### Ask questions
```shell
python privateGPT.py
Enter a query: How many locations does WeWork have?
> Answer (took 17.7 s.):
As of June 2023, WeWork has 777 locations worldwide, including 610 Consolidated Locations (as defined in the section entitled Key Performance Indicators).
```
### Try a different model:
```
ollama pull llama2:13b
MODEL=llama2:13b python privateGPT.py
```
## Adding more files
Put any and all your files into the `source_documents` directory
The supported extensions are:
- `.csv`: CSV,
- `.docx`: Word Document,
- `.doc`: Word Document,
- `.enex`: EverNote,
- `.eml`: Email,
- `.epub`: EPub,
- `.html`: HTML File,
- `.md`: Markdown,
- `.msg`: Outlook Message,
- `.odt`: Open Document Text,
- `.pdf`: Portable Document Format (PDF),
- `.pptx` : PowerPoint Document,
- `.ppt` : PowerPoint Document,
- `.txt`: Text file (UTF-8),

View File

@@ -0,0 +1,11 @@
import os
from chromadb.config import Settings
# Define the folder for storing database
PERSIST_DIRECTORY = os.environ.get('PERSIST_DIRECTORY', 'db')
# Define the Chroma settings
CHROMA_SETTINGS = Settings(
persist_directory=PERSIST_DIRECTORY,
anonymized_telemetry=False
)

View File

@@ -0,0 +1,161 @@
#!/usr/bin/env python3
import os
import glob
from typing import List
from multiprocessing import Pool
from tqdm import tqdm
from langchain.document_loaders import (
CSVLoader,
EverNoteLoader,
PyMuPDFLoader,
TextLoader,
UnstructuredEmailLoader,
UnstructuredEPubLoader,
UnstructuredHTMLLoader,
UnstructuredMarkdownLoader,
UnstructuredODTLoader,
UnstructuredPowerPointLoader,
UnstructuredWordDocumentLoader,
)
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.vectorstores import Chroma
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.docstore.document import Document
from constants import CHROMA_SETTINGS
# Load environment variables
persist_directory = os.environ.get('PERSIST_DIRECTORY', 'db')
source_directory = os.environ.get('SOURCE_DIRECTORY', 'source_documents')
embeddings_model_name = os.environ.get('EMBEDDINGS_MODEL_NAME', 'all-MiniLM-L6-v2')
chunk_size = 500
chunk_overlap = 50
# Custom document loaders
class MyElmLoader(UnstructuredEmailLoader):
"""Wrapper to fallback to text/plain when default does not work"""
def load(self) -> List[Document]:
"""Wrapper adding fallback for elm without html"""
try:
try:
doc = UnstructuredEmailLoader.load(self)
except ValueError as e:
if 'text/html content not found in email' in str(e):
# Try plain text
self.unstructured_kwargs["content_source"]="text/plain"
doc = UnstructuredEmailLoader.load(self)
else:
raise
except Exception as e:
# Add file_path to exception message
raise type(e)(f"{self.file_path}: {e}") from e
return doc
# Map file extensions to document loaders and their arguments
LOADER_MAPPING = {
".csv": (CSVLoader, {}),
# ".docx": (Docx2txtLoader, {}),
".doc": (UnstructuredWordDocumentLoader, {}),
".docx": (UnstructuredWordDocumentLoader, {}),
".enex": (EverNoteLoader, {}),
".eml": (MyElmLoader, {}),
".epub": (UnstructuredEPubLoader, {}),
".html": (UnstructuredHTMLLoader, {}),
".md": (UnstructuredMarkdownLoader, {}),
".odt": (UnstructuredODTLoader, {}),
".pdf": (PyMuPDFLoader, {}),
".ppt": (UnstructuredPowerPointLoader, {}),
".pptx": (UnstructuredPowerPointLoader, {}),
".txt": (TextLoader, {"encoding": "utf8"}),
# Add more mappings for other file extensions and loaders as needed
}
def load_single_document(file_path: str) -> List[Document]:
ext = "." + file_path.rsplit(".", 1)[-1]
if ext in LOADER_MAPPING:
loader_class, loader_args = LOADER_MAPPING[ext]
loader = loader_class(file_path, **loader_args)
return loader.load()
raise ValueError(f"Unsupported file extension '{ext}'")
def load_documents(source_dir: str, ignored_files: List[str] = []) -> List[Document]:
"""
Loads all documents from the source documents directory, ignoring specified files
"""
all_files = []
for ext in LOADER_MAPPING:
all_files.extend(
glob.glob(os.path.join(source_dir, f"**/*{ext}"), recursive=True)
)
filtered_files = [file_path for file_path in all_files if file_path not in ignored_files]
with Pool(processes=os.cpu_count()) as pool:
results = []
with tqdm(total=len(filtered_files), desc='Loading new documents', ncols=80) as pbar:
for i, docs in enumerate(pool.imap_unordered(load_single_document, filtered_files)):
results.extend(docs)
pbar.update()
return results
def process_documents(ignored_files: List[str] = []) -> List[Document]:
"""
Load documents and split in chunks
"""
print(f"Loading documents from {source_directory}")
documents = load_documents(source_directory, ignored_files)
if not documents:
print("No new documents to load")
exit(0)
print(f"Loaded {len(documents)} new documents from {source_directory}")
text_splitter = RecursiveCharacterTextSplitter(chunk_size=chunk_size, chunk_overlap=chunk_overlap)
texts = text_splitter.split_documents(documents)
print(f"Split into {len(texts)} chunks of text (max. {chunk_size} tokens each)")
return texts
def does_vectorstore_exist(persist_directory: str) -> bool:
"""
Checks if vectorstore exists
"""
if os.path.exists(os.path.join(persist_directory, 'index')):
if os.path.exists(os.path.join(persist_directory, 'chroma-collections.parquet')) and os.path.exists(os.path.join(persist_directory, 'chroma-embeddings.parquet')):
list_index_files = glob.glob(os.path.join(persist_directory, 'index/*.bin'))
list_index_files += glob.glob(os.path.join(persist_directory, 'index/*.pkl'))
# At least 3 documents are needed in a working vectorstore
if len(list_index_files) > 3:
return True
return False
def main():
# Create embeddings
embeddings = HuggingFaceEmbeddings(model_name=embeddings_model_name)
if does_vectorstore_exist(persist_directory):
# Update and store locally vectorstore
print(f"Appending to existing vectorstore at {persist_directory}")
db = Chroma(persist_directory=persist_directory, embedding_function=embeddings, client_settings=CHROMA_SETTINGS)
collection = db.get()
texts = process_documents([metadata['source'] for metadata in collection['metadatas']])
print(f"Creating embeddings. May take some minutes...")
db.add_documents(texts)
else:
# Create and store locally vectorstore
print("Creating new vectorstore")
texts = process_documents()
print(f"Creating embeddings. May take some minutes...")
db = Chroma.from_documents(texts, embeddings, persist_directory=persist_directory)
db.persist()
db = None
print(f"Ingestion complete! You can now run privateGPT.py to query your documents")
if __name__ == "__main__":
main()

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,74 @@
#!/usr/bin/env python3
from langchain.chains import RetrievalQA
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler
from langchain.vectorstores import Chroma
from langchain.llms import Ollama
import chromadb
import os
import argparse
import time
model = os.environ.get("MODEL", "llama2-uncensored")
# For embeddings model, the example uses a sentence-transformers model
# https://www.sbert.net/docs/pretrained_models.html
# "The all-mpnet-base-v2 model provides the best quality, while all-MiniLM-L6-v2 is 5 times faster and still offers good quality."
embeddings_model_name = os.environ.get("EMBEDDINGS_MODEL_NAME", "all-MiniLM-L6-v2")
persist_directory = os.environ.get("PERSIST_DIRECTORY", "db")
target_source_chunks = int(os.environ.get('TARGET_SOURCE_CHUNKS',4))
from constants import CHROMA_SETTINGS
def main():
# Parse the command line arguments
args = parse_arguments()
embeddings = HuggingFaceEmbeddings(model_name=embeddings_model_name)
db = Chroma(persist_directory=persist_directory, embedding_function=embeddings)
retriever = db.as_retriever(search_kwargs={"k": target_source_chunks})
# activate/deactivate the streaming StdOut callback for LLMs
callbacks = [] if args.mute_stream else [StreamingStdOutCallbackHandler()]
llm = Ollama(model=model, callbacks=callbacks)
qa = RetrievalQA.from_chain_type(llm=llm, chain_type="stuff", retriever=retriever, return_source_documents= not args.hide_source)
# Interactive questions and answers
while True:
query = input("\nEnter a query: ")
if query == "exit":
break
if query.strip() == "":
continue
# Get the answer from the chain
start = time.time()
res = qa(query)
answer, docs = res['result'], [] if args.hide_source else res['source_documents']
end = time.time()
# Print the result
print("\n\n> Question:")
print(query)
print(answer)
# Print the relevant sources used for the answer
for document in docs:
print("\n> " + document.metadata["source"] + ":")
print(document.page_content)
def parse_arguments():
parser = argparse.ArgumentParser(description='privateGPT: Ask questions to your documents without an internet connection, '
'using the power of LLMs.')
parser.add_argument("--hide-source", "-S", action='store_true',
help='Use this flag to disable printing of source documents used for answers.')
parser.add_argument("--mute-stream", "-M",
action='store_true',
help='Use this flag to disable the streaming StdOut callback for LLMs.')
return parser.parse_args()
if __name__ == "__main__":
main()

View File

@@ -0,0 +1,26 @@
[tool.poetry]
name = "privategpt"
version = "0.1.0"
description = ""
authors = ["Ivan Martinez <ivanmartit@gmail.com>"]
license = "Apache Version 2.0"
readme = "README.md"
[tool.poetry.dependencies]
python = "^3.10"
langchain = "0.0.261"
gpt4all = "^1.0.3"
chromadb = "^0.3.26"
PyMuPDF = "^1.22.5"
python-dotenv = "^1.0.0"
unstructured = "^0.8.0"
extract-msg = "^0.41.5"
tabulate = "^0.9.0"
pandoc = "^2.3"
pypandoc = "^1.11"
tqdm = "^4.65.0"
sentence-transformers = "^2.2.2"
[build-system]
requires = ["poetry-core"]
build-backend = "poetry.core.masonry.api"

View File

@@ -0,0 +1,14 @@
langchain==0.0.274
gpt4all==1.0.8
chromadb==0.4.7
llama-cpp-python==0.1.81
urllib3==2.0.4
PyMuPDF==1.23.5
python-dotenv==1.0.0
unstructured==0.10.8
extract-msg==0.45.0
tabulate==0.9.0
pandoc==2.3
pypandoc==1.11
tqdm==4.66.1
sentence_transformers==2.2.2

View File

@@ -0,0 +1,23 @@
# LangChain Web Summarization
This example summarizes the website, [https://ollama.ai/blog/run-llama2-uncensored-locally](https://ollama.ai/blog/run-llama2-uncensored-locally)
## Running the Example
1. Ensure you have the `llama2` model installed:
```bash
ollama pull llama2
```
2. Install the Python Requirements.
```bash
pip install -r requirements.txt
```
3. Run the example:
```bash
python main.py
```

View File

@@ -0,0 +1,12 @@
from langchain.llms import Ollama
from langchain.document_loaders import WebBaseLoader
from langchain.chains.summarize import load_summarize_chain
loader = WebBaseLoader("https://ollama.ai/blog/run-llama2-uncensored-locally")
docs = loader.load()
llm = Ollama(model="llama2")
chain = load_summarize_chain(llm, chain_type="stuff")
result = chain.run(docs)
print(result)

View File

@@ -0,0 +1 @@
langchain==0.0.259

View File

@@ -0,0 +1,24 @@
# LangChain
This example is a basic "hello world" of using LangChain with Ollama.
## Running the Example
1. Ensure you have the `llama2` model installed:
```bash
ollama pull llama2
```
2. Install the Python Requirements.
```bash
pip install -r requirements.txt
```
3. Run the example:
```bash
python main.py
```

View File

@@ -0,0 +1,6 @@
from langchain.llms import Ollama
input = input("What is your question?")
llm = Ollama(model="llama2")
res = llm.predict(input)
print (res)

View File

@@ -0,0 +1 @@
langchain==0.0.259

View File

@@ -0,0 +1,23 @@
# LangChain
This example is a basic "hello world" of using LangChain with Ollama using Node.js and Typescript.
## Running the Example
1. Install the prerequisites:
```bash
npm install
```
2. Ensure the `mistral` model is available:
```bash
ollama pull mistral
```
3. Run the example:
```bash
npm start
```

View File

@@ -0,0 +1,25 @@
import { Ollama } from 'langchain/llms/ollama';
import * as readline from "readline";
async function main() {
const ollama = new Ollama({
model: 'mistral'
// other parameters can be found at https://js.langchain.com/docs/api/llms_ollama/classes/Ollama
});
const rl = readline.createInterface({
input: process.stdin,
output: process.stdout,
});
rl.question("What is your question: \n", async (user_input) => {
const stream = await ollama.stream(user_input);
for await (const chunk of stream) {
process.stdout.write(chunk);
}
rl.close();
})
}
main();

View File

@@ -0,0 +1,997 @@
{
"name": "langchain-typescript-simple",
"lockfileVersion": 3,
"requires": true,
"packages": {
"": {
"dependencies": {
"langchain": "^0.0.165"
},
"devDependencies": {
"typescript": "^5.2.2"
}
},
"node_modules/@anthropic-ai/sdk": {
"version": "0.6.2",
"resolved": "https://registry.npmjs.org/@anthropic-ai/sdk/-/sdk-0.6.2.tgz",
"integrity": "sha512-fB9PUj9RFT+XjkL+E9Ol864ZIJi+1P8WnbHspN3N3/GK2uSzjd0cbVIKTGgf4v3N8MwaQu+UWnU7C4BG/fap/g==",
"dependencies": {
"@types/node": "^18.11.18",
"@types/node-fetch": "^2.6.4",
"abort-controller": "^3.0.0",
"agentkeepalive": "^4.2.1",
"digest-fetch": "^1.3.0",
"form-data-encoder": "1.7.2",
"formdata-node": "^4.3.2",
"node-fetch": "^2.6.7"
}
},
"node_modules/@types/node": {
"version": "18.18.4",
"resolved": "https://registry.npmjs.org/@types/node/-/node-18.18.4.tgz",
"integrity": "sha512-t3rNFBgJRugIhackit2mVcLfF6IRc0JE4oeizPQL8Zrm8n2WY/0wOdpOPhdtG0V9Q2TlW/axbF1MJ6z+Yj/kKQ=="
},
"node_modules/@types/node-fetch": {
"version": "2.6.6",
"resolved": "https://registry.npmjs.org/@types/node-fetch/-/node-fetch-2.6.6.tgz",
"integrity": "sha512-95X8guJYhfqiuVVhRFxVQcf4hW/2bCuoPwDasMf/531STFoNoWTT7YDnWdXHEZKqAGUigmpG31r2FE70LwnzJw==",
"dependencies": {
"@types/node": "*",
"form-data": "^4.0.0"
}
},
"node_modules/@types/retry": {
"version": "0.12.0",
"resolved": "https://registry.npmjs.org/@types/retry/-/retry-0.12.0.tgz",
"integrity": "sha512-wWKOClTTiizcZhXnPY4wikVAwmdYHp8q6DmC+EJUzAMsycb7HB32Kh9RN4+0gExjmPmZSAQjgURXIGATPegAvA=="
},
"node_modules/@types/uuid": {
"version": "9.0.5",
"resolved": "https://registry.npmjs.org/@types/uuid/-/uuid-9.0.5.tgz",
"integrity": "sha512-xfHdwa1FMJ082prjSJpoEI57GZITiQz10r3vEJCHa2khEFQjKy91aWKz6+zybzssCvXUwE1LQWgWVwZ4nYUvHQ=="
},
"node_modules/abort-controller": {
"version": "3.0.0",
"resolved": "https://registry.npmjs.org/abort-controller/-/abort-controller-3.0.0.tgz",
"integrity": "sha512-h8lQ8tacZYnR3vNQTgibj+tODHI5/+l06Au2Pcriv/Gmet0eaj4TwWH41sO9wnHDiQsEj19q0drzdWdeAHtweg==",
"dependencies": {
"event-target-shim": "^5.0.0"
},
"engines": {
"node": ">=6.5"
}
},
"node_modules/agentkeepalive": {
"version": "4.5.0",
"resolved": "https://registry.npmjs.org/agentkeepalive/-/agentkeepalive-4.5.0.tgz",
"integrity": "sha512-5GG/5IbQQpC9FpkRGsSvZI5QYeSCzlJHdpBQntCsuTOxhKD8lqKhrleg2Yi7yvMIf82Ycmmqln9U8V9qwEiJew==",
"dependencies": {
"humanize-ms": "^1.2.1"
},
"engines": {
"node": ">= 8.0.0"
}
},
"node_modules/ansi-styles": {
"version": "5.2.0",
"resolved": "https://registry.npmjs.org/ansi-styles/-/ansi-styles-5.2.0.tgz",
"integrity": "sha512-Cxwpt2SfTzTtXcfOlzGEee8O+c+MmUgGrNiBcXnuWxuFJHe6a5Hz7qwhwe5OgaSYI0IJvkLqWX1ASG+cJOkEiA==",
"engines": {
"node": ">=10"
},
"funding": {
"url": "https://github.com/chalk/ansi-styles?sponsor=1"
}
},
"node_modules/argparse": {
"version": "2.0.1",
"resolved": "https://registry.npmjs.org/argparse/-/argparse-2.0.1.tgz",
"integrity": "sha512-8+9WqebbFzpX9OR+Wa6O29asIogeRMzcGtAINdpMHHyAg10f05aSFVBbcEqGf/PXw1EjAZ+q2/bEBg3DvurK3Q=="
},
"node_modules/asynckit": {
"version": "0.4.0",
"resolved": "https://registry.npmjs.org/asynckit/-/asynckit-0.4.0.tgz",
"integrity": "sha512-Oei9OH4tRh0YqU3GxhX79dM/mwVgvbZJaSNaRk+bshkj0S5cfHcgYakreBjrHwatXKbz+IoIdYLxrKim2MjW0Q=="
},
"node_modules/base-64": {
"version": "0.1.0",
"resolved": "https://registry.npmjs.org/base-64/-/base-64-0.1.0.tgz",
"integrity": "sha512-Y5gU45svrR5tI2Vt/X9GPd3L0HNIKzGu202EjxrXMpuc2V2CiKgemAbUUsqYmZJvPtCXoUKjNZwBJzsNScUbXA=="
},
"node_modules/base64-js": {
"version": "1.5.1",
"resolved": "https://registry.npmjs.org/base64-js/-/base64-js-1.5.1.tgz",
"integrity": "sha512-AKpaYlHn8t4SVbOHCy+b5+KKgvR4vrsD8vbvrbiQJps7fKDTkjkDry6ji0rUJjC0kzbNePLwzxq8iypo41qeWA==",
"funding": [
{
"type": "github",
"url": "https://github.com/sponsors/feross"
},
{
"type": "patreon",
"url": "https://www.patreon.com/feross"
},
{
"type": "consulting",
"url": "https://feross.org/support"
}
]
},
"node_modules/binary-extensions": {
"version": "2.2.0",
"resolved": "https://registry.npmjs.org/binary-extensions/-/binary-extensions-2.2.0.tgz",
"integrity": "sha512-jDctJ/IVQbZoJykoeHbhXpOlNBqGNcwXJKJog42E5HDPUwQTSdjCHdihjj0DlnheQ7blbT6dHOafNAiS8ooQKA==",
"engines": {
"node": ">=8"
}
},
"node_modules/binary-search": {
"version": "1.3.6",
"resolved": "https://registry.npmjs.org/binary-search/-/binary-search-1.3.6.tgz",
"integrity": "sha512-nbE1WxOTTrUWIfsfZ4aHGYu5DOuNkbxGokjV6Z2kxfJK3uaAb8zNK1muzOeipoLHZjInT4Br88BHpzevc681xA=="
},
"node_modules/camelcase": {
"version": "6.3.0",
"resolved": "https://registry.npmjs.org/camelcase/-/camelcase-6.3.0.tgz",
"integrity": "sha512-Gmy6FhYlCY7uOElZUSbxo2UCDH8owEk996gkbrpsgGtrJLM3J7jGxl9Ic7Qwwj4ivOE5AWZWRMecDdF7hqGjFA==",
"engines": {
"node": ">=10"
},
"funding": {
"url": "https://github.com/sponsors/sindresorhus"
}
},
"node_modules/charenc": {
"version": "0.0.2",
"resolved": "https://registry.npmjs.org/charenc/-/charenc-0.0.2.tgz",
"integrity": "sha512-yrLQ/yVUFXkzg7EDQsPieE/53+0RlaWTs+wBrvW36cyilJ2SaDWfl4Yj7MtLTXleV9uEKefbAGUPv2/iWSooRA==",
"engines": {
"node": "*"
}
},
"node_modules/combined-stream": {
"version": "1.0.8",
"resolved": "https://registry.npmjs.org/combined-stream/-/combined-stream-1.0.8.tgz",
"integrity": "sha512-FQN4MRfuJeHf7cBbBMJFXhKSDq+2kAArBlmRBvcvFE5BB1HZKXtSFASDhdlz9zOYwxh8lDdnvmMOe/+5cdoEdg==",
"dependencies": {
"delayed-stream": "~1.0.0"
},
"engines": {
"node": ">= 0.8"
}
},
"node_modules/commander": {
"version": "10.0.1",
"resolved": "https://registry.npmjs.org/commander/-/commander-10.0.1.tgz",
"integrity": "sha512-y4Mg2tXshplEbSGzx7amzPwKKOCGuoSRP/CjEdwwk0FOGlUbq6lKuoyDZTNZkmxHdJtp54hdfY/JUrdL7Xfdug==",
"engines": {
"node": ">=14"
}
},
"node_modules/crypt": {
"version": "0.0.2",
"resolved": "https://registry.npmjs.org/crypt/-/crypt-0.0.2.tgz",
"integrity": "sha512-mCxBlsHFYh9C+HVpiEacem8FEBnMXgU9gy4zmNC+SXAZNB/1idgp/aulFJ4FgCi7GPEVbfyng092GqL2k2rmow==",
"engines": {
"node": "*"
}
},
"node_modules/decamelize": {
"version": "1.2.0",
"resolved": "https://registry.npmjs.org/decamelize/-/decamelize-1.2.0.tgz",
"integrity": "sha512-z2S+W9X73hAUUki+N+9Za2lBlun89zigOyGrsax+KUQ6wKW4ZoWpEYBkGhQjwAjjDCkWxhY0VKEhk8wzY7F5cA==",
"engines": {
"node": ">=0.10.0"
}
},
"node_modules/delayed-stream": {
"version": "1.0.0",
"resolved": "https://registry.npmjs.org/delayed-stream/-/delayed-stream-1.0.0.tgz",
"integrity": "sha512-ZySD7Nf91aLB0RxL4KGrKHBXl7Eds1DAmEdcoVawXnLD7SDhpNgtuII2aAkg7a7QS41jxPSZ17p4VdGnMHk3MQ==",
"engines": {
"node": ">=0.4.0"
}
},
"node_modules/digest-fetch": {
"version": "1.3.0",
"resolved": "https://registry.npmjs.org/digest-fetch/-/digest-fetch-1.3.0.tgz",
"integrity": "sha512-CGJuv6iKNM7QyZlM2T3sPAdZWd/p9zQiRNS9G+9COUCwzWFTs0Xp8NF5iePx7wtvhDykReiRRrSeNb4oMmB8lA==",
"dependencies": {
"base-64": "^0.1.0",
"md5": "^2.3.0"
}
},
"node_modules/event-target-shim": {
"version": "5.0.1",
"resolved": "https://registry.npmjs.org/event-target-shim/-/event-target-shim-5.0.1.tgz",
"integrity": "sha512-i/2XbnSz/uxRCU6+NdVJgKWDTM427+MqYbkQzD321DuCQJUqOuJKIA0IM2+W2xtYHdKOmZ4dR6fExsd4SXL+WQ==",
"engines": {
"node": ">=6"
}
},
"node_modules/eventemitter3": {
"version": "4.0.7",
"resolved": "https://registry.npmjs.org/eventemitter3/-/eventemitter3-4.0.7.tgz",
"integrity": "sha512-8guHBZCwKnFhYdHr2ysuRWErTwhoN2X8XELRlrRwpmfeY2jjuUN4taQMsULKUVo1K4DvZl+0pgfyoysHxvmvEw=="
},
"node_modules/expr-eval": {
"version": "2.0.2",
"resolved": "https://registry.npmjs.org/expr-eval/-/expr-eval-2.0.2.tgz",
"integrity": "sha512-4EMSHGOPSwAfBiibw3ndnP0AvjDWLsMvGOvWEZ2F96IGk0bIVdjQisOHxReSkE13mHcfbuCiXw+G4y0zv6N8Eg=="
},
"node_modules/flat": {
"version": "5.0.2",
"resolved": "https://registry.npmjs.org/flat/-/flat-5.0.2.tgz",
"integrity": "sha512-b6suED+5/3rTpUBdG1gupIl8MPFCAMA0QXwmljLhvCUKcUvdE4gWky9zpuGCcXHOsz4J9wPGNWq6OKpmIzz3hQ==",
"bin": {
"flat": "cli.js"
}
},
"node_modules/form-data": {
"version": "4.0.0",
"resolved": "https://registry.npmjs.org/form-data/-/form-data-4.0.0.tgz",
"integrity": "sha512-ETEklSGi5t0QMZuiXoA/Q6vcnxcLQP5vdugSpuAyi6SVGi2clPPp+xgEhuMaHC+zGgn31Kd235W35f7Hykkaww==",
"dependencies": {
"asynckit": "^0.4.0",
"combined-stream": "^1.0.8",
"mime-types": "^2.1.12"
},
"engines": {
"node": ">= 6"
}
},
"node_modules/form-data-encoder": {
"version": "1.7.2",
"resolved": "https://registry.npmjs.org/form-data-encoder/-/form-data-encoder-1.7.2.tgz",
"integrity": "sha512-qfqtYan3rxrnCk1VYaA4H+Ms9xdpPqvLZa6xmMgFvhO32x7/3J/ExcTd6qpxM0vH2GdMI+poehyBZvqfMTto8A=="
},
"node_modules/formdata-node": {
"version": "4.4.1",
"resolved": "https://registry.npmjs.org/formdata-node/-/formdata-node-4.4.1.tgz",
"integrity": "sha512-0iirZp3uVDjVGt9p49aTaqjk84TrglENEDuqfdlZQ1roC9CWlPk6Avf8EEnZNcAqPonwkG35x4n3ww/1THYAeQ==",
"dependencies": {
"node-domexception": "1.0.0",
"web-streams-polyfill": "4.0.0-beta.3"
},
"engines": {
"node": ">= 12.20"
}
},
"node_modules/humanize-ms": {
"version": "1.2.1",
"resolved": "https://registry.npmjs.org/humanize-ms/-/humanize-ms-1.2.1.tgz",
"integrity": "sha512-Fl70vYtsAFb/C06PTS9dZBo7ihau+Tu/DNCk/OyHhea07S+aeMWpFFkUaXRa8fI+ScZbEI8dfSxwY7gxZ9SAVQ==",
"dependencies": {
"ms": "^2.0.0"
}
},
"node_modules/is-any-array": {
"version": "2.0.1",
"resolved": "https://registry.npmjs.org/is-any-array/-/is-any-array-2.0.1.tgz",
"integrity": "sha512-UtilS7hLRu++wb/WBAw9bNuP1Eg04Ivn1vERJck8zJthEvXCBEBpGR/33u/xLKWEQf95803oalHrVDptcAvFdQ=="
},
"node_modules/is-buffer": {
"version": "1.1.6",
"resolved": "https://registry.npmjs.org/is-buffer/-/is-buffer-1.1.6.tgz",
"integrity": "sha512-NcdALwpXkTm5Zvvbk7owOUSvVvBKDgKP5/ewfXEznmQFfs4ZRmanOeKBTjRVjka3QFoN6XJ+9F3USqfHqTaU5w=="
},
"node_modules/js-tiktoken": {
"version": "1.0.7",
"resolved": "https://registry.npmjs.org/js-tiktoken/-/js-tiktoken-1.0.7.tgz",
"integrity": "sha512-biba8u/clw7iesNEWLOLwrNGoBP2lA+hTaBLs/D45pJdUPFXyxD6nhcDVtADChghv4GgyAiMKYMiRx7x6h7Biw==",
"dependencies": {
"base64-js": "^1.5.1"
}
},
"node_modules/js-yaml": {
"version": "4.1.0",
"resolved": "https://registry.npmjs.org/js-yaml/-/js-yaml-4.1.0.tgz",
"integrity": "sha512-wpxZs9NoxZaJESJGIZTyDEaYpl0FKSA+FB9aJiyemKhMwkxQg63h4T1KJgUGHpTqPDNRcmmYLugrRjJlBtWvRA==",
"dependencies": {
"argparse": "^2.0.1"
},
"bin": {
"js-yaml": "bin/js-yaml.js"
}
},
"node_modules/jsonpointer": {
"version": "5.0.1",
"resolved": "https://registry.npmjs.org/jsonpointer/-/jsonpointer-5.0.1.tgz",
"integrity": "sha512-p/nXbhSEcu3pZRdkW1OfJhpsVtW1gd4Wa1fnQc9YLiTfAjn0312eMKimbdIQzuZl9aa9xUGaRlP9T/CJE/ditQ==",
"engines": {
"node": ">=0.10.0"
}
},
"node_modules/langchain": {
"version": "0.0.165",
"resolved": "https://registry.npmjs.org/langchain/-/langchain-0.0.165.tgz",
"integrity": "sha512-CpbNpjwaE+9lzjdw+pZz0VgnRrFivEgr7CVp9dDaAb5JpaJAA4V2v6uQ9ZPN+TSqupTQ79HFn2sfyZVEl2EG7Q==",
"dependencies": {
"@anthropic-ai/sdk": "^0.6.2",
"ansi-styles": "^5.0.0",
"binary-extensions": "^2.2.0",
"camelcase": "6",
"decamelize": "^1.2.0",
"expr-eval": "^2.0.2",
"flat": "^5.0.2",
"js-tiktoken": "^1.0.7",
"js-yaml": "^4.1.0",
"jsonpointer": "^5.0.1",
"langchainhub": "~0.0.6",
"langsmith": "~0.0.31",
"ml-distance": "^4.0.0",
"object-hash": "^3.0.0",
"openai": "~4.4.0",
"openapi-types": "^12.1.3",
"p-queue": "^6.6.2",
"p-retry": "4",
"uuid": "^9.0.0",
"yaml": "^2.2.1",
"zod": "^3.22.3",
"zod-to-json-schema": "^3.20.4"
},
"engines": {
"node": ">=18"
},
"peerDependencies": {
"@aws-crypto/sha256-js": "^5.0.0",
"@aws-sdk/client-bedrock-runtime": "^3.422.0",
"@aws-sdk/client-dynamodb": "^3.310.0",
"@aws-sdk/client-kendra": "^3.352.0",
"@aws-sdk/client-lambda": "^3.310.0",
"@aws-sdk/client-s3": "^3.310.0",
"@aws-sdk/client-sagemaker-runtime": "^3.310.0",
"@aws-sdk/client-sfn": "^3.310.0",
"@aws-sdk/credential-provider-node": "^3.388.0",
"@azure/storage-blob": "^12.15.0",
"@clickhouse/client": "^0.0.14",
"@cloudflare/ai": "^1.0.12",
"@elastic/elasticsearch": "^8.4.0",
"@getmetal/metal-sdk": "*",
"@getzep/zep-js": "^0.7.0",
"@gomomento/sdk": "^1.23.0",
"@google-ai/generativelanguage": "^0.2.1",
"@google-cloud/storage": "^6.10.1",
"@huggingface/inference": "^1.5.1",
"@mozilla/readability": "*",
"@notionhq/client": "^2.2.10",
"@opensearch-project/opensearch": "*",
"@pinecone-database/pinecone": "^1.1.0",
"@planetscale/database": "^1.8.0",
"@qdrant/js-client-rest": "^1.2.0",
"@raycast/api": "^1.55.2",
"@smithy/eventstream-codec": "^2.0.5",
"@smithy/protocol-http": "^3.0.6",
"@smithy/signature-v4": "^2.0.10",
"@smithy/util-utf8": "^2.0.0",
"@supabase/postgrest-js": "^1.1.1",
"@supabase/supabase-js": "^2.10.0",
"@tensorflow-models/universal-sentence-encoder": "*",
"@tensorflow/tfjs-converter": "*",
"@tensorflow/tfjs-core": "*",
"@upstash/redis": "^1.20.6",
"@vercel/postgres": "^0.5.0",
"@writerai/writer-sdk": "^0.40.2",
"@xata.io/client": "^0.25.1",
"@xenova/transformers": "^2.5.4",
"@zilliz/milvus2-sdk-node": ">=2.2.7",
"apify-client": "^2.7.1",
"axios": "*",
"cassandra-driver": "^4.6.4",
"cheerio": "^1.0.0-rc.12",
"chromadb": "*",
"cohere-ai": ">=6.0.0",
"d3-dsv": "^2.0.0",
"epub2": "^3.0.1",
"faiss-node": "^0.3.0",
"fast-xml-parser": "^4.2.7",
"firebase-admin": "^11.9.0",
"google-auth-library": "^8.9.0",
"googleapis": "^126.0.1",
"hnswlib-node": "^1.4.2",
"html-to-text": "^9.0.5",
"ignore": "^5.2.0",
"ioredis": "^5.3.2",
"jsdom": "*",
"llmonitor": "*",
"lodash": "^4.17.21",
"mammoth": "*",
"mongodb": "^5.2.0",
"mysql2": "^3.3.3",
"neo4j-driver": "*",
"node-llama-cpp": "*",
"notion-to-md": "^3.1.0",
"pdf-parse": "1.1.1",
"peggy": "^3.0.2",
"pg": "^8.11.0",
"pg-copy-streams": "^6.0.5",
"pickleparser": "^0.1.0",
"playwright": "^1.32.1",
"portkey-ai": "^0.1.11",
"puppeteer": "^19.7.2",
"redis": "^4.6.4",
"replicate": "^0.18.0",
"sonix-speech-recognition": "^2.1.1",
"srt-parser-2": "^1.2.2",
"typeorm": "^0.3.12",
"typesense": "^1.5.3",
"usearch": "^1.1.1",
"vectordb": "^0.1.4",
"voy-search": "0.6.2",
"weaviate-ts-client": "^1.4.0",
"web-auth-library": "^1.0.3",
"youtube-transcript": "^1.0.6",
"youtubei.js": "^5.8.0"
},
"peerDependenciesMeta": {
"@aws-crypto/sha256-js": {
"optional": true
},
"@aws-sdk/client-bedrock-runtime": {
"optional": true
},
"@aws-sdk/client-dynamodb": {
"optional": true
},
"@aws-sdk/client-kendra": {
"optional": true
},
"@aws-sdk/client-lambda": {
"optional": true
},
"@aws-sdk/client-s3": {
"optional": true
},
"@aws-sdk/client-sagemaker-runtime": {
"optional": true
},
"@aws-sdk/client-sfn": {
"optional": true
},
"@aws-sdk/credential-provider-node": {
"optional": true
},
"@azure/storage-blob": {
"optional": true
},
"@clickhouse/client": {
"optional": true
},
"@cloudflare/ai": {
"optional": true
},
"@elastic/elasticsearch": {
"optional": true
},
"@getmetal/metal-sdk": {
"optional": true
},
"@getzep/zep-js": {
"optional": true
},
"@gomomento/sdk": {
"optional": true
},
"@google-ai/generativelanguage": {
"optional": true
},
"@google-cloud/storage": {
"optional": true
},
"@huggingface/inference": {
"optional": true
},
"@mozilla/readability": {
"optional": true
},
"@notionhq/client": {
"optional": true
},
"@opensearch-project/opensearch": {
"optional": true
},
"@pinecone-database/pinecone": {
"optional": true
},
"@planetscale/database": {
"optional": true
},
"@qdrant/js-client-rest": {
"optional": true
},
"@raycast/api": {
"optional": true
},
"@smithy/eventstream-codec": {
"optional": true
},
"@smithy/protocol-http": {
"optional": true
},
"@smithy/signature-v4": {
"optional": true
},
"@smithy/util-utf8": {
"optional": true
},
"@supabase/postgrest-js": {
"optional": true
},
"@supabase/supabase-js": {
"optional": true
},
"@tensorflow-models/universal-sentence-encoder": {
"optional": true
},
"@tensorflow/tfjs-converter": {
"optional": true
},
"@tensorflow/tfjs-core": {
"optional": true
},
"@upstash/redis": {
"optional": true
},
"@vercel/postgres": {
"optional": true
},
"@writerai/writer-sdk": {
"optional": true
},
"@xata.io/client": {
"optional": true
},
"@xenova/transformers": {
"optional": true
},
"@zilliz/milvus2-sdk-node": {
"optional": true
},
"apify-client": {
"optional": true
},
"axios": {
"optional": true
},
"cassandra-driver": {
"optional": true
},
"cheerio": {
"optional": true
},
"chromadb": {
"optional": true
},
"cohere-ai": {
"optional": true
},
"d3-dsv": {
"optional": true
},
"epub2": {
"optional": true
},
"faiss-node": {
"optional": true
},
"fast-xml-parser": {
"optional": true
},
"firebase-admin": {
"optional": true
},
"google-auth-library": {
"optional": true
},
"googleapis": {
"optional": true
},
"hnswlib-node": {
"optional": true
},
"html-to-text": {
"optional": true
},
"ignore": {
"optional": true
},
"ioredis": {
"optional": true
},
"jsdom": {
"optional": true
},
"llmonitor": {
"optional": true
},
"lodash": {
"optional": true
},
"mammoth": {
"optional": true
},
"mongodb": {
"optional": true
},
"mysql2": {
"optional": true
},
"neo4j-driver": {
"optional": true
},
"node-llama-cpp": {
"optional": true
},
"notion-to-md": {
"optional": true
},
"pdf-parse": {
"optional": true
},
"peggy": {
"optional": true
},
"pg": {
"optional": true
},
"pg-copy-streams": {
"optional": true
},
"pickleparser": {
"optional": true
},
"playwright": {
"optional": true
},
"portkey-ai": {
"optional": true
},
"puppeteer": {
"optional": true
},
"redis": {
"optional": true
},
"replicate": {
"optional": true
},
"sonix-speech-recognition": {
"optional": true
},
"srt-parser-2": {
"optional": true
},
"typeorm": {
"optional": true
},
"typesense": {
"optional": true
},
"usearch": {
"optional": true
},
"vectordb": {
"optional": true
},
"voy-search": {
"optional": true
},
"weaviate-ts-client": {
"optional": true
},
"web-auth-library": {
"optional": true
},
"youtube-transcript": {
"optional": true
},
"youtubei.js": {
"optional": true
}
}
},
"node_modules/langchainhub": {
"version": "0.0.6",
"resolved": "https://registry.npmjs.org/langchainhub/-/langchainhub-0.0.6.tgz",
"integrity": "sha512-SW6105T+YP1cTe0yMf//7kyshCgvCTyFBMTgH2H3s9rTAR4e+78DA/BBrUL/Mt4Q5eMWui7iGuAYb3pgGsdQ9w=="
},
"node_modules/langsmith": {
"version": "0.0.42",
"resolved": "https://registry.npmjs.org/langsmith/-/langsmith-0.0.42.tgz",
"integrity": "sha512-sFuN+e7E+pPBIRaRgFqZh/BRBWNHTZNAwi6uj4kydQawooCZYoJmM5snOkiQrhVSvAhgu6xFhLvmfvkPcKzD7w==",
"dependencies": {
"@types/uuid": "^9.0.1",
"commander": "^10.0.1",
"p-queue": "^6.6.2",
"p-retry": "4",
"uuid": "^9.0.0"
},
"bin": {
"langsmith": "dist/cli/main.cjs"
}
},
"node_modules/md5": {
"version": "2.3.0",
"resolved": "https://registry.npmjs.org/md5/-/md5-2.3.0.tgz",
"integrity": "sha512-T1GITYmFaKuO91vxyoQMFETst+O71VUPEU3ze5GNzDm0OWdP8v1ziTaAEPUr/3kLsY3Sftgz242A1SetQiDL7g==",
"dependencies": {
"charenc": "0.0.2",
"crypt": "0.0.2",
"is-buffer": "~1.1.6"
}
},
"node_modules/mime-db": {
"version": "1.52.0",
"resolved": "https://registry.npmjs.org/mime-db/-/mime-db-1.52.0.tgz",
"integrity": "sha512-sPU4uV7dYlvtWJxwwxHD0PuihVNiE7TyAbQ5SWxDCB9mUYvOgroQOwYQQOKPJ8CIbE+1ETVlOoK1UC2nU3gYvg==",
"engines": {
"node": ">= 0.6"
}
},
"node_modules/mime-types": {
"version": "2.1.35",
"resolved": "https://registry.npmjs.org/mime-types/-/mime-types-2.1.35.tgz",
"integrity": "sha512-ZDY+bPm5zTTF+YpCrAU9nK0UgICYPT0QtT1NZWFv4s++TNkcgVaT0g6+4R2uI4MjQjzysHB1zxuWL50hzaeXiw==",
"dependencies": {
"mime-db": "1.52.0"
},
"engines": {
"node": ">= 0.6"
}
},
"node_modules/ml-array-mean": {
"version": "1.1.6",
"resolved": "https://registry.npmjs.org/ml-array-mean/-/ml-array-mean-1.1.6.tgz",
"integrity": "sha512-MIdf7Zc8HznwIisyiJGRH9tRigg3Yf4FldW8DxKxpCCv/g5CafTw0RRu51nojVEOXuCQC7DRVVu5c7XXO/5joQ==",
"dependencies": {
"ml-array-sum": "^1.1.6"
}
},
"node_modules/ml-array-sum": {
"version": "1.1.6",
"resolved": "https://registry.npmjs.org/ml-array-sum/-/ml-array-sum-1.1.6.tgz",
"integrity": "sha512-29mAh2GwH7ZmiRnup4UyibQZB9+ZLyMShvt4cH4eTK+cL2oEMIZFnSyB3SS8MlsTh6q/w/yh48KmqLxmovN4Dw==",
"dependencies": {
"is-any-array": "^2.0.0"
}
},
"node_modules/ml-distance": {
"version": "4.0.1",
"resolved": "https://registry.npmjs.org/ml-distance/-/ml-distance-4.0.1.tgz",
"integrity": "sha512-feZ5ziXs01zhyFUUUeZV5hwc0f5JW0Sh0ckU1koZe/wdVkJdGxcP06KNQuF0WBTj8FttQUzcvQcpcrOp/XrlEw==",
"dependencies": {
"ml-array-mean": "^1.1.6",
"ml-distance-euclidean": "^2.0.0",
"ml-tree-similarity": "^1.0.0"
}
},
"node_modules/ml-distance-euclidean": {
"version": "2.0.0",
"resolved": "https://registry.npmjs.org/ml-distance-euclidean/-/ml-distance-euclidean-2.0.0.tgz",
"integrity": "sha512-yC9/2o8QF0A3m/0IXqCTXCzz2pNEzvmcE/9HFKOZGnTjatvBbsn4lWYJkxENkA4Ug2fnYl7PXQxnPi21sgMy/Q=="
},
"node_modules/ml-tree-similarity": {
"version": "1.0.0",
"resolved": "https://registry.npmjs.org/ml-tree-similarity/-/ml-tree-similarity-1.0.0.tgz",
"integrity": "sha512-XJUyYqjSuUQkNQHMscr6tcjldsOoAekxADTplt40QKfwW6nd++1wHWV9AArl0Zvw/TIHgNaZZNvr8QGvE8wLRg==",
"dependencies": {
"binary-search": "^1.3.5",
"num-sort": "^2.0.0"
}
},
"node_modules/ms": {
"version": "2.1.3",
"resolved": "https://registry.npmjs.org/ms/-/ms-2.1.3.tgz",
"integrity": "sha512-6FlzubTLZG3J2a/NVCAleEhjzq5oxgHyaCU9yYXvcLsvoVaHJq/s5xXI6/XXP6tz7R9xAOtHnSO/tXtF3WRTlA=="
},
"node_modules/node-domexception": {
"version": "1.0.0",
"resolved": "https://registry.npmjs.org/node-domexception/-/node-domexception-1.0.0.tgz",
"integrity": "sha512-/jKZoMpw0F8GRwl4/eLROPA3cfcXtLApP0QzLmUT/HuPCZWyB7IY9ZrMeKw2O/nFIqPQB3PVM9aYm0F312AXDQ==",
"funding": [
{
"type": "github",
"url": "https://github.com/sponsors/jimmywarting"
},
{
"type": "github",
"url": "https://paypal.me/jimmywarting"
}
],
"engines": {
"node": ">=10.5.0"
}
},
"node_modules/node-fetch": {
"version": "2.7.0",
"resolved": "https://registry.npmjs.org/node-fetch/-/node-fetch-2.7.0.tgz",
"integrity": "sha512-c4FRfUm/dbcWZ7U+1Wq0AwCyFL+3nt2bEw05wfxSz+DWpWsitgmSgYmy2dQdWyKC1694ELPqMs/YzUSNozLt8A==",
"dependencies": {
"whatwg-url": "^5.0.0"
},
"engines": {
"node": "4.x || >=6.0.0"
},
"peerDependencies": {
"encoding": "^0.1.0"
},
"peerDependenciesMeta": {
"encoding": {
"optional": true
}
}
},
"node_modules/num-sort": {
"version": "2.1.0",
"resolved": "https://registry.npmjs.org/num-sort/-/num-sort-2.1.0.tgz",
"integrity": "sha512-1MQz1Ed8z2yckoBeSfkQHHO9K1yDRxxtotKSJ9yvcTUUxSvfvzEq5GwBrjjHEpMlq/k5gvXdmJ1SbYxWtpNoVg==",
"engines": {
"node": ">=8"
},
"funding": {
"url": "https://github.com/sponsors/sindresorhus"
}
},
"node_modules/object-hash": {
"version": "3.0.0",
"resolved": "https://registry.npmjs.org/object-hash/-/object-hash-3.0.0.tgz",
"integrity": "sha512-RSn9F68PjH9HqtltsSnqYC1XXoWe9Bju5+213R98cNGttag9q9yAOTzdbsqvIa7aNm5WffBZFpWYr2aWrklWAw==",
"engines": {
"node": ">= 6"
}
},
"node_modules/openai": {
"version": "4.4.0",
"resolved": "https://registry.npmjs.org/openai/-/openai-4.4.0.tgz",
"integrity": "sha512-JN0t628Kh95T0IrXl0HdBqnlJg+4Vq0Bnh55tio+dfCnyzHvMLiWyCM9m726MAJD2YkDU4/8RQB6rNbEq9ct2w==",
"dependencies": {
"@types/node": "^18.11.18",
"@types/node-fetch": "^2.6.4",
"abort-controller": "^3.0.0",
"agentkeepalive": "^4.2.1",
"digest-fetch": "^1.3.0",
"form-data-encoder": "1.7.2",
"formdata-node": "^4.3.2",
"node-fetch": "^2.6.7"
},
"bin": {
"openai": "bin/cli"
}
},
"node_modules/openapi-types": {
"version": "12.1.3",
"resolved": "https://registry.npmjs.org/openapi-types/-/openapi-types-12.1.3.tgz",
"integrity": "sha512-N4YtSYJqghVu4iek2ZUvcN/0aqH1kRDuNqzcycDxhOUpg7GdvLa2F3DgS6yBNhInhv2r/6I0Flkn7CqL8+nIcw=="
},
"node_modules/p-finally": {
"version": "1.0.0",
"resolved": "https://registry.npmjs.org/p-finally/-/p-finally-1.0.0.tgz",
"integrity": "sha512-LICb2p9CB7FS+0eR1oqWnHhp0FljGLZCWBE9aix0Uye9W8LTQPwMTYVGWQWIw9RdQiDg4+epXQODwIYJtSJaow==",
"engines": {
"node": ">=4"
}
},
"node_modules/p-queue": {
"version": "6.6.2",
"resolved": "https://registry.npmjs.org/p-queue/-/p-queue-6.6.2.tgz",
"integrity": "sha512-RwFpb72c/BhQLEXIZ5K2e+AhgNVmIejGlTgiB9MzZ0e93GRvqZ7uSi0dvRF7/XIXDeNkra2fNHBxTyPDGySpjQ==",
"dependencies": {
"eventemitter3": "^4.0.4",
"p-timeout": "^3.2.0"
},
"engines": {
"node": ">=8"
},
"funding": {
"url": "https://github.com/sponsors/sindresorhus"
}
},
"node_modules/p-retry": {
"version": "4.6.2",
"resolved": "https://registry.npmjs.org/p-retry/-/p-retry-4.6.2.tgz",
"integrity": "sha512-312Id396EbJdvRONlngUx0NydfrIQ5lsYu0znKVUzVvArzEIt08V1qhtyESbGVd1FGX7UKtiFp5uwKZdM8wIuQ==",
"dependencies": {
"@types/retry": "0.12.0",
"retry": "^0.13.1"
},
"engines": {
"node": ">=8"
}
},
"node_modules/p-timeout": {
"version": "3.2.0",
"resolved": "https://registry.npmjs.org/p-timeout/-/p-timeout-3.2.0.tgz",
"integrity": "sha512-rhIwUycgwwKcP9yTOOFK/AKsAopjjCakVqLHePO3CC6Mir1Z99xT+R63jZxAT5lFZLa2inS5h+ZS2GvR99/FBg==",
"dependencies": {
"p-finally": "^1.0.0"
},
"engines": {
"node": ">=8"
}
},
"node_modules/retry": {
"version": "0.13.1",
"resolved": "https://registry.npmjs.org/retry/-/retry-0.13.1.tgz",
"integrity": "sha512-XQBQ3I8W1Cge0Seh+6gjj03LbmRFWuoszgK9ooCpwYIrhhoO80pfq4cUkU5DkknwfOfFteRwlZ56PYOGYyFWdg==",
"engines": {
"node": ">= 4"
}
},
"node_modules/tr46": {
"version": "0.0.3",
"resolved": "https://registry.npmjs.org/tr46/-/tr46-0.0.3.tgz",
"integrity": "sha512-N3WMsuqV66lT30CrXNbEjx4GEwlow3v6rr4mCcv6prnfwhS01rkgyFdjPNBYd9br7LpXV1+Emh01fHnq2Gdgrw=="
},
"node_modules/typescript": {
"version": "5.2.2",
"resolved": "https://registry.npmjs.org/typescript/-/typescript-5.2.2.tgz",
"integrity": "sha512-mI4WrpHsbCIcwT9cF4FZvr80QUeKvsUsUvKDoR+X/7XHQH98xYD8YHZg7ANtz2GtZt/CBq2QJ0thkGJMHfqc1w==",
"dev": true,
"bin": {
"tsc": "bin/tsc",
"tsserver": "bin/tsserver"
},
"engines": {
"node": ">=14.17"
}
},
"node_modules/uuid": {
"version": "9.0.1",
"resolved": "https://registry.npmjs.org/uuid/-/uuid-9.0.1.tgz",
"integrity": "sha512-b+1eJOlsR9K8HJpow9Ok3fiWOWSIcIzXodvv0rQjVoOVNpWMpxf1wZNpt4y9h10odCNrqnYp1OBzRktckBe3sA==",
"funding": [
"https://github.com/sponsors/broofa",
"https://github.com/sponsors/ctavan"
],
"bin": {
"uuid": "dist/bin/uuid"
}
},
"node_modules/web-streams-polyfill": {
"version": "4.0.0-beta.3",
"resolved": "https://registry.npmjs.org/web-streams-polyfill/-/web-streams-polyfill-4.0.0-beta.3.tgz",
"integrity": "sha512-QW95TCTaHmsYfHDybGMwO5IJIM93I/6vTRk+daHTWFPhwh+C8Cg7j7XyKrwrj8Ib6vYXe0ocYNrmzY4xAAN6ug==",
"engines": {
"node": ">= 14"
}
},
"node_modules/webidl-conversions": {
"version": "3.0.1",
"resolved": "https://registry.npmjs.org/webidl-conversions/-/webidl-conversions-3.0.1.tgz",
"integrity": "sha512-2JAn3z8AR6rjK8Sm8orRC0h/bcl/DqL7tRPdGZ4I1CjdF+EaMLmYxBHyXuKL849eucPFhvBoxMsflfOb8kxaeQ=="
},
"node_modules/whatwg-url": {
"version": "5.0.0",
"resolved": "https://registry.npmjs.org/whatwg-url/-/whatwg-url-5.0.0.tgz",
"integrity": "sha512-saE57nupxk6v3HY35+jzBwYa0rKSy0XR8JSxZPwgLr7ys0IBzhGviA1/TUGJLmSVqs8pb9AnvICXEuOHLprYTw==",
"dependencies": {
"tr46": "~0.0.3",
"webidl-conversions": "^3.0.0"
}
},
"node_modules/yaml": {
"version": "2.3.2",
"resolved": "https://registry.npmjs.org/yaml/-/yaml-2.3.2.tgz",
"integrity": "sha512-N/lyzTPaJasoDmfV7YTrYCI0G/3ivm/9wdG0aHuheKowWQwGTsK0Eoiw6utmzAnI6pkJa0DUVygvp3spqqEKXg==",
"engines": {
"node": ">= 14"
}
},
"node_modules/zod": {
"version": "3.22.4",
"resolved": "https://registry.npmjs.org/zod/-/zod-3.22.4.tgz",
"integrity": "sha512-iC+8Io04lddc+mVqQ9AZ7OQ2MrUKGN+oIQyq1vemgt46jwCwLfhq7/pwnBnNXXXZb8VTVLKwp9EDkx+ryxIWmg==",
"funding": {
"url": "https://github.com/sponsors/colinhacks"
}
},
"node_modules/zod-to-json-schema": {
"version": "3.21.4",
"resolved": "https://registry.npmjs.org/zod-to-json-schema/-/zod-to-json-schema-3.21.4.tgz",
"integrity": "sha512-fjUZh4nQ1s6HMccgIeE0VP4QG/YRGPmyjO9sAh890aQKPEk3nqbfUXhMFaC+Dr5KvYBm8BCyvfpZf2jY9aGSsw==",
"peerDependencies": {
"zod": "^3.21.4"
}
}
}
}

View File

@@ -0,0 +1,13 @@
{
"scripts": {
"start": "tsx main.ts"
},
"devDependencies": {
"tsx": "^4.6.2",
"typescript": "^5.3.3"
},
"dependencies": {
"langchain": "^0.0.165",
"readline": "^1.3.0"
}
}

View File

@@ -0,0 +1,5 @@
FROM llama2
PARAMETER temperature 1
SYSTEM """
You are Mario from super mario bros, acting as an assistant.
"""

Binary file not shown.

After

Width:  |  Height:  |  Size: 446 KiB

View File

@@ -0,0 +1,43 @@
<img src="logo.png" alt="image of Italian plumber" height="200"/>
# Example character: Mario
This example shows how to create a basic character using Llama2 as the base model.
To run this example:
1. Download the Modelfile
2. `ollama pull llama2` to get the base model used in the model file.
3. `ollama create NAME -f ./Modelfile`
4. `ollama run NAME`
Ask it some questions like "Who are you?" or "Is Peach in trouble again?"
## Editing this file
What the model file looks like:
```
FROM llama2
PARAMETER temperature 1
SYSTEM """
You are Mario from Super Mario Bros, acting as an assistant.
"""
```
What if you want to change its behaviour?
- Try changing the prompt
- Try changing the parameters [Docs](https://github.com/jmorganca/ollama/blob/main/docs/modelfile.md)
- Try changing the model (e.g. An uncensored model by `FROM wizard-vicuna` this is the wizard-vicuna uncensored model )
Once the changes are made,
1. `ollama create NAME -f ./Modelfile`
2. `ollama run NAME`
3. Iterate until you are happy with the results.
Notes:
- This example is for research purposes only. There is no affiliation with any entity.
- When using an uncensored model, please be aware that it may generate offensive content.

View File

@@ -0,0 +1,23 @@
# Example Modelfile - Tweetwriter
This simple examples shows what you can do without any code, simply relying on a Modelfile. The file has two instructions:
1. FROM - The From instructions defines the parent model to use for this one. If you choose a model from the library, you can enter just the model name. For all other models, you need to specify the namespace as well. You could also use a local file. Just include the relative path to the converted, quantized model weights file. To learn more about creating that file, see the `import.md` file in the docs folder of this repository.
2. SYSTEM - This defines the system prompt for the model and overrides the system prompt from the parent model.
## Running the Example
1. Create the model:
```bash
ollama create tweetwriter
```
2. Enter a topic to generate a tweet about.
3. Show the Modelfile in the REPL.
```bash
/show modelfile
```
Notice that the FROM and SYSTEM match what was in the file. But there is also a TEMPLATE and PARAMETER. These are inherited from the parent model.

View File

@@ -0,0 +1,20 @@
FROM mistral
SYSTEM """
You are an experienced Devops engineer focused on docker. When given specifications for a particular need or application you know the best way to host that within a docker container. For instance if someone tells you they want an nginx server to host files located at /web you will answer as follows
---start
FROM nginx:alpine
COPY /myweb /usr/share/nginx/html
EXPOSE 80
---end
Notice that the answer you should give is just the contents of the dockerfile with no explanation and there are three dashes and the word start at the beginning and 3 dashes and the word end. The full output can be piped into a file and run as is. Here is another example. The user will ask to launch a Postgres server with a password of abc123. And the response should be
---start
FROM postgres:latest
ENV POSTGRES_PASSWORD=abc123
EXPOSE 5432
---end
Again it's just the contents of the dockerfile and nothing else.
"""

View File

@@ -0,0 +1,31 @@
# DockerIt
DockerIt is a tool to help you build and run your application in a Docker container. It consists of a model that defines the system prompt and model weights to use, along with a python script to then build the container and run the image automatically.
## Running the Example
1. Ensure you have the `mattw/dockerit` model installed:
```bash
ollama pull mattw/dockerit
```
2. Make sure Docker is running on your machine.
3. Install the Python Requirements.
```bash
pip install -r requirements.txt
```
4. Run the example:
```bash
python dockerit.py "simple postgres server with admin password set to 123"
```
5. Enter the name you would like to use for your container image.
## Caveats
This is a simple example. It's assuming the Dockerfile content generated is going to work. In many cases, even with simple web servers, it fails when trying to copy files that don't exist. It's simply an example of what you could possibly do.

View File

@@ -0,0 +1,17 @@
import requests, json, docker, io, sys
inputDescription = " ".join(sys.argv[1:])
imageName = input("Enter the name of the image: ")
client = docker.from_env()
s = requests.Session()
output=""
with s.post('http://localhost:11434/api/generate', json={'model': 'dockerit', 'prompt': inputDescription}, stream=True) as r:
for line in r.iter_lines():
if line:
j = json.loads(line)
if "response" in j:
output = output +j["response"]
output = output[output.find("---start")+9:output.find("---end")-1]
f = io.BytesIO(bytes(output, 'utf-8'))
client.images.build(fileobj=f, tag=imageName)
container = client.containers.run(imageName, detach=True)
print("Container named", container.name, " started with id: ",container.id)

View File

@@ -0,0 +1 @@
docker

View File

@@ -0,0 +1,31 @@
import requests
import json
import random
model = "llama2"
template = {
"firstName": "",
"lastName": "",
"address": {
"street": "",
"city": "",
"state": "",
"zipCode": ""
},
"phoneNumber": ""
}
prompt = f"generate one realistically believable sample data set of a persons first name, last name, address in the US, and phone number. \nUse the following template: {json.dumps(template)}."
data = {
"prompt": prompt,
"model": model,
"format": "json",
"stream": False,
"options": {"temperature": 2.5, "top_p": 0.99, "top_k": 100},
}
print(f"Generating a sample user")
response = requests.post("http://localhost:11434/api/generate", json=data, stream=False)
json_data = json.loads(response.text)
print(json.dumps(json.loads(json_data["response"]), indent=2))

View File

@@ -0,0 +1,31 @@
import requests
import json
import random
countries = [
"United States",
"United Kingdom",
"the Netherlands",
"Germany",
"Mexico",
"Canada",
"France",
]
country = random.choice(countries)
model = "llama2"
prompt = f"generate one realistically believable sample data set of a persons first name, last name, address in {country}, and phone number. Do not use common names. Respond using JSON. Key names should have no backslashes, values should use plain ascii with no special characters."
data = {
"prompt": prompt,
"model": model,
"format": "json",
"stream": False,
"options": {"temperature": 2.5, "top_p": 0.99, "top_k": 100},
}
print(f"Generating a sample user in {country}")
response = requests.post("http://localhost:11434/api/generate", json=data, stream=False)
json_data = json.loads(response.text)
print(json.dumps(json.loads(json_data["response"]), indent=2))

View File

@@ -0,0 +1,60 @@
# JSON Output Example
![llmjson 2023-11-10 15_31_31](https://github.com/jmorganca/ollama/assets/633681/e599d986-9b4a-4118-81a4-4cfe7e22da25)
There are two python scripts in this example. `randomaddresses.py` generates random addresses from different countries. `predefinedschema.py` sets a template for the model to fill in.
## Running the Example
1. Ensure you have the `llama2` model installed:
```bash
ollama pull llama2
```
2. Install the Python Requirements.
```bash
pip install -r requirements.txt
```
3. Run the Random Addresses example:
```bash
python randomaddresses.py
```
4. Run the Predefined Schema example:
```bash
python predefinedschema.py
```
## Review the Code
Both programs are basically the same, with a different prompt for each, demonstrating two different ideas. The key part of getting JSON out of a model is to state in the prompt or system prompt that it should respond using JSON, and specifying the `format` as `json` in the data body.
```python
prompt = f"generate one realistically believable sample data set of a persons first name, last name, address in {country}, and phone number. Do not use common names. Respond using JSON. Key names should with no backslashes, values should use plain ascii with no special characters."
data = {
"prompt": prompt,
"model": model,
"format": "json",
"stream": False,
"options": {"temperature": 2.5, "top_p": 0.99, "top_k": 100},
}
```
When running `randomaddresses.py` you will see that the schema changes and adapts to the chosen country.
In `predefinedschema.py`, a template has been specified in the prompt as well. It's been defined as JSON and then dumped into the prompt string to make it easier to work with.
Both examples turn streaming off so that we end up with the completed JSON all at once. We need to convert the `response.text` to JSON so that when we output it as a string we can set the indent spacing to make the output easy to read.
```python
response = requests.post("http://localhost:11434/api/generate", json=data, stream=False)
json_data = json.loads(response.text)
print(json.dumps(json.loads(json_data["response"]), indent=2))
```

View File

@@ -0,0 +1 @@
Requests==2.31.0

View File

@@ -0,0 +1,8 @@
FROM codebooga:latest
SYSTEM """
You are a log file analyzer. You will receive a set of lines from a log file for some software application, find the errors and other interesting aspects of the logs, and explain them so a new user can understand what they mean. If there are any steps they can do to resolve them, list the steps in your answer.
"""
PARAMETER TEMPERATURE 0.3

View File

@@ -0,0 +1,41 @@
import sys
import re
import requests
import json
# prelines and postlines represent the number of lines of context to include in the output around the error
prelines = 10
postlines = 10
def find_errors_in_log_file():
if len(sys.argv) < 2:
print("Usage: python loganalysis.py <filename>")
return
log_file_path = sys.argv[1]
with open(log_file_path, 'r') as log_file:
log_lines = log_file.readlines()
error_logs = []
for i, line in enumerate(log_lines):
if "error" in line.lower():
start_index = max(0, i - prelines)
end_index = min(len(log_lines), i + postlines + 1)
error_logs.extend(log_lines[start_index:end_index])
return error_logs
error_logs = find_errors_in_log_file()
data = {
"prompt": "\n".join(error_logs),
"model": "mattw/loganalyzer"
}
response = requests.post("http://localhost:11434/api/generate", json=data, stream=True)
for line in response.iter_lines():
if line:
json_data = json.loads(line)
if json_data['done'] == False:
print(json_data['response'], end='', flush=True)

View File

@@ -0,0 +1,32 @@
2023-11-10 07:17:40 /docker-entrypoint.sh: /docker-entrypoint.d/ is not empty, will attempt to perform configuration
2023-11-10 07:17:40 /docker-entrypoint.sh: Looking for shell scripts in /docker-entrypoint.d/
2023-11-10 07:17:40 /docker-entrypoint.sh: Launching /docker-entrypoint.d/10-listen-on-ipv6-by-default.sh
2023-11-10 07:17:40 10-listen-on-ipv6-by-default.sh: info: Getting the checksum of /etc/nginx/conf.d/default.conf
2023-11-10 07:17:40 10-listen-on-ipv6-by-default.sh: info: Enabled listen on IPv6 in /etc/nginx/conf.d/default.conf
2023-11-10 07:17:40 /docker-entrypoint.sh: Sourcing /docker-entrypoint.d/15-local-resolvers.envsh
2023-11-10 07:17:40 /docker-entrypoint.sh: Launching /docker-entrypoint.d/20-envsubst-on-templates.sh
2023-11-10 07:17:40 /docker-entrypoint.sh: Launching /docker-entrypoint.d/30-tune-worker-processes.sh
2023-11-10 07:17:40 /docker-entrypoint.sh: Configuration complete; ready for start up
2023-11-10 07:17:40 2023/11/10 13:17:40 [notice] 1#1: using the "epoll" event method
2023-11-10 07:17:40 2023/11/10 13:17:40 [notice] 1#1: nginx/1.25.3
2023-11-10 07:17:40 2023/11/10 13:17:40 [notice] 1#1: built by gcc 12.2.0 (Debian 12.2.0-14)
2023-11-10 07:17:40 2023/11/10 13:17:40 [notice] 1#1: OS: Linux 6.4.16-linuxkit
2023-11-10 07:17:40 2023/11/10 13:17:40 [notice] 1#1: getrlimit(RLIMIT_NOFILE): 1048576:1048576
2023-11-10 07:17:40 2023/11/10 13:17:40 [notice] 1#1: start worker processes
2023-11-10 07:17:40 2023/11/10 13:17:40 [notice] 1#1: start worker process 29
2023-11-10 07:17:40 2023/11/10 13:17:40 [notice] 1#1: start worker process 30
2023-11-10 07:17:40 2023/11/10 13:17:40 [notice] 1#1: start worker process 31
2023-11-10 07:17:40 2023/11/10 13:17:40 [notice] 1#1: start worker process 32
2023-11-10 07:17:40 2023/11/10 13:17:40 [notice] 1#1: start worker process 33
2023-11-10 07:17:40 2023/11/10 13:17:40 [notice] 1#1: start worker process 34
2023-11-10 07:17:40 2023/11/10 13:17:40 [notice] 1#1: start worker process 35
2023-11-10 07:17:40 2023/11/10 13:17:40 [notice] 1#1: start worker process 36
2023-11-10 07:17:40 2023/11/10 13:17:40 [notice] 1#1: start worker process 37
2023-11-10 07:17:40 2023/11/10 13:17:40 [notice] 1#1: start worker process 38
2023-11-10 07:17:44 192.168.65.1 - - [10/Nov/2023:13:17:43 +0000] "GET / HTTP/1.1" 200 615 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/119.0.0.0 Safari/537.36" "-"
2023-11-10 07:17:44 2023/11/10 13:17:44 [error] 29#29: *1 open() "/usr/share/nginx/html/favicon.ico" failed (2: No such file or directory), client: 192.168.65.1, server: localhost, request: "GET /favicon.ico HTTP/1.1", host: "localhost:8080", referrer: "http://localhost:8080/"
2023-11-10 07:17:44 192.168.65.1 - - [10/Nov/2023:13:17:44 +0000] "GET /favicon.ico HTTP/1.1" 404 555 "http://localhost:8080/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/119.0.0.0 Safari/537.36" "-"
2023-11-10 07:17:50 2023/11/10 13:17:50 [error] 29#29: *1 open() "/usr/share/nginx/html/ahstat" failed (2: No such file or directory), client: 192.168.65.1, server: localhost, request: "GET /ahstat HTTP/1.1", host: "localhost:8080"
2023-11-10 07:17:50 192.168.65.1 - - [10/Nov/2023:13:17:50 +0000] "GET /ahstat HTTP/1.1" 404 555 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/119.0.0.0 Safari/537.36" "-"
2023-11-10 07:18:53 2023/11/10 13:18:53 [error] 29#29: *1 open() "/usr/share/nginx/html/ahstat" failed (2: No such file or directory), client: 192.168.65.1, server: localhost, request: "GET /ahstat HTTP/1.1", host: "localhost:8080"
2023-11-10 07:18:53 192.168.65.1 - - [10/Nov/2023:13:18:53 +0000] "GET /ahstat HTTP/1.1" 404 555 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/119.0.0.0 Safari/537.36" "-"

View File

@@ -0,0 +1,70 @@
# Log Analysis example
![loganalyzer 2023-11-10 08_53_29](https://github.com/jmorganca/ollama/assets/633681/ad30f1fc-321f-4953-8914-e30e24db9921)
This example shows one possible way to create a log file analyzer. It uses the model **mattw/loganalyzer** which is based on **codebooga**, a 34b parameter model.
To use it, run:
`python loganalysis.py <logfile>`
You can try this with the `logtest.logfile` file included in this directory.
## Running the Example
1. Ensure you have the `mattw/loganalyzer` model installed:
```bash
ollama pull mattw/loganalyzer
```
2. Install the Python Requirements.
```bash
pip install -r requirements.txt
```
3. Run the example:
```bash
python loganalysis.py logtest.logfile
```
## Review the code
The first part of this example is a Modelfile that takes `codebooga` and applies a new System Prompt:
```plaintext
SYSTEM """
You are a log file analyzer. You will receive a set of lines from a log file for some software application, find the errors and other interesting aspects of the logs, and explain them so a new user can understand what they mean. If there are any steps they can do to resolve them, list the steps in your answer.
"""
```
This model is available at https://ollama.ai/mattw/loganalyzer. You can customize it and add to your own namespace using the command `ollama create <namespace/modelname> -f <path-to-modelfile>` then `ollama push <namespace/modelname>`.
Then loganalysis.py scans all the lines in the given log file and searches for the word 'error'. When the word is found, the 10 lines before and after are set as the prompt for a call to the Generate API.
```python
data = {
"prompt": "\n".join(error_logs),
"model": "mattw/loganalyzer"
}
```
Finally, the streamed output is parsed and the response field in the output is printed to the line.
```python
response = requests.post("http://localhost:11434/api/generate", json=data, stream=True)
for line in response.iter_lines():
if line:
json_data = json.loads(line)
if json_data['done'] == False:
print(json_data['response'], end='')
```
## Next Steps
There is a lot more that can be done here. This is a simple way to detect errors, looking for the word error. Perhaps it would be interesting to find anomalous activity in the logs. It could be interesting to create embeddings for each line and compare them, looking for similar lines. Or look into applying Levenshtein Distance algorithms to find similar lines to help identify the anomalous lines.
Try different models and different prompts to analyze the data. You could consider adding retrieval augmented generation (RAG) to this to help understand newer log formats.

Some files were not shown because too many files have changed in this diff Show More