Compare commits

..

99 Commits

Author SHA1 Message Date
Michael Yang
1c198977ec ci: fix linux archive (#8862)
the find returns intermediate directories which pulls the parent
directories. it also omits files under lib/ollama.

switch back to globbing
2025-02-05 19:45:58 -08:00
zyphixor
330b6c50b0 readme: add simple-discord-ai to community integrations (#8659) 2025-02-05 18:35:04 -08:00
Diego Pereira
928911bc68 runner: avoid buffer overwrite when generating multiple embeddings (#8714)
Shield the code processing the embedding result
from subsequent calls that may overwrite the same
buffer to process a second input when retrieving
model embeddings.
2025-02-05 16:53:33 -08:00
Michael Yang
5b446cc815 chore: update gitattributes (#8860)
* chore: update gitattributes
* chore: add build info source
2025-02-05 16:37:18 -08:00
Daniel Lok
451c1596af readme: add MLflow Tracing as an observability integration (#8811) 2025-02-05 16:04:24 -08:00
Michael Yang
932bded12f chore: add optional field for server logs 2025-02-05 15:55:32 -08:00
Michael Yang
070ad913ac ci: fix linux archive 2025-02-05 15:08:02 -08:00
Azis Alvriyanto
8d8b9f83ae format: byte formatting test coverage (#8692)
Removed redundant checks and streamlined the switch-case structure.
Added test cases for both HumanBytes and HumanBytes2 to cover a wide range of scenarios.
2025-02-05 12:23:07 -08:00
Jeffrey Morgan
f00d359a67 docs: add section in development.md on library detection (#8855) 2025-02-05 11:16:27 -08:00
Yashwanth A
291def6adb server: increase timeout in stall detection from 5s to 30s (#8831)
In some cases, downloads slow due to disk i/o or other factors,
causing the download to restart a part. This causes the download
to "reverse" in percent completion. By increasing the timeout to 30s,
this should happen less frequently.
2025-02-05 10:00:26 -08:00
Jeffrey Morgan
cd3fbf1c49 llama: use dynamic backend loading for mllama and clip (#8835) 2025-02-05 09:46:56 -08:00
Jeffrey Morgan
c852b8e021 server: always print upload/download part info (#8832) 2025-02-04 19:30:49 -08:00
William
d8932c55e7 server: fix out of bounds exception on model download (#8746) 2025-02-04 18:52:47 -08:00
Michael Yang
63f0269f7f ci: split docker build by platform
this improves build reliability and concurrency
2025-02-04 17:04:27 -08:00
Jeffrey Morgan
4759ecae19 ml/backend/ggml: fix library loading on macOS amd64 (#8827) 2025-02-04 15:05:39 -08:00
Michael Yang
65b7ecac7b fix extra quote 2025-02-04 08:35:30 -08:00
Michael Yang
f9d2d89135 fix linux archive 2025-02-03 16:12:33 -08:00
Michael Yang
669dc31cf3 fix build 2025-02-03 15:10:51 -08:00
Tilman Griesel
d4d338c224 readme: add Chipper to community integrations (#8803) 2025-02-03 14:18:19 -08:00
Melroy van den Berg
bfdeffc375 docs: use OLLAMA_VERSION=0.5.7 for install version override (#8802) 2025-02-03 13:54:08 -08:00
Michael Yang
e806184023 fix release workflow 2025-02-03 13:19:57 -08:00
Jeffrey Morgan
50566113ac llm: do not error if LibOllamaPath does not exist (#8801) 2025-02-03 12:27:48 -08:00
Davide Bertoni
ad22ace439 docs: add missing json and shell code blocks in api.md (#8766) 2025-02-02 13:12:55 -08:00
Anıl Kaynar
f4321a421c readme: add MinimalNextOllamaChat to community integrations (#8767) 2025-02-02 12:56:10 -08:00
Michael Yang
475333d533 fix docker build-args
env context is not accessible from job.*.strategy. since it's in the
environment, just tell docker to use the environment variable[1]

[1]: https://docs.docker.com/reference/cli/docker/buildx/build/#build-arg
2025-01-31 14:56:02 -08:00
Michael Yang
39fd89308c build: set CFLAGS=-O3 specifically for cpu.go 2025-01-31 10:25:39 -08:00
Michael Yang
548a9f56a6 Revert "cgo: use O3"
This reverts commit bea1f1fac6.
2025-01-31 10:25:39 -08:00
Michael Yang
3f0cb36bdb build: set goflags in linux release 2025-01-30 13:07:32 -08:00
Michael Yang
bea1f1fac6 cgo: use O3 2025-01-30 12:21:50 -08:00
Jeffrey Morgan
5d75d837ef discover: fix default LibOllamaPath value (#8702) 2025-01-30 12:21:38 -08:00
Parth Sareen
711648c9bb docs: update api.md with streaming with tools is enabled (#8676) 2025-01-29 15:14:30 -08:00
Michael Yang
dcfb7a105c next build (#8539)
* add build to .dockerignore

* test: only build one arch

* add build to .gitignore

* fix ccache path

* filter amdgpu targets

* only filter if autodetecting

* Don't clobber gpu list for default runner

This ensures the GPU specific environment variables are set properly

* explicitly set CXX compiler for HIP

* Update build_windows.ps1

This isn't complete, but is close.  Dependencies are missing, and it only builds the "default" preset.

* build: add ollama subdir

* add .git to .dockerignore

* docs: update development.md

* update build_darwin.sh

* remove unused scripts

* llm: add cwd and build/lib/ollama to library paths

* default DYLD_LIBRARY_PATH to LD_LIBRARY_PATH in runner on macOS

* add additional cmake output vars for msvc

* interim edits to make server detection logic work with dll directories like lib/ollama/cuda_v12

* remove unncessary filepath.Dir, cleanup

* add hardware-specific directory to path

* use absolute server path

* build: linux arm

* cmake install targets

* remove unused files

* ml: visit each library path once

* build: skip cpu variants on arm

* build: install cpu targets

* build: fix workflow

* shorter names

* fix rocblas install

* docs: clean up development.md

* consistent build dir removal in development.md

* silence -Wimplicit-function-declaration build warnings in ggml-cpu

* update readme

* update development readme

* llm: update library lookup logic now that there is one runner (#8587)

* tweak development.md

* update docs

* add windows cuda/rocm tests

---------

Co-authored-by: jmorganca <jmorganca@gmail.com>
Co-authored-by: Daniel Hiltgen <daniel@ollama.com>
2025-01-29 15:03:38 -08:00
Xiaofu Huang
2ef3c803a1 readme: add AI Toolkit for VSCode to community integrations (#8604) 2025-01-27 00:36:23 -08:00
Matěj Štágl
453e4d090b readme: add LlmTornado to community integrations (#8551) 2025-01-25 01:04:07 -08:00
Daniel Jalkut
ca2f9843c8 docs: remove reference to the deleted examples folder (#8524) 2025-01-22 22:52:15 -08:00
frob
294b6f5a22 docs: remove tfs_z option from documentation (#8515) 2025-01-21 09:28:59 -08:00
EndoTheDev
7bb356c680 docs: update suspend header in gpu.md (#8487) 2025-01-19 18:45:35 -08:00
Jannik Maierhöfer
021817e59a readme: add link to Langfuse (#8455) 2025-01-16 22:41:12 -08:00
Patrick Devine
a420a453b4 fix default modelfile for create (#8452) 2025-01-16 01:14:04 -08:00
Jeffrey Morgan
42cf4db601 parser: fix parsing Modelfiles with multiple FROM commands (#8449) 2025-01-16 00:14:04 -08:00
Josh
93a8daf285 convert: import support for command-r models from safetensors (#6063)
---------

Co-authored-by: Patrick Devine <patrick@infrahq.com>
2025-01-15 16:31:22 -08:00
Gloryjaw
a041b4df7c docs: fix path to examples (#8438) 2025-01-15 11:49:12 -08:00
Patrick Devine
2539f2dbf9 Fix absolute path names + gguf detection (#8428) 2025-01-14 19:01:24 -08:00
Jeffrey Morgan
61676fb506 llama: move grammar tests to llama_test.go (#8411) 2025-01-14 12:55:45 -08:00
Bruce MacDonald
f6f3713001 convert: qwen2 from safetensors (#8408)
Add native support for converting Qwen2 family models (including Qwen2.5)
from safetensors to gguf format so we can run it.
2025-01-14 10:34:37 -08:00
Steve Berdy
a30f347201 readme: add LangChain for .NET to community integrations (#8352) 2025-01-14 09:37:35 -08:00
Jeffrey Morgan
74ea4fb604 remove .prettierrc.json (#8413) 2025-01-14 09:30:34 -08:00
Jeffrey Morgan
6982e9cc96 readme: remove link to missing page 2025-01-13 18:56:31 -08:00
Patrick Devine
ab39872cb4 add new create api doc (#8388) 2025-01-13 17:30:24 -08:00
Parth Sareen
84a2314463 examples: remove codified examples (#8267) 2025-01-13 11:26:22 -08:00
Jeffrey Morgan
17fcdea698 readme: move discord link 2025-01-12 22:45:47 -08:00
Patrick Devine
32bd37adf8 make the modelfile path relative for ollama create (#8380) 2025-01-10 16:14:08 -08:00
Michael Yang
9446c2c902 Merge pull request #8196 from ollama/mxyng/gods-v2
chore: upgrade to gods v2
2025-01-10 13:50:11 -08:00
Jeffrey Morgan
9aa141d023 readme: remove discord badge image for now 2025-01-09 22:02:18 -08:00
Patrick Devine
8bccae4f92 show a more descriptive error in the client if it is newer than the server (#8351) 2025-01-09 10:12:30 -08:00
isamu arimoto
6ae2adc1af openai: accept additional headers to fix CORS errors (#8343) 2025-01-08 11:28:11 -08:00
Jeffrey Morgan
1deafd8254 llama: update vendored code to commit 46e3556 (#8308) 2025-01-08 11:22:01 -08:00
Michael
57f038ec7b readme: add phi4 model (#8350) 2025-01-08 11:21:39 -08:00
frob
cdf3a181dc Add CUSTOM_CPU_FLAGS to Dockerfile. (#8284)
* Add CUSTOM_CPU_FLAGS.

* fix golangci-lint error.

---------

Co-authored-by: Richard Lyons <rick@frob.com.au>
2025-01-06 09:17:19 -08:00
Ubaldo Porcheddu
3919f4ba3d llama: fix runner api example url in README.md (#8307) 2025-01-04 15:45:16 -08:00
Bruce MacDonald
2d33c4e97d discover: remove leading new-line for linter 2025-01-03 12:03:58 -08:00
Bruce MacDonald
29a8975c66 api: remove unused create fields
These fields are deprecated, but specifying them will not do anything. Removing them as the other deprecated fields will still work, but these do not, so they dont match our existing pattern.
2025-01-03 12:03:58 -08:00
Patrick Devine
86a622cbdc Update the /api/create endpoint to use JSON (#7935)
Replaces `POST /api/create` to use JSON instead of a Modelfile.

This is a breaking change.
2024-12-31 18:02:30 -08:00
Jeffrey Morgan
459d822b51 readme: link header to ollama.com 2024-12-29 17:36:07 -05:00
Simon Schampijer
844899440a examples: updated deprecated imports (#3602) 2024-12-29 14:36:25 -05:00
Anas Khan
103db4216d docs: add /api/version endpoint documentation (#8082)
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>
2024-12-29 14:33:44 -05:00
Jeffrey Morgan
6daddcde01 readme: update import header 2024-12-29 14:12:23 -05:00
Emilien Lancelot
07f7e69b36 readme: add Yacana multi-agent framework to community integrations (#7259) 2024-12-28 15:05:57 -05:00
CIIDMike
b68e8e5727 docs: add syntax highlighting on Go template code blocks (#8215) 2024-12-27 13:17:49 -05:00
Adarsh Mishra
369fb529e2 readme: add TextLLaMA to community integrations 2024-12-27 13:16:06 -05:00
Jared Donnell
023e4bca14 readme: add neollama to terminal section of community integrations (#8242) 2024-12-25 17:16:11 -05:00
aritra saha
51af455f62 readme: add alpaca client application to community integrations (#8227) 2024-12-24 23:05:35 -05:00
Emanuil Rusev
ffe3549064 readme: add IntelliBar to community integrations (#7950) 2024-12-23 12:04:18 -05:00
湛露先生
928de9050e server: reuse InvalidModelNameErrMsg type (#8163) 2024-12-23 10:38:34 -05:00
ItzCrazyKns
36aea6154a readme: add Perplexica to community-integrations (#8198) 2024-12-22 20:04:01 -05:00
Patrick Devine
dd352ab27f fix crash bug with /save when quotes are used (#8208) 2024-12-21 22:31:37 -08:00
Michael Yang
cb40d60469 chore: upgrade to gods v2
gods v2 uses go generics rather than interfaces which simplifies the
code considerably
2024-12-21 00:05:16 -08:00
Patrick Devine
d8bab8ea44 remove tutorials.md which pointed to removed tutorials (#8189) 2024-12-20 14:04:20 -08:00
Squishedmac
9ab62eb96f update golang.org/x dependencies (#8172) 2024-12-20 09:29:30 -08:00
Parth Sareen
290cf2040a llama: test key order preservation in schema_to_grammar (#8078)
This change adds a test to catch a regression in schema_to_grammar where
the order of keys in the JSON schema is not preserved in the generated
grammar, which is critical for step-by-step reasoning.
2024-12-18 19:44:50 -08:00
Jeffrey Morgan
a72f2dce45 scripts: sign renamed macOS binary (#8131) 2024-12-17 18:03:49 -08:00
Jesse Gross
08a832b482 llama: Ensure KV cache is fully defragmented.
Sometimes the KV cache requires defragmentation even without
triggering the threshold heuristic. In this case, decoding
will not being able to find a KV cache slot. This is particularly
difficult for the caller to handle if it happens in between
ubatches. To avoid this, we should immediately trigger a defrag.

In addition, a heavily fragmented cache can require more than
max_moves to defragment. Currently, we stop when we hit the limit
but this can leave a cache that still does not have adequate space
even after defragmentation is triggered. Instead, we should do
multiple batches of processing until everything is complete.

Fixes #7949
2024-12-17 14:01:19 -08:00
Blake Mizerany
2ddc32d5c5 llm: do not error on "null" format (#8139)
This fixes another regression in the previous commit that fixed other
known bugs.
2024-12-17 09:49:37 -08:00
Jascha Beste
2cde4b8817 readme: change getting started guide link for pgai (#8119) 2024-12-16 22:13:23 -08:00
Blake Mizerany
87f0a49fe6 llm: do not silently fail for supplied, but invalid formats (#8130)
Changes in #8002 introduced fixes for bugs with mangling JSON Schemas.
It also fixed a bug where the server would silently fail when clients
requested invalid formats. It also, unfortunately, introduced a bug
where the server would reject requests with an empty format, which
should be allowed.

The change in #8127 updated the code to allow the empty format, but also
reintroduced the regression where the server would silently fail when
the format was set, but invalid.

This commit fixes both regressions. The server does not reject the empty
format, but it does reject invalid formats. It also adds tests to help
us catch regressions in the future.

Also, the updated code provides a more detailed error message when a
client sends a non-empty, but invalid format, echoing the invalid format
in the response.

This commits also takes the opportunity to remove superfluous linter
checks.
2024-12-16 21:57:49 -08:00
Jeffrey Morgan
0f06a6daa7 llm: loosen format check to default to no format (#8127) 2024-12-16 18:45:46 -08:00
Daniel Hiltgen
8f805dd74b darwin: restore multiple runners for x86 (#8125)
In 0.5.2 we simplified packaging to have avx only for macos x86.  It looks like
there may still be some non-AVX systems out there, so this puts back the prior
logic of building no-AVX for the primary binary, and now 2 runners for avx and avx2.
These will be packaged in the App bundle only, so the stand-alone binary will now be
without AVX support on macos.  On arm, we'll also see these runners reported
as available in the log, but they're dormant and will never be used at runtime.
2024-12-16 18:45:02 -08:00
Michael
89d5e2f2fd readme: example/get started guide for pgai with Ollama (#8115)
readme: example/get started guide for pgai with Ollama
2024-12-16 17:14:37 +08:00
Jascha Beste
297ada6c87 readme: add pgai to readme for semantic search (#8028)
* docs: switch around database integrations order and link to quickstart

* docs: link to blog post in example readme

* chore: link to main readme

* readme: removing example to link externally

readme: removing example to link externally so we don't have to keep this example up-to-date

---------
2024-12-16 17:02:28 +08:00
Patrick Devine
8c9fb8eb73 imageproc mllama refactor (#7537)
Refactor mllama image processing code, and add pixtral and qwen2vl
2024-12-14 19:50:15 -08:00
Daniel Hiltgen
b75ccfc5ec ci: be more aggressive on parallelism in build (#8102) 2024-12-14 14:56:05 -08:00
Jeffrey Morgan
7a81daf026 llama: update vendor code to commit ba1cb19c (#8101) 2024-12-14 14:55:51 -08:00
Daniel Hiltgen
60f75560a2 runner: switch logging back to stderr (#8091)
This puts the low-level runner logging back on stderr for consistency with prior releases
2024-12-13 14:36:50 -08:00
Anuraag (Rag) Agrawal
e28f2d4900 openai: return usage as final chunk for streams (#6784)
* openai: return usage as final chunk for streams

---------

Co-authored-by: ParthSareen <parth.sareen@ollama.com>
2024-12-12 17:09:30 -08:00
Pascal Patry
c216850523 llama: parse JSON schema using nlohmann::ordered_json to maintain ordering (#8071) 2024-12-12 09:57:28 -08:00
Parth Sareen
18f6a98bd6 llama: enable JSON schema key ordering for generating grammars (#8055) 2024-12-11 17:17:36 -08:00
Blake Mizerany
b1fd7fef86 server: more support for mixed-case model names (#8017)
Fixes #7944
2024-12-11 15:29:59 -08:00
Daniel Hiltgen
36d111e788 ci: fix linux version (#8054)
Pass through the version override so the makefiles use it
2024-12-11 14:09:57 -08:00
Blake Mizerany
9039c821a2 llama: preserve field order in user-defined JSON schemas (#8002)
Previously we decoded and re-encoded JSON schemas during validation,
which served no purpose since json.RawMessage already validates JSON
syntax. Worse, the re-encoding lost field ordering from the original
schema, which affects inference quality during step-by-step reasoning.

While fixing this ordering issue by using json.RawMessage directly,
testing revealed that schema_to_grammar (from llama.cpp) also fails to
preserve field order during grammar generation. This appears to be the
root cause of inference degradation.

This change prevents us from mangling the user's original schema order,
but we still need to address the ordering issue in schema_to_grammar.
That will be a separate change.

Updates #7978
2024-12-11 14:07:30 -08:00
705 changed files with 28560 additions and 35550 deletions

View File

@@ -3,7 +3,9 @@ ollama
app
macapp
dist
build
.env
.cache
test_data
llama/build
.git

13
.gitattributes vendored
View File

@@ -7,5 +7,18 @@ llama/**/*.cuh linguist-vendored
llama/**/*.m linguist-vendored
llama/**/*.metal linguist-vendored
ml/backend/**/*.c linguist-vendored
ml/backend/**/*.h linguist-vendored
ml/backend/**/*.cpp linguist-vendored
ml/backend/**/*.hpp linguist-vendored
ml/backend/**/*.cu linguist-vendored
ml/backend/**/*.cuh linguist-vendored
ml/backend/**/*.m linguist-vendored
ml/backend/**/*.metal linguist-vendored
ml/backend/**/CMakeLists.txt linguist-vendored
llama/build-info.cpp linguist-generated
ml/backend/ggml/ggml/src/ggml-metal/ggml-metal-embed.s linguist-generated
* text=auto
*.go text eol=lf

View File

@@ -9,6 +9,14 @@ body:
description: What happened? What did you expect to happen?
validations:
required: true
- type: textarea
id: logs
attributes:
label: Relevant log output
description: Please copy and paste any relevant log output. See [Troubleshooting Guide](https://github.com/ollama/ollama/blob/main/docs/troubleshooting.md#how-to-troubleshoot-issues) for details.
render: shell
validations:
required: false
- type: dropdown
id: os
attributes:

File diff suppressed because it is too large Load Diff

View File

@@ -1,11 +1,5 @@
name: test
env:
ROCM_WINDOWS_URL: https://download.amd.com/developer/eula/rocm-hub/AMD-Software-PRO-Edition-24.Q3-WinSvr2022-For-HIP.exe
MSYS2_URL: https://github.com/msys2/msys2-installer/releases/download/2024-07-27/msys2-x86_64-20240727.exe
CUDA_12_WINDOWS_URL: https://developer.download.nvidia.com/compute/cuda/12.4.0/local_installers/cuda_12.4.0_551.61_windows.exe
CUDA_12_WINDOWS_VER: 12.4
concurrency:
# For PRs, later CI runs preempt previous ones. e.g. a force push on a PR
# cancels running CI jobs and starts all new ones.
@@ -27,7 +21,7 @@ jobs:
changes:
runs-on: ubuntu-latest
outputs:
RUNNERS: ${{ steps.changes.outputs.RUNNERS }}
changed: ${{ steps.changes.outputs.changed }}
steps:
- uses: actions/checkout@v4
with:
@@ -35,309 +29,139 @@ jobs:
- id: changes
run: |
changed() {
git diff-tree -r --no-commit-id --name-only \
$(git merge-base ${{ github.event.pull_request.base.sha }} ${{ github.event.pull_request.head.sha }}) \
${{ github.event.pull_request.head.sha }} \
local BASE=${{ github.event.pull_request.base.sha }}
local HEAD=${{ github.event.pull_request.head.sha }}
local MERGE_BASE=$(git merge-base $BASE $HEAD)
git diff-tree -r --no-commit-id --name-only "$MERGE_BASE" "$HEAD" \
| xargs python3 -c "import sys; from pathlib import Path; print(any(Path(x).match(glob) for x in sys.argv[1:] for glob in '$*'.split(' ')))"
}
{
echo RUNNERS=$(changed 'llama/**')
} >>$GITHUB_OUTPUT
echo changed=$(changed 'llama/llama.cpp/**' 'ml/backend/ggml/ggml/**') | tee -a $GITHUB_OUTPUT
runners-linux-cuda:
linux:
needs: [changes]
if: ${{ needs.changes.outputs.RUNNERS == 'True' }}
if: needs.changes.outputs.changed == 'True'
strategy:
matrix:
cuda-version:
- '11.8.0'
include:
- preset: CPU
- preset: CUDA
container: nvidia/cuda:11.8.0-devel-ubuntu22.04
flags: '-DCMAKE_CUDA_ARCHITECTURES=87'
- preset: ROCm
container: rocm/dev-ubuntu-22.04:6.1.2
extra-packages: rocm-libs
flags: '-DAMDGPU_TARGETS=gfx1010 -DCMAKE_PREFIX_PATH=/opt/rocm'
runs-on: linux
container: nvidia/cuda:${{ matrix.cuda-version }}-devel-ubuntu20.04
container: ${{ matrix.container }}
steps:
- uses: actions/checkout@v4
- run: |
apt-get update && apt-get install -y git build-essential curl
[ -n "${{ matrix.container }}" ] || sudo=sudo
$sudo apt-get update
$sudo apt-get install -y cmake ccache ${{ matrix.extra-packages }}
env:
DEBIAN_FRONTEND: noninteractive
- uses: actions/checkout@v4
- uses: actions/setup-go@v4
- uses: actions/cache@v4
with:
go-version-file: go.mod
cache: true
- run: go get ./...
path: /github/home/.cache/ccache
key: ccache-${{ runner.os }}-${{ runner.arch }}-${{ matrix.preset }}
- run: |
git config --global --add safe.directory /__w/ollama/ollama
cores=$(grep '^core id' /proc/cpuinfo |sort -u|wc -l)
make -j $cores cuda_v11
runners-linux-rocm:
cmake --preset ${{ matrix.preset }} ${{ matrix.flags }}
cmake --build --preset ${{ matrix.preset }} --parallel
windows:
needs: [changes]
if: ${{ needs.changes.outputs.RUNNERS == 'True' }}
if: needs.changes.outputs.changed == 'True'
strategy:
matrix:
rocm-version:
- '6.1.2'
runs-on: linux
container: rocm/dev-ubuntu-20.04:${{ matrix.rocm-version }}
steps:
- run: |
apt-get update && apt-get install -y git build-essential curl rocm-libs
env:
DEBIAN_FRONTEND: noninteractive
- uses: actions/checkout@v4
- uses: actions/setup-go@v4
with:
go-version-file: go.mod
cache: true
- run: go get ./...
- run: |
git config --global --add safe.directory /__w/ollama/ollama
cores=$(grep '^core id' /proc/cpuinfo |sort -u|wc -l)
make -j $cores rocm
# ROCm generation step
runners-windows-rocm:
needs: [changes]
if: ${{ needs.changes.outputs.RUNNERS == 'True' }}
include:
- preset: CPU
- preset: CUDA
install: https://developer.download.nvidia.com/compute/cuda/11.8.0/local_installers/cuda_11.8.0_522.06_windows.exe
flags: '-DCMAKE_CUDA_ARCHITECTURES=87'
- preset: ROCm
install: https://download.amd.com/developer/eula/rocm-hub/AMD-Software-PRO-Edition-24.Q3-WinSvr2022-For-HIP.exe
flags: '-DAMDGPU_TARGETS=gfx1010'
runs-on: windows
steps:
- uses: actions/checkout@v4
- uses: actions/setup-go@v5
- run: |
choco install -y --no-progress ccache ninja
ccache -o cache_dir=${{ github.workspace }}\.ccache
- if: matrix.preset == 'CUDA' || matrix.preset == 'ROCm'
id: cache-install
uses: actions/cache/restore@v4
with:
go-version-file: go.mod
cache: true
- name: Set make jobs default
run: |
echo "MAKEFLAGS=--jobs=$((Get-ComputerInfo -Property CsProcessors).CsProcessors.NumberOfCores)" | Out-File -FilePath $env:GITHUB_ENV -Encoding utf8 -Append
# ROCM installation steps
- name: 'Cache ROCm installer'
id: cache-rocm
uses: actions/cache@v4
with:
path: rocm-install.exe
key: ${{ env.ROCM_WINDOWS_URL }}
- name: 'Conditionally Download ROCm'
if: steps.cache-rocm.outputs.cache-hit != 'true'
path: |
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA
C:\Program Files\AMD\ROCm
key: ${{ matrix.install }}
- if: matrix.preset == 'CUDA'
name: Install CUDA ${{ matrix.cuda-version }}
run: |
$ErrorActionPreference = "Stop"
Invoke-WebRequest -Uri "${env:ROCM_WINDOWS_URL}" -OutFile "rocm-install.exe"
- name: 'Install ROCm'
run: |
Start-Process "rocm-install.exe" -ArgumentList '-install' -NoNewWindow -Wait
- name: 'Verify ROCm'
run: |
& 'C:\Program Files\AMD\ROCm\*\bin\clang.exe' --version
echo "HIP_PATH=$(Resolve-Path 'C:\Program Files\AMD\ROCm\*\bin\clang.exe' | split-path | split-path | select -first 1)" | Out-File -FilePath $env:GITHUB_ENV -Encoding utf8 -Append
if ("${{ steps.cache-install.outputs.cache-hit }}" -ne 'true') {
Invoke-WebRequest -Uri "${{ matrix.install }}" -OutFile "install.exe"
Start-Process -FilePath .\install.exe -ArgumentList (@("-s", "cudart_11.8", "nvcc_11.8", "cublas_11.8", "cublas_dev_11.8")) -NoNewWindow -Wait
}
- name: Add msys paths
run: |
echo "c:\msys64\usr\bin" | Out-File -FilePath $env:GITHUB_PATH -Encoding utf8 -Append
echo "C:\msys64\clang64\bin" | Out-File -FilePath $env:GITHUB_PATH -Encoding utf8 -Append
- name: Install msys2 tools
run: |
Start-Process "c:\msys64\usr\bin\pacman.exe" -ArgumentList @("-S", "--noconfirm", "mingw-w64-clang-x86_64-gcc-compat", "mingw-w64-clang-x86_64-clang") -NoNewWindow -Wait
- name: make rocm runner
run: |
import-module 'C:\Program Files (x86)\Microsoft Visual Studio\2019\Enterprise\Common7\Tools\Microsoft.VisualStudio.DevShell.dll'
Enter-VsDevShell -vsinstallpath 'C:\Program Files (x86)\Microsoft Visual Studio\2019\Enterprise' -skipautomaticlocation -DevCmdArguments '-arch=x64 -no_logo'
if (!(gcc --version | select-string -quiet clang)) { throw "wrong gcc compiler detected - must be clang" }
make -C llama print-HIP_PATH print-HIP_LIB_DIR
make rocm
# CUDA generation step
runners-windows-cuda:
needs: [changes]
if: ${{ needs.changes.outputs.RUNNERS == 'True' }}
runs-on: windows
steps:
- uses: actions/checkout@v4
- uses: actions/setup-go@v5
with:
go-version-file: go.mod
cache: true
- name: Set make jobs default
run: |
echo "MAKEFLAGS=--jobs=$((Get-ComputerInfo -Property CsProcessors).CsProcessors.NumberOfCores)" | Out-File -FilePath $env:GITHUB_ENV -Encoding utf8 -Append
# CUDA installation steps
- name: 'Cache CUDA installer'
id: cache-cuda
uses: actions/cache@v4
with:
path: cuda-install.exe
key: ${{ env.CUDA_12_WINDOWS_URL }}
- name: 'Conditionally Download CUDA'
if: steps.cache-cuda.outputs.cache-hit != 'true'
run: |
$ErrorActionPreference = "Stop"
Invoke-WebRequest -Uri "${env:CUDA_12_WINDOWS_URL}" -OutFile "cuda-install.exe"
- name: 'Install CUDA'
run: |
$subpackages = @("cudart", "nvcc", "cublas", "cublas_dev") | foreach-object {"${_}_${{ env.CUDA_12_WINDOWS_VER }}"}
Start-Process "cuda-install.exe" -ArgumentList (@("-s") + $subpackages) -NoNewWindow -Wait
- name: 'Verify CUDA'
run: |
& (resolve-path "c:\Program Files\NVIDIA*\CUDA\v*\bin\nvcc.exe")[0] --version
$cudaPath=((resolve-path "c:\Program Files\NVIDIA*\CUDA\v*\bin\nvcc.exe")[0].path | split-path | split-path)
$cudaVer=($cudaPath | split-path -leaf ) -replace 'v(\d+).(\d+)', '$1_$2'
$cudaPath = (Resolve-Path "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\*").path
echo "$cudaPath\bin" | Out-File -FilePath $env:GITHUB_PATH -Encoding utf8 -Append
echo "CUDA_PATH=$cudaPath" | Out-File -FilePath $env:GITHUB_ENV -Encoding utf8 -Append
echo "CUDA_PATH_V${cudaVer}=$cudaPath" | Out-File -FilePath $env:GITHUB_ENV -Encoding utf8 -Append
echo "CUDA_PATH_VX_Y=CUDA_PATH_V${cudaVer}" | Out-File -FilePath $env:GITHUB_ENV -Encoding utf8 -Append
- if: matrix.preset == 'ROCm'
name: Install ROCm ${{ matrix.rocm-version }}
run: |
$ErrorActionPreference = "Stop"
if ("${{ steps.cache-install.outputs.cache-hit }}" -ne 'true') {
Invoke-WebRequest -Uri "${{ matrix.install }}" -OutFile "install.exe"
Start-Process -FilePath .\install.exe -ArgumentList '-install' -NoNewWindow -Wait
}
- name: Add msys paths
run: |
echo "c:\msys64\usr\bin" | Out-File -FilePath $env:GITHUB_PATH -Encoding utf8 -Append
echo "C:\msys64\clang64\bin" | Out-File -FilePath $env:GITHUB_PATH -Encoding utf8 -Append
- name: Install msys2 tools
run: |
Start-Process "c:\msys64\usr\bin\pacman.exe" -ArgumentList @("-S", "--noconfirm", "mingw-w64-clang-x86_64-gcc-compat", "mingw-w64-clang-x86_64-clang") -NoNewWindow -Wait
- name: make cuda runner
run: |
import-module 'C:\Program Files (x86)\Microsoft Visual Studio\2019\Enterprise\Common7\Tools\Microsoft.VisualStudio.DevShell.dll'
Enter-VsDevShell -vsinstallpath 'C:\Program Files (x86)\Microsoft Visual Studio\2019\Enterprise' -skipautomaticlocation -DevCmdArguments '-arch=x64 -no_logo'
if (!(gcc --version | select-string -quiet clang)) { throw "wrong gcc compiler detected - must be clang" }
make cuda_v$(($env:CUDA_PATH | split-path -leaf) -replace 'v(\d+).*', '$1')
runners-cpu:
needs: [changes]
if: ${{ needs.changes.outputs.RUNNERS == 'True' }}
strategy:
matrix:
os: [ubuntu-latest, macos-latest, windows-2019]
arch: [amd64, arm64]
exclude:
- os: ubuntu-latest
arch: arm64
- os: windows-2019
arch: arm64
runs-on: ${{ matrix.os }}
env:
GOARCH: ${{ matrix.arch }}
ARCH: ${{ matrix.arch }}
CGO_ENABLED: '1'
steps:
$hipPath = (Resolve-Path "C:\Program Files\AMD\ROCm\*").path
echo "$hipPath\bin" | Out-File -FilePath $env:GITHUB_PATH -Encoding utf8 -Append
echo "CC=$hipPath\bin\clang.exe" | Out-File -FilePath $env:GITHUB_ENV -Append
echo "CXX=$hipPath\bin\clang++.exe" | Out-File -FilePath $env:GITHUB_ENV -Append
- if: ${{ !cancelled() && steps.cache-install.outputs.cache-hit != 'true' }}
uses: actions/cache/save@v4
with:
path: |
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA
C:\Program Files\AMD\ROCm
key: ${{ matrix.install }}
- uses: actions/checkout@v4
- uses: actions/setup-go@v5
- uses: actions/cache@v4
with:
go-version-file: go.mod
cache: true
- name: Add msys paths
if: ${{ startsWith(matrix.os, 'windows-') }}
run: |
echo "c:\msys64\usr\bin" | Out-File -FilePath $env:GITHUB_PATH -Encoding utf8 -Append
echo "C:\msys64\clang64\bin" | Out-File -FilePath $env:GITHUB_PATH -Encoding utf8 -Append
- name: Install msys2 tools
if: ${{ startsWith(matrix.os, 'windows-') }}
run: |
Start-Process "c:\msys64\usr\bin\pacman.exe" -ArgumentList @("-S", "--noconfirm", "mingw-w64-clang-x86_64-gcc-compat", "mingw-w64-clang-x86_64-clang") -NoNewWindow -Wait
- name: 'Build Windows Go Runners'
if: ${{ startsWith(matrix.os, 'windows-') }}
run: |
$gopath=(get-command go).source | split-path -parent
$gccpath=(get-command gcc).source | split-path -parent
import-module 'C:\Program Files (x86)\Microsoft Visual Studio\2019\Enterprise\Common7\Tools\Microsoft.VisualStudio.DevShell.dll'
Enter-VsDevShell -vsinstallpath 'C:\Program Files (x86)\Microsoft Visual Studio\2019\Enterprise' -skipautomaticlocation -DevCmdArguments '-arch=x64 -no_logo'
$env:CMAKE_SYSTEM_VERSION="10.0.22621.0"
$env:PATH="$gopath;$gccpath;$env:PATH"
echo $env:PATH
if (!(gcc --version | select-string -quiet clang)) { throw "wrong gcc compiler detected - must be clang" }
make -j 4
- name: 'Build Unix Go Runners'
if: ${{ ! startsWith(matrix.os, 'windows-') }}
run: make -j 4
- run: go build .
lint:
strategy:
matrix:
os: [ubuntu-latest, macos-latest, windows-2019]
arch: [amd64, arm64]
exclude:
- os: ubuntu-latest
arch: arm64
- os: windows-2019
arch: arm64
- os: macos-latest
arch: amd64
runs-on: ${{ matrix.os }}
env:
GOARCH: ${{ matrix.arch }}
CGO_ENABLED: '1'
steps:
- uses: actions/checkout@v4
with:
submodules: recursive
- name: Add msys paths
if: ${{ startsWith(matrix.os, 'windows-') }}
run: |
echo "c:\msys64\usr\bin" | Out-File -FilePath $env:GITHUB_PATH -Encoding utf8 -Append
echo "C:\msys64\clang64\bin" | Out-File -FilePath $env:GITHUB_PATH -Encoding utf8 -Append
- name: Install msys2 tools
if: ${{ startsWith(matrix.os, 'windows-') }}
run: |
Start-Process "c:\msys64\usr\bin\pacman.exe" -ArgumentList @("-S", "--noconfirm", "mingw-w64-clang-x86_64-gcc-compat", "mingw-w64-clang-x86_64-clang") -NoNewWindow -Wait
- uses: actions/setup-go@v5
with:
go-version-file: go.mod
cache: false
path: ${{ github.workspace }}\.ccache
key: ccache-${{ runner.os }}-${{ runner.arch }}-${{ matrix.preset }}
- run: |
case ${{ matrix.arch }} in
amd64) echo ARCH=x86_64 ;;
arm64) echo ARCH=arm64 ;;
esac >>$GITHUB_ENV
shell: bash
- uses: golangci/golangci-lint-action@v6
with:
args: --timeout 10m0s -v
Import-Module 'C:\Program Files (x86)\Microsoft Visual Studio\2019\Enterprise\Common7\Tools\Microsoft.VisualStudio.DevShell.dll'
Enter-VsDevShell -VsInstallPath 'C:\Program Files (x86)\Microsoft Visual Studio\2019\Enterprise' -SkipAutomaticLocation -DevCmdArguments '-arch=x64 -no_logo'
cmake --preset "${{ matrix.preset }}" ${{ matrix.flags }}
cmake --build --parallel --preset "${{ matrix.preset }}"
env:
CMAKE_GENERATOR: Ninja
test:
strategy:
matrix:
os: [ubuntu-latest, macos-latest, windows-2019]
arch: [amd64]
exclude:
- os: ubuntu-latest
arch: arm64
- os: windows-2019
arch: arm64
os: [ubuntu-latest, macos-latest, windows-latest]
runs-on: ${{ matrix.os }}
env:
GOARCH: ${{ matrix.arch }}
CGO_ENABLED: '1'
steps:
- uses: actions/checkout@v4
with:
submodules: recursive
- name: Add msys paths
if: ${{ startsWith(matrix.os, 'windows-') }}
run: |
echo "c:\msys64\usr\bin" | Out-File -FilePath $env:GITHUB_PATH -Encoding utf8 -Append
echo "C:\msys64\clang64\bin" | Out-File -FilePath $env:GITHUB_PATH -Encoding utf8 -Append
- name: Install msys2 tools
if: ${{ startsWith(matrix.os, 'windows-') }}
run: |
Start-Process "c:\msys64\usr\bin\pacman.exe" -ArgumentList @("-S", "--noconfirm", "mingw-w64-clang-x86_64-gcc-compat", "mingw-w64-clang-x86_64-clang") -NoNewWindow -Wait
- uses: actions/setup-go@v5
with:
go-version-file: go.mod
cache: true
- run: |
case ${{ matrix.arch }} in
amd64) echo ARCH=amd64 ;;
arm64) echo ARCH=arm64 ;;
esac >>$GITHUB_ENV
shell: bash
- uses: golangci/golangci-lint-action@v6
with:
args: --timeout 10m0s -v
- run: go test ./...
patches:
needs: [changes]
if: ${{ needs.changes.outputs.RUNNERS == 'True' }}
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
with:
submodules: recursive
- name: Verify patches carry all the changes
- name: Verify patches apply cleanly and do not change files
run: |
make apply-patches sync && git diff --compact-summary --exit-code llama
make -f Makefile.sync clean sync
git diff --compact-summary --exit-code

5
.gitignore vendored
View File

@@ -4,12 +4,13 @@
.venv
.swp
dist
build
ollama
.cache
*.exe
.idea
test_data
*.crt
llama/build
__debug_bin*
llama/vendor
llama/build
llama/vendor

View File

@@ -8,8 +8,6 @@ linters:
- containedctx
- contextcheck
- errcheck
- exportloopref
- gci
- gocheckcompilerdirectives
- gofmt
- gofumpt
@@ -30,8 +28,6 @@ linters:
- wastedassign
- whitespace
linters-settings:
gci:
sections: [standard, default, localmodule]
staticcheck:
checks:
- all

View File

@@ -1,10 +0,0 @@
{
"trailingComma": "es5",
"tabWidth": 2,
"useTabs": false,
"semi": false,
"singleQuote": true,
"jsxSingleQuote": true,
"printWidth": 120,
"arrowParens": "avoid"
}

117
CMakeLists.txt Normal file
View File

@@ -0,0 +1,117 @@
cmake_minimum_required(VERSION 3.21)
project(Ollama C CXX)
include(CheckLanguage)
find_package(Threads REQUIRED)
set(CMAKE_BUILD_TYPE Release)
set(BUILD_SHARED_LIBS ON)
set(CMAKE_CXX_STANDARD 17)
set(CMAKE_CXX_STANDARD_REQUIRED ON)
set(CMAKE_CXX_EXTENSIONS OFF)
set(GGML_BUILD ON)
set(GGML_SHARED ON)
set(GGML_CCACHE ON)
set(GGML_BACKEND_DL ON)
set(GGML_BACKEND_SHARED ON)
set(GGML_SCHED_MAX_COPIES 4)
set(GGML_LLAMAFILE ON)
set(GGML_CUDA_PEER_MAX_BATCH_SIZE 128)
set(GGML_CUDA_GRAPHS ON)
if((NOT CMAKE_OSX_ARCHITECTURES MATCHES "arm64")
OR (NOT CMAKE_OSX_ARCHITECTURES AND NOT CMAKE_SYSTEM_PROCESSOR MATCHES "arm|aarch64|ARM64|ARMv[0-9]+"))
set(GGML_CPU_ALL_VARIANTS ON)
endif()
if (CMAKE_OSX_ARCHITECTURES MATCHES "x86_64")
set(CMAKE_BUILD_RPATH "@loader_path")
set(CMAKE_INSTALL_RPATH "@loader_path")
endif()
set(OLLAMA_BUILD_DIR ${CMAKE_BINARY_DIR}/lib/ollama)
set(OLLAMA_INSTALL_DIR ${CMAKE_INSTALL_PREFIX}/lib/ollama)
set(CMAKE_RUNTIME_OUTPUT_DIRECTORY ${OLLAMA_BUILD_DIR})
set(CMAKE_RUNTIME_OUTPUT_DIRECTORY_DEBUG ${OLLAMA_BUILD_DIR})
set(CMAKE_RUNTIME_OUTPUT_DIRECTORY_RELEASE ${OLLAMA_BUILD_DIR})
set(CMAKE_LIBRARY_OUTPUT_DIRECTORY ${OLLAMA_BUILD_DIR})
set(CMAKE_LIBRARY_OUTPUT_DIRECTORY_DEBUG ${OLLAMA_BUILD_DIR})
set(CMAKE_LIBRARY_OUTPUT_DIRECTORY_RELEASE ${OLLAMA_BUILD_DIR})
include_directories(${CMAKE_CURRENT_SOURCE_DIR}/ml/backend/ggml/ggml/src)
include_directories(${CMAKE_CURRENT_SOURCE_DIR}/ml/backend/ggml/ggml/src/include)
include_directories(${CMAKE_CURRENT_SOURCE_DIR}/ml/backend/ggml/ggml/src/ggml-cpu)
include_directories(${CMAKE_CURRENT_SOURCE_DIR}/ml/backend/ggml/ggml/src/ggml-cpu/amx)
set(GGML_CPU ON)
add_subdirectory(${CMAKE_CURRENT_SOURCE_DIR}/ml/backend/ggml/ggml/src)
set_property(TARGET ggml PROPERTY EXCLUDE_FROM_ALL TRUE)
get_target_property(CPU_VARIANTS ggml-cpu MANUALLY_ADDED_DEPENDENCIES)
if(NOT CPU_VARIANTS)
set(CPU_VARIANTS "ggml-cpu")
endif()
install(TARGETS ggml-base ${CPU_VARIANTS}
RUNTIME_DEPENDENCIES
PRE_EXCLUDE_REGEXES ".*"
RUNTIME DESTINATION ${OLLAMA_INSTALL_DIR} COMPONENT CPU
LIBRARY DESTINATION ${OLLAMA_INSTALL_DIR} COMPONENT CPU
FRAMEWORK DESTINATION ${OLLAMA_INSTALL_DIR} COMPONENT CPU
)
check_language(CUDA)
if(CMAKE_CUDA_COMPILER)
if(CMAKE_VERSION VERSION_GREATER_EQUAL "3.24" AND NOT CMAKE_CUDA_ARCHITECTURES)
set(CMAKE_CUDA_ARCHITECTURES "native")
endif()
find_package(CUDAToolkit)
add_subdirectory(${CMAKE_CURRENT_SOURCE_DIR}/ml/backend/ggml/ggml/src/ggml-cuda)
set(OLLAMA_CUDA_INSTALL_DIR ${OLLAMA_INSTALL_DIR}/cuda_v${CUDAToolkit_VERSION_MAJOR})
install(TARGETS ggml-cuda
RUNTIME_DEPENDENCIES
DIRECTORIES ${CUDAToolkit_BIN_DIR} ${CUDAToolkit_LIBRARY_DIR}
PRE_INCLUDE_REGEXES cublas cublasLt cudart
PRE_EXCLUDE_REGEXES ".*"
RUNTIME DESTINATION ${OLLAMA_CUDA_INSTALL_DIR} COMPONENT CUDA
LIBRARY DESTINATION ${OLLAMA_CUDA_INSTALL_DIR} COMPONENT CUDA
)
endif()
check_language(HIP)
if(CMAKE_HIP_COMPILER)
set(HIP_PLATFORM "amd")
find_package(hip REQUIRED)
if(NOT AMDGPU_TARGETS)
list(FILTER AMDGPU_TARGETS INCLUDE REGEX "^gfx(900|94[012]|101[02]|1030|110[012])$")
endif()
if(AMDGPU_TARGETS)
add_subdirectory(${CMAKE_CURRENT_SOURCE_DIR}/ml/backend/ggml/ggml/src/ggml-hip)
set(OLLAMA_HIP_INSTALL_DIR ${OLLAMA_INSTALL_DIR}/rocm)
install(TARGETS ggml-hip
RUNTIME_DEPENDENCIES
DIRECTORIES ${HIP_BIN_INSTALL_DIR} ${HIP_LIB_INSTALL_DIR}
PRE_INCLUDE_REGEXES amdhip64 hipblas rocblas amd_comgr hsa_runtime64 rocprofiler-register drm_amdgpu drm numa
PRE_EXCLUDE_REGEXES ".*"
POST_EXCLUDE_REGEXES "system32"
RUNTIME DESTINATION ${OLLAMA_HIP_INSTALL_DIR} COMPONENT HIP
LIBRARY DESTINATION ${OLLAMA_HIP_INSTALL_DIR} COMPONENT HIP
)
foreach(HIP_LIB_BIN_INSTALL_DIR IN ITEMS ${HIP_BIN_INSTALL_DIR} ${HIP_LIB_INSTALL_DIR})
if(EXISTS ${HIP_LIB_BIN_INSTALL_DIR}/rocblas)
install(DIRECTORY ${HIP_LIB_BIN_INSTALL_DIR}/rocblas DESTINATION ${OLLAMA_HIP_INSTALL_DIR} COMPONENT HIP)
break()
endif()
endforeach()
endif()
endif()

110
CMakePresets.json Normal file
View File

@@ -0,0 +1,110 @@
{
"version": 3,
"configurePresets": [
{
"name": "Default",
"binaryDir": "${sourceDir}/build",
"installDir": "${sourceDir}/dist",
"cacheVariables": {
"CMAKE_BUILD_TYPE": "Release"
}
},
{
"name": "CPU",
"inherits": [ "Default" ]
},
{
"name": "CUDA",
"inherits": [ "Default" ]
},
{
"name": "CUDA 11",
"inherits": [ "CUDA" ],
"cacheVariables": {
"CMAKE_CUDA_ARCHITECTURES": "50;52;53;60;61;62;70;72;75;80;86"
}
},
{
"name": "CUDA 12",
"inherits": [ "CUDA" ],
"cacheVariables": {
"CMAKE_CUDA_ARCHITECTURES": "60;61;62;70;72;75;80;86;87;89;90;90a"
}
},
{
"name": "JetPack 5",
"inherits": [ "CUDA" ],
"cacheVariables": {
"CMAKE_CUDA_ARCHITECTURES": "72;87"
}
},
{
"name": "JetPack 6",
"inherits": [ "CUDA" ],
"cacheVariables": {
"CMAKE_CUDA_ARCHITECTURES": "87"
}
},
{
"name": "ROCm",
"inherits": [ "Default" ],
"cacheVariables": {
"CMAKE_HIP_PLATFORM": "amd"
}
},
{
"name": "ROCm 6",
"inherits": [ "ROCm" ],
"cacheVariables": {
"AMDGPU_TARGETS": "gfx900;gfx940;gfx941;gfx942;gfx1010;gfx1012;gfx1030;gfx1100;gfx1101;gfx1102"
}
}
],
"buildPresets": [
{
"name": "Default",
"configurePreset": "Default",
"configuration": "Release"
},
{
"name": "CPU",
"configurePreset": "Default",
"targets": [ "ggml-cpu" ]
},
{
"name": "CUDA",
"configurePreset": "CUDA",
"targets": [ "ggml-cuda" ]
},
{
"name": "CUDA 11",
"inherits": [ "CUDA" ],
"configurePreset": "CUDA 11"
},
{
"name": "CUDA 12",
"inherits": [ "CUDA" ],
"configurePreset": "CUDA 12"
},
{
"name": "JetPack 5",
"inherits": [ "CUDA" ],
"configurePreset": "JetPack 5"
},
{
"name": "JetPack 6",
"inherits": [ "CUDA" ],
"configurePreset": "JetPack 6"
},
{
"name": "ROCm",
"configurePreset": "ROCm",
"targets": [ "ggml-hip" ]
},
{
"name": "ROCm 6",
"inherits": [ "ROCm" ],
"configurePreset": "ROCm 6"
}
]
}

View File

@@ -1,196 +1,128 @@
ARG GOLANG_VERSION=1.22.8
ARG CUDA_VERSION_11=11.3.1
ARG CUDA_VERSION_12=12.4.0
ARG ROCM_VERSION=6.1.2
ARG JETPACK_6=r36.2.0
ARG JETPACK_5=r35.4.1
# vim: filetype=dockerfile
### To create a local image for building linux binaries on mac or windows with efficient incremental builds
#
# docker build --platform linux/amd64 -t builder-amd64 -f Dockerfile --target unified-builder-amd64 .
# docker run --platform linux/amd64 --rm -it -v $(pwd):/go/src/github.com/ollama/ollama/ builder-amd64
#
### Then incremental builds will be much faster in this container
#
# make -j 10 dist
#
FROM --platform=linux/amd64 rocm/dev-centos-7:${ROCM_VERSION}-complete AS unified-builder-amd64
ARG GOLANG_VERSION
ARG CUDA_VERSION_11
ARG CUDA_VERSION_12
COPY ./scripts/rh_linux_deps.sh /
ENV PATH /opt/rh/devtoolset-10/root/usr/bin:/usr/local/cuda/bin:$PATH
ENV LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:/usr/local/cuda/lib64
RUN GOLANG_VERSION=${GOLANG_VERSION} sh /rh_linux_deps.sh
RUN yum-config-manager --add-repo https://developer.download.nvidia.com/compute/cuda/repos/rhel7/x86_64/cuda-rhel7.repo && \
dnf clean all && \
dnf install -y \
zsh \
cuda-toolkit-$(echo ${CUDA_VERSION_11} | cut -f1-2 -d. | sed -e "s/\./-/g") \
cuda-toolkit-$(echo ${CUDA_VERSION_12} | cut -f1-2 -d. | sed -e "s/\./-/g")
# TODO intel oneapi goes here...
ENV GOARCH amd64
ENV CGO_ENABLED 1
WORKDIR /go/src/github.com/ollama/ollama/
ENTRYPOINT [ "zsh" ]
ARG FLAVOR=${TARGETARCH}
### To create a local image for building linux binaries on mac or linux/arm64 with efficient incremental builds
# Note: this does not contain jetson variants
#
# docker build --platform linux/arm64 -t builder-arm64 -f Dockerfile --target unified-builder-arm64 .
# docker run --platform linux/arm64 --rm -it -v $(pwd):/go/src/github.com/ollama/ollama/ builder-arm64
#
FROM --platform=linux/arm64 rockylinux:8 AS unified-builder-arm64
ARG GOLANG_VERSION
ARG CUDA_VERSION_11
ARG CUDA_VERSION_12
COPY ./scripts/rh_linux_deps.sh /
RUN GOLANG_VERSION=${GOLANG_VERSION} sh /rh_linux_deps.sh
RUN yum-config-manager --add-repo https://developer.download.nvidia.com/compute/cuda/repos/rhel8/sbsa/cuda-rhel8.repo && \
dnf config-manager --set-enabled appstream && \
dnf clean all && \
dnf install -y \
zsh \
cuda-toolkit-$(echo ${CUDA_VERSION_11} | cut -f1-2 -d. | sed -e "s/\./-/g") \
cuda-toolkit-$(echo ${CUDA_VERSION_12} | cut -f1-2 -d. | sed -e "s/\./-/g")
ENV PATH /opt/rh/gcc-toolset-10/root/usr/bin:$PATH:/usr/local/cuda/bin
ENV LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:/usr/local/cuda/lib64
ENV LIBRARY_PATH=/usr/local/cuda/lib64/stubs:/opt/amdgpu/lib64
ENV GOARCH arm64
ENV CGO_ENABLED 1
WORKDIR /go/src/github.com/ollama/ollama/
ENTRYPOINT [ "zsh" ]
ARG ROCMVERSION=6.1.2
ARG JETPACK5VERSION=r35.4.1
ARG JETPACK6VERSION=r36.2.0
ARG CMAKEVERSION=3.31.2
FROM --platform=linux/amd64 unified-builder-amd64 AS build-amd64
COPY . .
ARG OLLAMA_SKIP_CUDA_GENERATE
ARG OLLAMA_SKIP_ROCM_GENERATE
ARG OLLAMA_FAST_BUILD
FROM --platform=linux/amd64 rocm/dev-centos-7:${ROCMVERSION}-complete AS base-amd64
RUN sed -i -e 's/mirror.centos.org/vault.centos.org/g' -e 's/^#.*baseurl=http/baseurl=http/g' -e 's/^mirrorlist=http/#mirrorlist=http/g' /etc/yum.repos.d/*.repo \
&& yum install -y yum-utils devtoolset-10-gcc devtoolset-10-gcc-c++ \
&& yum-config-manager --add-repo https://developer.download.nvidia.com/compute/cuda/repos/rhel7/x86_64/cuda-rhel7.repo \
&& curl -s -L https://github.com/ccache/ccache/releases/download/v4.10.2/ccache-4.10.2-linux-x86_64.tar.xz | tar -Jx -C /usr/local/bin --strip-components 1
ENV PATH=/opt/rh/devtoolset-10/root/usr/bin:/opt/rh/devtoolset-11/root/usr/bin:$PATH
FROM --platform=linux/arm64 rockylinux:8 AS base-arm64
# install epel-release for ccache
RUN yum install -y yum-utils epel-release \
&& yum install -y clang ccache \
&& yum-config-manager --add-repo https://developer.download.nvidia.com/compute/cuda/repos/rhel8/sbsa/cuda-rhel8.repo
ENV CC=clang CXX=clang++
FROM base-${TARGETARCH} AS base
ARG CMAKEVERSION
RUN curl -fsSL https://github.com/Kitware/CMake/releases/download/v${CMAKEVERSION}/cmake-${CMAKEVERSION}-linux-$(uname -m).tar.gz | tar xz -C /usr/local --strip-components 1
COPY CMakeLists.txt CMakePresets.json .
COPY ml/backend/ggml/ggml ml/backend/ggml/ggml
ENV LDFLAGS=-s
FROM base AS cpu
# amd64 uses gcc which requires devtoolset-11 for AVX extensions while arm64 uses clang
RUN if [ "$(uname -m)" = "x86_64" ]; then yum install -y devtoolset-11-gcc devtoolset-11-gcc-c++; fi
ENV PATH=/opt/rh/devtoolset-11/root/usr/bin:$PATH
RUN --mount=type=cache,target=/root/.ccache \
if grep "^flags" /proc/cpuinfo|grep avx>/dev/null; then \
make -j $(expr $(nproc) / 2 ) dist ; \
else \
make -j 5 dist ; \
fi
RUN cd dist/linux-$GOARCH && \
tar -cf - . | pigz --best > ../ollama-linux-$GOARCH.tgz
RUN if [ -z ${OLLAMA_SKIP_ROCM_GENERATE} ] ; then \
cd dist/linux-$GOARCH-rocm && \
tar -cf - . | pigz --best > ../ollama-linux-$GOARCH-rocm.tgz ;\
fi
cmake --preset 'CPU' \
&& cmake --build --parallel --preset 'CPU' \
&& cmake --install build --component CPU --strip --parallel 8
# Jetsons need to be built in discrete stages
FROM --platform=linux/arm64 nvcr.io/nvidia/l4t-jetpack:${JETPACK_5} AS runners-jetpack5-arm64
ARG GOLANG_VERSION
RUN apt-get update && apt-get install -y git curl ccache && \
curl -s -L https://dl.google.com/go/go${GOLANG_VERSION}.linux-arm64.tar.gz | tar xz -C /usr/local && \
ln -s /usr/local/go/bin/go /usr/local/bin/go && \
ln -s /usr/local/go/bin/gofmt /usr/local/bin/gofmt && \
apt-get clean && rm -rf /var/lib/apt/lists/*
WORKDIR /go/src/github.com/ollama/ollama/
COPY . .
ARG CGO_CFLAGS
ENV GOARCH arm64
FROM base AS cuda-11
ARG CUDA11VERSION=11.3
RUN yum install -y cuda-toolkit-${CUDA11VERSION//./-}
ENV PATH=/usr/local/cuda-11/bin:$PATH
RUN --mount=type=cache,target=/root/.ccache \
make -j 5 dist_cuda_v11 \
CUDA_ARCHITECTURES="72;87" \
GPU_RUNNER_VARIANT=_jetpack5 \
DIST_LIB_DIR=/go/src/github.com/ollama/ollama/dist/linux-arm64-jetpack5/lib/ollama \
DIST_GPU_RUNNER_DEPS_DIR=/go/src/github.com/ollama/ollama/dist/linux-arm64-jetpack5/lib/ollama/cuda_jetpack5
cmake --preset 'CUDA 11' \
&& cmake --build --parallel --preset 'CUDA 11' \
&& cmake --install build --component CUDA --strip --parallel 8
FROM --platform=linux/arm64 nvcr.io/nvidia/l4t-jetpack:${JETPACK_6} AS runners-jetpack6-arm64
ARG GOLANG_VERSION
RUN apt-get update && apt-get install -y git curl ccache && \
curl -s -L https://dl.google.com/go/go${GOLANG_VERSION}.linux-arm64.tar.gz | tar xz -C /usr/local && \
ln -s /usr/local/go/bin/go /usr/local/bin/go && \
ln -s /usr/local/go/bin/gofmt /usr/local/bin/gofmt && \
apt-get clean && rm -rf /var/lib/apt/lists/*
WORKDIR /go/src/github.com/ollama/ollama/
COPY . .
ARG CGO_CFLAGS
ENV GOARCH arm64
FROM base AS cuda-12
ARG CUDA12VERSION=12.4
RUN yum install -y cuda-toolkit-${CUDA12VERSION//./-}
ENV PATH=/usr/local/cuda-12/bin:$PATH
RUN --mount=type=cache,target=/root/.ccache \
make -j 5 dist_cuda_v12 \
CUDA_ARCHITECTURES="87" \
GPU_RUNNER_VARIANT=_jetpack6 \
DIST_LIB_DIR=/go/src/github.com/ollama/ollama/dist/linux-arm64-jetpack6/lib/ollama \
DIST_GPU_RUNNER_DEPS_DIR=/go/src/github.com/ollama/ollama/dist/linux-arm64-jetpack6/lib/ollama/cuda_jetpack6
cmake --preset 'CUDA 12' \
&& cmake --build --parallel --preset 'CUDA 12' \
&& cmake --install build --component CUDA --strip --parallel 8
FROM --platform=linux/arm64 unified-builder-arm64 AS build-arm64
COPY . .
ARG OLLAMA_SKIP_CUDA_GENERATE
ARG OLLAMA_FAST_BUILD
FROM base AS rocm-6
RUN --mount=type=cache,target=/root/.ccache \
make -j 5 dist
COPY --from=runners-jetpack5-arm64 /go/src/github.com/ollama/ollama/dist/ dist/
COPY --from=runners-jetpack6-arm64 /go/src/github.com/ollama/ollama/dist/ dist/
RUN cd dist/linux-$GOARCH && \
tar -cf - . | pigz --best > ../ollama-linux-$GOARCH.tgz
RUN cd dist/linux-$GOARCH-jetpack5 && \
tar -cf - . | pigz --best > ../ollama-linux-$GOARCH-jetpack5.tgz
RUN cd dist/linux-$GOARCH-jetpack6 && \
tar -cf - . | pigz --best > ../ollama-linux-$GOARCH-jetpack6.tgz
cmake --preset 'ROCm 6' \
&& cmake --build --parallel --preset 'ROCm 6' \
&& cmake --install build --component HIP --strip --parallel 8
FROM --platform=linux/amd64 scratch AS dist-amd64
COPY --from=build-amd64 /go/src/github.com/ollama/ollama/dist/ollama-linux-*.tgz /
FROM --platform=linux/arm64 scratch AS dist-arm64
COPY --from=build-arm64 /go/src/github.com/ollama/ollama/dist/ollama-linux-*.tgz /
FROM dist-$TARGETARCH AS dist
FROM --platform=linux/arm64 nvcr.io/nvidia/l4t-jetpack:${JETPACK5VERSION} AS jetpack-5
ARG CMAKEVERSION
RUN apt-get update && apt-get install -y curl ccache \
&& curl -fsSL https://github.com/Kitware/CMake/releases/download/v${CMAKEVERSION}/cmake-${CMAKEVERSION}-linux-$(uname -m).tar.gz | tar xz -C /usr/local --strip-components 1
COPY CMakeLists.txt CMakePresets.json .
COPY ml/backend/ggml/ggml ml/backend/ggml/ggml
RUN --mount=type=cache,target=/root/.ccache \
cmake --preset 'JetPack 5' \
&& cmake --build --parallel --preset 'JetPack 5' \
&& cmake --install build --component CUDA --strip --parallel 8
FROM --platform=linux/arm64 nvcr.io/nvidia/l4t-jetpack:${JETPACK6VERSION} AS jetpack-6
ARG CMAKEVERSION
RUN apt-get update && apt-get install -y curl ccache \
&& curl -fsSL https://github.com/Kitware/CMake/releases/download/v${CMAKEVERSION}/cmake-${CMAKEVERSION}-linux-$(uname -m).tar.gz | tar xz -C /usr/local --strip-components 1
COPY CMakeLists.txt CMakePresets.json .
COPY ml/backend/ggml/ggml ml/backend/ggml/ggml
RUN --mount=type=cache,target=/root/.ccache \
cmake --preset 'JetPack 6' \
&& cmake --build --parallel --preset 'JetPack 6' \
&& cmake --install build --component CUDA --strip --parallel 8
# For amd64 container images, filter out cuda/rocm to minimize size
FROM build-amd64 AS runners-cuda-amd64
RUN rm -rf \
./dist/linux-amd64/lib/ollama/libggml_hipblas.so \
./dist/linux-amd64/lib/ollama/runners/rocm*
FROM base AS build
ARG GOVERSION=1.23.4
RUN curl -fsSL https://golang.org/dl/go${GOVERSION}.linux-$(case $(uname -m) in x86_64) echo amd64 ;; aarch64) echo arm64 ;; esac).tar.gz | tar xz -C /usr/local
ENV PATH=/usr/local/go/bin:$PATH
WORKDIR /go/src/github.com/ollama/ollama
COPY . .
ARG GOFLAGS="'-ldflags=-w -s'"
ENV CGO_ENABLED=1
RUN --mount=type=cache,target=/root/.cache/go-build \
go build -trimpath -buildmode=pie -o /bin/ollama .
FROM build-amd64 AS runners-rocm-amd64
RUN rm -rf \
./dist/linux-amd64/lib/ollama/libggml_cuda*.so \
./dist/linux-amd64/lib/ollama/libcu*.so* \
./dist/linux-amd64/lib/ollama/runners/cuda*
FROM --platform=linux/amd64 scratch AS amd64
COPY --from=cuda-11 dist/lib/ollama/cuda_v11 /lib/ollama/cuda_v11
COPY --from=cuda-12 dist/lib/ollama/cuda_v12 /lib/ollama/cuda_v12
FROM --platform=linux/amd64 ubuntu:22.04 AS runtime-amd64
RUN apt-get update && \
apt-get install -y ca-certificates && \
apt-get clean && rm -rf /var/lib/apt/lists/*
COPY --from=build-amd64 /go/src/github.com/ollama/ollama/dist/linux-amd64/bin/ /bin/
COPY --from=runners-cuda-amd64 /go/src/github.com/ollama/ollama/dist/linux-amd64/lib/ /lib/
FROM --platform=linux/arm64 scratch AS arm64
COPY --from=cuda-11 dist/lib/ollama/cuda_v11 /lib/ollama/cuda_v11
COPY --from=cuda-12 dist/lib/ollama/cuda_v12 /lib/ollama/cuda_v12
COPY --from=jetpack-5 dist/lib/ollama/cuda_v11 lib/ollama/cuda_jetpack5
COPY --from=jetpack-6 dist/lib/ollama/cuda_v12 lib/ollama/cuda_jetpack6
FROM --platform=linux/arm64 ubuntu:22.04 AS runtime-arm64
RUN apt-get update && \
apt-get install -y ca-certificates && \
apt-get clean && rm -rf /var/lib/apt/lists/*
COPY --from=build-arm64 /go/src/github.com/ollama/ollama/dist/linux-arm64/bin/ /bin/
COPY --from=build-arm64 /go/src/github.com/ollama/ollama/dist/linux-arm64/lib/ /lib/
COPY --from=runners-jetpack5-arm64 /go/src/github.com/ollama/ollama/dist/linux-arm64-jetpack5/lib/ /lib/
COPY --from=runners-jetpack6-arm64 /go/src/github.com/ollama/ollama/dist/linux-arm64-jetpack6/lib/ /lib/
FROM --platform=linux/arm64 scratch AS rocm
COPY --from=rocm-6 dist/lib/ollama/rocm /lib/ollama/rocm
FROM ${FLAVOR} AS archive
COPY --from=cpu dist/lib/ollama /lib/ollama
COPY --from=build /bin/ollama /bin/ollama
# ROCm libraries larger so we keep it distinct from the CPU/CUDA image
FROM --platform=linux/amd64 ubuntu:22.04 AS runtime-rocm
# Frontload the rocm libraries which are large, and rarely change to increase chance of a common layer
# across releases
COPY --from=build-amd64 /go/src/github.com/ollama/ollama/dist/linux-amd64-rocm/lib/ /lib/
RUN apt-get update && \
apt-get install -y ca-certificates && \
apt-get clean && rm -rf /var/lib/apt/lists/*
COPY --from=build-amd64 /go/src/github.com/ollama/ollama/dist/linux-amd64/bin/ /bin/
COPY --from=runners-rocm-amd64 /go/src/github.com/ollama/ollama/dist/linux-amd64/lib/ /lib/
EXPOSE 11434
ENV OLLAMA_HOST 0.0.0.0
ENTRYPOINT ["/bin/ollama"]
CMD ["serve"]
FROM runtime-$TARGETARCH
EXPOSE 11434
ENV OLLAMA_HOST 0.0.0.0
FROM ubuntu:20.04
RUN apt-get update \
&& apt-get install -y ca-certificates \
&& apt-get clean \
&& rm -rf /var/lib/apt/lists/*
COPY --from=archive /bin /usr/bin
ENV PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
COPY --from=archive /lib/ollama /usr/lib/ollama
ENV LD_LIBRARY_PATH=/usr/local/nvidia/lib:/usr/local/nvidia/lib64
ENV NVIDIA_DRIVER_CAPABILITIES=compute,utility
ENV NVIDIA_VISIBLE_DEVICES=all
ENV OLLAMA_HOST=0.0.0.0:11434
EXPOSE 11434
ENTRYPOINT ["/bin/ollama"]
CMD ["serve"]

105
Makefile
View File

@@ -1,105 +0,0 @@
# top level makefile for Ollama
include make/common-defs.make
# Determine which if any GPU runners we should build
include make/cuda-v11-defs.make
include make/cuda-v12-defs.make
include make/rocm-defs.make
ifeq ($(CUSTOM_CPU_FLAGS),)
ifneq ($(OS),darwin)
ifeq ($(ARCH),amd64)
RUNNER_TARGETS=cpu
endif
endif
# Without CUSTOM_CPU_FLAGS we default to build both v11 and v12 if present
ifeq ($(OLLAMA_SKIP_CUDA_GENERATE),)
ifneq ($(CUDA_11_COMPILER),)
RUNNER_TARGETS += cuda_v11
endif
ifneq ($(CUDA_12_COMPILER),)
RUNNER_TARGETS += cuda_v12
endif
endif
else # CUSTOM_CPU_FLAGS is set, we'll build only the latest cuda version detected
ifneq ($(CUDA_12_COMPILER),)
RUNNER_TARGETS += cuda_v12
else ifneq ($(CUDA_11_COMPILER),)
RUNNER_TARGETS += cuda_v11
endif
endif
ifeq ($(OLLAMA_SKIP_ROCM_GENERATE),)
ifneq ($(HIP_COMPILER),)
RUNNER_TARGETS += rocm
endif
endif
all: runners exe
dist: $(addprefix dist_, $(RUNNER_TARGETS)) dist_exe
dist_%:
@$(MAKE) --no-print-directory -f make/Makefile.$* dist
runners: $(RUNNER_TARGETS)
$(RUNNER_TARGETS):
@$(MAKE) --no-print-directory -f make/Makefile.$@
exe dist_exe:
@$(MAKE) --no-print-directory -f make/Makefile.ollama $@
help-sync apply-patches create-patches sync sync-clean:
@$(MAKE) --no-print-directory -f make/Makefile.sync $@
test integration lint:
@$(MAKE) --no-print-directory -f make/Makefile.test $@
clean:
rm -rf $(BUILD_DIR) $(DIST_LIB_DIR) $(OLLAMA_EXE) $(DIST_OLLAMA_EXE)
go clean -cache
help:
@echo "The following make targets will help you build Ollama"
@echo ""
@echo " make all # (default target) Build Ollama llm subprocess runners, and the primary ollama executable"
@echo " make runners # Build Ollama llm subprocess runners; after you may use 'go build .' to build the primary ollama exectuable"
@echo " make <runner> # Build specific runners. Enabled: '$(RUNNER_TARGETS)'"
@echo " make dist # Build the runners and primary ollama executable for distribution"
@echo " make help-sync # Help information on vendor update targets"
@echo " make help-runners # Help information on runner targets"
@echo ""
@echo "The following make targets will help you test Ollama"
@echo ""
@echo " make test # Run unit tests"
@echo " make integration # Run integration tests. You must 'make all' first"
@echo " make lint # Run lint and style tests"
@echo ""
@echo "For more information see 'docs/development.md'"
@echo ""
help-runners:
@echo "The following runners will be built based on discovered GPU libraries: '$(RUNNER_TARGETS)'"
@echo ""
@echo "GPU Runner CPU Flags: '$(GPU_RUNNER_CPU_FLAGS)' (Override with CUSTOM_CPU_FLAGS)"
@echo ""
@echo "# CUDA_PATH sets the location where CUDA toolkits are present"
@echo "CUDA_PATH=$(CUDA_PATH)"
@echo " CUDA_11_PATH=$(CUDA_11_PATH)"
@echo " CUDA_11_COMPILER=$(CUDA_11_COMPILER)"
@echo " CUDA_12_PATH=$(CUDA_12_PATH)"
@echo " CUDA_12_COMPILER=$(CUDA_12_COMPILER)"
@echo ""
@echo "# HIP_PATH sets the location where the ROCm toolkit is present"
@echo "HIP_PATH=$(HIP_PATH)"
@echo " HIP_COMPILER=$(HIP_COMPILER)"
.PHONY: all exe dist help help-sync help-runners test integration lint runners clean $(RUNNER_TARGETS)
# Handy debugging for make variables
print-%:
@echo '$*=$($*)'

60
Makefile.sync Normal file
View File

@@ -0,0 +1,60 @@
UPSTREAM=https://github.com/ggerganov/llama.cpp.git
WORKDIR=llama/vendor
FETCH_HEAD=46e3556e01b824e52395fb050b29804b6cff2a7c
.PHONY: help
help:
@echo "Available targets:"
@echo " sync Sync with upstream repositories"
@echo " checkout Checkout upstream repository"
@echo " apply-patches Apply patches to local repository"
@echo " format-patches Format patches from local repository"
@echo " clean Clean local repository"
@echo
@echo "Example:"
@echo " make -f $(lastword $(MAKEFILE_LIST)) clean sync"
.PHONY: sync
sync: llama/build-info.cpp llama/llama.cpp ml/backend/ggml/ggml apply-patches
.PHONY: llama/build-info.cpp
llama/build-info.cpp: llama/build-info.cpp.in
sed -e 's|@FETCH_HEAD@|$(FETCH_HEAD)|' $< > $@
.PHONY: llama/llama.cpp
llama/llama.cpp: llama/vendor/ apply-patches
rsync -arvzc -f "merge $@/.rsync-filter" $< $@
.PHONY: ml/backend/ggml/ggml apply-patches
ml/backend/ggml/ggml: llama/vendor/ggml/ apply-patches
rsync -arvzc -f "merge $@/.rsync-filter" $< $@
PATCHES=$(wildcard llama/patches/*.patch)
.PHONY: apply-patches
.NOTPARALLEL:
apply-patches: $(addsuffix ed, $(PATCHES))
%.patched: %.patch
@if git -c user.name=nobody -c 'user.email=<>' -C $(WORKDIR) am -3 $(realpath $<); then touch $@; else git -C $(WORKDIR) am --abort; exit 1; fi
.PHONY: checkout
checkout: $(WORKDIR)
git -C $(WORKDIR) fetch
git -C $(WORKDIR) checkout -f $(FETCH_HEAD)
$(WORKDIR):
git clone $(UPSTREAM) $(WORKDIR)
.PHONE: format-patches
format-patches: llama/patches
git -C $(WORKDIR) format-patch \
--no-signature \
--no-numbered \
--zero-commit \
-o $(realpath $<) \
$(FETCH_HEAD)
.PHONE: clean
clean: checkout
$(RM) $(addsuffix ed, $(PATCHES))

View File

@@ -1,11 +1,11 @@
<div align="center">
 <img alt="ollama" height="200px" src="https://github.com/ollama/ollama/assets/3325447/0d0b44e2-8f4a-4e99-9b52-a5c1c741c8f7">
  <a href="https://ollama.com" />
<img alt="ollama" height="200px" src="https://github.com/ollama/ollama/assets/3325447/0d0b44e2-8f4a-4e99-9b52-a5c1c741c8f7">
</a>
</div>
# Ollama
[![Discord](https://dcbadge.vercel.app/api/server/ollama?style=flat&compact=true)](https://discord.gg/ollama)
Get up and running with large language models.
### macOS
@@ -33,6 +33,11 @@ The official [Ollama Docker image](https://hub.docker.com/r/ollama/ollama) `olla
- [ollama-python](https://github.com/ollama/ollama-python)
- [ollama-js](https://github.com/ollama/ollama-js)
### Community
- [Discord](https://discord.gg/ollama)
- [Reddit](https://reddit.com/r/ollama)
## Quickstart
To run and chat with [Llama 3.2](https://ollama.com/library/llama3.2):
@@ -56,8 +61,8 @@ Here are some example models that can be downloaded:
| Llama 3.2 Vision | 90B | 55GB | `ollama run llama3.2-vision:90b` |
| Llama 3.1 | 8B | 4.7GB | `ollama run llama3.1` |
| Llama 3.1 | 405B | 231GB | `ollama run llama3.1:405b` |
| Phi 4 | 14B | 9.1GB | `ollama run phi4` |
| Phi 3 Mini | 3.8B | 2.3GB | `ollama run phi3` |
| Phi 3 Medium | 14B | 7.9GB | `ollama run phi3:medium` |
| Gemma 2 | 2B | 1.6GB | `ollama run gemma2:2b` |
| Gemma 2 | 9B | 5.5GB | `ollama run gemma2` |
| Gemma 2 | 27B | 16GB | `ollama run gemma2:27b` |
@@ -97,7 +102,7 @@ Ollama supports importing GGUF models in the Modelfile:
ollama run example
```
### Import from PyTorch or Safetensors
### Import from Safetensors
See the [guide](docs/import.md) on importing models for more information.
@@ -132,7 +137,7 @@ ollama run mario
Hello! It's your friend Mario.
```
For more examples, see the [examples](examples) directory. For more information on working with a Modelfile, see the [Modelfile](docs/modelfile.md) documentation.
For more information on working with a Modelfile, see the [Modelfile](docs/modelfile.md) documentation.
## CLI Reference
@@ -298,6 +303,7 @@ See the [API documentation](./docs/api.md) for all endpoints.
- [AnythingLLM (Docker + MacOs/Windows/Linux native app)](https://github.com/Mintplex-Labs/anything-llm)
- [Ollama Basic Chat: Uses HyperDiv Reactive UI](https://github.com/rapidarchitect/ollama_basic_chat)
- [Ollama-chats RPG](https://github.com/drazdra/ollama-chats)
- [IntelliBar](https://intellibar.app/) (AI-powered assistant for macOS)
- [QA-Pilot](https://github.com/reid41/QA-Pilot) (Interactive chat tool that can leverage Ollama models for rapid understanding and navigation of GitHub code repositories)
- [ChatOllama](https://github.com/sugarforever/chat-ollama) (Open Source Chatbot based on Ollama with Knowledge Bases)
- [CRAG Ollama Chat](https://github.com/Nagi-ovo/CRAG-Ollama-Chat) (Simple Web Search with Corrective RAG)
@@ -327,6 +333,7 @@ See the [API documentation](./docs/api.md) for all endpoints.
- [BoltAI for Mac](https://boltai.com) (AI Chat Client for Mac)
- [Harbor](https://github.com/av/harbor) (Containerized LLM Toolkit with Ollama as default backend)
- [PyGPT](https://github.com/szczyglis-dev/py-gpt) (AI desktop assistant for Linux, Windows and Mac)
- [Alpaca](https://github.com/Jeffser/Alpaca) (An Ollama client application for linux and macos made with GTK4 and Adwaita)
- [AutoGPT](https://github.com/Significant-Gravitas/AutoGPT/blob/master/docs/content/platform/ollama.md) (AutoGPT Ollama integration)
- [Go-CREW](https://www.jonathanhecl.com/go-crew/) (Powerful Offline RAG in Golang)
- [PartCAD](https://github.com/openvmp/partcad/) (CAD model generation with OpenSCAD and CadQuery)
@@ -361,6 +368,10 @@ See the [API documentation](./docs/api.md) for all endpoints.
- [Abbey](https://github.com/US-Artificial-Intelligence/abbey) (A configurable AI interface server with notebooks, document storage, and YouTube support)
- [Minima](https://github.com/dmayboroda/minima) (RAG with on-premises or fully local workflow)
- [aidful-ollama-model-delete](https://github.com/AidfulAI/aidful-ollama-model-delete) (User interface for simplified model cleanup)
- [Perplexica](https://github.com/ItzCrazyKns/Perplexica) (An AI-powered search engine & an open-source alternative to Perplexity AI)
- [AI Toolkit for Visual Studio Code](https://aka.ms/ai-tooklit/ollama-docs) (Microsoft-official VSCode extension to chat, test, evaluate models with Ollama support, and use them in your AI applications.)
- [MinimalNextOllamaChat](https://github.com/anilkay/MinimalNextOllamaChat) (Minimal Web UI for Chat and Model Control)
- [Chipper](https://github.com/TilmanGriesel/chipper) AI interface for tinkerers (Ollama, Haystack RAG, Python)
### Cloud
@@ -373,6 +384,7 @@ See the [API documentation](./docs/api.md) for all endpoints.
- [oterm](https://github.com/ggozad/oterm)
- [Ellama Emacs client](https://github.com/s-kostyaev/ellama)
- [Emacs client](https://github.com/zweifisch/ollama)
- [neollama](https://github.com/paradoxical-dev/neollama) UI client for interacting with models from within Neovim
- [gen.nvim](https://github.com/David-Kunz/gen.nvim)
- [ollama.nvim](https://github.com/nomnivore/ollama.nvim)
- [ollero.nvim](https://github.com/marco-souza/ollero.nvim)
@@ -407,6 +419,8 @@ See the [API documentation](./docs/api.md) for all endpoints.
### Database
- [pgai](https://github.com/timescale/pgai) - PostgreSQL as a vector database (Create and search embeddings from Ollama models using pgvector)
- [Get started guide](https://github.com/timescale/pgai/blob/main/docs/vectorizer-quick-start.md)
- [MindsDB](https://github.com/mindsdb/mindsdb/blob/staging/mindsdb/integrations/handlers/ollama_handler/README.md) (Connects Ollama models with nearly 200 data platforms and apps)
- [chromem-go](https://github.com/philippgille/chromem-go/blob/v0.5.0/embed_ollama.go) with [example](https://github.com/philippgille/chromem-go/tree/v0.5.0/examples/rag-wikipedia-ollama)
- [Kangaroo](https://github.com/dbkangaroo/kangaroo) (AI-powered SQL client and admin tool for popular databases)
@@ -425,10 +439,12 @@ See the [API documentation](./docs/api.md) for all endpoints.
- [LangChain](https://python.langchain.com/docs/integrations/llms/ollama) and [LangChain.js](https://js.langchain.com/docs/integrations/chat/ollama/) with [example](https://js.langchain.com/docs/tutorials/local_rag/)
- [Firebase Genkit](https://firebase.google.com/docs/genkit/plugins/ollama)
- [crewAI](https://github.com/crewAIInc/crewAI)
- [Yacana](https://remembersoftwares.github.io/yacana/) (User-friendly multi-agent framework for brainstorming and executing predetermined flows with built-in tool integration)
- [Spring AI](https://github.com/spring-projects/spring-ai) with [reference](https://docs.spring.io/spring-ai/reference/api/chat/ollama-chat.html) and [example](https://github.com/tzolov/ollama-tools)
- [LangChainGo](https://github.com/tmc/langchaingo/) with [example](https://github.com/tmc/langchaingo/tree/main/examples/ollama-completion-example)
- [LangChain4j](https://github.com/langchain4j/langchain4j) with [example](https://github.com/langchain4j/langchain4j-examples/tree/main/ollama-examples/src/main/java)
- [LangChainRust](https://github.com/Abraxas-365/langchain-rust) with [example](https://github.com/Abraxas-365/langchain-rust/blob/main/examples/llm_ollama.rs)
- [LangChain for .NET](https://github.com/tryAGI/LangChain) with [example](https://github.com/tryAGI/LangChain/blob/main/examples/LangChain.Samples.OpenAI/Program.cs)
- [LLPhant](https://github.com/theodo-group/LLPhant?tab=readme-ov-file#ollama)
- [LlamaIndex](https://docs.llamaindex.ai/en/stable/examples/llm/ollama/) and [LlamaIndexTS](https://ts.llamaindex.ai/modules/llms/available_llms/ollama)
- [LiteLLM](https://github.com/BerriAI/litellm)
@@ -468,6 +484,7 @@ See the [API documentation](./docs/api.md) for all endpoints.
- [GoLamify](https://github.com/prasad89/golamify)
- [Ollama for Haskell](https://github.com/tusharad/ollama-haskell)
- [multi-llm-ts](https://github.com/nbonamy/multi-llm-ts) (A Typescript/JavaScript library allowing access to different LLM in unified API)
- [LlmTornado](https://github.com/lofcz/llmtornado) (C# library providing a unified interface for major FOSS & Commercial inference APIs)
### Mobile
@@ -517,6 +534,8 @@ See the [API documentation](./docs/api.md) for all endpoints.
- [AI Summmary Helper plugin](https://github.com/philffm/ai-summary-helper)
- [TextCraft](https://github.com/suncloudsmoon/TextCraft) (Copilot in Word alternative using Ollama)
- [Alfred Ollama](https://github.com/zeitlings/alfred-ollama) (Alfred Workflow)
- [TextLLaMA](https://github.com/adarshM84/TextLLaMA) A Chrome Extension that helps you write emails, correct grammar, and translate into any language
- [Simple-Discord-AI](https://github.com/zyphixor/simple-discord-ai)
### Supported backends
@@ -525,4 +544,6 @@ See the [API documentation](./docs/api.md) for all endpoints.
### Observability
- [OpenLIT](https://github.com/openlit/openlit) is an OpenTelemetry-native tool for monitoring Ollama Applications & GPUs using traces and metrics.
- [HoneyHive](https://docs.honeyhive.ai/integrations/ollama) is an AI observability and evaluation platform for AI agents. Use HoneyHive to evaluate agent performance, interrogate failures, and monitor quality in production.
- [HoneyHive](https://docs.honeyhive.ai/integrations/ollama) is an AI observability and evaluation platform for AI agents. Use HoneyHive to evaluate agent performance, interrogate failures, and monitor quality in production.
- [Langfuse](https://langfuse.com/docs/integrations/ollama) is an open source LLM observability platform that enables teams to collaboratively monitor, evaluate and debug AI applications.
- [MLflow Tracing](https://mlflow.org/docs/latest/llms/tracing/index.html#automatic-tracing) is an open source LLM observability tool with a convenient API to log and visualize traces, making it easy to debug and evaluate GenAI applications.

17
api/examples/README.md Normal file
View File

@@ -0,0 +1,17 @@
# Ollama API Examples
Run the examples in this directory with:
```
go run example_name/main.go
```
## Chat - Chat with a model
- [chat/main.go](chat/main.go)
## Generate - Generate text from a model
- [generate/main.go](generate/main.go)
- [generate-streaming/main.go](generate-streaming/main.go)
## Pull - Pull a model
- [pull-progress/main.go](pull-progress/main.go)

View File

@@ -225,7 +225,6 @@ type Options struct {
Mirostat int `json:"mirostat,omitempty"`
MirostatTau float32 `json:"mirostat_tau,omitempty"`
MirostatEta float32 `json:"mirostat_eta,omitempty"`
PenalizeNewline bool `json:"penalize_newline,omitempty"`
Stop []string `json:"stop,omitempty"`
}
@@ -295,17 +294,21 @@ type EmbeddingResponse struct {
// CreateRequest is the request passed to [Client.Create].
type CreateRequest struct {
Model string `json:"model"`
Modelfile string `json:"modelfile"`
Stream *bool `json:"stream,omitempty"`
Quantize string `json:"quantize,omitempty"`
Model string `json:"model"`
Stream *bool `json:"stream,omitempty"`
Quantize string `json:"quantize,omitempty"`
From string `json:"from,omitempty"`
Files map[string]string `json:"files,omitempty"`
Adapters map[string]string `json:"adapters,omitempty"`
Template string `json:"template,omitempty"`
License any `json:"license,omitempty"`
System string `json:"system,omitempty"`
Parameters map[string]any `json:"parameters,omitempty"`
Messages []Message `json:"messages,omitempty"`
// Deprecated: set the model name with Model instead
Name string `json:"name"`
// Deprecated: set the file content with Modelfile instead
Path string `json:"path"`
// Deprecated: use Quantize instead
Quantization string `json:"quantization,omitempty"`
}
@@ -602,7 +605,6 @@ func DefaultOptions() Options {
Mirostat: 0,
MirostatTau: 5.0,
MirostatEta: 0.1,
PenalizeNewline: true,
Seed: -1,
Runner: Runner{

View File

@@ -1,13 +1,10 @@
package cmd
import (
"archive/zip"
"bufio"
"bytes"
"context"
"crypto/ed25519"
"crypto/rand"
"crypto/sha256"
"encoding/json"
"encoding/pem"
"errors"
@@ -46,15 +43,11 @@ import (
"github.com/ollama/ollama/version"
)
var (
errModelNotFound = errors.New("no Modelfile or safetensors files found")
errModelfileNotFound = errors.New("specified Modelfile wasn't found")
)
var errModelfileNotFound = errors.New("specified Modelfile wasn't found")
func getModelfileName(cmd *cobra.Command) (string, error) {
fn, _ := cmd.Flags().GetString("file")
filename, _ := cmd.Flags().GetString("file")
filename := fn
if filename == "" {
filename = "Modelfile"
}
@@ -66,7 +59,7 @@ func getModelfileName(cmd *cobra.Command) (string, error) {
_, err = os.Stat(absName)
if err != nil {
return fn, err
return "", err
}
return absName, nil
@@ -102,68 +95,52 @@ func CreateHandler(cmd *cobra.Command, args []string) error {
return err
}
home, err := os.UserHomeDir()
status := "gathering model components"
spinner := progress.NewSpinner(status)
p.Add(status, spinner)
req, err := modelfile.CreateRequest(filepath.Dir(filename))
if err != nil {
return err
}
spinner.Stop()
status := "transferring model data"
spinner := progress.NewSpinner(status)
p.Add(status, spinner)
defer p.Stop()
req.Name = args[0]
quantize, _ := cmd.Flags().GetString("quantize")
if quantize != "" {
req.Quantize = quantize
}
client, err := api.ClientFromEnvironment()
if err != nil {
return err
}
for i := range modelfile.Commands {
switch modelfile.Commands[i].Name {
case "model", "adapter":
path := modelfile.Commands[i].Args
if path == "~" {
path = home
} else if strings.HasPrefix(path, "~/") {
path = filepath.Join(home, path[2:])
}
if !filepath.IsAbs(path) {
path = filepath.Join(filepath.Dir(filename), path)
}
fi, err := os.Stat(path)
if errors.Is(err, os.ErrNotExist) && modelfile.Commands[i].Name == "model" {
continue
} else if err != nil {
if len(req.Files) > 0 {
fileMap := map[string]string{}
for f, digest := range req.Files {
if _, err := createBlob(cmd, client, f, digest, p); err != nil {
return err
}
if fi.IsDir() {
// this is likely a safetensors or pytorch directory
// TODO make this work w/ adapters
tempfile, err := tempZipFiles(path)
if err != nil {
return err
}
defer os.RemoveAll(tempfile)
path = tempfile
}
digest, err := createBlob(cmd, client, path, spinner)
if err != nil {
return err
}
modelfile.Commands[i].Args = "@" + digest
fileMap[filepath.Base(f)] = digest
}
req.Files = fileMap
}
if len(req.Adapters) > 0 {
fileMap := map[string]string{}
for f, digest := range req.Adapters {
if _, err := createBlob(cmd, client, f, digest, p); err != nil {
return err
}
fileMap[filepath.Base(f)] = digest
}
req.Adapters = fileMap
}
bars := make(map[string]*progress.Bar)
fn := func(resp api.ProgressResponse) error {
if resp.Digest != "" {
spinner.Stop()
bar, ok := bars[resp.Digest]
if !ok {
bar = progress.NewBar(fmt.Sprintf("pulling %s...", resp.Digest[7:19]), resp.Total, resp.Completed)
@@ -183,145 +160,23 @@ func CreateHandler(cmd *cobra.Command, args []string) error {
return nil
}
quantize, _ := cmd.Flags().GetString("quantize")
request := api.CreateRequest{Name: args[0], Modelfile: modelfile.String(), Quantize: quantize}
if err := client.Create(cmd.Context(), &request, fn); err != nil {
if err := client.Create(cmd.Context(), req, fn); err != nil {
if strings.Contains(err.Error(), "path or Modelfile are required") {
return fmt.Errorf("the ollama server must be updated to use `ollama create` with this client")
}
return err
}
return nil
}
func tempZipFiles(path string) (string, error) {
tempfile, err := os.CreateTemp("", "ollama-tf")
func createBlob(cmd *cobra.Command, client *api.Client, path string, digest string, p *progress.Progress) (string, error) {
realPath, err := filepath.EvalSymlinks(path)
if err != nil {
return "", err
}
defer tempfile.Close()
detectContentType := func(path string) (string, error) {
f, err := os.Open(path)
if err != nil {
return "", err
}
defer f.Close()
var b bytes.Buffer
b.Grow(512)
if _, err := io.CopyN(&b, f, 512); err != nil && !errors.Is(err, io.EOF) {
return "", err
}
contentType, _, _ := strings.Cut(http.DetectContentType(b.Bytes()), ";")
return contentType, nil
}
glob := func(pattern, contentType string) ([]string, error) {
matches, err := filepath.Glob(pattern)
if err != nil {
return nil, err
}
for _, safetensor := range matches {
if ct, err := detectContentType(safetensor); err != nil {
return nil, err
} else if ct != contentType {
return nil, fmt.Errorf("invalid content type: expected %s for %s", ct, safetensor)
}
}
return matches, nil
}
var files []string
if st, _ := glob(filepath.Join(path, "model*.safetensors"), "application/octet-stream"); len(st) > 0 {
// safetensors files might be unresolved git lfs references; skip if they are
// covers model-x-of-y.safetensors, model.fp32-x-of-y.safetensors, model.safetensors
files = append(files, st...)
} else if st, _ := glob(filepath.Join(path, "adapters.safetensors"), "application/octet-stream"); len(st) > 0 {
// covers adapters.safetensors
files = append(files, st...)
} else if st, _ := glob(filepath.Join(path, "adapter_model.safetensors"), "application/octet-stream"); len(st) > 0 {
// covers adapter_model.safetensors
files = append(files, st...)
} else if pt, _ := glob(filepath.Join(path, "pytorch_model*.bin"), "application/zip"); len(pt) > 0 {
// pytorch files might also be unresolved git lfs references; skip if they are
// covers pytorch_model-x-of-y.bin, pytorch_model.fp32-x-of-y.bin, pytorch_model.bin
files = append(files, pt...)
} else if pt, _ := glob(filepath.Join(path, "consolidated*.pth"), "application/zip"); len(pt) > 0 {
// pytorch files might also be unresolved git lfs references; skip if they are
// covers consolidated.x.pth, consolidated.pth
files = append(files, pt...)
} else {
return "", errModelNotFound
}
// add configuration files, json files are detected as text/plain
js, err := glob(filepath.Join(path, "*.json"), "text/plain")
if err != nil {
return "", err
}
files = append(files, js...)
// bert models require a nested config.json
// TODO(mxyng): merge this with the glob above
js, err = glob(filepath.Join(path, "**/*.json"), "text/plain")
if err != nil {
return "", err
}
files = append(files, js...)
if tks, _ := glob(filepath.Join(path, "tokenizer.model"), "application/octet-stream"); len(tks) > 0 {
// add tokenizer.model if it exists, tokenizer.json is automatically picked up by the previous glob
// tokenizer.model might be a unresolved git lfs reference; error if it is
files = append(files, tks...)
} else if tks, _ := glob(filepath.Join(path, "**/tokenizer.model"), "text/plain"); len(tks) > 0 {
// some times tokenizer.model is in a subdirectory (e.g. meta-llama/Meta-Llama-3-8B)
files = append(files, tks...)
}
zipfile := zip.NewWriter(tempfile)
defer zipfile.Close()
for _, file := range files {
f, err := os.Open(file)
if err != nil {
return "", err
}
defer f.Close()
fi, err := f.Stat()
if err != nil {
return "", err
}
zfi, err := zip.FileInfoHeader(fi)
if err != nil {
return "", err
}
zfi.Name, err = filepath.Rel(path, file)
if err != nil {
return "", err
}
zf, err := zipfile.CreateHeader(zfi)
if err != nil {
return "", err
}
if _, err := io.Copy(zf, f); err != nil {
return "", err
}
}
return tempfile.Name(), nil
}
func createBlob(cmd *cobra.Command, client *api.Client, path string, spinner *progress.Spinner) (string, error) {
bin, err := os.Open(path)
bin, err := os.Open(realPath)
if err != nil {
return "", err
}
@@ -334,18 +189,11 @@ func createBlob(cmd *cobra.Command, client *api.Client, path string, spinner *pr
}
fileSize := fileInfo.Size()
hash := sha256.New()
if _, err := io.Copy(hash, bin); err != nil {
return "", err
}
if _, err := bin.Seek(0, io.SeekStart); err != nil {
return "", err
}
var pw progressWriter
status := "transferring model data 0%"
spinner.SetMessage(status)
status := fmt.Sprintf("copying file %s 0%%", digest)
spinner := progress.NewSpinner(status)
p.Add(status, spinner)
defer spinner.Stop()
done := make(chan struct{})
defer close(done)
@@ -356,15 +204,14 @@ func createBlob(cmd *cobra.Command, client *api.Client, path string, spinner *pr
for {
select {
case <-ticker.C:
spinner.SetMessage(fmt.Sprintf("transferring model data %d%%", int(100*pw.n.Load()/fileSize)))
spinner.SetMessage(fmt.Sprintf("copying file %s %d%%", digest, int(100*pw.n.Load()/fileSize)))
case <-done:
spinner.SetMessage("transferring model data 100%")
spinner.SetMessage(fmt.Sprintf("copying file %s 100%%", digest))
return
}
}
}()
digest := fmt.Sprintf("sha256:%x", hash.Sum(nil))
if err = client.CreateBlob(cmd.Context(), digest, io.TeeReader(bin, &pw)); err != nil {
return "", err
}
@@ -601,7 +448,7 @@ func ListHandler(cmd *cobra.Command, args []string) error {
var data [][]string
for _, m := range models.Models {
if len(args) == 0 || strings.HasPrefix(m.Name, args[0]) {
if len(args) == 0 || strings.HasPrefix(strings.ToLower(m.Name), strings.ToLower(args[0])) {
data = append(data, []string{m.Name, m.Digest[:12], format.HumanBytes(m.Size), format.HumanTime(m.ModifiedAt, "Never")})
}
}

View File

@@ -293,7 +293,7 @@ func TestGetModelfileName(t *testing.T) {
name: "modelfile specified, no modelfile exists",
modelfileName: "crazyfile",
fileExists: false,
expectedName: "crazyfile",
expectedName: "",
expectedErr: os.ErrNotExist,
},
{
@@ -338,8 +338,8 @@ func TestGetModelfileName(t *testing.T) {
t.Fatalf("couldn't set file flag: %v", err)
}
} else {
expectedFilename = tt.expectedName
if tt.modelfileName != "" {
expectedFilename = tt.modelfileName
err := cmd.Flags().Set("file", tt.modelfileName)
if err != nil {
t.Fatalf("couldn't set file flag: %v", err)
@@ -489,3 +489,130 @@ func TestPushHandler(t *testing.T) {
})
}
}
func TestCreateHandler(t *testing.T) {
tests := []struct {
name string
modelName string
modelFile string
serverResponse map[string]func(w http.ResponseWriter, r *http.Request)
expectedError string
expectedOutput string
}{
{
name: "successful create",
modelName: "test-model",
modelFile: "FROM foo",
serverResponse: map[string]func(w http.ResponseWriter, r *http.Request){
"/api/create": func(w http.ResponseWriter, r *http.Request) {
if r.Method != http.MethodPost {
t.Errorf("expected POST request, got %s", r.Method)
}
req := api.CreateRequest{}
if err := json.NewDecoder(r.Body).Decode(&req); err != nil {
http.Error(w, err.Error(), http.StatusBadRequest)
return
}
if req.Name != "test-model" {
t.Errorf("expected model name 'test-model', got %s", req.Name)
}
if req.From != "foo" {
t.Errorf("expected from 'foo', got %s", req.From)
}
responses := []api.ProgressResponse{
{Status: "using existing layer sha256:56bb8bd477a519ffa694fc449c2413c6f0e1d3b1c88fa7e3c9d88d3ae49d4dcb"},
{Status: "writing manifest"},
{Status: "success"},
}
for _, resp := range responses {
if err := json.NewEncoder(w).Encode(resp); err != nil {
http.Error(w, err.Error(), http.StatusInternalServerError)
return
}
w.(http.Flusher).Flush()
}
},
},
expectedOutput: "",
},
}
for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
mockServer := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
handler, ok := tt.serverResponse[r.URL.Path]
if !ok {
t.Errorf("unexpected request to %s", r.URL.Path)
http.Error(w, "not found", http.StatusNotFound)
return
}
handler(w, r)
}))
t.Setenv("OLLAMA_HOST", mockServer.URL)
t.Cleanup(mockServer.Close)
tempFile, err := os.CreateTemp("", "modelfile")
if err != nil {
t.Fatal(err)
}
defer os.Remove(tempFile.Name())
if _, err := tempFile.WriteString(tt.modelFile); err != nil {
t.Fatal(err)
}
if err := tempFile.Close(); err != nil {
t.Fatal(err)
}
cmd := &cobra.Command{}
cmd.Flags().String("file", "", "")
if err := cmd.Flags().Set("file", tempFile.Name()); err != nil {
t.Fatal(err)
}
cmd.Flags().Bool("insecure", false, "")
cmd.SetContext(context.TODO())
// Redirect stderr to capture progress output
oldStderr := os.Stderr
r, w, _ := os.Pipe()
os.Stderr = w
// Capture stdout for the "Model pushed" message
oldStdout := os.Stdout
outR, outW, _ := os.Pipe()
os.Stdout = outW
err = CreateHandler(cmd, []string{tt.modelName})
// Restore stderr
w.Close()
os.Stderr = oldStderr
// drain the pipe
if _, err := io.ReadAll(r); err != nil {
t.Fatal(err)
}
// Restore stdout and get output
outW.Close()
os.Stdout = oldStdout
stdout, _ := io.ReadAll(outR)
if tt.expectedError == "" {
if err != nil {
t.Errorf("expected no error, got %v", err)
}
if tt.expectedOutput != "" {
if got := string(stdout); got != tt.expectedOutput {
t.Errorf("expected output %q, got %q", tt.expectedOutput, got)
}
}
}
})
}
}

View File

@@ -13,11 +13,9 @@ import (
"strings"
"github.com/spf13/cobra"
"golang.org/x/exp/maps"
"github.com/ollama/ollama/api"
"github.com/ollama/ollama/envconfig"
"github.com/ollama/ollama/parser"
"github.com/ollama/ollama/readline"
"github.com/ollama/ollama/types/errtypes"
)
@@ -213,10 +211,7 @@ func generateInteractive(cmd *cobra.Command, opts runOptions) error {
return err
}
req := &api.CreateRequest{
Name: args[1],
Modelfile: buildModelfile(opts),
}
req := NewCreateRequest(args[1], opts)
fn := func(resp api.ProgressResponse) error { return nil }
err = client.Create(cmd.Context(), req, fn)
if err != nil {
@@ -459,36 +454,25 @@ func generateInteractive(cmd *cobra.Command, opts runOptions) error {
}
}
func buildModelfile(opts runOptions) string {
var f parser.File
f.Commands = append(f.Commands, parser.Command{Name: "model", Args: cmp.Or(opts.ParentModel, opts.Model)})
func NewCreateRequest(name string, opts runOptions) *api.CreateRequest {
req := &api.CreateRequest{
Name: name,
From: cmp.Or(opts.ParentModel, opts.Model),
}
if opts.System != "" {
f.Commands = append(f.Commands, parser.Command{Name: "system", Args: opts.System})
req.System = opts.System
}
keys := maps.Keys(opts.Options)
slices.Sort(keys)
for _, k := range keys {
v := opts.Options[k]
var cmds []parser.Command
switch t := v.(type) {
case []string:
for _, s := range t {
cmds = append(cmds, parser.Command{Name: k, Args: s})
}
default:
cmds = append(cmds, parser.Command{Name: k, Args: fmt.Sprintf("%v", t)})
}
f.Commands = append(f.Commands, cmds...)
if len(opts.Options) > 0 {
req.Parameters = opts.Options
}
for _, msg := range opts.Messages {
f.Commands = append(f.Commands, parser.Command{Name: "message", Args: fmt.Sprintf("%s: %s", msg.Role, msg.Content)})
if len(opts.Messages) > 0 {
req.Messages = opts.Messages
}
return f.String()
return req
}
func normalizeFilePath(fp string) string {

View File

@@ -3,10 +3,7 @@ package cmd
import (
"testing"
"github.com/google/go-cmp/cmp"
"github.com/stretchr/testify/assert"
"github.com/ollama/ollama/api"
)
func TestExtractFilenames(t *testing.T) {
@@ -53,56 +50,3 @@ d:\path with\spaces\seven.JPEG inbetween7 c:\users\jdoe\eight.png inbetween8
assert.Contains(t, res[9], "ten.PNG")
assert.Contains(t, res[9], "E:")
}
func TestModelfileBuilder(t *testing.T) {
opts := runOptions{
Model: "hork",
System: "You are part horse and part shark, but all hork. Do horklike things",
Messages: []api.Message{
{Role: "user", Content: "Hey there hork!"},
{Role: "assistant", Content: "Yes it is true, I am half horse, half shark."},
},
Options: map[string]any{
"temperature": 0.9,
"seed": 42,
"penalize_newline": false,
"stop": []string{"hi", "there"},
},
}
t.Run("model", func(t *testing.T) {
expect := `FROM hork
SYSTEM You are part horse and part shark, but all hork. Do horklike things
PARAMETER penalize_newline false
PARAMETER seed 42
PARAMETER stop hi
PARAMETER stop there
PARAMETER temperature 0.9
MESSAGE user Hey there hork!
MESSAGE assistant Yes it is true, I am half horse, half shark.
`
actual := buildModelfile(opts)
if diff := cmp.Diff(expect, actual); diff != "" {
t.Errorf("mismatch (-want +got):\n%s", diff)
}
})
t.Run("parent model", func(t *testing.T) {
opts.ParentModel = "horseshark"
expect := `FROM horseshark
SYSTEM You are part horse and part shark, but all hork. Do horklike things
PARAMETER penalize_newline false
PARAMETER seed 42
PARAMETER stop hi
PARAMETER stop there
PARAMETER temperature 0.9
MESSAGE user Hey there hork!
MESSAGE assistant Yes it is true, I am half horse, half shark.
`
actual := buildModelfile(opts)
if diff := cmp.Diff(expect, actual); diff != "" {
t.Errorf("mismatch (-want +got):\n%s", diff)
}
})
}

View File

@@ -187,8 +187,12 @@ func ConvertModel(fsys fs.FS, ws io.WriteSeeker) error {
conv = &gemma2Model{}
case "Phi3ForCausalLM":
conv = &phi3Model{}
case "Qwen2ForCausalLM":
conv = &qwen2Model{}
case "BertModel":
conv = &bertModel{}
case "CohereForCausalLM":
conv = &commandrModel{}
default:
return errors.New("unsupported architecture")
}

View File

@@ -0,0 +1,76 @@
package convert
import (
"cmp"
"github.com/ollama/ollama/llm"
)
type commandrModel struct {
ModelParameters
MaxPositionEmbeddings uint32 `json:"max_position_embeddings"`
HiddenSize uint32 `json:"hidden_size"`
HiddenLayers uint32 `json:"num_hidden_layers"`
IntermediateSize uint32 `json:"intermediate_size"`
NumAttentionHeads uint32 `json:"num_attention_heads"`
NumKeyValueHeads uint32 `json:"num_key_value_heads"`
LayerNormEPS float32 `json:"layer_norm_eps"`
RopeTheta float32 `json:"rope_theta"`
UseQKNorm bool `json:"use_qk_norm"`
MaxLength uint32 `json:"model_max_length"`
LogitScale float32 `json:"logit_scale"`
NCtx uint32 `json:"n_ctx"`
}
var _ ModelConverter = (*commandrModel)(nil)
func (p *commandrModel) KV(t *Tokenizer) llm.KV {
kv := p.ModelParameters.KV(t)
kv["general.architecture"] = "command-r"
kv["general.name"] = "command-r"
kv["command-r.context_length"] = cmp.Or(p.MaxLength, p.MaxPositionEmbeddings, p.NCtx)
kv["command-r.embedding_length"] = p.HiddenSize
kv["command-r.block_count"] = p.HiddenLayers
kv["command-r.feed_forward_length"] = p.IntermediateSize
kv["command-r.attention.head_count"] = p.NumAttentionHeads
kv["command-r.attention.head_count_kv"] = p.NumKeyValueHeads
kv["command-r.attention.layer_norm_epsilon"] = p.LayerNormEPS
kv["command-r.rope.freq_base"] = p.RopeTheta
kv["command-r.max_position_embeddings"] = cmp.Or(p.MaxLength, p.MaxPositionEmbeddings)
kv["command-r.logit_scale"] = p.LogitScale
kv["command-r.rope.scaling.type"] = "none"
return kv
}
func (p *commandrModel) Tensors(ts []Tensor) []llm.Tensor {
var out []llm.Tensor
for _, t := range ts {
out = append(out, llm.Tensor{
Name: t.Name(),
Kind: t.Kind(),
Shape: t.Shape(),
WriterTo: t,
})
}
return out
}
func (p *commandrModel) Replacements() []string {
return []string{
"self_attn.q_norm", "attn_q_norm",
"self_attn.k_norm", "attn_k_norm",
"model.layers", "blk",
"input_layernorm", "attn_norm",
"mlp.down_proj", "ffn_down",
"mlp.gate_proj", "ffn_gate",
"mlp.up_proj", "ffn_up",
"self_attn.k_proj", "attn_k",
"self_attn.o_proj", "attn_output",
"self_attn.q_proj", "attn_q",
"self_attn.v_proj", "attn_v",
"model.norm", "output_norm",
"model.embed_tokens", "token_embd",
}
}

78
convert/convert_qwen2.go Normal file
View File

@@ -0,0 +1,78 @@
package convert
import "github.com/ollama/ollama/llm"
type qwen2Model struct {
ModelParameters
MaxPositionEmbeddings uint32 `json:"max_position_embeddings"`
HiddenSize uint32 `json:"hidden_size"`
HiddenLayers uint32 `json:"num_hidden_layers"`
IntermediateSize uint32 `json:"intermediate_size"`
NumAttentionHeads uint32 `json:"num_attention_heads"`
NumKeyValueHeads uint32 `json:"num_key_value_heads"`
RopeTheta float32 `json:"rope_theta"`
RopeScaling struct {
Type string `json:"type"`
Factor ropeFactor `json:"factor"`
OriginalMaxPositionEmbeddings uint32 `json:"original_max_position_embeddings"`
} `json:"rope_scaling"`
RMSNormEPS float32 `json:"rms_norm_eps"`
}
var _ ModelConverter = (*qwen2Model)(nil)
func (q *qwen2Model) KV(t *Tokenizer) llm.KV {
kv := q.ModelParameters.KV(t)
kv["general.architecture"] = "qwen2"
kv["qwen2.block_count"] = q.HiddenLayers
kv["qwen2.context_length"] = q.MaxPositionEmbeddings
kv["qwen2.embedding_length"] = q.HiddenSize
kv["qwen2.feed_forward_length"] = q.IntermediateSize
kv["qwen2.attention.head_count"] = q.NumAttentionHeads
kv["qwen2.attention.head_count_kv"] = q.NumKeyValueHeads
kv["qwen2.rope.freq_base"] = q.RopeTheta
kv["qwen2.attention.layer_norm_rms_epsilon"] = q.RMSNormEPS
switch q.RopeScaling.Type {
case "":
// no scaling
case "yarn":
kv["qwen2.rope.scaling.type"] = q.RopeScaling.Type
kv["qwen2.rope.scaling.factor"] = q.RopeScaling.Factor
default:
panic("unknown rope scaling type")
}
return kv
}
func (q *qwen2Model) Tensors(ts []Tensor) []llm.Tensor {
var out []llm.Tensor
for _, t := range ts {
out = append(out, llm.Tensor{
Name: t.Name(),
Kind: t.Kind(),
Shape: t.Shape(),
WriterTo: t,
})
}
return out
}
func (p *qwen2Model) Replacements() []string {
return []string{
"lm_head", "output",
"model.embed_tokens", "token_embd",
"model.layers", "blk",
"input_layernorm", "attn_norm",
"self_attn.k_proj", "attn_k",
"self_attn.v_proj", "attn_v",
"self_attn.q_proj", "attn_q",
"self_attn.o_proj", "attn_output",
"mlp.down_proj", "ffn_down",
"mlp.gate_proj", "ffn_gate",
"mlp.up_proj", "ffn_up",
"post_attention_layernorm", "ffn_norm",
"model.norm", "output_norm",
}
}

View File

@@ -108,6 +108,8 @@ func TestConvertModel(t *testing.T) {
"Phi-3-mini-128k-instruct",
"all-MiniLM-L6-v2",
"gemma-2-9b-it",
"Qwen2.5-0.5B-Instruct",
"c4ai-command-r-v01",
}
for i := range cases {

View File

@@ -0,0 +1,314 @@
{
"general.architecture": "qwen2",
"general.file_type": "1",
"general.parameter_count": "494032768",
"general.quantization_version": "2",
"output_norm.weight": "93a01a6db3419e85320a244bbf8ae81c43033b1d10c342bea3797ff2ce348390",
"qwen2.attention.head_count": "14",
"qwen2.attention.head_count_kv": "2",
"qwen2.attention.layer_norm_rms_epsilon": "1e-06",
"qwen2.block_count": "24",
"qwen2.context_length": "32768",
"qwen2.embedding_length": "896",
"qwen2.feed_forward_length": "4864",
"qwen2.rope.freq_base": "1e+06",
"token_embd.weight": "d74257dc547b48be5ae7b93f1c9af072c0c42dbbb85503078e25c59cd09e68d0",
"tokenizer.ggml.add_eos_token": "false",
"tokenizer.ggml.add_padding_token": "false",
"tokenizer.ggml.eos_token_id": "151645",
"tokenizer.ggml.merges": "6b1b1c58f1223d74f9095929d3e6416cdd74784440221a5507b87b8197f2bfd2",
"tokenizer.ggml.model": "gpt2",
"tokenizer.ggml.padding_token_id": "151643",
"tokenizer.ggml.pre": "qwen2",
"tokenizer.ggml.scores": "94e247e531e8b0fa3d248f3de09c9beae0c87da8106208a8edfaac0b8ec4b53d",
"tokenizer.ggml.token_type": "b178dbc9d1b2e08f84d02918e00fc2de2619a250e6c188c91a6605f701860055",
"tokenizer.ggml.tokens": "1d93f6679b23a1152b725f7f473792d54d53c1040c5250d3e46b42f81e0a1a34",
"blk.0.attn_k.bias": "5ce6617845f66c34515978d23d52e729c298d8bffa28c356a0428bef17142cf1",
"blk.0.attn_k.weight": "a960832a9e0e83e4d95402e5d1a01cc74300fcca0c381237162126330e1a7af8",
"blk.0.attn_norm.weight": "32c7d51cd0958f1f1771174192db341f9770516d7595a2f0fd18a4d78bd5aba3",
"blk.0.attn_output.weight": "c67e6e7e868354a11bf9121c70ee56c140b20eec611a8955e7dfe54a21d40a98",
"blk.0.attn_q.bias": "3e9e994eb1f03bccfc82f8bb3c324c920d42d547e07de5be83be12c428645063",
"blk.0.attn_q.weight": "dc12132f789b97cfa1e3f5775ceb835247fa67aa47400fd09c8f9f3769208583",
"blk.0.attn_v.bias": "a3fd0757b31fdc78af5ec320332d239c1a79d34e8804df06c5454e86955e8cc9",
"blk.0.attn_v.weight": "f43094a2134c7ee2dcc52aac3c8b7d9d64fb0295a8adb94cabfd49213f017b84",
"blk.0.ffn_down.weight": "18c2aec92db14f21976838a8c35d5575f80d0e4b1e05ccc0d8388d5877e80147",
"blk.0.ffn_gate.weight": "a3a1c4ef38f8f750eabadfe3d83bbb0f77941eec1cc1a388e51852e99c8691f6",
"blk.0.ffn_norm.weight": "b59b779c42d44b5c4cec41e39b4eb61e0491a07c1b3e946ccb5b8d5c657eda3f",
"blk.0.ffn_up.weight": "db64f09987ea59449e90abae5a2ffcc20efd9203f0eebec77a6aacb5809d6cff",
"blk.1.attn_k.bias": "a5c8c5671703ec0aa0143ff70a20ffdd67b5d5790ca1dfa5bba4e87e4071ed9f",
"blk.1.attn_k.weight": "835c7c7cc95b3cb2e55bd9cac585aa0760a033896621d3e06421f3378c540f7d",
"blk.1.attn_norm.weight": "f4c36fb6c14fce721fab0de78cc118d6f66e3a3d3ea0017bb14aade24c3c5434",
"blk.1.attn_output.weight": "cc1e80310c97cef068e48e40b7096f32fa2138519d6209c6a1a9994985999016",
"blk.1.attn_q.bias": "bc332780e66b0aac80ec5e63ac32344919a840db2fcc8f87bcef16a43a54138e",
"blk.1.attn_q.weight": "d766f06c925cce38d4b31b2165b3448e1fb49a7d561985f95d9cd2fcba52367a",
"blk.1.attn_v.bias": "9f486626fb6ed9ac84970a71e9b9818dd2758501fd3f61bb1c08540dcc7a8631",
"blk.1.attn_v.weight": "e873d1e5bd4f4d6abfd47c0f55119c2c111105838753ee273a03c5ccea25ce5c",
"blk.1.ffn_down.weight": "b3ce82b093f187344de04284b1783a452de1b72640914609b8f830dc81580521",
"blk.1.ffn_gate.weight": "5cd44ad237edaca525a28a3ac13975d1b565f576d6a8003237a341ae0d156f2e",
"blk.1.ffn_norm.weight": "4ac774ee8afaee119610c46aa1ff89fc6c9084a29d226075bc4aa4d2f15f746c",
"blk.1.ffn_up.weight": "042d81ab5f1983d85c81213232f3bfc05a9302d9dfaa98d931ebba326b6058b8",
"blk.10.attn_k.bias": "767ecfeacd60a2c2221ac4d76c357190849dd9cdf64ced418d9d0c7949101401",
"blk.10.attn_k.weight": "a9f3df343227537636be8202303453086375091944e498bad11e0b91e45e8c71",
"blk.10.attn_norm.weight": "01acd0e7b3e363f873dbfde6f0995ffcce83f5aaa10ff91c31dbf775035f6d5a",
"blk.10.attn_output.weight": "a531fe660769604ab869f01b203eb115e025cad4c0baeacdd1bcca99cf6d0264",
"blk.10.attn_q.bias": "356a02c9163dd660c1340fbe1e049b335ac6178891e00996131bba9ab4cb3e59",
"blk.10.attn_q.weight": "81be0cfb227339d83f954cd8dcf35828441211c6e1d184060e3eb76085041e2f",
"blk.10.attn_v.bias": "ed0450653284b62f8bf2c2db19c0ff7a6cf3cda1324d0a044c5e3db7bb692bd3",
"blk.10.attn_v.weight": "c1247ff7092babd2ed979883095b9aa022b2996cab1c77fb9e6176ddc1498d16",
"blk.10.ffn_down.weight": "fda7544965dc9af874f1062c22151c6cefc8ba08cbe15dc67aa89979e77b2de4",
"blk.10.ffn_gate.weight": "9f2632b1dee7304d10c70bd38d85bb1f148a628a8468f894f57975b8a2f1d945",
"blk.10.ffn_norm.weight": "94f8cbd6b17a4d5aabd93fa32930a687db3b11f086142f1cd71c535c11adcad4",
"blk.10.ffn_up.weight": "8dc2f8db0474939a277a3d89db34c3bcc3381cfea57bd05a8426a164634d9112",
"blk.11.attn_k.bias": "3b8e5a662b19411e3f6530714b766aad2ee41eebc8161bec9db0bc82d383a6e0",
"blk.11.attn_k.weight": "2c29f1ed1ce53ce9604e9ea3663c2c373157e909a0d6064a8920005f6d15dad9",
"blk.11.attn_norm.weight": "48f68a99c3da4ab4c9e492677b606d1b8e0e3de1fdbf6a977523f97b8c21ec31",
"blk.11.attn_output.weight": "5859f3838a94898b020c23040941ed88f4fcb132db400d0849f30a01f62c0f1c",
"blk.11.attn_q.bias": "c5ad89a5628f2bd81252ef44ef6bbcbff15c33ad16fba66435509b959c2af6d3",
"blk.11.attn_q.weight": "d102104e5d61c1e3219564f1d0149fd593db6c6daa9f3872460c84403323cfef",
"blk.11.attn_v.bias": "8653f7d48c5f75a5b55630819f99ecf01c932f12d33fd1a3ee634613e70edde8",
"blk.11.attn_v.weight": "e0a7c7d89b9f2d0d781ce85330022229126e130a8600a09d4a5f920f0bbd50b2",
"blk.11.ffn_down.weight": "4a22b3361eba8bbe1d9a6fda1812618e894c49f13bcacb505defa9badb6b96a6",
"blk.11.ffn_gate.weight": "484698b206760d3fd8df68b252a3c5bae65c8bf6392fb53a5261b021b6f39144",
"blk.11.ffn_norm.weight": "da69e96338cbe30882cf5a9544004387f5bbc0bcb6038e61ba2baabbd2623bac",
"blk.11.ffn_up.weight": "26ec74f1f504d1281715680dfbcc321db4e9900c53932fa40955daceb891b9aa",
"blk.12.attn_k.bias": "f94b49ec3e498f14f6bc3ebefe1f82018935bbe594df03253bfffae36bc20751",
"blk.12.attn_k.weight": "ae6323d0bbcfcea01f598d308993d1a7530317e78c1f64923e36d4b1649e9e73",
"blk.12.attn_norm.weight": "3784536a7611a839a42a29a5cc538c74ee4f9793092e5efe1b227b48f8c4d37f",
"blk.12.attn_output.weight": "46826c00b066829355db78293ab216e890f5eaaed3a70499ee68785189a6b0d9",
"blk.12.attn_q.bias": "b14db2d327ce0deec97beda7d3965a56c43e1e63dc9181840fb176b114cf643a",
"blk.12.attn_q.weight": "30f67df52ced06f76b6c85531657584276a454d6ec9bb7d0c7d2ca8f067f5551",
"blk.12.attn_v.bias": "57ab4b7e43f4fc5853bca7bfbb2702f8c2c391a49252a760abbb7b26330dc4aa",
"blk.12.attn_v.weight": "3ccd9da0cfe241cd33a63310f3ca6d81c5bc5a50d200bfea6612ac376166aca2",
"blk.12.ffn_down.weight": "a095774413198a83c549ce132d7c9684c0baef33145eaa889be370ef9c881c81",
"blk.12.ffn_gate.weight": "bb3b2bbdfb065d2a0a795909c53beec327781a4a7e974bf9f99c436cea459991",
"blk.12.ffn_norm.weight": "3b486c6cd97eb4b17967d9d6c0cc3821a1a6ad73d96b4d8fbf980101b32b8dab",
"blk.12.ffn_up.weight": "d020b82dd39a5d5a9d3881397bf53a567790a07f395284e6eb0f5fe0fef53de3",
"blk.13.attn_k.bias": "69381f8254586eba3623eceb18697fe79f9b4d8f2c30136acb10d5926e3ba1d0",
"blk.13.attn_k.weight": "c4d7a31495d71269f81b586203a50abea3a9e2985667faf258c9306ec6030f1d",
"blk.13.attn_norm.weight": "907da11075d16eda668dabe548af3cfd794df26b8ab53939af1344d91bec6fba",
"blk.13.attn_output.weight": "ca01cf6d2b8ece2fb3b0f56f1eb76194471ac27b54fe264f99c909f5eb7fef4a",
"blk.13.attn_q.bias": "2f5ecebafe03b1d485b93c41cff756ca57fb65b02e9d8336f14a3d26ab5d159a",
"blk.13.attn_q.weight": "f557f8acad7f0fa62da06b5da134182fe04a5bed8bdb269e316f970c9cc440fb",
"blk.13.attn_v.bias": "a492a88ae131e95714b092545a8752eaea7c7d2f9cb77852628ca8296c415525",
"blk.13.attn_v.weight": "d1220b1fe9f1cc0a5a88ee239d65fec900f5eaf6c448b6c2cbe74c81e15ed333",
"blk.13.ffn_down.weight": "53184e33440b49848a896304eb16a983efbc6b8bee0b93de8c8de716e1585fcb",
"blk.13.ffn_gate.weight": "684bf8896f148c851506c62717e45c426921b93c10d536ecdeb0fb28259a106d",
"blk.13.ffn_norm.weight": "6cb4e547ad8665eb7c174855c08afe1e5490fece66122522c1e9e8132d9064eb",
"blk.13.ffn_up.weight": "c64107897e38c06727075aba4ea7940b2cdd0e278b5c555dffb2790ef553bb57",
"blk.14.attn_k.bias": "2814ca9b160b16ae39557c9b629482fbe3a7592d372c1e1bf1ac59a2d578fde1",
"blk.14.attn_k.weight": "3377177396463afba667742972920ebb45dfdc37e9950e1f0e1d60a2f936b27d",
"blk.14.attn_norm.weight": "5cae870477d51dd35a6d22aaeacfce4dff218ffba693820ede6a4e11f02afd6d",
"blk.14.attn_output.weight": "3cfe9ccf3d48ae9e95b93a132a1c6240189a277d764f58590fb36fdbb714cad0",
"blk.14.attn_q.bias": "6a75acc2f090b2e67bfc26f7fca080ae8bd7c7aa090ec252e694be66b8b8f038",
"blk.14.attn_q.weight": "5ef45c86d7dda1df585aa1b827b89823adf679a6bb9c164bd0f97b2aa6eb96f1",
"blk.14.attn_v.bias": "5534480443e10ed72c31a917f3d104b0f49df5e6dbfa58d0eb5e7318120e3aee",
"blk.14.attn_v.weight": "58f45cf3240c4623626ec415c7d5441eaa8d2fb184f101aba973f222989422d1",
"blk.14.ffn_down.weight": "2dc82a0f20c05b77512458738130d8d05ce150cc078680ae7ee6dd7ed68d955d",
"blk.14.ffn_gate.weight": "d4a6c6f0fcccddfd1fcaa074846622f4a74cb22b9a654ab497abdc1d0dde9450",
"blk.14.ffn_norm.weight": "777e444932a0212ff3feac98442444e17bd8a98cb758ea3356697d0846d12c56",
"blk.14.ffn_up.weight": "6b75f6bd00195198447b69a417ed9d98f8ca28b3cb8be82f4bad908be0777d57",
"blk.15.attn_k.bias": "2d07211a58e6c2f23aa3a6dc03c80a7d135dfb28726b60b0e0fdd0f35ea5c37b",
"blk.15.attn_k.weight": "e77f3c0075a1810e70df956cc51fd08612f576cc09b6de8708dcae5daedb0739",
"blk.15.attn_norm.weight": "379a10d90609a5d5ba67d633803eda1424fc61ba5cca8d3bffe70c8b18b58ebf",
"blk.15.attn_output.weight": "402751c12ee9dbc9db5e3bf66a7b23ebe7d36c0500e0be67be4c8b1c4357fa62",
"blk.15.attn_q.bias": "acb37fc409ee725ceedf7a3a41b40106086abc47b76780728f781942c5120208",
"blk.15.attn_q.weight": "89cd3047a09b46ed2bb57c69dd687f67a1f0235149b30376fa31b525898e4a55",
"blk.15.attn_v.bias": "f081a37289cbe811978feb4da3ef543bdeb7355414d476f44e09b498da10cb2c",
"blk.15.attn_v.weight": "8404f242a11e6d512c9ead9b2f083cda031e9b269f8a0a83f57ee4c56934764e",
"blk.15.ffn_down.weight": "93438f43ee8cc4f1a7fd3840a6afdd5f02123e76db4f0d9474430c0100d148fc",
"blk.15.ffn_gate.weight": "ff935a2698843e87fad9dbf7125f53e460190ec71ee128b650b3fc027fe37bfc",
"blk.15.ffn_norm.weight": "4be80f199841cba831982e988451e1833c3c938a4d6ca1169319087bf0bd723e",
"blk.15.ffn_up.weight": "ee9ba63c66d71053e33551ddd519878bb30b88eeb03cfe047119c5c4000fb0a6",
"blk.16.attn_k.bias": "3f5fbabed4510c620b99d9d542739295fa6a262a7157f3a00a4889253f8341b8",
"blk.16.attn_k.weight": "8ca6eb139b281c257324cddea97a8e9aa7c048b53075cf00153123b967c27ee5",
"blk.16.attn_norm.weight": "290157f005e5aa7dddf4bd60100e7ee7b0baa7f11ec5c2cea5e0ead2aad3a4c6",
"blk.16.attn_output.weight": "b1f4d80a7447f08f1c331712527f750d00147f35c042442ade96fd029dadc5a1",
"blk.16.attn_q.bias": "e3e4e442ad4416791b468cad8de0d0d2d68c7e7df8d06002f4d49b4da9cb25e4",
"blk.16.attn_q.weight": "cc7392fa5bb1107d3816e7e7363de252d37efd4165d065e258806291ce0a147b",
"blk.16.attn_v.bias": "a7629830f2f6293e018916849614636d40b1bcd11245f75dbc34d38abae8f324",
"blk.16.attn_v.weight": "b6c7856c7d594437630929c8cf3b31d476e817875daf1095334ec08e40c5e355",
"blk.16.ffn_down.weight": "f9c0a777a00170990a4982d5a06717511bf9b0dd08aeaab64d9040d59bcbebba",
"blk.16.ffn_gate.weight": "ed88f11bc3176c9f22004e3559ccb9830a278b75edd05e11971d51c014bd5cd2",
"blk.16.ffn_norm.weight": "ab24abdcc4957895e434c6bb3a5237a71ff5044efb9f76c1a9e76e280c128410",
"blk.16.ffn_up.weight": "99f594dc8db37f554efa606e71d215fbc3907aa464a54038d6e40e9229a547ff",
"blk.17.attn_k.bias": "f236625676f9b2faa6781c7184d12d84c089c130d2a9350a6cf70210990f6bf1",
"blk.17.attn_k.weight": "c2a4f20cd3e98538308a13afe9cc5880bdd90d543449c6072dedd694b511ee1a",
"blk.17.attn_norm.weight": "5a9da4ee168311f487a79fc9d065a035432c6cafa8adb963a84954cf32f57a2a",
"blk.17.attn_output.weight": "d5df7031e354186ce65dc09d6f8a92eb721c0319816f8596b0c8a5d148ed0a2a",
"blk.17.attn_q.bias": "3212d5eeaa7ed7fac93cc99e16544de93c01bb681ae9391256ed4a8671fc6b00",
"blk.17.attn_q.weight": "d18cd9aa7ee10c551cb705549fa1ae974aea233f86471c9a19022dc29b63d0d5",
"blk.17.attn_v.bias": "a74ad11a1f8357742f80e2a0c0b3a2578fc8bbaf14c8223000767e07a5d79703",
"blk.17.attn_v.weight": "da18ac0e90884436a1cb0ad6a067f97a37f321b03c70b8b03bf481339fef5c80",
"blk.17.ffn_down.weight": "81a8a5d7a194fb53d976558e0347efbe9fdb1effffde9634c70162e1a20eff51",
"blk.17.ffn_gate.weight": "72870d83ab62f2dcd45f593924e291a45e4ae1b87f804b5b88aa34cfd76dd15e",
"blk.17.ffn_norm.weight": "cae39ac69b9bdaeefab7533796fdf11dbb7a4bdbdeed601e20f209503aafe008",
"blk.17.ffn_up.weight": "e7cb40b0842468507cec0e502bbed8a86428b51d439e3466bc12f44b2754e28f",
"blk.18.attn_k.bias": "8bfc02b94f9587aa125e2d8bbc2b15f0a5eb8f378d8b3e64a8150ae0a8ca3df2",
"blk.18.attn_k.weight": "434bc3b3332ea48afee890aa689eb458a75c50bc783492b0cbf64d42db40e8ad",
"blk.18.attn_norm.weight": "d6ffc09396c42a70d1f0e97d81113eee704d3bfc9eeae2bed022075a5dd08075",
"blk.18.attn_output.weight": "133f001f81f3b082468a7de67cb2e7a76508fce34bcc4dee7f0858e06eee082c",
"blk.18.attn_q.bias": "758d0e28bf5e660b3090aafb70e2a3191b4f3bb218d65e9139a086ceacaf599f",
"blk.18.attn_q.weight": "12d7b86fc1b09b9fa7f8b7ed43d8a410892cec8672d0c752f8346f6193343696",
"blk.18.attn_v.bias": "9efd15bab0519462431d6c6e8a5b7dd4e151dc449468097ee0ddca369c0ecc2e",
"blk.18.attn_v.weight": "f631231a79d4a2e9730fb2e386d8c18621eb3fb7900fbfdff5e6d52cc42db122",
"blk.18.ffn_down.weight": "874a2dddf456f3ab56b958b0860d71c8c680a6f89322c9bf6b2f32a113592300",
"blk.18.ffn_gate.weight": "4549ef8976c345a511df4a7133bdaf6fe387335f52dfd8a4605a8ae3f728c403",
"blk.18.ffn_norm.weight": "80c258a2536a860e19bfcbd9f29afa13214fbb4c34bde0d4da51287d354e9a59",
"blk.18.ffn_up.weight": "8b03308a581457a3c038b7a086f3cdf14941d7ad4107c4bd6d9d6b062fd00d73",
"blk.19.attn_k.bias": "e77f7b0c8e3e0a9b0d61918cd88371047752a1b02b1576936f4ec807d4d870ee",
"blk.19.attn_k.weight": "a2a318e93355230c0d0f95c441b080bf9c4914507255f363fb67a5e771d4d1e6",
"blk.19.attn_norm.weight": "9a4bdeb3970be21ac74a94c2c81eb36986533db81b78db6edec48d9802910d59",
"blk.19.attn_output.weight": "2369b103dd3947e2cef02b2669b405af5957fb3a7f9d0ff40646078c4b4317ad",
"blk.19.attn_q.bias": "e20bf427bef69059ae84a5d9f98f7d688489627f198fb6153def018ff9fd2e34",
"blk.19.attn_q.weight": "45a3bb3bdfd2f29dd76e5f78ddae73678b9a2a85dfaf609e460240ef5b7be2ad",
"blk.19.attn_v.bias": "a441f58a3e02ed86ee1819eefc9bd4e8b70d11b864a929d58a2c2ac0aeb8203d",
"blk.19.attn_v.weight": "30b0b04480c510450a7abb2ce9fa05c65b150a3cc4dc76f8916bf8d013f1b6be",
"blk.19.ffn_down.weight": "eebb9ab8fdb6a6efcfff8cf383adac9ec2d64aeeff703d16ed60d3621f86c395",
"blk.19.ffn_gate.weight": "3fef1493029298378886586478410b3d2e4e879f6aa83c07e210a7ce6481817f",
"blk.19.ffn_norm.weight": "e1be99ea1e8fb9678f7b8ba200f3f37e03878f3574d65d57bcd3a9fd796e2112",
"blk.19.ffn_up.weight": "f07cf25e09394fb69fe3ef324bdc0df9a4cecf3dc53070b8acc39e6d1689bf82",
"blk.2.attn_k.bias": "b29baa8221f125eff6b8ac1a950fa1d7cfc1bce7bdc636bf3df7d4065ab6466c",
"blk.2.attn_k.weight": "4bd0c179bced8bc37a09f5748c394e0cf50273942fb38a866e5cf50b6c96c437",
"blk.2.attn_norm.weight": "07b3edc6a6325c3428aa12f29bcae0be0de363ce61a6af487bc5c93fb8c468d9",
"blk.2.attn_output.weight": "056b5b31dbc81087c81b9d41c25960aa66c7190004c842ba343979644d7f4d88",
"blk.2.attn_q.bias": "479b6212401e097767c9d52b12a1adb8961c0fce9fcaaab81f202a9d85744376",
"blk.2.attn_q.weight": "f89196076f446a6dd8a9eee017f303504f9c03094c326449cee5a7fc0a97fade",
"blk.2.attn_v.bias": "ef9b1b986dbd9d7291027a88b67dc31434435b20e76e4f1e9d6273ebd31224f0",
"blk.2.attn_v.weight": "9322f4f00e85f8c0936845c51ca64b202a93df104f36886986a8452a8e4967a5",
"blk.2.ffn_down.weight": "7beac0d2440dc49af33ededb85a6cc3ba23ab33ad3ffa5760714b2ef84d94f6e",
"blk.2.ffn_gate.weight": "818a93864a5890c1f4dc66429004fad07645a50142350e9bff9a68fe24608a52",
"blk.2.ffn_norm.weight": "152c924d5514942ad274aafb8cc91b35c1db3627c3d973d92f60ff75f3daf9ba",
"blk.2.ffn_up.weight": "9c9579e600f209546db6015c9acfeda4f51b6d3cca6e8db4d20a04285fe61a37",
"blk.20.attn_k.bias": "fd22bfeffb63d818ce2ff1ea2ace0db5d940f7a9489b6bfc1ec4a5398848d7fe",
"blk.20.attn_k.weight": "f74439bc74c2f9252130c9c28384fd7352368b58bb7ce3f2444cf0288dfff861",
"blk.20.attn_norm.weight": "5c15d2613df87be6495fb7546b7dcedd2801d12fa5ecc02c877df889330e8f37",
"blk.20.attn_output.weight": "6731a39286a67f6859832f96695732e579e14e0c36956eccd1edce3db11595b8",
"blk.20.attn_q.bias": "04466e5a3f454a19b9b433fc2585396feac780027ece7ccb4e4bb3e406fc14d8",
"blk.20.attn_q.weight": "ead4c71daaeb17bf20d014a34c88b97f238456488e815ae0f281a5daf6fc99b8",
"blk.20.attn_v.bias": "adcc848e043025de9bd55ccb14dd8fb6343e8b5185ed07e12964be41d0faf99f",
"blk.20.attn_v.weight": "81bfc23f83526386a4761c2c16b6a93cd0bbf9d846c1a51b82c71f1474a465f1",
"blk.20.ffn_down.weight": "9bf660af3bafad919d03173c89a65fc9c89440a76c42c9e55e4d171076f3c17f",
"blk.20.ffn_gate.weight": "c04b4f3ccce44917ee228b998e2c19dd702aef10a43413afb152e808b5ac5c42",
"blk.20.ffn_norm.weight": "3d5b555d7746a71220143c6b8fff5ce4eb63283d9d9c772f1233d848f69f4ff4",
"blk.20.ffn_up.weight": "d7a196505c39e5469dfc7c6958bdbb54e93629ac1a047a6663ed96b318753094",
"blk.21.attn_k.bias": "4db1f48e5c6a3bc5720a5da813bbef08283e6269e12d83f8a9c54e52715d8011",
"blk.21.attn_k.weight": "c687b2f0e132a5e220a2a059b61aa2a537f37d8a674d7709f87880637b263b31",
"blk.21.attn_norm.weight": "ec23b0ff847a4b45585ab8e04f10fc20bb1637c5f1fbcdc4d73f336bcb5d1bd0",
"blk.21.attn_output.weight": "01255390576316c1731ef201e32c6e934eba356c28438cd06d9027ac6a3ff84f",
"blk.21.attn_q.bias": "3098f37205a15418e1681e407c82b7ce7c6fda6c6826b0590a13e1b68a38a1ea",
"blk.21.attn_q.weight": "30ea62cbb702a5359229dc96819df17ee535e2e9988d044b005c73ea536e1005",
"blk.21.attn_v.bias": "7bbedb2c22a04737f21993115701d4a06b985b7ca3b64681f53cd1be8d7ea39e",
"blk.21.attn_v.weight": "e11905e63579e36fbee978062af7599339ae29633765a4835628d79a795ec8df",
"blk.21.ffn_down.weight": "84def2ffd8aca766f9ce12ed9ac76919ab81eb34bdeae44fa4224417c38af527",
"blk.21.ffn_gate.weight": "4e99f05377b4a0b8d875045530a5c59dee6a46ac8a45597f6579f6fdfa800787",
"blk.21.ffn_norm.weight": "af48f13d03fba38ff8794a5f5005e666e501f971ca2e30bbded2777a8096f37d",
"blk.21.ffn_up.weight": "a29541c39a6acbc364be86994632a5bf55d701027cb7f23320f8c6d55ee42c91",
"blk.22.attn_k.bias": "c97f84db6c75422df6ef5768676d4e9abefaa3b8337aa2730ff260f8fc350480",
"blk.22.attn_k.weight": "af9a0c56f68779513e95be11611b7be6175ddae27d48bee9dd72fdbf05f6cbfa",
"blk.22.attn_norm.weight": "1c7518eb5bcff4a202c6f4a2827f14abd76f9bcc64ce75fe9db60b69437a5c9c",
"blk.22.attn_output.weight": "1abcf1f3caa2f59dd018646b93f9cf8fd30d49e98a473e6a8704419a751be46f",
"blk.22.attn_q.bias": "7221e01cb692faf2f7f8c2eb6e2fac38a1b751a9c9fdb6a21a0a936eb0bf4b96",
"blk.22.attn_q.weight": "faaf8fb7b6c19f343d47f3ea6b57151fb46c787e0b3bd2c292fd327d3d4d8e35",
"blk.22.attn_v.bias": "3ec05942e82d735de99dfd0d8228d8425e63e2fc584da98b3326bdef89ecb2e5",
"blk.22.attn_v.weight": "42e7b0ad06db76227837da9d4e74b2db97f3df4050ecb3a87cb9b55e08dfcb42",
"blk.22.ffn_down.weight": "87ef98ad2d0e824b0fa5ad8aa18787162922e527c9b1b721a99bc07d3bf97c82",
"blk.22.ffn_gate.weight": "562d6e5a1654b03aaa0e33864d23c10297fd4bcaa72d30fac69fb771ee1df9d6",
"blk.22.ffn_norm.weight": "f8a405dee467749d59427ce05cdd4b9c11bb18934a89258ea461f013b7d251f5",
"blk.22.ffn_up.weight": "90e1f4ae4062649d4d838399eb353e8bb8d56a49982b6a7f64aa3945377f7187",
"blk.23.attn_k.bias": "9ad22178a85f3be7e25f5aff462f31627466364f2f5e92f265cc91db0da9a8a8",
"blk.23.attn_k.weight": "d813beffb10f03278f5b58eea0f9d73cdcb7b5b4045ae025c379592e854f7dfd",
"blk.23.attn_norm.weight": "f583c9836044bdb056d6f8911088ac28add68e500043ae1f97b5d9158fe3d769",
"blk.23.attn_output.weight": "02789911ac3b97f6b761e958b7dd6dc7da61a46a1be92bd0b346039ca7ecd2b2",
"blk.23.attn_q.bias": "38c4970fb9b4f7e4a139258a45639d848653814b4bc89ea9849709b13f16414b",
"blk.23.attn_q.weight": "eb694be9a5ab5858b8dab064ee4cce247dc757424e65282989bd4d015b8580ce",
"blk.23.attn_v.bias": "0a25f6533aa7e7a152a4b198cf6c411c2408a34afa4f161bb4d5ffba2f74e33f",
"blk.23.attn_v.weight": "187e1bac6b70f74e6364de226565aa8275ee2854d09cbe5895451a689596049e",
"blk.23.ffn_down.weight": "88880dd9ba7ee80ade972927f810b5d2c30a69520c615190b27f9daabc0a8c5a",
"blk.23.ffn_gate.weight": "5abec63197935ab3eb8e6de0a5307396ec46cdb1cc5de25d87c845f3c4a3e887",
"blk.23.ffn_norm.weight": "60e1f5e6310c3a531c554a6bb7cd883aed58db1e51853f739436ea461c1843d7",
"blk.23.ffn_up.weight": "3d7f502771743f4a634188dfcd8b8a384fb07467ca8528366aee59ddb25b7bce",
"blk.3.attn_k.bias": "0b6b442ebbac29c8c4b67e8e3876d0382dd2dc52efdf4ab0ebbc6f71b6252393",
"blk.3.attn_k.weight": "480f40584fbda692c26f2cee45f5923780b236f8b4e8ec7bbee0237777a0918d",
"blk.3.attn_norm.weight": "39872be2af31bc9cd6b583ebba6fb759f621d586d66e5a2fc0b85991615a8923",
"blk.3.attn_output.weight": "924b2c80d8513bf637f8ebb3756a340d9cf2243de723fd08d7f5dccd46b3f8b6",
"blk.3.attn_q.bias": "863c9d848156847a3fe9bbc44415a4395245b5d13e95673c014fdb71e494ab0a",
"blk.3.attn_q.weight": "bff73ee5de92fba8f6c089bbb19ce57e17ab3c9c29295712804bb752711b882e",
"blk.3.attn_v.bias": "e1b6fea126e86189112fcdfee79ffc66a087461527bc9c2dc52dc80f3b7de95e",
"blk.3.attn_v.weight": "7812b7f5133636f06cdbb4dcc48ef7803206538641b6c960777b37f60a8e6752",
"blk.3.ffn_down.weight": "00b393d6a7e3ad9b5224211ccdbc54a96aae151f24ed631764ac224972a6bc82",
"blk.3.ffn_gate.weight": "cfd63fa3a038af05dc53c6eeb3c192f1602f26ff24cb840bcf1510fcb37b5513",
"blk.3.ffn_norm.weight": "7389fc240a282949580ea2f5b0d7973ac79f32f76dc0155b537bb6b751f8e27a",
"blk.3.ffn_up.weight": "2a945f47090df9cb16f92f1f06c520f156f8e232182eaaed09f257b8947a2a62",
"blk.4.attn_k.bias": "62533c31f0de498187593f238c6597503fef2a92e920cd540a96bc5311b3b2a0",
"blk.4.attn_k.weight": "93e829868bffd980a8e589b9c4566cd81e6ce4296a5f357a2ae93febe1284156",
"blk.4.attn_norm.weight": "9e0aaa4bbdd1389890f8abec20533f3ab16d61b872b1a8dbd623023921c660a9",
"blk.4.attn_output.weight": "74467d6f44357d67f452ac49da861468b38e98057017bd38bc9a449f9d3538e6",
"blk.4.attn_q.bias": "8e6d9026fd69b314c1773c5946be2e11daf806ef22a5d91d744344fd30c58c59",
"blk.4.attn_q.weight": "e5bfbafd94a4d530f3769f5edbba8cc08d9b5bee8f66ebf4cb54e69bc0b7f63b",
"blk.4.attn_v.bias": "20c570f92022d9905eb85c0e41d1fdb30db22007a9628b51f512f8268d6c34a2",
"blk.4.attn_v.weight": "9638d459d61da03c9dd34dad985e03c43b4f8a5bc9701a82153478329b0517e0",
"blk.4.ffn_down.weight": "9d91b06e89d52f4365dece7eaeec50f81e52cb2407b333248a81e6e2f84c05b8",
"blk.4.ffn_gate.weight": "bf6350a79c6a6ee9146edfd788b88d4a4c2b54db1aa0adcc1464dbba8a84b646",
"blk.4.ffn_norm.weight": "11a70a6b9f7ce336292f4e3a2c6c92d366d4ee4306ad4fdb1870fde107e9cc31",
"blk.4.ffn_up.weight": "64f23f493d02b147a72a59605e6b7dd1c4c74f6813a38a2a60818bd66f697347",
"blk.5.attn_k.bias": "f6c2c279c0ed686f298ad1e5514b5cd882199341f896abbb2c2129d4c64ce9c5",
"blk.5.attn_k.weight": "0e682f75870abf9efaca10dac5f04c580f42820ecf4e234d43af967019acb86f",
"blk.5.attn_norm.weight": "01efae7653705e741932fcd79dff3be643d7e97f4b5719b887835dffe44b3a82",
"blk.5.attn_output.weight": "69e841d00d196acc489cd70bc5ffbbb63530ac5fabb169d40c4fb3a32ebb8ed8",
"blk.5.attn_q.bias": "f3304d76ccd44fed887565857c8e513b1211d89a5d3e81782de507ab3f6fc045",
"blk.5.attn_q.weight": "98612a6b7920a247853ada95c240807d4ca8e43604279e7a2fc9bb265ae40469",
"blk.5.attn_v.bias": "39940a9b353ceed3edfd4a39b985c9520490aa1b9f11749c94fdf6d879d1a259",
"blk.5.attn_v.weight": "839f84b828cf83aecf479a0dc7bc86cce05145ef77dcf29916dc3e0680f5b665",
"blk.5.ffn_down.weight": "1f48cbb0960f15e06ab8a3754ade792995a655856389ddbca629c07e89d1b114",
"blk.5.ffn_gate.weight": "33d8219fce3189e1aab376039896eebd4ad36ebd26a8278cd19b26e4357e4f81",
"blk.5.ffn_norm.weight": "0f4a0f83d37127fa4483f2905cb4f38ef6ddc71584b6cb05632c62a9af313dda",
"blk.5.ffn_up.weight": "22a64a11e5f0a1ff45ca327bf9e1efa258f085ff6a96edc398b7474f725b4514",
"blk.6.attn_k.bias": "baa91df99d4df2d25e8d590bca4e334b97f2d9aa3df8e748fedc8a6188499111",
"blk.6.attn_k.weight": "121f3b9f4b9491996499392e2688a929cafe102a67920b4cb2a039349c43d8eb",
"blk.6.attn_norm.weight": "b4cf987e923d71f2f84c58d20ea8af7576b225bf61952145b489fdd395e3d411",
"blk.6.attn_output.weight": "a112642150a138d54b2a4038042fd33619035a35694771e966f3575856c635d6",
"blk.6.attn_q.bias": "a97ea10469cdfa3fdddf8bad6de683ef99f6170eb8d29d15dcf6bf4bce37c5a3",
"blk.6.attn_q.weight": "d80c787019317a87361de6bbc7df6701357216bdd9b404522cede34a719a5500",
"blk.6.attn_v.bias": "d846269db9cd77ae28da26ba0914cace1b6754bd5301af9c44607085dfcbd2d7",
"blk.6.attn_v.weight": "06567c433e8a391647633291b50828a076ad7c2436106bb9278c60a3f8fccb3b",
"blk.6.ffn_down.weight": "f15f66f56b3c474eac8c6315c5fff07c3e29c6e483d7efd4d303c7f43814be91",
"blk.6.ffn_gate.weight": "47768f89c6da8eefb29adb766ff4eb38c9dfd79320bbc1386248319fcbcf567f",
"blk.6.ffn_norm.weight": "7f8195e6b148212967145fc9d86ce36b699cff0de026042245c2d344f1ef8510",
"blk.6.ffn_up.weight": "53d7707ae4347aadb445289f9f87a008b72df5cb855b00080a605442fdd8edf3",
"blk.7.attn_k.bias": "63e274df3217dde25b8369a383e480fe4f6b403a74385f15ac0b5db71dce2744",
"blk.7.attn_k.weight": "f6fce88602f5945eee09767acbcad387d132614e6da39ae359f2bbf380d94b1f",
"blk.7.attn_norm.weight": "bbf5dc7336c0f9a511afef6bf5efeffd78f1b83940850c3eb7eb20c621b75656",
"blk.7.attn_output.weight": "d9fb907a138396a859cecbfcb377927308dc93c24c7fb52dba5eb59265feadec",
"blk.7.attn_q.bias": "f02ba1318346af77e309f40aee716e2de7ee8cab67e67b17636db9bf40894fb0",
"blk.7.attn_q.weight": "54a691e824be287a61c35c172edc01922ed792d2addeee029afc17ba6c7e11b9",
"blk.7.attn_v.bias": "3a4f182f51e84ce862d558fb2751b91802b65d74596bb14d624808513a8a83ec",
"blk.7.attn_v.weight": "a142fe6e106d3ab484e2dc6f9c72b8fc0a385279dde08deb1ad1fd05ac25deb1",
"blk.7.ffn_down.weight": "8daf7e8c430d183a4d6ab3eb575fafa4b5e31689f68b290c8b370411ad9d0f12",
"blk.7.ffn_gate.weight": "a2a786b45eb660994254b48e2aaf22f3e9821cfb383dee0ba04cc4350a2f8e72",
"blk.7.ffn_norm.weight": "73828bbc8c9610cc139fcf03e96272648cdc291263251fe3a67367408deb69e1",
"blk.7.ffn_up.weight": "e85dd0f63fed449ce16893c5795ea6a050a2d7a66d9534410a227e22c905dafa",
"blk.8.attn_k.bias": "91a752a6e2c364e5ee6a015770fe289aece4911ae6c6bbfe74ac52f465465f93",
"blk.8.attn_k.weight": "99c069e92c43a2efb74e23188256b3cabbbe06399878e681ce203a05d5da378a",
"blk.8.attn_norm.weight": "c76d36d3cc06aa2a9edb1abf9f602bb7ed61ac9d61f8ef7ed736a1e619abe717",
"blk.8.attn_output.weight": "ee5ff156a2625e1f203f65e69b514f9df04bd9a5e82b28e3876e16cf1c6f65c5",
"blk.8.attn_q.bias": "8fbd868a93b330c8b0418b488c5301f42a7eb0c58445a4e515d56777f1d96ed5",
"blk.8.attn_q.weight": "9f20ef86e80098ba52a3a31ebcc315bea3a614dac9cba7ac1db02f156db9b577",
"blk.8.attn_v.bias": "c4813571d5d618742183a7890c0b89cd7f18e210c758f63aad564659bc38a26d",
"blk.8.attn_v.weight": "ea88e1a4cf8bd56e9a88ada427d2b0cd352234827640757ee2a9ed594fb67a53",
"blk.8.ffn_down.weight": "b0d1a7495811580b189aaa3e20ea871d6d01ed7b6c23e59825078ef786944ff2",
"blk.8.ffn_gate.weight": "0a17c0caa0b06721c49b59b2a63a5dcbf744dd1cffa55962b404ba910c658a62",
"blk.8.ffn_norm.weight": "f15f109d4a8e9d1ff7c71fa5bc6373df7ee80c5f7d1de3fa0d4849d747e36bcb",
"blk.8.ffn_up.weight": "bbf4c5c4c5c8a0f9ae8b88e3cc8b86f81b98148722d5a350995af176c0b774f2",
"blk.9.attn_k.bias": "a7f60d962686b8ca60f69643e0e0fa8614688be738fb0b1c6bd54de35c2beb5e",
"blk.9.attn_k.weight": "dd80ce4adb00e338fc04b307e4c18a27071f4ba4397184a24d765e6e4a268ef4",
"blk.9.attn_norm.weight": "721e6487547e2b3986ab4b4e2500ceade59d908bccf4436e1e8031f246deb2bd",
"blk.9.attn_output.weight": "5a800af39107b363861e5f5173483cdcd644d8ac3b0c8a443b9c759d71285db8",
"blk.9.attn_q.bias": "0a19b4925ea8ca8067acc909b058adc327de3874cfc94cc9eb4a106d3f370123",
"blk.9.attn_q.weight": "93e84906684c0c7ede79967236d9fc8344da84a9f1daa04e8295c2c9b6b26a24",
"blk.9.attn_v.bias": "615421f812f821e230ecde4e6da35d868823248355ce7e4e51e2d650ead565f9",
"blk.9.attn_v.weight": "7f4913e289aefd9ceecbdaf9767b1e95303f5d59dd67ecb2cc15768477f4d08e",
"blk.9.ffn_down.weight": "95d1b3933221e87dc4af70dd566daec9498bf358070b8d26f1fc70766a84a152",
"blk.9.ffn_gate.weight": "530f2d04f6a1fbffaaa5f2fbc3a328ebed7b330e3af14b4fc7d8a51b13ad8d42",
"blk.9.ffn_norm.weight": "28077de416217ea1df94b96017bef4cc562ab62e51b1a03a671c70abc29ce52a",
"blk.9.ffn_up.weight": "b87b6190778aaee4695938e24ac6c90dbbee6dce7c5c2ab5bc26ba4564581822"
}

344
convert/testdata/c4ai-command-r-v01.json vendored Normal file
View File

@@ -0,0 +1,344 @@
{
"general.architecture": "command-r",
"general.name": "command-r",
"command-r.attention.head_count": "64",
"command-r.attention.head_count_kv": "64",
"command-r.attention.layer_norm_epsilon": "1e-05",
"command-r.block_count": "40",
"command-r.context_length": "131072",
"command-r.embedding_length": "8192",
"command-r.feed_forward_length": "22528",
"command-r.logit_scale": "0.0625",
"command-r.rope.freq_base": "8e+06",
"command-r.rope.scaling.type": "none",
"tokenizer.ggml.add_bos_token": "true",
"tokenizer.ggml.add_eos_token": "false",
"tokenizer.ggml.bos_token_id": "5",
"tokenizer.ggml.eos_token_id": "255001",
"tokenizer.ggml.merges": "902a060cac8884a5793d2a857dd2e53a259de46c8d08c4deb243c239671e1350",
"tokenizer.ggml.model": "gpt2",
"tokenizer.ggml.padding_token_id": "0",
"tokenizer.ggml.token_type": "b7a352ccd1c99d4413bcf452c2db707b0526d0e1216616b865560fab80296462",
"tokenizer.ggml.tokens": "815ac90ff23565081522d7258f46648c8a0619eb847a9c7c31b238a9b984e4ae",
"blk.0.attn_k.weight": "6fcfdb466f9ceb1229404ce4ec4e480751b8d00da12707a11783dad7256cb864",
"blk.0.attn_norm.weight": "6063317f731371864049c7704a70772f1eb632194201ebdc2ed0f8e483507c72",
"blk.0.attn_output.weight": "920f49716a1e2fc73b6794ec777947f1c122701e63ed302422ac89e90f06e9da",
"blk.0.attn_q.weight": "ddbcd7cde197e632564ac58e4f25d9e3a8ca52917329eeb6081eb41a797932ab",
"blk.0.attn_v.weight": "318fc02a189d87420f0cbf57f47f11e00c21ec1ed472ce0a2a895b44f7fa0fca",
"blk.0.ffn_down.weight": "aa71975b6eb1f4c77b03d2ac4a194cf8d95718efac741bb12f0f3ff79a27f9bc",
"blk.0.ffn_gate.weight": "42967702fa0bc738b88dc50007ace26dbe74a5a9e0978124dd093f818241a9e1",
"blk.0.ffn_up.weight": "5282c8788b086bd30f46525e7995a17464882a72703fd27165491afdd8bfd4af",
"blk.1.attn_k.weight": "cd248882e64fd2c3402c44790ebe12440133dc671b6893fdad0564c461973adc",
"blk.1.attn_norm.weight": "ba84e1c8fd30af6ec94208db4078befac8c921aad3acb887812887f3282ea2be",
"blk.1.attn_output.weight": "2efa3ef7c5666ccceb05e339b83ad680cc0d2c3ec78203f5da5959f23a80e14f",
"blk.1.attn_q.weight": "5106f2e255358a1303c22e8b5f0ec044852bb30a866c52cabefd30017a7a6b7d",
"blk.1.attn_v.weight": "a211a634a1a5df1d5f973645438be0461dd922210f9747c6b04e386c7f1ebe95",
"blk.1.ffn_down.weight": "37093afe48d32c578ec956c9ed85242cd000d6aa979e60526aafa10c822dbb10",
"blk.1.ffn_gate.weight": "469860819e9159caefb1aad0bc66db790f3393f05fd87b08e52256a7ed256543",
"blk.1.ffn_up.weight": "736742c97d35d1a011f9cafd3c0ce947ad559bb2fba6da73c816f6bfd0fa9aeb",
"blk.2.attn_k.weight": "92c219d92804d832ab404bd6dc7339c90877bb7cf405dd030c121f8b27757739",
"blk.2.attn_norm.weight": "61e4466069474b76b6d1e702566420eb669faf3556b00ff7b824784aca13a2d6",
"blk.2.attn_output.weight": "d2fb38a2b2171fd91caf037faa585a62225819aa232d86fd4f7f9d2c3c8a45e9",
"blk.2.attn_q.weight": "f6faf5cc6844e3daa4f9f68d90f5458c64879de68a7728860e38374e30c3429d",
"blk.2.attn_v.weight": "f340ef8f7341d987a6f37c0e9afe0aef5be67be00c0ce5f57612daf73319cce1",
"blk.2.ffn_down.weight": "c7be61a701d779860b621b143fb6365b607bf99ec7c0f153b07908ac8120885a",
"blk.2.ffn_gate.weight": "b64f0878187bd3392abfa4c3e8ad2f8b4c133903e54246747ff8f3b4639ad83e",
"blk.2.ffn_up.weight": "50b11c712652e90ee7428dbb45cffebb80662ac982bc72bd9eafff361b5eb5a8",
"blk.3.attn_k.weight": "2b7bcbe9ee5c9c630c8c8d7483887e78b73581016f4cbb6933db2a147a25f431",
"blk.3.attn_norm.weight": "0181dac7f4eee7252980323e8032cf339bef2046ce0a16c0fd72af7c98a8a37b",
"blk.3.attn_output.weight": "aef8843b636ce231da9e7c9acbee197883cc15df0e2887709324c6a50f16da7b",
"blk.3.attn_q.weight": "55404130fa10e81322d33eb378aa0de31a92990ce7730f1338c0ace0406bb1b1",
"blk.3.attn_v.weight": "76f7fb8040d82b957d689ce34fea2302a6640ad5bbaa0052ad2b7ebce270c33d",
"blk.3.ffn_down.weight": "648628933eff3b357c3729c33c5b1ae51c28e59b9c19acd1601a2ff7c5d5d9a5",
"blk.3.ffn_gate.weight": "6a588885d16e98d5f50ebed05af089154f680085ca9c97691e5b489088630a4a",
"blk.3.ffn_up.weight": "e12455a1d702f4986e1a663493e3d5102b367af74d45557522002a35d63ecac2",
"blk.4.attn_k.weight": "40d943380a8a85e4eab147934bf6e16f23cc8ab753f6636526382c074d182288",
"blk.4.attn_norm.weight": "4ab2c098983d4599fe540eef624c4df954adb7473faebda7471ef0ba4134814c",
"blk.4.attn_output.weight": "d14b91e40f58bf4a3c8c2eca0b12bb541de406574af39027d56f6c588a147082",
"blk.4.attn_q.weight": "e1224960a3562107488589f883fa32414bae41712fa8dbd47c5f3e3a7801452f",
"blk.4.attn_v.weight": "063f297bc4aa6e709fc32c4c32e35af7d07d80e83cb939b76adbba858006c03d",
"blk.4.ffn_down.weight": "f88a18020c5e1caaa29596895eb348e76ee5bfad27ed57651a86cd8cd1f9b5aa",
"blk.4.ffn_gate.weight": "48e7e1eed3fb52e92e61d3557dd0ec002418327090e034ce4322fd68542266f8",
"blk.4.ffn_up.weight": "1ca8a7aa17355b6ce0d9ad5539fdad3899fa47fd359c285fbfb31f19f47bf073",
"blk.5.attn_k.weight": "2bdf15f8e73d068d972380f25d207004cf0bf3b5bfa46946803ba6fba07d9175",
"blk.5.attn_norm.weight": "60448d7cde6e1b6467aa31bdea012e39cdb08c88081cee7d102dca4f93f766ef",
"blk.5.attn_output.weight": "f9f687d7c457537f9fca8a4087a59f1c3bebfaf5537b94e42c831a13224f7799",
"blk.5.attn_q.weight": "987db7a2ad68657a92625e1980effbb1f79697c2183f2b9f3b3a0570c51b0ab9",
"blk.5.attn_v.weight": "cf696891148f3e4783ad1d20f93462ae091eb8651c656bba9b662253b6263e02",
"blk.5.ffn_down.weight": "c0662b0bd0929136005fb9d691fdd9b2c33867d9ce9622339a6a456b720b059a",
"blk.5.ffn_gate.weight": "200bbdfab615d7a3a84719b6ced7751e3ce52757ef212d96f87798bc1de5e987",
"blk.5.ffn_up.weight": "df5d23e7e035fb1b9d163da7ddfdfe38da6a37e86e96534dc02ad20f011b55b3",
"blk.6.attn_k.weight": "c0dae2d272a7c5a2fa004bbb8475dbab362fc1f6d008e73d5a4434a9382ac6ba",
"blk.6.attn_norm.weight": "51c57ac8b55e04354d5dca6bb9c0cf4177639d3b038e80209e33036209688f64",
"blk.6.attn_output.weight": "229d97892c62f85bcdf431675250e01c976ad69ffa450b01fb543bf88f14a2fb",
"blk.6.attn_q.weight": "c20e49621821bd46ed156e6823864a5bda4f317750e71ab8dc54e44eb48cf7c2",
"blk.6.attn_v.weight": "53ceb1a2ee43fce3c7b5b33c58a9fc5ee7f44dc1c6f29bc9dbefc37582102dc9",
"blk.6.ffn_down.weight": "7923c943b7629d560a032d1efa210d1d75c6692140f1be94464ee7ed24f44ed0",
"blk.6.ffn_gate.weight": "57593d350361af753a6a39f53b066282634c0fb44f396f6f2966a574b01d8f8c",
"blk.6.ffn_up.weight": "327b6a7a387098b8899d3ded04a4d4e7c658ca61b80d4e7b17594be232721602",
"blk.7.attn_k.weight": "9ca48b87a10116fd8868e62b76f211d4bb91f166096be9061439ee2e1c3a5c20",
"blk.7.attn_norm.weight": "cd56cfcc4e2ad6b96e23ea7b0d32b4caf236107d99a0b22c56760b62e63c8cfd",
"blk.7.attn_output.weight": "7352b509a03cae2491ffc060e577d189341a0f861233f18c96f9d275dc4234bf",
"blk.7.attn_q.weight": "2b3791c8c008c33ddbe12bedba8191322ceea2dcce5cf0eb7a93d40ad254e672",
"blk.7.attn_v.weight": "3ae721d52466487a3d48150581e57f6d64ea1e83ab929f23b28c3d777422eeb6",
"blk.7.ffn_down.weight": "3b6fa8ececdb3c34af3a5363863d6f94289c1c95bf47fce3a3ddcf184c5f0848",
"blk.7.ffn_gate.weight": "dbd7df6c5ae5eb4adb859f0d36453813a4e289a359a1ba8f72d67fcbf21c3e22",
"blk.7.ffn_up.weight": "de68380a334b4c5cfd4c318b0e9854aec59bd79aa0f0c30af3f56414f83482b0",
"blk.8.attn_k.weight": "7303c4e4480abc72a7ee271811311199245fb5c2ea27a2bd3b8cad3a53a03c27",
"blk.8.attn_norm.weight": "2e3d1921898d1b943ce1a1b6818546c8b471d6d542da24f51a8b514b8c3dd4ef",
"blk.8.attn_output.weight": "30421520887b66bf97a18dbcdc283bc8d0b60590b612fd638a319a6eae923227",
"blk.8.attn_q.weight": "73e064d5433c9b500068a1c31744dbd53f4ade298fb450a0e8c97f62cf1f8a8d",
"blk.8.attn_v.weight": "27e21f8b9a9a8533e8178ca34a72aa1d786393d57302b7806dcdf3e51de511a8",
"blk.8.ffn_down.weight": "bf694bd8e00047982108000e7b3dee7b225db8b19abc595e5697b6bbefd92e7c",
"blk.8.ffn_gate.weight": "d55fdbf8606d9141b774b0500c58944fd1253b9e69d1f765eaa9a680b9f2ca40",
"blk.8.ffn_up.weight": "1ae3f580655e7c8e8dd6c34fa4ac574fdfc5e3f1a8536da0c5442d3a2976f0e7",
"blk.9.attn_k.weight": "b18080626012d8aabcf78542d6c7bf31c712bf55a70172fbfe173fcf34481036",
"blk.9.attn_norm.weight": "2e3620620dc09998c6d3063a7d5de5433fbbae8c11e5b00d13f145d39140e162",
"blk.9.attn_output.weight": "69c3c0e27ef1c0fc933eeb7b612b70909f18cde238873c0d576a2ba9714ef174",
"blk.9.attn_q.weight": "68330e5aa28a28873c9a6e67f032186ef651df2df5844e0f27094ba349fbe4ab",
"blk.9.attn_v.weight": "3df8d45a102be082d0793a51cb82aa62a43cd0e9d047ba4115ca0f2414b39325",
"blk.9.ffn_down.weight": "1d6cc162b73745b135b4f040a0aac3c06d5135a3dc5b2421e7ee2af48662fd7f",
"blk.9.ffn_gate.weight": "034a9d40fb1e32b534b45f4bccd65cbe43c4a6a3f5d01132bd245ca0005de5fc",
"blk.9.ffn_up.weight": "c838c38d0e1a0ac0da17eb2a66023ed31929f07d8fcfe1cc546df26096c91f0c",
"blk.10.attn_k.weight": "a78507cb72f744b86ceaa032596e74e5571c822d0226d334881169addb32cbd5",
"blk.10.attn_norm.weight": "35f48d0b28ee0e6b4cad4e983925737562d64824be5b168b3e26df3d6b260cf1",
"blk.10.attn_output.weight": "53712db06796de39b131323e7abf9a58551b6d52da6db66a471580386d396252",
"blk.10.attn_q.weight": "efe08429ba196026b81cd1c471e1c7418afd9e966659feb3936b674aa0803b58",
"blk.10.attn_v.weight": "7ec6055e134f89da0cbe79ec9f13ef2e442ac584b1f03c3e13e7d0cdad0078bd",
"blk.10.ffn_down.weight": "37e66af4bcd1f3079e841e892255b8255070655901864ea3a8c602a7f681a640",
"blk.10.ffn_gate.weight": "1825282bc34830d371c6edcc3c1e73e6ecc1e10f4aea0122dbb7acc1d6f7b1bc",
"blk.10.ffn_up.weight": "819b3b276a4d4c14a35ed6682d5ef18a5e8ed468e5ce3f12e8c75ec18ac20ec4",
"blk.11.attn_k.weight": "5327e6a2af82dfff0619a14971f5864a15553c36fead84e1af42c7630f2729c6",
"blk.11.attn_norm.weight": "fec363b3c4a43036d2c635fb8aa9e122dd87ee79811839f2f6cd955be3373e7b",
"blk.11.attn_output.weight": "ccf7b38f18ee8798b8a6a35018e2df3eb3e007de62876befb68025dd66c79763",
"blk.11.attn_q.weight": "da8c4a1c824ffe174e39f126cd72f7ef83c56aff1259d452a1212de80f98f5e9",
"blk.11.attn_v.weight": "d17ae6bb77f03982b55d341eb67acb5969e9ad3da5994b96eafc09793dcfe3a0",
"blk.11.ffn_down.weight": "a6bac521e2791345f22c57205fa1c2f2f687794dfd24d0e98d50ae0d0eb6088a",
"blk.11.ffn_gate.weight": "5ed902c488cb51ba5635f3df08258c5f84f31a679a00211ea5f9d8b824ef6d9d",
"blk.11.ffn_up.weight": "ee9f1437eb890d2cf9df2574afa1cecf20aafdd847cd75b152d7eb74419afd34",
"blk.12.attn_k.weight": "5a069c06e1019b0f889088e67458f7a11ec77fa190ada6069e46211f62219947",
"blk.12.attn_norm.weight": "194d7e5fcc8c49aea62daf1940532419cf3c505afdce6be377286b677db5db8f",
"blk.12.attn_output.weight": "6534995fd4d6fecb55e317add4b1723aba4d825e1e9471d0b08813dfdc247176",
"blk.12.attn_q.weight": "4ab51ca519b5995581fa34f846276feca3b907ef2b51f192f6cc0b3263c3f5a2",
"blk.12.attn_v.weight": "5652ca3fa81ef9a1ac1543d71fc6813f8517f8ec54b25c701f6f98061614830f",
"blk.12.ffn_down.weight": "4b2c263f54c88516b8eb273bb8d9615b01c5c8b484dc70358adb91b50b300edd",
"blk.12.ffn_gate.weight": "8f50c3c3e3e8568991d6c1b0e74b500cf4f208e7700bbb8e87c3f6a6d359b6b5",
"blk.12.ffn_up.weight": "1c1a581fec1fbe959e1427fa513f400100b5e1ee9d83932630be9905fb49c231",
"blk.13.attn_k.weight": "efd7a38c46f08d8376d82974f33c644e3a02220e142d63b1704718699a8a884c",
"blk.13.attn_norm.weight": "d28fa4f1bd75abbd063b0e622e08f579c89cd0c0c5ce63c1952ec9f944f8ee13",
"blk.13.attn_output.weight": "71e0068a639288718bdb70a6cfdefd50bc8b3ec3993347a65129e70001ca5827",
"blk.13.attn_q.weight": "b97077adc92cff07a2e07d80ee38f214ad8713571c69cd5c70ebd43dc501ac87",
"blk.13.attn_v.weight": "79b3e2749ab4b459c81e96e322b215f1e8af645eb346e176c326bd00cf6ed2fd",
"blk.13.ffn_down.weight": "9f8687d11effa1db7cfecf7bec5631734bcf2962aad74a9f519144491e08ec85",
"blk.13.ffn_gate.weight": "7d14dfa0543852e7777fe8fff29ca533744cbcf1ebcf10067e5adfc4eb345e65",
"blk.13.ffn_up.weight": "852b9527b97fdab211ff3f832a660ee1d93ccb56906144c50f01319a6e8ee615",
"blk.14.attn_k.weight": "79e926b20f36f66d58226cb358881f2f68ae7b468787d33cafae5110287a14a0",
"blk.14.attn_norm.weight": "97d481b63deb0df6142c2c6cd23043720c62eb609e390f47a7113751c79974ec",
"blk.14.attn_output.weight": "aa6e94d7176d5c79fbb89b96e5f13ce75702ce3dd23ee52986446da436a6c3d6",
"blk.14.attn_q.weight": "214becb6d1bb460da9fb8ace0f99b9a5afa9edf7aa7acc19606c7401b11d6305",
"blk.14.attn_v.weight": "488b0e6d7f1a7a2ed0972aaa6d10ef9c775ee5373460324efcf5b3e3da9311df",
"blk.14.ffn_down.weight": "29c7ad16cf9542e30996a1a01ab95b844533b28051f04cc7949c371afb796471",
"blk.14.ffn_gate.weight": "b7ef208f2b054803665b377f5a5980c122c026841809cf855c6ba06d1c3a885a",
"blk.14.ffn_up.weight": "76a5cc28100748d79c4398ce7b9176aab4d661548b6293a82f99144812e5b70e",
"blk.15.attn_k.weight": "a6b8f9e98ab878fa7ebc5d080978ebf2d050acc2ab2fa8ea9188eb10e27702c8",
"blk.15.attn_norm.weight": "a26d07a9752d6dccb68e3a8a2a49fd0752cdd0a415e05547819bc37d9ba63d5e",
"blk.15.attn_output.weight": "c63616c69048ccbee801e05be4f56d21fda21aa0cc470f41d57c31b4d9283a4d",
"blk.15.attn_q.weight": "fd595a67bf96c6ba16eb148a9d02fa52fa3c1d33ed10be28a08f851409fd6e64",
"blk.15.attn_v.weight": "1c5c9d33fa07c05d5f4ed0032c6c4aa83d863f0d31c94a66109d239dcd03cea3",
"blk.15.ffn_down.weight": "585ea62ab8aff7d7d212ea5c1a03226fda6b68370c890b776834af70c948dcbc",
"blk.15.ffn_gate.weight": "a13c63f86f879b03a573d5dd2a25cfd1f4dc73e8132e6454ecc23e538b4cdf6f",
"blk.15.ffn_up.weight": "f7112450f57c12fcd511f049e0dc0b541625a107a7901c3261ed9e984299f65c",
"blk.16.attn_k.weight": "2d2c8b11dd71fba6d1c106aa1673c113a5448653cca7eab897c8739212ed5003",
"blk.16.attn_norm.weight": "95c2ec7be9469690e18a9a1779684acb3e9da44b13e263a0da840305646fbf8a",
"blk.16.attn_output.weight": "31a65046e677f54dae654ded4e733479fcc0f7283d83076b7dc7cbcae8528230",
"blk.16.attn_q.weight": "bfc6292b9c6d49b7118d08060242a138182eb182d136ba5dfaf469437c16081d",
"blk.16.attn_v.weight": "68f81d037340217d87c7853ff4d6edfbc46d9e827ee6d5bff7c3f6238e3a95ad",
"blk.16.ffn_down.weight": "bbd6629691950cef4d5113e1c6670e91b216a9b872cb92cee02dfda4d6c4f7b8",
"blk.16.ffn_gate.weight": "63cb56f282b7401ed6c76e5bb6fdf1bf68a64f9af0c82c014209b55bcb5191d0",
"blk.16.ffn_up.weight": "b54f39a2541063cbfb6f713aa81c3b69a04100e999aa2ebbeec195dc382eceec",
"blk.17.attn_k.weight": "3d9ba49799cc56664ec30a002bcad61eb651294212a68c3ddb573eb042aef5a4",
"blk.17.attn_norm.weight": "42ee0db4b9d63257bca0012a30b12737ead1caafeb5ed3d93c8f48ffec4b46de",
"blk.17.attn_output.weight": "a38fd100f05c9041c592bc739e287de0b10d08ef2bda41a879225bdca9002f71",
"blk.17.attn_q.weight": "8a3bee285b0180a9eb35662e449ee4cbe16d992bdd48fb3a94bc4a347728cfa2",
"blk.17.attn_v.weight": "d7f8f1b8b863494ed4392a1656775912e9b264ad36016547b12e832a1d6757d6",
"blk.17.ffn_down.weight": "bb7ee58f61da8630972e25b621996fbe8ec06f4dc9ab1e268ab5b120c526ca28",
"blk.17.ffn_gate.weight": "6b652dbf167fee09a45ebfd78d500ff6548fb2756dbe5343ffec3f7e6207179f",
"blk.17.ffn_up.weight": "3b67f727e55e742715de978fab80457781e7a3762bc48f79d13b45dcb8de664c",
"blk.18.attn_k.weight": "ff7fe57c57b90c6fcc0aefc39ec24593c3a7d1ea1c23770480075a015450e0f5",
"blk.18.attn_norm.weight": "1d40faca082d2633ef0ccf19e121870dd6c7c3e2154607c7f3543fa96e99cb2d",
"blk.18.attn_output.weight": "9adfecaaa397a92db4687efd5fcabfa0daef9e6b0493763b7ff5ebc185c43a6c",
"blk.18.attn_q.weight": "ad1803eb9b291948639277afe981e666b07167eb3fcae903ba5b73bf86d8f50b",
"blk.18.attn_v.weight": "308cf23399adccf27401a4ab60d74dac6fb9d4cd4b9c5940d9145118d1881b34",
"blk.18.ffn_down.weight": "7de4ac9a561fb580619b745687dfd7ca8a69ef70471dee978741b80e9ff7bead",
"blk.18.ffn_gate.weight": "0c66970f696b33bd5ee8f1f2fbcb41fd78fa5ccabdc927e11a4d5a4089f19c69",
"blk.18.ffn_up.weight": "66a42e988e8a1f468fabf976c48e9e4bb045eaac6916ef16555ac101cd674abc",
"blk.19.attn_k.weight": "a928ab50390bacbcebe2e4b66922498134ce22d7b93beaa87d6cf4ab52eb7174",
"blk.19.attn_norm.weight": "b4a02c55b46c2a96aec9c64a254087cf48e6c1d4b6f31782c77a46fc4daebad1",
"blk.19.attn_output.weight": "b768319c641dff1eac5d1f8ceb960c9899c795bf2b24c1d6bf70aa24fda45f77",
"blk.19.attn_q.weight": "79ef3f57d187d3954a26362096e1b6c222d76f537dff73e034d6e9999935b8bc",
"blk.19.attn_v.weight": "ce13d6b13e24fcb2d5bc6a2662e5bd295b31b12db10a6d0307f86cf29b8d5001",
"blk.19.ffn_down.weight": "cf90d7e2137482cfd50934a8223ad774621d08554969da80a9712df5e6227eb0",
"blk.19.ffn_gate.weight": "71ce30150f003b6eeb3bf7464e05b6ae615f135110d8e47f0a47fd973e537c0f",
"blk.19.ffn_up.weight": "7f92aca0cc29866633feec701ec01a85a8ee2fd4e2b9630173a6cffb1d9d50ee",
"blk.20.attn_k.weight": "a2df23159d6fb74ef28e14b61028fe8b00a693a2fc9234a980be74f20b958682",
"blk.20.attn_norm.weight": "c6cd5f1b096fc5efa4eb59ca1c8c4bd28730f3dcedd59a63601663eccc6724ed",
"blk.20.attn_output.weight": "896a8a166d0f006d4b09867ae4345426303cbc3fb13a18d3d4e1bde00f16dbdf",
"blk.20.attn_q.weight": "01eb79588fe61baea0da43e99f4dc5939590e1bafd01e12dadb8326f102bfea2",
"blk.20.attn_v.weight": "bd39630fdd5a7c859ac1addaf53e63faf524c3f32f5f4896d86b6e746b1d5c06",
"blk.20.ffn_down.weight": "0304a5d39957a0e3f031c4bcc4549a135d396c8d97c8d276fd1c823ce86560c2",
"blk.20.ffn_gate.weight": "117b79d595b1dca0c8b37586beaecc4d84411507276212dc286cde7fc36c9bef",
"blk.20.ffn_up.weight": "6e799346db145c125f01783539749d3828fcc451cd4f10c5352f047a47e28714",
"blk.21.attn_k.weight": "1c37e4c0664147e775bb006b226b9553e3421140cd96288ea755f81731ab80ba",
"blk.21.attn_norm.weight": "00ae783a29000ccda5e4bdbff03df0752fb82805dc3f9b987500ebd80714476e",
"blk.21.attn_output.weight": "7588b84f9fb19f15095b5265c60b4a4e7ae74bcc47d4607dfa5d0bfab6f136cb",
"blk.21.attn_q.weight": "a65f1c0dd06d45bb97532d3e932689c1eecfe7359089b39174a96a149335cbc1",
"blk.21.attn_v.weight": "4220b77e7d5e8709b4eef33a679b5dad11f297085ef44c9977f9e54ef08f7a2d",
"blk.21.ffn_down.weight": "b8c082a0530d4b5328e67db0df84c5498f2af956de23c639fa0198ffea853950",
"blk.21.ffn_gate.weight": "cd1b656ee72d00e9835ef667c19ef89a88de261eb8eb7c0e936e0f9ddf83ef9f",
"blk.21.ffn_up.weight": "dc445f73e36ec7a3bd86884186b728f8e0187f32848c3b8b69d4d41f8571bf31",
"blk.22.attn_k.weight": "e37cf0b893ec8b9ee8c78dd139b8d9c45cb997a3bc0c3d93a70ca1c3f6af8859",
"blk.22.attn_norm.weight": "248a27838d3c46cc03a5c312facc84e2e0e2c990ef8401e93da25918497f88d1",
"blk.22.attn_output.weight": "fc191a18f6d18332c66761f7ab28008bfe295dd1f5c8741a2488442f9e00d0f5",
"blk.22.attn_q.weight": "4b193a2ab8bc2b085db18f2bf3eeba26e02b537b2cdd738160c8f14b165d0f5a",
"blk.22.attn_v.weight": "7a60ce5ccac7e045e55ba1e1e85bd2a0f93f8c781daee96c5223665e22f0c666",
"blk.22.ffn_down.weight": "e0a34fb4244e2c7168f3dbaa1904c15d339ec39999cdf27128bbaf619ee0a237",
"blk.22.ffn_gate.weight": "8bac872d4b8549c8812f927efa309f1792b524f33601095fff61b826de5a5615",
"blk.22.ffn_up.weight": "b67fa2b94dd901b6ec64c0853ce8ca2d86fe9cb1cc6d2f15fbbbe0e691c0c648",
"blk.23.attn_k.weight": "2c32e66ad01942b819ac09a197c71579fe66f02226a264fdd72ad1e02c67a27e",
"blk.23.attn_norm.weight": "825fdc94deb439cb93c713eeb077c1052b90ed658d6d464fc4ad3d611e911d48",
"blk.23.attn_output.weight": "95ca6707a95b8750b0c7c5d379d368f0f2e7ebef631954e7d4d8ec0f41f13a3a",
"blk.23.attn_q.weight": "6eccc84faca5fac015d1b26e2854501edcfd292a302228fe14cf99f5eb59a34b",
"blk.23.attn_v.weight": "b343ac3d226040f1033ee049668aa1d89b1774bc18431965682e5dbdce78ccdc",
"blk.23.ffn_down.weight": "9fc599befea8d3b1e342d564a110074f66d2542df406c4b90b6bdc5828fbb2b2",
"blk.23.ffn_gate.weight": "488556c1b0c9f0b20b0c99b4bac2e0f4046b81edb601d7b91e7e5b3bab47d667",
"blk.23.ffn_up.weight": "1088e291d7008dd9c7c2dd6830af686a8a84b724d123a016209bd5156d6898f1",
"blk.24.attn_k.weight": "a923fbe35e61e009a53927d7828818e0592bb737d6a1106c4b0b5a1efc367e07",
"blk.24.attn_norm.weight": "9b51aaaa939cefafdd9b13a7e5b74ac7fa2d603427e55a16a909d6f3f353750a",
"blk.24.attn_output.weight": "1beb2baba56f8409466434b037771248c2f620ec5f53e15f44c271d5a2d9ecf4",
"blk.24.attn_q.weight": "4b0194fe5bfae0c6bf6131dcf8cb6e2b994f6ea10b27cb03574f0f4f8cc0c950",
"blk.24.attn_v.weight": "6ac34b1ab0f66226d85bca1194a7c212cd93d384ecbc8b8395de48aec0970a61",
"blk.24.ffn_down.weight": "5508f74cb732a662c2936b32ac5e90742d172b9f961a747b0e5cba0e5906a89d",
"blk.24.ffn_gate.weight": "095e39b8584403835f9bb1ac33e0e81f54175575e4800273d281b845bff381e7",
"blk.24.ffn_up.weight": "2d43ec21637dda12973de367b0113ee9840b0d815bf6fce042f7c3f270b0b530",
"blk.25.attn_k.weight": "9e2aee029f3d2c7f67dfc7926e72c8228fb978382c8e5a4701bbf82c93801419",
"blk.25.attn_norm.weight": "220cd7164fb4cdbe22d26058e4153b26c27c7b5ce2bec8e95bf2c0ea08d23103",
"blk.25.attn_output.weight": "a17f4a5dc6aa51f03dbd75602d98e9491767c205cdc2c3a5f8667fc54bbf7c64",
"blk.25.attn_q.weight": "f60827496835c440c794bf57ce9780704d10a59d8229886bf75ebb18900ba4ef",
"blk.25.attn_v.weight": "9cac217e9e9f4f4c85f14ee51165a77c580165bd4a34b202389169bbe61a1ced",
"blk.25.ffn_down.weight": "a0f36949b663e80849581dfb71e7babcc73580793bbcb0c80ab26d5a6e000359",
"blk.25.ffn_gate.weight": "df4d1be4d50d6afe5ad3ef0d0e0fac76a33e85c963dea769641d612dd53e7d13",
"blk.25.ffn_up.weight": "992da76be762632e25ebc5ef4d03728eece1b43f7c4e31827df19ca724aea694",
"blk.26.attn_k.weight": "34199ff856ac32a500c754539d070258574192a34ecba87a182897cb59fdff52",
"blk.26.attn_norm.weight": "a8e9dfb2dae5d22b5c0aec5f3675991c0e3c3e6a44153db2579136b73f456e00",
"blk.26.attn_output.weight": "1c4f257ffb0d7db0f11cfb275e38b4af736917b43ad82de1badce3f1d227da4d",
"blk.26.attn_q.weight": "33d55786274c2e718cf61e8fbecf3dfa5ee0c208f0b716d42b061f55459acb3c",
"blk.26.attn_v.weight": "684b636939cd4ffcfec5a6238a0790ffa43d853c95783af9b9e8275e74071a7a",
"blk.26.ffn_down.weight": "89d0bf066db154e6d312b5433aed1714f6a28b40f4c52e3e1530ee07703303c8",
"blk.26.ffn_gate.weight": "393d649bebe5e2940e1b043649f6c860b4b8b9f380f30e9da1744a830f358156",
"blk.26.ffn_up.weight": "179edc85ababd9d8440cc6093eecd1004290aa1cb96434b26ecf7585b6cca17b",
"blk.27.attn_k.weight": "334841445a7f1e14731b08f56eb0b1f0938c63823d28bc6d078c4c5f05b36f19",
"blk.27.attn_norm.weight": "57344471bbda2e9deffdfdb2dd05a07aa47f8761e24de53525588639145bf551",
"blk.27.attn_output.weight": "506126af9ee54b535d49f97e36f630e74834f480329f098d6d62e96246d8d65a",
"blk.27.attn_q.weight": "dd984df1acb4783849e25ba7ae378bfd385cd9efc540fb798cd5bdd873f0118f",
"blk.27.attn_v.weight": "b4b3fe9a4455d34c297ff20a2f537b647cef424741d840a747b265f23d320ac0",
"blk.27.ffn_down.weight": "621fdb185ba0d35ba5476dae73d2c81ec1482a0e878d5bfd5c3b29fe837af013",
"blk.27.ffn_gate.weight": "e4fbab45f2ec506fa374103251a0bdb7baa6f576080bdd796f3e9db92098e08f",
"blk.27.ffn_up.weight": "a0c57e463e988002bbd6a6c6792baa21a65e6f89ae303a2c301951b0ae6e4bbe",
"blk.28.attn_k.weight": "bac36cbd52ec5056841663865e1291ddab4b47ef9a2544dd285d4503bfb0e4a0",
"blk.28.attn_norm.weight": "5774a9df2bbb2e86d1f70179c7b92d81e1f401160148b3328fb64db6646a5425",
"blk.28.attn_output.weight": "e8712622d1569557000c75f26c3f55fad267fd300463c2c2cfe3afbfa1c8f908",
"blk.28.attn_q.weight": "11677751fddee52cc739699c02836f7be54d96038be4240be5d4f53d00161608",
"blk.28.attn_v.weight": "e5ee459b8958d65e1445997b9aa1e90e2f5d17761ebcf5357313119a45322507",
"blk.28.ffn_down.weight": "3934518f9f85292da8475fe38a8edcbfc4e24ac56c351b472d6351f98750871e",
"blk.28.ffn_gate.weight": "6ba735d57e98d0847e487f25ffaa25256deaa8abec76f428cb70bd9774279d83",
"blk.28.ffn_up.weight": "977fae6e1e5353114fc645dd98429464749758765cbc6e6457593d596e57850c",
"blk.29.attn_k.weight": "8122a457307d580ad6f1e0acea09a2f593d97f595ba0d6737f5fea16d2433642",
"blk.29.attn_norm.weight": "d626f721e05aa1202439b01027031d4caf1adace61ed37870a277cb6297c77cc",
"blk.29.attn_output.weight": "7fb7122ab1b6b1e6615ca746897da27bc52c92cb70d3147183cdde61795b72b3",
"blk.29.attn_q.weight": "be43e94ff6b6e391024dc824101efa0ddf4005d5b002ac26cb03765c0c73c2fa",
"blk.29.attn_v.weight": "af93c85ebff908f74f9935b81bde0516ca487c84139868a1ce079c3ae20036b1",
"blk.29.ffn_down.weight": "39dae12340ed3120bd19c495fe0872b559613641e41fde69d02d8631900b84c0",
"blk.29.ffn_gate.weight": "36fd482439840ef197c9f3b8905d86acfcea49bcf018544106ca465d4bf8d5c7",
"blk.29.ffn_up.weight": "5243fbdfdc1e2a1dd84b6210a9869d18a014db9088897e345240cdc99990bd5d",
"blk.30.attn_k.weight": "948f263616bd3788b2b968baafd69b9c5bd1b77578665f096c4b7e247b4cea42",
"blk.30.attn_norm.weight": "e168df981e744874ff303faf2eb470e5f6868c2040ba5f383f6c5148669975e7",
"blk.30.attn_output.weight": "4cf0ccca04b792573b756655a24fc89cfb1f272da8305633f0bc66ef14990b93",
"blk.30.attn_q.weight": "21e07d6cba6c50d65350289258209717174a13c42be57e8141d69712cbaf32c1",
"blk.30.attn_v.weight": "65a8ca29c7237b3182ccf03e2fc94e84f9a53d0e160fb679ab401c853170dd9c",
"blk.30.ffn_down.weight": "8b00500a6d00d84058f6658ee1d6f06fb4fcae2f90d4341792259362923b3c13",
"blk.30.ffn_gate.weight": "5bc0e19ab7a31b50ac2118ad1b36e31055271a322cd8ff661d47c3ac0210703c",
"blk.30.ffn_up.weight": "f37a0561955725bd59ee2d064fa9f4e00a12a1b620b624db3bc3add5330bc321",
"blk.31.attn_k.weight": "9a5663edda227f5d87533897146764f8e8a7481b9e71fae197c39204f8463221",
"blk.31.attn_norm.weight": "060a4f438a1ee5e220b5b5278ad2f5c085a428bf38c515766781815597c87529",
"blk.31.attn_output.weight": "6ada5d3cad9dea4780ffbb43302bb6ccc2f24eddd0fc4f5f84c9ce0fc0c6e5dd",
"blk.31.attn_q.weight": "bb5d08c08603907981ad388d5d8b70fcc9b98034ba264b8474c8890cc0297af0",
"blk.31.attn_v.weight": "e01b4252ea9c6a889c32b21144b441a347464d04536ef4f6572425be55759796",
"blk.31.ffn_down.weight": "8ba4d679c36e93ba65ba03180385ef35ea86b3b7cdf2fded9df59369f1c09630",
"blk.31.ffn_gate.weight": "e5b41dc93645f8b5e8eebae3ada3ea43a18f97ce2654228655170b07b463ccb0",
"blk.31.ffn_up.weight": "25b88cdddc8b547af294ed107d3d1312e90b983cae87936fa6062ecd8ea02539",
"blk.32.attn_k.weight": "4bcf86dc0858c8ca2fbdf6aa76674d43eb698f78979fdc1a38f556a7af1facc4",
"blk.32.attn_norm.weight": "cdcc12f3b8b9773c6722736bfb748a2729230b21478cbcc4104859d3148df815",
"blk.32.attn_output.weight": "d43f1196822995ed89a9365c97054753a8b30ce20b6e273c8edcc42673a1e141",
"blk.32.attn_q.weight": "ebf2972bb3865cbc5be4840113a322089752038344beab2a0122c7cb4fb399b6",
"blk.32.attn_v.weight": "714db81704ff34fa137512903c1013acee7877467473e46600728b9240582eb7",
"blk.32.ffn_down.weight": "2cde3da1258bb170a79d5d3cdfe10c86a71eb34b77da46b74c5ed71e7f4fe274",
"blk.32.ffn_gate.weight": "c7e1ed792532613ff9d4e5834b6536e2e0f47df2303bc0fdaa90aac0c1f4e8db",
"blk.32.ffn_up.weight": "d8d6f13fe66a716e28f79101a29817f0c0d6f99969a6f017d51bafd1a16c600c",
"blk.33.attn_k.weight": "a0a28f6cbca88da00cab2ca37094d9b0503bf9defdae77b91895b911c408cbb6",
"blk.33.attn_norm.weight": "0251200c24cc8445607ace6dc8c5aa0566567997262b7cca53a11ac23cc564b2",
"blk.33.attn_output.weight": "b2423205bdf6a1096d43c44d8d12f1a84fcd4e1bb70fcf6dc8542b8b8a71a13c",
"blk.33.attn_q.weight": "00b425c3ef71065ce5e0234e702bf38143b4952da78a85f52ab2c2e3073d97ab",
"blk.33.attn_v.weight": "035edd2335df816c42c765a5e66b9d9b9e15a822a8dc1863508145499c942c14",
"blk.33.ffn_down.weight": "4894a923a3db75bae4496ba3ce5f28796ad31fe33996a066271fb8654964310e",
"blk.33.ffn_gate.weight": "8f6c819b8bbfbe3357fae89e1ac5a3d58be85b3b04be3bacf7b62775869046ff",
"blk.33.ffn_up.weight": "257c3544b5b544fd5d839665bf5caf107a329b59dbc3751efcaa24ae63c56179",
"blk.34.attn_k.weight": "b6cd8bba892e38dac4a2ebc3ba1bce49e71b967fc436fde30c6d76f54a18935f",
"blk.34.attn_norm.weight": "2b3c8e60a064cba9955752bbbbdd92c71ba5c2f1bd721097bdbe88b5abc68787",
"blk.34.attn_output.weight": "8cc272551c9aaca9db5a660c6927bab94a0243d74a30b2bc165f06bd577714ea",
"blk.34.attn_q.weight": "74b561eb4792484e6a94b58fe2583848c3ae28ff2f1bf3d02939a0cfdfa49990",
"blk.34.attn_v.weight": "dba19e24ff05154dc5a1f55c023729303a583d13d68732ce22ea74d4410dc8f0",
"blk.34.ffn_down.weight": "76eca5dfeb274c35774e0bf9f22ee420ed9085c8e99aa2cd5a236e4918b44c61",
"blk.34.ffn_gate.weight": "9af0862d5fcbc24732846488e653db8242a467765c0cdbc00332b3a40256b4a6",
"blk.34.ffn_up.weight": "2a03126bf73587eaba99ece2066103d12e47bcd4ce30ff6c17b2f383b81d40df",
"blk.35.attn_k.weight": "52513fc0cd4e997a842729af7d21dd09399bce0a339558374738be266d0fa2f0",
"blk.35.attn_norm.weight": "e5281fa911964263ccf1630b14762edbd41d0b9472d6ec695fc600fed4892c35",
"blk.35.attn_output.weight": "b391d6705d5dc6f48326b5fd16573f679edf64109d86fb729a498819676590ca",
"blk.35.attn_q.weight": "d16446921966db9b0e0539626ad22a2511ace780e59379d6a4162d8c5441440b",
"blk.35.attn_v.weight": "9d8cdf23ffdb0c5c74106843390b94b24c9f33ef0eb9998d39f78c73390101ea",
"blk.35.ffn_down.weight": "938eb6301f7bbf162d7dd965682a5ed11d0a4a530c6fedd7e5469ce80012fc17",
"blk.35.ffn_gate.weight": "5ad84f5a0c8edcfea1ecf1a3e3d21d85ceda0c4ad9e3c6ca68885eeff8ed3c2f",
"blk.35.ffn_up.weight": "1c4330d9dc71bf4c98812c34356c51f520f47610a534152aa6d29284b758090d",
"blk.36.attn_k.weight": "ef720655e5ca2465f13db2dfc4732fb4ef2c9d53acde52f514fd4f301e974081",
"blk.36.attn_norm.weight": "88f4b9310b3c8c2644e3029160cd35678c79dfa59280430e03f5c29a6fe84a58",
"blk.36.attn_output.weight": "aec6f915fffd7bb72cd783273e871b4f09605950089d45e72059d1316b6c4b01",
"blk.36.attn_q.weight": "72f9408a2405d42f8db6ce5fcf1d26a3660b6f225fc60e77d0277109cfcb82ed",
"blk.36.attn_v.weight": "0f3b3d851dc44b3893ef53f6cca5b4acc9658bacfe1cc2d13c3d704ddd409b67",
"blk.36.ffn_down.weight": "470aec48ce8c5129a6654d9fd26fcae72776f9fc1429a8bb05818072a876475d",
"blk.36.ffn_gate.weight": "7f5f296d09cf55679767b5d15de3eff489c456782119f25204be4b1647f18dcf",
"blk.36.ffn_up.weight": "b7ef74a1f7ffb4982711d93f1787be3a70edc3d2358d5203c41d8900508037d4",
"blk.37.attn_k.weight": "c4ffa5412e4ff2dcfe1aed991c1f54169fd171a4c7638e4b9f21a1ca64c5e1d6",
"blk.37.attn_norm.weight": "4eb6c888d841cccfacf5b963f8611120f6ff24b84af0b5714fd9ab36dcda422f",
"blk.37.attn_output.weight": "db2a7bbf9682f9f6eea672dae8e150738f1bf74dbc80edc7022017a3f040c8ac",
"blk.37.attn_q.weight": "e38c0462aff139afcbab289189823527e453abc9e541154adde5e7af88cacf0b",
"blk.37.attn_v.weight": "952eb2492ed452a72f96bcc12d4b2affad9dfdf46ee39ce4a5d7b57a5dc301e5",
"blk.37.ffn_down.weight": "25f23a8fbc44febf6dc4848fd7fe03a580e2822bd3b3b5a51f4990826bfe3e4e",
"blk.37.ffn_gate.weight": "707da5eb40118b035305d3262444382351f170a20a537386a70e90c5a83a7817",
"blk.37.ffn_up.weight": "d2d2ba5cfc4ef47338dd7384219e22bf030a5a2209e0354d88f5bbaaafd20e87",
"blk.38.attn_k.weight": "abc4bb189dedf7ce661e79028427623a4f91ac091c2cd60e31b58bc62b1cda71",
"blk.38.attn_norm.weight": "9f4803a7d03fd40fcb83d85f84eb1d5682ea4e5bb084f210c02850675d804c3d",
"blk.38.attn_output.weight": "77cb66007f1a41df7135d0e7f900ceb499c2f667dfc3f1a6ac01a3203bbd3ccf",
"blk.38.attn_q.weight": "d94a8b26cd375bf2bcaa76597e314aa8268ee50a479d00931e5e0e021feadb5d",
"blk.38.attn_v.weight": "660c907888bc5016dc69b7d35fe6f55c7ded697c93be0e2d332a2f17aff88758",
"blk.38.ffn_down.weight": "6f06173bae5b00ffaf88ef383619a8b9c6a8d0d5c6494695d17f6c1de1a68a13",
"blk.38.ffn_gate.weight": "89f99be149d03f116527bfcabe073c50001c874de40fb6e817f6619027f3cd05",
"blk.38.ffn_up.weight": "8d57557c8d5e2d2688b73f01dddf1ce8d5194990cda6358153320aea88aac7f8",
"blk.39.attn_k.weight": "21be09c988b46c8393e6c2ec9230f3b5136eb7607dd1953ba92d0811c2f0dd75",
"blk.39.attn_norm.weight": "ba7c1912dd1c4e2d16917201f62396fd0600e4a451137eaddff255548c209abd",
"blk.39.attn_output.weight": "acfaf4abb3fd27fd899b5563c3877f176b597d8f6cdb2f2fd3f3a0bd4da15ed6",
"blk.39.attn_q.weight": "e8adbc140d4c8f0db2a27ca584c5531d5b1e080555fe627e34d80d0814a92bed",
"blk.39.attn_v.weight": "92f96b0e1f724e73a0f90a76c145654418844c04a6d4b14c05eb5af8a62bf8dc",
"blk.39.ffn_down.weight": "4d9ee7c65fc16fe95d10c47b79ac6a525741947600a64b5fcea5d300a82c50de",
"blk.39.ffn_gate.weight": "7e18507989f39b32191133d2657c2ee3b74f42f070579204d727eb72215793d1",
"blk.39.ffn_up.weight": "22cda752269c9757ba918abede1df95bb0f83a5c772dea13c8deea3d5f2723d9",
"output_norm.weight": "2858cf0e39d32caf52b7861378ace076000241e147f10b9eb21d8a5cd149e3cb"
}

View File

@@ -100,6 +100,8 @@ func parseTokenizer(fsys fs.FS, specialTokenTypes []string) (*Tokenizer, error)
t.Pre = "deepseek-llm"
case "21cde974d587f0d54dc8d56b183cc1e6239600172035c68fbd6d4b9f8da0576e":
t.Pre = "deepseek-coder"
case "1ff7f41064896984db5d1bb6ff64fa4bc29007d08c1b439e505b7392777a319e":
t.Pre = "qwen2"
case "e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855":
// noop, empty pretokenizer
default:

View File

@@ -9,8 +9,6 @@ import (
"path/filepath"
"runtime"
"strings"
"github.com/ollama/ollama/envconfig"
)
// Determine if the given ROCm lib directory is usable by checking for existence of some glob patterns
@@ -41,13 +39,10 @@ func commonAMDValidateLibDir() (string, error) {
// Favor our bundled version
// Installer payload location if we're running the installed binary
exe, err := os.Executable()
if err == nil {
rocmTargetDir := filepath.Join(filepath.Dir(exe), envconfig.LibRelativeToExe(), "lib", "ollama")
if rocmLibUsable(rocmTargetDir) {
slog.Debug("detected ROCM next to ollama executable " + rocmTargetDir)
return rocmTargetDir, nil
}
rocmTargetDir := filepath.Join(LibOllamaPath, "rocm")
if rocmLibUsable(rocmTargetDir) {
slog.Debug("detected ROCM next to ollama executable " + rocmTargetDir)
return rocmTargetDir, nil
}
// Prefer explicit HIP env var

View File

@@ -77,8 +77,7 @@ func AMDGetGPUInfo() ([]RocmGPUInfo, error) {
gfxOverride := envconfig.HsaOverrideGfxVersion()
var supported []string
depPaths := LibraryDirs()
libDir := ""
var libDir string
// The amdgpu driver always exposes the host CPU(s) first, but we have to skip them and subtract
// from the other IDs to get alignment with the HIP libraries expectations (zero is the first GPU, not the CPU)
@@ -353,9 +352,8 @@ func AMDGetGPUInfo() ([]RocmGPUInfo, error) {
})
return nil, err
}
depPaths = append(depPaths, libDir)
}
gpuInfo.DependencyPath = depPaths
gpuInfo.DependencyPath = []string{libDir}
if gfxOverride == "" {
// Only load supported list once

View File

@@ -5,7 +5,6 @@ import (
"errors"
"fmt"
"log/slog"
"os"
"path/filepath"
"slices"
"strconv"
@@ -50,14 +49,13 @@ func AMDGetGPUInfo() ([]RocmGPUInfo, error) {
slog.Info(err.Error())
return nil, err
}
depPaths := LibraryDirs()
libDir, err := AMDValidateLibDir()
if err != nil {
err = fmt.Errorf("unable to verify rocm library: %w", err)
slog.Warn(err.Error())
return nil, err
}
depPaths = append(depPaths, libDir)
var supported []string
gfxOverride := envconfig.HsaOverrideGfxVersion()
@@ -113,7 +111,7 @@ func AMDGetGPUInfo() ([]RocmGPUInfo, error) {
UnreliableFreeMemory: true,
ID: strconv.Itoa(i), // TODO this is probably wrong if we specify visible devices
DependencyPath: depPaths,
DependencyPath: []string{libDir},
MinimumMemory: rocmMinimumMemory,
Name: name,
Compute: gfx,
@@ -164,9 +162,7 @@ func AMDValidateLibDir() (string, error) {
}
// Installer payload (if we're running from some other location)
localAppData := os.Getenv("LOCALAPPDATA")
appDir := filepath.Join(localAppData, "Programs", "Ollama")
rocmTargetDir := filepath.Join(appDir, envconfig.LibRelativeToExe(), "lib", "ollama")
rocmTargetDir := filepath.Join(LibOllamaPath, "rocm")
if rocmLibUsable(rocmTargetDir) {
slog.Debug("detected ollama installed ROCm at " + rocmTargetDir)
return rocmTargetDir, nil

View File

@@ -23,7 +23,6 @@ import (
"github.com/ollama/ollama/envconfig"
"github.com/ollama/ollama/format"
"github.com/ollama/ollama/runners"
)
type cudaHandles struct {
@@ -101,15 +100,7 @@ func initCudaHandles() *cudaHandles {
// Aligned with driver, we can't carry as payloads
nvcudaMgmtPatterns := NvcudaGlobs
if runtime.GOOS == "windows" {
localAppData := os.Getenv("LOCALAPPDATA")
cudartMgmtPatterns = []string{filepath.Join(localAppData, "Programs", "Ollama", CudartMgmtName)}
}
libDirs := LibraryDirs()
for _, d := range libDirs {
cudartMgmtPatterns = append(cudartMgmtPatterns, filepath.Join(d, CudartMgmtName))
}
cudartMgmtPatterns = append(cudartMgmtPatterns, filepath.Join(LibOllamaPath, "cuda_v*", CudartMgmtName))
cudartMgmtPatterns = append(cudartMgmtPatterns, CudartGlobs...)
if len(NvmlGlobs) > 0 {
@@ -240,7 +231,7 @@ func GetGPUInfo() GpuInfoList {
if err != nil {
slog.Warn("error looking up system memory", "error", err)
}
depPaths := LibraryDirs()
details, err := GetCPUDetails()
if err != nil {
slog.Warn("failed to lookup CPU details", "error", err)
@@ -248,11 +239,9 @@ func GetGPUInfo() GpuInfoList {
cpus = []CPUInfo{
{
GpuInfo: GpuInfo{
memInfo: mem,
Library: "cpu",
Variant: runners.GetCPUCapability().String(),
ID: "0",
DependencyPath: depPaths,
memInfo: mem,
Library: "cpu",
ID: "0",
},
CPUs: details,
},
@@ -294,17 +283,13 @@ func GetGPUInfo() GpuInfoList {
gpuInfo.DriverMajor = driverMajor
gpuInfo.DriverMinor = driverMinor
variant := cudaVariant(gpuInfo)
if depPaths != nil {
gpuInfo.DependencyPath = depPaths
// Check for variant specific directory
if variant != "" {
for _, d := range depPaths {
if _, err := os.Stat(filepath.Join(d, "cuda_"+variant)); err == nil {
// Put the variant directory first in the search path to avoid runtime linking to the wrong library
gpuInfo.DependencyPath = append([]string{filepath.Join(d, "cuda_"+variant)}, gpuInfo.DependencyPath...)
break
}
}
// Start with our bundled libraries
if variant != "" {
variantPath := filepath.Join(LibOllamaPath, "cuda_"+variant)
if _, err := os.Stat(variantPath); err == nil {
// Put the variant directory first in the search path to avoid runtime linking to the wrong library
gpuInfo.DependencyPath = append([]string{variantPath}, gpuInfo.DependencyPath...)
}
}
gpuInfo.Name = C.GoString(&memInfo.gpu_name[0])
@@ -376,7 +361,7 @@ func GetGPUInfo() GpuInfoList {
gpuInfo.FreeMemory = uint64(memInfo.free)
gpuInfo.ID = C.GoString(&memInfo.gpu_id[0])
gpuInfo.Name = C.GoString(&memInfo.gpu_name[0])
gpuInfo.DependencyPath = depPaths
gpuInfo.DependencyPath = []string{LibOllamaPath}
oneapiGPUs = append(oneapiGPUs, gpuInfo)
}
}
@@ -512,37 +497,33 @@ func GetGPUInfo() GpuInfoList {
func FindGPULibs(baseLibName string, defaultPatterns []string) []string {
// Multiple GPU libraries may exist, and some may not work, so keep trying until we exhaust them
var ldPaths []string
gpuLibPaths := []string{}
slog.Debug("Searching for GPU library", "name", baseLibName)
// Start with our bundled libraries
patterns := []string{}
for _, d := range LibraryDirs() {
patterns = append(patterns, filepath.Join(d, baseLibName))
}
// search our bundled libraries first
patterns := []string{filepath.Join(LibOllamaPath, baseLibName)}
var ldPaths []string
switch runtime.GOOS {
case "windows":
ldPaths = strings.Split(os.Getenv("PATH"), ";")
ldPaths = strings.Split(os.Getenv("PATH"), string(os.PathListSeparator))
case "linux":
ldPaths = strings.Split(os.Getenv("LD_LIBRARY_PATH"), ":")
default:
return gpuLibPaths
ldPaths = strings.Split(os.Getenv("LD_LIBRARY_PATH"), string(os.PathListSeparator))
}
// Then with whatever we find in the PATH/LD_LIBRARY_PATH
for _, ldPath := range ldPaths {
d, err := filepath.Abs(ldPath)
// then search the system's LD_LIBRARY_PATH
for _, p := range ldPaths {
p, err := filepath.Abs(p)
if err != nil {
continue
}
patterns = append(patterns, filepath.Join(d, baseLibName))
patterns = append(patterns, filepath.Join(p, baseLibName))
}
// finally, search the default patterns provided by the caller
patterns = append(patterns, defaultPatterns...)
slog.Debug("gpu library search", "globs", patterns)
for _, pattern := range patterns {
// Nvidia PhysX known to return bogus results
if strings.Contains(pattern, "PhysX") {
slog.Debug("skipping PhysX cuda library path", "path", pattern)
@@ -716,28 +697,6 @@ func (l GpuInfoList) GetVisibleDevicesEnv() (string, string) {
}
}
func LibraryDirs() []string {
// dependencies can exist wherever we found the runners (e.g. build tree for developers) and relative to the executable
// This can be simplified once we no longer carry runners as payloads
paths := []string{}
appExe, err := os.Executable()
if err != nil {
slog.Warn("failed to lookup executable path", "error", err)
} else {
appRelative := filepath.Join(filepath.Dir(appExe), envconfig.LibRelativeToExe(), "lib", "ollama")
if _, err := os.Stat(appRelative); err == nil {
paths = append(paths, appRelative)
}
}
rDir := runners.Locate()
if err != nil {
slog.Warn("unable to locate gpu dependency libraries", "error", err)
} else {
paths = append(paths, filepath.Dir(rDir))
}
return paths
}
func GetSystemInfo() SystemInfo {
gpus := GetGPUInfo()
gpuMutex.Lock()

View File

@@ -15,7 +15,6 @@ import (
"syscall"
"github.com/ollama/ollama/format"
"github.com/ollama/ollama/runners"
)
const (
@@ -28,7 +27,6 @@ func GetGPUInfo() GpuInfoList {
return []GpuInfo{
{
Library: "cpu",
Variant: runners.GetCPUCapability().String(),
memInfo: mem,
},
}
@@ -51,7 +49,6 @@ func GetCPUInfo() GpuInfoList {
return []GpuInfo{
{
Library: "cpu",
Variant: runners.GetCPUCapability().String(),
memInfo: mem,
},
}

57
discover/path.go Normal file
View File

@@ -0,0 +1,57 @@
package discover
import (
"os"
"path/filepath"
"runtime"
)
// LibPath is a path to lookup dynamic libraries
// in development it's usually 'build/lib/ollama'
// in distribution builds it's 'lib/ollama' on Windows
// '../lib/ollama' on Linux and the executable's directory on macOS
// note: distribution builds, additional GPU-specific libraries are
// found in subdirectories of the returned path, such as
// 'cuda_v11', 'cuda_v12', 'rocm', etc.
var LibOllamaPath string = func() string {
exe, err := os.Executable()
if err != nil {
return ""
}
exe, err = filepath.EvalSymlinks(exe)
if err != nil {
return ""
}
var libPath string
switch runtime.GOOS {
case "windows":
libPath = filepath.Join(filepath.Dir(exe), "lib", "ollama")
case "linux":
libPath = filepath.Join(filepath.Dir(exe), "..", "lib", "ollama")
case "darwin":
libPath = filepath.Dir(exe)
}
cwd, err := os.Getwd()
if err != nil {
return ""
}
paths := []string{
libPath,
// build paths for development
filepath.Join(filepath.Dir(exe), "build", "lib", "ollama"),
filepath.Join(cwd, "build", "lib", "ollama"),
}
for _, p := range paths {
if _, err := os.Stat(p); err == nil {
return p
}
}
return filepath.Dir(exe)
}()

View File

@@ -5,7 +5,6 @@ import (
"log/slog"
"github.com/ollama/ollama/format"
"github.com/ollama/ollama/runners"
)
type memInfo struct {
@@ -107,7 +106,7 @@ func (l GpuInfoList) ByLibrary() []GpuInfoList {
for _, info := range l {
found := false
requested := info.Library
if info.Variant != runners.CPUCapabilityNone.String() {
if info.Variant != "" {
requested += "_" + info.Variant
}
for i, lib := range libs {

View File

@@ -2,7 +2,7 @@
### Getting Started
* [Quickstart](../README.md#quickstart)
* [Examples](../examples)
* [Examples](./examples.md)
* [Importing models](./import.md)
* [Linux Documentation](./linux.md)
* [Windows Documentation](./windows.md)

View File

@@ -13,6 +13,7 @@
- [Push a Model](#push-a-model)
- [Generate Embeddings](#generate-embeddings)
- [List Running Models](#list-running-models)
- [Version](#version)
## Conventions
@@ -305,7 +306,7 @@ curl http://localhost:11434/api/generate -d '{
#### Response
```
```json
{
"model": "llava",
"created_at": "2023-11-03T15:36:02.583064Z",
@@ -494,14 +495,14 @@ Generate the next message in a chat with a provided model. This is a streaming e
- `model`: (required) the [model name](#model-names)
- `messages`: the messages of the chat, this can be used to keep a chat memory
- `tools`: tools for the model to use if supported. Requires `stream` to be set to `false`
- `tools`: list of tools in JSON for the model to use if supported
The `message` object has the following fields:
- `role`: the role of the message, either `system`, `user`, `assistant`, or `tool`
- `content`: the content of the message
- `images` (optional): a list of images to include in the message (for multimodal models such as `llava`)
- `tool_calls` (optional): a list of tools the model wants to use
- `tool_calls` (optional): a list of tools in JSON that the model wants to use
Advanced parameters (optional):
@@ -794,7 +795,7 @@ curl http://localhost:11434/api/chat -d '{
##### Request
```
```shell
curl http://localhost:11434/api/chat -d '{
"model": "llama3.2",
"messages": [
@@ -869,7 +870,7 @@ If the messages array is empty, the model will be loaded into memory.
##### Request
```
```shell
curl http://localhost:11434/api/chat -d '{
"model": "llama3.2",
"messages": []
@@ -896,7 +897,7 @@ If the messages array is empty and the `keep_alive` parameter is set to `0`, a m
##### Request
```
```shell
curl http://localhost:11434/api/chat -d '{
"model": "llama3.2",
"messages": [],
@@ -927,14 +928,25 @@ A single JSON object is returned:
POST /api/create
```
Create a model from a [`Modelfile`](./modelfile.md). It is recommended to set `modelfile` to the content of the Modelfile rather than just set `path`. This is a requirement for remote create. Remote model creation must also create any file blobs, fields such as `FROM` and `ADAPTER`, explicitly with the server using [Create a Blob](#create-a-blob) and the value to the path indicated in the response.
Create a model from:
* another model;
* a safetensors directory; or
* a GGUF file.
If you are creating a model from a safetensors directory or from a GGUF file, you must [create a blob](#create-a-blob) for each of the files and then use the file name and SHA256 digest associated with each blob in the `files` field.
### Parameters
- `model`: name of the model to create
- `modelfile` (optional): contents of the Modelfile
- `from`: (optional) name of an existing model to create the new model from
- `files`: (optional) a dictionary of file names to SHA256 digests of blobs to create the model from
- `adapters`: (optional) a dictionary of file names to SHA256 digests of blobs for LORA adapters
- `template`: (optional) the prompt template for the model
- `license`: (optional) a string or list of strings containing the license or licenses for the model
- `system`: (optional) a string containing the system prompt for the model
- `parameters`: (optional) a dictionary of parameters for the model (see [Modelfile](./modelfile.md#valid-parameters-and-values) for a list of parameters)
- `messages`: (optional) a list of message objects used to create a conversation
- `stream`: (optional) if `false` the response will be returned as a single response object, rather than a stream of objects
- `path` (optional): path to the Modelfile
- `quantize` (optional): quantize a non-quantized (e.g. float16) model
#### Quantization types
@@ -960,14 +972,15 @@ Create a model from a [`Modelfile`](./modelfile.md). It is recommended to set `m
#### Create a new model
Create a new model from a `Modelfile`.
Create a new model from an existing model.
##### Request
```shell
curl http://localhost:11434/api/create -d '{
"model": "mario",
"modelfile": "FROM llama3\nSYSTEM You are mario from Super Mario Bros."
"from": "llama3.2",
"system": "You are Mario from Super Mario Bros."
}'
```
@@ -998,7 +1011,7 @@ Quantize a non-quantized model.
```shell
curl http://localhost:11434/api/create -d '{
"model": "llama3.1:quantized",
"modelfile": "FROM llama3.1:8b-instruct-fp16",
"from": "llama3.1:8b-instruct-fp16",
"quantize": "q4_K_M"
}'
```
@@ -1018,52 +1031,112 @@ A stream of JSON objects is returned:
{"status":"success"}
```
#### Create a model from GGUF
### Check if a Blob Exists
Create a model from a GGUF file. The `files` parameter should be filled out with the file name and SHA256 digest of the GGUF file you wish to use. Use [/api/blobs/:digest](#push-a-blob) to push the GGUF file to the server before calling this API.
##### Request
```shell
curl http://localhost:11434/api/create -d '{
"model": "my-gguf-model",
"files": {
"test.gguf": "sha256:432f310a77f4650a88d0fd59ecdd7cebed8d684bafea53cbff0473542964f0c3"
}
}'
```
##### Response
A stream of JSON objects is returned:
```
{"status":"parsing GGUF"}
{"status":"using existing layer sha256:432f310a77f4650a88d0fd59ecdd7cebed8d684bafea53cbff0473542964f0c3"}
{"status":"writing manifest"}
{"status":"success"}
```
#### Create a model from a Safetensors directory
The `files` parameter should include a dictionary of files for the safetensors model which includes the file names and SHA256 digest of each file. Use [/api/blobs/:digest](#push-a-blob) to first push each of the files to the server before calling this API. Files will remain in the cache until the Ollama server is restarted.
##### Request
```shell
curl http://localhost:11434/api/create -d '{
"model": "fred",
"files": {
"config.json": "sha256:dd3443e529fb2290423a0c65c2d633e67b419d273f170259e27297219828e389",
"generation_config.json": "sha256:88effbb63300dbbc7390143fbbdd9d9fa50587b37e8bfd16c8c90d4970a74a36",
"special_tokens_map.json": "sha256:b7455f0e8f00539108837bfa586c4fbf424e31f8717819a6798be74bef813d05",
"tokenizer.json": "sha256:bbc1904d35169c542dffbe1f7589a5994ec7426d9e5b609d07bab876f32e97ab",
"tokenizer_config.json": "sha256:24e8a6dc2547164b7002e3125f10b415105644fcf02bf9ad8b674c87b1eaaed6",
"model.safetensors": "sha256:1ff795ff6a07e6a68085d206fb84417da2f083f68391c2843cd2b8ac6df8538f"
}
}'
```
##### Response
A stream of JSON objects is returned:
```shell
{"status":"converting model"}
{"status":"creating new layer sha256:05ca5b813af4a53d2c2922933936e398958855c44ee534858fcfd830940618b6"}
{"status":"using autodetected template llama3-instruct"}
{"status":"using existing layer sha256:56bb8bd477a519ffa694fc449c2413c6f0e1d3b1c88fa7e3c9d88d3ae49d4dcb"}
{"status":"writing manifest"}
{"status":"success"}
```
## Check if a Blob Exists
```shell
HEAD /api/blobs/:digest
```
Ensures that the file blob used for a FROM or ADAPTER field exists on the server. This is checking your Ollama server and not ollama.com.
Ensures that the file blob (Binary Large Object) used with create a model exists on the server. This checks your Ollama server and not ollama.com.
#### Query Parameters
### Query Parameters
- `digest`: the SHA256 digest of the blob
#### Examples
### Examples
##### Request
#### Request
```shell
curl -I http://localhost:11434/api/blobs/sha256:29fdb92e57cf0827ded04ae6461b5931d01fa595843f55d36f5b275a52087dd2
```
##### Response
#### Response
Return 200 OK if the blob exists, 404 Not Found if it does not.
### Create a Blob
## Push a Blob
```shell
POST /api/blobs/:digest
```
Create a blob from a file on the server. Returns the server file path.
Push a file to the Ollama server to create a "blob" (Binary Large Object).
#### Query Parameters
### Query Parameters
- `digest`: the expected SHA256 digest of the file
#### Examples
### Examples
##### Request
#### Request
```shell
curl -T model.bin -X POST http://localhost:11434/api/blobs/sha256:29fdb92e57cf0827ded04ae6461b5931d01fa595843f55d36f5b275a52087dd2
curl -T model.gguf -X POST http://localhost:11434/api/blobs/sha256:29fdb92e57cf0827ded04ae6461b5931d01fa595843f55d36f5b275a52087dd2
```
##### Response
#### Response
Return 201 Created if the blob was successfully created, 400 Bad Request if the digest used is not expected.
@@ -1526,3 +1599,29 @@ curl http://localhost:11434/api/embeddings -d '{
]
}
```
## Version
```shell
GET /api/version
```
Retrieve the Ollama version
### Examples
#### Request
```shell
curl http://localhost:11434/api/version
```
#### Response
```json
{
"version": "0.5.1"
}
```

View File

@@ -1,165 +1,131 @@
# Development
Install required tools:
Install prerequisites:
- go version 1.22 or higher
- OS specific C/C++ compiler (see below)
- GNU Make
- [Go](https://go.dev/doc/install)
- C/C++ Compiler e.g. Clang on macOS, [TDM-GCC](https://jmeubank.github.io/tdm-gcc/download/) (Windows amd64) or [llvm-mingw](https://github.com/mstorsjo/llvm-mingw) (Windows arm64), GCC/Clang on Linux.
## Overview
Ollama uses a mix of Go and C/C++ code to interface with GPUs. The C/C++ code is compiled with both CGO and GPU library specific compilers. A set of GNU Makefiles are used to compile the project. GPU Libraries are auto-detected based on the typical environment variables used by the respective libraries, but can be overridden if necessary. The default make target will build the runners and primary Go Ollama application that will run within the repo directory. Throughout the examples below `-j 5` is suggested for 5 parallel jobs to speed up the build. You can adjust the job count based on your CPU Core count to reduce build times. If you want to relocate the built binaries, use the `dist` target and recursively copy the files in `./dist/$OS-$ARCH/` to your desired location. To learn more about the other make targets use `make help`
Once you have built the GPU/CPU runners, you can compile the main application with `go build .`
### MacOS
[Download Go](https://go.dev/dl/)
```bash
make -j 5
```
Now you can run `ollama`:
```bash
./ollama
```
#### Xcode 15 warnings
If you are using Xcode newer than version 14, you may see a warning during `go build` about `ld: warning: ignoring duplicate libraries: '-lobjc'` due to Golang issue https://github.com/golang/go/issues/67799 which can be safely ignored. You can suppress the warning with `export CGO_LDFLAGS="-Wl,-no_warn_duplicate_libraries"`
### Linux
#### Linux CUDA (NVIDIA)
_Your operating system distribution may already have packages for NVIDIA CUDA. Distro packages are often preferable, but instructions are distro-specific. Please consult distro-specific docs for dependencies if available!_
Install `make`, `gcc` and `golang` as well as [NVIDIA CUDA](https://developer.nvidia.com/cuda-downloads)
development and runtime packages.
Typically the makefile will auto-detect CUDA, however, if your Linux distro
or installation approach uses alternative paths, you can specify the location by
overriding `CUDA_PATH` to the location of the CUDA toolkit. You can customize
a set of target CUDA architectures by setting `CUDA_ARCHITECTURES` (e.g. `CUDA_ARCHITECTURES=50;60;70`)
Then build and run Ollama from the root directory of the repository:
```
make -j 5
go run . serve
```
If both v11 and v12 tookkits are detected, runners for both major versions will be built by default. You can build just v12 with `make cuda_v12`
## macOS (Apple Silicon)
#### Older Linux CUDA (NVIDIA)
macOS Apple Silicon supports Metal which is built-in to the Ollama binary. No additional steps are required.
To support older GPUs with Compute Capability 3.5 or 3.7, you will need to use an older version of the Driver from [Unix Driver Archive](https://www.nvidia.com/en-us/drivers/unix/) (tested with 470) and [CUDA Toolkit Archive](https://developer.nvidia.com/cuda-toolkit-archive) (tested with cuda V11). When you build Ollama, you will need to set two make variable to adjust the minimum compute capability Ollama supports via `make -j 5 CUDA_ARCHITECTURES="35;37;50;52" EXTRA_GOLDFLAGS="\"-X=github.com/ollama/ollama/discover.CudaComputeMajorMin=3\" \"-X=github.com/ollama/ollama/discover.CudaComputeMinorMin=5\""`. To find the Compute Capability of your older GPU, refer to [GPU Compute Capability](https://developer.nvidia.com/cuda-gpus).
## macOS (Intel)
#### Linux ROCm (AMD)
Install prerequisites:
_Your operating system distribution may already have packages for AMD ROCm. Distro packages are often preferable, but instructions are distro-specific. Please consult distro-specific docs for dependencies if available!_
- [CMake](https://cmake.org/download/) or `brew install cmake`
Install [ROCm](https://rocm.docs.amd.com/en/latest/) development packages first, as well as `make`, `gcc`, and `golang`.
Typically the build scripts will auto-detect ROCm, however, if your Linux distro
or installation approach uses unusual paths, you can specify the location by
specifying an environment variable `HIP_PATH` to the location of the ROCm
install (typically `/opt/rocm`). You can also customize
the AMD GPU targets by setting HIP_ARCHS (e.g. `HIP_ARCHS=gfx1101;gfx1102`)
Then, configure and build the project:
```
make -j 5
cmake -B build
cmake --build build
```
ROCm requires elevated privileges to access the GPU at runtime. On most distros you can add your user account to the `render` group, or run as root.
#### Containerized Linux Build
If you have Docker and buildx available, you can build linux binaries with `./scripts/build_linux.sh` which has the CUDA and ROCm dependencies included. The resulting artifacts are placed in `./dist` and by default the script builds both arm64 and amd64 binaries. If you want to build only amd64, you can build with `PLATFORM=linux/amd64 ./scripts/build_linux.sh`
### Windows
The following tools are required as a minimal development environment to build CPU inference support.
- Go version 1.22 or higher
- https://go.dev/dl/
- Git
- https://git-scm.com/download/win
- clang with gcc compat and Make. There are multiple options on how to go about installing these tools on Windows. We have verified the following, but others may work as well:
- [MSYS2](https://www.msys2.org/)
- After installing, from an MSYS2 terminal, run `pacman -S mingw-w64-clang-x86_64-gcc-compat mingw-w64-clang-x86_64-clang make` to install the required tools
- Assuming you used the default install prefix for msys2 above, add `C:\msys64\clang64\bin` and `c:\msys64\usr\bin` to your environment variable `PATH` where you will perform the build steps below (e.g. system-wide, account-level, powershell, cmd, etc.)
> [!NOTE]
> Due to bugs in the GCC C++ library for unicode support, Ollama should be built with clang on windows.
Lastly, run Ollama:
```
make -j 5
go run . serve
```
#### GPU Support
## Windows
The GPU tools require the Microsoft native build tools. To build either CUDA or ROCm, you must first install MSVC via Visual Studio:
Install prerequisites:
- Make sure to select `Desktop development with C++` as a Workload during the Visual Studio install
- You must complete the Visual Studio install and run it once **BEFORE** installing CUDA or ROCm for the tools to properly register
- Add the location of the **64 bit (x64)** compiler (`cl.exe`) to your `PATH`
- Note: the default Developer Shell may configure the 32 bit (x86) compiler which will lead to build failures. Ollama requires a 64 bit toolchain.
- [CMake](https://cmake.org/download/)
- [Visual Studio 2022](https://visualstudio.microsoft.com/downloads/) including the Native Desktop Workload
- (Optional) AMD GPU support
- [ROCm](https://rocm.github.io/install.html)
- [Ninja](https://github.com/ninja-build/ninja/releases)
- (Optional) NVIDIA GPU support
- [CUDA SDK](https://developer.nvidia.com/cuda-downloads?target_os=Windows&target_arch=x86_64&target_version=11&target_type=exe_network)
#### Windows CUDA (NVIDIA)
> [!IMPORTANT]
> Ensure prerequisites are in `PATH` before running CMake.
In addition to the common Windows development tools and MSVC described above:
> [!IMPORTANT]
> ROCm is not compatible with Visual Studio CMake generators. Use `-GNinja` when configuring the project.
- [NVIDIA CUDA](https://docs.nvidia.com/cuda/cuda-installation-guide-microsoft-windows/index.html)
> [!IMPORTANT]
> CUDA is only compatible with Visual Studio CMake generators.
#### Windows ROCm (AMD Radeon)
In addition to the common Windows development tools and MSVC described above:
- [AMD HIP](https://www.amd.com/en/developer/resources/rocm-hub/hip-sdk.html)
#### Windows arm64
The default `Developer PowerShell for VS 2022` may default to x86 which is not what you want. To ensure you get an arm64 development environment, start a plain PowerShell terminal and run:
```powershell
import-module 'C:\\Program Files\\Microsoft Visual Studio\\2022\\Community\\Common7\\Tools\\Microsoft.VisualStudio.DevShell.dll'
Enter-VsDevShell -Arch arm64 -vsinstallpath 'C:\\Program Files\\Microsoft Visual Studio\\2022\\Community' -skipautomaticlocation
```
You can confirm with `write-host $env:VSCMD_ARG_TGT_ARCH`
Follow the instructions at https://www.msys2.org/wiki/arm64/ to set up an arm64 msys2 environment. Ollama requires gcc and mingw32-make to compile, which is not currently available on Windows arm64, but a gcc compatibility adapter is available via `mingw-w64-clang-aarch64-gcc-compat`. At a minimum you will need to install the following:
Then, configure and build the project:
```
pacman -S mingw-w64-clang-aarch64-clang mingw-w64-clang-aarch64-gcc-compat mingw-w64-clang-aarch64-make make
cmake -B build
cmake --build build --config Release
```
You will need to ensure your PATH includes go, cmake, gcc and clang mingw32-make to build ollama from source. (typically `C:\msys64\clangarm64\bin\`)
## Advanced CPU Vector Settings
On x86, running `make` will compile several CPU runners which can run on different CPU families. At runtime, Ollama will auto-detect the best variation to load. If GPU libraries are present at build time, Ollama also compiles GPU runners with the `AVX` CPU vector feature enabled. This provides a good performance balance when loading large models that split across GPU and CPU with broad compatibility. Some users may prefer no vector extensions (e.g. older Xeon/Celeron processors, or hypervisors that mask the vector features) while other users may prefer turning on many more vector extensions to further improve performance for split model loads.
To customize the set of CPU vector features enabled for a CPU runner and all GPU runners, use CUSTOM_CPU_FLAGS during the build.
To build without any vector flags:
Lastly, run Ollama:
```
make CUSTOM_CPU_FLAGS=""
go run . serve
```
To build with both AVX and AVX2:
```
make CUSTOM_CPU_FLAGS=avx,avx2
```
## Windows (ARM)
To build with AVX512 features turned on:
Windows ARM does not support additional acceleration libraries at this time.
## Linux
Install prerequisites:
- [CMake](https://cmake.org/download/) or `sudo apt install cmake` or `sudo dnf install cmake`
- (Optional) AMD GPU support
- [ROCm](https://rocm.docs.amd.com/projects/install-on-linux/en/latest/install/quick-start.html)
- (Optional) NVIDIA GPU support
- [CUDA SDK](https://developer.nvidia.com/cuda-downloads)
> [!IMPORTANT]
> Ensure prerequisites are in `PATH` before running CMake.
Then, configure and build the project:
```
make CUSTOM_CPU_FLAGS=avx,avx2,avx512,avx512vbmi,avx512vnni,avx512bf16
cmake -B build
cmake --build build
```
> [!NOTE]
> If you are experimenting with different flags, make sure to do a `make clean` between each change to ensure everything is rebuilt with the new compiler flags
Lastly, run Ollama:
```
go run . serve
```
## Docker
```
docker build .
```
### ROCm
```
docker build --build-arg FLAVOR=rocm .
```
## Running tests
To run tests, use `go test`:
```
go test ./...
```
## Library detection
Ollama looks for acceleration libraries in the following paths relative to the `ollama` executable:
* `./lib/ollama` (Windows)
* `../lib/ollama` (Linux)
* `.` (macOS)
* `build/lib/ollama` (for development)
If the libraries are not found, Ollama will not run with any acceleration libraries.

View File

@@ -12,3 +12,9 @@ Ollama JavaScript examples at [ollama-js/examples](https://github.com/ollama/oll
## OpenAI compatibility examples
Ollama OpenAI compatibility examples at [ollama/examples/openai](../docs/openai.md)
## Community examples
- [LangChain Ollama Python](https://python.langchain.com/docs/integrations/chat/ollama/)
- [LangChain Ollama JS](https://js.langchain.com/docs/integrations/chat/ollama/)

View File

@@ -38,7 +38,7 @@ Numeric IDs may be used, however ordering may vary, so UUIDs are more reliable.
You can discover the UUID of your GPUs by running `nvidia-smi -L` If you want to
ignore the GPUs and force CPU usage, use an invalid GPU ID (e.g., "-1")
### Laptop Suspend Resume
### Linux Suspend Resume
On linux, after a suspend/resume cycle, sometimes Ollama will fail to discover
your NVIDIA GPU, and fallback to running on the CPU. You can workaround this

View File

@@ -152,7 +152,7 @@ Use `OLLAMA_VERSION` environment variable with the install script to install a s
For example:
```shell
curl -fsSL https://ollama.com/install.sh | OLLAMA_VERSION=0.3.9 sh
curl -fsSL https://ollama.com/install.sh | OLLAMA_VERSION=0.5.7 sh
```
## Viewing logs

View File

@@ -67,8 +67,6 @@ To use this:
3. `ollama run choose-a-model-name`
4. Start using the model!
More examples are available in the [examples directory](../examples).
To view the Modelfile of a given model, use the `ollama show --modelfile` command.
```bash
@@ -155,7 +153,6 @@ PARAMETER <parameter> <parametervalue>
| temperature | The temperature of the model. Increasing the temperature will make the model answer more creatively. (Default: 0.8) | float | temperature 0.7 |
| seed | Sets the random number seed to use for generation. Setting this to a specific number will make the model generate the same text for the same prompt. (Default: 0) | int | seed 42 |
| stop | Sets the stop sequences to use. When this pattern is encountered the LLM will stop generating text and return. Multiple stop patterns may be set by specifying multiple separate `stop` parameters in a modelfile. | string | stop "AI assistant:" |
| tfs_z | Tail free sampling is used to reduce the impact of less probable tokens from the output. A higher value (e.g., 2.0) will reduce the impact more, while a value of 1.0 disables this setting. (default: 1) | float | tfs_z 1 |
| num_predict | Maximum number of tokens to predict when generating text. (Default: -1, infinite generation) | int | num_predict 42 |
| top_k | Reduces the probability of generating nonsense. A higher value (e.g. 100) will give more diverse answers, while a lower value (e.g. 10) will be more conservative. (Default: 40) | int | top_k 40 |
| top_p | Works together with top-k. A higher value (e.g., 0.95) will lead to more diverse text, while a lower value (e.g., 0.5) will generate more focused and conservative text. (Default: 0.9) | float | top_p 0.9 |

View File

@@ -233,6 +233,8 @@ curl http://localhost:11434/v1/embeddings \
- [x] `seed`
- [x] `stop`
- [x] `stream`
- [x] `stream_options`
- [x] `include_usage`
- [x] `temperature`
- [x] `top_p`
- [x] `max_tokens`
@@ -261,6 +263,8 @@ curl http://localhost:11434/v1/embeddings \
- [x] `seed`
- [x] `stop`
- [x] `stream`
- [x] `stream_options`
- [x] `include_usage`
- [x] `temperature`
- [x] `top_p`
- [x] `max_tokens`

View File

@@ -111,7 +111,7 @@ Keep the following tips and best practices in mind when working with Go template
ChatML is a popular template format. It can be used for models such as Databrick's DBRX, Intel's Neural Chat, and Microsoft's Orca 2.
```gotmpl
```go
{{- range .Messages }}<|im_start|>{{ .Role }}
{{ .Content }}<|im_end|>
{{ end }}<|im_start|>assistant
@@ -125,7 +125,7 @@ Tools support can be added to a model by adding a `{{ .Tools }}` node to the tem
Mistral v0.3 and Mixtral 8x22B supports tool calling.
```gotmpl
```go
{{- range $index, $_ := .Messages }}
{{- if eq .Role "user" }}
{{- if and (le (len (slice $.Messages $index)) 2) $.Tools }}[AVAILABLE_TOOLS] {{ json $.Tools }}[/AVAILABLE_TOOLS]
@@ -151,7 +151,7 @@ Fill-in-middle support can be added to a model by adding a `{{ .Suffix }}` node
CodeLlama [7B](https://ollama.com/library/codellama:7b-code) and [13B](https://ollama.com/library/codellama:13b-code) code completion models support fill-in-middle.
```gotmpl
```go
<PRE> {{ .Prompt }} <SUF>{{ .Suffix }} <MID>
```

View File

@@ -1,9 +0,0 @@
# Tutorials
Here is a list of ways you can use Ollama with other tools to build interesting applications.
- [Using LangChain with Ollama in JavaScript](./tutorials/langchainjs.md)
- [Using LangChain with Ollama in Python](./tutorials/langchainpy.md)
- [Running Ollama on NVIDIA Jetson Devices](./tutorials/nvidia-jetson.md)
Also be sure to check out the [examples](../examples) directory for more ways to use Ollama.

View File

@@ -288,12 +288,3 @@ func Values() map[string]string {
func Var(key string) string {
return strings.Trim(strings.TrimSpace(os.Getenv(key)), "\"'")
}
// On windows, we keep the binary at the top directory, but
// other platforms use a "bin" directory, so this returns ".."
func LibRelativeToExe() string {
if runtime.GOOS == "windows" {
return "."
}
return ".."
}

174
examples/.gitignore vendored
View File

@@ -1,174 +0,0 @@
node_modules
bun.lockb
.vscode
# OSX
.DS_STORE
# Models
models/
# Local Chroma db
.chroma/
db/
# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class
# C extensions
*.so
# Distribution / packaging
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
share/python-wheels/
*.egg-info/
.installed.cfg
*.egg
MANIFEST
# PyInstaller
# Usually these files are written by a python script from a template
# before PyInstaller builds the exe, so as to inject date/other infos into it.
*.manifest
*.spec
# Installer logs
pip-log.txt
pip-delete-this-directory.txt
# Unit test / coverage reports
htmlcov/
.tox/
.nox/
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*.cover
*.py,cover
.hypothesis/
.pytest_cache/
cover/
# Translations
*.mo
*.pot
# Django stuff:
*.log
local_settings.py
db.sqlite3
db.sqlite3-journal
# Flask stuff:
instance/
.webassets-cache
# Scrapy stuff:
.scrapy
# Sphinx documentation
docs/_build/
# PyBuilder
.pybuilder/
target/
# Jupyter Notebook
.ipynb_checkpoints
# IPython
profile_default/
ipython_config.py
# pyenv
# For a library or package, you might want to ignore these files since the code is
# intended to run in multiple environments; otherwise, check them in:
# .python-version
# pipenv
# According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
# However, in case of collaboration, if having platform-specific dependencies or dependencies
# having no cross-platform support, pipenv may install dependencies that don't work, or not
# install all needed dependencies.
#Pipfile.lock
# poetry
# Similar to Pipfile.lock, it is generally recommended to include poetry.lock in version control.
# This is especially recommended for binary packages to ensure reproducibility, and is more
# commonly ignored for libraries.
# https://python-poetry.org/docs/basic-usage/#commit-your-poetrylock-file-to-version-control
#poetry.lock
# pdm
# Similar to Pipfile.lock, it is generally recommended to include pdm.lock in version control.
#pdm.lock
# pdm stores project-wide configurations in .pdm.toml, but it is recommended to not include it
# in version control.
# https://pdm.fming.dev/#use-with-ide
.pdm.toml
# PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm
__pypackages__/
# Celery stuff
celerybeat-schedule
celerybeat.pid
# SageMath parsed files
*.sage.py
# Environments
.env
.venv
env/
venv/
ENV/
env.bak/
venv.bak/
# Spyder project settings
.spyderproject
.spyproject
# Rope project settings
.ropeproject
# mkdocs documentation
/site
# mypy
.mypy_cache/
.dmypy.json
dmypy.json
# Pyre type checker
.pyre/
# pytype static type analyzer
.pytype/
# Cython debug symbols
cython_debug/
# PyCharm
# JetBrains specific template is maintained in a separate JetBrains.gitignore that can
# be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore
# and can be added to the global gitignore or merged into this file. For a more nuclear
# option (not recommended) you can uncomment the following to ignore the entire idea folder.
#.idea/

View File

@@ -1 +0,0 @@
fly.toml

View File

@@ -1,67 +0,0 @@
# Deploy Ollama to Fly.io
> Note: this example exposes a public endpoint and does not configure authentication. Use with care.
## Prerequisites
- Ollama: https://ollama.com/download
- Fly.io account. Sign up for a free account: https://fly.io/app/sign-up
## Steps
1. Login to Fly.io
```bash
fly auth login
```
1. Create a new Fly app
```bash
fly launch --name <name> --image ollama/ollama --internal-port 11434 --vm-size shared-cpu-8x --now
```
1. Pull and run `orca-mini:3b`
```bash
OLLAMA_HOST=https://<name>.fly.dev ollama run orca-mini:3b
```
`shared-cpu-8x` is a free-tier eligible machine type. For better performance, switch to a `performance` or `dedicated` machine type or attach a GPU for hardware acceleration (see below).
## (Optional) Persistent Volume
By default Fly Machines use ephemeral storage which is problematic if you want to use the same model across restarts without pulling it again. Create and attach a persistent volume to store the downloaded models:
1. Create the Fly Volume
```bash
fly volume create ollama
```
1. Update `fly.toml` and add `[mounts]`
```toml
[mounts]
source = "ollama"
destination = "/mnt/ollama/models"
```
1. Update `fly.toml` and add `[env]`
```toml
[env]
OLLAMA_MODELS = "/mnt/ollama/models"
```
1. Deploy your app
```bash
fly deploy
```
## (Optional) Hardware Acceleration
Fly.io GPU is currently in waitlist. Sign up for the waitlist: https://fly.io/gpu
Once you've been accepted, create the app with the additional flags `--vm-gpu-kind a100-pcie-40gb` or `--vm-gpu-kind a100-pcie-80gb`.

View File

@@ -1,29 +0,0 @@
package main
import (
"bytes"
"fmt"
"io"
"log"
"net/http"
"os"
)
func main() {
body := []byte(`{"model":"mistral"}`)
resp, err := http.Post("http://localhost:11434/api/generate", "application/json", bytes.NewBuffer(body))
if err != nil {
fmt.Print(err.Error())
os.Exit(1)
}
defer resp.Body.Close()
responseData, err := io.ReadAll(resp.Body)
if err != nil {
log.Fatal(err)
}
fmt.Println(string(responseData))
}

View File

@@ -1,5 +0,0 @@
# Ollama Jupyter Notebook
This example downloads and installs Ollama in a Jupyter instance such as Google Colab. It will start the Ollama service and expose an endpoint using `ngrok` which can be used to communicate with the Ollama instance remotely.
For best results, use an instance with GPU accelerator.

View File

@@ -1,102 +0,0 @@
{
"cells": [
{
"cell_type": "code",
"execution_count": null,
"id": "93f59dcb-c588-41b8-a792-55d88ade739c",
"metadata": {},
"outputs": [],
"source": [
"# Download and run the Ollama Linux install script\n",
"!curl -fsSL https://ollama.com/install.sh | sh\n",
"!command -v systemctl >/dev/null && sudo systemctl stop ollama"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "658c147e-c7f8-490e-910e-62b80f577dda",
"metadata": {},
"outputs": [],
"source": [
"!pip install aiohttp pyngrok\n",
"\n",
"import os\n",
"import asyncio\n",
"from aiohttp import ClientSession\n",
"\n",
"# Set LD_LIBRARY_PATH so the system NVIDIA library becomes preferred\n",
"# over the built-in library. This is particularly important for \n",
"# Google Colab which installs older drivers\n",
"os.environ.update({'LD_LIBRARY_PATH': '/usr/lib64-nvidia'})\n",
"\n",
"async def run(cmd):\n",
" '''\n",
" run is a helper function to run subcommands asynchronously.\n",
" '''\n",
" print('>>> starting', *cmd)\n",
" p = await asyncio.subprocess.create_subprocess_exec(\n",
" *cmd,\n",
" stdout=asyncio.subprocess.PIPE,\n",
" stderr=asyncio.subprocess.PIPE,\n",
" )\n",
"\n",
" async def pipe(lines):\n",
" async for line in lines:\n",
" print(line.strip().decode('utf-8'))\n",
"\n",
" await asyncio.gather(\n",
" pipe(p.stdout),\n",
" pipe(p.stderr),\n",
" )\n",
"\n",
"\n",
"await asyncio.gather(\n",
" run(['ollama', 'serve']),\n",
" run(['ngrok', 'http', '--log', 'stderr', '11434']),\n",
")"
]
},
{
"cell_type": "markdown",
"id": "e7735a55-9aad-4caf-8683-52e2163ba53b",
"metadata": {},
"source": [
"The previous cell starts two processes, `ollama` and `ngrok`. The log output will show a line like the following which describes the external address.\n",
"\n",
"```\n",
"t=2023-11-12T22:55:56+0000 lvl=info msg=\"started tunnel\" obj=tunnels name=command_line addr=http://localhost:11434 url=https://8249-34-125-179-11.ngrok.io\n",
"```\n",
"\n",
"The external address in this case is `https://8249-34-125-179-11.ngrok.io` which can be passed into `OLLAMA_HOST` to access this instance.\n",
"\n",
"```bash\n",
"export OLLAMA_HOST=https://8249-34-125-179-11.ngrok.io\n",
"ollama list\n",
"ollama run mistral\n",
"```"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.6"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@@ -1,38 +0,0 @@
# Deploy Ollama to Kubernetes
## Prerequisites
- Ollama: https://ollama.com/download
- Kubernetes cluster. This example will use Google Kubernetes Engine.
## Steps
1. Create the Ollama namespace, deployment, and service
```bash
kubectl apply -f cpu.yaml
```
## (Optional) Hardware Acceleration
Hardware acceleration in Kubernetes requires NVIDIA's [`k8s-device-plugin`](https://github.com/NVIDIA/k8s-device-plugin) which is deployed in Kubernetes in form of daemonset. Follow the link for more details.
Once configured, create a GPU enabled Ollama deployment.
```bash
kubectl apply -f gpu.yaml
```
## Test
1. Port forward the Ollama service to connect and use it locally
```bash
kubectl -n ollama port-forward service/ollama 11434:80
```
1. Pull and run a model, for example `orca-mini:3b`
```bash
ollama run orca-mini:3b
```

View File

@@ -1,42 +0,0 @@
---
apiVersion: v1
kind: Namespace
metadata:
name: ollama
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: ollama
namespace: ollama
spec:
selector:
matchLabels:
name: ollama
template:
metadata:
labels:
name: ollama
spec:
containers:
- name: ollama
image: ollama/ollama:latest
ports:
- name: http
containerPort: 11434
protocol: TCP
---
apiVersion: v1
kind: Service
metadata:
name: ollama
namespace: ollama
spec:
type: ClusterIP
selector:
name: ollama
ports:
- port: 80
name: http
targetPort: http
protocol: TCP

View File

@@ -1,58 +0,0 @@
---
apiVersion: v1
kind: Namespace
metadata:
name: ollama
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: ollama
namespace: ollama
spec:
strategy:
type: Recreate
selector:
matchLabels:
name: ollama
template:
metadata:
labels:
name: ollama
spec:
containers:
- name: ollama
image: ollama/ollama:latest
env:
- name: PATH
value: /usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
- name: LD_LIBRARY_PATH
value: /usr/local/nvidia/lib:/usr/local/nvidia/lib64
- name: NVIDIA_DRIVER_CAPABILITIES
value: compute,utility
ports:
- name: http
containerPort: 11434
protocol: TCP
resources:
limits:
nvidia.com/gpu: 1
tolerations:
- key: nvidia.com/gpu
operator: Exists
effect: NoSchedule
---
apiVersion: v1
kind: Service
metadata:
name: ollama
namespace: ollama
spec:
type: ClusterIP
selector:
name: ollama
ports:
- port: 80
name: http
targetPort: http
protocol: TCP

View File

@@ -1,29 +0,0 @@
# LangChain Document QA
This example provides an interface for asking questions to a PDF document.
## Setup
1. Ensure you have the `llama3.2` model installed:
```
ollama pull llama3.2
```
2. Install the Python Requirements.
```
pip install -r requirements.txt
```
## Run
```
python main.py
```
A prompt will appear, where questions may be asked:
```
Query: How many locations does WeWork have?
```

View File

@@ -1,61 +0,0 @@
from langchain.document_loaders import OnlinePDFLoader
from langchain.vectorstores import Chroma
from langchain.embeddings import GPT4AllEmbeddings
from langchain import PromptTemplate
from langchain.llms import Ollama
from langchain.callbacks.manager import CallbackManager
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler
from langchain.chains import RetrievalQA
import sys
import os
class SuppressStdout:
def __enter__(self):
self._original_stdout = sys.stdout
self._original_stderr = sys.stderr
sys.stdout = open(os.devnull, 'w')
sys.stderr = open(os.devnull, 'w')
def __exit__(self, exc_type, exc_val, exc_tb):
sys.stdout.close()
sys.stdout = self._original_stdout
sys.stderr = self._original_stderr
# load the pdf and split it into chunks
loader = OnlinePDFLoader("https://d18rn0p25nwr6d.cloudfront.net/CIK-0001813756/975b3e9b-268e-4798-a9e4-2a9a7c92dc10.pdf")
data = loader.load()
from langchain.text_splitter import RecursiveCharacterTextSplitter
text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=0)
all_splits = text_splitter.split_documents(data)
with SuppressStdout():
vectorstore = Chroma.from_documents(documents=all_splits, embedding=GPT4AllEmbeddings())
while True:
query = input("\nQuery: ")
if query == "exit":
break
if query.strip() == "":
continue
# Prompt
template = """Use the following pieces of context to answer the question at the end.
If you don't know the answer, just say that you don't know, don't try to make up an answer.
Use three sentences maximum and keep the answer as concise as possible.
{context}
Question: {question}
Helpful Answer:"""
QA_CHAIN_PROMPT = PromptTemplate(
input_variables=["context", "question"],
template=template,
)
llm = Ollama(model="llama3.2", callback_manager=CallbackManager([StreamingStdOutCallbackHandler()]))
qa_chain = RetrievalQA.from_chain_type(
llm,
retriever=vectorstore.as_retriever(),
chain_type_kwargs={"prompt": QA_CHAIN_PROMPT},
)
result = qa_chain({"query": query})

View File

@@ -1,109 +0,0 @@
absl-py==1.4.0
aiohttp==3.8.5
aiosignal==1.3.1
anyio==3.7.1
astunparse==1.6.3
async-timeout==4.0.3
attrs==23.1.0
backoff==2.2.1
beautifulsoup4==4.12.2
bs4==0.0.1
cachetools==5.3.1
certifi==2023.7.22
cffi==1.15.1
chardet==5.2.0
charset-normalizer==3.2.0
Chroma==0.2.0
chroma-hnswlib==0.7.2
chromadb==0.4.5
click==8.1.6
coloredlogs==15.0.1
cryptography==41.0.3
dataclasses-json==0.5.14
fastapi==0.99.1
filetype==1.2.0
flatbuffers==23.5.26
frozenlist==1.4.0
gast==0.4.0
google-auth==2.22.0
google-auth-oauthlib==1.0.0
google-pasta==0.2.0
gpt4all==1.0.8
grpcio==1.57.0
h11==0.14.0
h5py==3.9.0
httptools==0.6.0
humanfriendly==10.0
idna==3.4
importlib-resources==6.0.1
joblib==1.3.2
keras==2.13.1
langchain==0.0.261
langsmith==0.0.21
libclang==16.0.6
lxml==4.9.3
Markdown==3.4.4
MarkupSafe==2.1.3
marshmallow==3.20.1
monotonic==1.6
mpmath==1.3.0
multidict==6.0.4
mypy-extensions==1.0.0
nltk==3.8.1
numexpr==2.8.5
numpy==1.24.3
oauthlib==3.2.2
onnxruntime==1.15.1
openapi-schema-pydantic==1.2.4
opt-einsum==3.3.0
overrides==7.4.0
packaging==23.1
pdf2image==1.16.3
pdfminer==20191125
pdfminer.six==20221105
Pillow==10.0.0
posthog==3.0.1
protobuf==4.24.0
pulsar-client==3.2.0
pyasn1==0.5.0
pyasn1-modules==0.3.0
pycparser==2.21
pycryptodome==3.18.0
pydantic==1.10.12
PyPika==0.48.9
python-dateutil==2.8.2
python-dotenv==1.0.0
python-magic==0.4.27
PyYAML==6.0.1
regex==2023.8.8
requests==2.31.0
requests-oauthlib==1.3.1
rsa==4.9
six==1.16.0
sniffio==1.3.0
soupsieve==2.4.1
SQLAlchemy==2.0.19
starlette==0.27.0
sympy==1.12
tabulate==0.9.0
tenacity==8.2.2
tensorboard==2.13.0
tensorboard-data-server==0.7.1
tensorflow==2.13.0
tensorflow-estimator==2.13.0
tensorflow-hub==0.14.0
tensorflow-macos==2.13.0
termcolor==2.3.0
tokenizers==0.13.3
tqdm==4.66.1
typing-inspect==0.9.0
typing_extensions==4.5.0
unstructured==0.9.2
urllib3==1.26.16
uvicorn==0.23.2
uvloop==0.17.0
watchfiles==0.19.0
websockets==11.0.3
Werkzeug==2.3.6
wrapt==1.15.0
yarl==1.9.2

View File

@@ -1,170 +0,0 @@
# OSX
.DS_STORE
# Models
models/
# Local Chroma db
.chroma/
db/
# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class
# C extensions
*.so
# Distribution / packaging
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
share/python-wheels/
*.egg-info/
.installed.cfg
*.egg
MANIFEST
# PyInstaller
# Usually these files are written by a python script from a template
# before PyInstaller builds the exe, so as to inject date/other infos into it.
*.manifest
*.spec
# Installer logs
pip-log.txt
pip-delete-this-directory.txt
# Unit test / coverage reports
htmlcov/
.tox/
.nox/
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*.cover
*.py,cover
.hypothesis/
.pytest_cache/
cover/
# Translations
*.mo
*.pot
# Django stuff:
*.log
local_settings.py
db.sqlite3
db.sqlite3-journal
# Flask stuff:
instance/
.webassets-cache
# Scrapy stuff:
.scrapy
# Sphinx documentation
docs/_build/
# PyBuilder
.pybuilder/
target/
# Jupyter Notebook
.ipynb_checkpoints
# IPython
profile_default/
ipython_config.py
# pyenv
# For a library or package, you might want to ignore these files since the code is
# intended to run in multiple environments; otherwise, check them in:
# .python-version
# pipenv
# According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
# However, in case of collaboration, if having platform-specific dependencies or dependencies
# having no cross-platform support, pipenv may install dependencies that don't work, or not
# install all needed dependencies.
#Pipfile.lock
# poetry
# Similar to Pipfile.lock, it is generally recommended to include poetry.lock in version control.
# This is especially recommended for binary packages to ensure reproducibility, and is more
# commonly ignored for libraries.
# https://python-poetry.org/docs/basic-usage/#commit-your-poetrylock-file-to-version-control
#poetry.lock
# pdm
# Similar to Pipfile.lock, it is generally recommended to include pdm.lock in version control.
#pdm.lock
# pdm stores project-wide configurations in .pdm.toml, but it is recommended to not include it
# in version control.
# https://pdm.fming.dev/#use-with-ide
.pdm.toml
# PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm
__pypackages__/
# Celery stuff
celerybeat-schedule
celerybeat.pid
# SageMath parsed files
*.sage.py
# Environments
.env
.venv
env/
venv/
ENV/
env.bak/
venv.bak/
# Spyder project settings
.spyderproject
.spyproject
# Rope project settings
.ropeproject
# mkdocs documentation
/site
# mypy
.mypy_cache/
.dmypy.json
dmypy.json
# Pyre type checker
.pyre/
# pytype static type analyzer
.pytype/
# Cython debug symbols
cython_debug/
# PyCharm
# JetBrains specific template is maintained in a separate JetBrains.gitignore that can
# be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore
# and can be added to the global gitignore or merged into this file. For a more nuclear
# option (not recommended) you can uncomment the following to ignore the entire idea folder.
#.idea/

View File

@@ -1,201 +0,0 @@
Apache License
Version 2.0, January 2004
http://www.apache.org/licenses/
TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
1. Definitions.
"License" shall mean the terms and conditions for use, reproduction,
and distribution as defined by Sections 1 through 9 of this document.
"Licensor" shall mean the copyright owner or entity authorized by
the copyright owner that is granting the License.
"Legal Entity" shall mean the union of the acting entity and all
other entities that control, are controlled by, or are under common
control with that entity. For the purposes of this definition,
"control" means (i) the power, direct or indirect, to cause the
direction or management of such entity, whether by contract or
otherwise, or (ii) ownership of fifty percent (50%) or more of the
outstanding shares, or (iii) beneficial ownership of such entity.
"You" (or "Your") shall mean an individual or Legal Entity
exercising permissions granted by this License.
"Source" form shall mean the preferred form for making modifications,
including but not limited to software source code, documentation
source, and configuration files.
"Object" form shall mean any form resulting from mechanical
transformation or translation of a Source form, including but
not limited to compiled object code, generated documentation,
and conversions to other media types.
"Work" shall mean the work of authorship, whether in Source or
Object form, made available under the License, as indicated by a
copyright notice that is included in or attached to the work
(an example is provided in the Appendix below).
"Derivative Works" shall mean any work, whether in Source or Object
form, that is based on (or derived from) the Work and for which the
editorial revisions, annotations, elaborations, or other modifications
represent, as a whole, an original work of authorship. For the purposes
of this License, Derivative Works shall not include works that remain
separable from, or merely link (or bind by name) to the interfaces of,
the Work and Derivative Works thereof.
"Contribution" shall mean any work of authorship, including
the original version of the Work and any modifications or additions
to that Work or Derivative Works thereof, that is intentionally
submitted to Licensor for inclusion in the Work by the copyright owner
or by an individual or Legal Entity authorized to submit on behalf of
the copyright owner. For the purposes of this definition, "submitted"
means any form of electronic, verbal, or written communication sent
to the Licensor or its representatives, including but not limited to
communication on electronic mailing lists, source code control systems,
and issue tracking systems that are managed by, or on behalf of, the
Licensor for the purpose of discussing and improving the Work, but
excluding communication that is conspicuously marked or otherwise
designated in writing by the copyright owner as "Not a Contribution."
"Contributor" shall mean Licensor and any individual or Legal Entity
on behalf of whom a Contribution has been received by Licensor and
subsequently incorporated within the Work.
2. Grant of Copyright License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
copyright license to reproduce, prepare Derivative Works of,
publicly display, publicly perform, sublicense, and distribute the
Work and such Derivative Works in Source or Object form.
3. Grant of Patent License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
(except as stated in this section) patent license to make, have made,
use, offer to sell, sell, import, and otherwise transfer the Work,
where such license applies only to those patent claims licensable
by such Contributor that are necessarily infringed by their
Contribution(s) alone or by combination of their Contribution(s)
with the Work to which such Contribution(s) was submitted. If You
institute patent litigation against any entity (including a
cross-claim or counterclaim in a lawsuit) alleging that the Work
or a Contribution incorporated within the Work constitutes direct
or contributory patent infringement, then any patent licenses
granted to You under this License for that Work shall terminate
as of the date such litigation is filed.
4. Redistribution. You may reproduce and distribute copies of the
Work or Derivative Works thereof in any medium, with or without
modifications, and in Source or Object form, provided that You
meet the following conditions:
(a) You must give any other recipients of the Work or
Derivative Works a copy of this License; and
(b) You must cause any modified files to carry prominent notices
stating that You changed the files; and
(c) You must retain, in the Source form of any Derivative Works
that You distribute, all copyright, patent, trademark, and
attribution notices from the Source form of the Work,
excluding those notices that do not pertain to any part of
the Derivative Works; and
(d) If the Work includes a "NOTICE" text file as part of its
distribution, then any Derivative Works that You distribute must
include a readable copy of the attribution notices contained
within such NOTICE file, excluding those notices that do not
pertain to any part of the Derivative Works, in at least one
of the following places: within a NOTICE text file distributed
as part of the Derivative Works; within the Source form or
documentation, if provided along with the Derivative Works; or,
within a display generated by the Derivative Works, if and
wherever such third-party notices normally appear. The contents
of the NOTICE file are for informational purposes only and
do not modify the License. You may add Your own attribution
notices within Derivative Works that You distribute, alongside
or as an addendum to the NOTICE text from the Work, provided
that such additional attribution notices cannot be construed
as modifying the License.
You may add Your own copyright statement to Your modifications and
may provide additional or different license terms and conditions
for use, reproduction, or distribution of Your modifications, or
for any such Derivative Works as a whole, provided Your use,
reproduction, and distribution of the Work otherwise complies with
the conditions stated in this License.
5. Submission of Contributions. Unless You explicitly state otherwise,
any Contribution intentionally submitted for inclusion in the Work
by You to the Licensor shall be under the terms and conditions of
this License, without any additional terms or conditions.
Notwithstanding the above, nothing herein shall supersede or modify
the terms of any separate license agreement you may have executed
with Licensor regarding such Contributions.
6. Trademarks. This License does not grant permission to use the trade
names, trademarks, service marks, or product names of the Licensor,
except as required for reasonable and customary use in describing the
origin of the Work and reproducing the content of the NOTICE file.
7. Disclaimer of Warranty. Unless required by applicable law or
agreed to in writing, Licensor provides the Work (and each
Contributor provides its Contributions) on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
implied, including, without limitation, any warranties or conditions
of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
PARTICULAR PURPOSE. You are solely responsible for determining the
appropriateness of using or redistributing the Work and assume any
risks associated with Your exercise of permissions under this License.
8. Limitation of Liability. In no event and under no legal theory,
whether in tort (including negligence), contract, or otherwise,
unless required by applicable law (such as deliberate and grossly
negligent acts) or agreed to in writing, shall any Contributor be
liable to You for damages, including any direct, indirect, special,
incidental, or consequential damages of any character arising as a
result of this License or out of the use or inability to use the
Work (including but not limited to damages for loss of goodwill,
work stoppage, computer failure or malfunction, or any and all
other commercial damages or losses), even if such Contributor
has been advised of the possibility of such damages.
9. Accepting Warranty or Additional Liability. While redistributing
the Work or Derivative Works thereof, You may choose to offer,
and charge a fee for, acceptance of support, warranty, indemnity,
or other liability obligations and/or rights consistent with this
License. However, in accepting such obligations, You may act only
on Your own behalf and on Your sole responsibility, not on behalf
of any other Contributor, and only if You agree to indemnify,
defend, and hold each Contributor harmless for any liability
incurred by, or claims asserted against, such Contributor by reason
of your accepting any such warranty or additional liability.
END OF TERMS AND CONDITIONS
APPENDIX: How to apply the Apache License to your work.
To apply the Apache License to your work, attach the following
boilerplate notice, with the fields enclosed by brackets "[]"
replaced with your own identifying information. (Don't include
the brackets!) The text should be enclosed in the appropriate
comment syntax for the file format. We also recommend that a
file or class name and description of purpose be included on the
same "printed page" as the copyright notice for easier
identification within third-party archives.
Copyright [yyyy] [name of copyright owner]
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

View File

@@ -1,91 +0,0 @@
# PrivateGPT with Llama 2 uncensored
https://github.com/ollama/ollama/assets/3325447/20cf8ec6-ff25-42c6-bdd8-9be594e3ce1b
> Note: this example is a slightly modified version of PrivateGPT using models such as Llama 2 Uncensored. All credit for PrivateGPT goes to Iván Martínez who is the creator of it, and you can find his GitHub repo [here](https://github.com/imartinez/privateGPT).
### Setup
Set up a virtual environment (optional):
```
python3 -m venv .venv
source .venv/bin/activate
```
Install the Python dependencies:
```shell
pip install -r requirements.txt
```
Pull the model you'd like to use:
```
ollama pull llama2-uncensored
```
### Getting WeWork's latest quarterly earnings report (10-Q)
```
mkdir source_documents
curl https://d18rn0p25nwr6d.cloudfront.net/CIK-0001813756/975b3e9b-268e-4798-a9e4-2a9a7c92dc10.pdf -o source_documents/wework.pdf
```
### Ingesting files
```shell
python ingest.py
```
Output should look like this:
```shell
Creating new vectorstore
Loading documents from source_documents
Loading new documents: 100%|██████████████████████| 1/1 [00:01<00:00, 1.73s/it]
Loaded 1 new documents from source_documents
Split into 90 chunks of text (max. 500 tokens each)
Creating embeddings. May take some minutes...
Using embedded DuckDB with persistence: data will be stored in: db
Ingestion complete! You can now run privateGPT.py to query your documents
```
### Ask questions
```shell
python privateGPT.py
Enter a query: How many locations does WeWork have?
> Answer (took 17.7 s.):
As of June 2023, WeWork has 777 locations worldwide, including 610 Consolidated Locations (as defined in the section entitled Key Performance Indicators).
```
### Try a different model:
```
ollama pull llama2:13b
MODEL=llama2:13b python privateGPT.py
```
## Adding more files
Put any and all your files into the `source_documents` directory
The supported extensions are:
- `.csv`: CSV,
- `.docx`: Word Document,
- `.doc`: Word Document,
- `.enex`: EverNote,
- `.eml`: Email,
- `.epub`: EPub,
- `.html`: HTML File,
- `.md`: Markdown,
- `.msg`: Outlook Message,
- `.odt`: Open Document Text,
- `.pdf`: Portable Document Format (PDF),
- `.pptx` : PowerPoint Document,
- `.ppt` : PowerPoint Document,
- `.txt`: Text file (UTF-8),

View File

@@ -1,11 +0,0 @@
import os
from chromadb.config import Settings
# Define the folder for storing database
PERSIST_DIRECTORY = os.environ.get('PERSIST_DIRECTORY', 'db')
# Define the Chroma settings
CHROMA_SETTINGS = Settings(
persist_directory=PERSIST_DIRECTORY,
anonymized_telemetry=False
)

View File

@@ -1,170 +0,0 @@
#!/usr/bin/env python3
import os
import glob
from typing import List
from multiprocessing import Pool
from tqdm import tqdm
from langchain.document_loaders import (
CSVLoader,
EverNoteLoader,
PyMuPDFLoader,
TextLoader,
UnstructuredEmailLoader,
UnstructuredEPubLoader,
UnstructuredHTMLLoader,
UnstructuredMarkdownLoader,
UnstructuredODTLoader,
UnstructuredPowerPointLoader,
UnstructuredWordDocumentLoader,
)
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.vectorstores import Chroma
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.docstore.document import Document
from constants import CHROMA_SETTINGS
# Load environment variables
persist_directory = os.environ.get('PERSIST_DIRECTORY', 'db')
source_directory = os.environ.get('SOURCE_DIRECTORY', 'source_documents')
embeddings_model_name = os.environ.get('EMBEDDINGS_MODEL_NAME', 'all-MiniLM-L6-v2')
chunk_size = 500
chunk_overlap = 50
# Custom document loaders
class MyElmLoader(UnstructuredEmailLoader):
"""Wrapper to fallback to text/plain when default does not work"""
def load(self) -> List[Document]:
"""Wrapper adding fallback for elm without html"""
try:
try:
doc = UnstructuredEmailLoader.load(self)
except ValueError as e:
if 'text/html content not found in email' in str(e):
# Try plain text
self.unstructured_kwargs["content_source"]="text/plain"
doc = UnstructuredEmailLoader.load(self)
else:
raise
except Exception as e:
# Add file_path to exception message
raise type(e)(f"{self.file_path}: {e}") from e
return doc
# Map file extensions to document loaders and their arguments
LOADER_MAPPING = {
".csv": (CSVLoader, {}),
# ".docx": (Docx2txtLoader, {}),
".doc": (UnstructuredWordDocumentLoader, {}),
".docx": (UnstructuredWordDocumentLoader, {}),
".enex": (EverNoteLoader, {}),
".eml": (MyElmLoader, {}),
".epub": (UnstructuredEPubLoader, {}),
".html": (UnstructuredHTMLLoader, {}),
".md": (UnstructuredMarkdownLoader, {}),
".odt": (UnstructuredODTLoader, {}),
".pdf": (PyMuPDFLoader, {}),
".ppt": (UnstructuredPowerPointLoader, {}),
".pptx": (UnstructuredPowerPointLoader, {}),
".txt": (TextLoader, {"encoding": "utf8"}),
# Add more mappings for other file extensions and loaders as needed
}
def load_single_document(file_path: str) -> List[Document]:
if os.path.getsize(file_path) != 0:
filename, ext = os.path.splitext(file_path)
if ext in LOADER_MAPPING:
loader_class, loader_args = LOADER_MAPPING[ext]
try:
loader = loader_class(file_path, **loader_args)
if loader:
return loader.load()
except:
print(f"Corrupted file {file_path}. Ignoring it.")
else:
print(f"Unsupported file {file_path}. Ignoring it.")
else:
print(f"Empty file {file_path}. Ignoring it.")
def load_documents(source_dir: str, ignored_files: List[str] = []) -> List[Document]:
"""
Loads all documents from the source documents directory, ignoring specified files
"""
all_files = []
for ext in LOADER_MAPPING:
all_files.extend(
glob.glob(os.path.join(source_dir, f"**/*{ext}"), recursive=True)
)
filtered_files = [file_path for file_path in all_files if file_path not in ignored_files]
with Pool(processes=os.cpu_count()) as pool:
results = []
with tqdm(total=len(filtered_files), desc='Loading new documents', ncols=80) as pbar:
for i, docs in enumerate(pool.imap_unordered(load_single_document, filtered_files)):
if docs:
results.extend(docs)
pbar.update()
return results
def process_documents(ignored_files: List[str] = []) -> List[Document]:
"""
Load documents and split in chunks
"""
print(f"Loading documents from {source_directory}")
documents = load_documents(source_directory, ignored_files)
if not documents:
print("No new documents to load")
exit(0)
print(f"Loaded {len(documents)} new documents from {source_directory}")
text_splitter = RecursiveCharacterTextSplitter(chunk_size=chunk_size, chunk_overlap=chunk_overlap)
texts = text_splitter.split_documents(documents)
print(f"Split into {len(texts)} chunks of text (max. {chunk_size} tokens each)")
return texts
def does_vectorstore_exist(persist_directory: str) -> bool:
"""
Checks if vectorstore exists
"""
if os.path.exists(os.path.join(persist_directory, 'index')):
if os.path.exists(os.path.join(persist_directory, 'chroma-collections.parquet')) and os.path.exists(os.path.join(persist_directory, 'chroma-embeddings.parquet')):
list_index_files = glob.glob(os.path.join(persist_directory, 'index/*.bin'))
list_index_files += glob.glob(os.path.join(persist_directory, 'index/*.pkl'))
# At least 3 documents are needed in a working vectorstore
if len(list_index_files) > 3:
return True
return False
def main():
# Create embeddings
embeddings = HuggingFaceEmbeddings(model_name=embeddings_model_name)
if does_vectorstore_exist(persist_directory):
# Update and store locally vectorstore
print(f"Appending to existing vectorstore at {persist_directory}")
db = Chroma(persist_directory=persist_directory, embedding_function=embeddings, client_settings=CHROMA_SETTINGS)
collection = db.get()
texts = process_documents([metadata['source'] for metadata in collection['metadatas']])
print(f"Creating embeddings. May take some minutes...")
db.add_documents(texts)
else:
# Create and store locally vectorstore
print("Creating new vectorstore")
texts = process_documents()
print(f"Creating embeddings. May take some minutes...")
db = Chroma.from_documents(texts, embeddings, persist_directory=persist_directory)
db.persist()
db = None
print(f"Ingestion complete! You can now run privateGPT.py to query your documents")
if __name__ == "__main__":
main()

File diff suppressed because it is too large Load Diff

View File

@@ -1,74 +0,0 @@
#!/usr/bin/env python3
from langchain.chains import RetrievalQA
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler
from langchain.vectorstores import Chroma
from langchain.llms import Ollama
import chromadb
import os
import argparse
import time
model = os.environ.get("MODEL", "llama2-uncensored")
# For embeddings model, the example uses a sentence-transformers model
# https://www.sbert.net/docs/pretrained_models.html
# "The all-mpnet-base-v2 model provides the best quality, while all-MiniLM-L6-v2 is 5 times faster and still offers good quality."
embeddings_model_name = os.environ.get("EMBEDDINGS_MODEL_NAME", "all-MiniLM-L6-v2")
persist_directory = os.environ.get("PERSIST_DIRECTORY", "db")
target_source_chunks = int(os.environ.get('TARGET_SOURCE_CHUNKS',4))
from constants import CHROMA_SETTINGS
def main():
# Parse the command line arguments
args = parse_arguments()
embeddings = HuggingFaceEmbeddings(model_name=embeddings_model_name)
db = Chroma(persist_directory=persist_directory, embedding_function=embeddings)
retriever = db.as_retriever(search_kwargs={"k": target_source_chunks})
# activate/deactivate the streaming StdOut callback for LLMs
callbacks = [] if args.mute_stream else [StreamingStdOutCallbackHandler()]
llm = Ollama(model=model, callbacks=callbacks)
qa = RetrievalQA.from_chain_type(llm=llm, chain_type="stuff", retriever=retriever, return_source_documents= not args.hide_source)
# Interactive questions and answers
while True:
query = input("\nEnter a query: ")
if query == "exit":
break
if query.strip() == "":
continue
# Get the answer from the chain
start = time.time()
res = qa(query)
answer, docs = res['result'], [] if args.hide_source else res['source_documents']
end = time.time()
# Print the result
print("\n\n> Question:")
print(query)
print(answer)
# Print the relevant sources used for the answer
for document in docs:
print("\n> " + document.metadata["source"] + ":")
print(document.page_content)
def parse_arguments():
parser = argparse.ArgumentParser(description='privateGPT: Ask questions to your documents without an internet connection, '
'using the power of LLMs.')
parser.add_argument("--hide-source", "-S", action='store_true',
help='Use this flag to disable printing of source documents used for answers.')
parser.add_argument("--mute-stream", "-M",
action='store_true',
help='Use this flag to disable the streaming StdOut callback for LLMs.')
return parser.parse_args()
if __name__ == "__main__":
main()

View File

@@ -1,26 +0,0 @@
[tool.poetry]
name = "privategpt"
version = "0.1.0"
description = ""
authors = ["Ivan Martinez <ivanmartit@gmail.com>"]
license = "Apache Version 2.0"
readme = "README.md"
[tool.poetry.dependencies]
python = "^3.10"
langchain = "0.0.261"
gpt4all = "^1.0.3"
chromadb = "^0.3.26"
PyMuPDF = "^1.22.5"
python-dotenv = "^1.0.0"
unstructured = "^0.8.0"
extract-msg = "^0.41.5"
tabulate = "^0.9.0"
pandoc = "^2.3"
pypandoc = "^1.11"
tqdm = "^4.65.0"
sentence-transformers = "^2.2.2"
[build-system]
requires = ["poetry-core"]
build-backend = "poetry.core.masonry.api"

View File

@@ -1,15 +0,0 @@
langchain==0.0.274
gpt4all==1.0.8
chromadb==0.5.0
llama-cpp-python==0.1.81
urllib3==2.0.4
PyMuPDF==1.23.5
python-dotenv==1.0.0
unstructured==0.10.8
extract-msg==0.45.0
tabulate==0.9.0
pandoc==2.3
pypandoc==1.11
tqdm==4.66.1
sentence_transformers==2.2.2
numpy>=1.22.2 # not directly required, pinned by Snyk to avoid a vulnerability

View File

@@ -1,23 +0,0 @@
# LangChain Web Summarization
This example summarizes the website, [https://ollama.com/blog/run-llama2-uncensored-locally](https://ollama.com/blog/run-llama2-uncensored-locally)
## Running the Example
1. Ensure you have the `llama3.2` model installed:
```bash
ollama pull llama3.2
```
2. Install the Python Requirements.
```bash
pip install -r requirements.txt
```
3. Run the example:
```bash
python main.py
```

View File

@@ -1,12 +0,0 @@
from langchain_community.llms import Ollama
from langchain_community.document_loaders import WebBaseLoader
from langchain.chains.summarize import load_summarize_chain
loader = WebBaseLoader("https://ollama.com/blog/run-llama2-uncensored-locally")
docs = loader.load()
llm = Ollama(model="llama3.2")
chain = load_summarize_chain(llm, chain_type="stuff")
result = chain.invoke(docs)
print(result)

View File

@@ -1 +0,0 @@
langchain==0.0.259

View File

@@ -1,23 +0,0 @@
# LangChain
This example is a basic "hello world" of using LangChain with Ollama.
## Running the Example
1. Ensure you have the `llama3.2` model installed:
```bash
ollama pull llama3.2
```
2. Install the Python Requirements.
```bash
pip install -r requirements.txt
```
3. Run the example:
```bash
python main.py
```

View File

@@ -1,6 +0,0 @@
from langchain.llms import Ollama
input = input("What is your question?\n> ")
llm = Ollama(model="llama3.2")
res = llm.invoke(input)
print (res)

View File

@@ -1 +0,0 @@
langchain==0.0.259

View File

@@ -1,23 +0,0 @@
# LangChain
This example is a basic "hello world" of using LangChain with Ollama using Node.js and Typescript.
## Running the Example
1. Install the prerequisites:
```bash
npm install
```
2. Ensure the `mistral` model is available:
```bash
ollama pull mistral
```
3. Run the example:
```bash
npm start
```

View File

@@ -1,25 +0,0 @@
import { Ollama } from 'langchain/llms/ollama';
import * as readline from "readline";
async function main() {
const ollama = new Ollama({
model: 'mistral'
// other parameters can be found at https://js.langchain.com/docs/api/llms_ollama/classes/Ollama
});
const rl = readline.createInterface({
input: process.stdin,
output: process.stdout,
});
rl.question("What is your question: \n", async (user_input) => {
const stream = await ollama.stream(user_input);
for await (const chunk of stream) {
process.stdout.write(chunk);
}
rl.close();
})
}
main();

View File

@@ -1,997 +0,0 @@
{
"name": "langchain-typescript-simple",
"lockfileVersion": 3,
"requires": true,
"packages": {
"": {
"dependencies": {
"langchain": "^0.0.165"
},
"devDependencies": {
"typescript": "^5.2.2"
}
},
"node_modules/@anthropic-ai/sdk": {
"version": "0.6.2",
"resolved": "https://registry.npmjs.org/@anthropic-ai/sdk/-/sdk-0.6.2.tgz",
"integrity": "sha512-fB9PUj9RFT+XjkL+E9Ol864ZIJi+1P8WnbHspN3N3/GK2uSzjd0cbVIKTGgf4v3N8MwaQu+UWnU7C4BG/fap/g==",
"dependencies": {
"@types/node": "^18.11.18",
"@types/node-fetch": "^2.6.4",
"abort-controller": "^3.0.0",
"agentkeepalive": "^4.2.1",
"digest-fetch": "^1.3.0",
"form-data-encoder": "1.7.2",
"formdata-node": "^4.3.2",
"node-fetch": "^2.6.7"
}
},
"node_modules/@types/node": {
"version": "18.18.4",
"resolved": "https://registry.npmjs.org/@types/node/-/node-18.18.4.tgz",
"integrity": "sha512-t3rNFBgJRugIhackit2mVcLfF6IRc0JE4oeizPQL8Zrm8n2WY/0wOdpOPhdtG0V9Q2TlW/axbF1MJ6z+Yj/kKQ=="
},
"node_modules/@types/node-fetch": {
"version": "2.6.6",
"resolved": "https://registry.npmjs.org/@types/node-fetch/-/node-fetch-2.6.6.tgz",
"integrity": "sha512-95X8guJYhfqiuVVhRFxVQcf4hW/2bCuoPwDasMf/531STFoNoWTT7YDnWdXHEZKqAGUigmpG31r2FE70LwnzJw==",
"dependencies": {
"@types/node": "*",
"form-data": "^4.0.0"
}
},
"node_modules/@types/retry": {
"version": "0.12.0",
"resolved": "https://registry.npmjs.org/@types/retry/-/retry-0.12.0.tgz",
"integrity": "sha512-wWKOClTTiizcZhXnPY4wikVAwmdYHp8q6DmC+EJUzAMsycb7HB32Kh9RN4+0gExjmPmZSAQjgURXIGATPegAvA=="
},
"node_modules/@types/uuid": {
"version": "9.0.5",
"resolved": "https://registry.npmjs.org/@types/uuid/-/uuid-9.0.5.tgz",
"integrity": "sha512-xfHdwa1FMJ082prjSJpoEI57GZITiQz10r3vEJCHa2khEFQjKy91aWKz6+zybzssCvXUwE1LQWgWVwZ4nYUvHQ=="
},
"node_modules/abort-controller": {
"version": "3.0.0",
"resolved": "https://registry.npmjs.org/abort-controller/-/abort-controller-3.0.0.tgz",
"integrity": "sha512-h8lQ8tacZYnR3vNQTgibj+tODHI5/+l06Au2Pcriv/Gmet0eaj4TwWH41sO9wnHDiQsEj19q0drzdWdeAHtweg==",
"dependencies": {
"event-target-shim": "^5.0.0"
},
"engines": {
"node": ">=6.5"
}
},
"node_modules/agentkeepalive": {
"version": "4.5.0",
"resolved": "https://registry.npmjs.org/agentkeepalive/-/agentkeepalive-4.5.0.tgz",
"integrity": "sha512-5GG/5IbQQpC9FpkRGsSvZI5QYeSCzlJHdpBQntCsuTOxhKD8lqKhrleg2Yi7yvMIf82Ycmmqln9U8V9qwEiJew==",
"dependencies": {
"humanize-ms": "^1.2.1"
},
"engines": {
"node": ">= 8.0.0"
}
},
"node_modules/ansi-styles": {
"version": "5.2.0",
"resolved": "https://registry.npmjs.org/ansi-styles/-/ansi-styles-5.2.0.tgz",
"integrity": "sha512-Cxwpt2SfTzTtXcfOlzGEee8O+c+MmUgGrNiBcXnuWxuFJHe6a5Hz7qwhwe5OgaSYI0IJvkLqWX1ASG+cJOkEiA==",
"engines": {
"node": ">=10"
},
"funding": {
"url": "https://github.com/chalk/ansi-styles?sponsor=1"
}
},
"node_modules/argparse": {
"version": "2.0.1",
"resolved": "https://registry.npmjs.org/argparse/-/argparse-2.0.1.tgz",
"integrity": "sha512-8+9WqebbFzpX9OR+Wa6O29asIogeRMzcGtAINdpMHHyAg10f05aSFVBbcEqGf/PXw1EjAZ+q2/bEBg3DvurK3Q=="
},
"node_modules/asynckit": {
"version": "0.4.0",
"resolved": "https://registry.npmjs.org/asynckit/-/asynckit-0.4.0.tgz",
"integrity": "sha512-Oei9OH4tRh0YqU3GxhX79dM/mwVgvbZJaSNaRk+bshkj0S5cfHcgYakreBjrHwatXKbz+IoIdYLxrKim2MjW0Q=="
},
"node_modules/base-64": {
"version": "0.1.0",
"resolved": "https://registry.npmjs.org/base-64/-/base-64-0.1.0.tgz",
"integrity": "sha512-Y5gU45svrR5tI2Vt/X9GPd3L0HNIKzGu202EjxrXMpuc2V2CiKgemAbUUsqYmZJvPtCXoUKjNZwBJzsNScUbXA=="
},
"node_modules/base64-js": {
"version": "1.5.1",
"resolved": "https://registry.npmjs.org/base64-js/-/base64-js-1.5.1.tgz",
"integrity": "sha512-AKpaYlHn8t4SVbOHCy+b5+KKgvR4vrsD8vbvrbiQJps7fKDTkjkDry6ji0rUJjC0kzbNePLwzxq8iypo41qeWA==",
"funding": [
{
"type": "github",
"url": "https://github.com/sponsors/feross"
},
{
"type": "patreon",
"url": "https://www.patreon.com/feross"
},
{
"type": "consulting",
"url": "https://feross.org/support"
}
]
},
"node_modules/binary-extensions": {
"version": "2.2.0",
"resolved": "https://registry.npmjs.org/binary-extensions/-/binary-extensions-2.2.0.tgz",
"integrity": "sha512-jDctJ/IVQbZoJykoeHbhXpOlNBqGNcwXJKJog42E5HDPUwQTSdjCHdihjj0DlnheQ7blbT6dHOafNAiS8ooQKA==",
"engines": {
"node": ">=8"
}
},
"node_modules/binary-search": {
"version": "1.3.6",
"resolved": "https://registry.npmjs.org/binary-search/-/binary-search-1.3.6.tgz",
"integrity": "sha512-nbE1WxOTTrUWIfsfZ4aHGYu5DOuNkbxGokjV6Z2kxfJK3uaAb8zNK1muzOeipoLHZjInT4Br88BHpzevc681xA=="
},
"node_modules/camelcase": {
"version": "6.3.0",
"resolved": "https://registry.npmjs.org/camelcase/-/camelcase-6.3.0.tgz",
"integrity": "sha512-Gmy6FhYlCY7uOElZUSbxo2UCDH8owEk996gkbrpsgGtrJLM3J7jGxl9Ic7Qwwj4ivOE5AWZWRMecDdF7hqGjFA==",
"engines": {
"node": ">=10"
},
"funding": {
"url": "https://github.com/sponsors/sindresorhus"
}
},
"node_modules/charenc": {
"version": "0.0.2",
"resolved": "https://registry.npmjs.org/charenc/-/charenc-0.0.2.tgz",
"integrity": "sha512-yrLQ/yVUFXkzg7EDQsPieE/53+0RlaWTs+wBrvW36cyilJ2SaDWfl4Yj7MtLTXleV9uEKefbAGUPv2/iWSooRA==",
"engines": {
"node": "*"
}
},
"node_modules/combined-stream": {
"version": "1.0.8",
"resolved": "https://registry.npmjs.org/combined-stream/-/combined-stream-1.0.8.tgz",
"integrity": "sha512-FQN4MRfuJeHf7cBbBMJFXhKSDq+2kAArBlmRBvcvFE5BB1HZKXtSFASDhdlz9zOYwxh8lDdnvmMOe/+5cdoEdg==",
"dependencies": {
"delayed-stream": "~1.0.0"
},
"engines": {
"node": ">= 0.8"
}
},
"node_modules/commander": {
"version": "10.0.1",
"resolved": "https://registry.npmjs.org/commander/-/commander-10.0.1.tgz",
"integrity": "sha512-y4Mg2tXshplEbSGzx7amzPwKKOCGuoSRP/CjEdwwk0FOGlUbq6lKuoyDZTNZkmxHdJtp54hdfY/JUrdL7Xfdug==",
"engines": {
"node": ">=14"
}
},
"node_modules/crypt": {
"version": "0.0.2",
"resolved": "https://registry.npmjs.org/crypt/-/crypt-0.0.2.tgz",
"integrity": "sha512-mCxBlsHFYh9C+HVpiEacem8FEBnMXgU9gy4zmNC+SXAZNB/1idgp/aulFJ4FgCi7GPEVbfyng092GqL2k2rmow==",
"engines": {
"node": "*"
}
},
"node_modules/decamelize": {
"version": "1.2.0",
"resolved": "https://registry.npmjs.org/decamelize/-/decamelize-1.2.0.tgz",
"integrity": "sha512-z2S+W9X73hAUUki+N+9Za2lBlun89zigOyGrsax+KUQ6wKW4ZoWpEYBkGhQjwAjjDCkWxhY0VKEhk8wzY7F5cA==",
"engines": {
"node": ">=0.10.0"
}
},
"node_modules/delayed-stream": {
"version": "1.0.0",
"resolved": "https://registry.npmjs.org/delayed-stream/-/delayed-stream-1.0.0.tgz",
"integrity": "sha512-ZySD7Nf91aLB0RxL4KGrKHBXl7Eds1DAmEdcoVawXnLD7SDhpNgtuII2aAkg7a7QS41jxPSZ17p4VdGnMHk3MQ==",
"engines": {
"node": ">=0.4.0"
}
},
"node_modules/digest-fetch": {
"version": "1.3.0",
"resolved": "https://registry.npmjs.org/digest-fetch/-/digest-fetch-1.3.0.tgz",
"integrity": "sha512-CGJuv6iKNM7QyZlM2T3sPAdZWd/p9zQiRNS9G+9COUCwzWFTs0Xp8NF5iePx7wtvhDykReiRRrSeNb4oMmB8lA==",
"dependencies": {
"base-64": "^0.1.0",
"md5": "^2.3.0"
}
},
"node_modules/event-target-shim": {
"version": "5.0.1",
"resolved": "https://registry.npmjs.org/event-target-shim/-/event-target-shim-5.0.1.tgz",
"integrity": "sha512-i/2XbnSz/uxRCU6+NdVJgKWDTM427+MqYbkQzD321DuCQJUqOuJKIA0IM2+W2xtYHdKOmZ4dR6fExsd4SXL+WQ==",
"engines": {
"node": ">=6"
}
},
"node_modules/eventemitter3": {
"version": "4.0.7",
"resolved": "https://registry.npmjs.org/eventemitter3/-/eventemitter3-4.0.7.tgz",
"integrity": "sha512-8guHBZCwKnFhYdHr2ysuRWErTwhoN2X8XELRlrRwpmfeY2jjuUN4taQMsULKUVo1K4DvZl+0pgfyoysHxvmvEw=="
},
"node_modules/expr-eval": {
"version": "2.0.2",
"resolved": "https://registry.npmjs.org/expr-eval/-/expr-eval-2.0.2.tgz",
"integrity": "sha512-4EMSHGOPSwAfBiibw3ndnP0AvjDWLsMvGOvWEZ2F96IGk0bIVdjQisOHxReSkE13mHcfbuCiXw+G4y0zv6N8Eg=="
},
"node_modules/flat": {
"version": "5.0.2",
"resolved": "https://registry.npmjs.org/flat/-/flat-5.0.2.tgz",
"integrity": "sha512-b6suED+5/3rTpUBdG1gupIl8MPFCAMA0QXwmljLhvCUKcUvdE4gWky9zpuGCcXHOsz4J9wPGNWq6OKpmIzz3hQ==",
"bin": {
"flat": "cli.js"
}
},
"node_modules/form-data": {
"version": "4.0.0",
"resolved": "https://registry.npmjs.org/form-data/-/form-data-4.0.0.tgz",
"integrity": "sha512-ETEklSGi5t0QMZuiXoA/Q6vcnxcLQP5vdugSpuAyi6SVGi2clPPp+xgEhuMaHC+zGgn31Kd235W35f7Hykkaww==",
"dependencies": {
"asynckit": "^0.4.0",
"combined-stream": "^1.0.8",
"mime-types": "^2.1.12"
},
"engines": {
"node": ">= 6"
}
},
"node_modules/form-data-encoder": {
"version": "1.7.2",
"resolved": "https://registry.npmjs.org/form-data-encoder/-/form-data-encoder-1.7.2.tgz",
"integrity": "sha512-qfqtYan3rxrnCk1VYaA4H+Ms9xdpPqvLZa6xmMgFvhO32x7/3J/ExcTd6qpxM0vH2GdMI+poehyBZvqfMTto8A=="
},
"node_modules/formdata-node": {
"version": "4.4.1",
"resolved": "https://registry.npmjs.org/formdata-node/-/formdata-node-4.4.1.tgz",
"integrity": "sha512-0iirZp3uVDjVGt9p49aTaqjk84TrglENEDuqfdlZQ1roC9CWlPk6Avf8EEnZNcAqPonwkG35x4n3ww/1THYAeQ==",
"dependencies": {
"node-domexception": "1.0.0",
"web-streams-polyfill": "4.0.0-beta.3"
},
"engines": {
"node": ">= 12.20"
}
},
"node_modules/humanize-ms": {
"version": "1.2.1",
"resolved": "https://registry.npmjs.org/humanize-ms/-/humanize-ms-1.2.1.tgz",
"integrity": "sha512-Fl70vYtsAFb/C06PTS9dZBo7ihau+Tu/DNCk/OyHhea07S+aeMWpFFkUaXRa8fI+ScZbEI8dfSxwY7gxZ9SAVQ==",
"dependencies": {
"ms": "^2.0.0"
}
},
"node_modules/is-any-array": {
"version": "2.0.1",
"resolved": "https://registry.npmjs.org/is-any-array/-/is-any-array-2.0.1.tgz",
"integrity": "sha512-UtilS7hLRu++wb/WBAw9bNuP1Eg04Ivn1vERJck8zJthEvXCBEBpGR/33u/xLKWEQf95803oalHrVDptcAvFdQ=="
},
"node_modules/is-buffer": {
"version": "1.1.6",
"resolved": "https://registry.npmjs.org/is-buffer/-/is-buffer-1.1.6.tgz",
"integrity": "sha512-NcdALwpXkTm5Zvvbk7owOUSvVvBKDgKP5/ewfXEznmQFfs4ZRmanOeKBTjRVjka3QFoN6XJ+9F3USqfHqTaU5w=="
},
"node_modules/js-tiktoken": {
"version": "1.0.7",
"resolved": "https://registry.npmjs.org/js-tiktoken/-/js-tiktoken-1.0.7.tgz",
"integrity": "sha512-biba8u/clw7iesNEWLOLwrNGoBP2lA+hTaBLs/D45pJdUPFXyxD6nhcDVtADChghv4GgyAiMKYMiRx7x6h7Biw==",
"dependencies": {
"base64-js": "^1.5.1"
}
},
"node_modules/js-yaml": {
"version": "4.1.0",
"resolved": "https://registry.npmjs.org/js-yaml/-/js-yaml-4.1.0.tgz",
"integrity": "sha512-wpxZs9NoxZaJESJGIZTyDEaYpl0FKSA+FB9aJiyemKhMwkxQg63h4T1KJgUGHpTqPDNRcmmYLugrRjJlBtWvRA==",
"dependencies": {
"argparse": "^2.0.1"
},
"bin": {
"js-yaml": "bin/js-yaml.js"
}
},
"node_modules/jsonpointer": {
"version": "5.0.1",
"resolved": "https://registry.npmjs.org/jsonpointer/-/jsonpointer-5.0.1.tgz",
"integrity": "sha512-p/nXbhSEcu3pZRdkW1OfJhpsVtW1gd4Wa1fnQc9YLiTfAjn0312eMKimbdIQzuZl9aa9xUGaRlP9T/CJE/ditQ==",
"engines": {
"node": ">=0.10.0"
}
},
"node_modules/langchain": {
"version": "0.0.165",
"resolved": "https://registry.npmjs.org/langchain/-/langchain-0.0.165.tgz",
"integrity": "sha512-CpbNpjwaE+9lzjdw+pZz0VgnRrFivEgr7CVp9dDaAb5JpaJAA4V2v6uQ9ZPN+TSqupTQ79HFn2sfyZVEl2EG7Q==",
"dependencies": {
"@anthropic-ai/sdk": "^0.6.2",
"ansi-styles": "^5.0.0",
"binary-extensions": "^2.2.0",
"camelcase": "6",
"decamelize": "^1.2.0",
"expr-eval": "^2.0.2",
"flat": "^5.0.2",
"js-tiktoken": "^1.0.7",
"js-yaml": "^4.1.0",
"jsonpointer": "^5.0.1",
"langchainhub": "~0.0.6",
"langsmith": "~0.0.31",
"ml-distance": "^4.0.0",
"object-hash": "^3.0.0",
"openai": "~4.4.0",
"openapi-types": "^12.1.3",
"p-queue": "^6.6.2",
"p-retry": "4",
"uuid": "^9.0.0",
"yaml": "^2.2.1",
"zod": "^3.22.3",
"zod-to-json-schema": "^3.20.4"
},
"engines": {
"node": ">=18"
},
"peerDependencies": {
"@aws-crypto/sha256-js": "^5.0.0",
"@aws-sdk/client-bedrock-runtime": "^3.422.0",
"@aws-sdk/client-dynamodb": "^3.310.0",
"@aws-sdk/client-kendra": "^3.352.0",
"@aws-sdk/client-lambda": "^3.310.0",
"@aws-sdk/client-s3": "^3.310.0",
"@aws-sdk/client-sagemaker-runtime": "^3.310.0",
"@aws-sdk/client-sfn": "^3.310.0",
"@aws-sdk/credential-provider-node": "^3.388.0",
"@azure/storage-blob": "^12.15.0",
"@clickhouse/client": "^0.0.14",
"@cloudflare/ai": "^1.0.12",
"@elastic/elasticsearch": "^8.4.0",
"@getmetal/metal-sdk": "*",
"@getzep/zep-js": "^0.7.0",
"@gomomento/sdk": "^1.23.0",
"@google-ai/generativelanguage": "^0.2.1",
"@google-cloud/storage": "^6.10.1",
"@huggingface/inference": "^1.5.1",
"@mozilla/readability": "*",
"@notionhq/client": "^2.2.10",
"@opensearch-project/opensearch": "*",
"@pinecone-database/pinecone": "^1.1.0",
"@planetscale/database": "^1.8.0",
"@qdrant/js-client-rest": "^1.2.0",
"@raycast/api": "^1.55.2",
"@smithy/eventstream-codec": "^2.0.5",
"@smithy/protocol-http": "^3.0.6",
"@smithy/signature-v4": "^2.0.10",
"@smithy/util-utf8": "^2.0.0",
"@supabase/postgrest-js": "^1.1.1",
"@supabase/supabase-js": "^2.10.0",
"@tensorflow-models/universal-sentence-encoder": "*",
"@tensorflow/tfjs-converter": "*",
"@tensorflow/tfjs-core": "*",
"@upstash/redis": "^1.20.6",
"@vercel/postgres": "^0.5.0",
"@writerai/writer-sdk": "^0.40.2",
"@xata.io/client": "^0.25.1",
"@xenova/transformers": "^2.5.4",
"@zilliz/milvus2-sdk-node": ">=2.2.7",
"apify-client": "^2.7.1",
"axios": "*",
"cassandra-driver": "^4.6.4",
"cheerio": "^1.0.0-rc.12",
"chromadb": "*",
"cohere-ai": ">=6.0.0",
"d3-dsv": "^2.0.0",
"epub2": "^3.0.1",
"faiss-node": "^0.3.0",
"fast-xml-parser": "^4.2.7",
"firebase-admin": "^11.9.0",
"google-auth-library": "^8.9.0",
"googleapis": "^126.0.1",
"hnswlib-node": "^1.4.2",
"html-to-text": "^9.0.5",
"ignore": "^5.2.0",
"ioredis": "^5.3.2",
"jsdom": "*",
"llmonitor": "*",
"lodash": "^4.17.21",
"mammoth": "*",
"mongodb": "^5.2.0",
"mysql2": "^3.3.3",
"neo4j-driver": "*",
"node-llama-cpp": "*",
"notion-to-md": "^3.1.0",
"pdf-parse": "1.1.1",
"peggy": "^3.0.2",
"pg": "^8.11.0",
"pg-copy-streams": "^6.0.5",
"pickleparser": "^0.1.0",
"playwright": "^1.32.1",
"portkey-ai": "^0.1.11",
"puppeteer": "^19.7.2",
"redis": "^4.6.4",
"replicate": "^0.18.0",
"sonix-speech-recognition": "^2.1.1",
"srt-parser-2": "^1.2.2",
"typeorm": "^0.3.12",
"typesense": "^1.5.3",
"usearch": "^1.1.1",
"vectordb": "^0.1.4",
"voy-search": "0.6.2",
"weaviate-ts-client": "^1.4.0",
"web-auth-library": "^1.0.3",
"youtube-transcript": "^1.0.6",
"youtubei.js": "^5.8.0"
},
"peerDependenciesMeta": {
"@aws-crypto/sha256-js": {
"optional": true
},
"@aws-sdk/client-bedrock-runtime": {
"optional": true
},
"@aws-sdk/client-dynamodb": {
"optional": true
},
"@aws-sdk/client-kendra": {
"optional": true
},
"@aws-sdk/client-lambda": {
"optional": true
},
"@aws-sdk/client-s3": {
"optional": true
},
"@aws-sdk/client-sagemaker-runtime": {
"optional": true
},
"@aws-sdk/client-sfn": {
"optional": true
},
"@aws-sdk/credential-provider-node": {
"optional": true
},
"@azure/storage-blob": {
"optional": true
},
"@clickhouse/client": {
"optional": true
},
"@cloudflare/ai": {
"optional": true
},
"@elastic/elasticsearch": {
"optional": true
},
"@getmetal/metal-sdk": {
"optional": true
},
"@getzep/zep-js": {
"optional": true
},
"@gomomento/sdk": {
"optional": true
},
"@google-ai/generativelanguage": {
"optional": true
},
"@google-cloud/storage": {
"optional": true
},
"@huggingface/inference": {
"optional": true
},
"@mozilla/readability": {
"optional": true
},
"@notionhq/client": {
"optional": true
},
"@opensearch-project/opensearch": {
"optional": true
},
"@pinecone-database/pinecone": {
"optional": true
},
"@planetscale/database": {
"optional": true
},
"@qdrant/js-client-rest": {
"optional": true
},
"@raycast/api": {
"optional": true
},
"@smithy/eventstream-codec": {
"optional": true
},
"@smithy/protocol-http": {
"optional": true
},
"@smithy/signature-v4": {
"optional": true
},
"@smithy/util-utf8": {
"optional": true
},
"@supabase/postgrest-js": {
"optional": true
},
"@supabase/supabase-js": {
"optional": true
},
"@tensorflow-models/universal-sentence-encoder": {
"optional": true
},
"@tensorflow/tfjs-converter": {
"optional": true
},
"@tensorflow/tfjs-core": {
"optional": true
},
"@upstash/redis": {
"optional": true
},
"@vercel/postgres": {
"optional": true
},
"@writerai/writer-sdk": {
"optional": true
},
"@xata.io/client": {
"optional": true
},
"@xenova/transformers": {
"optional": true
},
"@zilliz/milvus2-sdk-node": {
"optional": true
},
"apify-client": {
"optional": true
},
"axios": {
"optional": true
},
"cassandra-driver": {
"optional": true
},
"cheerio": {
"optional": true
},
"chromadb": {
"optional": true
},
"cohere-ai": {
"optional": true
},
"d3-dsv": {
"optional": true
},
"epub2": {
"optional": true
},
"faiss-node": {
"optional": true
},
"fast-xml-parser": {
"optional": true
},
"firebase-admin": {
"optional": true
},
"google-auth-library": {
"optional": true
},
"googleapis": {
"optional": true
},
"hnswlib-node": {
"optional": true
},
"html-to-text": {
"optional": true
},
"ignore": {
"optional": true
},
"ioredis": {
"optional": true
},
"jsdom": {
"optional": true
},
"llmonitor": {
"optional": true
},
"lodash": {
"optional": true
},
"mammoth": {
"optional": true
},
"mongodb": {
"optional": true
},
"mysql2": {
"optional": true
},
"neo4j-driver": {
"optional": true
},
"node-llama-cpp": {
"optional": true
},
"notion-to-md": {
"optional": true
},
"pdf-parse": {
"optional": true
},
"peggy": {
"optional": true
},
"pg": {
"optional": true
},
"pg-copy-streams": {
"optional": true
},
"pickleparser": {
"optional": true
},
"playwright": {
"optional": true
},
"portkey-ai": {
"optional": true
},
"puppeteer": {
"optional": true
},
"redis": {
"optional": true
},
"replicate": {
"optional": true
},
"sonix-speech-recognition": {
"optional": true
},
"srt-parser-2": {
"optional": true
},
"typeorm": {
"optional": true
},
"typesense": {
"optional": true
},
"usearch": {
"optional": true
},
"vectordb": {
"optional": true
},
"voy-search": {
"optional": true
},
"weaviate-ts-client": {
"optional": true
},
"web-auth-library": {
"optional": true
},
"youtube-transcript": {
"optional": true
},
"youtubei.js": {
"optional": true
}
}
},
"node_modules/langchainhub": {
"version": "0.0.6",
"resolved": "https://registry.npmjs.org/langchainhub/-/langchainhub-0.0.6.tgz",
"integrity": "sha512-SW6105T+YP1cTe0yMf//7kyshCgvCTyFBMTgH2H3s9rTAR4e+78DA/BBrUL/Mt4Q5eMWui7iGuAYb3pgGsdQ9w=="
},
"node_modules/langsmith": {
"version": "0.0.42",
"resolved": "https://registry.npmjs.org/langsmith/-/langsmith-0.0.42.tgz",
"integrity": "sha512-sFuN+e7E+pPBIRaRgFqZh/BRBWNHTZNAwi6uj4kydQawooCZYoJmM5snOkiQrhVSvAhgu6xFhLvmfvkPcKzD7w==",
"dependencies": {
"@types/uuid": "^9.0.1",
"commander": "^10.0.1",
"p-queue": "^6.6.2",
"p-retry": "4",
"uuid": "^9.0.0"
},
"bin": {
"langsmith": "dist/cli/main.cjs"
}
},
"node_modules/md5": {
"version": "2.3.0",
"resolved": "https://registry.npmjs.org/md5/-/md5-2.3.0.tgz",
"integrity": "sha512-T1GITYmFaKuO91vxyoQMFETst+O71VUPEU3ze5GNzDm0OWdP8v1ziTaAEPUr/3kLsY3Sftgz242A1SetQiDL7g==",
"dependencies": {
"charenc": "0.0.2",
"crypt": "0.0.2",
"is-buffer": "~1.1.6"
}
},
"node_modules/mime-db": {
"version": "1.52.0",
"resolved": "https://registry.npmjs.org/mime-db/-/mime-db-1.52.0.tgz",
"integrity": "sha512-sPU4uV7dYlvtWJxwwxHD0PuihVNiE7TyAbQ5SWxDCB9mUYvOgroQOwYQQOKPJ8CIbE+1ETVlOoK1UC2nU3gYvg==",
"engines": {
"node": ">= 0.6"
}
},
"node_modules/mime-types": {
"version": "2.1.35",
"resolved": "https://registry.npmjs.org/mime-types/-/mime-types-2.1.35.tgz",
"integrity": "sha512-ZDY+bPm5zTTF+YpCrAU9nK0UgICYPT0QtT1NZWFv4s++TNkcgVaT0g6+4R2uI4MjQjzysHB1zxuWL50hzaeXiw==",
"dependencies": {
"mime-db": "1.52.0"
},
"engines": {
"node": ">= 0.6"
}
},
"node_modules/ml-array-mean": {
"version": "1.1.6",
"resolved": "https://registry.npmjs.org/ml-array-mean/-/ml-array-mean-1.1.6.tgz",
"integrity": "sha512-MIdf7Zc8HznwIisyiJGRH9tRigg3Yf4FldW8DxKxpCCv/g5CafTw0RRu51nojVEOXuCQC7DRVVu5c7XXO/5joQ==",
"dependencies": {
"ml-array-sum": "^1.1.6"
}
},
"node_modules/ml-array-sum": {
"version": "1.1.6",
"resolved": "https://registry.npmjs.org/ml-array-sum/-/ml-array-sum-1.1.6.tgz",
"integrity": "sha512-29mAh2GwH7ZmiRnup4UyibQZB9+ZLyMShvt4cH4eTK+cL2oEMIZFnSyB3SS8MlsTh6q/w/yh48KmqLxmovN4Dw==",
"dependencies": {
"is-any-array": "^2.0.0"
}
},
"node_modules/ml-distance": {
"version": "4.0.1",
"resolved": "https://registry.npmjs.org/ml-distance/-/ml-distance-4.0.1.tgz",
"integrity": "sha512-feZ5ziXs01zhyFUUUeZV5hwc0f5JW0Sh0ckU1koZe/wdVkJdGxcP06KNQuF0WBTj8FttQUzcvQcpcrOp/XrlEw==",
"dependencies": {
"ml-array-mean": "^1.1.6",
"ml-distance-euclidean": "^2.0.0",
"ml-tree-similarity": "^1.0.0"
}
},
"node_modules/ml-distance-euclidean": {
"version": "2.0.0",
"resolved": "https://registry.npmjs.org/ml-distance-euclidean/-/ml-distance-euclidean-2.0.0.tgz",
"integrity": "sha512-yC9/2o8QF0A3m/0IXqCTXCzz2pNEzvmcE/9HFKOZGnTjatvBbsn4lWYJkxENkA4Ug2fnYl7PXQxnPi21sgMy/Q=="
},
"node_modules/ml-tree-similarity": {
"version": "1.0.0",
"resolved": "https://registry.npmjs.org/ml-tree-similarity/-/ml-tree-similarity-1.0.0.tgz",
"integrity": "sha512-XJUyYqjSuUQkNQHMscr6tcjldsOoAekxADTplt40QKfwW6nd++1wHWV9AArl0Zvw/TIHgNaZZNvr8QGvE8wLRg==",
"dependencies": {
"binary-search": "^1.3.5",
"num-sort": "^2.0.0"
}
},
"node_modules/ms": {
"version": "2.1.3",
"resolved": "https://registry.npmjs.org/ms/-/ms-2.1.3.tgz",
"integrity": "sha512-6FlzubTLZG3J2a/NVCAleEhjzq5oxgHyaCU9yYXvcLsvoVaHJq/s5xXI6/XXP6tz7R9xAOtHnSO/tXtF3WRTlA=="
},
"node_modules/node-domexception": {
"version": "1.0.0",
"resolved": "https://registry.npmjs.org/node-domexception/-/node-domexception-1.0.0.tgz",
"integrity": "sha512-/jKZoMpw0F8GRwl4/eLROPA3cfcXtLApP0QzLmUT/HuPCZWyB7IY9ZrMeKw2O/nFIqPQB3PVM9aYm0F312AXDQ==",
"funding": [
{
"type": "github",
"url": "https://github.com/sponsors/jimmywarting"
},
{
"type": "github",
"url": "https://paypal.me/jimmywarting"
}
],
"engines": {
"node": ">=10.5.0"
}
},
"node_modules/node-fetch": {
"version": "2.7.0",
"resolved": "https://registry.npmjs.org/node-fetch/-/node-fetch-2.7.0.tgz",
"integrity": "sha512-c4FRfUm/dbcWZ7U+1Wq0AwCyFL+3nt2bEw05wfxSz+DWpWsitgmSgYmy2dQdWyKC1694ELPqMs/YzUSNozLt8A==",
"dependencies": {
"whatwg-url": "^5.0.0"
},
"engines": {
"node": "4.x || >=6.0.0"
},
"peerDependencies": {
"encoding": "^0.1.0"
},
"peerDependenciesMeta": {
"encoding": {
"optional": true
}
}
},
"node_modules/num-sort": {
"version": "2.1.0",
"resolved": "https://registry.npmjs.org/num-sort/-/num-sort-2.1.0.tgz",
"integrity": "sha512-1MQz1Ed8z2yckoBeSfkQHHO9K1yDRxxtotKSJ9yvcTUUxSvfvzEq5GwBrjjHEpMlq/k5gvXdmJ1SbYxWtpNoVg==",
"engines": {
"node": ">=8"
},
"funding": {
"url": "https://github.com/sponsors/sindresorhus"
}
},
"node_modules/object-hash": {
"version": "3.0.0",
"resolved": "https://registry.npmjs.org/object-hash/-/object-hash-3.0.0.tgz",
"integrity": "sha512-RSn9F68PjH9HqtltsSnqYC1XXoWe9Bju5+213R98cNGttag9q9yAOTzdbsqvIa7aNm5WffBZFpWYr2aWrklWAw==",
"engines": {
"node": ">= 6"
}
},
"node_modules/openai": {
"version": "4.4.0",
"resolved": "https://registry.npmjs.org/openai/-/openai-4.4.0.tgz",
"integrity": "sha512-JN0t628Kh95T0IrXl0HdBqnlJg+4Vq0Bnh55tio+dfCnyzHvMLiWyCM9m726MAJD2YkDU4/8RQB6rNbEq9ct2w==",
"dependencies": {
"@types/node": "^18.11.18",
"@types/node-fetch": "^2.6.4",
"abort-controller": "^3.0.0",
"agentkeepalive": "^4.2.1",
"digest-fetch": "^1.3.0",
"form-data-encoder": "1.7.2",
"formdata-node": "^4.3.2",
"node-fetch": "^2.6.7"
},
"bin": {
"openai": "bin/cli"
}
},
"node_modules/openapi-types": {
"version": "12.1.3",
"resolved": "https://registry.npmjs.org/openapi-types/-/openapi-types-12.1.3.tgz",
"integrity": "sha512-N4YtSYJqghVu4iek2ZUvcN/0aqH1kRDuNqzcycDxhOUpg7GdvLa2F3DgS6yBNhInhv2r/6I0Flkn7CqL8+nIcw=="
},
"node_modules/p-finally": {
"version": "1.0.0",
"resolved": "https://registry.npmjs.org/p-finally/-/p-finally-1.0.0.tgz",
"integrity": "sha512-LICb2p9CB7FS+0eR1oqWnHhp0FljGLZCWBE9aix0Uye9W8LTQPwMTYVGWQWIw9RdQiDg4+epXQODwIYJtSJaow==",
"engines": {
"node": ">=4"
}
},
"node_modules/p-queue": {
"version": "6.6.2",
"resolved": "https://registry.npmjs.org/p-queue/-/p-queue-6.6.2.tgz",
"integrity": "sha512-RwFpb72c/BhQLEXIZ5K2e+AhgNVmIejGlTgiB9MzZ0e93GRvqZ7uSi0dvRF7/XIXDeNkra2fNHBxTyPDGySpjQ==",
"dependencies": {
"eventemitter3": "^4.0.4",
"p-timeout": "^3.2.0"
},
"engines": {
"node": ">=8"
},
"funding": {
"url": "https://github.com/sponsors/sindresorhus"
}
},
"node_modules/p-retry": {
"version": "4.6.2",
"resolved": "https://registry.npmjs.org/p-retry/-/p-retry-4.6.2.tgz",
"integrity": "sha512-312Id396EbJdvRONlngUx0NydfrIQ5lsYu0znKVUzVvArzEIt08V1qhtyESbGVd1FGX7UKtiFp5uwKZdM8wIuQ==",
"dependencies": {
"@types/retry": "0.12.0",
"retry": "^0.13.1"
},
"engines": {
"node": ">=8"
}
},
"node_modules/p-timeout": {
"version": "3.2.0",
"resolved": "https://registry.npmjs.org/p-timeout/-/p-timeout-3.2.0.tgz",
"integrity": "sha512-rhIwUycgwwKcP9yTOOFK/AKsAopjjCakVqLHePO3CC6Mir1Z99xT+R63jZxAT5lFZLa2inS5h+ZS2GvR99/FBg==",
"dependencies": {
"p-finally": "^1.0.0"
},
"engines": {
"node": ">=8"
}
},
"node_modules/retry": {
"version": "0.13.1",
"resolved": "https://registry.npmjs.org/retry/-/retry-0.13.1.tgz",
"integrity": "sha512-XQBQ3I8W1Cge0Seh+6gjj03LbmRFWuoszgK9ooCpwYIrhhoO80pfq4cUkU5DkknwfOfFteRwlZ56PYOGYyFWdg==",
"engines": {
"node": ">= 4"
}
},
"node_modules/tr46": {
"version": "0.0.3",
"resolved": "https://registry.npmjs.org/tr46/-/tr46-0.0.3.tgz",
"integrity": "sha512-N3WMsuqV66lT30CrXNbEjx4GEwlow3v6rr4mCcv6prnfwhS01rkgyFdjPNBYd9br7LpXV1+Emh01fHnq2Gdgrw=="
},
"node_modules/typescript": {
"version": "5.2.2",
"resolved": "https://registry.npmjs.org/typescript/-/typescript-5.2.2.tgz",
"integrity": "sha512-mI4WrpHsbCIcwT9cF4FZvr80QUeKvsUsUvKDoR+X/7XHQH98xYD8YHZg7ANtz2GtZt/CBq2QJ0thkGJMHfqc1w==",
"dev": true,
"bin": {
"tsc": "bin/tsc",
"tsserver": "bin/tsserver"
},
"engines": {
"node": ">=14.17"
}
},
"node_modules/uuid": {
"version": "9.0.1",
"resolved": "https://registry.npmjs.org/uuid/-/uuid-9.0.1.tgz",
"integrity": "sha512-b+1eJOlsR9K8HJpow9Ok3fiWOWSIcIzXodvv0rQjVoOVNpWMpxf1wZNpt4y9h10odCNrqnYp1OBzRktckBe3sA==",
"funding": [
"https://github.com/sponsors/broofa",
"https://github.com/sponsors/ctavan"
],
"bin": {
"uuid": "dist/bin/uuid"
}
},
"node_modules/web-streams-polyfill": {
"version": "4.0.0-beta.3",
"resolved": "https://registry.npmjs.org/web-streams-polyfill/-/web-streams-polyfill-4.0.0-beta.3.tgz",
"integrity": "sha512-QW95TCTaHmsYfHDybGMwO5IJIM93I/6vTRk+daHTWFPhwh+C8Cg7j7XyKrwrj8Ib6vYXe0ocYNrmzY4xAAN6ug==",
"engines": {
"node": ">= 14"
}
},
"node_modules/webidl-conversions": {
"version": "3.0.1",
"resolved": "https://registry.npmjs.org/webidl-conversions/-/webidl-conversions-3.0.1.tgz",
"integrity": "sha512-2JAn3z8AR6rjK8Sm8orRC0h/bcl/DqL7tRPdGZ4I1CjdF+EaMLmYxBHyXuKL849eucPFhvBoxMsflfOb8kxaeQ=="
},
"node_modules/whatwg-url": {
"version": "5.0.0",
"resolved": "https://registry.npmjs.org/whatwg-url/-/whatwg-url-5.0.0.tgz",
"integrity": "sha512-saE57nupxk6v3HY35+jzBwYa0rKSy0XR8JSxZPwgLr7ys0IBzhGviA1/TUGJLmSVqs8pb9AnvICXEuOHLprYTw==",
"dependencies": {
"tr46": "~0.0.3",
"webidl-conversions": "^3.0.0"
}
},
"node_modules/yaml": {
"version": "2.3.2",
"resolved": "https://registry.npmjs.org/yaml/-/yaml-2.3.2.tgz",
"integrity": "sha512-N/lyzTPaJasoDmfV7YTrYCI0G/3ivm/9wdG0aHuheKowWQwGTsK0Eoiw6utmzAnI6pkJa0DUVygvp3spqqEKXg==",
"engines": {
"node": ">= 14"
}
},
"node_modules/zod": {
"version": "3.22.4",
"resolved": "https://registry.npmjs.org/zod/-/zod-3.22.4.tgz",
"integrity": "sha512-iC+8Io04lddc+mVqQ9AZ7OQ2MrUKGN+oIQyq1vemgt46jwCwLfhq7/pwnBnNXXXZb8VTVLKwp9EDkx+ryxIWmg==",
"funding": {
"url": "https://github.com/sponsors/colinhacks"
}
},
"node_modules/zod-to-json-schema": {
"version": "3.21.4",
"resolved": "https://registry.npmjs.org/zod-to-json-schema/-/zod-to-json-schema-3.21.4.tgz",
"integrity": "sha512-fjUZh4nQ1s6HMccgIeE0VP4QG/YRGPmyjO9sAh890aQKPEk3nqbfUXhMFaC+Dr5KvYBm8BCyvfpZf2jY9aGSsw==",
"peerDependencies": {
"zod": "^3.21.4"
}
}
}
}

View File

@@ -1,13 +0,0 @@
{
"scripts": {
"start": "tsx main.ts"
},
"devDependencies": {
"tsx": "^4.6.2",
"typescript": "^5.3.3"
},
"dependencies": {
"langchain": "^0.0.165",
"readline": "^1.3.0"
}
}

View File

@@ -1,5 +0,0 @@
FROM llama3.2
PARAMETER temperature 1
SYSTEM """
You are Mario from super mario bros, acting as an assistant.
"""

Binary file not shown.

Before

Width:  |  Height:  |  Size: 446 KiB

View File

@@ -1,43 +0,0 @@
<img src="logo.png" alt="image of Italian plumber" height="200"/>
# Example character: Mario
This example shows how to create a basic character using Llama 3.2 as the base model.
To run this example:
1. Download the Modelfile
2. `ollama pull llama3.2` to get the base model used in the model file.
3. `ollama create NAME -f ./Modelfile`
4. `ollama run NAME`
Ask it some questions like "Who are you?" or "Is Peach in trouble again?"
## Editing this file
What the model file looks like:
```
FROM llama3.2
PARAMETER temperature 1
SYSTEM """
You are Mario from Super Mario Bros, acting as an assistant.
"""
```
What if you want to change its behaviour?
- Try changing the prompt
- Try changing the parameters [Docs](https://github.com/ollama/ollama/blob/main/docs/modelfile.md)
- Try changing the model (e.g. An uncensored model by `FROM wizard-vicuna` this is the wizard-vicuna uncensored model )
Once the changes are made,
1. `ollama create NAME -f ./Modelfile`
2. `ollama run NAME`
3. Iterate until you are happy with the results.
Notes:
- This example is for research purposes only. There is no affiliation with any entity.
- When using an uncensored model, please be aware that it may generate offensive content.

View File

@@ -1,20 +0,0 @@
FROM mistral
SYSTEM """
You are an experienced Devops engineer focused on docker. When given specifications for a particular need or application you know the best way to host that within a docker container. For instance if someone tells you they want an nginx server to host files located at /web you will answer as follows
---start
FROM nginx:alpine
COPY /myweb /usr/share/nginx/html
EXPOSE 80
---end
Notice that the answer you should give is just the contents of the dockerfile with no explanation and there are three dashes and the word start at the beginning and 3 dashes and the word end. The full output can be piped into a file and run as is. Here is another example. The user will ask to launch a Postgres server with a password of abc123. And the response should be
---start
FROM postgres:latest
ENV POSTGRES_PASSWORD=abc123
EXPOSE 5432
---end
Again it's just the contents of the dockerfile and nothing else.
"""

View File

@@ -1,31 +0,0 @@
# DockerIt
DockerIt is a tool to help you build and run your application in a Docker container. It consists of a model that defines the system prompt and model weights to use, along with a python script to then build the container and run the image automatically.
## Running the Example
1. Ensure you have the `mattw/dockerit` model installed:
```bash
ollama pull mattw/dockerit
```
2. Make sure Docker is running on your machine.
3. Install the Python Requirements.
```bash
pip install -r requirements.txt
```
4. Run the example:
```bash
python dockerit.py "simple postgres server with admin password set to 123"
```
5. Enter the name you would like to use for your container image.
## Caveats
This is a simple example. It's assuming the Dockerfile content generated is going to work. In many cases, even with simple web servers, it fails when trying to copy files that don't exist. It's simply an example of what you could possibly do.

View File

@@ -1,17 +0,0 @@
import requests, json, docker, io, sys
inputDescription = " ".join(sys.argv[1:])
imageName = input("Enter the name of the image: ")
client = docker.from_env()
s = requests.Session()
output=""
with s.post('http://localhost:11434/api/generate', json={'model': 'mattw/dockerit', 'prompt': inputDescription}, stream=True) as r:
for line in r.iter_lines():
if line:
j = json.loads(line)
if "response" in j:
output = output +j["response"]
output = output[output.find("---start")+9:output.find("---end")-1]
f = io.BytesIO(bytes(output, 'utf-8'))
client.images.build(fileobj=f, tag=imageName)
container = client.containers.run(imageName, detach=True)
print("Container named", container.name, " started with id: ",container.id)

View File

@@ -1 +0,0 @@
docker

View File

@@ -1,93 +0,0 @@
# RAG Hallucination Checker using Bespoke-Minicheck
This example allows the user to ask questions related to a document, which can be specified via an article url. Relevant chunks are retrieved from the document and given to `llama3.2` as context to answer the question. Then each sentence in the answer is checked against the retrieved chunks using `bespoke-minicheck` to ensure that the answer does not contain hallucinations.
## Running the Example
1. Ensure `all-minilm` (embedding) `llama3.2` (chat) and `bespoke-minicheck` (check) models installed:
```bash
ollama pull all-minilm
ollama pull llama3.2
ollama pull bespoke-minicheck
```
2. Install the dependencies.
```bash
pip install -r requirements.txt
```
3. Run the example:
```bash
python main.py
```
## Expected Output
```text
Enter the URL of an article you want to chat with, or press Enter for default example:
Loaded, chunked, and embedded text from https://www.theverge.com/2024/9/12/24242439/openai-o1-model-reasoning-strawberry-chatgpt.
Enter your question or type quit: Who is the CEO of openai?
Retrieved chunks:
OpenAI is releasing a new model called o1 , the first in a planned series of “ reasoning ” models that have been trained to answer more complex questions , faster than a human can . It s being released alongside o1-mini , a smaller , cheaper version . And yes , if you re steeped in AI rumors : this is , in fact , the extremely hyped Strawberry model . For OpenAI , o1 represents a step toward its broader goal of human-like artificial intelligence .
OpenAI is releasing a new model called o1 , the first in a planned series of “ reasoning ” models that have been trained to answer more complex questions , faster than a human can . It s being released alongside o1-mini , a smaller , cheaper version . And yes , if you re steeped in AI rumors : this is , in fact , the extremely hyped Strawberry model . For OpenAI , o1 represents a step toward its broader goal of human-like artificial intelligence . More practically , it does a better job at writing code and solving multistep problems than previous models . But it s also more expensive and slower to use than GPT-4o . OpenAI is calling this release of o1 a “ preview ” to emphasize how nascent it is . ChatGPT Plus and Team users get access to both o1-preview and o1-mini starting today , while Enterprise and Edu users will get access early next week .
More practically , it does a better job at writing code and solving multistep problems than previous models . But it s also more expensive and slower to use than GPT-4o . OpenAI is calling this release of o1 a “ preview ” to emphasize how nascent it is . ChatGPT Plus and Team users get access to both o1-preview and o1-mini starting today , while Enterprise and Edu users will get access early next week . OpenAI says it plans to bring o1-mini access to all the free users of ChatGPT but hasn t set a release date yet . Developer access to o1 is really expensive : In the API , o1-preview is $ 15 per 1 million input tokens , or chunks of text parsed by the model , and $ 60 per 1 million output tokens . For comparison , GPT-4o costs $ 5 per 1 million input tokens and $ 15 per 1 million output tokens .
OpenAI says it plans to bring o1-mini access to all the free users of ChatGPT but hasn t set a release date yet . Developer access to o1 is really expensive : In the API , o1-preview is $ 15 per 1 million input tokens , or chunks of text parsed by the model , and $ 60 per 1 million output tokens . For comparison , GPT-4o costs $ 5 per 1 million input tokens and $ 15 per 1 million output tokens . The training behind o1 is fundamentally different from its predecessors , OpenAI s research lead , Jerry Tworek , tells me , though the company is being vague about the exact details . He says o1 “ has been trained using a completely new optimization algorithm and a new training dataset specifically tailored for it. ” Image : OpenAI OpenAI taught previous GPT models to mimic patterns from its training data .
LLM Answer:
The text does not mention the CEO of OpenAI. It only discusses the release of a new model called o1 and some details about it, but does not provide information on the company's leadership.
LLM Claim: The text does not mention the CEO of OpenAI.
Is this claim supported by the context according to bespoke-minicheck? Yes
LLM Claim: It only discusses the release of a new model called o1 and some details about it, but does not provide information on the company's leadership.
Is this claim supported by the context according to bespoke-minicheck? No
```
The second claim is unsupported since the text mentions the research lead.
Another tricky example:
```text
Enter your question or type quit: what sets o1 apart from gpt-4o?
Retrieved chunks:
OpenAI says it plans to bring o1-mini access to all the free users of ChatGPT but hasn t set a release date yet . Developer access to o1 is really expensive : In the API , o1-preview is $ 15 per 1 million input tokens , or chunks of text parsed by the model , and $ 60 per 1 million output tokens . For comparison , GPT-4o costs $ 5 per 1 million input tokens and $ 15 per 1 million output tokens . The training behind o1 is fundamentally different from its predecessors , OpenAI s research lead , Jerry Tworek , tells me , though the company is being vague about the exact details . He says o1 “ has been trained using a completely new optimization algorithm and a new training dataset specifically tailored for it. ” Image : OpenAI OpenAI taught previous GPT models to mimic patterns from its training data .
He says OpenAI also tested o1 against a qualifying exam for the International Mathematics Olympiad , and while GPT-4o only correctly solved only 13 percent of problems , o1 scored 83 percent . “ We can t say we solved hallucinations ” In online programming contests known as Codeforces competitions , this new model reached the 89th percentile of participants , and OpenAI claims the next update of this model will perform “ similarly to PhD students on challenging benchmark tasks in physics , chemistry and biology. ” At the same time , o1 is not as capable as GPT-4o in a lot of areas . It doesn t do as well on factual knowledge about the world .
More practically , it does a better job at writing code and solving multistep problems than previous models . But it s also more expensive and slower to use than GPT-4o . OpenAI is calling this release of o1 a “ preview ” to emphasize how nascent it is . ChatGPT Plus and Team users get access to both o1-preview and o1-mini starting today , while Enterprise and Edu users will get access early next week . OpenAI says it plans to bring o1-mini access to all the free users of ChatGPT but hasn t set a release date yet . Developer access to o1 is really expensive : In the API , o1-preview is $ 15 per 1 million input tokens , or chunks of text parsed by the model , and $ 60 per 1 million output tokens . For comparison , GPT-4o costs $ 5 per 1 million input tokens and $ 15 per 1 million output tokens .
“ We can t say we solved hallucinations ” In online programming contests known as Codeforces competitions , this new model reached the 89th percentile of participants , and OpenAI claims the next update of this model will perform “ similarly to PhD students on challenging benchmark tasks in physics , chemistry and biology. ” At the same time , o1 is not as capable as GPT-4o in a lot of areas . It doesn t do as well on factual knowledge about the world . It also doesn t have the ability to browse the web or process files and images . Still , the company believes it represents a brand-new class of capabilities . It was named o1 to indicate “ resetting the counter back to 1. ” “ I m gon na be honest : I think we re terrible at naming , traditionally , ” McGrew says .
LLM Answer: According to the text, several things set o1 apart from GPT-4o:
* In online programming contests (Codeforces competitions), o1 scored 83% correct solutions compared to GPT-4o's 13%.
* The training behind o1 is "fundamentally different" from its predecessors, including a completely new optimization algorithm and a new training dataset specifically tailored for it.
* o1 has been shown to perform similarly to PhD students on challenging benchmark tasks in physics, chemistry, and biology, while GPT-4o does not have this level of performance.
* However, o1 is also less capable than GPT-4o in some areas, particularly with regard to factual knowledge about the world.
LLM Claim: According to the text, several things set o1 apart from GPT-4o:
* In online programming contests (Codeforces competitions), o1 scored 83% correct solutions compared to GPT-4o's 13%.
Is this claim supported by the context according to bespoke-minicheck? Yes
LLM Claim: * The training behind o1 is "fundamentally different" from its predecessors, including a completely new optimization algorithm and a new training dataset specifically tailored for it.
Is this claim supported by the context according to bespoke-minicheck? Yes
LLM Claim: * o1 has been shown to perform similarly to PhD students on challenging benchmark tasks in physics, chemistry, and biology, while GPT-4o does not have this level of performance.
Is this claim supported by the context according to bespoke-minicheck? No
LLM Claim: * However, o1 is also less capable than GPT-4o in some areas, particularly with regard to factual knowledge about the world.
Is this claim supported by the context according to bespoke-minicheck? Yes
```
We see that the third claim "* o1 has been shown to perform similarly to PhD students on challenging benchmark tasks in physics, chemistry, and biology, while GPT-4o does not have this level of performance." is not supported by the context. This is because the context only mentions that o1 "is claimed to perform" which is different from "has been shown to perform".

View File

@@ -1,137 +0,0 @@
import ollama
import warnings
from mattsollamatools import chunker
from newspaper import Article
import numpy as np
from sklearn.neighbors import NearestNeighbors
import nltk
warnings.filterwarnings(
"ignore", category=FutureWarning, module="transformers.tokenization_utils_base"
)
nltk.download("punkt_tab", quiet=True)
def getArticleText(url):
"""Gets the text of an article from a URL.
Often there are a bunch of ads and menus on pages for a news article.
This uses newspaper3k to get just the text of just the article.
"""
article = Article(url)
article.download()
article.parse()
return article.text
def knn_search(question_embedding, embeddings, k=5):
"""Performs K-nearest neighbors (KNN) search"""
X = np.array(
[item["embedding"] for article in embeddings for item in article["embeddings"]]
)
source_texts = [
item["source"] for article in embeddings for item in article["embeddings"]
]
# Fit a KNN model on the embeddings
knn = NearestNeighbors(n_neighbors=k, metric="cosine")
knn.fit(X)
# Find the indices and distances of the k-nearest neighbors.
_, indices = knn.kneighbors(question_embedding, n_neighbors=k)
# Get the indices and source texts of the best matches
best_matches = [(indices[0][i], source_texts[indices[0][i]]) for i in range(k)]
return best_matches
def check(document, claim):
"""Checks if the claim is supported by the document by calling bespoke-minicheck.
Returns Yes/yes if the claim is supported by the document, No/no otherwise.
Support for logits will be added in the future.
bespoke-minicheck's system prompt is defined as:
'Determine whether the provided claim is consistent with the corresponding
document. Consistency in this context implies that all information presented in the claim
is substantiated by the document. If not, it should be considered inconsistent. Please
assess the claim's consistency with the document by responding with either "Yes" or "No".'
bespoke-minicheck's user prompt is defined as:
"Document: {document}\nClaim: {claim}"
"""
prompt = f"Document: {document}\nClaim: {claim}"
response = ollama.generate(
model="bespoke-minicheck", prompt=prompt, options={"num_predict": 2, "temperature": 0.0}
)
return response["response"].strip()
if __name__ == "__main__":
allEmbeddings = []
default_url = "https://www.theverge.com/2024/9/12/24242439/openai-o1-model-reasoning-strawberry-chatgpt"
user_input = input(
"Enter the URL of an article you want to chat with, or press Enter for default example: "
)
article_url = user_input.strip() if user_input.strip() else default_url
article = {}
article["embeddings"] = []
article["url"] = article_url
text = getArticleText(article_url)
chunks = chunker(text)
# Embed (batch) chunks using ollama
embeddings = ollama.embed(model="all-minilm", input=chunks)["embeddings"]
for chunk, embedding in zip(chunks, embeddings):
item = {}
item["source"] = chunk
item["embedding"] = embedding
item["sourcelength"] = len(chunk)
article["embeddings"].append(item)
allEmbeddings.append(article)
print(f"\nLoaded, chunked, and embedded text from {article_url}.\n")
while True:
# Input a question from the user
# For example, "Who is the chief research officer?"
question = input("Enter your question or type quit: ")
if question.lower() == "quit":
break
# Embed the user's question using ollama.embed
question_embedding = ollama.embed(model="all-minilm", input=question)[
"embeddings"
]
# Perform KNN search to find the best matches (indices and source text)
best_matches = knn_search(question_embedding, allEmbeddings, k=4)
sourcetext = "\n\n".join([source_text for (_, source_text) in best_matches])
print(f"\nRetrieved chunks: \n{sourcetext}\n")
# Give the retrieved chunks and question to the chat model
system_prompt = f"Only use the following information to answer the question. Do not use anything else: {sourcetext}"
ollama_response = ollama.generate(
model="llama3.2",
prompt=question,
system=system_prompt,
options={"stream": False},
)
answer = ollama_response["response"]
print(f"LLM Answer:\n{answer}\n")
# Check each sentence in the response for grounded factuality
if answer:
for claim in nltk.sent_tokenize(answer):
print(f"LLM Claim: {claim}")
print(
f"Is this claim supported by the context according to bespoke-minicheck? {check(sourcetext, claim)}\n"
)

View File

@@ -1,8 +0,0 @@
ollama
lxml==5.3.0
lxml_html_clean==0.2.2
mattsollamatools==0.0.25
newspaper3k==0.2.8
nltk==3.9.1
numpy==1.26.4
scikit-learn==1.5.2

View File

@@ -1,53 +0,0 @@
"""Simple example to demonstrate how to use the bespoke-minicheck model."""
import ollama
# NOTE: ollama must be running for this to work, start the ollama app or run `ollama serve`
def check(document, claim):
"""Checks if the claim is supported by the document by calling bespoke-minicheck.
Returns Yes/yes if the claim is supported by the document, No/no otherwise.
Support for logits will be added in the future.
bespoke-minicheck's system prompt is defined as:
'Determine whether the provided claim is consistent with the corresponding
document. Consistency in this context implies that all information presented in the claim
is substantiated by the document. If not, it should be considered inconsistent. Please
assess the claim's consistency with the document by responding with either "Yes" or "No".'
bespoke-minicheck's user prompt is defined as:
"Document: {document}\nClaim: {claim}"
"""
prompt = f"Document: {document}\nClaim: {claim}"
response = ollama.generate(
model="bespoke-minicheck", prompt=prompt, options={"num_predict": 2, "temperature": 0.0}
)
return response["response"].strip()
def get_user_input(prompt):
user_input = input(prompt)
if not user_input:
exit()
print()
return user_input
def main():
while True:
# Get a document from the user (e.g. "Ryan likes running and biking.")
document = get_user_input("Enter a document: ")
# Get a claim from the user (e.g. "Ryan likes to run.")
claim = get_user_input("Enter a claim: ")
# Check if the claim is supported by the document
grounded_factuality_check = check(document, claim)
print(
f"Is the claim supported by the document according to bespoke-minicheck? {grounded_factuality_check}"
)
print("\n\n")
if __name__ == "__main__":
main()

View File

@@ -1,54 +0,0 @@
# Simple Bespoke-Minicheck Example
`bespoke-minicheck` is a model for checking if a claim is supported by a document. It is used through the **generate** endpoint, which is called in this example with a `prompt` that includes the expected formatting of the user input.
## Running the Example
1. Ensure you have the `bespoke-minicheck` model installed:
```bash
ollama pull bespoke-minicheck
```
2. Install the dependencies:
```bash
pip install -r requirements.txt
```
3. Run the program:
```bash
python main.py
```
4. Enter a document and a claim when prompted:
```bash
Enter a document: Roses are red.
Enter a claim: Roses are blue.
```
The claim and document are then given to the `bespoke-minicheck` as inputs, which then generates a response (Yes or No) on whether the claim is supported by the document.
```bash
Is the claim supported by the document according to bespoke-minicheck? No
```
## More Examples
Document ([source](https://en.wikipedia.org/wiki/Apple_I)):
> The Apple Computer 1 (Apple-1[a]), later known predominantly as the Apple I(written with a Roman numeral),[b] is an 8-bit motherboard-only personal computer designed by Steve Wozniak[5][6] and released by the Apple Computer Company (now Apple Inc.) in 1976. The company was initially formed to sell the Apple I its first product and would later become the world's largest technology company.[7] The idea of starting a company and selling the computer came from Wozniak's friend and Apple co-founder Steve Jobs.[8][9] One of the main innovations of the Apple I was that it included video display terminal circuitry on its circuit board, allowing it to connect to a low-cost composite video monitor or television, instead of an expensive computer terminal, compared to most existing computers at the time.
Claim:
>The Apple I is a 16-bit computer.
Expected output:
>Is the claim supported by the document according to bespoke-minicheck? **No**
Claim:
>Apple was originally called the Apple Computer Company.
Expected output:
>Is the claim supported by the document according to bespoke-minicheck? **Yes**

View File

@@ -1,31 +0,0 @@
import requests
import json
import random
model = "llama3.2"
template = {
"firstName": "",
"lastName": "",
"address": {
"street": "",
"city": "",
"state": "",
"zipCode": ""
},
"phoneNumber": ""
}
prompt = f"generate one realistically believable sample data set of a persons first name, last name, address in the US, and phone number. \nUse the following template: {json.dumps(template)}."
data = {
"prompt": prompt,
"model": model,
"format": "json",
"stream": False,
"options": {"temperature": 2.5, "top_p": 0.99, "top_k": 100},
}
print(f"Generating a sample user")
response = requests.post("http://localhost:11434/api/generate", json=data, stream=False)
json_data = json.loads(response.text)
print(json.dumps(json.loads(json_data["response"]), indent=2))

View File

@@ -1,31 +0,0 @@
import requests
import json
import random
countries = [
"United States",
"United Kingdom",
"the Netherlands",
"Germany",
"Mexico",
"Canada",
"France",
]
country = random.choice(countries)
model = "llama3.2"
prompt = f"generate one realistically believable sample data set of a persons first name, last name, address in {country}, and phone number. Do not use common names. Respond using JSON. Key names should have no backslashes, values should use plain ascii with no special characters."
data = {
"prompt": prompt,
"model": model,
"format": "json",
"stream": False,
"options": {"temperature": 2.5, "top_p": 0.99, "top_k": 100},
}
print(f"Generating a sample user in {country}")
response = requests.post("http://localhost:11434/api/generate", json=data, stream=False)
json_data = json.loads(response.text)
print(json.dumps(json.loads(json_data["response"]), indent=2))

View File

@@ -1,60 +0,0 @@
# JSON Output Example
![llmjson 2023-11-10 15_31_31](https://github.com/ollama/ollama/assets/633681/e599d986-9b4a-4118-81a4-4cfe7e22da25)
There are two python scripts in this example. `randomaddresses.py` generates random addresses from different countries. `predefinedschema.py` sets a template for the model to fill in.
## Running the Example
1. Ensure you have the `llama3.2` model installed:
```bash
ollama pull llama3.2
```
2. Install the Python Requirements.
```bash
pip install -r requirements.txt
```
3. Run the Random Addresses example:
```bash
python randomaddresses.py
```
4. Run the Predefined Schema example:
```bash
python predefinedschema.py
```
## Review the Code
Both programs are basically the same, with a different prompt for each, demonstrating two different ideas. The key part of getting JSON out of a model is to state in the prompt or system prompt that it should respond using JSON, and specifying the `format` as `json` in the data body.
```python
prompt = f"generate one realistically believable sample data set of a persons first name, last name, address in {country}, and phone number. Do not use common names. Respond using JSON. Key names should with no backslashes, values should use plain ascii with no special characters."
data = {
"prompt": prompt,
"model": model,
"format": "json",
"stream": False,
"options": {"temperature": 2.5, "top_p": 0.99, "top_k": 100},
}
```
When running `randomaddresses.py` you will see that the schema changes and adapts to the chosen country.
In `predefinedschema.py`, a template has been specified in the prompt as well. It's been defined as JSON and then dumped into the prompt string to make it easier to work with.
Both examples turn streaming off so that we end up with the completed JSON all at once. We need to convert the `response.text` to JSON so that when we output it as a string we can set the indent spacing to make the output easy to read.
```python
response = requests.post("http://localhost:11434/api/generate", json=data, stream=False)
json_data = json.loads(response.text)
print(json.dumps(json.loads(json_data["response"]), indent=2))
```

View File

@@ -1 +0,0 @@
Requests==2.31.0

View File

@@ -1,8 +0,0 @@
FROM codebooga:latest
SYSTEM """
You are a log file analyzer. You will receive a set of lines from a log file for some software application, find the errors and other interesting aspects of the logs, and explain them so a new user can understand what they mean. If there are any steps they can do to resolve them, list the steps in your answer.
"""
PARAMETER temperature 0.3

View File

@@ -1,41 +0,0 @@
import sys
import re
import requests
import json
# prelines and postlines represent the number of lines of context to include in the output around the error
prelines = 10
postlines = 10
def find_errors_in_log_file():
if len(sys.argv) < 2:
print("Usage: python loganalysis.py <filename>")
return
log_file_path = sys.argv[1]
with open(log_file_path, 'r') as log_file:
log_lines = log_file.readlines()
error_logs = []
for i, line in enumerate(log_lines):
if "error" in line.lower():
start_index = max(0, i - prelines)
end_index = min(len(log_lines), i + postlines + 1)
error_logs.extend(log_lines[start_index:end_index])
return error_logs
error_logs = find_errors_in_log_file()
data = {
"prompt": "\n".join(error_logs),
"model": "mattw/loganalyzer"
}
response = requests.post("http://localhost:11434/api/generate", json=data, stream=True)
for line in response.iter_lines():
if line:
json_data = json.loads(line)
if json_data['done'] == False:
print(json_data['response'], end='', flush=True)

Some files were not shown because too many files have changed in this diff Show More