Compare commits
170 Commits
v0.0.4
...
fix-unknow
Author | SHA1 | Date | |
---|---|---|---|
![]() |
3168f51125 | ||
![]() |
37324a0a00 | ||
![]() |
20a5d99f77 | ||
![]() |
3b43cc019a | ||
![]() |
b8421dce3d | ||
![]() |
9f6e97865c | ||
![]() |
f5f0da06d9 | ||
![]() |
52f04e39f2 | ||
![]() |
3c8f4c03d7 | ||
![]() |
7ba1308595 | ||
![]() |
91cd54016c | ||
![]() |
e7a393de54 | ||
![]() |
8454f298ac | ||
![]() |
a3badaf103 | ||
![]() |
50e8e5bdbe | ||
![]() |
8526e1f5f1 | ||
![]() |
0cfdbb95cc | ||
![]() |
6cea2061ec | ||
![]() |
2832801c2a | ||
![]() |
23a37dc466 | ||
![]() |
992892866b | ||
![]() |
dde880290c | ||
![]() |
1f27d7f1b8 | ||
![]() |
00aaa05901 | ||
![]() |
a83eaa7a9f | ||
![]() |
5156e48c2a | ||
![]() |
bf198c3918 | ||
![]() |
09dc6273e3 | ||
![]() |
ebaa33ac28 | ||
![]() |
3ec4ebc562 | ||
![]() |
6a19724d5f | ||
![]() |
924ce739f9 | ||
![]() |
e1973e6780 | ||
![]() |
f1b08ef40e | ||
![]() |
31f0cb7742 | ||
![]() |
e4b2ccfb23 | ||
![]() |
a3d7bb0a30 | ||
![]() |
77e49f3822 | ||
![]() |
8945b25484 | ||
![]() |
99ccf0c5d3 | ||
![]() |
d59b164fa2 | ||
![]() |
55b5f5dc34 | ||
![]() |
3b135ac963 | ||
![]() |
e6bae8d916 | ||
![]() |
d9f54300c3 | ||
![]() |
1511219763 | ||
![]() |
ada0add89b | ||
![]() |
75e508e1d6 | ||
![]() |
6f046dbf18 | ||
![]() |
cd820c8bca | ||
![]() |
88e755d7fd | ||
![]() |
6984171cfd | ||
![]() |
60b4db6389 | ||
![]() |
7c6ea2a966 | ||
![]() |
c161aef5f9 | ||
![]() |
c47786c1b0 | ||
![]() |
df100ce540 | ||
![]() |
5c5948b4e7 | ||
![]() |
1c72e46e09 | ||
![]() |
ca210ba480 | ||
![]() |
df146c41e2 | ||
![]() |
2d305fa99a | ||
![]() |
e4d7f3e287 | ||
![]() |
f2044b5838 | ||
![]() |
d53988f619 | ||
![]() |
ac88ab48d9 | ||
![]() |
84c6ee8cc6 | ||
![]() |
dbc90576b8 | ||
![]() |
84200dcde6 | ||
![]() |
e54c08da89 | ||
![]() |
31413857ea | ||
![]() |
25f874c030 | ||
![]() |
10d502611f | ||
![]() |
7fe4103b94 | ||
![]() |
7fbdc8e2c1 | ||
![]() |
9c5572d51f | ||
![]() |
75eb28f574 | ||
![]() |
56b6a1720f | ||
![]() |
dfceca48a7 | ||
![]() |
bbb67002c3 | ||
![]() |
0294216ea9 | ||
![]() |
7a62b2d2ab | ||
![]() |
f08c050e57 | ||
![]() |
67c8d49757 | ||
![]() |
ffcd90e8a7 | ||
![]() |
4ca7c4be1f | ||
![]() |
17b7af78f0 | ||
![]() |
4c1dc52083 | ||
![]() |
572fc9099f | ||
![]() |
3020f29041 | ||
![]() |
a6d03dd510 | ||
![]() |
68df36ae50 | ||
![]() |
5540305293 | ||
![]() |
d4cfee79d5 | ||
![]() |
6e36f948df | ||
![]() |
553fa39fe8 | ||
![]() |
820e581ad8 | ||
![]() |
d14785738e | ||
![]() |
9e15635c2d | ||
![]() |
3e10f902f5 | ||
![]() |
aa6714f25c | ||
![]() |
7f3a37aed4 | ||
![]() |
7b08280355 | ||
![]() |
e3cc4d5eac | ||
![]() |
8c85dfb735 | ||
![]() |
ac62a413e5 | ||
![]() |
d1f89778e9 | ||
![]() |
df67a90e64 | ||
![]() |
576ae644de | ||
![]() |
7e52e51db1 | ||
![]() |
f12df8d79a | ||
![]() |
65de730bdb | ||
![]() |
9658a5043b | ||
![]() |
280fbe8019 | ||
![]() |
2e339c2bab | ||
![]() |
38f0c54c64 | ||
![]() |
f20426a768 | ||
![]() |
885f67a471 | ||
![]() |
a9cc270b4d | ||
![]() |
aa281a30e5 | ||
![]() |
760bc3366b | ||
![]() |
5bea29f610 | ||
![]() |
9310ee3967 | ||
![]() |
da7ddbb4dc | ||
![]() |
4a28a2f093 | ||
![]() |
3d9498dc95 | ||
![]() |
1f45f7bb52 | ||
![]() |
2e6c64a8f9 | ||
![]() |
c7dd52271c | ||
![]() |
e4300e1eb7 | ||
![]() |
aba706ea2d | ||
![]() |
53d0052c6c | ||
![]() |
28a136e9a3 | ||
![]() |
529ff9ab6d | ||
![]() |
41aca47d43 | ||
![]() |
3862a51a6a | ||
![]() |
bcb612a30a | ||
![]() |
c05219aa0d | ||
![]() |
508ffbbb15 | ||
![]() |
59fa93cdd4 | ||
![]() |
952abe029b | ||
![]() |
f923855906 | ||
![]() |
9386073e96 | ||
![]() |
52ea4d4bb2 | ||
![]() |
c4ba192187 | ||
![]() |
fe758ca319 | ||
![]() |
08b933cc10 | ||
![]() |
6746a00af8 | ||
![]() |
2fb52261ad | ||
![]() |
6fdea03049 | ||
![]() |
38021ba494 | ||
![]() |
6c9fa573ae | ||
![]() |
40c9dc0a31 | ||
![]() |
0142660bd4 | ||
![]() |
743e957d88 | ||
![]() |
560f36e6c8 | ||
![]() |
e88dd25bab | ||
![]() |
567e74e7d7 | ||
![]() |
5ade3db040 | ||
![]() |
965f9ad033 | ||
![]() |
5d1c6b7499 | ||
![]() |
5fefaa5d4d | ||
![]() |
1775647f76 | ||
![]() |
77dc1a6d74 | ||
![]() |
05e08d2310 | ||
![]() |
31590284a7 | ||
![]() |
f2863cc7f8 | ||
![]() |
4dd296e155 | ||
![]() |
304f419429 | ||
![]() |
2666d3c206 |
125
README.md
@@ -1,76 +1,113 @@
|
||||

|
||||
<div align="center">
|
||||
<picture>
|
||||
<source media="(prefers-color-scheme: dark)" height="200px" srcset="https://github.com/jmorganca/ollama/assets/3325447/56ea1849-1284-4645-8970-956de6e51c3c">
|
||||
<img alt="logo" height="200px" src="https://github.com/jmorganca/ollama/assets/3325447/0d0b44e2-8f4a-4e99-9b52-a5c1c741c8f7">
|
||||
</picture>
|
||||
</div>
|
||||
|
||||
# Ollama
|
||||
|
||||
Run large language models with `llama.cpp`.
|
||||
[](https://discord.gg/ollama)
|
||||
|
||||
> Note: certain models that can be run with Ollama are intended for research and/or non-commercial use only.
|
||||
> Note: Ollama is in early preview. Please report any issues you find.
|
||||
|
||||
### Features
|
||||
Run, create, and share large language models (LLMs).
|
||||
|
||||
- Download and run popular large language models
|
||||
- Switch between multiple models on the fly
|
||||
- Hardware acceleration where available (Metal, CUDA)
|
||||
- Fast inference server written in Go, powered by [llama.cpp](https://github.com/ggerganov/llama.cpp)
|
||||
- REST API to use with your application (python, typescript SDKs coming soon)
|
||||
## Download
|
||||
|
||||
## Install
|
||||
|
||||
- [Download](https://ollama.ai/download) for macOS
|
||||
- Download for Windows (coming soon)
|
||||
|
||||
You can also build the [binary from source](#building).
|
||||
- [Download](https://ollama.ai/download) for macOS on Apple Silicon (Intel coming soon)
|
||||
- Download for Windows and Linux (coming soon)
|
||||
- Build [from source](#building)
|
||||
|
||||
## Quickstart
|
||||
|
||||
Run a fast and simple model.
|
||||
To run and chat with [Llama 2](https://ai.meta.com/llama), the new model by Meta:
|
||||
|
||||
```
|
||||
ollama run orca
|
||||
ollama run llama2
|
||||
```
|
||||
|
||||
## Example models
|
||||
## Model library
|
||||
|
||||
### 💬 Chat
|
||||
`ollama` includes a library of open-source models:
|
||||
|
||||
Have a conversation.
|
||||
| Model | Parameters | Size | Download |
|
||||
| ------------------------ | ---------- | ----- | --------------------------- |
|
||||
| Llama2 | 7B | 3.8GB | `ollama pull llama2` |
|
||||
| Llama2 13B | 13B | 7.3GB | `ollama pull llama2:13b` |
|
||||
| Orca Mini | 3B | 1.9GB | `ollama pull orca` |
|
||||
| Vicuna | 7B | 3.8GB | `ollama pull vicuna` |
|
||||
| Nous-Hermes | 13B | 7.3GB | `ollama pull nous-hermes` |
|
||||
| Wizard Vicuna Uncensored | 13B | 7.3GB | `ollama pull wizard-vicuna` |
|
||||
|
||||
> Note: You should have at least 8 GB of RAM to run the 3B models, 16 GB to run the 7B models, and 32 GB to run the 13B models.
|
||||
|
||||
## Examples
|
||||
|
||||
### Run a model
|
||||
|
||||
```
|
||||
ollama run vicuna "Why is the sky blue?"
|
||||
ollama run llama2
|
||||
>>> hi
|
||||
Hello! How can I help you today?
|
||||
```
|
||||
|
||||
### 🗺️ Instructions
|
||||
### Create a custom model
|
||||
|
||||
Get a helping hand.
|
||||
Pull a base model:
|
||||
|
||||
```
|
||||
ollama run orca "Write an email to my boss."
|
||||
ollama pull llama2
|
||||
```
|
||||
|
||||
### 🔎 Ask questions about documents
|
||||
|
||||
Send the contents of a document and ask questions about it.
|
||||
Create a `Modelfile`:
|
||||
|
||||
```
|
||||
ollama run nous-hermes "$(cat input.txt)", please summarize this story
|
||||
FROM llama2
|
||||
|
||||
# set the temperature to 1 [higher is more creative, lower is more coherent]
|
||||
PARAMETER temperature 1
|
||||
|
||||
# set the system prompt
|
||||
SYSTEM """
|
||||
You are Mario from Super Mario Bros. Answer as Mario, the assistant, only.
|
||||
"""
|
||||
```
|
||||
|
||||
### 📖 Storytelling
|
||||
|
||||
Venture into the unknown.
|
||||
Next, create and run the model:
|
||||
|
||||
```
|
||||
ollama run nous-hermes "Once upon a time"
|
||||
ollama create mario -f ./Modelfile
|
||||
ollama run mario
|
||||
>>> hi
|
||||
Hello! It's your friend Mario.
|
||||
```
|
||||
|
||||
## Advanced usage
|
||||
For more examples, see the [examples](./examples) directory.
|
||||
|
||||
### Run a local model
|
||||
### Pull a model from the registry
|
||||
|
||||
```
|
||||
ollama run ~/Downloads/vicuna-7b-v1.3.ggmlv3.q4_1.bin
|
||||
ollama pull orca
|
||||
```
|
||||
|
||||
### Listing local models
|
||||
|
||||
```
|
||||
ollama list
|
||||
```
|
||||
|
||||
## Model packages
|
||||
|
||||
### Overview
|
||||
|
||||
Ollama bundles model weights, configuration, and data into a single package, defined by a [Modelfile](./docs/modelfile.md).
|
||||
|
||||
<picture>
|
||||
<source media="(prefers-color-scheme: dark)" height="480" srcset="https://github.com/jmorganca/ollama/assets/251292/2fd96b5f-191b-45c1-9668-941cfad4eb70">
|
||||
<img alt="logo" height="480" src="https://github.com/jmorganca/ollama/assets/251292/2fd96b5f-191b-45c1-9668-941cfad4eb70">
|
||||
</picture>
|
||||
|
||||
## Building
|
||||
|
||||
```
|
||||
@@ -80,29 +117,21 @@ go build .
|
||||
To run it start the server:
|
||||
|
||||
```
|
||||
./ollama server &
|
||||
./ollama serve &
|
||||
```
|
||||
|
||||
Finally, run a model!
|
||||
|
||||
```
|
||||
./ollama run ~/Downloads/vicuna-7b-v1.3.ggmlv3.q4_1.bin
|
||||
./ollama run llama2
|
||||
```
|
||||
|
||||
## API Reference
|
||||
|
||||
### `POST /api/pull`
|
||||
|
||||
Download a model
|
||||
|
||||
```
|
||||
curl -X POST http://localhost:11343/api/pull -d '{"model": "orca"}'
|
||||
```
|
||||
## REST API
|
||||
|
||||
### `POST /api/generate`
|
||||
|
||||
Complete a prompt
|
||||
Generate text from a model.
|
||||
|
||||
```
|
||||
curl -X POST http://localhost:11434/api/generate -d '{"model": "orca", "prompt": "hello!", "stream": true}'
|
||||
curl -X POST http://localhost:11434/api/generate -d '{"model": "llama2", "prompt":"Why is the sky blue?"}'
|
||||
```
|
||||
|
132
api/client.go
@@ -6,26 +6,31 @@ import (
|
||||
"context"
|
||||
"encoding/json"
|
||||
"fmt"
|
||||
"io"
|
||||
"net/http"
|
||||
"net/url"
|
||||
)
|
||||
|
||||
type StatusError struct {
|
||||
StatusCode int
|
||||
Status string
|
||||
Message string
|
||||
type Client struct {
|
||||
base url.URL
|
||||
HTTP http.Client
|
||||
Headers http.Header
|
||||
}
|
||||
|
||||
func (e StatusError) Error() string {
|
||||
if e.Message != "" {
|
||||
return fmt.Sprintf("%s: %s", e.Status, e.Message)
|
||||
func checkError(resp *http.Response, body []byte) error {
|
||||
if resp.StatusCode >= 200 && resp.StatusCode < 400 {
|
||||
return nil
|
||||
}
|
||||
|
||||
return e.Status
|
||||
}
|
||||
apiError := StatusError{StatusCode: resp.StatusCode}
|
||||
|
||||
type Client struct {
|
||||
base url.URL
|
||||
err := json.Unmarshal(body, &apiError)
|
||||
if err != nil {
|
||||
// Use the full body as the message if we fail to decode a response.
|
||||
apiError.ErrorMessage = string(body)
|
||||
}
|
||||
|
||||
return apiError
|
||||
}
|
||||
|
||||
func NewClient(hosts ...string) *Client {
|
||||
@@ -36,9 +41,59 @@ func NewClient(hosts ...string) *Client {
|
||||
|
||||
return &Client{
|
||||
base: url.URL{Scheme: "http", Host: host},
|
||||
HTTP: http.Client{},
|
||||
}
|
||||
}
|
||||
|
||||
func (c *Client) do(ctx context.Context, method, path string, reqData, respData any) error {
|
||||
var reqBody io.Reader
|
||||
var data []byte
|
||||
var err error
|
||||
if reqData != nil {
|
||||
data, err = json.Marshal(reqData)
|
||||
if err != nil {
|
||||
return err
|
||||
}
|
||||
reqBody = bytes.NewReader(data)
|
||||
}
|
||||
|
||||
url := c.base.JoinPath(path).String()
|
||||
|
||||
req, err := http.NewRequestWithContext(ctx, method, url, reqBody)
|
||||
if err != nil {
|
||||
return err
|
||||
}
|
||||
|
||||
req.Header.Set("Content-Type", "application/json")
|
||||
req.Header.Set("Accept", "application/json")
|
||||
|
||||
for k, v := range c.Headers {
|
||||
req.Header[k] = v
|
||||
}
|
||||
|
||||
respObj, err := c.HTTP.Do(req)
|
||||
if err != nil {
|
||||
return err
|
||||
}
|
||||
defer respObj.Body.Close()
|
||||
|
||||
respBody, err := io.ReadAll(respObj.Body)
|
||||
if err != nil {
|
||||
return err
|
||||
}
|
||||
|
||||
if err := checkError(respObj, respBody); err != nil {
|
||||
return err
|
||||
}
|
||||
|
||||
if len(respBody) > 0 && respData != nil {
|
||||
if err := json.Unmarshal(respBody, respData); err != nil {
|
||||
return err
|
||||
}
|
||||
}
|
||||
return nil
|
||||
}
|
||||
|
||||
func (c *Client) stream(ctx context.Context, method, path string, data any, fn func([]byte) error) error {
|
||||
var buf *bytes.Buffer
|
||||
if data != nil {
|
||||
@@ -75,11 +130,15 @@ func (c *Client) stream(ctx context.Context, method, path string, data any, fn f
|
||||
return fmt.Errorf("unmarshal: %w", err)
|
||||
}
|
||||
|
||||
if errorResponse.Error != "" {
|
||||
return fmt.Errorf("stream: %s", errorResponse.Error)
|
||||
}
|
||||
|
||||
if response.StatusCode >= 400 {
|
||||
return StatusError{
|
||||
StatusCode: response.StatusCode,
|
||||
Status: response.Status,
|
||||
Message: errorResponse.Error,
|
||||
StatusCode: response.StatusCode,
|
||||
Status: response.Status,
|
||||
ErrorMessage: errorResponse.Error,
|
||||
}
|
||||
}
|
||||
|
||||
@@ -104,11 +163,11 @@ func (c *Client) Generate(ctx context.Context, req *GenerateRequest, fn Generate
|
||||
})
|
||||
}
|
||||
|
||||
type PullProgressFunc func(PullProgress) error
|
||||
type PullProgressFunc func(ProgressResponse) error
|
||||
|
||||
func (c *Client) Pull(ctx context.Context, req *PullRequest, fn PullProgressFunc) error {
|
||||
return c.stream(ctx, http.MethodPost, "/api/pull", req, func(bts []byte) error {
|
||||
var resp PullProgress
|
||||
var resp ProgressResponse
|
||||
if err := json.Unmarshal(bts, &resp); err != nil {
|
||||
return err
|
||||
}
|
||||
@@ -116,3 +175,44 @@ func (c *Client) Pull(ctx context.Context, req *PullRequest, fn PullProgressFunc
|
||||
return fn(resp)
|
||||
})
|
||||
}
|
||||
|
||||
type PushProgressFunc func(ProgressResponse) error
|
||||
|
||||
func (c *Client) Push(ctx context.Context, req *PushRequest, fn PushProgressFunc) error {
|
||||
return c.stream(ctx, http.MethodPost, "/api/push", req, func(bts []byte) error {
|
||||
var resp ProgressResponse
|
||||
if err := json.Unmarshal(bts, &resp); err != nil {
|
||||
return err
|
||||
}
|
||||
|
||||
return fn(resp)
|
||||
})
|
||||
}
|
||||
|
||||
type CreateProgressFunc func(CreateProgress) error
|
||||
|
||||
func (c *Client) Create(ctx context.Context, req *CreateRequest, fn CreateProgressFunc) error {
|
||||
return c.stream(ctx, http.MethodPost, "/api/create", req, func(bts []byte) error {
|
||||
var resp CreateProgress
|
||||
if err := json.Unmarshal(bts, &resp); err != nil {
|
||||
return err
|
||||
}
|
||||
|
||||
return fn(resp)
|
||||
})
|
||||
}
|
||||
|
||||
func (c *Client) List(ctx context.Context) (*ListResponse, error) {
|
||||
var lr ListResponse
|
||||
if err := c.do(ctx, http.MethodGet, "/api/tags", nil, &lr); err != nil {
|
||||
return nil, err
|
||||
}
|
||||
return &lr, nil
|
||||
}
|
||||
|
||||
func (c *Client) Delete(ctx context.Context, req *DeleteRequest) error {
|
||||
if err := c.do(ctx, http.MethodDelete, "/api/delete", req, nil); err != nil {
|
||||
return err
|
||||
}
|
||||
return nil
|
||||
}
|
||||
|
117
api/types.go
@@ -1,26 +1,121 @@
|
||||
package api
|
||||
|
||||
import "runtime"
|
||||
import (
|
||||
"fmt"
|
||||
"os"
|
||||
"runtime"
|
||||
"time"
|
||||
)
|
||||
|
||||
type PullRequest struct {
|
||||
Model string `json:"model"`
|
||||
type StatusError struct {
|
||||
StatusCode int
|
||||
Status string
|
||||
ErrorMessage string `json:"error"`
|
||||
}
|
||||
|
||||
type PullProgress struct {
|
||||
Total int64 `json:"total"`
|
||||
Completed int64 `json:"completed"`
|
||||
Percent float64 `json:"percent"`
|
||||
func (e StatusError) Error() string {
|
||||
switch {
|
||||
case e.Status != "" && e.ErrorMessage != "":
|
||||
return fmt.Sprintf("%s: %s", e.Status, e.ErrorMessage)
|
||||
case e.Status != "":
|
||||
return e.Status
|
||||
case e.ErrorMessage != "":
|
||||
return e.ErrorMessage
|
||||
default:
|
||||
// this should not happen
|
||||
return "something went wrong, please see the ollama server logs for details"
|
||||
}
|
||||
}
|
||||
|
||||
type GenerateRequest struct {
|
||||
Model string `json:"model"`
|
||||
Prompt string `json:"prompt"`
|
||||
Model string `json:"model"`
|
||||
Prompt string `json:"prompt"`
|
||||
Context []int `json:"context,omitempty"`
|
||||
|
||||
Options `json:"options"`
|
||||
}
|
||||
|
||||
type CreateRequest struct {
|
||||
Name string `json:"name"`
|
||||
Path string `json:"path"`
|
||||
}
|
||||
|
||||
type CreateProgress struct {
|
||||
Status string `json:"status"`
|
||||
}
|
||||
|
||||
type DeleteRequest struct {
|
||||
Name string `json:"name"`
|
||||
}
|
||||
|
||||
type PullRequest struct {
|
||||
Name string `json:"name"`
|
||||
Insecure bool `json:"insecure,omitempty"`
|
||||
Username string `json:"username"`
|
||||
Password string `json:"password"`
|
||||
}
|
||||
|
||||
type ProgressResponse struct {
|
||||
Status string `json:"status"`
|
||||
Digest string `json:"digest,omitempty"`
|
||||
Total int `json:"total,omitempty"`
|
||||
Completed int `json:"completed,omitempty"`
|
||||
}
|
||||
|
||||
type PushRequest struct {
|
||||
Name string `json:"name"`
|
||||
Insecure bool `json:"insecure,omitempty"`
|
||||
Username string `json:"username"`
|
||||
Password string `json:"password"`
|
||||
}
|
||||
|
||||
type ListResponse struct {
|
||||
Models []ListResponseModel `json:"models"`
|
||||
}
|
||||
|
||||
type ListResponseModel struct {
|
||||
Name string `json:"name"`
|
||||
ModifiedAt time.Time `json:"modified_at"`
|
||||
Size int `json:"size"`
|
||||
}
|
||||
|
||||
type GenerateResponse struct {
|
||||
Response string `json:"response"`
|
||||
Model string `json:"model"`
|
||||
CreatedAt time.Time `json:"created_at"`
|
||||
Response string `json:"response,omitempty"`
|
||||
|
||||
Done bool `json:"done"`
|
||||
Context []int `json:"context,omitempty"`
|
||||
|
||||
TotalDuration time.Duration `json:"total_duration,omitempty"`
|
||||
PromptEvalCount int `json:"prompt_eval_count,omitempty"`
|
||||
PromptEvalDuration time.Duration `json:"prompt_eval_duration,omitempty"`
|
||||
EvalCount int `json:"eval_count,omitempty"`
|
||||
EvalDuration time.Duration `json:"eval_duration,omitempty"`
|
||||
}
|
||||
|
||||
func (r *GenerateResponse) Summary() {
|
||||
if r.TotalDuration > 0 {
|
||||
fmt.Fprintf(os.Stderr, "total duration: %v\n", r.TotalDuration)
|
||||
}
|
||||
|
||||
if r.PromptEvalCount > 0 {
|
||||
fmt.Fprintf(os.Stderr, "prompt eval count: %d token(s)\n", r.PromptEvalCount)
|
||||
}
|
||||
|
||||
if r.PromptEvalDuration > 0 {
|
||||
fmt.Fprintf(os.Stderr, "prompt eval duration: %s\n", r.PromptEvalDuration)
|
||||
fmt.Fprintf(os.Stderr, "prompt eval rate: %.2f tokens/s\n", float64(r.PromptEvalCount)/r.PromptEvalDuration.Seconds())
|
||||
}
|
||||
|
||||
if r.EvalCount > 0 {
|
||||
fmt.Fprintf(os.Stderr, "eval count: %d token(s)\n", r.EvalCount)
|
||||
}
|
||||
|
||||
if r.EvalDuration > 0 {
|
||||
fmt.Fprintf(os.Stderr, "eval duration: %s\n", r.EvalDuration)
|
||||
fmt.Fprintf(os.Stderr, "eval rate: %.2f tokens/s\n", float64(r.EvalCount)/r.EvalDuration.Seconds())
|
||||
}
|
||||
}
|
||||
|
||||
type Options struct {
|
||||
@@ -65,7 +160,7 @@ func DefaultOptions() Options {
|
||||
|
||||
UseNUMA: false,
|
||||
|
||||
NumCtx: 512,
|
||||
NumCtx: 2048,
|
||||
NumBatch: 512,
|
||||
NumGPU: 1,
|
||||
LowVRAM: false,
|
||||
|
Before Width: | Height: | Size: 442 B After Width: | Height: | Size: 403 B |
Before Width: | Height: | Size: 889 B After Width: | Height: | Size: 741 B |
BIN
app/assets/ollama_outline_icon_16x16Template.png
Normal file
After Width: | Height: | Size: 445 B |
BIN
app/assets/ollama_outline_icon_16x16Template@2x.png
Normal file
After Width: | Height: | Size: 891 B |
@@ -1,4 +1,4 @@
|
||||
import type { ForgeConfig, ResolvedForgeConfig, ForgeMakeResult } from '@electron-forge/shared-types'
|
||||
import type { ForgeConfig } from '@electron-forge/shared-types'
|
||||
import { MakerSquirrel } from '@electron-forge/maker-squirrel'
|
||||
import { MakerZIP } from '@electron-forge/maker-zip'
|
||||
import { PublisherGithub } from '@electron-forge/publisher-github'
|
||||
@@ -21,6 +21,8 @@ const config: ForgeConfig = {
|
||||
'../ollama',
|
||||
path.join(__dirname, './assets/ollama_icon_16x16Template.png'),
|
||||
path.join(__dirname, './assets/ollama_icon_16x16Template@2x.png'),
|
||||
path.join(__dirname, './assets/ollama_outline_icon_16x16Template.png'),
|
||||
path.join(__dirname, './assets/ollama_outline_icon_16x16Template@2x.png'),
|
||||
...(process.platform === 'darwin' ? ['../llama/ggml-metal.metal'] : []),
|
||||
],
|
||||
...(process.env.SIGN
|
||||
@@ -58,7 +60,7 @@ const config: ForgeConfig = {
|
||||
new AutoUnpackNativesPlugin({}),
|
||||
new WebpackPlugin({
|
||||
mainConfig,
|
||||
devContentSecurityPolicy: `default-src * 'unsafe-eval' 'unsafe-inline'`,
|
||||
devContentSecurityPolicy: `default-src * 'unsafe-eval' 'unsafe-inline'; img-src data: 'self'`,
|
||||
renderer: {
|
||||
config: rendererConfig,
|
||||
nodeIntegration: true,
|
||||
|
2274
app/package-lock.json
generated
@@ -30,6 +30,7 @@
|
||||
"@electron-forge/plugin-auto-unpack-natives": "^6.2.1",
|
||||
"@electron-forge/plugin-webpack": "^6.2.1",
|
||||
"@electron-forge/publisher-github": "^6.2.1",
|
||||
"@svgr/webpack": "^8.0.1",
|
||||
"@types/chmodr": "^1.0.0",
|
||||
"@types/node": "^20.4.0",
|
||||
"@types/react": "^18.2.14",
|
||||
@@ -54,17 +55,21 @@
|
||||
"prettier": "^2.8.8",
|
||||
"prettier-plugin-tailwindcss": "^0.3.0",
|
||||
"style-loader": "^3.3.3",
|
||||
"svg-inline-loader": "^0.8.2",
|
||||
"tailwindcss": "^3.3.2",
|
||||
"ts-loader": "^9.4.3",
|
||||
"ts-node": "^10.9.1",
|
||||
"typescript": "~4.5.4",
|
||||
"url-loader": "^4.1.1",
|
||||
"webpack": "^5.88.0",
|
||||
"webpack-cli": "^5.1.4",
|
||||
"webpack-dev-server": "^4.15.1"
|
||||
},
|
||||
"dependencies": {
|
||||
"@electron/remote": "^2.0.10",
|
||||
"@heroicons/react": "^2.0.18",
|
||||
"@segment/analytics-node": "^1.0.0",
|
||||
"copy-to-clipboard": "^3.3.3",
|
||||
"electron-squirrel-startup": "^1.0.0",
|
||||
"electron-store": "^8.1.0",
|
||||
"react": "^18.2.0",
|
||||
|
@@ -11,6 +11,10 @@ body {
|
||||
-webkit-app-region: drag;
|
||||
}
|
||||
|
||||
.no-drag {
|
||||
-webkit-app-region: no-drag;
|
||||
}
|
||||
|
||||
.blink {
|
||||
-webkit-animation: 1s blink step-end infinite;
|
||||
-moz-animation: 1s blink step-end infinite;
|
||||
|
245
app/src/app.tsx
@@ -1,158 +1,115 @@
|
||||
import { useState } from 'react'
|
||||
import path from 'path'
|
||||
import os from 'os'
|
||||
import { dialog, getCurrentWindow } from '@electron/remote'
|
||||
import copy from 'copy-to-clipboard'
|
||||
import { CheckIcon, DocumentDuplicateIcon } from '@heroicons/react/24/outline'
|
||||
import Store from 'electron-store'
|
||||
import { getCurrentWindow } from '@electron/remote'
|
||||
|
||||
const API_URL = 'http://127.0.0.1:7734'
|
||||
import { install } from './install'
|
||||
import OllamaIcon from './ollama.svg'
|
||||
|
||||
type Message = {
|
||||
sender: 'bot' | 'human'
|
||||
content: string
|
||||
}
|
||||
const store = new Store()
|
||||
|
||||
const userInfo = os.userInfo()
|
||||
|
||||
async function generate(prompt: string, model: string, callback: (res: string) => void) {
|
||||
const result = await fetch(`${API_URL}/generate`, {
|
||||
method: 'POST',
|
||||
headers: {
|
||||
'Content-Type': 'application/json',
|
||||
},
|
||||
body: JSON.stringify({
|
||||
prompt,
|
||||
model,
|
||||
}),
|
||||
})
|
||||
|
||||
if (!result.ok) {
|
||||
return
|
||||
}
|
||||
|
||||
let reader = result.body.getReader()
|
||||
|
||||
while (true) {
|
||||
const { done, value } = await reader.read()
|
||||
|
||||
if (done) {
|
||||
break
|
||||
}
|
||||
|
||||
let decoder = new TextDecoder()
|
||||
let str = decoder.decode(value)
|
||||
|
||||
let re = /}\s*{/g
|
||||
str = '[' + str.replace(re, '},{') + ']'
|
||||
let messages = JSON.parse(str)
|
||||
|
||||
for (const message of messages) {
|
||||
const choice = message.choices[0]
|
||||
|
||||
callback(choice.text)
|
||||
|
||||
if (choice.finish_reason === 'stop') {
|
||||
break
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
return
|
||||
enum Step {
|
||||
WELCOME = 0,
|
||||
CLI,
|
||||
FINISH,
|
||||
}
|
||||
|
||||
export default function () {
|
||||
const [prompt, setPrompt] = useState('')
|
||||
const [messages, setMessages] = useState<Message[]>([])
|
||||
const [model, setModel] = useState('')
|
||||
const [generating, setGenerating] = useState(false)
|
||||
const [step, setStep] = useState<Step>(Step.WELCOME)
|
||||
const [commandCopied, setCommandCopied] = useState<boolean>(false)
|
||||
|
||||
const command = 'ollama run llama2'
|
||||
|
||||
return (
|
||||
<div className='flex min-h-screen flex-1 flex-col justify-between bg-white'>
|
||||
<header className='drag sticky top-0 z-50 flex h-14 w-full flex-row items-center border-b border-black/10 bg-white/75 backdrop-blur-md'>
|
||||
<div className='mx-auto w-full max-w-xl leading-none'>
|
||||
<h1 className='text-sm font-medium'>{path.basename(model).replace('.bin', '')}</h1>
|
||||
</div>
|
||||
</header>
|
||||
{model ? (
|
||||
<section className='mx-auto mb-10 w-full max-w-xl flex-1 break-words'>
|
||||
{messages.map((m, i) => (
|
||||
<div className='my-4 flex gap-4' key={i}>
|
||||
<div className='flex-none pr-1 text-lg'>
|
||||
{m.sender === 'human' ? (
|
||||
<div className='mt-px flex h-6 w-6 items-center justify-center rounded-md bg-neutral-200 text-sm text-neutral-700'>
|
||||
{userInfo.username[0].toUpperCase()}
|
||||
</div>
|
||||
) : (
|
||||
<div className='mt-0.5 flex h-6 w-6 items-center justify-center rounded-md bg-blue-600 text-sm text-white'>
|
||||
{path.basename(model)[0].toUpperCase()}
|
||||
</div>
|
||||
)}
|
||||
</div>
|
||||
<div className='flex-1 text-gray-800'>
|
||||
{m.content}
|
||||
{m.sender === 'bot' && generating && i === messages.length - 1 && (
|
||||
<span className='blink relative -top-[3px] left-1 text-[10px]'>█</span>
|
||||
)}
|
||||
<div className='drag'>
|
||||
<div className='mx-auto flex min-h-screen w-full flex-col justify-between bg-white px-4 pt-16'>
|
||||
{step === Step.WELCOME && (
|
||||
<>
|
||||
<div className='mx-auto text-center'>
|
||||
<h1 className='mb-6 mt-4 text-2xl tracking-tight text-gray-900'>Welcome to Ollama</h1>
|
||||
<p className='mx-auto w-[65%] text-sm text-gray-400'>
|
||||
Let's get you up and running with your own large language models.
|
||||
</p>
|
||||
<button
|
||||
onClick={() => setStep(Step.CLI)}
|
||||
className='no-drag rounded-dm mx-auto my-8 w-[40%] rounded-md bg-black px-4 py-2 text-sm text-white hover:brightness-110'
|
||||
>
|
||||
Next
|
||||
</button>
|
||||
</div>
|
||||
<div className='mx-auto'>
|
||||
<OllamaIcon />
|
||||
</div>
|
||||
</>
|
||||
)}
|
||||
{step === Step.CLI && (
|
||||
<>
|
||||
<div className='mx-auto flex flex-col space-y-28 text-center'>
|
||||
<h1 className='mt-4 text-2xl tracking-tight text-gray-900'>Install the command line</h1>
|
||||
<pre className='mx-auto text-4xl text-gray-400'>> ollama</pre>
|
||||
<div className='mx-auto'>
|
||||
<button
|
||||
onClick={async () => {
|
||||
await install()
|
||||
getCurrentWindow().show()
|
||||
getCurrentWindow().focus()
|
||||
setStep(Step.FINISH)
|
||||
}}
|
||||
className='no-drag rounded-dm mx-auto w-[60%] rounded-md bg-black px-4 py-2 text-sm text-white hover:brightness-110'
|
||||
>
|
||||
Install
|
||||
</button>
|
||||
<p className='mx-auto my-4 w-[70%] text-xs text-gray-400'>
|
||||
You will be prompted for administrator access
|
||||
</p>
|
||||
</div>
|
||||
</div>
|
||||
))}
|
||||
</section>
|
||||
) : (
|
||||
<section className='flex flex-1 select-none flex-col items-center justify-center pb-20'>
|
||||
<h2 className='text-3xl font-light text-neutral-400'>No model selected</h2>
|
||||
<button
|
||||
onClick={async () => {
|
||||
const res = await dialog.showOpenDialog(getCurrentWindow(), {
|
||||
properties: ['openFile', 'multiSelections'],
|
||||
})
|
||||
if (res.canceled) {
|
||||
return
|
||||
}
|
||||
|
||||
setModel(res.filePaths[0])
|
||||
}}
|
||||
className='rounded-dm my-8 rounded-md bg-blue-600 px-4 py-2 text-sm text-white hover:brightness-110'
|
||||
>
|
||||
Open file...
|
||||
</button>
|
||||
</section>
|
||||
)}
|
||||
<div className='sticky bottom-0 bg-gradient-to-b from-transparent to-white'>
|
||||
{model && (
|
||||
<textarea
|
||||
autoFocus
|
||||
rows={1}
|
||||
value={prompt}
|
||||
placeholder='Send a message...'
|
||||
onChange={e => setPrompt(e.target.value)}
|
||||
className='mx-auto my-4 block w-full max-w-xl resize-none rounded-xl border border-gray-200 px-5 py-3.5 text-[15px] shadow-lg shadow-black/5 focus:outline-none'
|
||||
onKeyDownCapture={async e => {
|
||||
if (e.key === 'Enter' && !e.shiftKey) {
|
||||
e.preventDefault()
|
||||
|
||||
if (generating) {
|
||||
return
|
||||
}
|
||||
|
||||
if (!prompt) {
|
||||
return
|
||||
}
|
||||
|
||||
await setMessages(messages => {
|
||||
return [...messages, { sender: 'human', content: prompt }, { sender: 'bot', content: '' }]
|
||||
})
|
||||
|
||||
setPrompt('')
|
||||
|
||||
setGenerating(true)
|
||||
await generate(prompt, model, res => {
|
||||
setMessages(messages => {
|
||||
let last = messages[messages.length - 1]
|
||||
return [...messages.slice(0, messages.length - 1), { ...last, content: last.content + res }]
|
||||
})
|
||||
})
|
||||
setGenerating(false)
|
||||
}
|
||||
}}
|
||||
></textarea>
|
||||
</>
|
||||
)}
|
||||
{step === Step.FINISH && (
|
||||
<>
|
||||
<div className='mx-auto flex flex-col space-y-20 text-center'>
|
||||
<h1 className='mt-4 text-2xl tracking-tight text-gray-900'>Run your first model</h1>
|
||||
<div className='flex flex-col'>
|
||||
<div className='group relative flex items-center'>
|
||||
<pre className='language-none text-2xs w-full rounded-md bg-gray-100 px-4 py-3 text-start leading-normal'>
|
||||
{command}
|
||||
</pre>
|
||||
<button
|
||||
className={`no-drag absolute right-[5px] px-2 py-2 ${
|
||||
commandCopied
|
||||
? 'text-gray-900 opacity-100 hover:cursor-auto'
|
||||
: 'text-gray-200 opacity-50 hover:cursor-pointer'
|
||||
} hover:font-bold hover:text-gray-900 group-hover:opacity-100`}
|
||||
onClick={() => {
|
||||
copy(command)
|
||||
setCommandCopied(true)
|
||||
setTimeout(() => setCommandCopied(false), 3000)
|
||||
}}
|
||||
>
|
||||
{commandCopied ? (
|
||||
<CheckIcon className='h-4 w-4 font-bold text-gray-500' />
|
||||
) : (
|
||||
<DocumentDuplicateIcon className='h-4 w-4 text-gray-500' />
|
||||
)}
|
||||
</button>
|
||||
</div>
|
||||
<p className='mx-auto my-4 w-[70%] text-xs text-gray-400'>
|
||||
Run this command in your favorite terminal.
|
||||
</p>
|
||||
</div>
|
||||
<button
|
||||
onClick={() => {
|
||||
store.set('first-time-run', true)
|
||||
window.close()
|
||||
}}
|
||||
className='no-drag rounded-dm mx-auto w-[60%] rounded-md bg-black px-4 py-2 text-sm text-white hover:brightness-110'
|
||||
>
|
||||
Finish
|
||||
</button>
|
||||
</div>
|
||||
</>
|
||||
)}
|
||||
</div>
|
||||
</div>
|
||||
|
4
app/src/declarations.d.ts
vendored
Normal file
@@ -0,0 +1,4 @@
|
||||
declare module '*.svg' {
|
||||
const content: string;
|
||||
export default content;
|
||||
}
|
133
app/src/index.ts
@@ -1,17 +1,20 @@
|
||||
import { spawn, exec } from 'child_process'
|
||||
import { app, autoUpdater, dialog, Tray, Menu } from 'electron'
|
||||
import { spawn } from 'child_process'
|
||||
import { app, autoUpdater, dialog, Tray, Menu, BrowserWindow, nativeTheme } from 'electron'
|
||||
import Store from 'electron-store'
|
||||
import winston from 'winston'
|
||||
import 'winston-daily-rotate-file'
|
||||
import * as path from 'path'
|
||||
import * as fs from 'fs'
|
||||
|
||||
import { analytics, id } from './telemetry'
|
||||
import { installed } from './install'
|
||||
|
||||
require('@electron/remote/main').initialize()
|
||||
|
||||
const store = new Store()
|
||||
let tray: Tray | null = null
|
||||
let welcomeWindow: BrowserWindow | null = null
|
||||
|
||||
declare const MAIN_WINDOW_WEBPACK_ENTRY: string
|
||||
|
||||
const logger = winston.createLogger({
|
||||
transports: [
|
||||
@@ -22,7 +25,7 @@ const logger = winston.createLogger({
|
||||
maxFiles: 5,
|
||||
}),
|
||||
],
|
||||
format: winston.format.printf(info => `${info.message}`),
|
||||
format: winston.format.printf(info => info.message),
|
||||
})
|
||||
|
||||
const SingleInstanceLock = app.requestSingleInstanceLock()
|
||||
@@ -30,15 +33,63 @@ if (!SingleInstanceLock) {
|
||||
app.quit()
|
||||
}
|
||||
|
||||
const createSystemtray = () => {
|
||||
let iconPath = path.join(__dirname, '..', '..', 'assets', 'ollama_icon_16x16Template.png')
|
||||
function firstRunWindow() {
|
||||
// Create the browser window.
|
||||
welcomeWindow = new BrowserWindow({
|
||||
width: 400,
|
||||
height: 500,
|
||||
frame: false,
|
||||
fullscreenable: false,
|
||||
resizable: false,
|
||||
movable: true,
|
||||
show: false,
|
||||
webPreferences: {
|
||||
nodeIntegration: true,
|
||||
contextIsolation: false,
|
||||
},
|
||||
alwaysOnTop: true,
|
||||
})
|
||||
|
||||
require('@electron/remote/main').enable(welcomeWindow.webContents)
|
||||
|
||||
// and load the index.html of the app.
|
||||
welcomeWindow.loadURL(MAIN_WINDOW_WEBPACK_ENTRY)
|
||||
|
||||
welcomeWindow.on('ready-to-show', () => welcomeWindow.show())
|
||||
|
||||
// for debugging
|
||||
// welcomeWindow.webContents.openDevTools()
|
||||
|
||||
if (process.platform === 'darwin') {
|
||||
app.dock.hide()
|
||||
}
|
||||
}
|
||||
|
||||
function createSystemtray() {
|
||||
let iconPath = nativeTheme.shouldUseDarkColors
|
||||
? path.join(__dirname, '..', '..', 'assets', 'ollama_icon_16x16Template.png')
|
||||
: path.join(__dirname, '..', '..', 'assets', 'ollama_outline_icon_16x16Template.png')
|
||||
|
||||
if (app.isPackaged) {
|
||||
iconPath = path.join(process.resourcesPath, 'ollama_icon_16x16Template.png')
|
||||
iconPath = nativeTheme.shouldUseDarkColors
|
||||
? path.join(process.resourcesPath, 'ollama_icon_16x16Template.png')
|
||||
: path.join(process.resourcesPath, 'ollama_outline_icon_16x16Template.png')
|
||||
}
|
||||
|
||||
tray = new Tray(iconPath)
|
||||
|
||||
nativeTheme.on('updated', function theThemeHasChanged () {
|
||||
if (nativeTheme.shouldUseDarkColors) {
|
||||
app.isPackaged
|
||||
? tray.setImage(path.join(process.resourcesPath, 'ollama_icon_16x16Template.png'))
|
||||
: tray.setImage(path.join(__dirname, '..', '..', 'assets', 'ollama_icon_16x16Template.png'))
|
||||
} else {
|
||||
app.isPackaged
|
||||
? tray.setImage(path.join(process.resourcesPath, 'ollama_outline_icon_16x16Template.png'))
|
||||
: tray.setImage(path.join(__dirname, '..', '..', 'assets', 'ollama_outline_icon_16x16Template.png'))
|
||||
}
|
||||
})
|
||||
|
||||
const contextMenu = Menu.buildFromTemplate([{ role: 'quit', label: 'Quit Ollama', accelerator: 'Command+Q' }])
|
||||
|
||||
tray.setContextMenu(contextMenu)
|
||||
@@ -49,8 +100,6 @@ if (require('electron-squirrel-startup')) {
|
||||
app.quit()
|
||||
}
|
||||
|
||||
const ollama = path.join(process.resourcesPath, 'ollama')
|
||||
|
||||
function server() {
|
||||
const binary = app.isPackaged
|
||||
? path.join(process.resourcesPath, 'ollama')
|
||||
@@ -66,66 +115,25 @@ function server() {
|
||||
logger.error(data.toString().trim())
|
||||
})
|
||||
|
||||
proc.on('exit', () => {
|
||||
function restart() {
|
||||
logger.info('Restarting the server...')
|
||||
server()
|
||||
})
|
||||
}
|
||||
|
||||
proc.on('disconnect', () => {
|
||||
logger.info('Server disconnected. Reconnecting...')
|
||||
server()
|
||||
})
|
||||
proc.on('exit', restart)
|
||||
|
||||
process.on('exit', () => {
|
||||
app.on('before-quit', () => {
|
||||
proc.off('exit', restart)
|
||||
proc.kill()
|
||||
})
|
||||
}
|
||||
|
||||
function installCLI() {
|
||||
const symlinkPath = '/usr/local/bin/ollama'
|
||||
|
||||
if (fs.existsSync(symlinkPath) && fs.readlinkSync(symlinkPath) === ollama) {
|
||||
return
|
||||
}
|
||||
|
||||
dialog
|
||||
.showMessageBox({
|
||||
type: 'info',
|
||||
title: 'Ollama CLI installation',
|
||||
message: 'To make the Ollama command work in your terminal, it needs administrator privileges.',
|
||||
buttons: ['OK'],
|
||||
})
|
||||
.then(result => {
|
||||
if (result.response === 0) {
|
||||
const command = `
|
||||
do shell script "ln -F -s ${ollama} /usr/local/bin/ollama" with administrator privileges
|
||||
`
|
||||
exec(`osascript -e '${command}'`, (error: Error | null, stdout: string, stderr: string) => {
|
||||
if (error) {
|
||||
logger.error(`cli: failed to install cli: ${error.message}`)
|
||||
return
|
||||
}
|
||||
|
||||
logger.info(stdout)
|
||||
logger.error(stderr)
|
||||
})
|
||||
}
|
||||
})
|
||||
if (process.platform === 'darwin') {
|
||||
app.dock.hide()
|
||||
}
|
||||
|
||||
app.on('ready', () => {
|
||||
if (process.platform === 'darwin') {
|
||||
app.dock.hide()
|
||||
|
||||
if (!store.has('first-time-run')) {
|
||||
// This is the first run
|
||||
app.setLoginItemSettings({ openAtLogin: true })
|
||||
store.set('first-time-run', false)
|
||||
} else {
|
||||
// The app has been run before
|
||||
app.setLoginItemSettings({ openAtLogin: app.getLoginItemSettings().openAtLogin })
|
||||
}
|
||||
|
||||
if (app.isPackaged) {
|
||||
if (!app.isInApplicationsFolder()) {
|
||||
const chosen = dialog.showMessageBoxSync({
|
||||
@@ -157,13 +165,20 @@ app.on('ready', () => {
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
installCLI()
|
||||
}
|
||||
}
|
||||
|
||||
createSystemtray()
|
||||
server()
|
||||
|
||||
if (store.get('first-time-run') && installed()) {
|
||||
app.setLoginItemSettings({ openAtLogin: app.getLoginItemSettings().openAtLogin })
|
||||
return
|
||||
}
|
||||
|
||||
// This is the first run or the CLI is no longer installed
|
||||
app.setLoginItemSettings({ openAtLogin: true })
|
||||
firstRunWindow()
|
||||
})
|
||||
|
||||
// Quit when all windows are closed, except on macOS. There, it's common
|
||||
|
26
app/src/install.ts
Normal file
@@ -0,0 +1,26 @@
|
||||
import * as fs from 'fs'
|
||||
import { exec as cbExec } from 'child_process'
|
||||
import * as path from 'path'
|
||||
import { promisify } from 'util'
|
||||
|
||||
const app = process && process.type === 'renderer' ? require('@electron/remote').app : require('electron').app
|
||||
const ollama = app.isPackaged ? path.join(process.resourcesPath, 'ollama') : path.resolve(process.cwd(), '..', 'ollama')
|
||||
const exec = promisify(cbExec)
|
||||
const symlinkPath = '/usr/local/bin/ollama'
|
||||
|
||||
export function installed() {
|
||||
return fs.existsSync(symlinkPath) && fs.readlinkSync(symlinkPath) === ollama
|
||||
}
|
||||
|
||||
export async function install() {
|
||||
const command = `do shell script "mkdir -p ${path.dirname(
|
||||
symlinkPath
|
||||
)} && ln -F -s ${ollama} ${symlinkPath}" with administrator privileges`
|
||||
|
||||
try {
|
||||
await exec(`osascript -e '${command}'`)
|
||||
} catch (error) {
|
||||
console.error(`cli: failed to install cli: ${error.message}`)
|
||||
return
|
||||
}
|
||||
}
|
9
app/src/ollama.svg
Normal file
After Width: | Height: | Size: 17 KiB |
@@ -28,4 +28,8 @@ export const rules: Required<ModuleOptions>['rules'] = [
|
||||
},
|
||||
},
|
||||
},
|
||||
{
|
||||
test: /\.svg$/,
|
||||
use: ['@svgr/webpack'],
|
||||
},
|
||||
]
|
||||
|
391
cmd/cmd.go
@@ -5,36 +5,71 @@ import (
|
||||
"context"
|
||||
"errors"
|
||||
"fmt"
|
||||
"io"
|
||||
"log"
|
||||
"net"
|
||||
"net/http"
|
||||
"os"
|
||||
"path"
|
||||
"path/filepath"
|
||||
"strings"
|
||||
"time"
|
||||
|
||||
"github.com/schollz/progressbar/v3"
|
||||
"github.com/chzyer/readline"
|
||||
"github.com/dustin/go-humanize"
|
||||
"github.com/olekukonko/tablewriter"
|
||||
"github.com/spf13/cobra"
|
||||
"golang.org/x/term"
|
||||
|
||||
"github.com/jmorganca/ollama/api"
|
||||
"github.com/jmorganca/ollama/format"
|
||||
"github.com/jmorganca/ollama/progressbar"
|
||||
"github.com/jmorganca/ollama/server"
|
||||
)
|
||||
|
||||
func cacheDir() string {
|
||||
home, err := os.UserHomeDir()
|
||||
func CreateHandler(cmd *cobra.Command, args []string) error {
|
||||
filename, _ := cmd.Flags().GetString("file")
|
||||
filename, err := filepath.Abs(filename)
|
||||
if err != nil {
|
||||
panic(err)
|
||||
return err
|
||||
}
|
||||
|
||||
return path.Join(home, ".ollama")
|
||||
client := api.NewClient()
|
||||
|
||||
var spinner *Spinner
|
||||
|
||||
request := api.CreateRequest{Name: args[0], Path: filename}
|
||||
fn := func(resp api.CreateProgress) error {
|
||||
if spinner != nil {
|
||||
spinner.Stop()
|
||||
}
|
||||
|
||||
spinner = NewSpinner(resp.Status)
|
||||
go spinner.Spin(100 * time.Millisecond)
|
||||
|
||||
return nil
|
||||
}
|
||||
|
||||
if err := client.Create(context.Background(), &request, fn); err != nil {
|
||||
return err
|
||||
}
|
||||
|
||||
if spinner != nil {
|
||||
spinner.Stop()
|
||||
}
|
||||
|
||||
return nil
|
||||
}
|
||||
|
||||
func RunRun(cmd *cobra.Command, args []string) error {
|
||||
_, err := os.Stat(args[0])
|
||||
func RunHandler(cmd *cobra.Command, args []string) error {
|
||||
mp := server.ParseModelPath(args[0])
|
||||
fp, err := mp.GetManifestPath(false)
|
||||
if err != nil {
|
||||
return err
|
||||
}
|
||||
|
||||
_, err = os.Stat(fp)
|
||||
switch {
|
||||
case errors.Is(err, os.ErrNotExist):
|
||||
if err := pull(args[0]); err != nil {
|
||||
if err := pull(args[0], false); err != nil {
|
||||
var apiStatusError api.StatusError
|
||||
if !errors.As(err, &apiStatusError) {
|
||||
return err
|
||||
@@ -51,70 +86,147 @@ func RunRun(cmd *cobra.Command, args []string) error {
|
||||
return RunGenerate(cmd, args)
|
||||
}
|
||||
|
||||
func pull(model string) error {
|
||||
func PushHandler(cmd *cobra.Command, args []string) error {
|
||||
client := api.NewClient()
|
||||
var bar *progressbar.ProgressBar
|
||||
return client.Pull(
|
||||
context.Background(),
|
||||
&api.PullRequest{Model: model},
|
||||
func(progress api.PullProgress) error {
|
||||
if bar == nil {
|
||||
if progress.Percent >= 100 {
|
||||
// already downloaded
|
||||
return nil
|
||||
}
|
||||
|
||||
bar = progressbar.DefaultBytes(progress.Total)
|
||||
}
|
||||
insecure, err := cmd.Flags().GetBool("insecure")
|
||||
if err != nil {
|
||||
return err
|
||||
}
|
||||
|
||||
return bar.Set64(progress.Completed)
|
||||
},
|
||||
)
|
||||
request := api.PushRequest{Name: args[0], Insecure: insecure}
|
||||
fn := func(resp api.ProgressResponse) error {
|
||||
fmt.Println(resp.Status)
|
||||
return nil
|
||||
}
|
||||
|
||||
if err := client.Push(context.Background(), &request, fn); err != nil {
|
||||
return err
|
||||
}
|
||||
return nil
|
||||
}
|
||||
|
||||
func RunGenerate(_ *cobra.Command, args []string) error {
|
||||
func ListHandler(cmd *cobra.Command, args []string) error {
|
||||
client := api.NewClient()
|
||||
|
||||
models, err := client.List(context.Background())
|
||||
if err != nil {
|
||||
return err
|
||||
}
|
||||
|
||||
var data [][]string
|
||||
|
||||
for _, m := range models.Models {
|
||||
if len(args) == 0 || strings.HasPrefix(m.Name, args[0]) {
|
||||
data = append(data, []string{m.Name, humanize.Bytes(uint64(m.Size)), format.HumanTime(m.ModifiedAt, "Never")})
|
||||
}
|
||||
}
|
||||
|
||||
table := tablewriter.NewWriter(os.Stdout)
|
||||
table.SetHeader([]string{"NAME", "SIZE", "MODIFIED"})
|
||||
table.SetHeaderAlignment(tablewriter.ALIGN_LEFT)
|
||||
table.SetAlignment(tablewriter.ALIGN_LEFT)
|
||||
table.SetHeaderLine(false)
|
||||
table.SetBorder(false)
|
||||
table.SetNoWhiteSpace(true)
|
||||
table.SetTablePadding("\t")
|
||||
table.AppendBulk(data)
|
||||
table.Render()
|
||||
|
||||
return nil
|
||||
}
|
||||
|
||||
func DeleteHandler(cmd *cobra.Command, args []string) error {
|
||||
client := api.NewClient()
|
||||
|
||||
request := api.DeleteRequest{Name: args[0]}
|
||||
if err := client.Delete(context.Background(), &request); err != nil {
|
||||
return err
|
||||
}
|
||||
fmt.Printf("deleted '%s'\n", args[0])
|
||||
return nil
|
||||
}
|
||||
|
||||
func PullHandler(cmd *cobra.Command, args []string) error {
|
||||
insecure, err := cmd.Flags().GetBool("insecure")
|
||||
if err != nil {
|
||||
return err
|
||||
}
|
||||
|
||||
return pull(args[0], insecure)
|
||||
}
|
||||
|
||||
func pull(model string, insecure bool) error {
|
||||
client := api.NewClient()
|
||||
|
||||
var currentDigest string
|
||||
var bar *progressbar.ProgressBar
|
||||
|
||||
request := api.PullRequest{Name: model, Insecure: insecure}
|
||||
fn := func(resp api.ProgressResponse) error {
|
||||
if resp.Digest != currentDigest && resp.Digest != "" {
|
||||
currentDigest = resp.Digest
|
||||
bar = progressbar.DefaultBytes(
|
||||
int64(resp.Total),
|
||||
fmt.Sprintf("pulling %s...", resp.Digest[7:19]),
|
||||
)
|
||||
|
||||
bar.Set(resp.Completed)
|
||||
} else if resp.Digest == currentDigest && resp.Digest != "" {
|
||||
bar.Set(resp.Completed)
|
||||
} else {
|
||||
currentDigest = ""
|
||||
fmt.Println(resp.Status)
|
||||
}
|
||||
return nil
|
||||
}
|
||||
|
||||
if err := client.Pull(context.Background(), &request, fn); err != nil {
|
||||
return err
|
||||
}
|
||||
return nil
|
||||
}
|
||||
|
||||
func RunGenerate(cmd *cobra.Command, args []string) error {
|
||||
if len(args) > 1 {
|
||||
// join all args into a single prompt
|
||||
return generate(args[0], strings.Join(args[1:], " "))
|
||||
return generate(cmd, args[0], strings.Join(args[1:], " "))
|
||||
}
|
||||
|
||||
if term.IsTerminal(int(os.Stdin.Fd())) {
|
||||
return generateInteractive(args[0])
|
||||
if readline.IsTerminal(int(os.Stdin.Fd())) {
|
||||
return generateInteractive(cmd, args[0])
|
||||
}
|
||||
|
||||
return generateBatch(args[0])
|
||||
return generateBatch(cmd, args[0])
|
||||
}
|
||||
|
||||
func generate(model, prompt string) error {
|
||||
var generateContextKey struct{}
|
||||
|
||||
func generate(cmd *cobra.Command, model, prompt string) error {
|
||||
if len(strings.TrimSpace(prompt)) > 0 {
|
||||
client := api.NewClient()
|
||||
|
||||
spinner := progressbar.NewOptions(-1,
|
||||
progressbar.OptionSetWriter(os.Stderr),
|
||||
progressbar.OptionThrottle(60*time.Millisecond),
|
||||
progressbar.OptionSpinnerType(14),
|
||||
progressbar.OptionSetRenderBlankState(true),
|
||||
progressbar.OptionSetElapsedTime(false),
|
||||
progressbar.OptionClearOnFinish(),
|
||||
)
|
||||
spinner := NewSpinner("")
|
||||
go spinner.Spin(60 * time.Millisecond)
|
||||
|
||||
go func() {
|
||||
for range time.Tick(60 * time.Millisecond) {
|
||||
if spinner.IsFinished() {
|
||||
break
|
||||
}
|
||||
var latest api.GenerateResponse
|
||||
|
||||
spinner.Add(1)
|
||||
}
|
||||
}()
|
||||
generateContext, ok := cmd.Context().Value(generateContextKey).([]int)
|
||||
if !ok {
|
||||
generateContext = []int{}
|
||||
}
|
||||
|
||||
request := api.GenerateRequest{Model: model, Prompt: prompt}
|
||||
request := api.GenerateRequest{Model: model, Prompt: prompt, Context: generateContext}
|
||||
fn := func(resp api.GenerateResponse) error {
|
||||
if !spinner.IsFinished() {
|
||||
spinner.Finish()
|
||||
}
|
||||
|
||||
latest = resp
|
||||
|
||||
fmt.Print(resp.Response)
|
||||
|
||||
cmd.SetContext(context.WithValue(cmd.Context(), generateContextKey, resp.Context))
|
||||
return nil
|
||||
}
|
||||
|
||||
@@ -124,31 +236,134 @@ func generate(model, prompt string) error {
|
||||
|
||||
fmt.Println()
|
||||
fmt.Println()
|
||||
}
|
||||
|
||||
return nil
|
||||
}
|
||||
|
||||
func generateInteractive(model string) error {
|
||||
fmt.Print(">>> ")
|
||||
scanner := bufio.NewScanner(os.Stdin)
|
||||
for scanner.Scan() {
|
||||
if err := generate(model, scanner.Text()); err != nil {
|
||||
verbose, err := cmd.Flags().GetBool("verbose")
|
||||
if err != nil {
|
||||
return err
|
||||
}
|
||||
|
||||
fmt.Print(">>> ")
|
||||
if verbose {
|
||||
latest.Summary()
|
||||
}
|
||||
}
|
||||
|
||||
return nil
|
||||
}
|
||||
|
||||
func generateBatch(model string) error {
|
||||
func generateInteractive(cmd *cobra.Command, model string) error {
|
||||
home, err := os.UserHomeDir()
|
||||
if err != nil {
|
||||
return err
|
||||
}
|
||||
|
||||
completer := readline.NewPrefixCompleter(
|
||||
readline.PcItem("/help"),
|
||||
readline.PcItem("/list"),
|
||||
readline.PcItem("/set",
|
||||
readline.PcItem("history"),
|
||||
readline.PcItem("nohistory"),
|
||||
readline.PcItem("verbose"),
|
||||
readline.PcItem("quiet"),
|
||||
readline.PcItem("mode",
|
||||
readline.PcItem("vim"),
|
||||
readline.PcItem("emacs"),
|
||||
readline.PcItem("default"),
|
||||
),
|
||||
),
|
||||
readline.PcItem("/exit"),
|
||||
readline.PcItem("/bye"),
|
||||
)
|
||||
|
||||
usage := func() {
|
||||
fmt.Fprintln(os.Stderr, "commands:")
|
||||
fmt.Fprintln(os.Stderr, completer.Tree(" "))
|
||||
}
|
||||
|
||||
config := readline.Config{
|
||||
Prompt: ">>> ",
|
||||
HistoryFile: filepath.Join(home, ".ollama", "history"),
|
||||
AutoComplete: completer,
|
||||
}
|
||||
|
||||
scanner, err := readline.NewEx(&config)
|
||||
if err != nil {
|
||||
return err
|
||||
}
|
||||
defer scanner.Close()
|
||||
|
||||
for {
|
||||
line, err := scanner.Readline()
|
||||
switch {
|
||||
case errors.Is(err, io.EOF):
|
||||
return nil
|
||||
case errors.Is(err, readline.ErrInterrupt):
|
||||
if line == "" {
|
||||
return nil
|
||||
}
|
||||
|
||||
continue
|
||||
case err != nil:
|
||||
return err
|
||||
}
|
||||
|
||||
line = strings.TrimSpace(line)
|
||||
|
||||
switch {
|
||||
case strings.HasPrefix(line, "/list"):
|
||||
args := strings.Fields(line)
|
||||
if err := ListHandler(cmd, args[1:]); err != nil {
|
||||
return err
|
||||
}
|
||||
|
||||
continue
|
||||
case strings.HasPrefix(line, "/set"):
|
||||
args := strings.Fields(line)
|
||||
if len(args) > 1 {
|
||||
switch args[1] {
|
||||
case "history":
|
||||
scanner.HistoryEnable()
|
||||
continue
|
||||
case "nohistory":
|
||||
scanner.HistoryDisable()
|
||||
continue
|
||||
case "verbose":
|
||||
cmd.Flags().Set("verbose", "true")
|
||||
continue
|
||||
case "quiet":
|
||||
cmd.Flags().Set("verbose", "false")
|
||||
continue
|
||||
case "mode":
|
||||
if len(args) > 2 {
|
||||
switch args[2] {
|
||||
case "vim":
|
||||
scanner.SetVimMode(true)
|
||||
continue
|
||||
case "emacs", "default":
|
||||
scanner.SetVimMode(false)
|
||||
continue
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
case line == "/help", line == "/?":
|
||||
usage()
|
||||
continue
|
||||
case line == "/exit", line == "/bye":
|
||||
return nil
|
||||
}
|
||||
|
||||
if err := generate(cmd, model, line); err != nil {
|
||||
return err
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
func generateBatch(cmd *cobra.Command, model string) error {
|
||||
scanner := bufio.NewScanner(os.Stdin)
|
||||
for scanner.Scan() {
|
||||
prompt := scanner.Text()
|
||||
fmt.Printf(">>> %s\n", prompt)
|
||||
if err := generate(model, prompt); err != nil {
|
||||
if err := generate(cmd, model, prompt); err != nil {
|
||||
return err
|
||||
}
|
||||
}
|
||||
@@ -185,21 +400,28 @@ func NewCLI() *cobra.Command {
|
||||
CompletionOptions: cobra.CompletionOptions{
|
||||
DisableDefaultCmd: true,
|
||||
},
|
||||
PersistentPreRunE: func(_ *cobra.Command, args []string) error {
|
||||
// create the models directory and it's parent
|
||||
return os.MkdirAll(path.Join(cacheDir(), "models"), 0o700)
|
||||
},
|
||||
}
|
||||
|
||||
cobra.EnableCommandSorting = false
|
||||
|
||||
createCmd := &cobra.Command{
|
||||
Use: "create MODEL",
|
||||
Short: "Create a model from a Modelfile",
|
||||
Args: cobra.MinimumNArgs(1),
|
||||
RunE: CreateHandler,
|
||||
}
|
||||
|
||||
createCmd.Flags().StringP("file", "f", "Modelfile", "Name of the Modelfile (default \"Modelfile\")")
|
||||
|
||||
runCmd := &cobra.Command{
|
||||
Use: "run MODEL [PROMPT]",
|
||||
Short: "Run a model",
|
||||
Args: cobra.MinimumNArgs(1),
|
||||
RunE: RunRun,
|
||||
RunE: RunHandler,
|
||||
}
|
||||
|
||||
runCmd.Flags().Bool("verbose", false, "Show timings for response")
|
||||
|
||||
serveCmd := &cobra.Command{
|
||||
Use: "serve",
|
||||
Aliases: []string{"start"},
|
||||
@@ -207,9 +429,46 @@ func NewCLI() *cobra.Command {
|
||||
RunE: RunServer,
|
||||
}
|
||||
|
||||
pullCmd := &cobra.Command{
|
||||
Use: "pull MODEL",
|
||||
Short: "Pull a model from a registry",
|
||||
Args: cobra.MinimumNArgs(1),
|
||||
RunE: PullHandler,
|
||||
}
|
||||
|
||||
pullCmd.Flags().Bool("insecure", false, "Use an insecure registry")
|
||||
|
||||
pushCmd := &cobra.Command{
|
||||
Use: "push MODEL",
|
||||
Short: "Push a model to a registry",
|
||||
Args: cobra.MinimumNArgs(1),
|
||||
RunE: PushHandler,
|
||||
}
|
||||
|
||||
pushCmd.Flags().Bool("insecure", false, "Use an insecure registry")
|
||||
|
||||
listCmd := &cobra.Command{
|
||||
Use: "list",
|
||||
Aliases: []string{"ls"},
|
||||
Short: "List models",
|
||||
RunE: ListHandler,
|
||||
}
|
||||
|
||||
deleteCmd := &cobra.Command{
|
||||
Use: "rm",
|
||||
Short: "Remove a model",
|
||||
Args: cobra.MinimumNArgs(1),
|
||||
RunE: DeleteHandler,
|
||||
}
|
||||
|
||||
rootCmd.AddCommand(
|
||||
serveCmd,
|
||||
createCmd,
|
||||
runCmd,
|
||||
pullCmd,
|
||||
pushCmd,
|
||||
listCmd,
|
||||
deleteCmd,
|
||||
)
|
||||
|
||||
return rootCmd
|
||||
|
44
cmd/spinner.go
Normal file
@@ -0,0 +1,44 @@
|
||||
package cmd
|
||||
|
||||
import (
|
||||
"fmt"
|
||||
"os"
|
||||
"time"
|
||||
|
||||
"github.com/jmorganca/ollama/progressbar"
|
||||
)
|
||||
|
||||
type Spinner struct {
|
||||
description string
|
||||
*progressbar.ProgressBar
|
||||
}
|
||||
|
||||
func NewSpinner(description string) *Spinner {
|
||||
return &Spinner{
|
||||
description: description,
|
||||
ProgressBar: progressbar.NewOptions(-1,
|
||||
progressbar.OptionSetWriter(os.Stderr),
|
||||
progressbar.OptionThrottle(60*time.Millisecond),
|
||||
progressbar.OptionSpinnerType(14),
|
||||
progressbar.OptionSetRenderBlankState(true),
|
||||
progressbar.OptionSetElapsedTime(false),
|
||||
progressbar.OptionClearOnFinish(),
|
||||
progressbar.OptionSetDescription(description),
|
||||
),
|
||||
}
|
||||
}
|
||||
|
||||
func (s *Spinner) Spin(tick time.Duration) {
|
||||
for range time.Tick(tick) {
|
||||
if s.IsFinished() {
|
||||
break
|
||||
}
|
||||
|
||||
s.Add(1)
|
||||
}
|
||||
}
|
||||
|
||||
func (s *Spinner) Stop() {
|
||||
s.Finish()
|
||||
fmt.Println(s.description)
|
||||
}
|
@@ -3,13 +3,19 @@
|
||||
Install required tools:
|
||||
|
||||
```
|
||||
brew install cmake go node
|
||||
brew install go
|
||||
```
|
||||
|
||||
Then run `make`:
|
||||
Enable CGO:
|
||||
|
||||
```
|
||||
make
|
||||
export CGO_ENABLED=1
|
||||
```
|
||||
|
||||
Then build ollama:
|
||||
|
||||
```
|
||||
go build .
|
||||
```
|
||||
|
||||
Now you can run `ollama`:
|
||||
|
105
docs/modelfile.md
Normal file
@@ -0,0 +1,105 @@
|
||||
# Ollama Model File
|
||||
|
||||
> Note: this model file syntax is in development
|
||||
|
||||
A model file is the blueprint to create and share models with Ollama.
|
||||
|
||||
## Format
|
||||
|
||||
The format of the Modelfile:
|
||||
|
||||
```modelfile
|
||||
# comment
|
||||
INSTRUCTION arguments
|
||||
```
|
||||
|
||||
| Instruction | Description |
|
||||
| ----------------- | ----------------------------------------------------- |
|
||||
| `FROM` (required) | Defines the base model to use |
|
||||
| `PARAMETER` | Sets the parameters for how Ollama will run the model |
|
||||
| `SYSTEM` | Specifies the system prompt that will set the context |
|
||||
| `TEMPLATE` | The full prompt template to be sent to the model |
|
||||
| `LICENSE` | Specifies the legal license |
|
||||
|
||||
## Examples
|
||||
|
||||
An example of a model file creating a mario blueprint:
|
||||
|
||||
```
|
||||
FROM llama2
|
||||
# sets the temperature to 1 [higher is more creative, lower is more coherent]
|
||||
# sets the context size to 4096
|
||||
PARAMETER temperature 1
|
||||
PARAMETER num_ctx 4096
|
||||
|
||||
# Overriding the system prompt
|
||||
SYSTEM You are Mario from super mario bros, acting as an assistant.
|
||||
```
|
||||
|
||||
To use this:
|
||||
|
||||
1. Save it as a file (eg. `Modelfile``)
|
||||
2. `ollama create NAME -f <location of the file eg. ./Modelfile>'`
|
||||
3. `ollama run NAME`
|
||||
4. Start using the model!
|
||||
|
||||
## FROM (Required)
|
||||
|
||||
The FROM instruction defines the base model to use when creating a model.
|
||||
|
||||
```
|
||||
FROM <model name>:<tag>
|
||||
```
|
||||
|
||||
### Build from llama2
|
||||
|
||||
```
|
||||
FROM llama2
|
||||
```
|
||||
|
||||
A list of available base models:
|
||||
<https://github.com/jmorganca/ollama#model-library>
|
||||
|
||||
### Build from a bin file
|
||||
|
||||
```
|
||||
FROM ./ollama-model.bin
|
||||
```
|
||||
|
||||
## PARAMETER (Optional)
|
||||
|
||||
The `PARAMETER` instruction defines a parameter that can be set when the model is run.
|
||||
|
||||
```
|
||||
PARAMETER <parameter> <parametervalue>
|
||||
```
|
||||
|
||||
### Valid Parameters and Values
|
||||
|
||||
| Parameter | Description | Value Type | Example Usage |
|
||||
| -------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ---------- | ------------------ |
|
||||
| num_ctx | Sets the size of the prompt context size length model. (Default: 2048) | int | num_ctx 4096 |
|
||||
| temperature | The temperature of the model. Increasing the temperature will make the model answer more creatively. (Default: 0.8) | float | temperature 0.7 |
|
||||
| top_k | Reduces the probability of generating nonsense. A higher value (e.g. 100) will give more diverse answers, while a lower value (e.g. 10) will be more conservative. (Default: 40) | int | top_k 40 |
|
||||
| top_p | Works together with top-k. A higher value (e.g., 0.95) will lead to more diverse text, while a lower value (e.g., 0.5) will generate more focused and conservative text. (Default: 0.9) | float | top_p 0.9 |
|
||||
| num_gpu | The number of GPUs to use. On macOS it defaults to 1 to enable metal support, 0 to disable. | int | num_gpu 1 |
|
||||
| repeat_last_n | Sets how far back for the model to look back to prevent repetition. (Default: 64, 0 = disabled, -1 = ctx-size) | int | repeat_last_n 64 |
|
||||
| repeat_penalty | Sets how strongly to penalize repetitions. A higher value (e.g., 1.5) will penalize repetitions more strongly, while a lower value (e.g., 0.9) will be more lenient. (Default: 1.1) | float | repeat_penalty 1.1 |
|
||||
| tfs_z | Tail free sampling is used to reduce the impact of less probable tokens from the output. A higher value (e.g., 2.0) will reduce the impact more, while a value of 1.0 disables this setting. (default: 1) | float | tfs_z 1 |
|
||||
| mirostat | Enable Mirostat sampling for controlling perplexity. (default: 0, 0 = disabled, 1 = Mirostat, 2 = Mirostat 2.0) | int | mirostat 0 |
|
||||
| mirostat_tau | Controls the balance between coherence and diversity of the output. A lower value will result in more focused and coherent text. (Default: 5.0) | float | mirostat_tau 5.0 |
|
||||
| mirostat_eta | Influences how quickly the algorithm responds to feedback from the generated text. A lower learning rate will result in slower adjustments, while a higher learning rate will make the algorithm more responsive. (Default: 0.1) | float | mirostat_eta 0.1 |
|
||||
| num_thread | Sets the number of threads to use during computation. By default, Ollama will detect this for optimal performance. It is recommended to set this value to the number of physical CPU cores your system has (as opposed to the logical number of cores). | int | num_thread 8 |
|
||||
|
||||
## Prompt
|
||||
|
||||
When building on top of the base models supplied by Ollama, it comes with the prompt template predefined. To override the supplied system prompt, simply add `SYSTEM insert system prompt` to change the system prompt.
|
||||
|
||||
### Prompt Template
|
||||
|
||||
`TEMPLATE` the full prompt template to be passed into the model. It may include (optionally) a system prompt, user prompt, and assistant prompt. This is used to create a full custom prompt, and syntax may be model specific.
|
||||
|
||||
## Notes
|
||||
|
||||
- the **modelfile is not case sensitive**. In the examples, we use uppercase for instructions to make it easier to distinguish it from arguments.
|
||||
- Instructions can be in any order. In the examples, we start with FROM instruction to keep it easily readable.
|
@@ -1,64 +0,0 @@
|
||||
# Python SDK
|
||||
|
||||
## Install
|
||||
|
||||
```
|
||||
pip install ollama
|
||||
```
|
||||
|
||||
## Example
|
||||
|
||||
```python
|
||||
import ollama
|
||||
ollama.generate("orca-mini-3b", "hi")
|
||||
```
|
||||
|
||||
## Reference
|
||||
|
||||
### `ollama.generate(model, message)`
|
||||
|
||||
Generate a completion
|
||||
|
||||
```python
|
||||
ollama.generate("./llama-7b-ggml.bin", "hi")
|
||||
```
|
||||
|
||||
### `ollama.models()`
|
||||
|
||||
List available local models
|
||||
|
||||
```python
|
||||
models = ollama.models()
|
||||
```
|
||||
|
||||
### `ollama.load(model)`
|
||||
|
||||
Manually a model for generation
|
||||
|
||||
```python
|
||||
ollama.load("model")
|
||||
```
|
||||
|
||||
### `ollama.unload(model)`
|
||||
|
||||
Unload a model
|
||||
|
||||
```python
|
||||
ollama.unload("model")
|
||||
```
|
||||
|
||||
### `ollama.pull(model)`
|
||||
|
||||
Download a model
|
||||
|
||||
```python
|
||||
ollama.pull("huggingface.co/thebloke/llama-7b-ggml")
|
||||
```
|
||||
|
||||
### `ollama.search(query)`
|
||||
|
||||
Search for compatible models that Ollama can run
|
||||
|
||||
```python
|
||||
ollama.search("llama-7b")
|
||||
```
|
15
examples/README.md
Normal file
@@ -0,0 +1,15 @@
|
||||
# Examples
|
||||
|
||||
This directory contains examples that can be created and run with `ollama`.
|
||||
|
||||
To create a model:
|
||||
|
||||
```
|
||||
ollama create example -f <example file>
|
||||
```
|
||||
|
||||
To run a model:
|
||||
|
||||
```
|
||||
ollama run example
|
||||
```
|
5
examples/mario/Modelfile
Normal file
@@ -0,0 +1,5 @@
|
||||
FROM llama2
|
||||
PARAMETER temperature 1
|
||||
SYSTEM """
|
||||
You are Mario from super mario bros, acting as an assistant.
|
||||
"""
|
BIN
examples/mario/logo.png
Normal file
After Width: | Height: | Size: 446 KiB |
43
examples/mario/readme.md
Normal file
@@ -0,0 +1,43 @@
|
||||
<img src="logo.png" alt="image of Italian plumber" height="200"/>
|
||||
|
||||
# Example character: Mario
|
||||
|
||||
This example shows how to create a basic character using Llama2 as the base model.
|
||||
|
||||
To run this example:
|
||||
|
||||
1. Download the Modelfile
|
||||
2. `ollama pull llama2` to get the base model used in the model file.
|
||||
3. `ollama create NAME -f ./Modelfile`
|
||||
4. `ollama run NAME`
|
||||
|
||||
Ask it some questions like "Who are you?" or "Is Peach in trouble again?"
|
||||
|
||||
## Editing this file
|
||||
|
||||
What the model file looks like:
|
||||
|
||||
```
|
||||
FROM llama2
|
||||
PARAMETER temperature 1
|
||||
SYSTEM """
|
||||
You are Mario from Super Mario Bros, acting as an assistant.
|
||||
"""
|
||||
```
|
||||
|
||||
What if you want to change its behaviour?
|
||||
|
||||
- Try changing the prompt
|
||||
- Try changing the parameters [Docs](https://github.com/jmorganca/ollama/blob/main/docs/modelfile.md)
|
||||
- Try changing the model (e.g. An uncensored model by `FROM wizard-vicuna` this is the wizard-vicuna uncensored model )
|
||||
|
||||
Once the changes are made,
|
||||
|
||||
1. `ollama create NAME -f ./Modelfile`
|
||||
2. `ollama run NAME`
|
||||
3. Iterate until you are happy with the results.
|
||||
|
||||
Notes:
|
||||
|
||||
- This example is for research purposes only. There is no affiliation with any entity.
|
||||
- When using an uncensored model, please be aware that it may generate offensive content.
|
8
examples/midjourney-prompter/Modelfile
Normal file
@@ -0,0 +1,8 @@
|
||||
# Modelfile for creating a Midjourney prompts from a topic
|
||||
# This prompt was adapted from the original at https://www.greataiprompts.com/guide/midjourney/best-chatgpt-prompt-for-midjourney/
|
||||
# Run `ollama create mj -f ./Modelfile` and then `ollama run mj` and enter a topic
|
||||
|
||||
FROM nous-hermes
|
||||
SYSTEM """
|
||||
Embrace your role as an AI-powered creative assistant, employing Midjourney to manifest compelling AI-generated art. I will outline a specific image concept, and in response, you must produce an exhaustive, multifaceted prompt for Midjourney, ensuring every detail of the original concept is represented in your instructions. Midjourney doesn't do well with text, so after the prompt, give me instructions that I can use to create the titles in a image editor.
|
||||
"""
|
@@ -1,15 +0,0 @@
|
||||
# Python
|
||||
|
||||
This is a simple example of calling the Ollama api from a python app.
|
||||
|
||||
First, download a model:
|
||||
|
||||
```
|
||||
curl -L https://huggingface.co/TheBloke/orca_mini_3B-GGML/resolve/main/orca-mini-3b.ggmlv3.q4_1.bin -o orca.bin
|
||||
```
|
||||
|
||||
Then run it using the example script. You'll need to have Ollama running on your machine.
|
||||
|
||||
```
|
||||
python3 main.py orca.bin
|
||||
```
|
@@ -1,32 +0,0 @@
|
||||
import http.client
|
||||
import json
|
||||
import os
|
||||
import sys
|
||||
|
||||
if len(sys.argv) < 2:
|
||||
print("Usage: python main.py <model file>")
|
||||
sys.exit(1)
|
||||
|
||||
conn = http.client.HTTPConnection('localhost', 11434)
|
||||
|
||||
headers = { 'Content-Type': 'application/json' }
|
||||
|
||||
# generate text from the model
|
||||
conn.request("POST", "/api/generate", json.dumps({
|
||||
'model': os.path.join(os.getcwd(), sys.argv[1]),
|
||||
'prompt': 'write me a short story',
|
||||
'stream': True
|
||||
}), headers)
|
||||
|
||||
response = conn.getresponse()
|
||||
|
||||
def parse_generate(data):
|
||||
for event in data.decode('utf-8').split("\n"):
|
||||
if not event:
|
||||
continue
|
||||
yield event
|
||||
|
||||
if response.status == 200:
|
||||
for chunk in response:
|
||||
for event in parse_generate(chunk):
|
||||
print(json.loads(event)['response'], end="", flush=True)
|
6
examples/recipemaker/Modelfile
Normal file
@@ -0,0 +1,6 @@
|
||||
# Modelfile for creating a recipe from a list of ingredients
|
||||
# Run `ollama create recipemaker -f ./Modelfile` and then `ollama run recipemaker` and feed it lists of ingredients to create recipes around.
|
||||
FROM nous-hermes
|
||||
SYSTEM """
|
||||
The instruction will be a list of ingredients. You should generate a recipe that can be made in less than an hour. You can also include ingredients that most people will find in their pantry every day. The recipe should be 4 people and you should include a description of what the meal will taste like
|
||||
"""
|
7
examples/tweetwriter/Modelfile
Normal file
@@ -0,0 +1,7 @@
|
||||
# Modelfile for creating a tweet from a topic
|
||||
# Run `ollama create tweetwriter -f ./Modelfile` and then `ollama run tweetwriter` and enter a topic
|
||||
|
||||
FROM nous-hermes
|
||||
SYSTEM """
|
||||
You are a content marketer who needs to come up with a short but succinct tweet. Make sure to include the appropriate hashtags and links. Sometimes when appropriate, describe a meme that can be includes as well. All answers should be in the form of a tweet which has a max size of 280 characters. Every instruction will be the topic to create a tweet about.
|
||||
"""
|
141
format/time.go
Normal file
@@ -0,0 +1,141 @@
|
||||
package format
|
||||
|
||||
import (
|
||||
"fmt"
|
||||
"math"
|
||||
"strings"
|
||||
"time"
|
||||
)
|
||||
|
||||
// HumanDuration returns a human-readable approximation of a duration
|
||||
// (eg. "About a minute", "4 hours ago", etc.).
|
||||
// Modified version of github.com/docker/go-units.HumanDuration
|
||||
func HumanDuration(d time.Duration) string {
|
||||
return HumanDurationWithCase(d, true)
|
||||
}
|
||||
|
||||
// HumanDurationWithCase returns a human-readable approximation of a
|
||||
// duration (eg. "About a minute", "4 hours ago", etc.). but allows
|
||||
// you to specify whether the first word should be capitalized
|
||||
// (eg. "About" vs. "about")
|
||||
func HumanDurationWithCase(d time.Duration, useCaps bool) string {
|
||||
seconds := int(d.Seconds())
|
||||
|
||||
switch {
|
||||
case seconds < 1:
|
||||
if useCaps {
|
||||
return "Less than a second"
|
||||
}
|
||||
return "less than a second"
|
||||
case seconds == 1:
|
||||
return "1 second"
|
||||
case seconds < 60:
|
||||
return fmt.Sprintf("%d seconds", seconds)
|
||||
}
|
||||
|
||||
minutes := int(d.Minutes())
|
||||
switch {
|
||||
case minutes == 1:
|
||||
if useCaps {
|
||||
return "About a minute"
|
||||
}
|
||||
return "about a minute"
|
||||
case minutes < 60:
|
||||
return fmt.Sprintf("%d minutes", minutes)
|
||||
}
|
||||
|
||||
hours := int(math.Round(d.Hours()))
|
||||
switch {
|
||||
case hours == 1:
|
||||
if useCaps {
|
||||
return "About an hour"
|
||||
}
|
||||
return "about an hour"
|
||||
case hours < 48:
|
||||
return fmt.Sprintf("%d hours", hours)
|
||||
case hours < 24*7*2:
|
||||
return fmt.Sprintf("%d days", hours/24)
|
||||
case hours < 24*30*2:
|
||||
return fmt.Sprintf("%d weeks", hours/24/7)
|
||||
case hours < 24*365*2:
|
||||
return fmt.Sprintf("%d months", hours/24/30)
|
||||
}
|
||||
|
||||
return fmt.Sprintf("%d years", int(d.Hours())/24/365)
|
||||
}
|
||||
|
||||
func HumanTime(t time.Time, zeroValue string) string {
|
||||
return humanTimeWithCase(t, zeroValue, true)
|
||||
}
|
||||
|
||||
func HumanTimeLower(t time.Time, zeroValue string) string {
|
||||
return humanTimeWithCase(t, zeroValue, false)
|
||||
}
|
||||
|
||||
func humanTimeWithCase(t time.Time, zeroValue string, useCaps bool) string {
|
||||
if t.IsZero() {
|
||||
return zeroValue
|
||||
}
|
||||
|
||||
delta := time.Since(t)
|
||||
if delta < 0 {
|
||||
return HumanDurationWithCase(-delta, useCaps) + " from now"
|
||||
}
|
||||
return HumanDurationWithCase(delta, useCaps) + " ago"
|
||||
}
|
||||
|
||||
// ExcatDuration returns a human readable hours/minutes/seconds or milliseconds format of a duration
|
||||
// the most precise level of duration is milliseconds
|
||||
func ExactDuration(d time.Duration) string {
|
||||
if d.Seconds() < 1 {
|
||||
if d.Milliseconds() == 1 {
|
||||
return fmt.Sprintf("%d millisecond", d.Milliseconds())
|
||||
}
|
||||
return fmt.Sprintf("%d milliseconds", d.Milliseconds())
|
||||
}
|
||||
|
||||
var readableDur strings.Builder
|
||||
|
||||
dur := d.String()
|
||||
|
||||
// split the default duration string format of 0h0m0s into something nicer to read
|
||||
h := strings.Split(dur, "h")
|
||||
if len(h) > 1 {
|
||||
hours := h[0]
|
||||
if hours == "1" {
|
||||
readableDur.WriteString(fmt.Sprintf("%s hour ", hours))
|
||||
} else {
|
||||
readableDur.WriteString(fmt.Sprintf("%s hours ", hours))
|
||||
}
|
||||
dur = h[1]
|
||||
}
|
||||
|
||||
m := strings.Split(dur, "m")
|
||||
if len(m) > 1 {
|
||||
mins := m[0]
|
||||
switch mins {
|
||||
case "0":
|
||||
// skip
|
||||
case "1":
|
||||
readableDur.WriteString(fmt.Sprintf("%s minute ", mins))
|
||||
default:
|
||||
readableDur.WriteString(fmt.Sprintf("%s minutes ", mins))
|
||||
}
|
||||
dur = m[1]
|
||||
}
|
||||
|
||||
s := strings.Split(dur, "s")
|
||||
if len(s) > 0 {
|
||||
sec := s[0]
|
||||
switch sec {
|
||||
case "0":
|
||||
// skip
|
||||
case "1":
|
||||
readableDur.WriteString(fmt.Sprintf("%s second ", sec))
|
||||
default:
|
||||
readableDur.WriteString(fmt.Sprintf("%s seconds ", sec))
|
||||
}
|
||||
}
|
||||
|
||||
return strings.TrimSpace(readableDur.String())
|
||||
}
|
102
format/time_test.go
Normal file
@@ -0,0 +1,102 @@
|
||||
package format
|
||||
|
||||
import (
|
||||
"testing"
|
||||
"time"
|
||||
)
|
||||
|
||||
func assertEqual(t *testing.T, a interface{}, b interface{}) {
|
||||
if a != b {
|
||||
t.Errorf("Assert failed, expected %v, got %v", b, a)
|
||||
}
|
||||
}
|
||||
|
||||
func TestHumanDuration(t *testing.T) {
|
||||
day := 24 * time.Hour
|
||||
week := 7 * day
|
||||
month := 30 * day
|
||||
year := 365 * day
|
||||
|
||||
assertEqual(t, "Less than a second", HumanDuration(450*time.Millisecond))
|
||||
assertEqual(t, "Less than a second", HumanDurationWithCase(450*time.Millisecond, true))
|
||||
assertEqual(t, "less than a second", HumanDurationWithCase(450*time.Millisecond, false))
|
||||
assertEqual(t, "1 second", HumanDuration(1*time.Second))
|
||||
assertEqual(t, "45 seconds", HumanDuration(45*time.Second))
|
||||
assertEqual(t, "46 seconds", HumanDuration(46*time.Second))
|
||||
assertEqual(t, "59 seconds", HumanDuration(59*time.Second))
|
||||
assertEqual(t, "About a minute", HumanDuration(60*time.Second))
|
||||
assertEqual(t, "About a minute", HumanDurationWithCase(1*time.Minute, true))
|
||||
assertEqual(t, "about a minute", HumanDurationWithCase(1*time.Minute, false))
|
||||
assertEqual(t, "3 minutes", HumanDuration(3*time.Minute))
|
||||
assertEqual(t, "35 minutes", HumanDuration(35*time.Minute))
|
||||
assertEqual(t, "35 minutes", HumanDuration(35*time.Minute+40*time.Second))
|
||||
assertEqual(t, "45 minutes", HumanDuration(45*time.Minute))
|
||||
assertEqual(t, "45 minutes", HumanDuration(45*time.Minute+40*time.Second))
|
||||
assertEqual(t, "46 minutes", HumanDuration(46*time.Minute))
|
||||
assertEqual(t, "59 minutes", HumanDuration(59*time.Minute))
|
||||
assertEqual(t, "About an hour", HumanDuration(1*time.Hour))
|
||||
assertEqual(t, "About an hour", HumanDurationWithCase(1*time.Hour+29*time.Minute, true))
|
||||
assertEqual(t, "about an hour", HumanDurationWithCase(1*time.Hour+29*time.Minute, false))
|
||||
assertEqual(t, "2 hours", HumanDuration(1*time.Hour+31*time.Minute))
|
||||
assertEqual(t, "2 hours", HumanDuration(1*time.Hour+59*time.Minute))
|
||||
assertEqual(t, "3 hours", HumanDuration(3*time.Hour))
|
||||
assertEqual(t, "3 hours", HumanDuration(3*time.Hour+29*time.Minute))
|
||||
assertEqual(t, "4 hours", HumanDuration(3*time.Hour+31*time.Minute))
|
||||
assertEqual(t, "4 hours", HumanDuration(3*time.Hour+59*time.Minute))
|
||||
assertEqual(t, "4 hours", HumanDuration(3*time.Hour+60*time.Minute))
|
||||
assertEqual(t, "24 hours", HumanDuration(24*time.Hour))
|
||||
assertEqual(t, "36 hours", HumanDuration(1*day+12*time.Hour))
|
||||
assertEqual(t, "2 days", HumanDuration(2*day))
|
||||
assertEqual(t, "7 days", HumanDuration(7*day))
|
||||
assertEqual(t, "13 days", HumanDuration(13*day+5*time.Hour))
|
||||
assertEqual(t, "2 weeks", HumanDuration(2*week))
|
||||
assertEqual(t, "2 weeks", HumanDuration(2*week+4*day))
|
||||
assertEqual(t, "3 weeks", HumanDuration(3*week))
|
||||
assertEqual(t, "4 weeks", HumanDuration(4*week))
|
||||
assertEqual(t, "4 weeks", HumanDuration(4*week+3*day))
|
||||
assertEqual(t, "4 weeks", HumanDuration(1*month))
|
||||
assertEqual(t, "6 weeks", HumanDuration(1*month+2*week))
|
||||
assertEqual(t, "2 months", HumanDuration(2*month))
|
||||
assertEqual(t, "2 months", HumanDuration(2*month+2*week))
|
||||
assertEqual(t, "3 months", HumanDuration(3*month))
|
||||
assertEqual(t, "3 months", HumanDuration(3*month+1*week))
|
||||
assertEqual(t, "5 months", HumanDuration(5*month+2*week))
|
||||
assertEqual(t, "13 months", HumanDuration(13*month))
|
||||
assertEqual(t, "23 months", HumanDuration(23*month))
|
||||
assertEqual(t, "24 months", HumanDuration(24*month))
|
||||
assertEqual(t, "2 years", HumanDuration(24*month+2*week))
|
||||
assertEqual(t, "3 years", HumanDuration(3*year+2*month))
|
||||
}
|
||||
|
||||
func TestHumanTime(t *testing.T) {
|
||||
now := time.Now()
|
||||
|
||||
t.Run("zero value", func(t *testing.T) {
|
||||
assertEqual(t, HumanTime(time.Time{}, "never"), "never")
|
||||
})
|
||||
t.Run("time in the future", func(t *testing.T) {
|
||||
v := now.Add(48 * time.Hour)
|
||||
assertEqual(t, HumanTime(v, ""), "2 days from now")
|
||||
})
|
||||
t.Run("time in the past", func(t *testing.T) {
|
||||
v := now.Add(-48 * time.Hour)
|
||||
assertEqual(t, HumanTime(v, ""), "2 days ago")
|
||||
})
|
||||
}
|
||||
|
||||
func TestExactDuration(t *testing.T) {
|
||||
assertEqual(t, "1 millisecond", ExactDuration(1*time.Millisecond))
|
||||
assertEqual(t, "10 milliseconds", ExactDuration(10*time.Millisecond))
|
||||
assertEqual(t, "1 second", ExactDuration(1*time.Second))
|
||||
assertEqual(t, "10 seconds", ExactDuration(10*time.Second))
|
||||
assertEqual(t, "1 minute", ExactDuration(1*time.Minute))
|
||||
assertEqual(t, "10 minutes", ExactDuration(10*time.Minute))
|
||||
assertEqual(t, "1 hour", ExactDuration(1*time.Hour))
|
||||
assertEqual(t, "10 hours", ExactDuration(10*time.Hour))
|
||||
assertEqual(t, "1 hour 1 second", ExactDuration(1*time.Hour+1*time.Second))
|
||||
assertEqual(t, "1 hour 10 seconds", ExactDuration(1*time.Hour+10*time.Second))
|
||||
assertEqual(t, "1 hour 1 minute", ExactDuration(1*time.Hour+1*time.Minute))
|
||||
assertEqual(t, "1 hour 10 minutes", ExactDuration(1*time.Hour+10*time.Minute))
|
||||
assertEqual(t, "1 hour 1 minute 1 second", ExactDuration(1*time.Hour+1*time.Minute+1*time.Second))
|
||||
assertEqual(t, "10 hours 10 minutes 10 seconds", ExactDuration(10*time.Hour+10*time.Minute+10*time.Second))
|
||||
}
|
14
go.mod
@@ -3,19 +3,21 @@ module github.com/jmorganca/ollama
|
||||
go 1.20
|
||||
|
||||
require (
|
||||
github.com/dustin/go-humanize v1.0.1
|
||||
github.com/gin-gonic/gin v1.9.1
|
||||
github.com/mattn/go-runewidth v0.0.14
|
||||
github.com/mitchellh/colorstring v0.0.0-20190213212951-d06e56a500db
|
||||
github.com/olekukonko/tablewriter v0.0.5
|
||||
github.com/spf13/cobra v1.7.0
|
||||
)
|
||||
|
||||
require (
|
||||
github.com/mattn/go-runewidth v0.0.14 // indirect
|
||||
github.com/mitchellh/colorstring v0.0.0-20190213212951-d06e56a500db // indirect
|
||||
github.com/rivo/uniseg v0.2.0 // indirect
|
||||
)
|
||||
require github.com/rivo/uniseg v0.2.0 // indirect
|
||||
|
||||
require (
|
||||
dario.cat/mergo v1.0.0
|
||||
github.com/bytedance/sonic v1.9.1 // indirect
|
||||
github.com/chenzhuoyu/base64x v0.0.0-20221115062448-fe3a3abad311 // indirect
|
||||
github.com/chzyer/readline v1.5.1
|
||||
github.com/gabriel-vasile/mimetype v1.4.2 // indirect
|
||||
github.com/gin-contrib/sse v0.1.0 // indirect
|
||||
github.com/go-playground/locales v0.14.1 // indirect
|
||||
@@ -27,12 +29,10 @@ require (
|
||||
github.com/json-iterator/go v1.1.12 // indirect
|
||||
github.com/klauspost/cpuid/v2 v2.2.4 // indirect
|
||||
github.com/leodido/go-urn v1.2.4 // indirect
|
||||
github.com/lithammer/fuzzysearch v1.1.8
|
||||
github.com/mattn/go-isatty v0.0.19 // indirect
|
||||
github.com/modern-go/concurrent v0.0.0-20180306012644-bacd9c7ef1dd // indirect
|
||||
github.com/modern-go/reflect2 v1.0.2 // indirect
|
||||
github.com/pelletier/go-toml/v2 v2.0.8 // indirect
|
||||
github.com/schollz/progressbar/v3 v3.13.1
|
||||
github.com/spf13/pflag v1.0.5 // indirect
|
||||
github.com/twitchyliquid64/golang-asm v0.15.1 // indirect
|
||||
github.com/ugorji/go/codec v1.2.11 // indirect
|
||||
|
53
go.sum
@@ -1,13 +1,23 @@
|
||||
dario.cat/mergo v1.0.0 h1:AGCNq9Evsj31mOgNPcLyXc+4PNABt905YmuqPYYpBWk=
|
||||
dario.cat/mergo v1.0.0/go.mod h1:uNxQE+84aUszobStD9th8a29P2fMDhsBdgRYvZOxGmk=
|
||||
github.com/bytedance/sonic v1.5.0/go.mod h1:ED5hyg4y6t3/9Ku1R6dU/4KyJ48DZ4jPhfY1O2AihPM=
|
||||
github.com/bytedance/sonic v1.9.1 h1:6iJ6NqdoxCDr6mbY8h18oSO+cShGSMRGCEo7F2h0x8s=
|
||||
github.com/bytedance/sonic v1.9.1/go.mod h1:i736AoUSYt75HyZLoJW9ERYxcy6eaN6h4BZXU064P/U=
|
||||
github.com/chenzhuoyu/base64x v0.0.0-20211019084208-fb5309c8db06/go.mod h1:DH46F32mSOjUmXrMHnKwZdA8wcEefY7UVqBKYGjpdQY=
|
||||
github.com/chenzhuoyu/base64x v0.0.0-20221115062448-fe3a3abad311 h1:qSGYFH7+jGhDF8vLC+iwCD4WpbV1EBDSzWkJODFLams=
|
||||
github.com/chenzhuoyu/base64x v0.0.0-20221115062448-fe3a3abad311/go.mod h1:b583jCggY9gE99b6G5LEC39OIiVsWj+R97kbl5odCEk=
|
||||
github.com/chzyer/logex v1.2.1 h1:XHDu3E6q+gdHgsdTPH6ImJMIp436vR6MPtH8gP05QzM=
|
||||
github.com/chzyer/logex v1.2.1/go.mod h1:JLbx6lG2kDbNRFnfkgvh4eRJRPX1QCoOIWomwysCBrQ=
|
||||
github.com/chzyer/readline v1.5.1 h1:upd/6fQk4src78LMRzh5vItIt361/o4uq553V8B5sGI=
|
||||
github.com/chzyer/readline v1.5.1/go.mod h1:Eh+b79XXUwfKfcPLepksvw2tcLE/Ct21YObkaSkeBlk=
|
||||
github.com/chzyer/test v1.0.0 h1:p3BQDXSxOhOG0P9z6/hGnII4LGiEPOYBhs8asl/fC04=
|
||||
github.com/chzyer/test v1.0.0/go.mod h1:2JlltgoNkt4TW/z9V/IzDdFaMTM2JPIi26O1pF38GC8=
|
||||
github.com/cpuguy83/go-md2man/v2 v2.0.2/go.mod h1:tgQtvFlXSQOSOSIRvRPT7W67SCa46tRHOmNcaadrF8o=
|
||||
github.com/davecgh/go-spew v1.1.0/go.mod h1:J7Y8YcW2NihsgmVo/mv3lAwl/skON4iLHjSsI+c5H38=
|
||||
github.com/davecgh/go-spew v1.1.1 h1:vj9j/u1bqnvCEfJOwUhtlOARqs3+rkHYY13jYWTU97c=
|
||||
github.com/davecgh/go-spew v1.1.1/go.mod h1:J7Y8YcW2NihsgmVo/mv3lAwl/skON4iLHjSsI+c5H38=
|
||||
github.com/dustin/go-humanize v1.0.1 h1:GzkhY7T5VNhEkwH0PVJgjz+fX1rhBrR7pRT3mDkpeCY=
|
||||
github.com/dustin/go-humanize v1.0.1/go.mod h1:Mu1zIs6XwVuF/gI1OepvI0qD18qycQx+mFykh5fBlto=
|
||||
github.com/gabriel-vasile/mimetype v1.4.2 h1:w5qFW6JKBz9Y393Y4q372O9A7cUSequkh1Q7OhCmWKU=
|
||||
github.com/gabriel-vasile/mimetype v1.4.2/go.mod h1:zApsH/mKG4w07erKIaJPFiX0Tsq9BFQgN3qGY5GnNgA=
|
||||
github.com/gin-contrib/sse v0.1.0 h1:Y/yl/+YNO8GZSjAhjMsSuLt29uWRFHdHYUb5lYOV9qE=
|
||||
@@ -32,17 +42,14 @@ github.com/inconshreveable/mousetrap v1.1.0 h1:wN+x4NVGpMsO7ErUn/mUI3vEoE6Jt13X2
|
||||
github.com/inconshreveable/mousetrap v1.1.0/go.mod h1:vpF70FUmC8bwa3OWnCshd2FqLfsEA9PFc4w1p2J65bw=
|
||||
github.com/json-iterator/go v1.1.12 h1:PV8peI4a0ysnczrg+LtxykD8LfKY9ML6u2jnxaEnrnM=
|
||||
github.com/json-iterator/go v1.1.12/go.mod h1:e30LSqwooZae/UwlEbR2852Gd8hjQvJoHmT4TnhNGBo=
|
||||
github.com/k0kubun/go-ansi v0.0.0-20180517002512-3bf9e2903213/go.mod h1:vNUNkEQ1e29fT/6vq2aBdFsgNPmy8qMdSay1npru+Sw=
|
||||
github.com/klauspost/cpuid/v2 v2.0.9/go.mod h1:FInQzS24/EEf25PyTYn52gqo7WaD8xa0213Md/qVLRg=
|
||||
github.com/klauspost/cpuid/v2 v2.2.4 h1:acbojRNwl3o09bUq+yDCtZFc1aiwaAAxtcn8YkZXnvk=
|
||||
github.com/klauspost/cpuid/v2 v2.2.4/go.mod h1:RVVoqg1df56z8g3pUjL/3lE5UfnlrJX8tyFgg4nqhuY=
|
||||
github.com/leodido/go-urn v1.2.4 h1:XlAE/cm/ms7TE/VMVoduSpNBoyc2dOxHs5MZSwAN63Q=
|
||||
github.com/leodido/go-urn v1.2.4/go.mod h1:7ZrI8mTSeBSHl/UaRyKQW1qZeMgak41ANeCNaVckg+4=
|
||||
github.com/lithammer/fuzzysearch v1.1.8 h1:/HIuJnjHuXS8bKaiTMeeDlW2/AyIWk2brx1V8LFgLN4=
|
||||
github.com/lithammer/fuzzysearch v1.1.8/go.mod h1:IdqeyBClc3FFqSzYq/MXESsS4S0FsZ5ajtkr5xPLts4=
|
||||
github.com/mattn/go-isatty v0.0.17/go.mod h1:kYGgaQfpe5nmfYZH+SKPsOc2e4SrIfOl2e/yFXSvRLM=
|
||||
github.com/mattn/go-isatty v0.0.19 h1:JITubQf0MOLdlGRuRq+jtsDlekdYPia9ZFsB8h/APPA=
|
||||
github.com/mattn/go-isatty v0.0.19/go.mod h1:W+V8PltTTMOvKvAeJH7IuucS94S2C6jfK/D7dTCTo3Y=
|
||||
github.com/mattn/go-runewidth v0.0.9/go.mod h1:H031xJmbD/WCDINGzjvQ9THkh0rPKHF+m2gUSrubnMI=
|
||||
github.com/mattn/go-runewidth v0.0.14 h1:+xnbZSEeDbOIg5/mE6JF0w6n9duR1l3/WmbinWVwUuU=
|
||||
github.com/mattn/go-runewidth v0.0.14/go.mod h1:Jdepj2loyihRzMpdS35Xk/zdY8IAYHsh153qUoGf23w=
|
||||
github.com/mitchellh/colorstring v0.0.0-20190213212951-d06e56a500db h1:62I3jR2EmQ4l5rM/4FEfDWcRD+abF5XlKShorW5LRoQ=
|
||||
@@ -52,6 +59,8 @@ github.com/modern-go/concurrent v0.0.0-20180306012644-bacd9c7ef1dd h1:TRLaZ9cD/w
|
||||
github.com/modern-go/concurrent v0.0.0-20180306012644-bacd9c7ef1dd/go.mod h1:6dJC0mAP4ikYIbvyc7fijjWJddQyLn8Ig3JB5CqoB9Q=
|
||||
github.com/modern-go/reflect2 v1.0.2 h1:xBagoLtFs94CBntxluKeaWgTMpvLxC4ur3nMaC9Gz0M=
|
||||
github.com/modern-go/reflect2 v1.0.2/go.mod h1:yWuevngMOJpCy52FWWMvUC8ws7m/LJsjYzDa0/r8luk=
|
||||
github.com/olekukonko/tablewriter v0.0.5 h1:P2Ga83D34wi1o9J6Wh1mRuqd4mF/x/lgBS7N7AbDhec=
|
||||
github.com/olekukonko/tablewriter v0.0.5/go.mod h1:hPp6KlRPjbx+hW8ykQs1w3UBbZlj6HuIJcUGPhkA7kY=
|
||||
github.com/pelletier/go-toml/v2 v2.0.8 h1:0ctb6s9mE31h0/lhu+J6OPmVeDxJn+kYnJc2jZR9tGQ=
|
||||
github.com/pelletier/go-toml/v2 v2.0.8/go.mod h1:vuYfssBdrU2XDZ9bYydBu6t+6a6PYNcZljzZR9VXg+4=
|
||||
github.com/pmezard/go-difflib v1.0.0 h1:4DBwDE0NGyQoBHbLQYPwSUPoCMWR5BEzIk/f1lZbAQM=
|
||||
@@ -59,8 +68,6 @@ github.com/pmezard/go-difflib v1.0.0/go.mod h1:iKH77koFhYxTK1pcRnkKkqfTogsbg7gZN
|
||||
github.com/rivo/uniseg v0.2.0 h1:S1pD9weZBuJdFmowNwbpi7BJ8TNftyUImj/0WQi72jY=
|
||||
github.com/rivo/uniseg v0.2.0/go.mod h1:J6wj4VEh+S6ZtnVlnTBMWIodfgj8LQOQFoIToxlJtxc=
|
||||
github.com/russross/blackfriday/v2 v2.1.0/go.mod h1:+Rmxgy9KzJVeS9/2gXHxylqXiyQDYRxCVz55jmeOWTM=
|
||||
github.com/schollz/progressbar/v3 v3.13.1 h1:o8rySDYiQ59Mwzy2FELeHY5ZARXZTVJC7iHD6PEFUiE=
|
||||
github.com/schollz/progressbar/v3 v3.13.1/go.mod h1:xvrbki8kfT1fzWzBT/UZd9L6GA+jdL7HAgq2RFnO6fQ=
|
||||
github.com/spf13/cobra v1.7.0 h1:hyqWnYt1ZQShIddO5kBpj3vu05/++x6tJ6dg8EC572I=
|
||||
github.com/spf13/cobra v1.7.0/go.mod h1:uLxZILRyS/50WlhOIKD7W6V5bgeIt+4sICxh6uRMrb0=
|
||||
github.com/spf13/pflag v1.0.5 h1:iy+VFUOCP1a+8yFto/drg2CJ5u0yRoB7fZw3DKv/JXA=
|
||||
@@ -80,54 +87,22 @@ github.com/twitchyliquid64/golang-asm v0.15.1 h1:SU5vSMR7hnwNxj24w34ZyCi/FmDZTkS
|
||||
github.com/twitchyliquid64/golang-asm v0.15.1/go.mod h1:a1lVb/DtPvCB8fslRZhAngC2+aY1QWCk3Cedj/Gdt08=
|
||||
github.com/ugorji/go/codec v1.2.11 h1:BMaWp1Bb6fHwEtbplGBGJ498wD+LKlNSl25MjdZY4dU=
|
||||
github.com/ugorji/go/codec v1.2.11/go.mod h1:UNopzCgEMSXjBc6AOMqYvWC1ktqTAfzJZUZgYf6w6lg=
|
||||
github.com/yuin/goldmark v1.4.13/go.mod h1:6yULJ656Px+3vBD8DxQVa3kxgyrAnzto9xy5taEt/CY=
|
||||
golang.org/x/arch v0.0.0-20210923205945-b76863e36670/go.mod h1:5om86z9Hs0C8fWVUuoMHwpExlXzs5Tkyp9hOrfG7pp8=
|
||||
golang.org/x/arch v0.3.0 h1:02VY4/ZcO/gBOH6PUaoiptASxtXU10jazRCP865E97k=
|
||||
golang.org/x/arch v0.3.0/go.mod h1:5om86z9Hs0C8fWVUuoMHwpExlXzs5Tkyp9hOrfG7pp8=
|
||||
golang.org/x/crypto v0.0.0-20190308221718-c2843e01d9a2/go.mod h1:djNgcEr1/C05ACkg1iLfiJU5Ep61QUkGW8qpdssI0+w=
|
||||
golang.org/x/crypto v0.0.0-20210921155107-089bfa567519/go.mod h1:GvvjBRRGRdwPK5ydBHafDWAxML/pGHZbMvKqRZ5+Abc=
|
||||
golang.org/x/crypto v0.10.0 h1:LKqV2xt9+kDzSTfOhx4FrkEBcMrAgHSYgzywV9zcGmM=
|
||||
golang.org/x/crypto v0.10.0/go.mod h1:o4eNf7Ede1fv+hwOwZsTHl9EsPFO6q6ZvYR8vYfY45I=
|
||||
golang.org/x/mod v0.6.0-dev.0.20220419223038-86c51ed26bb4/go.mod h1:jJ57K6gSWd91VN4djpZkiMVwK6gcyfeH4XE8wZrZaV4=
|
||||
golang.org/x/mod v0.8.0/go.mod h1:iBbtSCu2XBx23ZKBPSOrRkjjQPZFPuis4dIYUhu/chs=
|
||||
golang.org/x/net v0.0.0-20190620200207-3b0461eec859/go.mod h1:z5CRVTTTmAJ677TzLLGU+0bjPO0LkuOLi4/5GtJWs/s=
|
||||
golang.org/x/net v0.0.0-20210226172049-e18ecbb05110/go.mod h1:m0MpNAwzfU5UDzcl9v0D8zg8gWTRqZa9RBIspLL5mdg=
|
||||
golang.org/x/net v0.0.0-20220722155237-a158d28d115b/go.mod h1:XRhObCWvk6IyKnWLug+ECip1KBveYUHfp+8e9klMJ9c=
|
||||
golang.org/x/net v0.6.0/go.mod h1:2Tu9+aMcznHK/AK1HMvgo6xiTLG5rD5rZLDS+rp2Bjs=
|
||||
golang.org/x/net v0.10.0 h1:X2//UzNDwYmtCLn7To6G58Wr6f5ahEAQgKNzv9Y951M=
|
||||
golang.org/x/net v0.10.0/go.mod h1:0qNGK6F8kojg2nk9dLZ2mShWaEBan6FAoqfSigmmuDg=
|
||||
golang.org/x/sync v0.0.0-20190423024810-112230192c58/go.mod h1:RxMgew5VJxzue5/jJTE5uejpjVlOe/izrB70Jof72aM=
|
||||
golang.org/x/sync v0.0.0-20220722155255-886fb9371eb4/go.mod h1:RxMgew5VJxzue5/jJTE5uejpjVlOe/izrB70Jof72aM=
|
||||
golang.org/x/sync v0.1.0/go.mod h1:RxMgew5VJxzue5/jJTE5uejpjVlOe/izrB70Jof72aM=
|
||||
golang.org/x/sys v0.0.0-20190215142949-d0b11bdaac8a/go.mod h1:STP8DvDyc/dI5b8T5hshtkjS+E42TnysNCUPdjciGhY=
|
||||
golang.org/x/sys v0.0.0-20201119102817-f84b799fce68/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs=
|
||||
golang.org/x/sys v0.0.0-20210615035016-665e8c7367d1/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg=
|
||||
golang.org/x/sys v0.0.0-20220520151302-bc2c85ada10a/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg=
|
||||
golang.org/x/sys v0.0.0-20220310020820-b874c991c1a5/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg=
|
||||
golang.org/x/sys v0.0.0-20220704084225-05e143d24a9e/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg=
|
||||
golang.org/x/sys v0.0.0-20220722155257-8c9f86f7a55f/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg=
|
||||
golang.org/x/sys v0.0.0-20220811171246-fbc7d0a398ab/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg=
|
||||
golang.org/x/sys v0.5.0/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg=
|
||||
golang.org/x/sys v0.6.0/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg=
|
||||
golang.org/x/sys v0.10.0 h1:SqMFp9UcQJZa+pmYuAKjd9xq1f0j5rLcDIk0mj4qAsA=
|
||||
golang.org/x/sys v0.10.0/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg=
|
||||
golang.org/x/term v0.0.0-20201126162022-7de9c90e9dd1/go.mod h1:bj7SfCRtBDWHUb9snDiAeCFNEtKQo2Wmx5Cou7ajbmo=
|
||||
golang.org/x/term v0.0.0-20210927222741-03fcf44c2211/go.mod h1:jbD1KX2456YbFQfuXm/mYQcufACuNUgVhRMnK/tPxf8=
|
||||
golang.org/x/term v0.5.0/go.mod h1:jMB1sMXY+tzblOD4FWmEbocvup2/aLOaQEp7JmGp78k=
|
||||
golang.org/x/term v0.6.0/go.mod h1:m6U89DPEgQRMq3DNkDClhWw02AUbt2daBVO4cn4Hv9U=
|
||||
golang.org/x/term v0.10.0 h1:3R7pNqamzBraeqj/Tj8qt1aQ2HpmlC+Cx/qL/7hn4/c=
|
||||
golang.org/x/term v0.10.0/go.mod h1:lpqdcUyK/oCiQxvxVrppt5ggO2KCZ5QblwqPnfZ6d5o=
|
||||
golang.org/x/text v0.3.0/go.mod h1:NqM8EUOU14njkJ3fqMW+pc6Ldnwhi/IjpwHt7yyuwOQ=
|
||||
golang.org/x/text v0.3.3/go.mod h1:5Zoc/QRtKVWzQhOtBMvqHzDpF6irO9z98xDceosuGiQ=
|
||||
golang.org/x/text v0.3.7/go.mod h1:u+2+/6zg+i71rQMx5EYifcz6MCKuco9NR6JIITiCfzQ=
|
||||
golang.org/x/text v0.7.0/go.mod h1:mrYo+phRRbMaCq/xk9113O4dZlRixOauAjOtrjsXDZ8=
|
||||
golang.org/x/text v0.9.0/go.mod h1:e1OnstbJyHTd6l/uOt8jFFHp6TRDWZR/bV3emEE/zU8=
|
||||
golang.org/x/text v0.10.0 h1:UpjohKhiEgNc0CSauXmwYftY1+LlaC75SJwh0SgCX58=
|
||||
golang.org/x/text v0.10.0/go.mod h1:TvPlkZtksWOMsz7fbANvkp4WM8x/WCo/om8BMLbz+aE=
|
||||
golang.org/x/tools v0.0.0-20180917221912-90fa682c2a6e/go.mod h1:n7NCudcB/nEzxVGmLbDWY5pfWTLqBcC2KZ6jyYvM4mQ=
|
||||
golang.org/x/tools v0.0.0-20191119224855-298f0cb1881e/go.mod h1:b+2E5dAYhXwXZwtnZ6UAqBI28+e2cm9otk0dWdXHAEo=
|
||||
golang.org/x/tools v0.1.12/go.mod h1:hNGJHUnrk76NpqgfD5Aqm5Crs+Hm0VOH/i9J2+nxYbc=
|
||||
golang.org/x/tools v0.6.0/go.mod h1:Xwgl3UAJ/d3gWutnCtw505GrjyAbvKui8lOU390QaIU=
|
||||
golang.org/x/xerrors v0.0.0-20190717185122-a985d3407aa7/go.mod h1:I/5z698sn9Ka8TeJc9MKroUUfqBBauWjQqLJ2OPfmY0=
|
||||
golang.org/x/xerrors v0.0.0-20191204190536-9bdfabe68543/go.mod h1:I/5z698sn9Ka8TeJc9MKroUUfqBBauWjQqLJ2OPfmY0=
|
||||
google.golang.org/protobuf v1.26.0-rc.1/go.mod h1:jlhhOSvTdKEhbULTjvd4ARK9grFBp09yW+WbY/TyQbw=
|
||||
google.golang.org/protobuf v1.30.0 h1:kPPoIgf3TsEvrm0PFe15JQ+570QVxYzEvvHqChK+cng=
|
||||
|
1
library/.gitignore
vendored
Normal file
@@ -0,0 +1 @@
|
||||
models
|
7
library/downloads
Normal file
@@ -0,0 +1,7 @@
|
||||
https://huggingface.co/TheBloke/orca_mini_3B-GGML/resolve/main/orca-mini-3b.ggmlv3.q4_0.bin e84705205f71dd55be7b24a778f248f0eda9999a125d313358c087e092d83148
|
||||
https://huggingface.co/TheBloke/Nous-Hermes-13B-GGML/resolve/main/nous-hermes-13b.ggmlv3.q4_0.bin d1735b93e1dc503f1045ccd6c8bd73277b18ba892befd1dc29e9b9a7822ed998
|
||||
https://huggingface.co/TheBloke/vicuna-7B-v1.3-GGML/resolve/main/vicuna-7b-v1.3.ggmlv3.q4_0.bin 23ce5ed290b56a19305178b9ada2c3d96036bd69a6c18304b6158eb6672d6c0f
|
||||
https://huggingface.co/TheBloke/Wizard-Vicuna-13B-Uncensored-GGML/resolve/main/Wizard-Vicuna-13B-Uncensored.ggmlv3.q4_0.bin 1f08b147a5bce41cfcbb3fd5d51ba765dea1786e15b5655ab69ba3a337a893b7
|
||||
https://huggingface.co/TheBloke/Llama-2-7B-GGML/resolve/main/llama-2-7b.ggmlv3.q4_0.bin bfa26d855e44629c4cf919985e90bd7fa03b77eea1676791519e39a4d45fd4d5
|
||||
https://huggingface.co/TheBloke/Llama-2-7B-Chat-GGML/resolve/main/llama-2-7b-chat.ggmlv3.q4_0.bin 8daa9615cce30c259a9555b1cc250d461d1bc69980a274b44d7eda0be78076d8
|
||||
https://huggingface.co/TheBloke/Llama-2-13B-chat-GGML/resolve/main/llama-2-13b-chat.ggmlv3.q4_0.bin f79142715bc9539a2edbb4b253548db8b34fac22736593eeaa28555874476e30
|
147
library/modelfiles/llama2
Normal file
@@ -0,0 +1,147 @@
|
||||
FROM ../models/llama-2-7b-chat.ggmlv3.q4_0.bin
|
||||
|
||||
TEMPLATE """
|
||||
{{- if .First }}
|
||||
<<SYS>>
|
||||
{{ .System }}
|
||||
<</SYS>>
|
||||
{{- end }}
|
||||
|
||||
[INST] {{ .Prompt }} [/INST]
|
||||
"""
|
||||
|
||||
SYSTEM """
|
||||
You are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe. Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature.
|
||||
|
||||
If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information.
|
||||
"""
|
||||
|
||||
LICENSE """
|
||||
Llama 2 Community License Agreement
|
||||
|
||||
Llama 2 Version Release Date: July 18, 2023
|
||||
|
||||
“Agreement” means the terms and conditions for use, reproduction, distribution and modification of the Llama Materials set forth herein.
|
||||
|
||||
“Documentation” means the specifications, manuals and documentation accompanying Llama 2 distributed by Meta at ai.meta.com/resources/models-and-libraries/llama-downloads/.
|
||||
|
||||
“Licensee” or “you” means you, or your employer or any other person or entity (if you are entering into this Agreement on such person or entity’s behalf), of the age required under applicable laws, rules or regulations to provide legal consent and that has legal authority to bind your employer or such other person or entity if you are entering in this Agreement on their behalf.
|
||||
|
||||
“Llama 2” means the foundational large language models and software and algorithms, including machine-learning model code, trained model weights, inference-enabling code, training-enabling code, fine-tuning enabling code and other elements of the foregoing distributed by Meta at ai.meta.com/resources/models-and-libraries/llama-downloads/.
|
||||
|
||||
“Llama Materials” means, collectively, Meta’s proprietary Llama 2 and Documentation (and any portion thereof) made available under this Agreement.
|
||||
|
||||
“Meta” or “we” means Meta Platforms Ireland Limited (if you are located in or, if you are an entity, your principal place of business is in the EEA or Switzerland) and Meta Platforms, Inc. (if you are located outside of the EEA or Switzerland).
|
||||
|
||||
By clicking “I Accept” below or by using or distributing any portion or element of the Llama Materials, you agree to be bound by this Agreement.
|
||||
|
||||
1. License Rights and Redistribution.
|
||||
|
||||
a. Grant of Rights. You are granted a non-exclusive, worldwide, non-transferable and royalty-free limited license under Meta’s intellectual property or other rights owned by Meta embodied in the Llama Materials to use, reproduce, distribute, copy, create derivative works of, and make modifications to the Llama Materials.
|
||||
|
||||
b. Redistribution and Use.
|
||||
|
||||
i. If you distribute or make the Llama Materials, or any derivative works thereof, available to a third party, you shall provide a copy of this Agreement to such third party.
|
||||
|
||||
ii. If you receive Llama Materials, or any derivative works thereof, from a Licensee as part of an integrated end user product, then Section 2 of this Agreement will not apply to you.
|
||||
|
||||
iii. You must retain in all copies of the Llama Materials that you distribute the following attribution notice within a “Notice” text file distributed as a part of such copies: “Llama 2 is licensed under the LLAMA 2 Community License, Copyright © Meta Platforms, Inc. All Rights Reserved.”
|
||||
|
||||
iv. Your use of the Llama Materials must comply with applicable laws and regulations (including trade compliance laws and regulations) and adhere to the Acceptable Use Policy for the Llama Materials (available at https://ai.meta.com/llama/use-policy), which is hereby incorporated by reference into this Agreement.
|
||||
|
||||
v. You will not use the Llama Materials or any output or results of the Llama Materials to improve any other large language model (excluding Llama 2 or derivative works thereof).
|
||||
|
||||
2. Additional Commercial Terms. If, on the Llama 2 version release date, the monthly active users of the products or services made available by or for Licensee, or Licensee’s affiliates, is greater than 700 million monthly active users in the preceding calendar month, you must request a license from Meta, which Meta may grant to you in its sole discretion, and you are not authorized to exercise any of the rights under this Agreement unless or until Meta otherwise expressly grants you such rights.
|
||||
|
||||
3. Disclaimer of Warranty. UNLESS REQUIRED BY APPLICABLE LAW, THE LLAMA MATERIALS AND ANY OUTPUT AND RESULTS THEREFROM ARE PROVIDED ON AN “AS IS” BASIS, WITHOUT WARRANTIES OF ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING, WITHOUT LIMITATION, ANY WARRANTIES OF TITLE, NON-INFRINGEMENT, MERCHANTABILITY, OR FITNESS FOR A PARTICULAR PURPOSE. YOU ARE SOLELY RESPONSIBLE FOR DETERMINING THE APPROPRIATENESS OF USING OR REDISTRIBUTING THE LLAMA MATERIALS AND ASSUME ANY RISKS ASSOCIATED WITH YOUR USE OF THE LLAMA MATERIALS AND ANY OUTPUT AND RESULTS.
|
||||
|
||||
4. Limitation of Liability. IN NO EVENT WILL META OR ITS AFFILIATES BE LIABLE UNDER ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, TORT, NEGLIGENCE, PRODUCTS LIABILITY, OR OTHERWISE, ARISING OUT OF THIS AGREEMENT, FOR ANY LOST PROFITS OR ANY INDIRECT, SPECIAL, CONSEQUENTIAL, INCIDENTAL, EXEMPLARY OR PUNITIVE DAMAGES, EVEN IF META OR ITS AFFILIATES HAVE BEEN ADVISED OF THE POSSIBILITY OF ANY OF THE FOREGOING.
|
||||
|
||||
5. Intellectual Property.
|
||||
|
||||
a. No trademark licenses are granted under this Agreement, and in connection with the Llama Materials, neither Meta nor Licensee may use any name or mark owned by or associated with the other or any of its affiliates, except as required for reasonable and customary use in describing and redistributing the Llama Materials.
|
||||
|
||||
b. Subject to Meta’s ownership of Llama Materials and derivatives made by or for Meta, with respect to any derivative works and modifications of the Llama Materials that are made by you, as between you and Meta, you are and will be the owner of such derivative works and modifications.
|
||||
|
||||
c. If you institute litigation or other proceedings against Meta or any entity (including a cross-claim or counterclaim in a lawsuit) alleging that the Llama Materials or Llama 2 outputs or results, or any portion of any of the foregoing, constitutes infringement of intellectual property or other rights owned or licensable by you, then any licenses granted to you under this Agreement shall terminate as of the date such litigation or claim is filed or instituted. You will indemnify and hold harmless Meta from and against any claim by any third party arising out of or related to your use or distribution of the Llama Materials.
|
||||
|
||||
6. Term and Termination. The term of this Agreement will commence upon your acceptance of this Agreement or access to the Llama Materials and will continue in full force and effect until terminated in accordance with the terms and conditions herein. Meta may terminate this Agreement if you are in breach of any term or condition of this Agreement. Upon termination of this Agreement, you shall delete and cease use of the Llama Materials. Sections 3, 4 and 7 shall survive the termination of this Agreement.
|
||||
|
||||
7. Governing Law and Jurisdiction. This Agreement will be governed and construed under the laws of the State of California without regard to choice of law principles, and the UN Convention on Contracts for the International Sale of Goods does not apply to this Agreement. The courts of California shall have exclusive jurisdiction of any dispute arising out of this Agreement.
|
||||
|
||||
"""
|
||||
|
||||
LICENSE """
|
||||
Llama 2 Acceptable Use Policy
|
||||
|
||||
Meta is committed to promoting safe and fair use of its tools and features, including Llama 2. If you access or use Llama 2, you agree to this Acceptable Use Policy (“Policy”). The most recent copy of this policy can be found at ai.meta.com/llama/use-policy.
|
||||
|
||||
Prohibited Uses
|
||||
|
||||
We want everyone to use Llama 2 safely and responsibly. You agree you will not use, or allow others to use, Llama 2 to:
|
||||
|
||||
1. Violate the law or others’ rights, including to:
|
||||
|
||||
a. Engage in, promote, generate, contribute to, encourage, plan, incite, or further illegal or unlawful activity or content, such as:
|
||||
|
||||
i. Violence or terrorism
|
||||
|
||||
ii. Exploitation or harm to children, including the solicitation, creation, acquisition, or dissemination of child exploitative content or failure to report Child Sexual Abuse Material
|
||||
|
||||
b. Human trafficking, exploitation, and sexual violence
|
||||
|
||||
iii. The illegal distribution of information or materials to minors, including obscene materials, or failure to employ legally required age-gating in connection with such information or materials.
|
||||
|
||||
iv. Sexual solicitation
|
||||
|
||||
vi. Any other criminal activity
|
||||
|
||||
c. Engage in, promote, incite, or facilitate the harassment, abuse, threatening, or bullying of individuals or groups of individuals
|
||||
|
||||
d. Engage in, promote, incite, or facilitate discrimination or other unlawful or harmful conduct in the provision of employment, employment benefits, credit, housing, other economic benefits, or other essential goods and services
|
||||
|
||||
e. Engage in the unauthorized or unlicensed practice of any profession including, but not limited to, financial, legal, medical/health, or related professional practices
|
||||
|
||||
f. Collect, process, disclose, generate, or infer health, demographic, or other sensitive personal or private information about individuals without rights and consents required by applicable laws
|
||||
|
||||
g. Engage in or facilitate any action or generate any content that infringes, misappropriates, or otherwise violates any third-party rights, including the outputs or results of any products or services using the Llama 2 Materials
|
||||
|
||||
h. Create, generate, or facilitate the creation of malicious code, malware, computer viruses or do anything else that could disable, overburden, interfere with or impair the proper working, integrity, operation or appearance of a website or computer system
|
||||
|
||||
2. Engage in, promote, incite, facilitate, or assist in the planning or development of activities that present a risk of death or bodily harm to individuals, including use of Llama 2 related to the following:
|
||||
|
||||
a. Military, warfare, nuclear industries or applications, espionage, use for materials or activities that are subject to the International Traffic Arms Regulations (ITAR) maintained by the United States Department of State
|
||||
|
||||
b. Guns and illegal weapons (including weapon development)
|
||||
|
||||
c. Illegal drugs and regulated/controlled substances
|
||||
|
||||
d. Operation of critical infrastructure, transportation technologies, or heavy machinery
|
||||
|
||||
e. Self-harm or harm to others, including suicide, cutting, and eating disorders
|
||||
|
||||
f. Any content intended to incite or promote violence, abuse, or any infliction of bodily harm to an individual
|
||||
|
||||
3. Intentionally deceive or mislead others, including use of Llama 2 related to the following:
|
||||
|
||||
a. Generating, promoting, or furthering fraud or the creation or promotion of disinformation
|
||||
|
||||
b. Generating, promoting, or furthering defamatory content, including the creation of defamatory statements, images, or other content
|
||||
|
||||
c. Generating, promoting, or further distributing spam
|
||||
|
||||
d. Impersonating another individual without consent, authorization, or legal right
|
||||
|
||||
e. Representing that the use of Llama 2 or outputs are human-generated
|
||||
|
||||
f. Generating or facilitating false online engagement, including fake reviews and other means of fake online engagement
|
||||
|
||||
4. Fail to appropriately disclose to end users any known dangers of your AI system
|
||||
|
||||
Please report any violation of this Policy, software “bug,” or other problems that could lead to a violation of this Policy through one of the following means:
|
||||
|
||||
Reporting issues with the model: github.com/facebookresearch/llama
|
||||
Reporting risky content generated by the model: developers.facebook.com/llama_output_feedback
|
||||
Reporting bugs and security concerns: facebook.com/whitehat/info
|
||||
Reporting violations of the Acceptable Use Policy or unlicensed uses of Llama: LlamaUseReport@meta.com
|
||||
"""
|
147
library/modelfiles/llama2_13b
Normal file
@@ -0,0 +1,147 @@
|
||||
FROM ../models/llama-2-13b-chat.ggmlv3.q4_0.bin
|
||||
|
||||
TEMPLATE """
|
||||
{{- if .First }}
|
||||
<<SYS>>
|
||||
{{ .System }}
|
||||
<</SYS>>
|
||||
{{- end }}
|
||||
|
||||
[INST] {{ .Prompt }} [/INST]
|
||||
"""
|
||||
|
||||
SYSTEM """
|
||||
You are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe. Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature.
|
||||
|
||||
If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information.
|
||||
"""
|
||||
|
||||
LICENSE """
|
||||
Llama 2 Community License Agreement
|
||||
|
||||
Llama 2 Version Release Date: July 18, 2023
|
||||
|
||||
“Agreement” means the terms and conditions for use, reproduction, distribution and modification of the Llama Materials set forth herein.
|
||||
|
||||
“Documentation” means the specifications, manuals and documentation accompanying Llama 2 distributed by Meta at ai.meta.com/resources/models-and-libraries/llama-downloads/.
|
||||
|
||||
“Licensee” or “you” means you, or your employer or any other person or entity (if you are entering into this Agreement on such person or entity’s behalf), of the age required under applicable laws, rules or regulations to provide legal consent and that has legal authority to bind your employer or such other person or entity if you are entering in this Agreement on their behalf.
|
||||
|
||||
“Llama 2” means the foundational large language models and software and algorithms, including machine-learning model code, trained model weights, inference-enabling code, training-enabling code, fine-tuning enabling code and other elements of the foregoing distributed by Meta at ai.meta.com/resources/models-and-libraries/llama-downloads/.
|
||||
|
||||
“Llama Materials” means, collectively, Meta’s proprietary Llama 2 and Documentation (and any portion thereof) made available under this Agreement.
|
||||
|
||||
“Meta” or “we” means Meta Platforms Ireland Limited (if you are located in or, if you are an entity, your principal place of business is in the EEA or Switzerland) and Meta Platforms, Inc. (if you are located outside of the EEA or Switzerland).
|
||||
|
||||
By clicking “I Accept” below or by using or distributing any portion or element of the Llama Materials, you agree to be bound by this Agreement.
|
||||
|
||||
1. License Rights and Redistribution.
|
||||
|
||||
a. Grant of Rights. You are granted a non-exclusive, worldwide, non-transferable and royalty-free limited license under Meta’s intellectual property or other rights owned by Meta embodied in the Llama Materials to use, reproduce, distribute, copy, create derivative works of, and make modifications to the Llama Materials.
|
||||
|
||||
b. Redistribution and Use.
|
||||
|
||||
i. If you distribute or make the Llama Materials, or any derivative works thereof, available to a third party, you shall provide a copy of this Agreement to such third party.
|
||||
|
||||
ii. If you receive Llama Materials, or any derivative works thereof, from a Licensee as part of an integrated end user product, then Section 2 of this Agreement will not apply to you.
|
||||
|
||||
iii. You must retain in all copies of the Llama Materials that you distribute the following attribution notice within a “Notice” text file distributed as a part of such copies: “Llama 2 is licensed under the LLAMA 2 Community License, Copyright © Meta Platforms, Inc. All Rights Reserved.”
|
||||
|
||||
iv. Your use of the Llama Materials must comply with applicable laws and regulations (including trade compliance laws and regulations) and adhere to the Acceptable Use Policy for the Llama Materials (available at https://ai.meta.com/llama/use-policy), which is hereby incorporated by reference into this Agreement.
|
||||
|
||||
v. You will not use the Llama Materials or any output or results of the Llama Materials to improve any other large language model (excluding Llama 2 or derivative works thereof).
|
||||
|
||||
2. Additional Commercial Terms. If, on the Llama 2 version release date, the monthly active users of the products or services made available by or for Licensee, or Licensee’s affiliates, is greater than 700 million monthly active users in the preceding calendar month, you must request a license from Meta, which Meta may grant to you in its sole discretion, and you are not authorized to exercise any of the rights under this Agreement unless or until Meta otherwise expressly grants you such rights.
|
||||
|
||||
3. Disclaimer of Warranty. UNLESS REQUIRED BY APPLICABLE LAW, THE LLAMA MATERIALS AND ANY OUTPUT AND RESULTS THEREFROM ARE PROVIDED ON AN “AS IS” BASIS, WITHOUT WARRANTIES OF ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING, WITHOUT LIMITATION, ANY WARRANTIES OF TITLE, NON-INFRINGEMENT, MERCHANTABILITY, OR FITNESS FOR A PARTICULAR PURPOSE. YOU ARE SOLELY RESPONSIBLE FOR DETERMINING THE APPROPRIATENESS OF USING OR REDISTRIBUTING THE LLAMA MATERIALS AND ASSUME ANY RISKS ASSOCIATED WITH YOUR USE OF THE LLAMA MATERIALS AND ANY OUTPUT AND RESULTS.
|
||||
|
||||
4. Limitation of Liability. IN NO EVENT WILL META OR ITS AFFILIATES BE LIABLE UNDER ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, TORT, NEGLIGENCE, PRODUCTS LIABILITY, OR OTHERWISE, ARISING OUT OF THIS AGREEMENT, FOR ANY LOST PROFITS OR ANY INDIRECT, SPECIAL, CONSEQUENTIAL, INCIDENTAL, EXEMPLARY OR PUNITIVE DAMAGES, EVEN IF META OR ITS AFFILIATES HAVE BEEN ADVISED OF THE POSSIBILITY OF ANY OF THE FOREGOING.
|
||||
|
||||
5. Intellectual Property.
|
||||
|
||||
a. No trademark licenses are granted under this Agreement, and in connection with the Llama Materials, neither Meta nor Licensee may use any name or mark owned by or associated with the other or any of its affiliates, except as required for reasonable and customary use in describing and redistributing the Llama Materials.
|
||||
|
||||
b. Subject to Meta’s ownership of Llama Materials and derivatives made by or for Meta, with respect to any derivative works and modifications of the Llama Materials that are made by you, as between you and Meta, you are and will be the owner of such derivative works and modifications.
|
||||
|
||||
c. If you institute litigation or other proceedings against Meta or any entity (including a cross-claim or counterclaim in a lawsuit) alleging that the Llama Materials or Llama 2 outputs or results, or any portion of any of the foregoing, constitutes infringement of intellectual property or other rights owned or licensable by you, then any licenses granted to you under this Agreement shall terminate as of the date such litigation or claim is filed or instituted. You will indemnify and hold harmless Meta from and against any claim by any third party arising out of or related to your use or distribution of the Llama Materials.
|
||||
|
||||
6. Term and Termination. The term of this Agreement will commence upon your acceptance of this Agreement or access to the Llama Materials and will continue in full force and effect until terminated in accordance with the terms and conditions herein. Meta may terminate this Agreement if you are in breach of any term or condition of this Agreement. Upon termination of this Agreement, you shall delete and cease use of the Llama Materials. Sections 3, 4 and 7 shall survive the termination of this Agreement.
|
||||
|
||||
7. Governing Law and Jurisdiction. This Agreement will be governed and construed under the laws of the State of California without regard to choice of law principles, and the UN Convention on Contracts for the International Sale of Goods does not apply to this Agreement. The courts of California shall have exclusive jurisdiction of any dispute arising out of this Agreement.
|
||||
|
||||
"""
|
||||
|
||||
LICENSE """
|
||||
Llama 2 Acceptable Use Policy
|
||||
|
||||
Meta is committed to promoting safe and fair use of its tools and features, including Llama 2. If you access or use Llama 2, you agree to this Acceptable Use Policy (“Policy”). The most recent copy of this policy can be found at ai.meta.com/llama/use-policy.
|
||||
|
||||
Prohibited Uses
|
||||
|
||||
We want everyone to use Llama 2 safely and responsibly. You agree you will not use, or allow others to use, Llama 2 to:
|
||||
|
||||
1. Violate the law or others’ rights, including to:
|
||||
|
||||
a. Engage in, promote, generate, contribute to, encourage, plan, incite, or further illegal or unlawful activity or content, such as:
|
||||
|
||||
i. Violence or terrorism
|
||||
|
||||
ii. Exploitation or harm to children, including the solicitation, creation, acquisition, or dissemination of child exploitative content or failure to report Child Sexual Abuse Material
|
||||
|
||||
b. Human trafficking, exploitation, and sexual violence
|
||||
|
||||
iii. The illegal distribution of information or materials to minors, including obscene materials, or failure to employ legally required age-gating in connection with such information or materials.
|
||||
|
||||
iv. Sexual solicitation
|
||||
|
||||
vi. Any other criminal activity
|
||||
|
||||
c. Engage in, promote, incite, or facilitate the harassment, abuse, threatening, or bullying of individuals or groups of individuals
|
||||
|
||||
d. Engage in, promote, incite, or facilitate discrimination or other unlawful or harmful conduct in the provision of employment, employment benefits, credit, housing, other economic benefits, or other essential goods and services
|
||||
|
||||
e. Engage in the unauthorized or unlicensed practice of any profession including, but not limited to, financial, legal, medical/health, or related professional practices
|
||||
|
||||
f. Collect, process, disclose, generate, or infer health, demographic, or other sensitive personal or private information about individuals without rights and consents required by applicable laws
|
||||
|
||||
g. Engage in or facilitate any action or generate any content that infringes, misappropriates, or otherwise violates any third-party rights, including the outputs or results of any products or services using the Llama 2 Materials
|
||||
|
||||
h. Create, generate, or facilitate the creation of malicious code, malware, computer viruses or do anything else that could disable, overburden, interfere with or impair the proper working, integrity, operation or appearance of a website or computer system
|
||||
|
||||
2. Engage in, promote, incite, facilitate, or assist in the planning or development of activities that present a risk of death or bodily harm to individuals, including use of Llama 2 related to the following:
|
||||
|
||||
a. Military, warfare, nuclear industries or applications, espionage, use for materials or activities that are subject to the International Traffic Arms Regulations (ITAR) maintained by the United States Department of State
|
||||
|
||||
b. Guns and illegal weapons (including weapon development)
|
||||
|
||||
c. Illegal drugs and regulated/controlled substances
|
||||
|
||||
d. Operation of critical infrastructure, transportation technologies, or heavy machinery
|
||||
|
||||
e. Self-harm or harm to others, including suicide, cutting, and eating disorders
|
||||
|
||||
f. Any content intended to incite or promote violence, abuse, or any infliction of bodily harm to an individual
|
||||
|
||||
3. Intentionally deceive or mislead others, including use of Llama 2 related to the following:
|
||||
|
||||
a. Generating, promoting, or furthering fraud or the creation or promotion of disinformation
|
||||
|
||||
b. Generating, promoting, or furthering defamatory content, including the creation of defamatory statements, images, or other content
|
||||
|
||||
c. Generating, promoting, or further distributing spam
|
||||
|
||||
d. Impersonating another individual without consent, authorization, or legal right
|
||||
|
||||
e. Representing that the use of Llama 2 or outputs are human-generated
|
||||
|
||||
f. Generating or facilitating false online engagement, including fake reviews and other means of fake online engagement
|
||||
|
||||
4. Fail to appropriately disclose to end users any known dangers of your AI system
|
||||
|
||||
Please report any violation of this Policy, software “bug,” or other problems that could lead to a violation of this Policy through one of the following means:
|
||||
|
||||
Reporting issues with the model: github.com/facebookresearch/llama
|
||||
Reporting risky content generated by the model: developers.facebook.com/llama_output_feedback
|
||||
Reporting bugs and security concerns: facebook.com/whitehat/info
|
||||
Reporting violations of the Acceptable Use Policy or unlicensed uses of Llama: LlamaUseReport@meta.com
|
||||
"""
|
147
library/modelfiles/llama2_7b
Normal file
@@ -0,0 +1,147 @@
|
||||
FROM ../models/llama-2-7b-chat.ggmlv3.q4_0.bin
|
||||
|
||||
TEMPLATE """
|
||||
{{- if .First }}
|
||||
<<SYS>>
|
||||
{{ .System }}
|
||||
<</SYS>>
|
||||
{{- end }}
|
||||
|
||||
[INST] {{ .Prompt }} [/INST]
|
||||
"""
|
||||
|
||||
SYSTEM """
|
||||
You are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe. Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature.
|
||||
|
||||
If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information.
|
||||
"""
|
||||
|
||||
LICENSE """
|
||||
Llama 2 Community License Agreement
|
||||
|
||||
Llama 2 Version Release Date: July 18, 2023
|
||||
|
||||
“Agreement” means the terms and conditions for use, reproduction, distribution and modification of the Llama Materials set forth herein.
|
||||
|
||||
“Documentation” means the specifications, manuals and documentation accompanying Llama 2 distributed by Meta at ai.meta.com/resources/models-and-libraries/llama-downloads/.
|
||||
|
||||
“Licensee” or “you” means you, or your employer or any other person or entity (if you are entering into this Agreement on such person or entity’s behalf), of the age required under applicable laws, rules or regulations to provide legal consent and that has legal authority to bind your employer or such other person or entity if you are entering in this Agreement on their behalf.
|
||||
|
||||
“Llama 2” means the foundational large language models and software and algorithms, including machine-learning model code, trained model weights, inference-enabling code, training-enabling code, fine-tuning enabling code and other elements of the foregoing distributed by Meta at ai.meta.com/resources/models-and-libraries/llama-downloads/.
|
||||
|
||||
“Llama Materials” means, collectively, Meta’s proprietary Llama 2 and Documentation (and any portion thereof) made available under this Agreement.
|
||||
|
||||
“Meta” or “we” means Meta Platforms Ireland Limited (if you are located in or, if you are an entity, your principal place of business is in the EEA or Switzerland) and Meta Platforms, Inc. (if you are located outside of the EEA or Switzerland).
|
||||
|
||||
By clicking “I Accept” below or by using or distributing any portion or element of the Llama Materials, you agree to be bound by this Agreement.
|
||||
|
||||
1. License Rights and Redistribution.
|
||||
|
||||
a. Grant of Rights. You are granted a non-exclusive, worldwide, non-transferable and royalty-free limited license under Meta’s intellectual property or other rights owned by Meta embodied in the Llama Materials to use, reproduce, distribute, copy, create derivative works of, and make modifications to the Llama Materials.
|
||||
|
||||
b. Redistribution and Use.
|
||||
|
||||
i. If you distribute or make the Llama Materials, or any derivative works thereof, available to a third party, you shall provide a copy of this Agreement to such third party.
|
||||
|
||||
ii. If you receive Llama Materials, or any derivative works thereof, from a Licensee as part of an integrated end user product, then Section 2 of this Agreement will not apply to you.
|
||||
|
||||
iii. You must retain in all copies of the Llama Materials that you distribute the following attribution notice within a “Notice” text file distributed as a part of such copies: “Llama 2 is licensed under the LLAMA 2 Community License, Copyright © Meta Platforms, Inc. All Rights Reserved.”
|
||||
|
||||
iv. Your use of the Llama Materials must comply with applicable laws and regulations (including trade compliance laws and regulations) and adhere to the Acceptable Use Policy for the Llama Materials (available at https://ai.meta.com/llama/use-policy), which is hereby incorporated by reference into this Agreement.
|
||||
|
||||
v. You will not use the Llama Materials or any output or results of the Llama Materials to improve any other large language model (excluding Llama 2 or derivative works thereof).
|
||||
|
||||
2. Additional Commercial Terms. If, on the Llama 2 version release date, the monthly active users of the products or services made available by or for Licensee, or Licensee’s affiliates, is greater than 700 million monthly active users in the preceding calendar month, you must request a license from Meta, which Meta may grant to you in its sole discretion, and you are not authorized to exercise any of the rights under this Agreement unless or until Meta otherwise expressly grants you such rights.
|
||||
|
||||
3. Disclaimer of Warranty. UNLESS REQUIRED BY APPLICABLE LAW, THE LLAMA MATERIALS AND ANY OUTPUT AND RESULTS THEREFROM ARE PROVIDED ON AN “AS IS” BASIS, WITHOUT WARRANTIES OF ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING, WITHOUT LIMITATION, ANY WARRANTIES OF TITLE, NON-INFRINGEMENT, MERCHANTABILITY, OR FITNESS FOR A PARTICULAR PURPOSE. YOU ARE SOLELY RESPONSIBLE FOR DETERMINING THE APPROPRIATENESS OF USING OR REDISTRIBUTING THE LLAMA MATERIALS AND ASSUME ANY RISKS ASSOCIATED WITH YOUR USE OF THE LLAMA MATERIALS AND ANY OUTPUT AND RESULTS.
|
||||
|
||||
4. Limitation of Liability. IN NO EVENT WILL META OR ITS AFFILIATES BE LIABLE UNDER ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, TORT, NEGLIGENCE, PRODUCTS LIABILITY, OR OTHERWISE, ARISING OUT OF THIS AGREEMENT, FOR ANY LOST PROFITS OR ANY INDIRECT, SPECIAL, CONSEQUENTIAL, INCIDENTAL, EXEMPLARY OR PUNITIVE DAMAGES, EVEN IF META OR ITS AFFILIATES HAVE BEEN ADVISED OF THE POSSIBILITY OF ANY OF THE FOREGOING.
|
||||
|
||||
5. Intellectual Property.
|
||||
|
||||
a. No trademark licenses are granted under this Agreement, and in connection with the Llama Materials, neither Meta nor Licensee may use any name or mark owned by or associated with the other or any of its affiliates, except as required for reasonable and customary use in describing and redistributing the Llama Materials.
|
||||
|
||||
b. Subject to Meta’s ownership of Llama Materials and derivatives made by or for Meta, with respect to any derivative works and modifications of the Llama Materials that are made by you, as between you and Meta, you are and will be the owner of such derivative works and modifications.
|
||||
|
||||
c. If you institute litigation or other proceedings against Meta or any entity (including a cross-claim or counterclaim in a lawsuit) alleging that the Llama Materials or Llama 2 outputs or results, or any portion of any of the foregoing, constitutes infringement of intellectual property or other rights owned or licensable by you, then any licenses granted to you under this Agreement shall terminate as of the date such litigation or claim is filed or instituted. You will indemnify and hold harmless Meta from and against any claim by any third party arising out of or related to your use or distribution of the Llama Materials.
|
||||
|
||||
6. Term and Termination. The term of this Agreement will commence upon your acceptance of this Agreement or access to the Llama Materials and will continue in full force and effect until terminated in accordance with the terms and conditions herein. Meta may terminate this Agreement if you are in breach of any term or condition of this Agreement. Upon termination of this Agreement, you shall delete and cease use of the Llama Materials. Sections 3, 4 and 7 shall survive the termination of this Agreement.
|
||||
|
||||
7. Governing Law and Jurisdiction. This Agreement will be governed and construed under the laws of the State of California without regard to choice of law principles, and the UN Convention on Contracts for the International Sale of Goods does not apply to this Agreement. The courts of California shall have exclusive jurisdiction of any dispute arising out of this Agreement.
|
||||
|
||||
"""
|
||||
|
||||
LICENSE """
|
||||
Llama 2 Acceptable Use Policy
|
||||
|
||||
Meta is committed to promoting safe and fair use of its tools and features, including Llama 2. If you access or use Llama 2, you agree to this Acceptable Use Policy (“Policy”). The most recent copy of this policy can be found at ai.meta.com/llama/use-policy.
|
||||
|
||||
Prohibited Uses
|
||||
|
||||
We want everyone to use Llama 2 safely and responsibly. You agree you will not use, or allow others to use, Llama 2 to:
|
||||
|
||||
1. Violate the law or others’ rights, including to:
|
||||
|
||||
a. Engage in, promote, generate, contribute to, encourage, plan, incite, or further illegal or unlawful activity or content, such as:
|
||||
|
||||
i. Violence or terrorism
|
||||
|
||||
ii. Exploitation or harm to children, including the solicitation, creation, acquisition, or dissemination of child exploitative content or failure to report Child Sexual Abuse Material
|
||||
|
||||
b. Human trafficking, exploitation, and sexual violence
|
||||
|
||||
iii. The illegal distribution of information or materials to minors, including obscene materials, or failure to employ legally required age-gating in connection with such information or materials.
|
||||
|
||||
iv. Sexual solicitation
|
||||
|
||||
vi. Any other criminal activity
|
||||
|
||||
c. Engage in, promote, incite, or facilitate the harassment, abuse, threatening, or bullying of individuals or groups of individuals
|
||||
|
||||
d. Engage in, promote, incite, or facilitate discrimination or other unlawful or harmful conduct in the provision of employment, employment benefits, credit, housing, other economic benefits, or other essential goods and services
|
||||
|
||||
e. Engage in the unauthorized or unlicensed practice of any profession including, but not limited to, financial, legal, medical/health, or related professional practices
|
||||
|
||||
f. Collect, process, disclose, generate, or infer health, demographic, or other sensitive personal or private information about individuals without rights and consents required by applicable laws
|
||||
|
||||
g. Engage in or facilitate any action or generate any content that infringes, misappropriates, or otherwise violates any third-party rights, including the outputs or results of any products or services using the Llama 2 Materials
|
||||
|
||||
h. Create, generate, or facilitate the creation of malicious code, malware, computer viruses or do anything else that could disable, overburden, interfere with or impair the proper working, integrity, operation or appearance of a website or computer system
|
||||
|
||||
2. Engage in, promote, incite, facilitate, or assist in the planning or development of activities that present a risk of death or bodily harm to individuals, including use of Llama 2 related to the following:
|
||||
|
||||
a. Military, warfare, nuclear industries or applications, espionage, use for materials or activities that are subject to the International Traffic Arms Regulations (ITAR) maintained by the United States Department of State
|
||||
|
||||
b. Guns and illegal weapons (including weapon development)
|
||||
|
||||
c. Illegal drugs and regulated/controlled substances
|
||||
|
||||
d. Operation of critical infrastructure, transportation technologies, or heavy machinery
|
||||
|
||||
e. Self-harm or harm to others, including suicide, cutting, and eating disorders
|
||||
|
||||
f. Any content intended to incite or promote violence, abuse, or any infliction of bodily harm to an individual
|
||||
|
||||
3. Intentionally deceive or mislead others, including use of Llama 2 related to the following:
|
||||
|
||||
a. Generating, promoting, or furthering fraud or the creation or promotion of disinformation
|
||||
|
||||
b. Generating, promoting, or furthering defamatory content, including the creation of defamatory statements, images, or other content
|
||||
|
||||
c. Generating, promoting, or further distributing spam
|
||||
|
||||
d. Impersonating another individual without consent, authorization, or legal right
|
||||
|
||||
e. Representing that the use of Llama 2 or outputs are human-generated
|
||||
|
||||
f. Generating or facilitating false online engagement, including fake reviews and other means of fake online engagement
|
||||
|
||||
4. Fail to appropriately disclose to end users any known dangers of your AI system
|
||||
|
||||
Please report any violation of this Policy, software “bug,” or other problems that could lead to a violation of this Policy through one of the following means:
|
||||
|
||||
Reporting issues with the model: github.com/facebookresearch/llama
|
||||
Reporting risky content generated by the model: developers.facebook.com/llama_output_feedback
|
||||
Reporting bugs and security concerns: facebook.com/whitehat/info
|
||||
Reporting violations of the Acceptable Use Policy or unlicensed uses of Llama: LlamaUseReport@meta.com
|
||||
"""
|
7
library/modelfiles/nous-hermes
Normal file
@@ -0,0 +1,7 @@
|
||||
FROM ../models/nous-hermes-13b.ggmlv3.q4_0.bin
|
||||
TEMPLATE """
|
||||
### Instruction:
|
||||
{{ .Prompt }}
|
||||
|
||||
### Response:
|
||||
"""
|
14
library/modelfiles/orca
Normal file
@@ -0,0 +1,14 @@
|
||||
FROM ../models/orca-mini-3b.ggmlv3.q4_0.bin
|
||||
TEMPLATE """
|
||||
{{- if .First }}
|
||||
### System:
|
||||
{{ .System }}
|
||||
{{- end }}
|
||||
|
||||
### User:
|
||||
{{ .Prompt }}
|
||||
|
||||
### Response:
|
||||
"""
|
||||
|
||||
SYSTEM """You are an AI assistant that follows instruction extremely well. Help as much as you can."""
|
11
library/modelfiles/vicuna
Normal file
@@ -0,0 +1,11 @@
|
||||
FROM ../models/vicuna-7b-v1.3.ggmlv3.q4_0.bin
|
||||
TEMPLATE """
|
||||
{{ if .First }}
|
||||
{{ .System }}
|
||||
{{- end }}
|
||||
|
||||
USER: {{ .Prompt }}
|
||||
ASSISTANT:
|
||||
"""
|
||||
|
||||
SYSTEM """A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions."""
|
5
library/modelfiles/wizard-vicuna
Normal file
@@ -0,0 +1,5 @@
|
||||
FROM ../models/Wizard-Vicuna-13B-Uncensored.ggmlv3.q4_0.bin
|
||||
TEMPLATE """
|
||||
USER: {{ .Prompt }}
|
||||
ASSISTANT:
|
||||
"""
|
52
library/publish.sh
Executable file
@@ -0,0 +1,52 @@
|
||||
#!/bin/bash
|
||||
|
||||
mkdir -p models
|
||||
|
||||
# download binaries
|
||||
function process_line {
|
||||
local url=$1
|
||||
local checksum=$2
|
||||
|
||||
# Get the filename from the URL
|
||||
local filename=models/$(basename $url)
|
||||
|
||||
echo "verifying $filename..."
|
||||
|
||||
# If the file exists, compute its checksum
|
||||
if [ -f $filename ]; then
|
||||
local existing_checksum=$(shasum -a 256 $filename | cut -d ' ' -f1)
|
||||
fi
|
||||
|
||||
# If the file does not exist, or its checksum does not match, download it
|
||||
if [ ! -f $filename ] || [ $existing_checksum != $checksum ]; then
|
||||
echo "downloading $filename..."
|
||||
|
||||
# Download the file
|
||||
curl -L $url -o $filename
|
||||
|
||||
# Compute the SHA256 hash of the downloaded file
|
||||
local computed_checksum=$(shasum -a 256 $filename | cut -d ' ' -f1)
|
||||
|
||||
# Verify the checksum
|
||||
if [ $computed_checksum != $checksum ]; then
|
||||
echo "Checksum verification failed for $filename"
|
||||
exit 1
|
||||
fi
|
||||
fi
|
||||
}
|
||||
|
||||
while IFS=' ' read -r url checksum
|
||||
do
|
||||
process_line $url $checksum
|
||||
done < "downloads"
|
||||
|
||||
# create and publish the models
|
||||
for file in modelfiles/*; do
|
||||
if [ -f "$file" ]; then
|
||||
filename=$(basename "$file")
|
||||
echo $filename
|
||||
ollama create "library/${filename}" -f "$file"
|
||||
ollama push "${filename}"
|
||||
fi
|
||||
done
|
||||
|
@@ -1,5 +1,5 @@
|
||||
/**
|
||||
* llama.cpp - git 5bf2a2771886ee86137e01dbc7492f78fb392066
|
||||
* llama.cpp - git e782c9e735f93ab4767ffc37462c523b73a17ddc
|
||||
*
|
||||
* MIT License
|
||||
*
|
||||
|
@@ -1,5 +1,7 @@
|
||||
//go:build darwin
|
||||
|
||||
/**
|
||||
* llama.cpp - git 5bf2a2771886ee86137e01dbc7492f78fb392066
|
||||
* llama.cpp - git e782c9e735f93ab4767ffc37462c523b73a17ddc
|
||||
*
|
||||
* MIT License
|
||||
*
|
||||
|
@@ -1,7 +1,7 @@
|
||||
// +build darwin
|
||||
//go:build darwin
|
||||
|
||||
/**
|
||||
* llama.cpp - git 5bf2a2771886ee86137e01dbc7492f78fb392066
|
||||
* llama.cpp - git e782c9e735f93ab4767ffc37462c523b73a17ddc
|
||||
*
|
||||
* MIT License
|
||||
*
|
||||
@@ -722,8 +722,8 @@ void ggml_metal_graph_compute(
|
||||
GGML_ASSERT(ne02 == 1);
|
||||
GGML_ASSERT(ne12 == 1);
|
||||
|
||||
nth0 = 4;
|
||||
nth1 = 16;
|
||||
nth0 = 2;
|
||||
nth1 = 32;
|
||||
[encoder setComputePipelineState:ctx->pipeline_mul_mat_q4_K_f32];
|
||||
} break;
|
||||
case GGML_TYPE_Q5_K:
|
||||
@@ -731,8 +731,8 @@ void ggml_metal_graph_compute(
|
||||
GGML_ASSERT(ne02 == 1);
|
||||
GGML_ASSERT(ne12 == 1);
|
||||
|
||||
nth0 = 4;
|
||||
nth1 = 16;
|
||||
nth0 = 2;
|
||||
nth1 = 32;
|
||||
[encoder setComputePipelineState:ctx->pipeline_mul_mat_q5_K_f32];
|
||||
} break;
|
||||
case GGML_TYPE_Q6_K:
|
||||
@@ -740,8 +740,8 @@ void ggml_metal_graph_compute(
|
||||
GGML_ASSERT(ne02 == 1);
|
||||
GGML_ASSERT(ne12 == 1);
|
||||
|
||||
nth0 = 4;
|
||||
nth1 = 16;
|
||||
nth0 = 2;
|
||||
nth1 = 32;
|
||||
[encoder setComputePipelineState:ctx->pipeline_mul_mat_q6_K_f32];
|
||||
} break;
|
||||
default:
|
||||
@@ -767,15 +767,18 @@ void ggml_metal_graph_compute(
|
||||
[encoder setBytes:&ne0 length:sizeof(ne0) atIndex:13];
|
||||
[encoder setBytes:&ne1 length:sizeof(ne1) atIndex:14];
|
||||
|
||||
if (src0t == GGML_TYPE_Q4_0 || src0t == GGML_TYPE_Q4_1) {
|
||||
[encoder setThreadgroupMemoryLength:nth0*nth1*sizeof(float) atIndex:0];
|
||||
[encoder dispatchThreadgroups:MTLSizeMake(ne01, ne11, 1) threadsPerThreadgroup:MTLSizeMake(nth0, nth1, 1)];
|
||||
if (src0t == GGML_TYPE_Q4_0 || src0t == GGML_TYPE_Q4_1 ||
|
||||
src0t == GGML_TYPE_Q4_K) {
|
||||
[encoder dispatchThreadgroups:MTLSizeMake((ne01 + 7) / 8, ne11, 1) threadsPerThreadgroup:MTLSizeMake(nth0, nth1, 1)];
|
||||
}
|
||||
else if (src0t == GGML_TYPE_Q5_K) {
|
||||
[encoder dispatchThreadgroups:MTLSizeMake((ne01 + 3) / 4, ne11, 1) threadsPerThreadgroup:MTLSizeMake(nth0, nth1, 1)];
|
||||
}
|
||||
else if (src0t == GGML_TYPE_Q6_K) {
|
||||
[encoder dispatchThreadgroups:MTLSizeMake((ne01+1)/2, ne11, 1) threadsPerThreadgroup:MTLSizeMake(nth0, nth1, 1)];
|
||||
}
|
||||
else if (src0t == GGML_TYPE_Q2_K ||
|
||||
src0t == GGML_TYPE_Q3_K ||
|
||||
src0t == GGML_TYPE_Q4_K ||
|
||||
src0t == GGML_TYPE_Q5_K ||
|
||||
src0t == GGML_TYPE_Q6_K) {
|
||||
src0t == GGML_TYPE_Q3_K) {
|
||||
[encoder setThreadgroupMemoryLength:nth0*nth1*sizeof(float) atIndex:0];
|
||||
[encoder dispatchThreadgroups:MTLSizeMake(ne01, 1, 1) threadsPerThreadgroup:MTLSizeMake(nth0, nth1, 1)];
|
||||
} else {
|
||||
@@ -821,7 +824,7 @@ void ggml_metal_graph_compute(
|
||||
|
||||
const float eps = 1e-6f;
|
||||
|
||||
const int nth = 256;
|
||||
const int nth = 512;
|
||||
|
||||
[encoder setComputePipelineState:ctx->pipeline_rms_norm];
|
||||
[encoder setBuffer:id_src0 offset:offs_src0 atIndex:0];
|
||||
@@ -829,7 +832,7 @@ void ggml_metal_graph_compute(
|
||||
[encoder setBytes:&ne00 length:sizeof( int64_t) atIndex:2];
|
||||
[encoder setBytes:&nb01 length:sizeof(uint64_t) atIndex:3];
|
||||
[encoder setBytes:&eps length:sizeof( float) atIndex:4];
|
||||
[encoder setThreadgroupMemoryLength:nth*sizeof(float) atIndex:0];
|
||||
[encoder setThreadgroupMemoryLength:nth/32*sizeof(float) atIndex:0];
|
||||
|
||||
const int64_t nrows = ggml_nrows(src0);
|
||||
|
||||
@@ -910,28 +913,35 @@ void ggml_metal_graph_compute(
|
||||
|
||||
const int n_past = ((int32_t *)(src1->data))[0];
|
||||
|
||||
float freq_base;
|
||||
float freq_scale;
|
||||
memcpy(&freq_base, (int32_t *) src1->data + 4, sizeof(float));
|
||||
memcpy(&freq_scale, (int32_t *) src1->data + 5, sizeof(float));
|
||||
|
||||
[encoder setComputePipelineState:ctx->pipeline_rope];
|
||||
[encoder setBuffer:id_src0 offset:offs_src0 atIndex:0];
|
||||
[encoder setBuffer:id_dst offset:offs_dst atIndex:1];
|
||||
[encoder setBytes:&ne00 length:sizeof( int64_t) atIndex:2];
|
||||
[encoder setBytes:&ne01 length:sizeof( int64_t) atIndex:3];
|
||||
[encoder setBytes:&ne02 length:sizeof( int64_t) atIndex:4];
|
||||
[encoder setBytes:&ne03 length:sizeof( int64_t) atIndex:5];
|
||||
[encoder setBytes:&nb00 length:sizeof(uint64_t) atIndex:6];
|
||||
[encoder setBytes:&nb01 length:sizeof(uint64_t) atIndex:7];
|
||||
[encoder setBytes:&nb02 length:sizeof(uint64_t) atIndex:8];
|
||||
[encoder setBytes:&nb03 length:sizeof(uint64_t) atIndex:9];
|
||||
[encoder setBytes:&ne0 length:sizeof( int64_t) atIndex:10];
|
||||
[encoder setBytes:&ne1 length:sizeof( int64_t) atIndex:11];
|
||||
[encoder setBytes:&ne2 length:sizeof( int64_t) atIndex:12];
|
||||
[encoder setBytes:&ne3 length:sizeof( int64_t) atIndex:13];
|
||||
[encoder setBytes:&nb0 length:sizeof(uint64_t) atIndex:14];
|
||||
[encoder setBytes:&nb1 length:sizeof(uint64_t) atIndex:15];
|
||||
[encoder setBytes:&nb2 length:sizeof(uint64_t) atIndex:16];
|
||||
[encoder setBytes:&nb3 length:sizeof(uint64_t) atIndex:17];
|
||||
[encoder setBytes:&n_past length:sizeof( int) atIndex:18];
|
||||
[encoder setBytes:&n_dims length:sizeof( int) atIndex:19];
|
||||
[encoder setBytes:&mode length:sizeof( int) atIndex:20];
|
||||
[encoder setBytes:&ne00 length:sizeof( int64_t) atIndex:2];
|
||||
[encoder setBytes:&ne01 length:sizeof( int64_t) atIndex:3];
|
||||
[encoder setBytes:&ne02 length:sizeof( int64_t) atIndex:4];
|
||||
[encoder setBytes:&ne03 length:sizeof( int64_t) atIndex:5];
|
||||
[encoder setBytes:&nb00 length:sizeof(uint64_t) atIndex:6];
|
||||
[encoder setBytes:&nb01 length:sizeof(uint64_t) atIndex:7];
|
||||
[encoder setBytes:&nb02 length:sizeof(uint64_t) atIndex:8];
|
||||
[encoder setBytes:&nb03 length:sizeof(uint64_t) atIndex:9];
|
||||
[encoder setBytes:&ne0 length:sizeof( int64_t) atIndex:10];
|
||||
[encoder setBytes:&ne1 length:sizeof( int64_t) atIndex:11];
|
||||
[encoder setBytes:&ne2 length:sizeof( int64_t) atIndex:12];
|
||||
[encoder setBytes:&ne3 length:sizeof( int64_t) atIndex:13];
|
||||
[encoder setBytes:&nb0 length:sizeof(uint64_t) atIndex:14];
|
||||
[encoder setBytes:&nb1 length:sizeof(uint64_t) atIndex:15];
|
||||
[encoder setBytes:&nb2 length:sizeof(uint64_t) atIndex:16];
|
||||
[encoder setBytes:&nb3 length:sizeof(uint64_t) atIndex:17];
|
||||
[encoder setBytes:&n_past length:sizeof( int) atIndex:18];
|
||||
[encoder setBytes:&n_dims length:sizeof( int) atIndex:19];
|
||||
[encoder setBytes:&mode length:sizeof( int) atIndex:20];
|
||||
[encoder setBytes:&freq_base length:sizeof(float) atIndex:21];
|
||||
[encoder setBytes:&freq_scale length:sizeof(float) atIndex:22];
|
||||
|
||||
[encoder dispatchThreadgroups:MTLSizeMake(ne01, ne02, ne03) threadsPerThreadgroup:MTLSizeMake(1, 1, 1)];
|
||||
} break;
|
||||
|
@@ -1,5 +1,7 @@
|
||||
//go:build darwin
|
||||
|
||||
/**
|
||||
* llama.cpp - git 5bf2a2771886ee86137e01dbc7492f78fb392066
|
||||
* llama.cpp - git e782c9e735f93ab4767ffc37462c523b73a17ddc
|
||||
*
|
||||
* MIT License
|
||||
*
|
||||
@@ -357,26 +359,33 @@ kernel void kernel_rms_norm(
|
||||
threadgroup float * sum [[threadgroup(0)]],
|
||||
uint tgpig[[threadgroup_position_in_grid]],
|
||||
uint tpitg[[thread_position_in_threadgroup]],
|
||||
uint sgitg[[simdgroup_index_in_threadgroup]],
|
||||
uint tiisg[[thread_index_in_simdgroup]],
|
||||
uint ntg[[threads_per_threadgroup]]) {
|
||||
device const float * x = (device const float *) ((device const char *) src0 + tgpig*nb01);
|
||||
device const float4 * x = (device const float4 *) ((device const char *) src0 + tgpig*nb01);
|
||||
device const float * x_scalar = (device const float *) x;
|
||||
float4 sumf=0;
|
||||
float all_sum=0;
|
||||
|
||||
// parallel sum
|
||||
sum[tpitg] = 0.0f;
|
||||
for (int i00 = tpitg; i00 < ne00; i00 += ntg) {
|
||||
sum[tpitg] += x[i00] * x[i00];
|
||||
for (int i00 = tpitg; i00 < ne00/4; i00 += ntg) {
|
||||
sumf += x[i00] * x[i00];
|
||||
}
|
||||
all_sum = sumf[0] + sumf[1] + sumf[2] + sumf[3];
|
||||
all_sum = simd_sum(all_sum);
|
||||
if (tiisg == 0) {
|
||||
sum[sgitg] = all_sum;
|
||||
}
|
||||
|
||||
// reduce
|
||||
threadgroup_barrier(mem_flags::mem_threadgroup);
|
||||
for (uint i = ntg/2; i > 0; i /= 2) {
|
||||
if (tpitg < i) {
|
||||
sum[tpitg] += sum[tpitg + i];
|
||||
}
|
||||
threadgroup_barrier(mem_flags::mem_threadgroup);
|
||||
// broadcast, simd group number is ntg / 32
|
||||
for (int i = ntg / 32 / 2; i > 0; i /= 2) {
|
||||
if (tpitg < i) {
|
||||
sum[tpitg] += sum[tpitg + i];
|
||||
}
|
||||
}
|
||||
|
||||
// broadcast
|
||||
if (tpitg == 0) {
|
||||
for (int i = 4 * (ne00 / 4); i < ne00; i++) {sum[0] += x_scalar[i];}
|
||||
sum[0] /= ne00;
|
||||
}
|
||||
|
||||
@@ -385,10 +394,99 @@ kernel void kernel_rms_norm(
|
||||
const float mean = sum[0];
|
||||
const float scale = 1.0f/sqrt(mean + eps);
|
||||
|
||||
device float * y = dst + tgpig*ne00;
|
||||
for (int i00 = tpitg; i00 < ne00; i00 += ntg) {
|
||||
device float4 * y = (device float4 *) (dst + tgpig*ne00);
|
||||
device float * y_scalar = (device float *) y;
|
||||
for (int i00 = tpitg; i00 < ne00/4; i00 += ntg) {
|
||||
y[i00] = x[i00] * scale;
|
||||
}
|
||||
if (tpitg == 0) {
|
||||
for (int i00 = 4 * (ne00 / 4); i00 < ne00; i00++) {y_scalar[i00] = x_scalar[i00] * scale;}
|
||||
}
|
||||
}
|
||||
|
||||
// function for calculate inner product between a q4_0 block and 32 floats (yl), sumy is SUM(yl[i])
|
||||
float block_q_n_dot_y(device const block_q4_0 * qb_curr, float sumy, thread float * yl) {
|
||||
float d = qb_curr->d;
|
||||
float4 acc = 0.f;
|
||||
device uint16_t * qs = ((device uint16_t *)qb_curr + 1);
|
||||
for (int i = 0; i < 16; i+=2) {
|
||||
acc[0] += yl[i] * (qs[i / 2] & 0x000F);
|
||||
acc[1] += yl[i + 16] * (qs[i / 2] & 0x00F0);
|
||||
acc[2] += yl[i + 1] * (qs[i / 2] & 0x0F00);
|
||||
acc[3] += yl[i + 17] * (qs[i / 2] & 0xF000);
|
||||
}
|
||||
return d * (sumy * -8.f + acc[0] + acc[1]/16.f + acc[2]/256.f + acc[3]/4096.f);
|
||||
}
|
||||
|
||||
// function for calculate inner product between a q4_1 block and 32 floats (yl), sumy is SUM(yl[i])
|
||||
float block_q_n_dot_y(device const block_q4_1 * qb_curr, float sumy, thread float * yl) {
|
||||
float d = qb_curr->d;
|
||||
float m = qb_curr->m;
|
||||
float4 acc = 0.f;
|
||||
device uint16_t * qs = ((device uint16_t *)qb_curr + 2);
|
||||
for (int i = 0; i < 16; i+=2) {
|
||||
acc[0] += yl[i] * (qs[i / 2] & 0x000F);
|
||||
acc[1] += yl[i + 16] * (qs[i / 2] & 0x00F0);
|
||||
acc[2] += yl[i + 1] * (qs[i / 2] & 0x0F00);
|
||||
acc[3] += yl[i + 17] * (qs[i / 2] & 0xF000);
|
||||
}
|
||||
return d * (acc[0] + acc[1]/16.f + acc[2]/256.f + acc[3]/4096.f) + sumy * m;
|
||||
}
|
||||
|
||||
// putting them in the kernel cause a significant performance penalty
|
||||
#define N_DST 4 // each SIMD group works on 4 rows
|
||||
#define N_SIMDGROUP 2 // number of SIMD groups in a thread group
|
||||
#define N_SIMDWIDTH 32 // assuming SIMD group size is 32
|
||||
template<typename block_q_type>
|
||||
void mul_vec_q_n_f32(device const void * src0, device const float * src1, device float * dst,
|
||||
int64_t ne00, int64_t ne10, int64_t ne0, int64_t ne01,
|
||||
uint2 tgpig, uint tiisg, uint sgitg) {
|
||||
const int nb = ne00/QK4_0;
|
||||
const int r0 = tgpig.x;
|
||||
const int r1 = tgpig.y;
|
||||
device const block_q_type * x = (device const block_q_type *) src0 + (r0 * N_SIMDGROUP + sgitg) * N_DST * nb;
|
||||
device const float * y = (device const float *) src1 + r1*ne10;
|
||||
float4 y_curr[8]; // src1 vector cache
|
||||
float sumf[N_DST]={0.f}, all_sum;
|
||||
thread float * yl=(thread float *)y_curr;
|
||||
|
||||
// each thread in a SIMD group deals with 1 block.
|
||||
for (int column = 0; column < nb / N_SIMDWIDTH; column++) {
|
||||
float sumy = 0;
|
||||
for (int i = 0; i < QK4_0 / 4; i++) {
|
||||
y_curr[i] = *((device float4 *)(y + N_SIMDWIDTH * (tiisg + column * QK4_0)) + i);
|
||||
sumy += y_curr[i][0] + y_curr[i][1] + y_curr[i][2] + y_curr[i][3];
|
||||
}
|
||||
|
||||
for (int row = 0; row < N_DST; row++) {
|
||||
sumf[row] += block_q_n_dot_y(x+(tiisg + row * nb + column * N_SIMDWIDTH), sumy, yl);
|
||||
}
|
||||
}
|
||||
|
||||
// from now loads two rows every time and 16 blocks per row
|
||||
int ir = tiisg / (N_SIMDWIDTH / 2);
|
||||
int ib = tiisg % (N_SIMDWIDTH / 2);
|
||||
for (int ind = 0; ind < (nb % N_SIMDWIDTH + N_SIMDWIDTH / 2 - 1)/(N_SIMDWIDTH / 2); ind++) {
|
||||
int nb_start = (nb / N_SIMDWIDTH) * N_SIMDWIDTH + ind * (N_SIMDWIDTH / 2); //where the left blocks start
|
||||
float sumy = 0;
|
||||
for (int i = 0; i < QK4_0 / 4; i++) {
|
||||
y_curr[i] = *((device float4 *)(y + (nb_start + ib) * QK4_0) + i);
|
||||
sumy += y_curr[i][0] + y_curr[i][1] + y_curr[i][2] + y_curr[i][3];
|
||||
}
|
||||
|
||||
for (int row = 0; row < N_DST; row+=2) {
|
||||
if (nb_start + ib < nb) {
|
||||
sumf[row + ir] += block_q_n_dot_y(x + (nb_start + ib + (row + ir) * nb), sumy, yl);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
for (int row = 0; row < N_DST; ++row) {
|
||||
all_sum = simd_sum(sumf[row]);
|
||||
if (tiisg == 0 && ((r0 * N_SIMDGROUP + sgitg) * N_DST + row) < ne01) {
|
||||
dst[r1*ne0 + (r0 * N_SIMDGROUP + sgitg) * N_DST + row] = all_sum;
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
kernel void kernel_mul_mat_q4_0_f32(
|
||||
@@ -398,65 +496,11 @@ kernel void kernel_mul_mat_q4_0_f32(
|
||||
constant int64_t & ne00,
|
||||
constant int64_t & ne10,
|
||||
constant int64_t & ne0,
|
||||
threadgroup float * sum [[threadgroup(0)]],
|
||||
constant int64_t & ne01[[buffer(4)]],
|
||||
uint2 tgpig[[threadgroup_position_in_grid]],
|
||||
uint2 tpitg[[thread_position_in_threadgroup]],
|
||||
uint2 tptg[[threads_per_threadgroup]]) {
|
||||
const int nb = ne00/QK4_0;
|
||||
|
||||
const int64_t r0 = tgpig.x;
|
||||
const int64_t r1 = tgpig.y;
|
||||
|
||||
device const block_q4_0 * x = (device const block_q4_0 *) src0 + r0*nb;
|
||||
device const float * y = (device const float *) src1 + r1*ne10;
|
||||
|
||||
const int nth = tptg.x*tptg.y;
|
||||
const int ith = tptg.y*tpitg.x + tpitg.y;
|
||||
|
||||
const int ix = tpitg.y/4; // 0 or 1
|
||||
const int iy = tpitg.y - 4*ix; // 0...3
|
||||
|
||||
const int first = 4 * iy;
|
||||
|
||||
float sumf = 0;
|
||||
|
||||
for (int i = 2*tpitg.x + ix; i < nb; i += 2*tptg.x) {
|
||||
|
||||
const float d = (float)x[i].d;
|
||||
|
||||
device const uint8_t * xl = x[i].qs + first;
|
||||
device const float * yl = y + i * QK4_0 + first;
|
||||
|
||||
float2 acc = {0.0f, 0.0f};
|
||||
|
||||
for (int j = 0; j < 4; ++j) {
|
||||
|
||||
acc[0] += yl[j] * (xl[j] & 0xF) + yl[j+16] * (xl[j] >> 4);
|
||||
acc[1] += yl[j] + yl[j+16];
|
||||
|
||||
}
|
||||
|
||||
sumf += d * (acc[0] - 8.f*acc[1]);
|
||||
}
|
||||
|
||||
sum[ith] = sumf;
|
||||
|
||||
//
|
||||
// Accumulate the sum from all threads in the threadgroup
|
||||
//
|
||||
threadgroup_barrier(mem_flags::mem_threadgroup);
|
||||
if (ith%4 == 0) {
|
||||
sum[ith] += sum[ith+1] + sum[ith+2] + sum[ith+3];
|
||||
}
|
||||
threadgroup_barrier(mem_flags::mem_threadgroup);
|
||||
if (ith%16 == 0) {
|
||||
sum[ith] += sum[ith+4] + sum[ith+8] + sum[ith+12];
|
||||
}
|
||||
threadgroup_barrier(mem_flags::mem_threadgroup);
|
||||
if (ith == 0) {
|
||||
for (int i = 16; i < nth; i += 16) sum[0] += sum[i];
|
||||
dst[r1*ne0 + r0] = sum[0];
|
||||
}
|
||||
uint tiisg[[thread_index_in_simdgroup]],
|
||||
uint sgitg[[simdgroup_index_in_threadgroup]]) {
|
||||
mul_vec_q_n_f32<block_q4_0>(src0,src1,dst,ne00,ne10,ne0,ne01,tgpig,tiisg,sgitg);
|
||||
}
|
||||
|
||||
kernel void kernel_mul_mat_q4_1_f32(
|
||||
@@ -466,66 +510,11 @@ kernel void kernel_mul_mat_q4_1_f32(
|
||||
constant int64_t & ne00,
|
||||
constant int64_t & ne10,
|
||||
constant int64_t & ne0,
|
||||
threadgroup float * sum [[threadgroup(0)]],
|
||||
constant int64_t & ne01[[buffer(4)]],
|
||||
uint2 tgpig[[threadgroup_position_in_grid]],
|
||||
uint2 tpitg[[thread_position_in_threadgroup]],
|
||||
uint2 tptg[[threads_per_threadgroup]]) {
|
||||
const int nb = ne00/QK4_1;
|
||||
|
||||
const int64_t r0 = tgpig.x;
|
||||
const int64_t r1 = tgpig.y;
|
||||
|
||||
device const block_q4_1 * x = (device const block_q4_1 *) src0 + r0*nb;
|
||||
device const float * y = (device const float *) src1 + r1*ne10;
|
||||
|
||||
const uint nth = tptg.x*tptg.y;
|
||||
const uint ith = tptg.y*tpitg.x + tpitg.y;
|
||||
|
||||
const int ix = tpitg.y/4; // 0 or 1
|
||||
const int iy = tpitg.y - 4*ix; // 0...3
|
||||
|
||||
const int first = 4 * iy;
|
||||
|
||||
float sumf = 0;
|
||||
|
||||
for (int i = 2*tpitg.x + ix; i < nb; i += 2*tptg.x) {
|
||||
|
||||
const float d = (float)x[i].d;
|
||||
const float m = (float)x[i].m;
|
||||
|
||||
device const uint8_t * xl = x[i].qs + first;
|
||||
device const float * yl = y + i * QK4_1 + first;
|
||||
|
||||
float2 acc = {0.0f, 0.0f};
|
||||
|
||||
for (int j = 0; j < 4; ++j) {
|
||||
|
||||
acc[0] += yl[j+ 0] * (d * (xl[j] & 0xF) + m);
|
||||
acc[1] += yl[j+16] * (d * (xl[j] >> 4) + m);
|
||||
|
||||
}
|
||||
|
||||
sumf += acc[0] + acc[1];
|
||||
}
|
||||
|
||||
sum[ith] = sumf;
|
||||
|
||||
//
|
||||
// Accumulate the sum from all threads in the threadgroup
|
||||
//
|
||||
threadgroup_barrier(mem_flags::mem_threadgroup);
|
||||
if (ith%4 == 0) {
|
||||
sum[ith] += sum[ith+1] + sum[ith+2] + sum[ith+3];
|
||||
}
|
||||
threadgroup_barrier(mem_flags::mem_threadgroup);
|
||||
if (ith%16 == 0) {
|
||||
sum[ith] += sum[ith+4] + sum[ith+8] + sum[ith+12];
|
||||
}
|
||||
threadgroup_barrier(mem_flags::mem_threadgroup);
|
||||
if (ith == 0) {
|
||||
for (uint i = 16; i < nth; i += 16) sum[0] += sum[i];
|
||||
dst[r1*ne0 + r0] = sum[0];
|
||||
}
|
||||
uint tiisg[[thread_index_in_simdgroup]],
|
||||
uint sgitg[[simdgroup_index_in_threadgroup]]) {
|
||||
mul_vec_q_n_f32<block_q4_1>(src0,src1,dst,ne00,ne10,ne0,ne01,tgpig,tiisg,sgitg);
|
||||
}
|
||||
|
||||
kernel void kernel_mul_mat_f16_f32(
|
||||
@@ -641,17 +630,19 @@ kernel void kernel_rope(
|
||||
constant int & n_past,
|
||||
constant int & n_dims,
|
||||
constant int & mode,
|
||||
constant float & freq_base,
|
||||
constant float & freq_scale,
|
||||
uint3 tpig[[thread_position_in_grid]]) {
|
||||
const int64_t i3 = tpig[2];
|
||||
const int64_t i2 = tpig[1];
|
||||
const int64_t i1 = tpig[0];
|
||||
|
||||
const bool is_neox = mode & 2;
|
||||
const float theta_scale = pow(10000.0, -2.0f/n_dims);
|
||||
const float theta_scale = pow(freq_base, -2.0f/n_dims);
|
||||
|
||||
const int64_t p = ((mode & 1) == 0 ? n_past + i2 : i2);
|
||||
|
||||
float theta = (float)p;
|
||||
float theta = freq_scale * (float)p;
|
||||
|
||||
if (!is_neox) {
|
||||
for (int64_t i0 = 0; i0 < ne0; i0 += 2) {
|
||||
@@ -1489,6 +1480,7 @@ kernel void kernel_mul_mat_q3_K_f32(
|
||||
|
||||
}
|
||||
|
||||
#if QK_K == 256
|
||||
kernel void kernel_mul_mat_q4_K_f32(
|
||||
device const void * src0,
|
||||
device const float * src1,
|
||||
@@ -1496,131 +1488,180 @@ kernel void kernel_mul_mat_q4_K_f32(
|
||||
constant int64_t & ne00,
|
||||
constant int64_t & ne10,
|
||||
constant int64_t & ne0,
|
||||
threadgroup float * sum [[threadgroup(0)]],
|
||||
constant int64_t & ne01[[buffer(4)]],
|
||||
uint2 tgpig[[threadgroup_position_in_grid]],
|
||||
uint2 tpitg[[thread_position_in_threadgroup]],
|
||||
uint2 tptg[[threads_per_threadgroup]]) {
|
||||
|
||||
const int nb = ne00/QK_K;
|
||||
|
||||
const int64_t r0 = tgpig.x;
|
||||
const int64_t r1 = tgpig.y;
|
||||
|
||||
const int nth = tptg.x*tptg.y;
|
||||
const int ith = tptg.y*tpitg.x + tpitg.y;
|
||||
|
||||
device const block_q4_K * x = (device const block_q4_K *) src0 + r0*nb;
|
||||
device const float * yy = (device const float *) src1 + r1*ne10;
|
||||
|
||||
float sumf = 0;
|
||||
|
||||
#if QK_K == 256
|
||||
uint tiisg[[thread_index_in_simdgroup]],
|
||||
uint sgitg[[simdgroup_index_in_threadgroup]]) {
|
||||
|
||||
const uint16_t kmask1 = 0x3f3f;
|
||||
const uint16_t kmask2 = 0x0f0f;
|
||||
const uint16_t kmask3 = 0xc0c0;
|
||||
|
||||
const int tid = tpitg.y; // 0...16
|
||||
const int il = tid/4; // 0...3
|
||||
const int ir = tid - 4*il;// 0...3
|
||||
const int n = 4;
|
||||
const int ix = tiisg/8; // 0...3
|
||||
const int it = tiisg%8; // 0...7
|
||||
const int im = it/4; // 0 or 1
|
||||
const int ir = it%4; // 0...3
|
||||
|
||||
const int im = il/2; // 0 or 1. 0 computes 0,32 + 128,160, 1 computes 64,96 + 192,224
|
||||
const int in = il%2;
|
||||
const int nb = ne00/QK_K;
|
||||
const int r0 = tgpig.x;
|
||||
const int r1 = tgpig.y;
|
||||
const int first_row = (r0 * N_SIMDGROUP + sgitg) * N_DST;
|
||||
const int ib_row = first_row * nb;
|
||||
device const block_q4_K * x = (device const block_q4_K *) src0 + ib_row;
|
||||
device const float * y = (device const float *) src1 + r1*ne10;
|
||||
float yl[16];
|
||||
float yh[16];
|
||||
float sumf[N_DST]={0.f}, all_sum;
|
||||
|
||||
const int l0 = n*(2*ir + in);
|
||||
const int q_offset = 32*im + l0;
|
||||
const int y_offset = 64*im + l0;
|
||||
const int step = sizeof(block_q4_K) * nb / 2;
|
||||
|
||||
uchar2 sc1, sc2, sc3, sc4;
|
||||
device const float * y4 = y + ix * QK_K + 64 * im + 8 * ir;
|
||||
|
||||
for (int i = tpitg.x; i < nb; i += tptg.x) {
|
||||
uint16_t sc16[4];
|
||||
thread const uint8_t * sc8 = (thread const uint8_t *)sc16;
|
||||
|
||||
device const uint8_t * q1 = (x + i)->qs + q_offset;
|
||||
device const uint8_t * q2 = q1 + 64;
|
||||
device const float * y1 = yy + i*QK_K + y_offset;
|
||||
device const float * y2 = y1 + 128;
|
||||
|
||||
const float dall = (float)((x + i)->d);
|
||||
const float dmin = (float)((x + i)->dmin);
|
||||
|
||||
device const uint16_t * a = (device const uint16_t *)(x + i)->scales;
|
||||
sc1 = as_type<uchar2>((uint16_t)(a[im+0] & kmask1));
|
||||
sc2 = as_type<uchar2>((uint16_t)(a[im+2] & kmask1));
|
||||
sc3 = as_type<uchar2>((uint16_t)(((a[im+4] >> 0) & kmask2) | ((a[im+0] & kmask3) >> 2)));
|
||||
sc4 = as_type<uchar2>((uint16_t)(((a[im+4] >> 4) & kmask2) | ((a[im+2] & kmask3) >> 2)));
|
||||
|
||||
float4 s = {0.f, 0.f, 0.f, 0.f};
|
||||
float smin = 0;
|
||||
for (int l = 0; l < n; ++l) {
|
||||
|
||||
s[0] += y1[l] * (q1[l] & 0xF); s[1] += y1[l+32] * (q1[l] >> 4);
|
||||
s[2] += y2[l] * (q2[l] & 0xF); s[3] += y2[l+32] * (q2[l] >> 4);
|
||||
smin += y1[l] * sc2[0] + y1[l+32] * sc2[1] + y2[l] * sc4[0] + y2[l+32] * sc4[1];
|
||||
for (int ib = ix; ib < nb; ib += 4) {
|
||||
|
||||
float4 sumy = {0.f, 0.f, 0.f, 0.f};
|
||||
for (int i = 0; i < 8; ++i) {
|
||||
yl[i+0] = y4[i+ 0]; sumy[0] += yl[i+0];
|
||||
yl[i+8] = y4[i+ 32]; sumy[1] += yl[i+8];
|
||||
yh[i+0] = y4[i+128]; sumy[2] += yh[i+0];
|
||||
yh[i+8] = y4[i+160]; sumy[3] += yh[i+8];
|
||||
}
|
||||
sumf += dall * (s[0] * sc1[0] + s[1] * sc1[1] + s[2] * sc3[0] + s[3] * sc3[1]) - dmin * smin;
|
||||
|
||||
device const uint16_t * sc = (device const uint16_t *)x[ib].scales + im;
|
||||
device const uint16_t * q1 = (device const uint16_t *)x[ib].qs + 16 * im + 4 * ir;
|
||||
device const half * dh = &x[ib].d;
|
||||
|
||||
for (int row = 0; row < N_DST; row++) {
|
||||
|
||||
sc16[0] = sc[0] & kmask1;
|
||||
sc16[1] = sc[2] & kmask1;
|
||||
sc16[2] = ((sc[4] >> 0) & kmask2) | ((sc[0] & kmask3) >> 2);
|
||||
sc16[3] = ((sc[4] >> 4) & kmask2) | ((sc[2] & kmask3) >> 2);
|
||||
|
||||
device const uint16_t * q2 = q1 + 32;
|
||||
|
||||
float4 acc1 = {0.f, 0.f, 0.f, 0.f};
|
||||
float4 acc2 = {0.f, 0.f, 0.f, 0.f};
|
||||
for (int i = 0; i < 8; i += 2) {
|
||||
acc1[0] += yl[i+0] * (q1[i/2] & 0x000F);
|
||||
acc1[1] += yl[i+1] * (q1[i/2] & 0x0F00);
|
||||
acc1[2] += yl[i+8] * (q1[i/2] & 0x00F0);
|
||||
acc1[3] += yl[i+9] * (q1[i/2] & 0xF000);
|
||||
acc2[0] += yh[i+0] * (q2[i/2] & 0x000F);
|
||||
acc2[1] += yh[i+1] * (q2[i/2] & 0x0F00);
|
||||
acc2[2] += yh[i+8] * (q2[i/2] & 0x00F0);
|
||||
acc2[3] += yh[i+9] * (q2[i/2] & 0xF000);
|
||||
}
|
||||
|
||||
float dall = dh[0];
|
||||
float dmin = dh[1];
|
||||
sumf[row] += dall * ((acc1[0] + 1.f/256.f * acc1[1]) * sc8[0] +
|
||||
(acc1[2] + 1.f/256.f * acc1[3]) * sc8[1] * 1.f/16.f +
|
||||
(acc2[0] + 1.f/256.f * acc2[1]) * sc8[4] +
|
||||
(acc2[2] + 1.f/256.f * acc2[3]) * sc8[5] * 1.f/16.f) -
|
||||
dmin * (sumy[0] * sc8[2] + sumy[1] * sc8[3] + sumy[2] * sc8[6] + sumy[3] * sc8[7]);
|
||||
|
||||
q1 += step;
|
||||
sc += step;
|
||||
dh += step;
|
||||
}
|
||||
|
||||
y4 += 4 * QK_K;
|
||||
}
|
||||
#else
|
||||
uint16_t aux16[2];
|
||||
thread const uint8_t * scales = (thread const uint8_t *)aux16;
|
||||
|
||||
const int il = 4*tpitg.x;
|
||||
|
||||
for (int i = tpitg.y; i < nb; i += tptg.y) {
|
||||
|
||||
device const uint8_t * q = x[i].qs + il;
|
||||
device const float * y = yy + i * QK_K + il;
|
||||
|
||||
const float d = (float)x[i].d[0];
|
||||
const float m = (float)x[i].d[1];
|
||||
|
||||
device const uint16_t * a = (device const uint16_t *)x[i].scales;
|
||||
aux16[0] = a[0] & 0x0f0f;
|
||||
aux16[1] = (a[0] >> 4) & 0x0f0f;
|
||||
|
||||
for (int l = 0; l < 4; ++l) {
|
||||
sumf += d * scales[0] * (y[l+ 0] * (q[l] & 0xF) + y[l+16] * (q[l+16] & 0xF)) - m * scales[2] * (y[l+ 0] + y[l+16])
|
||||
+ d * scales[1] * (y[l+32] * (q[l] >> 4) + y[l+48] * (q[l+16] >> 4)) - m * scales[3] * (y[l+32] + y[l+48]);
|
||||
for (int row = 0; row < N_DST; ++row) {
|
||||
all_sum = simd_sum(sumf[row]);
|
||||
if (tiisg == 0) {
|
||||
dst[r1*ne0 + first_row + row] = all_sum;
|
||||
}
|
||||
}
|
||||
#endif
|
||||
|
||||
sum[ith] = sumf;
|
||||
|
||||
//
|
||||
// Accumulate the sum from all threads in the threadgroup
|
||||
// This version is slightly faster than the commented out one below,
|
||||
// which I copy-pasted from ggerganov's q4_0 dot product for metal.
|
||||
//
|
||||
threadgroup_barrier(mem_flags::mem_threadgroup);
|
||||
if (ith%4 == 0) {
|
||||
for (int i = 1; i < 4; ++i) sum[ith] += sum[ith + i];
|
||||
}
|
||||
threadgroup_barrier(mem_flags::mem_threadgroup);
|
||||
if (ith%16 == 0) {
|
||||
for (int i = 4; i < 16; i += 4) sum[ith] += sum[ith + i];
|
||||
}
|
||||
threadgroup_barrier(mem_flags::mem_threadgroup);
|
||||
if (ith == 0) {
|
||||
for (int i = 16; i < nth; i += 16) sum[0] += sum[i];
|
||||
dst[r1*ne0 + r0] = sum[0];
|
||||
}
|
||||
|
||||
//// accumulate the sum from all threads in the threadgroup
|
||||
//threadgroup_barrier(mem_flags::mem_threadgroup);
|
||||
//for (uint i = nth/2; i > 0; i /= 2) {
|
||||
// if (ith < i) {
|
||||
// sum[ith] += sum[ith + i];
|
||||
// }
|
||||
// threadgroup_barrier(mem_flags::mem_threadgroup);
|
||||
//}
|
||||
|
||||
//if (ith == 0) {
|
||||
// dst[r1*ne0 + r0] = sum[0];
|
||||
//}
|
||||
}
|
||||
#else
|
||||
kernel void kernel_mul_mat_q4_K_f32(
|
||||
device const void * src0,
|
||||
device const float * src1,
|
||||
device float * dst,
|
||||
constant int64_t & ne00,
|
||||
constant int64_t & ne10,
|
||||
constant int64_t & ne0,
|
||||
constant int64_t & ne01[[buffer(4)]],
|
||||
uint2 tgpig[[threadgroup_position_in_grid]],
|
||||
uint tiisg[[thread_index_in_simdgroup]],
|
||||
uint sgitg[[simdgroup_index_in_threadgroup]]) {
|
||||
|
||||
const int ix = tiisg/4; // 0...7
|
||||
const int it = tiisg%4; // 0...3
|
||||
|
||||
const int nb = ne00/QK_K;
|
||||
const int r0 = tgpig.x;
|
||||
const int r1 = tgpig.y;
|
||||
const int first_row = (r0 * N_SIMDGROUP + sgitg) * N_DST;
|
||||
const int ib_row = first_row * nb;
|
||||
device const block_q4_K * x = (device const block_q4_K *) src0 + ib_row;
|
||||
device const float * y = (device const float *) src1 + r1*ne10;
|
||||
float yl[8];
|
||||
float yh[8];
|
||||
float sumf[N_DST]={0.f}, all_sum;
|
||||
|
||||
const int step = sizeof(block_q4_K) * nb / 2;
|
||||
|
||||
device const float * y4 = y + ix * QK_K + 8 * it;
|
||||
|
||||
uint16_t sc16[4];
|
||||
|
||||
for (int ib = ix; ib < nb; ib += 8) {
|
||||
|
||||
float2 sumy = {0.f, 0.f};
|
||||
for (int i = 0; i < 8; ++i) {
|
||||
yl[i] = y4[i+ 0]; sumy[0] += yl[i];
|
||||
yh[i] = y4[i+32]; sumy[1] += yh[i];
|
||||
}
|
||||
|
||||
device const uint16_t * sc = (device const uint16_t *)x[ib].scales;
|
||||
device const uint16_t * qs = (device const uint16_t *)x[ib].qs + 4 * it;
|
||||
device const half * dh = x[ib].d;
|
||||
|
||||
for (int row = 0; row < N_DST; row++) {
|
||||
|
||||
sc16[0] = sc[0] & 0x000f;
|
||||
sc16[1] = sc[0] & 0x0f00;
|
||||
sc16[2] = sc[0] & 0x00f0;
|
||||
sc16[3] = sc[0] & 0xf000;
|
||||
|
||||
float2 acc1 = {0.f, 0.f};
|
||||
float2 acc2 = {0.f, 0.f};
|
||||
for (int i = 0; i < 8; i += 2) {
|
||||
acc1[0] += yl[i+0] * (qs[i/2] & 0x000F);
|
||||
acc1[1] += yl[i+1] * (qs[i/2] & 0x0F00);
|
||||
acc2[0] += yh[i+0] * (qs[i/2] & 0x00F0);
|
||||
acc2[1] += yh[i+1] * (qs[i/2] & 0xF000);
|
||||
}
|
||||
|
||||
float dall = dh[0];
|
||||
float dmin = dh[1];
|
||||
sumf[row] += dall * ((acc1[0] + 1.f/256.f * acc1[1]) * sc16[0] +
|
||||
(acc2[0] + 1.f/256.f * acc2[1]) * sc16[1] * 1.f/4096.f) -
|
||||
dmin * 1.f/16.f * (sumy[0] * sc16[2] + sumy[1] * sc16[3] * 1.f/256.f);
|
||||
|
||||
qs += step;
|
||||
sc += step;
|
||||
dh += step;
|
||||
}
|
||||
|
||||
y4 += 8 * QK_K;
|
||||
}
|
||||
|
||||
for (int row = 0; row < N_DST; ++row) {
|
||||
all_sum = simd_sum(sumf[row]);
|
||||
if (tiisg == 0) {
|
||||
dst[r1*ne0 + first_row + row] = all_sum;
|
||||
}
|
||||
}
|
||||
}
|
||||
#endif
|
||||
|
||||
kernel void kernel_mul_mat_q5_K_f32(
|
||||
device const void * src0,
|
||||
@@ -1629,39 +1670,39 @@ kernel void kernel_mul_mat_q5_K_f32(
|
||||
constant int64_t & ne00,
|
||||
constant int64_t & ne10,
|
||||
constant int64_t & ne0,
|
||||
threadgroup float * sum [[threadgroup(0)]],
|
||||
uint2 tgpig[[threadgroup_position_in_grid]],
|
||||
uint2 tpitg[[thread_position_in_threadgroup]],
|
||||
uint2 tptg[[threads_per_threadgroup]]) {
|
||||
uint tiisg[[thread_index_in_simdgroup]],
|
||||
uint sgitg[[simdgroup_index_in_threadgroup]]) {
|
||||
|
||||
const int nb = ne00/QK_K;
|
||||
|
||||
const int64_t r0 = tgpig.x;
|
||||
const int64_t r1 = tgpig.y;
|
||||
|
||||
device const block_q5_K * x = (device const block_q5_K *) src0 + r0*nb;
|
||||
const int first_row = (r0 * N_SIMDGROUP + sgitg) * 2;
|
||||
|
||||
device const block_q5_K * x = (device const block_q5_K *) src0 + first_row*nb;
|
||||
device const float * yy = (device const float *) src1 + r1*ne10;
|
||||
|
||||
const int nth = tptg.x*tptg.y;
|
||||
const int ith = tptg.y*tpitg.x + tpitg.y;
|
||||
float sumf[2]={0.f};
|
||||
|
||||
float sumf = 0;
|
||||
const int step = sizeof(block_q5_K) * nb;
|
||||
|
||||
#if QK_K == 256
|
||||
#
|
||||
float yl[16], yh[16];
|
||||
|
||||
const uint16_t kmask1 = 0x3f3f;
|
||||
const uint16_t kmask2 = 0x0f0f;
|
||||
const uint16_t kmask3 = 0xc0c0;
|
||||
|
||||
const int tid = tpitg.y; // 0...16
|
||||
const int il = tid/4; // 0...3
|
||||
const int ir = tid - 4*il;// 0...3
|
||||
const int n = 4;
|
||||
const int tid = tiisg/4;
|
||||
const int ix = tiisg%4;
|
||||
const int im = tid/4;
|
||||
const int ir = tid%4;
|
||||
const int n = 8;
|
||||
|
||||
const int im = il/2; // 0 or 1. 0 computes 0,32 + 128,160, 1 computes 64,96 + 192,224
|
||||
const int in = il%2;
|
||||
|
||||
const int l0 = n*(2*ir + in);
|
||||
const int l0 = n*ir;
|
||||
const int q_offset = 32*im + l0;
|
||||
const int y_offset = 64*im + l0;
|
||||
|
||||
@@ -1670,78 +1711,114 @@ kernel void kernel_mul_mat_q5_K_f32(
|
||||
const uint8_t hm3 = hm1 << 4;
|
||||
const uint8_t hm4 = hm2 << 4;
|
||||
|
||||
uchar2 sc1, sc2, sc3, sc4;
|
||||
uint16_t sc16[4];
|
||||
thread const uint8_t * sc8 = (thread const uint8_t *)sc16;
|
||||
|
||||
for (int i = tpitg.x; i < nb; i += tptg.x) {
|
||||
device const float * y1 = yy + ix*QK_K + y_offset;
|
||||
|
||||
device const uint8_t * q1 = (x + i)->qs + q_offset;
|
||||
device const uint8_t * q2 = q1 + 64;
|
||||
device const uint8_t * qh = (x + i)->qh + l0;
|
||||
device const float * y1 = yy + i*QK_K + y_offset;
|
||||
device const float * y2 = y1 + 128;
|
||||
for (int i = ix; i < nb; i += 4) {
|
||||
|
||||
const float dall = (float)((x + i)->d);
|
||||
const float dmin = (float)((x + i)->dmin);
|
||||
device const uint8_t * q1 = x[i].qs + q_offset;
|
||||
device const uint8_t * qh = x[i].qh + l0;
|
||||
device const half * dh = &x[i].d;
|
||||
device const uint16_t * a = (device const uint16_t *)x[i].scales + im;
|
||||
|
||||
device const uint16_t * a = (device const uint16_t *)(x + i)->scales;
|
||||
sc1 = as_type<uchar2>((uint16_t)(a[im+0] & kmask1));
|
||||
sc2 = as_type<uchar2>((uint16_t)(a[im+2] & kmask1));
|
||||
sc3 = as_type<uchar2>((uint16_t)(((a[im+4] >> 0) & kmask2) | ((a[im+0] & kmask3) >> 2)));
|
||||
sc4 = as_type<uchar2>((uint16_t)(((a[im+4] >> 4) & kmask2) | ((a[im+2] & kmask3) >> 2)));
|
||||
device const float * y2 = y1 + 128;
|
||||
float4 sumy = {0.f, 0.f, 0.f, 0.f};
|
||||
for (int l = 0; l < 8; ++l) {
|
||||
yl[l+0] = y1[l+ 0]; sumy[0] += yl[l+0];
|
||||
yl[l+8] = y1[l+32]; sumy[1] += yl[l+8];
|
||||
yh[l+0] = y2[l+ 0]; sumy[2] += yh[l+0];
|
||||
yh[l+8] = y2[l+32]; sumy[3] += yh[l+8];
|
||||
}
|
||||
|
||||
float4 s = {0.f, 0.f, 0.f, 0.f};
|
||||
float smin = 0;
|
||||
for (int l = 0; l < n; ++l) {
|
||||
for (int row = 0; row < 2; ++row) {
|
||||
|
||||
s[0] += y1[l+ 0] * ((q1[l] & 0xF) + (qh[l] & hm1 ? 16 : 0));
|
||||
s[1] += y1[l+32] * ((q1[l] >> 4) + (qh[l] & hm2 ? 16 : 0));
|
||||
s[2] += y2[l+ 0] * ((q2[l] & 0xF) + (qh[l] & hm3 ? 16 : 0));
|
||||
s[3] += y2[l+32] * ((q2[l] >> 4) + (qh[l] & hm4 ? 16 : 0));
|
||||
smin += y1[l] * sc2[0] + y1[l+32] * sc2[1] + y2[l] * sc4[0] + y2[l+32] * sc4[1];
|
||||
device const uint8_t * q2 = q1 + 64;
|
||||
|
||||
sc16[0] = a[0] & kmask1;
|
||||
sc16[1] = a[2] & kmask1;
|
||||
sc16[2] = ((a[4] >> 0) & kmask2) | ((a[0] & kmask3) >> 2);
|
||||
sc16[3] = ((a[4] >> 4) & kmask2) | ((a[2] & kmask3) >> 2);
|
||||
|
||||
float4 acc = {0.f, 0.f, 0.f, 0.f};
|
||||
for (int l = 0; l < n; ++l) {
|
||||
uint8_t h = qh[l];
|
||||
acc[0] += yl[l+0] * ((uint16_t)(q1[l] & 0x0F) + (h & hm1 ? 16 : 0));
|
||||
acc[1] += yl[l+8] * ((uint16_t)(q1[l] & 0xF0) + (h & hm2 ? 256 : 0));
|
||||
acc[2] += yh[l+0] * ((uint16_t)(q2[l] & 0x0F) + (h & hm3 ? 16 : 0));
|
||||
acc[3] += yh[l+8] * ((uint16_t)(q2[l] & 0xF0) + (h & hm4 ? 256 : 0));
|
||||
}
|
||||
const float dall = dh[0];
|
||||
const float dmin = dh[1];
|
||||
sumf[row] += dall * (acc[0] * sc8[0] + acc[1] * sc8[1] * 1.f/16.f + acc[2] * sc8[4] + acc[3] * sc8[5] * 1.f/16.f) -
|
||||
dmin * (sumy[0] * sc8[2] + sumy[1] * sc8[3] + sumy[2] * sc8[6] + sumy[3] * sc8[7]);
|
||||
|
||||
q1 += step;
|
||||
qh += step;
|
||||
dh += step/2;
|
||||
a += step/2;
|
||||
|
||||
}
|
||||
sumf += dall * (s[0] * sc1[0] + s[1] * sc1[1] + s[2] * sc3[0] + s[3] * sc3[1]) - dmin * smin;
|
||||
|
||||
y1 += 4 * QK_K;
|
||||
|
||||
}
|
||||
#else
|
||||
const int il = 4 * tpitg.x; // 0, 4, 8, 12
|
||||
const int im = il/8; // 0, 0, 1, 1
|
||||
const int in = il%8; // 0, 4, 0, 4
|
||||
float yl[8], yh[8];
|
||||
|
||||
for (int i = tpitg.y; i < nb; i += tptg.y) {
|
||||
const int il = 4 * (tiisg/8); // 0, 4, 8, 12
|
||||
const int ix = tiisg%8;
|
||||
const int im = il/8; // 0, 0, 1, 1
|
||||
const int in = il%8; // 0, 4, 0, 4
|
||||
|
||||
const float d = (float)x[i].d;
|
||||
device const float * y = yy + ix*QK_K + il;
|
||||
|
||||
for (int i = ix; i < nb; i += 8) {
|
||||
|
||||
float4 sumy = {0.f, 0.f, 0.f, 0.f};
|
||||
for (int l = 0; l < 4; ++l) {
|
||||
yl[l+0] = y[l+ 0];
|
||||
yl[l+4] = y[l+16];
|
||||
yh[l+0] = y[l+32];
|
||||
yh[l+4] = y[l+48];
|
||||
}
|
||||
|
||||
device const half * dh = &x[i].d;
|
||||
device const uint8_t * q = x[i].qs + il;
|
||||
device const uint8_t * h = x[i].qh + in;
|
||||
device const int8_t * s = x[i].scales;
|
||||
device const float * y = yy + i*QK_K + il;
|
||||
|
||||
for (int l = 0; l < 4; ++l) {
|
||||
const uint8_t hl = h[l] >> im;
|
||||
sumf += y[l+ 0] * d * s[0] * ((q[l+ 0] & 0xF) - (hl & 0x01 ? 0 : 16))
|
||||
+ y[l+16] * d * s[1] * ((q[l+16] & 0xF) - (hl & 0x04 ? 0 : 16))
|
||||
+ y[l+32] * d * s[2] * ((q[l+ 0] >> 4) - (hl & 0x10 ? 0 : 16))
|
||||
+ y[l+48] * d * s[3] * ((q[l+16] >> 4) - (hl & 0x40 ? 0 : 16));
|
||||
for (int row = 0; row < 2; ++row) {
|
||||
|
||||
const float d = dh[0];
|
||||
|
||||
float2 acc = {0.f, 0.f};
|
||||
for (int l = 0; l < 4; ++l) {
|
||||
const uint8_t hl = h[l] >> im;
|
||||
acc[0] += yl[l+0] * s[0] * ((int16_t)(q[l+ 0] & 0x0F) - (hl & 0x01 ? 0 : 16))
|
||||
+ yl[l+4] * s[1] * ((int16_t)(q[l+16] & 0x0F) - (hl & 0x04 ? 0 : 16));
|
||||
acc[1] += yh[l+0] * s[2] * ((int16_t)(q[l+ 0] & 0xF0) - (hl & 0x10 ? 0 : 256))
|
||||
+ yh[l+4] * s[3] * ((int16_t)(q[l+16] & 0xF0) - (hl & 0x40 ? 0 : 256));
|
||||
}
|
||||
sumf[row] += d * (acc[0] + 1.f/16.f * acc[1]);
|
||||
|
||||
q += step;
|
||||
h += step;
|
||||
s += step;
|
||||
dh += step/2;
|
||||
|
||||
}
|
||||
|
||||
y += 8 * QK_K;
|
||||
}
|
||||
#endif
|
||||
sum[ith] = sumf;
|
||||
|
||||
//
|
||||
// Accumulate the sum from all threads in the threadgroup
|
||||
//
|
||||
threadgroup_barrier(mem_flags::mem_threadgroup);
|
||||
if (ith%4 == 0) {
|
||||
sum[ith] += sum[ith+1] + sum[ith+2] + sum[ith+3];
|
||||
}
|
||||
threadgroup_barrier(mem_flags::mem_threadgroup);
|
||||
if (ith%16 == 0) {
|
||||
sum[ith] += sum[ith+4] + sum[ith+8] + sum[ith+12];
|
||||
}
|
||||
threadgroup_barrier(mem_flags::mem_threadgroup);
|
||||
if (ith == 0) {
|
||||
for (int i = 16; i < nth; i += 16) sum[0] += sum[i];
|
||||
dst[r1*ne0 + r0] = sum[0];
|
||||
for (int row = 0; row < 2; ++row) {
|
||||
const float tot = simd_sum(sumf[row]);
|
||||
if (tiisg == 0) {
|
||||
dst[r1*ne0 + first_row + row] = tot;
|
||||
}
|
||||
}
|
||||
|
||||
}
|
||||
@@ -1753,10 +1830,9 @@ kernel void kernel_mul_mat_q6_K_f32(
|
||||
constant int64_t & ne00,
|
||||
constant int64_t & ne10,
|
||||
constant int64_t & ne0,
|
||||
threadgroup float * sum [[threadgroup(0)]],
|
||||
uint2 tgpig[[threadgroup_position_in_grid]],
|
||||
uint2 tpitg[[thread_position_in_threadgroup]],
|
||||
uint2 tptg[[threads_per_threadgroup]]) {
|
||||
uint tiisg[[thread_index_in_simdgroup]],
|
||||
uint sgitg[[simdgroup_index_in_threadgroup]]) {
|
||||
|
||||
const uint8_t kmask1 = 0x03;
|
||||
const uint8_t kmask2 = 0x0C;
|
||||
@@ -1768,19 +1844,18 @@ kernel void kernel_mul_mat_q6_K_f32(
|
||||
const int64_t r0 = tgpig.x;
|
||||
const int64_t r1 = tgpig.y;
|
||||
|
||||
device const block_q6_K * x = (device const block_q6_K *) src0 + r0*nb;
|
||||
device const float * yy = (device const float *) src1 + r1*ne10;
|
||||
const int row = 2 * r0 + sgitg;
|
||||
|
||||
const int nth = tptg.x*tptg.y;
|
||||
const int ith = tptg.y*tpitg.x + tpitg.y;
|
||||
device const block_q6_K * x = (device const block_q6_K *) src0 + row * nb; //r0*nb;
|
||||
device const float * yy = (device const float *) src1 + r1*ne10;
|
||||
|
||||
float sumf = 0;
|
||||
|
||||
#if QK_K == 256
|
||||
// Note: we absolutely assume that tptg.y = 16 and QK_K = 256!
|
||||
const int iqs = 16 * tpitg.y;
|
||||
const int ip = iqs / 128; // 0 or 1
|
||||
const int il = (iqs - 128*ip)/16; // 0...7
|
||||
const int tid = tiisg/2;
|
||||
const int ix = tiisg%2;
|
||||
const int ip = tid/8; // 0 or 1
|
||||
const int il = tid%8;
|
||||
const int n = 4;
|
||||
const int l0 = n*il;
|
||||
const int is = 8*ip + l0/16;
|
||||
@@ -1789,9 +1864,10 @@ kernel void kernel_mul_mat_q6_K_f32(
|
||||
const int q_offset_l = 64*ip + l0;
|
||||
const int q_offset_h = 32*ip + l0;
|
||||
|
||||
for (int i = tpitg.x; i < nb; i += tptg.x) {
|
||||
for (int i = ix; i < nb; i += 2) {
|
||||
|
||||
device const uint8_t * ql = x[i].ql + q_offset_l;
|
||||
device const uint8_t * q1 = x[i].ql + q_offset_l;
|
||||
device const uint8_t * q2 = q1 + 32;
|
||||
device const uint8_t * qh = x[i].qh + q_offset_h;
|
||||
device const int8_t * sc = x[i].scales + is;
|
||||
|
||||
@@ -1801,19 +1877,21 @@ kernel void kernel_mul_mat_q6_K_f32(
|
||||
|
||||
float4 sums = {0.f, 0.f, 0.f, 0.f};
|
||||
for (int l = 0; l < n; ++l) {
|
||||
sums[0] += y[l+ 0] * ((int8_t)((ql[l+ 0] & 0xF) | ((qh[l] & kmask1) << 4)) - 32);
|
||||
sums[1] += y[l+32] * ((int8_t)((ql[l+32] & 0xF) | ((qh[l] & kmask2) << 2)) - 32);
|
||||
sums[2] += y[l+64] * ((int8_t)((ql[l+ 0] >> 4) | ((qh[l] & kmask3) << 0)) - 32);
|
||||
sums[3] += y[l+96] * ((int8_t)((ql[l+32] >> 4) | ((qh[l] & kmask4) >> 2)) - 32);
|
||||
sums[0] += y[l+ 0] * ((int8_t)((q1[l] & 0xF) | ((qh[l] & kmask1) << 4)) - 32);
|
||||
sums[1] += y[l+32] * ((int8_t)((q2[l] & 0xF) | ((qh[l] & kmask2) << 2)) - 32);
|
||||
sums[2] += y[l+64] * ((int8_t)((q1[l] >> 4) | ((qh[l] & kmask3) << 0)) - 32);
|
||||
sums[3] += y[l+96] * ((int8_t)((q2[l] >> 4) | ((qh[l] & kmask4) >> 2)) - 32);
|
||||
}
|
||||
|
||||
sumf += dall * (sums[0] * sc[0] + sums[1] * sc[2] + sums[2] * sc[4] + sums[3] * sc[6]);
|
||||
|
||||
}
|
||||
#else
|
||||
const int il = 4*tpitg.x; // 0, 4, 8, 12
|
||||
|
||||
for (int i = tpitg.y; i < nb; i += tptg.y) {
|
||||
#else
|
||||
const int ix = tiisg/4;
|
||||
const int il = 4*(tiisg%4);
|
||||
|
||||
for (int i = ix; i < nb; i += 8) {
|
||||
device const float * y = yy + i * QK_K + il;
|
||||
device const uint8_t * ql = x[i].ql + il;
|
||||
device const uint8_t * qh = x[i].qh + il;
|
||||
@@ -1833,23 +1911,8 @@ kernel void kernel_mul_mat_q6_K_f32(
|
||||
|
||||
#endif
|
||||
|
||||
sum[ith] = sumf;
|
||||
|
||||
//
|
||||
// Accumulate the sum from all threads in the threadgroup
|
||||
//
|
||||
threadgroup_barrier(mem_flags::mem_threadgroup);
|
||||
if (ith%4 == 0) {
|
||||
for (int i = 1; i < 4; ++i) sum[ith] += sum[ith + i];
|
||||
const float tot = simd_sum(sumf);
|
||||
if (tiisg == 0) {
|
||||
dst[r1*ne0 + row] = tot;
|
||||
}
|
||||
threadgroup_barrier(mem_flags::mem_threadgroup);
|
||||
if (ith%16 == 0) {
|
||||
for (int i = 4; i < 16; i += 4) sum[ith] += sum[ith + i];
|
||||
}
|
||||
threadgroup_barrier(mem_flags::mem_threadgroup);
|
||||
if (ith == 0) {
|
||||
for (int i = 16; i < nth; i += 16) sum[0] += sum[i];
|
||||
dst[r1*ne0 + r0] = sum[0];
|
||||
}
|
||||
|
||||
}
|
||||
|
244
llama/ggml-mpi.c
Normal file
@@ -0,0 +1,244 @@
|
||||
//go:build mpi
|
||||
|
||||
/**
|
||||
* llama.cpp - git e782c9e735f93ab4767ffc37462c523b73a17ddc
|
||||
*
|
||||
* MIT License
|
||||
*
|
||||
* Copyright (c) 2023 Georgi Gerganov
|
||||
*
|
||||
* Permission is hereby granted, free of charge, to any person obtaining a copy
|
||||
* of this software and associated documentation files (the "Software"), to deal
|
||||
* in the Software without restriction, including without limitation the rights
|
||||
* to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
||||
* copies of the Software, and to permit persons to whom the Software is
|
||||
* furnished to do so, subject to the following conditions:
|
||||
*
|
||||
* The above copyright notice and this permission notice shall be included in all
|
||||
* copies or substantial portions of the Software.
|
||||
*
|
||||
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
||||
* IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
||||
* FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
||||
* AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
||||
* LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
||||
* OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
||||
* SOFTWARE.
|
||||
*/
|
||||
|
||||
#include "ggml-mpi.h"
|
||||
|
||||
#include "ggml.h"
|
||||
|
||||
#include <mpi.h>
|
||||
|
||||
#include <stdio.h>
|
||||
#include <stdlib.h>
|
||||
|
||||
#define MIN(a, b) ((a) < (b) ? (a) : (b))
|
||||
|
||||
#define UNUSED GGML_UNUSED
|
||||
|
||||
struct ggml_mpi_context {
|
||||
int rank;
|
||||
int size;
|
||||
};
|
||||
|
||||
void ggml_mpi_backend_init(void) {
|
||||
MPI_Init(NULL, NULL);
|
||||
}
|
||||
|
||||
void ggml_mpi_backend_free(void) {
|
||||
MPI_Finalize();
|
||||
}
|
||||
|
||||
struct ggml_mpi_context * ggml_mpi_init(void) {
|
||||
struct ggml_mpi_context * ctx = calloc(1, sizeof(struct ggml_mpi_context));
|
||||
|
||||
MPI_Comm_rank(MPI_COMM_WORLD, &ctx->rank);
|
||||
MPI_Comm_size(MPI_COMM_WORLD, &ctx->size);
|
||||
|
||||
return ctx;
|
||||
}
|
||||
|
||||
void ggml_mpi_free(struct ggml_mpi_context * ctx) {
|
||||
free(ctx);
|
||||
}
|
||||
|
||||
int ggml_mpi_rank(struct ggml_mpi_context * ctx) {
|
||||
return ctx->rank;
|
||||
}
|
||||
|
||||
void ggml_mpi_eval_init(
|
||||
struct ggml_mpi_context * ctx_mpi,
|
||||
int * n_tokens,
|
||||
int * n_past,
|
||||
int * n_threads) {
|
||||
UNUSED(ctx_mpi);
|
||||
|
||||
// synchronize the worker node parameters with the root node
|
||||
MPI_Barrier(MPI_COMM_WORLD);
|
||||
|
||||
MPI_Bcast(n_tokens, 1, MPI_INT, 0, MPI_COMM_WORLD);
|
||||
MPI_Bcast(n_past, 1, MPI_INT, 0, MPI_COMM_WORLD);
|
||||
MPI_Bcast(n_threads, 1, MPI_INT, 0, MPI_COMM_WORLD);
|
||||
}
|
||||
|
||||
static int ggml_graph_get_node_idx(struct ggml_cgraph * gf, const char * name) {
|
||||
struct ggml_tensor * t = ggml_graph_get_tensor(gf, name);
|
||||
if (t == NULL) {
|
||||
fprintf(stderr, "%s: tensor %s not found\n", __func__, name);
|
||||
return -1;
|
||||
}
|
||||
|
||||
for (int i = 0; i < gf->n_nodes; i++) {
|
||||
if (gf->nodes[i] == t) {
|
||||
return i;
|
||||
}
|
||||
}
|
||||
|
||||
fprintf(stderr, "%s: tensor %s not found in graph (should not happen)\n", __func__, name);
|
||||
return -1;
|
||||
}
|
||||
|
||||
static void ggml_mpi_tensor_send(struct ggml_tensor * t, int mpi_rank_dst) {
|
||||
MPI_Datatype mpi_type;
|
||||
|
||||
switch (t->type) {
|
||||
case GGML_TYPE_I32: mpi_type = MPI_INT32_T; break;
|
||||
case GGML_TYPE_F32: mpi_type = MPI_FLOAT; break;
|
||||
default: GGML_ASSERT(false && "not implemented");
|
||||
}
|
||||
|
||||
const int retval = MPI_Send(t->data, ggml_nelements(t), mpi_type, mpi_rank_dst, 0, MPI_COMM_WORLD);
|
||||
GGML_ASSERT(retval == MPI_SUCCESS);
|
||||
}
|
||||
|
||||
static void ggml_mpi_tensor_recv(struct ggml_tensor * t, int mpi_rank_src) {
|
||||
MPI_Datatype mpi_type;
|
||||
|
||||
switch (t->type) {
|
||||
case GGML_TYPE_I32: mpi_type = MPI_INT32_T; break;
|
||||
case GGML_TYPE_F32: mpi_type = MPI_FLOAT; break;
|
||||
default: GGML_ASSERT(false && "not implemented");
|
||||
}
|
||||
|
||||
MPI_Status status; UNUSED(status);
|
||||
|
||||
const int retval = MPI_Recv(t->data, ggml_nelements(t), mpi_type, mpi_rank_src, MPI_ANY_TAG, MPI_COMM_WORLD, &status);
|
||||
GGML_ASSERT(retval == MPI_SUCCESS);
|
||||
}
|
||||
|
||||
// TODO: there are many improvements that can be done to this implementation
|
||||
void ggml_mpi_graph_compute_pre(
|
||||
struct ggml_mpi_context * ctx_mpi,
|
||||
struct ggml_cgraph * gf,
|
||||
int n_layers) {
|
||||
const int mpi_rank = ctx_mpi->rank;
|
||||
const int mpi_size = ctx_mpi->size;
|
||||
|
||||
struct ggml_tensor * inp_tokens = ggml_graph_get_tensor(gf, "inp_tokens");
|
||||
if (inp_tokens == NULL) {
|
||||
fprintf(stderr, "%s: tensor 'inp_tokens' not found\n", __func__);
|
||||
return;
|
||||
}
|
||||
|
||||
struct ggml_tensor * inp0 = ggml_graph_get_tensor(gf, "layer_inp_0");
|
||||
if (inp0 == NULL) {
|
||||
fprintf(stderr, "%s: tensor 'inp0' not found\n", __func__);
|
||||
return;
|
||||
}
|
||||
|
||||
GGML_ASSERT(inp0 == gf->nodes[0]);
|
||||
|
||||
// distribute the compute graph into slices across the MPI nodes
|
||||
//
|
||||
// the main node (0) processes the last layers + the remainder of the compute graph
|
||||
// and is responsible to pass the input tokens to the first node (1)
|
||||
//
|
||||
// node 1: [( 0) * n_per_node, ( 1) * n_per_node)
|
||||
// node 2: [( 1) * n_per_node, ( 2) * n_per_node)
|
||||
// ...
|
||||
// node n-1: [(n-2) * n_per_node, (n-1) * n_per_node)
|
||||
// node 0: [(n-1) * n_per_node, n_nodes)
|
||||
//
|
||||
if (mpi_rank > 0) {
|
||||
if (mpi_rank == 1) {
|
||||
// the first node (1) receives the input tokens from the main node (0)
|
||||
ggml_mpi_tensor_recv(inp_tokens, 0);
|
||||
} else {
|
||||
// recv input data for each node into the "inp0" tensor (i.e. the first node in the compute graph)
|
||||
ggml_mpi_tensor_recv(inp0, mpi_rank - 1);
|
||||
}
|
||||
} else if (mpi_size > 1) {
|
||||
// node 0 sends the input tokens to node 1
|
||||
ggml_mpi_tensor_send(inp_tokens, 1);
|
||||
|
||||
// recv the output data from the last node
|
||||
ggml_mpi_tensor_recv(inp0, mpi_size - 1);
|
||||
}
|
||||
|
||||
{
|
||||
const int n_per_node = (n_layers + (mpi_size - 1)) / mpi_size;
|
||||
|
||||
const int mpi_idx = mpi_rank > 0 ? mpi_rank - 1 : mpi_size - 1;
|
||||
|
||||
const int il0 = (mpi_idx + 0) * n_per_node;
|
||||
const int il1 = MIN(n_layers, (mpi_idx + 1) * n_per_node);
|
||||
|
||||
char name_l0[GGML_MAX_NAME];
|
||||
char name_l1[GGML_MAX_NAME];
|
||||
|
||||
snprintf(name_l0, sizeof(name_l0), "layer_inp_%d", il0);
|
||||
snprintf(name_l1, sizeof(name_l1), "layer_inp_%d", il1);
|
||||
|
||||
const int idx_l0 = ggml_graph_get_node_idx(gf, name_l0);
|
||||
const int idx_l1 = mpi_rank > 0 ? ggml_graph_get_node_idx(gf, name_l1) + 1 : gf->n_nodes;
|
||||
|
||||
if (idx_l0 < 0 || idx_l1 < 0) {
|
||||
fprintf(stderr, "%s: layer input nodes not found\n", __func__);
|
||||
return;
|
||||
}
|
||||
|
||||
// attach the input data to all nodes that need it
|
||||
// TODO: not great - should be able to do this without modifying the compute graph (see next TODO below)
|
||||
for (int i = idx_l0; i < idx_l1; i++) {
|
||||
if (gf->nodes[i]->src[0] == gf->nodes[idx_l0]) {
|
||||
gf->nodes[i]->src[0] = inp0;
|
||||
}
|
||||
if (gf->nodes[i]->src[1] == gf->nodes[idx_l0]) {
|
||||
gf->nodes[i]->src[1] = inp0;
|
||||
}
|
||||
}
|
||||
|
||||
// TODO: instead of rearranging the nodes, we should be able to execute a subset of the compute graph
|
||||
for (int i = 1; i < idx_l1 - idx_l0; i++) {
|
||||
gf->nodes[i] = gf->nodes[idx_l0 + i];
|
||||
gf->grads[i] = gf->grads[idx_l0 + i];
|
||||
}
|
||||
|
||||
// the first node performs the "get_rows" operation, the rest of the nodes get the data from the previous node
|
||||
if (mpi_idx != 0) {
|
||||
gf->nodes[0]->op = GGML_OP_NONE;
|
||||
}
|
||||
|
||||
gf->n_nodes = idx_l1 - idx_l0;
|
||||
|
||||
//fprintf(stderr, "%s: node %d: processing %d nodes [%d, %d)\n", __func__, mpi_rank, gf->n_nodes, il0, il1);
|
||||
}
|
||||
}
|
||||
|
||||
void ggml_mpi_graph_compute_post(
|
||||
struct ggml_mpi_context * ctx_mpi,
|
||||
struct ggml_cgraph * gf,
|
||||
int n_layers) {
|
||||
UNUSED(n_layers);
|
||||
|
||||
const int mpi_rank = ctx_mpi->rank;
|
||||
const int mpi_size = ctx_mpi->size;
|
||||
|
||||
// send the output data to the next node
|
||||
if (mpi_rank > 0) {
|
||||
ggml_mpi_tensor_send(gf->nodes[gf->n_nodes - 1], (mpi_rank + 1) % mpi_size);
|
||||
}
|
||||
}
|
67
llama/ggml-mpi.h
Normal file
@@ -0,0 +1,67 @@
|
||||
//go:build mpi
|
||||
|
||||
/**
|
||||
* llama.cpp - git e782c9e735f93ab4767ffc37462c523b73a17ddc
|
||||
*
|
||||
* MIT License
|
||||
*
|
||||
* Copyright (c) 2023 Georgi Gerganov
|
||||
*
|
||||
* Permission is hereby granted, free of charge, to any person obtaining a copy
|
||||
* of this software and associated documentation files (the "Software"), to deal
|
||||
* in the Software without restriction, including without limitation the rights
|
||||
* to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
||||
* copies of the Software, and to permit persons to whom the Software is
|
||||
* furnished to do so, subject to the following conditions:
|
||||
*
|
||||
* The above copyright notice and this permission notice shall be included in all
|
||||
* copies or substantial portions of the Software.
|
||||
*
|
||||
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
||||
* IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
||||
* FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
||||
* AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
||||
* LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
||||
* OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
||||
* SOFTWARE.
|
||||
*/
|
||||
|
||||
#pragma once
|
||||
|
||||
struct ggml_context;
|
||||
struct ggml_tensor;
|
||||
struct ggml_cgraph;
|
||||
|
||||
#ifdef __cplusplus
|
||||
extern "C" {
|
||||
#endif
|
||||
|
||||
struct ggml_mpi_context;
|
||||
|
||||
void ggml_mpi_backend_init(void);
|
||||
void ggml_mpi_backend_free(void);
|
||||
|
||||
struct ggml_mpi_context * ggml_mpi_init(void);
|
||||
void ggml_mpi_free(struct ggml_mpi_context * ctx);
|
||||
|
||||
int ggml_mpi_rank(struct ggml_mpi_context * ctx);
|
||||
|
||||
void ggml_mpi_eval_init(
|
||||
struct ggml_mpi_context * ctx_mpi,
|
||||
int * n_tokens,
|
||||
int * n_past,
|
||||
int * n_threads);
|
||||
|
||||
void ggml_mpi_graph_compute_pre(
|
||||
struct ggml_mpi_context * ctx_mpi,
|
||||
struct ggml_cgraph * gf,
|
||||
int n_layers);
|
||||
|
||||
void ggml_mpi_graph_compute_post(
|
||||
struct ggml_mpi_context * ctx_mpi,
|
||||
struct ggml_cgraph * gf,
|
||||
int n_layers);
|
||||
|
||||
#ifdef __cplusplus
|
||||
}
|
||||
#endif
|
1893
llama/ggml-opencl.cpp
Normal file
53
llama/ggml-opencl.h
Normal file
@@ -0,0 +1,53 @@
|
||||
//go:build opencl
|
||||
|
||||
/**
|
||||
* llama.cpp - git e782c9e735f93ab4767ffc37462c523b73a17ddc
|
||||
*
|
||||
* MIT License
|
||||
*
|
||||
* Copyright (c) 2023 Georgi Gerganov
|
||||
*
|
||||
* Permission is hereby granted, free of charge, to any person obtaining a copy
|
||||
* of this software and associated documentation files (the "Software"), to deal
|
||||
* in the Software without restriction, including without limitation the rights
|
||||
* to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
||||
* copies of the Software, and to permit persons to whom the Software is
|
||||
* furnished to do so, subject to the following conditions:
|
||||
*
|
||||
* The above copyright notice and this permission notice shall be included in all
|
||||
* copies or substantial portions of the Software.
|
||||
*
|
||||
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
||||
* IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
||||
* FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
||||
* AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
||||
* LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
||||
* OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
||||
* SOFTWARE.
|
||||
*/
|
||||
|
||||
#pragma once
|
||||
|
||||
#include "ggml.h"
|
||||
|
||||
#ifdef __cplusplus
|
||||
extern "C" {
|
||||
#endif
|
||||
|
||||
void ggml_cl_init(void);
|
||||
|
||||
void ggml_cl_mul(const struct ggml_tensor * src0, const struct ggml_tensor * src1, struct ggml_tensor * dst);
|
||||
bool ggml_cl_can_mul_mat(const struct ggml_tensor * src0, const struct ggml_tensor * src1, struct ggml_tensor * dst);
|
||||
size_t ggml_cl_mul_mat_get_wsize(const struct ggml_tensor * src0, const struct ggml_tensor * src1, struct ggml_tensor * dst);
|
||||
void ggml_cl_mul_mat(const struct ggml_tensor * src0, const struct ggml_tensor * src1, struct ggml_tensor * dst, void * wdata, size_t wsize);
|
||||
|
||||
void * ggml_cl_host_malloc(size_t size);
|
||||
void ggml_cl_host_free(void * ptr);
|
||||
|
||||
void ggml_cl_free_data(const struct ggml_tensor* tensor);
|
||||
|
||||
void ggml_cl_transform_tensor(void * data, struct ggml_tensor * tensor);
|
||||
|
||||
#ifdef __cplusplus
|
||||
}
|
||||
#endif
|
650
llama/ggml.c
51
llama/ggml.h
@@ -1,5 +1,5 @@
|
||||
/**
|
||||
* llama.cpp - git 5bf2a2771886ee86137e01dbc7492f78fb392066
|
||||
* llama.cpp - git e782c9e735f93ab4767ffc37462c523b73a17ddc
|
||||
*
|
||||
* MIT License
|
||||
*
|
||||
@@ -227,8 +227,13 @@
|
||||
#define GGML_MAX_NAME 48
|
||||
#define GGML_DEFAULT_N_THREADS 4
|
||||
|
||||
|
||||
#define GGML_EXIT_SUCCESS 0
|
||||
#define GGML_EXIT_ABORTED 1
|
||||
|
||||
#define GGML_UNUSED(x) (void)(x)
|
||||
|
||||
|
||||
#define GGML_ASSERT(x) \
|
||||
do { \
|
||||
if (!(x)) { \
|
||||
@@ -389,6 +394,8 @@ extern "C" {
|
||||
GGML_OP_CLAMP,
|
||||
GGML_OP_CONV_1D,
|
||||
GGML_OP_CONV_2D,
|
||||
GGML_OP_POOL_1D,
|
||||
GGML_OP_POOL_2D,
|
||||
|
||||
GGML_OP_FLASH_ATTN,
|
||||
GGML_OP_FLASH_FF,
|
||||
@@ -468,6 +475,10 @@ extern "C" {
|
||||
|
||||
// the `n_tasks` of nodes, 1:1 mapping to cgraph nodes
|
||||
int n_tasks[GGML_MAX_NODES];
|
||||
|
||||
// abort ggml_graph_compute when true
|
||||
bool (*abort_callback)(void * data);
|
||||
void * abort_callback_data;
|
||||
};
|
||||
|
||||
// computation graph
|
||||
@@ -1136,6 +1147,17 @@ extern "C" {
|
||||
int mode,
|
||||
int n_ctx);
|
||||
|
||||
// custom RoPE, in-place, returns view(a)
|
||||
GGML_API struct ggml_tensor * ggml_rope_custom_inplace(
|
||||
struct ggml_context * ctx,
|
||||
struct ggml_tensor * a,
|
||||
int n_past,
|
||||
int n_dims,
|
||||
int mode,
|
||||
float freq_base,
|
||||
float freq_scale,
|
||||
int n_ctx);
|
||||
|
||||
// rotary position embedding backward, i.e compute dx from dy
|
||||
// a - dy
|
||||
GGML_API struct ggml_tensor * ggml_rope_back(
|
||||
@@ -1190,6 +1212,31 @@ extern "C" {
|
||||
int s,
|
||||
int d);
|
||||
|
||||
enum ggml_op_pool {
|
||||
GGML_OP_POOL_MAX,
|
||||
GGML_OP_POOL_AVG,
|
||||
GGML_OP_POOL_COUNT,
|
||||
};
|
||||
|
||||
GGML_API struct ggml_tensor* ggml_pool_1d(
|
||||
struct ggml_context * ctx,
|
||||
struct ggml_tensor * a,
|
||||
enum ggml_op_pool op,
|
||||
int k0, // kernel size
|
||||
int s0, // stride
|
||||
int p0); // padding
|
||||
|
||||
GGML_API struct ggml_tensor* ggml_pool_2d(
|
||||
struct ggml_context * ctx,
|
||||
struct ggml_tensor * a,
|
||||
enum ggml_op_pool op,
|
||||
int k0,
|
||||
int k1,
|
||||
int s0,
|
||||
int s1,
|
||||
int p0,
|
||||
int p1);
|
||||
|
||||
GGML_API struct ggml_tensor * ggml_flash_attn(
|
||||
struct ggml_context * ctx,
|
||||
struct ggml_tensor * q,
|
||||
@@ -1329,7 +1376,7 @@ extern "C" {
|
||||
// ggml_graph_plan() has to be called before ggml_graph_compute()
|
||||
// when plan.work_size > 0, caller must allocate memory for plan.work_data
|
||||
GGML_API struct ggml_cplan ggml_graph_plan (struct ggml_cgraph * cgraph, int n_threads /*= GGML_DEFAULT_N_THREADS*/);
|
||||
GGML_API void ggml_graph_compute(struct ggml_cgraph * cgraph, struct ggml_cplan * cplan);
|
||||
GGML_API int ggml_graph_compute(struct ggml_cgraph * cgraph, struct ggml_cplan * cplan);
|
||||
GGML_API void ggml_graph_reset (struct ggml_cgraph * cgraph);
|
||||
|
||||
// same as ggml_graph_compute() but the work data is allocated as a part of the context
|
||||
|
@@ -1,5 +1,5 @@
|
||||
/**
|
||||
* llama.cpp - git 5bf2a2771886ee86137e01dbc7492f78fb392066
|
||||
* llama.cpp - git e782c9e735f93ab4767ffc37462c523b73a17ddc
|
||||
*
|
||||
* MIT License
|
||||
*
|
||||
|
@@ -1,5 +1,5 @@
|
||||
/**
|
||||
* llama.cpp - git 5bf2a2771886ee86137e01dbc7492f78fb392066
|
||||
* llama.cpp - git e782c9e735f93ab4767ffc37462c523b73a17ddc
|
||||
*
|
||||
* MIT License
|
||||
*
|
||||
@@ -41,6 +41,14 @@
|
||||
#define K_SCALE_SIZE 12
|
||||
#endif
|
||||
|
||||
#ifndef static_assert
|
||||
#if defined(__STDC_VERSION__) && (__STDC_VERSION__ >= 201100L)
|
||||
#define static_assert(cond, msg) _Static_assert(cond, msg)
|
||||
#else
|
||||
#define static_assert(cond, msg) struct global_scope_noop_trick
|
||||
#endif
|
||||
#endif
|
||||
|
||||
//
|
||||
// Super-block quantization structures
|
||||
//
|
||||
|
@@ -1,5 +1,5 @@
|
||||
/**
|
||||
* llama.cpp - git 5bf2a2771886ee86137e01dbc7492f78fb392066
|
||||
* llama.cpp - git e782c9e735f93ab4767ffc37462c523b73a17ddc
|
||||
*
|
||||
* MIT License
|
||||
*
|
||||
@@ -201,13 +201,13 @@ struct llama_mmap {
|
||||
llama_mmap(struct llama_file * file, size_t prefetch = (size_t) -1 /* -1 = max value */, bool numa = false) {
|
||||
size = file->size;
|
||||
int fd = fileno(file->fp);
|
||||
int flags = MAP_PRIVATE;
|
||||
int flags = MAP_SHARED;
|
||||
// prefetch/readahead impairs performance on NUMA systems
|
||||
if (numa) { prefetch = 0; }
|
||||
#ifdef __linux__
|
||||
if (prefetch) { flags |= MAP_POPULATE; }
|
||||
#endif
|
||||
addr = mmap(NULL, file->size, PROT_READ | PROT_WRITE, flags, fd, 0);
|
||||
addr = mmap(NULL, file->size, PROT_READ, flags, fd, 0);
|
||||
if (addr == MAP_FAILED) {
|
||||
throw std::runtime_error(format("mmap failed: %s", strerror(errno)));
|
||||
}
|
||||
@@ -249,7 +249,7 @@ struct llama_mmap {
|
||||
throw std::runtime_error(format("CreateFileMappingA failed: %s", llama_format_win_err(error).c_str()));
|
||||
}
|
||||
|
||||
addr = MapViewOfFile(hMapping, FILE_MAP_COPY, 0, 0, 0);
|
||||
addr = MapViewOfFile(hMapping, FILE_MAP_READ, 0, 0, 0);
|
||||
error = GetLastError();
|
||||
CloseHandle(hMapping);
|
||||
|
||||
|
175
llama/llama.cpp
@@ -1,5 +1,5 @@
|
||||
/**
|
||||
* llama.cpp - git 5bf2a2771886ee86137e01dbc7492f78fb392066
|
||||
* llama.cpp - git e782c9e735f93ab4767ffc37462c523b73a17ddc
|
||||
*
|
||||
* MIT License
|
||||
*
|
||||
@@ -127,14 +127,15 @@ static void ggml_graph_compute_helper(std::vector<uint8_t> & buf, ggml_cgraph *
|
||||
// memory sizes
|
||||
//
|
||||
|
||||
static const std::map<e_model, size_t> & MEM_REQ_SCRATCH0()
|
||||
static const std::map<e_model, size_t> & MEM_REQ_SCRATCH0(int n_ctx)
|
||||
{
|
||||
static std::map<e_model, size_t> k_sizes = {
|
||||
{ MODEL_3B, 256ull * MB },
|
||||
{ MODEL_7B, 512ull * MB },
|
||||
{ MODEL_13B, 512ull * MB },
|
||||
{ MODEL_30B, 512ull * MB },
|
||||
{ MODEL_65B, 1024ull * MB },
|
||||
/* empirical scaling, still a guess */
|
||||
{ MODEL_3B, ((size_t) n_ctx / 16ull + 128ull) * MB },
|
||||
{ MODEL_7B, ((size_t) n_ctx / 16ull + 256ull) * MB },
|
||||
{ MODEL_13B, ((size_t) n_ctx / 12ull + 256ull) * MB },
|
||||
{ MODEL_30B, ((size_t) n_ctx / 10ull + 256ull) * MB },
|
||||
{ MODEL_65B, ((size_t) n_ctx / 8ull + 512ull) * MB },
|
||||
};
|
||||
return k_sizes;
|
||||
}
|
||||
@@ -166,14 +167,14 @@ static const std::map<e_model, size_t> & MEM_REQ_KV_SELF()
|
||||
|
||||
// this is mostly needed for temporary mul_mat buffers to dequantize the data
|
||||
// not actually needed if BLAS is disabled
|
||||
static const std::map<e_model, size_t> & MEM_REQ_EVAL()
|
||||
static const std::map<e_model, size_t> & MEM_REQ_EVAL(int n_ctx)
|
||||
{
|
||||
static std::map<e_model, size_t> k_sizes = {
|
||||
{ MODEL_3B, 512ull * MB },
|
||||
{ MODEL_7B, 768ull * MB },
|
||||
{ MODEL_13B, 1024ull * MB },
|
||||
{ MODEL_30B, 1280ull * MB },
|
||||
{ MODEL_65B, 1536ull * MB },
|
||||
{ MODEL_3B, ((size_t) n_ctx / 256ull + 512ull) * MB },
|
||||
{ MODEL_7B, ((size_t) n_ctx / 256ull + 768ull) * MB },
|
||||
{ MODEL_13B, ((size_t) n_ctx / 256ull + 1024ull) * MB },
|
||||
{ MODEL_30B, ((size_t) n_ctx / 256ull + 1280ull) * MB },
|
||||
{ MODEL_65B, ((size_t) n_ctx / 256ull + 1536ull) * MB },
|
||||
};
|
||||
return k_sizes;
|
||||
}
|
||||
@@ -215,6 +216,10 @@ struct llama_hparams {
|
||||
uint32_t n_head = 32;
|
||||
uint32_t n_layer = 32;
|
||||
uint32_t n_rot = 64;
|
||||
|
||||
float rope_freq_base = 10000.0f;
|
||||
float rope_freq_scale = 1.0f;
|
||||
|
||||
enum llama_ftype ftype = LLAMA_FTYPE_MOSTLY_F16;
|
||||
|
||||
bool operator!=(const llama_hparams & other) const {
|
||||
@@ -329,7 +334,7 @@ struct llama_model {
|
||||
};
|
||||
|
||||
struct llama_context {
|
||||
llama_context(const llama_model & model, const llama_vocab & vocab) : model(model), vocab(vocab), t_load_us(model.t_load_us), t_start_us(model.t_start_us) {}
|
||||
llama_context(const llama_model & model) : model(model), t_load_us(model.t_load_us), t_start_us(model.t_start_us) {}
|
||||
#ifdef GGML_USE_METAL
|
||||
~llama_context() {
|
||||
if (ctx_metal) {
|
||||
@@ -350,7 +355,6 @@ struct llama_context {
|
||||
int32_t n_p_eval = 0; // number of tokens in eval calls for the prompt (with batch size > 1)
|
||||
|
||||
const llama_model & model;
|
||||
const llama_vocab & vocab;
|
||||
|
||||
bool model_owner = false;
|
||||
|
||||
@@ -577,7 +581,9 @@ struct llama_file_loader {
|
||||
}
|
||||
|
||||
// skip to the next multiple of 32 bytes
|
||||
file.seek(-static_cast<ptrdiff_t>(file.tell()) & 31, SEEK_CUR);
|
||||
if (file_version >= LLAMA_FILE_VERSION_GGJT_V1) {
|
||||
file.seek(-static_cast<ptrdiff_t>(file.tell()) & 31, SEEK_CUR);
|
||||
}
|
||||
|
||||
tensor.file_off = file.tell();
|
||||
tensor.name = name;
|
||||
@@ -674,7 +680,7 @@ struct llama_model_loader {
|
||||
*ctx_size_p = *mmapped_size_p = 0;
|
||||
for (const llama_load_tensor & lt : tensors_map.tensors) {
|
||||
*ctx_size_p += sizeof(struct ggml_tensor) + GGML_OBJECT_SIZE;
|
||||
*(use_mmap ? mmapped_size_p : ctx_size_p) += lt.size;
|
||||
*(use_mmap ? mmapped_size_p : ctx_size_p) += lt.size + 16;
|
||||
}
|
||||
}
|
||||
|
||||
@@ -870,6 +876,8 @@ struct llama_context_params llama_context_default_params() {
|
||||
/*.gpu_layers =*/ 0,
|
||||
/*.main_gpu =*/ 0,
|
||||
/*.tensor_split =*/ {0},
|
||||
/*.rope_freq_base =*/ 10000.0f,
|
||||
/*.rope_freq_scale =*/ 1.0f,
|
||||
/*.progress_callback =*/ nullptr,
|
||||
/*.progress_callback_user_data =*/ nullptr,
|
||||
/*.low_vram =*/ false,
|
||||
@@ -895,6 +903,10 @@ struct llama_model_quantize_params llama_model_quantize_default_params() {
|
||||
return result;
|
||||
}
|
||||
|
||||
int llama_max_devices() {
|
||||
return LLAMA_MAX_DEVICES;
|
||||
}
|
||||
|
||||
bool llama_mmap_supported() {
|
||||
return llama_mmap::SUPPORTED;
|
||||
}
|
||||
@@ -993,6 +1005,8 @@ static void llama_model_load_internal(
|
||||
int n_gpu_layers,
|
||||
int main_gpu,
|
||||
const float * tensor_split,
|
||||
float rope_freq_base,
|
||||
float rope_freq_scale,
|
||||
bool low_vram,
|
||||
ggml_type memory_type,
|
||||
bool use_mmap,
|
||||
@@ -1027,22 +1041,27 @@ static void llama_model_load_internal(
|
||||
}
|
||||
|
||||
hparams.n_ctx = n_ctx;
|
||||
|
||||
hparams.rope_freq_base = rope_freq_base;
|
||||
hparams.rope_freq_scale = rope_freq_scale;
|
||||
}
|
||||
|
||||
const uint32_t n_ff = ((2*(4*hparams.n_embd)/3 + hparams.n_mult - 1)/hparams.n_mult)*hparams.n_mult;
|
||||
|
||||
{
|
||||
fprintf(stderr, "%s: format = %s\n", __func__, llama_file_version_name(file_version));
|
||||
fprintf(stderr, "%s: n_vocab = %u\n", __func__, hparams.n_vocab);
|
||||
fprintf(stderr, "%s: n_ctx = %u\n", __func__, hparams.n_ctx);
|
||||
fprintf(stderr, "%s: n_embd = %u\n", __func__, hparams.n_embd);
|
||||
fprintf(stderr, "%s: n_mult = %u\n", __func__, hparams.n_mult);
|
||||
fprintf(stderr, "%s: n_head = %u\n", __func__, hparams.n_head);
|
||||
fprintf(stderr, "%s: n_layer = %u\n", __func__, hparams.n_layer);
|
||||
fprintf(stderr, "%s: n_rot = %u\n", __func__, hparams.n_rot);
|
||||
fprintf(stderr, "%s: format = %s\n", __func__, llama_file_version_name(file_version));
|
||||
fprintf(stderr, "%s: n_vocab = %u\n", __func__, hparams.n_vocab);
|
||||
fprintf(stderr, "%s: n_ctx = %u\n", __func__, hparams.n_ctx);
|
||||
fprintf(stderr, "%s: n_embd = %u\n", __func__, hparams.n_embd);
|
||||
fprintf(stderr, "%s: n_mult = %u\n", __func__, hparams.n_mult);
|
||||
fprintf(stderr, "%s: n_head = %u\n", __func__, hparams.n_head);
|
||||
fprintf(stderr, "%s: n_layer = %u\n", __func__, hparams.n_layer);
|
||||
fprintf(stderr, "%s: n_rot = %u\n", __func__, hparams.n_rot);
|
||||
fprintf(stderr, "%s: freq_base = %.1f\n", __func__, hparams.rope_freq_base);
|
||||
fprintf(stderr, "%s: freq_scale = %g\n", __func__, hparams.rope_freq_scale);
|
||||
fprintf(stderr, "%s: ftype = %u (%s)\n", __func__, hparams.ftype, llama_ftype_name(hparams.ftype));
|
||||
fprintf(stderr, "%s: n_ff = %u\n", __func__, n_ff);
|
||||
fprintf(stderr, "%s: model size = %s\n", __func__, llama_model_type_name(model.type));
|
||||
fprintf(stderr, "%s: n_ff = %u\n", __func__, n_ff);
|
||||
fprintf(stderr, "%s: model size = %s\n", __func__, llama_model_type_name(model.type));
|
||||
}
|
||||
|
||||
if (file_version < LLAMA_FILE_VERSION_GGJT_V2) {
|
||||
@@ -1191,9 +1210,9 @@ static void llama_model_load_internal(
|
||||
const size_t mem_required =
|
||||
ctx_size +
|
||||
mmapped_size - vram_weights + // weights in VRAM not in memory
|
||||
MEM_REQ_SCRATCH0().at(model.type) +
|
||||
MEM_REQ_SCRATCH0(hparams.n_ctx).at(model.type) +
|
||||
MEM_REQ_SCRATCH1().at(model.type) +
|
||||
MEM_REQ_EVAL().at (model.type);
|
||||
MEM_REQ_EVAL(hparams.n_ctx).at(model.type);
|
||||
|
||||
// this is the memory required by one llama_state
|
||||
const size_t mem_required_state =
|
||||
@@ -1297,6 +1316,8 @@ static bool llama_model_load(
|
||||
int n_gpu_layers,
|
||||
int main_gpu,
|
||||
float * tensor_split,
|
||||
float rope_freq_base,
|
||||
float rope_freq_scale,
|
||||
bool low_vram,
|
||||
ggml_type memory_type,
|
||||
bool use_mmap,
|
||||
@@ -1305,7 +1326,7 @@ static bool llama_model_load(
|
||||
llama_progress_callback progress_callback,
|
||||
void *progress_callback_user_data) {
|
||||
try {
|
||||
llama_model_load_internal(fname, model, vocab, n_ctx, n_batch, n_gpu_layers, main_gpu, tensor_split, low_vram, memory_type,
|
||||
llama_model_load_internal(fname, model, vocab, n_ctx, n_batch, n_gpu_layers, main_gpu, tensor_split, rope_freq_base, rope_freq_scale, low_vram, memory_type,
|
||||
use_mmap, use_mlock, vocab_only, progress_callback, progress_callback_user_data);
|
||||
return true;
|
||||
} catch (const std::exception & err) {
|
||||
@@ -1357,6 +1378,9 @@ static bool llama_eval_internal(
|
||||
const int n_rot = hparams.n_embd/hparams.n_head;
|
||||
const int n_gpu_layers = model.n_gpu_layers;
|
||||
|
||||
const float freq_base = hparams.rope_freq_base;
|
||||
const float freq_scale = hparams.rope_freq_scale;
|
||||
|
||||
auto & mem_per_token = lctx.mem_per_token;
|
||||
auto & buf_compute = lctx.buf_compute;
|
||||
|
||||
@@ -1454,11 +1478,11 @@ static bool llama_eval_internal(
|
||||
offload_func_kq(tmpq);
|
||||
ggml_set_name(tmpq, "tmpq");
|
||||
|
||||
struct ggml_tensor * Kcur = ggml_rope_inplace(ctx0, ggml_reshape_3d(ctx0, tmpk, n_embd/n_head, n_head, N), n_past, n_rot, 0, 0);
|
||||
struct ggml_tensor * Kcur = ggml_rope_custom_inplace(ctx0, ggml_reshape_3d(ctx0, tmpk, n_embd/n_head, n_head, N), n_past, n_rot, 0, freq_base, freq_scale, 0);
|
||||
offload_func_kq(Kcur);
|
||||
ggml_set_name(Kcur, "Kcur");
|
||||
|
||||
struct ggml_tensor * Qcur = ggml_rope_inplace(ctx0, ggml_reshape_3d(ctx0, tmpq, n_embd/n_head, n_head, N), n_past, n_rot, 0, 0);
|
||||
struct ggml_tensor * Qcur = ggml_rope_custom_inplace(ctx0, ggml_reshape_3d(ctx0, tmpq, n_embd/n_head, n_head, N), n_past, n_rot, 0, freq_base, freq_scale, 0);
|
||||
offload_func_kq(Qcur);
|
||||
ggml_set_name(Qcur, "Qcur");
|
||||
|
||||
@@ -2032,9 +2056,18 @@ void llama_sample_tail_free(struct llama_context * ctx, llama_token_data_array *
|
||||
}
|
||||
|
||||
// Normalize the second derivatives
|
||||
float second_derivatives_sum = std::accumulate(second_derivatives.begin(), second_derivatives.end(), 0.0f);
|
||||
for (float & value : second_derivatives) {
|
||||
value /= second_derivatives_sum;
|
||||
{
|
||||
const float second_derivatives_sum = std::accumulate(second_derivatives.begin(), second_derivatives.end(), 0.0f);
|
||||
|
||||
if (second_derivatives_sum > 1e-6f) {
|
||||
for (float & value : second_derivatives) {
|
||||
value /= second_derivatives_sum;
|
||||
}
|
||||
} else {
|
||||
for (float & value : second_derivatives) {
|
||||
value = 1.0f / second_derivatives.size();
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
float cum_sum = 0.0f;
|
||||
@@ -2213,7 +2246,7 @@ void llama_sample_classifier_free_guidance(
|
||||
struct llama_context * guidance_ctx,
|
||||
float scale,
|
||||
float smooth_factor) {
|
||||
int64_t t_start_sample_us = t_start_sample_us = ggml_time_us();
|
||||
int64_t t_start_sample_us = ggml_time_us();
|
||||
|
||||
assert(ctx);
|
||||
auto n_vocab = llama_n_vocab(ctx);
|
||||
@@ -2701,8 +2734,9 @@ struct llama_model * llama_load_model_from_file(
|
||||
ggml_type memory_type = params.f16_kv ? GGML_TYPE_F16 : GGML_TYPE_F32;
|
||||
|
||||
if (!llama_model_load(path_model, *model, model->vocab, params.n_ctx, params.n_batch, params.n_gpu_layers,
|
||||
params.main_gpu, params.tensor_split, params.low_vram, memory_type, params.use_mmap, params.use_mlock,
|
||||
params.vocab_only, params.progress_callback, params.progress_callback_user_data)) {
|
||||
params.main_gpu, params.tensor_split, params.rope_freq_base, params.rope_freq_scale,params.low_vram,
|
||||
memory_type, params.use_mmap, params.use_mlock, params.vocab_only, params.progress_callback,
|
||||
params.progress_callback_user_data)) {
|
||||
delete model;
|
||||
fprintf(stderr, "%s: failed to load model\n", __func__);
|
||||
return nullptr;
|
||||
@@ -2723,7 +2757,7 @@ struct llama_context * llama_new_context_with_model(
|
||||
return nullptr;
|
||||
}
|
||||
|
||||
llama_context * ctx = new llama_context(*model, model->vocab);
|
||||
llama_context * ctx = new llama_context(*model);
|
||||
|
||||
if (params.seed == LLAMA_DEFAULT_SEED) {
|
||||
params.seed = time(NULL);
|
||||
@@ -2777,9 +2811,9 @@ struct llama_context * llama_new_context_with_model(
|
||||
ctx->embedding.resize(hparams.n_embd);
|
||||
}
|
||||
|
||||
ctx->buf_compute.resize(MEM_REQ_EVAL().at(ctx->model.type));
|
||||
ctx->buf_compute.resize(MEM_REQ_EVAL(hparams.n_ctx).at(ctx->model.type));
|
||||
|
||||
ctx->buf_scratch[0].resize(MEM_REQ_SCRATCH0().at(ctx->model.type));
|
||||
ctx->buf_scratch[0].resize(MEM_REQ_SCRATCH0(hparams.n_ctx).at(ctx->model.type));
|
||||
ctx->buf_scratch[1].resize(MEM_REQ_SCRATCH1().at(ctx->model.type));
|
||||
}
|
||||
|
||||
@@ -3561,13 +3595,13 @@ int llama_eval_export(struct llama_context * ctx, const char * fname) {
|
||||
return 0;
|
||||
}
|
||||
|
||||
int llama_tokenize(
|
||||
struct llama_context * ctx,
|
||||
int llama_tokenize_with_model(
|
||||
const struct llama_model * model,
|
||||
const char * text,
|
||||
llama_token * tokens,
|
||||
int n_max_tokens,
|
||||
bool add_bos) {
|
||||
auto res = llama_tokenize(ctx->vocab, text, add_bos);
|
||||
auto res = llama_tokenize(model->vocab, text, add_bos);
|
||||
|
||||
if (n_max_tokens < (int) res.size()) {
|
||||
fprintf(stderr, "%s: too many tokens\n", __func__);
|
||||
@@ -3581,8 +3615,29 @@ int llama_tokenize(
|
||||
return res.size();
|
||||
}
|
||||
|
||||
int llama_tokenize(
|
||||
struct llama_context * ctx,
|
||||
const char * text,
|
||||
llama_token * tokens,
|
||||
int n_max_tokens,
|
||||
bool add_bos) {
|
||||
return llama_tokenize_with_model(&ctx->model, text, tokens, n_max_tokens, add_bos);
|
||||
}
|
||||
|
||||
int llama_n_vocab_from_model(const struct llama_model * model) {
|
||||
return model->vocab.id_to_token.size();
|
||||
}
|
||||
|
||||
int llama_n_ctx_from_model(const struct llama_model * model) {
|
||||
return model->hparams.n_ctx;
|
||||
}
|
||||
|
||||
int llama_n_embd_from_model(const struct llama_model * model) {
|
||||
return model->hparams.n_embd;
|
||||
}
|
||||
|
||||
int llama_n_vocab(const struct llama_context * ctx) {
|
||||
return ctx->vocab.id_to_token.size();
|
||||
return ctx->model.vocab.id_to_token.size();
|
||||
}
|
||||
|
||||
int llama_n_ctx(const struct llama_context * ctx) {
|
||||
@@ -3593,17 +3648,25 @@ int llama_n_embd(const struct llama_context * ctx) {
|
||||
return ctx->model.hparams.n_embd;
|
||||
}
|
||||
|
||||
int llama_get_vocab_from_model(
|
||||
const struct llama_model * model,
|
||||
const char * * strings,
|
||||
float * scores,
|
||||
int capacity) {
|
||||
int n = std::min(capacity, (int) model->vocab.id_to_token.size());
|
||||
for (int i = 0; i<n; ++i) {
|
||||
strings[i] = model->vocab.id_to_token[i].tok.c_str();
|
||||
scores[i] = model->vocab.id_to_token[i].score;
|
||||
}
|
||||
return n;
|
||||
}
|
||||
|
||||
int llama_get_vocab(
|
||||
const struct llama_context * ctx,
|
||||
const char * * strings,
|
||||
float * scores,
|
||||
int capacity) {
|
||||
int n = std::min(capacity, (int) ctx->vocab.id_to_token.size());
|
||||
for (int i = 0; i<n; ++i) {
|
||||
strings[i] = ctx->vocab.id_to_token[i].tok.c_str();
|
||||
scores[i] = ctx->vocab.id_to_token[i].score;
|
||||
}
|
||||
return n;
|
||||
return llama_get_vocab_from_model(&ctx->model, strings, scores, capacity);
|
||||
}
|
||||
|
||||
float * llama_get_logits(struct llama_context * ctx) {
|
||||
@@ -3614,12 +3677,16 @@ float * llama_get_embeddings(struct llama_context * ctx) {
|
||||
return ctx->embedding.data();
|
||||
}
|
||||
|
||||
const char * llama_token_to_str(const struct llama_context * ctx, llama_token token) {
|
||||
if (token >= llama_n_vocab(ctx)) {
|
||||
const char * llama_token_to_str_with_model(const struct llama_model * model, llama_token token) {
|
||||
if (token >= llama_n_vocab_from_model(model)) {
|
||||
return nullptr;
|
||||
}
|
||||
|
||||
return ctx->vocab.id_to_token[token].tok.c_str();
|
||||
return model->vocab.id_to_token[token].tok.c_str();
|
||||
}
|
||||
|
||||
const char * llama_token_to_str(const struct llama_context * ctx, llama_token token) {
|
||||
return llama_token_to_str_with_model(&ctx->model, token);
|
||||
}
|
||||
|
||||
llama_token llama_token_bos() {
|
||||
|
@@ -78,10 +78,14 @@ llama_token llama_sample(
|
||||
*/
|
||||
import "C"
|
||||
import (
|
||||
"bytes"
|
||||
"errors"
|
||||
"fmt"
|
||||
"io"
|
||||
"os"
|
||||
"strings"
|
||||
"time"
|
||||
"unicode/utf8"
|
||||
"unsafe"
|
||||
|
||||
"github.com/jmorganca/ollama/api"
|
||||
@@ -147,9 +151,14 @@ func (llm *llama) Close() {
|
||||
C.llama_print_timings(llm.ctx)
|
||||
}
|
||||
|
||||
func (llm *llama) Predict(prompt string, fn func(string)) error {
|
||||
if tokens := llm.tokenize(prompt); tokens != nil {
|
||||
return llm.generate(tokens, fn)
|
||||
func (llm *llama) Predict(ctx []int, prompt string, fn func(api.GenerateResponse)) error {
|
||||
if input := llm.tokenize(prompt); input != nil {
|
||||
embd := make([]C.llama_token, len(ctx))
|
||||
for i := range ctx {
|
||||
embd[i] = C.llama_token(ctx[i])
|
||||
}
|
||||
|
||||
return llm.generate(append(embd, input...), fn)
|
||||
}
|
||||
|
||||
return errors.New("llama: tokenize")
|
||||
@@ -176,7 +185,7 @@ func (llm *llama) detokenize(tokens ...C.llama_token) string {
|
||||
return sb.String()
|
||||
}
|
||||
|
||||
func (llm *llama) generate(tokens []C.llama_token, fn func(string)) error {
|
||||
func (llm *llama) generate(input []C.llama_token, fn func(api.GenerateResponse)) error {
|
||||
var opts C.struct_llama_sample_options
|
||||
opts.repeat_penalty = C.float(llm.RepeatPenalty)
|
||||
opts.frequency_penalty = C.float(llm.FrequencyPenalty)
|
||||
@@ -190,38 +199,70 @@ func (llm *llama) generate(tokens []C.llama_token, fn func(string)) error {
|
||||
opts.mirostat_tau = C.float(llm.MirostatTau)
|
||||
opts.mirostat_eta = C.float(llm.MirostatEta)
|
||||
|
||||
pastTokens := deque[C.llama_token]{capacity: llm.RepeatLastN}
|
||||
output := deque[C.llama_token]{capacity: llm.NumCtx}
|
||||
|
||||
context := deque[int]{capacity: llm.NumCtx / 2}
|
||||
for _, in := range input {
|
||||
context.PushLeft(int(in))
|
||||
}
|
||||
|
||||
var b bytes.Buffer
|
||||
for C.llama_get_kv_cache_token_count(llm.ctx) < C.int(llm.NumCtx) {
|
||||
if retval := C.llama_eval(llm.ctx, unsafe.SliceData(tokens), C.int(len(tokens)), C.llama_get_kv_cache_token_count(llm.ctx), C.int(llm.NumThread)); retval != 0 {
|
||||
if retval := C.llama_eval(llm.ctx, unsafe.SliceData(input), C.int(len(input)), C.llama_get_kv_cache_token_count(llm.ctx), C.int(llm.NumThread)); retval != 0 {
|
||||
return errors.New("llama: eval")
|
||||
}
|
||||
|
||||
token, err := llm.sample(pastTokens, &opts)
|
||||
switch {
|
||||
case errors.Is(err, io.EOF):
|
||||
return nil
|
||||
case err != nil:
|
||||
token, err := llm.sample(output, &opts)
|
||||
if errors.Is(err, io.EOF) {
|
||||
break
|
||||
} else if err != nil {
|
||||
return err
|
||||
}
|
||||
|
||||
fn(llm.detokenize(token))
|
||||
b.WriteString(llm.detokenize(token))
|
||||
if utf8.Valid(b.Bytes()) || b.Len() >= utf8.UTFMax {
|
||||
// call the callback
|
||||
fn(api.GenerateResponse{
|
||||
Response: b.String(),
|
||||
})
|
||||
|
||||
tokens = []C.llama_token{token}
|
||||
output.PushLeft(token)
|
||||
context.PushLeft(int(token))
|
||||
b.Reset()
|
||||
}
|
||||
|
||||
pastTokens.PushLeft(token)
|
||||
input = []C.llama_token{token}
|
||||
}
|
||||
|
||||
dur := func(ms float64) time.Duration {
|
||||
d, err := time.ParseDuration(fmt.Sprintf("%fms", ms))
|
||||
if err != nil {
|
||||
panic(err)
|
||||
}
|
||||
|
||||
return d
|
||||
}
|
||||
|
||||
timings := C.llama_get_timings(llm.ctx)
|
||||
fn(api.GenerateResponse{
|
||||
Done: true,
|
||||
Context: context.Data(),
|
||||
PromptEvalCount: int(timings.n_p_eval),
|
||||
PromptEvalDuration: dur(float64(timings.t_p_eval_ms)),
|
||||
EvalCount: int(timings.n_eval),
|
||||
EvalDuration: dur(float64(timings.t_eval_ms)),
|
||||
})
|
||||
|
||||
return nil
|
||||
}
|
||||
|
||||
func (llm *llama) sample(pastTokens deque[C.llama_token], opts *C.struct_llama_sample_options) (C.llama_token, error) {
|
||||
func (llm *llama) sample(output deque[C.llama_token], opts *C.struct_llama_sample_options) (C.llama_token, error) {
|
||||
numVocab := int(C.llama_n_vocab(llm.ctx))
|
||||
logits := unsafe.Slice(C.llama_get_logits(llm.ctx), numVocab)
|
||||
|
||||
candidates := make([]C.struct_llama_token_data, 0, numVocab)
|
||||
for i := 0; i < numVocab; i++ {
|
||||
candidates = append(candidates, C.llama_token_data{
|
||||
candidates := deque[C.struct_llama_token_data]{capacity: numVocab}
|
||||
for i := 0; i < candidates.Cap(); i++ {
|
||||
candidates.PushLeft(C.struct_llama_token_data{
|
||||
id: C.int(i),
|
||||
logit: logits[i],
|
||||
p: 0,
|
||||
@@ -230,8 +271,8 @@ func (llm *llama) sample(pastTokens deque[C.llama_token], opts *C.struct_llama_s
|
||||
|
||||
token := C.llama_sample(
|
||||
llm.ctx,
|
||||
unsafe.SliceData(candidates), C.ulong(len(candidates)),
|
||||
unsafe.SliceData(pastTokens.Data()), C.ulong(pastTokens.Len()),
|
||||
unsafe.SliceData(candidates.Data()), C.size_t(candidates.Len()),
|
||||
unsafe.SliceData(output.Data()), C.size_t(output.Len()),
|
||||
opts)
|
||||
if token != C.llama_token_eos() {
|
||||
return token, nil
|
||||
|
@@ -1,5 +1,5 @@
|
||||
/**
|
||||
* llama.cpp - git 5bf2a2771886ee86137e01dbc7492f78fb392066
|
||||
* llama.cpp - git e782c9e735f93ab4767ffc37462c523b73a17ddc
|
||||
*
|
||||
* MIT License
|
||||
*
|
||||
@@ -115,6 +115,11 @@ extern "C" {
|
||||
int32_t n_gpu_layers; // number of layers to store in VRAM
|
||||
int32_t main_gpu; // the GPU that is used for scratch and small tensors
|
||||
float tensor_split[LLAMA_MAX_DEVICES]; // how to split layers across multiple GPUs
|
||||
|
||||
// ref: https://github.com/ggerganov/llama.cpp/pull/2054
|
||||
float rope_freq_base; // RoPE base frequency
|
||||
float rope_freq_scale; // RoPE frequency scaling factor
|
||||
|
||||
// called with a progress value between 0 and 1, pass NULL to disable
|
||||
llama_progress_callback progress_callback;
|
||||
// context pointer passed to the progress callback
|
||||
@@ -174,6 +179,8 @@ extern "C" {
|
||||
int32_t n_eval;
|
||||
};
|
||||
|
||||
LLAMA_API int llama_max_devices();
|
||||
|
||||
LLAMA_API struct llama_context_params llama_context_default_params();
|
||||
LLAMA_API struct llama_model_quantize_params llama_model_quantize_default_params();
|
||||
|
||||
@@ -296,10 +303,21 @@ extern "C" {
|
||||
int n_max_tokens,
|
||||
bool add_bos);
|
||||
|
||||
LLAMA_API int llama_tokenize_with_model(
|
||||
const struct llama_model * model,
|
||||
const char * text,
|
||||
llama_token * tokens,
|
||||
int n_max_tokens,
|
||||
bool add_bos);
|
||||
|
||||
LLAMA_API int llama_n_vocab(const struct llama_context * ctx);
|
||||
LLAMA_API int llama_n_ctx (const struct llama_context * ctx);
|
||||
LLAMA_API int llama_n_embd (const struct llama_context * ctx);
|
||||
|
||||
LLAMA_API int llama_n_vocab_from_model(const struct llama_model * model);
|
||||
LLAMA_API int llama_n_ctx_from_model (const struct llama_model * model);
|
||||
LLAMA_API int llama_n_embd_from_model (const struct llama_model * model);
|
||||
|
||||
// Get the vocabulary as output parameters.
|
||||
// Returns number of results.
|
||||
LLAMA_API int llama_get_vocab(
|
||||
@@ -308,6 +326,12 @@ extern "C" {
|
||||
float * scores,
|
||||
int capacity);
|
||||
|
||||
LLAMA_API int llama_get_vocab_from_model(
|
||||
const struct llama_model * model,
|
||||
const char * * strings,
|
||||
float * scores,
|
||||
int capacity);
|
||||
|
||||
// Token logits obtained from the last call to llama_eval()
|
||||
// The logits for the last token are stored in the last row
|
||||
// Can be mutated in order to change the probabilities of the next token
|
||||
@@ -320,7 +344,13 @@ extern "C" {
|
||||
LLAMA_API float * llama_get_embeddings(struct llama_context * ctx);
|
||||
|
||||
// Token Id -> String. Uses the vocabulary in the provided context
|
||||
LLAMA_API const char * llama_token_to_str(const struct llama_context * ctx, llama_token token);
|
||||
LLAMA_API const char * llama_token_to_str(
|
||||
const struct llama_context * ctx,
|
||||
llama_token token);
|
||||
|
||||
LLAMA_API const char * llama_token_to_str_with_model(
|
||||
const struct llama_model * model,
|
||||
llama_token token);
|
||||
|
||||
// Special tokens
|
||||
LLAMA_API llama_token llama_token_bos(); // beginning-of-sentence
|
||||
|
70
llama/update-llama-cpp.sh
Normal file
@@ -0,0 +1,70 @@
|
||||
#!/bin/sh
|
||||
|
||||
set -eu
|
||||
|
||||
|
||||
status() { echo >&2 ">>> $*"; }
|
||||
error() { status "ERROR $*"; }
|
||||
usage() {
|
||||
echo "usage: $(basename $0) /path/to/repo"
|
||||
exit 1
|
||||
}
|
||||
|
||||
OUT=$(dirname $0)
|
||||
while getopts "hC:" OPTION; do
|
||||
case $OPTION in
|
||||
C) OUT=$OPTARG ;;
|
||||
*) usage ;;
|
||||
esac
|
||||
done
|
||||
|
||||
shift $(( $OPTIND - 1 ))
|
||||
[ $# -eq 1 ] || usage
|
||||
|
||||
status "updating source..."
|
||||
cp -a "$1"/*.{c,h,cpp,m,metal,cu} "$OUT"
|
||||
|
||||
status "removing incompatible files..."
|
||||
rm -f "$OUT"/build-info.h
|
||||
|
||||
SHA1=$(git -C $1 rev-parse @)
|
||||
|
||||
LICENSE=$(mktemp)
|
||||
cleanup() {
|
||||
rm -f $LICENSE
|
||||
}
|
||||
trap cleanup 0
|
||||
|
||||
cat <<EOF | sed 's/ *$//' >$LICENSE
|
||||
/**
|
||||
* llama.cpp - git $SHA1
|
||||
*
|
||||
$(sed 's/^/ * /' <$1/LICENSE)
|
||||
*/
|
||||
|
||||
EOF
|
||||
|
||||
for IN in $OUT/*.{c,h,cpp,m,metal,cu}; do
|
||||
TMP=$(mktemp)
|
||||
status "updating license $IN"
|
||||
cat $LICENSE $IN >$TMP
|
||||
mv $TMP $IN
|
||||
done
|
||||
|
||||
touchup() {
|
||||
local CONSTRAINT=$1 && shift
|
||||
|
||||
for IN in $*; do
|
||||
status "touching up $IN..."
|
||||
TMP=$(mktemp)
|
||||
{
|
||||
echo "//go:build $CONSTRAINT"
|
||||
echo
|
||||
} | cat - $IN >$TMP
|
||||
mv $TMP $IN
|
||||
done
|
||||
}
|
||||
|
||||
touchup darwin $OUT/ggml-metal.*
|
||||
touchup mpi $OUT/ggml-mpi.*
|
||||
touchup opencl $OUT/ggml-opencl.*
|
4
main.go
@@ -1,9 +1,11 @@
|
||||
package main
|
||||
|
||||
import (
|
||||
"context"
|
||||
|
||||
"github.com/jmorganca/ollama/cmd"
|
||||
)
|
||||
|
||||
func main() {
|
||||
cmd.NewCLI().Execute()
|
||||
cmd.NewCLI().ExecuteContext(context.Background())
|
||||
}
|
||||
|
38
models.json
@@ -1,38 +0,0 @@
|
||||
[
|
||||
{
|
||||
"name": "orca",
|
||||
"display_name": "Orca Mini",
|
||||
"parameters": "3B",
|
||||
"url": "https://huggingface.co/TheBloke/orca_mini_3B-GGML/resolve/main/orca-mini-3b.ggmlv3.q4_1.bin",
|
||||
"short_description": "Follow instructions. Great small model that runs fast even without GPU support.",
|
||||
"description": "An OpenLLaMa-3B model trained on explain tuned datasets, created using Instructions and Input from WizardLM, Alpaca & Dolly-V2 datasets and applying Orca Research Paper dataset construction approaches.",
|
||||
"published_by": "TheBloke",
|
||||
"original_author": "psmathur",
|
||||
"original_url": "https://huggingface.co/psmathur/orca_mini_3b",
|
||||
"license": "CC-BY-SA-4.0"
|
||||
},
|
||||
{
|
||||
"name": "nous-hermes",
|
||||
"display_name": "Nous Hermes",
|
||||
"parameters": "13B",
|
||||
"url": "https://huggingface.co/TheBloke/Nous-Hermes-13B-GGML/resolve/main/nous-hermes-13b.ggmlv3.q2_K.bin",
|
||||
"short_description": "Currently one of the best 13B general model.",
|
||||
"description": "It is suitable for a wide range of language tasks, from generating creative text to understanding and following complex instructions. This model was fine-tuned by Nous Research, with Teknium and Karan4D leading the fine tuning process and dataset curation, Redmond AI sponsoring the compute, and several other contributors. The result is an enhanced Llama 13b model that rivals GPT-3.5-turbo in performance across a variety of tasks. \n \n This model stands out for its long responses, low hallucination rate, and absence of OpenAI censorship mechanisms. The fine-tuning process was performed with a 2000 sequence length on an 8x a100 80GB DGX machine for over 50 hours.",
|
||||
"published_by": "TheBloke",
|
||||
"original_author": "NousResearch",
|
||||
"original_url": "https://huggingface.co/NousResearch/Nous-Hermes-13b",
|
||||
"license": "GPL"
|
||||
},
|
||||
{
|
||||
"name": "vicuna",
|
||||
"display_name": "Vicuna",
|
||||
"parameters": "7B",
|
||||
"url": "https://huggingface.co/TheBloke/vicuna-7B-v1.3-GGML/resolve/main/vicuna-7b-v1.3.ggmlv3.q4_0.bin",
|
||||
"short_description": "Vicuna is a chat assistant trained by fine-tuning LLaMA on user-shared conversations collected from ShareGPT.",
|
||||
"description": "The primary use of Vicuna is research on large language models and chatbots. The primary intended users of the model are researchers and hobbyists in natural language processing, machine learning, and artificial intelligence.",
|
||||
"published_by": "TheBloke",
|
||||
"original_author": "LMSYS",
|
||||
"original_url": "https://huggingface.co/lmsys/vicuna-7b-v1.3",
|
||||
"license:": "Non-commercial"
|
||||
}
|
||||
]
|
82
parser/parser.go
Normal file
@@ -0,0 +1,82 @@
|
||||
package parser
|
||||
|
||||
import (
|
||||
"bufio"
|
||||
"bytes"
|
||||
"errors"
|
||||
"fmt"
|
||||
"io"
|
||||
)
|
||||
|
||||
type Command struct {
|
||||
Name string
|
||||
Args string
|
||||
}
|
||||
|
||||
func (c *Command) Reset() {
|
||||
c.Name = ""
|
||||
c.Args = ""
|
||||
}
|
||||
|
||||
func Parse(reader io.Reader) ([]Command, error) {
|
||||
var commands []Command
|
||||
|
||||
var command, modelCommand Command
|
||||
|
||||
scanner := bufio.NewScanner(reader)
|
||||
scanner.Split(scanModelfile)
|
||||
for scanner.Scan() {
|
||||
line := scanner.Bytes()
|
||||
|
||||
fields := bytes.SplitN(line, []byte(" "), 2)
|
||||
if len(fields) == 0 {
|
||||
continue
|
||||
}
|
||||
|
||||
switch string(bytes.ToUpper(fields[0])) {
|
||||
case "FROM":
|
||||
command.Name = "model"
|
||||
command.Args = string(fields[1])
|
||||
// copy command for validation
|
||||
modelCommand = command
|
||||
case "LICENSE", "TEMPLATE", "SYSTEM", "PROMPT":
|
||||
command.Name = string(bytes.ToLower(fields[0]))
|
||||
command.Args = string(fields[1])
|
||||
case "PARAMETER":
|
||||
fields = bytes.SplitN(fields[1], []byte(" "), 2)
|
||||
command.Name = string(fields[0])
|
||||
command.Args = string(fields[1])
|
||||
default:
|
||||
continue
|
||||
}
|
||||
|
||||
commands = append(commands, command)
|
||||
command.Reset()
|
||||
}
|
||||
|
||||
if modelCommand.Args == "" {
|
||||
return nil, fmt.Errorf("no FROM line for the model was specified")
|
||||
}
|
||||
|
||||
return commands, scanner.Err()
|
||||
}
|
||||
|
||||
func scanModelfile(data []byte, atEOF bool) (advance int, token []byte, err error) {
|
||||
newline := bytes.IndexByte(data, '\n')
|
||||
|
||||
if start := bytes.Index(data, []byte(`"""`)); start >= 0 && start < newline {
|
||||
end := bytes.Index(data[start+3:], []byte(`"""`))
|
||||
if end < 0 {
|
||||
if atEOF {
|
||||
return 0, nil, errors.New(`unterminated multiline string: """`)
|
||||
} else {
|
||||
return 0, nil, nil
|
||||
}
|
||||
}
|
||||
|
||||
n := start + 3 + end + 3
|
||||
return n, bytes.Replace(data[:n], []byte(`"""`), []byte(""), 2), nil
|
||||
}
|
||||
|
||||
return bufio.ScanLines(data, atEOF)
|
||||
}
|
21
progressbar/LICENSE
Normal file
@@ -0,0 +1,21 @@
|
||||
MIT License
|
||||
|
||||
Copyright (c) 2017 Zack
|
||||
|
||||
Permission is hereby granted, free of charge, to any person obtaining a copy
|
||||
of this software and associated documentation files (the "Software"), to deal
|
||||
in the Software without restriction, including without limitation the rights
|
||||
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
||||
copies of the Software, and to permit persons to whom the Software is
|
||||
furnished to do so, subject to the following conditions:
|
||||
|
||||
The above copyright notice and this permission notice shall be included in all
|
||||
copies or substantial portions of the Software.
|
||||
|
||||
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
||||
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
||||
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
||||
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
||||
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
||||
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
||||
SOFTWARE.
|
121
progressbar/README.md
Normal file
@@ -0,0 +1,121 @@
|
||||
# progressbar
|
||||
|
||||
[](https://github.com/schollz/progressbar/actions/workflows/ci.yml)
|
||||
[](https://goreportcard.com/report/github.com/schollz/progressbar)
|
||||
[](https://gocover.io/github.com/schollz/progressbar)
|
||||
[](https://godoc.org/github.com/schollz/progressbar/v3)
|
||||
|
||||
A very simple thread-safe progress bar which should work on every OS without problems. I needed a progressbar for [croc](https://github.com/schollz/croc) and everything I tried had problems, so I made another one. In order to be OS agnostic I do not plan to support [multi-line outputs](https://github.com/schollz/progressbar/issues/6).
|
||||
|
||||
|
||||
## Install
|
||||
|
||||
```
|
||||
go get -u github.com/schollz/progressbar/v3
|
||||
```
|
||||
|
||||
## Usage
|
||||
|
||||
### Basic usage
|
||||
|
||||
```golang
|
||||
bar := progressbar.Default(100)
|
||||
for i := 0; i < 100; i++ {
|
||||
bar.Add(1)
|
||||
time.Sleep(40 * time.Millisecond)
|
||||
}
|
||||
```
|
||||
|
||||
which looks like:
|
||||
|
||||

|
||||
|
||||
|
||||
### I/O operations
|
||||
|
||||
The `progressbar` implements an `io.Writer` so it can automatically detect the number of bytes written to a stream, so you can use it as a progressbar for an `io.Reader`.
|
||||
|
||||
```golang
|
||||
req, _ := http.NewRequest("GET", "https://dl.google.com/go/go1.14.2.src.tar.gz", nil)
|
||||
resp, _ := http.DefaultClient.Do(req)
|
||||
defer resp.Body.Close()
|
||||
|
||||
f, _ := os.OpenFile("go1.14.2.src.tar.gz", os.O_CREATE|os.O_WRONLY, 0644)
|
||||
defer f.Close()
|
||||
|
||||
bar := progressbar.DefaultBytes(
|
||||
resp.ContentLength,
|
||||
"downloading",
|
||||
)
|
||||
io.Copy(io.MultiWriter(f, bar), resp.Body)
|
||||
```
|
||||
|
||||
which looks like:
|
||||
|
||||

|
||||
|
||||
|
||||
### Progress bar with unknown length
|
||||
|
||||
A progressbar with unknown length is a spinner. Any bar with -1 length will automatically convert it to a spinner with a customizable spinner type. For example, the above code can be run and set the `resp.ContentLength` to `-1`.
|
||||
|
||||
which looks like:
|
||||
|
||||

|
||||
|
||||
|
||||
### Customization
|
||||
|
||||
There is a lot of customization that you can do - change the writer, the color, the width, description, theme, etc. See [all the options](https://pkg.go.dev/github.com/schollz/progressbar/v3?tab=doc#Option).
|
||||
|
||||
```golang
|
||||
bar := progressbar.NewOptions(1000,
|
||||
progressbar.OptionSetWriter(ansi.NewAnsiStdout()),
|
||||
progressbar.OptionEnableColorCodes(true),
|
||||
progressbar.OptionShowBytes(true),
|
||||
progressbar.OptionSetWidth(15),
|
||||
progressbar.OptionSetDescription("[cyan][1/3][reset] Writing moshable file..."),
|
||||
progressbar.OptionSetTheme(progressbar.Theme{
|
||||
Saucer: "[green]=[reset]",
|
||||
SaucerHead: "[green]>[reset]",
|
||||
SaucerPadding: " ",
|
||||
BarStart: "[",
|
||||
BarEnd: "]",
|
||||
}))
|
||||
for i := 0; i < 1000; i++ {
|
||||
bar.Add(1)
|
||||
time.Sleep(5 * time.Millisecond)
|
||||
}
|
||||
```
|
||||
|
||||
which looks like:
|
||||
|
||||

|
||||
|
||||
|
||||
## Contributing
|
||||
|
||||
Pull requests are welcome. Feel free to...
|
||||
|
||||
- Revise documentation
|
||||
- Add new features
|
||||
- Fix bugs
|
||||
- Suggest improvements
|
||||
|
||||
## Thanks
|
||||
|
||||
Thanks [@Dynom](https://github.com/dynom) for massive improvements in version 2.0!
|
||||
|
||||
Thanks [@CrushedPixel](https://github.com/CrushedPixel) for adding descriptions and color code support!
|
||||
|
||||
Thanks [@MrMe42](https://github.com/MrMe42) for adding some minor features!
|
||||
|
||||
Thanks [@tehstun](https://github.com/tehstun) for some great PRs!
|
||||
|
||||
Thanks [@Benzammour](https://github.com/Benzammour) and [@haseth](https://github.com/haseth) for helping create v3!
|
||||
|
||||
Thanks [@briandowns](https://github.com/briandowns) for compiling the list of spinners.
|
||||
|
||||
## License
|
||||
|
||||
MIT
|
1098
progressbar/progressbar.go
Normal file
80
progressbar/spinners.go
Normal file
@@ -0,0 +1,80 @@
|
||||
package progressbar
|
||||
|
||||
var spinners = map[int][]string{
|
||||
0: {"←", "↖", "↑", "↗", "→", "↘", "↓", "↙"},
|
||||
1: {"▁", "▃", "▄", "▅", "▆", "▇", "█", "▇", "▆", "▅", "▄", "▃", "▁"},
|
||||
2: {"▖", "▘", "▝", "▗"},
|
||||
3: {"┤", "┘", "┴", "└", "├", "┌", "┬", "┐"},
|
||||
4: {"◢", "◣", "◤", "◥"},
|
||||
5: {"◰", "◳", "◲", "◱"},
|
||||
6: {"◴", "◷", "◶", "◵"},
|
||||
7: {"◐", "◓", "◑", "◒"},
|
||||
8: {".", "o", "O", "@", "*"},
|
||||
9: {"|", "/", "-", "\\"},
|
||||
10: {"◡◡", "⊙⊙", "◠◠"},
|
||||
11: {"⣾", "⣽", "⣻", "⢿", "⡿", "⣟", "⣯", "⣷"},
|
||||
12: {">))'>", " >))'>", " >))'>", " >))'>", " >))'>", " <'((<", " <'((<", " <'((<"},
|
||||
13: {"⠁", "⠂", "⠄", "⡀", "⢀", "⠠", "⠐", "⠈"},
|
||||
14: {"⠋", "⠙", "⠹", "⠸", "⠼", "⠴", "⠦", "⠧", "⠇", "⠏"},
|
||||
15: {"a", "b", "c", "d", "e", "f", "g", "h", "i", "j", "k", "l", "m", "n", "o", "p", "q", "r", "s", "t", "u", "v", "w", "x", "y", "z"},
|
||||
16: {"▉", "▊", "▋", "▌", "▍", "▎", "▏", "▎", "▍", "▌", "▋", "▊", "▉"},
|
||||
17: {"■", "□", "▪", "▫"},
|
||||
18: {"←", "↑", "→", "↓"},
|
||||
19: {"╫", "╪"},
|
||||
20: {"⇐", "⇖", "⇑", "⇗", "⇒", "⇘", "⇓", "⇙"},
|
||||
21: {"⠁", "⠁", "⠉", "⠙", "⠚", "⠒", "⠂", "⠂", "⠒", "⠲", "⠴", "⠤", "⠄", "⠄", "⠤", "⠠", "⠠", "⠤", "⠦", "⠖", "⠒", "⠐", "⠐", "⠒", "⠓", "⠋", "⠉", "⠈", "⠈"},
|
||||
22: {"⠈", "⠉", "⠋", "⠓", "⠒", "⠐", "⠐", "⠒", "⠖", "⠦", "⠤", "⠠", "⠠", "⠤", "⠦", "⠖", "⠒", "⠐", "⠐", "⠒", "⠓", "⠋", "⠉", "⠈"},
|
||||
23: {"⠁", "⠉", "⠙", "⠚", "⠒", "⠂", "⠂", "⠒", "⠲", "⠴", "⠤", "⠄", "⠄", "⠤", "⠴", "⠲", "⠒", "⠂", "⠂", "⠒", "⠚", "⠙", "⠉", "⠁"},
|
||||
24: {"⠋", "⠙", "⠚", "⠒", "⠂", "⠂", "⠒", "⠲", "⠴", "⠦", "⠖", "⠒", "⠐", "⠐", "⠒", "⠓", "⠋"},
|
||||
25: {"ヲ", "ァ", "ィ", "ゥ", "ェ", "ォ", "ャ", "ュ", "ョ", "ッ", "ア", "イ", "ウ", "エ", "オ", "カ", "キ", "ク", "ケ", "コ", "サ", "シ", "ス", "セ", "ソ", "タ", "チ", "ツ", "テ", "ト", "ナ", "ニ", "ヌ", "ネ", "ノ", "ハ", "ヒ", "フ", "ヘ", "ホ", "マ", "ミ", "ム", "メ", "モ", "ヤ", "ユ", "ヨ", "ラ", "リ", "ル", "レ", "ロ", "ワ", "ン"},
|
||||
26: {".", "..", "..."},
|
||||
27: {"▁", "▂", "▃", "▄", "▅", "▆", "▇", "█", "▉", "▊", "▋", "▌", "▍", "▎", "▏", "▏", "▎", "▍", "▌", "▋", "▊", "▉", "█", "▇", "▆", "▅", "▄", "▃", "▂", "▁"},
|
||||
28: {".", "o", "O", "°", "O", "o", "."},
|
||||
29: {"+", "x"},
|
||||
30: {"v", "<", "^", ">"},
|
||||
31: {">>--->", " >>--->", " >>--->", " >>--->", " >>--->", " <---<<", " <---<<", " <---<<", " <---<<", "<---<<"},
|
||||
32: {"|", "||", "|||", "||||", "|||||", "|||||||", "||||||||", "|||||||", "||||||", "|||||", "||||", "|||", "||", "|"},
|
||||
33: {"[ ]", "[= ]", "[== ]", "[=== ]", "[==== ]", "[===== ]", "[====== ]", "[======= ]", "[======== ]", "[========= ]", "[==========]"},
|
||||
34: {"(*---------)", "(-*--------)", "(--*-------)", "(---*------)", "(----*-----)", "(-----*----)", "(------*---)", "(-------*--)", "(--------*-)", "(---------*)"},
|
||||
35: {"█▒▒▒▒▒▒▒▒▒", "███▒▒▒▒▒▒▒", "█████▒▒▒▒▒", "███████▒▒▒", "██████████"},
|
||||
36: {"[ ]", "[=> ]", "[===> ]", "[=====> ]", "[======> ]", "[========> ]", "[==========> ]", "[============> ]", "[==============> ]", "[================> ]", "[==================> ]", "[===================>]"},
|
||||
37: {"ဝ", "၀"},
|
||||
38: {"▌", "▀", "▐▄"},
|
||||
39: {"🌍", "🌎", "🌏"},
|
||||
40: {"◜", "◝", "◞", "◟"},
|
||||
41: {"⬒", "⬔", "⬓", "⬕"},
|
||||
42: {"⬖", "⬘", "⬗", "⬙"},
|
||||
43: {"[>>> >]", "[]>>>> []", "[] >>>> []", "[] >>>> []", "[] >>>> []", "[] >>>>[]", "[>> >>]"},
|
||||
44: {"♠", "♣", "♥", "♦"},
|
||||
45: {"➞", "➟", "➠", "➡", "➠", "➟"},
|
||||
46: {" | ", ` \ `, "_ ", ` \ `, " | ", " / ", " _", " / "},
|
||||
47: {" . . . .", ". . . .", ". . . .", ". . . .", ". . . . ", ". . . . ."},
|
||||
48: {" | ", " / ", " _ ", ` \ `, " | ", ` \ `, " _ ", " / "},
|
||||
49: {"⎺", "⎻", "⎼", "⎽", "⎼", "⎻"},
|
||||
50: {"▹▹▹▹▹", "▸▹▹▹▹", "▹▸▹▹▹", "▹▹▸▹▹", "▹▹▹▸▹", "▹▹▹▹▸"},
|
||||
51: {"[ ]", "[ =]", "[ ==]", "[ ===]", "[====]", "[=== ]", "[== ]", "[= ]"},
|
||||
52: {"( ● )", "( ● )", "( ● )", "( ● )", "( ●)", "( ● )", "( ● )", "( ● )", "( ● )"},
|
||||
53: {"✶", "✸", "✹", "✺", "✹", "✷"},
|
||||
54: {"▐|\\____________▌", "▐_|\\___________▌", "▐__|\\__________▌", "▐___|\\_________▌", "▐____|\\________▌", "▐_____|\\_______▌", "▐______|\\______▌", "▐_______|\\_____▌", "▐________|\\____▌", "▐_________|\\___▌", "▐__________|\\__▌", "▐___________|\\_▌", "▐____________|\\▌", "▐____________/|▌", "▐___________/|_▌", "▐__________/|__▌", "▐_________/|___▌", "▐________/|____▌", "▐_______/|_____▌", "▐______/|______▌", "▐_____/|_______▌", "▐____/|________▌", "▐___/|_________▌", "▐__/|__________▌", "▐_/|___________▌", "▐/|____________▌"},
|
||||
55: {"▐⠂ ▌", "▐⠈ ▌", "▐ ⠂ ▌", "▐ ⠠ ▌", "▐ ⡀ ▌", "▐ ⠠ ▌", "▐ ⠂ ▌", "▐ ⠈ ▌", "▐ ⠂ ▌", "▐ ⠠ ▌", "▐ ⡀ ▌", "▐ ⠠ ▌", "▐ ⠂ ▌", "▐ ⠈ ▌", "▐ ⠂▌", "▐ ⠠▌", "▐ ⡀▌", "▐ ⠠ ▌", "▐ ⠂ ▌", "▐ ⠈ ▌", "▐ ⠂ ▌", "▐ ⠠ ▌", "▐ ⡀ ▌", "▐ ⠠ ▌", "▐ ⠂ ▌", "▐ ⠈ ▌", "▐ ⠂ ▌", "▐ ⠠ ▌", "▐ ⡀ ▌", "▐⠠ ▌"},
|
||||
56: {"¿", "?"},
|
||||
57: {"⢹", "⢺", "⢼", "⣸", "⣇", "⡧", "⡗", "⡏"},
|
||||
58: {"⢄", "⢂", "⢁", "⡁", "⡈", "⡐", "⡠"},
|
||||
59: {". ", ".. ", "...", " ..", " .", " "},
|
||||
60: {".", "o", "O", "°", "O", "o", "."},
|
||||
61: {"▓", "▒", "░"},
|
||||
62: {"▌", "▀", "▐", "▄"},
|
||||
63: {"⊶", "⊷"},
|
||||
64: {"▪", "▫"},
|
||||
65: {"□", "■"},
|
||||
66: {"▮", "▯"},
|
||||
67: {"-", "=", "≡"},
|
||||
68: {"d", "q", "p", "b"},
|
||||
69: {"∙∙∙", "●∙∙", "∙●∙", "∙∙●", "∙∙∙"},
|
||||
70: {"🌑 ", "🌒 ", "🌓 ", "🌔 ", "🌕 ", "🌖 ", "🌗 ", "🌘 "},
|
||||
71: {"☗", "☖"},
|
||||
72: {"⧇", "⧆"},
|
||||
73: {"◉", "◎"},
|
||||
74: {"㊂", "㊀", "㊁"},
|
||||
75: {"⦾", "⦿"},
|
||||
}
|
@@ -12,13 +12,15 @@ ARCH=$(go env GOARCH)
|
||||
|
||||
go build .
|
||||
|
||||
npm --prefix app run make:sign
|
||||
|
||||
# Create a new tag if it doesn't exist.
|
||||
if ! git rev-parse v$VERSION >/dev/null 2>&1; then
|
||||
git tag v$VERSION
|
||||
git push origin v$VERSION
|
||||
fi
|
||||
|
||||
mkdir dist
|
||||
mkdir -p dist
|
||||
cp app/out/make/zip/${OS}/${ARCH}/Ollama-${OS}-${ARCH}-${VERSION}.zip dist/Ollama-${OS}-${ARCH}.zip
|
||||
cp ./ollama dist/ollama-${OS}-${ARCH}
|
||||
|
||||
|
1028
server/images.go
Normal file
123
server/modelpath.go
Normal file
@@ -0,0 +1,123 @@
|
||||
package server
|
||||
|
||||
import (
|
||||
"fmt"
|
||||
"os"
|
||||
"path/filepath"
|
||||
"runtime"
|
||||
"strings"
|
||||
)
|
||||
|
||||
type ModelPath struct {
|
||||
ProtocolScheme string
|
||||
Registry string
|
||||
Namespace string
|
||||
Repository string
|
||||
Tag string
|
||||
}
|
||||
|
||||
const (
|
||||
DefaultRegistry = "registry.ollama.ai"
|
||||
DefaultNamespace = "library"
|
||||
DefaultTag = "latest"
|
||||
DefaultProtocolScheme = "https"
|
||||
)
|
||||
|
||||
func ParseModelPath(name string) ModelPath {
|
||||
slashParts := strings.Split(name, "/")
|
||||
var registry, namespace, repository, tag string
|
||||
|
||||
switch len(slashParts) {
|
||||
case 3:
|
||||
registry = slashParts[0]
|
||||
namespace = slashParts[1]
|
||||
repository = strings.Split(slashParts[2], ":")[0]
|
||||
case 2:
|
||||
registry = DefaultRegistry
|
||||
namespace = slashParts[0]
|
||||
repository = strings.Split(slashParts[1], ":")[0]
|
||||
case 1:
|
||||
registry = DefaultRegistry
|
||||
namespace = DefaultNamespace
|
||||
repository = strings.Split(slashParts[0], ":")[0]
|
||||
default:
|
||||
fmt.Println("Invalid image format.")
|
||||
return ModelPath{}
|
||||
}
|
||||
|
||||
colonParts := strings.Split(slashParts[len(slashParts)-1], ":")
|
||||
if len(colonParts) == 2 {
|
||||
tag = colonParts[1]
|
||||
} else {
|
||||
tag = DefaultTag
|
||||
}
|
||||
|
||||
return ModelPath{
|
||||
ProtocolScheme: DefaultProtocolScheme,
|
||||
Registry: registry,
|
||||
Namespace: namespace,
|
||||
Repository: repository,
|
||||
Tag: tag,
|
||||
}
|
||||
}
|
||||
|
||||
func (mp ModelPath) GetNamespaceRepository() string {
|
||||
return fmt.Sprintf("%s/%s", mp.Namespace, mp.Repository)
|
||||
}
|
||||
|
||||
func (mp ModelPath) GetFullTagname() string {
|
||||
return fmt.Sprintf("%s/%s/%s:%s", mp.Registry, mp.Namespace, mp.Repository, mp.Tag)
|
||||
}
|
||||
|
||||
func (mp ModelPath) GetShortTagname() string {
|
||||
if mp.Registry == DefaultRegistry {
|
||||
if mp.Namespace == DefaultNamespace {
|
||||
return fmt.Sprintf("%s:%s", mp.Repository, mp.Tag)
|
||||
}
|
||||
return fmt.Sprintf("%s/%s:%s", mp.Namespace, mp.Repository, mp.Tag)
|
||||
}
|
||||
return fmt.Sprintf("%s/%s/%s:%s", mp.Registry, mp.Namespace, mp.Repository, mp.Tag)
|
||||
}
|
||||
|
||||
func (mp ModelPath) GetManifestPath(createDir bool) (string, error) {
|
||||
home, err := os.UserHomeDir()
|
||||
if err != nil {
|
||||
return "", err
|
||||
}
|
||||
|
||||
path := filepath.Join(home, ".ollama", "models", "manifests", mp.Registry, mp.Namespace, mp.Repository, mp.Tag)
|
||||
if createDir {
|
||||
if err := os.MkdirAll(filepath.Dir(path), 0o755); err != nil {
|
||||
return "", err
|
||||
}
|
||||
}
|
||||
|
||||
return path, nil
|
||||
}
|
||||
|
||||
func GetManifestPath() (string, error) {
|
||||
home, err := os.UserHomeDir()
|
||||
if err != nil {
|
||||
return "", err
|
||||
}
|
||||
|
||||
return filepath.Join(home, ".ollama", "models", "manifests"), nil
|
||||
}
|
||||
|
||||
func GetBlobsPath(digest string) (string, error) {
|
||||
home, err := os.UserHomeDir()
|
||||
if err != nil {
|
||||
return "", err
|
||||
}
|
||||
|
||||
if runtime.GOOS == "windows" {
|
||||
digest = strings.ReplaceAll(digest, ":", "-")
|
||||
}
|
||||
|
||||
path := filepath.Join(home, ".ollama", "models", "blobs", digest)
|
||||
if err := os.MkdirAll(filepath.Dir(path), 0o755); err != nil {
|
||||
return "", err
|
||||
}
|
||||
|
||||
return path, nil
|
||||
}
|
140
server/models.go
@@ -1,140 +0,0 @@
|
||||
package server
|
||||
|
||||
import (
|
||||
"encoding/json"
|
||||
"errors"
|
||||
"fmt"
|
||||
"io"
|
||||
"net/http"
|
||||
"os"
|
||||
"path"
|
||||
"strconv"
|
||||
)
|
||||
|
||||
const directoryURL = "https://ollama.ai/api/models"
|
||||
|
||||
type Model struct {
|
||||
Name string `json:"name"`
|
||||
DisplayName string `json:"display_name"`
|
||||
Parameters string `json:"parameters"`
|
||||
URL string `json:"url"`
|
||||
ShortDescription string `json:"short_description"`
|
||||
Description string `json:"description"`
|
||||
PublishedBy string `json:"published_by"`
|
||||
OriginalAuthor string `json:"original_author"`
|
||||
OriginalURL string `json:"original_url"`
|
||||
License string `json:"license"`
|
||||
}
|
||||
|
||||
func (m *Model) FullName() string {
|
||||
home, err := os.UserHomeDir()
|
||||
if err != nil {
|
||||
panic(err)
|
||||
}
|
||||
|
||||
return path.Join(home, ".ollama", "models", m.Name+".bin")
|
||||
}
|
||||
|
||||
func (m *Model) TempFile() string {
|
||||
fullName := m.FullName()
|
||||
return path.Join(
|
||||
path.Dir(fullName),
|
||||
fmt.Sprintf(".%s.part", path.Base(fullName)),
|
||||
)
|
||||
}
|
||||
|
||||
func getRemote(model string) (*Model, error) {
|
||||
// resolve the model download from our directory
|
||||
resp, err := http.Get(directoryURL)
|
||||
if err != nil {
|
||||
return nil, fmt.Errorf("failed to get directory: %w", err)
|
||||
}
|
||||
defer resp.Body.Close()
|
||||
body, err := io.ReadAll(resp.Body)
|
||||
if err != nil {
|
||||
return nil, fmt.Errorf("failed to read directory: %w", err)
|
||||
}
|
||||
var models []Model
|
||||
err = json.Unmarshal(body, &models)
|
||||
if err != nil {
|
||||
return nil, fmt.Errorf("failed to parse directory: %w", err)
|
||||
}
|
||||
for _, m := range models {
|
||||
if m.Name == model {
|
||||
return &m, nil
|
||||
}
|
||||
}
|
||||
return nil, fmt.Errorf("model not found in directory: %s", model)
|
||||
}
|
||||
|
||||
func saveModel(model *Model, fn func(total, completed int64)) error {
|
||||
// this models cache directory is created by the server on startup
|
||||
|
||||
client := &http.Client{}
|
||||
req, err := http.NewRequest("GET", model.URL, nil)
|
||||
if err != nil {
|
||||
return fmt.Errorf("failed to download model: %w", err)
|
||||
}
|
||||
|
||||
// check if completed file exists
|
||||
fi, err := os.Stat(model.FullName())
|
||||
switch {
|
||||
case errors.Is(err, os.ErrNotExist):
|
||||
// noop, file doesn't exist so create it
|
||||
case err != nil:
|
||||
return fmt.Errorf("stat: %w", err)
|
||||
default:
|
||||
fn(fi.Size(), fi.Size())
|
||||
return nil
|
||||
}
|
||||
|
||||
var size int64
|
||||
|
||||
// completed file doesn't exist, check partial file
|
||||
fi, err = os.Stat(model.TempFile())
|
||||
switch {
|
||||
case errors.Is(err, os.ErrNotExist):
|
||||
// noop, file doesn't exist so create it
|
||||
case err != nil:
|
||||
return fmt.Errorf("stat: %w", err)
|
||||
default:
|
||||
size = fi.Size()
|
||||
}
|
||||
|
||||
req.Header.Add("Range", fmt.Sprintf("bytes=%d-", size))
|
||||
|
||||
resp, err := client.Do(req)
|
||||
if err != nil {
|
||||
return fmt.Errorf("failed to download model: %w", err)
|
||||
}
|
||||
defer resp.Body.Close()
|
||||
|
||||
if resp.StatusCode >= 400 {
|
||||
return fmt.Errorf("failed to download model: %s", resp.Status)
|
||||
}
|
||||
|
||||
out, err := os.OpenFile(model.TempFile(), os.O_CREATE|os.O_APPEND|os.O_WRONLY, 0o644)
|
||||
if err != nil {
|
||||
panic(err)
|
||||
}
|
||||
defer out.Close()
|
||||
|
||||
remaining, _ := strconv.ParseInt(resp.Header.Get("Content-Length"), 10, 64)
|
||||
completed := size
|
||||
|
||||
total := remaining + completed
|
||||
|
||||
for {
|
||||
fn(total, completed)
|
||||
if completed >= total {
|
||||
return os.Rename(model.TempFile(), model.FullName())
|
||||
}
|
||||
|
||||
n , err := io.CopyN(out, resp.Body, 8192)
|
||||
if err != nil && !errors.Is(err, io.EOF) {
|
||||
return err
|
||||
}
|
||||
|
||||
completed += n
|
||||
}
|
||||
}
|
259
server/routes.go
@@ -1,125 +1,230 @@
|
||||
package server
|
||||
|
||||
import (
|
||||
"embed"
|
||||
"encoding/json"
|
||||
"errors"
|
||||
"fmt"
|
||||
"io"
|
||||
"log"
|
||||
"math"
|
||||
"net"
|
||||
"net/http"
|
||||
"os"
|
||||
"path"
|
||||
"path/filepath"
|
||||
"strings"
|
||||
"text/template"
|
||||
"time"
|
||||
|
||||
"dario.cat/mergo"
|
||||
"github.com/gin-gonic/gin"
|
||||
"github.com/lithammer/fuzzysearch/fuzzy"
|
||||
|
||||
"github.com/jmorganca/ollama/api"
|
||||
"github.com/jmorganca/ollama/llama"
|
||||
)
|
||||
|
||||
//go:embed templates/*
|
||||
var templatesFS embed.FS
|
||||
var templates = template.Must(template.ParseFS(templatesFS, "templates/*.prompt"))
|
||||
|
||||
func cacheDir() string {
|
||||
home, err := os.UserHomeDir()
|
||||
if err != nil {
|
||||
panic(err)
|
||||
}
|
||||
|
||||
return path.Join(home, ".ollama")
|
||||
}
|
||||
|
||||
func generate(c *gin.Context) {
|
||||
req := api.GenerateRequest{
|
||||
Options: api.DefaultOptions(),
|
||||
}
|
||||
func GenerateHandler(c *gin.Context) {
|
||||
start := time.Now()
|
||||
|
||||
var req api.GenerateRequest
|
||||
if err := c.ShouldBindJSON(&req); err != nil {
|
||||
c.JSON(http.StatusBadRequest, gin.H{"error": err.Error()})
|
||||
return
|
||||
}
|
||||
|
||||
if remoteModel, _ := getRemote(req.Model); remoteModel != nil {
|
||||
req.Model = remoteModel.FullName()
|
||||
}
|
||||
if _, err := os.Stat(req.Model); err != nil {
|
||||
if !errors.Is(err, os.ErrNotExist) {
|
||||
c.JSON(http.StatusBadRequest, gin.H{"error": err.Error()})
|
||||
return
|
||||
}
|
||||
req.Model = path.Join(cacheDir(), "models", req.Model+".bin")
|
||||
model, err := GetModel(req.Model)
|
||||
if err != nil {
|
||||
c.JSON(http.StatusBadRequest, gin.H{"error": err.Error()})
|
||||
return
|
||||
}
|
||||
|
||||
ch := make(chan any)
|
||||
go stream(c, ch)
|
||||
|
||||
templateNames := make([]string, 0, len(templates.Templates()))
|
||||
for _, template := range templates.Templates() {
|
||||
templateNames = append(templateNames, template.Name())
|
||||
opts := api.DefaultOptions()
|
||||
if err := mergo.Merge(&opts, model.Options, mergo.WithOverride); err != nil {
|
||||
c.JSON(http.StatusInternalServerError, gin.H{"error": err.Error()})
|
||||
return
|
||||
}
|
||||
|
||||
match, _ := matchRankOne(path.Base(req.Model), templateNames)
|
||||
if template := templates.Lookup(match); template != nil {
|
||||
var sb strings.Builder
|
||||
if err := template.Execute(&sb, req); err != nil {
|
||||
c.JSON(http.StatusInternalServerError, gin.H{"error": err.Error()})
|
||||
return
|
||||
}
|
||||
|
||||
req.Prompt = sb.String()
|
||||
if err := mergo.Merge(&opts, req.Options, mergo.WithOverride); err != nil {
|
||||
c.JSON(http.StatusInternalServerError, gin.H{"error": err.Error()})
|
||||
return
|
||||
}
|
||||
|
||||
llm, err := llama.New(req.Model, req.Options)
|
||||
prompt, err := model.Prompt(req)
|
||||
if err != nil {
|
||||
c.JSON(http.StatusInternalServerError, gin.H{"error": err.Error()})
|
||||
return
|
||||
}
|
||||
|
||||
llm, err := llama.New(model.ModelPath, opts)
|
||||
if err != nil {
|
||||
c.JSON(http.StatusInternalServerError, gin.H{"error": err.Error()})
|
||||
return
|
||||
}
|
||||
defer llm.Close()
|
||||
|
||||
fn := func(s string) {
|
||||
ch <- api.GenerateResponse{Response: s}
|
||||
}
|
||||
ch := make(chan any)
|
||||
go func() {
|
||||
defer close(ch)
|
||||
fn := func(r api.GenerateResponse) {
|
||||
r.Model = req.Model
|
||||
r.CreatedAt = time.Now().UTC()
|
||||
if r.Done {
|
||||
r.TotalDuration = time.Since(start)
|
||||
}
|
||||
|
||||
if err := llm.Predict(req.Prompt, fn); err != nil {
|
||||
c.JSON(http.StatusInternalServerError, gin.H{"error": err.Error()})
|
||||
return
|
||||
}
|
||||
ch <- r
|
||||
}
|
||||
|
||||
if err := llm.Predict(req.Context, prompt, fn); err != nil {
|
||||
ch <- gin.H{"error": err.Error()}
|
||||
}
|
||||
}()
|
||||
|
||||
streamResponse(c, ch)
|
||||
}
|
||||
|
||||
func pull(c *gin.Context) {
|
||||
func PullModelHandler(c *gin.Context) {
|
||||
var req api.PullRequest
|
||||
if err := c.ShouldBindJSON(&req); err != nil {
|
||||
c.JSON(http.StatusBadRequest, gin.H{"error": err.Error()})
|
||||
return
|
||||
}
|
||||
|
||||
remote, err := getRemote(req.Model)
|
||||
if err != nil {
|
||||
c.JSON(http.StatusBadGateway, gin.H{"error": err.Error()})
|
||||
ch := make(chan any)
|
||||
go func() {
|
||||
defer close(ch)
|
||||
fn := func(r api.ProgressResponse) {
|
||||
ch <- r
|
||||
}
|
||||
|
||||
regOpts := &RegistryOptions{
|
||||
Insecure: req.Insecure,
|
||||
Username: req.Username,
|
||||
Password: req.Password,
|
||||
}
|
||||
|
||||
if err := PullModel(req.Name, regOpts, fn); err != nil {
|
||||
ch <- gin.H{"error": err.Error()}
|
||||
}
|
||||
}()
|
||||
|
||||
streamResponse(c, ch)
|
||||
}
|
||||
|
||||
func PushModelHandler(c *gin.Context) {
|
||||
var req api.PushRequest
|
||||
if err := c.ShouldBindJSON(&req); err != nil {
|
||||
c.JSON(http.StatusBadRequest, gin.H{"error": err.Error()})
|
||||
return
|
||||
}
|
||||
|
||||
ch := make(chan any)
|
||||
go stream(c, ch)
|
||||
|
||||
fn := func(total, completed int64) {
|
||||
ch <- api.PullProgress{
|
||||
Total: total,
|
||||
Completed: completed,
|
||||
Percent: float64(completed) / float64(total) * 100,
|
||||
go func() {
|
||||
defer close(ch)
|
||||
fn := func(r api.ProgressResponse) {
|
||||
ch <- r
|
||||
}
|
||||
|
||||
regOpts := &RegistryOptions{
|
||||
Insecure: req.Insecure,
|
||||
Username: req.Username,
|
||||
Password: req.Password,
|
||||
}
|
||||
|
||||
if err := PushModel(req.Name, regOpts, fn); err != nil {
|
||||
ch <- gin.H{"error": err.Error()}
|
||||
}
|
||||
}()
|
||||
|
||||
streamResponse(c, ch)
|
||||
}
|
||||
|
||||
func CreateModelHandler(c *gin.Context) {
|
||||
var req api.CreateRequest
|
||||
if err := c.ShouldBindJSON(&req); err != nil {
|
||||
c.JSON(http.StatusBadRequest, gin.H{"message": err.Error()})
|
||||
return
|
||||
}
|
||||
|
||||
if err := saveModel(remote, fn); err != nil {
|
||||
ch := make(chan any)
|
||||
go func() {
|
||||
defer close(ch)
|
||||
fn := func(status string) {
|
||||
ch <- api.CreateProgress{
|
||||
Status: status,
|
||||
}
|
||||
}
|
||||
|
||||
if err := CreateModel(req.Name, req.Path, fn); err != nil {
|
||||
ch <- gin.H{"error": err.Error()}
|
||||
}
|
||||
}()
|
||||
|
||||
streamResponse(c, ch)
|
||||
}
|
||||
|
||||
func DeleteModelHandler(c *gin.Context) {
|
||||
var req api.DeleteRequest
|
||||
if err := c.ShouldBindJSON(&req); err != nil {
|
||||
c.JSON(http.StatusBadRequest, gin.H{"error": err.Error()})
|
||||
return
|
||||
}
|
||||
|
||||
if err := DeleteModel(req.Name); err != nil {
|
||||
if os.IsNotExist(err) {
|
||||
c.JSON(http.StatusNotFound, gin.H{"error": fmt.Sprintf("model '%s' not found", req.Name)})
|
||||
} else {
|
||||
c.JSON(http.StatusInternalServerError, gin.H{"error": err.Error()})
|
||||
}
|
||||
return
|
||||
}
|
||||
}
|
||||
|
||||
func ListModelsHandler(c *gin.Context) {
|
||||
var models []api.ListResponseModel
|
||||
fp, err := GetManifestPath()
|
||||
if err != nil {
|
||||
c.JSON(http.StatusInternalServerError, gin.H{"error": err.Error()})
|
||||
return
|
||||
}
|
||||
err = filepath.Walk(fp, func(path string, info os.FileInfo, err error) error {
|
||||
if err != nil {
|
||||
if errors.Is(err, os.ErrNotExist) {
|
||||
log.Printf("manifest file does not exist: %s", fp)
|
||||
return nil
|
||||
}
|
||||
return err
|
||||
}
|
||||
if !info.IsDir() {
|
||||
fi, err := os.Stat(path)
|
||||
if err != nil {
|
||||
log.Printf("skipping file: %s", fp)
|
||||
return nil
|
||||
}
|
||||
path := path[len(fp)+1:]
|
||||
slashIndex := strings.LastIndex(path, "/")
|
||||
if slashIndex == -1 {
|
||||
return nil
|
||||
}
|
||||
tag := path[:slashIndex] + ":" + path[slashIndex+1:]
|
||||
mp := ParseModelPath(tag)
|
||||
manifest, err := GetManifest(mp)
|
||||
if err != nil {
|
||||
log.Printf("skipping file: %s", fp)
|
||||
return nil
|
||||
}
|
||||
model := api.ListResponseModel{
|
||||
Name: mp.GetShortTagname(),
|
||||
Size: manifest.GetTotalSize(),
|
||||
ModifiedAt: fi.ModTime(),
|
||||
}
|
||||
models = append(models, model)
|
||||
}
|
||||
return nil
|
||||
})
|
||||
if err != nil {
|
||||
c.JSON(http.StatusInternalServerError, gin.H{"error": err.Error()})
|
||||
return
|
||||
}
|
||||
|
||||
c.JSON(http.StatusOK, api.ListResponse{models})
|
||||
}
|
||||
|
||||
func Serve(ln net.Listener) error {
|
||||
@@ -129,8 +234,12 @@ func Serve(ln net.Listener) error {
|
||||
c.String(http.StatusOK, "Ollama is running")
|
||||
})
|
||||
|
||||
r.POST("api/pull", pull)
|
||||
r.POST("/api/generate", generate)
|
||||
r.POST("/api/pull", PullModelHandler)
|
||||
r.POST("/api/generate", GenerateHandler)
|
||||
r.POST("/api/create", CreateModelHandler)
|
||||
r.POST("/api/push", PushModelHandler)
|
||||
r.GET("/api/tags", ListModelsHandler)
|
||||
r.DELETE("/api/delete", DeleteModelHandler)
|
||||
|
||||
log.Printf("Listening on %s", ln.Addr())
|
||||
s := &http.Server{
|
||||
@@ -140,19 +249,7 @@ func Serve(ln net.Listener) error {
|
||||
return s.Serve(ln)
|
||||
}
|
||||
|
||||
func matchRankOne(source string, targets []string) (bestMatch string, bestRank int) {
|
||||
bestRank = math.MaxInt
|
||||
for _, target := range targets {
|
||||
if rank := fuzzy.LevenshteinDistance(source, target); bestRank > rank {
|
||||
bestRank = rank
|
||||
bestMatch = target
|
||||
}
|
||||
}
|
||||
|
||||
return
|
||||
}
|
||||
|
||||
func stream(c *gin.Context, ch chan any) {
|
||||
func streamResponse(c *gin.Context, ch chan any) {
|
||||
c.Stream(func(w io.Writer) bool {
|
||||
val, ok := <-ch
|
||||
if !ok {
|
||||
|
@@ -1,8 +0,0 @@
|
||||
Below is an instruction that describes a task. Write a response that appropriately completes the request.
|
||||
|
||||
### Instruction:
|
||||
{{ .Prompt }}
|
||||
|
||||
### Response:
|
||||
|
||||
|
@@ -1,3 +0,0 @@
|
||||
A helpful assistant who helps the user with any questions asked.
|
||||
User: {{ .Prompt }}
|
||||
Assistant:
|
@@ -1,5 +0,0 @@
|
||||
### Instruction:
|
||||
{{ .Prompt }}
|
||||
|
||||
### Response:
|
||||
|
@@ -1,5 +0,0 @@
|
||||
### Instruction:
|
||||
{{ .Prompt }}
|
||||
|
||||
### Response:
|
||||
|
@@ -1,4 +0,0 @@
|
||||
Below is an instruction that describes a task. Write a response that appropriately completes the request. Be concise. Once the request is completed, include no other text.
|
||||
### Instruction:
|
||||
{{ .Prompt }}
|
||||
### Response:
|
@@ -1 +0,0 @@
|
||||
{{ .Prompt }}
|
@@ -1,7 +0,0 @@
|
||||
### System:
|
||||
You are an AI assistant that follows instruction extremely well. Help as much as you can.
|
||||
|
||||
### User:
|
||||
{{ .Prompt }}
|
||||
|
||||
### Response:
|
@@ -1,2 +0,0 @@
|
||||
### Human: {{ .Prompt }}
|
||||
### Assistant:
|
@@ -1,4 +0,0 @@
|
||||
|
||||
{{ .Prompt }}
|
||||
|
||||
|
@@ -1,2 +0,0 @@
|
||||
USER: {{ .Prompt }}
|
||||
ASSISTANT:
|
@@ -1,4 +0,0 @@
|
||||
A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions.
|
||||
|
||||
USER: {{ .Prompt }}
|
||||
ASSISTANT:
|
@@ -1,5 +0,0 @@
|
||||
Below is an instruction that describes a task. Write a response that appropriately completes the request
|
||||
|
||||
### Instruction: {{ .Prompt }}
|
||||
|
||||
### Response:
|
@@ -1,3 +0,0 @@
|
||||
{{ .Prompt }}
|
||||
|
||||
### Response:
|
@@ -1,6 +0,0 @@
|
||||
import models from '../../../../models.json'
|
||||
import { NextResponse } from 'next/server'
|
||||
|
||||
export async function GET() {
|
||||
return NextResponse.json(models)
|
||||
}
|
@@ -6,12 +6,22 @@ const analytics = new Analytics({ writeKey: process.env.TELEMETRY_WRITE_KEY || '
|
||||
export async function POST(req: Request) {
|
||||
const { email } = await req.json()
|
||||
|
||||
analytics.identify({
|
||||
anonymousId: uuid(),
|
||||
const id = uuid()
|
||||
|
||||
await analytics.identify({
|
||||
anonymousId: id,
|
||||
traits: {
|
||||
email,
|
||||
},
|
||||
})
|
||||
|
||||
await analytics.track({
|
||||
anonymousId: id,
|
||||
event: 'signup',
|
||||
properties: {
|
||||
email,
|
||||
},
|
||||
})
|
||||
|
||||
return new Response(null, { status: 200 })
|
||||
}
|
||||
|
@@ -1,3 +1,6 @@
|
||||
import Image from 'next/image'
|
||||
|
||||
import Header from '../header'
|
||||
import Downloader from './downloader'
|
||||
import Signup from './signup'
|
||||
|
||||
@@ -26,22 +29,19 @@ export default async function Download() {
|
||||
}
|
||||
|
||||
return (
|
||||
<main className='flex min-h-screen max-w-2xl flex-col p-4 lg:p-24 items-center mx-auto'>
|
||||
<img src='/ollama.png' className='w-16 h-auto' />
|
||||
<section className='my-12 text-center'>
|
||||
<h2 className='my-2 max-w-md text-3xl tracking-tight'>Downloading Ollama</h2>
|
||||
<h3 className='text-sm text-neutral-500'>
|
||||
Problems downloading?{' '}
|
||||
<a href={asset.browser_download_url} className='underline'>
|
||||
Try again
|
||||
</a>
|
||||
</h3>
|
||||
<Downloader url={asset.browser_download_url} />
|
||||
</section>
|
||||
<section className='max-w-sm flex flex-col w-full items-center border border-neutral-200 rounded-xl px-8 pt-8 pb-2'>
|
||||
<p className='text-lg leading-tight text-center mb-6 max-w-[260px]'>Sign up for updates</p>
|
||||
<>
|
||||
<Header />
|
||||
<main className='flex min-h-screen max-w-6xl flex-col py-20 px-16 lg:p-32 items-center mx-auto'>
|
||||
<Image src='/ollama.png' width={64} height={64} alt='ollamaIcon' />
|
||||
<section className='mt-12 mb-8 text-center'>
|
||||
<h2 className='my-2 max-w-md text-3xl tracking-tight'>Downloading...</h2>
|
||||
<h3 className='text-base text-neutral-500 mt-12 max-w-[16rem]'>
|
||||
While Ollama downloads, sign up to get notified of new updates.
|
||||
</h3>
|
||||
<Downloader url={asset.browser_download_url} />
|
||||
</section>
|
||||
<Signup />
|
||||
</section>
|
||||
</main>
|
||||
</main>
|
||||
</>
|
||||
)
|
||||
}
|
||||
|
@@ -28,7 +28,7 @@ export default function Signup() {
|
||||
|
||||
return false
|
||||
}}
|
||||
className='flex self-stretch flex-col gap-3 h-32'
|
||||
className='flex self-stretch flex-col gap-3 h-32 md:mx-40 lg:mx-72'
|
||||
>
|
||||
<input
|
||||
required
|
||||
@@ -37,13 +37,13 @@ export default function Signup() {
|
||||
onChange={e => setEmail(e.target.value)}
|
||||
type='email'
|
||||
placeholder='your@email.com'
|
||||
className='bg-neutral-100 rounded-lg px-4 py-2 focus:outline-none placeholder-neutral-500'
|
||||
className='border border-neutral-200 rounded-lg px-4 py-2 focus:outline-none placeholder-neutral-300'
|
||||
/>
|
||||
<input
|
||||
type='submit'
|
||||
value='Get updates'
|
||||
disabled={submitting}
|
||||
className='bg-black text-white disabled:text-neutral-200 disabled:bg-neutral-700 rounded-lg px-4 py-2 focus:outline-none cursor-pointer'
|
||||
className='bg-black text-white disabled:text-neutral-200 disabled:bg-neutral-700 rounded-full px-4 py-2 focus:outline-none cursor-pointer'
|
||||
/>
|
||||
{success && <p className='text-center text-sm'>You're signed up for updates</p>}
|
||||
</form>
|
||||
|
26
web/app/header.tsx
Normal file
@@ -0,0 +1,26 @@
|
||||
import Link from "next/link"
|
||||
|
||||
const navigation = [
|
||||
{ name: 'Discord', href: 'https://discord.gg/MrfB5FbNWN' },
|
||||
{ name: 'Github', href: 'https://github.com/jmorganca/ollama' },
|
||||
{ name: 'Download', href: '/download' },
|
||||
]
|
||||
|
||||
export default function Header() {
|
||||
return (
|
||||
<header className="absolute inset-x-0 top-0 z-50">
|
||||
<nav className="mx-auto flex items-center justify-between px-10 py-4">
|
||||
<Link className="flex-1 font-bold" href="/">
|
||||
Ollama
|
||||
</Link>
|
||||
<div className="flex space-x-8">
|
||||
{navigation.map((item) => (
|
||||
<Link key={item.name} href={item.href} className="text-sm leading-6 text-gray-900">
|
||||
{item.name}
|
||||
</Link>
|
||||
))}
|
||||
</div>
|
||||
</nav>
|
||||
</header >
|
||||
)
|
||||
}
|
@@ -1,34 +1,37 @@
|
||||
import { AiFillApple } from 'react-icons/ai'
|
||||
import Image from 'next/image'
|
||||
import Link from 'next/link'
|
||||
|
||||
import models from '../../models.json'
|
||||
import Header from './header'
|
||||
|
||||
export default async function Home() {
|
||||
return (
|
||||
<main className='flex min-h-screen max-w-2xl flex-col p-4 lg:p-24'>
|
||||
<img src='/ollama.png' className='w-16 h-auto' />
|
||||
<section className='my-4'>
|
||||
<p className='my-3 max-w-md'>
|
||||
<a className='underline' href='https://github.com/jmorganca/ollama'>
|
||||
Ollama
|
||||
</a>{' '}
|
||||
is a tool for running large language models, currently for macOS with Windows and Linux coming soon.
|
||||
<br />
|
||||
<br />
|
||||
<a href='/download'>
|
||||
<button className='bg-black text-white text-sm py-2 px-3 rounded-lg flex items-center gap-2'>
|
||||
<AiFillApple className='h-auto w-5 relative -top-px' /> Download for macOS
|
||||
</button>
|
||||
</a>
|
||||
</p>
|
||||
</section>
|
||||
<section className='my-4'>
|
||||
<h2 className='mb-4 text-lg'>Example models you can try running:</h2>
|
||||
{models.map(m => (
|
||||
<div className='my-2 grid font-mono' key={m.name}>
|
||||
<code className='py-0.5'>ollama run {m.name}</code>
|
||||
<>
|
||||
<Header />
|
||||
<main className='flex min-h-screen max-w-6xl flex-col py-20 px-16 md:p-32 items-center mx-auto'>
|
||||
<Image src='/ollama.png' width={64} height={64} alt='ollamaIcon' />
|
||||
<section className='my-12 text-center'>
|
||||
<div className='flex flex-col space-y-2'>
|
||||
<h2 className='md:max-w-md mx-auto my-2 text-3xl tracking-tight'>
|
||||
Get up and running with large language models, locally.
|
||||
</h2>
|
||||
<h3 className='md:max-w-xs mx-auto text-base text-neutral-500'>
|
||||
Run Llama 2 and other models on macOS. Customize and create your own.
|
||||
</h3>
|
||||
</div>
|
||||
))}
|
||||
</section>
|
||||
</main>
|
||||
<div className='mx-auto max-w-xs flex flex-col space-y-4 mt-12'>
|
||||
<Link
|
||||
href='/download'
|
||||
className='md:mx-10 lg:mx-14 bg-black text-white rounded-full px-4 py-2 focus:outline-none cursor-pointer'
|
||||
>
|
||||
Download
|
||||
</Link>
|
||||
<p className='text-neutral-500 text-sm '>
|
||||
Available for macOS with Apple Silicon <br />
|
||||
Windows & Linux support coming soon.
|
||||
</p>
|
||||
</div>
|
||||
</section>
|
||||
</main>
|
||||
</>
|
||||
)
|
||||
}
|
||||
|