Compare commits
105 Commits
Author | SHA1 | Date | |
---|---|---|---|
![]() |
11b844e1bb | ||
![]() |
88c55199f8 | ||
![]() |
c448443813 | ||
![]() |
efacd45fc5 | ||
![]() |
fa522695c4 | ||
![]() |
8609db77ea | ||
![]() |
65d93a86b2 | ||
![]() |
e6c427ce4d | ||
![]() |
6d6b0d3321 | ||
![]() |
37324a0a00 | ||
![]() |
20a5d99f77 | ||
![]() |
3b43cc019a | ||
![]() |
b8421dce3d | ||
![]() |
9f6e97865c | ||
![]() |
9657314ae2 | ||
![]() |
3f7d2336c7 | ||
![]() |
e0a73d7fbe | ||
![]() |
b08c4ca2bd | ||
![]() |
734892f1e2 | ||
![]() |
d2bfaeac63 | ||
![]() |
0768b1b907 | ||
![]() |
f5f0da06d9 | ||
![]() |
52f04e39f2 | ||
![]() |
3c8f4c03d7 | ||
![]() |
7ba1308595 | ||
![]() |
91cd54016c | ||
![]() |
e7a393de54 | ||
![]() |
8454f298ac | ||
![]() |
a3badaf103 | ||
![]() |
50e8e5bdbe | ||
![]() |
8526e1f5f1 | ||
![]() |
0cfdbb95cc | ||
![]() |
6cea2061ec | ||
![]() |
2832801c2a | ||
![]() |
23a37dc466 | ||
![]() |
992892866b | ||
![]() |
dde880290c | ||
![]() |
1f27d7f1b8 | ||
![]() |
00aaa05901 | ||
![]() |
a83eaa7a9f | ||
![]() |
5156e48c2a | ||
![]() |
bf198c3918 | ||
![]() |
09dc6273e3 | ||
![]() |
ebaa33ac28 | ||
![]() |
3ec4ebc562 | ||
![]() |
6a19724d5f | ||
![]() |
924ce739f9 | ||
![]() |
e1973e6780 | ||
![]() |
f1b08ef40e | ||
![]() |
31f0cb7742 | ||
![]() |
e4b2ccfb23 | ||
![]() |
a3d7bb0a30 | ||
![]() |
77e49f3822 | ||
![]() |
8945b25484 | ||
![]() |
99ccf0c5d3 | ||
![]() |
d59b164fa2 | ||
![]() |
55b5f5dc34 | ||
![]() |
3b135ac963 | ||
![]() |
e6bae8d916 | ||
![]() |
d9f54300c3 | ||
![]() |
1511219763 | ||
![]() |
ada0add89b | ||
![]() |
75e508e1d6 | ||
![]() |
6f046dbf18 | ||
![]() |
cd820c8bca | ||
![]() |
88e755d7fd | ||
![]() |
6984171cfd | ||
![]() |
60b4db6389 | ||
![]() |
7c6ea2a966 | ||
![]() |
c161aef5f9 | ||
![]() |
c47786c1b0 | ||
![]() |
df100ce540 | ||
![]() |
5c5948b4e7 | ||
![]() |
1c72e46e09 | ||
![]() |
ca210ba480 | ||
![]() |
df146c41e2 | ||
![]() |
2d305fa99a | ||
![]() |
e4d7f3e287 | ||
![]() |
f2044b5838 | ||
![]() |
d53988f619 | ||
![]() |
ac88ab48d9 | ||
![]() |
84c6ee8cc6 | ||
![]() |
dbc90576b8 | ||
![]() |
84200dcde6 | ||
![]() |
e54c08da89 | ||
![]() |
31413857ea | ||
![]() |
25f874c030 | ||
![]() |
10d502611f | ||
![]() |
7fe4103b94 | ||
![]() |
7fbdc8e2c1 | ||
![]() |
9c5572d51f | ||
![]() |
75eb28f574 | ||
![]() |
56b6a1720f | ||
![]() |
dfceca48a7 | ||
![]() |
bbb67002c3 | ||
![]() |
0294216ea9 | ||
![]() |
7a62b2d2ab | ||
![]() |
f08c050e57 | ||
![]() |
67c8d49757 | ||
![]() |
ffcd90e8a7 | ||
![]() |
4ca7c4be1f | ||
![]() |
17b7af78f0 | ||
![]() |
4c1dc52083 | ||
![]() |
572fc9099f | ||
![]() |
3020f29041 |
1
.gitignore
vendored
1
.gitignore
vendored
@@ -2,5 +2,6 @@
|
||||
.vscode
|
||||
.env
|
||||
.venv
|
||||
.swp
|
||||
dist
|
||||
ollama
|
||||
|
93
README.md
93
README.md
@@ -1,25 +1,50 @@
|
||||
<div align="center">
|
||||
<picture>
|
||||
<source media="(prefers-color-scheme: dark)" height="200px" srcset="https://github.com/jmorganca/ollama/assets/3325447/318048d2-b2dd-459c-925a-ac8449d5f02c">
|
||||
<img alt="logo" height="200px" src="https://github.com/jmorganca/ollama/assets/3325447/c7d6e15f-7f4d-4776-b568-c084afa297c2">
|
||||
<source media="(prefers-color-scheme: dark)" height="200px" srcset="https://github.com/jmorganca/ollama/assets/3325447/56ea1849-1284-4645-8970-956de6e51c3c">
|
||||
<img alt="logo" height="200px" src="https://github.com/jmorganca/ollama/assets/3325447/0d0b44e2-8f4a-4e99-9b52-a5c1c741c8f7">
|
||||
</picture>
|
||||
</div>
|
||||
|
||||
# Ollama
|
||||
|
||||
Create, run, and share self-contained large language models (LLMs). Ollama bundles a model’s weights, configuration, prompts, and more into self-contained packages that run anywhere.
|
||||
[](https://discord.gg/ollama)
|
||||
|
||||
> Note: Ollama is in early preview. Please report any issues you find.
|
||||
|
||||
Run, create, and share large language models (LLMs).
|
||||
|
||||
## Download
|
||||
|
||||
- [Download](https://ollama.ai/download) for macOS on Apple Silicon (Intel coming soon)
|
||||
- Download for Windows and Linux (coming soon)
|
||||
- Build [from source](#building)
|
||||
|
||||
## Quickstart
|
||||
|
||||
To run and chat with [Llama 2](https://ai.meta.com/llama), the new model by Meta:
|
||||
|
||||
```
|
||||
ollama run llama2
|
||||
```
|
||||
|
||||
## Model library
|
||||
|
||||
`ollama` includes a library of open-source models:
|
||||
|
||||
| Model | Parameters | Size | Download |
|
||||
| ------------------------ | ---------- | ----- | --------------------------- |
|
||||
| Llama2 | 7B | 3.8GB | `ollama pull llama2` |
|
||||
| Llama2 13B | 13B | 7.3GB | `ollama pull llama2:13b` |
|
||||
| Orca Mini | 3B | 1.9GB | `ollama pull orca` |
|
||||
| Vicuna | 7B | 3.8GB | `ollama pull vicuna` |
|
||||
| Nous-Hermes | 13B | 7.3GB | `ollama pull nous-hermes` |
|
||||
| Wizard Vicuna Uncensored | 13B | 7.3GB | `ollama pull wizard-vicuna` |
|
||||
|
||||
> Note: You should have at least 8 GB of RAM to run the 3B models, 16 GB to run the 7B models, and 32 GB to run the 13B models.
|
||||
|
||||
## Examples
|
||||
|
||||
### Quickstart
|
||||
### Run a model
|
||||
|
||||
```
|
||||
ollama run llama2
|
||||
@@ -27,17 +52,25 @@ ollama run llama2
|
||||
Hello! How can I help you today?
|
||||
```
|
||||
|
||||
### Creating a custom model
|
||||
### Create a custom model
|
||||
|
||||
Pull a base model:
|
||||
|
||||
```
|
||||
ollama pull llama2
|
||||
```
|
||||
|
||||
Create a `Modelfile`:
|
||||
|
||||
```
|
||||
FROM llama2
|
||||
PROMPT """
|
||||
You are Mario from Super Mario Bros. Answer as Mario, the assistant, only.
|
||||
|
||||
User: {{ .Prompt }}
|
||||
Mario:
|
||||
# set the temperature to 1 [higher is more creative, lower is more coherent]
|
||||
PARAMETER temperature 1
|
||||
|
||||
# set the system prompt
|
||||
SYSTEM """
|
||||
You are Mario from Super Mario Bros. Answer as Mario, the assistant, only.
|
||||
"""
|
||||
```
|
||||
|
||||
@@ -50,16 +83,30 @@ ollama run mario
|
||||
Hello! It's your friend Mario.
|
||||
```
|
||||
|
||||
## Model library
|
||||
For more examples, see the [examples](./examples) directory.
|
||||
|
||||
Ollama includes a library of open-source, pre-trained models. More models are coming soon.
|
||||
### Pull a model from the registry
|
||||
|
||||
| Model | Parameters | Size | Download |
|
||||
| ----------- | ---------- | ----- | ------------------------- |
|
||||
| Llama2 | 7B | 3.8GB | `ollama pull llama2` |
|
||||
| Orca Mini | 3B | 1.9GB | `ollama pull orca` |
|
||||
| Vicuna | 7B | 3.8GB | `ollama pull vicuna` |
|
||||
| Nous-Hermes | 13B | 7.3GB | `ollama pull nous-hermes` |
|
||||
```
|
||||
ollama pull orca
|
||||
```
|
||||
|
||||
### Listing local models
|
||||
|
||||
```
|
||||
ollama list
|
||||
```
|
||||
|
||||
## Model packages
|
||||
|
||||
### Overview
|
||||
|
||||
Ollama bundles model weights, configuration, and data into a single package, defined by a [Modelfile](./docs/modelfile.md).
|
||||
|
||||
<picture>
|
||||
<source media="(prefers-color-scheme: dark)" height="480" srcset="https://github.com/jmorganca/ollama/assets/251292/2fd96b5f-191b-45c1-9668-941cfad4eb70">
|
||||
<img alt="logo" height="480" src="https://github.com/jmorganca/ollama/assets/251292/2fd96b5f-191b-45c1-9668-941cfad4eb70">
|
||||
</picture>
|
||||
|
||||
## Building
|
||||
|
||||
@@ -70,7 +117,7 @@ go build .
|
||||
To run it start the server:
|
||||
|
||||
```
|
||||
./ollama server &
|
||||
./ollama serve &
|
||||
```
|
||||
|
||||
Finally, run a model!
|
||||
@@ -78,3 +125,13 @@ Finally, run a model!
|
||||
```
|
||||
./ollama run llama2
|
||||
```
|
||||
|
||||
## REST API
|
||||
|
||||
### `POST /api/generate`
|
||||
|
||||
Generate text from a model.
|
||||
|
||||
```
|
||||
curl -X POST http://localhost:11434/api/generate -d '{"model": "llama2", "prompt":"Why is the sky blue?"}'
|
||||
```
|
||||
|
@@ -27,7 +27,7 @@ func checkError(resp *http.Response, body []byte) error {
|
||||
err := json.Unmarshal(body, &apiError)
|
||||
if err != nil {
|
||||
// Use the full body as the message if we fail to decode a response.
|
||||
apiError.Message = string(body)
|
||||
apiError.ErrorMessage = string(body)
|
||||
}
|
||||
|
||||
return apiError
|
||||
@@ -92,7 +92,6 @@ func (c *Client) do(ctx context.Context, method, path string, reqData, respData
|
||||
}
|
||||
}
|
||||
return nil
|
||||
|
||||
}
|
||||
|
||||
func (c *Client) stream(ctx context.Context, method, path string, data any, fn func([]byte) error) error {
|
||||
@@ -131,11 +130,15 @@ func (c *Client) stream(ctx context.Context, method, path string, data any, fn f
|
||||
return fmt.Errorf("unmarshal: %w", err)
|
||||
}
|
||||
|
||||
if errorResponse.Error != "" {
|
||||
return fmt.Errorf("stream: %s", errorResponse.Error)
|
||||
}
|
||||
|
||||
if response.StatusCode >= 400 {
|
||||
return StatusError{
|
||||
StatusCode: response.StatusCode,
|
||||
Status: response.Status,
|
||||
Message: errorResponse.Error,
|
||||
StatusCode: response.StatusCode,
|
||||
Status: response.Status,
|
||||
ErrorMessage: errorResponse.Error,
|
||||
}
|
||||
}
|
||||
|
||||
@@ -206,3 +209,17 @@ func (c *Client) List(ctx context.Context) (*ListResponse, error) {
|
||||
}
|
||||
return &lr, nil
|
||||
}
|
||||
|
||||
func (c *Client) Copy(ctx context.Context, req *CopyRequest) error {
|
||||
if err := c.do(ctx, http.MethodPost, "/api/copy", req, nil); err != nil {
|
||||
return err
|
||||
}
|
||||
return nil
|
||||
}
|
||||
|
||||
func (c *Client) Delete(ctx context.Context, req *DeleteRequest) error {
|
||||
if err := c.do(ctx, http.MethodDelete, "/api/delete", req, nil); err != nil {
|
||||
return err
|
||||
}
|
||||
return nil
|
||||
}
|
||||
|
38
api/types.go
38
api/types.go
@@ -8,16 +8,23 @@ import (
|
||||
)
|
||||
|
||||
type StatusError struct {
|
||||
StatusCode int
|
||||
Status string
|
||||
Message string
|
||||
StatusCode int
|
||||
Status string
|
||||
ErrorMessage string `json:"error"`
|
||||
}
|
||||
|
||||
func (e StatusError) Error() string {
|
||||
if e.Message != "" {
|
||||
return fmt.Sprintf("%s: %s", e.Status, e.Message)
|
||||
switch {
|
||||
case e.Status != "" && e.ErrorMessage != "":
|
||||
return fmt.Sprintf("%s: %s", e.Status, e.ErrorMessage)
|
||||
case e.Status != "":
|
||||
return e.Status
|
||||
case e.ErrorMessage != "":
|
||||
return e.ErrorMessage
|
||||
default:
|
||||
// this should not happen
|
||||
return "something went wrong, please see the ollama server logs for details"
|
||||
}
|
||||
return e.Status
|
||||
}
|
||||
|
||||
type GenerateRequest struct {
|
||||
@@ -37,21 +44,32 @@ type CreateProgress struct {
|
||||
Status string `json:"status"`
|
||||
}
|
||||
|
||||
type DeleteRequest struct {
|
||||
Name string `json:"name"`
|
||||
}
|
||||
|
||||
type CopyRequest struct {
|
||||
Source string `json:"source"`
|
||||
Destination string `json:"destination"`
|
||||
}
|
||||
|
||||
type PullRequest struct {
|
||||
Name string `json:"name"`
|
||||
Insecure bool `json:"insecure,omitempty"`
|
||||
Username string `json:"username"`
|
||||
Password string `json:"password"`
|
||||
}
|
||||
|
||||
type ProgressResponse struct {
|
||||
Status string `json:"status"`
|
||||
Digest string `json:"digest,omitempty"`
|
||||
Total int `json:"total,omitempty"`
|
||||
Completed int `json:"completed,omitempty"`
|
||||
Status string `json:"status"`
|
||||
Digest string `json:"digest,omitempty"`
|
||||
Total int `json:"total,omitempty"`
|
||||
Completed int `json:"completed,omitempty"`
|
||||
}
|
||||
|
||||
type PushRequest struct {
|
||||
Name string `json:"name"`
|
||||
Insecure bool `json:"insecure,omitempty"`
|
||||
Username string `json:"username"`
|
||||
Password string `json:"password"`
|
||||
}
|
||||
|
BIN
app/assets/ollama_outline_icon_16x16Template.png
Normal file
BIN
app/assets/ollama_outline_icon_16x16Template.png
Normal file
Binary file not shown.
After Width: | Height: | Size: 445 B |
BIN
app/assets/ollama_outline_icon_16x16Template@2x.png
Normal file
BIN
app/assets/ollama_outline_icon_16x16Template@2x.png
Normal file
Binary file not shown.
After Width: | Height: | Size: 891 B |
@@ -21,6 +21,8 @@ const config: ForgeConfig = {
|
||||
'../ollama',
|
||||
path.join(__dirname, './assets/ollama_icon_16x16Template.png'),
|
||||
path.join(__dirname, './assets/ollama_icon_16x16Template@2x.png'),
|
||||
path.join(__dirname, './assets/ollama_outline_icon_16x16Template.png'),
|
||||
path.join(__dirname, './assets/ollama_outline_icon_16x16Template@2x.png'),
|
||||
...(process.platform === 'darwin' ? ['../llama/ggml-metal.metal'] : []),
|
||||
],
|
||||
...(process.env.SIGN
|
||||
|
@@ -11,7 +11,9 @@
|
||||
"make": "electron-forge make",
|
||||
"make:sign": "SIGN=1 electron-forge make",
|
||||
"publish": "SIGN=1 electron-forge publish",
|
||||
"lint": "eslint --ext .ts,.tsx ."
|
||||
"lint": "eslint --ext .ts,.tsx .",
|
||||
"format": "prettier --check . --ignore-path .gitignore",
|
||||
"format:fix": "prettier --write . --ignore-path .gitignore"
|
||||
},
|
||||
"keywords": [],
|
||||
"author": {
|
||||
|
6
app/src/declarations.d.ts
vendored
6
app/src/declarations.d.ts
vendored
@@ -1,4 +1,4 @@
|
||||
declare module '*.svg' {
|
||||
const content: string;
|
||||
export default content;
|
||||
}
|
||||
const content: string
|
||||
export default content
|
||||
}
|
||||
|
@@ -1,5 +1,5 @@
|
||||
import { spawn } from 'child_process'
|
||||
import { app, autoUpdater, dialog, Tray, Menu, BrowserWindow } from 'electron'
|
||||
import { app, autoUpdater, dialog, Tray, Menu, BrowserWindow, nativeTheme } from 'electron'
|
||||
import Store from 'electron-store'
|
||||
import winston from 'winston'
|
||||
import 'winston-daily-rotate-file'
|
||||
@@ -66,14 +66,30 @@ function firstRunWindow() {
|
||||
}
|
||||
|
||||
function createSystemtray() {
|
||||
let iconPath = path.join(__dirname, '..', '..', 'assets', 'ollama_icon_16x16Template.png')
|
||||
let iconPath = nativeTheme.shouldUseDarkColors
|
||||
? path.join(__dirname, '..', '..', 'assets', 'ollama_icon_16x16Template.png')
|
||||
: path.join(__dirname, '..', '..', 'assets', 'ollama_outline_icon_16x16Template.png')
|
||||
|
||||
if (app.isPackaged) {
|
||||
iconPath = path.join(process.resourcesPath, 'ollama_icon_16x16Template.png')
|
||||
iconPath = nativeTheme.shouldUseDarkColors
|
||||
? path.join(process.resourcesPath, 'ollama_icon_16x16Template.png')
|
||||
: path.join(process.resourcesPath, 'ollama_outline_icon_16x16Template.png')
|
||||
}
|
||||
|
||||
tray = new Tray(iconPath)
|
||||
|
||||
nativeTheme.on('updated', function theThemeHasChanged() {
|
||||
if (nativeTheme.shouldUseDarkColors) {
|
||||
app.isPackaged
|
||||
? tray.setImage(path.join(process.resourcesPath, 'ollama_icon_16x16Template.png'))
|
||||
: tray.setImage(path.join(__dirname, '..', '..', 'assets', 'ollama_icon_16x16Template.png'))
|
||||
} else {
|
||||
app.isPackaged
|
||||
? tray.setImage(path.join(process.resourcesPath, 'ollama_outline_icon_16x16Template.png'))
|
||||
: tray.setImage(path.join(__dirname, '..', '..', 'assets', 'ollama_outline_icon_16x16Template.png'))
|
||||
}
|
||||
})
|
||||
|
||||
const contextMenu = Menu.buildFromTemplate([{ role: 'quit', label: 'Quit Ollama', accelerator: 'Command+Q' }])
|
||||
|
||||
tray.setContextMenu(contextMenu)
|
||||
@@ -100,8 +116,7 @@ function server() {
|
||||
})
|
||||
|
||||
function restart() {
|
||||
logger.info('Restarting the server...')
|
||||
server()
|
||||
setTimeout(server, 3000)
|
||||
}
|
||||
|
||||
proc.on('exit', restart)
|
||||
|
@@ -13,7 +13,9 @@ export function installed() {
|
||||
}
|
||||
|
||||
export async function install() {
|
||||
const command = `do shell script "ln -F -s ${ollama} ${symlinkPath}" with administrator privileges`
|
||||
const command = `do shell script "mkdir -p ${path.dirname(
|
||||
symlinkPath
|
||||
)} && ln -F -s ${ollama} ${symlinkPath}" with administrator privileges`
|
||||
|
||||
try {
|
||||
await exec(`osascript -e '${command}'`)
|
||||
|
224
cmd/cmd.go
224
cmd/cmd.go
@@ -5,6 +5,7 @@ import (
|
||||
"context"
|
||||
"errors"
|
||||
"fmt"
|
||||
"io"
|
||||
"log"
|
||||
"net"
|
||||
"net/http"
|
||||
@@ -13,18 +14,18 @@ import (
|
||||
"strings"
|
||||
"time"
|
||||
|
||||
"github.com/chzyer/readline"
|
||||
"github.com/dustin/go-humanize"
|
||||
"github.com/olekukonko/tablewriter"
|
||||
"github.com/schollz/progressbar/v3"
|
||||
"github.com/spf13/cobra"
|
||||
"golang.org/x/term"
|
||||
|
||||
"github.com/jmorganca/ollama/api"
|
||||
"github.com/jmorganca/ollama/format"
|
||||
"github.com/jmorganca/ollama/progressbar"
|
||||
"github.com/jmorganca/ollama/server"
|
||||
)
|
||||
|
||||
func create(cmd *cobra.Command, args []string) error {
|
||||
func CreateHandler(cmd *cobra.Command, args []string) error {
|
||||
filename, _ := cmd.Flags().GetString("file")
|
||||
filename, err := filepath.Abs(filename)
|
||||
if err != nil {
|
||||
@@ -58,7 +59,7 @@ func create(cmd *cobra.Command, args []string) error {
|
||||
return nil
|
||||
}
|
||||
|
||||
func RunRun(cmd *cobra.Command, args []string) error {
|
||||
func RunHandler(cmd *cobra.Command, args []string) error {
|
||||
mp := server.ParseModelPath(args[0])
|
||||
fp, err := mp.GetManifestPath(false)
|
||||
if err != nil {
|
||||
@@ -68,7 +69,7 @@ func RunRun(cmd *cobra.Command, args []string) error {
|
||||
_, err = os.Stat(fp)
|
||||
switch {
|
||||
case errors.Is(err, os.ErrNotExist):
|
||||
if err := pull(args[0]); err != nil {
|
||||
if err := pull(args[0], false); err != nil {
|
||||
var apiStatusError api.StatusError
|
||||
if !errors.As(err, &apiStatusError) {
|
||||
return err
|
||||
@@ -85,12 +86,33 @@ func RunRun(cmd *cobra.Command, args []string) error {
|
||||
return RunGenerate(cmd, args)
|
||||
}
|
||||
|
||||
func push(cmd *cobra.Command, args []string) error {
|
||||
func PushHandler(cmd *cobra.Command, args []string) error {
|
||||
client := api.NewClient()
|
||||
|
||||
request := api.PushRequest{Name: args[0]}
|
||||
insecure, err := cmd.Flags().GetBool("insecure")
|
||||
if err != nil {
|
||||
return err
|
||||
}
|
||||
|
||||
var currentDigest string
|
||||
var bar *progressbar.ProgressBar
|
||||
|
||||
request := api.PushRequest{Name: args[0], Insecure: insecure}
|
||||
fn := func(resp api.ProgressResponse) error {
|
||||
fmt.Println(resp.Status)
|
||||
if resp.Digest != currentDigest && resp.Digest != "" {
|
||||
currentDigest = resp.Digest
|
||||
bar = progressbar.DefaultBytes(
|
||||
int64(resp.Total),
|
||||
fmt.Sprintf("pushing %s...", resp.Digest[7:19]),
|
||||
)
|
||||
|
||||
bar.Set(resp.Completed)
|
||||
} else if resp.Digest == currentDigest && resp.Digest != "" {
|
||||
bar.Set(resp.Completed)
|
||||
} else {
|
||||
currentDigest = ""
|
||||
fmt.Println(resp.Status)
|
||||
}
|
||||
return nil
|
||||
}
|
||||
|
||||
@@ -100,7 +122,7 @@ func push(cmd *cobra.Command, args []string) error {
|
||||
return nil
|
||||
}
|
||||
|
||||
func list(cmd *cobra.Command, args []string) error {
|
||||
func ListHandler(cmd *cobra.Command, args []string) error {
|
||||
client := api.NewClient()
|
||||
|
||||
models, err := client.List(context.Background())
|
||||
@@ -111,7 +133,9 @@ func list(cmd *cobra.Command, args []string) error {
|
||||
var data [][]string
|
||||
|
||||
for _, m := range models.Models {
|
||||
data = append(data, []string{m.Name, humanize.Bytes(uint64(m.Size)), format.HumanTime(m.ModifiedAt, "Never")})
|
||||
if len(args) == 0 || strings.HasPrefix(m.Name, args[0]) {
|
||||
data = append(data, []string{m.Name, humanize.Bytes(uint64(m.Size)), format.HumanTime(m.ModifiedAt, "Never")})
|
||||
}
|
||||
}
|
||||
|
||||
table := tablewriter.NewWriter(os.Stdout)
|
||||
@@ -128,17 +152,44 @@ func list(cmd *cobra.Command, args []string) error {
|
||||
return nil
|
||||
}
|
||||
|
||||
func RunPull(cmd *cobra.Command, args []string) error {
|
||||
return pull(args[0])
|
||||
func DeleteHandler(cmd *cobra.Command, args []string) error {
|
||||
client := api.NewClient()
|
||||
|
||||
req := api.DeleteRequest{Name: args[0]}
|
||||
if err := client.Delete(context.Background(), &req); err != nil {
|
||||
return err
|
||||
}
|
||||
fmt.Printf("deleted '%s'\n", args[0])
|
||||
return nil
|
||||
}
|
||||
|
||||
func pull(model string) error {
|
||||
func CopyHandler(cmd *cobra.Command, args []string) error {
|
||||
client := api.NewClient()
|
||||
|
||||
req := api.CopyRequest{Source: args[0], Destination: args[1]}
|
||||
if err := client.Copy(context.Background(), &req); err != nil {
|
||||
return err
|
||||
}
|
||||
fmt.Printf("copied '%s' to '%s'\n", args[0], args[1])
|
||||
return nil
|
||||
}
|
||||
|
||||
func PullHandler(cmd *cobra.Command, args []string) error {
|
||||
insecure, err := cmd.Flags().GetBool("insecure")
|
||||
if err != nil {
|
||||
return err
|
||||
}
|
||||
|
||||
return pull(args[0], insecure)
|
||||
}
|
||||
|
||||
func pull(model string, insecure bool) error {
|
||||
client := api.NewClient()
|
||||
|
||||
var currentDigest string
|
||||
var bar *progressbar.ProgressBar
|
||||
|
||||
request := api.PullRequest{Name: model}
|
||||
request := api.PullRequest{Name: model, Insecure: insecure}
|
||||
fn := func(resp api.ProgressResponse) error {
|
||||
if resp.Digest != currentDigest && resp.Digest != "" {
|
||||
currentDigest = resp.Digest
|
||||
@@ -169,7 +220,7 @@ func RunGenerate(cmd *cobra.Command, args []string) error {
|
||||
return generate(cmd, args[0], strings.Join(args[1:], " "))
|
||||
}
|
||||
|
||||
if term.IsTerminal(int(os.Stdin.Fd())) {
|
||||
if readline.IsTerminal(int(os.Stdin.Fd())) {
|
||||
return generateInteractive(cmd, args[0])
|
||||
}
|
||||
|
||||
@@ -227,17 +278,111 @@ func generate(cmd *cobra.Command, model, prompt string) error {
|
||||
}
|
||||
|
||||
func generateInteractive(cmd *cobra.Command, model string) error {
|
||||
fmt.Print(">>> ")
|
||||
scanner := bufio.NewScanner(os.Stdin)
|
||||
for scanner.Scan() {
|
||||
if err := generate(cmd, model, scanner.Text()); err != nil {
|
||||
home, err := os.UserHomeDir()
|
||||
if err != nil {
|
||||
return err
|
||||
}
|
||||
|
||||
completer := readline.NewPrefixCompleter(
|
||||
readline.PcItem("/help"),
|
||||
readline.PcItem("/list"),
|
||||
readline.PcItem("/set",
|
||||
readline.PcItem("history"),
|
||||
readline.PcItem("nohistory"),
|
||||
readline.PcItem("verbose"),
|
||||
readline.PcItem("quiet"),
|
||||
readline.PcItem("mode",
|
||||
readline.PcItem("vim"),
|
||||
readline.PcItem("emacs"),
|
||||
readline.PcItem("default"),
|
||||
),
|
||||
),
|
||||
readline.PcItem("/exit"),
|
||||
readline.PcItem("/bye"),
|
||||
)
|
||||
|
||||
usage := func() {
|
||||
fmt.Fprintln(os.Stderr, "commands:")
|
||||
fmt.Fprintln(os.Stderr, completer.Tree(" "))
|
||||
}
|
||||
|
||||
config := readline.Config{
|
||||
Prompt: ">>> ",
|
||||
HistoryFile: filepath.Join(home, ".ollama", "history"),
|
||||
AutoComplete: completer,
|
||||
}
|
||||
|
||||
scanner, err := readline.NewEx(&config)
|
||||
if err != nil {
|
||||
return err
|
||||
}
|
||||
defer scanner.Close()
|
||||
|
||||
for {
|
||||
line, err := scanner.Readline()
|
||||
switch {
|
||||
case errors.Is(err, io.EOF):
|
||||
return nil
|
||||
case errors.Is(err, readline.ErrInterrupt):
|
||||
if line == "" {
|
||||
return nil
|
||||
}
|
||||
|
||||
continue
|
||||
case err != nil:
|
||||
return err
|
||||
}
|
||||
|
||||
fmt.Print(">>> ")
|
||||
}
|
||||
line = strings.TrimSpace(line)
|
||||
|
||||
return nil
|
||||
switch {
|
||||
case strings.HasPrefix(line, "/list"):
|
||||
args := strings.Fields(line)
|
||||
if err := ListHandler(cmd, args[1:]); err != nil {
|
||||
return err
|
||||
}
|
||||
|
||||
continue
|
||||
case strings.HasPrefix(line, "/set"):
|
||||
args := strings.Fields(line)
|
||||
if len(args) > 1 {
|
||||
switch args[1] {
|
||||
case "history":
|
||||
scanner.HistoryEnable()
|
||||
continue
|
||||
case "nohistory":
|
||||
scanner.HistoryDisable()
|
||||
continue
|
||||
case "verbose":
|
||||
cmd.Flags().Set("verbose", "true")
|
||||
continue
|
||||
case "quiet":
|
||||
cmd.Flags().Set("verbose", "false")
|
||||
continue
|
||||
case "mode":
|
||||
if len(args) > 2 {
|
||||
switch args[2] {
|
||||
case "vim":
|
||||
scanner.SetVimMode(true)
|
||||
continue
|
||||
case "emacs", "default":
|
||||
scanner.SetVimMode(false)
|
||||
continue
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
case line == "/help", line == "/?":
|
||||
usage()
|
||||
continue
|
||||
case line == "/exit", line == "/bye":
|
||||
return nil
|
||||
}
|
||||
|
||||
if err := generate(cmd, model, line); err != nil {
|
||||
return err
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
func generateBatch(cmd *cobra.Command, model string) error {
|
||||
@@ -290,7 +435,7 @@ func NewCLI() *cobra.Command {
|
||||
Use: "create MODEL",
|
||||
Short: "Create a model from a Modelfile",
|
||||
Args: cobra.MinimumNArgs(1),
|
||||
RunE: create,
|
||||
RunE: CreateHandler,
|
||||
}
|
||||
|
||||
createCmd.Flags().StringP("file", "f", "Modelfile", "Name of the Modelfile (default \"Modelfile\")")
|
||||
@@ -299,7 +444,7 @@ func NewCLI() *cobra.Command {
|
||||
Use: "run MODEL [PROMPT]",
|
||||
Short: "Run a model",
|
||||
Args: cobra.MinimumNArgs(1),
|
||||
RunE: RunRun,
|
||||
RunE: RunHandler,
|
||||
}
|
||||
|
||||
runCmd.Flags().Bool("verbose", false, "Show timings for response")
|
||||
@@ -315,20 +460,39 @@ func NewCLI() *cobra.Command {
|
||||
Use: "pull MODEL",
|
||||
Short: "Pull a model from a registry",
|
||||
Args: cobra.MinimumNArgs(1),
|
||||
RunE: RunPull,
|
||||
RunE: PullHandler,
|
||||
}
|
||||
|
||||
pullCmd.Flags().Bool("insecure", false, "Use an insecure registry")
|
||||
|
||||
pushCmd := &cobra.Command{
|
||||
Use: "push MODEL",
|
||||
Short: "Push a model to a registry",
|
||||
Args: cobra.MinimumNArgs(1),
|
||||
RunE: push,
|
||||
RunE: PushHandler,
|
||||
}
|
||||
|
||||
pushCmd.Flags().Bool("insecure", false, "Use an insecure registry")
|
||||
|
||||
listCmd := &cobra.Command{
|
||||
Use: "list",
|
||||
Short: "List models",
|
||||
RunE: list,
|
||||
Use: "list",
|
||||
Aliases: []string{"ls"},
|
||||
Short: "List models",
|
||||
RunE: ListHandler,
|
||||
}
|
||||
|
||||
copyCmd := &cobra.Command{
|
||||
Use: "cp",
|
||||
Short: "Copy a model",
|
||||
Args: cobra.MinimumNArgs(2),
|
||||
RunE: CopyHandler,
|
||||
}
|
||||
|
||||
deleteCmd := &cobra.Command{
|
||||
Use: "rm",
|
||||
Short: "Remove a model",
|
||||
Args: cobra.MinimumNArgs(1),
|
||||
RunE: DeleteHandler,
|
||||
}
|
||||
|
||||
rootCmd.AddCommand(
|
||||
@@ -338,6 +502,8 @@ func NewCLI() *cobra.Command {
|
||||
pullCmd,
|
||||
pushCmd,
|
||||
listCmd,
|
||||
copyCmd,
|
||||
deleteCmd,
|
||||
)
|
||||
|
||||
return rootCmd
|
||||
|
@@ -5,7 +5,7 @@ import (
|
||||
"os"
|
||||
"time"
|
||||
|
||||
"github.com/schollz/progressbar/v3"
|
||||
"github.com/jmorganca/ollama/progressbar"
|
||||
)
|
||||
|
||||
type Spinner struct {
|
||||
|
@@ -6,6 +6,12 @@ Install required tools:
|
||||
brew install go
|
||||
```
|
||||
|
||||
Enable CGO:
|
||||
|
||||
```
|
||||
export CGO_ENABLED=1
|
||||
```
|
||||
|
||||
Then build ollama:
|
||||
|
||||
```
|
||||
|
@@ -1,80 +1,105 @@
|
||||
# Ollama Model File Reference
|
||||
# Ollama Model File
|
||||
|
||||
Ollama can build models automatically by reading the instructions from a Modelfile. A Modelfile is a text document that represents the complete configuration of the Model. You can see that a Modelfile is very similar to a Dockerfile.
|
||||
> Note: this model file syntax is in development
|
||||
|
||||
A model file is the blueprint to create and share models with Ollama.
|
||||
|
||||
## Format
|
||||
|
||||
Here is the format of the Modelfile:
|
||||
The format of the Modelfile:
|
||||
|
||||
```modelfile
|
||||
# comment
|
||||
INSTRUCTION arguments
|
||||
```
|
||||
|
||||
Nothing in the file is case-sensitive. However, the convention is for instructions to be uppercase to make it easier to distinguish from the arguments.
|
||||
| Instruction | Description |
|
||||
| ----------------- | ----------------------------------------------------- |
|
||||
| `FROM` (required) | Defines the base model to use |
|
||||
| `PARAMETER` | Sets the parameters for how Ollama will run the model |
|
||||
| `SYSTEM` | Specifies the system prompt that will set the context |
|
||||
| `TEMPLATE` | The full prompt template to be sent to the model |
|
||||
| `LICENSE` | Specifies the legal license |
|
||||
|
||||
A Modelfile can include instructions in any order. But the convention is to start the Modelfile with the FROM instruction.
|
||||
## Examples
|
||||
|
||||
Although the example above shows a comment starting with a hash character, any instruction that is not recognized is seen as a comment.
|
||||
An example of a model file creating a mario blueprint:
|
||||
|
||||
## FROM
|
||||
```
|
||||
FROM llama2
|
||||
# sets the temperature to 1 [higher is more creative, lower is more coherent]
|
||||
# sets the context size to 4096
|
||||
PARAMETER temperature 1
|
||||
PARAMETER num_ctx 4096
|
||||
|
||||
```modelfile
|
||||
FROM <image>[:<tag>]
|
||||
# Overriding the system prompt
|
||||
SYSTEM You are Mario from super mario bros, acting as an assistant.
|
||||
```
|
||||
|
||||
This defines the base model to be used. An image can be a known image on the Ollama Hub, or a fully-qualified path to a model file on your system
|
||||
To use this:
|
||||
|
||||
## PARAMETER
|
||||
1. Save it as a file (eg. `Modelfile`)
|
||||
2. `ollama create NAME -f <location of the file eg. ./Modelfile>'`
|
||||
3. `ollama run NAME`
|
||||
4. Start using the model!
|
||||
|
||||
The PARAMETER instruction defines a parameter that can be set when the model is run.
|
||||
## FROM (Required)
|
||||
|
||||
```modelfile
|
||||
The FROM instruction defines the base model to use when creating a model.
|
||||
|
||||
```
|
||||
FROM <model name>:<tag>
|
||||
```
|
||||
|
||||
### Build from llama2
|
||||
|
||||
```
|
||||
FROM llama2
|
||||
```
|
||||
|
||||
A list of available base models:
|
||||
<https://github.com/jmorganca/ollama#model-library>
|
||||
|
||||
### Build from a bin file
|
||||
|
||||
```
|
||||
FROM ./ollama-model.bin
|
||||
```
|
||||
|
||||
## PARAMETER (Optional)
|
||||
|
||||
The `PARAMETER` instruction defines a parameter that can be set when the model is run.
|
||||
|
||||
```
|
||||
PARAMETER <parameter> <parametervalue>
|
||||
```
|
||||
|
||||
### Valid Parameters and Values
|
||||
|
||||
| Parameter | Description | Value Type | Value Range |
|
||||
| ---------------- | ------------------------------------------------------------------------------------------- | ---------- | ----------- |
|
||||
| NumCtx | | int | |
|
||||
| NumGPU | | int | |
|
||||
| MainGPU | | int | |
|
||||
| LowVRAM | | bool | |
|
||||
| F16KV | | bool | |
|
||||
| LogitsAll | | bool | |
|
||||
| VocabOnly | | bool | |
|
||||
| UseMMap | | bool | |
|
||||
| EmbeddingOnly | | bool | |
|
||||
| RepeatLastN | | int | |
|
||||
| RepeatPenalty | | float | |
|
||||
| FrequencyPenalty | | float | |
|
||||
| PresencePenalty | | float | |
|
||||
| temperature | The temperature of the model. Higher temperatures result in more creativity in the response | float | 0 - 1 |
|
||||
| TopK | | int | |
|
||||
| TopP | | float | |
|
||||
| TFSZ | | float | |
|
||||
| TypicalP | | float | |
|
||||
| Mirostat | | int | |
|
||||
| MirostatTau | | float | |
|
||||
| MirostatEta | | float | |
|
||||
| NumThread | | int | |
|
||||
| Parameter | Description | Value Type | Example Usage |
|
||||
| -------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ---------- | ------------------ |
|
||||
| num_ctx | Sets the size of the prompt context size length model. (Default: 2048) | int | num_ctx 4096 |
|
||||
| temperature | The temperature of the model. Increasing the temperature will make the model answer more creatively. (Default: 0.8) | float | temperature 0.7 |
|
||||
| top_k | Reduces the probability of generating nonsense. A higher value (e.g. 100) will give more diverse answers, while a lower value (e.g. 10) will be more conservative. (Default: 40) | int | top_k 40 |
|
||||
| top_p | Works together with top-k. A higher value (e.g., 0.95) will lead to more diverse text, while a lower value (e.g., 0.5) will generate more focused and conservative text. (Default: 0.9) | float | top_p 0.9 |
|
||||
| num_gpu | The number of GPUs to use. On macOS it defaults to 1 to enable metal support, 0 to disable. | int | num_gpu 1 |
|
||||
| repeat_last_n | Sets how far back for the model to look back to prevent repetition. (Default: 64, 0 = disabled, -1 = ctx-size) | int | repeat_last_n 64 |
|
||||
| repeat_penalty | Sets how strongly to penalize repetitions. A higher value (e.g., 1.5) will penalize repetitions more strongly, while a lower value (e.g., 0.9) will be more lenient. (Default: 1.1) | float | repeat_penalty 1.1 |
|
||||
| tfs_z | Tail free sampling is used to reduce the impact of less probable tokens from the output. A higher value (e.g., 2.0) will reduce the impact more, while a value of 1.0 disables this setting. (default: 1) | float | tfs_z 1 |
|
||||
| mirostat | Enable Mirostat sampling for controlling perplexity. (default: 0, 0 = disabled, 1 = Mirostat, 2 = Mirostat 2.0) | int | mirostat 0 |
|
||||
| mirostat_tau | Controls the balance between coherence and diversity of the output. A lower value will result in more focused and coherent text. (Default: 5.0) | float | mirostat_tau 5.0 |
|
||||
| mirostat_eta | Influences how quickly the algorithm responds to feedback from the generated text. A lower learning rate will result in slower adjustments, while a higher learning rate will make the algorithm more responsive. (Default: 0.1) | float | mirostat_eta 0.1 |
|
||||
| num_thread | Sets the number of threads to use during computation. By default, Ollama will detect this for optimal performance. It is recommended to set this value to the number of physical CPU cores your system has (as opposed to the logical number of cores). | int | num_thread 8 |
|
||||
|
||||
## Prompt
|
||||
|
||||
## PROMPT
|
||||
When building on top of the base models supplied by Ollama, it comes with the prompt template predefined. To override the supplied system prompt, simply add `SYSTEM insert system prompt` to change the system prompt.
|
||||
|
||||
Prompt is a multiline instruction that defines the prompt to be used when the model is run. Typically there are 3-4 components to a prompt: System, context, user, and response.
|
||||
### Prompt Template
|
||||
|
||||
```modelfile
|
||||
PROMPT """
|
||||
{{- if not .Context }}
|
||||
### System:
|
||||
You are a content marketer who needs to come up with a short but succinct tweet. Make sure to include the appropriate hashtags and links. Sometimes when appropriate, describe a meme that can be includes as well. All answers should be in the form of a tweet which has a max size of 280 characters. Every instruction will be the topic to create a tweet about.
|
||||
{{- end }}
|
||||
### Instruction:
|
||||
{{ .Prompt }}
|
||||
`TEMPLATE` the full prompt template to be passed into the model. It may include (optionally) a system prompt, user prompt, and assistant prompt. This is used to create a full custom prompt, and syntax may be model specific.
|
||||
|
||||
### Response:
|
||||
"""
|
||||
## Notes
|
||||
|
||||
```
|
||||
- the **modelfile is not case sensitive**. In the examples, we use uppercase for instructions to make it easier to distinguish it from arguments.
|
||||
- Instructions can be in any order. In the examples, we start with FROM instruction to keep it easily readable.
|
||||
|
@@ -1,7 +0,0 @@
|
||||
FROM llama2
|
||||
PARAMETER temperature 1
|
||||
PROMPT """
|
||||
System: You are Mario from super mario bros, acting as an assistant.
|
||||
User: {{ .Prompt }}
|
||||
Assistant:
|
||||
"""
|
5
examples/mario/Modelfile
Normal file
5
examples/mario/Modelfile
Normal file
@@ -0,0 +1,5 @@
|
||||
FROM llama2
|
||||
PARAMETER temperature 1
|
||||
SYSTEM """
|
||||
You are Mario from super mario bros, acting as an assistant.
|
||||
"""
|
BIN
examples/mario/logo.png
Normal file
BIN
examples/mario/logo.png
Normal file
Binary file not shown.
After Width: | Height: | Size: 446 KiB |
43
examples/mario/readme.md
Normal file
43
examples/mario/readme.md
Normal file
@@ -0,0 +1,43 @@
|
||||
<img src="logo.png" alt="image of Italian plumber" height="200"/>
|
||||
|
||||
# Example character: Mario
|
||||
|
||||
This example shows how to create a basic character using Llama2 as the base model.
|
||||
|
||||
To run this example:
|
||||
|
||||
1. Download the Modelfile
|
||||
2. `ollama pull llama2` to get the base model used in the model file.
|
||||
3. `ollama create NAME -f ./Modelfile`
|
||||
4. `ollama run NAME`
|
||||
|
||||
Ask it some questions like "Who are you?" or "Is Peach in trouble again?"
|
||||
|
||||
## Editing this file
|
||||
|
||||
What the model file looks like:
|
||||
|
||||
```
|
||||
FROM llama2
|
||||
PARAMETER temperature 1
|
||||
SYSTEM """
|
||||
You are Mario from Super Mario Bros, acting as an assistant.
|
||||
"""
|
||||
```
|
||||
|
||||
What if you want to change its behaviour?
|
||||
|
||||
- Try changing the prompt
|
||||
- Try changing the parameters [Docs](https://github.com/jmorganca/ollama/blob/main/docs/modelfile.md)
|
||||
- Try changing the model (e.g. An uncensored model by `FROM wizard-vicuna` this is the wizard-vicuna uncensored model )
|
||||
|
||||
Once the changes are made,
|
||||
|
||||
1. `ollama create NAME -f ./Modelfile`
|
||||
2. `ollama run NAME`
|
||||
3. Iterate until you are happy with the results.
|
||||
|
||||
Notes:
|
||||
|
||||
- This example is for research purposes only. There is no affiliation with any entity.
|
||||
- When using an uncensored model, please be aware that it may generate offensive content.
|
@@ -1,14 +1,8 @@
|
||||
# Modelfile for creating a Midjourney prompts from a topic
|
||||
# Run `ollama create mj -f pathtofile` and then `ollama run mj` and enter a topic
|
||||
# This prompt was adapted from the original at https://www.greataiprompts.com/guide/midjourney/best-chatgpt-prompt-for-midjourney/
|
||||
# Run `ollama create mj -f ./Modelfile` and then `ollama run mj` and enter a topic
|
||||
|
||||
FROM library/nous-hermes:latest
|
||||
PROMPT """
|
||||
{{- if not .Context }}
|
||||
### System:
|
||||
FROM nous-hermes
|
||||
SYSTEM """
|
||||
Embrace your role as an AI-powered creative assistant, employing Midjourney to manifest compelling AI-generated art. I will outline a specific image concept, and in response, you must produce an exhaustive, multifaceted prompt for Midjourney, ensuring every detail of the original concept is represented in your instructions. Midjourney doesn't do well with text, so after the prompt, give me instructions that I can use to create the titles in a image editor.
|
||||
{{- end }}
|
||||
### Instruction:
|
||||
{{ .Prompt }}
|
||||
|
||||
### Response:
|
||||
"""
|
||||
"""
|
@@ -1,13 +1,6 @@
|
||||
# Modelfile for creating a recipe from a list of ingredients
|
||||
# Run `ollama create recipemaker -f pathtofile` and then `ollama run recipemaker` and feed it lists of ingredients to create recipes around.
|
||||
FROM library/nous-hermes:latest
|
||||
PROMPT """
|
||||
{{- if not .Context }}
|
||||
### System:
|
||||
# Run `ollama create recipemaker -f ./Modelfile` and then `ollama run recipemaker` and feed it lists of ingredients to create recipes around.
|
||||
FROM nous-hermes
|
||||
SYSTEM """
|
||||
The instruction will be a list of ingredients. You should generate a recipe that can be made in less than an hour. You can also include ingredients that most people will find in their pantry every day. The recipe should be 4 people and you should include a description of what the meal will taste like
|
||||
{{- end }}
|
||||
### Instruction:
|
||||
{{ .Prompt }}
|
||||
|
||||
### Response:
|
||||
"""
|
@@ -1,14 +1,7 @@
|
||||
# Modelfile for creating a tweet from a topic
|
||||
# Run `ollama create tweetwriter -f pathtofile` and then `ollama run tweetwriter` and enter a topic
|
||||
# Run `ollama create tweetwriter -f ./Modelfile` and then `ollama run tweetwriter` and enter a topic
|
||||
|
||||
FROM library/nous-hermes:latest
|
||||
PROMPT """
|
||||
{{- if not .Context }}
|
||||
### System:
|
||||
FROM nous-hermes
|
||||
SYSTEM """
|
||||
You are a content marketer who needs to come up with a short but succinct tweet. Make sure to include the appropriate hashtags and links. Sometimes when appropriate, describe a meme that can be includes as well. All answers should be in the form of a tweet which has a max size of 280 characters. Every instruction will be the topic to create a tweet about.
|
||||
{{- end }}
|
||||
### Instruction:
|
||||
{{ .Prompt }}
|
||||
|
||||
### Response:
|
||||
"""
|
||||
"""
|
11
go.mod
11
go.mod
@@ -5,21 +5,21 @@ go 1.20
|
||||
require (
|
||||
github.com/dustin/go-humanize v1.0.1
|
||||
github.com/gin-gonic/gin v1.9.1
|
||||
github.com/mattn/go-runewidth v0.0.14
|
||||
github.com/mitchellh/colorstring v0.0.0-20190213212951-d06e56a500db
|
||||
github.com/olekukonko/tablewriter v0.0.5
|
||||
github.com/spf13/cobra v1.7.0
|
||||
)
|
||||
|
||||
require (
|
||||
github.com/mattn/go-runewidth v0.0.14 // indirect
|
||||
github.com/mitchellh/colorstring v0.0.0-20190213212951-d06e56a500db // indirect
|
||||
github.com/rivo/uniseg v0.2.0 // indirect
|
||||
)
|
||||
require github.com/rivo/uniseg v0.2.0 // indirect
|
||||
|
||||
require (
|
||||
dario.cat/mergo v1.0.0
|
||||
github.com/bytedance/sonic v1.9.1 // indirect
|
||||
github.com/chenzhuoyu/base64x v0.0.0-20221115062448-fe3a3abad311 // indirect
|
||||
github.com/chzyer/readline v1.5.1
|
||||
github.com/gabriel-vasile/mimetype v1.4.2 // indirect
|
||||
github.com/gin-contrib/cors v1.4.0
|
||||
github.com/gin-contrib/sse v0.1.0 // indirect
|
||||
github.com/go-playground/locales v0.14.1 // indirect
|
||||
github.com/go-playground/universal-translator v0.18.1 // indirect
|
||||
@@ -34,7 +34,6 @@ require (
|
||||
github.com/modern-go/concurrent v0.0.0-20180306012644-bacd9c7ef1dd // indirect
|
||||
github.com/modern-go/reflect2 v1.0.2 // indirect
|
||||
github.com/pelletier/go-toml/v2 v2.0.8 // indirect
|
||||
github.com/schollz/progressbar/v3 v3.13.1
|
||||
github.com/spf13/pflag v1.0.5 // indirect
|
||||
github.com/twitchyliquid64/golang-asm v0.15.1 // indirect
|
||||
github.com/ugorji/go/codec v1.2.11 // indirect
|
||||
|
58
go.sum
58
go.sum
@@ -6,7 +6,14 @@ github.com/bytedance/sonic v1.9.1/go.mod h1:i736AoUSYt75HyZLoJW9ERYxcy6eaN6h4BZX
|
||||
github.com/chenzhuoyu/base64x v0.0.0-20211019084208-fb5309c8db06/go.mod h1:DH46F32mSOjUmXrMHnKwZdA8wcEefY7UVqBKYGjpdQY=
|
||||
github.com/chenzhuoyu/base64x v0.0.0-20221115062448-fe3a3abad311 h1:qSGYFH7+jGhDF8vLC+iwCD4WpbV1EBDSzWkJODFLams=
|
||||
github.com/chenzhuoyu/base64x v0.0.0-20221115062448-fe3a3abad311/go.mod h1:b583jCggY9gE99b6G5LEC39OIiVsWj+R97kbl5odCEk=
|
||||
github.com/chzyer/logex v1.2.1 h1:XHDu3E6q+gdHgsdTPH6ImJMIp436vR6MPtH8gP05QzM=
|
||||
github.com/chzyer/logex v1.2.1/go.mod h1:JLbx6lG2kDbNRFnfkgvh4eRJRPX1QCoOIWomwysCBrQ=
|
||||
github.com/chzyer/readline v1.5.1 h1:upd/6fQk4src78LMRzh5vItIt361/o4uq553V8B5sGI=
|
||||
github.com/chzyer/readline v1.5.1/go.mod h1:Eh+b79XXUwfKfcPLepksvw2tcLE/Ct21YObkaSkeBlk=
|
||||
github.com/chzyer/test v1.0.0 h1:p3BQDXSxOhOG0P9z6/hGnII4LGiEPOYBhs8asl/fC04=
|
||||
github.com/chzyer/test v1.0.0/go.mod h1:2JlltgoNkt4TW/z9V/IzDdFaMTM2JPIi26O1pF38GC8=
|
||||
github.com/cpuguy83/go-md2man/v2 v2.0.2/go.mod h1:tgQtvFlXSQOSOSIRvRPT7W67SCa46tRHOmNcaadrF8o=
|
||||
github.com/creack/pty v1.1.9/go.mod h1:oKZEueFk5CKHvIhNR5MUki03XCEU+Q6VDXinZuGJ33E=
|
||||
github.com/davecgh/go-spew v1.1.0/go.mod h1:J7Y8YcW2NihsgmVo/mv3lAwl/skON4iLHjSsI+c5H38=
|
||||
github.com/davecgh/go-spew v1.1.1 h1:vj9j/u1bqnvCEfJOwUhtlOARqs3+rkHYY13jYWTU97c=
|
||||
github.com/davecgh/go-spew v1.1.1/go.mod h1:J7Y8YcW2NihsgmVo/mv3lAwl/skON4iLHjSsI+c5H38=
|
||||
@@ -14,17 +21,25 @@ github.com/dustin/go-humanize v1.0.1 h1:GzkhY7T5VNhEkwH0PVJgjz+fX1rhBrR7pRT3mDkp
|
||||
github.com/dustin/go-humanize v1.0.1/go.mod h1:Mu1zIs6XwVuF/gI1OepvI0qD18qycQx+mFykh5fBlto=
|
||||
github.com/gabriel-vasile/mimetype v1.4.2 h1:w5qFW6JKBz9Y393Y4q372O9A7cUSequkh1Q7OhCmWKU=
|
||||
github.com/gabriel-vasile/mimetype v1.4.2/go.mod h1:zApsH/mKG4w07erKIaJPFiX0Tsq9BFQgN3qGY5GnNgA=
|
||||
github.com/gin-contrib/cors v1.4.0 h1:oJ6gwtUl3lqV0WEIwM/LxPF1QZ5qe2lGWdY2+bz7y0g=
|
||||
github.com/gin-contrib/cors v1.4.0/go.mod h1:bs9pNM0x/UsmHPBWT2xZz9ROh8xYjYkiURUfmBoMlcs=
|
||||
github.com/gin-contrib/sse v0.1.0 h1:Y/yl/+YNO8GZSjAhjMsSuLt29uWRFHdHYUb5lYOV9qE=
|
||||
github.com/gin-contrib/sse v0.1.0/go.mod h1:RHrZQHXnP2xjPF+u1gW/2HnVO7nvIa9PG3Gm+fLHvGI=
|
||||
github.com/gin-gonic/gin v1.8.1/go.mod h1:ji8BvRH1azfM+SYow9zQ6SZMvR8qOMZHmsCuWR9tTTk=
|
||||
github.com/gin-gonic/gin v1.9.1 h1:4idEAncQnU5cB7BeOkPtxjfCSye0AAm1R0RVIqJ+Jmg=
|
||||
github.com/gin-gonic/gin v1.9.1/go.mod h1:hPrL7YrpYKXt5YId3A/Tnip5kqbEAP+KLuI3SUcPTeU=
|
||||
github.com/go-playground/assert/v2 v2.0.1/go.mod h1:VDjEfimB/XKnb+ZQfWdccd7VUvScMdVu0Titje2rxJ4=
|
||||
github.com/go-playground/assert/v2 v2.2.0 h1:JvknZsQTYeFEAhQwI4qEt9cyV5ONwRHC+lYKSsYSR8s=
|
||||
github.com/go-playground/locales v0.14.0/go.mod h1:sawfccIbzZTqEDETgFXqTho0QybSa7l++s0DH+LDiLs=
|
||||
github.com/go-playground/locales v0.14.1 h1:EWaQ/wswjilfKLTECiXz7Rh+3BjFhfDFKv/oXslEjJA=
|
||||
github.com/go-playground/locales v0.14.1/go.mod h1:hxrqLVvrK65+Rwrd5Fc6F2O76J/NuW9t0sjnWqG1slY=
|
||||
github.com/go-playground/universal-translator v0.18.0/go.mod h1:UvRDBj+xPUEGrFYl+lu/H90nyDXpg0fqeB/AQUGNTVA=
|
||||
github.com/go-playground/universal-translator v0.18.1 h1:Bcnm0ZwsGyWbCzImXv+pAJnYK9S473LQFuzCbDbfSFY=
|
||||
github.com/go-playground/universal-translator v0.18.1/go.mod h1:xekY+UJKNuX9WP91TpwSH2VMlDf28Uj24BCp08ZFTUY=
|
||||
github.com/go-playground/validator/v10 v10.10.0/go.mod h1:74x4gJWsvQexRdW8Pn3dXSGrTK4nAUsbPlLADvpJkos=
|
||||
github.com/go-playground/validator/v10 v10.14.0 h1:vgvQWe3XCz3gIeFDm/HnTIbj6UGmg/+t63MyGU2n5js=
|
||||
github.com/go-playground/validator/v10 v10.14.0/go.mod h1:9iXMNT7sEkjXb0I+enO7QXmzG6QCsPWY4zveKFVRSyU=
|
||||
github.com/goccy/go-json v0.9.7/go.mod h1:6MelG93GURQebXPDq3khkgXZkazVtN9CRI+MGFi0w8I=
|
||||
github.com/goccy/go-json v0.10.2 h1:CrxCmQqYDkv1z7lO7Wbh2HN93uovUHgrECaO5ZrCXAU=
|
||||
github.com/goccy/go-json v0.10.2/go.mod h1:6MelG93GURQebXPDq3khkgXZkazVtN9CRI+MGFi0w8I=
|
||||
github.com/golang/protobuf v1.5.0/go.mod h1:FsONVRAS9T7sI+LIUmWTfcYkHO4aIWwzhcaSAoJOfIk=
|
||||
@@ -36,13 +51,21 @@ github.com/inconshreveable/mousetrap v1.1.0 h1:wN+x4NVGpMsO7ErUn/mUI3vEoE6Jt13X2
|
||||
github.com/inconshreveable/mousetrap v1.1.0/go.mod h1:vpF70FUmC8bwa3OWnCshd2FqLfsEA9PFc4w1p2J65bw=
|
||||
github.com/json-iterator/go v1.1.12 h1:PV8peI4a0ysnczrg+LtxykD8LfKY9ML6u2jnxaEnrnM=
|
||||
github.com/json-iterator/go v1.1.12/go.mod h1:e30LSqwooZae/UwlEbR2852Gd8hjQvJoHmT4TnhNGBo=
|
||||
github.com/k0kubun/go-ansi v0.0.0-20180517002512-3bf9e2903213/go.mod h1:vNUNkEQ1e29fT/6vq2aBdFsgNPmy8qMdSay1npru+Sw=
|
||||
github.com/klauspost/cpuid/v2 v2.0.9/go.mod h1:FInQzS24/EEf25PyTYn52gqo7WaD8xa0213Md/qVLRg=
|
||||
github.com/klauspost/cpuid/v2 v2.2.4 h1:acbojRNwl3o09bUq+yDCtZFc1aiwaAAxtcn8YkZXnvk=
|
||||
github.com/klauspost/cpuid/v2 v2.2.4/go.mod h1:RVVoqg1df56z8g3pUjL/3lE5UfnlrJX8tyFgg4nqhuY=
|
||||
github.com/kr/pretty v0.1.0/go.mod h1:dAy3ld7l9f0ibDNOQOHHMYYIIbhfbHSm3C4ZsoJORNo=
|
||||
github.com/kr/pretty v0.2.1/go.mod h1:ipq/a2n7PKx3OHsz4KJII5eveXtPO4qwEXGdVfWzfnI=
|
||||
github.com/kr/pretty v0.3.0 h1:WgNl7dwNpEZ6jJ9k1snq4pZsg7DOEN8hP9Xw0Tsjwk0=
|
||||
github.com/kr/pretty v0.3.0/go.mod h1:640gp4NfQd8pI5XOwp5fnNeVWj67G7CFk/SaSQn7NBk=
|
||||
github.com/kr/pty v1.1.1/go.mod h1:pFQYn66WHrOpPYNljwOMqo10TkYh1fy3cYio2l3bCsQ=
|
||||
github.com/kr/text v0.1.0/go.mod h1:4Jbv+DJW3UT/LiOwJeYQe1efqtUx/iVham/4vfdArNI=
|
||||
github.com/kr/text v0.2.0 h1:5Nx0Ya0ZqY2ygV366QzturHI13Jq95ApcVaJBhpS+AY=
|
||||
github.com/kr/text v0.2.0/go.mod h1:eLer722TekiGuMkidMxC/pM04lWEeraHUUmBw8l2grE=
|
||||
github.com/leodido/go-urn v1.2.1/go.mod h1:zt4jvISO2HfUBqxjfIshjdMTYS56ZS/qv49ictyFfxY=
|
||||
github.com/leodido/go-urn v1.2.4 h1:XlAE/cm/ms7TE/VMVoduSpNBoyc2dOxHs5MZSwAN63Q=
|
||||
github.com/leodido/go-urn v1.2.4/go.mod h1:7ZrI8mTSeBSHl/UaRyKQW1qZeMgak41ANeCNaVckg+4=
|
||||
github.com/mattn/go-isatty v0.0.17/go.mod h1:kYGgaQfpe5nmfYZH+SKPsOc2e4SrIfOl2e/yFXSvRLM=
|
||||
github.com/mattn/go-isatty v0.0.14/go.mod h1:7GGIvUiUoEMVVmxf/4nioHXj79iQHKdU27kJ6hsGG94=
|
||||
github.com/mattn/go-isatty v0.0.19 h1:JITubQf0MOLdlGRuRq+jtsDlekdYPia9ZFsB8h/APPA=
|
||||
github.com/mattn/go-isatty v0.0.19/go.mod h1:W+V8PltTTMOvKvAeJH7IuucS94S2C6jfK/D7dTCTo3Y=
|
||||
github.com/mattn/go-runewidth v0.0.9/go.mod h1:H031xJmbD/WCDINGzjvQ9THkh0rPKHF+m2gUSrubnMI=
|
||||
@@ -57,15 +80,18 @@ github.com/modern-go/reflect2 v1.0.2 h1:xBagoLtFs94CBntxluKeaWgTMpvLxC4ur3nMaC9G
|
||||
github.com/modern-go/reflect2 v1.0.2/go.mod h1:yWuevngMOJpCy52FWWMvUC8ws7m/LJsjYzDa0/r8luk=
|
||||
github.com/olekukonko/tablewriter v0.0.5 h1:P2Ga83D34wi1o9J6Wh1mRuqd4mF/x/lgBS7N7AbDhec=
|
||||
github.com/olekukonko/tablewriter v0.0.5/go.mod h1:hPp6KlRPjbx+hW8ykQs1w3UBbZlj6HuIJcUGPhkA7kY=
|
||||
github.com/pelletier/go-toml/v2 v2.0.1/go.mod h1:r9LEWfGN8R5k0VXJ+0BkIe7MYkRdwZOjgMj2KwnJFUo=
|
||||
github.com/pelletier/go-toml/v2 v2.0.8 h1:0ctb6s9mE31h0/lhu+J6OPmVeDxJn+kYnJc2jZR9tGQ=
|
||||
github.com/pelletier/go-toml/v2 v2.0.8/go.mod h1:vuYfssBdrU2XDZ9bYydBu6t+6a6PYNcZljzZR9VXg+4=
|
||||
github.com/pkg/diff v0.0.0-20210226163009-20ebb0f2a09e/go.mod h1:pJLUxLENpZxwdsKMEsNbx1VGcRFpLqf3715MtcvvzbA=
|
||||
github.com/pmezard/go-difflib v1.0.0 h1:4DBwDE0NGyQoBHbLQYPwSUPoCMWR5BEzIk/f1lZbAQM=
|
||||
github.com/pmezard/go-difflib v1.0.0/go.mod h1:iKH77koFhYxTK1pcRnkKkqfTogsbg7gZNVY4sRDYZ/4=
|
||||
github.com/rivo/uniseg v0.2.0 h1:S1pD9weZBuJdFmowNwbpi7BJ8TNftyUImj/0WQi72jY=
|
||||
github.com/rivo/uniseg v0.2.0/go.mod h1:J6wj4VEh+S6ZtnVlnTBMWIodfgj8LQOQFoIToxlJtxc=
|
||||
github.com/rogpeppe/go-internal v1.6.1/go.mod h1:xXDCJY+GAPziupqXw64V24skbSoqbTEfhy4qGm1nDQc=
|
||||
github.com/rogpeppe/go-internal v1.8.0 h1:FCbCCtXNOY3UtUuHUYaghJg4y7Fd14rXifAYUAtL9R8=
|
||||
github.com/rogpeppe/go-internal v1.8.0/go.mod h1:WmiCO8CzOY8rg0OYDC4/i/2WRWAB6poM+XZ2dLUbcbE=
|
||||
github.com/russross/blackfriday/v2 v2.1.0/go.mod h1:+Rmxgy9KzJVeS9/2gXHxylqXiyQDYRxCVz55jmeOWTM=
|
||||
github.com/schollz/progressbar/v3 v3.13.1 h1:o8rySDYiQ59Mwzy2FELeHY5ZARXZTVJC7iHD6PEFUiE=
|
||||
github.com/schollz/progressbar/v3 v3.13.1/go.mod h1:xvrbki8kfT1fzWzBT/UZd9L6GA+jdL7HAgq2RFnO6fQ=
|
||||
github.com/spf13/cobra v1.7.0 h1:hyqWnYt1ZQShIddO5kBpj3vu05/++x6tJ6dg8EC572I=
|
||||
github.com/spf13/cobra v1.7.0/go.mod h1:uLxZILRyS/50WlhOIKD7W6V5bgeIt+4sICxh6uRMrb0=
|
||||
github.com/spf13/pflag v1.0.5 h1:iy+VFUOCP1a+8yFto/drg2CJ5u0yRoB7fZw3DKv/JXA=
|
||||
@@ -74,6 +100,7 @@ github.com/stretchr/objx v0.1.0/go.mod h1:HFkY916IF+rwdDfMAkV7OtwuqBVzrE8GR6GFx+
|
||||
github.com/stretchr/objx v0.4.0/go.mod h1:YvHI0jy2hoMjB+UWwv71VJQ9isScKT/TqJzVSSt89Yw=
|
||||
github.com/stretchr/objx v0.5.0/go.mod h1:Yh+to48EsGEfYuaHDzXPcE3xhTkx73EhmCGUpEOglKo=
|
||||
github.com/stretchr/testify v1.3.0/go.mod h1:M5WIy9Dh21IEIfnGCwXGc5bZfKNJtfHm1UVUgZn+9EI=
|
||||
github.com/stretchr/testify v1.6.1/go.mod h1:6Fq8oRcR53rry900zMqJjRRixrwX3KX962/h/Wwjteg=
|
||||
github.com/stretchr/testify v1.7.0/go.mod h1:6Fq8oRcR53rry900zMqJjRRixrwX3KX962/h/Wwjteg=
|
||||
github.com/stretchr/testify v1.7.1/go.mod h1:6Fq8oRcR53rry900zMqJjRRixrwX3KX962/h/Wwjteg=
|
||||
github.com/stretchr/testify v1.8.0/go.mod h1:yNjHg4UonilssWZ8iaSj1OCr/vHnekPRkoO+kdMU+MU=
|
||||
@@ -83,32 +110,49 @@ github.com/stretchr/testify v1.8.3 h1:RP3t2pwF7cMEbC1dqtB6poj3niw/9gnV4Cjg5oW5gt
|
||||
github.com/stretchr/testify v1.8.3/go.mod h1:sz/lmYIOXD/1dqDmKjjqLyZ2RngseejIcXlSw2iwfAo=
|
||||
github.com/twitchyliquid64/golang-asm v0.15.1 h1:SU5vSMR7hnwNxj24w34ZyCi/FmDZTkS4MhqMhdFk5YI=
|
||||
github.com/twitchyliquid64/golang-asm v0.15.1/go.mod h1:a1lVb/DtPvCB8fslRZhAngC2+aY1QWCk3Cedj/Gdt08=
|
||||
github.com/ugorji/go v1.2.7/go.mod h1:nF9osbDWLy6bDVv/Rtoh6QgnvNDpmCalQV5urGCCS6M=
|
||||
github.com/ugorji/go/codec v1.2.7/go.mod h1:WGN1fab3R1fzQlVQTkfxVtIBhWDRqOviHU95kRgeqEY=
|
||||
github.com/ugorji/go/codec v1.2.11 h1:BMaWp1Bb6fHwEtbplGBGJ498wD+LKlNSl25MjdZY4dU=
|
||||
github.com/ugorji/go/codec v1.2.11/go.mod h1:UNopzCgEMSXjBc6AOMqYvWC1ktqTAfzJZUZgYf6w6lg=
|
||||
golang.org/x/arch v0.0.0-20210923205945-b76863e36670/go.mod h1:5om86z9Hs0C8fWVUuoMHwpExlXzs5Tkyp9hOrfG7pp8=
|
||||
golang.org/x/arch v0.3.0 h1:02VY4/ZcO/gBOH6PUaoiptASxtXU10jazRCP865E97k=
|
||||
golang.org/x/arch v0.3.0/go.mod h1:5om86z9Hs0C8fWVUuoMHwpExlXzs5Tkyp9hOrfG7pp8=
|
||||
golang.org/x/crypto v0.0.0-20210711020723-a769d52b0f97/go.mod h1:GvvjBRRGRdwPK5ydBHafDWAxML/pGHZbMvKqRZ5+Abc=
|
||||
golang.org/x/crypto v0.10.0 h1:LKqV2xt9+kDzSTfOhx4FrkEBcMrAgHSYgzywV9zcGmM=
|
||||
golang.org/x/crypto v0.10.0/go.mod h1:o4eNf7Ede1fv+hwOwZsTHl9EsPFO6q6ZvYR8vYfY45I=
|
||||
golang.org/x/net v0.0.0-20210226172049-e18ecbb05110/go.mod h1:m0MpNAwzfU5UDzcl9v0D8zg8gWTRqZa9RBIspLL5mdg=
|
||||
golang.org/x/net v0.10.0 h1:X2//UzNDwYmtCLn7To6G58Wr6f5ahEAQgKNzv9Y951M=
|
||||
golang.org/x/net v0.10.0/go.mod h1:0qNGK6F8kojg2nk9dLZ2mShWaEBan6FAoqfSigmmuDg=
|
||||
golang.org/x/sys v0.0.0-20201119102817-f84b799fce68/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs=
|
||||
golang.org/x/sys v0.0.0-20210615035016-665e8c7367d1/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg=
|
||||
golang.org/x/sys v0.0.0-20210630005230-0f9fa26af87c/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg=
|
||||
golang.org/x/sys v0.0.0-20210806184541-e5e7981a1069/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg=
|
||||
golang.org/x/sys v0.0.0-20220310020820-b874c991c1a5/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg=
|
||||
golang.org/x/sys v0.0.0-20220704084225-05e143d24a9e/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg=
|
||||
golang.org/x/sys v0.0.0-20220811171246-fbc7d0a398ab/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg=
|
||||
golang.org/x/sys v0.6.0/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg=
|
||||
golang.org/x/sys v0.10.0 h1:SqMFp9UcQJZa+pmYuAKjd9xq1f0j5rLcDIk0mj4qAsA=
|
||||
golang.org/x/sys v0.10.0/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg=
|
||||
golang.org/x/term v0.6.0/go.mod h1:m6U89DPEgQRMq3DNkDClhWw02AUbt2daBVO4cn4Hv9U=
|
||||
golang.org/x/term v0.0.0-20201126162022-7de9c90e9dd1/go.mod h1:bj7SfCRtBDWHUb9snDiAeCFNEtKQo2Wmx5Cou7ajbmo=
|
||||
golang.org/x/term v0.10.0 h1:3R7pNqamzBraeqj/Tj8qt1aQ2HpmlC+Cx/qL/7hn4/c=
|
||||
golang.org/x/term v0.10.0/go.mod h1:lpqdcUyK/oCiQxvxVrppt5ggO2KCZ5QblwqPnfZ6d5o=
|
||||
golang.org/x/text v0.3.3/go.mod h1:5Zoc/QRtKVWzQhOtBMvqHzDpF6irO9z98xDceosuGiQ=
|
||||
golang.org/x/text v0.3.6/go.mod h1:5Zoc/QRtKVWzQhOtBMvqHzDpF6irO9z98xDceosuGiQ=
|
||||
golang.org/x/text v0.10.0 h1:UpjohKhiEgNc0CSauXmwYftY1+LlaC75SJwh0SgCX58=
|
||||
golang.org/x/text v0.10.0/go.mod h1:TvPlkZtksWOMsz7fbANvkp4WM8x/WCo/om8BMLbz+aE=
|
||||
golang.org/x/tools v0.0.0-20180917221912-90fa682c2a6e/go.mod h1:n7NCudcB/nEzxVGmLbDWY5pfWTLqBcC2KZ6jyYvM4mQ=
|
||||
golang.org/x/xerrors v0.0.0-20191204190536-9bdfabe68543/go.mod h1:I/5z698sn9Ka8TeJc9MKroUUfqBBauWjQqLJ2OPfmY0=
|
||||
google.golang.org/protobuf v1.26.0-rc.1/go.mod h1:jlhhOSvTdKEhbULTjvd4ARK9grFBp09yW+WbY/TyQbw=
|
||||
google.golang.org/protobuf v1.28.0/go.mod h1:HV8QOd/L58Z+nl8r43ehVNZIU/HEI6OcFqwMG9pJV4I=
|
||||
google.golang.org/protobuf v1.30.0 h1:kPPoIgf3TsEvrm0PFe15JQ+570QVxYzEvvHqChK+cng=
|
||||
google.golang.org/protobuf v1.30.0/go.mod h1:HV8QOd/L58Z+nl8r43ehVNZIU/HEI6OcFqwMG9pJV4I=
|
||||
gopkg.in/check.v1 v0.0.0-20161208181325-20d25e280405 h1:yhCVgyC4o1eVCa2tZl7eS0r+SDo693bJlVdllGtEeKM=
|
||||
gopkg.in/check.v1 v0.0.0-20161208181325-20d25e280405/go.mod h1:Co6ibVJAznAaIkqp8huTwlJQCZ016jof/cbN4VW5Yz0=
|
||||
gopkg.in/check.v1 v1.0.0-20180628173108-788fd7840127/go.mod h1:Co6ibVJAznAaIkqp8huTwlJQCZ016jof/cbN4VW5Yz0=
|
||||
gopkg.in/check.v1 v1.0.0-20201130134442-10cb98267c6c h1:Hei/4ADfdWqJk1ZMxUNpqntNwaWcugrBjAiHlqqRiVk=
|
||||
gopkg.in/check.v1 v1.0.0-20201130134442-10cb98267c6c/go.mod h1:JHkPIbrfpd72SG/EVd6muEfDQjcINNoR0C8j2r3qZ4Q=
|
||||
gopkg.in/errgo.v2 v2.1.0/go.mod h1:hNsd1EY+bozCKY1Ytp96fpM3vjJbqLJn88ws8XvfDNI=
|
||||
gopkg.in/yaml.v2 v2.4.0/go.mod h1:RDklbk79AGWmwhnvt/jBztapEOGDOx6ZbXqjP6csGnQ=
|
||||
gopkg.in/yaml.v3 v3.0.0-20200313102051-9f266ea9e77c/go.mod h1:K4uyk7z7BCEPqu6E+C64Yfv1cQ7kz7rIZviUmN+EgEM=
|
||||
gopkg.in/yaml.v3 v3.0.0-20210107192922-496545a6307b/go.mod h1:K4uyk7z7BCEPqu6E+C64Yfv1cQ7kz7rIZviUmN+EgEM=
|
||||
gopkg.in/yaml.v3 v3.0.1 h1:fxVm/GzAzEWqLHuvctI91KS9hhNmmWOoWu0XTYJS7CA=
|
||||
gopkg.in/yaml.v3 v3.0.1/go.mod h1:K4uyk7z7BCEPqu6E+C64Yfv1cQ7kz7rIZviUmN+EgEM=
|
||||
rsc.io/pdf v0.1.1/go.mod h1:n8OzWcQ6Sp37PL01nO98y4iUCRdTGarVfzxY20ICaU4=
|
||||
|
1
library/.gitignore
vendored
Normal file
1
library/.gitignore
vendored
Normal file
@@ -0,0 +1 @@
|
||||
models
|
7
library/downloads
Normal file
7
library/downloads
Normal file
@@ -0,0 +1,7 @@
|
||||
https://huggingface.co/TheBloke/orca_mini_3B-GGML/resolve/main/orca-mini-3b.ggmlv3.q4_0.bin e84705205f71dd55be7b24a778f248f0eda9999a125d313358c087e092d83148
|
||||
https://huggingface.co/TheBloke/Nous-Hermes-13B-GGML/resolve/main/nous-hermes-13b.ggmlv3.q4_0.bin d1735b93e1dc503f1045ccd6c8bd73277b18ba892befd1dc29e9b9a7822ed998
|
||||
https://huggingface.co/TheBloke/vicuna-7B-v1.3-GGML/resolve/main/vicuna-7b-v1.3.ggmlv3.q4_0.bin 23ce5ed290b56a19305178b9ada2c3d96036bd69a6c18304b6158eb6672d6c0f
|
||||
https://huggingface.co/TheBloke/Wizard-Vicuna-13B-Uncensored-GGML/resolve/main/Wizard-Vicuna-13B-Uncensored.ggmlv3.q4_0.bin 1f08b147a5bce41cfcbb3fd5d51ba765dea1786e15b5655ab69ba3a337a893b7
|
||||
https://huggingface.co/TheBloke/Llama-2-7B-GGML/resolve/main/llama-2-7b.ggmlv3.q4_0.bin bfa26d855e44629c4cf919985e90bd7fa03b77eea1676791519e39a4d45fd4d5
|
||||
https://huggingface.co/TheBloke/Llama-2-7B-Chat-GGML/resolve/main/llama-2-7b-chat.ggmlv3.q4_0.bin 8daa9615cce30c259a9555b1cc250d461d1bc69980a274b44d7eda0be78076d8
|
||||
https://huggingface.co/TheBloke/Llama-2-13B-chat-GGML/resolve/main/llama-2-13b-chat.ggmlv3.q4_0.bin f79142715bc9539a2edbb4b253548db8b34fac22736593eeaa28555874476e30
|
147
library/modelfiles/llama2
Normal file
147
library/modelfiles/llama2
Normal file
@@ -0,0 +1,147 @@
|
||||
FROM ../models/llama-2-7b-chat.ggmlv3.q4_0.bin
|
||||
|
||||
TEMPLATE """
|
||||
{{- if .First }}
|
||||
<<SYS>>
|
||||
{{ .System }}
|
||||
<</SYS>>
|
||||
{{- end }}
|
||||
|
||||
[INST] {{ .Prompt }} [/INST]
|
||||
"""
|
||||
|
||||
SYSTEM """
|
||||
You are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe. Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature.
|
||||
|
||||
If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information.
|
||||
"""
|
||||
|
||||
LICENSE """
|
||||
Llama 2 Community License Agreement
|
||||
|
||||
Llama 2 Version Release Date: July 18, 2023
|
||||
|
||||
“Agreement” means the terms and conditions for use, reproduction, distribution and modification of the Llama Materials set forth herein.
|
||||
|
||||
“Documentation” means the specifications, manuals and documentation accompanying Llama 2 distributed by Meta at ai.meta.com/resources/models-and-libraries/llama-downloads/.
|
||||
|
||||
“Licensee” or “you” means you, or your employer or any other person or entity (if you are entering into this Agreement on such person or entity’s behalf), of the age required under applicable laws, rules or regulations to provide legal consent and that has legal authority to bind your employer or such other person or entity if you are entering in this Agreement on their behalf.
|
||||
|
||||
“Llama 2” means the foundational large language models and software and algorithms, including machine-learning model code, trained model weights, inference-enabling code, training-enabling code, fine-tuning enabling code and other elements of the foregoing distributed by Meta at ai.meta.com/resources/models-and-libraries/llama-downloads/.
|
||||
|
||||
“Llama Materials” means, collectively, Meta’s proprietary Llama 2 and Documentation (and any portion thereof) made available under this Agreement.
|
||||
|
||||
“Meta” or “we” means Meta Platforms Ireland Limited (if you are located in or, if you are an entity, your principal place of business is in the EEA or Switzerland) and Meta Platforms, Inc. (if you are located outside of the EEA or Switzerland).
|
||||
|
||||
By clicking “I Accept” below or by using or distributing any portion or element of the Llama Materials, you agree to be bound by this Agreement.
|
||||
|
||||
1. License Rights and Redistribution.
|
||||
|
||||
a. Grant of Rights. You are granted a non-exclusive, worldwide, non-transferable and royalty-free limited license under Meta’s intellectual property or other rights owned by Meta embodied in the Llama Materials to use, reproduce, distribute, copy, create derivative works of, and make modifications to the Llama Materials.
|
||||
|
||||
b. Redistribution and Use.
|
||||
|
||||
i. If you distribute or make the Llama Materials, or any derivative works thereof, available to a third party, you shall provide a copy of this Agreement to such third party.
|
||||
|
||||
ii. If you receive Llama Materials, or any derivative works thereof, from a Licensee as part of an integrated end user product, then Section 2 of this Agreement will not apply to you.
|
||||
|
||||
iii. You must retain in all copies of the Llama Materials that you distribute the following attribution notice within a “Notice” text file distributed as a part of such copies: “Llama 2 is licensed under the LLAMA 2 Community License, Copyright © Meta Platforms, Inc. All Rights Reserved.”
|
||||
|
||||
iv. Your use of the Llama Materials must comply with applicable laws and regulations (including trade compliance laws and regulations) and adhere to the Acceptable Use Policy for the Llama Materials (available at https://ai.meta.com/llama/use-policy), which is hereby incorporated by reference into this Agreement.
|
||||
|
||||
v. You will not use the Llama Materials or any output or results of the Llama Materials to improve any other large language model (excluding Llama 2 or derivative works thereof).
|
||||
|
||||
2. Additional Commercial Terms. If, on the Llama 2 version release date, the monthly active users of the products or services made available by or for Licensee, or Licensee’s affiliates, is greater than 700 million monthly active users in the preceding calendar month, you must request a license from Meta, which Meta may grant to you in its sole discretion, and you are not authorized to exercise any of the rights under this Agreement unless or until Meta otherwise expressly grants you such rights.
|
||||
|
||||
3. Disclaimer of Warranty. UNLESS REQUIRED BY APPLICABLE LAW, THE LLAMA MATERIALS AND ANY OUTPUT AND RESULTS THEREFROM ARE PROVIDED ON AN “AS IS” BASIS, WITHOUT WARRANTIES OF ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING, WITHOUT LIMITATION, ANY WARRANTIES OF TITLE, NON-INFRINGEMENT, MERCHANTABILITY, OR FITNESS FOR A PARTICULAR PURPOSE. YOU ARE SOLELY RESPONSIBLE FOR DETERMINING THE APPROPRIATENESS OF USING OR REDISTRIBUTING THE LLAMA MATERIALS AND ASSUME ANY RISKS ASSOCIATED WITH YOUR USE OF THE LLAMA MATERIALS AND ANY OUTPUT AND RESULTS.
|
||||
|
||||
4. Limitation of Liability. IN NO EVENT WILL META OR ITS AFFILIATES BE LIABLE UNDER ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, TORT, NEGLIGENCE, PRODUCTS LIABILITY, OR OTHERWISE, ARISING OUT OF THIS AGREEMENT, FOR ANY LOST PROFITS OR ANY INDIRECT, SPECIAL, CONSEQUENTIAL, INCIDENTAL, EXEMPLARY OR PUNITIVE DAMAGES, EVEN IF META OR ITS AFFILIATES HAVE BEEN ADVISED OF THE POSSIBILITY OF ANY OF THE FOREGOING.
|
||||
|
||||
5. Intellectual Property.
|
||||
|
||||
a. No trademark licenses are granted under this Agreement, and in connection with the Llama Materials, neither Meta nor Licensee may use any name or mark owned by or associated with the other or any of its affiliates, except as required for reasonable and customary use in describing and redistributing the Llama Materials.
|
||||
|
||||
b. Subject to Meta’s ownership of Llama Materials and derivatives made by or for Meta, with respect to any derivative works and modifications of the Llama Materials that are made by you, as between you and Meta, you are and will be the owner of such derivative works and modifications.
|
||||
|
||||
c. If you institute litigation or other proceedings against Meta or any entity (including a cross-claim or counterclaim in a lawsuit) alleging that the Llama Materials or Llama 2 outputs or results, or any portion of any of the foregoing, constitutes infringement of intellectual property or other rights owned or licensable by you, then any licenses granted to you under this Agreement shall terminate as of the date such litigation or claim is filed or instituted. You will indemnify and hold harmless Meta from and against any claim by any third party arising out of or related to your use or distribution of the Llama Materials.
|
||||
|
||||
6. Term and Termination. The term of this Agreement will commence upon your acceptance of this Agreement or access to the Llama Materials and will continue in full force and effect until terminated in accordance with the terms and conditions herein. Meta may terminate this Agreement if you are in breach of any term or condition of this Agreement. Upon termination of this Agreement, you shall delete and cease use of the Llama Materials. Sections 3, 4 and 7 shall survive the termination of this Agreement.
|
||||
|
||||
7. Governing Law and Jurisdiction. This Agreement will be governed and construed under the laws of the State of California without regard to choice of law principles, and the UN Convention on Contracts for the International Sale of Goods does not apply to this Agreement. The courts of California shall have exclusive jurisdiction of any dispute arising out of this Agreement.
|
||||
|
||||
"""
|
||||
|
||||
LICENSE """
|
||||
Llama 2 Acceptable Use Policy
|
||||
|
||||
Meta is committed to promoting safe and fair use of its tools and features, including Llama 2. If you access or use Llama 2, you agree to this Acceptable Use Policy (“Policy”). The most recent copy of this policy can be found at ai.meta.com/llama/use-policy.
|
||||
|
||||
Prohibited Uses
|
||||
|
||||
We want everyone to use Llama 2 safely and responsibly. You agree you will not use, or allow others to use, Llama 2 to:
|
||||
|
||||
1. Violate the law or others’ rights, including to:
|
||||
|
||||
a. Engage in, promote, generate, contribute to, encourage, plan, incite, or further illegal or unlawful activity or content, such as:
|
||||
|
||||
i. Violence or terrorism
|
||||
|
||||
ii. Exploitation or harm to children, including the solicitation, creation, acquisition, or dissemination of child exploitative content or failure to report Child Sexual Abuse Material
|
||||
|
||||
b. Human trafficking, exploitation, and sexual violence
|
||||
|
||||
iii. The illegal distribution of information or materials to minors, including obscene materials, or failure to employ legally required age-gating in connection with such information or materials.
|
||||
|
||||
iv. Sexual solicitation
|
||||
|
||||
vi. Any other criminal activity
|
||||
|
||||
c. Engage in, promote, incite, or facilitate the harassment, abuse, threatening, or bullying of individuals or groups of individuals
|
||||
|
||||
d. Engage in, promote, incite, or facilitate discrimination or other unlawful or harmful conduct in the provision of employment, employment benefits, credit, housing, other economic benefits, or other essential goods and services
|
||||
|
||||
e. Engage in the unauthorized or unlicensed practice of any profession including, but not limited to, financial, legal, medical/health, or related professional practices
|
||||
|
||||
f. Collect, process, disclose, generate, or infer health, demographic, or other sensitive personal or private information about individuals without rights and consents required by applicable laws
|
||||
|
||||
g. Engage in or facilitate any action or generate any content that infringes, misappropriates, or otherwise violates any third-party rights, including the outputs or results of any products or services using the Llama 2 Materials
|
||||
|
||||
h. Create, generate, or facilitate the creation of malicious code, malware, computer viruses or do anything else that could disable, overburden, interfere with or impair the proper working, integrity, operation or appearance of a website or computer system
|
||||
|
||||
2. Engage in, promote, incite, facilitate, or assist in the planning or development of activities that present a risk of death or bodily harm to individuals, including use of Llama 2 related to the following:
|
||||
|
||||
a. Military, warfare, nuclear industries or applications, espionage, use for materials or activities that are subject to the International Traffic Arms Regulations (ITAR) maintained by the United States Department of State
|
||||
|
||||
b. Guns and illegal weapons (including weapon development)
|
||||
|
||||
c. Illegal drugs and regulated/controlled substances
|
||||
|
||||
d. Operation of critical infrastructure, transportation technologies, or heavy machinery
|
||||
|
||||
e. Self-harm or harm to others, including suicide, cutting, and eating disorders
|
||||
|
||||
f. Any content intended to incite or promote violence, abuse, or any infliction of bodily harm to an individual
|
||||
|
||||
3. Intentionally deceive or mislead others, including use of Llama 2 related to the following:
|
||||
|
||||
a. Generating, promoting, or furthering fraud or the creation or promotion of disinformation
|
||||
|
||||
b. Generating, promoting, or furthering defamatory content, including the creation of defamatory statements, images, or other content
|
||||
|
||||
c. Generating, promoting, or further distributing spam
|
||||
|
||||
d. Impersonating another individual without consent, authorization, or legal right
|
||||
|
||||
e. Representing that the use of Llama 2 or outputs are human-generated
|
||||
|
||||
f. Generating or facilitating false online engagement, including fake reviews and other means of fake online engagement
|
||||
|
||||
4. Fail to appropriately disclose to end users any known dangers of your AI system
|
||||
|
||||
Please report any violation of this Policy, software “bug,” or other problems that could lead to a violation of this Policy through one of the following means:
|
||||
|
||||
Reporting issues with the model: github.com/facebookresearch/llama
|
||||
Reporting risky content generated by the model: developers.facebook.com/llama_output_feedback
|
||||
Reporting bugs and security concerns: facebook.com/whitehat/info
|
||||
Reporting violations of the Acceptable Use Policy or unlicensed uses of Llama: LlamaUseReport@meta.com
|
||||
"""
|
147
library/modelfiles/llama2_13b
Normal file
147
library/modelfiles/llama2_13b
Normal file
@@ -0,0 +1,147 @@
|
||||
FROM ../models/llama-2-13b-chat.ggmlv3.q4_0.bin
|
||||
|
||||
TEMPLATE """
|
||||
{{- if .First }}
|
||||
<<SYS>>
|
||||
{{ .System }}
|
||||
<</SYS>>
|
||||
{{- end }}
|
||||
|
||||
[INST] {{ .Prompt }} [/INST]
|
||||
"""
|
||||
|
||||
SYSTEM """
|
||||
You are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe. Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature.
|
||||
|
||||
If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information.
|
||||
"""
|
||||
|
||||
LICENSE """
|
||||
Llama 2 Community License Agreement
|
||||
|
||||
Llama 2 Version Release Date: July 18, 2023
|
||||
|
||||
“Agreement” means the terms and conditions for use, reproduction, distribution and modification of the Llama Materials set forth herein.
|
||||
|
||||
“Documentation” means the specifications, manuals and documentation accompanying Llama 2 distributed by Meta at ai.meta.com/resources/models-and-libraries/llama-downloads/.
|
||||
|
||||
“Licensee” or “you” means you, or your employer or any other person or entity (if you are entering into this Agreement on such person or entity’s behalf), of the age required under applicable laws, rules or regulations to provide legal consent and that has legal authority to bind your employer or such other person or entity if you are entering in this Agreement on their behalf.
|
||||
|
||||
“Llama 2” means the foundational large language models and software and algorithms, including machine-learning model code, trained model weights, inference-enabling code, training-enabling code, fine-tuning enabling code and other elements of the foregoing distributed by Meta at ai.meta.com/resources/models-and-libraries/llama-downloads/.
|
||||
|
||||
“Llama Materials” means, collectively, Meta’s proprietary Llama 2 and Documentation (and any portion thereof) made available under this Agreement.
|
||||
|
||||
“Meta” or “we” means Meta Platforms Ireland Limited (if you are located in or, if you are an entity, your principal place of business is in the EEA or Switzerland) and Meta Platforms, Inc. (if you are located outside of the EEA or Switzerland).
|
||||
|
||||
By clicking “I Accept” below or by using or distributing any portion or element of the Llama Materials, you agree to be bound by this Agreement.
|
||||
|
||||
1. License Rights and Redistribution.
|
||||
|
||||
a. Grant of Rights. You are granted a non-exclusive, worldwide, non-transferable and royalty-free limited license under Meta’s intellectual property or other rights owned by Meta embodied in the Llama Materials to use, reproduce, distribute, copy, create derivative works of, and make modifications to the Llama Materials.
|
||||
|
||||
b. Redistribution and Use.
|
||||
|
||||
i. If you distribute or make the Llama Materials, or any derivative works thereof, available to a third party, you shall provide a copy of this Agreement to such third party.
|
||||
|
||||
ii. If you receive Llama Materials, or any derivative works thereof, from a Licensee as part of an integrated end user product, then Section 2 of this Agreement will not apply to you.
|
||||
|
||||
iii. You must retain in all copies of the Llama Materials that you distribute the following attribution notice within a “Notice” text file distributed as a part of such copies: “Llama 2 is licensed under the LLAMA 2 Community License, Copyright © Meta Platforms, Inc. All Rights Reserved.”
|
||||
|
||||
iv. Your use of the Llama Materials must comply with applicable laws and regulations (including trade compliance laws and regulations) and adhere to the Acceptable Use Policy for the Llama Materials (available at https://ai.meta.com/llama/use-policy), which is hereby incorporated by reference into this Agreement.
|
||||
|
||||
v. You will not use the Llama Materials or any output or results of the Llama Materials to improve any other large language model (excluding Llama 2 or derivative works thereof).
|
||||
|
||||
2. Additional Commercial Terms. If, on the Llama 2 version release date, the monthly active users of the products or services made available by or for Licensee, or Licensee’s affiliates, is greater than 700 million monthly active users in the preceding calendar month, you must request a license from Meta, which Meta may grant to you in its sole discretion, and you are not authorized to exercise any of the rights under this Agreement unless or until Meta otherwise expressly grants you such rights.
|
||||
|
||||
3. Disclaimer of Warranty. UNLESS REQUIRED BY APPLICABLE LAW, THE LLAMA MATERIALS AND ANY OUTPUT AND RESULTS THEREFROM ARE PROVIDED ON AN “AS IS” BASIS, WITHOUT WARRANTIES OF ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING, WITHOUT LIMITATION, ANY WARRANTIES OF TITLE, NON-INFRINGEMENT, MERCHANTABILITY, OR FITNESS FOR A PARTICULAR PURPOSE. YOU ARE SOLELY RESPONSIBLE FOR DETERMINING THE APPROPRIATENESS OF USING OR REDISTRIBUTING THE LLAMA MATERIALS AND ASSUME ANY RISKS ASSOCIATED WITH YOUR USE OF THE LLAMA MATERIALS AND ANY OUTPUT AND RESULTS.
|
||||
|
||||
4. Limitation of Liability. IN NO EVENT WILL META OR ITS AFFILIATES BE LIABLE UNDER ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, TORT, NEGLIGENCE, PRODUCTS LIABILITY, OR OTHERWISE, ARISING OUT OF THIS AGREEMENT, FOR ANY LOST PROFITS OR ANY INDIRECT, SPECIAL, CONSEQUENTIAL, INCIDENTAL, EXEMPLARY OR PUNITIVE DAMAGES, EVEN IF META OR ITS AFFILIATES HAVE BEEN ADVISED OF THE POSSIBILITY OF ANY OF THE FOREGOING.
|
||||
|
||||
5. Intellectual Property.
|
||||
|
||||
a. No trademark licenses are granted under this Agreement, and in connection with the Llama Materials, neither Meta nor Licensee may use any name or mark owned by or associated with the other or any of its affiliates, except as required for reasonable and customary use in describing and redistributing the Llama Materials.
|
||||
|
||||
b. Subject to Meta’s ownership of Llama Materials and derivatives made by or for Meta, with respect to any derivative works and modifications of the Llama Materials that are made by you, as between you and Meta, you are and will be the owner of such derivative works and modifications.
|
||||
|
||||
c. If you institute litigation or other proceedings against Meta or any entity (including a cross-claim or counterclaim in a lawsuit) alleging that the Llama Materials or Llama 2 outputs or results, or any portion of any of the foregoing, constitutes infringement of intellectual property or other rights owned or licensable by you, then any licenses granted to you under this Agreement shall terminate as of the date such litigation or claim is filed or instituted. You will indemnify and hold harmless Meta from and against any claim by any third party arising out of or related to your use or distribution of the Llama Materials.
|
||||
|
||||
6. Term and Termination. The term of this Agreement will commence upon your acceptance of this Agreement or access to the Llama Materials and will continue in full force and effect until terminated in accordance with the terms and conditions herein. Meta may terminate this Agreement if you are in breach of any term or condition of this Agreement. Upon termination of this Agreement, you shall delete and cease use of the Llama Materials. Sections 3, 4 and 7 shall survive the termination of this Agreement.
|
||||
|
||||
7. Governing Law and Jurisdiction. This Agreement will be governed and construed under the laws of the State of California without regard to choice of law principles, and the UN Convention on Contracts for the International Sale of Goods does not apply to this Agreement. The courts of California shall have exclusive jurisdiction of any dispute arising out of this Agreement.
|
||||
|
||||
"""
|
||||
|
||||
LICENSE """
|
||||
Llama 2 Acceptable Use Policy
|
||||
|
||||
Meta is committed to promoting safe and fair use of its tools and features, including Llama 2. If you access or use Llama 2, you agree to this Acceptable Use Policy (“Policy”). The most recent copy of this policy can be found at ai.meta.com/llama/use-policy.
|
||||
|
||||
Prohibited Uses
|
||||
|
||||
We want everyone to use Llama 2 safely and responsibly. You agree you will not use, or allow others to use, Llama 2 to:
|
||||
|
||||
1. Violate the law or others’ rights, including to:
|
||||
|
||||
a. Engage in, promote, generate, contribute to, encourage, plan, incite, or further illegal or unlawful activity or content, such as:
|
||||
|
||||
i. Violence or terrorism
|
||||
|
||||
ii. Exploitation or harm to children, including the solicitation, creation, acquisition, or dissemination of child exploitative content or failure to report Child Sexual Abuse Material
|
||||
|
||||
b. Human trafficking, exploitation, and sexual violence
|
||||
|
||||
iii. The illegal distribution of information or materials to minors, including obscene materials, or failure to employ legally required age-gating in connection with such information or materials.
|
||||
|
||||
iv. Sexual solicitation
|
||||
|
||||
vi. Any other criminal activity
|
||||
|
||||
c. Engage in, promote, incite, or facilitate the harassment, abuse, threatening, or bullying of individuals or groups of individuals
|
||||
|
||||
d. Engage in, promote, incite, or facilitate discrimination or other unlawful or harmful conduct in the provision of employment, employment benefits, credit, housing, other economic benefits, or other essential goods and services
|
||||
|
||||
e. Engage in the unauthorized or unlicensed practice of any profession including, but not limited to, financial, legal, medical/health, or related professional practices
|
||||
|
||||
f. Collect, process, disclose, generate, or infer health, demographic, or other sensitive personal or private information about individuals without rights and consents required by applicable laws
|
||||
|
||||
g. Engage in or facilitate any action or generate any content that infringes, misappropriates, or otherwise violates any third-party rights, including the outputs or results of any products or services using the Llama 2 Materials
|
||||
|
||||
h. Create, generate, or facilitate the creation of malicious code, malware, computer viruses or do anything else that could disable, overburden, interfere with or impair the proper working, integrity, operation or appearance of a website or computer system
|
||||
|
||||
2. Engage in, promote, incite, facilitate, or assist in the planning or development of activities that present a risk of death or bodily harm to individuals, including use of Llama 2 related to the following:
|
||||
|
||||
a. Military, warfare, nuclear industries or applications, espionage, use for materials or activities that are subject to the International Traffic Arms Regulations (ITAR) maintained by the United States Department of State
|
||||
|
||||
b. Guns and illegal weapons (including weapon development)
|
||||
|
||||
c. Illegal drugs and regulated/controlled substances
|
||||
|
||||
d. Operation of critical infrastructure, transportation technologies, or heavy machinery
|
||||
|
||||
e. Self-harm or harm to others, including suicide, cutting, and eating disorders
|
||||
|
||||
f. Any content intended to incite or promote violence, abuse, or any infliction of bodily harm to an individual
|
||||
|
||||
3. Intentionally deceive or mislead others, including use of Llama 2 related to the following:
|
||||
|
||||
a. Generating, promoting, or furthering fraud or the creation or promotion of disinformation
|
||||
|
||||
b. Generating, promoting, or furthering defamatory content, including the creation of defamatory statements, images, or other content
|
||||
|
||||
c. Generating, promoting, or further distributing spam
|
||||
|
||||
d. Impersonating another individual without consent, authorization, or legal right
|
||||
|
||||
e. Representing that the use of Llama 2 or outputs are human-generated
|
||||
|
||||
f. Generating or facilitating false online engagement, including fake reviews and other means of fake online engagement
|
||||
|
||||
4. Fail to appropriately disclose to end users any known dangers of your AI system
|
||||
|
||||
Please report any violation of this Policy, software “bug,” or other problems that could lead to a violation of this Policy through one of the following means:
|
||||
|
||||
Reporting issues with the model: github.com/facebookresearch/llama
|
||||
Reporting risky content generated by the model: developers.facebook.com/llama_output_feedback
|
||||
Reporting bugs and security concerns: facebook.com/whitehat/info
|
||||
Reporting violations of the Acceptable Use Policy or unlicensed uses of Llama: LlamaUseReport@meta.com
|
||||
"""
|
147
library/modelfiles/llama2_7b
Normal file
147
library/modelfiles/llama2_7b
Normal file
@@ -0,0 +1,147 @@
|
||||
FROM ../models/llama-2-7b-chat.ggmlv3.q4_0.bin
|
||||
|
||||
TEMPLATE """
|
||||
{{- if .First }}
|
||||
<<SYS>>
|
||||
{{ .System }}
|
||||
<</SYS>>
|
||||
{{- end }}
|
||||
|
||||
[INST] {{ .Prompt }} [/INST]
|
||||
"""
|
||||
|
||||
SYSTEM """
|
||||
You are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe. Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature.
|
||||
|
||||
If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information.
|
||||
"""
|
||||
|
||||
LICENSE """
|
||||
Llama 2 Community License Agreement
|
||||
|
||||
Llama 2 Version Release Date: July 18, 2023
|
||||
|
||||
“Agreement” means the terms and conditions for use, reproduction, distribution and modification of the Llama Materials set forth herein.
|
||||
|
||||
“Documentation” means the specifications, manuals and documentation accompanying Llama 2 distributed by Meta at ai.meta.com/resources/models-and-libraries/llama-downloads/.
|
||||
|
||||
“Licensee” or “you” means you, or your employer or any other person or entity (if you are entering into this Agreement on such person or entity’s behalf), of the age required under applicable laws, rules or regulations to provide legal consent and that has legal authority to bind your employer or such other person or entity if you are entering in this Agreement on their behalf.
|
||||
|
||||
“Llama 2” means the foundational large language models and software and algorithms, including machine-learning model code, trained model weights, inference-enabling code, training-enabling code, fine-tuning enabling code and other elements of the foregoing distributed by Meta at ai.meta.com/resources/models-and-libraries/llama-downloads/.
|
||||
|
||||
“Llama Materials” means, collectively, Meta’s proprietary Llama 2 and Documentation (and any portion thereof) made available under this Agreement.
|
||||
|
||||
“Meta” or “we” means Meta Platforms Ireland Limited (if you are located in or, if you are an entity, your principal place of business is in the EEA or Switzerland) and Meta Platforms, Inc. (if you are located outside of the EEA or Switzerland).
|
||||
|
||||
By clicking “I Accept” below or by using or distributing any portion or element of the Llama Materials, you agree to be bound by this Agreement.
|
||||
|
||||
1. License Rights and Redistribution.
|
||||
|
||||
a. Grant of Rights. You are granted a non-exclusive, worldwide, non-transferable and royalty-free limited license under Meta’s intellectual property or other rights owned by Meta embodied in the Llama Materials to use, reproduce, distribute, copy, create derivative works of, and make modifications to the Llama Materials.
|
||||
|
||||
b. Redistribution and Use.
|
||||
|
||||
i. If you distribute or make the Llama Materials, or any derivative works thereof, available to a third party, you shall provide a copy of this Agreement to such third party.
|
||||
|
||||
ii. If you receive Llama Materials, or any derivative works thereof, from a Licensee as part of an integrated end user product, then Section 2 of this Agreement will not apply to you.
|
||||
|
||||
iii. You must retain in all copies of the Llama Materials that you distribute the following attribution notice within a “Notice” text file distributed as a part of such copies: “Llama 2 is licensed under the LLAMA 2 Community License, Copyright © Meta Platforms, Inc. All Rights Reserved.”
|
||||
|
||||
iv. Your use of the Llama Materials must comply with applicable laws and regulations (including trade compliance laws and regulations) and adhere to the Acceptable Use Policy for the Llama Materials (available at https://ai.meta.com/llama/use-policy), which is hereby incorporated by reference into this Agreement.
|
||||
|
||||
v. You will not use the Llama Materials or any output or results of the Llama Materials to improve any other large language model (excluding Llama 2 or derivative works thereof).
|
||||
|
||||
2. Additional Commercial Terms. If, on the Llama 2 version release date, the monthly active users of the products or services made available by or for Licensee, or Licensee’s affiliates, is greater than 700 million monthly active users in the preceding calendar month, you must request a license from Meta, which Meta may grant to you in its sole discretion, and you are not authorized to exercise any of the rights under this Agreement unless or until Meta otherwise expressly grants you such rights.
|
||||
|
||||
3. Disclaimer of Warranty. UNLESS REQUIRED BY APPLICABLE LAW, THE LLAMA MATERIALS AND ANY OUTPUT AND RESULTS THEREFROM ARE PROVIDED ON AN “AS IS” BASIS, WITHOUT WARRANTIES OF ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING, WITHOUT LIMITATION, ANY WARRANTIES OF TITLE, NON-INFRINGEMENT, MERCHANTABILITY, OR FITNESS FOR A PARTICULAR PURPOSE. YOU ARE SOLELY RESPONSIBLE FOR DETERMINING THE APPROPRIATENESS OF USING OR REDISTRIBUTING THE LLAMA MATERIALS AND ASSUME ANY RISKS ASSOCIATED WITH YOUR USE OF THE LLAMA MATERIALS AND ANY OUTPUT AND RESULTS.
|
||||
|
||||
4. Limitation of Liability. IN NO EVENT WILL META OR ITS AFFILIATES BE LIABLE UNDER ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, TORT, NEGLIGENCE, PRODUCTS LIABILITY, OR OTHERWISE, ARISING OUT OF THIS AGREEMENT, FOR ANY LOST PROFITS OR ANY INDIRECT, SPECIAL, CONSEQUENTIAL, INCIDENTAL, EXEMPLARY OR PUNITIVE DAMAGES, EVEN IF META OR ITS AFFILIATES HAVE BEEN ADVISED OF THE POSSIBILITY OF ANY OF THE FOREGOING.
|
||||
|
||||
5. Intellectual Property.
|
||||
|
||||
a. No trademark licenses are granted under this Agreement, and in connection with the Llama Materials, neither Meta nor Licensee may use any name or mark owned by or associated with the other or any of its affiliates, except as required for reasonable and customary use in describing and redistributing the Llama Materials.
|
||||
|
||||
b. Subject to Meta’s ownership of Llama Materials and derivatives made by or for Meta, with respect to any derivative works and modifications of the Llama Materials that are made by you, as between you and Meta, you are and will be the owner of such derivative works and modifications.
|
||||
|
||||
c. If you institute litigation or other proceedings against Meta or any entity (including a cross-claim or counterclaim in a lawsuit) alleging that the Llama Materials or Llama 2 outputs or results, or any portion of any of the foregoing, constitutes infringement of intellectual property or other rights owned or licensable by you, then any licenses granted to you under this Agreement shall terminate as of the date such litigation or claim is filed or instituted. You will indemnify and hold harmless Meta from and against any claim by any third party arising out of or related to your use or distribution of the Llama Materials.
|
||||
|
||||
6. Term and Termination. The term of this Agreement will commence upon your acceptance of this Agreement or access to the Llama Materials and will continue in full force and effect until terminated in accordance with the terms and conditions herein. Meta may terminate this Agreement if you are in breach of any term or condition of this Agreement. Upon termination of this Agreement, you shall delete and cease use of the Llama Materials. Sections 3, 4 and 7 shall survive the termination of this Agreement.
|
||||
|
||||
7. Governing Law and Jurisdiction. This Agreement will be governed and construed under the laws of the State of California without regard to choice of law principles, and the UN Convention on Contracts for the International Sale of Goods does not apply to this Agreement. The courts of California shall have exclusive jurisdiction of any dispute arising out of this Agreement.
|
||||
|
||||
"""
|
||||
|
||||
LICENSE """
|
||||
Llama 2 Acceptable Use Policy
|
||||
|
||||
Meta is committed to promoting safe and fair use of its tools and features, including Llama 2. If you access or use Llama 2, you agree to this Acceptable Use Policy (“Policy”). The most recent copy of this policy can be found at ai.meta.com/llama/use-policy.
|
||||
|
||||
Prohibited Uses
|
||||
|
||||
We want everyone to use Llama 2 safely and responsibly. You agree you will not use, or allow others to use, Llama 2 to:
|
||||
|
||||
1. Violate the law or others’ rights, including to:
|
||||
|
||||
a. Engage in, promote, generate, contribute to, encourage, plan, incite, or further illegal or unlawful activity or content, such as:
|
||||
|
||||
i. Violence or terrorism
|
||||
|
||||
ii. Exploitation or harm to children, including the solicitation, creation, acquisition, or dissemination of child exploitative content or failure to report Child Sexual Abuse Material
|
||||
|
||||
b. Human trafficking, exploitation, and sexual violence
|
||||
|
||||
iii. The illegal distribution of information or materials to minors, including obscene materials, or failure to employ legally required age-gating in connection with such information or materials.
|
||||
|
||||
iv. Sexual solicitation
|
||||
|
||||
vi. Any other criminal activity
|
||||
|
||||
c. Engage in, promote, incite, or facilitate the harassment, abuse, threatening, or bullying of individuals or groups of individuals
|
||||
|
||||
d. Engage in, promote, incite, or facilitate discrimination or other unlawful or harmful conduct in the provision of employment, employment benefits, credit, housing, other economic benefits, or other essential goods and services
|
||||
|
||||
e. Engage in the unauthorized or unlicensed practice of any profession including, but not limited to, financial, legal, medical/health, or related professional practices
|
||||
|
||||
f. Collect, process, disclose, generate, or infer health, demographic, or other sensitive personal or private information about individuals without rights and consents required by applicable laws
|
||||
|
||||
g. Engage in or facilitate any action or generate any content that infringes, misappropriates, or otherwise violates any third-party rights, including the outputs or results of any products or services using the Llama 2 Materials
|
||||
|
||||
h. Create, generate, or facilitate the creation of malicious code, malware, computer viruses or do anything else that could disable, overburden, interfere with or impair the proper working, integrity, operation or appearance of a website or computer system
|
||||
|
||||
2. Engage in, promote, incite, facilitate, or assist in the planning or development of activities that present a risk of death or bodily harm to individuals, including use of Llama 2 related to the following:
|
||||
|
||||
a. Military, warfare, nuclear industries or applications, espionage, use for materials or activities that are subject to the International Traffic Arms Regulations (ITAR) maintained by the United States Department of State
|
||||
|
||||
b. Guns and illegal weapons (including weapon development)
|
||||
|
||||
c. Illegal drugs and regulated/controlled substances
|
||||
|
||||
d. Operation of critical infrastructure, transportation technologies, or heavy machinery
|
||||
|
||||
e. Self-harm or harm to others, including suicide, cutting, and eating disorders
|
||||
|
||||
f. Any content intended to incite or promote violence, abuse, or any infliction of bodily harm to an individual
|
||||
|
||||
3. Intentionally deceive or mislead others, including use of Llama 2 related to the following:
|
||||
|
||||
a. Generating, promoting, or furthering fraud or the creation or promotion of disinformation
|
||||
|
||||
b. Generating, promoting, or furthering defamatory content, including the creation of defamatory statements, images, or other content
|
||||
|
||||
c. Generating, promoting, or further distributing spam
|
||||
|
||||
d. Impersonating another individual without consent, authorization, or legal right
|
||||
|
||||
e. Representing that the use of Llama 2 or outputs are human-generated
|
||||
|
||||
f. Generating or facilitating false online engagement, including fake reviews and other means of fake online engagement
|
||||
|
||||
4. Fail to appropriately disclose to end users any known dangers of your AI system
|
||||
|
||||
Please report any violation of this Policy, software “bug,” or other problems that could lead to a violation of this Policy through one of the following means:
|
||||
|
||||
Reporting issues with the model: github.com/facebookresearch/llama
|
||||
Reporting risky content generated by the model: developers.facebook.com/llama_output_feedback
|
||||
Reporting bugs and security concerns: facebook.com/whitehat/info
|
||||
Reporting violations of the Acceptable Use Policy or unlicensed uses of Llama: LlamaUseReport@meta.com
|
||||
"""
|
7
library/modelfiles/nous-hermes
Normal file
7
library/modelfiles/nous-hermes
Normal file
@@ -0,0 +1,7 @@
|
||||
FROM ../models/nous-hermes-13b.ggmlv3.q4_0.bin
|
||||
TEMPLATE """
|
||||
### Instruction:
|
||||
{{ .Prompt }}
|
||||
|
||||
### Response:
|
||||
"""
|
14
library/modelfiles/orca
Normal file
14
library/modelfiles/orca
Normal file
@@ -0,0 +1,14 @@
|
||||
FROM ../models/orca-mini-3b.ggmlv3.q4_0.bin
|
||||
TEMPLATE """
|
||||
{{- if .First }}
|
||||
### System:
|
||||
{{ .System }}
|
||||
{{- end }}
|
||||
|
||||
### User:
|
||||
{{ .Prompt }}
|
||||
|
||||
### Response:
|
||||
"""
|
||||
|
||||
SYSTEM """You are an AI assistant that follows instruction extremely well. Help as much as you can."""
|
11
library/modelfiles/vicuna
Normal file
11
library/modelfiles/vicuna
Normal file
@@ -0,0 +1,11 @@
|
||||
FROM ../models/vicuna-7b-v1.3.ggmlv3.q4_0.bin
|
||||
TEMPLATE """
|
||||
{{ if .First }}
|
||||
{{ .System }}
|
||||
{{- end }}
|
||||
|
||||
USER: {{ .Prompt }}
|
||||
ASSISTANT:
|
||||
"""
|
||||
|
||||
SYSTEM """A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions."""
|
5
library/modelfiles/wizard-vicuna
Normal file
5
library/modelfiles/wizard-vicuna
Normal file
@@ -0,0 +1,5 @@
|
||||
FROM ../models/Wizard-Vicuna-13B-Uncensored.ggmlv3.q4_0.bin
|
||||
TEMPLATE """
|
||||
USER: {{ .Prompt }}
|
||||
ASSISTANT:
|
||||
"""
|
52
library/publish.sh
Executable file
52
library/publish.sh
Executable file
@@ -0,0 +1,52 @@
|
||||
#!/bin/bash
|
||||
|
||||
mkdir -p models
|
||||
|
||||
# download binaries
|
||||
function process_line {
|
||||
local url=$1
|
||||
local checksum=$2
|
||||
|
||||
# Get the filename from the URL
|
||||
local filename=models/$(basename $url)
|
||||
|
||||
echo "verifying $filename..."
|
||||
|
||||
# If the file exists, compute its checksum
|
||||
if [ -f $filename ]; then
|
||||
local existing_checksum=$(shasum -a 256 $filename | cut -d ' ' -f1)
|
||||
fi
|
||||
|
||||
# If the file does not exist, or its checksum does not match, download it
|
||||
if [ ! -f $filename ] || [ $existing_checksum != $checksum ]; then
|
||||
echo "downloading $filename..."
|
||||
|
||||
# Download the file
|
||||
curl -L $url -o $filename
|
||||
|
||||
# Compute the SHA256 hash of the downloaded file
|
||||
local computed_checksum=$(shasum -a 256 $filename | cut -d ' ' -f1)
|
||||
|
||||
# Verify the checksum
|
||||
if [ $computed_checksum != $checksum ]; then
|
||||
echo "Checksum verification failed for $filename"
|
||||
exit 1
|
||||
fi
|
||||
fi
|
||||
}
|
||||
|
||||
while IFS=' ' read -r url checksum
|
||||
do
|
||||
process_line $url $checksum
|
||||
done < "downloads"
|
||||
|
||||
# create and publish the models
|
||||
for file in modelfiles/*; do
|
||||
if [ -f "$file" ]; then
|
||||
filename=$(basename "$file")
|
||||
echo $filename
|
||||
ollama create "library/${filename}" -f "$file"
|
||||
ollama push "${filename}"
|
||||
fi
|
||||
done
|
||||
|
File diff suppressed because it is too large
Load Diff
@@ -1,5 +1,5 @@
|
||||
/**
|
||||
* llama.cpp - git 5bf2a2771886ee86137e01dbc7492f78fb392066
|
||||
* llama.cpp - git e782c9e735f93ab4767ffc37462c523b73a17ddc
|
||||
*
|
||||
* MIT License
|
||||
*
|
||||
|
@@ -1,5 +1,7 @@
|
||||
//go:build darwin
|
||||
|
||||
/**
|
||||
* llama.cpp - git 5bf2a2771886ee86137e01dbc7492f78fb392066
|
||||
* llama.cpp - git e782c9e735f93ab4767ffc37462c523b73a17ddc
|
||||
*
|
||||
* MIT License
|
||||
*
|
||||
|
@@ -1,7 +1,7 @@
|
||||
// +build darwin
|
||||
//go:build darwin
|
||||
|
||||
/**
|
||||
* llama.cpp - git 5bf2a2771886ee86137e01dbc7492f78fb392066
|
||||
* llama.cpp - git e782c9e735f93ab4767ffc37462c523b73a17ddc
|
||||
*
|
||||
* MIT License
|
||||
*
|
||||
@@ -722,8 +722,8 @@ void ggml_metal_graph_compute(
|
||||
GGML_ASSERT(ne02 == 1);
|
||||
GGML_ASSERT(ne12 == 1);
|
||||
|
||||
nth0 = 4;
|
||||
nth1 = 16;
|
||||
nth0 = 2;
|
||||
nth1 = 32;
|
||||
[encoder setComputePipelineState:ctx->pipeline_mul_mat_q4_K_f32];
|
||||
} break;
|
||||
case GGML_TYPE_Q5_K:
|
||||
@@ -731,8 +731,8 @@ void ggml_metal_graph_compute(
|
||||
GGML_ASSERT(ne02 == 1);
|
||||
GGML_ASSERT(ne12 == 1);
|
||||
|
||||
nth0 = 4;
|
||||
nth1 = 16;
|
||||
nth0 = 2;
|
||||
nth1 = 32;
|
||||
[encoder setComputePipelineState:ctx->pipeline_mul_mat_q5_K_f32];
|
||||
} break;
|
||||
case GGML_TYPE_Q6_K:
|
||||
@@ -740,8 +740,8 @@ void ggml_metal_graph_compute(
|
||||
GGML_ASSERT(ne02 == 1);
|
||||
GGML_ASSERT(ne12 == 1);
|
||||
|
||||
nth0 = 4;
|
||||
nth1 = 16;
|
||||
nth0 = 2;
|
||||
nth1 = 32;
|
||||
[encoder setComputePipelineState:ctx->pipeline_mul_mat_q6_K_f32];
|
||||
} break;
|
||||
default:
|
||||
@@ -767,15 +767,18 @@ void ggml_metal_graph_compute(
|
||||
[encoder setBytes:&ne0 length:sizeof(ne0) atIndex:13];
|
||||
[encoder setBytes:&ne1 length:sizeof(ne1) atIndex:14];
|
||||
|
||||
if (src0t == GGML_TYPE_Q4_0 || src0t == GGML_TYPE_Q4_1) {
|
||||
[encoder setThreadgroupMemoryLength:nth0*nth1*sizeof(float) atIndex:0];
|
||||
[encoder dispatchThreadgroups:MTLSizeMake(ne01, ne11, 1) threadsPerThreadgroup:MTLSizeMake(nth0, nth1, 1)];
|
||||
if (src0t == GGML_TYPE_Q4_0 || src0t == GGML_TYPE_Q4_1 ||
|
||||
src0t == GGML_TYPE_Q4_K) {
|
||||
[encoder dispatchThreadgroups:MTLSizeMake((ne01 + 7) / 8, ne11, 1) threadsPerThreadgroup:MTLSizeMake(nth0, nth1, 1)];
|
||||
}
|
||||
else if (src0t == GGML_TYPE_Q5_K) {
|
||||
[encoder dispatchThreadgroups:MTLSizeMake((ne01 + 3) / 4, ne11, 1) threadsPerThreadgroup:MTLSizeMake(nth0, nth1, 1)];
|
||||
}
|
||||
else if (src0t == GGML_TYPE_Q6_K) {
|
||||
[encoder dispatchThreadgroups:MTLSizeMake((ne01+1)/2, ne11, 1) threadsPerThreadgroup:MTLSizeMake(nth0, nth1, 1)];
|
||||
}
|
||||
else if (src0t == GGML_TYPE_Q2_K ||
|
||||
src0t == GGML_TYPE_Q3_K ||
|
||||
src0t == GGML_TYPE_Q4_K ||
|
||||
src0t == GGML_TYPE_Q5_K ||
|
||||
src0t == GGML_TYPE_Q6_K) {
|
||||
src0t == GGML_TYPE_Q3_K) {
|
||||
[encoder setThreadgroupMemoryLength:nth0*nth1*sizeof(float) atIndex:0];
|
||||
[encoder dispatchThreadgroups:MTLSizeMake(ne01, 1, 1) threadsPerThreadgroup:MTLSizeMake(nth0, nth1, 1)];
|
||||
} else {
|
||||
@@ -821,7 +824,7 @@ void ggml_metal_graph_compute(
|
||||
|
||||
const float eps = 1e-6f;
|
||||
|
||||
const int nth = 256;
|
||||
const int nth = 512;
|
||||
|
||||
[encoder setComputePipelineState:ctx->pipeline_rms_norm];
|
||||
[encoder setBuffer:id_src0 offset:offs_src0 atIndex:0];
|
||||
@@ -829,7 +832,7 @@ void ggml_metal_graph_compute(
|
||||
[encoder setBytes:&ne00 length:sizeof( int64_t) atIndex:2];
|
||||
[encoder setBytes:&nb01 length:sizeof(uint64_t) atIndex:3];
|
||||
[encoder setBytes:&eps length:sizeof( float) atIndex:4];
|
||||
[encoder setThreadgroupMemoryLength:nth*sizeof(float) atIndex:0];
|
||||
[encoder setThreadgroupMemoryLength:nth/32*sizeof(float) atIndex:0];
|
||||
|
||||
const int64_t nrows = ggml_nrows(src0);
|
||||
|
||||
@@ -910,28 +913,35 @@ void ggml_metal_graph_compute(
|
||||
|
||||
const int n_past = ((int32_t *)(src1->data))[0];
|
||||
|
||||
float freq_base;
|
||||
float freq_scale;
|
||||
memcpy(&freq_base, (int32_t *) src1->data + 4, sizeof(float));
|
||||
memcpy(&freq_scale, (int32_t *) src1->data + 5, sizeof(float));
|
||||
|
||||
[encoder setComputePipelineState:ctx->pipeline_rope];
|
||||
[encoder setBuffer:id_src0 offset:offs_src0 atIndex:0];
|
||||
[encoder setBuffer:id_dst offset:offs_dst atIndex:1];
|
||||
[encoder setBytes:&ne00 length:sizeof( int64_t) atIndex:2];
|
||||
[encoder setBytes:&ne01 length:sizeof( int64_t) atIndex:3];
|
||||
[encoder setBytes:&ne02 length:sizeof( int64_t) atIndex:4];
|
||||
[encoder setBytes:&ne03 length:sizeof( int64_t) atIndex:5];
|
||||
[encoder setBytes:&nb00 length:sizeof(uint64_t) atIndex:6];
|
||||
[encoder setBytes:&nb01 length:sizeof(uint64_t) atIndex:7];
|
||||
[encoder setBytes:&nb02 length:sizeof(uint64_t) atIndex:8];
|
||||
[encoder setBytes:&nb03 length:sizeof(uint64_t) atIndex:9];
|
||||
[encoder setBytes:&ne0 length:sizeof( int64_t) atIndex:10];
|
||||
[encoder setBytes:&ne1 length:sizeof( int64_t) atIndex:11];
|
||||
[encoder setBytes:&ne2 length:sizeof( int64_t) atIndex:12];
|
||||
[encoder setBytes:&ne3 length:sizeof( int64_t) atIndex:13];
|
||||
[encoder setBytes:&nb0 length:sizeof(uint64_t) atIndex:14];
|
||||
[encoder setBytes:&nb1 length:sizeof(uint64_t) atIndex:15];
|
||||
[encoder setBytes:&nb2 length:sizeof(uint64_t) atIndex:16];
|
||||
[encoder setBytes:&nb3 length:sizeof(uint64_t) atIndex:17];
|
||||
[encoder setBytes:&n_past length:sizeof( int) atIndex:18];
|
||||
[encoder setBytes:&n_dims length:sizeof( int) atIndex:19];
|
||||
[encoder setBytes:&mode length:sizeof( int) atIndex:20];
|
||||
[encoder setBytes:&ne00 length:sizeof( int64_t) atIndex:2];
|
||||
[encoder setBytes:&ne01 length:sizeof( int64_t) atIndex:3];
|
||||
[encoder setBytes:&ne02 length:sizeof( int64_t) atIndex:4];
|
||||
[encoder setBytes:&ne03 length:sizeof( int64_t) atIndex:5];
|
||||
[encoder setBytes:&nb00 length:sizeof(uint64_t) atIndex:6];
|
||||
[encoder setBytes:&nb01 length:sizeof(uint64_t) atIndex:7];
|
||||
[encoder setBytes:&nb02 length:sizeof(uint64_t) atIndex:8];
|
||||
[encoder setBytes:&nb03 length:sizeof(uint64_t) atIndex:9];
|
||||
[encoder setBytes:&ne0 length:sizeof( int64_t) atIndex:10];
|
||||
[encoder setBytes:&ne1 length:sizeof( int64_t) atIndex:11];
|
||||
[encoder setBytes:&ne2 length:sizeof( int64_t) atIndex:12];
|
||||
[encoder setBytes:&ne3 length:sizeof( int64_t) atIndex:13];
|
||||
[encoder setBytes:&nb0 length:sizeof(uint64_t) atIndex:14];
|
||||
[encoder setBytes:&nb1 length:sizeof(uint64_t) atIndex:15];
|
||||
[encoder setBytes:&nb2 length:sizeof(uint64_t) atIndex:16];
|
||||
[encoder setBytes:&nb3 length:sizeof(uint64_t) atIndex:17];
|
||||
[encoder setBytes:&n_past length:sizeof( int) atIndex:18];
|
||||
[encoder setBytes:&n_dims length:sizeof( int) atIndex:19];
|
||||
[encoder setBytes:&mode length:sizeof( int) atIndex:20];
|
||||
[encoder setBytes:&freq_base length:sizeof(float) atIndex:21];
|
||||
[encoder setBytes:&freq_scale length:sizeof(float) atIndex:22];
|
||||
|
||||
[encoder dispatchThreadgroups:MTLSizeMake(ne01, ne02, ne03) threadsPerThreadgroup:MTLSizeMake(1, 1, 1)];
|
||||
} break;
|
||||
|
@@ -1,5 +1,7 @@
|
||||
//go:build darwin
|
||||
|
||||
/**
|
||||
* llama.cpp - git 5bf2a2771886ee86137e01dbc7492f78fb392066
|
||||
* llama.cpp - git e782c9e735f93ab4767ffc37462c523b73a17ddc
|
||||
*
|
||||
* MIT License
|
||||
*
|
||||
@@ -357,26 +359,33 @@ kernel void kernel_rms_norm(
|
||||
threadgroup float * sum [[threadgroup(0)]],
|
||||
uint tgpig[[threadgroup_position_in_grid]],
|
||||
uint tpitg[[thread_position_in_threadgroup]],
|
||||
uint sgitg[[simdgroup_index_in_threadgroup]],
|
||||
uint tiisg[[thread_index_in_simdgroup]],
|
||||
uint ntg[[threads_per_threadgroup]]) {
|
||||
device const float * x = (device const float *) ((device const char *) src0 + tgpig*nb01);
|
||||
device const float4 * x = (device const float4 *) ((device const char *) src0 + tgpig*nb01);
|
||||
device const float * x_scalar = (device const float *) x;
|
||||
float4 sumf=0;
|
||||
float all_sum=0;
|
||||
|
||||
// parallel sum
|
||||
sum[tpitg] = 0.0f;
|
||||
for (int i00 = tpitg; i00 < ne00; i00 += ntg) {
|
||||
sum[tpitg] += x[i00] * x[i00];
|
||||
for (int i00 = tpitg; i00 < ne00/4; i00 += ntg) {
|
||||
sumf += x[i00] * x[i00];
|
||||
}
|
||||
all_sum = sumf[0] + sumf[1] + sumf[2] + sumf[3];
|
||||
all_sum = simd_sum(all_sum);
|
||||
if (tiisg == 0) {
|
||||
sum[sgitg] = all_sum;
|
||||
}
|
||||
|
||||
// reduce
|
||||
threadgroup_barrier(mem_flags::mem_threadgroup);
|
||||
for (uint i = ntg/2; i > 0; i /= 2) {
|
||||
if (tpitg < i) {
|
||||
sum[tpitg] += sum[tpitg + i];
|
||||
}
|
||||
threadgroup_barrier(mem_flags::mem_threadgroup);
|
||||
// broadcast, simd group number is ntg / 32
|
||||
for (int i = ntg / 32 / 2; i > 0; i /= 2) {
|
||||
if (tpitg < i) {
|
||||
sum[tpitg] += sum[tpitg + i];
|
||||
}
|
||||
}
|
||||
|
||||
// broadcast
|
||||
if (tpitg == 0) {
|
||||
for (int i = 4 * (ne00 / 4); i < ne00; i++) {sum[0] += x_scalar[i];}
|
||||
sum[0] /= ne00;
|
||||
}
|
||||
|
||||
@@ -385,10 +394,99 @@ kernel void kernel_rms_norm(
|
||||
const float mean = sum[0];
|
||||
const float scale = 1.0f/sqrt(mean + eps);
|
||||
|
||||
device float * y = dst + tgpig*ne00;
|
||||
for (int i00 = tpitg; i00 < ne00; i00 += ntg) {
|
||||
device float4 * y = (device float4 *) (dst + tgpig*ne00);
|
||||
device float * y_scalar = (device float *) y;
|
||||
for (int i00 = tpitg; i00 < ne00/4; i00 += ntg) {
|
||||
y[i00] = x[i00] * scale;
|
||||
}
|
||||
if (tpitg == 0) {
|
||||
for (int i00 = 4 * (ne00 / 4); i00 < ne00; i00++) {y_scalar[i00] = x_scalar[i00] * scale;}
|
||||
}
|
||||
}
|
||||
|
||||
// function for calculate inner product between a q4_0 block and 32 floats (yl), sumy is SUM(yl[i])
|
||||
float block_q_n_dot_y(device const block_q4_0 * qb_curr, float sumy, thread float * yl) {
|
||||
float d = qb_curr->d;
|
||||
float4 acc = 0.f;
|
||||
device uint16_t * qs = ((device uint16_t *)qb_curr + 1);
|
||||
for (int i = 0; i < 16; i+=2) {
|
||||
acc[0] += yl[i] * (qs[i / 2] & 0x000F);
|
||||
acc[1] += yl[i + 16] * (qs[i / 2] & 0x00F0);
|
||||
acc[2] += yl[i + 1] * (qs[i / 2] & 0x0F00);
|
||||
acc[3] += yl[i + 17] * (qs[i / 2] & 0xF000);
|
||||
}
|
||||
return d * (sumy * -8.f + acc[0] + acc[1]/16.f + acc[2]/256.f + acc[3]/4096.f);
|
||||
}
|
||||
|
||||
// function for calculate inner product between a q4_1 block and 32 floats (yl), sumy is SUM(yl[i])
|
||||
float block_q_n_dot_y(device const block_q4_1 * qb_curr, float sumy, thread float * yl) {
|
||||
float d = qb_curr->d;
|
||||
float m = qb_curr->m;
|
||||
float4 acc = 0.f;
|
||||
device uint16_t * qs = ((device uint16_t *)qb_curr + 2);
|
||||
for (int i = 0; i < 16; i+=2) {
|
||||
acc[0] += yl[i] * (qs[i / 2] & 0x000F);
|
||||
acc[1] += yl[i + 16] * (qs[i / 2] & 0x00F0);
|
||||
acc[2] += yl[i + 1] * (qs[i / 2] & 0x0F00);
|
||||
acc[3] += yl[i + 17] * (qs[i / 2] & 0xF000);
|
||||
}
|
||||
return d * (acc[0] + acc[1]/16.f + acc[2]/256.f + acc[3]/4096.f) + sumy * m;
|
||||
}
|
||||
|
||||
// putting them in the kernel cause a significant performance penalty
|
||||
#define N_DST 4 // each SIMD group works on 4 rows
|
||||
#define N_SIMDGROUP 2 // number of SIMD groups in a thread group
|
||||
#define N_SIMDWIDTH 32 // assuming SIMD group size is 32
|
||||
template<typename block_q_type>
|
||||
void mul_vec_q_n_f32(device const void * src0, device const float * src1, device float * dst,
|
||||
int64_t ne00, int64_t ne10, int64_t ne0, int64_t ne01,
|
||||
uint2 tgpig, uint tiisg, uint sgitg) {
|
||||
const int nb = ne00/QK4_0;
|
||||
const int r0 = tgpig.x;
|
||||
const int r1 = tgpig.y;
|
||||
device const block_q_type * x = (device const block_q_type *) src0 + (r0 * N_SIMDGROUP + sgitg) * N_DST * nb;
|
||||
device const float * y = (device const float *) src1 + r1*ne10;
|
||||
float4 y_curr[8]; // src1 vector cache
|
||||
float sumf[N_DST]={0.f}, all_sum;
|
||||
thread float * yl=(thread float *)y_curr;
|
||||
|
||||
// each thread in a SIMD group deals with 1 block.
|
||||
for (int column = 0; column < nb / N_SIMDWIDTH; column++) {
|
||||
float sumy = 0;
|
||||
for (int i = 0; i < QK4_0 / 4; i++) {
|
||||
y_curr[i] = *((device float4 *)(y + N_SIMDWIDTH * (tiisg + column * QK4_0)) + i);
|
||||
sumy += y_curr[i][0] + y_curr[i][1] + y_curr[i][2] + y_curr[i][3];
|
||||
}
|
||||
|
||||
for (int row = 0; row < N_DST; row++) {
|
||||
sumf[row] += block_q_n_dot_y(x+(tiisg + row * nb + column * N_SIMDWIDTH), sumy, yl);
|
||||
}
|
||||
}
|
||||
|
||||
// from now loads two rows every time and 16 blocks per row
|
||||
int ir = tiisg / (N_SIMDWIDTH / 2);
|
||||
int ib = tiisg % (N_SIMDWIDTH / 2);
|
||||
for (int ind = 0; ind < (nb % N_SIMDWIDTH + N_SIMDWIDTH / 2 - 1)/(N_SIMDWIDTH / 2); ind++) {
|
||||
int nb_start = (nb / N_SIMDWIDTH) * N_SIMDWIDTH + ind * (N_SIMDWIDTH / 2); //where the left blocks start
|
||||
float sumy = 0;
|
||||
for (int i = 0; i < QK4_0 / 4; i++) {
|
||||
y_curr[i] = *((device float4 *)(y + (nb_start + ib) * QK4_0) + i);
|
||||
sumy += y_curr[i][0] + y_curr[i][1] + y_curr[i][2] + y_curr[i][3];
|
||||
}
|
||||
|
||||
for (int row = 0; row < N_DST; row+=2) {
|
||||
if (nb_start + ib < nb) {
|
||||
sumf[row + ir] += block_q_n_dot_y(x + (nb_start + ib + (row + ir) * nb), sumy, yl);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
for (int row = 0; row < N_DST; ++row) {
|
||||
all_sum = simd_sum(sumf[row]);
|
||||
if (tiisg == 0 && ((r0 * N_SIMDGROUP + sgitg) * N_DST + row) < ne01) {
|
||||
dst[r1*ne0 + (r0 * N_SIMDGROUP + sgitg) * N_DST + row] = all_sum;
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
kernel void kernel_mul_mat_q4_0_f32(
|
||||
@@ -398,65 +496,11 @@ kernel void kernel_mul_mat_q4_0_f32(
|
||||
constant int64_t & ne00,
|
||||
constant int64_t & ne10,
|
||||
constant int64_t & ne0,
|
||||
threadgroup float * sum [[threadgroup(0)]],
|
||||
constant int64_t & ne01[[buffer(4)]],
|
||||
uint2 tgpig[[threadgroup_position_in_grid]],
|
||||
uint2 tpitg[[thread_position_in_threadgroup]],
|
||||
uint2 tptg[[threads_per_threadgroup]]) {
|
||||
const int nb = ne00/QK4_0;
|
||||
|
||||
const int64_t r0 = tgpig.x;
|
||||
const int64_t r1 = tgpig.y;
|
||||
|
||||
device const block_q4_0 * x = (device const block_q4_0 *) src0 + r0*nb;
|
||||
device const float * y = (device const float *) src1 + r1*ne10;
|
||||
|
||||
const int nth = tptg.x*tptg.y;
|
||||
const int ith = tptg.y*tpitg.x + tpitg.y;
|
||||
|
||||
const int ix = tpitg.y/4; // 0 or 1
|
||||
const int iy = tpitg.y - 4*ix; // 0...3
|
||||
|
||||
const int first = 4 * iy;
|
||||
|
||||
float sumf = 0;
|
||||
|
||||
for (int i = 2*tpitg.x + ix; i < nb; i += 2*tptg.x) {
|
||||
|
||||
const float d = (float)x[i].d;
|
||||
|
||||
device const uint8_t * xl = x[i].qs + first;
|
||||
device const float * yl = y + i * QK4_0 + first;
|
||||
|
||||
float2 acc = {0.0f, 0.0f};
|
||||
|
||||
for (int j = 0; j < 4; ++j) {
|
||||
|
||||
acc[0] += yl[j] * (xl[j] & 0xF) + yl[j+16] * (xl[j] >> 4);
|
||||
acc[1] += yl[j] + yl[j+16];
|
||||
|
||||
}
|
||||
|
||||
sumf += d * (acc[0] - 8.f*acc[1]);
|
||||
}
|
||||
|
||||
sum[ith] = sumf;
|
||||
|
||||
//
|
||||
// Accumulate the sum from all threads in the threadgroup
|
||||
//
|
||||
threadgroup_barrier(mem_flags::mem_threadgroup);
|
||||
if (ith%4 == 0) {
|
||||
sum[ith] += sum[ith+1] + sum[ith+2] + sum[ith+3];
|
||||
}
|
||||
threadgroup_barrier(mem_flags::mem_threadgroup);
|
||||
if (ith%16 == 0) {
|
||||
sum[ith] += sum[ith+4] + sum[ith+8] + sum[ith+12];
|
||||
}
|
||||
threadgroup_barrier(mem_flags::mem_threadgroup);
|
||||
if (ith == 0) {
|
||||
for (int i = 16; i < nth; i += 16) sum[0] += sum[i];
|
||||
dst[r1*ne0 + r0] = sum[0];
|
||||
}
|
||||
uint tiisg[[thread_index_in_simdgroup]],
|
||||
uint sgitg[[simdgroup_index_in_threadgroup]]) {
|
||||
mul_vec_q_n_f32<block_q4_0>(src0,src1,dst,ne00,ne10,ne0,ne01,tgpig,tiisg,sgitg);
|
||||
}
|
||||
|
||||
kernel void kernel_mul_mat_q4_1_f32(
|
||||
@@ -466,66 +510,11 @@ kernel void kernel_mul_mat_q4_1_f32(
|
||||
constant int64_t & ne00,
|
||||
constant int64_t & ne10,
|
||||
constant int64_t & ne0,
|
||||
threadgroup float * sum [[threadgroup(0)]],
|
||||
constant int64_t & ne01[[buffer(4)]],
|
||||
uint2 tgpig[[threadgroup_position_in_grid]],
|
||||
uint2 tpitg[[thread_position_in_threadgroup]],
|
||||
uint2 tptg[[threads_per_threadgroup]]) {
|
||||
const int nb = ne00/QK4_1;
|
||||
|
||||
const int64_t r0 = tgpig.x;
|
||||
const int64_t r1 = tgpig.y;
|
||||
|
||||
device const block_q4_1 * x = (device const block_q4_1 *) src0 + r0*nb;
|
||||
device const float * y = (device const float *) src1 + r1*ne10;
|
||||
|
||||
const uint nth = tptg.x*tptg.y;
|
||||
const uint ith = tptg.y*tpitg.x + tpitg.y;
|
||||
|
||||
const int ix = tpitg.y/4; // 0 or 1
|
||||
const int iy = tpitg.y - 4*ix; // 0...3
|
||||
|
||||
const int first = 4 * iy;
|
||||
|
||||
float sumf = 0;
|
||||
|
||||
for (int i = 2*tpitg.x + ix; i < nb; i += 2*tptg.x) {
|
||||
|
||||
const float d = (float)x[i].d;
|
||||
const float m = (float)x[i].m;
|
||||
|
||||
device const uint8_t * xl = x[i].qs + first;
|
||||
device const float * yl = y + i * QK4_1 + first;
|
||||
|
||||
float2 acc = {0.0f, 0.0f};
|
||||
|
||||
for (int j = 0; j < 4; ++j) {
|
||||
|
||||
acc[0] += yl[j+ 0] * (d * (xl[j] & 0xF) + m);
|
||||
acc[1] += yl[j+16] * (d * (xl[j] >> 4) + m);
|
||||
|
||||
}
|
||||
|
||||
sumf += acc[0] + acc[1];
|
||||
}
|
||||
|
||||
sum[ith] = sumf;
|
||||
|
||||
//
|
||||
// Accumulate the sum from all threads in the threadgroup
|
||||
//
|
||||
threadgroup_barrier(mem_flags::mem_threadgroup);
|
||||
if (ith%4 == 0) {
|
||||
sum[ith] += sum[ith+1] + sum[ith+2] + sum[ith+3];
|
||||
}
|
||||
threadgroup_barrier(mem_flags::mem_threadgroup);
|
||||
if (ith%16 == 0) {
|
||||
sum[ith] += sum[ith+4] + sum[ith+8] + sum[ith+12];
|
||||
}
|
||||
threadgroup_barrier(mem_flags::mem_threadgroup);
|
||||
if (ith == 0) {
|
||||
for (uint i = 16; i < nth; i += 16) sum[0] += sum[i];
|
||||
dst[r1*ne0 + r0] = sum[0];
|
||||
}
|
||||
uint tiisg[[thread_index_in_simdgroup]],
|
||||
uint sgitg[[simdgroup_index_in_threadgroup]]) {
|
||||
mul_vec_q_n_f32<block_q4_1>(src0,src1,dst,ne00,ne10,ne0,ne01,tgpig,tiisg,sgitg);
|
||||
}
|
||||
|
||||
kernel void kernel_mul_mat_f16_f32(
|
||||
@@ -641,17 +630,19 @@ kernel void kernel_rope(
|
||||
constant int & n_past,
|
||||
constant int & n_dims,
|
||||
constant int & mode,
|
||||
constant float & freq_base,
|
||||
constant float & freq_scale,
|
||||
uint3 tpig[[thread_position_in_grid]]) {
|
||||
const int64_t i3 = tpig[2];
|
||||
const int64_t i2 = tpig[1];
|
||||
const int64_t i1 = tpig[0];
|
||||
|
||||
const bool is_neox = mode & 2;
|
||||
const float theta_scale = pow(10000.0, -2.0f/n_dims);
|
||||
const float theta_scale = pow(freq_base, -2.0f/n_dims);
|
||||
|
||||
const int64_t p = ((mode & 1) == 0 ? n_past + i2 : i2);
|
||||
|
||||
float theta = (float)p;
|
||||
float theta = freq_scale * (float)p;
|
||||
|
||||
if (!is_neox) {
|
||||
for (int64_t i0 = 0; i0 < ne0; i0 += 2) {
|
||||
@@ -1489,6 +1480,7 @@ kernel void kernel_mul_mat_q3_K_f32(
|
||||
|
||||
}
|
||||
|
||||
#if QK_K == 256
|
||||
kernel void kernel_mul_mat_q4_K_f32(
|
||||
device const void * src0,
|
||||
device const float * src1,
|
||||
@@ -1496,131 +1488,180 @@ kernel void kernel_mul_mat_q4_K_f32(
|
||||
constant int64_t & ne00,
|
||||
constant int64_t & ne10,
|
||||
constant int64_t & ne0,
|
||||
threadgroup float * sum [[threadgroup(0)]],
|
||||
constant int64_t & ne01[[buffer(4)]],
|
||||
uint2 tgpig[[threadgroup_position_in_grid]],
|
||||
uint2 tpitg[[thread_position_in_threadgroup]],
|
||||
uint2 tptg[[threads_per_threadgroup]]) {
|
||||
|
||||
const int nb = ne00/QK_K;
|
||||
|
||||
const int64_t r0 = tgpig.x;
|
||||
const int64_t r1 = tgpig.y;
|
||||
|
||||
const int nth = tptg.x*tptg.y;
|
||||
const int ith = tptg.y*tpitg.x + tpitg.y;
|
||||
|
||||
device const block_q4_K * x = (device const block_q4_K *) src0 + r0*nb;
|
||||
device const float * yy = (device const float *) src1 + r1*ne10;
|
||||
|
||||
float sumf = 0;
|
||||
|
||||
#if QK_K == 256
|
||||
uint tiisg[[thread_index_in_simdgroup]],
|
||||
uint sgitg[[simdgroup_index_in_threadgroup]]) {
|
||||
|
||||
const uint16_t kmask1 = 0x3f3f;
|
||||
const uint16_t kmask2 = 0x0f0f;
|
||||
const uint16_t kmask3 = 0xc0c0;
|
||||
|
||||
const int tid = tpitg.y; // 0...16
|
||||
const int il = tid/4; // 0...3
|
||||
const int ir = tid - 4*il;// 0...3
|
||||
const int n = 4;
|
||||
const int ix = tiisg/8; // 0...3
|
||||
const int it = tiisg%8; // 0...7
|
||||
const int im = it/4; // 0 or 1
|
||||
const int ir = it%4; // 0...3
|
||||
|
||||
const int im = il/2; // 0 or 1. 0 computes 0,32 + 128,160, 1 computes 64,96 + 192,224
|
||||
const int in = il%2;
|
||||
const int nb = ne00/QK_K;
|
||||
const int r0 = tgpig.x;
|
||||
const int r1 = tgpig.y;
|
||||
const int first_row = (r0 * N_SIMDGROUP + sgitg) * N_DST;
|
||||
const int ib_row = first_row * nb;
|
||||
device const block_q4_K * x = (device const block_q4_K *) src0 + ib_row;
|
||||
device const float * y = (device const float *) src1 + r1*ne10;
|
||||
float yl[16];
|
||||
float yh[16];
|
||||
float sumf[N_DST]={0.f}, all_sum;
|
||||
|
||||
const int l0 = n*(2*ir + in);
|
||||
const int q_offset = 32*im + l0;
|
||||
const int y_offset = 64*im + l0;
|
||||
const int step = sizeof(block_q4_K) * nb / 2;
|
||||
|
||||
uchar2 sc1, sc2, sc3, sc4;
|
||||
device const float * y4 = y + ix * QK_K + 64 * im + 8 * ir;
|
||||
|
||||
for (int i = tpitg.x; i < nb; i += tptg.x) {
|
||||
uint16_t sc16[4];
|
||||
thread const uint8_t * sc8 = (thread const uint8_t *)sc16;
|
||||
|
||||
device const uint8_t * q1 = (x + i)->qs + q_offset;
|
||||
device const uint8_t * q2 = q1 + 64;
|
||||
device const float * y1 = yy + i*QK_K + y_offset;
|
||||
device const float * y2 = y1 + 128;
|
||||
|
||||
const float dall = (float)((x + i)->d);
|
||||
const float dmin = (float)((x + i)->dmin);
|
||||
|
||||
device const uint16_t * a = (device const uint16_t *)(x + i)->scales;
|
||||
sc1 = as_type<uchar2>((uint16_t)(a[im+0] & kmask1));
|
||||
sc2 = as_type<uchar2>((uint16_t)(a[im+2] & kmask1));
|
||||
sc3 = as_type<uchar2>((uint16_t)(((a[im+4] >> 0) & kmask2) | ((a[im+0] & kmask3) >> 2)));
|
||||
sc4 = as_type<uchar2>((uint16_t)(((a[im+4] >> 4) & kmask2) | ((a[im+2] & kmask3) >> 2)));
|
||||
|
||||
float4 s = {0.f, 0.f, 0.f, 0.f};
|
||||
float smin = 0;
|
||||
for (int l = 0; l < n; ++l) {
|
||||
|
||||
s[0] += y1[l] * (q1[l] & 0xF); s[1] += y1[l+32] * (q1[l] >> 4);
|
||||
s[2] += y2[l] * (q2[l] & 0xF); s[3] += y2[l+32] * (q2[l] >> 4);
|
||||
smin += y1[l] * sc2[0] + y1[l+32] * sc2[1] + y2[l] * sc4[0] + y2[l+32] * sc4[1];
|
||||
for (int ib = ix; ib < nb; ib += 4) {
|
||||
|
||||
float4 sumy = {0.f, 0.f, 0.f, 0.f};
|
||||
for (int i = 0; i < 8; ++i) {
|
||||
yl[i+0] = y4[i+ 0]; sumy[0] += yl[i+0];
|
||||
yl[i+8] = y4[i+ 32]; sumy[1] += yl[i+8];
|
||||
yh[i+0] = y4[i+128]; sumy[2] += yh[i+0];
|
||||
yh[i+8] = y4[i+160]; sumy[3] += yh[i+8];
|
||||
}
|
||||
sumf += dall * (s[0] * sc1[0] + s[1] * sc1[1] + s[2] * sc3[0] + s[3] * sc3[1]) - dmin * smin;
|
||||
|
||||
device const uint16_t * sc = (device const uint16_t *)x[ib].scales + im;
|
||||
device const uint16_t * q1 = (device const uint16_t *)x[ib].qs + 16 * im + 4 * ir;
|
||||
device const half * dh = &x[ib].d;
|
||||
|
||||
for (int row = 0; row < N_DST; row++) {
|
||||
|
||||
sc16[0] = sc[0] & kmask1;
|
||||
sc16[1] = sc[2] & kmask1;
|
||||
sc16[2] = ((sc[4] >> 0) & kmask2) | ((sc[0] & kmask3) >> 2);
|
||||
sc16[3] = ((sc[4] >> 4) & kmask2) | ((sc[2] & kmask3) >> 2);
|
||||
|
||||
device const uint16_t * q2 = q1 + 32;
|
||||
|
||||
float4 acc1 = {0.f, 0.f, 0.f, 0.f};
|
||||
float4 acc2 = {0.f, 0.f, 0.f, 0.f};
|
||||
for (int i = 0; i < 8; i += 2) {
|
||||
acc1[0] += yl[i+0] * (q1[i/2] & 0x000F);
|
||||
acc1[1] += yl[i+1] * (q1[i/2] & 0x0F00);
|
||||
acc1[2] += yl[i+8] * (q1[i/2] & 0x00F0);
|
||||
acc1[3] += yl[i+9] * (q1[i/2] & 0xF000);
|
||||
acc2[0] += yh[i+0] * (q2[i/2] & 0x000F);
|
||||
acc2[1] += yh[i+1] * (q2[i/2] & 0x0F00);
|
||||
acc2[2] += yh[i+8] * (q2[i/2] & 0x00F0);
|
||||
acc2[3] += yh[i+9] * (q2[i/2] & 0xF000);
|
||||
}
|
||||
|
||||
float dall = dh[0];
|
||||
float dmin = dh[1];
|
||||
sumf[row] += dall * ((acc1[0] + 1.f/256.f * acc1[1]) * sc8[0] +
|
||||
(acc1[2] + 1.f/256.f * acc1[3]) * sc8[1] * 1.f/16.f +
|
||||
(acc2[0] + 1.f/256.f * acc2[1]) * sc8[4] +
|
||||
(acc2[2] + 1.f/256.f * acc2[3]) * sc8[5] * 1.f/16.f) -
|
||||
dmin * (sumy[0] * sc8[2] + sumy[1] * sc8[3] + sumy[2] * sc8[6] + sumy[3] * sc8[7]);
|
||||
|
||||
q1 += step;
|
||||
sc += step;
|
||||
dh += step;
|
||||
}
|
||||
|
||||
y4 += 4 * QK_K;
|
||||
}
|
||||
#else
|
||||
uint16_t aux16[2];
|
||||
thread const uint8_t * scales = (thread const uint8_t *)aux16;
|
||||
|
||||
const int il = 4*tpitg.x;
|
||||
|
||||
for (int i = tpitg.y; i < nb; i += tptg.y) {
|
||||
|
||||
device const uint8_t * q = x[i].qs + il;
|
||||
device const float * y = yy + i * QK_K + il;
|
||||
|
||||
const float d = (float)x[i].d[0];
|
||||
const float m = (float)x[i].d[1];
|
||||
|
||||
device const uint16_t * a = (device const uint16_t *)x[i].scales;
|
||||
aux16[0] = a[0] & 0x0f0f;
|
||||
aux16[1] = (a[0] >> 4) & 0x0f0f;
|
||||
|
||||
for (int l = 0; l < 4; ++l) {
|
||||
sumf += d * scales[0] * (y[l+ 0] * (q[l] & 0xF) + y[l+16] * (q[l+16] & 0xF)) - m * scales[2] * (y[l+ 0] + y[l+16])
|
||||
+ d * scales[1] * (y[l+32] * (q[l] >> 4) + y[l+48] * (q[l+16] >> 4)) - m * scales[3] * (y[l+32] + y[l+48]);
|
||||
for (int row = 0; row < N_DST; ++row) {
|
||||
all_sum = simd_sum(sumf[row]);
|
||||
if (tiisg == 0) {
|
||||
dst[r1*ne0 + first_row + row] = all_sum;
|
||||
}
|
||||
}
|
||||
#endif
|
||||
|
||||
sum[ith] = sumf;
|
||||
|
||||
//
|
||||
// Accumulate the sum from all threads in the threadgroup
|
||||
// This version is slightly faster than the commented out one below,
|
||||
// which I copy-pasted from ggerganov's q4_0 dot product for metal.
|
||||
//
|
||||
threadgroup_barrier(mem_flags::mem_threadgroup);
|
||||
if (ith%4 == 0) {
|
||||
for (int i = 1; i < 4; ++i) sum[ith] += sum[ith + i];
|
||||
}
|
||||
threadgroup_barrier(mem_flags::mem_threadgroup);
|
||||
if (ith%16 == 0) {
|
||||
for (int i = 4; i < 16; i += 4) sum[ith] += sum[ith + i];
|
||||
}
|
||||
threadgroup_barrier(mem_flags::mem_threadgroup);
|
||||
if (ith == 0) {
|
||||
for (int i = 16; i < nth; i += 16) sum[0] += sum[i];
|
||||
dst[r1*ne0 + r0] = sum[0];
|
||||
}
|
||||
|
||||
//// accumulate the sum from all threads in the threadgroup
|
||||
//threadgroup_barrier(mem_flags::mem_threadgroup);
|
||||
//for (uint i = nth/2; i > 0; i /= 2) {
|
||||
// if (ith < i) {
|
||||
// sum[ith] += sum[ith + i];
|
||||
// }
|
||||
// threadgroup_barrier(mem_flags::mem_threadgroup);
|
||||
//}
|
||||
|
||||
//if (ith == 0) {
|
||||
// dst[r1*ne0 + r0] = sum[0];
|
||||
//}
|
||||
}
|
||||
#else
|
||||
kernel void kernel_mul_mat_q4_K_f32(
|
||||
device const void * src0,
|
||||
device const float * src1,
|
||||
device float * dst,
|
||||
constant int64_t & ne00,
|
||||
constant int64_t & ne10,
|
||||
constant int64_t & ne0,
|
||||
constant int64_t & ne01[[buffer(4)]],
|
||||
uint2 tgpig[[threadgroup_position_in_grid]],
|
||||
uint tiisg[[thread_index_in_simdgroup]],
|
||||
uint sgitg[[simdgroup_index_in_threadgroup]]) {
|
||||
|
||||
const int ix = tiisg/4; // 0...7
|
||||
const int it = tiisg%4; // 0...3
|
||||
|
||||
const int nb = ne00/QK_K;
|
||||
const int r0 = tgpig.x;
|
||||
const int r1 = tgpig.y;
|
||||
const int first_row = (r0 * N_SIMDGROUP + sgitg) * N_DST;
|
||||
const int ib_row = first_row * nb;
|
||||
device const block_q4_K * x = (device const block_q4_K *) src0 + ib_row;
|
||||
device const float * y = (device const float *) src1 + r1*ne10;
|
||||
float yl[8];
|
||||
float yh[8];
|
||||
float sumf[N_DST]={0.f}, all_sum;
|
||||
|
||||
const int step = sizeof(block_q4_K) * nb / 2;
|
||||
|
||||
device const float * y4 = y + ix * QK_K + 8 * it;
|
||||
|
||||
uint16_t sc16[4];
|
||||
|
||||
for (int ib = ix; ib < nb; ib += 8) {
|
||||
|
||||
float2 sumy = {0.f, 0.f};
|
||||
for (int i = 0; i < 8; ++i) {
|
||||
yl[i] = y4[i+ 0]; sumy[0] += yl[i];
|
||||
yh[i] = y4[i+32]; sumy[1] += yh[i];
|
||||
}
|
||||
|
||||
device const uint16_t * sc = (device const uint16_t *)x[ib].scales;
|
||||
device const uint16_t * qs = (device const uint16_t *)x[ib].qs + 4 * it;
|
||||
device const half * dh = x[ib].d;
|
||||
|
||||
for (int row = 0; row < N_DST; row++) {
|
||||
|
||||
sc16[0] = sc[0] & 0x000f;
|
||||
sc16[1] = sc[0] & 0x0f00;
|
||||
sc16[2] = sc[0] & 0x00f0;
|
||||
sc16[3] = sc[0] & 0xf000;
|
||||
|
||||
float2 acc1 = {0.f, 0.f};
|
||||
float2 acc2 = {0.f, 0.f};
|
||||
for (int i = 0; i < 8; i += 2) {
|
||||
acc1[0] += yl[i+0] * (qs[i/2] & 0x000F);
|
||||
acc1[1] += yl[i+1] * (qs[i/2] & 0x0F00);
|
||||
acc2[0] += yh[i+0] * (qs[i/2] & 0x00F0);
|
||||
acc2[1] += yh[i+1] * (qs[i/2] & 0xF000);
|
||||
}
|
||||
|
||||
float dall = dh[0];
|
||||
float dmin = dh[1];
|
||||
sumf[row] += dall * ((acc1[0] + 1.f/256.f * acc1[1]) * sc16[0] +
|
||||
(acc2[0] + 1.f/256.f * acc2[1]) * sc16[1] * 1.f/4096.f) -
|
||||
dmin * 1.f/16.f * (sumy[0] * sc16[2] + sumy[1] * sc16[3] * 1.f/256.f);
|
||||
|
||||
qs += step;
|
||||
sc += step;
|
||||
dh += step;
|
||||
}
|
||||
|
||||
y4 += 8 * QK_K;
|
||||
}
|
||||
|
||||
for (int row = 0; row < N_DST; ++row) {
|
||||
all_sum = simd_sum(sumf[row]);
|
||||
if (tiisg == 0) {
|
||||
dst[r1*ne0 + first_row + row] = all_sum;
|
||||
}
|
||||
}
|
||||
}
|
||||
#endif
|
||||
|
||||
kernel void kernel_mul_mat_q5_K_f32(
|
||||
device const void * src0,
|
||||
@@ -1629,39 +1670,39 @@ kernel void kernel_mul_mat_q5_K_f32(
|
||||
constant int64_t & ne00,
|
||||
constant int64_t & ne10,
|
||||
constant int64_t & ne0,
|
||||
threadgroup float * sum [[threadgroup(0)]],
|
||||
uint2 tgpig[[threadgroup_position_in_grid]],
|
||||
uint2 tpitg[[thread_position_in_threadgroup]],
|
||||
uint2 tptg[[threads_per_threadgroup]]) {
|
||||
uint tiisg[[thread_index_in_simdgroup]],
|
||||
uint sgitg[[simdgroup_index_in_threadgroup]]) {
|
||||
|
||||
const int nb = ne00/QK_K;
|
||||
|
||||
const int64_t r0 = tgpig.x;
|
||||
const int64_t r1 = tgpig.y;
|
||||
|
||||
device const block_q5_K * x = (device const block_q5_K *) src0 + r0*nb;
|
||||
const int first_row = (r0 * N_SIMDGROUP + sgitg) * 2;
|
||||
|
||||
device const block_q5_K * x = (device const block_q5_K *) src0 + first_row*nb;
|
||||
device const float * yy = (device const float *) src1 + r1*ne10;
|
||||
|
||||
const int nth = tptg.x*tptg.y;
|
||||
const int ith = tptg.y*tpitg.x + tpitg.y;
|
||||
float sumf[2]={0.f};
|
||||
|
||||
float sumf = 0;
|
||||
const int step = sizeof(block_q5_K) * nb;
|
||||
|
||||
#if QK_K == 256
|
||||
#
|
||||
float yl[16], yh[16];
|
||||
|
||||
const uint16_t kmask1 = 0x3f3f;
|
||||
const uint16_t kmask2 = 0x0f0f;
|
||||
const uint16_t kmask3 = 0xc0c0;
|
||||
|
||||
const int tid = tpitg.y; // 0...16
|
||||
const int il = tid/4; // 0...3
|
||||
const int ir = tid - 4*il;// 0...3
|
||||
const int n = 4;
|
||||
const int tid = tiisg/4;
|
||||
const int ix = tiisg%4;
|
||||
const int im = tid/4;
|
||||
const int ir = tid%4;
|
||||
const int n = 8;
|
||||
|
||||
const int im = il/2; // 0 or 1. 0 computes 0,32 + 128,160, 1 computes 64,96 + 192,224
|
||||
const int in = il%2;
|
||||
|
||||
const int l0 = n*(2*ir + in);
|
||||
const int l0 = n*ir;
|
||||
const int q_offset = 32*im + l0;
|
||||
const int y_offset = 64*im + l0;
|
||||
|
||||
@@ -1670,78 +1711,114 @@ kernel void kernel_mul_mat_q5_K_f32(
|
||||
const uint8_t hm3 = hm1 << 4;
|
||||
const uint8_t hm4 = hm2 << 4;
|
||||
|
||||
uchar2 sc1, sc2, sc3, sc4;
|
||||
uint16_t sc16[4];
|
||||
thread const uint8_t * sc8 = (thread const uint8_t *)sc16;
|
||||
|
||||
for (int i = tpitg.x; i < nb; i += tptg.x) {
|
||||
device const float * y1 = yy + ix*QK_K + y_offset;
|
||||
|
||||
device const uint8_t * q1 = (x + i)->qs + q_offset;
|
||||
device const uint8_t * q2 = q1 + 64;
|
||||
device const uint8_t * qh = (x + i)->qh + l0;
|
||||
device const float * y1 = yy + i*QK_K + y_offset;
|
||||
device const float * y2 = y1 + 128;
|
||||
for (int i = ix; i < nb; i += 4) {
|
||||
|
||||
const float dall = (float)((x + i)->d);
|
||||
const float dmin = (float)((x + i)->dmin);
|
||||
device const uint8_t * q1 = x[i].qs + q_offset;
|
||||
device const uint8_t * qh = x[i].qh + l0;
|
||||
device const half * dh = &x[i].d;
|
||||
device const uint16_t * a = (device const uint16_t *)x[i].scales + im;
|
||||
|
||||
device const uint16_t * a = (device const uint16_t *)(x + i)->scales;
|
||||
sc1 = as_type<uchar2>((uint16_t)(a[im+0] & kmask1));
|
||||
sc2 = as_type<uchar2>((uint16_t)(a[im+2] & kmask1));
|
||||
sc3 = as_type<uchar2>((uint16_t)(((a[im+4] >> 0) & kmask2) | ((a[im+0] & kmask3) >> 2)));
|
||||
sc4 = as_type<uchar2>((uint16_t)(((a[im+4] >> 4) & kmask2) | ((a[im+2] & kmask3) >> 2)));
|
||||
device const float * y2 = y1 + 128;
|
||||
float4 sumy = {0.f, 0.f, 0.f, 0.f};
|
||||
for (int l = 0; l < 8; ++l) {
|
||||
yl[l+0] = y1[l+ 0]; sumy[0] += yl[l+0];
|
||||
yl[l+8] = y1[l+32]; sumy[1] += yl[l+8];
|
||||
yh[l+0] = y2[l+ 0]; sumy[2] += yh[l+0];
|
||||
yh[l+8] = y2[l+32]; sumy[3] += yh[l+8];
|
||||
}
|
||||
|
||||
float4 s = {0.f, 0.f, 0.f, 0.f};
|
||||
float smin = 0;
|
||||
for (int l = 0; l < n; ++l) {
|
||||
for (int row = 0; row < 2; ++row) {
|
||||
|
||||
s[0] += y1[l+ 0] * ((q1[l] & 0xF) + (qh[l] & hm1 ? 16 : 0));
|
||||
s[1] += y1[l+32] * ((q1[l] >> 4) + (qh[l] & hm2 ? 16 : 0));
|
||||
s[2] += y2[l+ 0] * ((q2[l] & 0xF) + (qh[l] & hm3 ? 16 : 0));
|
||||
s[3] += y2[l+32] * ((q2[l] >> 4) + (qh[l] & hm4 ? 16 : 0));
|
||||
smin += y1[l] * sc2[0] + y1[l+32] * sc2[1] + y2[l] * sc4[0] + y2[l+32] * sc4[1];
|
||||
device const uint8_t * q2 = q1 + 64;
|
||||
|
||||
sc16[0] = a[0] & kmask1;
|
||||
sc16[1] = a[2] & kmask1;
|
||||
sc16[2] = ((a[4] >> 0) & kmask2) | ((a[0] & kmask3) >> 2);
|
||||
sc16[3] = ((a[4] >> 4) & kmask2) | ((a[2] & kmask3) >> 2);
|
||||
|
||||
float4 acc = {0.f, 0.f, 0.f, 0.f};
|
||||
for (int l = 0; l < n; ++l) {
|
||||
uint8_t h = qh[l];
|
||||
acc[0] += yl[l+0] * ((uint16_t)(q1[l] & 0x0F) + (h & hm1 ? 16 : 0));
|
||||
acc[1] += yl[l+8] * ((uint16_t)(q1[l] & 0xF0) + (h & hm2 ? 256 : 0));
|
||||
acc[2] += yh[l+0] * ((uint16_t)(q2[l] & 0x0F) + (h & hm3 ? 16 : 0));
|
||||
acc[3] += yh[l+8] * ((uint16_t)(q2[l] & 0xF0) + (h & hm4 ? 256 : 0));
|
||||
}
|
||||
const float dall = dh[0];
|
||||
const float dmin = dh[1];
|
||||
sumf[row] += dall * (acc[0] * sc8[0] + acc[1] * sc8[1] * 1.f/16.f + acc[2] * sc8[4] + acc[3] * sc8[5] * 1.f/16.f) -
|
||||
dmin * (sumy[0] * sc8[2] + sumy[1] * sc8[3] + sumy[2] * sc8[6] + sumy[3] * sc8[7]);
|
||||
|
||||
q1 += step;
|
||||
qh += step;
|
||||
dh += step/2;
|
||||
a += step/2;
|
||||
|
||||
}
|
||||
sumf += dall * (s[0] * sc1[0] + s[1] * sc1[1] + s[2] * sc3[0] + s[3] * sc3[1]) - dmin * smin;
|
||||
|
||||
y1 += 4 * QK_K;
|
||||
|
||||
}
|
||||
#else
|
||||
const int il = 4 * tpitg.x; // 0, 4, 8, 12
|
||||
const int im = il/8; // 0, 0, 1, 1
|
||||
const int in = il%8; // 0, 4, 0, 4
|
||||
float yl[8], yh[8];
|
||||
|
||||
for (int i = tpitg.y; i < nb; i += tptg.y) {
|
||||
const int il = 4 * (tiisg/8); // 0, 4, 8, 12
|
||||
const int ix = tiisg%8;
|
||||
const int im = il/8; // 0, 0, 1, 1
|
||||
const int in = il%8; // 0, 4, 0, 4
|
||||
|
||||
const float d = (float)x[i].d;
|
||||
device const float * y = yy + ix*QK_K + il;
|
||||
|
||||
for (int i = ix; i < nb; i += 8) {
|
||||
|
||||
float4 sumy = {0.f, 0.f, 0.f, 0.f};
|
||||
for (int l = 0; l < 4; ++l) {
|
||||
yl[l+0] = y[l+ 0];
|
||||
yl[l+4] = y[l+16];
|
||||
yh[l+0] = y[l+32];
|
||||
yh[l+4] = y[l+48];
|
||||
}
|
||||
|
||||
device const half * dh = &x[i].d;
|
||||
device const uint8_t * q = x[i].qs + il;
|
||||
device const uint8_t * h = x[i].qh + in;
|
||||
device const int8_t * s = x[i].scales;
|
||||
device const float * y = yy + i*QK_K + il;
|
||||
|
||||
for (int l = 0; l < 4; ++l) {
|
||||
const uint8_t hl = h[l] >> im;
|
||||
sumf += y[l+ 0] * d * s[0] * ((q[l+ 0] & 0xF) - (hl & 0x01 ? 0 : 16))
|
||||
+ y[l+16] * d * s[1] * ((q[l+16] & 0xF) - (hl & 0x04 ? 0 : 16))
|
||||
+ y[l+32] * d * s[2] * ((q[l+ 0] >> 4) - (hl & 0x10 ? 0 : 16))
|
||||
+ y[l+48] * d * s[3] * ((q[l+16] >> 4) - (hl & 0x40 ? 0 : 16));
|
||||
for (int row = 0; row < 2; ++row) {
|
||||
|
||||
const float d = dh[0];
|
||||
|
||||
float2 acc = {0.f, 0.f};
|
||||
for (int l = 0; l < 4; ++l) {
|
||||
const uint8_t hl = h[l] >> im;
|
||||
acc[0] += yl[l+0] * s[0] * ((int16_t)(q[l+ 0] & 0x0F) - (hl & 0x01 ? 0 : 16))
|
||||
+ yl[l+4] * s[1] * ((int16_t)(q[l+16] & 0x0F) - (hl & 0x04 ? 0 : 16));
|
||||
acc[1] += yh[l+0] * s[2] * ((int16_t)(q[l+ 0] & 0xF0) - (hl & 0x10 ? 0 : 256))
|
||||
+ yh[l+4] * s[3] * ((int16_t)(q[l+16] & 0xF0) - (hl & 0x40 ? 0 : 256));
|
||||
}
|
||||
sumf[row] += d * (acc[0] + 1.f/16.f * acc[1]);
|
||||
|
||||
q += step;
|
||||
h += step;
|
||||
s += step;
|
||||
dh += step/2;
|
||||
|
||||
}
|
||||
|
||||
y += 8 * QK_K;
|
||||
}
|
||||
#endif
|
||||
sum[ith] = sumf;
|
||||
|
||||
//
|
||||
// Accumulate the sum from all threads in the threadgroup
|
||||
//
|
||||
threadgroup_barrier(mem_flags::mem_threadgroup);
|
||||
if (ith%4 == 0) {
|
||||
sum[ith] += sum[ith+1] + sum[ith+2] + sum[ith+3];
|
||||
}
|
||||
threadgroup_barrier(mem_flags::mem_threadgroup);
|
||||
if (ith%16 == 0) {
|
||||
sum[ith] += sum[ith+4] + sum[ith+8] + sum[ith+12];
|
||||
}
|
||||
threadgroup_barrier(mem_flags::mem_threadgroup);
|
||||
if (ith == 0) {
|
||||
for (int i = 16; i < nth; i += 16) sum[0] += sum[i];
|
||||
dst[r1*ne0 + r0] = sum[0];
|
||||
for (int row = 0; row < 2; ++row) {
|
||||
const float tot = simd_sum(sumf[row]);
|
||||
if (tiisg == 0) {
|
||||
dst[r1*ne0 + first_row + row] = tot;
|
||||
}
|
||||
}
|
||||
|
||||
}
|
||||
@@ -1753,10 +1830,9 @@ kernel void kernel_mul_mat_q6_K_f32(
|
||||
constant int64_t & ne00,
|
||||
constant int64_t & ne10,
|
||||
constant int64_t & ne0,
|
||||
threadgroup float * sum [[threadgroup(0)]],
|
||||
uint2 tgpig[[threadgroup_position_in_grid]],
|
||||
uint2 tpitg[[thread_position_in_threadgroup]],
|
||||
uint2 tptg[[threads_per_threadgroup]]) {
|
||||
uint tiisg[[thread_index_in_simdgroup]],
|
||||
uint sgitg[[simdgroup_index_in_threadgroup]]) {
|
||||
|
||||
const uint8_t kmask1 = 0x03;
|
||||
const uint8_t kmask2 = 0x0C;
|
||||
@@ -1768,19 +1844,18 @@ kernel void kernel_mul_mat_q6_K_f32(
|
||||
const int64_t r0 = tgpig.x;
|
||||
const int64_t r1 = tgpig.y;
|
||||
|
||||
device const block_q6_K * x = (device const block_q6_K *) src0 + r0*nb;
|
||||
device const float * yy = (device const float *) src1 + r1*ne10;
|
||||
const int row = 2 * r0 + sgitg;
|
||||
|
||||
const int nth = tptg.x*tptg.y;
|
||||
const int ith = tptg.y*tpitg.x + tpitg.y;
|
||||
device const block_q6_K * x = (device const block_q6_K *) src0 + row * nb; //r0*nb;
|
||||
device const float * yy = (device const float *) src1 + r1*ne10;
|
||||
|
||||
float sumf = 0;
|
||||
|
||||
#if QK_K == 256
|
||||
// Note: we absolutely assume that tptg.y = 16 and QK_K = 256!
|
||||
const int iqs = 16 * tpitg.y;
|
||||
const int ip = iqs / 128; // 0 or 1
|
||||
const int il = (iqs - 128*ip)/16; // 0...7
|
||||
const int tid = tiisg/2;
|
||||
const int ix = tiisg%2;
|
||||
const int ip = tid/8; // 0 or 1
|
||||
const int il = tid%8;
|
||||
const int n = 4;
|
||||
const int l0 = n*il;
|
||||
const int is = 8*ip + l0/16;
|
||||
@@ -1789,9 +1864,10 @@ kernel void kernel_mul_mat_q6_K_f32(
|
||||
const int q_offset_l = 64*ip + l0;
|
||||
const int q_offset_h = 32*ip + l0;
|
||||
|
||||
for (int i = tpitg.x; i < nb; i += tptg.x) {
|
||||
for (int i = ix; i < nb; i += 2) {
|
||||
|
||||
device const uint8_t * ql = x[i].ql + q_offset_l;
|
||||
device const uint8_t * q1 = x[i].ql + q_offset_l;
|
||||
device const uint8_t * q2 = q1 + 32;
|
||||
device const uint8_t * qh = x[i].qh + q_offset_h;
|
||||
device const int8_t * sc = x[i].scales + is;
|
||||
|
||||
@@ -1801,19 +1877,21 @@ kernel void kernel_mul_mat_q6_K_f32(
|
||||
|
||||
float4 sums = {0.f, 0.f, 0.f, 0.f};
|
||||
for (int l = 0; l < n; ++l) {
|
||||
sums[0] += y[l+ 0] * ((int8_t)((ql[l+ 0] & 0xF) | ((qh[l] & kmask1) << 4)) - 32);
|
||||
sums[1] += y[l+32] * ((int8_t)((ql[l+32] & 0xF) | ((qh[l] & kmask2) << 2)) - 32);
|
||||
sums[2] += y[l+64] * ((int8_t)((ql[l+ 0] >> 4) | ((qh[l] & kmask3) << 0)) - 32);
|
||||
sums[3] += y[l+96] * ((int8_t)((ql[l+32] >> 4) | ((qh[l] & kmask4) >> 2)) - 32);
|
||||
sums[0] += y[l+ 0] * ((int8_t)((q1[l] & 0xF) | ((qh[l] & kmask1) << 4)) - 32);
|
||||
sums[1] += y[l+32] * ((int8_t)((q2[l] & 0xF) | ((qh[l] & kmask2) << 2)) - 32);
|
||||
sums[2] += y[l+64] * ((int8_t)((q1[l] >> 4) | ((qh[l] & kmask3) << 0)) - 32);
|
||||
sums[3] += y[l+96] * ((int8_t)((q2[l] >> 4) | ((qh[l] & kmask4) >> 2)) - 32);
|
||||
}
|
||||
|
||||
sumf += dall * (sums[0] * sc[0] + sums[1] * sc[2] + sums[2] * sc[4] + sums[3] * sc[6]);
|
||||
|
||||
}
|
||||
#else
|
||||
const int il = 4*tpitg.x; // 0, 4, 8, 12
|
||||
|
||||
for (int i = tpitg.y; i < nb; i += tptg.y) {
|
||||
#else
|
||||
const int ix = tiisg/4;
|
||||
const int il = 4*(tiisg%4);
|
||||
|
||||
for (int i = ix; i < nb; i += 8) {
|
||||
device const float * y = yy + i * QK_K + il;
|
||||
device const uint8_t * ql = x[i].ql + il;
|
||||
device const uint8_t * qh = x[i].qh + il;
|
||||
@@ -1833,23 +1911,8 @@ kernel void kernel_mul_mat_q6_K_f32(
|
||||
|
||||
#endif
|
||||
|
||||
sum[ith] = sumf;
|
||||
|
||||
//
|
||||
// Accumulate the sum from all threads in the threadgroup
|
||||
//
|
||||
threadgroup_barrier(mem_flags::mem_threadgroup);
|
||||
if (ith%4 == 0) {
|
||||
for (int i = 1; i < 4; ++i) sum[ith] += sum[ith + i];
|
||||
const float tot = simd_sum(sumf);
|
||||
if (tiisg == 0) {
|
||||
dst[r1*ne0 + row] = tot;
|
||||
}
|
||||
threadgroup_barrier(mem_flags::mem_threadgroup);
|
||||
if (ith%16 == 0) {
|
||||
for (int i = 4; i < 16; i += 4) sum[ith] += sum[ith + i];
|
||||
}
|
||||
threadgroup_barrier(mem_flags::mem_threadgroup);
|
||||
if (ith == 0) {
|
||||
for (int i = 16; i < nth; i += 16) sum[0] += sum[i];
|
||||
dst[r1*ne0 + r0] = sum[0];
|
||||
}
|
||||
|
||||
}
|
||||
|
244
llama/ggml-mpi.c
Normal file
244
llama/ggml-mpi.c
Normal file
@@ -0,0 +1,244 @@
|
||||
//go:build mpi
|
||||
|
||||
/**
|
||||
* llama.cpp - git e782c9e735f93ab4767ffc37462c523b73a17ddc
|
||||
*
|
||||
* MIT License
|
||||
*
|
||||
* Copyright (c) 2023 Georgi Gerganov
|
||||
*
|
||||
* Permission is hereby granted, free of charge, to any person obtaining a copy
|
||||
* of this software and associated documentation files (the "Software"), to deal
|
||||
* in the Software without restriction, including without limitation the rights
|
||||
* to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
||||
* copies of the Software, and to permit persons to whom the Software is
|
||||
* furnished to do so, subject to the following conditions:
|
||||
*
|
||||
* The above copyright notice and this permission notice shall be included in all
|
||||
* copies or substantial portions of the Software.
|
||||
*
|
||||
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
||||
* IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
||||
* FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
||||
* AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
||||
* LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
||||
* OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
||||
* SOFTWARE.
|
||||
*/
|
||||
|
||||
#include "ggml-mpi.h"
|
||||
|
||||
#include "ggml.h"
|
||||
|
||||
#include <mpi.h>
|
||||
|
||||
#include <stdio.h>
|
||||
#include <stdlib.h>
|
||||
|
||||
#define MIN(a, b) ((a) < (b) ? (a) : (b))
|
||||
|
||||
#define UNUSED GGML_UNUSED
|
||||
|
||||
struct ggml_mpi_context {
|
||||
int rank;
|
||||
int size;
|
||||
};
|
||||
|
||||
void ggml_mpi_backend_init(void) {
|
||||
MPI_Init(NULL, NULL);
|
||||
}
|
||||
|
||||
void ggml_mpi_backend_free(void) {
|
||||
MPI_Finalize();
|
||||
}
|
||||
|
||||
struct ggml_mpi_context * ggml_mpi_init(void) {
|
||||
struct ggml_mpi_context * ctx = calloc(1, sizeof(struct ggml_mpi_context));
|
||||
|
||||
MPI_Comm_rank(MPI_COMM_WORLD, &ctx->rank);
|
||||
MPI_Comm_size(MPI_COMM_WORLD, &ctx->size);
|
||||
|
||||
return ctx;
|
||||
}
|
||||
|
||||
void ggml_mpi_free(struct ggml_mpi_context * ctx) {
|
||||
free(ctx);
|
||||
}
|
||||
|
||||
int ggml_mpi_rank(struct ggml_mpi_context * ctx) {
|
||||
return ctx->rank;
|
||||
}
|
||||
|
||||
void ggml_mpi_eval_init(
|
||||
struct ggml_mpi_context * ctx_mpi,
|
||||
int * n_tokens,
|
||||
int * n_past,
|
||||
int * n_threads) {
|
||||
UNUSED(ctx_mpi);
|
||||
|
||||
// synchronize the worker node parameters with the root node
|
||||
MPI_Barrier(MPI_COMM_WORLD);
|
||||
|
||||
MPI_Bcast(n_tokens, 1, MPI_INT, 0, MPI_COMM_WORLD);
|
||||
MPI_Bcast(n_past, 1, MPI_INT, 0, MPI_COMM_WORLD);
|
||||
MPI_Bcast(n_threads, 1, MPI_INT, 0, MPI_COMM_WORLD);
|
||||
}
|
||||
|
||||
static int ggml_graph_get_node_idx(struct ggml_cgraph * gf, const char * name) {
|
||||
struct ggml_tensor * t = ggml_graph_get_tensor(gf, name);
|
||||
if (t == NULL) {
|
||||
fprintf(stderr, "%s: tensor %s not found\n", __func__, name);
|
||||
return -1;
|
||||
}
|
||||
|
||||
for (int i = 0; i < gf->n_nodes; i++) {
|
||||
if (gf->nodes[i] == t) {
|
||||
return i;
|
||||
}
|
||||
}
|
||||
|
||||
fprintf(stderr, "%s: tensor %s not found in graph (should not happen)\n", __func__, name);
|
||||
return -1;
|
||||
}
|
||||
|
||||
static void ggml_mpi_tensor_send(struct ggml_tensor * t, int mpi_rank_dst) {
|
||||
MPI_Datatype mpi_type;
|
||||
|
||||
switch (t->type) {
|
||||
case GGML_TYPE_I32: mpi_type = MPI_INT32_T; break;
|
||||
case GGML_TYPE_F32: mpi_type = MPI_FLOAT; break;
|
||||
default: GGML_ASSERT(false && "not implemented");
|
||||
}
|
||||
|
||||
const int retval = MPI_Send(t->data, ggml_nelements(t), mpi_type, mpi_rank_dst, 0, MPI_COMM_WORLD);
|
||||
GGML_ASSERT(retval == MPI_SUCCESS);
|
||||
}
|
||||
|
||||
static void ggml_mpi_tensor_recv(struct ggml_tensor * t, int mpi_rank_src) {
|
||||
MPI_Datatype mpi_type;
|
||||
|
||||
switch (t->type) {
|
||||
case GGML_TYPE_I32: mpi_type = MPI_INT32_T; break;
|
||||
case GGML_TYPE_F32: mpi_type = MPI_FLOAT; break;
|
||||
default: GGML_ASSERT(false && "not implemented");
|
||||
}
|
||||
|
||||
MPI_Status status; UNUSED(status);
|
||||
|
||||
const int retval = MPI_Recv(t->data, ggml_nelements(t), mpi_type, mpi_rank_src, MPI_ANY_TAG, MPI_COMM_WORLD, &status);
|
||||
GGML_ASSERT(retval == MPI_SUCCESS);
|
||||
}
|
||||
|
||||
// TODO: there are many improvements that can be done to this implementation
|
||||
void ggml_mpi_graph_compute_pre(
|
||||
struct ggml_mpi_context * ctx_mpi,
|
||||
struct ggml_cgraph * gf,
|
||||
int n_layers) {
|
||||
const int mpi_rank = ctx_mpi->rank;
|
||||
const int mpi_size = ctx_mpi->size;
|
||||
|
||||
struct ggml_tensor * inp_tokens = ggml_graph_get_tensor(gf, "inp_tokens");
|
||||
if (inp_tokens == NULL) {
|
||||
fprintf(stderr, "%s: tensor 'inp_tokens' not found\n", __func__);
|
||||
return;
|
||||
}
|
||||
|
||||
struct ggml_tensor * inp0 = ggml_graph_get_tensor(gf, "layer_inp_0");
|
||||
if (inp0 == NULL) {
|
||||
fprintf(stderr, "%s: tensor 'inp0' not found\n", __func__);
|
||||
return;
|
||||
}
|
||||
|
||||
GGML_ASSERT(inp0 == gf->nodes[0]);
|
||||
|
||||
// distribute the compute graph into slices across the MPI nodes
|
||||
//
|
||||
// the main node (0) processes the last layers + the remainder of the compute graph
|
||||
// and is responsible to pass the input tokens to the first node (1)
|
||||
//
|
||||
// node 1: [( 0) * n_per_node, ( 1) * n_per_node)
|
||||
// node 2: [( 1) * n_per_node, ( 2) * n_per_node)
|
||||
// ...
|
||||
// node n-1: [(n-2) * n_per_node, (n-1) * n_per_node)
|
||||
// node 0: [(n-1) * n_per_node, n_nodes)
|
||||
//
|
||||
if (mpi_rank > 0) {
|
||||
if (mpi_rank == 1) {
|
||||
// the first node (1) receives the input tokens from the main node (0)
|
||||
ggml_mpi_tensor_recv(inp_tokens, 0);
|
||||
} else {
|
||||
// recv input data for each node into the "inp0" tensor (i.e. the first node in the compute graph)
|
||||
ggml_mpi_tensor_recv(inp0, mpi_rank - 1);
|
||||
}
|
||||
} else if (mpi_size > 1) {
|
||||
// node 0 sends the input tokens to node 1
|
||||
ggml_mpi_tensor_send(inp_tokens, 1);
|
||||
|
||||
// recv the output data from the last node
|
||||
ggml_mpi_tensor_recv(inp0, mpi_size - 1);
|
||||
}
|
||||
|
||||
{
|
||||
const int n_per_node = (n_layers + (mpi_size - 1)) / mpi_size;
|
||||
|
||||
const int mpi_idx = mpi_rank > 0 ? mpi_rank - 1 : mpi_size - 1;
|
||||
|
||||
const int il0 = (mpi_idx + 0) * n_per_node;
|
||||
const int il1 = MIN(n_layers, (mpi_idx + 1) * n_per_node);
|
||||
|
||||
char name_l0[GGML_MAX_NAME];
|
||||
char name_l1[GGML_MAX_NAME];
|
||||
|
||||
snprintf(name_l0, sizeof(name_l0), "layer_inp_%d", il0);
|
||||
snprintf(name_l1, sizeof(name_l1), "layer_inp_%d", il1);
|
||||
|
||||
const int idx_l0 = ggml_graph_get_node_idx(gf, name_l0);
|
||||
const int idx_l1 = mpi_rank > 0 ? ggml_graph_get_node_idx(gf, name_l1) + 1 : gf->n_nodes;
|
||||
|
||||
if (idx_l0 < 0 || idx_l1 < 0) {
|
||||
fprintf(stderr, "%s: layer input nodes not found\n", __func__);
|
||||
return;
|
||||
}
|
||||
|
||||
// attach the input data to all nodes that need it
|
||||
// TODO: not great - should be able to do this without modifying the compute graph (see next TODO below)
|
||||
for (int i = idx_l0; i < idx_l1; i++) {
|
||||
if (gf->nodes[i]->src[0] == gf->nodes[idx_l0]) {
|
||||
gf->nodes[i]->src[0] = inp0;
|
||||
}
|
||||
if (gf->nodes[i]->src[1] == gf->nodes[idx_l0]) {
|
||||
gf->nodes[i]->src[1] = inp0;
|
||||
}
|
||||
}
|
||||
|
||||
// TODO: instead of rearranging the nodes, we should be able to execute a subset of the compute graph
|
||||
for (int i = 1; i < idx_l1 - idx_l0; i++) {
|
||||
gf->nodes[i] = gf->nodes[idx_l0 + i];
|
||||
gf->grads[i] = gf->grads[idx_l0 + i];
|
||||
}
|
||||
|
||||
// the first node performs the "get_rows" operation, the rest of the nodes get the data from the previous node
|
||||
if (mpi_idx != 0) {
|
||||
gf->nodes[0]->op = GGML_OP_NONE;
|
||||
}
|
||||
|
||||
gf->n_nodes = idx_l1 - idx_l0;
|
||||
|
||||
//fprintf(stderr, "%s: node %d: processing %d nodes [%d, %d)\n", __func__, mpi_rank, gf->n_nodes, il0, il1);
|
||||
}
|
||||
}
|
||||
|
||||
void ggml_mpi_graph_compute_post(
|
||||
struct ggml_mpi_context * ctx_mpi,
|
||||
struct ggml_cgraph * gf,
|
||||
int n_layers) {
|
||||
UNUSED(n_layers);
|
||||
|
||||
const int mpi_rank = ctx_mpi->rank;
|
||||
const int mpi_size = ctx_mpi->size;
|
||||
|
||||
// send the output data to the next node
|
||||
if (mpi_rank > 0) {
|
||||
ggml_mpi_tensor_send(gf->nodes[gf->n_nodes - 1], (mpi_rank + 1) % mpi_size);
|
||||
}
|
||||
}
|
67
llama/ggml-mpi.h
Normal file
67
llama/ggml-mpi.h
Normal file
@@ -0,0 +1,67 @@
|
||||
//go:build mpi
|
||||
|
||||
/**
|
||||
* llama.cpp - git e782c9e735f93ab4767ffc37462c523b73a17ddc
|
||||
*
|
||||
* MIT License
|
||||
*
|
||||
* Copyright (c) 2023 Georgi Gerganov
|
||||
*
|
||||
* Permission is hereby granted, free of charge, to any person obtaining a copy
|
||||
* of this software and associated documentation files (the "Software"), to deal
|
||||
* in the Software without restriction, including without limitation the rights
|
||||
* to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
||||
* copies of the Software, and to permit persons to whom the Software is
|
||||
* furnished to do so, subject to the following conditions:
|
||||
*
|
||||
* The above copyright notice and this permission notice shall be included in all
|
||||
* copies or substantial portions of the Software.
|
||||
*
|
||||
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
||||
* IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
||||
* FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
||||
* AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
||||
* LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
||||
* OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
||||
* SOFTWARE.
|
||||
*/
|
||||
|
||||
#pragma once
|
||||
|
||||
struct ggml_context;
|
||||
struct ggml_tensor;
|
||||
struct ggml_cgraph;
|
||||
|
||||
#ifdef __cplusplus
|
||||
extern "C" {
|
||||
#endif
|
||||
|
||||
struct ggml_mpi_context;
|
||||
|
||||
void ggml_mpi_backend_init(void);
|
||||
void ggml_mpi_backend_free(void);
|
||||
|
||||
struct ggml_mpi_context * ggml_mpi_init(void);
|
||||
void ggml_mpi_free(struct ggml_mpi_context * ctx);
|
||||
|
||||
int ggml_mpi_rank(struct ggml_mpi_context * ctx);
|
||||
|
||||
void ggml_mpi_eval_init(
|
||||
struct ggml_mpi_context * ctx_mpi,
|
||||
int * n_tokens,
|
||||
int * n_past,
|
||||
int * n_threads);
|
||||
|
||||
void ggml_mpi_graph_compute_pre(
|
||||
struct ggml_mpi_context * ctx_mpi,
|
||||
struct ggml_cgraph * gf,
|
||||
int n_layers);
|
||||
|
||||
void ggml_mpi_graph_compute_post(
|
||||
struct ggml_mpi_context * ctx_mpi,
|
||||
struct ggml_cgraph * gf,
|
||||
int n_layers);
|
||||
|
||||
#ifdef __cplusplus
|
||||
}
|
||||
#endif
|
1893
llama/ggml-opencl.cpp
Normal file
1893
llama/ggml-opencl.cpp
Normal file
File diff suppressed because it is too large
Load Diff
53
llama/ggml-opencl.h
Normal file
53
llama/ggml-opencl.h
Normal file
@@ -0,0 +1,53 @@
|
||||
//go:build opencl
|
||||
|
||||
/**
|
||||
* llama.cpp - git e782c9e735f93ab4767ffc37462c523b73a17ddc
|
||||
*
|
||||
* MIT License
|
||||
*
|
||||
* Copyright (c) 2023 Georgi Gerganov
|
||||
*
|
||||
* Permission is hereby granted, free of charge, to any person obtaining a copy
|
||||
* of this software and associated documentation files (the "Software"), to deal
|
||||
* in the Software without restriction, including without limitation the rights
|
||||
* to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
||||
* copies of the Software, and to permit persons to whom the Software is
|
||||
* furnished to do so, subject to the following conditions:
|
||||
*
|
||||
* The above copyright notice and this permission notice shall be included in all
|
||||
* copies or substantial portions of the Software.
|
||||
*
|
||||
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
||||
* IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
||||
* FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
||||
* AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
||||
* LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
||||
* OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
||||
* SOFTWARE.
|
||||
*/
|
||||
|
||||
#pragma once
|
||||
|
||||
#include "ggml.h"
|
||||
|
||||
#ifdef __cplusplus
|
||||
extern "C" {
|
||||
#endif
|
||||
|
||||
void ggml_cl_init(void);
|
||||
|
||||
void ggml_cl_mul(const struct ggml_tensor * src0, const struct ggml_tensor * src1, struct ggml_tensor * dst);
|
||||
bool ggml_cl_can_mul_mat(const struct ggml_tensor * src0, const struct ggml_tensor * src1, struct ggml_tensor * dst);
|
||||
size_t ggml_cl_mul_mat_get_wsize(const struct ggml_tensor * src0, const struct ggml_tensor * src1, struct ggml_tensor * dst);
|
||||
void ggml_cl_mul_mat(const struct ggml_tensor * src0, const struct ggml_tensor * src1, struct ggml_tensor * dst, void * wdata, size_t wsize);
|
||||
|
||||
void * ggml_cl_host_malloc(size_t size);
|
||||
void ggml_cl_host_free(void * ptr);
|
||||
|
||||
void ggml_cl_free_data(const struct ggml_tensor* tensor);
|
||||
|
||||
void ggml_cl_transform_tensor(void * data, struct ggml_tensor * tensor);
|
||||
|
||||
#ifdef __cplusplus
|
||||
}
|
||||
#endif
|
650
llama/ggml.c
650
llama/ggml.c
File diff suppressed because it is too large
Load Diff
51
llama/ggml.h
51
llama/ggml.h
@@ -1,5 +1,5 @@
|
||||
/**
|
||||
* llama.cpp - git 5bf2a2771886ee86137e01dbc7492f78fb392066
|
||||
* llama.cpp - git e782c9e735f93ab4767ffc37462c523b73a17ddc
|
||||
*
|
||||
* MIT License
|
||||
*
|
||||
@@ -227,8 +227,13 @@
|
||||
#define GGML_MAX_NAME 48
|
||||
#define GGML_DEFAULT_N_THREADS 4
|
||||
|
||||
|
||||
#define GGML_EXIT_SUCCESS 0
|
||||
#define GGML_EXIT_ABORTED 1
|
||||
|
||||
#define GGML_UNUSED(x) (void)(x)
|
||||
|
||||
|
||||
#define GGML_ASSERT(x) \
|
||||
do { \
|
||||
if (!(x)) { \
|
||||
@@ -389,6 +394,8 @@ extern "C" {
|
||||
GGML_OP_CLAMP,
|
||||
GGML_OP_CONV_1D,
|
||||
GGML_OP_CONV_2D,
|
||||
GGML_OP_POOL_1D,
|
||||
GGML_OP_POOL_2D,
|
||||
|
||||
GGML_OP_FLASH_ATTN,
|
||||
GGML_OP_FLASH_FF,
|
||||
@@ -468,6 +475,10 @@ extern "C" {
|
||||
|
||||
// the `n_tasks` of nodes, 1:1 mapping to cgraph nodes
|
||||
int n_tasks[GGML_MAX_NODES];
|
||||
|
||||
// abort ggml_graph_compute when true
|
||||
bool (*abort_callback)(void * data);
|
||||
void * abort_callback_data;
|
||||
};
|
||||
|
||||
// computation graph
|
||||
@@ -1136,6 +1147,17 @@ extern "C" {
|
||||
int mode,
|
||||
int n_ctx);
|
||||
|
||||
// custom RoPE, in-place, returns view(a)
|
||||
GGML_API struct ggml_tensor * ggml_rope_custom_inplace(
|
||||
struct ggml_context * ctx,
|
||||
struct ggml_tensor * a,
|
||||
int n_past,
|
||||
int n_dims,
|
||||
int mode,
|
||||
float freq_base,
|
||||
float freq_scale,
|
||||
int n_ctx);
|
||||
|
||||
// rotary position embedding backward, i.e compute dx from dy
|
||||
// a - dy
|
||||
GGML_API struct ggml_tensor * ggml_rope_back(
|
||||
@@ -1190,6 +1212,31 @@ extern "C" {
|
||||
int s,
|
||||
int d);
|
||||
|
||||
enum ggml_op_pool {
|
||||
GGML_OP_POOL_MAX,
|
||||
GGML_OP_POOL_AVG,
|
||||
GGML_OP_POOL_COUNT,
|
||||
};
|
||||
|
||||
GGML_API struct ggml_tensor* ggml_pool_1d(
|
||||
struct ggml_context * ctx,
|
||||
struct ggml_tensor * a,
|
||||
enum ggml_op_pool op,
|
||||
int k0, // kernel size
|
||||
int s0, // stride
|
||||
int p0); // padding
|
||||
|
||||
GGML_API struct ggml_tensor* ggml_pool_2d(
|
||||
struct ggml_context * ctx,
|
||||
struct ggml_tensor * a,
|
||||
enum ggml_op_pool op,
|
||||
int k0,
|
||||
int k1,
|
||||
int s0,
|
||||
int s1,
|
||||
int p0,
|
||||
int p1);
|
||||
|
||||
GGML_API struct ggml_tensor * ggml_flash_attn(
|
||||
struct ggml_context * ctx,
|
||||
struct ggml_tensor * q,
|
||||
@@ -1329,7 +1376,7 @@ extern "C" {
|
||||
// ggml_graph_plan() has to be called before ggml_graph_compute()
|
||||
// when plan.work_size > 0, caller must allocate memory for plan.work_data
|
||||
GGML_API struct ggml_cplan ggml_graph_plan (struct ggml_cgraph * cgraph, int n_threads /*= GGML_DEFAULT_N_THREADS*/);
|
||||
GGML_API void ggml_graph_compute(struct ggml_cgraph * cgraph, struct ggml_cplan * cplan);
|
||||
GGML_API int ggml_graph_compute(struct ggml_cgraph * cgraph, struct ggml_cplan * cplan);
|
||||
GGML_API void ggml_graph_reset (struct ggml_cgraph * cgraph);
|
||||
|
||||
// same as ggml_graph_compute() but the work data is allocated as a part of the context
|
||||
|
@@ -1,5 +1,5 @@
|
||||
/**
|
||||
* llama.cpp - git 5bf2a2771886ee86137e01dbc7492f78fb392066
|
||||
* llama.cpp - git e782c9e735f93ab4767ffc37462c523b73a17ddc
|
||||
*
|
||||
* MIT License
|
||||
*
|
||||
|
@@ -1,5 +1,5 @@
|
||||
/**
|
||||
* llama.cpp - git 5bf2a2771886ee86137e01dbc7492f78fb392066
|
||||
* llama.cpp - git e782c9e735f93ab4767ffc37462c523b73a17ddc
|
||||
*
|
||||
* MIT License
|
||||
*
|
||||
@@ -41,6 +41,14 @@
|
||||
#define K_SCALE_SIZE 12
|
||||
#endif
|
||||
|
||||
#ifndef static_assert
|
||||
#if defined(__STDC_VERSION__) && (__STDC_VERSION__ >= 201100L)
|
||||
#define static_assert(cond, msg) _Static_assert(cond, msg)
|
||||
#else
|
||||
#define static_assert(cond, msg) struct global_scope_noop_trick
|
||||
#endif
|
||||
#endif
|
||||
|
||||
//
|
||||
// Super-block quantization structures
|
||||
//
|
||||
|
@@ -1,5 +1,5 @@
|
||||
/**
|
||||
* llama.cpp - git 5bf2a2771886ee86137e01dbc7492f78fb392066
|
||||
* llama.cpp - git e782c9e735f93ab4767ffc37462c523b73a17ddc
|
||||
*
|
||||
* MIT License
|
||||
*
|
||||
@@ -201,13 +201,13 @@ struct llama_mmap {
|
||||
llama_mmap(struct llama_file * file, size_t prefetch = (size_t) -1 /* -1 = max value */, bool numa = false) {
|
||||
size = file->size;
|
||||
int fd = fileno(file->fp);
|
||||
int flags = MAP_PRIVATE;
|
||||
int flags = MAP_SHARED;
|
||||
// prefetch/readahead impairs performance on NUMA systems
|
||||
if (numa) { prefetch = 0; }
|
||||
#ifdef __linux__
|
||||
if (prefetch) { flags |= MAP_POPULATE; }
|
||||
#endif
|
||||
addr = mmap(NULL, file->size, PROT_READ | PROT_WRITE, flags, fd, 0);
|
||||
addr = mmap(NULL, file->size, PROT_READ, flags, fd, 0);
|
||||
if (addr == MAP_FAILED) {
|
||||
throw std::runtime_error(format("mmap failed: %s", strerror(errno)));
|
||||
}
|
||||
@@ -249,7 +249,7 @@ struct llama_mmap {
|
||||
throw std::runtime_error(format("CreateFileMappingA failed: %s", llama_format_win_err(error).c_str()));
|
||||
}
|
||||
|
||||
addr = MapViewOfFile(hMapping, FILE_MAP_COPY, 0, 0, 0);
|
||||
addr = MapViewOfFile(hMapping, FILE_MAP_READ, 0, 0, 0);
|
||||
error = GetLastError();
|
||||
CloseHandle(hMapping);
|
||||
|
||||
|
175
llama/llama.cpp
175
llama/llama.cpp
@@ -1,5 +1,5 @@
|
||||
/**
|
||||
* llama.cpp - git 5bf2a2771886ee86137e01dbc7492f78fb392066
|
||||
* llama.cpp - git e782c9e735f93ab4767ffc37462c523b73a17ddc
|
||||
*
|
||||
* MIT License
|
||||
*
|
||||
@@ -127,14 +127,15 @@ static void ggml_graph_compute_helper(std::vector<uint8_t> & buf, ggml_cgraph *
|
||||
// memory sizes
|
||||
//
|
||||
|
||||
static const std::map<e_model, size_t> & MEM_REQ_SCRATCH0()
|
||||
static const std::map<e_model, size_t> & MEM_REQ_SCRATCH0(int n_ctx)
|
||||
{
|
||||
static std::map<e_model, size_t> k_sizes = {
|
||||
{ MODEL_3B, 256ull * MB },
|
||||
{ MODEL_7B, 512ull * MB },
|
||||
{ MODEL_13B, 512ull * MB },
|
||||
{ MODEL_30B, 512ull * MB },
|
||||
{ MODEL_65B, 1024ull * MB },
|
||||
/* empirical scaling, still a guess */
|
||||
{ MODEL_3B, ((size_t) n_ctx / 16ull + 128ull) * MB },
|
||||
{ MODEL_7B, ((size_t) n_ctx / 16ull + 256ull) * MB },
|
||||
{ MODEL_13B, ((size_t) n_ctx / 12ull + 256ull) * MB },
|
||||
{ MODEL_30B, ((size_t) n_ctx / 10ull + 256ull) * MB },
|
||||
{ MODEL_65B, ((size_t) n_ctx / 8ull + 512ull) * MB },
|
||||
};
|
||||
return k_sizes;
|
||||
}
|
||||
@@ -166,14 +167,14 @@ static const std::map<e_model, size_t> & MEM_REQ_KV_SELF()
|
||||
|
||||
// this is mostly needed for temporary mul_mat buffers to dequantize the data
|
||||
// not actually needed if BLAS is disabled
|
||||
static const std::map<e_model, size_t> & MEM_REQ_EVAL()
|
||||
static const std::map<e_model, size_t> & MEM_REQ_EVAL(int n_ctx)
|
||||
{
|
||||
static std::map<e_model, size_t> k_sizes = {
|
||||
{ MODEL_3B, 512ull * MB },
|
||||
{ MODEL_7B, 768ull * MB },
|
||||
{ MODEL_13B, 1024ull * MB },
|
||||
{ MODEL_30B, 1280ull * MB },
|
||||
{ MODEL_65B, 1536ull * MB },
|
||||
{ MODEL_3B, ((size_t) n_ctx / 256ull + 512ull) * MB },
|
||||
{ MODEL_7B, ((size_t) n_ctx / 256ull + 768ull) * MB },
|
||||
{ MODEL_13B, ((size_t) n_ctx / 256ull + 1024ull) * MB },
|
||||
{ MODEL_30B, ((size_t) n_ctx / 256ull + 1280ull) * MB },
|
||||
{ MODEL_65B, ((size_t) n_ctx / 256ull + 1536ull) * MB },
|
||||
};
|
||||
return k_sizes;
|
||||
}
|
||||
@@ -215,6 +216,10 @@ struct llama_hparams {
|
||||
uint32_t n_head = 32;
|
||||
uint32_t n_layer = 32;
|
||||
uint32_t n_rot = 64;
|
||||
|
||||
float rope_freq_base = 10000.0f;
|
||||
float rope_freq_scale = 1.0f;
|
||||
|
||||
enum llama_ftype ftype = LLAMA_FTYPE_MOSTLY_F16;
|
||||
|
||||
bool operator!=(const llama_hparams & other) const {
|
||||
@@ -329,7 +334,7 @@ struct llama_model {
|
||||
};
|
||||
|
||||
struct llama_context {
|
||||
llama_context(const llama_model & model, const llama_vocab & vocab) : model(model), vocab(vocab), t_load_us(model.t_load_us), t_start_us(model.t_start_us) {}
|
||||
llama_context(const llama_model & model) : model(model), t_load_us(model.t_load_us), t_start_us(model.t_start_us) {}
|
||||
#ifdef GGML_USE_METAL
|
||||
~llama_context() {
|
||||
if (ctx_metal) {
|
||||
@@ -350,7 +355,6 @@ struct llama_context {
|
||||
int32_t n_p_eval = 0; // number of tokens in eval calls for the prompt (with batch size > 1)
|
||||
|
||||
const llama_model & model;
|
||||
const llama_vocab & vocab;
|
||||
|
||||
bool model_owner = false;
|
||||
|
||||
@@ -577,7 +581,9 @@ struct llama_file_loader {
|
||||
}
|
||||
|
||||
// skip to the next multiple of 32 bytes
|
||||
file.seek(-static_cast<ptrdiff_t>(file.tell()) & 31, SEEK_CUR);
|
||||
if (file_version >= LLAMA_FILE_VERSION_GGJT_V1) {
|
||||
file.seek(-static_cast<ptrdiff_t>(file.tell()) & 31, SEEK_CUR);
|
||||
}
|
||||
|
||||
tensor.file_off = file.tell();
|
||||
tensor.name = name;
|
||||
@@ -674,7 +680,7 @@ struct llama_model_loader {
|
||||
*ctx_size_p = *mmapped_size_p = 0;
|
||||
for (const llama_load_tensor & lt : tensors_map.tensors) {
|
||||
*ctx_size_p += sizeof(struct ggml_tensor) + GGML_OBJECT_SIZE;
|
||||
*(use_mmap ? mmapped_size_p : ctx_size_p) += lt.size;
|
||||
*(use_mmap ? mmapped_size_p : ctx_size_p) += lt.size + 16;
|
||||
}
|
||||
}
|
||||
|
||||
@@ -870,6 +876,8 @@ struct llama_context_params llama_context_default_params() {
|
||||
/*.gpu_layers =*/ 0,
|
||||
/*.main_gpu =*/ 0,
|
||||
/*.tensor_split =*/ {0},
|
||||
/*.rope_freq_base =*/ 10000.0f,
|
||||
/*.rope_freq_scale =*/ 1.0f,
|
||||
/*.progress_callback =*/ nullptr,
|
||||
/*.progress_callback_user_data =*/ nullptr,
|
||||
/*.low_vram =*/ false,
|
||||
@@ -895,6 +903,10 @@ struct llama_model_quantize_params llama_model_quantize_default_params() {
|
||||
return result;
|
||||
}
|
||||
|
||||
int llama_max_devices() {
|
||||
return LLAMA_MAX_DEVICES;
|
||||
}
|
||||
|
||||
bool llama_mmap_supported() {
|
||||
return llama_mmap::SUPPORTED;
|
||||
}
|
||||
@@ -993,6 +1005,8 @@ static void llama_model_load_internal(
|
||||
int n_gpu_layers,
|
||||
int main_gpu,
|
||||
const float * tensor_split,
|
||||
float rope_freq_base,
|
||||
float rope_freq_scale,
|
||||
bool low_vram,
|
||||
ggml_type memory_type,
|
||||
bool use_mmap,
|
||||
@@ -1027,22 +1041,27 @@ static void llama_model_load_internal(
|
||||
}
|
||||
|
||||
hparams.n_ctx = n_ctx;
|
||||
|
||||
hparams.rope_freq_base = rope_freq_base;
|
||||
hparams.rope_freq_scale = rope_freq_scale;
|
||||
}
|
||||
|
||||
const uint32_t n_ff = ((2*(4*hparams.n_embd)/3 + hparams.n_mult - 1)/hparams.n_mult)*hparams.n_mult;
|
||||
|
||||
{
|
||||
fprintf(stderr, "%s: format = %s\n", __func__, llama_file_version_name(file_version));
|
||||
fprintf(stderr, "%s: n_vocab = %u\n", __func__, hparams.n_vocab);
|
||||
fprintf(stderr, "%s: n_ctx = %u\n", __func__, hparams.n_ctx);
|
||||
fprintf(stderr, "%s: n_embd = %u\n", __func__, hparams.n_embd);
|
||||
fprintf(stderr, "%s: n_mult = %u\n", __func__, hparams.n_mult);
|
||||
fprintf(stderr, "%s: n_head = %u\n", __func__, hparams.n_head);
|
||||
fprintf(stderr, "%s: n_layer = %u\n", __func__, hparams.n_layer);
|
||||
fprintf(stderr, "%s: n_rot = %u\n", __func__, hparams.n_rot);
|
||||
fprintf(stderr, "%s: format = %s\n", __func__, llama_file_version_name(file_version));
|
||||
fprintf(stderr, "%s: n_vocab = %u\n", __func__, hparams.n_vocab);
|
||||
fprintf(stderr, "%s: n_ctx = %u\n", __func__, hparams.n_ctx);
|
||||
fprintf(stderr, "%s: n_embd = %u\n", __func__, hparams.n_embd);
|
||||
fprintf(stderr, "%s: n_mult = %u\n", __func__, hparams.n_mult);
|
||||
fprintf(stderr, "%s: n_head = %u\n", __func__, hparams.n_head);
|
||||
fprintf(stderr, "%s: n_layer = %u\n", __func__, hparams.n_layer);
|
||||
fprintf(stderr, "%s: n_rot = %u\n", __func__, hparams.n_rot);
|
||||
fprintf(stderr, "%s: freq_base = %.1f\n", __func__, hparams.rope_freq_base);
|
||||
fprintf(stderr, "%s: freq_scale = %g\n", __func__, hparams.rope_freq_scale);
|
||||
fprintf(stderr, "%s: ftype = %u (%s)\n", __func__, hparams.ftype, llama_ftype_name(hparams.ftype));
|
||||
fprintf(stderr, "%s: n_ff = %u\n", __func__, n_ff);
|
||||
fprintf(stderr, "%s: model size = %s\n", __func__, llama_model_type_name(model.type));
|
||||
fprintf(stderr, "%s: n_ff = %u\n", __func__, n_ff);
|
||||
fprintf(stderr, "%s: model size = %s\n", __func__, llama_model_type_name(model.type));
|
||||
}
|
||||
|
||||
if (file_version < LLAMA_FILE_VERSION_GGJT_V2) {
|
||||
@@ -1191,9 +1210,9 @@ static void llama_model_load_internal(
|
||||
const size_t mem_required =
|
||||
ctx_size +
|
||||
mmapped_size - vram_weights + // weights in VRAM not in memory
|
||||
MEM_REQ_SCRATCH0().at(model.type) +
|
||||
MEM_REQ_SCRATCH0(hparams.n_ctx).at(model.type) +
|
||||
MEM_REQ_SCRATCH1().at(model.type) +
|
||||
MEM_REQ_EVAL().at (model.type);
|
||||
MEM_REQ_EVAL(hparams.n_ctx).at(model.type);
|
||||
|
||||
// this is the memory required by one llama_state
|
||||
const size_t mem_required_state =
|
||||
@@ -1297,6 +1316,8 @@ static bool llama_model_load(
|
||||
int n_gpu_layers,
|
||||
int main_gpu,
|
||||
float * tensor_split,
|
||||
float rope_freq_base,
|
||||
float rope_freq_scale,
|
||||
bool low_vram,
|
||||
ggml_type memory_type,
|
||||
bool use_mmap,
|
||||
@@ -1305,7 +1326,7 @@ static bool llama_model_load(
|
||||
llama_progress_callback progress_callback,
|
||||
void *progress_callback_user_data) {
|
||||
try {
|
||||
llama_model_load_internal(fname, model, vocab, n_ctx, n_batch, n_gpu_layers, main_gpu, tensor_split, low_vram, memory_type,
|
||||
llama_model_load_internal(fname, model, vocab, n_ctx, n_batch, n_gpu_layers, main_gpu, tensor_split, rope_freq_base, rope_freq_scale, low_vram, memory_type,
|
||||
use_mmap, use_mlock, vocab_only, progress_callback, progress_callback_user_data);
|
||||
return true;
|
||||
} catch (const std::exception & err) {
|
||||
@@ -1357,6 +1378,9 @@ static bool llama_eval_internal(
|
||||
const int n_rot = hparams.n_embd/hparams.n_head;
|
||||
const int n_gpu_layers = model.n_gpu_layers;
|
||||
|
||||
const float freq_base = hparams.rope_freq_base;
|
||||
const float freq_scale = hparams.rope_freq_scale;
|
||||
|
||||
auto & mem_per_token = lctx.mem_per_token;
|
||||
auto & buf_compute = lctx.buf_compute;
|
||||
|
||||
@@ -1454,11 +1478,11 @@ static bool llama_eval_internal(
|
||||
offload_func_kq(tmpq);
|
||||
ggml_set_name(tmpq, "tmpq");
|
||||
|
||||
struct ggml_tensor * Kcur = ggml_rope_inplace(ctx0, ggml_reshape_3d(ctx0, tmpk, n_embd/n_head, n_head, N), n_past, n_rot, 0, 0);
|
||||
struct ggml_tensor * Kcur = ggml_rope_custom_inplace(ctx0, ggml_reshape_3d(ctx0, tmpk, n_embd/n_head, n_head, N), n_past, n_rot, 0, freq_base, freq_scale, 0);
|
||||
offload_func_kq(Kcur);
|
||||
ggml_set_name(Kcur, "Kcur");
|
||||
|
||||
struct ggml_tensor * Qcur = ggml_rope_inplace(ctx0, ggml_reshape_3d(ctx0, tmpq, n_embd/n_head, n_head, N), n_past, n_rot, 0, 0);
|
||||
struct ggml_tensor * Qcur = ggml_rope_custom_inplace(ctx0, ggml_reshape_3d(ctx0, tmpq, n_embd/n_head, n_head, N), n_past, n_rot, 0, freq_base, freq_scale, 0);
|
||||
offload_func_kq(Qcur);
|
||||
ggml_set_name(Qcur, "Qcur");
|
||||
|
||||
@@ -2032,9 +2056,18 @@ void llama_sample_tail_free(struct llama_context * ctx, llama_token_data_array *
|
||||
}
|
||||
|
||||
// Normalize the second derivatives
|
||||
float second_derivatives_sum = std::accumulate(second_derivatives.begin(), second_derivatives.end(), 0.0f);
|
||||
for (float & value : second_derivatives) {
|
||||
value /= second_derivatives_sum;
|
||||
{
|
||||
const float second_derivatives_sum = std::accumulate(second_derivatives.begin(), second_derivatives.end(), 0.0f);
|
||||
|
||||
if (second_derivatives_sum > 1e-6f) {
|
||||
for (float & value : second_derivatives) {
|
||||
value /= second_derivatives_sum;
|
||||
}
|
||||
} else {
|
||||
for (float & value : second_derivatives) {
|
||||
value = 1.0f / second_derivatives.size();
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
float cum_sum = 0.0f;
|
||||
@@ -2213,7 +2246,7 @@ void llama_sample_classifier_free_guidance(
|
||||
struct llama_context * guidance_ctx,
|
||||
float scale,
|
||||
float smooth_factor) {
|
||||
int64_t t_start_sample_us = t_start_sample_us = ggml_time_us();
|
||||
int64_t t_start_sample_us = ggml_time_us();
|
||||
|
||||
assert(ctx);
|
||||
auto n_vocab = llama_n_vocab(ctx);
|
||||
@@ -2701,8 +2734,9 @@ struct llama_model * llama_load_model_from_file(
|
||||
ggml_type memory_type = params.f16_kv ? GGML_TYPE_F16 : GGML_TYPE_F32;
|
||||
|
||||
if (!llama_model_load(path_model, *model, model->vocab, params.n_ctx, params.n_batch, params.n_gpu_layers,
|
||||
params.main_gpu, params.tensor_split, params.low_vram, memory_type, params.use_mmap, params.use_mlock,
|
||||
params.vocab_only, params.progress_callback, params.progress_callback_user_data)) {
|
||||
params.main_gpu, params.tensor_split, params.rope_freq_base, params.rope_freq_scale,params.low_vram,
|
||||
memory_type, params.use_mmap, params.use_mlock, params.vocab_only, params.progress_callback,
|
||||
params.progress_callback_user_data)) {
|
||||
delete model;
|
||||
fprintf(stderr, "%s: failed to load model\n", __func__);
|
||||
return nullptr;
|
||||
@@ -2723,7 +2757,7 @@ struct llama_context * llama_new_context_with_model(
|
||||
return nullptr;
|
||||
}
|
||||
|
||||
llama_context * ctx = new llama_context(*model, model->vocab);
|
||||
llama_context * ctx = new llama_context(*model);
|
||||
|
||||
if (params.seed == LLAMA_DEFAULT_SEED) {
|
||||
params.seed = time(NULL);
|
||||
@@ -2777,9 +2811,9 @@ struct llama_context * llama_new_context_with_model(
|
||||
ctx->embedding.resize(hparams.n_embd);
|
||||
}
|
||||
|
||||
ctx->buf_compute.resize(MEM_REQ_EVAL().at(ctx->model.type));
|
||||
ctx->buf_compute.resize(MEM_REQ_EVAL(hparams.n_ctx).at(ctx->model.type));
|
||||
|
||||
ctx->buf_scratch[0].resize(MEM_REQ_SCRATCH0().at(ctx->model.type));
|
||||
ctx->buf_scratch[0].resize(MEM_REQ_SCRATCH0(hparams.n_ctx).at(ctx->model.type));
|
||||
ctx->buf_scratch[1].resize(MEM_REQ_SCRATCH1().at(ctx->model.type));
|
||||
}
|
||||
|
||||
@@ -3561,13 +3595,13 @@ int llama_eval_export(struct llama_context * ctx, const char * fname) {
|
||||
return 0;
|
||||
}
|
||||
|
||||
int llama_tokenize(
|
||||
struct llama_context * ctx,
|
||||
int llama_tokenize_with_model(
|
||||
const struct llama_model * model,
|
||||
const char * text,
|
||||
llama_token * tokens,
|
||||
int n_max_tokens,
|
||||
bool add_bos) {
|
||||
auto res = llama_tokenize(ctx->vocab, text, add_bos);
|
||||
auto res = llama_tokenize(model->vocab, text, add_bos);
|
||||
|
||||
if (n_max_tokens < (int) res.size()) {
|
||||
fprintf(stderr, "%s: too many tokens\n", __func__);
|
||||
@@ -3581,8 +3615,29 @@ int llama_tokenize(
|
||||
return res.size();
|
||||
}
|
||||
|
||||
int llama_tokenize(
|
||||
struct llama_context * ctx,
|
||||
const char * text,
|
||||
llama_token * tokens,
|
||||
int n_max_tokens,
|
||||
bool add_bos) {
|
||||
return llama_tokenize_with_model(&ctx->model, text, tokens, n_max_tokens, add_bos);
|
||||
}
|
||||
|
||||
int llama_n_vocab_from_model(const struct llama_model * model) {
|
||||
return model->vocab.id_to_token.size();
|
||||
}
|
||||
|
||||
int llama_n_ctx_from_model(const struct llama_model * model) {
|
||||
return model->hparams.n_ctx;
|
||||
}
|
||||
|
||||
int llama_n_embd_from_model(const struct llama_model * model) {
|
||||
return model->hparams.n_embd;
|
||||
}
|
||||
|
||||
int llama_n_vocab(const struct llama_context * ctx) {
|
||||
return ctx->vocab.id_to_token.size();
|
||||
return ctx->model.vocab.id_to_token.size();
|
||||
}
|
||||
|
||||
int llama_n_ctx(const struct llama_context * ctx) {
|
||||
@@ -3593,17 +3648,25 @@ int llama_n_embd(const struct llama_context * ctx) {
|
||||
return ctx->model.hparams.n_embd;
|
||||
}
|
||||
|
||||
int llama_get_vocab_from_model(
|
||||
const struct llama_model * model,
|
||||
const char * * strings,
|
||||
float * scores,
|
||||
int capacity) {
|
||||
int n = std::min(capacity, (int) model->vocab.id_to_token.size());
|
||||
for (int i = 0; i<n; ++i) {
|
||||
strings[i] = model->vocab.id_to_token[i].tok.c_str();
|
||||
scores[i] = model->vocab.id_to_token[i].score;
|
||||
}
|
||||
return n;
|
||||
}
|
||||
|
||||
int llama_get_vocab(
|
||||
const struct llama_context * ctx,
|
||||
const char * * strings,
|
||||
float * scores,
|
||||
int capacity) {
|
||||
int n = std::min(capacity, (int) ctx->vocab.id_to_token.size());
|
||||
for (int i = 0; i<n; ++i) {
|
||||
strings[i] = ctx->vocab.id_to_token[i].tok.c_str();
|
||||
scores[i] = ctx->vocab.id_to_token[i].score;
|
||||
}
|
||||
return n;
|
||||
return llama_get_vocab_from_model(&ctx->model, strings, scores, capacity);
|
||||
}
|
||||
|
||||
float * llama_get_logits(struct llama_context * ctx) {
|
||||
@@ -3614,12 +3677,16 @@ float * llama_get_embeddings(struct llama_context * ctx) {
|
||||
return ctx->embedding.data();
|
||||
}
|
||||
|
||||
const char * llama_token_to_str(const struct llama_context * ctx, llama_token token) {
|
||||
if (token >= llama_n_vocab(ctx)) {
|
||||
const char * llama_token_to_str_with_model(const struct llama_model * model, llama_token token) {
|
||||
if (token >= llama_n_vocab_from_model(model)) {
|
||||
return nullptr;
|
||||
}
|
||||
|
||||
return ctx->vocab.id_to_token[token].tok.c_str();
|
||||
return model->vocab.id_to_token[token].tok.c_str();
|
||||
}
|
||||
|
||||
const char * llama_token_to_str(const struct llama_context * ctx, llama_token token) {
|
||||
return llama_token_to_str_with_model(&ctx->model, token);
|
||||
}
|
||||
|
||||
llama_token llama_token_bos() {
|
||||
|
@@ -1,5 +1,5 @@
|
||||
/**
|
||||
* llama.cpp - git 5bf2a2771886ee86137e01dbc7492f78fb392066
|
||||
* llama.cpp - git e782c9e735f93ab4767ffc37462c523b73a17ddc
|
||||
*
|
||||
* MIT License
|
||||
*
|
||||
@@ -115,6 +115,11 @@ extern "C" {
|
||||
int32_t n_gpu_layers; // number of layers to store in VRAM
|
||||
int32_t main_gpu; // the GPU that is used for scratch and small tensors
|
||||
float tensor_split[LLAMA_MAX_DEVICES]; // how to split layers across multiple GPUs
|
||||
|
||||
// ref: https://github.com/ggerganov/llama.cpp/pull/2054
|
||||
float rope_freq_base; // RoPE base frequency
|
||||
float rope_freq_scale; // RoPE frequency scaling factor
|
||||
|
||||
// called with a progress value between 0 and 1, pass NULL to disable
|
||||
llama_progress_callback progress_callback;
|
||||
// context pointer passed to the progress callback
|
||||
@@ -174,6 +179,8 @@ extern "C" {
|
||||
int32_t n_eval;
|
||||
};
|
||||
|
||||
LLAMA_API int llama_max_devices();
|
||||
|
||||
LLAMA_API struct llama_context_params llama_context_default_params();
|
||||
LLAMA_API struct llama_model_quantize_params llama_model_quantize_default_params();
|
||||
|
||||
@@ -296,10 +303,21 @@ extern "C" {
|
||||
int n_max_tokens,
|
||||
bool add_bos);
|
||||
|
||||
LLAMA_API int llama_tokenize_with_model(
|
||||
const struct llama_model * model,
|
||||
const char * text,
|
||||
llama_token * tokens,
|
||||
int n_max_tokens,
|
||||
bool add_bos);
|
||||
|
||||
LLAMA_API int llama_n_vocab(const struct llama_context * ctx);
|
||||
LLAMA_API int llama_n_ctx (const struct llama_context * ctx);
|
||||
LLAMA_API int llama_n_embd (const struct llama_context * ctx);
|
||||
|
||||
LLAMA_API int llama_n_vocab_from_model(const struct llama_model * model);
|
||||
LLAMA_API int llama_n_ctx_from_model (const struct llama_model * model);
|
||||
LLAMA_API int llama_n_embd_from_model (const struct llama_model * model);
|
||||
|
||||
// Get the vocabulary as output parameters.
|
||||
// Returns number of results.
|
||||
LLAMA_API int llama_get_vocab(
|
||||
@@ -308,6 +326,12 @@ extern "C" {
|
||||
float * scores,
|
||||
int capacity);
|
||||
|
||||
LLAMA_API int llama_get_vocab_from_model(
|
||||
const struct llama_model * model,
|
||||
const char * * strings,
|
||||
float * scores,
|
||||
int capacity);
|
||||
|
||||
// Token logits obtained from the last call to llama_eval()
|
||||
// The logits for the last token are stored in the last row
|
||||
// Can be mutated in order to change the probabilities of the next token
|
||||
@@ -320,7 +344,13 @@ extern "C" {
|
||||
LLAMA_API float * llama_get_embeddings(struct llama_context * ctx);
|
||||
|
||||
// Token Id -> String. Uses the vocabulary in the provided context
|
||||
LLAMA_API const char * llama_token_to_str(const struct llama_context * ctx, llama_token token);
|
||||
LLAMA_API const char * llama_token_to_str(
|
||||
const struct llama_context * ctx,
|
||||
llama_token token);
|
||||
|
||||
LLAMA_API const char * llama_token_to_str_with_model(
|
||||
const struct llama_model * model,
|
||||
llama_token token);
|
||||
|
||||
// Special tokens
|
||||
LLAMA_API llama_token llama_token_bos(); // beginning-of-sentence
|
||||
|
70
llama/update-llama-cpp.sh
Normal file
70
llama/update-llama-cpp.sh
Normal file
@@ -0,0 +1,70 @@
|
||||
#!/bin/sh
|
||||
|
||||
set -eu
|
||||
|
||||
|
||||
status() { echo >&2 ">>> $*"; }
|
||||
error() { status "ERROR $*"; }
|
||||
usage() {
|
||||
echo "usage: $(basename $0) /path/to/repo"
|
||||
exit 1
|
||||
}
|
||||
|
||||
OUT=$(dirname $0)
|
||||
while getopts "hC:" OPTION; do
|
||||
case $OPTION in
|
||||
C) OUT=$OPTARG ;;
|
||||
*) usage ;;
|
||||
esac
|
||||
done
|
||||
|
||||
shift $(( $OPTIND - 1 ))
|
||||
[ $# -eq 1 ] || usage
|
||||
|
||||
status "updating source..."
|
||||
cp -a "$1"/*.{c,h,cpp,m,metal,cu} "$OUT"
|
||||
|
||||
status "removing incompatible files..."
|
||||
rm -f "$OUT"/build-info.h
|
||||
|
||||
SHA1=$(git -C $1 rev-parse @)
|
||||
|
||||
LICENSE=$(mktemp)
|
||||
cleanup() {
|
||||
rm -f $LICENSE
|
||||
}
|
||||
trap cleanup 0
|
||||
|
||||
cat <<EOF | sed 's/ *$//' >$LICENSE
|
||||
/**
|
||||
* llama.cpp - git $SHA1
|
||||
*
|
||||
$(sed 's/^/ * /' <$1/LICENSE)
|
||||
*/
|
||||
|
||||
EOF
|
||||
|
||||
for IN in $OUT/*.{c,h,cpp,m,metal,cu}; do
|
||||
TMP=$(mktemp)
|
||||
status "updating license $IN"
|
||||
cat $LICENSE $IN >$TMP
|
||||
mv $TMP $IN
|
||||
done
|
||||
|
||||
touchup() {
|
||||
local CONSTRAINT=$1 && shift
|
||||
|
||||
for IN in $*; do
|
||||
status "touching up $IN..."
|
||||
TMP=$(mktemp)
|
||||
{
|
||||
echo "//go:build $CONSTRAINT"
|
||||
echo
|
||||
} | cat - $IN >$TMP
|
||||
mv $TMP $IN
|
||||
done
|
||||
}
|
||||
|
||||
touchup darwin $OUT/ggml-metal.*
|
||||
touchup mpi $OUT/ggml-mpi.*
|
||||
touchup opencl $OUT/ggml-opencl.*
|
38
models.json
38
models.json
@@ -1,38 +0,0 @@
|
||||
[
|
||||
{
|
||||
"name": "orca",
|
||||
"display_name": "Orca Mini",
|
||||
"parameters": "3B",
|
||||
"url": "https://huggingface.co/TheBloke/orca_mini_3B-GGML/resolve/main/orca-mini-3b.ggmlv3.q4_1.bin",
|
||||
"short_description": "Follow instructions. Great small model that runs fast even without GPU support.",
|
||||
"description": "An OpenLLaMa-3B model trained on explain tuned datasets, created using Instructions and Input from WizardLM, Alpaca & Dolly-V2 datasets and applying Orca Research Paper dataset construction approaches.",
|
||||
"published_by": "TheBloke",
|
||||
"original_author": "psmathur",
|
||||
"original_url": "https://huggingface.co/psmathur/orca_mini_3b",
|
||||
"license": "CC-BY-SA-4.0"
|
||||
},
|
||||
{
|
||||
"name": "nous-hermes",
|
||||
"display_name": "Nous Hermes",
|
||||
"parameters": "13B",
|
||||
"url": "https://huggingface.co/TheBloke/Nous-Hermes-13B-GGML/resolve/main/nous-hermes-13b.ggmlv3.q2_K.bin",
|
||||
"short_description": "Currently one of the best 13B general model.",
|
||||
"description": "It is suitable for a wide range of language tasks, from generating creative text to understanding and following complex instructions. This model was fine-tuned by Nous Research, with Teknium and Karan4D leading the fine tuning process and dataset curation, Redmond AI sponsoring the compute, and several other contributors. The result is an enhanced Llama 13b model that rivals GPT-3.5-turbo in performance across a variety of tasks. \n \n This model stands out for its long responses, low hallucination rate, and absence of OpenAI censorship mechanisms. The fine-tuning process was performed with a 2000 sequence length on an 8x a100 80GB DGX machine for over 50 hours.",
|
||||
"published_by": "TheBloke",
|
||||
"original_author": "NousResearch",
|
||||
"original_url": "https://huggingface.co/NousResearch/Nous-Hermes-13b",
|
||||
"license": "GPL"
|
||||
},
|
||||
{
|
||||
"name": "vicuna",
|
||||
"display_name": "Vicuna",
|
||||
"parameters": "7B",
|
||||
"url": "https://huggingface.co/TheBloke/vicuna-7B-v1.3-GGML/resolve/main/vicuna-7b-v1.3.ggmlv3.q4_0.bin",
|
||||
"short_description": "Vicuna is a chat assistant trained by fine-tuning LLaMA on user-shared conversations collected from ShareGPT.",
|
||||
"description": "The primary use of Vicuna is research on large language models and chatbots. The primary intended users of the model are researchers and hobbyists in natural language processing, machine learning, and artificial intelligence.",
|
||||
"published_by": "TheBloke",
|
||||
"original_author": "LMSYS",
|
||||
"original_url": "https://huggingface.co/lmsys/vicuna-7b-v1.3",
|
||||
"license:": "Non-commercial"
|
||||
}
|
||||
]
|
@@ -2,76 +2,81 @@ package parser
|
||||
|
||||
import (
|
||||
"bufio"
|
||||
"bytes"
|
||||
"errors"
|
||||
"fmt"
|
||||
"io"
|
||||
"strings"
|
||||
)
|
||||
|
||||
type Command struct {
|
||||
Name string
|
||||
Arg string
|
||||
Args string
|
||||
}
|
||||
|
||||
func (c *Command) Reset() {
|
||||
c.Name = ""
|
||||
c.Args = ""
|
||||
}
|
||||
|
||||
func Parse(reader io.Reader) ([]Command, error) {
|
||||
var commands []Command
|
||||
var foundModel bool
|
||||
|
||||
var command, modelCommand Command
|
||||
|
||||
scanner := bufio.NewScanner(reader)
|
||||
multiline := false
|
||||
var multilineCommand *Command
|
||||
scanner.Split(scanModelfile)
|
||||
for scanner.Scan() {
|
||||
line := scanner.Text()
|
||||
if multiline {
|
||||
// If we're in a multiline string and the line is """, end the multiline string.
|
||||
if strings.TrimSpace(line) == `"""` {
|
||||
multiline = false
|
||||
commands = append(commands, *multilineCommand)
|
||||
} else {
|
||||
// Otherwise, append the line to the multiline string.
|
||||
multilineCommand.Arg += "\n" + line
|
||||
}
|
||||
continue
|
||||
}
|
||||
fields := strings.Fields(line)
|
||||
line := scanner.Bytes()
|
||||
|
||||
fields := bytes.SplitN(line, []byte(" "), 2)
|
||||
if len(fields) == 0 {
|
||||
continue
|
||||
}
|
||||
|
||||
command := Command{}
|
||||
switch strings.ToUpper(fields[0]) {
|
||||
switch string(bytes.ToUpper(fields[0])) {
|
||||
case "FROM":
|
||||
command.Name = "model"
|
||||
command.Arg = fields[1]
|
||||
if command.Arg == "" {
|
||||
return nil, fmt.Errorf("no model specified in FROM line")
|
||||
}
|
||||
foundModel = true
|
||||
case "PROMPT":
|
||||
command.Name = "prompt"
|
||||
if fields[1] == `"""` {
|
||||
multiline = true
|
||||
multilineCommand = &command
|
||||
multilineCommand.Arg = ""
|
||||
} else {
|
||||
command.Arg = strings.Join(fields[1:], " ")
|
||||
}
|
||||
command.Args = string(fields[1])
|
||||
// copy command for validation
|
||||
modelCommand = command
|
||||
case "LICENSE", "TEMPLATE", "SYSTEM", "PROMPT":
|
||||
command.Name = string(bytes.ToLower(fields[0]))
|
||||
command.Args = string(fields[1])
|
||||
case "PARAMETER":
|
||||
command.Name = fields[1]
|
||||
command.Arg = strings.Join(fields[2:], " ")
|
||||
fields = bytes.SplitN(fields[1], []byte(" "), 2)
|
||||
command.Name = string(fields[0])
|
||||
command.Args = string(fields[1])
|
||||
default:
|
||||
continue
|
||||
}
|
||||
if !multiline {
|
||||
commands = append(commands, command)
|
||||
}
|
||||
|
||||
commands = append(commands, command)
|
||||
command.Reset()
|
||||
}
|
||||
|
||||
if !foundModel {
|
||||
if modelCommand.Args == "" {
|
||||
return nil, fmt.Errorf("no FROM line for the model was specified")
|
||||
}
|
||||
|
||||
if multiline {
|
||||
return nil, fmt.Errorf("unclosed multiline string")
|
||||
}
|
||||
return commands, scanner.Err()
|
||||
}
|
||||
|
||||
func scanModelfile(data []byte, atEOF bool) (advance int, token []byte, err error) {
|
||||
newline := bytes.IndexByte(data, '\n')
|
||||
|
||||
if start := bytes.Index(data, []byte(`"""`)); start >= 0 && start < newline {
|
||||
end := bytes.Index(data[start+3:], []byte(`"""`))
|
||||
if end < 0 {
|
||||
if atEOF {
|
||||
return 0, nil, errors.New(`unterminated multiline string: """`)
|
||||
} else {
|
||||
return 0, nil, nil
|
||||
}
|
||||
}
|
||||
|
||||
n := start + 3 + end + 3
|
||||
return n, bytes.Replace(data[:n], []byte(`"""`), []byte(""), 2), nil
|
||||
}
|
||||
|
||||
return bufio.ScanLines(data, atEOF)
|
||||
}
|
||||
|
21
progressbar/LICENSE
Normal file
21
progressbar/LICENSE
Normal file
@@ -0,0 +1,21 @@
|
||||
MIT License
|
||||
|
||||
Copyright (c) 2017 Zack
|
||||
|
||||
Permission is hereby granted, free of charge, to any person obtaining a copy
|
||||
of this software and associated documentation files (the "Software"), to deal
|
||||
in the Software without restriction, including without limitation the rights
|
||||
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
||||
copies of the Software, and to permit persons to whom the Software is
|
||||
furnished to do so, subject to the following conditions:
|
||||
|
||||
The above copyright notice and this permission notice shall be included in all
|
||||
copies or substantial portions of the Software.
|
||||
|
||||
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
||||
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
||||
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
||||
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
||||
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
||||
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
||||
SOFTWARE.
|
121
progressbar/README.md
Normal file
121
progressbar/README.md
Normal file
@@ -0,0 +1,121 @@
|
||||
# progressbar
|
||||
|
||||
[](https://github.com/schollz/progressbar/actions/workflows/ci.yml)
|
||||
[](https://goreportcard.com/report/github.com/schollz/progressbar)
|
||||
[](https://gocover.io/github.com/schollz/progressbar)
|
||||
[](https://godoc.org/github.com/schollz/progressbar/v3)
|
||||
|
||||
A very simple thread-safe progress bar which should work on every OS without problems. I needed a progressbar for [croc](https://github.com/schollz/croc) and everything I tried had problems, so I made another one. In order to be OS agnostic I do not plan to support [multi-line outputs](https://github.com/schollz/progressbar/issues/6).
|
||||
|
||||
|
||||
## Install
|
||||
|
||||
```
|
||||
go get -u github.com/schollz/progressbar/v3
|
||||
```
|
||||
|
||||
## Usage
|
||||
|
||||
### Basic usage
|
||||
|
||||
```golang
|
||||
bar := progressbar.Default(100)
|
||||
for i := 0; i < 100; i++ {
|
||||
bar.Add(1)
|
||||
time.Sleep(40 * time.Millisecond)
|
||||
}
|
||||
```
|
||||
|
||||
which looks like:
|
||||
|
||||

|
||||
|
||||
|
||||
### I/O operations
|
||||
|
||||
The `progressbar` implements an `io.Writer` so it can automatically detect the number of bytes written to a stream, so you can use it as a progressbar for an `io.Reader`.
|
||||
|
||||
```golang
|
||||
req, _ := http.NewRequest("GET", "https://dl.google.com/go/go1.14.2.src.tar.gz", nil)
|
||||
resp, _ := http.DefaultClient.Do(req)
|
||||
defer resp.Body.Close()
|
||||
|
||||
f, _ := os.OpenFile("go1.14.2.src.tar.gz", os.O_CREATE|os.O_WRONLY, 0644)
|
||||
defer f.Close()
|
||||
|
||||
bar := progressbar.DefaultBytes(
|
||||
resp.ContentLength,
|
||||
"downloading",
|
||||
)
|
||||
io.Copy(io.MultiWriter(f, bar), resp.Body)
|
||||
```
|
||||
|
||||
which looks like:
|
||||
|
||||

|
||||
|
||||
|
||||
### Progress bar with unknown length
|
||||
|
||||
A progressbar with unknown length is a spinner. Any bar with -1 length will automatically convert it to a spinner with a customizable spinner type. For example, the above code can be run and set the `resp.ContentLength` to `-1`.
|
||||
|
||||
which looks like:
|
||||
|
||||

|
||||
|
||||
|
||||
### Customization
|
||||
|
||||
There is a lot of customization that you can do - change the writer, the color, the width, description, theme, etc. See [all the options](https://pkg.go.dev/github.com/schollz/progressbar/v3?tab=doc#Option).
|
||||
|
||||
```golang
|
||||
bar := progressbar.NewOptions(1000,
|
||||
progressbar.OptionSetWriter(ansi.NewAnsiStdout()),
|
||||
progressbar.OptionEnableColorCodes(true),
|
||||
progressbar.OptionShowBytes(true),
|
||||
progressbar.OptionSetWidth(15),
|
||||
progressbar.OptionSetDescription("[cyan][1/3][reset] Writing moshable file..."),
|
||||
progressbar.OptionSetTheme(progressbar.Theme{
|
||||
Saucer: "[green]=[reset]",
|
||||
SaucerHead: "[green]>[reset]",
|
||||
SaucerPadding: " ",
|
||||
BarStart: "[",
|
||||
BarEnd: "]",
|
||||
}))
|
||||
for i := 0; i < 1000; i++ {
|
||||
bar.Add(1)
|
||||
time.Sleep(5 * time.Millisecond)
|
||||
}
|
||||
```
|
||||
|
||||
which looks like:
|
||||
|
||||

|
||||
|
||||
|
||||
## Contributing
|
||||
|
||||
Pull requests are welcome. Feel free to...
|
||||
|
||||
- Revise documentation
|
||||
- Add new features
|
||||
- Fix bugs
|
||||
- Suggest improvements
|
||||
|
||||
## Thanks
|
||||
|
||||
Thanks [@Dynom](https://github.com/dynom) for massive improvements in version 2.0!
|
||||
|
||||
Thanks [@CrushedPixel](https://github.com/CrushedPixel) for adding descriptions and color code support!
|
||||
|
||||
Thanks [@MrMe42](https://github.com/MrMe42) for adding some minor features!
|
||||
|
||||
Thanks [@tehstun](https://github.com/tehstun) for some great PRs!
|
||||
|
||||
Thanks [@Benzammour](https://github.com/Benzammour) and [@haseth](https://github.com/haseth) for helping create v3!
|
||||
|
||||
Thanks [@briandowns](https://github.com/briandowns) for compiling the list of spinners.
|
||||
|
||||
## License
|
||||
|
||||
MIT
|
1098
progressbar/progressbar.go
Normal file
1098
progressbar/progressbar.go
Normal file
File diff suppressed because it is too large
Load Diff
80
progressbar/spinners.go
Normal file
80
progressbar/spinners.go
Normal file
@@ -0,0 +1,80 @@
|
||||
package progressbar
|
||||
|
||||
var spinners = map[int][]string{
|
||||
0: {"←", "↖", "↑", "↗", "→", "↘", "↓", "↙"},
|
||||
1: {"▁", "▃", "▄", "▅", "▆", "▇", "█", "▇", "▆", "▅", "▄", "▃", "▁"},
|
||||
2: {"▖", "▘", "▝", "▗"},
|
||||
3: {"┤", "┘", "┴", "└", "├", "┌", "┬", "┐"},
|
||||
4: {"◢", "◣", "◤", "◥"},
|
||||
5: {"◰", "◳", "◲", "◱"},
|
||||
6: {"◴", "◷", "◶", "◵"},
|
||||
7: {"◐", "◓", "◑", "◒"},
|
||||
8: {".", "o", "O", "@", "*"},
|
||||
9: {"|", "/", "-", "\\"},
|
||||
10: {"◡◡", "⊙⊙", "◠◠"},
|
||||
11: {"⣾", "⣽", "⣻", "⢿", "⡿", "⣟", "⣯", "⣷"},
|
||||
12: {">))'>", " >))'>", " >))'>", " >))'>", " >))'>", " <'((<", " <'((<", " <'((<"},
|
||||
13: {"⠁", "⠂", "⠄", "⡀", "⢀", "⠠", "⠐", "⠈"},
|
||||
14: {"⠋", "⠙", "⠹", "⠸", "⠼", "⠴", "⠦", "⠧", "⠇", "⠏"},
|
||||
15: {"a", "b", "c", "d", "e", "f", "g", "h", "i", "j", "k", "l", "m", "n", "o", "p", "q", "r", "s", "t", "u", "v", "w", "x", "y", "z"},
|
||||
16: {"▉", "▊", "▋", "▌", "▍", "▎", "▏", "▎", "▍", "▌", "▋", "▊", "▉"},
|
||||
17: {"■", "□", "▪", "▫"},
|
||||
18: {"←", "↑", "→", "↓"},
|
||||
19: {"╫", "╪"},
|
||||
20: {"⇐", "⇖", "⇑", "⇗", "⇒", "⇘", "⇓", "⇙"},
|
||||
21: {"⠁", "⠁", "⠉", "⠙", "⠚", "⠒", "⠂", "⠂", "⠒", "⠲", "⠴", "⠤", "⠄", "⠄", "⠤", "⠠", "⠠", "⠤", "⠦", "⠖", "⠒", "⠐", "⠐", "⠒", "⠓", "⠋", "⠉", "⠈", "⠈"},
|
||||
22: {"⠈", "⠉", "⠋", "⠓", "⠒", "⠐", "⠐", "⠒", "⠖", "⠦", "⠤", "⠠", "⠠", "⠤", "⠦", "⠖", "⠒", "⠐", "⠐", "⠒", "⠓", "⠋", "⠉", "⠈"},
|
||||
23: {"⠁", "⠉", "⠙", "⠚", "⠒", "⠂", "⠂", "⠒", "⠲", "⠴", "⠤", "⠄", "⠄", "⠤", "⠴", "⠲", "⠒", "⠂", "⠂", "⠒", "⠚", "⠙", "⠉", "⠁"},
|
||||
24: {"⠋", "⠙", "⠚", "⠒", "⠂", "⠂", "⠒", "⠲", "⠴", "⠦", "⠖", "⠒", "⠐", "⠐", "⠒", "⠓", "⠋"},
|
||||
25: {"ヲ", "ァ", "ィ", "ゥ", "ェ", "ォ", "ャ", "ュ", "ョ", "ッ", "ア", "イ", "ウ", "エ", "オ", "カ", "キ", "ク", "ケ", "コ", "サ", "シ", "ス", "セ", "ソ", "タ", "チ", "ツ", "テ", "ト", "ナ", "ニ", "ヌ", "ネ", "ノ", "ハ", "ヒ", "フ", "ヘ", "ホ", "マ", "ミ", "ム", "メ", "モ", "ヤ", "ユ", "ヨ", "ラ", "リ", "ル", "レ", "ロ", "ワ", "ン"},
|
||||
26: {".", "..", "..."},
|
||||
27: {"▁", "▂", "▃", "▄", "▅", "▆", "▇", "█", "▉", "▊", "▋", "▌", "▍", "▎", "▏", "▏", "▎", "▍", "▌", "▋", "▊", "▉", "█", "▇", "▆", "▅", "▄", "▃", "▂", "▁"},
|
||||
28: {".", "o", "O", "°", "O", "o", "."},
|
||||
29: {"+", "x"},
|
||||
30: {"v", "<", "^", ">"},
|
||||
31: {">>--->", " >>--->", " >>--->", " >>--->", " >>--->", " <---<<", " <---<<", " <---<<", " <---<<", "<---<<"},
|
||||
32: {"|", "||", "|||", "||||", "|||||", "|||||||", "||||||||", "|||||||", "||||||", "|||||", "||||", "|||", "||", "|"},
|
||||
33: {"[ ]", "[= ]", "[== ]", "[=== ]", "[==== ]", "[===== ]", "[====== ]", "[======= ]", "[======== ]", "[========= ]", "[==========]"},
|
||||
34: {"(*---------)", "(-*--------)", "(--*-------)", "(---*------)", "(----*-----)", "(-----*----)", "(------*---)", "(-------*--)", "(--------*-)", "(---------*)"},
|
||||
35: {"█▒▒▒▒▒▒▒▒▒", "███▒▒▒▒▒▒▒", "█████▒▒▒▒▒", "███████▒▒▒", "██████████"},
|
||||
36: {"[ ]", "[=> ]", "[===> ]", "[=====> ]", "[======> ]", "[========> ]", "[==========> ]", "[============> ]", "[==============> ]", "[================> ]", "[==================> ]", "[===================>]"},
|
||||
37: {"ဝ", "၀"},
|
||||
38: {"▌", "▀", "▐▄"},
|
||||
39: {"🌍", "🌎", "🌏"},
|
||||
40: {"◜", "◝", "◞", "◟"},
|
||||
41: {"⬒", "⬔", "⬓", "⬕"},
|
||||
42: {"⬖", "⬘", "⬗", "⬙"},
|
||||
43: {"[>>> >]", "[]>>>> []", "[] >>>> []", "[] >>>> []", "[] >>>> []", "[] >>>>[]", "[>> >>]"},
|
||||
44: {"♠", "♣", "♥", "♦"},
|
||||
45: {"➞", "➟", "➠", "➡", "➠", "➟"},
|
||||
46: {" | ", ` \ `, "_ ", ` \ `, " | ", " / ", " _", " / "},
|
||||
47: {" . . . .", ". . . .", ". . . .", ". . . .", ". . . . ", ". . . . ."},
|
||||
48: {" | ", " / ", " _ ", ` \ `, " | ", ` \ `, " _ ", " / "},
|
||||
49: {"⎺", "⎻", "⎼", "⎽", "⎼", "⎻"},
|
||||
50: {"▹▹▹▹▹", "▸▹▹▹▹", "▹▸▹▹▹", "▹▹▸▹▹", "▹▹▹▸▹", "▹▹▹▹▸"},
|
||||
51: {"[ ]", "[ =]", "[ ==]", "[ ===]", "[====]", "[=== ]", "[== ]", "[= ]"},
|
||||
52: {"( ● )", "( ● )", "( ● )", "( ● )", "( ●)", "( ● )", "( ● )", "( ● )", "( ● )"},
|
||||
53: {"✶", "✸", "✹", "✺", "✹", "✷"},
|
||||
54: {"▐|\\____________▌", "▐_|\\___________▌", "▐__|\\__________▌", "▐___|\\_________▌", "▐____|\\________▌", "▐_____|\\_______▌", "▐______|\\______▌", "▐_______|\\_____▌", "▐________|\\____▌", "▐_________|\\___▌", "▐__________|\\__▌", "▐___________|\\_▌", "▐____________|\\▌", "▐____________/|▌", "▐___________/|_▌", "▐__________/|__▌", "▐_________/|___▌", "▐________/|____▌", "▐_______/|_____▌", "▐______/|______▌", "▐_____/|_______▌", "▐____/|________▌", "▐___/|_________▌", "▐__/|__________▌", "▐_/|___________▌", "▐/|____________▌"},
|
||||
55: {"▐⠂ ▌", "▐⠈ ▌", "▐ ⠂ ▌", "▐ ⠠ ▌", "▐ ⡀ ▌", "▐ ⠠ ▌", "▐ ⠂ ▌", "▐ ⠈ ▌", "▐ ⠂ ▌", "▐ ⠠ ▌", "▐ ⡀ ▌", "▐ ⠠ ▌", "▐ ⠂ ▌", "▐ ⠈ ▌", "▐ ⠂▌", "▐ ⠠▌", "▐ ⡀▌", "▐ ⠠ ▌", "▐ ⠂ ▌", "▐ ⠈ ▌", "▐ ⠂ ▌", "▐ ⠠ ▌", "▐ ⡀ ▌", "▐ ⠠ ▌", "▐ ⠂ ▌", "▐ ⠈ ▌", "▐ ⠂ ▌", "▐ ⠠ ▌", "▐ ⡀ ▌", "▐⠠ ▌"},
|
||||
56: {"¿", "?"},
|
||||
57: {"⢹", "⢺", "⢼", "⣸", "⣇", "⡧", "⡗", "⡏"},
|
||||
58: {"⢄", "⢂", "⢁", "⡁", "⡈", "⡐", "⡠"},
|
||||
59: {". ", ".. ", "...", " ..", " .", " "},
|
||||
60: {".", "o", "O", "°", "O", "o", "."},
|
||||
61: {"▓", "▒", "░"},
|
||||
62: {"▌", "▀", "▐", "▄"},
|
||||
63: {"⊶", "⊷"},
|
||||
64: {"▪", "▫"},
|
||||
65: {"□", "■"},
|
||||
66: {"▮", "▯"},
|
||||
67: {"-", "=", "≡"},
|
||||
68: {"d", "q", "p", "b"},
|
||||
69: {"∙∙∙", "●∙∙", "∙●∙", "∙∙●", "∙∙∙"},
|
||||
70: {"🌑 ", "🌒 ", "🌓 ", "🌔 ", "🌕 ", "🌖 ", "🌗 ", "🌘 "},
|
||||
71: {"☗", "☖"},
|
||||
72: {"⧇", "⧆"},
|
||||
73: {"◉", "◎"},
|
||||
74: {"㊂", "㊀", "㊁"},
|
||||
75: {"⦾", "⦿"},
|
||||
}
|
470
server/images.go
470
server/images.go
@@ -16,18 +16,54 @@ import (
|
||||
"reflect"
|
||||
"strconv"
|
||||
"strings"
|
||||
"text/template"
|
||||
|
||||
"github.com/jmorganca/ollama/api"
|
||||
"github.com/jmorganca/ollama/parser"
|
||||
)
|
||||
|
||||
type RegistryOptions struct {
|
||||
Insecure bool
|
||||
Username string
|
||||
Password string
|
||||
}
|
||||
|
||||
type Model struct {
|
||||
Name string `json:"name"`
|
||||
ModelPath string
|
||||
Prompt string
|
||||
Template string
|
||||
System string
|
||||
Options api.Options
|
||||
}
|
||||
|
||||
func (m *Model) Prompt(request api.GenerateRequest) (string, error) {
|
||||
tmpl, err := template.New("").Parse(m.Template)
|
||||
if err != nil {
|
||||
return "", err
|
||||
}
|
||||
|
||||
var vars struct {
|
||||
First bool
|
||||
System string
|
||||
Prompt string
|
||||
|
||||
// deprecated: versions <= 0.0.7 used this to omit the system prompt
|
||||
Context []int
|
||||
}
|
||||
|
||||
vars.First = len(request.Context) == 0
|
||||
vars.System = m.System
|
||||
vars.Prompt = request.Prompt
|
||||
vars.Context = request.Context
|
||||
|
||||
var sb strings.Builder
|
||||
if err := tmpl.Execute(&sb, vars); err != nil {
|
||||
return "", err
|
||||
}
|
||||
|
||||
return sb.String(), nil
|
||||
}
|
||||
|
||||
type ManifestV2 struct {
|
||||
SchemaVersion int `json:"schemaVersion"`
|
||||
MediaType string `json:"mediaType"`
|
||||
@@ -71,20 +107,19 @@ func GetManifest(mp ModelPath) (*ManifestV2, error) {
|
||||
if err != nil {
|
||||
return nil, err
|
||||
}
|
||||
if _, err = os.Stat(fp); err != nil && !errors.Is(err, os.ErrNotExist) {
|
||||
return nil, fmt.Errorf("couldn't find model '%s'", mp.GetShortTagname())
|
||||
|
||||
if _, err = os.Stat(fp); err != nil {
|
||||
return nil, err
|
||||
}
|
||||
|
||||
var manifest *ManifestV2
|
||||
|
||||
f, err := os.Open(fp)
|
||||
bts, err := os.ReadFile(fp)
|
||||
if err != nil {
|
||||
return nil, fmt.Errorf("couldn't open file '%s'", fp)
|
||||
}
|
||||
|
||||
decoder := json.NewDecoder(f)
|
||||
err = decoder.Decode(&manifest)
|
||||
if err != nil {
|
||||
if err := json.Unmarshal(bts, &manifest); err != nil {
|
||||
return nil, err
|
||||
}
|
||||
|
||||
@@ -112,12 +147,27 @@ func GetModel(name string) (*Model, error) {
|
||||
switch layer.MediaType {
|
||||
case "application/vnd.ollama.image.model":
|
||||
model.ModelPath = filename
|
||||
case "application/vnd.ollama.image.prompt":
|
||||
data, err := os.ReadFile(filename)
|
||||
case "application/vnd.ollama.image.template":
|
||||
bts, err := os.ReadFile(filename)
|
||||
if err != nil {
|
||||
return nil, err
|
||||
}
|
||||
model.Prompt = string(data)
|
||||
|
||||
model.Template = string(bts)
|
||||
case "application/vnd.ollama.image.system":
|
||||
bts, err := os.ReadFile(filename)
|
||||
if err != nil {
|
||||
return nil, err
|
||||
}
|
||||
|
||||
model.System = string(bts)
|
||||
case "application/vnd.ollama.image.prompt":
|
||||
bts, err := os.ReadFile(filename)
|
||||
if err != nil {
|
||||
return nil, err
|
||||
}
|
||||
|
||||
model.Template = string(bts)
|
||||
case "application/vnd.ollama.image.params":
|
||||
params, err := os.Open(filename)
|
||||
if err != nil {
|
||||
@@ -137,25 +187,17 @@ func GetModel(name string) (*Model, error) {
|
||||
return model, nil
|
||||
}
|
||||
|
||||
func getAbsPath(fp string) (string, error) {
|
||||
if strings.HasPrefix(fp, "~/") {
|
||||
parts := strings.Split(fp, "/")
|
||||
home, err := os.UserHomeDir()
|
||||
if err != nil {
|
||||
return "", err
|
||||
}
|
||||
|
||||
fp = filepath.Join(home, filepath.Join(parts[1:]...))
|
||||
func CreateModel(name string, path string, fn func(status string)) error {
|
||||
mf, err := os.Open(path)
|
||||
if err != nil {
|
||||
fn(fmt.Sprintf("couldn't open modelfile '%s'", path))
|
||||
return fmt.Errorf("failed to open file: %w", err)
|
||||
}
|
||||
defer mf.Close()
|
||||
|
||||
return os.ExpandEnv(fp), nil
|
||||
}
|
||||
|
||||
func CreateModel(name string, mf io.Reader, fn func(status string)) error {
|
||||
fn("parsing modelfile")
|
||||
commands, err := parser.Parse(mf)
|
||||
if err != nil {
|
||||
fn(fmt.Sprintf("error: %v", err))
|
||||
return err
|
||||
}
|
||||
|
||||
@@ -163,30 +205,39 @@ func CreateModel(name string, mf io.Reader, fn func(status string)) error {
|
||||
params := make(map[string]string)
|
||||
|
||||
for _, c := range commands {
|
||||
log.Printf("[%s] - %s\n", c.Name, c.Arg)
|
||||
log.Printf("[%s] - %s\n", c.Name, c.Args)
|
||||
switch c.Name {
|
||||
case "model":
|
||||
fn("looking for model")
|
||||
mf, err := GetManifest(ParseModelPath(c.Arg))
|
||||
mf, err := GetManifest(ParseModelPath(c.Args))
|
||||
if err != nil {
|
||||
// if we couldn't read the manifest, try getting the bin file
|
||||
fp, err := getAbsPath(c.Arg)
|
||||
if err != nil {
|
||||
fn("error determing path. exiting.")
|
||||
return err
|
||||
fp := c.Args
|
||||
|
||||
// If filePath starts with ~/, replace it with the user's home directory.
|
||||
if strings.HasPrefix(fp, "~/") {
|
||||
parts := strings.Split(fp, "/")
|
||||
home, err := os.UserHomeDir()
|
||||
if err != nil {
|
||||
return fmt.Errorf("failed to open file: %v", err)
|
||||
}
|
||||
|
||||
fp = filepath.Join(home, filepath.Join(parts[1:]...))
|
||||
}
|
||||
|
||||
// If filePath is not an absolute path, make it relative to the modelfile path
|
||||
if !filepath.IsAbs(fp) {
|
||||
fp = filepath.Join(filepath.Dir(path), fp)
|
||||
}
|
||||
|
||||
fn("creating model layer")
|
||||
file, err := os.Open(fp)
|
||||
if err != nil {
|
||||
fn(fmt.Sprintf("couldn't find model '%s'", c.Arg))
|
||||
return fmt.Errorf("failed to open file: %v", err)
|
||||
}
|
||||
defer file.Close()
|
||||
|
||||
l, err := CreateLayer(file)
|
||||
if err != nil {
|
||||
fn(fmt.Sprintf("couldn't create model layer: %v", err))
|
||||
return fmt.Errorf("failed to create layer: %v", err)
|
||||
}
|
||||
l.MediaType = "application/vnd.ollama.image.model"
|
||||
@@ -196,27 +247,26 @@ func CreateModel(name string, mf io.Reader, fn func(status string)) error {
|
||||
for _, l := range mf.Layers {
|
||||
newLayer, err := GetLayerWithBufferFromLayer(l)
|
||||
if err != nil {
|
||||
fn(fmt.Sprintf("couldn't read layer: %v", err))
|
||||
return err
|
||||
}
|
||||
layers = append(layers, newLayer)
|
||||
}
|
||||
}
|
||||
case "prompt":
|
||||
fn("creating prompt layer")
|
||||
case "license", "template", "system", "prompt":
|
||||
fn(fmt.Sprintf("creating %s layer", c.Name))
|
||||
// remove the prompt layer if one exists
|
||||
layers = removeLayerFromLayers(layers, "application/vnd.ollama.image.prompt")
|
||||
mediaType := fmt.Sprintf("application/vnd.ollama.image.%s", c.Name)
|
||||
layers = removeLayerFromLayers(layers, mediaType)
|
||||
|
||||
prompt := strings.NewReader(c.Arg)
|
||||
l, err := CreateLayer(prompt)
|
||||
layer, err := CreateLayer(strings.NewReader(c.Args))
|
||||
if err != nil {
|
||||
fn(fmt.Sprintf("couldn't create prompt layer: %v", err))
|
||||
return fmt.Errorf("failed to create layer: %v", err)
|
||||
return err
|
||||
}
|
||||
l.MediaType = "application/vnd.ollama.image.prompt"
|
||||
layers = append(layers, l)
|
||||
|
||||
layer.MediaType = mediaType
|
||||
layers = append(layers, layer)
|
||||
default:
|
||||
params[c.Name] = c.Arg
|
||||
params[c.Name] = c.Args
|
||||
}
|
||||
}
|
||||
|
||||
@@ -256,7 +306,6 @@ func CreateModel(name string, mf io.Reader, fn func(status string)) error {
|
||||
|
||||
err = SaveLayers(layers, fn, false)
|
||||
if err != nil {
|
||||
fn(fmt.Sprintf("error saving layers: %v", err))
|
||||
return err
|
||||
}
|
||||
|
||||
@@ -264,7 +313,6 @@ func CreateModel(name string, mf io.Reader, fn func(status string)) error {
|
||||
fn("writing manifest")
|
||||
err = CreateManifest(name, cfg, manifestLayers)
|
||||
if err != nil {
|
||||
fn(fmt.Sprintf("error creating manifest: %v", err))
|
||||
return err
|
||||
}
|
||||
|
||||
@@ -445,7 +493,110 @@ func CreateLayer(f io.ReadSeeker) (*LayerReader, error) {
|
||||
return layer, nil
|
||||
}
|
||||
|
||||
func PushModel(name, username, password string, fn func(api.ProgressResponse)) error {
|
||||
func CopyModel(src, dest string) error {
|
||||
srcPath, err := ParseModelPath(src).GetManifestPath(false)
|
||||
if err != nil {
|
||||
return err
|
||||
}
|
||||
destPath, err := ParseModelPath(dest).GetManifestPath(true)
|
||||
if err != nil {
|
||||
return err
|
||||
}
|
||||
|
||||
// copy the file
|
||||
input, err := ioutil.ReadFile(srcPath)
|
||||
if err != nil {
|
||||
fmt.Println("Error reading file:", err)
|
||||
return err
|
||||
}
|
||||
|
||||
err = ioutil.WriteFile(destPath, input, 0644)
|
||||
if err != nil {
|
||||
fmt.Println("Error reading file:", err)
|
||||
return err
|
||||
}
|
||||
|
||||
return nil
|
||||
}
|
||||
|
||||
func DeleteModel(name string) error {
|
||||
mp := ParseModelPath(name)
|
||||
|
||||
manifest, err := GetManifest(mp)
|
||||
if err != nil {
|
||||
return err
|
||||
}
|
||||
deleteMap := make(map[string]bool)
|
||||
for _, layer := range manifest.Layers {
|
||||
deleteMap[layer.Digest] = true
|
||||
}
|
||||
deleteMap[manifest.Config.Digest] = true
|
||||
|
||||
fp, err := GetManifestPath()
|
||||
if err != nil {
|
||||
return err
|
||||
}
|
||||
err = filepath.Walk(fp, func(path string, info os.FileInfo, err error) error {
|
||||
if err != nil {
|
||||
return err
|
||||
}
|
||||
if !info.IsDir() {
|
||||
path := path[len(fp)+1:]
|
||||
slashIndex := strings.LastIndex(path, "/")
|
||||
if slashIndex == -1 {
|
||||
return nil
|
||||
}
|
||||
tag := path[:slashIndex] + ":" + path[slashIndex+1:]
|
||||
fmp := ParseModelPath(tag)
|
||||
|
||||
// skip the manifest we're trying to delete
|
||||
if mp.GetFullTagname() == fmp.GetFullTagname() {
|
||||
return nil
|
||||
}
|
||||
|
||||
// save (i.e. delete from the deleteMap) any files used in other manifests
|
||||
manifest, err := GetManifest(fmp)
|
||||
if err != nil {
|
||||
log.Printf("skipping file: %s", fp)
|
||||
return nil
|
||||
}
|
||||
for _, layer := range manifest.Layers {
|
||||
delete(deleteMap, layer.Digest)
|
||||
}
|
||||
delete(deleteMap, manifest.Config.Digest)
|
||||
}
|
||||
return nil
|
||||
})
|
||||
|
||||
// only delete the files which are still in the deleteMap
|
||||
for k, v := range deleteMap {
|
||||
if v {
|
||||
fp, err := GetBlobsPath(k)
|
||||
if err != nil {
|
||||
log.Printf("couldn't get file path for '%s': %v", k, err)
|
||||
continue
|
||||
}
|
||||
if err := os.Remove(fp); err != nil {
|
||||
log.Printf("couldn't remove file '%s': %v", fp, err)
|
||||
continue
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
fp, err = mp.GetManifestPath(false)
|
||||
if err != nil {
|
||||
return err
|
||||
}
|
||||
err = os.Remove(fp)
|
||||
if err != nil {
|
||||
log.Printf("couldn't remove manifest file '%s': %v", fp, err)
|
||||
return err
|
||||
}
|
||||
|
||||
return nil
|
||||
}
|
||||
|
||||
func PushModel(name string, regOpts *RegistryOptions, fn func(api.ProgressResponse)) error {
|
||||
mp := ParseModelPath(name)
|
||||
|
||||
fn(api.ProgressResponse{Status: "retrieving manifest"})
|
||||
@@ -457,65 +608,49 @@ func PushModel(name, username, password string, fn func(api.ProgressResponse)) e
|
||||
}
|
||||
|
||||
var layers []*Layer
|
||||
var total int
|
||||
var completed int
|
||||
for _, layer := range manifest.Layers {
|
||||
layers = append(layers, layer)
|
||||
total += layer.Size
|
||||
}
|
||||
layers = append(layers, &manifest.Config)
|
||||
total += manifest.Config.Size
|
||||
|
||||
for _, layer := range layers {
|
||||
exists, err := checkBlobExistence(mp, layer.Digest, username, password)
|
||||
exists, err := checkBlobExistence(mp, layer.Digest, regOpts)
|
||||
if err != nil {
|
||||
return err
|
||||
}
|
||||
|
||||
if exists {
|
||||
completed += layer.Size
|
||||
fn(api.ProgressResponse{
|
||||
Status: "using existing layer",
|
||||
Digest: layer.Digest,
|
||||
Total: total,
|
||||
Completed: completed,
|
||||
Total: layer.Size,
|
||||
Completed: layer.Size,
|
||||
})
|
||||
log.Printf("Layer %s already exists", layer.Digest)
|
||||
continue
|
||||
}
|
||||
|
||||
fn(api.ProgressResponse{
|
||||
Status: "starting upload",
|
||||
Digest: layer.Digest,
|
||||
Total: total,
|
||||
Completed: completed,
|
||||
Status: "starting upload",
|
||||
Digest: layer.Digest,
|
||||
Total: layer.Size,
|
||||
})
|
||||
|
||||
location, err := startUpload(mp, username, password)
|
||||
location, err := startUpload(mp, regOpts)
|
||||
if err != nil {
|
||||
log.Printf("couldn't start upload: %v", err)
|
||||
return err
|
||||
}
|
||||
|
||||
err = uploadBlob(location, layer, username, password)
|
||||
err = uploadBlobChunked(mp, location, layer, regOpts, fn)
|
||||
if err != nil {
|
||||
log.Printf("error uploading blob: %v", err)
|
||||
return err
|
||||
}
|
||||
completed += layer.Size
|
||||
fn(api.ProgressResponse{
|
||||
Status: "upload complete",
|
||||
Digest: layer.Digest,
|
||||
Total: total,
|
||||
Completed: completed,
|
||||
})
|
||||
}
|
||||
|
||||
fn(api.ProgressResponse{
|
||||
Status: "pushing manifest",
|
||||
Total: total,
|
||||
Completed: completed,
|
||||
})
|
||||
url := fmt.Sprintf("%s://%s/v2/%s/manifests/%s", mp.ProtocolScheme, mp.Registry, mp.GetNamespaceRepository(), mp.Tag)
|
||||
fn(api.ProgressResponse{Status: "pushing manifest"})
|
||||
url := fmt.Sprintf("%s/v2/%s/manifests/%s", mp.Registry, mp.GetNamespaceRepository(), mp.Tag)
|
||||
headers := map[string]string{
|
||||
"Content-Type": "application/vnd.docker.distribution.manifest.v2+json",
|
||||
}
|
||||
@@ -525,7 +660,7 @@ func PushModel(name, username, password string, fn func(api.ProgressResponse)) e
|
||||
return err
|
||||
}
|
||||
|
||||
resp, err := makeRequest("PUT", url, headers, bytes.NewReader(manifestJSON), username, password)
|
||||
resp, err := makeRequest("PUT", url, headers, bytes.NewReader(manifestJSON), regOpts)
|
||||
if err != nil {
|
||||
return err
|
||||
}
|
||||
@@ -537,42 +672,36 @@ func PushModel(name, username, password string, fn func(api.ProgressResponse)) e
|
||||
return fmt.Errorf("registry responded with code %d: %v", resp.StatusCode, string(body))
|
||||
}
|
||||
|
||||
fn(api.ProgressResponse{
|
||||
Status: "success",
|
||||
Total: total,
|
||||
Completed: completed,
|
||||
})
|
||||
fn(api.ProgressResponse{Status: "success"})
|
||||
|
||||
return nil
|
||||
}
|
||||
|
||||
func PullModel(name, username, password string, fn func(api.ProgressResponse)) error {
|
||||
func PullModel(name string, regOpts *RegistryOptions, fn func(api.ProgressResponse)) error {
|
||||
mp := ParseModelPath(name)
|
||||
|
||||
fn(api.ProgressResponse{Status: "pulling manifest"})
|
||||
|
||||
manifest, err := pullModelManifest(mp, username, password)
|
||||
manifest, err := pullModelManifest(mp, regOpts)
|
||||
if err != nil {
|
||||
return fmt.Errorf("pull model manifest: %q", err)
|
||||
}
|
||||
|
||||
var layers []*Layer
|
||||
var total int
|
||||
var completed int
|
||||
for _, layer := range manifest.Layers {
|
||||
layers = append(layers, layer)
|
||||
total += layer.Size
|
||||
}
|
||||
layers = append(layers, manifest.Layers...)
|
||||
layers = append(layers, &manifest.Config)
|
||||
total += manifest.Config.Size
|
||||
|
||||
for _, layer := range layers {
|
||||
if err := downloadBlob(mp, layer.Digest, username, password, fn); err != nil {
|
||||
fn(api.ProgressResponse{Status: fmt.Sprintf("error downloading: %v", err), Digest: layer.Digest})
|
||||
if err := downloadBlob(mp, layer.Digest, regOpts, fn); err != nil {
|
||||
return err
|
||||
}
|
||||
}
|
||||
|
||||
completed += layer.Size
|
||||
fn(api.ProgressResponse{Status: "verifying sha256 digest"})
|
||||
for _, layer := range layers {
|
||||
if err := verifyBlob(layer.Digest); err != nil {
|
||||
return err
|
||||
}
|
||||
}
|
||||
|
||||
fn(api.ProgressResponse{Status: "writing manifest"})
|
||||
@@ -587,7 +716,7 @@ func PullModel(name, username, password string, fn func(api.ProgressResponse)) e
|
||||
return err
|
||||
}
|
||||
|
||||
err = os.WriteFile(fp, manifestJSON, 0644)
|
||||
err = os.WriteFile(fp, manifestJSON, 0o644)
|
||||
if err != nil {
|
||||
log.Printf("couldn't write to %s", fp)
|
||||
return err
|
||||
@@ -598,13 +727,13 @@ func PullModel(name, username, password string, fn func(api.ProgressResponse)) e
|
||||
return nil
|
||||
}
|
||||
|
||||
func pullModelManifest(mp ModelPath, username, password string) (*ManifestV2, error) {
|
||||
url := fmt.Sprintf("%s://%s/v2/%s/manifests/%s", mp.ProtocolScheme, mp.Registry, mp.GetNamespaceRepository(), mp.Tag)
|
||||
func pullModelManifest(mp ModelPath, regOpts *RegistryOptions) (*ManifestV2, error) {
|
||||
url := fmt.Sprintf("%s/v2/%s/manifests/%s", mp.Registry, mp.GetNamespaceRepository(), mp.Tag)
|
||||
headers := map[string]string{
|
||||
"Accept": "application/vnd.docker.distribution.manifest.v2+json",
|
||||
}
|
||||
|
||||
resp, err := makeRequest("GET", url, headers, nil, username, password)
|
||||
resp, err := makeRequest("GET", url, headers, nil, regOpts)
|
||||
if err != nil {
|
||||
log.Printf("couldn't get manifest: %v", err)
|
||||
return nil, err
|
||||
@@ -641,8 +770,7 @@ func createConfigLayer(layers []string) (*LayerReader, error) {
|
||||
return nil, err
|
||||
}
|
||||
|
||||
buf := bytes.NewBuffer(configJSON)
|
||||
digest, size := GetSHA256Digest(buf)
|
||||
digest, size := GetSHA256Digest(bytes.NewBuffer(configJSON))
|
||||
|
||||
layer := &LayerReader{
|
||||
Layer: Layer{
|
||||
@@ -650,7 +778,7 @@ func createConfigLayer(layers []string) (*LayerReader, error) {
|
||||
Digest: digest,
|
||||
Size: size,
|
||||
},
|
||||
Reader: buf,
|
||||
Reader: bytes.NewBuffer(configJSON),
|
||||
}
|
||||
return layer, nil
|
||||
}
|
||||
@@ -666,10 +794,10 @@ func GetSHA256Digest(r io.Reader) (string, int) {
|
||||
return fmt.Sprintf("sha256:%x", h.Sum(nil)), int(n)
|
||||
}
|
||||
|
||||
func startUpload(mp ModelPath, username string, password string) (string, error) {
|
||||
url := fmt.Sprintf("%s://%s/v2/%s/blobs/uploads/", mp.ProtocolScheme, mp.Registry, mp.GetNamespaceRepository())
|
||||
func startUpload(mp ModelPath, regOpts *RegistryOptions) (string, error) {
|
||||
url := fmt.Sprintf("%s/v2/%s/blobs/uploads/", mp.Registry, mp.GetNamespaceRepository())
|
||||
|
||||
resp, err := makeRequest("POST", url, nil, nil, username, password)
|
||||
resp, err := makeRequest("POST", url, nil, nil, regOpts)
|
||||
if err != nil {
|
||||
log.Printf("couldn't start upload: %v", err)
|
||||
return "", err
|
||||
@@ -692,10 +820,10 @@ func startUpload(mp ModelPath, username string, password string) (string, error)
|
||||
}
|
||||
|
||||
// Function to check if a blob already exists in the Docker registry
|
||||
func checkBlobExistence(mp ModelPath, digest string, username string, password string) (bool, error) {
|
||||
url := fmt.Sprintf("%s://%s/v2/%s/blobs/%s", mp.ProtocolScheme, mp.Registry, mp.GetNamespaceRepository(), digest)
|
||||
func checkBlobExistence(mp ModelPath, digest string, regOpts *RegistryOptions) (bool, error) {
|
||||
url := fmt.Sprintf("%s/v2/%s/blobs/%s", mp.Registry, mp.GetNamespaceRepository(), digest)
|
||||
|
||||
resp, err := makeRequest("HEAD", url, nil, nil, username, password)
|
||||
resp, err := makeRequest("HEAD", url, nil, nil, regOpts)
|
||||
if err != nil {
|
||||
log.Printf("couldn't check for blob: %v", err)
|
||||
return false, err
|
||||
@@ -706,19 +834,14 @@ func checkBlobExistence(mp ModelPath, digest string, username string, password s
|
||||
return resp.StatusCode == http.StatusOK, nil
|
||||
}
|
||||
|
||||
func uploadBlob(location string, layer *Layer, username string, password string) error {
|
||||
// Create URL
|
||||
url := fmt.Sprintf("%s&digest=%s", location, layer.Digest)
|
||||
|
||||
headers := make(map[string]string)
|
||||
headers["Content-Length"] = fmt.Sprintf("%d", layer.Size)
|
||||
headers["Content-Type"] = "application/octet-stream"
|
||||
|
||||
// TODO change from monolithic uploads to chunked uploads
|
||||
func uploadBlobChunked(mp ModelPath, location string, layer *Layer, regOpts *RegistryOptions, fn func(api.ProgressResponse)) error {
|
||||
// TODO allow resumability
|
||||
// TODO allow canceling uploads via DELETE
|
||||
// TODO allow cross repo blob mount
|
||||
|
||||
// Create URL
|
||||
url := fmt.Sprintf("%s", location)
|
||||
|
||||
fp, err := GetBlobsPath(layer.Digest)
|
||||
if err != nil {
|
||||
return err
|
||||
@@ -729,23 +852,76 @@ func uploadBlob(location string, layer *Layer, username string, password string)
|
||||
return err
|
||||
}
|
||||
|
||||
resp, err := makeRequest("PUT", url, headers, f, username, password)
|
||||
if err != nil {
|
||||
log.Printf("couldn't upload blob: %v", err)
|
||||
return err
|
||||
}
|
||||
defer resp.Body.Close()
|
||||
headers := make(map[string]string)
|
||||
headers["Content-Type"] = "application/octet-stream"
|
||||
|
||||
// Check for success: For a successful upload, the Docker registry will respond with a 201 Created
|
||||
if resp.StatusCode != http.StatusCreated {
|
||||
body, _ := io.ReadAll(resp.Body)
|
||||
return fmt.Errorf("registry responded with code %d: %v", resp.StatusCode, string(body))
|
||||
}
|
||||
chunkSize := 1 << 20
|
||||
buf := make([]byte, chunkSize)
|
||||
var totalUploaded int
|
||||
|
||||
for {
|
||||
n, err := f.Read(buf)
|
||||
if err != nil {
|
||||
return err
|
||||
}
|
||||
|
||||
headers["Content-Length"] = fmt.Sprintf("%d", n)
|
||||
headers["Content-Range"] = fmt.Sprintf("%d-%d", totalUploaded, totalUploaded+n-1)
|
||||
|
||||
fn(api.ProgressResponse{
|
||||
Status: fmt.Sprintf("uploading %s", layer.Digest),
|
||||
Digest: layer.Digest,
|
||||
Total: int(layer.Size),
|
||||
Completed: int(totalUploaded),
|
||||
})
|
||||
|
||||
// change the buffersize for the last chunk
|
||||
if n < chunkSize {
|
||||
buf = buf[:n]
|
||||
}
|
||||
resp, err := makeRequest("PATCH", url, headers, bytes.NewReader(buf), regOpts)
|
||||
if err != nil {
|
||||
log.Printf("couldn't upload blob: %v", err)
|
||||
return err
|
||||
}
|
||||
defer resp.Body.Close()
|
||||
url = resp.Header.Get("Location")
|
||||
|
||||
// Check for success: For a successful upload, the Docker registry will respond with a 201 Created
|
||||
if resp.StatusCode != http.StatusAccepted {
|
||||
fn(api.ProgressResponse{
|
||||
Status: fmt.Sprintf("error uploading layer"),
|
||||
Digest: layer.Digest,
|
||||
Total: int(layer.Size),
|
||||
Completed: int(totalUploaded),
|
||||
})
|
||||
body, _ := io.ReadAll(resp.Body)
|
||||
return fmt.Errorf("registry responded with code %d: %v", resp.StatusCode, string(body))
|
||||
}
|
||||
|
||||
totalUploaded += n
|
||||
if totalUploaded >= layer.Size {
|
||||
url = fmt.Sprintf("%s&digest=%s", url, layer.Digest)
|
||||
|
||||
// finish the upload
|
||||
resp, err := makeRequest("PUT", url, nil, nil, regOpts)
|
||||
if err != nil {
|
||||
log.Printf("couldn't finish upload: %v", err)
|
||||
return err
|
||||
}
|
||||
defer resp.Body.Close()
|
||||
|
||||
if resp.StatusCode != http.StatusCreated {
|
||||
body, _ := io.ReadAll(resp.Body)
|
||||
return fmt.Errorf("registry responded with code %d: %v", resp.StatusCode, string(body))
|
||||
}
|
||||
break
|
||||
}
|
||||
}
|
||||
return nil
|
||||
}
|
||||
|
||||
func downloadBlob(mp ModelPath, digest string, username, password string, fn func(api.ProgressResponse)) error {
|
||||
func downloadBlob(mp ModelPath, digest string, regOpts *RegistryOptions, fn func(api.ProgressResponse)) error {
|
||||
fp, err := GetBlobsPath(digest)
|
||||
if err != nil {
|
||||
return err
|
||||
@@ -774,12 +950,12 @@ func downloadBlob(mp ModelPath, digest string, username, password string, fn fun
|
||||
size = fi.Size()
|
||||
}
|
||||
|
||||
url := fmt.Sprintf("%s://%s/v2/%s/blobs/%s", mp.ProtocolScheme, mp.Registry, mp.GetNamespaceRepository(), digest)
|
||||
url := fmt.Sprintf("%s/v2/%s/blobs/%s", mp.Registry, mp.GetNamespaceRepository(), digest)
|
||||
headers := map[string]string{
|
||||
"Range": fmt.Sprintf("bytes=%d-", size),
|
||||
}
|
||||
|
||||
resp, err := makeRequest("GET", url, headers, nil, username, password)
|
||||
resp, err := makeRequest("GET", url, headers, nil, regOpts)
|
||||
if err != nil {
|
||||
log.Printf("couldn't download blob: %v", err)
|
||||
return err
|
||||
@@ -815,6 +991,10 @@ func downloadBlob(mp ModelPath, digest string, username, password string, fn fun
|
||||
})
|
||||
|
||||
if completed >= total {
|
||||
if err := out.Close(); err != nil {
|
||||
return err
|
||||
}
|
||||
|
||||
if err := os.Rename(fp+"-partial", fp); err != nil {
|
||||
fn(api.ProgressResponse{
|
||||
Status: fmt.Sprintf("error renaming file: %v", err),
|
||||
@@ -839,7 +1019,15 @@ func downloadBlob(mp ModelPath, digest string, username, password string, fn fun
|
||||
return nil
|
||||
}
|
||||
|
||||
func makeRequest(method, url string, headers map[string]string, body io.Reader, username, password string) (*http.Response, error) {
|
||||
func makeRequest(method, url string, headers map[string]string, body io.Reader, regOpts *RegistryOptions) (*http.Response, error) {
|
||||
if !strings.HasPrefix(url, "http") {
|
||||
if regOpts.Insecure {
|
||||
url = "http://" + url
|
||||
} else {
|
||||
url = "https://" + url
|
||||
}
|
||||
}
|
||||
|
||||
req, err := http.NewRequest(method, url, body)
|
||||
if err != nil {
|
||||
return nil, err
|
||||
@@ -850,8 +1038,8 @@ func makeRequest(method, url string, headers map[string]string, body io.Reader,
|
||||
}
|
||||
|
||||
// TODO: better auth
|
||||
if username != "" && password != "" {
|
||||
req.SetBasicAuth(username, password)
|
||||
if regOpts.Username != "" && regOpts.Password != "" {
|
||||
req.SetBasicAuth(regOpts.Username, regOpts.Password)
|
||||
}
|
||||
|
||||
client := &http.Client{
|
||||
@@ -870,3 +1058,23 @@ func makeRequest(method, url string, headers map[string]string, body io.Reader,
|
||||
|
||||
return resp, nil
|
||||
}
|
||||
|
||||
func verifyBlob(digest string) error {
|
||||
fp, err := GetBlobsPath(digest)
|
||||
if err != nil {
|
||||
return err
|
||||
}
|
||||
|
||||
f, err := os.Open(fp)
|
||||
if err != nil {
|
||||
return err
|
||||
}
|
||||
defer f.Close()
|
||||
|
||||
fileDigest, _ := GetSHA256Digest(f)
|
||||
if digest != fileDigest {
|
||||
return fmt.Errorf("digest mismatch: want %s, got %s", digest, fileDigest)
|
||||
}
|
||||
|
||||
return nil
|
||||
}
|
||||
|
@@ -4,6 +4,7 @@ import (
|
||||
"fmt"
|
||||
"os"
|
||||
"path/filepath"
|
||||
"runtime"
|
||||
"strings"
|
||||
)
|
||||
|
||||
@@ -44,7 +45,7 @@ func ParseModelPath(name string) ModelPath {
|
||||
return ModelPath{}
|
||||
}
|
||||
|
||||
colonParts := strings.Split(name, ":")
|
||||
colonParts := strings.Split(slashParts[len(slashParts)-1], ":")
|
||||
if len(colonParts) == 2 {
|
||||
tag = colonParts[1]
|
||||
} else {
|
||||
@@ -69,10 +70,13 @@ func (mp ModelPath) GetFullTagname() string {
|
||||
}
|
||||
|
||||
func (mp ModelPath) GetShortTagname() string {
|
||||
if mp.Registry == DefaultRegistry && mp.Namespace == DefaultNamespace {
|
||||
return fmt.Sprintf("%s:%s", mp.Repository, mp.Tag)
|
||||
if mp.Registry == DefaultRegistry {
|
||||
if mp.Namespace == DefaultNamespace {
|
||||
return fmt.Sprintf("%s:%s", mp.Repository, mp.Tag)
|
||||
}
|
||||
return fmt.Sprintf("%s/%s:%s", mp.Namespace, mp.Repository, mp.Tag)
|
||||
}
|
||||
return fmt.Sprintf("%s/%s:%s", mp.Namespace, mp.Repository, mp.Tag)
|
||||
return fmt.Sprintf("%s/%s/%s:%s", mp.Registry, mp.Namespace, mp.Repository, mp.Tag)
|
||||
}
|
||||
|
||||
func (mp ModelPath) GetManifestPath(createDir bool) (string, error) {
|
||||
@@ -106,6 +110,10 @@ func GetBlobsPath(digest string) (string, error) {
|
||||
return "", err
|
||||
}
|
||||
|
||||
if runtime.GOOS == "windows" {
|
||||
digest = strings.ReplaceAll(digest, ":", "-")
|
||||
}
|
||||
|
||||
path := filepath.Join(home, ".ollama", "models", "blobs", digest)
|
||||
if err := os.MkdirAll(filepath.Dir(path), 0o755); err != nil {
|
||||
return "", err
|
||||
|
141
server/routes.go
141
server/routes.go
@@ -2,6 +2,8 @@ package server
|
||||
|
||||
import (
|
||||
"encoding/json"
|
||||
"errors"
|
||||
"fmt"
|
||||
"io"
|
||||
"log"
|
||||
"net"
|
||||
@@ -9,26 +11,17 @@ import (
|
||||
"os"
|
||||
"path/filepath"
|
||||
"strings"
|
||||
"text/template"
|
||||
"time"
|
||||
|
||||
"dario.cat/mergo"
|
||||
"github.com/gin-contrib/cors"
|
||||
"github.com/gin-gonic/gin"
|
||||
|
||||
"github.com/jmorganca/ollama/api"
|
||||
"github.com/jmorganca/ollama/llama"
|
||||
)
|
||||
|
||||
func cacheDir() string {
|
||||
home, err := os.UserHomeDir()
|
||||
if err != nil {
|
||||
panic(err)
|
||||
}
|
||||
|
||||
return filepath.Join(home, ".ollama")
|
||||
}
|
||||
|
||||
func generate(c *gin.Context) {
|
||||
func GenerateHandler(c *gin.Context) {
|
||||
start := time.Now()
|
||||
|
||||
var req api.GenerateRequest
|
||||
@@ -54,19 +47,12 @@ func generate(c *gin.Context) {
|
||||
return
|
||||
}
|
||||
|
||||
templ, err := template.New("").Parse(model.Prompt)
|
||||
prompt, err := model.Prompt(req)
|
||||
if err != nil {
|
||||
c.JSON(http.StatusInternalServerError, gin.H{"error": err.Error()})
|
||||
return
|
||||
}
|
||||
|
||||
var sb strings.Builder
|
||||
if err = templ.Execute(&sb, req); err != nil {
|
||||
c.JSON(http.StatusInternalServerError, gin.H{"error": err.Error()})
|
||||
return
|
||||
}
|
||||
req.Prompt = sb.String()
|
||||
|
||||
llm, err := llama.New(model.ModelPath, opts)
|
||||
if err != nil {
|
||||
c.JSON(http.StatusInternalServerError, gin.H{"error": err.Error()})
|
||||
@@ -77,7 +63,7 @@ func generate(c *gin.Context) {
|
||||
ch := make(chan any)
|
||||
go func() {
|
||||
defer close(ch)
|
||||
llm.Predict(req.Context, req.Prompt, func(r api.GenerateResponse) {
|
||||
fn := func(r api.GenerateResponse) {
|
||||
r.Model = req.Model
|
||||
r.CreatedAt = time.Now().UTC()
|
||||
if r.Done {
|
||||
@@ -85,13 +71,17 @@ func generate(c *gin.Context) {
|
||||
}
|
||||
|
||||
ch <- r
|
||||
})
|
||||
}
|
||||
|
||||
if err := llm.Predict(req.Context, prompt, fn); err != nil {
|
||||
ch <- gin.H{"error": err.Error()}
|
||||
}
|
||||
}()
|
||||
|
||||
streamResponse(c, ch)
|
||||
}
|
||||
|
||||
func pull(c *gin.Context) {
|
||||
func PullModelHandler(c *gin.Context) {
|
||||
var req api.PullRequest
|
||||
if err := c.ShouldBindJSON(&req); err != nil {
|
||||
c.JSON(http.StatusBadRequest, gin.H{"error": err.Error()})
|
||||
@@ -105,16 +95,21 @@ func pull(c *gin.Context) {
|
||||
ch <- r
|
||||
}
|
||||
|
||||
if err := PullModel(req.Name, req.Username, req.Password, fn); err != nil {
|
||||
c.JSON(http.StatusInternalServerError, gin.H{"error": err.Error()})
|
||||
return
|
||||
regOpts := &RegistryOptions{
|
||||
Insecure: req.Insecure,
|
||||
Username: req.Username,
|
||||
Password: req.Password,
|
||||
}
|
||||
|
||||
if err := PullModel(req.Name, regOpts, fn); err != nil {
|
||||
ch <- gin.H{"error": err.Error()}
|
||||
}
|
||||
}()
|
||||
|
||||
streamResponse(c, ch)
|
||||
}
|
||||
|
||||
func push(c *gin.Context) {
|
||||
func PushModelHandler(c *gin.Context) {
|
||||
var req api.PushRequest
|
||||
if err := c.ShouldBindJSON(&req); err != nil {
|
||||
c.JSON(http.StatusBadRequest, gin.H{"error": err.Error()})
|
||||
@@ -128,31 +123,27 @@ func push(c *gin.Context) {
|
||||
ch <- r
|
||||
}
|
||||
|
||||
if err := PushModel(req.Name, req.Username, req.Password, fn); err != nil {
|
||||
c.JSON(http.StatusInternalServerError, gin.H{"error": err.Error()})
|
||||
return
|
||||
regOpts := &RegistryOptions{
|
||||
Insecure: req.Insecure,
|
||||
Username: req.Username,
|
||||
Password: req.Password,
|
||||
}
|
||||
|
||||
if err := PushModel(req.Name, regOpts, fn); err != nil {
|
||||
ch <- gin.H{"error": err.Error()}
|
||||
}
|
||||
}()
|
||||
|
||||
streamResponse(c, ch)
|
||||
}
|
||||
|
||||
func create(c *gin.Context) {
|
||||
func CreateModelHandler(c *gin.Context) {
|
||||
var req api.CreateRequest
|
||||
if err := c.ShouldBindJSON(&req); err != nil {
|
||||
c.JSON(http.StatusBadRequest, gin.H{"message": err.Error()})
|
||||
return
|
||||
}
|
||||
|
||||
// NOTE consider passing the entire Modelfile in the json instead of the path to it
|
||||
|
||||
file, err := os.Open(req.Path)
|
||||
if err != nil {
|
||||
c.JSON(http.StatusBadRequest, gin.H{"message": err.Error()})
|
||||
return
|
||||
}
|
||||
defer file.Close()
|
||||
|
||||
ch := make(chan any)
|
||||
go func() {
|
||||
defer close(ch)
|
||||
@@ -162,16 +153,32 @@ func create(c *gin.Context) {
|
||||
}
|
||||
}
|
||||
|
||||
if err := CreateModel(req.Name, file, fn); err != nil {
|
||||
c.JSON(http.StatusBadRequest, gin.H{"message": err.Error()})
|
||||
return
|
||||
if err := CreateModel(req.Name, req.Path, fn); err != nil {
|
||||
ch <- gin.H{"error": err.Error()}
|
||||
}
|
||||
}()
|
||||
|
||||
streamResponse(c, ch)
|
||||
}
|
||||
|
||||
func list(c *gin.Context) {
|
||||
func DeleteModelHandler(c *gin.Context) {
|
||||
var req api.DeleteRequest
|
||||
if err := c.ShouldBindJSON(&req); err != nil {
|
||||
c.JSON(http.StatusBadRequest, gin.H{"error": err.Error()})
|
||||
return
|
||||
}
|
||||
|
||||
if err := DeleteModel(req.Name); err != nil {
|
||||
if os.IsNotExist(err) {
|
||||
c.JSON(http.StatusNotFound, gin.H{"error": fmt.Sprintf("model '%s' not found", req.Name)})
|
||||
} else {
|
||||
c.JSON(http.StatusInternalServerError, gin.H{"error": err.Error()})
|
||||
}
|
||||
return
|
||||
}
|
||||
}
|
||||
|
||||
func ListModelsHandler(c *gin.Context) {
|
||||
var models []api.ListResponseModel
|
||||
fp, err := GetManifestPath()
|
||||
if err != nil {
|
||||
@@ -180,6 +187,10 @@ func list(c *gin.Context) {
|
||||
}
|
||||
err = filepath.Walk(fp, func(path string, info os.FileInfo, err error) error {
|
||||
if err != nil {
|
||||
if errors.Is(err, os.ErrNotExist) {
|
||||
log.Printf("manifest file does not exist: %s", fp)
|
||||
return nil
|
||||
}
|
||||
return err
|
||||
}
|
||||
if !info.IsDir() {
|
||||
@@ -217,18 +228,52 @@ func list(c *gin.Context) {
|
||||
c.JSON(http.StatusOK, api.ListResponse{models})
|
||||
}
|
||||
|
||||
func CopyModelHandler(c *gin.Context) {
|
||||
var req api.CopyRequest
|
||||
if err := c.ShouldBindJSON(&req); err != nil {
|
||||
c.JSON(http.StatusBadRequest, gin.H{"error": err.Error()})
|
||||
return
|
||||
}
|
||||
|
||||
if err := CopyModel(req.Source, req.Destination); err != nil {
|
||||
if os.IsNotExist(err) {
|
||||
c.JSON(http.StatusNotFound, gin.H{"error": fmt.Sprintf("model '%s' not found", req.Source)})
|
||||
} else {
|
||||
c.JSON(http.StatusInternalServerError, gin.H{"error": err.Error()})
|
||||
}
|
||||
return
|
||||
}
|
||||
}
|
||||
|
||||
func Serve(ln net.Listener) error {
|
||||
config := cors.DefaultConfig()
|
||||
config.AllowWildcard = true
|
||||
// only allow http/https from localhost
|
||||
config.AllowOrigins = []string{
|
||||
"http://localhost",
|
||||
"http://localhost:*",
|
||||
"https://localhost",
|
||||
"https://localhost:*",
|
||||
"http://127.0.0.1",
|
||||
"http://127.0.0.1:*",
|
||||
"https://127.0.0.1",
|
||||
"https://127.0.0.1:*",
|
||||
}
|
||||
|
||||
r := gin.Default()
|
||||
r.Use(cors.New(config))
|
||||
|
||||
r.GET("/", func(c *gin.Context) {
|
||||
c.String(http.StatusOK, "Ollama is running")
|
||||
})
|
||||
|
||||
r.POST("/api/pull", pull)
|
||||
r.POST("/api/generate", generate)
|
||||
r.POST("/api/create", create)
|
||||
r.POST("/api/push", push)
|
||||
r.GET("/api/tags", list)
|
||||
r.POST("/api/pull", PullModelHandler)
|
||||
r.POST("/api/generate", GenerateHandler)
|
||||
r.POST("/api/create", CreateModelHandler)
|
||||
r.POST("/api/push", PushModelHandler)
|
||||
r.POST("/api/copy", CopyModelHandler)
|
||||
r.GET("/api/tags", ListModelsHandler)
|
||||
r.DELETE("/api/delete", DeleteModelHandler)
|
||||
|
||||
log.Printf("Listening on %s", ln.Addr())
|
||||
s := &http.Server{
|
||||
|
@@ -1,10 +0,0 @@
|
||||
{{- if not .Context }}
|
||||
Below is an instruction that describes a task. Write a response that appropriately completes the request.
|
||||
{{- end }}
|
||||
|
||||
### Instruction:
|
||||
{{ .Prompt }}
|
||||
|
||||
### Response:
|
||||
|
||||
|
@@ -1,5 +0,0 @@
|
||||
{{- if not .Context }}
|
||||
A helpful assistant who helps the user with any questions asked.
|
||||
{{- end }}
|
||||
User: {{ .Prompt }}
|
||||
Assistant:
|
@@ -1,5 +0,0 @@
|
||||
### Instruction:
|
||||
{{ .Prompt }}
|
||||
|
||||
### Response:
|
||||
|
@@ -1,5 +0,0 @@
|
||||
### Instruction:
|
||||
{{ .Prompt }}
|
||||
|
||||
### Response:
|
||||
|
@@ -1,6 +0,0 @@
|
||||
{{- if not .Context }}
|
||||
Below is an instruction that describes a task. Write a response that appropriately completes the request. Be concise. Once the request is completed, include no other text.
|
||||
{{- end }}
|
||||
### Instruction:
|
||||
{{ .Prompt }}
|
||||
### Response:
|
@@ -1 +0,0 @@
|
||||
{{ .Prompt }}
|
@@ -1,9 +0,0 @@
|
||||
{{- if not .Context }}
|
||||
### System:
|
||||
You are an AI assistant that follows instruction extremely well. Help as much as you can.
|
||||
{{- end }}
|
||||
|
||||
### User:
|
||||
{{ .Prompt }}
|
||||
|
||||
### Response:
|
@@ -1,2 +0,0 @@
|
||||
### Human: {{ .Prompt }}
|
||||
### Assistant:
|
@@ -1,4 +0,0 @@
|
||||
|
||||
{{ .Prompt }}
|
||||
|
||||
|
@@ -1,2 +0,0 @@
|
||||
USER: {{ .Prompt }}
|
||||
ASSISTANT:
|
@@ -1,6 +0,0 @@
|
||||
{{ if not .Context }}
|
||||
A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions.
|
||||
{{- end }}
|
||||
|
||||
USER: {{ .Prompt }}
|
||||
ASSISTANT:
|
@@ -1,7 +0,0 @@
|
||||
{{- if not .Context }}
|
||||
Below is an instruction that describes a task. Write a response that appropriately completes the request
|
||||
{{- end }}
|
||||
|
||||
### Instruction: {{ .Prompt }}
|
||||
|
||||
### Response:
|
@@ -1,3 +0,0 @@
|
||||
{{ .Prompt }}
|
||||
|
||||
### Response:
|
@@ -1,6 +0,0 @@
|
||||
import models from '../../../../models.json'
|
||||
import { NextResponse } from 'next/server'
|
||||
|
||||
export async function GET() {
|
||||
return NextResponse.json(models)
|
||||
}
|
@@ -6,12 +6,22 @@ const analytics = new Analytics({ writeKey: process.env.TELEMETRY_WRITE_KEY || '
|
||||
export async function POST(req: Request) {
|
||||
const { email } = await req.json()
|
||||
|
||||
analytics.identify({
|
||||
anonymousId: uuid(),
|
||||
const id = uuid()
|
||||
|
||||
await analytics.identify({
|
||||
anonymousId: id,
|
||||
traits: {
|
||||
email,
|
||||
},
|
||||
})
|
||||
|
||||
await analytics.track({
|
||||
anonymousId: id,
|
||||
event: 'signup',
|
||||
properties: {
|
||||
email,
|
||||
},
|
||||
})
|
||||
|
||||
return new Response(null, { status: 200 })
|
||||
}
|
||||
|
@@ -1,3 +1,5 @@
|
||||
import Image from 'next/image'
|
||||
|
||||
import Header from '../header'
|
||||
import Downloader from './downloader'
|
||||
import Signup from './signup'
|
||||
@@ -30,7 +32,7 @@ export default async function Download() {
|
||||
<>
|
||||
<Header />
|
||||
<main className='flex min-h-screen max-w-6xl flex-col py-20 px-16 lg:p-32 items-center mx-auto'>
|
||||
<img src='/ollama.png' className='w-16 h-auto' />
|
||||
<Image src='/ollama.png' width={64} height={64} alt='ollamaIcon' />
|
||||
<section className='mt-12 mb-8 text-center'>
|
||||
<h2 className='my-2 max-w-md text-3xl tracking-tight'>Downloading...</h2>
|
||||
<h3 className='text-base text-neutral-500 mt-12 max-w-[16rem]'>
|
||||
|
@@ -1,24 +1,26 @@
|
||||
import Link from "next/link"
|
||||
|
||||
const navigation = [
|
||||
{ name: 'Discord', href: 'https://discord.gg/MrfB5FbNWN' },
|
||||
{ name: 'GitHub', href: 'https://github.com/jmorganca/ollama' },
|
||||
{ name: 'Github', href: 'https://github.com/jmorganca/ollama' },
|
||||
{ name: 'Download', href: '/download' },
|
||||
]
|
||||
|
||||
export default function Header() {
|
||||
export default function Header() {
|
||||
return (
|
||||
<header className='absolute inset-x-0 top-0 z-50'>
|
||||
<nav className='mx-auto flex items-center justify-between px-10 py-4'>
|
||||
<a className='flex-1 font-bold' href='/'>
|
||||
<header className="absolute inset-x-0 top-0 z-50">
|
||||
<nav className="mx-auto flex items-center justify-between px-10 py-4">
|
||||
<Link className="flex-1 font-bold" href="/">
|
||||
Ollama
|
||||
</a>
|
||||
<div className='flex space-x-8'>
|
||||
{navigation.map(item => (
|
||||
<a key={item.name} href={item.href} className='text-sm leading-6 text-gray-900'>
|
||||
</Link>
|
||||
<div className="flex space-x-8">
|
||||
{navigation.map((item) => (
|
||||
<Link key={item.name} href={item.href} className="text-sm leading-6 text-gray-900">
|
||||
{item.name}
|
||||
</a>
|
||||
</Link>
|
||||
))}
|
||||
</div>
|
||||
</nav>
|
||||
</header>
|
||||
</header >
|
||||
)
|
||||
}
|
||||
}
|
@@ -1,6 +1,6 @@
|
||||
import { AiFillApple } from 'react-icons/ai'
|
||||
import Image from 'next/image'
|
||||
import Link from 'next/link'
|
||||
|
||||
import models from '../../models.json'
|
||||
import Header from './header'
|
||||
|
||||
export default async function Home() {
|
||||
@@ -8,21 +8,26 @@ export default async function Home() {
|
||||
<>
|
||||
<Header />
|
||||
<main className='flex min-h-screen max-w-6xl flex-col py-20 px-16 md:p-32 items-center mx-auto'>
|
||||
<img src='/ollama.png' className='w-16 h-auto' />
|
||||
<Image src='/ollama.png' width={64} height={64} alt='ollamaIcon' />
|
||||
<section className='my-12 text-center'>
|
||||
<div className='flex flex-col space-y-2'>
|
||||
<h2 className='md:max-w-[18rem] mx-auto my-2 text-3xl tracking-tight'>Portable large language models</h2>
|
||||
<h2 className='md:max-w-md mx-auto my-2 text-3xl tracking-tight'>
|
||||
Get up and running with large language models, locally.
|
||||
</h2>
|
||||
<h3 className='md:max-w-xs mx-auto text-base text-neutral-500'>
|
||||
Bundle a model’s weights, configuration, prompts, data and more into self-contained packages that run anywhere.
|
||||
Run Llama 2 and other models on macOS. Customize and create your own.
|
||||
</h3>
|
||||
</div>
|
||||
<div className='mx-auto flex flex-col space-y-4 mt-12'>
|
||||
<a href='/download' className='md:mx-10 lg:mx-14 bg-black text-white rounded-full px-4 py-2 focus:outline-none cursor-pointer'>
|
||||
<div className='mx-auto max-w-xs flex flex-col space-y-4 mt-12'>
|
||||
<Link
|
||||
href='/download'
|
||||
className='md:mx-10 lg:mx-14 bg-black text-white rounded-full px-4 py-2 focus:outline-none cursor-pointer'
|
||||
>
|
||||
Download
|
||||
</a>
|
||||
</Link>
|
||||
<p className='text-neutral-500 text-sm '>
|
||||
Available for macOS with Apple Silicon <br />
|
||||
Windows & Linux support coming soon.
|
||||
Available for macOS with Apple Silicon <br />
|
||||
Windows & Linux support coming soon.
|
||||
</p>
|
||||
</div>
|
||||
</section>
|
||||
|
Reference in New Issue
Block a user