Commit Graph

874 Commits

Author SHA1 Message Date
HanishKVC 668b98700c SimpleChat: Add a simple readme file 2024-05-18 01:06:54 +05:30
HanishKVC b3644172e0 SimpleChat:JS: Force completion mode be single message by default 2024-05-18 00:36:23 +05:30
HanishKVC aef32d9cc0 SimpleChat:JS: Handle difference in response
Try read the assistance response from appropriate field in the
response got.

Also examples/server seems to return the response in a slightly
different field, so try account for that also.
2024-05-18 00:36:23 +05:30
HanishKVC 3e5edbacd6 SimpleChat: Dont submit if already submitted and waiting
Also make chat the default selection wrt mode
2024-05-18 00:36:23 +05:30
HanishKVC 9feb58eaa5 SimpleChat: Allow user to select chat or completion mode 2024-05-18 00:36:23 +05:30
HanishKVC e62087bf3f SimpleChat:JS: Try trap enter key press wrt input text field
So user can either press submit button or press enter key
2024-05-18 00:36:23 +05:30
HanishKVC 29d2d22c02 SimpleChat:sh: Add simple shell script to run python3 http.server
So one needs to run the llm server locally
then run this script and access it using a local browser
2024-05-18 00:36:23 +05:30
HanishKVC ebe330d098 SimpleChat: Move into its own sub directory to avoid confusion 2024-05-18 00:36:23 +05:30
HanishKVC 9942851273 SimpleChat: Diff user/assistant msgs, Make input wider
Also show a default message to user

Also add some metas
2024-05-18 00:36:23 +05:30
HanishKVC 7d772f6b9a SimpleChat: Try keep input element in view 2024-05-18 00:36:23 +05:30
HanishKVC 564469e4f6 SimpleChat:JS: Messages/Prompt, indicate working to end user 2024-05-18 00:36:23 +05:30
HanishKVC c6653479fc SimpleChat:JS: Extract model response and show to user 2024-05-18 00:36:23 +05:30
HanishKVC 33bc67baa6 SimpleChat: Try handshake with llm over its web service endpoint 2024-05-18 00:36:23 +05:30
HanishKVC 27268a6067 SimpleChat: Move handling of submit request into its own func 2024-05-18 00:36:23 +05:30
HanishKVC ce4aaeb692 SimpleChat: Use common helper logic wrt json data 2024-05-18 00:36:23 +05:30
HanishKVC 639d647ebf SimpleChat: Also add completions related prompt 2024-05-18 00:36:23 +05:30
HanishKVC 256e02c7c9 SimpleChat: Rather value wrt input text element 2024-05-18 00:36:23 +05:30
HanishKVC 24d348ab97 SimpleChat:HTML: Bring in the js file 2024-05-18 00:36:23 +05:30
HanishKVC 70e5860264 SimpleChatJS: Roles Class, submitClick
Define Role class with static members corresponding to the roles.

Update startme to

* Get hold of the ui elements.

* Attach a click handler to submit button, which adds the user input
  to xchats array and shows the chat messages till now in chat div
  element.

Trap DOMContentLoaded to trigger startme
2024-05-18 00:36:23 +05:30
HanishKVC 1d3cc9353a SimpleChat: request_json, globals, startme 2024-05-18 00:36:23 +05:30
HanishKVC 0402a4b60e SimpleChat: A js skeleton with SimpleChat class
Allows maintaining an array of chat message.

Allows adding chat message (from any of the roles be it system,
user, assistant, ...)

Allows showing chat messages till now, in a given div element.
2024-05-18 00:36:23 +05:30
HanishKVC 69ecad21e7 SimpleChat: Add a skeletal html page
Contains a div placeholder for showing chat messages till now

a text-input for allowing user to enter next chat message/query
to the model.

a submit button to allow sending of the user entered message and
chat till now to the model.
2024-05-18 00:36:22 +05:30
Radoslav Gerganov f4bd8b3d26
rpc : set SO_REUSEADDR for the server socket (#7320)
ref: #7293
2024-05-17 17:25:44 +03:00
Radoslav Gerganov ee94172d33
server : add support for the RPC backend (#7305)
ref: #7292
2024-05-17 10:00:17 +03:00
Leon Knauer 9c4fdcbec8
[Server] Added --verbose option to README [no ci] (#7335) 2024-05-17 10:11:03 +10:00
Pierrick Hymbert 24ecb58168
Revert "server bench: fix bench not waiting for model load (#7284)" (#7334)
This reverts commit 583fd6b000.
2024-05-16 20:43:45 +02:00
Radoslav Gerganov 9afdffe70e rpc : get available mem for the CPU backend
This can be overridden with the -m command line option

ref: #7293
2024-05-16 12:04:08 +03:00
Radoslav Gerganov 3b3963c55c rpc : add command line arg for specifying backend memory
ref: #7293
2024-05-16 09:58:29 +03:00
Vaibhav Srivastav ad52d5c259
doc: add references to hugging face GGUF-my-repo quantisation web tool. (#7288)
* chore: add references to the quantisation space.

* fix grammer lol.

* Update README.md

Co-authored-by: Julien Chaumond <julien@huggingface.co>

* Update README.md

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

---------

Co-authored-by: Julien Chaumond <julien@huggingface.co>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-05-16 15:38:43 +10:00
slaren 344f9126cc
ggml : tag ggml_tensor::backend as deprecated (#7290) 2024-05-15 15:08:48 +02:00
dm4 ea3b0590ee
embedding : free the batch after execution (#7297) 2024-05-15 15:01:12 +03:00
Johannes Gäßler 583fd6b000
server bench: fix bench not waiting for model load (#7284) 2024-05-15 08:44:16 +02:00
Steve Grubb 4f0263633b
server: free sampling contexts on exit (#7264)
* server: free sampling contexts on exit

This cleans up last leak found by the address sanitizer.

* fix whitespace

* fix whitespace
2024-05-14 16:11:24 +02:00
Brian 1265c670fd
Revert "move ndk code to a new library (#6951)" (#7282)
This reverts commit efc8f767c8.
2024-05-14 16:10:39 +03:00
Radoslav Gerganov 5e31828d3e
ggml : add RPC backend (#6829)
* ggml : add RPC backend

The RPC backend proxies all operations to a remote server which runs a
regular backend (CPU, CUDA, Metal, etc).

* set TCP_NODELAY

* add CI workflows

* Address review comments

* fix warning

* implement llama_max_devices() for RPC

* Address review comments

* Address review comments

* wrap sockfd into a struct

* implement get_alignment and get_max_size

* add get_device_memory

* fix warning

* win32 support

* add README

* readme : trim trailing whitespace

* Address review comments

* win32 fix

* Address review comments

* fix compile warnings on macos
2024-05-14 14:27:19 +03:00
Elton Kola efc8f767c8
move ndk code to a new library (#6951) 2024-05-14 17:30:30 +10:00
Ryuei 27f65d6267
docs: Fix typo and update description for --embeddings flag (#7026)
- Change '--embedding' to '--embeddings' in the README
- Update the description to match the latest --help output
- Added a caution about defining physical batch size
2024-05-14 15:20:47 +10:00
k.h.lai 30e70334f7
llava-cli: fix base64 prompt (#7248) 2024-05-14 00:02:36 +10:00
Johannes Gäßler 1c570d8bee
perplexity: add BF16 vs. FP16 results (#7150) 2024-05-13 13:03:27 +02:00
Benjamin Findley e586ee4259
change default temperature of OAI compat API from 0 to 1 (#7226)
* change default temperature of OAI compat API from 0 to 1

* make tests explicitly send temperature to OAI API
2024-05-13 12:40:08 +10:00
Xuan Son Nguyen 72c177c1f6
fix system prompt handling (#7153) 2024-05-11 17:28:10 +02:00
Steve Grubb 988631335a
server : free llama_batch on exit (#7212)
* [server] Cleanup a memory leak on exit

There are a couple memory leaks on exit of the server. This hides others.
After cleaning this up, you can see leaks on slots. But that is another
patch to be sent after this.

* make tab into spaces
2024-05-11 11:13:02 +03:00
Johannes Gäßler 5ae3426b0b
server: fix reported top tokens for temperature 0 (#7203) 2024-05-11 10:11:28 +02:00
Joan Fontanals b83cc3f5b3
llama : add Jina Embeddings architecture (#6826)
* feat: first things to do

* feat: create tensors for Jina architecture

* fix: use other tensors

* feat: embedding gets results

* fix: fix usage of ALIBI

* fix: clean prints

* fix: do some cleanup unused vars

* fix: revert changes to Makefile and CMakeLists

* fix: revert some changes

* fix: fix small detail

* fix: fix convert formatting

* fix: fix linting and editor

* feat: set proper vocab settings

* fix: JinaBertForMaskedLM registration

* feat: support q_normalization and k_normalization in Jina arch

* feat: handle gpt2 tokenizer with Jina architecture

* feat: example comments in embedding

* feat: rename Jina Bert to Jina Bert V2

* fix: add some changes as per review

* feat: proper KQ_pos for Jina embeddings

* feat: add capacity to load models ES and DE for Spanish

* llama : fix pre-tokenizers

* ggml : full ALiBi support

* ggml : update ggml_soft_max_ext() CUDA, SYCL

* ggml : ggml_flash_attn_ext() support ALiBi (CPU)

* ggml : ggml_flash_attn_ext() support ALiBi (Metal)

* ggml : fix warning

* ggml : ggml_flash_attn_ext() support ALiBi (CUDA)

ggml-ci

* minor : clean-up

* embedding : add warning about missing SEP

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-05-11 10:46:09 +03:00
slaren e849648888
llama-bench : add pp+tg test type (#7199) 2024-05-10 18:03:54 +02:00
Justine Tunney 4e3880978f
Fix memory bug in grammar parser (#7194)
The llama.cpp grammar parser had a bug where forgetting to add a closing
quotation mark to strings would cause parsing to crash. Anyone running a
server on a public endpoint is advised to upgrade. To reproduce this bug

    ./llamafile -m foo.gguf -p bar --grammar 'root::="'

Credit for discovering and reporting this issue goes to Eclypsium
Security Researcher Richard Johnson <Richard.johnson@eclypsium.com>.
2024-05-10 21:01:08 +10:00
HanishKVC f89fe2732c
Main+: optionally allow special tokens from user in interactive mode (#7097)
@hanishkvc added a new `--interactive-specials` flag which would allow for inserting special tokens from user side into the embedding stream.
2024-05-10 20:21:58 +10:00
Andrei d11afd6652
llava : fix moondream support (#7163)
* Revert "Revert "llava : add support for moondream vision language model (#6899)""

This reverts commit 9da243b36a.

* Fix num_positions and embeddings initialization
2024-05-10 09:41:10 +03:00
slaren eaf4bd8b39
eval-callback : fix conversion to float (#7184) 2024-05-10 01:04:12 +02:00
Ahmet Zeer 07cd41d096
TypoFix (#7162) 2024-05-09 10:16:45 +02:00