llama.cpp

Commit Graph

Author	SHA1	Message	Date
HanishKVC	72151aa634	SimpleChat:Alert user if they provide sysprompt late or change it	2024-05-18 03:16:30 +05:30
HanishKVC	884adfd739	SimpleChat: Ignore empty user input, without trimming	2024-05-18 03:07:40 +05:30
HanishKVC	ae52ad1675	SimpleChat:Allow system prompt to be set, if provided before user	2024-05-18 02:59:42 +05:30
HanishKVC	69817fe1de	SimpleChat:HTML: Cleanup/structure UI a bit, Add input for system	2024-05-18 01:40:57 +05:30
HanishKVC	668b98700c	SimpleChat: Add a simple readme file	2024-05-18 01:06:54 +05:30
HanishKVC	b3644172e0	SimpleChat:JS: Force completion mode be single message by default	2024-05-18 00:36:23 +05:30
HanishKVC	aef32d9cc0	SimpleChat:JS: Handle difference in response Try read the assistance response from appropriate field in the response got. Also examples/server seems to return the response in a slightly different field, so try account for that also.	2024-05-18 00:36:23 +05:30
HanishKVC	3e5edbacd6	SimpleChat: Dont submit if already submitted and waiting Also make chat the default selection wrt mode	2024-05-18 00:36:23 +05:30
HanishKVC	9feb58eaa5	SimpleChat: Allow user to select chat or completion mode	2024-05-18 00:36:23 +05:30
HanishKVC	e62087bf3f	SimpleChat:JS: Try trap enter key press wrt input text field So user can either press submit button or press enter key	2024-05-18 00:36:23 +05:30
HanishKVC	29d2d22c02	SimpleChat:sh: Add simple shell script to run python3 http.server So one needs to run the llm server locally then run this script and access it using a local browser	2024-05-18 00:36:23 +05:30
HanishKVC	ebe330d098	SimpleChat: Move into its own sub directory to avoid confusion	2024-05-18 00:36:23 +05:30
HanishKVC	9942851273	SimpleChat: Diff user/assistant msgs, Make input wider Also show a default message to user Also add some metas	2024-05-18 00:36:23 +05:30
HanishKVC	7d772f6b9a	SimpleChat: Try keep input element in view	2024-05-18 00:36:23 +05:30
HanishKVC	564469e4f6	SimpleChat:JS: Messages/Prompt, indicate working to end user	2024-05-18 00:36:23 +05:30
HanishKVC	c6653479fc	SimpleChat:JS: Extract model response and show to user	2024-05-18 00:36:23 +05:30
HanishKVC	33bc67baa6	SimpleChat: Try handshake with llm over its web service endpoint	2024-05-18 00:36:23 +05:30
HanishKVC	27268a6067	SimpleChat: Move handling of submit request into its own func	2024-05-18 00:36:23 +05:30
HanishKVC	ce4aaeb692	SimpleChat: Use common helper logic wrt json data	2024-05-18 00:36:23 +05:30
HanishKVC	639d647ebf	SimpleChat: Also add completions related prompt	2024-05-18 00:36:23 +05:30
HanishKVC	256e02c7c9	SimpleChat: Rather value wrt input text element	2024-05-18 00:36:23 +05:30
HanishKVC	24d348ab97	SimpleChat:HTML: Bring in the js file	2024-05-18 00:36:23 +05:30
HanishKVC	70e5860264	SimpleChatJS: Roles Class, submitClick Define Role class with static members corresponding to the roles. Update startme to * Get hold of the ui elements. * Attach a click handler to submit button, which adds the user input to xchats array and shows the chat messages till now in chat div element. Trap DOMContentLoaded to trigger startme	2024-05-18 00:36:23 +05:30
HanishKVC	1d3cc9353a	SimpleChat: request_json, globals, startme	2024-05-18 00:36:23 +05:30
HanishKVC	0402a4b60e	SimpleChat: A js skeleton with SimpleChat class Allows maintaining an array of chat message. Allows adding chat message (from any of the roles be it system, user, assistant, ...) Allows showing chat messages till now, in a given div element.	2024-05-18 00:36:23 +05:30
HanishKVC	69ecad21e7	SimpleChat: Add a skeletal html page Contains a div placeholder for showing chat messages till now a text-input for allowing user to enter next chat message/query to the model. a submit button to allow sending of the user entered message and chat till now to the model.	2024-05-18 00:36:22 +05:30
Radoslav Gerganov	f4bd8b3d26	rpc : set SO_REUSEADDR for the server socket (#7320 ) ref: #7293	2024-05-17 17:25:44 +03:00
Radoslav Gerganov	ee94172d33	server : add support for the RPC backend (#7305 ) ref: #7292	2024-05-17 10:00:17 +03:00
Leon Knauer	9c4fdcbec8	[Server] Added --verbose option to README [no ci] (#7335 )	2024-05-17 10:11:03 +10:00
Pierrick Hymbert	24ecb58168	Revert "server bench: fix bench not waiting for model load (#7284 )" (#7334 ) This reverts commit `583fd6b000`.	2024-05-16 20:43:45 +02:00
Radoslav Gerganov	9afdffe70e	rpc : get available mem for the CPU backend This can be overridden with the -m command line option ref: #7293	2024-05-16 12:04:08 +03:00
Radoslav Gerganov	3b3963c55c	rpc : add command line arg for specifying backend memory ref: #7293	2024-05-16 09:58:29 +03:00
Vaibhav Srivastav	ad52d5c259	doc: add references to hugging face GGUF-my-repo quantisation web tool. (#7288 ) * chore: add references to the quantisation space. * fix grammer lol. * Update README.md Co-authored-by: Julien Chaumond <julien@huggingface.co> * Update README.md Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> --------- Co-authored-by: Julien Chaumond <julien@huggingface.co> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-05-16 15:38:43 +10:00
slaren	344f9126cc	ggml : tag ggml_tensor::backend as deprecated (#7290 )	2024-05-15 15:08:48 +02:00
dm4	ea3b0590ee	embedding : free the batch after execution (#7297 )	2024-05-15 15:01:12 +03:00
Johannes Gäßler	583fd6b000	server bench: fix bench not waiting for model load (#7284 )	2024-05-15 08:44:16 +02:00
Steve Grubb	4f0263633b	server: free sampling contexts on exit (#7264 ) * server: free sampling contexts on exit This cleans up last leak found by the address sanitizer. * fix whitespace * fix whitespace	2024-05-14 16:11:24 +02:00
Brian	1265c670fd	Revert "move ndk code to a new library (#6951 )" (#7282 ) This reverts commit `efc8f767c8`.	2024-05-14 16:10:39 +03:00
Radoslav Gerganov	5e31828d3e	ggml : add RPC backend (#6829 ) * ggml : add RPC backend The RPC backend proxies all operations to a remote server which runs a regular backend (CPU, CUDA, Metal, etc). * set TCP_NODELAY * add CI workflows * Address review comments * fix warning * implement llama_max_devices() for RPC * Address review comments * Address review comments * wrap sockfd into a struct * implement get_alignment and get_max_size * add get_device_memory * fix warning * win32 support * add README * readme : trim trailing whitespace * Address review comments * win32 fix * Address review comments * fix compile warnings on macos	2024-05-14 14:27:19 +03:00
Elton Kola	efc8f767c8	move ndk code to a new library (#6951 )	2024-05-14 17:30:30 +10:00
Ryuei	27f65d6267	docs: Fix typo and update description for --embeddings flag (#7026 ) - Change '--embedding' to '--embeddings' in the README - Update the description to match the latest --help output - Added a caution about defining physical batch size	2024-05-14 15:20:47 +10:00
k.h.lai	30e70334f7	llava-cli: fix base64 prompt (#7248 )	2024-05-14 00:02:36 +10:00
Johannes Gäßler	1c570d8bee	perplexity: add BF16 vs. FP16 results (#7150 )	2024-05-13 13:03:27 +02:00
Benjamin Findley	e586ee4259	change default temperature of OAI compat API from 0 to 1 (#7226 ) * change default temperature of OAI compat API from 0 to 1 * make tests explicitly send temperature to OAI API	2024-05-13 12:40:08 +10:00
Xuan Son Nguyen	72c177c1f6	fix system prompt handling (#7153 )	2024-05-11 17:28:10 +02:00
Steve Grubb	988631335a	server : free llama_batch on exit (#7212 ) * [server] Cleanup a memory leak on exit There are a couple memory leaks on exit of the server. This hides others. After cleaning this up, you can see leaks on slots. But that is another patch to be sent after this. * make tab into spaces	2024-05-11 11:13:02 +03:00
Johannes Gäßler	5ae3426b0b	server: fix reported top tokens for temperature 0 (#7203 )	2024-05-11 10:11:28 +02:00
Joan Fontanals	b83cc3f5b3	llama : add Jina Embeddings architecture (#6826 ) * feat: first things to do * feat: create tensors for Jina architecture * fix: use other tensors * feat: embedding gets results * fix: fix usage of ALIBI * fix: clean prints * fix: do some cleanup unused vars * fix: revert changes to Makefile and CMakeLists * fix: revert some changes * fix: fix small detail * fix: fix convert formatting * fix: fix linting and editor * feat: set proper vocab settings * fix: JinaBertForMaskedLM registration * feat: support q_normalization and k_normalization in Jina arch * feat: handle gpt2 tokenizer with Jina architecture * feat: example comments in embedding * feat: rename Jina Bert to Jina Bert V2 * fix: add some changes as per review * feat: proper KQ_pos for Jina embeddings * feat: add capacity to load models ES and DE for Spanish * llama : fix pre-tokenizers * ggml : full ALiBi support * ggml : update ggml_soft_max_ext() CUDA, SYCL * ggml : ggml_flash_attn_ext() support ALiBi (CPU) * ggml : ggml_flash_attn_ext() support ALiBi (Metal) * ggml : fix warning * ggml : ggml_flash_attn_ext() support ALiBi (CUDA) ggml-ci * minor : clean-up * embedding : add warning about missing SEP --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-05-11 10:46:09 +03:00
slaren	e849648888	llama-bench : add pp+tg test type (#7199 )	2024-05-10 18:03:54 +02:00
Justine Tunney	4e3880978f	Fix memory bug in grammar parser (#7194 ) The llama.cpp grammar parser had a bug where forgetting to add a closing quotation mark to strings would cause parsing to crash. Anyone running a server on a public endpoint is advised to upgrade. To reproduce this bug ./llamafile -m foo.gguf -p bar --grammar 'root::="' Credit for discovering and reporting this issue goes to Eclypsium Security Researcher Richard Johnson <Richard.johnson@eclypsium.com>.	2024-05-10 21:01:08 +10:00

1 2 3 4 5 ...

878 Commits