llama.cpp

Commit Graph

Author	SHA1	Message	Date
HanishKVC	c88088c7a1	SimpleChat:HtmlCss: Cleanup UI flow set margin wrt vmin rather than vw or vh so portrait/landscape ok. Use flex and flex-grow to put things on the same line as well as distribute available space as needed. Given two main elements/line so it remains simple. In each line have one element with grows and one sits with a basic comfortably fixed size.	2024-05-20 10:40:50 +05:30
HanishKVC	c191e475d5	SimpleChat:HTML: Add viewport meta for better mobile friendliness Without this the page content may look too small.	2024-05-19 15:51:07 +05:30
HanishKVC	5976126c26	SimpleChat:Readme: Note about handle_systemprompt begin/anytime	2024-05-19 03:42:44 +05:30
HanishKVC	7905f2fcbe	SimpleChat:JS: Allow for changing system prompt anytime for future	2024-05-19 03:20:30 +05:30
HanishKVC	676053fc7f	SimpleChat:HTML:Group user input+btn together; Note about multichat	2024-05-19 02:52:33 +05:30
HanishKVC	5a5f6ab848	SimpleChat: Update notes a bit. Try keep browser happy Avoid browser quirk mode with DOCTYPE. Help with accessibility a bit by specifying the language explicitly. Specify the char encoding explicitly, inturn utf-8 is a safe bet, even with intermixing of languages if reqd in future. Add a cache-control http-equiv meta tag, which in all probability will be ignored. Defer js loading and execution, just for fun and future, not that critical here as it stands now.	2024-05-19 01:59:25 +05:30
HanishKVC	6eb1e0fbde	SimpleChat:JS: bottom of element visible, Set focus to user input As the generated text could be multiple lines and occupy more space that the full scrollable div's vertical space, make the bottom of the last element (which can be such a generated text) in the div visible by scrolling. Ensure that the user input box has focus	2024-05-18 22:59:21 +05:30
HanishKVC	a944ce7cbe	SimpleChat:JS: Try ensure the last entry in chat is visible Needed because now only the chat div is scrollable and not the full page. In last commit the chat div size was fixed to 75% vertical height, so the full page no longer scrolls, so the old bring user-input element to view wont work, instead now the last element in the chat div should be brought into view.	2024-05-18 22:23:34 +05:30
HanishKVC	a1a2f36a45	SimpleChat:CSS: Allow for chat div to be scrollable	2024-05-18 22:11:59 +05:30
HanishKVC	ebd5e71295	SimpleChat:CSS: Move style info into its own css file To keep it simple, clean and seperate so that things are not unnecessarily cluttered.	2024-05-18 17:09:47 +05:30
HanishKVC	65a56e6fdb	SimpleChat: Update the readme file	2024-05-18 03:37:15 +05:30
HanishKVC	0d0a28b4ab	SimpleChat:HTML: Add a style for system role message	2024-05-18 03:31:37 +05:30
HanishKVC	601fedf8c1	SimpleChat: Move handling systemprompt into its own func	2024-05-18 03:19:59 +05:30
HanishKVC	72151aa634	SimpleChat:Alert user if they provide sysprompt late or change it	2024-05-18 03:16:30 +05:30
HanishKVC	884adfd739	SimpleChat: Ignore empty user input, without trimming	2024-05-18 03:07:40 +05:30
HanishKVC	ae52ad1675	SimpleChat:Allow system prompt to be set, if provided before user	2024-05-18 02:59:42 +05:30
HanishKVC	69817fe1de	SimpleChat:HTML: Cleanup/structure UI a bit, Add input for system	2024-05-18 01:40:57 +05:30
HanishKVC	668b98700c	SimpleChat: Add a simple readme file	2024-05-18 01:06:54 +05:30
HanishKVC	b3644172e0	SimpleChat:JS: Force completion mode be single message by default	2024-05-18 00:36:23 +05:30
HanishKVC	aef32d9cc0	SimpleChat:JS: Handle difference in response Try read the assistance response from appropriate field in the response got. Also examples/server seems to return the response in a slightly different field, so try account for that also.	2024-05-18 00:36:23 +05:30
HanishKVC	3e5edbacd6	SimpleChat: Dont submit if already submitted and waiting Also make chat the default selection wrt mode	2024-05-18 00:36:23 +05:30
HanishKVC	9feb58eaa5	SimpleChat: Allow user to select chat or completion mode	2024-05-18 00:36:23 +05:30
HanishKVC	e62087bf3f	SimpleChat:JS: Try trap enter key press wrt input text field So user can either press submit button or press enter key	2024-05-18 00:36:23 +05:30
HanishKVC	29d2d22c02	SimpleChat:sh: Add simple shell script to run python3 http.server So one needs to run the llm server locally then run this script and access it using a local browser	2024-05-18 00:36:23 +05:30
HanishKVC	ebe330d098	SimpleChat: Move into its own sub directory to avoid confusion	2024-05-18 00:36:23 +05:30
HanishKVC	9942851273	SimpleChat: Diff user/assistant msgs, Make input wider Also show a default message to user Also add some metas	2024-05-18 00:36:23 +05:30
HanishKVC	7d772f6b9a	SimpleChat: Try keep input element in view	2024-05-18 00:36:23 +05:30
HanishKVC	564469e4f6	SimpleChat:JS: Messages/Prompt, indicate working to end user	2024-05-18 00:36:23 +05:30
HanishKVC	c6653479fc	SimpleChat:JS: Extract model response and show to user	2024-05-18 00:36:23 +05:30
HanishKVC	33bc67baa6	SimpleChat: Try handshake with llm over its web service endpoint	2024-05-18 00:36:23 +05:30
HanishKVC	27268a6067	SimpleChat: Move handling of submit request into its own func	2024-05-18 00:36:23 +05:30
HanishKVC	ce4aaeb692	SimpleChat: Use common helper logic wrt json data	2024-05-18 00:36:23 +05:30
HanishKVC	639d647ebf	SimpleChat: Also add completions related prompt	2024-05-18 00:36:23 +05:30
HanishKVC	256e02c7c9	SimpleChat: Rather value wrt input text element	2024-05-18 00:36:23 +05:30
HanishKVC	24d348ab97	SimpleChat:HTML: Bring in the js file	2024-05-18 00:36:23 +05:30
HanishKVC	70e5860264	SimpleChatJS: Roles Class, submitClick Define Role class with static members corresponding to the roles. Update startme to * Get hold of the ui elements. * Attach a click handler to submit button, which adds the user input to xchats array and shows the chat messages till now in chat div element. Trap DOMContentLoaded to trigger startme	2024-05-18 00:36:23 +05:30
HanishKVC	1d3cc9353a	SimpleChat: request_json, globals, startme	2024-05-18 00:36:23 +05:30
HanishKVC	0402a4b60e	SimpleChat: A js skeleton with SimpleChat class Allows maintaining an array of chat message. Allows adding chat message (from any of the roles be it system, user, assistant, ...) Allows showing chat messages till now, in a given div element.	2024-05-18 00:36:23 +05:30
HanishKVC	69ecad21e7	SimpleChat: Add a skeletal html page Contains a div placeholder for showing chat messages till now a text-input for allowing user to enter next chat message/query to the model. a submit button to allow sending of the user entered message and chat till now to the model.	2024-05-18 00:36:22 +05:30
Radoslav Gerganov	ee94172d33	server : add support for the RPC backend (#7305 ) ref: #7292	2024-05-17 10:00:17 +03:00
Leon Knauer	9c4fdcbec8	[Server] Added --verbose option to README [no ci] (#7335 )	2024-05-17 10:11:03 +10:00
Pierrick Hymbert	24ecb58168	Revert "server bench: fix bench not waiting for model load (#7284 )" (#7334 ) This reverts commit `583fd6b000`.	2024-05-16 20:43:45 +02:00
Johannes Gäßler	583fd6b000	server bench: fix bench not waiting for model load (#7284 )	2024-05-15 08:44:16 +02:00
Steve Grubb	4f0263633b	server: free sampling contexts on exit (#7264 ) * server: free sampling contexts on exit This cleans up last leak found by the address sanitizer. * fix whitespace * fix whitespace	2024-05-14 16:11:24 +02:00
Ryuei	27f65d6267	docs: Fix typo and update description for --embeddings flag (#7026 ) - Change '--embedding' to '--embeddings' in the README - Update the description to match the latest --help output - Added a caution about defining physical batch size	2024-05-14 15:20:47 +10:00
Benjamin Findley	e586ee4259	change default temperature of OAI compat API from 0 to 1 (#7226 ) * change default temperature of OAI compat API from 0 to 1 * make tests explicitly send temperature to OAI API	2024-05-13 12:40:08 +10:00
Xuan Son Nguyen	72c177c1f6	fix system prompt handling (#7153 )	2024-05-11 17:28:10 +02:00
Steve Grubb	988631335a	server : free llama_batch on exit (#7212 ) * [server] Cleanup a memory leak on exit There are a couple memory leaks on exit of the server. This hides others. After cleaning this up, you can see leaks on slots. But that is another patch to be sent after this. * make tab into spaces	2024-05-11 11:13:02 +03:00
Johannes Gäßler	5ae3426b0b	server: fix reported top tokens for temperature 0 (#7203 )	2024-05-11 10:11:28 +02:00
compilade	f98eb31c51	convert-hf : save memory with lazy evaluation (#7075 ) * convert-hf : begin refactoring write_tensor * convert : upgrade to sentencepiece v0.2.0 * convert-hf : remove unused n_dims in extra__tensors convert-hf : simplify MoE weights stacking * convert-hf : flake8 linter doesn't like semicolons * convert-hf : allow unusual model part names For example, loading `model-00001-of-00001.safetensors` now works. * convert-hf : fix stacking MoE expert tensors `torch.stack` and `torch.cat` don't do the same thing. * convert-hf : fix Mamba conversion Tested to work even with a SentencePiece-based tokenizer. * convert : use a string for the SentencePiece tokenizer path * convert-hf : display tensor shape * convert-hf : convert norms to f32 by default * convert-hf : sort model part names `os.listdir` is said to list files in arbitrary order. Sorting the file names should let "model-00009-of-00042.safetensors" be loaded before "model-00010-of-00042.safetensors". * convert-hf : use an ABC for Model again It seems Protocol can't be used as a statically type-checked ABC, because its subclasses also can't be instantiated. (why did it seem to work?) At least there's still a way to throw an error when forgetting to define the `model_arch` property of any registered Model subclasses. * convert-hf : use a plain class for Model, and forbid direct instantiation There are no abstract methods used anyway, so using ABC isn't really necessary. * convert-hf : more consistent formatting of cmdline args * convert-hf : align the message logged for converted tensors * convert-hf : fix Refact conversion * convert-hf : save memory with lazy evaluation * convert-hf : flake8 doesn't like lowercase L as a variable name * convert-hf : remove einops requirement for InternLM2 * convert-hf : faster model parts loading Instead of pre-loading them all into a dict, iterate on the tensors in the model parts progressively as needed in Model.write_tensors Conversion for some architectures relies on checking for the presence of specific tensor names, so for multi-part models, the weight map is read from the relevant json file to quickly get these names up-front. * convert-hf : minor changes for consistency * gguf-py : add tqdm as a dependency It's small, and used for a progress bar in GGUFWriter.write_tensors_to_file	2024-05-08 18:16:38 -04:00

1 2 3 4 5 ...

344 Commits