Commit Graph

339 Commits

Author SHA1 Message Date
HanishKVC 5a5f6ab848 SimpleChat: Update notes a bit. Try keep browser happy
Avoid browser quirk mode with DOCTYPE.

Help with accessibility a bit by specifying the language explicitly.

Specify the char encoding explicitly, inturn utf-8 is a safe bet,
even with intermixing of languages if reqd in future.

Add a cache-control http-equiv meta tag, which in all probability
will be ignored.

Defer js loading and execution, just for fun and future, not that
critical here as it stands now.
2024-05-19 01:59:25 +05:30
HanishKVC 6eb1e0fbde SimpleChat:JS: bottom of element visible, Set focus to user input
As the generated text could be multiple lines and occupy more space
that the full scrollable div's vertical space, make the bottom of
the last element (which can be such a generated text) in the div
visible by scrolling.

Ensure that the user input box has focus
2024-05-18 22:59:21 +05:30
HanishKVC a944ce7cbe SimpleChat:JS: Try ensure the last entry in chat is visible
Needed because now only the chat div is scrollable and not the full
page.

In last commit the chat div size was fixed to 75% vertical height,
so the full page no longer scrolls, so the old bring user-input
element to view wont work, instead now the last element in the
chat div should be brought into view.
2024-05-18 22:23:34 +05:30
HanishKVC a1a2f36a45 SimpleChat:CSS: Allow for chat div to be scrollable 2024-05-18 22:11:59 +05:30
HanishKVC ebd5e71295 SimpleChat:CSS: Move style info into its own css file
To keep it simple, clean and seperate so that things are not
unnecessarily cluttered.
2024-05-18 17:09:47 +05:30
HanishKVC 65a56e6fdb SimpleChat: Update the readme file 2024-05-18 03:37:15 +05:30
HanishKVC 0d0a28b4ab SimpleChat:HTML: Add a style for system role message 2024-05-18 03:31:37 +05:30
HanishKVC 601fedf8c1 SimpleChat: Move handling systemprompt into its own func 2024-05-18 03:19:59 +05:30
HanishKVC 72151aa634 SimpleChat:Alert user if they provide sysprompt late or change it 2024-05-18 03:16:30 +05:30
HanishKVC 884adfd739 SimpleChat: Ignore empty user input, without trimming 2024-05-18 03:07:40 +05:30
HanishKVC ae52ad1675 SimpleChat:Allow system prompt to be set, if provided before user 2024-05-18 02:59:42 +05:30
HanishKVC 69817fe1de SimpleChat:HTML: Cleanup/structure UI a bit, Add input for system 2024-05-18 01:40:57 +05:30
HanishKVC 668b98700c SimpleChat: Add a simple readme file 2024-05-18 01:06:54 +05:30
HanishKVC b3644172e0 SimpleChat:JS: Force completion mode be single message by default 2024-05-18 00:36:23 +05:30
HanishKVC aef32d9cc0 SimpleChat:JS: Handle difference in response
Try read the assistance response from appropriate field in the
response got.

Also examples/server seems to return the response in a slightly
different field, so try account for that also.
2024-05-18 00:36:23 +05:30
HanishKVC 3e5edbacd6 SimpleChat: Dont submit if already submitted and waiting
Also make chat the default selection wrt mode
2024-05-18 00:36:23 +05:30
HanishKVC 9feb58eaa5 SimpleChat: Allow user to select chat or completion mode 2024-05-18 00:36:23 +05:30
HanishKVC e62087bf3f SimpleChat:JS: Try trap enter key press wrt input text field
So user can either press submit button or press enter key
2024-05-18 00:36:23 +05:30
HanishKVC 29d2d22c02 SimpleChat:sh: Add simple shell script to run python3 http.server
So one needs to run the llm server locally
then run this script and access it using a local browser
2024-05-18 00:36:23 +05:30
HanishKVC ebe330d098 SimpleChat: Move into its own sub directory to avoid confusion 2024-05-18 00:36:23 +05:30
HanishKVC 9942851273 SimpleChat: Diff user/assistant msgs, Make input wider
Also show a default message to user

Also add some metas
2024-05-18 00:36:23 +05:30
HanishKVC 7d772f6b9a SimpleChat: Try keep input element in view 2024-05-18 00:36:23 +05:30
HanishKVC 564469e4f6 SimpleChat:JS: Messages/Prompt, indicate working to end user 2024-05-18 00:36:23 +05:30
HanishKVC c6653479fc SimpleChat:JS: Extract model response and show to user 2024-05-18 00:36:23 +05:30
HanishKVC 33bc67baa6 SimpleChat: Try handshake with llm over its web service endpoint 2024-05-18 00:36:23 +05:30
HanishKVC 27268a6067 SimpleChat: Move handling of submit request into its own func 2024-05-18 00:36:23 +05:30
HanishKVC ce4aaeb692 SimpleChat: Use common helper logic wrt json data 2024-05-18 00:36:23 +05:30
HanishKVC 639d647ebf SimpleChat: Also add completions related prompt 2024-05-18 00:36:23 +05:30
HanishKVC 256e02c7c9 SimpleChat: Rather value wrt input text element 2024-05-18 00:36:23 +05:30
HanishKVC 24d348ab97 SimpleChat:HTML: Bring in the js file 2024-05-18 00:36:23 +05:30
HanishKVC 70e5860264 SimpleChatJS: Roles Class, submitClick
Define Role class with static members corresponding to the roles.

Update startme to

* Get hold of the ui elements.

* Attach a click handler to submit button, which adds the user input
  to xchats array and shows the chat messages till now in chat div
  element.

Trap DOMContentLoaded to trigger startme
2024-05-18 00:36:23 +05:30
HanishKVC 1d3cc9353a SimpleChat: request_json, globals, startme 2024-05-18 00:36:23 +05:30
HanishKVC 0402a4b60e SimpleChat: A js skeleton with SimpleChat class
Allows maintaining an array of chat message.

Allows adding chat message (from any of the roles be it system,
user, assistant, ...)

Allows showing chat messages till now, in a given div element.
2024-05-18 00:36:23 +05:30
HanishKVC 69ecad21e7 SimpleChat: Add a skeletal html page
Contains a div placeholder for showing chat messages till now

a text-input for allowing user to enter next chat message/query
to the model.

a submit button to allow sending of the user entered message and
chat till now to the model.
2024-05-18 00:36:22 +05:30
Radoslav Gerganov ee94172d33
server : add support for the RPC backend (#7305)
ref: #7292
2024-05-17 10:00:17 +03:00
Leon Knauer 9c4fdcbec8
[Server] Added --verbose option to README [no ci] (#7335) 2024-05-17 10:11:03 +10:00
Pierrick Hymbert 24ecb58168
Revert "server bench: fix bench not waiting for model load (#7284)" (#7334)
This reverts commit 583fd6b000.
2024-05-16 20:43:45 +02:00
Johannes Gäßler 583fd6b000
server bench: fix bench not waiting for model load (#7284) 2024-05-15 08:44:16 +02:00
Steve Grubb 4f0263633b
server: free sampling contexts on exit (#7264)
* server: free sampling contexts on exit

This cleans up last leak found by the address sanitizer.

* fix whitespace

* fix whitespace
2024-05-14 16:11:24 +02:00
Ryuei 27f65d6267
docs: Fix typo and update description for --embeddings flag (#7026)
- Change '--embedding' to '--embeddings' in the README
- Update the description to match the latest --help output
- Added a caution about defining physical batch size
2024-05-14 15:20:47 +10:00
Benjamin Findley e586ee4259
change default temperature of OAI compat API from 0 to 1 (#7226)
* change default temperature of OAI compat API from 0 to 1

* make tests explicitly send temperature to OAI API
2024-05-13 12:40:08 +10:00
Xuan Son Nguyen 72c177c1f6
fix system prompt handling (#7153) 2024-05-11 17:28:10 +02:00
Steve Grubb 988631335a
server : free llama_batch on exit (#7212)
* [server] Cleanup a memory leak on exit

There are a couple memory leaks on exit of the server. This hides others.
After cleaning this up, you can see leaks on slots. But that is another
patch to be sent after this.

* make tab into spaces
2024-05-11 11:13:02 +03:00
Johannes Gäßler 5ae3426b0b
server: fix reported top tokens for temperature 0 (#7203) 2024-05-11 10:11:28 +02:00
compilade f98eb31c51
convert-hf : save memory with lazy evaluation (#7075)
* convert-hf : begin refactoring write_tensor

* convert : upgrade to sentencepiece v0.2.0

* convert-hf : remove unused n_dims in extra_*_tensors

* convert-hf : simplify MoE weights stacking

* convert-hf : flake8 linter doesn't like semicolons

* convert-hf : allow unusual model part names

For example, loading `model-00001-of-00001.safetensors` now works.

* convert-hf : fix stacking MoE expert tensors

`torch.stack` and `torch.cat` don't do the same thing.

* convert-hf : fix Mamba conversion

Tested to work even with a SentencePiece-based tokenizer.

* convert : use a string for the SentencePiece tokenizer path

* convert-hf : display tensor shape

* convert-hf : convert norms to f32 by default

* convert-hf : sort model part names

`os.listdir` is said to list files in arbitrary order.
Sorting the file names should let "model-00009-of-00042.safetensors"
be loaded before "model-00010-of-00042.safetensors".

* convert-hf : use an ABC for Model again

It seems Protocol can't be used as a statically type-checked ABC,
because its subclasses also can't be instantiated. (why did it seem to work?)

At least there's still a way to throw an error when forgetting to define
the `model_arch` property of any registered Model subclasses.

* convert-hf : use a plain class for Model, and forbid direct instantiation

There are no abstract methods used anyway,
so using ABC isn't really necessary.

* convert-hf : more consistent formatting of cmdline args

* convert-hf : align the message logged for converted tensors

* convert-hf : fix Refact conversion

* convert-hf : save memory with lazy evaluation

* convert-hf : flake8 doesn't like lowercase L as a variable name

* convert-hf : remove einops requirement for InternLM2

* convert-hf : faster model parts loading

Instead of pre-loading them all into a dict, iterate on the tensors
in the model parts progressively as needed in Model.write_tensors

Conversion for some architectures relies on checking for the presence
of specific tensor names, so for multi-part models, the weight map is read
from the relevant json file to quickly get these names up-front.

* convert-hf : minor changes for consistency

* gguf-py : add tqdm as a dependency

It's small, and used for a progress bar
in GGUFWriter.write_tensors_to_file
2024-05-08 18:16:38 -04:00
Johannes Gäßler c12452c7ae
JSON: [key] -> .at(key), assert() -> GGML_ASSERT (#7143) 2024-05-08 21:53:08 +02:00
JohnnyB bd1871fa2b
server : add themes + favicon (#6848)
* Added themes support with two sample themes and a favicon.

* Newline

* Newline

* Newline

* Trailing whitespace

* Increased opacity for contrast

* Increase opacity.

Check actions cancelled for some other priority job and I can't seem to manually re-run them, so MOAR OPACITY

* Opacity action trigger.

Trying to re-trigger the cancelled action.

* One more opacity adjustment

This Actions pipeline is failing for random issues.

* Delete examples/server/themes/buttons_top/completion.js

This will be served from the static string built-in to server.

* Delete examples/server/themes/buttons_top/index.js

This will be served from the static string built-in to server.

* Delete examples/server/themes/wild/completion.js

This will be served from the static string built-in to server.

* Delete examples/server/themes/buttons_top/json-schema-to-grammar.mjs

This will be served from the static string built-in to server.

* Delete examples/server/themes/wild/index.js

This will be served from the static string built-in to server.

* Delete examples/server/themes/wild/json-schema-to-grammar.mjs

This will be served from the static string built-in to server.

* Replaced underscore.
2024-05-08 22:12:06 +03:00
Johan 911b3900dd
server : add_special option for tokenize endpoint (#7059) 2024-05-08 15:27:58 +03:00
Xuan Son Nguyen 1fd9c1741d
clean up json_value & server_log (#7142) 2024-05-08 13:24:14 +02:00
Johannes Gäßler af0a5b6163
server: fix incorrectly reported token probabilities (#7125)
* server: normalize token probabilities

* fix temperature == 0.0f
2024-05-07 23:07:58 +02:00