From 8c8ddb1e5991d5be02fd57be9792855ce5c2677d Mon Sep 17 00:00:00 2001 From: hanishkvc Date: Sun, 26 Oct 2025 12:01:56 +0530 Subject: [PATCH] SimpleChatTC:Update and cleanup the readme a bit include info about the auto option within tools. use nonwrapped text wrt certain sections, so that the markdown readme can be viewed properly wrt the structure of the content in it. --- tools/server/public_simplechat/readme.md | 232 ++++++++++++++--------- 1 file changed, 138 insertions(+), 94 deletions(-) diff --git a/tools/server/public_simplechat/readme.md b/tools/server/public_simplechat/readme.md index 83781a31ea..6576e26914 100644 --- a/tools/server/public_simplechat/readme.md +++ b/tools/server/public_simplechat/readme.md @@ -21,7 +21,7 @@ own system prompts. This allows seeing the generated text / ai-model response in oneshot at the end, after it is fully generated, or potentially as it is being generated, in a streamed manner from the server/ai-model. -![Chat and Settings screens](./simplechat_screens.webp "Chat and Settings screens") +![Chat and Settings (old) screens](./simplechat_screens.webp "Chat and Settings (old) screens") Auto saves the chat session locally as and when the chat is progressing and inturn at a later time when you open SimpleChat, option is provided to restore the old chat session, if a matching one exists. @@ -80,7 +80,7 @@ remember to * use a GenAi/LLM model which supports tool calling. * if fetch web url / page tool call is needed remember to run the bundled local.tools/simpleproxy.py - helper along with its config file + helper along with its config file, before using/loading this client ui through a browser * cd tools/server/public_simplechat/local.tools; python3 ./simpleproxy.py --config simpleproxy.json @@ -154,6 +154,9 @@ Once inside User can even modify the response generated by the tool, if required, before submitting. * just refresh the page, to reset wrt the chat history and or system prompt and start afresh. + This also helps if you had forgotten to start the bundled simpleproxy.py server before hand. + Start the simpleproxy.py server and refresh the client ui page, to get access to web access + related tool calls. * Using NewChat one can start independent chat sessions. * two independent chat sessions are setup by default. @@ -181,91 +184,71 @@ Me/gMe consolidates the settings which control the behaviour into one object. One can see the current settings, as well as change/update them using browsers devel-tool/console. It is attached to the document object. Some of these can also be updated using the Settings UI. - baseURL - the domain-name/ip-address and inturn the port to send the request. + * baseURL - the domain-name/ip-address and inturn the port to send the request. - chatProps - maintain a set of properties which manipulate chatting with ai engine + * chatProps - maintain a set of properties which manipulate chatting with ai engine - apiEP - select between /completions and /chat/completions endpoint provided by the server/ai-model. + * apiEP - select between /completions and /chat/completions endpoint provided by the server/ai-model. - stream - control between oneshot-at-end and live-stream-as-its-generated collating and showing - of the generated response. + * stream - control between oneshot-at-end and live-stream-as-its-generated collating and showing of the generated response. the logic assumes that the text sent from the server follows utf-8 encoding. - in streaming mode - if there is any exception, the logic traps the same and tries to ensure - that text generated till then is not lost. + in streaming mode - if there is any exception, the logic traps the same and tries to ensure that text generated till then is not lost. - if a very long text is being generated, which leads to no user interaction for sometime and - inturn the machine goes into power saving mode or so, the platform may stop network connection, - leading to exception. + * if a very long text is being generated, which leads to no user interaction for sometime and inturn the machine goes into power saving mode or so, the platform may stop network connection, leading to exception. - iRecentUserMsgCnt - a simple minded SlidingWindow to limit context window load at Ai Model end. - This is set to 10 by default. So in addition to latest system message, last/latest iRecentUserMsgCnt - user messages after the latest system prompt and its responses from the ai model will be sent - to the ai-model, when querying for a new response. Note that if enabled, only user messages after - the latest system message/prompt will be considered. + * iRecentUserMsgCnt - a simple minded SlidingWindow to limit context window load at Ai Model end. This is set to 10 by default. So in addition to latest system message, last/latest iRecentUserMsgCnt user messages after the latest system prompt and its responses from the ai model will be sent to the ai-model, when querying for a new response. Note that if enabled, only user messages after the latest system message/prompt will be considered. This specified sliding window user message count also includes the latest user query. - <0 : Send entire chat history to server - 0 : Send only the system message if any to the server - >0 : Send the latest chat history from the latest system prompt, limited to specified cnt. - bCompletionFreshChatAlways - whether Completion mode collates complete/sliding-window history when - communicating with the server or only sends the latest user query/message. + * less than 0 : Send entire chat history to server - bCompletionInsertStandardRolePrefix - whether Completion mode inserts role related prefix wrt the - messages that get inserted into prompt field wrt /Completion endpoint. + * 0 : Send only the system message if any to the server - bTrimGarbage - whether garbage repeatation at the end of the generated ai response, should be - trimmed or left as is. If enabled, it will be trimmed so that it wont be sent back as part of - subsequent chat history. At the same time the actual trimmed text is shown to the user, once - when it was generated, so user can check if any useful info/data was there in the response. + * greater than 0 : Send the latest chat history from the latest system prompt, limited to specified cnt. - One may be able to request the ai-model to continue (wrt the last response) (if chat-history - is enabled as part of the chat-history-in-context setting), and chances are the ai-model will - continue starting from the trimmed part, thus allows long response to be recovered/continued - indirectly, in many cases. + * bCompletionFreshChatAlways - whether Completion mode collates complete/sliding-window history when communicating with the server or only sends the latest user query/message. - The histogram/freq based trimming logic is currently tuned for english language wrt its - is-it-a-alpabetic|numeral-char regex match logic. + * bCompletionInsertStandardRolePrefix - whether Completion mode inserts role related prefix wrt the messages that get inserted into prompt field wrt /Completion endpoint. - tools - contains controls related to tool calling + * bTrimGarbage - whether garbage repeatation at the end of the generated ai response, should be trimmed or left as is. If enabled, it will be trimmed so that it wont be sent back as part of subsequent chat history. At the same time the actual trimmed text is shown to the user, once when it was generated, so user can check if any useful info/data was there in the response. - enabled - control whether tool calling is enabled or not + One may be able to request the ai-model to continue (wrt the last response) (if chat-history is enabled as part of the chat-history-in-context setting), and chances are the ai-model will continue starting from the trimmed part, thus allows long response to be recovered/continued indirectly, in many cases. + + The histogram/freq based trimming logic is currently tuned for english language wrt its is-it-a-alpabetic|numeral-char regex match logic. + + * tools - contains controls related to tool calling + + * enabled - control whether tool calling is enabled or not remember to enable this only for GenAi/LLM models which support tool/function calling. - fetchProxyUrl - specify the address for the running instance of bundled local.tools/simpleproxy.py + * fetchProxyUrl - specify the address for the running instance of bundled local.tools/simpleproxy.py + + * auto - the amount of time in seconds to wait before the tool call request is auto triggered and generated response is auto submitted back. + + setting this value to 0 (default), disables auto logic, so that end user can review the tool calls requested by ai and if needed even modify them, before triggering/executing them as well as review and modify results generated by the tool call, before submitting them back to the ai. the builtin tools' meta data is sent to the ai model in the requests sent to it. - inturn if the ai model requests a tool call to be made, the same will be done and the response - sent back to the ai model, under user control. + inturn if the ai model requests a tool call to be made, the same will be done and the response sent back to the ai model, under user control, by default. - as tool calling will involve a bit of back and forth between ai assistant and end user, it is - recommended to set iRecentUserMsgCnt to 10 or more, so that enough context is retained during - chatting with ai models with tool support. + as tool calling will involve a bit of back and forth between ai assistant and end user, it is recommended to set iRecentUserMsgCnt to 10 or more, so that enough context is retained during chatting with ai models with tool support. - apiRequestOptions - maintains the list of options/fields to send along with api request, - irrespective of whether /chat/completions or /completions endpoint. + * apiRequestOptions - maintains the list of options/fields to send along with api request, irrespective of whether /chat/completions or /completions endpoint. - If you want to add additional options/fields to send to the server/ai-model, and or - modify the existing options value or remove them, for now you can update this global var - using browser's development-tools/console. + If you want to add additional options/fields to send to the server/ai-model, and or modify the existing options value or remove them, for now you can update this global var using browser's development-tools/console. - For string, numeric and boolean fields in apiRequestOptions, including even those added by a - user at runtime by directly modifying gMe.apiRequestOptions, setting ui entries will be auto - created. + For string, numeric and boolean fields in apiRequestOptions, including even those added by a user at runtime by directly modifying gMe.apiRequestOptions, setting ui entries will be auto created. - cache_prompt option supported by example/server is allowed to be controlled by user, so that - any caching supported wrt system-prompt and chat history, if usable can get used. When chat - history sliding window is enabled, cache_prompt logic may or may not kick in at the backend - wrt same, based on aspects related to model, positional encoding, attention mechanism etal. - However system prompt should ideally get the benefit of caching. + cache_prompt option supported by example/server is allowed to be controlled by user, so that any caching supported wrt system-prompt and chat history, if usable can get used. When chat history sliding window is enabled, cache_prompt logic may or may not kick in at the backend wrt same, based on aspects related to model, positional encoding, attention mechanism etal. However system prompt should ideally get the benefit of caching. - headers - maintains the list of http headers sent when request is made to the server. By default - Content-Type is set to application/json. Additionally Authorization entry is provided, which can - be set if needed using the settings ui. + * headers - maintains the list of http headers sent when request is made to the server. By default + + * Content-Type is set to application/json. + + * Additionally Authorization entry is provided, which can be set if needed using the settings ui. By using gMe's chatProps.iRecentUserMsgCnt and apiRequestOptions.max_tokens/n_predict one can try to @@ -298,14 +281,14 @@ However a developer when testing the server of ai-model may want to change these Using chatProps.iRecentUserMsgCnt reduce chat history context sent to the server/ai-model to be just the system-prompt, few prev-user-requests-and-ai-responses and cur-user-request, instead of full chat history. This way if there is any response with garbage/repeatation, it doesnt -mess with things beyond the next question/request/query, in some ways. The trim garbage +mess with things beyond the next few question/request/query, in some ways. The trim garbage option also tries to help avoid issues with garbage in the context to an extent. -Set max_tokens to 2048, so that a relatively large previous reponse doesnt eat up the space -available wrt next query-response. While parallely allowing a good enough context size for -some amount of the chat history in the current session to influence future answers. However +Set max_tokens to 2048 or as needed, so that a relatively large previous reponse doesnt eat up +the space available wrt next query-response. While parallely allowing a good enough context size +for some amount of the chat history in the current session to influence future answers. However dont forget that the server when started should also be started with a model context size of -2k or more, to be on safe side. +2k or more, as needed. The /completions endpoint of tools/server doesnt take max_tokens, instead it takes the internal n_predict, for now add the same here on the client side, maybe later add max_tokens @@ -346,56 +329,115 @@ work. ### Tool Calling -ALERT: The simple minded way in which this is implemented, it can be dangerous in the worst case, -Always remember to verify all the tool calls requested and the responses generated manually to -ensure everything is fine, during interaction with ai models with tools support. +Given that browsers provide a implicit env for not only showing ui, but also running logic, +simplechat client ui allows use of tool calling support provided by the newer ai models by +end users of llama.cpp's server in a simple way without needing to worry about seperate mcp +host / router, tools etal, for basic useful tools/functions like calculator, code execution +(javascript in this case). + +Additionally if users want to work with web content as part of their ai chat session, Few +functions related to web access which work with a included python based simple proxy server +have been implemented. + +This can allow end users to use some basic yet useful tool calls to enhance their ai chat +sessions to some extent. It also provides for a simple minded exploration of tool calling +support in newer ai models and some fun along the way as well as occasional practical use +like + +* verifying mathematical or logical statements/reasoning made by the ai model during chat +sessions by getting it to also create and execute mathematical expressions or code to verify +such stuff and so. + +* access content from internet and augment the ai model's context with additional data as +needed to help generate better responses. this can also be used for + * generating the latest news summary by fetching from news aggregator sites and collating + organising and summarising the same + * searching for specific topics and summarising the results + * or so + +The tool calling feature has been tested with Gemma3N, Granite4 and GptOss (given that +reasoning is currently unsupported by this client ui, it can mess with things) + +ALERT: The simple minded way in which this is implemented, it provides some minimal safety +mechanism like running ai generated code in web workers and restricting web access to user +specified whitelist and so, but it can still be dangerous in the worst case, So remember +to verify all the tool calls requested and the responses generated manually to ensure +everything is fine, during interaction with ai models with tools support. #### Builtin Tools The following tools/functions are currently provided by default + +##### directly in browser + * simple_calculator - which can solve simple arithmatic expressions + * run_javascript_function_code - which can be used to run some javascript code in the browser context. -* fetch_web_url_raw - fetch requested url through a proxy server -* fetch_web_url_text - fetch requested url through a proxy server - and also try strip the html respose of html tags and also head, script, style, header,footer,... blocks. -Currently the generated code / expression is run through a simple minded eval inside a web worker +Currently the ai generated code / expression is run through a simple minded eval inside a web worker mechanism. Use of WebWorker helps avoid exposing browser global scope to the generated code directly. However any shared web worker scope isnt isolated. Either way always remember to cross check the tool requests and generated responses when using tool calling. -fetch_web_url_raw/text and family works along with a corresponding simple local web proxy/caching -server logic, this helps bypass the CORS restrictions applied if trying to directly fetch from the -browser js runtime environment. Depending on the path specified wrt the proxy server, if urltext -(and not urlraw), it additionally tries to convert html content into equivalent text to some extent -in a simple minded manner by dropping head block as well as all scripts/styles/footers/headers/nav. -* the logic does a simple check to see if the bundled simpleproxy is running at specified fetchProxyUrl - before enabling fetch web related tool calls. -* The bundled simple proxy can be found at - * tools/server/public_simplechat/local.tools/simpleproxy.py - * it provides for a basic white list of allowed domains to access, to an extent - * it tries to mimic the client/browser making the request to it by propogating header entries like - user-agent, accept and accept-language from the got request to the generated request during proxying - so that websites will hopefully respect the request rather than blindly rejecting it as coming from - a non-browser entity. +##### using bundled simpleproxy.py (helps bypass browser cors restriction, ...) +* fetch_web_url_raw - fetch contents of the requested url through a proxy server + +* fetch_web_url_text - fetch text parts of the content from the requested url through a proxy server. + Related logic tries to strip html response of html tags and also head, script, style, header,footer, + nav, ... blocks. + +fetch_web_url_raw/text and family works along with a corresponding simple local web proxy (/caching +in future) server logic, this helps bypass the CORS restrictions applied if trying to directly fetch +from the browser js runtime environment. + +Depending on the path specified wrt the proxy server, it executes the corresponding logic. Like if +urltext path is used (and not urlraw), the logic in addition to fetching content from given url, it +tries to convert html content into equivalent text content to some extent in a simple minded manner +by dropping head block as well as all scripts/styles/footers/headers/nav blocks and inturn dropping +the html tags. + +The client ui logic does a simple check to see if the bundled simpleproxy is running at specified +fetchProxyUrl before enabling these web and related tool calls. + +The bundled simple proxy + +* can be found at + * tools/server/public_simplechat/local.tools/simpleproxy.py + +* it provides for a basic white list of allowed domains to access, to be specified by the end user. + This should help limit web access to a safe set of sites determined by the end user. + +* it tries to mimic the client/browser making the request to it by propogating header entries like + user-agent, accept and accept-language from the got request to the generated request during proxying + so that websites will hopefully respect the request rather than blindly rejecting it as coming from + a non-browser entity. + +In future it can be extended to help with other relatively simple yet useful tool calls like search_web, +data/documents_store and so. + + * for now search_web can be indirectly achieved using fetch_web_url_text/raw. #### Extending with new tools +This client ui implements the json schema based function calling convention supported by gen ai +engines over http. + Provide a descriptive meta data explaining the tool / function being provided for tool calling, as well as its arguments. -Provide a handler which should implement the specified tool / function call or rather for many -cases constructs the code to be run to get the tool / function call job done, and inturn pass -the same to the provided web worker to get it executed. Remember to use console.log while -generating any response that should be sent back to the ai model, in your constructed code. +Provide a handler which +* implements the specified tool / function call or +* rather in some cases constructs the code to be run to get the tool / function call job done, + and inturn pass the same to the provided web worker to get it executed. Use console.log while + generating any response that should be sent back to the ai model, in your constructed code. +* once the job is done, return the generated result as needed. Update the tc_switch to include a object entry for the tool, which inturn includes -* the meta data as well as -* a reference to the handler and also - the handler should take toolCallId, toolName and toolArgs and pass these along to - web worker as needed. +* the meta data wrt the tool call +* a reference to the handler - the handler should take toolCallId, toolName and toolArgs. + It should pass these along to the tools web worker, if used. * the result key (was used previously, may use in future, but for now left as is) #### OLD: Mapping tool calls and responses to normal assistant - user chat flow @@ -406,7 +448,7 @@ the SimpleChatTC pushes it into the normal assistant - user chat flow itself, by tool call and response as a pair of tagged request with details in the assistant block and inturn tagged response in the subsequent user block. -This allows the GenAi/LLM to be aware of the tool calls it made as well as the responses it got, +This allows GenAi/LLM to be still aware of the tool calls it made as well as the responses it got, so that it can incorporate the results of the same in the subsequent chat / interactions. NOTE: This flow tested to be ok enough with Gemma-3N-E4B-it-Q8_0 LLM ai model for now. Logically @@ -418,7 +460,7 @@ the tool call responses or even go further and have the logically seperate tool_ structures also. DONE: rather both tool_calls structure wrt assistant messages and tool role based tool call -result messages are generated as needed. +result messages are generated as needed now. #### Related stuff @@ -438,6 +480,8 @@ Handle reasoning/thinking responses from ai models. Handle multimodal handshaking with ai models. +Add search_web and documents|data_store tool calling, through the simpleproxy.py if and where needed. + ### Debuging the handshake