SimpleChatTC:Reasoning+: Update readme wrt reasoning, flow cleanup
Also cleanup the minimal based showing of chat messages a bit And add github.com to allowed list
This commit is contained in:
parent
937aa57528
commit
cf06c8682b
|
|
@ -37,7 +37,8 @@
|
|||
"^theprint\\.in$",
|
||||
".*\\.ndtv\\.com$",
|
||||
"^lwn\\.net$",
|
||||
"^arstechnica\\.com$"
|
||||
"^arstechnica\\.com$",
|
||||
".*\\.github\\.com$"
|
||||
],
|
||||
"bearer.insecure": "NeverSecure"
|
||||
}
|
||||
|
|
|
|||
|
|
@ -14,9 +14,9 @@ Continue reading for the details.
|
|||
## overview
|
||||
|
||||
This simple web frontend, allows triggering/testing the server's /completions or /chat/completions endpoints
|
||||
in a simple way with minimal code from a common code base. Inturn additionally it tries to allow single or
|
||||
multiple independent back and forth chatting to an extent, with the ai llm model at a basic level, with their
|
||||
own system prompts.
|
||||
in a simple way with minimal code from a common code base. Additionally it also allows end users to have
|
||||
single or multiple independent chat sessions with back and forth chatting to an extent, with the ai llm model
|
||||
at a basic level, with their own system prompts.
|
||||
|
||||
This allows seeing the generated text / ai-model response in oneshot at the end, after it is fully generated,
|
||||
or potentially as it is being generated, in a streamed manner from the server/ai-model.
|
||||
|
|
@ -24,7 +24,10 @@ or potentially as it is being generated, in a streamed manner from the server/ai
|
|||
 screens")
|
||||
|
||||
Auto saves the chat session locally as and when the chat is progressing and inturn at a later time when you
|
||||
open SimpleChat, option is provided to restore the old chat session, if a matching one exists.
|
||||
open SimpleChat, option is provided to restore the old chat session, if a matching one exists. In turn if
|
||||
any of those chat sessions were pending wrt user triggering a tool call or submitting a tool call response,
|
||||
the ui is setup as needed for end user to continue with those previously saved sessions, from where they
|
||||
left off.
|
||||
|
||||
The UI follows a responsive web design so that the layout can adapt to available display space in a usable
|
||||
enough manner, in general.
|
||||
|
|
@ -36,12 +39,17 @@ settings ui.
|
|||
For GenAi/LLM models supporting tool / function calling, allows one to interact with them and explore use of
|
||||
ai driven augmenting of the knowledge used for generating answers as well as for cross checking ai generated
|
||||
answers logically / programatically and by checking with other sources and lot more by making using of the
|
||||
predefined tools / functions. The end user is provided control over tool calling and response submitting.
|
||||
simple yet useful predefined tools / functions provided by this client web ui. The end user is provided full
|
||||
control over tool calling and response submitting.
|
||||
|
||||
NOTE: Current web service api doesnt expose the model context length directly, so client logic doesnt provide
|
||||
any adaptive culling of old messages nor of replacing them with summary of their content etal. However there
|
||||
is a optional sliding window based chat logic, which provides a simple minded culling of old messages from
|
||||
the chat history before sending to the ai model.
|
||||
For GenAi/LLM models which support reasoning, the thinking of the model will be shown to the end user as the
|
||||
model is running through its reasoning.
|
||||
|
||||
NOTE: As all genai/llm web service apis may or may not expose the model context length directly, and also
|
||||
as using ai out of band for additional parallel work may not be efficient given the loading of current systems
|
||||
by genai/llm models, so client logic doesnt provide any adaptive culling of old messages nor of replacing them
|
||||
with summary of their content etal. However there is a optional sliding window based chat logic, which provides
|
||||
a simple minded culling of old messages from the chat history before sending to the ai model.
|
||||
|
||||
NOTE: Wrt options sent with the request, it mainly sets temperature, max_tokens and optionaly stream as well
|
||||
as tool_calls mainly for now. However if someone wants they can update the js file or equivalent member in
|
||||
|
|
@ -110,7 +118,7 @@ Once inside
|
|||
* try trim garbage in response or not
|
||||
* amount of chat history in the context sent to server/ai-model
|
||||
* oneshot or streamed mode.
|
||||
* use built in tool calling or not
|
||||
* use built in tool calling or not and its related params.
|
||||
|
||||
* In completion mode >> note: most recent work has been in chat mode <<
|
||||
* one normally doesnt use a system prompt in completion mode.
|
||||
|
|
@ -149,6 +157,9 @@ Once inside
|
|||
* the user input box will be disabled and a working message will be shown in it.
|
||||
* if trim garbage is enabled, the logic will try to trim repeating text kind of garbage to some extent.
|
||||
|
||||
* any reasoning / thinking by the model is shown to the end user, as it is occuring, if the ai model
|
||||
shares the same over the http interface.
|
||||
|
||||
* tool calling flow when working with ai models which support tool / function calling
|
||||
* if tool calling is enabled and the user query results in need for one of the builtin tools to be
|
||||
called, then the ai response might include request for tool call.
|
||||
|
|
@ -159,6 +170,9 @@ Once inside
|
|||
ie <tool_response> generated result with meta data </tool_response>
|
||||
* if user is ok with the tool response, they can click submit to send the same to the GenAi/LLM.
|
||||
User can even modify the response generated by the tool, if required, before submitting.
|
||||
* ALERT: Sometimes the reasoning or chat from ai model may indicate tool call, but you may actually
|
||||
not get/see a tool call, in such situations, dont forget to cross check that tool calling is
|
||||
enabled in the settings.
|
||||
|
||||
* just refresh the page, to reset wrt the chat history and or system prompt and start afresh.
|
||||
This also helps if you had forgotten to start the bundled simpleproxy.py server before hand.
|
||||
|
|
@ -372,8 +386,7 @@ needed to help generate better responses. this can also be used for
|
|||
* searching for specific topics and summarising the results
|
||||
* or so
|
||||
|
||||
The tool calling feature has been tested with Gemma3N, Granite4 and GptOss (given that
|
||||
reasoning is currently unsupported by this client ui, it can mess with things)
|
||||
The tool calling feature has been tested with Gemma3N, Granite4 and GptOss.
|
||||
|
||||
ALERT: The simple minded way in which this is implemented, it provides some minimal safety
|
||||
mechanism like running ai generated code in web workers and restricting web access to user
|
||||
|
|
@ -454,7 +467,8 @@ Provide a handler which
|
|||
* rather in some cases constructs the code to be run to get the tool / function call job done,
|
||||
and inturn pass the same to the provided web worker to get it executed. Use console.log while
|
||||
generating any response that should be sent back to the ai model, in your constructed code.
|
||||
* once the job is done, return the generated result as needed.
|
||||
* once the job is done, return the generated result as needed, along with tool call related meta
|
||||
data like chatSessionId, toolCallId, toolName which was passed along with the tool call.
|
||||
|
||||
Update the tc_switch to include a object entry for the tool, which inturn includes
|
||||
* the meta data wrt the tool call
|
||||
|
|
@ -495,24 +509,63 @@ gets executed, before tool calling returns and thus data / error generated by th
|
|||
get incorporated in result sent to ai engine on the server side.
|
||||
|
||||
|
||||
### ToDo
|
||||
### Progress
|
||||
|
||||
#### Done
|
||||
|
||||
Tool Calling support added, along with a bunch of useful tool calls as well as a bundled simple proxy
|
||||
if one wants to access web as part of tool call usage.
|
||||
|
||||
Reasoning / thinking response from Ai Models is shown to the user, as they are being generated/shared.
|
||||
|
||||
Chat Messages/Session and UI handling have been moved into corresponding Classes to an extent, this
|
||||
helps ensure that
|
||||
* switching chat sessions or loading a previous auto saved chat session will restore state including
|
||||
ui such that end user can continue the chat session from where they left it, even if in the middle
|
||||
of a tool call handshake.
|
||||
* new fields added to http handshake in oneshot or streaming mode can be handled in a structured way
|
||||
to an extent.
|
||||
|
||||
#### ToDo
|
||||
|
||||
Is the tool call promise land trap deep enough, need to think through and explore around this once later.
|
||||
|
||||
Trap error responses.
|
||||
|
||||
Handle reasoning/thinking responses from ai models.
|
||||
|
||||
Handle multimodal handshaking with ai models.
|
||||
|
||||
Add fetch_rss and documents|data_store tool calling, through the simpleproxy.py if and where needed.
|
||||
|
||||
Save used config entries along with the auto saved chat sessions and inturn give option to reload the
|
||||
same when saved chat is loaded.
|
||||
|
||||
### Debuging the handshake
|
||||
MAYBE make the settings in general chat session specific, rather than the current global config flow.
|
||||
|
||||
When working with llama.cpp server based GenAi/LLM running locally
|
||||
|
||||
sudo tcpdump -i lo -s 0 -vvv -A host 127.0.0.1 and port 8080 | tee /tmp/td.log
|
||||
### Debuging the handshake and beyond
|
||||
|
||||
When working with llama.cpp server based GenAi/LLM running locally, to look at the handshake directly
|
||||
from the commandline, you could run something like below
|
||||
|
||||
* sudo tcpdump -i lo -s 0 -vvv -A host 127.0.0.1 and port 8080 | tee /tmp/td.log
|
||||
* or one could also try look at the network tab in the browser developer console
|
||||
|
||||
One could always remove message entries or manipulate chat sessions by accessing document['gMe']
|
||||
in devel console of the browser
|
||||
|
||||
* if you want the last tool call response you submitted to be re-available for tool call execution and
|
||||
resubmitting of response fresh, for any reason, follow below steps
|
||||
* remove the assistant response from end of chat session, if any, using
|
||||
* document['gMe'].multiChat.simpleChats['SessionId'].xchat.pop()
|
||||
* reset role of Tool response chat message to TOOL.TEMP from tool
|
||||
* toolMessageIndex = document['gMe'].multiChat.simpleChats['SessionId'].xchat.length - 1
|
||||
* document['gMe'].multiChat.simpleChats['SessionId'].xchat[toolMessageIndex].role = "TOOL.TEMP"
|
||||
* clicking on the SessionId at top in UI, should refresh the chat ui and inturn it should now give
|
||||
the option to control that tool call again
|
||||
* this can also help in the case where the chat session fails with context window exceeded
|
||||
* you restart the GenAi/LLM server after increasing the context window as needed
|
||||
* edit the chat session history as mentioned above, to the extent needed
|
||||
* resubmit the last needed user/tool response as needed
|
||||
|
||||
|
||||
## At the end
|
||||
|
|
|
|||
|
|
@ -261,7 +261,7 @@ class ChatMessageEx {
|
|||
let content = ""
|
||||
let toolcall = ""
|
||||
if (this.ns.reasoning_content.trim() !== "") {
|
||||
reasoning = `!!!Reasoning: ${this.ns.reasoning_content.trim()} !!!\n`;
|
||||
reasoning = `!!!Reasoning: ${this.ns.reasoning_content.trim()} !!!\n\n`;
|
||||
}
|
||||
if (this.ns.content !== "") {
|
||||
content = this.ns.content;
|
||||
|
|
@ -898,7 +898,7 @@ class MultiChatUI {
|
|||
}
|
||||
continue
|
||||
}
|
||||
let entry = ui.el_create_append_p(`${x.ns.role}: ${x.content_equiv()}`, this.elDivChat);
|
||||
let entry = ui.el_create_append_p(`[[ ${x.ns.role} ]]: ${x.content_equiv()}`, this.elDivChat);
|
||||
entry.className = `role-${x.ns.role}`;
|
||||
last = entry;
|
||||
if (x.ns.role === Roles.Assistant) {
|
||||
|
|
|
|||
Loading…
Reference in New Issue