SimpleChatTC: Update readme.md wrt latest updates. 2k maxtokens

This commit is contained in:
hanishkvc 2025-10-14 02:28:42 +05:30
parent 1789f5f1e2
commit c2112618c0
2 changed files with 29 additions and 26 deletions

View File

@ -239,10 +239,10 @@ It is attached to the document object. Some of these can also be updated using t
be set if needed using the settings ui.
iRecentUserMsgCnt - a simple minded SlidingWindow to limit context window load at Ai Model end.
This is disabled by default. However if enabled, then in addition to latest system message, only
the last/latest iRecentUserMsgCnt user messages after the latest system prompt and its responses
from the ai model will be sent to the ai-model, when querying for a new response. IE if enabled,
only user messages after the latest system message/prompt will be considered.
This is set to 5 by default. So in addition to latest system message, last/latest iRecentUserMsgCnt
user messages after the latest system prompt and its responses from the ai model will be sent
to the ai-model, when querying for a new response. Note that if enabled, only user messages after
the latest system message/prompt will be considered.
This specified sliding window user message count also includes the latest user query.
<0 : Send entire chat history to server
@ -282,9 +282,11 @@ full chat history. This way if there is any response with garbage/repeatation, i
mess with things beyond the next question/request/query, in some ways. The trim garbage
option also tries to help avoid issues with garbage in the context to an extent.
Set max_tokens to 1024, so that a relatively large previous reponse doesnt eat up the space
available wrt next query-response. However dont forget that the server when started should
also be started with a model context size of 1k or more, to be on safe side.
Set max_tokens to 2048, so that a relatively large previous reponse doesnt eat up the space
available wrt next query-response. While parallely allowing a good enough context size for
some amount of the chat history in the current session to influence future answers. However
dont forget that the server when started should also be started with a model context size of
2k or more, to be on safe side.
The /completions endpoint of tools/server doesnt take max_tokens, instead it takes the
internal n_predict, for now add the same here on the client side, maybe later add max_tokens
@ -321,9 +323,9 @@ work.
### Tool Calling
ALERT: Currently the way this is implemented, it is dangerous to use this, unless one verifies
all the tool calls requested and the responses generated manually to ensure everything is fine,
during interaction with ai modles with tools support.
ALERT: The simple minded way in which this is implemented, it can be dangerous in the worst case,
Always remember to verify all the tool calls requested and the responses generated manually to
ensure everything is fine, during interaction with ai modles with tools support.
#### Builtin Tools
@ -332,10 +334,10 @@ The following tools/functions are currently provided by default
* run_javascript_function_code - which can be used to run some javascript code in the browser
context.
Currently the generated code / expression is run through a simple dynamic function mechanism.
May update things, in future, so that a WebWorker is used to avoid exposing browser global scope
to the generated code directly. Either way always remember to cross check the tool requests and
generated responses when using tool calling.
Currently the generated code / expression is run through a simple minded eval inside a web worker
mechanism. Use of WebWorker helps avoid exposing browser global scope to the generated code directly.
However any shared web worker scope isnt isolated. Either way always remember to cross check the tool
requests and generated responses when using tool calling.
May add
* web_fetch along with a corresponding simple local web proxy/caching server logic that can bypass
@ -343,19 +345,20 @@ May add
Inturn maybe with a white list of allowed sites to access or so.
#### Extending wiht new tools
#### Extending with new tools
Provide a descriptive meta data explaining the tool / function being provided for tool calling,
as well as its arguments.
Provide a handler which should implement the specified tool / function call. It should place
the result to be sent back to the ai model in the result key of the tc_switch entry for the
corresponding tool.
Provide a handler which should implement the specified tool / function call or rather constructs
the code to be run to get the tool / function call job done, and inturn pass the same to the
provided web worker to get it executed. Remember to use console.log while generating any response
that should be sent back to the ai model, in your constructed code.
Update the tc_switch to include a object entry for the tool, which inturn icnludes
Update the tc_switch to include a object entry for the tool, which inturn includes
* the meta data as well as
* a reference to the handler and also
* the result key
* the result key (was used previously, may use in future, but for now left as is)
#### Mapping tool calls and responses to normal assistant - user chat flow
@ -368,16 +371,16 @@ tagged response in the subsequent user block.
This allows the GenAi/LLM to be aware of the tool calls it made as well as the responses it got,
so that it can incorporate the results of the same in the subsequent chat / interactions.
NOTE: This flow tested to be ok enough with Gemma-3N-E4B-it-Q8_0 LLM ai model for now.
NOTE: This flow tested to be ok enough with Gemma-3N-E4B-it-Q8_0 LLM ai model for now. Logically
given the way current ai models work, most of them should understand things as needed, but need
to test this with other ai models later.
TODO: Need to think later, whether to continue this simple flow, or atleast use tool role wrt
the tool call responses or even go further and have the logically seperate tool_call request
the tool call responses or even go further and have the logically seperate tool_calls request
structures also.
#### ToDo
Update to use web worker.
WebFetch and Local web proxy/caching server
Try and trap promises based flows to ensure all generated results or errors if any are caught

View File

@ -907,8 +907,8 @@ class Me {
this.apiRequestOptions = {
"model": "gpt-3.5-turbo",
"temperature": 0.7,
"max_tokens": 1024,
"n_predict": 1024,
"max_tokens": 2048,
"n_predict": 2048,
"cache_prompt": false,
//"frequency_penalty": 1.2,
//"presence_penalty": 1.2,