SimpleChatTC:WebSearchPlus: Update readme, Wikipedia in allowed

If using wikipedia or so, remember to have sufficient context window
in general wrt the ai engine as well as wrt the handshake / chat
end point.
This commit is contained in:
hanishkvc 2025-10-26 23:12:06 +05:30
parent 221b5a9228
commit 252fb91e95
2 changed files with 28 additions and 16 deletions

View File

@ -1,5 +1,6 @@
{
"allowed.domains": [
".*\\.wikipedia\\.org$",
".*\\.bing\\.com$",
"^www\\.bing\\.com$",
".*\\.yahoo\\.com$",

View File

@ -34,8 +34,9 @@ console. Parallely some of the directly useful to end-user settings can also be
settings ui.
For GenAi/LLM models supporting tool / function calling, allows one to interact with them and explore use of
ai driven augmenting of the knowledge used for generating answers by using the predefined tools/functions.
The end user is provided control over tool calling and response submitting.
ai driven augmenting of the knowledge used for generating answers as well as for cross checking ai generated
answers logically / programatically and by checking with other sources and lot more by making using of the
predefined tools / functions. The end user is provided control over tool calling and response submitting.
NOTE: Current web service api doesnt expose the model context length directly, so client logic doesnt provide
any adaptive culling of old messages nor of replacing them with summary of their content etal. However there
@ -79,13 +80,14 @@ remember to
* use a GenAi/LLM model which supports tool calling.
* if fetch web url / page tool call is needed remember to run the bundled local.tools/simpleproxy.py
* if fetch web page or web search tool call is needed remember to run bundled local.tools/simpleproxy.py
helper along with its config file, before using/loading this client ui through a browser
* cd tools/server/public_simplechat/local.tools; python3 ./simpleproxy.py --config simpleproxy.json
* remember that this is a relatively dumb proxy logic along with optional stripping of scripts / styles
/ headers / footers /..., Be careful if trying to fetch web pages, and use it only with known safe sites.
* remember that this is a relatively minimal dumb proxy logic along with optional stripping of non textual
content like head, scripts, styles, headers, footers, ... Be careful when accessing web through this and
use it only with known safe sites.
* it allows one to specify a white list of allowed.domains, look into local.tools/simpleproxy.json
@ -226,6 +228,8 @@ It is attached to the document object. Some of these can also be updated using t
* fetchProxyUrl - specify the address for the running instance of bundled local.tools/simpleproxy.py
* searchUrl - specify the search engine's search url template along with the tag SEARCHWORDS in place where the search words should be substituted at runtime.
* auto - the amount of time in seconds to wait before the tool call request is auto triggered and generated response is auto submitted back.
setting this value to 0 (default), disables auto logic, so that end user can review the tool calls requested by ai and if needed even modify them, before triggering/executing them as well as review and modify results generated by the tool call, before submitting them back to the ai.
@ -362,7 +366,8 @@ ALERT: The simple minded way in which this is implemented, it provides some mini
mechanism like running ai generated code in web workers and restricting web access to user
specified whitelist and so, but it can still be dangerous in the worst case, So remember
to verify all the tool calls requested and the responses generated manually to ensure
everything is fine, during interaction with ai models with tools support.
everything is fine, during interaction with ai models with tools support. One could also
always run this from a discardable vm, just in case if one wants to be extra cautious.
#### Builtin Tools
@ -388,15 +393,18 @@ requests and generated responses when using tool calling.
Related logic tries to strip html response of html tags and also head, script, style, header,footer,
nav, ... blocks.
fetch_web_url_raw/text and family works along with a corresponding simple local web proxy (/caching
in future) server logic, this helps bypass the CORS restrictions applied if trying to directly fetch
from the browser js runtime environment.
* search_web_text - search for the specified words using the configured search engine and return the
plain textual content from the search result page.
the above set of web related tool calls work by handshaking with a bundled simple local web proxy
(/caching in future) server logic, this helps bypass the CORS restrictions applied if trying to
directly fetch from the browser js runtime environment.
Depending on the path specified wrt the proxy server, it executes the corresponding logic. Like if
urltext path is used (and not urlraw), the logic in addition to fetching content from given url, it
tries to convert html content into equivalent text content to some extent in a simple minded manner
by dropping head block as well as all scripts/styles/footers/headers/nav blocks and inturn dropping
the html tags.
tries to convert html content into equivalent plain text content to some extent in a simple minded
manner by dropping head block as well as all scripts/styles/footers/headers/nav blocks and inturn
dropping the html tags.
The client ui logic does a simple check to see if the bundled simpleproxy is running at specified
fetchProxyUrl before enabling these web and related tool calls.
@ -414,10 +422,10 @@ The bundled simple proxy
so that websites will hopefully respect the request rather than blindly rejecting it as coming from
a non-browser entity.
In future it can be extended to help with other relatively simple yet useful tool calls like search_web,
data/documents_store and so.
In future it can be further extended to help with other relatively simple yet useful tool calls like
data / documents_store, fetch_rss and so.
* for now search_web can be indirectly achieved using fetch_web_url_text/raw.
* for now fetch_rss can be indirectly achieved using fetch_web_url_raw.
#### Extending with new tools
@ -440,6 +448,9 @@ Update the tc_switch to include a object entry for the tool, which inturn includ
It should pass these along to the tools web worker, if used.
* the result key (was used previously, may use in future, but for now left as is)
Look into tooljs.mjs for javascript and inturn web worker based tool calls and toolweb.mjs
for the simpleproxy.py based tool calls.
#### OLD: Mapping tool calls and responses to normal assistant - user chat flow
Instead of maintaining tool_call request and resultant response in logically seperate parallel
@ -480,7 +491,7 @@ Handle reasoning/thinking responses from ai models.
Handle multimodal handshaking with ai models.
Add search_web and documents|data_store tool calling, through the simpleproxy.py if and where needed.
Add fetch_rss and documents|data_store tool calling, through the simpleproxy.py if and where needed.
### Debuging the handshake