From 252fb91e95857335710c861ae7a4c8da97eadc0f Mon Sep 17 00:00:00 2001
From: hanishkvc <hanishkvc@gmail.com>
Date: Sun, 26 Oct 2025 23:12:06 +0530
Subject: [PATCH] SimpleChatTC:WebSearchPlus: Update readme, Wikipedia in
 allowed

If using wikipedia or so, remember to have sufficient context window
in general wrt the ai engine as well as wrt the handshake / chat
end point.
---
 .../local.tools/simpleproxy.json              |  1 +
 tools/server/public_simplechat/readme.md      | 43 ++++++++++++-------
 2 files changed, 28 insertions(+), 16 deletions(-)

diff --git a/tools/server/public_simplechat/local.tools/simpleproxy.json b/tools/server/public_simplechat/local.tools/simpleproxy.json
index 949b7e014d..d68878199a 100644
--- a/tools/server/public_simplechat/local.tools/simpleproxy.json
+++ b/tools/server/public_simplechat/local.tools/simpleproxy.json
@@ -1,5 +1,6 @@
 {
     "allowed.domains": [
+        ".*\\.wikipedia\\.org$",
         ".*\\.bing\\.com$",
         "^www\\.bing\\.com$",
         ".*\\.yahoo\\.com$",
diff --git a/tools/server/public_simplechat/readme.md b/tools/server/public_simplechat/readme.md
index 6576e26914..30c4a2271e 100644
--- a/tools/server/public_simplechat/readme.md
+++ b/tools/server/public_simplechat/readme.md
@@ -34,8 +34,9 @@ console. Parallely some of the directly useful to end-user settings can also be
 settings ui.
 
 For GenAi/LLM models supporting tool / function calling, allows one to interact with them and explore use of
-ai driven augmenting of the knowledge used for generating answers by using the predefined tools/functions.
-The end user is provided control over tool calling and response submitting.
+ai driven augmenting of the knowledge used for generating answers as well as for cross checking ai generated
+answers logically / programatically and by checking with other sources and lot more by making using of the
+predefined tools / functions. The end user is provided control over tool calling and response submitting.
 
 NOTE: Current web service api doesnt expose the model context length directly, so client logic doesnt provide
 any adaptive culling of old messages nor of replacing them with summary of their content etal. However there
@@ -79,13 +80,14 @@ remember to
 
 * use a GenAi/LLM model which supports tool calling.
 
-* if fetch web url / page tool call is needed remember to run the bundled local.tools/simpleproxy.py
+* if fetch web page or web search tool call is needed remember to run bundled local.tools/simpleproxy.py
   helper along with its config file, before using/loading this client ui through a browser
 
   * cd tools/server/public_simplechat/local.tools; python3 ./simpleproxy.py --config simpleproxy.json
 
-  * remember that this is a relatively dumb proxy logic along with optional stripping of scripts / styles
-    / headers / footers /..., Be careful if trying to fetch web pages, and use it only with known safe sites.
+  * remember that this is a relatively minimal dumb proxy logic along with optional stripping of non textual
+  content like head, scripts, styles, headers, footers, ... Be careful when accessing web through this and
+  use it only with known safe sites.
 
   * it allows one to specify a white list of allowed.domains, look into local.tools/simpleproxy.json
 
@@ -226,6 +228,8 @@ It is attached to the document object. Some of these can also be updated using t
 
     * fetchProxyUrl - specify the address for the running instance of bundled local.tools/simpleproxy.py
 
+    * searchUrl - specify the search engine's search url template along with the tag SEARCHWORDS in place where the search words should be substituted at runtime.
+
     * auto - the amount of time in seconds to wait before the tool call request is auto triggered and generated response is auto submitted back.
 
       setting this value to 0 (default), disables auto logic, so that end user can review the tool calls requested by ai and if needed even modify them, before triggering/executing them as well as review and modify results generated by the tool call, before submitting them back to the ai.
@@ -362,7 +366,8 @@ ALERT: The simple minded way in which this is implemented, it provides some mini
 mechanism like running ai generated code in web workers and restricting web access to user
 specified whitelist and so, but it can still be dangerous in the worst case, So remember
 to verify all the tool calls requested and the responses generated manually to ensure
-everything is fine, during interaction with ai models with tools support.
+everything is fine, during interaction with ai models with tools support. One could also
+always run this from a discardable vm, just in case if one wants to be extra cautious.
 
 #### Builtin Tools
 
@@ -388,15 +393,18 @@ requests and generated responses when using tool calling.
   Related logic tries to strip html response of html tags and also head, script, style, header,footer,
   nav, ... blocks.
 
-fetch_web_url_raw/text and family works along with a corresponding simple local web proxy (/caching
-in future) server logic, this helps bypass the CORS restrictions applied if trying to directly fetch
-from the browser js runtime environment.
+* search_web_text - search for the specified words using the configured search engine and return the
+plain textual content from the search result page.
+
+the above set of web related tool calls work by handshaking with a bundled simple local web proxy
+(/caching in future) server logic, this helps bypass the CORS restrictions applied if trying to
+directly fetch from the browser js runtime environment.
 
 Depending on the path specified wrt the proxy server, it executes the corresponding logic. Like if
 urltext path is used (and not urlraw), the logic in addition to fetching content from given url, it
-tries to convert html content into equivalent text content to some extent in a simple minded manner
-by dropping head block as well as all scripts/styles/footers/headers/nav blocks and inturn dropping
-the html tags.
+tries to convert html content into equivalent plain text content to some extent in a simple minded
+manner by dropping head block as well as all scripts/styles/footers/headers/nav blocks and inturn
+dropping the html tags.
 
 The client ui logic does a simple check to see if the bundled simpleproxy is running at specified
 fetchProxyUrl before enabling these web and related tool calls.
@@ -414,10 +422,10 @@ The bundled simple proxy
   so that websites will hopefully respect the request rather than blindly rejecting it as coming from
   a non-browser entity.
 
-In future it can be extended to help with other relatively simple yet useful tool calls like search_web,
-data/documents_store and so.
+In future it can be further extended to help with other relatively simple yet useful tool calls like
+data / documents_store, fetch_rss and so.
 
-  * for now search_web can be indirectly achieved using fetch_web_url_text/raw.
+  * for now fetch_rss can be indirectly achieved using fetch_web_url_raw.
 
 #### Extending with new tools
 
@@ -440,6 +448,9 @@ Update the tc_switch to include a object entry for the tool, which inturn includ
   It should pass these along to the tools web worker, if used.
 * the result key (was used previously, may use in future, but for now left as is)
 
+Look into tooljs.mjs for javascript and inturn web worker based tool calls and toolweb.mjs
+for the simpleproxy.py based tool calls.
+
 #### OLD: Mapping tool calls and responses to normal assistant - user chat flow
 
 Instead of maintaining tool_call request and resultant response in logically seperate parallel
@@ -480,7 +491,7 @@ Handle reasoning/thinking responses from ai models.
 
 Handle multimodal handshaking with ai models.
 
-Add search_web and documents|data_store tool calling, through the simpleproxy.py if and where needed.
+Add fetch_rss and documents|data_store tool calling, through the simpleproxy.py if and where needed.
 
 
 ### Debuging the handshake