Rename search-drops to urltext-tag-drops, to indicate its more
generic semantic. Rather search drops specified in UI by user
will be mapped to urltext-tag-drops header entry of a urltext
web fetch request.
Implement a crude urltext-tag-drops logic in TextHtmlParser.
If there is any mismatch with opening and closing tags in the
html being parsed and inturn wrt the type of tag being targetted
for dropping, things can mess up.
Allow the web tools handshake helper to pass additional header
entries provided by its caller.
Make use of this to send a list of tag and id pairs wrt web search
tool. Which will be used to drop div's matching the specified id.
Chances are for ai models which dont support tool calling, things
will be such that the tool calls meta data shared will be silently
ignored without much issue.
So enabling tool calling feature by default, so that in case one
is using a ai model with tool calling the feature is readily
available for use.
Revert SlidingWindow ChatHistory in Context from last 10 to last 5
(2 more then origianl, given more context support in todays models)
by default, given that now tool handshakes go through the tools
related side channel in the http handshake and arent morphed into
normal user-assistant channel of the handshake.
Rename path and tags/identifiers from Pdf2Text to PdfText
Rename the function call to pdf_to_text, this should also help
indicate semantic more unambiguously, just in case, especially
for smaller models.
Usage Note
* Cleanup / fix some wording.
* Pick chat history handshaked len from config
Ensure the settings info is uptodate wrt available tool names
by chaining a reshowing with tools manager initialisation.
Added logic to help get a file from either the local file system
or from the web, based on the url specified.
Update pdfmagic module to use the same, so that it can support
both local as well as web based pdf.
Bring in the debug module, which I had forgotten to commit, after
moving debug helper code from simpleproxy.py to the debug module
also move debug dump helper to its own module
also remember to specify the Class name in quotes, similar to
refering to a class within a member of th class wrt python type
checking.
Add --allowed.schemes config entry as a needed config.
Setup the url validator.
Use this wrt urltext, urlraw and pdf2text
This allows user to control whether local file access is enabled
or not. By default in the sample simpleproxy.json config file
local file access is allowed.
As I was seeing the truncated message even for stripped plain text
web acces, relooking at that initial go at truncating, revealed
a oversight, which had the truncation logic trigger anytime the
iResultMaxDataLength was greater than 0, irrespective of whether
the actual result was smaller than the allowed limit or not,
thus adding that truncated message to end of result unnecessarily.
Have fixed that oversight
Also recent any number of args based simpleprox handshake helper
in toolweb seems to be working (atleast for the existing single
arg based calls).
This makes the logic more generic, as well as prepares for additional
parameters to be passed to the simpleproxy.py helper handshakes.
Ex: Restrict extracted contents of a pdf to specified start and end
page numbers or so.
Needed to tweak the description further for the ai model to be
able to understand that its ok to pass file:// scheme based urls
Had forgotten how big the web site pages have become as also the
need for more ResultDataLength wrt one shot PDF read to get
atleast some good enough amount of content in it with large pdfs
Allow user to limit the max amount of result data returned to ai
after a tool call.
Inturn it is set by default to 2K.
Update the pdf2text tool description to try make the local file
path support more explicit
Make the description bit more explicit with it supporting local
file paths as part of the url scheme, as the tested ai model was
cribbing about not supporting file url scheme. Need to check if
this new description will make things better.
Convert the text to bytes for writing to the http pipe.
Ensure CORS is kept happy by passing AccessControlAllowOrigin in
header.
move the actual chat handshake with ai server into a seperate code
to an extent.
also initial anchor to trap handshake http error responses
Rather come to think of it, its better to move this into SimpleChat
class.
Use finally to ensure any needed cleanup for handle_user_submit
occurs within itself.
Update the descriptions of set and get to indicate the possible
corner cases or rather semantic in such situations.
Update the readme also a bit. The auto save and restore mentioned
has nothing to do with the new data store mechanism.
In the eagerness of initial skeleton, had forgotten that the
root/generic tool call router takes care of parsing the json string
into a object, before calling the tool call, so no need to try
parse again. Fixed the same.
Hadnt converted the object based response from data store related
calls in the db web worker, into json string before passing to the
generic tool response callback, fixed the same.
- Rather the though of making the ChatMsgEx.createAllInOne handle
string or object set aside for now, to keep things simple and
consistant to the greatest extent possible across different flows.
And good news - flow is working atleast for the overall happy path
Need to check what corner cases are lurking like calling set on
same key more than once, seemed to have some flow oddity, which I
need to check later.
Also maybe change the field name to value from data in the response
to get, to match the field name convention of set. GPT-OSS is fine
with it. But worst case micro / nano / pico models may trip up, in
worst case, so better to keep things consistent.
So mention that may be ai can send complex objects in stringified
form. Rather once type of value is set to string, ai should normally
do it, but no harm is hinting.