Commit Graph

12 Commits

Author SHA1 Message Date
hanishkvc 0628226ea1 SimpleChatTC:XmlFiltered: Avoid showing skipped tags as no content
Dont even insert skipped tags as tag blocks with empty content.

This should make the resultant xml cleaner and make it use less
space.
2025-12-04 19:41:39 +05:30
hanishkvc 143f9c0b1a SimpleChatTC:Rename fetch_web_url_text to fetch_html_text
To make it easier for the ai model to understand that this works
mainly for html pages and not say xml or pdf or so. For those
one needs to use other explict tool calls provided like fetchpdftext
or fetchxmltext or so

The server service path renamed from urltext to htmltext.

SearchWebText also updated to use htmltext now
2025-12-04 19:41:39 +05:30
hanishkvc 9f5c3d7776 SimpleChatTC:XmlFiltered: Use re with heirarchy of tags to filter
Rename xmltext to xmlfiltered.

This simplifies the filtering related logic as well as gives more
fine grained flexibility wrt filtering bcas of re.
2025-12-04 19:41:39 +05:30
hanishkvc 9ed1cf9886 SimpleChatTC:XMLFiltered: Retain xml tags with selective dropping
instead of the prefixing of tag heirarchy retain the xml structure
while parallely allowing unwanted tags and their contents to be
dropped.
2025-12-04 19:41:39 +05:30
hanishkvc b8bb258dd5 SimpleChatTC:XmlText: Cleanup initial go
At simpleproxy end

* Add the tag names hierarchy before contents of a tag

* Remember to convert the tagDrops to small case as HTMLParser base
  class seems to do that by default.

At the client ui end

* if undefined remember to pass a empty list wrt tagDrops.

* cleanup the func description and also mention possible tagDrops
  for RSS feeds in the tool meta
2025-12-04 19:41:39 +05:30
hanishkvc 92b0dd7d36 SimpleChatTC:SimpleProxy:XMLText: initial go
Take the existing urltext logic including its html parser and
strip it out to be simpler.
2025-12-04 19:41:39 +05:30
hanishkvc 2394d38d58 SimpleChatTC:Cleanup: General T2
Pretty print SimpleProxy gMe config

Dont ignore the got http response status text.

Update readme wrt why autoSecs
2025-12-04 19:41:39 +05:30
hanishkvc c316f5a2bd SimpleChatTC:WebTools:UrlText:HtmlParser: tag drops - refine
Update the initial skeleton wrt the tag drops logic

* had forgotten to convert object to json string at the client end
* had confused between js and python and tried accessing the dict
  elements using . notation rather than [] notation in python.
* if the id filtered tag to be dropped is found, from then on
  track all other tags of the same type (independent of id),
  so that start and end tags can be matched. bcas end tag call
  wont have attribute, so all other tags of same type need to
  be tracked, for proper winding and unwinding to try find
  matching end tag
* remember to reset the tracked drop tag type to None once matching
  end tag at same depth is found. should avoid some unnecessary
  unwinding.
* set/fix the type wrt tagDrops explicitly to needed depth and
  ensure the dummy one and any explicitly got one is of right type.

Tested with duckduckgo search engine and now the div based unneeded
header is avoided in returned search result.
2025-12-04 19:41:39 +05:30
hanishkvc 06fd41a88e SimpleChatTC:WebTools: urltext-tag-drops python side - skel
Rename search-drops to urltext-tag-drops, to indicate its more
generic semantic. Rather search drops specified in UI by user
will be mapped to urltext-tag-drops header entry of a urltext
web fetch request.

Implement a crude urltext-tag-drops logic in TextHtmlParser.
If there is any mismatch with opening and closing tags in the
html being parsed and inturn wrt the type of tag being targetted
for dropping, things can mess up.
2025-12-04 19:41:39 +05:30
hanishkvc 3b929f934f SimpleChatTC:SimpleProxy:Switch web flow to use file helpers
This also indirectly adds support for local file system access
through the web / fetch (ie urlraw and urltext) service request paths.
2025-12-04 19:41:39 +05:30
hanishkvc d012d127bf SimpleChatTC:SimpleProxy: Avoid circular deps wrt Type Checking
also move debug dump helper to its own module

also remember to specify the Class name in quotes, similar to
refering to a class within a member of th class wrt python type
checking.
2025-12-04 19:41:39 +05:30
hanishkvc 350d7d77e0 SimpleChatTC:SimpleProxy: Move web requests to its own module 2025-12-04 19:41:39 +05:30