Commit Graph

99 Commits

Author SHA1 Message Date
hanishkvc e9dbe21c67 SimpleSallap:SimpleMCP:Cleanup initial go by running and seeing
As expected dataclass field member mutable default values needing
default_factory.

Dont forget returning after sending error response.

TypeAlias type hinting flow seems to go beyond TYPE_CHECKING.
Also email.message.Message[str,str] not accepted, so keep things
simple wrt HttpHeaders for now.
2025-12-07 03:19:22 +05:30
hanishkvc 79cfbbfc8a SimpleSallap:SimpleMCP:Allow auth check to be bypassed, if needed
By default bearer based auth check is done always whether in https
or http mode. However by updating the sec.bAuthAlways config entry
to false, the bearer auth check will be carried out only in https
mode.
2025-12-07 02:34:05 +05:30
hanishkvc 9d6daaed8c SimpleSallap:SimpleMCP:Body Bytes to Json within mcp_run
Given that there could be other service paths beyond /mcp exposed
in future, and given that it is not necessary that their post body
contain json data, so move conversion to json to within mcp_run
handler.

While retaining reading of the body in the generic do_POST ensures
that the read size limit is implicitly enforced, whether /mcp now or
any other path in future.
2025-12-07 02:28:49 +05:30
hanishkvc fac947f9cd SimpleSallap:SimpleMCP:tools/list
Fix a oversight wrt ToolManager.meta, where I had created a dict
of name-keyed toolcall metas, instead of a simple list of toolcall
metas. Rather I blindly duplicated structure I used for storing the
tool calls in the tc_switch in the anveshika sallap client side
code.

Add dataclasses to mimic the MCP tools/list response. However wrt
the 2 odd differences between the MCP structure and OpenAi tools
handshake structure, for now I have retained the OpenAi tools hs
structure.

Add a common helper send_mcp to ProxyHandler given that both
mcp_toolscall and mcp_toolslist and even others like mcp_initialise
in future require a common response mechanism.

With above and bit more implement initial go at tools/list response.
2025-12-07 00:29:37 +05:30
hanishkvc 69be7f2029 SimpleSallap:SimpleMCP: Use ToolManager for some of needed logics
Build the list of tool calls

Trap some of the MCP post json based requests and map to related
handlers. Inturn implement the tool call execution handler.

Add some helper dataclasses wrt expected MCP response structure

TOTHINK: For now maintain id has a string and not int, with idea
to map it directly to callid wrt tool call handshake by ai model.

TOCHECK: For now suffle the order of fields wrt jsonrpc and type
wrt MCP response related structures, assuming the order shouldnt
matter. Need to cross check.
2025-12-06 23:48:58 +05:30
hanishkvc 8700d522a5 SimpleSallap:SimpleMCP:ToolCalls beyond initial go
Define a typealias for HttpHeaders and use it where ever needed.
Inturn map this to email.message.Message and dict for now.
If and when python evolves Http Headers type into better one,
need to replace in only one place.

Add a ToolManager class which
* maintains the list of tool calls and inturn allows any given
  tool call to be executed and response returned along with needed
  meta data
* generate the overall tool calls meta data
* add ToolCallResponseEx which maintains full TCOutResponse for
  use by tc_handle callers

Avoid duplicating handling of some of the basic needed http header
entries.

Move checking for any dependencies before enabling a tool call into
respective tc??? module.
* for now this also demotes the logic from the previous fine grained
  per tool call based dependency check to a more global dep check at
  the respective module level
2025-12-06 22:38:10 +05:30
hanishkvc 0a445c875b SimpleSallap:SimpleMCP:Move towards post json based flow 2025-12-06 19:46:48 +05:30
hanishkvc 1db3a80f40 SimpleSallap:SimpleMCP:Duplicate simpleproxy for mcpish handshake
Will be looking at changing the handshake between AnveshikaSallap
web tech based client logic and this tool calls server to follow
the emerging interoperable MCP standard
2025-12-06 18:29:39 +05:30
hanishkvc cb1d91999d SimpleSallap:SimpleMCP:FileMagic switch to TCOutResponse 2025-12-06 18:21:52 +05:30
hanishkvc 4ce55eb0af SimpleSallap:SimpleMCP:TCPdf: update
Implement pdftext around toolcall class++ flow
2025-12-06 18:00:48 +05:30
hanishkvc 01a7800f51 SimpleSallap:SimpleMCP:TCPdf: Duplicate pdfmagic 2025-12-06 17:57:26 +05:30
hanishkvc 66038f99cf SimpleSallap:SimpleMCP:TCWeb:XMLFiltered initial go wrt new flow
Also remember to picks the tagDropREs from passed args object and
not from got http header.

Even TCHtmlText updated to get the tags to drop from passed args
object and not got http header. And inturn allow ai to pass this
optional arg, as it sees fit in co-ordination with user.
2025-12-06 17:49:08 +05:30
hanishkvc 5bf608dedd SimpleSallap:SimpleMCP:TCWeb:HtmlText updated for new flow
Rather initial go at the new flow, things require to be tweaked
later wrt final valid runnable flow
2025-12-06 17:47:07 +05:30
hanishkvc b17cd18bc5 SimpleSallap:SimpleMCP:TCWeb:Update TCUrlRaw + Helper
Now generic handle_urlreq and handle_urlraw updated to work with the
new ToolCall flow
2025-12-06 17:45:52 +05:30
hanishkvc cbb632eec0 SimpleSallap:SimpleMCP:TCWeb: Duplicate webmagic starting point
To help with switching to tool call class++ based flow
2025-12-06 17:31:23 +05:30
hanishkvc 47bd2bbc90 SimpleSallap:SimpleMCP:Update toolcall to suite calls needed 2025-12-06 01:47:38 +05:30
hanishkvc 452e610095 SimpleSallap:SimpleMCP:Skeletons for a toolcall class 2025-12-06 00:48:48 +05:30
hanishkvc a52ac5ddde SimpleSallap:SimpleProxy:use RequestHandler's setup after ssl hs
Instead of manually setting up rfile and wfile after switching to
ssl mode wrt a client request, now use the builtin setup provided
by the RequestHandler logic, so that these and any other needed
things will be setup as needed after the ssl hs based new socket,
just in case new things are needed in future.
2025-12-05 22:28:55 +05:30
hanishkvc d470d7e47d SimpleSallap:SimpleProxy:DataClass Config simpleproxy updated 2025-12-05 21:42:41 +05:30
hanishkvc 277225dddd SimpleSallap:SimpleProxy:DataClass Config - p4
Minimal skeleton to allow dict [] style access to dataclass based
class's attributes/fields. Also get member function similar to dict.

This simplifies the flow and avoids duplicating data between
attribute and dict data related name and data spaces.
2025-12-05 21:10:01 +05:30
hanishkvc 4f790cb646 SimpleSallap:SimpleProxy:DataClass Config - P3
Add a helper base class to try map data class's attributes into
underlying dict.

TODO: this potentially duplicates data in both normal attribute
space as well as dict items space. And will require additional
standard helper logics to be overridden to ensure sync between
both space etal. Rather given distance from python internals for
a long time now, on pausing and thinking a bit, better to move
into a simpler arch where attributes are directly worked on for
dict [] style access.
2025-12-05 20:47:33 +05:30
hanishkvc 5560840099 SimpleSallap:SimpleProxy:DataclassDict driven Config - p2
Assigning defaut values wrt compound type class members
2025-12-05 18:36:43 +05:30
hanishkvc 4e7c7374d7 SimpleSallap:SimpleProxy:Make Config dataclass driven - p1
Instead of maintaining the config and some of the runtime states
identified as gMe as a generic literal dictionary which grows at
runtime with fields as required, try create it as a class of classes.

Inturn use dataclass annotation to let biolerplate code get auto
generated.

A config module created with above, however remaining part of the
code not yet updated to work with this new structure.

process_args and load_config moved into the new Config class.
2025-12-05 17:59:20 +05:30
hanishkvc 05697afc15 SimpleSallap:SimpleProxy:Trap all GET request handling
otherwise aum path was not handled immidiately wrt exceptions.

this also ensures any future changes wrt get request handling
also get handled immidiately wrt exceptions, that may be missed
by any targetted exception handling.
2025-12-05 03:04:52 +05:30
hanishkvc e52a7aa304 SimpleSallap:SimpleProxy: MultiThreading
Given that default HTTPServer handles only one connection and inturn
request at any given time, so if a client opens connection and then
doesnt do anything with it, it will block other clients by putting their
requests into network queue for long.

So to overcome the above issue switch to ThreadingHTTPServer, which
starts a new thread for each request.

Given that previously ssl wrapping was done wrt the main server socket,
even with switching to ThreadingHTTPServer, the handshake for ssl/tls
still occurs in the main thread before a child thread is started for
parallel request handling, thus the ssl handshake phase blocking other
client requests.

So now avoid wrapping ssl wrt the main server socket, instead wait for
ThreadingHttpServer to start the new thread for a client request ie
after a connection is accepted for the client, before trying to wrap
the connection in ssl. This ensures that the ssl handshake occurs in
this child (ie client request related) thread. So some rogue entity
opening a http connection and not doing ssl handshake wont block.

Inturn in this case the rfile and wfile instances within the proxy
handler need to be remapped to the new ssl wrapped socket.
2025-12-05 01:53:12 +05:30
hanishkvc c4e0c03107 SimpleSallap:SimpleProxy: Enable https mode 2025-12-04 20:51:00 +05:30
hanishkvc efcd81aa2f SimpleChatTCRV:SimpleProxy:DumpHeaders 2025-12-04 19:41:39 +05:30
hanishkvc 9484bea71a SimpleChatTC:PdfText:Basic Outline and its Numbering done
Pass a list to keep track of the numbering at different depths
as well as to delay incrementing the numbering to the last min

Dont let recursion go beyond a predefined limit
2025-12-04 19:41:39 +05:30
hanishkvc 15e99843db SimpleChatTC:PdfText:Numbering T2 - Need diff scheme
This increaments before itself, but we need to increment after
2025-12-04 19:41:39 +05:30
hanishkvc bd60437cc6 SimpleChatTC:PdfText: Numbering T1 - Diff Scheme needed
This simple scheme doesnt work. Rather the pdf outline seems
to follow below logic

If a child list is found when processing the current list, dont
increment the numbering.
2025-12-04 19:41:39 +05:30
hanishkvc 51707b5169 SimpleChatTC:PdfText:Add initial skeleton for outline 2025-12-04 19:41:39 +05:30
hanishkvc 0628226ea1 SimpleChatTC:XmlFiltered: Avoid showing skipped tags as no content
Dont even insert skipped tags as tag blocks with empty content.

This should make the resultant xml cleaner and make it use less
space.
2025-12-04 19:41:39 +05:30
hanishkvc 143f9c0b1a SimpleChatTC:Rename fetch_web_url_text to fetch_html_text
To make it easier for the ai model to understand that this works
mainly for html pages and not say xml or pdf or so. For those
one needs to use other explict tool calls provided like fetchpdftext
or fetchxmltext or so

The server service path renamed from urltext to htmltext.

SearchWebText also updated to use htmltext now
2025-12-04 19:41:39 +05:30
hanishkvc 9f5c3d7776 SimpleChatTC:XmlFiltered: Use re with heirarchy of tags to filter
Rename xmltext to xmlfiltered.

This simplifies the filtering related logic as well as gives more
fine grained flexibility wrt filtering bcas of re.
2025-12-04 19:41:39 +05:30
hanishkvc 9ed1cf9886 SimpleChatTC:XMLFiltered: Retain xml tags with selective dropping
instead of the prefixing of tag heirarchy retain the xml structure
while parallely allowing unwanted tags and their contents to be
dropped.
2025-12-04 19:41:39 +05:30
hanishkvc b8bb258dd5 SimpleChatTC:XmlText: Cleanup initial go
At simpleproxy end

* Add the tag names hierarchy before contents of a tag

* Remember to convert the tagDrops to small case as HTMLParser base
  class seems to do that by default.

At the client ui end

* if undefined remember to pass a empty list wrt tagDrops.

* cleanup the func description and also mention possible tagDrops
  for RSS feeds in the tool meta
2025-12-04 19:41:39 +05:30
hanishkvc 92b0dd7d36 SimpleChatTC:SimpleProxy:XMLText: initial go
Take the existing urltext logic including its html parser and
strip it out to be simpler.
2025-12-04 19:41:39 +05:30
hanishkvc 2394d38d58 SimpleChatTC:Cleanup: General T2
Pretty print SimpleProxy gMe config

Dont ignore the got http response status text.

Update readme wrt why autoSecs
2025-12-04 19:41:39 +05:30
hanishkvc c316f5a2bd SimpleChatTC:WebTools:UrlText:HtmlParser: tag drops - refine
Update the initial skeleton wrt the tag drops logic

* had forgotten to convert object to json string at the client end
* had confused between js and python and tried accessing the dict
  elements using . notation rather than [] notation in python.
* if the id filtered tag to be dropped is found, from then on
  track all other tags of the same type (independent of id),
  so that start and end tags can be matched. bcas end tag call
  wont have attribute, so all other tags of same type need to
  be tracked, for proper winding and unwinding to try find
  matching end tag
* remember to reset the tracked drop tag type to None once matching
  end tag at same depth is found. should avoid some unnecessary
  unwinding.
* set/fix the type wrt tagDrops explicitly to needed depth and
  ensure the dummy one and any explicitly got one is of right type.

Tested with duckduckgo search engine and now the div based unneeded
header is avoided in returned search result.
2025-12-04 19:41:39 +05:30
hanishkvc 06fd41a88e SimpleChatTC:WebTools: urltext-tag-drops python side - skel
Rename search-drops to urltext-tag-drops, to indicate its more
generic semantic. Rather search drops specified in UI by user
will be mapped to urltext-tag-drops header entry of a urltext
web fetch request.

Implement a crude urltext-tag-drops logic in TextHtmlParser.
If there is any mismatch with opening and closing tags in the
html being parsed and inturn wrt the type of tag being targetted
for dropping, things can mess up.
2025-12-04 19:41:39 +05:30
hanishkvc 2cdf3f574c SimpleChatTC:SimpleProxy: Validate deps wrt enabled service paths
helps ensure only service paths that can be serviced are enabled

Use same to check for pypdf wrt pdftext
2025-12-04 19:41:39 +05:30
hanishkvc 1d1894ad14 SimpleChatTC:PdfText:Cleanup rename to follow a common convention
Rename path and tags/identifiers from Pdf2Text to PdfText

Rename the function call to pdf_to_text, this should also help
indicate semantic more unambiguously, just in case, especially
for smaller models.
2025-12-04 19:41:39 +05:30
hanishkvc 9efab62702 SimpleChatTC:SimpleProxy:Add generic arxiv.org entry to allowed 2025-12-04 19:41:39 +05:30
hanishkvc 3b929f934f SimpleChatTC:SimpleProxy:Switch web flow to use file helpers
This also indirectly adds support for local file system access
through the web / fetch (ie urlraw and urltext) service request paths.
2025-12-04 19:41:39 +05:30
hanishkvc 494d063657 SimpleChatTC:SimpleProxy: getting local / web file module ++
Added logic to help get a file from either the local file system
or from the web, based on the url specified.

Update pdfmagic module to use the same, so that it can support
both local as well as web based pdf.

Bring in the debug module, which I had forgotten to commit, after
moving debug helper code from simpleproxy.py to the debug module
2025-12-04 19:41:39 +05:30
hanishkvc a3beacf16a SimpleChatTC:SimpleProxy:Pdf2Text cleanup page number handling
Its not necessary to request a page number range always.

Take care of page number starting from 1 and underlying data having
0 as the starting index
2025-12-04 19:41:39 +05:30
hanishkvc d012d127bf SimpleChatTC:SimpleProxy: Avoid circular deps wrt Type Checking
also move debug dump helper to its own module

also remember to specify the Class name in quotes, similar to
refering to a class within a member of th class wrt python type
checking.
2025-12-04 19:41:39 +05:30
hanishkvc 350d7d77e0 SimpleChatTC:SimpleProxy: Move web requests to its own module 2025-12-04 19:41:39 +05:30
hanishkvc a7de002fd0 SimpleChatTC:SimpleProxy:Move pdf logic into its own module 2025-12-04 19:41:39 +05:30
hanishkvc b18aed4449 SimpleChatTC:SimpleProxy: AuthAndRun hlpr for paths that check auth
Also trap any exceptions while handling and send exception info
to the client requesting service
2025-12-04 19:41:39 +05:30