llama.cpp

Commit Graph

Author	SHA1	Message	Date
hanishkvc	efcd81aa2f	SimpleChatTCRV:SimpleProxy:DumpHeaders	2025-12-04 19:41:39 +05:30
hanishkvc	9484bea71a	SimpleChatTC:PdfText:Basic Outline and its Numbering done Pass a list to keep track of the numbering at different depths as well as to delay incrementing the numbering to the last min Dont let recursion go beyond a predefined limit	2025-12-04 19:41:39 +05:30
hanishkvc	15e99843db	SimpleChatTC:PdfText:Numbering T2 - Need diff scheme This increaments before itself, but we need to increment after	2025-12-04 19:41:39 +05:30
hanishkvc	bd60437cc6	SimpleChatTC:PdfText: Numbering T1 - Diff Scheme needed This simple scheme doesnt work. Rather the pdf outline seems to follow below logic If a child list is found when processing the current list, dont increment the numbering.	2025-12-04 19:41:39 +05:30
hanishkvc	51707b5169	SimpleChatTC:PdfText:Add initial skeleton for outline	2025-12-04 19:41:39 +05:30
hanishkvc	0628226ea1	SimpleChatTC:XmlFiltered: Avoid showing skipped tags as no content Dont even insert skipped tags as tag blocks with empty content. This should make the resultant xml cleaner and make it use less space.	2025-12-04 19:41:39 +05:30
hanishkvc	143f9c0b1a	SimpleChatTC:Rename fetch_web_url_text to fetch_html_text To make it easier for the ai model to understand that this works mainly for html pages and not say xml or pdf or so. For those one needs to use other explict tool calls provided like fetchpdftext or fetchxmltext or so The server service path renamed from urltext to htmltext. SearchWebText also updated to use htmltext now	2025-12-04 19:41:39 +05:30
hanishkvc	9f5c3d7776	SimpleChatTC:XmlFiltered: Use re with heirarchy of tags to filter Rename xmltext to xmlfiltered. This simplifies the filtering related logic as well as gives more fine grained flexibility wrt filtering bcas of re.	2025-12-04 19:41:39 +05:30
hanishkvc	9ed1cf9886	SimpleChatTC:XMLFiltered: Retain xml tags with selective dropping instead of the prefixing of tag heirarchy retain the xml structure while parallely allowing unwanted tags and their contents to be dropped.	2025-12-04 19:41:39 +05:30
hanishkvc	b8bb258dd5	SimpleChatTC:XmlText: Cleanup initial go At simpleproxy end * Add the tag names hierarchy before contents of a tag * Remember to convert the tagDrops to small case as HTMLParser base class seems to do that by default. At the client ui end * if undefined remember to pass a empty list wrt tagDrops. * cleanup the func description and also mention possible tagDrops for RSS feeds in the tool meta	2025-12-04 19:41:39 +05:30
hanishkvc	92b0dd7d36	SimpleChatTC:SimpleProxy:XMLText: initial go Take the existing urltext logic including its html parser and strip it out to be simpler.	2025-12-04 19:41:39 +05:30
hanishkvc	2394d38d58	SimpleChatTC:Cleanup: General T2 Pretty print SimpleProxy gMe config Dont ignore the got http response status text. Update readme wrt why autoSecs	2025-12-04 19:41:39 +05:30
hanishkvc	c316f5a2bd	SimpleChatTC:WebTools:UrlText:HtmlParser: tag drops - refine Update the initial skeleton wrt the tag drops logic * had forgotten to convert object to json string at the client end * had confused between js and python and tried accessing the dict elements using . notation rather than [] notation in python. * if the id filtered tag to be dropped is found, from then on track all other tags of the same type (independent of id), so that start and end tags can be matched. bcas end tag call wont have attribute, so all other tags of same type need to be tracked, for proper winding and unwinding to try find matching end tag * remember to reset the tracked drop tag type to None once matching end tag at same depth is found. should avoid some unnecessary unwinding. * set/fix the type wrt tagDrops explicitly to needed depth and ensure the dummy one and any explicitly got one is of right type. Tested with duckduckgo search engine and now the div based unneeded header is avoided in returned search result.	2025-12-04 19:41:39 +05:30
hanishkvc	06fd41a88e	SimpleChatTC:WebTools: urltext-tag-drops python side - skel Rename search-drops to urltext-tag-drops, to indicate its more generic semantic. Rather search drops specified in UI by user will be mapped to urltext-tag-drops header entry of a urltext web fetch request. Implement a crude urltext-tag-drops logic in TextHtmlParser. If there is any mismatch with opening and closing tags in the html being parsed and inturn wrt the type of tag being targetted for dropping, things can mess up.	2025-12-04 19:41:39 +05:30
hanishkvc	2cdf3f574c	SimpleChatTC:SimpleProxy: Validate deps wrt enabled service paths helps ensure only service paths that can be serviced are enabled Use same to check for pypdf wrt pdftext	2025-12-04 19:41:39 +05:30
hanishkvc	1d1894ad14	SimpleChatTC:PdfText:Cleanup rename to follow a common convention Rename path and tags/identifiers from Pdf2Text to PdfText Rename the function call to pdf_to_text, this should also help indicate semantic more unambiguously, just in case, especially for smaller models.	2025-12-04 19:41:39 +05:30
hanishkvc	9efab62702	SimpleChatTC:SimpleProxy:Add generic arxiv.org entry to allowed	2025-12-04 19:41:39 +05:30
hanishkvc	3b929f934f	SimpleChatTC:SimpleProxy:Switch web flow to use file helpers This also indirectly adds support for local file system access through the web / fetch (ie urlraw and urltext) service request paths.	2025-12-04 19:41:39 +05:30
hanishkvc	494d063657	SimpleChatTC:SimpleProxy: getting local / web file module ++ Added logic to help get a file from either the local file system or from the web, based on the url specified. Update pdfmagic module to use the same, so that it can support both local as well as web based pdf. Bring in the debug module, which I had forgotten to commit, after moving debug helper code from simpleproxy.py to the debug module	2025-12-04 19:41:39 +05:30
hanishkvc	a3beacf16a	SimpleChatTC:SimpleProxy:Pdf2Text cleanup page number handling Its not necessary to request a page number range always. Take care of page number starting from 1 and underlying data having 0 as the starting index	2025-12-04 19:41:39 +05:30
hanishkvc	d012d127bf	SimpleChatTC:SimpleProxy: Avoid circular deps wrt Type Checking also move debug dump helper to its own module also remember to specify the Class name in quotes, similar to refering to a class within a member of th class wrt python type checking.	2025-12-04 19:41:39 +05:30
hanishkvc	350d7d77e0	SimpleChatTC:SimpleProxy: Move web requests to its own module	2025-12-04 19:41:39 +05:30
hanishkvc	a7de002fd0	SimpleChatTC:SimpleProxy:Move pdf logic into its own module	2025-12-04 19:41:39 +05:30
hanishkvc	b18aed4449	SimpleChatTC:SimpleProxy: AuthAndRun hlpr for paths that check auth Also trap any exceptions while handling and send exception info to the client requesting service	2025-12-04 19:41:39 +05:30
hanishkvc	c597572e10	SimpleChatTC:SimpleProxy: Use urlvalidator Add --allowed.schemes config entry as a needed config. Setup the url validator. Use this wrt urltext, urlraw and pdf2text This allows user to control whether local file access is enabled or not. By default in the sample simpleproxy.json config file local file access is allowed.	2025-12-04 19:41:39 +05:30
hanishkvc	6cab95657f	SimpleChatTC:SimpleProxy:UrlValidator initial go Check if the specified scheme is allowed or not. If allowed then call corresponding validator to check remaining part of the url is fine or not	2025-12-04 19:41:39 +05:30
hanishkvc	c8407a1240	SimpleChatTC:SimpleProxy:UrlValidator module initial skeleton Copy validate_url and build initial skeleton	2025-12-04 19:41:39 +05:30
hanishkvc	dd0a7ec500	SimpleChatTC:Pdf2Text: Make it work with a subset of pages Initial go, need to review the code flow as well as test it out	2025-12-04 19:41:39 +05:30
hanishkvc	dfeb94d3f6	SimpleChatTC:Pdf2Text: cleanup initial go Make the description bit more explicit with it supporting local file paths as part of the url scheme, as the tested ai model was cribbing about not supporting file url scheme. Need to check if this new description will make things better. Convert the text to bytes for writing to the http pipe. Ensure CORS is kept happy by passing AccessControlAllowOrigin in header.	2025-12-04 19:41:39 +05:30
hanishkvc	6054ddfb65	SimpleChatTC:SimpleProxy:Pdf2Text: Initial go	2025-12-04 19:41:39 +05:30
hanishkvc	5ec29087ea	SimpleChatTC:SimpleProxy:Pdf2Text: Move handling url to its own	2025-12-04 19:41:39 +05:30
hanishkvc	ecfdb66c94	SimpleChatTC:SimpleProxy:Pdf2Text:Initial plumbing Get the pdf2text request for processing.	2025-12-04 19:41:39 +05:30
hanishkvc	da98a961ab	SimpleChatTC:SimpleProxy: Enable allowing or not requested feature	2025-12-04 19:41:39 +05:30
hanishkvc	59effa6ea8	SimpleChatTC:Cleanup: tool resp xml, some allowed domains Add a newline between name and content in the xml representation of the tool response, so that it is more easy to distinguish things Add github, linkedin and apnews domains to allowed.domains for simpleproxy.py	2025-12-04 19:41:39 +05:30
hanishkvc	cf06c8682b	SimpleChatTC:Reasoning+: Update readme wrt reasoning, flow cleanup Also cleanup the minimal based showing of chat messages a bit And add github.com to allowed list	2025-12-04 19:41:39 +05:30
hanishkvc	aa17edfa78	SimpleChatTC:SimpleProxy: Include some news sites in allowed domains	2025-12-04 19:41:39 +05:30
hanishkvc	84403973cd	SimpleChatTC:SimpleProxy: once in a bluemoon transformed bearer instead of using the shared bearer token as is, hash it with current year and use the hash. keep /aum path out of auth check. in future bearer token could be transformed more often, as well as with additional nounce/dynamic token from server got during initial /aum handshake as also running counter and so ... NOTE: All these circus not good enough, given that currently the simpleproxy.py handshakes work over http. However these skeletons put in place, for future, if needed. TODO: There is a once in a bluemoon race when the year transitions between client generating the request and server handling the req. But other wise year transitions dont matter bcas client always creates fresh token, and server checks for year change to genrate fresh token if required.	2025-12-04 19:41:39 +05:30
hanishkvc	6d08cda9c8	SimpleChatTC:SimpleProxy: Check for bearer authorization As noted in the comments in code, this is a very insecure flow for now.	2025-12-04 19:41:39 +05:30
hanishkvc	3f1fd289eb	SimpleChatTC:SimpleProxy:BearerInsecure a needed config Add a config entry called bearer.insecure which will contain a token used for bearer auth of http requests Make bearer.insecure and allowed.domains as needed configs, and exit program if they arent got through cmdline or config file.	2025-12-04 19:41:39 +05:30
hanishkvc	0caa2e8101	SimpleChatTC:SimpleProxy: Prg Parameters handling cleanup - next Ensure load_config gets called on encountering --config in cmdline, so that the user has control over whether cmdline or config file will decide the final value of any given parameter. Ensure that str type values in cmdline are picked up directly, without running them through ast.literal_eval, bcas otherwise one will have to ensure throught the cmdline arg mechanism that string quote is retained for literal_eval Have the """ function note/description below def line immidiately so that it is interpreted as a function description.	2025-12-04 19:41:39 +05:30
hanishkvc	f221a2c356	SimpleChatTC:SimpleProxy:LoadConfig ProcessArgs cleanup - initial Now both follow a similar mechanism and do the following * exit on finding any issue, so that things are in a known state from usage perspective, without any confusion/overlook * check if the cmdlineArgCmd/configCmd being processed is a known one or not. * check value of the cmd is of the expected type * have a generic flow which can accomodate more cmds in future in a simple way	2025-12-04 19:41:39 +05:30
hanishkvc	252fb91e95	SimpleChatTC:WebSearchPlus: Update readme, Wikipedia in allowed If using wikipedia or so, remember to have sufficient context window in general wrt the ai engine as well as wrt the handshake / chat end point.	2025-12-04 19:41:39 +05:30
hanishkvc	2192ae6dd3	SimpleChatTC:Cleanup whitespace - github editorconfig checker Add missing newline to ending bracket line of json config file	2025-12-04 19:41:39 +05:30
hanishkvc	f74ce327e5	SimpleChatTC: Cleanup whitespaces identified by llama.cpp editorconfig check * convert tab to spaces in json config file * remove extra space at end of line	2025-12-04 19:41:39 +05:30
hanishkvc	9e97880dde	SimpleChatTC:SimpleProxy:Cleanup avoid logically duplicate debug log	2025-12-04 19:41:39 +05:30
hanishkvc	4c1c363504	SimpleChatTC:SimpleProxy: debug dumps to identify funny bing bing raised a challenge for chrome triggered search requests after few requests, which were spread few minutes apart, while still seemingly allowing wget based search to continue (again spread few minutes apart). Added a simple helper to trace this, use --debug True to enable same.	2025-12-04 19:41:39 +05:30
hanishkvc	c109da870f	SimpleChatTC:SimpleProxy: mimicing got req helps wrt duckduckgo mimicing got req in generated req helps with duckduckgo also and not just yahoo. also update allowed.domains to allow a url generated by ai when trying to access the bing's news aggregation url	2025-12-04 19:41:39 +05:30
hanishkvc	bebf846157	SimpleChatTC:SimpleProxy:Cleanup a bit The tagging of messages wrt ValidateUrl and UrlReq Also dump req Move check for --allowed.domains to ValidateUrl NOTE: Also with mimicing of user agent etal from got request to the generated request, yahoo search/news is returning results now, instead of the bland error before.	2025-12-04 19:41:39 +05:30
hanishkvc	d0b9103176	SimpleChatTC:SimpleProxy:Try mimic real client using got req info ie include User-Agent, Accept-Language and Accept in the generated request using equivalent values got in the request being proxied.	2025-12-04 19:41:39 +05:30
hanishkvc	e6e0adbe90	SimpleChatTC:SimpleProxy: Some debug prints which give info	2025-12-04 19:41:39 +05:30

1 2

73 Commits