Commit Graph

25 Commits

Author SHA1 Message Date
hanishkvc d0b9103176 SimpleChatTC:SimpleProxy:Try mimic real client using got req info
ie include User-Agent, Accept-Language and Accept in the generated
request using equivalent values got in the request being proxied.
2025-12-04 19:41:39 +05:30
hanishkvc e6e0adbe90 SimpleChatTC:SimpleProxy: Some debug prints which give info 2025-12-04 19:41:39 +05:30
hanishkvc 840cab0b1c SimpleChatTC:SimpleProxy: Include a sample config file
with allowed domains set to few sites in general to show its use

this includes some sites which allow search to be carried out
through them as well as provide news aggregation
2025-12-04 19:41:39 +05:30
hanishkvc 370326b1ec SimpleChatTC:SimpleProxy: Cleanup domain filtering and general
Had confused between js and python wrt accessing dictionary
contents and its consequence on non existent key. Fixed it.

Use different error ids to distinguish between failure in common
urlreq and the specific urltext and urlraw helpers.
2025-12-04 19:41:39 +05:30
hanishkvc 71ad609db6 SimpleChatTC:SimpleProxy: AllowedDomains based filtering
Allow fetching from only specified allowed.domains
2025-12-04 19:41:39 +05:30
hanishkvc 58954c8814 SimpleChatTC:SimpleProxy: Update doc following python convention 2025-12-04 19:41:39 +05:30
hanishkvc 62dcd506e3 SimpleChatTC:SimpleProxy:Allow for loading json based config file
The config entries should be named same as their equivalent cmdline
argument entries but without the -- prefix
2025-12-04 19:41:39 +05:30
hanishkvc 98d43fac7f SimpleChatTC:WebFetch: Try confirm simpleproxy before enabling 2025-12-04 19:41:39 +05:30
hanishkvc d04c8cd38d SimpleChatTC:SimpleProxy: Ensure CORS related headers sent always
Add a new send headers common helper and use the same wrt the
overridden send_error as well as do_OPTIONS

This ensures that if there is any error during proxy opertions,
the send_error propogates to the fetch from any browser properly
without browser intercepting it with a CORS error
2025-12-04 19:41:39 +05:30
hanishkvc 73a144c44d SimpleChatTC:SimpleProxy:HtmlParser more generic and flexible
also now track header, footer and nav so that they arent captured
2025-12-04 19:41:39 +05:30
hanishkvc 9ff2c596ee SimpleChatTC:SimpleProxy:Options just in case 2025-12-04 19:41:39 +05:30
hanishkvc bf63b8f45a SimpleChatTC:SimpleProxy:UrlText: Slightly better trimming
First identify lines which have only whitespace and replace them
with lines with only newline char in them.

Next strip out adjacent lines, if they have only newlines
2025-12-04 19:41:39 +05:30
hanishkvc 266e825c68 SimpleChatTC:SimpleProxy:UrlText: Try strip empty lines some what 2025-12-04 19:41:39 +05:30
hanishkvc b46bbc542a SimpleChatTC:SimpleProxy:UrlText: Avoid style blocks also 2025-12-04 19:41:39 +05:30
hanishkvc f493e1af59 SimpleChatTC:SimpleProxy:UrlText: Capture body except for scripts 2025-12-04 19:41:39 +05:30
hanishkvc 45b05df21b SimpleChatTC:SimpleProxy: Switch to html.parser
As html can be malformed, xml ElementTree XMLParser cant handle
the same properly, so switch to the HtmlParser helper class that is
provided by python and try extend it.

Currently a minimal skeleton to just start it out, which captures
only the body contents.
2025-12-04 19:41:39 +05:30
hanishkvc d5f4183f7c SimpleChatTC:SimpleProxy: ElementTree, No _UrlopenRet
As _UrlopenRet not exposed for use outside urllib, so decode and
encode the data.

Add skeleton to try get the html/xml tree top elements
2025-12-04 19:41:39 +05:30
hanishkvc 6537559360 SimpleChatTC:SimpleProxy:Common UrlReq helper for UrlRaw & UrlText
Declare the result of UrlReq as a DataClass, so that one doesnt
goof up wrt updating and accessing members.

Duplicate UrlRaw into UrlText, need to add Text extracting from
html next for UrlText
2025-12-04 19:41:39 +05:30
hanishkvc e600e62e86 SimpleChatTC:SimpleProxy: Cleanup few messages 2025-12-04 19:41:39 +05:30
hanishkvc 3bab4de0e8 SimpleChatTC:SimpleProxy:UrlRaw: Fixup basic oversight wrt 1st go 2025-12-04 19:41:39 +05:30
hanishkvc 73ef9f7d46 SimpleChatTC:SimpleProxy:implement handle_urlraw
A basic go at it
2025-12-04 19:41:39 +05:30
hanishkvc 73054a5832 SimpleChatTC:SimpleProxy: Extract and check path, route to handlers 2025-12-04 19:41:39 +05:30
hanishkvc c99788e290 SimpleChatTC:SimpleProxy: Cleanup for basic run 2025-12-04 19:41:39 +05:30
hanishkvc 80fd065993 SimpleChatTC:SimpleProxy: Start server, Show requested path 2025-12-04 19:41:39 +05:30
hanishkvc 05c0ade8be SimpleChatTC:SimpleProxy:Process args --port 2025-12-04 19:41:39 +05:30