hanishkvc
73a144c44d
SimpleChatTC:SimpleProxy:HtmlParser more generic and flexible
...
also now track header, footer and nav so that they arent captured
2025-12-04 19:41:39 +05:30
hanishkvc
9ff2c596ee
SimpleChatTC:SimpleProxy:Options just in case
2025-12-04 19:41:39 +05:30
hanishkvc
bf63b8f45a
SimpleChatTC:SimpleProxy:UrlText: Slightly better trimming
...
First identify lines which have only whitespace and replace them
with lines with only newline char in them.
Next strip out adjacent lines, if they have only newlines
2025-12-04 19:41:39 +05:30
hanishkvc
266e825c68
SimpleChatTC:SimpleProxy:UrlText: Try strip empty lines some what
2025-12-04 19:41:39 +05:30
hanishkvc
b46bbc542a
SimpleChatTC:SimpleProxy:UrlText: Avoid style blocks also
2025-12-04 19:41:39 +05:30
hanishkvc
f493e1af59
SimpleChatTC:SimpleProxy:UrlText: Capture body except for scripts
2025-12-04 19:41:39 +05:30
hanishkvc
45b05df21b
SimpleChatTC:SimpleProxy: Switch to html.parser
...
As html can be malformed, xml ElementTree XMLParser cant handle
the same properly, so switch to the HtmlParser helper class that is
provided by python and try extend it.
Currently a minimal skeleton to just start it out, which captures
only the body contents.
2025-12-04 19:41:39 +05:30
hanishkvc
d5f4183f7c
SimpleChatTC:SimpleProxy: ElementTree, No _UrlopenRet
...
As _UrlopenRet not exposed for use outside urllib, so decode and
encode the data.
Add skeleton to try get the html/xml tree top elements
2025-12-04 19:41:39 +05:30
hanishkvc
6537559360
SimpleChatTC:SimpleProxy:Common UrlReq helper for UrlRaw & UrlText
...
Declare the result of UrlReq as a DataClass, so that one doesnt
goof up wrt updating and accessing members.
Duplicate UrlRaw into UrlText, need to add Text extracting from
html next for UrlText
2025-12-04 19:41:39 +05:30
hanishkvc
e600e62e86
SimpleChatTC:SimpleProxy: Cleanup few messages
2025-12-04 19:41:39 +05:30
hanishkvc
3bab4de0e8
SimpleChatTC:SimpleProxy:UrlRaw: Fixup basic oversight wrt 1st go
2025-12-04 19:41:39 +05:30
hanishkvc
73ef9f7d46
SimpleChatTC:SimpleProxy:implement handle_urlraw
...
A basic go at it
2025-12-04 19:41:39 +05:30
hanishkvc
73054a5832
SimpleChatTC:SimpleProxy: Extract and check path, route to handlers
2025-12-04 19:41:39 +05:30
hanishkvc
c99788e290
SimpleChatTC:SimpleProxy: Cleanup for basic run
2025-12-04 19:41:39 +05:30
hanishkvc
80fd065993
SimpleChatTC:SimpleProxy: Start server, Show requested path
2025-12-04 19:41:39 +05:30
hanishkvc
05c0ade8be
SimpleChatTC:SimpleProxy:Process args --port
2025-12-04 19:41:39 +05:30