Commit Graph

8 Commits

Author SHA1 Message Date
hanishkvc 15e99843db SimpleChatTC:PdfText:Numbering T2 - Need diff scheme
This increaments before itself, but we need to increment after
2025-12-04 19:41:39 +05:30
hanishkvc bd60437cc6 SimpleChatTC:PdfText: Numbering T1 - Diff Scheme needed
This simple scheme doesnt work. Rather the pdf outline seems
to follow below logic

If a child list is found when processing the current list, dont
increment the numbering.
2025-12-04 19:41:39 +05:30
hanishkvc 51707b5169 SimpleChatTC:PdfText:Add initial skeleton for outline 2025-12-04 19:41:39 +05:30
hanishkvc 1d1894ad14 SimpleChatTC:PdfText:Cleanup rename to follow a common convention
Rename path and tags/identifiers from Pdf2Text to PdfText

Rename the function call to pdf_to_text, this should also help
indicate semantic more unambiguously, just in case, especially
for smaller models.
2025-12-04 19:41:39 +05:30
hanishkvc 494d063657 SimpleChatTC:SimpleProxy: getting local / web file module ++
Added logic to help get a file from either the local file system
or from the web, based on the url specified.

Update pdfmagic module to use the same, so that it can support
both local as well as web based pdf.

Bring in the debug module, which I had forgotten to commit, after
moving debug helper code from simpleproxy.py to the debug module
2025-12-04 19:41:39 +05:30
hanishkvc a3beacf16a SimpleChatTC:SimpleProxy:Pdf2Text cleanup page number handling
Its not necessary to request a page number range always.

Take care of page number starting from 1 and underlying data having
0 as the starting index
2025-12-04 19:41:39 +05:30
hanishkvc d012d127bf SimpleChatTC:SimpleProxy: Avoid circular deps wrt Type Checking
also move debug dump helper to its own module

also remember to specify the Class name in quotes, similar to
refering to a class within a member of th class wrt python type
checking.
2025-12-04 19:41:39 +05:30
hanishkvc a7de002fd0 SimpleChatTC:SimpleProxy:Move pdf logic into its own module 2025-12-04 19:41:39 +05:30