llama.cpp

Commit Graph

Author	SHA1	Message	Date
hanishkvc	9e97880dde	SimpleChatTC:SimpleProxy:Cleanup avoid logically duplicate debug log	2025-12-04 19:41:39 +05:30
hanishkvc	4c1c363504	SimpleChatTC:SimpleProxy: debug dumps to identify funny bing bing raised a challenge for chrome triggered search requests after few requests, which were spread few minutes apart, while still seemingly allowing wget based search to continue (again spread few minutes apart). Added a simple helper to trace this, use --debug True to enable same.	2025-12-04 19:41:39 +05:30
hanishkvc	dbb24fec77	SimpleChatTC:ToolResponse: Use browser dom for xml/html safe Instead of simple concatenating of tool call id, name and result now use browser's dom logic to create the xml structure used for now to store these within content field. This should take care of transforming / escaping any xml special chars in the result, so that extracting them later for putting into different fields in the server handshake doesnt have any problem.	2025-12-04 19:41:39 +05:30
hanishkvc	90d232dc4a	SimpleChatTC:SimpleProxy: Update readme wrt mimicing client req ie during proxying	2025-12-04 19:41:39 +05:30
hanishkvc	74226a0992	SimpleChatTC:ToolCall response relaxed handling Use DOMParser parseFromString in text/html mode rather than text/xml as it makes it more relaxed without worrying about special chars of xml like & etal	2025-12-04 19:41:39 +05:30
hanishkvc	c109da870f	SimpleChatTC:SimpleProxy: mimicing got req helps wrt duckduckgo mimicing got req in generated req helps with duckduckgo also and not just yahoo. also update allowed.domains to allow a url generated by ai when trying to access the bing's news aggregation url	2025-12-04 19:41:39 +05:30
hanishkvc	bebf846157	SimpleChatTC:SimpleProxy:Cleanup a bit The tagging of messages wrt ValidateUrl and UrlReq Also dump req Move check for --allowed.domains to ValidateUrl NOTE: Also with mimicing of user agent etal from got request to the generated request, yahoo search/news is returning results now, instead of the bland error before.	2025-12-04 19:41:39 +05:30
hanishkvc	d0b9103176	SimpleChatTC:SimpleProxy:Try mimic real client using got req info ie include User-Agent, Accept-Language and Accept in the generated request using equivalent values got in the request being proxied.	2025-12-04 19:41:39 +05:30
hanishkvc	e6e0adbe90	SimpleChatTC:SimpleProxy: Some debug prints which give info	2025-12-04 19:41:39 +05:30
hanishkvc	17365ed4b9	SimpleChatTC: Update readme a bit	2025-12-04 19:41:39 +05:30
hanishkvc	840cab0b1c	SimpleChatTC:SimpleProxy: Include a sample config file with allowed domains set to few sites in general to show its use this includes some sites which allow search to be carried out through them as well as provide news aggregation	2025-12-04 19:41:39 +05:30
hanishkvc	370326b1ec	SimpleChatTC:SimpleProxy: Cleanup domain filtering and general Had confused between js and python wrt accessing dictionary contents and its consequence on non existent key. Fixed it. Use different error ids to distinguish between failure in common urlreq and the specific urltext and urlraw helpers.	2025-12-04 19:41:39 +05:30
hanishkvc	71ad609db6	SimpleChatTC:SimpleProxy: AllowedDomains based filtering Allow fetching from only specified allowed.domains	2025-12-04 19:41:39 +05:30
hanishkvc	58954c8814	SimpleChatTC:SimpleProxy: Update doc following python convention	2025-12-04 19:41:39 +05:30
hanishkvc	62dcd506e3	SimpleChatTC:SimpleProxy:Allow for loading json based config file The config entries should be named same as their equivalent cmdline argument entries but without the -- prefix	2025-12-04 19:41:39 +05:30
hanishkvc	aac5213104	SimpleChatTC:Tools: Show available tool names Dont allow tool names to be changed in settings page	2025-12-04 19:41:39 +05:30
hanishkvc	aa8c8040cf	SimpleChatTC:Cleanup:ChatProps: apiEP	2025-12-04 19:41:39 +05:30
hanishkvc	ad65659a63	SimpleChatTC:Cleanup:ChatProps: bTrimGarbage Also remove more inner/detailed stuff from show info in not bAll mode, given that many of the previous differentiated stuff have been moved into chatProps and inturn shown for now	2025-12-04 19:41:39 +05:30
hanishkvc	82be13aa33	SimpleChatTC:Cleanup:ChatProps: bCompletionInsertStandardRolePrefix	2025-12-04 19:41:39 +05:30
hanishkvc	734f74c908	SimpleChatTC:Cleanup:ChatProps: bCompletionFreshChatAlways Moved into Me.chatProps	2025-12-04 19:41:39 +05:30
hanishkvc	78ccca056f	SimpleChatTC:Cleanup:ChatProps: iRecentUserMsgCnt Update Me class Update show settings Update show props info Update readme	2025-12-04 19:41:39 +05:30
hanishkvc	7409b29862	SimpleChatTC:Cleanup:ChatProps: Move bStream into it	2025-12-04 19:41:39 +05:30
hanishkvc	a54fa472dd	SimpleChatTC:ShowObjPropsEdit:Any depth trapping of ui setup - t2 Fix up the oversights wrt any depth trapping flow Remember to start the propWithTree being checked/trapped with : to indicate the root of the prop hierarchy and also use : as sep between the elements of the props hierarchy tree Also had forgotten about the goof up possible with using in in a condition statement to check for array to contain a entry of interest in JS, fixed it now.	2025-12-04 19:41:39 +05:30
hanishkvc	8d7eb68712	SimpleChatTC:ShowObjPropsEdit:Any depth trapping of ui setup Maintain the current property hierarchy to its root over recursive calls. Allow callers to specify the props to be trapped using the prop hierarchy. Pass the prop hierarchy to the fTrapper. This should allow one to trap any prop wrt its editing ui setup, irrespective of whether it is a prop of the main object passed, or a member of a child prop of the main object passed or so ... Update the setting up of ChatHistoryInCtxt and ApiEndPoint to follow the new semantic/flow.	2025-12-04 19:41:39 +05:30
hanishkvc	b19e754322	SimpleChatTC:Cleanup:Rename func arg to match semantic better	2025-12-04 19:41:39 +05:30
hanishkvc	03426f0276	SimpleChatTC:Cleanup:EditObjProps: rename vars followingConvention Part 1 - add el prefix wrt the element handle related vars	2025-12-04 19:41:39 +05:30
hanishkvc	3e490cefc5	SimpleChatTC:Cleanup: Move bTools and toolFetchProxyUrl into tools Also update the readme wrt same and related	2025-12-04 19:41:39 +05:30
hanishkvc	303af1800e	SimpleChatTC:ShowInfo:Clean up layout of showing of props data Also ensure when switching between sessions, the full set of props info is shown.	2025-12-04 19:41:39 +05:30
hanishkvc	0e21d67e8a	SimpleChatTC:ShowInfo: Allow showing minimal info set, if needed	2025-12-04 19:41:39 +05:30
hanishkvc	fc26e47222	SimpleChatTC:ShowObjPropsInfo: Use sections to indicate relations Also create a top level div wrt whole. And allow class to be specified for the same as well as the top level legend, optionally	2025-12-04 19:41:39 +05:30
hanishkvc	24ba85026e	SimpleChatTC:ShowInfo: Make logic recursive, avoid JSON.stringify	2025-12-04 19:41:39 +05:30
hanishkvc	34b2beea1a	SimpleChatTC:ShowInfo: Create and use common automated info show Also fetch info from ai-server, and place path and ctx size into current Me instance and include in show info.	2025-12-04 19:41:39 +05:30
hanishkvc	2a94cb3786	SimpleChatTC:Fetch:Proxy URL rename and in settings	2025-12-04 19:41:39 +05:30
hanishkvc	98d43fac7f	SimpleChatTC:WebFetch: Try confirm simpleproxy before enabling	2025-12-04 19:41:39 +05:30
hanishkvc	a6aa563a18	SimpleChatTC:WebFetch: Check for the specific proxy paths	2025-12-04 19:41:39 +05:30
hanishkvc	80dbbb89a5	SimpleChatTC:WebFetch: Enable only if something at proxyUrl NOTE: not a robust check, just tries to establish a http connection for now and doesnt really check if it is the specific proxy srvr of interest or not.	2025-12-04 19:41:39 +05:30
hanishkvc	fa0a6919cb	SimpleChatTC: Update/Cleanup readme	2025-12-04 19:41:39 +05:30
hanishkvc	8ca77e455a	SimpleChatTC:NonStreaming: Update oneshot mode wrt tool calls Take care of the possibility of content not being there as well as take care of retrieving the tool calls for further processing. With this tool calls should work in non streaming mode also	2025-12-04 19:41:39 +05:30
hanishkvc	3e0cf2a2df	SimpleChatTC:ObjPropsEdit: Obj within Obj aware fRefiner Use same to set a placeholder for Authorization entry in headers	2025-12-04 19:41:39 +05:30
hanishkvc	f874c69983	SimpleChatTC:UiShowObjPropsEdit allow refining	2025-12-04 19:41:39 +05:30
hanishkvc	6253c717b3	SimpleChatTC:Trappable UiShowObjPropsEdit for custom handling Use it to handle apiEP and iRecentUserMsgCnt in more user friendly way, where they get a selection to choose from.	2025-12-04 19:41:39 +05:30
hanishkvc	3718a39c06	SimpleChatTC:Use generic obj props edit for settings in general Bring more user controllable properties into this new settings ui	2025-12-04 19:41:39 +05:30
hanishkvc	756b128539	SimpleChatTC:UI:ObjPropEdits handle objects, use for gMe	2025-12-04 19:41:39 +05:30
hanishkvc	b771e42dc1	SimpleChatTC:UI:Common helper to edit obj members of few types Make the previously relatively generic flow wrt apiRequestOptions settings into a fully generic reusable by others flow. Rather had stopped short of it, when previously moved onto other things at that time.	2025-12-04 19:41:39 +05:30
hanishkvc	6e5b532313	SimpleChatTC:UI: el_get/el_set to avoid warnings	2025-12-04 19:41:39 +05:30
hanishkvc	04644761e6	SimpleChatTC:Tools: Pick proxy server address from document[gMe]	2025-12-04 19:41:39 +05:30
hanishkvc	9b55775e8a	SimpleChatTC:WebFetch: Update readme to reflect the new names	2025-12-04 19:41:39 +05:30
hanishkvc	42f91df261	SimpleChatTC:WebFetch:Trap Non Ok status and raise error So that the same error path is used for logical error wrt http req also, without needing a different path for it. Dont forget to return the resp text/json/..., so that the contents are passed along the promise then chain	2025-12-04 19:41:39 +05:30
hanishkvc	d04c8cd38d	SimpleChatTC:SimpleProxy: Ensure CORS related headers sent always Add a new send headers common helper and use the same wrt the overridden send_error as well as do_OPTIONS This ensures that if there is any error during proxy opertions, the send_error propogates to the fetch from any browser properly without browser intercepting it with a CORS error	2025-12-04 19:41:39 +05:30
hanishkvc	c2fb0cd241	SimpleChatTC:WebFetch: Cleanup the names and descriptions a bit	2025-12-04 19:41:39 +05:30
hanishkvc	73a144c44d	SimpleChatTC:SimpleProxy:HtmlParser more generic and flexible also now track header, footer and nav so that they arent captured	2025-12-04 19:41:39 +05:30
hanishkvc	cd226e8dae	SimpleChatTC: Update readme wrt web fetch and related simple proxy	2025-12-04 19:41:39 +05:30
hanishkvc	8b950fd348	SimpleChatTC:WebFetch:UrlEnc url2fetch b4Passing toProxy asQuery Ensures that if the url being requested as any query strings in them then things dont get messed up, when the url to get inc its query is extracted from the proxy request's query string	2025-12-04 19:41:39 +05:30
hanishkvc	9ff2c596ee	SimpleChatTC:SimpleProxy:Options just in case	2025-12-04 19:41:39 +05:30
hanishkvc	9c7d6cc0e4	SimpleChatTC:WebUrlText:Update name and desc to see if prefered	2025-12-04 19:41:39 +05:30
hanishkvc	bf63b8f45a	SimpleChatTC:SimpleProxy:UrlText: Slightly better trimming First identify lines which have only whitespace and replace them with lines with only newline char in them. Next strip out adjacent lines, if they have only newlines	2025-12-04 19:41:39 +05:30
hanishkvc	266e825c68	SimpleChatTC:SimpleProxy:UrlText: Try strip empty lines some what	2025-12-04 19:41:39 +05:30
hanishkvc	82ab08ec1a	SimpleChatTC:WebUrl FetchStrip through simple proxy	2025-12-04 19:41:39 +05:30
hanishkvc	b46bbc542a	SimpleChatTC:SimpleProxy:UrlText: Avoid style blocks also	2025-12-04 19:41:39 +05:30
hanishkvc	f493e1af59	SimpleChatTC:SimpleProxy:UrlText: Capture body except for scripts	2025-12-04 19:41:39 +05:30
hanishkvc	45b05df21b	SimpleChatTC:SimpleProxy: Switch to html.parser As html can be malformed, xml ElementTree XMLParser cant handle the same properly, so switch to the HtmlParser helper class that is provided by python and try extend it. Currently a minimal skeleton to just start it out, which captures only the body contents.	2025-12-04 19:41:39 +05:30
hanishkvc	d5f4183f7c	SimpleChatTC:SimpleProxy: ElementTree, No _UrlopenRet As _UrlopenRet not exposed for use outside urllib, so decode and encode the data. Add skeleton to try get the html/xml tree top elements	2025-12-04 19:41:39 +05:30
hanishkvc	6537559360	SimpleChatTC:SimpleProxy:Common UrlReq helper for UrlRaw & UrlText Declare the result of UrlReq as a DataClass, so that one doesnt goof up wrt updating and accessing members. Duplicate UrlRaw into UrlText, need to add Text extracting from html next for UrlText	2025-12-04 19:41:39 +05:30
hanishkvc	e600e62e86	SimpleChatTC:SimpleProxy: Cleanup few messages	2025-12-04 19:41:39 +05:30
hanishkvc	c25b1968cd	SimpleChatTC:WebFetch: Update to use internal SimpleProxy.py	2025-12-04 19:41:39 +05:30
hanishkvc	3bab4de0e8	SimpleChatTC:SimpleProxy:UrlRaw: Fixup basic oversight wrt 1st go	2025-12-04 19:41:39 +05:30
hanishkvc	73ef9f7d46	SimpleChatTC:SimpleProxy:implement handle_urlraw A basic go at it	2025-12-04 19:41:39 +05:30
hanishkvc	73054a5832	SimpleChatTC:SimpleProxy: Extract and check path, route to handlers	2025-12-04 19:41:39 +05:30
hanishkvc	c99788e290	SimpleChatTC:SimpleProxy: Cleanup for basic run	2025-12-04 19:41:39 +05:30
hanishkvc	80fd065993	SimpleChatTC:SimpleProxy: Start server, Show requested path	2025-12-04 19:41:39 +05:30
hanishkvc	05c0ade8be	SimpleChatTC:SimpleProxy:Process args --port	2025-12-04 19:41:39 +05:30
hanishkvc	8fc74ef923	SimpleChatTC:WebFetchThroughProxy:Initial go creating request	2025-12-04 19:41:39 +05:30
hanishkvc	09ce19a95a	SimpleChatTC: update readme wrt promise related trapping	2025-12-04 19:41:39 +05:30
hanishkvc	f0a3886d1e	SimpleChatTC:Ensure fetch's promise chain is also trapped Dont forget to map members of got entity from fetch to things from saved original promise, bcas remember what is got is a promise. also add some comments around certain decisions and needed exploration	2025-12-04 19:41:39 +05:30
hanishkvc	77d3e43cb4	SimpleChatTC: Allow await in generated code that will be evald	2025-12-04 19:41:39 +05:30
hanishkvc	92e5b2133e	SimpleChatTC:Promises: trap normal fetch (dont care await or not)	2025-12-04 19:41:39 +05:30
hanishkvc	0241b7b469	SimpleChatTC:TrapPromise: log the trapping also possible refinement wrt trapping, if needed, added as comment all or allSettled to use or not is the question. whether to wait for a round trip through the related event loop or not is also a question.	2025-12-04 19:41:39 +05:30
hanishkvc	3d661793ef	SimpleChatTC:ChatMessageEx: 1st go at trying to track promises	2025-12-04 19:41:39 +05:30
hanishkvc	7dbbc46390	SimpleChatTC:ChatMessageEx: Better tool result extractor	2025-12-04 19:41:39 +05:30
hanishkvc	61b70bfa5d	SimpleChatTC:Readme: Updated wrt new relativelyProper toolCallsHS Also update the sliding window context size to last 9 chat messages so that there is a sufficiently large context for multi turn tool calls based adjusting by ai and user, without needing to go full hog, which has the issue of overflowing the currently set context window wrt the loaded ai model.	2025-12-04 19:41:39 +05:30
hanishkvc	152deb5d5a	SimpleChatTC:ChatMessageEx:While at it also ns_delete these common helpers avoid needing ignore tagging to ts-check, in places where valid constructs have been used which go beyond strict structured js handling that is tried to be achieved using it, but are still valid and legal.	2025-12-04 19:41:39 +05:30
hanishkvc	cc65a2f7a3	SimpleChatTC:ChatMessageEx: Build tool role result fully Expand the xml format id, name and content in content field of tool result into apropriate fields in the tool result message sent to the genai/llm engine on the server.	2025-12-04 19:41:39 +05:30
hanishkvc	ebc7f88b53	SimpleChatTC:Propogate toolcall id through tool call chain Use HTMLElement's dataset to maintain tool call id along with the element which maintains the toolname. Pass it along to the tools manager and inturn the actual tool calls and through them to the web worker handling the tool call related code and inturn returning it back as part of the obj which is used to return the tool call result. Embed the tool call id, function name and function result into the content field of chat message in terms of a xml structure Also make use of tool role to send back the tool call result. Do note that currently the id, name and content are all embedded into the content field of the tool role message sent to the ai engine on the server. NOTE: Use the user query entry area for showing tool call result in the above mentioned xml form, as well as for user to enter their own queries. Based on presence of the xml format data at beginning the logic will treat it has a tool result and if not then as a normal user query. The css has been updated to help show tool results/msgs in a lightyellow background	2025-12-04 19:41:39 +05:30
hanishkvc	2bb3d747e6	SimpleChatTC:ChatMessageEx: send tool_calls, only if needed	2025-12-04 19:41:39 +05:30
hanishkvc	2ef201ff8d	SimpleChatTC:Load allows old and new ChatMessage(Ex) formats	2025-12-04 19:41:39 +05:30
hanishkvc	475858a4b3	SimpleChatTC:ChatMessageEx: Cleanup remaining stuff wrt ChatMessageEx related required flow as well as avoid warnings	2025-12-04 19:41:39 +05:30
hanishkvc	963b9f4661	SimpleChatTC:ChatMessageEx: Recent chat users upd Users of recent_chat updated to work with ChatMessageEx As part of same recent_chat_ns also added, for the case where the array of chat messages can be passed as is ie in the chat mode, provided it has only the network handshake representation of the messages.	2025-12-04 19:41:39 +05:30
hanishkvc	4d9e3d1566	SimpleChatTC:ChatMessageEx: Upd Add, rm sysPromptAtBeginOnly hlpr Simplify Add semantic by expecting any validation of stuff before adding to be done by the callers of Add and not by add itself. Also update it to expect ChatMessageEx object Update all users of add to follow the new syntax and semantic. Remove the old and ununsed AddSysPromptOnlyAtBegin helper	2025-12-04 19:41:39 +05:30
hanishkvc	c65c1d5f0f	SimpleChatTC:ChatMessageEx: RecentChat, GetSystemLatest GetSystemLatest and its users updated wrt ChatMessageEx. RecentChat updated wrt ChatMessageEx. Also now irrespective of whether full history is being retrieved or only a subset, both cases refer to the ChatMessageEx instances in SimpleChat.xchat without creating new instances of anything.	2025-12-04 19:41:39 +05:30
hanishkvc	343d414dd3	SimpleChatTC:ChatMessageEx: ods load, system prompt related these have been updated to work with ChatMessageEx to an extent	2025-12-04 19:41:39 +05:30
hanishkvc	abbf927557	SimpleChatTC:ChatMessageEx: add update_oneshot response_extract logic moved directly into ChatMessageEx as update oneshot, with suitable adjustments. Inturn use the same directly.	2025-12-04 19:41:39 +05:30
hanishkvc	361f6968d1	SimpleChatTC:ChatMessage: remove ResponseExtractStream Use the equivalent update_stream directly added to ChatMessageEx. update_stream is also more generic to some extent and also directly implemented by the ChatMessageEx class.	2025-12-04 19:41:39 +05:30
hanishkvc	32dd63ee1d	SimpleChatTC:ChatMessageEx:cleanup, HasToolCalls, ContentEquiv Update HasToolCalls and ContentEquiv to work with new structure	2025-12-04 19:41:39 +05:30
hanishkvc	aa229a1f99	SimpleChatTC:ChatMessageEx: UpdateStream logic Rename ChatMessage to ChatMessageEx. Add typedefs for NSToolCall and NSChatMessage, they represent the way the corresponding data is structured in network hs. Add logic to build the ChatMessageEx from data got over network in streaming mode.	2025-12-04 19:41:39 +05:30
hanishkvc	2c29c2d589	SimpleChatTC:ChatMessage: AssistantResponse into chat message class Modify the constructor, newFrom and clear towards this goal.	2025-12-04 19:41:39 +05:30
hanishkvc	37faf8611a	SimpleChatTC: update descs to indicate use of web workers ie wrt the tool calls provided.	2025-12-04 19:41:39 +05:30
hanishkvc	c2112618c0	SimpleChatTC: Update readme.md wrt latest updates. 2k maxtokens	2025-12-04 19:41:39 +05:30
hanishkvc	1789f5f1e2	SimpleChatTC: Increase the sliding window context to Last4 QA As the tool calling, if enabled, will need access to last few user query and ai assistant responses (which will also include in them the tool call requests and the corresponding results), so that the model can build answers based on its tool call reqs and got responses, and also given that most of the models these days have sufficiently large context windows, so the sliding window context implemented by SimpleChat logic has been increased by default to include last 4 query and their responses roughlty.	2025-12-04 19:41:39 +05:30
hanishkvc	a0f6762fda	SimpleChatTC: Web worker flow initial go cleanup Had forgotten to specify type as module wrt web worker, in order to allow it to import the toolsconsole module. Had forgotten to maintain the id of the timeout handler, which is needed to clear/stop the timeout handler from triggering, if tool call response is got well in time. As I am currently reverting the console redirection at end of handling a tool call code in the web worker message handler, I need to setup the redirection each time. Also I had forgotten to clear the console.log capture data space, before a new tool call code is executed, this is also fixed by this change. TODO: Need to abort the tool call code execution in the web worker if possible in future, if the client / browser side times out waiting for tool call response, ie if the tool call code is taking up too much time.	2025-12-04 19:41:39 +05:30
hanishkvc	148ec1c41a	SimpleChatTC: Get ready for decoupled tool call response tools manager/module * setup the web worker that will help execute the tool call related codes in a js environment that is isolated from the browsers main js environment * pass the web worker to the tool call providers, for them to use * dont wait for the result from the tool call, as it will be got later asynchronously through a message * allow users of the tools manager to register a call back, which will be called when ever a message is got from the web worker containing response wrt previously requested tool call execution. simplechat * decouple toolcall response handling and toolcall requesting logic * setup a timeout to take back control if tool call takes up too much time. Inturn help alert the ai model, that the tool call took up too much time and so was aborted, by placing a approriate tagged tool response into user query area. * register a call back that will be called when response is got asynchronously wrt anye requested tool calls. In turn take care of updating the user query area with response got wrt the tool call, along with tool response tag around it.	2025-12-04 19:41:39 +05:30
hanishkvc	2a8bd1c9e7	SimpleChatTC: Actual tool call implementations simplified These no longer need to worry about * setting up the console.log related redirection to capture the generated outputs, nor about * setting up a dynamic function for executing the needed tool call related code The web worker setup to help run tool calls in a relatively isolated environment independent of the main browser env, takes care of these. One needs to only worry about getting the handle to the web worker to use and inturn pass the need code wrt the tool call to it.	2025-12-04 19:41:39 +05:30
hanishkvc	14d67f6c3c	SimpleChatTC: Pass around structured objects wrt tool worker The request for code to run as well as the resultant response data both need to follow a structured object convention, so that it is easy to map a request and the corresponding response to some extent.	2025-12-04 19:41:39 +05:30
hanishkvc	510c65c721	SimpleChatTC: Initial skeleton of a simple toolsworker	2025-12-04 19:41:39 +05:30
hanishkvc	a6bccf934e	SimpleChatTC:ToolsConsole:Cleanup a bit, add basic set of notes Try ensure as well as verify that original console.log is saved and not overwritten. Throw an exception if things seem off wrt same. Also ensure to add a newline at end of console.log messages	2025-12-04 19:41:39 +05:30
hanishkvc	2701cb3a1e	SimpleChatTC: Move console.log trapping into its own module So that it can be used from different modules, if required.	2025-12-04 19:41:39 +05:30
hanishkvc	45d8a00738	SimpleChatTC: Update readme wrt --jinja argument and bit more	2025-12-04 19:41:39 +05:30
hanishkvc	a8c8176d09	SimpleChatTC: Tool Calling UI elements use up horizontal space	2025-12-04 19:41:39 +05:30
hanishkvc	1e5b638beb	SimpleChatTC: Update readme with bit more details, Cleaner UI Also avoid showing Tool calling UI elements, when not needed to be shown.	2025-12-04 19:41:39 +05:30
hanishkvc	bfe789706e	SimpleChatTC: Let user trigger tool call, instead of automatic Instead of automatically calling any requested tool by the GenAi / llm, that is from the tail end of the handle user submit btn click, Now if the GenAi/LLM has requested any tool to be called, then enable the Tool Run related UI elements and fill them with the tool name and tool args. In turn the user can verify if they are ok with the tool being called and the arguments being passed to it. Rather they can even fix any errors in the tool usage like the arithmatic expr to calculate that is being passed to simple_calculator or the javascript code being passed to run_javascript_function_code If user is ok with the tool call being requested, then trigger the same. The results if any will be automatically placed into the user query text area. User can cross verify if they are ok with the result and or modify it suitabley if required and inturn submit the same to the GenAi/LLM.	2025-12-04 19:41:39 +05:30
hanishkvc	1fc44c971d	SimpleChatTC: Add ui elements for tool call verify and trigger Instead of automatically calling the requested tool with supplied arguments, rather allow user to verify things before triggering the tool. NOTE: User already provided control over tool_response before submitting it to the ai assistant.	2025-12-04 19:41:38 +05:30
hanishkvc	fd662b4b0b	SimpleChatTC: ToolCall hs info in normal assistant-user chat flow Also as part of same, wrap the request details in the assistant block using a similar tagging format as the tool_response in user block.	2025-12-04 19:41:38 +05:30
hanishkvc	30aa2f4c6b	SimpleChatTC: Update the readme.md wrt tool calling a bit	2025-12-04 19:41:38 +05:30
hanishkvc	63b5c6d76d	SimpleChatTC: Cleanup the function description a bit to better describe how it will be run, so that genai/llm while creating the code to run, will hopefully take care of any naunces required.	2025-12-04 19:41:38 +05:30
hanishkvc	a80da9a652	SimpleChatTC: Pass toolname to the tool handler So that when tool handler writes the result to the tc_switch, it can make use of the same, to write to the right location. NOTE: This also fixes the issue with I forgetting to rename the key in js_run wrt writing of result.	2025-12-04 19:41:38 +05:30
hanishkvc	f7284a8b89	SimpleChatTC: Move tool calling to tools, try trap async failures Move tool calling logic into tools module. Try trap async promise failures by awaiting results of tool calling and putting full thing in an outer try catch. Have forgotten the nitty gritties of JS flow, this might help, need to check.	2025-12-04 19:41:38 +05:30
hanishkvc	ef85ed41d4	SimpleChatTC: Clarify some type definitions to avoid warnings ie in vs code with ts-check	2025-12-04 19:41:38 +05:30
hanishkvc	a408e5e017	SimpleChatTC: More clearer description of toolcalls execution env Should hopeful ensure that the GenAi/LLM will generate appropriate code/expression as the argument to pass to these tool calls, to some extent.	2025-12-04 19:41:38 +05:30
hanishkvc	b4776da670	SimpleChatTC: Trap any exception raised during tool call and inform the GenAi/LLM about the same	2025-12-04 19:41:38 +05:30
hanishkvc	17c5daa52c	SimpleChatTC: Cleanup initial/1st go toolcall flow As output generated by any tool/function call is currently placed into the TextArea provided for End user (for their queries), bcas the GenAi (engine/LLM) may be expecting the tool response to be sent as a user role data with tool_response tag surrounding the results from the tool call. So also now at the end of submit btn click handling, the end user input text area is not cleared, if there was a tool call handled, for above reasons. Also given that running a simple arithmatic expression in itself doesnt generate any output, so wrap them in a console.log, to help capture the result using the console.log trapping flow that is already setup.	2025-12-04 19:41:38 +05:30
hanishkvc	301910c3a1	SimpleChatTC: Implement a simple toolcall handling flow Checks for toolname to be defined or not in the GenAi's response If toolname is set, then check if a corresponding tool/func exists, and if so call the same by passing it the GenAi provided toolargs as a object. Inturn the text generated by the tool/func is captured and put into the user input entry text box, with tool_response tag around it.	2025-12-04 19:41:38 +05:30
hanishkvc	fa63a86c71	SimpleChatTC:tooljs: Trap console.log and store in new result key The implementations of javascript and simple_calculator now use provided helpers to trap console.log messages when they execute the code / expression provided by GenAi and inturn store the captured log messages in the newly added result key in tc_switch This should help trap the output generated by the provided code or expression as the case maybe and inturn return the same to the GenAi, for its further processing.	2025-12-04 19:41:38 +05:30
hanishkvc	6d43011003	SimpleChatTC: Saner/Robust AssistantResponse content_equiv Previously if content was empty, it would have always sent the toolcall info related version even if there was no toolcall info in it. Fixed now to return empty string, if both content and toolname are empty.	2025-12-04 19:41:38 +05:30
hanishkvc	383c19c99b	SimpleChatTC: twins wrt streamed response handling As there could be failure wrt getting the response from the ai server some where in between a long response spread over multiple parts, the logic uses the latestResponse to cache the response as it is being received. However once the full response is got, one needs to transfer it to a new instance of AssistantResponse class, so that latestResponse can be cleared, while the new instance can be used in other locations in the flow as needed. Achieve the same now.	2025-12-04 19:41:38 +05:30
hanishkvc	53f85d09be	SimpleChatTC: AssistantResponse everywhere initial go Switch oneshot handler to use AssistantResponse, inturn currenlty only handle the normal content in the response. TODO: If any tool_calls in the oneshot response, it is currently not handled. Inturn switch the generic/toplevel handle response logic to use AssistantResponse class, given that both oneshot and the multipart/streaming flows use/return it. Inturn add trimmedContent member to AssistantResponse class and make the generic handle response logic to save the trimmed content into this. Update users of trimmed to work with this structure.	2025-12-04 19:41:38 +05:30
hanishkvc	3f3aa8d043	SimpleChatTC: AssistantResponse class initial go Make latestResponse into a new class based type instance wrt ai assistant response, which is what it represents. Move clearing, appending fields' values and getting assistant's response info (irrespective of a content or toolcall response) into this new class and inturn use the same.	2025-12-04 19:41:38 +05:30
hanishkvc	5a26831ad2	SimpleChatTC: Show toolcall being generated by ai - Temp	2025-12-04 19:41:38 +05:30
hanishkvc	e73bc4550b	SimpleChatTC: Avoid null content, Fix oversight wrt finish_reason I was wrongly checking for finish_reason to be non null, before trying to extract the genai content/toolcalls, have fixed this oversight with the new flow in progress. I had added few debug logs to identify the above issue, need to remove them later. Note: given that debug logs are disabled by replacing the debug function during this program's initialisation, which I had forgotten about, I didnt get the debug messages and had to scratch my head a bit, before realising this and the other issue ;) Also either when I had originally implemented simplechat 1+ years back, or later due to changes on the server end, the streaming flow sends a initial null wrt the content, where it only sets the role. This was not handled in my flow on the client side, so a null was getting prepended to the chat messages/responses from the server. This has been fixed now in the new generic flow.	2025-12-04 19:41:38 +05:30
hanishkvc	63430dc9f7	SimpleChatTC: Extract streamed field - assume only 1f at any time Update response_extract_stream to check for which field is being currently streamed ie is it normal content or tool call func name or tool call func args and then return the field name and extracted value. Previously it was always assumed that only normal content will be returned. Currently it is assumed that the server will only stream one of the 3 supported fields at any time and not more than one of them at the same time. TODO: Have to also add logic to extract the reasoning field later, ie wrt gen ai models which give out their thinking. Have updated append_response to expect both the key and the value wrt the latestResponse object, which it will be manipualted. Previously it was always assumed that content is what will be got and inturn appended.	2025-12-04 19:41:38 +05:30
hanishkvc	bfe7ef69fa	SimpleChatTC: Skeleton to handle diff fields when streaming Changed latestResponse type to an object instead of a string. Inturn it contains entries for content, toolname and toolargs. Added a custom clear logic due to the same and used it to replace the previously simple assigning of empty string to latestResponse. For now in all places where latestReponse is used, I have replaced with latestReponse.content. Next need to handle identifying the field being streamed and inturn append to it. Also need to add logic to call tool, when tool_call triggered by genai.	2025-12-04 19:41:38 +05:30
hanishkvc	32f5278e8c	SimpleChatTC: use tcpdump to dbg hs; check if ai aware of tools	2025-12-04 19:41:38 +05:30
hanishkvc	6167cdff9f	SimpleChatTC: Bring in the tools meta into the main flow	2025-12-04 19:41:38 +05:30
hanishkvc	46f0304105	SimpleChatTC: More generic tooljs, SimpCalc, some main skeleton Make tooljs structure and flow more generic Add a simple_calculator tool/function call logic Add initial skeleton wrt the main tools.mjs file.	2025-12-04 19:41:38 +05:30
hanishkvc	f1aa0ee778	SimpleChatTC: Add skeleton for a javascript interpretor tool call Define the meta that needs to be passed to the GenAi Engine. Define the logic that implements the tool call, if called. Implement the flow/structure such that a single tool calls implementation file can define multiple tool calls.	2025-12-04 19:41:38 +05:30
hanishkvc	48c9f07982	SimpleChatTC: Update test shell script a bit Enable streaming by default, to check the handshake before going on to change the code, given that havent looked into this for more than a year now and have been busy with totally different stuff. Also updated the user messages used for testing a bit	2025-12-04 19:41:38 +05:30
hanishkvc	9341c507f2	SimpleChatTools: Add boolean to allow user control of tools use	2025-12-04 19:41:38 +05:30
hanishkvc	4282a4277a	SimpleChatToolCalling: Test/Explore srvr initial hs using cmdline	2025-12-04 19:41:38 +05:30
Adrien Gallouët	ef75a89fdb	build : move _WIN32_WINNT definition to headers (#17736 ) Previously, cmake was forcing `_WIN32_WINNT=0x0A00` for MinGW builds, This caused "macro redefined" warnings with toolchains that define the version. This also removes the `GGML_WIN_VER` variable as it is no longer needed. Signed-off-by: Adrien Gallouët <angt@huggingface.co>	2025-12-04 07:04:02 +01:00
Piotr Wilkin (ilintar)	c6d1a00aa7	Add a couple of file types to the text section (#17670 ) * Add a couple of file types to the text section * Format + regenerate index * Rebuild after rebase	2025-12-03 21:45:06 +01:00
Aleksander Grygier	e9f9483464	Use OpenAI-compatible `/v1/models` endpoint by default (#17689 ) * refactor: Data fetching via stores * chore: update webui build output * refactor: Use OpenAI compat `/v1/models` endpoint by default to list models * chore: update webui build output * chore: update webui build output	2025-12-03 20:49:09 +01:00
Andika Wasisto	41c5e02f42	webui: Fix zero pasteLongTextToFileLen to disable conversion being overridden (#17445 ) * webui: Fix zero pasteLongTextToFileLen to disable conversion being overridden Zero pasteLongTextToFileLen should disable the conversion, but it was overwritten with 2500. * Apply suggestions from code review * Update webui build	2025-12-03 20:45:17 +01:00
Pascal	e7c2cf1356	server: add router multi-model tests (#17704 ) (#17722 ) * llama-server: add router multi-model tests (#17704) Add 4 test cases for model router: - test_router_unload_model: explicit model unloading - test_router_models_max_evicts_lru: LRU eviction with --models-max - test_router_no_models_autoload: --no-models-autoload flag behavior - test_router_api_key_required: API key authentication Tests use async model loading with polling and graceful skip when insufficient models available for eviction testing. utils.py changes: - Add models_max, models_dir, no_models_autoload attributes to ServerProcess - Handle JSONDecodeError for non-JSON error responses (fallback to text) * llama-server: update test models to new HF repos * add offline * llama-server: fix router LRU eviction test and add preloading Fix eviction test: load 2 models first, verify state, then load 3rd to trigger eviction. Previous logic loaded all 3 at once, causing first model to be evicted before verification could occur. Add module fixture to preload models via ServerPreset.load_all() and mark test presets as offline to use cached models * llama-server: fix split model download on Windows --------- Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>	2025-12-03 15:10:37 +01:00
Adrien Gallouët	1257491047	server : fix bad fmt, size() is a size_type (#17735 ) Signed-off-by: Adrien Gallouët <angt@huggingface.co>	2025-12-03 15:47:22 +02:00
Aldehir Rojas	0a8026e768	common : introduce composable PEG parser combinators for chat parsing (#17136 ) * common : implement parser combinators to simplify chat parsing * add virtual destructor to parser_base * fix memory leak from circular references of rules * implement gbnf grammar building * remove unused private variable * create a base visitor and implement id assignment as a visitor * fix const ref for grammar builder * clean up types, friend classes, and class declarations * remove builder usage from until_parser * Use a counter class to help assign rule ids * cache everything * add short description for each parser * create a type for the root parser * implement repetition parser * Make optional, one_or_more, and zero_or_more subclasses of repetition * improve context constructor * improve until parsing and add benchmarks * remove cached() pattern, cache in parser_base with specialized parsing functions for each parser * improve json parsing performance to better match legacy parsing * fix const auto * it for windows * move id assignment to classes instead of using a visitor * create named rules in the command r7b example * use '.' for any in GBNF * fix parens around choices in gbnf grammar * add convenience operators to turn strings to literals * add free-form operators for const char * to simplify defining literals * simplify test case parser * implement semantic actions * remove groups in favor of actions and a scratchpad * add built in actions for common operations * add actions to command r7b example * use std::default_searcher for platforms that don't have bm * improve parser_type handling and add cast helper * add partial result type to better control when to run actions * fix bug in until() * run actions on partial results by default * use common_chat_msg for result * add qwen3 example wip * trash partial idea and simplify * move action arguments to a struct * implement aho-corasick matcher for until_parser and to build exclusion grammars * use std::string for input, since std::string_view is incompatible with std::regex * Refactor tests * improve qwen3 example * implement sax-style parsing and refactor * fix json string in test * rename classes to use common_chat_ prefix * remove is_ suffix from functions * rename from id_counter to just counter * Final refactored tests * Fix executable name and editorconfig-checker * Third time's the charm... * add trigger parser to begin lazy grammar rule generation * working lazy grammar * refactor json rules now that we check for reachability * reduce pointer usage * print out grammars in example * rename to chat-peg-parser* and common_chat_peg_parser* * Revert unrelated changes * New macros for CMakeLists to enable multi-file compilations * starting unicode support * add unicode support to char_parser * use unparsed args as additional sources * Refactor tests to new harness * Fix CMakeLists * fix rate calculation * add unicode tests * fix trailing whitespace and line endings skip-checks: true * Helpers + rewrite qwen3 with helpers * Fix whitespace * extract unicode functions to separate file * refactor parse unicode function * fix compiler error * improve construction of sequence/choice parsers * be less clever * add make_parser helper function * expand usage of make_parser, alias common_chat_msg_peg_parser_builder to builder in source * lower bench iterations * add unicode support to until_parser * add unicode support to json_string_parser * clean up unicode tests * reduce unicode details to match src/unicode.cpp * simplify even further * remove unused functions * fix type * reformat char class parsing * clean up json string parser * clean up + fix diagnostics * reorder includes * compact builder functions * replace action_parser with capture_parser, rename env to semantics * rename env to semantics * clean up common_chat_parse_context * move type() to below constant * use default constructor for common_chat_peg_parser * make all operators functions for consistency * fix compilation errors in test-optional.cpp * simplify result values * rename json_string_unquoted to json_string_content * Move helper to separate class, add separate explicit and helper classes * Whitespace * Change + to append() * Reformat * Add extra helpers, tests and Minimax example * Add some extra optional debugging prints + real example of how to use them * fix bug in repetitions when min_count = 0 reports failures * dump rule in debug * fix token accumulation and assert parsing never fails * indent debug by depth * use LOG_* in tests so logs sync up with test logs * - Add selective testing - Refactor all messaging to use LOG_ERR - Fix lack of argument / tool name capturing - Temporary fix for double event capture * refactor rule() and introduce ref() * clean up visitor * clean up indirection in root parser w.r.t rules * store shared ptr directly in parser classes * replace aho-corasick automation with a simple trie * Reset prev for qwen3 helper example variant * refactor to use value semantics with std::variant/std::visit * simplify trie_matcher result * fix linting issues * add annotations to rules * revert test workaround * implement serializing the parser * remove redundant parsers * remove tests * gbnf generation fixes * remove LOG_* use in tests * update gbnf tests to test entire grammar * clean up gbnf generation and fix a few bugs * fix typo in test output * remove implicit conversion rules * improve test output * rename trie_matcher to trie * simplify trie to just know if a node is the end of a word * remove common_chat_ prefix and ensure a common_peg_ prefix to all types * rename chat-peg-parser -> peg-parser * promote chat-peg-parser-helper to chat-peg-parser * checkpoint * use a static_assert to ensure we handle every branch * inline trivial peg parser builders * use json strings for now * implement basic and native chat peg parser builders/extractors * resolve refs to their rules * remove packrat caching (for now) * update tests * compare parsers with incremental input * benchmark both complete and incremental parsing * add raw string generation from json schema * add support for string schemas in gbnf generation * fix qwen example to include \n * tidy up example * rename extractor to mapper * rename ast_arena to ast * place basic tests into one * use gbnf_format_literal from json-schema-to-grammar * integrate parser with common/chat and server * clean up schema and serialization * add json-schema raw string tests * clean up json creation and remove capture parser * trim spaces from reasoning and content * clean up redundant rules and comments * rename input_is_complete to is_partial to match rest of project * simplify json rules * remove extraneous file * remove comment * implement += and \|= operators * add comments to qwen3 implementation * reorder arguments to common_chat_peg_parse * remove commented outdated tests * add explicit copy constructor * fix operators and constness * wip: update test-chat for qwen3-coder * bring json parser closer to json-schema-to-grammar rules * trim trailing space for most things * fix qwen3 coder rules w.r.t. trailing spaces * group rules * do not trim trailing space from string args * tweak spacing of qwen3 grammar * update qwen3-coder tests * qwen3-coder small fixes * place parser in common_chat_syntax to simplify invocation * use std::set to collect rules to keep order predictable for tests * initialize parser to make certain platforms happy * revert back to std::unordered_set, sort rule names at the end instead * uncomment rest of chat tests * define explicit default constructor * improve arena init and server integration * fix chat test * add json_member() * add a comprehensive native example * clean up example qwen test and add response_format example to native test * make build_peg_parser accept std::function instead of template * change peg parser parameters into const ref * push tool call on tool open for constructed parser * add parsing documentation * clean up some comments * add json schema support to qwen3-coder * add id initializer in tests * remove grammar debug line from qwen3-coder * refactor qwen3-coder to use sequence over operators * only call common_chat_peg_parse if appropriate format * simplify qwen3-coder space handling * revert qwen3-coder implementation * revert json-schema-to-grammar changes * remove unnecessary forward declaration * small adjustment to until_parser * rename C/C++ files to use dashes * codeowners : add aldehir to peg-parser and related files --------- Co-authored-by: Piotr Wilkin <piotr.wilkin@syndatis.com>	2025-12-03 12:45:32 +02:00
Pascal	5ceed62421	server: fix duplicate HTTP headers in multiple models mode (#17698 ) * llama-server: fix duplicate HTTP headers in multiple models mode (#17693) * llama-server: address review feedback from ngxson - restrict scope of header after std::move - simplify header check (remove unordered_set)	2025-12-03 10:28:43 +01:00
Xuan-Son Nguyen	13628d8bdb	server: add --media-path for local media files (#17697 ) * server: add --media-path for local media files * remove unused fn	2025-12-02 22:49:20 +01:00
Xuan-Son Nguyen	a96283adc4	mtmd: fix --no-warmup (#17695 )	2025-12-02 22:48:08 +01:00
Chad Voegele	c4357dcc35	Server: Change Invalid Schema from Server Error (500) to User Error (400) (#17572 ) * Make invalid schema a user error (400) * Move invalid_argument exception handler to ex_wrapper * Fix test * Simplify test back to original pattern	2025-12-02 17:33:50 +01:00
Xuan-Son Nguyen	5d6bd842ea	server: remove default "gpt-3.5-turbo" model name (#17668 ) * server: remove default "gpt-3.5-turbo" model name * do not reflect back model name from request * fix test	2025-12-02 11:38:57 +01:00
senhtry	fd3abe849e	server: fixing naming conflict res_error in server-models.cpp (#17679 )	2025-12-02 11:18:39 +01:00
Xuan-Son Nguyen	682e6658bb	server: explicitly set exec path when create new instance (#17669 ) * Revert "rm unused fn" This reverts commit `f2dbe9c087`. * server: explicitly set exec path when create new instance * put back TODO * only call get_server_exec_path() once * add fallback logic	2025-12-02 10:25:11 +01:00
Aleksander Grygier	cee92af553	Add context info to server error (#17663 ) * fix: Add context info to server error * chore: update webui build output	2025-12-02 09:20:57 +01:00
Xuan-Son Nguyen	ecf74a8417	mtmd: add mtmd_context_params::warmup option (#17652 ) * mtmd: add mtmd_context_params::warmup option * reuse the common_params::warmup	2025-12-01 21:32:25 +01:00
Xuan-Son Nguyen	ec18edfcba	server: introduce API for serving / loading / unloading multiple models (#17470 ) * server: add model management and proxy * fix compile error * does this fix windows? * fix windows build * use subprocess.h, better logging * add test * fix windows * feat: Model/Router server architecture WIP * more stable * fix unsafe pointer * also allow terminate loading model * add is_active() * refactor: Architecture improvements * tmp apply upstream fix * address most problems * address thread safety issue * address review comment * add docs (first version) * address review comment * feat: Improved UX for model information, modality interactions etc * chore: update webui build output * refactor: Use only the message data `model` property for displaying model used info * chore: update webui build output * add --models-dir param * feat: New Model Selection UX WIP * chore: update webui build output * feat: Add auto-mic setting * feat: Attachments UX improvements * implement LRU * remove default model path * better --models-dir * add env for args * address review comments * fix compile * refactor: Chat Form Submit component * ad endpoint docs * Merge remote-tracking branch 'webui/allozaur/server_model_management_v1_2' into xsn/server_model_maagement_v1_2 Co-authored-by: Aleksander <aleksander.grygier@gmail.com> * feat: Add copy to clipboard to model name in model info dialog * feat: Model unavailable UI state for model selector * feat: Chat Form Actions UI logic improvements * feat: Auto-select model from last assistant response * chore: update webui build output * expose args and exit_code in API * add note * support extra_args on loading model * allow reusing args if auto_load * typo docs * oai-compat /models endpoint * cleaner * address review comments * feat: Use `model` property for displaying the `repo/model-name` naming format * refactor: Attachments data * chore: update webui build output * refactor: Enum imports * feat: Improve Model Selector responsiveness * chore: update webui build output * refactor: Cleanup * refactor: Cleanup * refactor: Formatters * chore: update webui build output * refactor: Copy To Clipboard Icon component * chore: update webui build output * refactor: Cleanup * chore: update webui build output * refactor: UI badges * chore: update webui build output * refactor: Cleanup * refactor: Cleanup * chore: update webui build output * add --models-allow-extra-args for security * nits * add stdin_file * fix merge * fix: Retrieve lost setting after resolving merge conflict * refactor: DatabaseStore -> DatabaseService * refactor: Database, Conversations & Chat services + stores architecture improvements (WIP) * refactor: Remove redundant settings * refactor: Multi-model business logic WIP * chore: update webui build output * feat: Switching models logic for ChatForm or when regenerating messges + modality detection logic * chore: update webui build output * fix: Add `untrack` inside chat processing info data logic to prevent infinite effect * fix: Regenerate * feat: Remove redundant settigns + rearrange * fix: Audio attachments * refactor: Icons * chore: update webui build output * feat: Model management and selection features WIP * chore: update webui build output * refactor: Improve server properties management * refactor: Icons * chore: update webui build output * feat: Improve model loading/unloading status updates * chore: update webui build output * refactor: Improve API header management via utility functions * remove support for extra args * set hf_repo/docker_repo as model alias when posible * refactor: Remove ConversationsService * refactor: Chat requests abort handling * refactor: Server store * tmp webui build * refactor: Model modality handling * chore: update webui build output * refactor: Processing state reactivity * fix: UI * refactor: Services/Stores syntax + logic improvements Refactors components to access stores directly instead of using exported getter functions. This change centralizes store access and logic, simplifying component code and improving maintainability by reducing the number of exported functions and promoting direct store interaction. Removes exported getter functions from `chat.svelte.ts`, `conversations.svelte.ts`, `models.svelte.ts` and `settings.svelte.ts`. * refactor: Architecture cleanup * feat: Improve statistic badges * feat: Condition available models based on modality + better model loading strategy & UX * docs: Architecture documentation * feat: Update logic for PDF as Image * add TODO for http client * refactor: Enhance model info and attachment handling * chore: update webui build output * refactor: Components naming * chore: update webui build output * refactor: Cleanup * refactor: DRY `getAttachmentDisplayItems` function + fix UI * chore: update webui build output * fix: Modality detection improvement for text-based PDF attachments * refactor: Cleanup * docs: Add info comment * refactor: Cleanup * re * refactor: Cleanup * refactor: Cleanup * feat: Attachment logic & UI improvements * refactor: Constants * feat: Improve UI sidebar background color * chore: update webui build output * refactor: Utils imports + move types to `app.d.ts` * test: Fix Storybook mocks * chore: update webui build output * test: Update Chat Form UI tests * refactor: Tooltip Provider from core layout * refactor: Tests to separate location * decouple server_models from server_routes * test: Move demo test to tests/server * refactor: Remove redundant method * chore: update webui build output * also route anthropic endpoints * fix duplicated arg * fix invalid ptr to shutdown_handler * server : minor * rm unused fn * add ?autoload=true\|false query param * refactor: Remove redundant code * docs: Update README documentations + architecture & data flow diagrams * fix: Disable autoload on calling server props for the model * chore: update webui build output * fix ubuntu build * fix: Model status reactivity * fix: Modality detection for MODEL mode * chore: update webui build output --------- Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2025-12-01 19:41:04 +01:00
Xuan-Son Nguyen	7733409734	common: improve verbosity level definitions (#17630 ) * common: improve verbosity level definitions * string_format * update autogen docs	2025-12-01 14:38:13 +01:00
Tarek Dakhran	2ba719519d	model: LFM2-VL fixes (#17577 ) * Adjust to pytorch * Add antialiasing upscale * Increase number of patches to 1024 * Handle default marker insertion for LFM2 * Switch to flag * Reformat * Cuda implementation of antialias kernel * Change placement in ops.cpp * consistent float literals * Pad only for LFM2 * Address PR feedback * Rollback default marker placement changes * Fallback to CPU implementation for antialias implementation of upscale	2025-11-30 21:57:31 +01:00
Xuan-Son Nguyen	7f8ef50cce	clip: fix nb calculation for qwen3-vl (#17594 )	2025-11-30 15:33:55 +01:00
Xuan-Son Nguyen	3c136b21a3	cli: add migration warning (#17620 )	2025-11-30 15:32:43 +01:00
Xuan-Son Nguyen	ab49f094d2	server: move server-context to its own cpp\|h (#17595 ) * git mv * add server-context.h * add server-context.h * clean up headers * cont : cleanup * also expose server_response_reader (to be used by CLI) * fix windows build * decouple server_routes and server_http --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2025-11-29 22:04:44 +01:00
Haiyue Wang	8c32d9d96d	server: explicitly set the function name in lambda (#17538 ) As [1] explained, the real debug message will be like: "res operator(): operator() : queue result stop" Set the name explicitly, the message is easy for debugging: "res operator(): recv : queue result stop" The left "operator()" is generated by 'RES_DBG() ... __func__' [1]: https://clang.llvm.org/extra/clang-tidy/checks/bugprone/lambda-function-name.html Signed-off-by: Haiyue Wang <haiyuewa@163.com>	2025-11-29 18:43:29 +01:00
Igor Smirnov	0874693b44	common : fix json schema with '\' in literals (#17307 ) * Fix json schema with '\' in literals * Add "literal string with escapes" test	2025-11-29 17:06:32 +01:00
o7si	3ce7a65c2f	server: fix: /metrics endpoint returning JSON-escaped Prometheus format (#17386 ) * fix: /metrics endpoint returning JSON-escaped Prometheus format * mod: remove string overload from ok() method	2025-11-28 19:14:00 +01:00
Fredrik Hultin	ddf9f94389	server : add Anthropic Messages API support (#17570 ) * server : add Anthropic Messages API support * remove -@pytest.mark.slow from tool calling/jinja tests * server : remove unused code and slow/skip on test_anthropic_vision_base64_with_multimodal_model in test_anthropic_api.py * server : removed redundant n field logic in anthropic_params_from_json * server : use single error object instead of error_array in streaming response handler for /v1/chat/completions and use unordered_set instead of set in to_json_anthropic_stream() * server : refactor Anthropic API to use OAI conversion * make sure basic test always go first * clean up * clean up api key check, add test --------- Co-authored-by: Xuan Son Nguyen <son@huggingface.co>	2025-11-28 12:57:04 +01:00
Xuan-Son Nguyen	e509411cf1	server: enable jinja by default, update docs (#17524 ) * server: enable jinja by default, update docs * fix tests	2025-11-27 01:02:50 +01:00
Han Qingzhe	1d594c295c	clip: (minicpmv) fix resampler kq_scale (#17516 ) * debug:"solve minicpmv precision problem" * “debug minicpmv” * Apply suggestion from @ngxson --------- Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>	2025-11-26 21:44:07 +01:00
Pascal	b1846f1c8e	webui: add rehype plugin to restore HTML in Markdown table cells (#17477 ) * webui: add rehype plugin to restore HTML in Markdown table cells The remark/rehype pipeline neutralizes inline HTML as literal text (remarkLiteralHtml) so that XML/HTML snippets in LLM responses display as-is instead of being rendered. This causes <br> and <ul> markup in table cells to show as plain text. This plugin traverses the HAST post-conversion, parses whitelisted HTML patterns (<br>, <ul><li>) from text nodes, and replaces them with actual HAST element nodes. For lists, adjacent siblings must be combined first as the AST fragmentation breaks pattern matching. Strict validation rejects malformed markup, keeping it as raw text. * chore: update webui build output	2025-11-25 08:01:02 +01:00
Xuan-Son Nguyen	b8372eecd9	server: split server.cpp code into server/common/task/queue (#17362 ) * add server-task, server-common * add server-queue * rm redundant includes * move enum stop_type to server-task * server : headers cleanup --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2025-11-24 14:41:53 +01:00
Pascal	0c7220db56	webui: minor settings reorganization and add disable autoscroll option (#17452 ) * webui: added a dedicated 'Display' settings section that groups visualization options * webui: added a Display setting to toggle automatic chat scrolling * chore: update webui build output	2025-11-23 18:42:00 +01:00
Aleksander Grygier	4c91f2633f	Improved file naming & structure for UI components (#17405 ) * refactor: Component iles naming & structure * chore: update webui build output * refactor: Dialog titles + components namig * chore: update webui build output * refactor: Imports * chore: update webui build output	2025-11-20 14:07:31 +01:00
Georgi Gerganov	196f5083ef	common : more accurate sampling timing (#17382 ) * common : more accurate sampling timing * eval-callback : minor fixes * cont : add time_meas impl * cont : fix log msg [no ci] * cont : fix multiple definitions of time_meas * llama-cli : exclude chat template init from time measurement * cont : print percentage of unaccounted time * cont : do not reset timings	2025-11-20 13:40:10 +02:00
Aleksander Grygier	99c53d6558	webui: Add a "Continue" Action for Assistant Message (#16971 ) * feat: Add "Continue" action for assistant messages * feat: Continuation logic & prompt improvements * chore: update webui build output * feat: Improve logic for continuing the assistant message * chore: update webui build output * chore: Linting * chore: update webui build output * fix: Remove synthetic prompt logic, use the prefill feature by sending the conversation payload ending with assistant message * chore: update webui build output * feat: Enable "Continue" button based on config & non-reasoning model type * chore: update webui build output * chore: Update packages with `npm audit fix` * fix: Remove redundant error * chore: update webui build output * chore: Update `.gitignore` * fix: Add missing change * feat: Add auto-resizing for Edit Assistant/User Message textareas * chore: update webui build output	2025-11-19 14:39:50 +01:00
o7si	97cb3fd5ae	fix: resolve undefined variable 'svr' compilation error (#17348 )	2025-11-18 10:10:47 +02:00
Xuan-Son Nguyen	0de8878c96	server: split HTTP into its own interface (#17216 ) * server: split HTTP into its own interface * move server-http and httplib to its own file * add the remaining endpoints * fix exception/error handling * renaming * missing header * fix missing windows header * fix error responses from http layer * fix slot save/restore handler * fix case where only one stream chunk is returned * add NOMINMAX * do not call sink.write on empty data * use safe_json_to_str for SSE * clean up * add some comments * improve usage of next() * bring back the "server is listening on" message * more generic handler * add req.headers * move the chat template print to init() * add req.path * cont : minor --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2025-11-17 22:05:44 +01:00
Georgi Gerganov	5b2093becc	server : handle context overflow during decode (#17267 ) * server : handle context overflow during decode * server : minor refactor	2025-11-16 09:23:37 +02:00
Aleksander Grygier	22e1ce2f81	webui: Fix clickability around chat processing statistics UI (#17278 ) * fix: Better pointer events handling in chat processing info elements * chore: update webui build output	2025-11-15 22:41:41 +01:00
Pascal	1411d9275a	webui: add OAI-Compat Harmony tool-call streaming visualization and persistence in chat UI (#16618 ) * webui: add OAI-Compat Harmony tool-call live streaming visualization and persistence in chat UI - Purely visual and diagnostic change, no effect on model context, prompt construction, or inference behavior - Captured assistant tool call payloads during streaming and non-streaming completions, and persisted them in chat state and storage for downstream use - Exposed parsed tool call labels beneath the assistant's model info line with graceful fallback when parsing fails - Added tool call badges beneath assistant responses that expose JSON tooltips and copy their payloads when clicked, matching the existing model badge styling - Added a user-facing setting to toggle tool call visibility to the Developer settings section directly under the model selector option * webui: remove scroll listener causing unnecessary layout updates (model selector) * Update tools/server/webui/src/lib/components/app/chat/ChatMessages/ChatMessageAssistant.svelte Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com> * Update tools/server/webui/src/lib/components/app/chat/ChatMessages/ChatMessageAssistant.svelte Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com> * chore: npm run format & update webui build output * chore: update webui build output --------- Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com>	2025-11-15 21:09:32 +01:00
Ankur Verma	c7b7db0445	mtmd-cli: Avoid logging to stdout for model loading messages in mtmd-cli (#17277 )	2025-11-15 12:41:16 +01:00
Xuan-Son Nguyen	9b17d74ab7	mtmd: add mtmd_log_set (#17268 )	2025-11-14 15:56:19 +01:00
Georgi Gerganov	d396b43748	server : fix "can batch with" bug (#17263 )	2025-11-14 14:03:45 +02:00
Aleksander Grygier	f1bad23f88	Better UX for handling multiple attachments in WebUI (#17246 )	2025-11-14 01:19:08 +01:00
Xuan-Son Nguyen	c4abcb2457	server: fixing naming conflict res_error (#17243 )	2025-11-13 20:53:47 +01:00
Aleksander Grygier	8e878f0cb4	Update packages + upgrade Storybook to v10 (#17201 ) * chore: Update packages + upgrade Storybook to v10 * fix: Increase timeout for UI tests	2025-11-12 19:01:48 +01:00
Xuan-Son Nguyen	00c94083b3	server: (refactor) implement generator-based API for task results (#17174 ) * server: (refactor) implement generator-based API for task results * improve * moving some code * fix "Response ended prematurely" * add sink.done before return false * rm redundant check * rm unused var * rename generator --> reader	2025-11-12 18:50:52 +01:00
Xuan-Son Nguyen	ee8dd5c658	server: move res_error/res_ok to static function (#17167 )	2025-11-12 14:17:24 +01:00
Adrien Gallouët	78010a0d52	cmake : move OpenSSL linking to vendor/cpp-httplib (#17177 ) * cmake : move OpenSSL linking to vendor/cpp-httplib Signed-off-by: Adrien Gallouët <angt@huggingface.co> * bring back httplib 0.27.0 * add -DLLAMA_HTTPLIB * update cmake config for visionos --------- Signed-off-by: Adrien Gallouët <angt@huggingface.co> Co-authored-by: Xuan Son Nguyen <son@huggingface.co>	2025-11-12 12:32:50 +01:00
Xuan-Son Nguyen	1d45b4228f	vendor: split httplib to cpp/h files (#17150 ) * vendor: split httplib to cpp/h files * move defines * include httplib if curl is not used * add TODO * fix build ios * fix build visionos instead	2025-11-11 13:32:58 +01:00
Mike Abbott	4a5b8aff40	cmake : add version to all shared object files (#17091 ) When compiling llama.cpp in Yocto, it fails QA checks because the generated so files aren't versioned. This applies a version to all generated so files, allowing the package to build without errors.	2025-11-11 13:19:50 +02:00
Nicolas B. Pierron	d2d626938a	Install rpc-server when GGML_RPC is ON. (#17149 )	2025-11-11 10:53:59 +00:00
Gabe Goodhart	0c74f32632	memory: Hybrid context shift (#17009 ) * feat(memory): Only fail partial erasure of recurrent tail The recurrent state is always assumed to be the state as of the last update from the final token in the sequence. When doing a partial erasure, if the range does not include the final token, the erasure can be considered a success since any memory used for the sequence prior to the final token (which is no memory) has been successfully removed. There is one potential case that this doesn't address which is the pruning of cache to remove sensitive data from the context. This wouldn't work for attention cache partial removal (in the middle) either since the KV state is linearly-dependent and states in later sequence positions would still be based on the state from the sensitive data, even if that data is no longer cached, so I don't think this is relevant, but it is worth noting that the semantics of this change for a partial erasure in the middle of the cache are essentially "my context is already compressed" and not "all trace of the removed tokens has been removed." https://github.com/ggml-org/llama.cpp/issues/16768 Branch: HybridContextShift-16768 Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> * fix(main): Check the output of seq_rm for prefix matching This prefix matching is explicitly attempting to remove the tokens at the end of the sequence that don't match. This is the operation that can't be performed on a recurrent cache due to the state being updated in place, so if this removal fails, we need to clear the whole cache. https://github.com/ggml-org/llama.cpp/issues/16768 Branch: HybridContextShift-16768 Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> * fix(memory): Fix condition for partial erasure failure if p0 > pos Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> Co-authored-by: compilade <git@compilade.net> * style: Fix extra parens Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * fix(main.cpp): Set n_matching_session_tokens to 0 on cache clear https://github.com/ggml-org/llama.cpp/issues/16768 Branch: HybridContextShift-16768 Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> --------- Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> Co-authored-by: compilade <git@compilade.net> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2025-11-10 17:14:23 +02:00
Georgi Gerganov	f914544b16	batched-bench : add "separate text gen" mode (#17103 )	2025-11-10 12:59:29 +02:00
Xuan-Son Nguyen	4b13a684c5	mtmd: fix patch_size initialized to random value in audio models (#17128 ) * mtmd: fix patch_size initialized to random value in audio models * add default hparams	2025-11-10 11:41:05 +01:00
Georgi Gerganov	b8595b16e6	mtmd : fix embedding size for image input (#17123 )	2025-11-09 18:31:02 +02:00
Georgi Gerganov	cb1adf8851	server : handle failures to restore host cache (#17078 ) * server : handle failures to restore host cache * server : add tests for the prompt cache	2025-11-09 14:27:05 +02:00
chansikpark	333f2595a3	webui: fix keyboard shortcuts for new chat & edit chat title (#17007 )	2025-11-08 20:52:35 +01:00
Aidan	eeee367de5	server: fix correct time_ms calculation in prompt_progress (#17093 ) * fix: correct time_ms calculation in send_partial_response The time_ms field was incorrectly calculated. The division was happening before the subtraction leading to incorrect values. Before: (ggml_time_us() - slot.t_start_process_prompt / 1000) After: (ggml_time_us() - slot.t_start_process_prompt) / 1000 * docs : document time_ms field in prompt_progress	2025-11-08 15:12:11 +02:00
Georgi Gerganov	7956bb4d7f	bench : cache the llama_context state at computed depth (#16944 ) * bench : cache llama_context state at depth * cont : handle failures to restore the old state * cont : print information when the state is being reused	2025-11-07 21:23:11 +02:00
Sigbjørn Skjæret	9008027aa3	hparams : add n_embd_inp() to support extended embed (#16928 ) * add n_embd_full to support extended embed * don't change output * rename to n_embd_inp * restore n_embd where applicable	2025-11-07 19:27:58 +01:00
Georgi Gerganov	16bcc1259d	kv-cache : pad the cache size to 256 for performance (#17046 ) * kv-cache : pad the size of the small SWA cache for performance * context : pad the total context to 256 * cont : future-proof the swa pad * server : adjust test params to new logic	2025-11-07 20:03:25 +02:00
Georgi Gerganov	8c0d6bb455	server : print the samplers chain for each request (#17070 )	2025-11-07 12:24:47 +02:00
Georgi Gerganov	b7f9010d24	server : disable checkpoints with mtmd (#17045 )	2025-11-06 12:09:29 +02:00
Xuan-Son Nguyen	4882f0ff78	clip: implement minicpm-v sinusoidal embd using GGML (#17036 ) * clip: implement minicpm-v sinusoidal embd using GGML * fix repeat op	2025-11-06 11:02:54 +01:00
Xuan-Son Nguyen	92bb84f775	mtmd: allow QwenVL to process larger image by default (#17020 )	2025-11-05 14:26:49 +01:00
Georgi Gerganov	13b339bcd9	server : do not default to multiple slots with speculative decoding (#17017 ) * server : do not default to multiple slots with speculative decoding * cont : fix	2025-11-05 14:32:55 +02:00
Xuan-Son Nguyen	2f0c2db43e	mtmd: improve struct initialization (#16981 )	2025-11-05 11:26:37 +01:00
손희준	fd2f84f468	docs: Clarify the endpoint that webui uses (#17001 )	2025-11-05 11:20:28 +01:00
Georgi Gerganov	66d8eccd42	server : do context shift only while generating (#17000 )	2025-11-04 19:21:36 +02:00
Aleksander Grygier	e7da30b584	fix: Viewing multiple PDF attachments (#16974 )	2025-11-03 18:53:26 +01:00
Georgi Gerganov	48bd26501b	server : add props.model_alias (#16943 ) * server : add props.model_alias * webui : npm run format	2025-11-03 14:38:23 +01:00
Xuan-Son Nguyen	070ff4d535	mtmd: add --image-min/max-tokens (#16921 )	2025-11-03 11:11:18 +01:00
Xuan-Son Nguyen	bf7b0c9725	mtmd: pad mask for qwen2.5vl (#16954 ) * mtmd: pad mask for qwen2.5vl * improve	2025-11-03 10:25:55 +01:00
Sascha Rogmann	bcfa87622a	feat(webui): improve LaTeX rendering with currency detection (#16508 ) * webui : Revised LaTeX formula recognition * webui : Further examples containg amounts * webui : vitest for maskInlineLaTeX * webui: Moved preprocessLaTeX to lib/utils * webui: LaTeX in table-cells * chore: update webui build output (use theirs) * webui: backslash in LaTeX-preprocessing * chore: update webui build output * webui: look-behind backslash-check * chore: update webui build output * Apply suggestions from code review Code maintenance (variable names, code formatting, string handling) Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com> * webui: Moved constants to lib/constants. * webui: package woff2 inside base64 data * webui: LaTeX-line-break in display formula * chore: update webui build output * webui: Bugfix (font embedding) * webui: Bugfix (font embedding) * webui: vite embeds assets * webui: don't suppress 404 (fonts) * refactor: KaTeX integration with SCSS Moves KaTeX styling to SCSS for better customization and font embedding. This change includes: - Adding `sass` as a dev dependency. - Introducing a custom SCSS file to override KaTeX variables and disable TTF/WOFF fonts, relying solely on WOFF2 for embedding. - Adjusting the Vite configuration to resolve `katex-fonts` alias and inject SCSS variables. * fix: LaTeX processing within blockquotes * webui: update webui build output --------- Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com>	2025-11-03 00:41:08 +01:00
Zhiyong Wang	6b9a52422b	model: add Janus Pro for image understanding (#16906 ) * Add support for Janus Pro * Update gguf-py/gguf/tensor_mapping.py Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * Update gguf-py/gguf/tensor_mapping.py Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * Address reviewer suggestions Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * Add JANUS_PRO constant * Update clip model handling Co-authored-by: Xuan-Son Nguyen <son@huggingface.co> * Update tools/mtmd/clip.cpp Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com> * Refactor JANUS_PRO handling in clip.cpp Co-authored-by: Xuan-Son Nguyen <son@huggingface.co> * Update tools/mtmd/clip.cpp Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * em whitespace --------- Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> Co-authored-by: Xuan-Son Nguyen <son@huggingface.co> Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>	2025-11-02 22:08:04 +01:00
Georgi Gerganov	2f966b8ed8	clip : use FA (#16837 ) * clip : use FA * cont : add warning about unsupported ops * implement "auto" mode for clip flash attn * clip : print more detailed op support info during warmup * cont : remove obsolete comment [no ci] * improve debugging message * trailing space * metal : remove stray return --------- Co-authored-by: Xuan Son Nguyen <son@huggingface.co>	2025-11-02 21:21:48 +01:00
Georgi Gerganov	cd5e3b5754	server : support unified cache across slots (#16736 ) * server : support unified context across slots * cont : fix speculative decoding initialization * context : fix n_ctx_per_seq computation * server : purge slots one by one * tests : add unified cache server tests * llama : update per-seq context computation * test-thread-safety : handle tiny training context of the input model * server : fix server_tokens clear() * server : use 4 slots + unified KV by default * llama : add note about context size queries * cont : update todos [no ci] * context : do not cap the size of the context * tests : adjust parameters to be CI friendlier * context : add warning	2025-11-02 18:14:04 +02:00
Georgi Gerganov	7fd205a8e8	scripts : add script to bench models (#16894 )	2025-11-02 00:15:31 +02:00
Pascal	2f68ce7cfd	webui: auto-refresh /props on inference start to resync model metadata (#16784 ) * webui: auto-refresh /props on inference start to resync model metadata - Add no-cache headers to /props and /slots - Throttle slot checks to 30s - Prevent concurrent fetches with promise guard - Trigger refresh from chat streaming for legacy and ModelSelector - Show dynamic serverWarning when using cached data * fix: restore proper legacy behavior in webui by using unified /props refresh Updated assistant message bubbles to show each message's stored model when available, falling back to the current server model only when the per-message value is missing When the model selector is disabled, now fetches /props and prioritizes that model name over chunk metadata, then persists it with the streamed message so legacy mode properly reflects the backend configuration * fix: detect first valid SSE chunk and refresh server props once * fix: removed the slots availability throttle constant and state * webui: purge ai-generated cruft * chore: update webui static build	2025-11-01 19:49:51 +01:00
Pascal	e4a71599e5	webui: add HTML/JS preview support to MarkdownContent with sandboxed iframe (#16757 ) * webui: add HTML/JS preview support to MarkdownContent with sandboxed iframe dialog Extended MarkdownContent to flag previewable code languages, add a preview button alongside copy controls, manage preview dialog state, and share styling for the new button group Introduced CodePreviewDialog.svelte, a sandboxed iframe modal for rendering HTML/JS previews with consistent dialog controls * webui: fullscreen HTML preview dialog using bits-ui * Update tools/server/webui/src/lib/components/app/misc/CodePreviewDialog.svelte Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com> * Update tools/server/webui/src/lib/components/app/misc/MarkdownContent.svelte Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com> * webui: pedantic style tweak for CodePreviewDialog close button * webui: remove overengineered preview language logic * chore: update webui static build --------- Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com>	2025-11-01 17:14:54 +01:00
Xuan-Son Nguyen	cf659bbb8e	mtmd: refactor preprocessing + support max/min pixels (#16878 ) * mtmd: refactor preprocessing + support max/min pixels * fix mlp type * implement mix/max pixels * improve hparams * better image preproc for qwen * fix * fix out of bound composite * fix (2) * fix token calculation * get_merge_kernel_size() * fix llama4 and lfm2 * gonna fix them all * use simple resize for qwen * qwen: increase min tokens * no resize if dst size == src size * restore to initial min/max tokens value for qwen	2025-11-01 15:51:36 +01:00
Aleksander Grygier	d8b860a219	Add a setting to display message generation statistics (#16901 ) * feat: Add setting to display message generation statistics * chore: build static webui output	2025-11-01 15:35:57 +01:00
Jaromír Hradílek	1ae74882f8	webui: recognize AsciiDoc files as valid text files (#16850 ) * webui: recognize AsciiDoc files as valid text files * webui: add an updated static webui build * webui: add the updated dependency list * webui: re-add an updated static webui build This also reverts commit `742dbb8379`.	2025-11-01 15:02:57 +01:00
Georgi Gerganov	c22473b580	server : don't print user inputs to console (#16871 )	2025-10-31 10:54:19 +02:00
Daniel Bevenius	0f715b4e75	server : fix typos in server.cpp comments [no ci] (#16883 )	2025-10-31 09:51:26 +01:00
chansikpark	16724b5b68	server : bump request URI max length to 32768 (#16862 )	2025-10-30 20:22:23 +02:00
Georgi Gerganov	b52edd2558	server : remove n_past (#16818 ) * server : remove n_past * server : replace slot.n_prompt_tokens() with slot.task->n_tokens() * server : fixes + clean-up * cont : fix context shift * server : add server_tokens::pos_next() Co-authored-by: Xuan-Son Nguyen <son@huggingface.co> * server : fix pos_next() usage Co-authored-by: Xuan-Son Nguyen <son@huggingface.co> --------- Co-authored-by: Xuan-Son Nguyen <son@huggingface.co>	2025-10-30 18:42:57 +02:00
JJJYmmm	d261223d24	model: add support for qwen3vl series (#16780 ) * support qwen3vl series. Co-authored-by: Thireus ☠ <Thireus@users.noreply.github.com> Co-authored-by: yairpatch <yairpatch@users.noreply.github.com> Co-authored-by: LETS-BEE <LETS-BEE@users.noreply.github.com> * bugfix: fix the arch check for qwen3vl-moe. * use build_ffn * optimize deepstack structure * optimize deepstack feature saving * Revert "optimize deepstack feature saving" for temporal fix This reverts commit `f321b9fdf1`. * code clean * use fused qkv in clip * clean up / rm is_deepstack_layers for simplification * add test model * move test model to "big" section * fix imrope check * remove trailing whitespace * fix rope fail * metal : add imrope support * add imrope support for sycl * vulkan: add imrope w/o check * fix vulkan * webgpu: add imrope w/o check * Update gguf-py/gguf/tensor_mapping.py Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * fix tensor mapping --------- Co-authored-by: Thireus ☠ <Thireus@users.noreply.github.com> Co-authored-by: yairpatch <yairpatch@users.noreply.github.com> Co-authored-by: LETS-BEE <LETS-BEE@users.noreply.github.com> Co-authored-by: Xuan Son Nguyen <son@huggingface.co> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>	2025-10-30 16:19:14 +01:00
Tianyue-Zhao	bacddc049a	model: Add support for CogVLM model (#15002 ) * Added GGUF mappings for CogVLM model * Add tensor mapping for CogVLM visual encoder * Add CogVLM to conversion script, no vision part yet * Added CogVLM vision model to conversion script * Add graph for CogVLM CLIP model * Add graph for CogVLM * Fixes for CogVLM. Now compiles. * Model now runs * Fixes for cogvlm graph * Account for graph context change after rebase * Changes for whitespace * Changes in convert script according to comments * Switch CogVLM LLM graph to merged QKV tensor * Use rope_type variable instead of direct definition * Change CogVLM CLIP encoder to use SWIGLU * Switch CogVLM CLIP to use merged QKV * Apply rebase edits and remove ggml_cont call that is now unnecessary * clean up --------- Co-authored-by: Xuan Son Nguyen <son@huggingface.co>	2025-10-30 12:18:50 +01:00
Xuan-Son Nguyen	e3af5563bd	llama: store mrope data in KV cell (#16825 ) * llama: store mrope data in KV cell * correct x,y ordering * address review comments * add consistency checks * Update src/llama-kv-cache.cpp Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * add TODO * fix asan error * kv-cells : improve ext handling * cont : fix headers --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2025-10-29 18:09:18 +01:00
Georgi Gerganov	85a7d8677b	memory : remove KV cache size padding (#16812 ) * memory : remove KV cache size padding * cont : restore padding for n_kv tensor shape * server : use slot context size instead of training context size * server : simplify context limit logic	2025-10-28 20:19:44 +02:00
Georgi Gerganov	a8ca18b4b8	llama-bench : clarify benchmarked parts of the computation (#16823 )	2025-10-28 19:41:43 +02:00
Aldehir Rojas	280d97be96	grammar : support array references in json schema (#16792 ) * grammar : support array references in json schema * Update json-schema-to-grammar.cpp Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * grammar : improve regex when naming ref derived rules * grammar : replace non-conformant definitions array with anyOf test case --------- Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>	2025-10-28 09:37:52 +01:00
Xuan-Son Nguyen	e1ab084803	mtmd : fix idefics3 preprocessing (#16806 ) * mtmd : fix idefics3 preprocessing * disable granite test * fix test for granite	2025-10-27 23:12:16 +01:00
Xuan-Son Nguyen	c55d53acec	model : add LightOnOCR-1B model (#16764 ) * model : add LightOnOCR-1B model * add test	2025-10-27 16:02:58 +01:00
Florian Badie	69e9ff0103	webui: support q URL parameter (#16728 ) * webui: support q URL parameter Fixes #16722 I’ve checked that it works with Firefox’s AI tools * webui: apply suggestions from code review Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com> * chore: update webui static build --------- Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com>	2025-10-24 14:10:29 +02:00
Johannes Gäßler	0bf47a1dbb	server: add memory breakdown print (#16740 )	2025-10-23 21:30:17 +02:00
Xuan-Son Nguyen	d0660f237a	mtmd-cli : allow using --jinja (#16718 ) * mtmd-cli : allow using --jinja * support -sys * implement chat_history * fix clear memory * rm -sys support, added TODO	2025-10-23 15:00:49 +02:00
Prajwal B Mehendarkar	fe6a9882ac	Manually link -lbsd to resolve flock symbol on AIX (#16610 )	2025-10-23 19:37:31 +08:00
matteo	8cf6b42d46	server : send partial stop string when <EOG> is reached (#15007 )	2025-10-23 12:32:24 +03:00
Pascal	9b9201f65a	webui: introduce OpenAI-compatible model selector in JSON payload (#16562 ) * webui: introduce OpenAI-compatible model selector in JSON payload * webui: restore OpenAI-Compatible model source of truth and unify metadata capture This change re-establishes a single, reliable source of truth for the active model: fully aligned with the OpenAI-Compat API behavior It introduces a unified metadata flow that captures the model field from both streaming and non-streaming responses, wiring a new onModel callback through ChatService The model name is now resolved directly from the API payload rather than relying on server /props or UI assumptions ChatStore records and persists the resolved model for each assistant message during streaming, ensuring consistency across the UI and database Type definitions for API and settings were also extended to include model metadata and the onModel callback, completing the alignment with OpenAI-Compat semantics * webui: address review feedback from allozaur * webui: move model selector into ChatForm (idea by @allozaur) * webui: make model selector more subtle and integrated into ChatForm * webui: replaced the Flowbite selector with a native Svelte dropdown * webui: add developer setting to toggle the chat model selector * webui: address review feedback from allozaur Normalized streamed model names during chat updates by trimming input and removing directory components before saving or persisting them, so the conversation UI shows only the filename Forced model names within the chat form selector dropdown to render as a single-line, truncated entry with a tooltip revealing the full name * webui: toggle displayed model source for legacy vs OpenAI-Compat modes When the selector is disabled, it falls back to the active server model name from /props When the model selector is enabled, the displayed model comes from the message metadata (the one explicitly selected and sent in the request) * Update tools/server/webui/src/lib/components/app/chat/ChatForm/ChatFormActions.svelte Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com> * Update tools/server/webui/src/lib/constants/localstorage-keys.ts Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com> * Update tools/server/webui/src/lib/components/app/chat/ChatForm/ChatFormModelSelector.svelte Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com> * Update tools/server/webui/src/lib/components/app/chat/ChatMessages/ChatMessageAssistant.svelte Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com> * Update tools/server/webui/src/lib/services/chat.ts Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com> * Update tools/server/webui/src/lib/services/chat.ts Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com> * webui: refactor model selector and persistence helpers - Replace inline portal and event listeners with proper Svelte bindings - Introduce 'persisted' store helper for localStorage sync without runes - Extract 'normalizeModelName' utils + Vitest coverage - Simplify ChatFormModelSelector structure and cleanup logic Replaced the persisted store helper's use of '$state/$effect' runes with a plain TS implementation to prevent orphaned effect runtime errors outside component context Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com> * webui: document normalizeModelName usage with inline examples * Update tools/server/webui/src/lib/components/app/chat/ChatForm/ChatFormModelSelector.svelte Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com> * Update tools/server/webui/src/lib/stores/models.svelte.ts Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com> * Update tools/server/webui/src/lib/stores/models.svelte.ts Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com> * webui: extract ModelOption type into dedicated models.d.ts Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com> * webui: refine ChatMessageAssistant displayedModel source logic * webui: stabilize dropdown, simplify model extraction, and init assistant model field * chore: update webui static build * Update tools/server/webui/src/lib/components/app/chat/ChatMessages/ChatMessageAssistant.svelte Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com> * chore: npm format, update webui static build * webui: align sidebar trigger position, remove z-index glitch * chore: update webui build output --------- Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com>	2025-10-22 16:58:23 +02:00
Aleksander Grygier	c9c1972e2c	Handle legacy 'context' attachments (#16687 )	2025-10-20 19:49:02 +02:00
Aleksander Grygier	79068501fa	Prevent premature submission on IME input (#16673 ) * fix: Prevent premature submission on IME input * chore: update webui static build * refactor: Put IME completion checker in a helper function and add checking for `KeyboardEvent.eventKey === 229` * chore: update webui static build * chore: update webui static build * chore: update webui static build	2025-10-20 14:21:12 +02:00
Aleksander Grygier	0e4a0cf2fa	Import/Export UX improvements (#16619 ) * webui : added download action (#13552) * webui : import and export (for all conversations) * webui : fixed download-format, import of one conversation * webui : add ExportedConversations type for chat import/export * feat: Update naming & order * chore: Linting * feat: Import/Export UX improvements * chore: update webui build output * feat: Update UI placement of Import/Export tab in Chat Settings Dialog * refactor: Cleanup chore: update webui build output * feat: Enable shift-click multiple conversation items selection * chore: update webui static build * chore: update webui static build --------- Co-authored-by: Sascha Rogmann <github@rogmann.org>	2025-10-20 13:29:14 +02:00
Aleksander Grygier	13f2cfad41	Enable per-conversation loading states to allow having parallel conversations (#16327 ) * feat: Per-conversation loading states and tracking streaming stats * chore: update webui build output * refactor: Chat state management Consolidates loading state management by using a global `isLoading` store synchronized with individual conversation states. This change ensures proper reactivity and avoids potential race conditions when updating the UI based on the loading status of different conversations. It also improves the accuracy of statistics displayed. Additionally, slots service methods are updated to use conversation IDs for per-conversation state management, avoiding global state pollution. * feat: Adds loading indicator to conversation items * chore: update webui build output * fix: Fix aborting chat streaming Improves the chat stream abortion process by ensuring that partial responses are saved before the abort signal is sent. This avoids a race condition where the onError callback could clear the streaming state before the partial response is saved. Additionally, the stream reading loop and callbacks are now checked for abort signals to prevent further processing after abortion. * refactor: Remove redundant comments * chore: build webui static output * refactor: Cleanup * chore: update webui build output * chore: update webui build output * fix: Conversation loading indicator for regenerating messages * chore: update webui static build * feat: Improve configuration * feat: Install `http-server` as dev dependency to not need to rely on `npx` in CI	2025-10-20 12:41:13 +02:00
Radoslav Gerganov	41386cf365	rpc : report actual free memory (#16616 ) * rpc : report actual free memory Start reporting the free memory on every device instead of using fixed values. Now llama-cli users can get a nice memory breakdown when using RPC devices. * drop --mem in rpc-server	2025-10-17 18:02:52 +03:00
Pascal	ababae7e1e	webui: reorganize settings layout (#16607 ) * webui: reorganize settings layout * chore: update webui build output * fix: remove unused variable * chore: update webui build output	2025-10-17 10:35:03 +02:00
Xuan-Son Nguyen	1bb4f43380	mtmd : support home-cooked Mistral Small Omni (#14928 )	2025-10-16 19:00:31 +02:00
Pascal	683fa6ba4e	fix: added a normalization step for MathJax-style \[\] and delimiters (#16599 ) * fix: added a normalization step for MathJax-style \[\] and delimiters So inline and block equations are converted before KaTeX rendering, enabling proper display of model-generated LaTeX in the WebUI * chore: update webui build output	2025-10-16 16:28:41 +02:00
Aleksander Grygier	f9fb33f263	Add server-driven parameter defaults and syncing (#16515 )	2025-10-15 16:22:20 +02:00
Georgi Gerganov	17304cbcc1	server : fix img token logs (#16595 )	2025-10-15 16:53:12 +03:00
Georgi Gerganov	554fd578a5	server : fix mtmd checkpoints (#16591 )	2025-10-15 11:51:27 +02:00
Georgi Gerganov	bc07349a7f	server : dynamic token limit for prompt cache (#16560 ) * server : dynamic token limit for prompt cache * cont : print estimated token limit	2025-10-14 08:48:50 +03:00
Pascal	1fb9504eb7	fix: add remark plugin to render raw HTML as literal text (#16505 ) * fix: add remark plugin to render raw HTML as literal text Implemented a missing MDAST stage to neutralize raw HTML like major LLM WebUIs do ensuring consistent and safe Markdown rendering Introduced 'remarkLiteralHtml', a plugin that converts raw HTML nodes in the Markdown AST into plain-text equivalents while preserving indentation and line breaks. This ensures consistent rendering and prevents unintended HTML execution, without altering valid Markdown structure Kept 'remarkRehype' in the pipeline since it performs the required conversion from MDAST to HAST for KaTeX, syntax highlighting, and HTML serialization Refined the link-enhancement logic to skip unnecessary DOM rewrites, fixing a subtle bug where extra paragraphs were injected after the first line due to full innerHTML reconstruction, and ensuring links open in new tabs only when required Final pipeline: remarkGfm -> remarkMath -> remarkBreaks -> remarkLiteralHtml -> remarkRehype -> rehypeKatex -> rehypeHighlight -> rehypeStringify * fix: address review feedback from allozaur * chore: update webui build output	2025-10-13 10:55:32 +02:00

... 3 4 5 6 7 ...

713 Commits