* Fix kimi-k2 parsing
* fix template & add more tests for kimi-k2
* Another fix for Kimi-K2 chat template.
* enable allow_toolcall_in_think for Kimi-K2
* Refine key-value separator and value end format
* Enable tool call in think for kimi-k2
* allow_toolcall_in_think is now tested with Kimi-K2
* Remove outdated TODO comment in XML tool call parser
Removed TODO comment about untested tool call feature.
* Rename function from "utf8_truncate_safe" to "utf8_truncate_safe_len"
* common : implement parser combinators to simplify chat parsing
* add virtual destructor to parser_base
* fix memory leak from circular references of rules
* implement gbnf grammar building
* remove unused private variable
* create a base visitor and implement id assignment as a visitor
* fix const ref for grammar builder
* clean up types, friend classes, and class declarations
* remove builder usage from until_parser
* Use a counter class to help assign rule ids
* cache everything
* add short description for each parser
* create a type for the root parser
* implement repetition parser
* Make optional, one_or_more, and zero_or_more subclasses of repetition
* improve context constructor
* improve until parsing and add benchmarks
* remove cached() pattern, cache in parser_base with specialized parsing functions for each parser
* improve json parsing performance to better match legacy parsing
* fix const auto * it for windows
* move id assignment to classes instead of using a visitor
* create named rules in the command r7b example
* use '.' for any in GBNF
* fix parens around choices in gbnf grammar
* add convenience operators to turn strings to literals
* add free-form operators for const char * to simplify defining literals
* simplify test case parser
* implement semantic actions
* remove groups in favor of actions and a scratchpad
* add built in actions for common operations
* add actions to command r7b example
* use std::default_searcher for platforms that don't have bm
* improve parser_type handling and add cast helper
* add partial result type to better control when to run actions
* fix bug in until()
* run actions on partial results by default
* use common_chat_msg for result
* add qwen3 example wip
* trash partial idea and simplify
* move action arguments to a struct
* implement aho-corasick matcher for until_parser and to build exclusion grammars
* use std::string for input, since std::string_view is incompatible with std::regex
* Refactor tests
* improve qwen3 example
* implement sax-style parsing and refactor
* fix json string in test
* rename classes to use common_chat_ prefix
* remove is_ suffix from functions
* rename from id_counter to just counter
* Final refactored tests
* Fix executable name and editorconfig-checker
* Third time's the charm...
* add trigger parser to begin lazy grammar rule generation
* working lazy grammar
* refactor json rules now that we check for reachability
* reduce pointer usage
* print out grammars in example
* rename to chat-peg-parser* and common_chat_peg_parser*
* Revert unrelated changes
* New macros for CMakeLists to enable multi-file compilations
* starting unicode support
* add unicode support to char_parser
* use unparsed args as additional sources
* Refactor tests to new harness
* Fix CMakeLists
* fix rate calculation
* add unicode tests
* fix trailing whitespace and line endings
skip-checks: true
* Helpers + rewrite qwen3 with helpers
* Fix whitespace
* extract unicode functions to separate file
* refactor parse unicode function
* fix compiler error
* improve construction of sequence/choice parsers
* be less clever
* add make_parser helper function
* expand usage of make_parser, alias common_chat_msg_peg_parser_builder to builder in source
* lower bench iterations
* add unicode support to until_parser
* add unicode support to json_string_parser
* clean up unicode tests
* reduce unicode details to match src/unicode.cpp
* simplify even further
* remove unused functions
* fix type
* reformat char class parsing
* clean up json string parser
* clean up + fix diagnostics
* reorder includes
* compact builder functions
* replace action_parser with capture_parser, rename env to semantics
* rename env to semantics
* clean up common_chat_parse_context
* move type() to below constant
* use default constructor for common_chat_peg_parser
* make all operators functions for consistency
* fix compilation errors in test-optional.cpp
* simplify result values
* rename json_string_unquoted to json_string_content
* Move helper to separate class, add separate explicit and helper classes
* Whitespace
* Change + to append()
* Reformat
* Add extra helpers, tests and Minimax example
* Add some extra optional debugging prints + real example of how to use them
* fix bug in repetitions when min_count = 0 reports failures
* dump rule in debug
* fix token accumulation and assert parsing never fails
* indent debug by depth
* use LOG_* in tests so logs sync up with test logs
* - Add selective testing
- Refactor all messaging to use LOG_ERR
- Fix lack of argument / tool name capturing
- Temporary fix for double event capture
* refactor rule() and introduce ref()
* clean up visitor
* clean up indirection in root parser w.r.t rules
* store shared ptr directly in parser classes
* replace aho-corasick automation with a simple trie
* Reset prev for qwen3 helper example variant
* refactor to use value semantics with std::variant/std::visit
* simplify trie_matcher result
* fix linting issues
* add annotations to rules
* revert test workaround
* implement serializing the parser
* remove redundant parsers
* remove tests
* gbnf generation fixes
* remove LOG_* use in tests
* update gbnf tests to test entire grammar
* clean up gbnf generation and fix a few bugs
* fix typo in test output
* remove implicit conversion rules
* improve test output
* rename trie_matcher to trie
* simplify trie to just know if a node is the end of a word
* remove common_chat_ prefix and ensure a common_peg_ prefix to all types
* rename chat-peg-parser -> peg-parser
* promote chat-peg-parser-helper to chat-peg-parser
* checkpoint
* use a static_assert to ensure we handle every branch
* inline trivial peg parser builders
* use json strings for now
* implement basic and native chat peg parser builders/extractors
* resolve refs to their rules
* remove packrat caching (for now)
* update tests
* compare parsers with incremental input
* benchmark both complete and incremental parsing
* add raw string generation from json schema
* add support for string schemas in gbnf generation
* fix qwen example to include \n
* tidy up example
* rename extractor to mapper
* rename ast_arena to ast
* place basic tests into one
* use gbnf_format_literal from json-schema-to-grammar
* integrate parser with common/chat and server
* clean up schema and serialization
* add json-schema raw string tests
* clean up json creation and remove capture parser
* trim spaces from reasoning and content
* clean up redundant rules and comments
* rename input_is_complete to is_partial to match rest of project
* simplify json rules
* remove extraneous file
* remove comment
* implement += and |= operators
* add comments to qwen3 implementation
* reorder arguments to common_chat_peg_parse
* remove commented outdated tests
* add explicit copy constructor
* fix operators and constness
* wip: update test-chat for qwen3-coder
* bring json parser closer to json-schema-to-grammar rules
* trim trailing space for most things
* fix qwen3 coder rules w.r.t. trailing spaces
* group rules
* do not trim trailing space from string args
* tweak spacing of qwen3 grammar
* update qwen3-coder tests
* qwen3-coder small fixes
* place parser in common_chat_syntax to simplify invocation
* use std::set to collect rules to keep order predictable for tests
* initialize parser to make certain platforms happy
* revert back to std::unordered_set, sort rule names at the end instead
* uncomment rest of chat tests
* define explicit default constructor
* improve arena init and server integration
* fix chat test
* add json_member()
* add a comprehensive native example
* clean up example qwen test and add response_format example to native test
* make build_peg_parser accept std::function instead of template
* change peg parser parameters into const ref
* push tool call on tool open for constructed parser
* add parsing documentation
* clean up some comments
* add json schema support to qwen3-coder
* add id initializer in tests
* remove grammar debug line from qwen3-coder
* refactor qwen3-coder to use sequence over operators
* only call common_chat_peg_parse if appropriate format
* simplify qwen3-coder space handling
* revert qwen3-coder implementation
* revert json-schema-to-grammar changes
* remove unnecessary forward declaration
* small adjustment to until_parser
* rename C/C++ files to use dashes
* codeowners : add aldehir to peg-parser and related files
---------
Co-authored-by: Piotr Wilkin <piotr.wilkin@syndatis.com>
* refactor: unify reasoning handling via backend reasoning_content, drop frontend tag parsing
- Updated the chat message component to surface backend-supplied reasoning via message.thinking while showing the raw assistant content without inline tag scrubbing
- Simplified chat streaming to append content chunks directly, stream reasoning into the message model, and persist any partial reasoning when generation stops
- Refactored the chat service SSE handler to rely on server-provided reasoning_content, removing legacy <think> parsing logic
- Refreshed Storybook data and streaming flows to populate the thinking field explicitly for static and streaming assistant messages
* refactor: implement streaming-aware universal reasoning parser
Remove the streaming mode limitation from --reasoning-format by refactoring
try_parse_reasoning() to handle incremental parsing of <think> tags across
all formats.
- Rework try_parse_reasoning() to track whitespace, partial tags, and
multiple reasoning segments, allowing proper separation of reasoning_content
and content in streaming mode
- Parse reasoning tags before tool call handling in content-only and Llama 3.x
formats to ensure inline <think> blocks are captured correctly
- Change default reasoning_format from 'auto' to 'deepseek' for consistent
behavior
- Add 'deepseek-legacy' option to preserve old inline behavior when needed
- Update CLI help and documentation to reflect streaming support
- Add parser tests for inline <think>...</think> segments
The parser now continues processing content after </think> closes instead of
stopping, enabling proper message.reasoning_content and message.content
separation in both streaming and non-streaming modes.
Fixes the issue where streaming responses would dump everything (including
post-thinking content) into reasoning_content while leaving content empty.
* refactor: address review feedback from allozaur
- Passed the assistant message content directly to ChatMessageAssistant to drop the redundant derived state in the chat message component
- Simplified chat streaming updates by removing unused partial-thinking handling and persisting partial responses straight from currentResponse
- Refreshed the ChatMessage stories to cover standard and reasoning scenarios without the old THINK-tag parsing examples
Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com>
* refactor: restore forced reasoning prefix to pass test-chat ([chat] All tests passed)
- store the exact sequence seen on input when 'thinking_forced_open' enforces a reasoning block
- inject this prefix before the first accumulated segment in 'reasoning_content', then clear it to avoid duplication
- repeat the capture on every new 'start_think' detection to properly handle partial/streaming flows
* refactor: address review feedback from ngxson
* debug: say goodbye to curl -N, hello one-click raw stream
- adds a new checkbox in the WebUI to display raw LLM output without backend parsing or frontend Markdown rendering
* Update tools/server/webui/src/lib/components/app/chat/ChatMessages/ChatMessage.svelte
Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com>
* webui: add Storybook example for raw LLM output and scope reasoning format toggle per story
- Added a Storybook example that showcases the chat message component in raw LLM output mode with the provided trace sample
- Updated every ChatMessage story to toggle the disableReasoningFormat setting so the raw-output rendering remains scoped to its own example
* npm run format
* chat-parser: address review feedback from ngxson
Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com>
---------
Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com>
Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com>
* First attempt
* No permute during convert (fixes qk tensors), proper norm application.
* RoPE = NeoX
* Coherence!
* Migrate xielu params from tensors to hyperparameters
* Simple CUDA kernel
* Revert stupid LLM refactorings
* Chat template support
* configchecker / flake8 errors
* Reorder unary.cu
* I do conclude that LLMs are, in fact, stupid.
* Fix after merge
* Final newline
* Make xIELU an UNARY_OP
* Final newline
* Correctly account for parameter shift
* Argh.
* Update ggml/src/ggml-cpu/unary-ops.cpp
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* Refactor: remove unused methods, inline and factorize softplus, add const modifiers
* Revert CUDA changes, implement xIELU as a separate OP
* Pesky newline
* Add float2half / half2float for F16 inputs/outputs
* CUDA variants, attempt 2
* Actually, attempt 3
* Update ggml/src/ggml-cuda/unary.cu
Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
* Missing convert header
* Proper formula and reference for xIELU in the comments.
* Modify unary-ops.cpp to add the functor-based logic besides the template system to retain optimizations
* Apply suggestions from code review
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
* Add tensor mappings for Apertus to global list instead
* Fix lazy on scalars
* Update ggml/src/ggml-cuda/unary.cu
Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
* Add comment about the constraints on positive/negative alpha
* Change `softplus` to `ggml_softplus`
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
Currently when a model generates output which looks like a tool call,
but is invalid an exception is thrown and not handled, causing the cli
or llama-server to bail. Instead, handle the chat parser exception and
simply return the generated text in such cases.
Signed-off-by: Piotr Stankiewicz <piotr.stankiewicz@docker.com>
* add preludes to content on partial regex match
* allow all parsers to parse non-tool-call content.
* tweak order of <|python_tag|> vs <function= parsing for functionary v3.1 format. still not ideal but hopefully less prone to crash