Commit Graph

512 Commits

Author SHA1 Message Date
pockers21 6d71ec0e16
Merge 6e89a8b296 into 3bc8d2cf23 2026-02-02 00:18:18 +02:00
EugeoSynthesisThirtyTwo 3dd95914d0
quantize: add option --tensor-type-file to llama-quantize (#18572)
* add option --tensor-type-file to llama-quantize, but it raises an error.

* add error message when file not found

* quantize: update help menu, fix CI

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

---------

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
Co-authored-by: Your Name <you@example.com>
Co-authored-by: Aaron Teo <aaron.teo1@ibm.com>
2026-01-31 11:39:21 +08:00
tc-mb ec6c7421e4
mtmd: support MiniCPM-o 4.5(vision only) (#19211)
Signed-off-by: tc-mb <caitianchi@modelbest.cn>
2026-01-30 23:19:30 +01:00
Georgi Gerganov bbada8bfb9
server : wrap around the "id_slot" parameter (#19207)
* server : wrap around the "id_slot" parameter

* cont : minor
2026-01-30 19:46:10 +02:00
Georgi Gerganov dabaa2e77a
spec : add ngram-mod (#19164)
* spec : add ngram-mod

* cont : simplify + keep track of occupancy

* cont : cleanup

* cont : move initialization to common/speculative

* cont : cleanup

* cont : cleanup

* cont : fix
2026-01-30 18:21:48 +02:00
Andrew Marshall 84b0a98319
webui: Update Svelte to fix effect_update_depth_exceeded errors (#19144)
The upstream fix is first available in 5.38.2, so constrain to at least
that version.

Rebuild pre-compiled webui index.html.gz based on these changes.

See also:
https://github.com/ggml-org/llama.cpp/issues/16347
https://github.com/huntabyte/bits-ui/issues/1687
https://github.com/sveltejs/svelte/issues/16548
2026-01-29 15:56:39 +01:00
Sascha Rogmann 72d3b1898a
spec : add self‑speculative decoding (no draft model required) + refactor (#18471)
* server: introduce self-speculative decoding

* server: moved self-call into speculative.cpp

* can_speculate() includes self-speculation

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* server: can_speculate() tests self-spec

* server: replace can_speculate() with slot.can_speculate()

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

* common: use %zu format specifier for size_t in logging

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

* server: can_speculate() requires a task instance

* common: ngram map, config self-speculative decoding

* common: add enum common_speculative_type

* common: add vector of speculative states

* common: add option --spec-draftless

* server: cleanup (remove slot.batch_spec, rename)

* common: moved self-spec impl to ngram-map

* common: cleanup (use common_speculative_state_draft)

* spec : refactor

* cont : naming

* spec: remove --spec-config

* doc: (draftless) speculative decoding

* common: print performance in spec decoding

* minor : cleanup

* common : better names

* minor : cleanup + fix build

* minor: comments

* CODEOWNERS: add common/ngram-map.* (#18471)

* common : rename speculative.draftless_type -> speculative.type

* ngram-map : fix uninitialized values

* ngram-map : take into account the input can become shorter

* ngram-map : revert len check for now

* arg : change `--spec-draftless` -> `--spec-type`

* spec : add common_speculative_state::accept()

* spec : refactor + add common_speculative_begin()

* spec : fix begin() call with mtmd

* spec : additional refactor + remove common_speculative_params

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
2026-01-28 19:42:42 +02:00
liyang 6e89a8b296 Refactor JinaCLIP vision mmproj mapping to use tensor_mapping table 2026-01-28 13:06:10 +00:00
Georgi Gerganov b931f81b5a
server : adjust spec tests to generate up to 16 tokens (#19093) 2026-01-28 09:11:40 +02:00
liyang bb7a5561ce remove unused fused QKV mapping 2026-01-26 08:44:46 +00:00
liyang 463f536695 address #16574; fold CLI into mtmd-cli; use ggml_rope_ext + bicubic;switch to 'jinaclip2'; fix converter constants 2026-01-26 08:44:46 +00:00
Georgi Gerganov 080b161995
completion : fix prompt cache for recurrent models (#19045) 2026-01-25 09:12:50 +02:00
Daniel Bevenius 16639ba217
common : use two decimal places for float arg help messages (#19048)
* common : use two decimal places for float arg help messages

This commit updates the help messages for various command-line arguments
in arg.cpp to display floating-point default values with two decimal
places instead of one.

The motivation for this changes is that currently only having one decimal
place means that values generated using --help or llama-gen-docs will not
display the correct values.

For example, currently the value of top-p in tools/server/README.md is
`0.9`, but the default value is actually '0.95'. And running
llama-gen-docs does not update this value as it uses the output from the
help message, which shows only one decimal place, so the values look
like they are unchanged.

* docs : run llama-gen-docs to update docs
2026-01-25 07:31:42 +01:00
Johannes Gäßler e9fd8dcab4
llama-fit-params: keep explicit --ctx-size 0 (#19070) 2026-01-24 22:13:08 +01:00
Aldehir Rojas a3e812811d
cli : load parser definition (#19031)
* cli : load parser definition

* cont : only unload if a parser is defined
2026-01-22 20:31:22 -06:00
Xuan-Son Nguyen 51fa458a92
server : support preserving reasoning_content in assistant message (#18994)
* support reasoning_content input

* report template caps to webui

* add docs

* rm commented code
2026-01-22 21:30:06 +01:00
Xuan-Son Nguyen 4e595b250a
server: do not log certain endpoints (avoid log spam) (#19028) 2026-01-22 19:24:37 +01:00
Xuan-Son Nguyen 9eb5bfec1a
mtmd : update docs to use llama_model_n_embd_inp (#18999) 2026-01-22 14:36:32 +01:00
손희준 c6926d1d95
server: Reorder methods in `server-task.cpp` (#19016)
* Move `task_result_state::update_chat_msg` to match with header

* Move `server_task_result_cmpl_partial::to_json_anthropic()` to match with header

---------

Co-authored-by: openingnow <>
2026-01-22 14:36:04 +01:00
Hendrik Erz 3802d3c78f
fix: Use `tabular-nums` for chat message statistics (#18915)
* fix: Use `tabular-nums` for chat message statistics

* fix: Rebuild WebUI
2026-01-21 18:46:01 +01:00
손희준 fbbf3ad190
server: /v1/responses (partial) (#18486)
* from previous PR

* Make instruction(system) as first message

* Convert [input_message] (text/image/file)

* Rename convert_responses_to_chatcmpl(body) -> response_body

* Initial tool call support

* Erase instructions field from chatcmpl body

* Feed reasoning texts to chat template

* Use std::vector instead of opaque json array

* Make output_item.added events consistent

* Move `server_task_result_cmpl_partial::update` from header to source

* Match ID of output_item.added and .done events

* Add function_call only if there is no "fc_" prefix

* Add function call output at non-streaming API

* Test if ID is persistent

* Add doc

* Fix style - use trailing comma

* Rewrite state management

* catch up with upstream/master

* Fix style - "type" is the first item of SSE data

* Explicitly check "instructions" from response_body

* Make lambdas static

* Check if reasoning content exists

* Add `oai_resp_id` to task_result_state(also initialized at ctor), server_task_result_cmpl_partial, and server_task_result_cmpl_final

* Reject `input_file` since it is not supported by chatcmpl

* Add "fc_" prefix to non-straming function call id as coderabbit pointed out

---------

Co-authored-by: openingnow <>
2026-01-21 17:47:23 +01:00
Adrien Gallouët 1c7cf94b22
common, server : use the same User-Agent by default (#18957)
This commit also ensures that if a custom User-Agent is used, it will be
the only one sent.

Signed-off-by: Adrien Gallouët <angt@huggingface.co>
2026-01-20 18:28:43 +01:00
Xuan-Son Nguyen 2c1f199653
cli : fix reasoning responses in CLI (#18961)
* cli : fix reasoning responses in CLI

* fix build

* fix build (2)
2026-01-20 18:23:25 +01:00
Xuan-Son Nguyen 6df686bee6
server : refactor oai_parser_opt, move it to server_chat_params (#18937)
* server_chat_params

* move chat format into CLI

* use meta whenever possible

* clean up, no more chatml fallback
2026-01-19 23:28:01 +01:00
Lennart Austenfeld 18361c579c
server: fix memory reservations in populate_token_probs (#18787) 2026-01-19 19:13:31 +01:00
Tarek Dakhran c945aaaef2
mtmd : Fix ASR for LFM2.5-Audio-1.5B (#18876) 2026-01-16 11:23:08 +01:00
Xuan-Son Nguyen c15395f73c
common : implement new jinja template engine (#18462)
* jinja vm

* lexer

* add vm types

* demo

* clean up

* parser ok

* binary_expression::execute

* shadow naming

* bin ops works!

* fix map object

* add string builtins

* add more builtins

* wip

* use mk_val

* eval with is_user_input

* render gemma tmpl ok

* track input string even after transformations

* support binded functions

* keyword arguments and slicing array

* use shared_ptr for values

* add mk_stmt

* allow print source on exception

* fix negate test

* testing more templates

* mostly works

* add filter_statement

* allow func to access ctx

* add jinja-value.cpp

* impl global_from_json

* a lot of fixes

* more tests

* more fix, more tests

* more fixes

* rm workarounds

* demo: type inferrence

* add placeholder for tojson

* improve function args handling

* rm type inference

* no more std::regex

* trailing spaces

* make testing more flexible

* make output a bit cleaner

* (wip) redirect minja calls

* test: add --output

* fix crash on macro kwargs

* add minimal caps system

* add some workarounds

* rm caps_apply_workarounds

* get rid of preprocessing

* more fixes

* fix test-chat-template

* move test-chat-jinja into test-chat-template

* rm test-chat-jinja from cmake

* test-chat-template: use common

* fix build

* fix build (2)

* rename vm --> interpreter

* improve error reporting

* correct lstrip behavior

* add tojson

* more fixes

* disable tests for COMMON_CHAT_FORMAT_GENERIC

* make sure tojson output correct order

* add object.length

* fully functional selectattr / rejectattr

* improve error reporting

* more builtins added, more fixes

* create jinja rendering tests

* fix testing.h path

* adjust whitespace rules

* more fixes

* temporary disable test for ibm-granite

* r/lstrip behavior matched with hf.js

* minimax, glm4.5 ok

* add append and pop

* kimi-k2 ok

* test-chat passed

* fix lstrip_block

* add more jinja tests

* cast to unsigned char

* allow dict key to be numeric

* nemotron: rm windows newline

* tests ok

* fix test

* rename interpreter --> runtime

* fix build

* add more checks

* bring back generic format support

* fix Apertus

* [json.exception.out_of_range.403] key 'content' not found

* rm generic test

* refactor input marking

* add docs

* fix windows build

* clarify error message

* improved tests

* split/rsplit with maxsplit

* non-inverse maxsplit

forgot to change after simplifying

* implement separators for tojson and fix indent

* i like to move it move it

* rename null -- > none

* token::eof

* some nits + comments

* add exception classes for lexer and parser

* null -> none

* rename global -> env

* rm minja

* update docs

* docs: add input marking caveats

* imlement missing jinja-tests functions

* oops

* support trim filter with args, remove bogus to_json reference

* numerous argument fixes

* updated tests

* implement optional strip chars parameter

* use new chars parameter

* float filter also has default

* always leave at least one decimal in float string

* jinja : static analysis + header cleanup + minor fixes

* add fuzz test

* add string.cpp

* fix chat_template_kwargs

* nits

* fix build

* revert

* unrevert

sorry :)

* add fuzz func_args, refactor to be safer

* fix array.map()

* loosen ensure_vals max count condition, add not impl for map(int)

* hopefully fix windows

* check if empty first

* normalize newlines

---------

Co-authored-by: Alde Rojas <hello@alde.dev>
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2026-01-16 11:22:06 +01:00
Julius Tischbein aa1dc3770a
Setting mmap and direct_io to false as default in llama-bench.cpp (#18841) 2026-01-16 09:46:51 +01:00
ddh0 13f1e4a9ca
llama : add adaptive-p sampler (#17927)
* initial commit for branch

* simplify constants

* add params to `struct common_params_sampling`, add reference to PR

* explicitly clamp `min_target` and `max_target` to `[0.0, 1.0]`

* add args, rename `queue_size` -> `window_size`

* improved comments

* minor

* remove old unused code from algorithm

* minor

* add power law case to `common_sampler_init`, add sampler name mappings

* clarify behaviour when `window_size = 0`

* add missing enums

* remove `target_range` param, make `target == 1` no-op, cleanup code

* oops, straggler

* add missing parameters in `server-task.cpp`

* copy from author

ref:
https://gist.github.com/MrJackSpade/9be99c7efbba7b95a41377e123b7b069

* remove old debug log, style nit

* fix compiler warning, add commented-out logging per token

* re-write + change parameters + simplify

* oops forgot args.cpp

* fix leftover `window_size`

* add missing values to `common_params_sampling::print()`

* with logging

* does this fix it?

* no, but does this?

* update default decay

* optimize

* fix bad merge

my git skills are lacking

* silence `missing initializer for member`

* update default decay to 0.9

* fix logging

* format (double)

* add power law to the new `samplers` vector

* log sampler init values

* improve logging messages in llama_sampler_power_law

* remove extraneous logging

* simplify target computation

last commit with debug logging!

* remove debug logging, explicitly clamp params at init

* add `use_power_law` flag + logic, minor cleanup

* update `power-law` -> `adaptive-p`

* fix cold start EMA

- `ctx->weighted_sum` is now initialized and reset to `target / (1.0f -
clamped_decay)`
- `ctx->total_weight` is now initialized and reset to `1.0f / (1.0f -
clamped_decay)`

this fixes a "cold start" problem with the moving average

* update `SHARPNESS` constant to `10.0f`

* minor style fixes

no functional changes

* minor style fixes cont.

* update `llama_sampler_adaptive_p_i` for backend sampling (ref: #17004)

* separate into `apply` + `accept` functions

* `pending_token_idx`: switch from `llama_token` to `int32`

functionally identical (`llama.h` has `typedef int32_t llama_token;`),
but its more correct now

* don't transform logits <= -1e9f

* fix masking in backend top-p, min-p

* address review comments

* typo in comments `RND` -> `RNG`

* add docs

* add recommended values in completion docs

* address PR feedback

* remove trailing whitespace (for CI `editorconfig`)

* add to adaptive-p to `common_sampler_types_from_chars`
2026-01-15 19:16:29 +02:00
Xuan-Son Nguyen a04c2b06a3
server: improve slots scheduling for n_cmpl (#18789)
* server : make sure children tasks are scheduled to launch with parent

* fix

* add comment pointing to this PR

* fix

* clean up

* more debug messages

* add pop_deferred_task with specific ID version

* improve the logic

* simple approach

* no double move

* correct return type of launch_slots_with_parent_task
2026-01-15 17:10:28 +01:00
Georgi Gerganov 39173bcacb
context : reserve new scheduler when graph topology changes (#18547)
* context : reserve new scheduler when graph topology changes

* cont : fix

* cont : fix reserve

* cont : reserve only when changes occur + timing

* context : add comments

* llama : reserve on sampler changes

* common : allow null common_sampler

* server : task declares needs (embd, logits, sampling)

* server : do not init sampler if not needed

* llama : fix need_reserve when unsetting a sampler

* server : consolidate slot reset/clear logic
2026-01-15 16:39:17 +02:00
Piotr Wilkin (ilintar) d98b548120
Restore clip's cb() to its rightful glory - extract common debugging elements in llama (#17914)
* Extract common debugging functions; plug eval-callback and mtmd's MTMD_DEBUG_GRAPH with same functionality

* Move to common

* Remove unneeded header

* Unlink from common

* chore: update webui build output

* Cleanup; properly pass params to mtmd without depending on common; factorize debug.cpp to use common debug code.

* Revert change to webapp

* Post-merge adjust

* Apply suggestions from code review

Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>

* Apply code review changes

* Remove changes to server-context

* Remove mtmd.h include

* Remove utility functions from header

* Apply suggestions from code review

Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>

* Rename functions

* Update tools/mtmd/clip.cpp

Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>

* Update tools/mtmd/clip.cpp

Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>

* Update tools/mtmd/clip.cpp

Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>

---------

Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>
2026-01-14 20:29:35 +01:00
Adrien Gallouët 516a4ca9b5
refactor : remove libcurl, use OpenSSL when available (#18828) 2026-01-14 18:02:47 +01:00
Xuan-Son Nguyen e047f9ee9d
mtmd: fix use_non_causal being reported incorrectly (#18793)
* mtmd: fix use_non_causal being reported incorrectly

* move clip_is_mrope to mtmd_decode_use_mrope

* fix sloppy code ggml_cpy
2026-01-13 12:19:38 +01:00
Ruben Ortlam db79dc06b1
llama-bench: add direct_io parameter (#18778) 2026-01-13 08:49:10 +01:00
Radoslav Gerganov bcf7546160
server : add arg for disabling prompt caching (#18776)
* server : add arg for disabling prompt caching

Disabling prompt caching is useful for clients who are restricted to
sending only OpenAI-compat requests and want deterministic
responses.

* address review comments

* address review comments
2026-01-12 19:21:34 +02:00
Xuan-Son Nguyen ce3bf9b1a4
server: update docs for sleeping [no ci] (#18777) 2026-01-12 13:01:24 +01:00
Georgi Gerganov f307926482
server : adjust unified KV cache tests (#18716) 2026-01-10 17:51:56 +02:00
Xuan-Son Nguyen 9ac2693a30
server: fix n_cmpl not skipping processing prompt (#18663)
* server: fix n_cmpl not skipping processing

* fix infinite loop on empty batch

* cont : init child samplers + modify child logic

* cont : cleanup

* cont : improve n_cmpl logic

- launch the parent task first so it finds the slot with best cache
- parent task waits for child tasks to be launched
- when a child task finishes - remove its cache

* cont : remove redundant function

* cont : reduce parent checks

* fix : nullptr task dereference

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2026-01-10 00:00:41 +01:00
Simranjeet Singh a61c8bc3bf
mtmd: Add Gemma3n multimodal support with MobileNetV5 vision encoder (#18256)
* Add Gemma3nVisionModel - MobileNetV5 vision encoder convertor to convert_hf_to_gguf.py. Add gemma3n to vision projectors in gguf-py/gguf/constants.py.

* Add mobilenetv5 impl

* Fix comments, remove unused vars

* Fix permute and remove transpose of projection weights

* Fix comments, remove debugging prints from hf_to_gguf

* 1. Hard-code image_mean = 0 and image_std = 1
2. Use available tensor mapping logic
3. Remove redundant chat template replacement of soft tokens placeholder with media placeholder

* 1. Move mobilenetv5 helpers declarations to `clip_graph_mobilenetv5` struct and definitions to mobilenetv5.cpp
2.Remove unused `clip_is_gemma3n` func declarations and definitions
3. Remove redundant `rescale_image_u8_to_f32` func and use `normalize_image_u8_to_f32` with zero mean and unit std
4. Calculate n_patches using image_size / patch_size

* Remove obsolete comments

* - convert_hf_to_gguf.py & constants.py & tensor_mapping.py: Use explicit mapping: Custom map for double indexed blocks and tensor_mapping.py for rest
- convert_hf_to_gguf.py: Unsqueeze Stem Bias and Layer scale tensors to correct shape while converting to gguf
- mobilenetv5.cpp: Remove explicit reshaping of Stem Bias and Layer scale which are now handled while converting to gguf, replace fprintf with LOG_*
- clip.cpp: Remove unused embedding and hard_emb_norm tensor loading

* - Rename tensors to v.conv..., v.blk..., v.msfa... to better align with already existing terminology

* Fix stem conv bias name

* Remove explicit handling of bias term for stem conv

* - Change order of addition in "project_per_layer_inputs" to support broadcasting of vision inp_per_layer
- Simplify the vision embeddings path of "get_per_layer_inputs" to output [n_embd_altup, n_layer, 1], broadcastable

* clean up conversion script

* fix code style

* also preserve audio tensors

* trailing space

* split arch A and V

* rm unused gemma3 func

* fix alignment

---------

Co-authored-by: Xuan Son Nguyen <son@huggingface.co>
2026-01-09 23:42:38 +01:00
Pascal ec8fd7876b
Webui/file upload (#18694)
* webui: fix restrictive file type validation

* webui: simplify file processing logic

* chore: update webui build output

* webui: remove file picker extension whitelist (1/2)

* webui: remove file picker extension whitelist (2/2)

* chore: update webui build output

* refactor: Cleanup

* chore: update webui build output

* fix: update ChatForm storybook test after removing accept attribute

* chore: update webui build output

* refactor: more cleanup

* chore: update webui build output
2026-01-09 16:45:32 +01:00
Asbjørn Olling a180ba78c7
cmake: only build cli when server is enabled (#18670) 2026-01-09 16:43:26 +01:00
Georgi Gerganov 53eb9435da
server : fix timing of prompt/generation (#18713) 2026-01-09 12:59:50 +02:00
Georgi Gerganov f5f8812f7c
server : use different seeds for child completions (#18700)
* server : use different seeds for child completions

* cont : handle default seed

* cont : note
2026-01-09 09:33:50 +02:00
Adrien Gallouët 55abc39355
vendor : update cpp-httplib to 0.30.0 (#18660)
* vendor : update cpp-httplib to 0.30.0
* common : allow custom headers when downloading
2026-01-08 13:53:54 +01:00
Johannes Gäßler 64848deb18
llama-fit-params: free memory target per device (#18679) 2026-01-08 10:07:58 +01:00
Adrien Gallouët 56d2fed2b3
tools : remove llama-run (#18661)
* tools : remove llama-run
* Remove licenses/LICENSE-linenoise

Signed-off-by: Adrien Gallouët <angt@huggingface.co>
2026-01-07 16:18:26 +01:00
Tarek Dakhran ccbc84a537
mtmd: mtmd_audio_streaming_istft (#18645)
Change is decoupled from https://github.com/ggml-org/llama.cpp/pull/18641.

[LFM2.5-Audio-1.5B](https://huggingface.co/LiquidAI/LFM2.5-Audio-1.5B)
needs streaming istft for generating output audio.

* add streaming ISTFT class (`mtmd_audio_streaming_istft`) with overlap-add for audio reconstruction
* replace global audio cache with per-instance cache, the model requires
  two independent caches, for preprocessing (audio input) and for istft
  (audio output).
* unified templated FFT/IFFT implementation supporting both forward and inverse transforms
2026-01-06 21:00:29 +01:00
R 3d26a09dc7
server : add thinking content blocks to Anthropic Messages API (#18551)
* server : add thinking content blocks to Anthropic Messages API

Add support for returning reasoning/thinking content in Anthropic API
responses when using models with --reasoning-format deepseek and the
thinking parameter enabled.

- Non-streaming: adds thinking block before text in content array
- Streaming: emits thinking_delta events with correct block indices
- Partial streaming: tracks reasoning state across chunks via
  anthropic_has_reasoning member variable

Tested with bartowski/DeepSeek-R1-Distill-Qwen-7B-GGUF model.

* server : fix Anthropic API streaming for thinking content blocks

Add signature field and fix duplicate content_block_start events in
Anthropic Messages API streaming responses for reasoning models.

* server: refactor Anthropic streaming state to avoid raw pointer

Replace raw pointer to task_result_state with direct field copies:
- Copy state fields in update() before processing chunk
- Use local copies in to_json_anthropic() instead of dereferencing
- Pre-compute state updates for next chunk in update()

This makes the data flow clearer and avoids unsafe pointer patterns.
2026-01-06 16:17:13 +01:00
Tarek Dakhran 73d284a250
model : add LFM2-ColBert-350M (#18607)
* model : add LFM2-ColBert-350M

* llama_model_n_embd_out() - returns `hparams.n_embd_out` if set and fallbacks to `hparams.n_embd`
2026-01-05 19:52:56 +01:00