llama.cpp

Commit Graph

Author	SHA1	Message	Date
Srihari-mcw	2913ac95dc	Remove trailing whitespaces	2025-12-16 22:50:29 -08:00
Manogna-Sree	9976c21bd3	Remove empty line	2025-12-16 22:50:29 -08:00
Manogna-Sree	75712bc6b1	Fix CI/CD issues	2025-12-16 22:50:29 -08:00
Manogna-Sree	b407f188b2	Fix for inaccuracies in the scalar version	2025-12-16 22:50:29 -08:00
Srihari-mcw	ac42365ca5	Remove print	2025-12-16 22:50:29 -08:00
Srihari-mcw	a3957d1173	Rename variables to maintain convention in other functions	2025-12-16 22:50:29 -08:00
Srihari-mcw	be80640fea	Fix issues with scalar version	2025-12-16 22:50:29 -08:00
Srihari-mcw	5c851ca7bd	Cleanup GEMV Code	2025-12-16 22:50:29 -08:00
Srihari-mcw	4806d6a8fe	Add further fixes and updates to scalar code	2025-12-16 22:50:29 -08:00
Srihari-mcw	c29ac56955	Further cleanup	2025-12-16 22:50:29 -08:00
Srihari-mcw	266fa80020	Cleanup of smaller loop of AVX2'	2025-12-16 22:50:28 -08:00
Srihari-mcw	d6fb079cb5	Cleanup commit for AVX2 GEMM bigger loop	2025-12-16 22:50:28 -08:00
Srihari-mcw	e1c3c053c0	Further cleanup of GEMM	2025-12-16 22:50:28 -08:00
Srihari-mcw	56b1f7d648	Initial cleanup of GEMM	2025-12-16 22:50:28 -08:00
Srihari-mcw	6e46dc1108	GEMM scalar implementation	2025-12-16 22:50:28 -08:00
Srihari-mcw	61a8c046dd	GEMV scalar implementation	2025-12-16 22:50:28 -08:00
Manogna-Sree	ed662687cf	Avx512 implementation of GEMM Q6K for edge handling case	2025-12-16 22:50:28 -08:00
Manogna-Sree	684c4cad9e	Avx512 implementation of GEMM Q6K	2025-12-16 22:50:28 -08:00
Manogna-Sree	5311e5217c	Initial implementation of GEMM Q6_K for edge handling case	2025-12-16 22:50:28 -08:00
Manogna-Sree	4630b5187e	Fix for inaccuracy of GEMM Q6K	2025-12-16 22:50:28 -08:00
Manogna-Sree	aedac0d7bc	Initial interleaving support for Q6_K Block Interleaving	2025-12-16 22:50:24 -08:00
TrevorS	4b2a4778f8	arg: allow -kvu flag for llama-perplexity (#18117 ) The -kvu (--kv-unified) flag is required for hellaswag and winogrande benchmarks which use coupled sequences. Without unified KV cache, these benchmarks fail with: split_equal: sequential split is not supported when there are coupled sequences in the input batch (you may need to use the -kvu flag) This change adds LLAMA_EXAMPLE_PERPLEXITY to the allowed examples for the -kvu argument, enabling its use with llama-perplexity.	2025-12-17 08:33:02 +02:00
Aadeshveer Singh	58062860af	ggml : use WARP_SIZE/2 for argmax reduction offset (#18092 )	2025-12-17 11:47:01 +08:00
Yuri Khrustalev	2973a65ecb	gguf-py : allow converting multi-tensor models from read-only locations (#18100 )	2025-12-17 02:27:03 +01:00
Johannes Gäßler	d0794e89d9	llama-fit-params: force disable mlock (#18103 )	2025-12-17 00:50:12 +01:00
Johannes Gäßler	9dcac6cf9f	llama-fit-params: lower ctx size for multi GPU (#18101 )	2025-12-17 00:49:34 +01:00
Johannes Gäßler	0e49a7b8b4	llama-fit-params: fix underflow for dense models (#18095 )	2025-12-17 00:47:37 +01:00
Johannes Gäßler	4164596c76	llama-fit-params: QoL impr. for prints/errors (#18089 )	2025-12-17 00:03:19 +01:00
Xuan-Son Nguyen	ef83fb8601	model: fix LFM2 missing tensors (#18105 )	2025-12-16 19:07:43 +01:00
Johannes Gäßler	ec98e20021	llama: fix early stop in params_fit if ctx is set (#18070 )	2025-12-16 14:24:00 +01:00
yifant-code	59977eba7b	server: fix crash when batch > ubatch with embeddings (#17912 ) * server: fix crash when batch > ubatch with embeddings (#12836) Fixes #12836 where the server crashes with GGML_ASSERT failure when running with embeddings enabled and n_batch > n_ubatch. Root cause: Embeddings use non-causal attention which requires all tokens to be processed within a single ubatch. When n_batch > n_ubatch, the server attempts to split processing, causing assertion failure. Solution: - Add parameter validation in main() after common_params_parse() - When embeddings enabled and n_batch > n_ubatch: * Log warnings explaining the issue * Automatically set n_batch = n_ubatch * Prevent server crash This follows the approach suggested by @ggerganov in issue #12836. Note: This supersedes stalled PR #12940 which attempted a runtime fix in the old examples/server/server.cpp location. This implementation validates at startup in tools/server/server.cpp (current location). Testing: - Build: Compiles successfully - Validation triggers: Warns when -b > -ub with --embedding - Auto-correction works: Adjusts n_batch = n_ubatch - No false positives: Valid params don't trigger warnings - Verified on macOS M3 Pro with embedding model * Update tools/server/server.cpp --------- Co-authored-by: ytian218 <ytian218@bloomberg.net> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2025-12-16 14:27:36 +02:00
Daniel Bevenius	79dbae034a	model-conversion : remove -fa option in model card template [no ci] (#18088 ) This commit updates the causal model card template and removes the -fa option as it is no longer required (fa is auto detected).	2025-12-16 13:25:09 +01:00
Xuan-Son Nguyen	7f2b2f3c77	arch: refactor LLM_TENSOR_NAMES (#18051 ) * arch: refactor LLM_TENSOR_NAMES * update docs * typo * fix LLM_ARCH_NEMOTRON_H_MOE * show more meaningful error message on missing tensor * fix and tested LLM_ARCH_NEMOTRON_H_MOE	2025-12-16 13:22:30 +01:00
Xuan-Son Nguyen	7b1db3d3b7	arg: clarify auto kvu/np being set on server (#17997 ) * arg: clarify auto kvu/np being set on server * improve docs * use invalid_argument	2025-12-16 12:01:27 +01:00
Piotr Wilkin (ilintar)	a5251ca11d	Optimization: Qwen3 next autoregressive pass (#17996 ) * It's Qwen3 Next, the lean mean token generation machine! * Apply patches from thread * Remove recurrent version, only keep chunked and autoregressive * Remove unnecessary conts and asserts * Remove more extra conts and asserts * Cleanup masking	2025-12-16 11:59:53 +01:00
Andrew Aladjev	fb644247de	CLI: fixed adding cli and completion into docker containers, improved docs (#18003 ) Co-authored-by: Andrew Aladjev <andrew.aladjev@gmail.com>	2025-12-16 11:52:23 +01:00
2114L3	5f5f9b4637	server: Update README.md incorrect argument (#18073 ) n-gpu-layer is incorrect argument is n-gpu-layers with the 's'	2025-12-16 11:50:43 +01:00
Xuan-Son Nguyen	3d86c6c2b5	model: support GLM4V vision encoder (#18042 ) * convert ok * no deepstack * less new tensors * cgraph ok * add mrope for text model * faster patch merger * add GGML_ROPE_TYPE_MRNORM * add support for metal * move glm4v do dedicated graph * convert: add norm_embd * clip: add debugging fn * working correctly * fix style * use bicubic * fix mrope metal * improve cpu * convert to neox ordering on conversion * revert backend changes * force stop if using old weight * support moe variant * fix conversion * fix convert (2) * Update tools/mtmd/clip-graph.h Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * process mrope_section on TextModel base class * resolve conflict merge --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2025-12-16 11:25:26 +01:00
Daniel Bevenius	9963b81f63	model-conversion : add note about verifying previous models (#18082 ) This commit adds a note to the README in the model-conversion examples, advising developers to verify that previous versions of models pass logits verification before adding new models from the same family.	2025-12-16 11:17:40 +01:00
Daniel Bevenius	db81d5ec4b	model-conversion : use CONVERTED_EMBEDDING_MODEL for embedding_verify_logits (#18079 ) This commit updates the embedding model verification script to use the CONVERTED_EMBEDDING_MODEL environment variable instead of using the EMBEDDING_MODEL_PATH (the original embedding model path) as the basis for the converted model file name. The motivation for this that currently if the converted embedding model file name differs from the original embedding model directory/name the verification script will look for the wrong .bin files that were generating when running the models.	2025-12-16 11:17:20 +01:00
Aldehir Rojas	c05aa69f32	common : add nemotron 3 parsing (#18077 ) * common : expose json-schema functionality to extract type info * common : fix peg parser negation during needs_more_input * common : add some defensive measures in constructed peg parser * common : add nemotron nano 3 support * common : add nemotron nano 3 tests * remove debug line	2025-12-16 04:05:23 -06:00
Francisco Herrera	279cef27c2	added note for old Intel hardware pre sycl (#18017 ) * added note for old Intel hardware pre sycl Older hardware used opencl * typo * use consistent terms	2025-12-16 17:45:09 +08:00
Georgi Gerganov	5ba95754ee	security : add collaborator guidance (#18081 )	2025-12-16 11:17:11 +02:00
Chris Peterson	2aa45ef9e3	llama: Include algorithm header needed for C++23 (#18078 )	2025-12-16 09:37:55 +02:00
Georgi Gerganov	c560316440	graph : reuse SSM graphs (#16490 ) * graph : reuse hybrid graphs * graph : reuse recurrent graphs * graph : fix reuse check for recurrent inputs * memory : move the recurrent state into the memory context * Revert "memory : move the recurrent state into the memory context" This reverts commit 00f115fe810815d4a22a6dee0acc346131e970e1. * cont : fix build	2025-12-16 09:36:21 +02:00
Sigbjørn Skjæret	d6742125c3	ci : separate webui from server (#18072 ) * separate webui from server * add public to path	2025-12-16 08:17:26 +01:00
Aleksander Grygier	3034836d36	webui: Improve copy to clipboard with text attachments (#17969 ) * feat: Create copy/paste user message including "pasted text" attachments * chore: update webui build output * chore: update webui static output * fix: UI issues * chore: update webui static output * fix: Decode HTML entities using `DOMParser` * chore: update webui build output * chore: update webui static output	2025-12-16 07:38:46 +01:00
Aleksander Grygier	a20979d433	webui: Add setting to always show sidebar on Desktop (#17809 ) * feat: Add setting to always show Sidebar on Desktop * chore: update webui build output * feat: Add auto-show sidebar setting * fix: Mobile settings dialog UI * chore: update webui build output * feat: UI label update * chore: update webui build output * chore: update webui build output * chore: update webui build output * refactor: Cleanup * chore: update webui build output	2025-12-16 07:31:37 +01:00
Daniel Bevenius	2995341730	llama : add support for NVIDIA Nemotron 3 Nano (#18058 ) * llama : add support for NVIDIA Nemotron Nano 3 This commit adds support for the NVIDIA Nemotron Nano 3 model, enabling the conversion and running of this model. Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2025-12-16 07:19:26 +01:00
Darius Lukas	40d9c394f4	Webui: Disable attachment button and model selector button when prompt textbox is disabled. (#17925 ) * Pass disabled state to the file attachments button and the model selector button. * Update index.html.gz * Fix model info card in non-router mode. * Update index.html.gz	2025-12-16 07:15:49 +01:00

1 2 3 4 5 ...

7466 Commits All Branches Search

7466 Commits

All Branches