llama.cpp

Commit Graph

Author	SHA1	Message	Date
Ed Addario	b0b33b7ccb	Optimise tensor sampling	2025-08-20 20:58:26 +01:00
Ed Addario	3f0118d602	Fix bias lambda bug	2025-08-20 17:26:37 +01:00
Ed Addario	52da4a4f8c	Skip if output.weight or type is COPY	2025-08-20 17:26:05 +01:00
Ed Addario	43caadf783	Add better fallbacks for IQ mixes	2025-08-20 17:24:48 +01:00
Johannes Gäßler	7a6e91ad26	CUDA: replace GGML_CUDA_F16 with CUDA arch checks (#15433 )	2025-08-20 16:58:49 +02:00
Jeff Bolz	fec9519802	vulkan: shorten pipeline name strings (#15431 ) These detailed strings were causing increased build time on gcc.	2025-08-20 16:33:14 +02:00
Ed Addario	29b2dc3ec0	Do not mix K and IQ quants	2025-08-20 13:27:01 +01:00
Daniel Bevenius	657b8a77bd	chat: handle gpt-oss return/end token inconsistency (#15421 ) This commit addresses an inconsistency during inference by adding a new member to the `templates_params` struct to indicate whether the chat is in inference mode. This allows the gpt-oss specific function `common_chat_params_init_gpt_oss` to check this flag and the `add_generation_prompt` flag to determine if it should replace the `<\|return\|>` token with the `<\|end\|>` token in the prompt. The motivation for this change is to ensure that the formatted prompt of past messages in `common_chat_format_single` matches the output of the formatted new message. The issue is that the gpt-oss template returns different end tags: `<\|return\|>` when `add_generation_prompt` is false, and `<\|end\|>` when `add_generation_prompt` is true. This causes the substring function to start at an incorrect position, resulting in tokenization starting with 'tart\|>' instead of '<\|start\|>'. Resolves: https://github.com/ggml-org/llama.cpp/issues/15417	2025-08-20 14:26:01 +02:00
Ed Addario	69586e212e	Add F16/BF16 type	2025-08-20 13:23:11 +01:00
Jie Fu (傅杰)	ec5ab1a36c	common : fix context shift help message (#15448 ) Signed-off-by: Jie Fu <jiefu@tencent.com>	2025-08-20 13:33:30 +03:00
xiaobing318	1a99c2d948	cmake : fix target include directories (#15450 ) * Update docker.yml 修改docker.yml文件中的内容使其停止周期性的运行该workflow，如果想要运行该workflow可以手动启动 * feat:Modify the header file include path 1. There's no llava directory in the tools directory. 2. Because the command `target_include_directories(mtmd PUBLIC .)` is used in the `mtmd` CMakeLists.txt file, other targets that link against `mtmd` automatically include the `mtmd` directory as a search path for header files. Therefore, you can remove `target_include_directories(${TARGET} PRIVATE ../llava`` or use `target_include_directories(${TARGET} PRIVATE ../mtmd`` to explicitly require the `llama-server` target to use header files from `mtmd`. * Restore the docker.yml file	2025-08-20 13:32:05 +03:00
Daniel Bevenius	37f10f955f	make : remove make in favor of CMake (#15449 ) This commit removes the content from the Makefile and updates the current deprecation message to information that `make` has been replaced by CMake instead. The message when `make` is invoked will now be the following: ```console $ make Makefile:6: *** Build system changed: The Makefile build has been replaced by CMake. For build instructions see: https://github.com/ggml-org/llama.cpp/blob/master/docs/build.md . Stop. ``` The motivation for this is that many, if not all targets fail to build now, after changes to the system, and `make` has also been deprected for some time now.	2025-08-20 13:31:16 +03:00
Georgi Gerganov	2f37014073	lookahead : add sample command to readme (#15447 ) * lookahead : add sample command to readme * cont : build-agnostic command	2025-08-20 13:30:46 +03:00
Ed Addario	5cd69a6809	Add F16/BF16 type	2025-08-20 09:41:39 +01:00
R0CKSTAR	a094f38143	musa: fix build warnings (#15258 ) * musa: fix build warnings Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com> * fix warning: comparison of integers of different signs: 'const int' and 'unsigned int' [-Wsign-compare] Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com> --------- Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>	2025-08-20 10:17:37 +08:00
Ed Addario	b33abae231	Merge branch 'master' into quantize	2025-08-19 23:39:07 +01:00
Ed Addario	936294f6af	Increase precision for error calculation	2025-08-19 23:31:22 +01:00
Ed Addario	f22b3097eb	Avoid division by zero if truncation occurs	2025-08-19 22:34:01 +01:00
Ed Addario	ee05d6bc0b	Update comments	2025-08-19 22:32:53 +01:00
Ed Addario	5aceb9e3ae	Refactor variable names	2025-08-19 22:29:27 +01:00
lhez	fb22dd07a6	opencl: mark `argsort` unsupported if cols exceed workgroup limit (#15375 )	2025-08-19 11:25:51 -07:00
Georgi Gerganov	9ef6b0b835	model : add gpt-oss type strings (#15424 )	2025-08-19 19:58:28 +03:00
Gian-Carlo Pascutto	1e19f5d462	common : Add top-nsigma sampler to help globally (#15428 ) Fixes #15423.	2025-08-19 19:58:14 +03:00
Georgi Gerganov	d2fcd91cf9	server : disable context shift by default (#15416 ) * server : disable context shift by default ggml-ci * server : make scopr of test parameters local	2025-08-19 16:46:37 +03:00
SHUAI YANG	a6d3cfe7fa	CANN: optimize rope operator (#15335 ) * optimize rope ops * amendment * delete trailing whitespace * change the variable name	2025-08-19 21:28:22 +08:00
R0CKSTAR	67f09a3a27	musa: handle __hgt2_mask, available starting from MUSA SDK rc4.3.0 (#15413 ) Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>	2025-08-19 12:33:47 +02:00
Ed Addario	1187f6aa9e	Implement bpw_overrides call	2025-08-19 11:07:03 +01:00
Ed Addario	92f49ab399	Add target_bpw_type() logic	2025-08-19 11:05:01 +01:00
Ed Addario	017945a3b2	Validate if imatrix contains activations	2025-08-19 11:03:52 +01:00
Ed Addario	9adae08789	Add is_iq()	2025-08-19 11:00:50 +01:00
Ed Addario	c96b8eef94	Add fallback_type enum	2025-08-19 11:00:05 +01:00
Ed Addario	a22a9deeee	Refactor variable and add target_bpw	2025-08-19 10:57:44 +01:00
Ed Addario	1b3d5b5744	Populate params	2025-08-19 10:56:02 +01:00
Ed Addario	e877474458	Process target_bpw parameter	2025-08-19 10:54:02 +01:00
Ed Addario	0edbf0c176	Process activations	2025-08-19 10:51:58 +01:00
Ed Addario	77b818c040	Populate activations_data with imatrix activations if present	2025-08-19 10:50:37 +01:00
Ed Addario	e6d55dc47b	Load activations	2025-08-19 10:49:01 +01:00
Ed Addario	5e85fb3ff3	Add parse_target_bpw()	2025-08-19 10:46:36 +01:00
Ed Addario	cfec4048ab	Update usage	2025-08-19 10:43:51 +01:00
Ed Addario	4d9491141b	Add target_bpw parameter	2025-08-19 10:43:21 +01:00
Marvin Gießing	6424594c56	ggml-cpu: add mxfp4 VSX intrinsics for Power9+ (ppc64le) hardware (#15385 ) * Added VSX intrinsics for Power9+ systems Signed-off-by: mgiessing <marvin.giessing@gmail.com> * Manual unrolling for minor perf improvement Signed-off-by: mgiessing <marvin.giessing@gmail.com> * Update ggml/src/ggml-cpu/arch/powerpc/quants.c Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> --------- Signed-off-by: mgiessing <marvin.giessing@gmail.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2025-08-19 11:54:31 +03:00
Ed Addario	ba7335efb3	Refactor variable name	2025-08-19 09:54:29 +01:00
Xuan-Son Nguyen	e9288e8869	chat : clarify the meaning of reasoning_format (#15408 ) * chat : clarify the meaning of reasoning_format * add link to this PR	2025-08-19 10:29:36 +02:00
Georgi Gerganov	9d262f4bad	server : remove swa_full warning (#15399 )	2025-08-19 08:45:26 +03:00
Georgi Gerganov	f0d3c7405c	batched-bench : use rand tokens (#15398 )	2025-08-19 08:45:12 +03:00
Xuan-Son Nguyen	f08c4c0d8d	mtmd : clean up clip_n_output_tokens (#15391 )	2025-08-18 22:53:52 +02:00
Georgi Gerganov	6d7f1117e3	codeowners : remove mmv.*	2025-08-18 22:06:44 +03:00
Georgi Gerganov	60212f1ead	sync : ggml	2025-08-18 22:06:44 +03:00
Georgi Gerganov	f0c541d315	scripts : update sync scripts	2025-08-18 22:06:44 +03:00
Sigbjørn Skjæret	baa9255a45	llama : merge conts and reshapes and remove unnecessary cont (#15380 ) * remove unnecessary conts and merge reshapes * restore necessary conts * merge more conts and reshapes * merge even more conts and reshapes	2025-08-18 19:30:17 +02:00

... 19 20 21 22 23 ...

7244 Commits All Branches Search

7244 Commits

All Branches