llama.cpp

Commit Graph

Author	SHA1	Message	Date
ngxson	d41c719980	bring back n_completions	2024-06-11 14:31:45 +02:00
Christian Zhou-Zheng	446da906d9	fix n_completions	2024-06-11 08:22:38 -04:00
ngxson	163916864c	remember to copy back the last_eigenvector	2024-06-11 12:40:07 +02:00
ngxson	1a088fb0a5	working version	2024-06-11 12:37:05 +02:00
ngxson	9e39571fc2	add n_batch for pca	2024-06-11 11:45:16 +02:00
ngxson	6a5adf3d7c	fix shape of v_diff_original	2024-06-11 01:33:16 +02:00
ngxson	c241b500a1	clean up PCA ggml implementation	2024-06-11 01:13:10 +02:00
ngxson	a710df749c	(wip) refactor	2024-06-07 15:37:58 +02:00
Christian Zhou-Zheng	a42e783d75	update comments	2024-06-03 21:33:46 -04:00
Christian Zhou-Zheng	3815a0c306	pre-tokenize so we can allocate correct memory to ctx_diffs_wrapped	2024-06-03 21:26:13 -04:00
Christian Zhou-Zheng	23fd1b587c	update debug statements	2024-06-03 21:14:43 -04:00
Christian Zhou-Zheng	07dba13ab6	temporary commit while I move dev environments it finally outputs a functioning control vector - "functioning" in the sense that it can be loaded and it clearly has the right idea, but makes the model incoherent	2024-06-03 17:40:19 -04:00
ngxson	15d5c257a0	fix cb_eval	2024-06-02 10:58:11 +02:00
Christian Zhou-Zheng	a23c72e4c0	fix ggml errors and make new ones at least it compiles and runs	2024-06-01 22:19:33 -04:00
Christian Zhou-Zheng	b67ea65983	tentatively translate the rest	2024-06-01 20:47:28 -04:00
Christian Zhou-Zheng	0e1f9734de	translated everything but PCA (I think)	2024-06-01 19:50:46 -04:00
Christian Zhou-Zheng	df623fffe8	interim fix memory leak	2024-06-01 18:36:54 -04:00
Christian Zhou-Zheng	3090c485b6	remove unnecessary multithreading	2024-06-01 18:32:14 -04:00
Christian Zhou-Zheng	544268888b	in-series multithreading for prompt embedding? added commented-out code to attempt to start implementing mutlithreading for embedding in main	2024-06-01 17:25:21 -04:00
Christian Zhou-Zheng	86842b20e5	fix compiler warnings	2024-05-31 22:25:46 -04:00
Christian Zhou-Zheng	db3ba108e7	code aestheticization	2024-05-31 21:38:02 -04:00
Christian Zhou-Zheng	62560367aa	add command-line args for num threads, num completions file lines, always reload model refactored a few things and did what the commit message says on the tin	2024-05-31 21:27:14 -04:00
Christian Zhou-Zheng	4d7d71bc43	fix square_diff matmul index range and CRLF->LF line endings fixed a logic error where square_diff would not multiply all rows fixed a formatting error where the provided completions.txt had CRLF line endings	2024-05-31 21:08:25 -04:00
Christian Zhou-Zheng	4d88cd1af1	fix zero output & param parsing, functional templating fixed a bug where the output file had no tensor data/was all zero fixed a bug where single hyphen flags were not being correctly parsed implements creation of templated prompts from input (still need to adapt based on model)	2024-05-31 12:40:35 -04:00
Christian Zhou-Zheng	fa85ba6ae3	preliminary template/multiprompt support model is running out of context and that ought to be fixed (segfaulting) but other than that it looks goodish	2024-05-30 23:39:59 -04:00
Christian Zhou-Zheng	31f153fe9c	fix matrix transpose multiplication you have got to be kidding me	2024-05-30 21:36:17 -04:00
ngxson	d446c6d887	add debugs	2024-05-31 00:41:12 +02:00
ngxson	287da25f48	fix mem error	2024-05-31 00:06:45 +02:00
ngxson	447023fc43	add multi prompts, multi-thread for PCA	2024-05-30 23:58:32 +02:00
Christian Zhou-Zheng	dc46264ff0	example template completions Implements an example template set built from the positive/negative prompts like the control vector Python implementation.	2024-05-30 13:12:54 -04:00
Christian Zhou-Zheng	f58f6af133	param parsing, refactor, comments Added basic command-line parameters for outfile and one each positive/negative prompt. Refactored some messy code in PCA computation and GGUF exporting. Left a bunch of comments regarding further work needed.	2024-05-30 11:31:45 -04:00
Christian Zhou-Zheng	73747fe8eb	proof-of-concept stdlib implementation Implements PCA and file writing using mostly standard libraries. The output is recognized as a functional control vector, but outputs gibberish.	2024-05-30 00:31:29 -04:00
ngxson	b30bea3257	add comments	2024-05-24 22:50:03 +02:00
ngxson	c31c118d86	calc diff	2024-05-24 11:46:47 +02:00
ngxson	0a46d73056	add control-vector-generator	2024-05-24 11:11:55 +02:00
Georgi Gerganov	74f33adf5f	readme : remove trailing space (#7469 )	2024-05-23 17:43:18 +03:00
Georgi Gerganov	1debe72737	ggml : silence UB sanitizer error during iq2_xxs quantization (#0 )	2024-05-23 17:25:38 +03:00
Tristan Druyen	007489e895	Fix phi3 chat template confusion with zephyr (#7449 ) * Fix phi3 template matching vs zephyr * Add regression test for new phi3 chat template * Implement review suggestions * Fix phi3 jinja test templates & match by <\|end\|> * Apply suggestion Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com> * Add all phi3 template variants in tests * Remove unneeded message trimming Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com> * Fix tests to not expect trimmed messages --------- Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com>	2024-05-23 16:15:15 +02:00
Raj Hammeer Singh Hada	8b94e799df	readme : add Bunny in supported models [no ci] (#7469 )	2024-05-23 15:30:13 +03:00
Daniel Bevenius	3015851c5a	llama : add getters for n_threads/n_threads_batch (#7464 ) * llama : add getters for n_threads/n_threads_batch This commit adds two new functions to the llama API. The functions can be used to get the number of threads used for generating a single token and the number of threads used for prompt and batch processing (multiple tokens). The motivation for this is that we want to be able to get the number of threads that the a context is using. The main use case is for a testing/verification that the number of threads is set correctly. Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com> * squash! llama : add getters for n_threads/n_threads_batch Rename the getters to llama_n_threads and llama_n_threads_batch. Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com> --------- Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>	2024-05-23 15:29:26 +03:00
Georgi Gerganov	55ac3b7aea	ci : use Pythia models instead of OpenLlama (#7470 ) * ci : start using Pythia models over OpenLlama ggml-ci * ci : disable q2_k ppl tests * ci : use convert-hf-to-gguf.py * ci : update gg_get_model * ci : fix convert outfile name ggml-ci * llama : gptneox arch use F32 attn prec ggml-ci	2024-05-23 15:28:14 +03:00
Victor Nogueira	dacfcebd60	readme : add GPT-NeoX + Pythia to the list of supported models (#7491 )	2024-05-23 15:12:43 +03:00
fairydreaming	9b82476ee9	Add missing inference support for GPTNeoXForCausalLM (Pythia and GPT-NeoX base models) (#7461 ) * convert-hf : add conversion of bloom-style qkv tensor to gpt-style qkv (code borrowed from BloomModel) * llama : add inference support for LLM_ARCH_GPTNEOX * llama : add model types for every Pythia variant and GPT-NeoX Co-authored-by: Stanisław Szymczyk <sszymczy@gmail.com>	2024-05-23 11:49:53 +02:00
Georgi Gerganov	a61a94e543	llama : rename n_ctx -> cache.size, less confusing (#0 )	2024-05-23 12:38:18 +03:00
Brian	152da28ae5	labeler.yml: add embedding label detector [no ci] (#7482 )	2024-05-23 17:40:43 +10:00
Georgi Gerganov	d48c88cbd5	ggml : remove ggml_flash_attn and ggml_flash_ff (#7463 ) ggml-ci	2024-05-23 10:00:44 +03:00
Georgi Gerganov	e84b71c2c6	ggml : drop support for QK_K=64 (#7473 ) * ggml : drop support for QK_K=64 ggml-ci * opencl : restore QK_K=256 define	2024-05-23 10:00:21 +03:00
0cc4m	1b1e27cb49	Update vulkan rope implementation to support frequency factors (#7475 )	2024-05-23 08:59:59 +02:00
Georgi Gerganov	fbf777d2b9	main : minor (#7462 )	2024-05-23 09:43:49 +03:00
Johannes Gäßler	cd93a28cb1	CUDA: fix FA out-of-bounds reads (#7479 )	2024-05-23 00:31:20 +02:00

1 2 3 4 5 ...

3021 Commits All Branches Search

3021 Commits

All Branches