Commit Graph

3021 Commits

Author SHA1 Message Date
ngxson d41c719980 bring back n_completions 2024-06-11 14:31:45 +02:00
Christian Zhou-Zheng 446da906d9 fix n_completions 2024-06-11 08:22:38 -04:00
ngxson 163916864c remember to copy back the last_eigenvector 2024-06-11 12:40:07 +02:00
ngxson 1a088fb0a5 working version 2024-06-11 12:37:05 +02:00
ngxson 9e39571fc2 add n_batch for pca 2024-06-11 11:45:16 +02:00
ngxson 6a5adf3d7c fix shape of v_diff_original 2024-06-11 01:33:16 +02:00
ngxson c241b500a1 clean up PCA ggml implementation 2024-06-11 01:13:10 +02:00
ngxson a710df749c (wip) refactor 2024-06-07 15:37:58 +02:00
Christian Zhou-Zheng a42e783d75 update comments 2024-06-03 21:33:46 -04:00
Christian Zhou-Zheng 3815a0c306 pre-tokenize so we can allocate correct memory to ctx_diffs_wrapped 2024-06-03 21:26:13 -04:00
Christian Zhou-Zheng 23fd1b587c update debug statements 2024-06-03 21:14:43 -04:00
Christian Zhou-Zheng 07dba13ab6 temporary commit while I move dev environments
it finally outputs a functioning control vector - "functioning" in the sense that it can be loaded and it clearly has the right idea, but makes the model incoherent
2024-06-03 17:40:19 -04:00
ngxson 15d5c257a0 fix cb_eval 2024-06-02 10:58:11 +02:00
Christian Zhou-Zheng a23c72e4c0 fix ggml errors and make new ones
at least it compiles and runs
2024-06-01 22:19:33 -04:00
Christian Zhou-Zheng b67ea65983 tentatively translate the rest 2024-06-01 20:47:28 -04:00
Christian Zhou-Zheng 0e1f9734de translated everything but PCA (I think) 2024-06-01 19:50:46 -04:00
Christian Zhou-Zheng df623fffe8 interim fix memory leak 2024-06-01 18:36:54 -04:00
Christian Zhou-Zheng 3090c485b6 remove unnecessary multithreading 2024-06-01 18:32:14 -04:00
Christian Zhou-Zheng 544268888b in-series multithreading for prompt embedding?
added commented-out code to attempt to start implementing mutlithreading for embedding in main
2024-06-01 17:25:21 -04:00
Christian Zhou-Zheng 86842b20e5 fix compiler warnings 2024-05-31 22:25:46 -04:00
Christian Zhou-Zheng db3ba108e7 code aestheticization 2024-05-31 21:38:02 -04:00
Christian Zhou-Zheng 62560367aa add command-line args for num threads, num completions file lines, always reload model
refactored a few things and did what the commit message says on the tin
2024-05-31 21:27:14 -04:00
Christian Zhou-Zheng 4d7d71bc43 fix square_diff matmul index range and CRLF->LF line endings
fixed a logic error where square_diff would not multiply all rows

fixed a formatting error where the provided completions.txt had CRLF line endings
2024-05-31 21:08:25 -04:00
Christian Zhou-Zheng 4d88cd1af1 fix zero output & param parsing, functional templating
fixed a bug where the output file had no tensor data/was all zero

fixed a bug where single hyphen flags were not being correctly parsed

implements creation of templated prompts from input (still need to adapt based on model)
2024-05-31 12:40:35 -04:00
Christian Zhou-Zheng fa85ba6ae3 preliminary template/multiprompt support
model is running out of context and that ought to be fixed (segfaulting) but other than that it looks goodish
2024-05-30 23:39:59 -04:00
Christian Zhou-Zheng 31f153fe9c fix matrix transpose multiplication
you have got to be kidding me
2024-05-30 21:36:17 -04:00
ngxson d446c6d887 add debugs 2024-05-31 00:41:12 +02:00
ngxson 287da25f48 fix mem error 2024-05-31 00:06:45 +02:00
ngxson 447023fc43 add multi prompts, multi-thread for PCA 2024-05-30 23:58:32 +02:00
Christian Zhou-Zheng dc46264ff0 example template completions
Implements an example template set built from the positive/negative prompts like the control vector Python implementation.
2024-05-30 13:12:54 -04:00
Christian Zhou-Zheng f58f6af133 param parsing, refactor, comments
Added basic command-line parameters for outfile and one each positive/negative prompt.

Refactored some messy code in PCA computation and GGUF exporting.

Left a bunch of comments regarding further work needed.
2024-05-30 11:31:45 -04:00
Christian Zhou-Zheng 73747fe8eb proof-of-concept stdlib implementation
Implements PCA and file writing using mostly standard libraries. The output is recognized as a functional control vector, but outputs gibberish.
2024-05-30 00:31:29 -04:00
ngxson b30bea3257 add comments 2024-05-24 22:50:03 +02:00
ngxson c31c118d86 calc diff 2024-05-24 11:46:47 +02:00
ngxson 0a46d73056 add control-vector-generator 2024-05-24 11:11:55 +02:00
Georgi Gerganov 74f33adf5f
readme : remove trailing space (#7469) 2024-05-23 17:43:18 +03:00
Georgi Gerganov 1debe72737
ggml : silence UB sanitizer error during iq2_xxs quantization (#0) 2024-05-23 17:25:38 +03:00
Tristan Druyen 007489e895
Fix phi3 chat template confusion with zephyr (#7449)
* Fix phi3 template matching vs zephyr

* Add regression test for new phi3 chat template

* Implement review suggestions

* Fix phi3 jinja test templates & match by <|end|>

* Apply suggestion

Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com>

* Add all phi3 template variants in tests

* Remove unneeded message trimming

Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com>

* Fix tests to not expect trimmed messages

---------

Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com>
2024-05-23 16:15:15 +02:00
Raj Hammeer Singh Hada 8b94e799df
readme : add Bunny in supported models [no ci] (#7469) 2024-05-23 15:30:13 +03:00
Daniel Bevenius 3015851c5a
llama : add getters for n_threads/n_threads_batch (#7464)
* llama : add getters for n_threads/n_threads_batch

This commit adds two new functions to the llama API. The functions
can be used to get the number of threads used for generating a single
token and the number of threads used for prompt and batch processing
(multiple tokens).

The motivation for this is that we want to be able to get the number of
threads that the a context is using. The main use case is for a
testing/verification that the number of threads is set correctly.

Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>

* squash! llama : add getters for n_threads/n_threads_batch

Rename the getters to llama_n_threads and llama_n_threads_batch.

Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>

---------

Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>
2024-05-23 15:29:26 +03:00
Georgi Gerganov 55ac3b7aea
ci : use Pythia models instead of OpenLlama (#7470)
* ci : start using Pythia models over OpenLlama

ggml-ci

* ci : disable q2_k ppl tests

* ci : use convert-hf-to-gguf.py

* ci : update gg_get_model

* ci : fix convert outfile name

ggml-ci

* llama : gptneox arch use F32 attn prec

ggml-ci
2024-05-23 15:28:14 +03:00
Victor Nogueira dacfcebd60
readme : add GPT-NeoX + Pythia to the list of supported models (#7491) 2024-05-23 15:12:43 +03:00
fairydreaming 9b82476ee9
Add missing inference support for GPTNeoXForCausalLM (Pythia and GPT-NeoX base models) (#7461)
* convert-hf : add conversion of bloom-style qkv tensor to gpt-style qkv (code borrowed from BloomModel)

* llama : add inference support for LLM_ARCH_GPTNEOX

* llama : add model types for every Pythia variant and GPT-NeoX

Co-authored-by: Stanisław Szymczyk <sszymczy@gmail.com>
2024-05-23 11:49:53 +02:00
Georgi Gerganov a61a94e543
llama : rename n_ctx -> cache.size, less confusing (#0) 2024-05-23 12:38:18 +03:00
Brian 152da28ae5
labeler.yml: add embedding label detector [no ci] (#7482) 2024-05-23 17:40:43 +10:00
Georgi Gerganov d48c88cbd5
ggml : remove ggml_flash_attn and ggml_flash_ff (#7463)
ggml-ci
2024-05-23 10:00:44 +03:00
Georgi Gerganov e84b71c2c6
ggml : drop support for QK_K=64 (#7473)
* ggml : drop support for QK_K=64

ggml-ci

* opencl : restore QK_K=256 define
2024-05-23 10:00:21 +03:00
0cc4m 1b1e27cb49
Update vulkan rope implementation to support frequency factors (#7475) 2024-05-23 08:59:59 +02:00
Georgi Gerganov fbf777d2b9
main : minor (#7462) 2024-05-23 09:43:49 +03:00
Johannes Gäßler cd93a28cb1
CUDA: fix FA out-of-bounds reads (#7479) 2024-05-23 00:31:20 +02:00