Commit Graph

3111 Commits

Author SHA1 Message Date
HanishKVC 184ac322e3 ChatON: Make json_get efficient and flexible wrt its calling
Also explicitly indicate that we are looking at a chain of keys
2024-05-13 16:21:02 +05:30
Neo Zhang 948f4ec7c5
[SYCL] rm wait() (#7233) 2024-05-13 18:11:26 +08:00
Joan Fontanals 9aa672490c
llama : rename jina tokenizers to v2 (#7249)
* refactor: rename jina tokenizers to v2

* refactor: keep refactoring non-breaking
2024-05-13 11:35:14 +03:00
HanishKVC eb7554ca3b ChatON: Avoid -> to match simpcfg as well as corresponding keys 2024-05-13 10:37:14 +05:30
Brian b1f8af1886
convert.py: Outfile default name change and additional metadata support (#4858)
* convert.py: Outfile default name change and additional metadata support

* convert.py: don't stringify Metadata load method output

* convert.py: typo fix

* convert.py: fix metadata format to sync with LLM_KV_NAMES in llama.cpp
2024-05-13 12:56:47 +10:00
Benjamin Findley e586ee4259
change default temperature of OAI compat API from 0 to 1 (#7226)
* change default temperature of OAI compat API from 0 to 1

* make tests explicitly send temperature to OAI API
2024-05-13 12:40:08 +10:00
Neo Zhang cbf75894d2
[SYCL] Add oneapi runtime dll files to win release package (#7241)
* add oneapi running time dlls to release package

* fix path

* fix path

* fix path

* fix path

* fix path

---------

Co-authored-by: Zhang <jianyu.zhang@intel.com>
2024-05-13 08:04:29 +08:00
Neo Zhang 0d5cef78ae
[SYCL] update CI with oneapi 2024.1 (#7235)
Co-authored-by: Zhang <jianyu.zhang@intel.com>
2024-05-13 08:02:55 +08:00
HanishKVC d5b0bfbaec SimpCfg: Remove now unused SC_DEBUG, rather GroupKV uses equiv
The code which was using SC_DEBUG moved to GroupKV and inturn
GKV_DEBUG
2024-05-13 00:33:36 +05:30
HanishKVC 857570f8f8 SimpCfgTest: Update dump usage to GKV return string semantic 2024-05-13 00:20:58 +05:30
HanishKVC 9249649fb3 ChatON+TestPrgs: Use specific log files 2024-05-12 23:59:48 +05:30
Johannes Gäßler dc685be466
CUDA: add FP32 FlashAttention vector kernel (#7188)
* CUDA: add FP32 FlashAttention vector kernel

* fixup! CUDA: add FP32 FlashAttention vector kernel

* fixup! fixup! CUDA: add FP32 FlashAttention vector kernel

* fixup! fixup! fixup! CUDA: add FP32 FlashAttention vector kernel
2024-05-12 19:40:45 +02:00
HanishKVC 3d33d62924 SimpCfg: Move testing code into its own file in tests
Also set functions to inline or static as appropriate
2024-05-12 22:53:48 +05:30
HanishKVC f2dd1263fd GroupKV: Move test code into its own file in tests 2024-05-12 22:33:48 +05:30
HanishKVC 6048218383 SimpCFG: COnvert to GroupKV extended version
Reuse the code already moved into GroupKV

Add explicit get and set wrt int32_t, which was added after move
to GroupKV wrt basic MapOfMapOfVariant logic.
2024-05-12 21:58:59 +05:30
Georgi Gerganov 6f1b63606f
cmake : fix version cmp (#7227) 2024-05-12 18:30:23 +03:00
HanishKVC db2ffabb18 ChatON: use templated json_get when loading bool key-value fields
With this now even loading chaton_meta.json file will generate
more informative exception, so that user can know which field
is missing, if any.
2024-05-12 18:26:58 +05:30
HanishKVC 470b8885f3 ChatON: Switch to templated json_get for str/bool/etal 2024-05-12 18:19:18 +05:30
HanishKVC 0249c07e6b ChatON:Switch to json_get_str to help identify missing keys better
The json library generates less informative exception message,
which doesnt help one identify which key is missing, so switch to
the new json_get_str helper added in the last commit. It generates
more informative exception message.
2024-05-12 17:44:13 +05:30
HanishKVC 4eae05a6b7 ChatON: json access helper which raises exception if key missing 2024-05-12 17:34:04 +05:30
HanishKVC f94fed92d3 ChatON+MetaHpp: Had forgotten to conv reverse-prompt
Also has dump was using get_value calls with fallback to default,
so it wasnt identifying the missed field.

Have fixed both of those. Also reconverted meta json file.

Misc: interesting avesham and aattam
2024-05-12 16:20:28 +05:30
HanishKVC 4232ec1fb9 Main: Load json meta file only if specified
This should be ok, given that there is a version of the chat tmpl
meta data already included with the library.

So only if user wants to change the chat template info wrt a existing
model/template-standard or add a new one, then there is need to
pass a json file with info for that model/standard.
2024-05-12 14:53:37 +05:30
HanishKVC a3285e8e25 ChatON:Include auto converted ChatONMeta.hpp chat template data
This should allow for using this generic chat templating code flow
along with the included chat template data, without needing to
load any json file at runtime.

However If user wants to change the already included chat template
data, or add new chat template standard/model related data, one can
explicitly load json file.

TODO: Need to cross check this flow once, but logically should work
2024-05-12 14:08:09 +05:30
HanishKVC b8590e3e57 ChatON:P5:meta json to hpp: Add required c++ inc and global var
Also comment to indicate that the hpp file is auto converted from
the chaton_meta.json file
2024-05-12 14:06:24 +05:30
HanishKVC b5b274a44b ChatON:P4:meta json to hpp: Insert kv bool
Rename kv helpers to match their semantic.
* whether working with string or bool value
* whether two keys or a single key

Add support for kv with bool value

inturn add the kv boolean pairs used in the chaton_meta.json file

Add the closing bracket
2024-05-12 13:33:54 +05:30
HanishKVC 7b5fb0a2fa ChatON:P3:meta json to hpp: Retain esc seqs and more kv pairs
Use repr to retain the escape sequences in the read string.
And parallely skip the single quote around strings wrt repr.

Bring in more k-v pairs wrt chaton_meta.json
2024-05-12 13:06:22 +05:30
HanishKVC 078e04d32b ChatON:P2:meta json to hpp conversion - add k-v pairs skeleton 2024-05-12 12:42:53 +05:30
HanishKVC 0c21a0084f ChatON:p1: meta json to hpp conversion - Initial skeleton
load the json file and put the template ids
2024-05-12 12:42:23 +05:30
slaren b228aba91a
remove convert-lora-to-ggml.py (#7204) 2024-05-12 02:29:33 +02:00
HanishKVC 1574201f71 ChatON:LoadJSon:ChatTemplates: revPrompt, system-user flags
WIP:NOTE:

Initial go converting from json driven flow to ChatTemplatesGroupKV
related flow done. Needs to be tested.

A optional helper added to load ChatTemplates from a specified
json file.

Need to add a compile time initialized MapOfMapOfVariants wrt
the chat template details of models/standards already known
to the program. So that one can use the llama.cpp and this new
chat template logic, even without json dependency, if one doesnt
want to.
2024-05-12 01:45:19 +05:30
HanishKVC 444d2ccf9c ChatON:LoadJSON: ChatTemplates - global/system/user/assistant
Manually iterate the json object items using begin-end explicitly,
because the implicit iteration for loop related helpers for the
used json lib gives only the values and not a key-value pair.
2024-05-12 01:35:31 +05:30
HanishKVC 2efc09f2d0 ChatON: Unnecessarily indirect nlohmann json
code used for exploring/testing commited just for future reference
2024-05-12 00:42:17 +05:30
Georgi Gerganov 7bd4ffb780
metal : fix warnings (skipme) (#0) 2024-05-11 21:38:13 +03:00
Georgi Gerganov 1622ac023f
sync : ggml 2024-05-11 21:35:05 +03:00
Georgi Gerganov 6aeff24f8b
metal : fix indent (ggml/0) 2024-05-11 21:34:21 +03:00
Georgi Gerganov 325756d28d
ggml : resolve merge (ggml/0)
ggml-ci
2024-05-11 21:33:08 +03:00
HanishKVC b9d9700de3 CMakeLists.txt: Compile C++ code for -std=c++20 2024-05-11 23:42:08 +05:30
HanishKVC b944d04d08 ChatON: Add constructor for ChatTemplates which chains into GKV 2024-05-11 23:42:08 +05:30
HanishKVC d9959b74e7 GroupKV: Get ready for use in llama.cpp ++
Avoid defining GKV_TEST_PRG, used for self testing, by default

Add it to common library
2024-05-11 23:40:03 +05:30
Josh Ramer fed0108491
Scripting & documenting debugging one test without anything else in the loop. (#7096)
* A little documentation that shares my quick tips for working in the repository.

* Update startup-testing-debugging.md

* script that shows a menu of tests to pick from & run the debugger on

* debug-test.sh: Refactor CLI help message

* debug-test.sh: documentation update

* debug-test.sh: CLI Help output corrections

* debug-test.sh: minor doc fix

---------

authored-by: Josh Ramer <ubuntu@ip-172-31-32-53.ec2.internal>
Assisted-by: brian khuu <mofosyne@gmail.com>
2024-05-12 03:26:35 +10:00
HanishKVC 4a9a6ce256 ChatON: ChatONMetaDump switch to GKV/ChatTemplates based flow 2024-05-11 22:53:45 +05:30
Xuan Son Nguyen 72c177c1f6
fix system prompt handling (#7153) 2024-05-11 17:28:10 +02:00
HanishKVC 484c710eab GroupKV:Add GetValue which throws exception 2024-05-11 20:49:51 +05:30
compilade 5a419926b0
convert-hf : support bfloat16 conversion (#7158)
* convert-hf : support bfloat16 conversion

* gguf-py : flake8 fixes

* convert-hf : add missing space after comma

* convert-hf : get bit-exact same output as ./quantize

The quantization version was missing.

* convert-hf : don't round bf16 NANs

* convert-hf : save some memory with np.int16 intermediate bf16 weights

* convert-hf : more closely match llama.cpp with which weights to keep in f32

* convert-hf : add --outtype auto-f16

A reason for this to exist is for model quantizers who want an initial
GGUF with the most fidelity to the original model while still using
a 16-bit float type instead of 32-bit floats.

* convert-hf : remove a semicolon because flake8 doesn't like it

It's a reflex from when programming in C/C++, I guess.

* convert-hf : support outtype templating in outfile name

* convert-hf : rename --outtype auto-f16 to --outtype auto
2024-05-11 11:06:26 -04:00
HanishKVC 9d4450d51a GroupKV: Let dump return a string, rather than printing/logging 2024-05-11 19:43:34 +05:30
HanishKVC e999934e91 ChatON:WIP: initial go at GroupKV based flow, instead of json 2024-05-11 19:41:58 +05:30
HanishKVC f294fddf43 GroupKV: Add group_exists checker 2024-05-11 19:18:19 +05:30
HanishKVC dde72df9d3 GroupKV: Rename the internal map 2024-05-11 18:23:06 +05:30
Georgi Gerganov fae9d234b6 sync : ggml
ggml-ci
2024-05-11 15:38:34 +03:00
Justina Cho f5ef34e428 feat: implemented sigmoid function (ggml/806)
* added sigmoid function

* implemented metal kernel for sigmoid

* implemented cuda kernel for sigmoid

* added sigmoid unary op and incremented count
2024-05-11 15:38:34 +03:00