Commit Graph

872 Commits

Author SHA1 Message Date
HanishKVC 3fcaf19967 ChatON+:Multi4Single: applyGlobalIfAny flag wrt templating api
Given that now the multi chat templating logic itself is used to
apply chat templating/tagging to a single chat message, so give
flexibility of deciding whether global tags if any should be
applied or not wrt the core tagging logic.

examples/main inturn updated to not apply global tags if any wrt
the system message. Also the user messages already dont apply
global tags if any, as its currently implemented to build on the
existing in-prefix/suffix and anitprompt flow.
2024-05-14 01:00:17 +05:30
HanishKVC 0cfe99076d ChatON:ChatTemplates: TmplExists, TmplGetKey, TmplRoleGetKeys
ChatTemplate directly supports these now, as well as the existing
global instance based corresponding helpers depend on same.
2024-05-13 17:30:47 +05:30
HanishKVC 4232ec1fb9 Main: Load json meta file only if specified
This should be ok, given that there is a version of the chat tmpl
meta data already included with the library.

So only if user wants to change the chat template info wrt a existing
model/template-standard or add a new one, then there is need to
pass a json file with info for that model/standard.
2024-05-12 14:53:37 +05:30
HanishKVC abb406b888 Merge branch 'master' into hkvc_chaton_v3
Have merged master branch has of 20240510IST12XY with chaton_v3
branch.

As part of same had to update the flow in examples/main/main.cpp
wrt conversion related commit in master branch and my chaton related
commits in this branch.
2024-05-10 13:14:26 +05:30
Andrei d11afd6652
llava : fix moondream support (#7163)
* Revert "Revert "llava : add support for moondream vision language model (#6899)""

This reverts commit 9da243b36a.

* Fix num_positions and embeddings initialization
2024-05-10 09:41:10 +03:00
slaren eaf4bd8b39
eval-callback : fix conversion to float (#7184) 2024-05-10 01:04:12 +02:00
Ahmet Zeer 07cd41d096
TypoFix (#7162) 2024-05-09 10:16:45 +02:00
compilade f98eb31c51
convert-hf : save memory with lazy evaluation (#7075)
* convert-hf : begin refactoring write_tensor

* convert : upgrade to sentencepiece v0.2.0

* convert-hf : remove unused n_dims in extra_*_tensors

* convert-hf : simplify MoE weights stacking

* convert-hf : flake8 linter doesn't like semicolons

* convert-hf : allow unusual model part names

For example, loading `model-00001-of-00001.safetensors` now works.

* convert-hf : fix stacking MoE expert tensors

`torch.stack` and `torch.cat` don't do the same thing.

* convert-hf : fix Mamba conversion

Tested to work even with a SentencePiece-based tokenizer.

* convert : use a string for the SentencePiece tokenizer path

* convert-hf : display tensor shape

* convert-hf : convert norms to f32 by default

* convert-hf : sort model part names

`os.listdir` is said to list files in arbitrary order.
Sorting the file names should let "model-00009-of-00042.safetensors"
be loaded before "model-00010-of-00042.safetensors".

* convert-hf : use an ABC for Model again

It seems Protocol can't be used as a statically type-checked ABC,
because its subclasses also can't be instantiated. (why did it seem to work?)

At least there's still a way to throw an error when forgetting to define
the `model_arch` property of any registered Model subclasses.

* convert-hf : use a plain class for Model, and forbid direct instantiation

There are no abstract methods used anyway,
so using ABC isn't really necessary.

* convert-hf : more consistent formatting of cmdline args

* convert-hf : align the message logged for converted tensors

* convert-hf : fix Refact conversion

* convert-hf : save memory with lazy evaluation

* convert-hf : flake8 doesn't like lowercase L as a variable name

* convert-hf : remove einops requirement for InternLM2

* convert-hf : faster model parts loading

Instead of pre-loading them all into a dict, iterate on the tensors
in the model parts progressively as needed in Model.write_tensors

Conversion for some architectures relies on checking for the presence
of specific tensor names, so for multi-part models, the weight map is read
from the relevant json file to quickly get these names up-front.

* convert-hf : minor changes for consistency

* gguf-py : add tqdm as a dependency

It's small, and used for a progress bar
in GGUFWriter.write_tensors_to_file
2024-05-08 18:16:38 -04:00
Johannes Gäßler c12452c7ae
JSON: [key] -> .at(key), assert() -> GGML_ASSERT (#7143) 2024-05-08 21:53:08 +02:00
Georgi Gerganov 9da243b36a
Revert "llava : add support for moondream vision language model (#6899)"
This reverts commit 46e12c4692.
2024-05-08 22:14:39 +03:00
JohnnyB bd1871fa2b
server : add themes + favicon (#6848)
* Added themes support with two sample themes and a favicon.

* Newline

* Newline

* Newline

* Trailing whitespace

* Increased opacity for contrast

* Increase opacity.

Check actions cancelled for some other priority job and I can't seem to manually re-run them, so MOAR OPACITY

* Opacity action trigger.

Trying to re-trigger the cancelled action.

* One more opacity adjustment

This Actions pipeline is failing for random issues.

* Delete examples/server/themes/buttons_top/completion.js

This will be served from the static string built-in to server.

* Delete examples/server/themes/buttons_top/index.js

This will be served from the static string built-in to server.

* Delete examples/server/themes/wild/completion.js

This will be served from the static string built-in to server.

* Delete examples/server/themes/buttons_top/json-schema-to-grammar.mjs

This will be served from the static string built-in to server.

* Delete examples/server/themes/wild/index.js

This will be served from the static string built-in to server.

* Delete examples/server/themes/wild/json-schema-to-grammar.mjs

This will be served from the static string built-in to server.

* Replaced underscore.
2024-05-08 22:12:06 +03:00
Dawid Potocki 83330d8cd6
main : add --conversation / -cnv flag (#7108) 2024-05-08 17:32:32 +03:00
Johan 911b3900dd
server : add_special option for tokenize endpoint (#7059) 2024-05-08 15:27:58 +03:00
Xuan Son Nguyen 1fd9c1741d
clean up json_value & server_log (#7142) 2024-05-08 13:24:14 +02:00
Justine Tunney 3855416027
ggml : introduce bfloat16 support (#6412)
* Introduce bfloat16 support

Many models on Hugging Face (e.g. Mistral, TinyLLaMA) use bfloat16 as
their canonical floating point format.

      ┌sign
      │
      │   ┌exponent
      │   │
      │   │      ┌mantissa
      │   │      │
      │┌──┴───┐┌─┴───┐
    0b0000000000000000 brain16

This encoding has the same number of exponent bits as float32. That
makes conversion relatively straightforward, even in the absence of
hardware support. For example, converting brain16 to binary32 means
simply shifting 16 bits to the left.

      ┌sign
      │
      │   ┌exponent
      │   │
      │   │      ┌mantissa
      │   │      │
      │┌──┴───┐┌─┴───────────────────┐
    0b00000000000000000000000000000000 IEEE binary32

The issue is that converting bf16 to fp16 can result in information
loss. Only 13% of bf16 numbers can be precisely represented in fp16
which in practice ends up being 99.71% of Mistral 7b v0.2's weights
however there is currently no way other than fp32 to get the others

      ┌sign
      │
      │  ┌exponent
      │  │
      │  │    ┌mantissa
      │  │    │
      │┌─┴─┐┌─┴──────┐
    0b0000000000000000 IEEE binary16

This change fixes that, by adding a bf16 data type to GGML. Support
for CPU inference has been implemented along with optimizations for
the AVX2, AVX512, and AVX512BF16 ISAs. Perplexity on Mistral 7b 0.2
improves somewhere around -0.0024 to -0.0046 compared to using fp16

* Remove GGML code that's not needed

* Minimize the GGML API surface area for BF16

* Remove bf16 luts

* Make the GGML header look nicer

* Fix documentation

* Apply ggerganov's fixes for test-backend-ops

* Add BF16 code for new ggml_validate_row_data() function
2024-05-08 09:30:09 +03:00
jukofyork 48b2f9c1fc
Fixed save_imatrix to match old behaviour for MoE (#7099)
* Fixed save_imatrix to match old behaviour for MoE

This fix is simple and clear, but unnecessarily doubles the memory overhead..

* Fixed missing idx variable

* Unconditionally increment ncall

Co-authored-by: slaren <slarengh@gmail.com>

* Fixed 2 bugs in save_imatrix()

- Fixed segfault bug because the counts vector needed to be created.
- Fixed pre-existing bug didn't actually add to the counts for "--combine" option.

* ncall needs summing too

* Trailing whitespace

---------

Co-authored-by: slaren <slarengh@gmail.com>
2024-05-08 02:24:16 +02:00
Johannes Gäßler af0a5b6163
server: fix incorrectly reported token probabilities (#7125)
* server: normalize token probabilities

* fix temperature == 0.0f
2024-05-07 23:07:58 +02:00
Kyle Mistele 260b7c6529
server : update readme with undocumented options (#7013) 2024-05-07 21:44:29 +03:00
RhinoDevel 3af34c1d1b
main : update log text (EOS to EOG) (#7104)
* Update log text (EOS to EOG)

The log text "found EOS" is no longer always correct, here, because there is now an is-EOG check that also returns true for EOT.

* Improve log msg. further by using "an" instead of "some".

As suggested, to avoid misunderstanding (no multiple EOG tokens found, just one).
2024-05-07 20:51:31 +03:00
omahs 04976db7a8
docs: fix typos (#7124)
* fix typo

* fix typos

* fix typo

* fix typos

* fix typo

* fix typos
2024-05-07 18:20:33 +03:00
HanishKVC b875b02979 ChatON:Initial go at vicuna chat template in meta.json
Have looked at tokenizer_config.json, jinja file and default
hardcoded template in llama.cpp.

This is also one of the models where a Global BoS is needed.

NOTE: Have taken the liberty to also add a SYSTEM: prefix wrt
system message, even thou default vicuna doesnt seem to need, but
vicuna-orca seems to need, so that both models can be driven from
same chat template config. I am assuming the system prefix should
not create any problem even in default vicuna, however if it does
create a problem one can duplicate the existing vicuna block in
chaton_meta.json and make the system prefix empty in it.
2024-05-06 11:27:56 +05:30
HanishKVC 0f8f2a18c2 ChatON:chat template for OpenChat in meta.json initial go
The first model seen, based on templates added till now into meta
json file, that needs a Global Begin.

From tokenizer_config json file, it appears like even system role
should have a appropriate prefix, unlike what is seen in hardcoded
default chat apply template of llama.cpp and chat jinja template.
2024-05-06 11:27:56 +05:30
HanishKVC 93115a9733 ChatON: initial go at OrionStar Ai chat model template
Got from its tokenizer config json. Also same found in existing
hardcoded template in default chat apply template logic of llamacpp
2024-05-06 11:27:56 +05:30
HanishKVC 5380b1e86e ChatON:Update meta.json wrt command-r models
template info picked from tokenizer config's default entry,

Verified that same is used in the existing hardcoded chat apply
template flow.
2024-05-06 11:27:56 +05:30
HanishKVC ef5a2cf391 SimpCfg:Dbug why bool is not setting properly 2024-05-06 11:27:56 +05:30
HanishKVC a108000cfd ChatON:Phi3: Add template details to detailed meta.json 2024-05-06 11:27:56 +05:30
HanishKVC 000245b8e8 SimpCfg:Warn possible nonstring strings, some invalid floats
Warn if something not starting with double quote is being treated
as a string.

Show some examples of invalid floating point values wrt this
logics floating point determination code
2024-05-06 11:27:56 +05:30
HanishKVC a6648b02f2 SimpCfg:Show floating point values in normal and exponential form 2024-05-06 11:27:56 +05:30
HanishKVC 4181164217 SimpCfg:Implement set_int64 and set_double
Also update the sample simpcfg file, to test for int and float
values.
2024-05-06 11:27:56 +05:30
HanishKVC f728dbddd0 ChatON: Add simpcfg based config file matching chaton_meta.json
Add missing begin and end fields wrt deepseek-coder assistant in
chaton_meta.json.

Idea is to avoid json library dependency by adding a simple text
based config file support.
2024-05-06 11:27:56 +05:30
HanishKVC c4e829d492 ChatON:Mistral: Decouple \n from suffix, use wrt sys msg 2024-05-06 11:27:56 +05:30
HanishKVC 55e3d63f13 ChatON:Mistral: Update to match jinja file 2024-05-06 11:27:56 +05:30
HanishKVC ad5e5216ce ChatON:Mistral: Add detailed meta json entries 2024-05-06 11:27:56 +05:30
HanishKVC 368fbf17a1 ChatON:ChatML: Update wrt detailed meta json 2024-05-06 11:27:56 +05:30
HanishKVC a64dcd7796 ChatON:Zephyr: Update wrt detailed meta json, also update eos
Pick eos from zephyr's tokenizer_config, which is different from
what was hardcoded in the existing llama_chat_apply_template.
2024-05-06 11:27:56 +05:30
HanishKVC 18cd12524f ChatON:Monarch:Update wrt detailed meta json 2024-05-06 11:27:56 +05:30
HanishKVC 006a398ebf ChatON:DeepSeekCoder: Update tmplid and wrt detailed meta json 2024-05-06 11:27:56 +05:30
HanishKVC 1b2e921186 ChatON:DeepSeek: Update support wrt detailed meta json 2024-05-06 11:27:56 +05:30
HanishKVC 403a6c4323 ChatON:Gemma: update for detailed meta json
Also as part of same add user role entry for system role also.
2024-05-06 11:27:56 +05:30
HanishKVC 01c8db70f7 ChatON+Main: Add C_API wrapper for single
Add a c api wrapper for a single message tagging scenario.

Inturn to match convention followed by existing chat_apply_template
code, make it return the size expected of the tagged message string
buffer. Update internal single logic to help with same.

Explicitly check if tmpl specified is available in the loaded json
or not and then return a error if not found.
2024-05-06 11:27:56 +05:30
HanishKVC 13857f29d6 ChatON+Main: Updates wrt detailed meta json
Fix a oversight wrt key name.

Add a alert in case if passed meta json file contains begin(BoS)
wrt assistant role, similar to check for end (EoS) wrt user role.
Bcas normally both (ie EoS wrt User and BoS wrt Assistant) shouldnt
be needed.

Update main wrt begin & prefix and suffix & end addition.
2024-05-06 11:27:56 +05:30
HanishKVC b9e31304a5 ChatON: Update to new detailed format wrt llama2 and llama3
Wrt llama2
* add bos wrt llama2 system and user begins, but not assistant
* split system suffix into suffix and end, and add systemuser-system
  flags so that end can be avoided wrt system+user message combo
* add eos wrt assistant end
* With these potentially this should work with main and server flows

Wrt llama3
* add empty begin, end fields and systemuser-system flags
* This should potentially work with main and server flows
2024-05-06 11:27:56 +05:30
HanishKVC bf1167bfdb ChatON: Backup the current simple meta json file 2024-05-06 11:27:56 +05:30
HanishKVC 6b23f15ffe ChatON:ChatOnMetaJSon: Add suffix wrt assistant messages 2024-05-06 11:27:56 +05:30
HanishKVC 3064a36e74 ChatON+:Update tmpl_role_kv to retrieve wrt multiple keys
Use the same for user role's begin and prefix entries.
2024-05-06 11:27:56 +05:30
HanishKVC f1f39c5256 ChatON:Add Monarch model template, which uses Begin + Prefix
Inturn Begin/BoS is added only for non 1st user messages in a
system+user prompts chain.
2024-05-06 11:27:56 +05:30
HanishKVC 0f713d4c4f ChatOn: meta json update wrt the new begin related fields 2024-05-06 11:27:56 +05:30
HanishKVC 84367b9fd1 ChatON: Add template for DeepSeek
Was looking at the tokenized vector, and noticed that the EOS
mentioned by existing chat_apply_template of llama.cpp, is different
from what I noticed in tokenizer_config.json of deepseek llm, so
I have added two entries

* "deepseek-alt" which matches llama.cpp's chat_apply_template and
* "deepseek" which matches that in tokenizer_config.json.

This impacts the assistant suffix and reverse prompt entries.

CasOfThis: Need to look into other entries which I added previously
at a later time. However as the default logic should be picking the
EOS from model file, so I assume reverse-prompt being outofsync,
may not matter beyond a limit, potentially.
2024-05-06 11:27:56 +05:30
HanishKVC f4b54069f6 ChatON: Add template for Gemma 2024-05-06 11:27:56 +05:30
HanishKVC 2a8028fba8 ChatON: Add Zephyr template to meta-json file 2024-05-06 11:27:56 +05:30