Olivier Chafik
f03e9b935b
Merge remote-tracking branch 'origin/master' into json-bounds2
2024-06-13 00:45:11 +01:00
Georgi Gerganov
a9cae48003
tests : add non-cont unary tests ( #7857 )
...
* tests : add non-cont unary tests
* ggml : update unary asserts and "supports_op"
ggml-ci
2024-06-12 16:00:22 +03:00
Georgi Gerganov
4bfe50f741
tests : check the Python version ( #7872 )
...
ggml-ci
2024-06-11 10:10:20 +03:00
Olivier Chafik
b61eb9644d
json: refine constraint for whitespace to avoid runaways yet allow pretty print ( #7866 )
2024-06-11 02:22:57 +01:00
Olivier Chafik
396b18dfec
`json`: document schema conversion in GBNF readme, align manual grammar examples & converters ( #7841 )
...
* json: fix char pattern in grammar converters
* json: prevent number precision & whitespace runaways in example grammars
* json: add doc to grammar readme
2024-06-11 01:00:30 +01:00
ochafik
d1f679125f
Update test-grammar-integration.cpp
2024-06-09 13:22:41 +01:00
Olivier Chafik
dcc27d1a93
fix min in [1, 9]
2024-06-09 09:42:19 +01:00
Olivier Chafik
ac2a8f8930
Update test-grammar-integration.cpp
2024-06-08 20:39:42 +01:00
Olivier Chafik
4c1c29361e
json: fix negative min (w/ more than 1 digit)
2024-06-08 20:33:40 +01:00
Olivier Chafik
931b543607
json: fix negative max
2024-06-08 20:33:12 +01:00
Olivier Chafik
a786c0381b
Merge remote-tracking branch 'origin/master' into json-bounds2
2024-06-08 20:05:17 +01:00
Clint Herron
ad675e1c67
Added support for . (any character) token in grammar engine. ( #6467 )
...
* Added support for . (any characer) token in grammar engine.
* Add integration tests for any-character symbol.
2024-06-06 06:08:52 -07:00
ochafik
431edb8e7b
json: fix bounds tests
2024-06-06 10:14:28 +01:00
ochafik
5a86c6f0e2
json: integration test for schemas
2024-06-06 10:14:28 +01:00
ochafik
f8db47814b
json: proper paren fix
2024-06-06 10:14:28 +01:00
ochafik
a381deb1b6
json: fix missing paren min/max bug
2024-06-06 10:14:28 +01:00
ochafik
af63f4fb27
json: handle negative min / max integer bounds
2024-06-06 10:14:28 +01:00
ochafik
c37c484029
json: min + max integer constraints
2024-06-06 10:14:28 +01:00
ochafik
d69ccb06a4
json: fix min 0
2024-06-06 10:14:28 +01:00
ochafik
057bbdc1f3
json: support minimum for positive integer values
2024-06-06 10:14:28 +01:00
Olivier Chafik
55b2d0849d
grammars: x{min,max} repetition operator ( #6640 )
...
* grammars: x{min,max} repetition operator + tweak +/*/? to avoid duplication of original over alternates
* grammars: handle `x{n}` and fix `x{n,n}`
* grammars: document new repetition operators
* grammars: uniform use of int for min & max
* grammars: refactor parser test
* grammar: parsing tests w/ natural pretty print of updated expectations
* grammars: much prettier print of expectations (+ TEST_GRAMMAR_PARSER_PRINT_ALL=1 to force all)
* grammars: improve test pretty print again
* grammars: pretty print rules and chars
* grammars: fix copy rule skipping
* grammars: disallow `a{,}` (not allowed in regexps)
* Update common/grammar-parser.cpp
Co-authored-by: Clint Herron <hanclinto@gmail.com>
* grammars: fix copy rule skipping (again) & display of expectations
* grammars: more test cases
* grammars: update reps parsing to bring ? / * / + closer to before
* json: use new GBNF repetitions{m,n} syntax
* grammars: update performance gotchas w/ repetition advice
* Update examples/json_schema_to_grammar.py
Co-authored-by: Clint Herron <hanclinto@gmail.com>
* Update examples/server/public/json-schema-to-grammar.mjs
Co-authored-by: Clint Herron <hanclinto@gmail.com>
* grammars: comment on rule repetitions
* grammars: ensure unambiguous number alternatives
* grammar: nit typo switched error msgs
* grammar: nit numbering in comment
* json: update numeric rule to be unambiguous
* Apply suggestions from code review
Co-authored-by: Clint Herron <hanclinto@gmail.com>
* Update examples/server/public/json-schema-to-grammar.mjs
Co-authored-by: Clint Herron <hanclinto@gmail.com>
* json: fix integral-part
* grammar: add repetition tests
---------
Co-authored-by: Clint Herron <hanclinto@gmail.com>
2024-06-06 10:07:06 +01:00
Georgi Gerganov
2b3389677a
ggml : refactor rope norm/neox ( #7634 )
...
* ggml : unify rope norm/neox (CPU)
* ggml : fix compile warning
* ggml : remove GLM rope mode
ggml-ci
* metal : better rope implementation
ggml-ci
* cuda : better rope implementation
ggml-ci
* naming : n_orig_ctx -> n_ctx_orig
ggml-ci
* dev : add reminders to update backends
ggml-ci
* vulkan : fix ggml_rope_ext() usage
* cuda : fix array size + indents
ggml-ci
2024-06-05 11:29:20 +03:00
jaime-m-p
3b38d48609
Per token attributes ( #7685 )
...
* Add per token attributes enum
* Using phi-3 for testing 'rstrip'
* Using jina-v2 for testing 'lstrip'
* Brute force test for 'lstrip' and 'rstrip'
* Implement 'rstrip' and 'lstrip'
* Update phi-3 GGUF file (obsolete since 917dc8c )
* Replace llama_token_type with llama_token_attribs
2024-06-04 09:17:17 +02:00
Johannes Gäßler
e141ce624a
Fix FlashAttention debug test, FP32 assert ( #7684 )
2024-06-01 23:26:10 +02:00
Johannes Gäßler
9b596417af
CUDA: quantized KV support for FA vec ( #7527 )
...
* CUDA: quantized KV support for FA vec
* try CI fix
* fix commented-out kernel variants
* add q8_0 q4_0 tests
* fix nwarps > batch size
* split fattn compile via extern templates
* fix flake8
* fix metal tests
* fix cmake
* make generate_cu_files.py executable
* add autogenerated .cu files
* fix AMD
* error if type_v != FP16 and not flash_attn
* remove obsolete code
2024-06-01 08:44:14 +02:00
Georgi Gerganov
0c27e6f62e
ggml : fix loongson compile warnings ( #7537 )
...
* ggml : fix loongson compile warnings
ggml-ci
* Fix loongarch quantize test fail.
Fix unexpected error introduced during rebase code.
* tests : disable json test due to lack of python on the CI node
ggml-ci
---------
Co-authored-by: junchao-loongson <zhaojunchao@loongson.cn>
2024-05-31 14:17:10 +03:00
Georgi Gerganov
fb76ec31a9
ggml : fix YARN + add tests + add asserts ( #7617 )
...
* tests : add rope tests
ggml-ci
* ggml : fixes (hopefully)
ggml-ci
* tests : add non-cont tests
ggml-ci
* cuda : add asserts for rope/norm + fix DS2
ggml-ci
* ggml : assert contiguousness
* tests : reduce RoPE tests
ggml-ci
2024-05-29 20:17:31 +03:00
Georgi Gerganov
cce3dcffc5
cuda : non-cont concat support ( #7610 )
...
* tests : add non-cont concat tests
* cuda : non-cont concat support
ggml-ci
2024-05-29 15:38:26 +03:00
jaime-m-p
02c1ecad07
Tokenizer WPM fixes ( #7500 )
...
* Update random test: add_bos_token.
* Update random test: add WPM models for testing.
* Build vocab.special_tokens_cache using vocab token types.
* Fix and improve WPM preprocessing.
- Fix unicode edge case combinations.
- Split by whitspace in the same pass.
* Discard all tokens when no matching found.
2024-05-28 21:46:34 +02:00
Georgi Gerganov
edc29433fa
tests : fix test-tokenizer-0.sh
2024-05-28 15:04:09 +03:00
Georgi Gerganov
0548a4187f
ggml : generalize GGML_OP_CONCAT ( #7563 )
...
* ggml : generalize GGML_OP_CONCAT (WIP)
ggml-ci
* tests : add dim != 2 tests
* metal : generalize concat kernel
* tests : naming
* cuda : generalize concat kernel
ggml-ci
* sycl : add warning and assert
* ggml : fix op params handling
* metal : bugfix kernel
ggml-ci
* ggml : reimplement CPU and Metal
* cuda : add asserts
ggml-ci
* ggml : fix ptrs
ggml-ci
2024-05-28 11:04:19 +03:00
Tristan Druyen
007489e895
Fix phi3 chat template confusion with zephyr ( #7449 )
...
* Fix phi3 template matching vs zephyr
* Add regression test for new phi3 chat template
* Implement review suggestions
* Fix phi3 jinja test templates & match by <|end|>
* Apply suggestion
Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com>
* Add all phi3 template variants in tests
* Remove unneeded message trimming
Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com>
* Fix tests to not expect trimmed messages
---------
Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com>
2024-05-23 16:15:15 +02:00
Georgi Gerganov
d48c88cbd5
ggml : remove ggml_flash_attn and ggml_flash_ff ( #7463 )
...
ggml-ci
2024-05-23 10:00:44 +03:00
Georgi Gerganov
3e5faa8503
cuda : fix rope + add tests ( #7452 )
...
* cuda : fix rope pos data
ggml-ci
* ggml : drop mode & 1 == 1 support for ggml_rope
ggml-ci
* ggml : support freq_factors for f16 rope (CPU)
ggml-ci
* tests : add rope tests using frequency factors
ggml-ci
2024-05-22 11:01:35 +03:00
liuwei-git
201cc11afa
llama : add phi3 128K model support ( #7225 )
...
* add phi3 128k support in convert-hf-to-gguf
* add phi3 128k support in cuda
* address build warnings on llama.cpp
* adjust index value in cuda long rope freq factors
* add long rope support in ggml cpu backend
* make freq factors only depend on ctx size
* remove unused rope scaling type 'su' frin gguf converter
* fix flint warnings on convert-hf-to-gguf.py
* set to the short freq factor when context size is small than trained context size
* add one line of comments
* metal : support rope freq_factors
* ggml : update ggml_rope_ext API to support freq. factors
* backends : add dev messages to support rope freq. factors
* minor : style
* tests : update to use new rope API
* backends : fix pragma semicolons
* minor : cleanup
* llama : move rope factors from KV header to tensors
* llama : remove tmp assert
* cuda : fix compile warning
* convert : read/write n_head_kv
* llama : fix uninitialized tensors
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-05-21 23:28:32 +03:00
Georgi Gerganov
c3f8d58356
tests : test-tokenizer-0.sh print more info ( #7402 )
2024-05-21 19:53:48 +03:00
jaime-m-p
d7e852c1bc
Tokenizer SPM fixes for phi-3 and llama-spm (bugfix) ( #7425 )
...
* Update brute force test: add_special
* Update brute force test: default values for add_bos_token and add_eos_token
* Enable rtrim when pre-inserting BOS
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* Revert "server : fix test regexes"
2024-05-21 14:39:48 +02:00
jaime-m-p
917dc8cfa6
Tokenizer SPM fixes for phi-3 and llama-spm ( #7375 )
...
* Update brute force test: special tokens
* Fix added tokens
- Try to read 'added_tokens.json'.
- Try to read 'tokenizer_config.json'.
- Try to read 'tokenizer.json'.
* Fix special tokens rtrim
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* server : fix test regexes
2024-05-20 20:15:57 +02:00
slaren
05834841dc
ggml : fix quants nans when all the group weights are very close to zero ( #7313 )
2024-05-18 02:39:54 +02:00
jaime-m-p
b43272afa2
Unicode codepoint flags for custom regexs ( #7245 )
...
* Replace CODEPOINT_TYPE_* with codepoint_flags
* Update and bugfix brute force random test
* Deterministic brute force random test
* Unicode normalization NFD
* Get rid of BOM
2024-05-18 01:09:13 +02:00
John Balis
48aa8fd1f2
ggml : add `ggml_upscale_ext` (ggml/814)
...
* initial commit with CPU implementation of upscale to shape and test, cuda implementation next
* experimental commit to see if dst shape is correct
* test version
* test
* removed unnecessary params
* refactor
* fixed tests
* ggml : metal impl + cleanup + sycl dev warnings
* patched ggml_upscale cuda op to handle non-contiguous tensors, added test for non-contiguous behavior
* metal : fix upsacle op to support nb00 + style
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-05-15 13:23:33 +03:00
Georgi Gerganov
e8a7fd4fb0
metal : support FA without mask + add asserts ( #7278 )
...
* ggml : fa without mask + add asserts
ggml-ci
* metal : support non-contiguous KV
ggml-ci
2024-05-14 19:09:30 +03:00
Haggai Nuchi
e0f556186b
Add left recursion check: quit early instead of going into an infinite loop ( #7083 )
...
* Add left recursion check: quit early instead of going into an infinite loop
* Remove custom enum, rename left recursion check and move to "grammar internal" section, add handling for edge case where a leftmost nonterminal may be empty
* Remove unnecessary declaration
2024-05-14 15:25:56 +10:00
Johannes Gäßler
dc685be466
CUDA: add FP32 FlashAttention vector kernel ( #7188 )
...
* CUDA: add FP32 FlashAttention vector kernel
* fixup! CUDA: add FP32 FlashAttention vector kernel
* fixup! fixup! CUDA: add FP32 FlashAttention vector kernel
* fixup! fixup! fixup! CUDA: add FP32 FlashAttention vector kernel
2024-05-12 19:40:45 +02:00
Haoxiang Fei
f99e1e456e
llama : lookup word in vocab before doing BPE merges ( #7193 )
...
* fix: llama-3 ignore_merges
* test: add test for llama-3 bpe ignore_merges
* fix: set ignore_merges only for llama-3
* fix: test-tokenizer-1-bpe --ingore-merges detection
* fix: copy to fix fallthrough
* fix: change ignore_merges to bool
* fix: add ignore merges tests to cmake
* llama : alternative merge ignore logic
---------
Co-authored-by: Haoxiang Fei <feihaoxiang@idea.edu.cn>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-05-11 11:12:06 +03:00
Georgi Gerganov
9cb317f77e
ggml : full ALiBi support ( #7192 )
...
* ggml : full ALiBi support
* ggml : update ggml_soft_max_ext() CUDA, SYCL
* ggml : ggml_flash_attn_ext() support ALiBi (CPU)
* ggml : ggml_flash_attn_ext() support ALiBi (Metal)
* ggml : fix warning
* ggml : ggml_flash_attn_ext() support ALiBi (CUDA)
ggml-ci
* ggml : fix assert message
* vulkan : add dev notes
* ggml : require mask when using ALiBi
ggml-ci
* convert : fix convert for refact models
2024-05-11 10:32:41 +03:00
jaime-m-p
43248e5594
llama3 custom regex split ( #6965 )
...
* merged the changes from deepseeker models to main branch
* Moved regex patterns to unicode.cpp and updated unicode.h
* Moved header files
* Resolved issues
* added and refactored unicode_regex_split and related functions
* Updated/merged the deepseek coder pr
* Refactored code
* Adding unicode regex mappings
* Adding unicode regex function
* Added needed functionality, testing remains
* Fixed issues
* Fixed issue with gpt2 regex custom preprocessor
* unicode : fix? unicode_wstring_to_utf8
* lint : fix whitespaces
* tests : add tokenizer tests for numbers
* unicode : remove redundant headers
* tests : remove and rename tokenizer test scripts
* tests : add sample usage
* gguf-py : reader prints warnings on duplicate keys
* llama : towards llama3 tokenization support (wip)
* unicode : shot in the dark to fix tests on Windows
* unicode : first try custom implementations
* convert : add "tokenizer.ggml.pre" GGUF KV (wip)
* llama : use new pre-tokenizer type
* convert : fix pre-tokenizer type writing
* lint : fix
* make : add test-tokenizer-0-llama-v3
* wip
* models : add llama v3 vocab file
* llama : adapt punctuation regex + add llama 3 regex
* minor
* unicode : set bomb
* unicode : set bomb
* unicode : always use std::wregex
* unicode : support \p{N}, \p{L} and \p{P} natively
* unicode : try fix windows
* unicode : category support via std::regex
* unicode : clean-up
* unicode : simplify
* llama3 custom regex split
* convert : add convert-hf-to-gguf-update.py
ggml-ci
* lint : update
* convert : add falcon
ggml-ci
* unicode : normalize signatures
* lint : fix
* lint : fix
* convert : remove unused functions
* convert : add comments
* convert : exercise contractions
ggml-ci
* Using char32_t for codepoints
* lint : fix
* already exists unicode_tolower()
* Typing
* Restore BOM
* cmake : refactor test targets
* tests : refactor vocab tests
ggml-ci
* tests : add more vocabs and tests
ggml-ci
* unicode : cleanup
* scripts : ignore new update script in check-requirements.sh
* Fix merge
* models : add phi-3, mpt, gpt-2, starcoder
* tests : disable obsolete
ggml-ci
* tests : use faster bpe test
ggml-ci
* llama : more prominent warning for old BPE models
* tests : disable test-tokenizer-1-bpe due to slowness
ggml-ci
* Move unused variable value
* GPT2 custom regex split
* Add alternative regex for custom aplit llama3
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* Style
* Add bruteforce random tests for token encoding
* wip: fixing unicode codepoint ranges
* Fix merge
* Unicode tables: separator, lowercase, uppercase and whitespace
* llama3 custom regex split: fix \s
* Restore BOM
* Style
* wip: generate NDF table
* Ignore special tokens for testing
* Clean gen-unicode-data.py
* Refactor random tokenizer test
* lint : fix
* tests : add fail test for llama-bpe
---------
Co-authored-by: Jaggzh <jaggz.h@gmail.com>
Co-authored-by: Kazim Abrar Mahi <kazimabrarmahi135@gmail.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Co-authored-by: jaime-m-p <>
2024-05-09 23:30:44 +10:00
Johannes Gäßler
a743d76a01
CUDA: generalize FP16 fattn vec kernel ( #7061 )
...
* CUDA: generalize FP16 fattn vec kernel
* disable unsupported head sizes for AMD in test
* try AMD fix
* fix batch size 2-8
* partially revert changes
2024-05-09 14:32:02 +02:00
Johannes Gäßler
c12452c7ae
JSON: [key] -> .at(key), assert() -> GGML_ASSERT ( #7143 )
2024-05-08 21:53:08 +02:00
Ren Xuancheng
229ffff872
llama : add BPE pre-tokenization for Qwen2 ( #7114 )
...
* Add BPE pre-tokenization for Qwen2.
* minor : fixes
---------
Co-authored-by: Ren Xuancheng <17811943+jklj077@users.noreply.github.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-05-08 15:06:43 +03:00