hongruichen
0301b500cd
refactoring: prevent leak the QNN_INTERFACE_VER_TYPE and QNN_SYSTEM_INTERFACE_VER_TYPE outside of qnn.hpp
2024-07-17 00:18:38 +08:00
hongruichen
ff601abc1c
add todo
2024-07-16 00:05:40 +08:00
hongruichen
f32327e2b2
remove multiply declearation of log in unit test
2024-07-15 12:06:12 +08:00
hongruichen
cd5a7331f7
add cpu backend as cross reference
2024-07-15 10:55:17 +08:00
hongruichen
4410fd6563
format with clang-format
2024-07-15 10:30:57 +08:00
hongruichen
c46b4deea9
[unit test] init all tensor by one function
2024-07-15 10:23:19 +08:00
hongruichen
148ceab70c
add log op
2024-07-14 23:00:50 +08:00
hongruichen
c1e2283887
expose op at unit test
2024-07-13 11:07:06 +08:00
hongruichen
7cbc4fbd8c
add mul
2024-07-12 23:26:38 +08:00
hongruichen
0eb595cc6e
use table to simpilify the op mapping
2024-07-12 23:22:29 +08:00
hongruichen
8b677d1b2f
move qnn backend into sub folder
2024-07-02 19:42:14 +08:00
hongruichen
3808a4c1e0
Merge branch 'master' into dev-refactoring
2024-07-01 22:52:08 +08:00
Xuan Son Nguyen
9ef0780062
Fix new line issue with chat template, disable template when in-prefix/suffix is set ( #8203 )
...
* preserve new line llama_chat_format_single
* disable chat template if in-prefix/suffix is set
* remove redundant change
2024-06-30 20:27:13 +02:00
Olivier Chafik
8748d8ac6f
json: attempt to skip slow tests when running under emulator ( #8189 )
2024-06-28 18:02:05 +01:00
Xuan Son Nguyen
26a39bbd6b
Add MiniCPM, Deepseek V2 chat template + clean up `llama_chat_apply_template_internal` ( #8172 )
...
* tmp_contains
* minicpm chat template
* add DeepSeek Lite template
* change deepseek-lite to deepseek2
* correct code comment
* correct code from master branch
2024-06-28 15:11:44 +02:00
Olivier Chafik
139cc621e9
`json`: restore default additionalProperties to false, fix some pattern escapes ( #8180 )
...
* json: expand ESCAPED_IN_REGEXPS_BUT_NOT_IN_LITERALS charset
* json: revert default of additionalProperties to false
* Update README.md
2024-06-28 09:26:45 +01:00
Georgi Gerganov
f3f65429c4
llama : reorganize source code + improve CMake ( #8006 )
...
* scripts : update sync [no ci]
* files : relocate [no ci]
* ci : disable kompute build [no ci]
* cmake : fixes [no ci]
* server : fix mingw build
ggml-ci
* cmake : minor [no ci]
* cmake : link math library [no ci]
* cmake : build normal ggml library (not object library) [no ci]
* cmake : fix kompute build
ggml-ci
* make,cmake : fix LLAMA_CUDA + replace GGML_CDEF_PRIVATE
ggml-ci
* move public backend headers to the public include directory (#8122 )
* move public backend headers to the public include directory
* nix test
* spm : fix metal header
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* scripts : fix sync paths [no ci]
* scripts : sync ggml-blas.h [no ci]
---------
Co-authored-by: slaren <slarengh@gmail.com>
2024-06-26 18:33:02 +03:00
Olivier Chafik
9b2f16f805
`json`: better support for "type" unions (e.g. nullable arrays w/ typed items) ( #7863 )
...
* json: better suport for "type" arrays (e.g. `{"type": ["array", "null"], "items": {"type": "string"}}`)
* json: add test for type: [array, null] fix
* update tests
2024-06-26 01:46:35 +01:00
Olivier Chafik
6777c544bd
`json`: fix additionalProperties, allow space after enum/const ( #7840 )
...
* json: default additionalProperty to true
* json: don't force additional props after normal properties!
* json: allow space after enum/const
* json: update pydantic example to set additionalProperties: false
* json: prevent additional props to redefine a typed prop
* port not_strings to python, add trailing space
* fix not_strings & port to js+py
* Update json-schema-to-grammar.cpp
* fix _not_strings for substring overlaps
* json: fix additionalProperties default, uncomment tests
* json: add integ. test case for additionalProperties
* json: nit: simplify condition
* reformat grammar integ tests w/ R"""()""" strings where there's escapes
* update # tokens in server test: consts can now have trailing space
2024-06-26 01:45:58 +01:00
Daniel Bevenius
e6bf007744
llama : return nullptr from llama_grammar_init ( #8093 )
...
* llama : return nullptr from llama_grammar_init
This commit updates llama_grammar_init to return nullptr instead of
throwing an exception.
The motivation for this is that this function is declared inside an
extern "C" block and is intended/may be used from C code which will not
be able to handle exceptions thrown, and results in undefined behavior.
On Windows and using MSVC the following warning is currently generated:
```console
C:\llama.cpp\llama.cpp(13998,1): warning C4297: 'llama_grammar_init':
function assumed not to throw an exception but does
C:\llama.cpp\llama.cpp(13998,1): message :
__declspec(nothrow), throw(), noexcept(true), or noexcept was specified
on the function
```
Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>
* squash! llama : return nullptr from llama_grammar_init
Add checks for nullptr when calling llama_grammar_init.
Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>
---------
Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>
Co-authored-by: Clint Herron <hanclinto@gmail.com>
2024-06-25 15:07:28 -04:00
Olivier Chafik
84631fe150
`json`: support integer minimum, maximum, exclusiveMinimum, exclusiveMaximum ( #7797 )
...
* json: support minimum for positive integer values
* json: fix min 0
* json: min + max integer constraints
* json: handle negative min / max integer bounds
* json: fix missing paren min/max bug
* json: proper paren fix
* json: integration test for schemas
* json: fix bounds tests
* Update json-schema-to-grammar.cpp
* json: fix negative max
* json: fix negative min (w/ more than 1 digit)
* Update test-grammar-integration.cpp
* json: nit: move string rules together
* json: port min/max integer support to Python & JS
* nit: move + rename _build_min_max_int
* fix min in [1, 9]
* Update test-grammar-integration.cpp
* add C++11-compatible replacement for std::string_view
* add min/max constrained int field to pydantic json schema example
* fix merge
* json: add integration tests for min/max bounds
* reshuffle/merge min/max integ test cases
* nits / cleanups
* defensive code against string out of bounds (apparently different behaviour of libstdc++ vs. clang's libc++, can't read final NULL char w/ former)
2024-06-25 20:06:20 +01:00
Xuan Son Nguyen
48e6b92cc3
Add chat template support for llama-cli ( #8068 )
...
* add chat template support for llama-cli
* add help message
* server: simplify format_chat
* more consistent naming
* improve
* add llama_chat_format_example
* fix server
* code style
* code style
* Update examples/main/main.cpp
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-06-25 21:56:49 +10:00
hongruichen
c9e99bd603
split qnn ops into file
2024-06-24 23:32:29 +08:00
slaren
b6b9a8e606
fix CI failures ( #8066 )
...
* test-backend-ops : increase cpy max nmse
* server ci : disable thread sanitizer
2024-06-23 13:14:45 +02:00
Clint Herron
c5a8d4b749
JSON Schema to GBNF integration tests ( #7790 )
...
* Adding simple bare-bones test for end-to-end integration test for json validation against auto-generated JSON-schema grammars.
* Adding additional examples as documented in #7789 . Also adding the ability to automatically output improperly failing grammars to debug output files so they can more easily be examined in the gbnf-validator program.
* Uncommenting formerly commented tests so that they fail for others who are attempting to reproduce the bugs.
* Merging improved schema test methods added by @ochafik in #7797
* Adding #define to temporarily remove failing tests so that this PR can pass CI, but still be useful for other PRs that want to leverage the framework.
* Fixing nits from ochafik. Removing escape slashes, adding additional failing cases, fixing some other strings.
* Fixing grammar indentation to be consistent throughout file.
2024-06-21 23:18:36 -04:00
hongruichen
99320620b0
split logger function, tensors and backend from main qnn source
2024-06-19 14:39:16 +08:00
jaime-m-p
37bef89433
tokenizer : BPE fixes ( #7530 )
...
* Random test: add_bos_token, add_eos_token
* Random test: add BPE models for testing
* Custom regex split fails with codepoint 0
* Fix falcon punctuation regex
* Refactor llm_tokenizer_bpe: move code to constructor
* Move 'add_special_bos/eos' logic to llm_tokenizer_bpe
* Move tokenizer flags to vocab structure.
* Default values for special_add_bos/eos
* Build vocab.special_tokens_cache using vocab token types
* Generalize 'jina-v2' per token attributes
* Fix unicode whitespaces (deepseek-coder, deepseek-llm)
* Skip missing byte tokens (falcon)
* Better unicode data generation
* Replace char32_t with uint32_t
2024-06-18 18:40:52 +02:00
Calvin Laurenson
43b35e38ba
Add support for sqrt on CUDA ( #7953 )
...
* cuda sqrt support
* enable cuda in pca
* fix comments in pca
* add test
* add sqrt to ggml_backend_cuda_supports_op
* fix test
* new line
* Use F32 sqrtf instead of F64 sqrt
Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
---------
Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
2024-06-17 00:23:04 +02:00
hongruichen
5e18cdc268
init the test array with const values
2024-06-16 22:28:10 +08:00
zhou.weiguo
5598fbd15d
review: make a MVP(Minimum Viable PR) style PR in upstream
2024-06-13 15:41:53 +08:00
Georgi Gerganov
a9cae48003
tests : add non-cont unary tests ( #7857 )
...
* tests : add non-cont unary tests
* ggml : update unary asserts and "supports_op"
ggml-ci
2024-06-12 16:00:22 +03:00
zhou.weiguo
faaa86b7e4
ggml-qnn: refine ggml inference using QNN NPU
2024-06-12 16:30:50 +08:00
zhou.weiguo
5269e082aa
ggml-qnn: refine ggml inference using QNN NPU
2024-06-11 23:05:00 +08:00
Georgi Gerganov
4bfe50f741
tests : check the Python version ( #7872 )
...
ggml-ci
2024-06-11 10:10:20 +03:00
Olivier Chafik
b61eb9644d
json: refine constraint for whitespace to avoid runaways yet allow pretty print ( #7866 )
2024-06-11 02:22:57 +01:00
Olivier Chafik
396b18dfec
`json`: document schema conversion in GBNF readme, align manual grammar examples & converters ( #7841 )
...
* json: fix char pattern in grammar converters
* json: prevent number precision & whitespace runaways in example grammars
* json: add doc to grammar readme
2024-06-11 01:00:30 +01:00
zhou.weiguo
2fab33d825
ggml-qnn: remove static global vars to support multi-instance simultaneously
2024-06-07 12:51:04 +08:00
Clint Herron
ad675e1c67
Added support for . (any character) token in grammar engine. ( #6467 )
...
* Added support for . (any characer) token in grammar engine.
* Add integration tests for any-character symbol.
2024-06-06 06:08:52 -07:00
zhou.weiguo
dd29834c11
add supportive of quantize data type Q8_0
2024-06-06 17:12:28 +08:00
Olivier Chafik
55b2d0849d
grammars: x{min,max} repetition operator ( #6640 )
...
* grammars: x{min,max} repetition operator + tweak +/*/? to avoid duplication of original over alternates
* grammars: handle `x{n}` and fix `x{n,n}`
* grammars: document new repetition operators
* grammars: uniform use of int for min & max
* grammars: refactor parser test
* grammar: parsing tests w/ natural pretty print of updated expectations
* grammars: much prettier print of expectations (+ TEST_GRAMMAR_PARSER_PRINT_ALL=1 to force all)
* grammars: improve test pretty print again
* grammars: pretty print rules and chars
* grammars: fix copy rule skipping
* grammars: disallow `a{,}` (not allowed in regexps)
* Update common/grammar-parser.cpp
Co-authored-by: Clint Herron <hanclinto@gmail.com>
* grammars: fix copy rule skipping (again) & display of expectations
* grammars: more test cases
* grammars: update reps parsing to bring ? / * / + closer to before
* json: use new GBNF repetitions{m,n} syntax
* grammars: update performance gotchas w/ repetition advice
* Update examples/json_schema_to_grammar.py
Co-authored-by: Clint Herron <hanclinto@gmail.com>
* Update examples/server/public/json-schema-to-grammar.mjs
Co-authored-by: Clint Herron <hanclinto@gmail.com>
* grammars: comment on rule repetitions
* grammars: ensure unambiguous number alternatives
* grammar: nit typo switched error msgs
* grammar: nit numbering in comment
* json: update numeric rule to be unambiguous
* Apply suggestions from code review
Co-authored-by: Clint Herron <hanclinto@gmail.com>
* Update examples/server/public/json-schema-to-grammar.mjs
Co-authored-by: Clint Herron <hanclinto@gmail.com>
* json: fix integral-part
* grammar: add repetition tests
---------
Co-authored-by: Clint Herron <hanclinto@gmail.com>
2024-06-06 10:07:06 +01:00
Georgi Gerganov
2b3389677a
ggml : refactor rope norm/neox ( #7634 )
...
* ggml : unify rope norm/neox (CPU)
* ggml : fix compile warning
* ggml : remove GLM rope mode
ggml-ci
* metal : better rope implementation
ggml-ci
* cuda : better rope implementation
ggml-ci
* naming : n_orig_ctx -> n_ctx_orig
ggml-ci
* dev : add reminders to update backends
ggml-ci
* vulkan : fix ggml_rope_ext() usage
* cuda : fix array size + indents
ggml-ci
2024-06-05 11:29:20 +03:00
zhou.weiguo
9c872cbbce
refine ggml-qnn-ut program and script to make reviewers happy
2024-06-05 12:06:17 +08:00
zhou.weiguo
d325088dbf
ggml: add Qualcomm QNN(Qualcomm Neural Network,aka Qualcomm AI Engine Direct) backend
2024-06-05 10:55:45 +08:00
jaime-m-p
3b38d48609
Per token attributes ( #7685 )
...
* Add per token attributes enum
* Using phi-3 for testing 'rstrip'
* Using jina-v2 for testing 'lstrip'
* Brute force test for 'lstrip' and 'rstrip'
* Implement 'rstrip' and 'lstrip'
* Update phi-3 GGUF file (obsolete since 917dc8c )
* Replace llama_token_type with llama_token_attribs
2024-06-04 09:17:17 +02:00
Johannes Gäßler
e141ce624a
Fix FlashAttention debug test, FP32 assert ( #7684 )
2024-06-01 23:26:10 +02:00
Johannes Gäßler
9b596417af
CUDA: quantized KV support for FA vec ( #7527 )
...
* CUDA: quantized KV support for FA vec
* try CI fix
* fix commented-out kernel variants
* add q8_0 q4_0 tests
* fix nwarps > batch size
* split fattn compile via extern templates
* fix flake8
* fix metal tests
* fix cmake
* make generate_cu_files.py executable
* add autogenerated .cu files
* fix AMD
* error if type_v != FP16 and not flash_attn
* remove obsolete code
2024-06-01 08:44:14 +02:00
Georgi Gerganov
0c27e6f62e
ggml : fix loongson compile warnings ( #7537 )
...
* ggml : fix loongson compile warnings
ggml-ci
* Fix loongarch quantize test fail.
Fix unexpected error introduced during rebase code.
* tests : disable json test due to lack of python on the CI node
ggml-ci
---------
Co-authored-by: junchao-loongson <zhaojunchao@loongson.cn>
2024-05-31 14:17:10 +03:00
Georgi Gerganov
fb76ec31a9
ggml : fix YARN + add tests + add asserts ( #7617 )
...
* tests : add rope tests
ggml-ci
* ggml : fixes (hopefully)
ggml-ci
* tests : add non-cont tests
ggml-ci
* cuda : add asserts for rope/norm + fix DS2
ggml-ci
* ggml : assert contiguousness
* tests : reduce RoPE tests
ggml-ci
2024-05-29 20:17:31 +03:00
Georgi Gerganov
cce3dcffc5
cuda : non-cont concat support ( #7610 )
...
* tests : add non-cont concat tests
* cuda : non-cont concat support
ggml-ci
2024-05-29 15:38:26 +03:00
jaime-m-p
02c1ecad07
Tokenizer WPM fixes ( #7500 )
...
* Update random test: add_bos_token.
* Update random test: add WPM models for testing.
* Build vocab.special_tokens_cache using vocab token types.
* Fix and improve WPM preprocessing.
- Fix unicode edge case combinations.
- Split by whitspace in the same pass.
* Discard all tokens when no matching found.
2024-05-28 21:46:34 +02:00