Commit Graph

4123 Commits

Author SHA1 Message Date
hongruichen b173c4e061 feat: update tensor name when bind to graph 2024-07-20 17:31:40 +08:00
hongruichen 5f3b1ae3b0 fix: try fix graph cache with append the tensors name 2024-07-20 16:39:06 +08:00
hongruichen 51f95d6980 fix: dimension could be wrong for tensor liked 1x1x8 2024-07-20 16:11:35 +08:00
Brian c3776cacab
gguf_dump.py: fix markddown kv array print (#8588)
* gguf_dump.py: fix markddown kv array print

* Update gguf-py/scripts/gguf_dump.py

Co-authored-by: compilade <git@compilade.net>

* gguf_dump.py: refactor kv array string handling

* gguf_dump.py: escape backticks inside of strings

* gguf_dump.py: inline code markdown escape handler added

>>> escape_markdown_inline_code("hello world")
'`hello world`'
>>> escape_markdown_inline_code("hello ` world")
'``hello ` world``'

* gguf_dump.py: handle edge case about backticks on start or end of a string

---------

Co-authored-by: compilade <git@compilade.net>
2024-07-20 17:35:25 +10:00
hongruichen 27299463ae fix: try fix tensor type error 2024-07-20 15:13:10 +08:00
hongruichen 28a00e5e6c fix: try fix QNN_GRAPH_ERROR_INVALID_OP_CONFIG 2024-07-20 14:11:58 +08:00
hongruichen 1679dcf47e fix: check all dimentions in `can offload` 2024-07-20 13:29:01 +08:00
slaren 87e397d00b
ggml : fix quant dot product with odd number of blocks (#8549)
* ggml : fix iq4_nl dot product with odd number of blocks

* ggml : fix odd blocks for ARM_NEON (#8556)

* ggml : fix iq4_nl dot product with odd number of blocks

* ggml : fix q4_1

* ggml : fix q5_0

* ggml : fix q5_1

* ggml : fix iq4_nl metal

ggml-ci

* ggml : fix q4_0

* ggml : fix q8_0

ggml-ci

* ggml : remove special Q4_0 code for first 2 blocks

* ggml : fix sumf redefinition

---------

Co-authored-by: slaren <slarengh@gmail.com>

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-07-19 17:17:27 +02:00
hongruichen b1b5cc10b1 add function to convert qnn error into string 2024-07-19 22:51:17 +08:00
Brian 57b1d4f9eb
convert-*.py: remove add_name from ChatGLMModel class (#8590) 2024-07-20 00:04:38 +10:00
Georgi Gerganov d197545530
llama : bump max layers from 256 to 512 (#8530)
* llama : bump max layers from 256 to 512

* llama : replace asserts with exceptions
2024-07-19 16:50:47 +03:00
Georgi Gerganov be0cfb4175
readme : fix server badge 2024-07-19 14:34:55 +03:00
Clint Herron b57eb9ca4f
ggml : add friendlier error message to fopen errors (#8575)
* Add additional error information when model files fail to load.

* Adding additional error information to most instances of fopen.
2024-07-19 14:05:45 +03:00
Frank Mai f299aa98ec
fix: typo of chatglm4 chat tmpl (#8586)
Signed-off-by: thxCode <thxcode0824@gmail.com>
2024-07-19 11:44:41 +02:00
Brian 3d0e4367d9
convert-*.py: add general.name kv override (#8571) 2024-07-19 17:51:51 +10:00
hongruichen a607995f95 Reapply "tried fix the add node error 6005"
This reverts commit f45fbec8f4.
2024-07-19 15:35:55 +08:00
hongruichen 0153a23d3f fix support ops
This reverts commit f45fbec8f4.
2024-07-19 15:31:29 +08:00
hongruichen f45fbec8f4 Revert "tried fix the add node error 6005"
This reverts commit ce3d09e5f2.
2024-07-19 12:59:38 +08:00
hongruichen ce3d09e5f2 tried fix the add node error 6005 2024-07-19 12:59:21 +08:00
Johannes Gäßler a15ef8f8a0
CUDA: fix partial offloading for ne0 % 256 != 0 (#8572) 2024-07-18 23:48:47 +02:00
65a 705b7ecf60
cmake : install all ggml public headers (#8480)
Co-authored-by: 65a <65a@65a.invalid>
2024-07-18 17:47:12 +03:00
hongruichen 665f823748 fix op checker 2024-07-18 22:26:53 +08:00
hongruichen 15f5cc450c bug: fix allocation size overflow at log 2024-07-18 19:44:05 +08:00
Eric Zhang 0d2c7321e9
server: use relative routes for static files in new UI (#8552)
* server: public: fix api_url on non-index pages

* server: public: use relative routes for static files in new UI
2024-07-18 12:43:49 +02:00
Brian 672a6f1018
convert-*.py: GGUF Naming Convention Refactor and Metadata Override Refactor (#7499)
Main thing is that the default output filename will take this form

{name}{parameters}{finetune}{version}{encoding}{kind}

In addition this add and remove some entries in the KV store and adds a metadata class with automatic heuristics capability to derive some values based on model card content

* No Change:
  - Internal GGUF Spec
    - `general.architecture`
    - `general.quantization_version`
    - `general.alignment`
    - `general.file_type`
  - General Model Details
    - `general.name`
    - `general.author`
    - `general.version`
    - `general.description`
  - Licensing details
    - `general.license`
  - Typically represents the converted GGUF repo (Unless made from scratch)
    - `general.url`
  - Model Source during conversion
    - `general.source.url`

* Removed:
  - Model Source during conversion
    - `general.source.huggingface.repository`

* Added:
  - General Model Details
    - `general.organization`
    - `general.finetune`
    - `general.basename`
    - `general.quantized_by`
    - `general.size_label`
  - Licensing details
    - `general.license.name`
    - `general.license.link`
  - Typically represents the converted GGUF repo (Unless made from scratch)
    - `general.doi`
    - `general.uuid`
    - `general.repo_url`
  - Model Source during conversion
    - `general.source.doi`
    - `general.source.uuid`
    - `general.source.repo_url`
  - Base Model Source
    - `general.base_model.count`
    - `general.base_model.{id}.name`
    - `general.base_model.{id}.author`
    - `general.base_model.{id}.version`
    - `general.base_model.{id}.organization`
    - `general.base_model.{id}.url` (Model Website/Paper)
    - `general.base_model.{id}.doi`
    - `general.base_model.{id}.uuid`
    - `general.base_model.{id}.repo_url` (Model Source Repository (git/svn/etc...))
  - Array based KV stores
    - `general.tags`
    - `general.languages`
    - `general.datasets`

---------

Co-authored-by: compilade <git@compilade.net>
Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com>
2024-07-18 20:40:15 +10:00
RunningLeon 3807c3de04
server : respect `--special` cli arg (#8553) 2024-07-18 11:06:22 +03:00
hongruichen d82b3a0bdb feat: add GGML_UNARY_OP_GELU 2024-07-18 11:15:48 +08:00
Johannes Gäßler e02b597be3
lookup: fibonacci hashing, fix crashes (#8548) 2024-07-17 23:35:44 +02:00
Al Mochkin b3283448ce
build : Fix docker build warnings (#8535) (#8537) 2024-07-17 20:21:55 +02:00
hongruichen ce199b2de7 refactoring: downgrade some log to debug level 2024-07-17 23:49:47 +08:00
hongruichen c76fc9aa2f fix warnings 2024-07-17 23:32:13 +08:00
hongruichen 6457a68bd7 disable qnn profiling in release build 2024-07-17 23:24:29 +08:00
hongruichen b7d781ec81 remove qnn dedicated unit tests since we're now using the `test-backend-ops` to cross-validate backend ops 2024-07-17 23:08:16 +08:00
Brian 30f80ca0bc
CONTRIBUTING.md : remove mention of noci (#8541) 2024-07-17 17:57:06 +03:00
hongruichen 2502b57203 fix warnings 2024-07-17 22:10:12 +08:00
hongruichen 454deef83c register qnn backend 2024-07-17 21:25:55 +08:00
hongruichen eed960575f add build step of QNN backend at ggml 2024-07-17 19:43:01 +08:00
hipudding 1bdd8ae19f
[CANN] Add Ascend NPU backend (#6035)
* [CANN] Add Ascend NPU backend

Ascend is a full-stack AI computing infrastructure for industry
applications and services based on Huawei Ascend processors and
software.

CANN (Compute Architecture of Neural Networks), developped by
Huawei, is a heterogeneous computing architecture for AI.

Co-authored-by: wangshuai09 <391746016@qq.com>

* delete trailing whitespaces

* Modify the code based on review comment

* Rename LLAMA_CANN to GGML_CANN

* Make ggml-common.h private

* add ggml_cann prefix for acl funcs

* Add logging for CANN backend

* Delete Trailing whitespace

---------

Co-authored-by: wangshuai09 <391746016@qq.com>
2024-07-17 14:23:50 +03:00
hongruichen 861bb9c580 Merge tag 'b3405' into dev-refactoring 2024-07-17 17:13:55 +08:00
Masaya, Kato da3913d8f9
batched: fix n_predict parameter (#8527) 2024-07-17 10:34:28 +03:00
Georgi Gerganov d65a8361fe
llama : disable context-shift for DeepSeek v2 (#8501) 2024-07-17 10:32:59 +03:00
hongruichen bb13795dce refactoring: remove unused functions and variables 2024-07-17 14:17:35 +08:00
hongruichen 63dc587dff refactoring: make the buffer alloc and free stay in same class 2024-07-17 14:08:31 +08:00
hongruichen b1ef302991 refactoring: remove depend of dlsym at utils.hpp 2024-07-17 12:21:33 +08:00
Johannes Gäßler 5e116e8dd5
make/cmake: add missing force MMQ/cuBLAS for HIP (#8515) 2024-07-16 21:20:59 +02:00
hongruichen 0301b500cd refactoring: prevent leak the QNN_INTERFACE_VER_TYPE and QNN_SYSTEM_INTERFACE_VER_TYPE outside of qnn.hpp 2024-07-17 00:18:38 +08:00
Brian 1666f92dcd
gguf-hash : update clib.json to point to original xxhash repo (#8491)
* Update clib.json to point to Cyan4973 original xxhash

Convinced Cyan4973 to add clib.json directly to his repo, so can now point the clib package directly to him now. Previously pointed to my fork with the clib.json package metadata

https://github.com/Cyan4973/xxHash/pull/954

* gguf-hash: readme update to point to Cyan4973 xxHash repo [no ci]
2024-07-16 10:14:16 +03:00
Steve Bonds 37b12f92ab
export-lora : handle help argument (#8497)
The --help option on export-lora isn't accepted as valid. The help still gets displayed by default, but the script exits with an error message and nonzero status.
2024-07-16 10:04:45 +03:00
Georgi Gerganov 0efec57787
llama : valign + remove unused ftype (#8502) 2024-07-16 10:00:30 +03:00
compilade 7acfd4e8d5
convert_hf : faster lazy safetensors (#8482)
* convert_hf : faster lazy safetensors

This makes '--dry-run' much, much faster.

* convert_hf : fix memory leak in lazy MoE conversion

The '_lazy' queue was sometimes self-referential,
which caused reference cycles of objects old enough
to avoid garbage collection until potential memory exhaustion.
2024-07-15 23:13:10 -04:00