Xuejun Zhai
95c3071906
Removed API GgmlOvDecoder::get_input_names()
2026-01-15 11:39:08 -08:00
Xuejun Zhai
8ff73e5d53
Removed API m_outputs
2026-01-15 11:39:08 -08:00
Xuejun Zhai
ba852f2a60
Removed API GgmlOvDecoder::get_output_op_params(const std::string & name)
2026-01-15 11:39:08 -08:00
Xuejun Zhai
6d7a0d6047
Modified API GgmlOvDecoder::get_output_type(const std::string & name)
2026-01-15 11:39:08 -08:00
Xuejun Zhai
f516db1db5
remove unused API get_output_shape(const std::string & name)
2026-01-15 11:39:08 -08:00
Xuejun Zhai
497964afbb
remove unused API GgmlOvDecoder::get_output_names()
2026-01-15 11:39:08 -08:00
Xuejun Zhai
0ea8238ad0
remove unused API GgmlOvDecoder::get_output_stride(const std::string & name)
2026-01-15 11:39:08 -08:00
Yu, Zijun
808619e274
NPU support llma-perplexity -b 512 --no-warmup
2026-01-15 11:39:08 -08:00
Yu, Zijun
65348b5d20
fallback naive run with accuracy issue
2026-01-15 11:39:08 -08:00
Yu, Zijun
59e7e7c47d
NPU fix llama-bench
2026-01-15 11:39:08 -08:00
Yu, Zijun
38254cf592
NPU prefill chunking
2026-01-15 11:39:08 -08:00
XuejunZhai
992dea73fd
Fix error for naive
2026-01-15 11:39:08 -08:00
XuejunZhai
ae936519d2
Remove the second decoder for node. Moving the function into the model decoder
2026-01-15 11:39:05 -08:00
Arshath
4400b5cb4b
Update ggml-decoder.cpp
2026-01-15 11:38:13 -08:00
Arshath
98396b275a
Update ggml-decoder.cpp
2026-01-15 11:38:13 -08:00
Arshath
4a57b37d4d
Update ggml-decoder.cpp
2026-01-15 11:38:13 -08:00
Arshath
bed495226d
Update ggml-decoder.cpp
2026-01-15 11:38:13 -08:00
Arshath
11b4cc5a67
Update ggml-decoder.cpp
2026-01-15 11:38:13 -08:00
Arshath
047bfb5c90
Update ggml-decoder.cpp
...
Hitting error while compiling on windows:
error C3861: 'unsetenv': identifier not found
Reason: unsetenv() is a POSIX function; it doesn’t exist on Windows. Visual Studio (MSVC) won’t recognize it.
Proposed fix: Use _putenv_s() (Windows equivalent)
This is supported by MSVC and achieves the same effect: it removes the environment variable from the process environment.
This keeps cross-platform compatibility.
2026-01-15 11:38:07 -08:00
Yu, Zijun
531941b348
Fix NPU
2026-01-15 11:28:48 -08:00
Yu, Zijun
ae404f7cbb
Fix llama-bench
2026-01-15 11:28:48 -08:00
Yu, Zijun
072dde0b2b
change graph to 4d, support multi sequences
2026-01-15 11:28:48 -08:00
Yu, Zijun
ea2c99be1c
NPU unify PD (handled internally)
2026-01-15 11:28:48 -08:00
Zijun Yu
b8690bc055
NPU Unify PD ( #14 )
...
* Stateless. Fix llama-cli llama-server
* Simplify broadcast op in attention
* Replace get_output_tensor+memcpy with set_output_tensor
* NPU unify PD. Unify dynamic and static dims
2026-01-15 11:27:30 -08:00
Yu, Zijun
eba8113dc4
Style: middle ptr and ref align, omit optional struct keyword
2026-01-15 11:27:30 -08:00
Yu, Zijun
8b82d1153b
Fix add_sliced_mask; Revert mulmat, softmax; Remove input attention_size, iSWA model not working
2026-01-15 11:26:00 -08:00
cavusmustafa
c112bc4e73
kvcachefusion support
2026-01-15 11:26:00 -08:00
Yu, Zijun
fdadca1e89
Fix after rebasing
2026-01-15 11:26:00 -08:00
Yu, Zijun
e4bfe5a20d
Add Q5_K to support phi-3-q4_k_m
2026-01-15 11:26:00 -08:00
Yu, Zijun
2f1d50fb07
Minor refactor
2026-01-15 11:26:00 -08:00
Yu, Zijun
67e178a2f6
Minor: not add attention_size_swa for non-swa model
2026-01-15 11:26:00 -08:00
Yu, Zijun
9de874cb7b
Support iSWA
2026-01-15 11:25:58 -08:00
Yu, Zijun
7d81861a18
Fix Hunyuan
2026-01-15 11:20:31 -08:00
Yu, Zijun
bcc343af00
Support BF16 model
2026-01-15 11:20:31 -08:00
Yu, Zijun
2ad1147b9b
Improve debug util; Eliminate nop ReshapeReshape
2026-01-15 11:20:31 -08:00
Yu, Zijun
6926655f5b
Add custom quant type: q8_1_c, q4_0_128
2026-01-15 11:20:31 -08:00
Yu, Zijun
b593428eb3
Dequantize q4_1 q4_k q6_k for NPU
2026-01-15 11:20:31 -08:00
Yu, Zijun
9900245e0b
Fix test-backend-ops: Treat quantized tensors as weights
2026-01-15 11:20:31 -08:00
Yu, Zijun
dd80b04235
Fix CI; Disable test-backend-ops
2026-01-15 11:19:15 -08:00
Yu, Zijun
6ab76ed10a
Fix accuracy: disable cpu_repack
2026-01-15 11:19:15 -08:00
Yu, Zijun
663a0b8cce
Quant models run with accuracy issue
2026-01-15 11:19:15 -08:00
Yu, Zijun
d4ca760da8
Add quant weight conversion functions from genai gguf reader
2026-01-15 11:19:15 -08:00
Yu, Zijun
56d596775d
Change openvino device_type to GPU; Enable flash_attn
2026-01-15 11:19:15 -08:00
Yu, Zijun
65e1b1af6d
Fix after rebasing
...
- Layout of cache k and cache v are unified: [seq, n_head, head_size]
- Add CPY and FLASH_ATTN_EXT, flash attn is not used yet
- Skip test-backend-ops due to flash attn test crash
- Add mutex around graph conversion to avoid test-thread-safety fali in the future
- Update NPU config
- Update GPU config to disable SDPA opt to make phi-3 run
2026-01-15 11:19:15 -08:00
Yu, Zijun
a7b611bc93
Minor updates for raising PR
2026-01-15 11:19:15 -08:00
Yu, Zijun
f4123be967
Fix test-backend-ops
2026-01-15 11:19:15 -08:00
Yu, Zijun
839f8c66a0
Remove CPY
2026-01-15 11:19:15 -08:00
Yu, Zijun
7bda5021f9
Fix NPU
2026-01-15 11:19:15 -08:00
Yu, Zijun
63d000ba40
Support op SET_ROWS
2026-01-15 11:19:15 -08:00
Yu, Zijun
9a91ca6ef9
Optimize tensor conversion, improve TTFT
2026-01-15 11:19:15 -08:00