Commit Graph

44 Commits

Author SHA1 Message Date
Yu, Zijun 5525bac078 Use bias instead of zp in test-backend-ops 2026-02-13 17:38:21 +08:00
Yu, Zijun 2a6a95eb77 Fix MUL_MAT with broadcast; Add unsupported MUL_MAT FLASH_ATTN cases 2026-02-13 15:36:51 +08:00
Yu, Zijun 1a54965c43 Suppress logging and add error handling to allow test-backend-ops to complete 2026-02-13 10:43:52 +08:00
Yu, Zijun d5d673cde3 Fix test-backend-ops crash glu, get_rows, scale, rms_norm, add 2026-02-12 17:32:19 +08:00
Yu, Zijun 900dd76c24 Refactor weight tensor processing 2026-02-11 10:15:09 +08:00
Yu, Zijun 0ee7e05485 Extract zp directly instead of bias 2026-02-11 10:15:09 +08:00
Yu, Zijun b6c0697d10 Avoid re-compilation in llama-bench 2026-02-11 10:15:09 +08:00
Yu, Zijun 18ab0f562b Remove hardcode names 2026-02-11 10:15:09 +08:00
Yu, Zijun c840210213 Don't put kvcache on GPU in stateful mode 2026-02-11 10:15:09 +08:00
Yu, Zijun 1c0a47a485 Fix --direct-io 0 2026-02-11 10:15:09 +08:00
Yu, Zijun 9a15c8b0cf Change ov backend buffer is_host to false 2026-01-21 15:23:12 +08:00
Yu, Zijun cfc471353d FIX: use remote tensor from singleton 2026-01-15 11:39:08 -08:00
Yu, Zijun a356b44477 only use remote tensor for kvcache for GPU 2026-01-15 11:39:08 -08:00
Yu, Zijun 88d1d17eac only use remote tensor for kvcache 2026-01-15 11:39:08 -08:00
Yu, Zijun 8273a7c2f4 Use ggml_aligned_malloc 2026-01-15 11:39:08 -08:00
Yu, Zijun d757849741 Put kvcache on GPU 2026-01-15 11:39:08 -08:00
Yu, Zijun 3fdcb6ab72 Add ov_backend_host_buffer; Use cached remote context 2026-01-15 11:39:08 -08:00
Yu, Zijun 72bba828df Use shared_buffer for GPU NPU; Refactor 2026-01-15 11:39:08 -08:00
Yu, Zijun 22d9c17a6f backend buffer: allocate on host 2026-01-15 11:39:08 -08:00
Yu, Zijun 2a9d4ca836 Refactor: split ov_graph_compute for dynamic and static 2026-01-15 11:39:08 -08:00
Yu, Zijun 072dde0b2b change graph to 4d, support multi sequences 2026-01-15 11:28:48 -08:00
Yu, Zijun 303923aba7 Clean placeholders in ggml-openvino.cpp 2026-01-15 11:27:30 -08:00
Yu, Zijun eba8113dc4 Style: middle ptr and ref align, omit optional struct keyword 2026-01-15 11:27:30 -08:00
Yu, Zijun bd3093f90c Style: use switch in supports_ops 2026-01-15 11:27:30 -08:00
Yu, Zijun 299f4923bb fix after rebasing 2026-01-15 11:26:00 -08:00
Yu, Zijun e4bfe5a20d Add Q5_K to support phi-3-q4_k_m 2026-01-15 11:26:00 -08:00
Yu, Zijun 1a38339cea Fix ROPE accuracy when freq_scale != 1 2026-01-15 11:26:00 -08:00
Yu, Zijun 597561242f Add GeGLU 2026-01-15 11:20:31 -08:00
Yu, Zijun b593428eb3 Dequantize q4_1 q4_k q6_k for NPU 2026-01-15 11:20:31 -08:00
Yu, Zijun 9ca53c7991 Add NPU Q4_0 support 2026-01-15 11:20:31 -08:00
Yu, Zijun 9900245e0b Fix test-backend-ops: Treat quantized tensors as weights 2026-01-15 11:20:31 -08:00
Yu, Zijun 6ab76ed10a Fix accuracy: disable cpu_repack 2026-01-15 11:19:15 -08:00
Yu, Zijun 3e897df51c Update supports_buft and supports_op for quantized models 2026-01-15 11:19:15 -08:00
Yu, Zijun 56d596775d Change openvino device_type to GPU; Enable flash_attn 2026-01-15 11:19:15 -08:00
Yu, Zijun 65e1b1af6d Fix after rebasing
- Layout of cache k and cache v are unified: [seq, n_head, head_size]
- Add CPY and FLASH_ATTN_EXT, flash attn is not used yet
- Skip test-backend-ops due to flash attn test crash
- Add mutex around graph conversion to avoid test-thread-safety fali in the future
- Update NPU config
- Update GPU config to disable SDPA opt to make phi-3 run
2026-01-15 11:19:15 -08:00
Yu, Zijun f4123be967 Fix test-backend-ops 2026-01-15 11:19:15 -08:00
Yu, Zijun 839f8c66a0 Remove CPY 2026-01-15 11:19:15 -08:00
Yu, Zijun 63d000ba40 Support op SET_ROWS 2026-01-15 11:19:15 -08:00
Yu, Zijun fc865340d5 Fix test-backend-ops 2026-01-15 10:26:28 -08:00
Yu, Zijun 75eec6265f Fix llama-bench; Clang-format 2026-01-15 10:26:28 -08:00
Yu, Zijun 44f4cf34b1 Fix Phi3 ROPE; Add test-backend-ops 2026-01-15 10:26:28 -08:00
Yu, Zijun 73ee84fffe Add SwiGLU 2026-01-15 10:20:18 -08:00
Yu, Zijun a0b30529bf FIX: backend buffer type issue 2026-01-15 10:10:00 -08:00
Zijun Yu 4c905b2b25 fix build error 2026-01-15 10:10:00 -08:00
Renamed from ggml/src/ggml-openvino.cpp (Browse further)