Commit Graph

7567 Commits

Author SHA1 Message Date
Xuan Son Nguyen 9e9a70f72f more fixes 2025-12-29 15:07:18 +01:00
Xuan Son Nguyen 026730e8e3 more fix, more tests 2025-12-29 12:53:31 +01:00
Xuan Son Nguyen 1cf25734a9 more tests 2025-12-29 10:53:32 +01:00
Xuan Son Nguyen 2a31c9a30c a lot of fixes 2025-12-29 00:38:29 +01:00
Xuan Son Nguyen 1784a57e7b impl global_from_json 2025-12-28 23:15:48 +01:00
Xuan Son Nguyen 55fe96a9df add jinja-value.cpp 2025-12-28 22:49:31 +01:00
Xuan Son Nguyen c7f246e7a5 allow func to access ctx 2025-12-28 22:15:10 +01:00
Xuan Son Nguyen adad34f64d add filter_statement 2025-12-28 22:02:22 +01:00
Xuan Son Nguyen 9a8a45ff3b mostly works 2025-12-28 21:32:55 +01:00
Xuan Son Nguyen 45df0c91e7 testing more templates 2025-12-28 19:50:09 +01:00
Xuan Son Nguyen db09a7468d fix negate test 2025-12-28 19:07:01 +01:00
Xuan Son Nguyen acb0effa25 allow print source on exception 2025-12-28 18:45:41 +01:00
Xuan Son Nguyen 64e29a5848 add mk_stmt 2025-12-28 17:48:14 +01:00
Xuan Son Nguyen 7f17608ea4 use shared_ptr for values 2025-12-28 17:46:25 +01:00
Xuan Son Nguyen 4331e9c8e9 keyword arguments and slicing array 2025-12-28 17:23:29 +01:00
Xuan Son Nguyen 45c194622e support binded functions 2025-12-28 15:33:14 +01:00
Xuan Son Nguyen 4ca114b095 track input string even after transformations 2025-12-28 12:48:35 +01:00
Xuan Son Nguyen 81310d29c1 render gemma tmpl ok 2025-12-28 12:04:23 +01:00
Xuan Son Nguyen 10835f2720 eval with is_user_input 2025-12-27 23:25:20 +01:00
Xuan Son Nguyen c08f4ddf01 use mk_val 2025-12-27 22:28:54 +01:00
Xuan Son Nguyen da7bbe5813 wip 2025-12-27 22:25:19 +01:00
Xuan Son Nguyen 7ed11f78f9 add more builtins 2025-12-27 22:10:45 +01:00
Xuan Son Nguyen 15b3dbab05 add string builtins 2025-12-27 21:52:50 +01:00
Xuan Son Nguyen 5a041e65b8 fix map object 2025-12-27 20:38:06 +01:00
Xuan Son Nguyen d8ef00e610 bin ops works! 2025-12-27 20:16:46 +01:00
Xuan Son Nguyen 8d1e9a0d12 shadow naming 2025-12-27 16:06:23 +01:00
Xuan Son Nguyen 7ad6eb39ca binary_expression::execute 2025-12-27 16:00:07 +01:00
Xuan Son Nguyen 8cea1ed6b0 parser ok 2025-12-27 12:55:01 +01:00
Xuan Son Nguyen 7ac8e98b28 clean up 2025-12-27 12:35:19 +01:00
Xuan Son Nguyen a6e0ae7a85 demo 2025-12-27 12:22:34 +01:00
Xuan Son Nguyen a35fcb00b5 add vm types 2025-12-27 12:12:07 +01:00
Xuan Son Nguyen 15b7c50e95 lexer 2025-12-25 21:08:51 +01:00
Xuan Son Nguyen 8d8030142e jinja vm 2025-12-25 00:19:23 +01:00
Xuan-Son Nguyen 4cbafad4f0
model: support MiMo-V2-Flash (#18328)
* mimov2: convert ok

* rename mimov2 --> mimo2

* fix conversion

* runnable not incorrect

* use sink

* add_sliding_window_pattern

* add swa and per-layer n_head_kv

* correct params

* somewhat working

* correct gating func

* nits

* mimo2: wire RMS eps + MoE bias + converter guards

* add co-author

Co-authored-by: Aaryan-Kapoor <Aaryan-Kapoor@users.noreply.github.com>

* use add_rope_freq_base_swa

---------

Co-authored-by: Aaryan Kapoor <aaryankapoor2006@gmail.com>
Co-authored-by: Aaryan-Kapoor <Aaryan-Kapoor@users.noreply.github.com>
2025-12-24 23:07:08 +01:00
Aadeshveer Singh c184284230
fit-params : fix race condition in fit-params output (#18276) 2025-12-24 15:57:38 +01:00
Aman Gupta c8a2417d7b
CUDA: experimental native mxfp4 support for blackwell (#17906)
* CUDA: experimental native mxfp4 support for blackwell

* optimize load_tiles

* optimize quantize_mxfp4

* cleanup

* first pass review: formatting

* use interleaved layout for mma

* mmq: add assert for size

* use __nv_fp4x4_e2m1

* use iter_k as 512, cleanup

* Use 1200 as blackwell instead of 1000

* address review comments

* mmq: fix stride

* quantize.cu: use reference impl of e8m0 scale

* address review comments

* add 120f-virtual + minor fixes

---------

Co-authored-by: Aman Gupta <aman>
2025-12-24 22:28:26 +08:00
Saba Fallah 54132f1b1f
model : support for LlamaBidirectionalModel architecture (#18220)
* model: llama-embed-nemotron

* minor: python lint

* changed arch-name

* templated llm_build_llama to be used for both llama and llama-embed arch
2025-12-24 14:02:36 +01:00
Jeff Bolz 2a9ea2020c
vulkan: fix command buffer corruption in ggml_backend_vk_event_wait (#18302) 2025-12-24 12:36:34 +01:00
Wang Weixuan ce7a6dc0fc
CANN : refactor ACL graph cache (#17752)
Move the graph property checking code into methods of LRU cache.

Signed-off-by: Wang Weixuan <wangweixvan@gmail.com>
2025-12-24 17:50:24 +08:00
Jesse Ikonen 1ce0126b18
docs: Fix typos in SYCL documentation (#18269) 2025-12-24 17:19:47 +08:00
Ruben Ortlam 7f459c98e7
vulkan: use fewer FA rows for small cache runs (#18280) 2025-12-24 08:59:14 +01:00
TianHao324 cf2ffc02bc
CANN: Uses yarn_ramp cache in ROPE (#17725) 2025-12-24 14:55:33 +08:00
ddh0 10355dc7d0
common: add `LLAMA_ARG_OVERRIDE_TENSOR` env var for `-ot` arg (#18267) 2025-12-24 14:19:12 +08:00
Xuan-Son Nguyen 5ee4e43f26
server: return_progress to also report 0% processing state (#18305) 2025-12-23 21:49:05 +01:00
Pascal 5b6c9bc0f3
webui: apply webui_settings on first load (#18223)
* webui: apply webui_settings on first load

The webui_settings from /props were not applied on initial load
when default_generation_settings.params was null

Now syncs whenever serverProps is available, regardless of params,
works for both single-model and router modes

* chore: update webui build output
2025-12-23 15:48:03 +01:00
Xuan-Son Nguyen 849d021104
server: fix crash with model not having BOS/EOS (#18321) 2025-12-23 14:39:36 +01:00
Daniel Bevenius 8e3ead6e4d
model-conversion : add device option to run-org-model.py (#18318)
* model-conversion : add device option to run-org-model.py

This commit refactors the `run-org-model.py` script to include a
`--device` argument, to allow users to specify the device on which to
run the model (e.g., cpu, cuda, mps, auto).
It also extracts a few common functions to prepare for future changes
where some code duplication will be removed which there currently
exists in embedding scripts.

The Makefile is also been updated to pass the device argument, for
example:
```console
(venv) $ make causal-verify-logits DEVICE=cpu
```

* fix error handling and remove parser reference

This commit fixes the error handling which previously referenced an
undefined 'parser' variable.
2025-12-23 14:07:25 +01:00
Chris Rohlf 12ee1763a6
rpc : add check for rpc buffer type (#18242) 2025-12-23 11:56:49 +02:00
nullname ed75977717
ggml-hexagon: create generalized functions for cpu side op (#17500)
* refactor: replace ggml_hexagon_mul_mat with template-based binary operation for improved flexibility

* refactor: replace ggml_hexagon_mul_mat_id with template-based binary operation for improved flexibility

* refactor: initialize buffer types and streamline dspqueue_buffers_init calls for clarity

* add comment

* refactor: remove redundant buffer checks in hexagon supported operations

* wip

* add missing include to fix weak symbol warning

* add ggml_hexagon_op_generic

* refactor: simplify tensor operation initialization and buffer management in hexagon implementation

* refactor: streamline hexagon operation initialization and buffer management

* refactor: update function signatures and streamline request handling in hexagon operations

* wip

* ggml-hexagon: clean up code formatting and improve unary operation handling

* wip

* rename

* fix: add support for permuted F16 tensors and enhance quantization checks in matrix operations

* refactor: replace ggml_hexagon_mul_mat with template-based binary operation for improved flexibility

refactor: replace ggml_hexagon_mul_mat_id with template-based binary operation for improved flexibility

refactor: initialize buffer types and streamline dspqueue_buffers_init calls for clarity

refactor: remove redundant buffer checks in hexagon supported operations

add missing include to fix weak symbol warning

add ggml_hexagon_op_generic

refactor: simplify tensor operation initialization and buffer management in hexagon implementation

refactor: streamline hexagon operation initialization and buffer management

refactor: update function signatures and streamline request handling in hexagon operations

ggml-hexagon: clean up code formatting and improve unary operation handling

fix: add support for permuted F16 tensors and enhance quantization checks in matrix operations

# Conflicts:
#	ggml/src/ggml-hexagon/ggml-hexagon.cpp

* hexagon: fix merge conflicts

* hexagon: minor cleanup for buffer support checks

* hexagon: factor out op_desc and the overal op logging

* hexagon: further simplify and cleanup op dispatch logic

* snapdragon: update adb scripts to use llama-cli and llama-completion

* fix pipeline failure

---------

Co-authored-by: Max Krasnyansky <maxk@qti.qualcomm.com>
2025-12-22 23:13:24 -08:00
Daniel Bevenius 847c35f7d5
model-conversion : add trust_remote_code for embedding scripts (#18288)
This commit adds the trust_remote_code=True parameter when loading
models and configurations in the embedding model conversion scripts.
It also adds a cast to float for models that might use a data type that
is not supported by python, for example bfloat16.

The motivation for this is that some models may require custom code to
be executed during loading, and setting trust_remote_code to True avoids
getting prompted for confirmation.

Future work will consolidate the embedding conversion scripts with the
causal conversion scripts to avoid code duplication. But in the mean
time it would be nice to have this fix in place.
2025-12-23 07:27:37 +01:00