Xuan Son Nguyen
4479c382ce
demo: type inferrence
2025-12-30 17:26:23 +01:00
Xuan Son Nguyen
9c0fa6f810
rm workarounds
2025-12-30 16:07:23 +01:00
Xuan Son Nguyen
9e9a70f72f
more fixes
2025-12-29 15:07:18 +01:00
Xuan Son Nguyen
026730e8e3
more fix, more tests
2025-12-29 12:53:31 +01:00
Xuan Son Nguyen
1cf25734a9
more tests
2025-12-29 10:53:32 +01:00
Xuan Son Nguyen
2a31c9a30c
a lot of fixes
2025-12-29 00:38:29 +01:00
Xuan Son Nguyen
1784a57e7b
impl global_from_json
2025-12-28 23:15:48 +01:00
Xuan Son Nguyen
55fe96a9df
add jinja-value.cpp
2025-12-28 22:49:31 +01:00
Xuan Son Nguyen
c7f246e7a5
allow func to access ctx
2025-12-28 22:15:10 +01:00
Xuan Son Nguyen
adad34f64d
add filter_statement
2025-12-28 22:02:22 +01:00
Xuan Son Nguyen
9a8a45ff3b
mostly works
2025-12-28 21:32:55 +01:00
Xuan Son Nguyen
45df0c91e7
testing more templates
2025-12-28 19:50:09 +01:00
Xuan Son Nguyen
db09a7468d
fix negate test
2025-12-28 19:07:01 +01:00
Xuan Son Nguyen
acb0effa25
allow print source on exception
2025-12-28 18:45:41 +01:00
Xuan Son Nguyen
64e29a5848
add mk_stmt
2025-12-28 17:48:14 +01:00
Xuan Son Nguyen
7f17608ea4
use shared_ptr for values
2025-12-28 17:46:25 +01:00
Xuan Son Nguyen
4331e9c8e9
keyword arguments and slicing array
2025-12-28 17:23:29 +01:00
Xuan Son Nguyen
45c194622e
support binded functions
2025-12-28 15:33:14 +01:00
Xuan Son Nguyen
4ca114b095
track input string even after transformations
2025-12-28 12:48:35 +01:00
Xuan Son Nguyen
81310d29c1
render gemma tmpl ok
2025-12-28 12:04:23 +01:00
Xuan Son Nguyen
10835f2720
eval with is_user_input
2025-12-27 23:25:20 +01:00
Xuan Son Nguyen
c08f4ddf01
use mk_val
2025-12-27 22:28:54 +01:00
Xuan Son Nguyen
da7bbe5813
wip
2025-12-27 22:25:19 +01:00
Xuan Son Nguyen
7ed11f78f9
add more builtins
2025-12-27 22:10:45 +01:00
Xuan Son Nguyen
15b3dbab05
add string builtins
2025-12-27 21:52:50 +01:00
Xuan Son Nguyen
5a041e65b8
fix map object
2025-12-27 20:38:06 +01:00
Xuan Son Nguyen
d8ef00e610
bin ops works!
2025-12-27 20:16:46 +01:00
Xuan Son Nguyen
8d1e9a0d12
shadow naming
2025-12-27 16:06:23 +01:00
Xuan Son Nguyen
7ad6eb39ca
binary_expression::execute
2025-12-27 16:00:07 +01:00
Xuan Son Nguyen
8cea1ed6b0
parser ok
2025-12-27 12:55:01 +01:00
Xuan Son Nguyen
7ac8e98b28
clean up
2025-12-27 12:35:19 +01:00
Xuan Son Nguyen
a6e0ae7a85
demo
2025-12-27 12:22:34 +01:00
Xuan Son Nguyen
a35fcb00b5
add vm types
2025-12-27 12:12:07 +01:00
Xuan Son Nguyen
15b7c50e95
lexer
2025-12-25 21:08:51 +01:00
Xuan Son Nguyen
8d8030142e
jinja vm
2025-12-25 00:19:23 +01:00
Xuan-Son Nguyen
4cbafad4f0
model: support MiMo-V2-Flash ( #18328 )
...
* mimov2: convert ok
* rename mimov2 --> mimo2
* fix conversion
* runnable not incorrect
* use sink
* add_sliding_window_pattern
* add swa and per-layer n_head_kv
* correct params
* somewhat working
* correct gating func
* nits
* mimo2: wire RMS eps + MoE bias + converter guards
* add co-author
Co-authored-by: Aaryan-Kapoor <Aaryan-Kapoor@users.noreply.github.com>
* use add_rope_freq_base_swa
---------
Co-authored-by: Aaryan Kapoor <aaryankapoor2006@gmail.com>
Co-authored-by: Aaryan-Kapoor <Aaryan-Kapoor@users.noreply.github.com>
2025-12-24 23:07:08 +01:00
Aadeshveer Singh
c184284230
fit-params : fix race condition in fit-params output ( #18276 )
2025-12-24 15:57:38 +01:00
Aman Gupta
c8a2417d7b
CUDA: experimental native mxfp4 support for blackwell ( #17906 )
...
* CUDA: experimental native mxfp4 support for blackwell
* optimize load_tiles
* optimize quantize_mxfp4
* cleanup
* first pass review: formatting
* use interleaved layout for mma
* mmq: add assert for size
* use __nv_fp4x4_e2m1
* use iter_k as 512, cleanup
* Use 1200 as blackwell instead of 1000
* address review comments
* mmq: fix stride
* quantize.cu: use reference impl of e8m0 scale
* address review comments
* add 120f-virtual + minor fixes
---------
Co-authored-by: Aman Gupta <aman>
2025-12-24 22:28:26 +08:00
Saba Fallah
54132f1b1f
model : support for LlamaBidirectionalModel architecture ( #18220 )
...
* model: llama-embed-nemotron
* minor: python lint
* changed arch-name
* templated llm_build_llama to be used for both llama and llama-embed arch
2025-12-24 14:02:36 +01:00
Jeff Bolz
2a9ea2020c
vulkan: fix command buffer corruption in ggml_backend_vk_event_wait ( #18302 )
2025-12-24 12:36:34 +01:00
Wang Weixuan
ce7a6dc0fc
CANN : refactor ACL graph cache ( #17752 )
...
Move the graph property checking code into methods of LRU cache.
Signed-off-by: Wang Weixuan <wangweixvan@gmail.com>
2025-12-24 17:50:24 +08:00
Jesse Ikonen
1ce0126b18
docs: Fix typos in SYCL documentation ( #18269 )
2025-12-24 17:19:47 +08:00
Ruben Ortlam
7f459c98e7
vulkan: use fewer FA rows for small cache runs ( #18280 )
2025-12-24 08:59:14 +01:00
TianHao324
cf2ffc02bc
CANN: Uses yarn_ramp cache in ROPE ( #17725 )
2025-12-24 14:55:33 +08:00
ddh0
10355dc7d0
common: add `LLAMA_ARG_OVERRIDE_TENSOR` env var for `-ot` arg ( #18267 )
2025-12-24 14:19:12 +08:00
Xuan-Son Nguyen
5ee4e43f26
server: return_progress to also report 0% processing state ( #18305 )
2025-12-23 21:49:05 +01:00
Pascal
5b6c9bc0f3
webui: apply webui_settings on first load ( #18223 )
...
* webui: apply webui_settings on first load
The webui_settings from /props were not applied on initial load
when default_generation_settings.params was null
Now syncs whenever serverProps is available, regardless of params,
works for both single-model and router modes
* chore: update webui build output
2025-12-23 15:48:03 +01:00
Xuan-Son Nguyen
849d021104
server: fix crash with model not having BOS/EOS ( #18321 )
2025-12-23 14:39:36 +01:00
Daniel Bevenius
8e3ead6e4d
model-conversion : add device option to run-org-model.py ( #18318 )
...
* model-conversion : add device option to run-org-model.py
This commit refactors the `run-org-model.py` script to include a
`--device` argument, to allow users to specify the device on which to
run the model (e.g., cpu, cuda, mps, auto).
It also extracts a few common functions to prepare for future changes
where some code duplication will be removed which there currently
exists in embedding scripts.
The Makefile is also been updated to pass the device argument, for
example:
```console
(venv) $ make causal-verify-logits DEVICE=cpu
```
* fix error handling and remove parser reference
This commit fixes the error handling which previously referenced an
undefined 'parser' variable.
2025-12-23 14:07:25 +01:00
Chris Rohlf
12ee1763a6
rpc : add check for rpc buffer type ( #18242 )
2025-12-23 11:56:49 +02:00