Commit Graph

338 Commits

Author SHA1 Message Date
HanishKVC cae0fff715 SimpCfg: Update notes; Try add a better trimming logic 2024-05-06 11:27:56 +05:30
HanishKVC d1156cc055 SimpCfg: As locale manipulation reqd for better processing 2024-05-06 11:27:56 +05:30
HanishKVC 2325764180 SimpCfg:CheckStrings: Switch Mbs2Wcs to multithread safe calls 2024-05-06 11:27:56 +05:30
HanishKVC 23acf07bb2 SimpCfg:CheckStrings: Cleanup wstring flow to needed parts 2024-05-06 11:27:56 +05:30
HanishKVC 2cda78f1ad SimpCfg:CheckStrings: WString2String finally
The constructor method doesnt convert wstring to string, when it
involves non-english chars which will encode to multibyte chars
in utf8. even thou it does work for the already utf8 u8string.

wcstombs doesnt seem to work for non english chars, when the
locale is set to the default c, need to change to something like
en_US.UTF-8, to allow it to do the conversion properly.
2024-05-06 11:27:56 +05:30
HanishKVC 7607dbc8c7 SimpCfg:CheckStrings: Try fixup wstring handling 2024-05-06 11:27:56 +05:30
HanishKVC 1a618a42f8 SimpCfg: Update the func notes with alert 2024-05-06 11:27:56 +05:30
HanishKVC 66d6fa62b7 SimpCfg: C++ and strings is a mess even after decades
Seperate out the checks wrt different string types.

Add a wstring_basic, which verifies that wstring iterator handles
non english chars propery or atleast better.
2024-05-06 11:27:56 +05:30
HanishKVC 3ad5cec47e SimpCfg:CheckStrings:MacOS, wstring and wcout
Without using imbue, I couldnt get non-english wstrings to print
on mac. Need to check on linux also.

Also avoid the uint8_t typecasting, given that wchar isnt 8bit
2024-05-06 11:27:56 +05:30
HanishKVC a448fec486 SimpCfg:CheckString: organise and probe - p3 wstring
wcouts' involving 2nd wstring with non english char in it not
showing up?
2024-05-06 11:27:56 +05:30
HanishKVC 691d0d43b5 SimpCfg:CheckStrings: Organise and Probe - P2 - std::u8string 2024-05-06 11:27:56 +05:30
HanishKVC 713520caac SimpCfg:CheckStrings: Organise and Probe - p1 std::string 2024-05-06 11:27:56 +05:30
HanishKVC 56f19c7a68 SimpCfg: Test c++ string handling 2024-05-06 11:27:56 +05:30
HanishKVC 86e776c857 SimpCfg: Rename to get_vector, add some test code 2024-05-06 11:27:56 +05:30
HanishKVC 1e1f54ec1d SimpCfg:GetArray flesh out, helpers to convert to string
Good: Forgotten love, All the light we cant see, laapataa ladies
2024-05-06 11:27:56 +05:30
HanishKVC 561f50930e SimpCfg: initial go at adding support for spreadout arrays
By having a seperate entry for each element of the array, the
existing logic itself can be repurposed with minimal additions.
2024-05-06 11:27:56 +05:30
HanishKVC 8fdc80533f SimpCfg:Cleanup:Cmdline Arg, GetValueCallerLogging, StringCmp
Bring the direct string comparison also back, as the issue with
cmp was more to do with string.transform.
2024-05-06 11:27:56 +05:30
HanishKVC 08b9711d68 SimpCfg:Remove dbug logs wrt str_tolower and set_bool
Also the warning wrt is it string, now also logs the line number,
group and key, to help user identify the line better.

Misc: pass time last week Another life, Anchakkallakokkan, Deadloch
2024-05-06 11:27:56 +05:30
HanishKVC 0e0d7da18f SimpCfg:Found issue with str_tolower, transform doesnt resize dst 2024-05-06 11:27:56 +05:30
HanishKVC eb56517f20 SimpCfg:Bools:Make lowercase b4 checking true/false for bool path
TODO: string check wrt true/false, doesnt seem to be working after
str_tolower was introduced. I seem to be doing some silly mistake
not able to make out, moving in and out of sleep, need to check
tomorrow.

string == string-literal failed
string == string-view failed
string.compare(string-literal) failed

Bit strange
2024-05-06 11:27:56 +05:30
HanishKVC ef5a2cf391 SimpCfg:Dbug why bool is not setting properly 2024-05-06 11:27:56 +05:30
HanishKVC 5aa1072aac SimpCfg: Move dump into its own func, Avoid KV iter wrt Get 2024-05-06 11:27:56 +05:30
HanishKVC 6b475e444f SimpCfg: Log Caller of Set/GetValue 2024-05-06 11:27:56 +05:30
HanishKVC f05f71bdc4 SimpCfg:SetBool string value, str_tolower, SetValueTypeLogging 2024-05-06 11:27:56 +05:30
HanishKVC 1dc7fd0e85 SimpCfg:WIP:Variant TypeDef, to_str and std::get
Cleanup the use of the variant

Initialize and << op stringstream seperately.
2024-05-06 11:27:56 +05:30
HanishKVC ee1a62c876 SimpCfg:WIP:Switch to C++ Variant type - initial skeleton 2024-05-06 11:27:56 +05:30
HanishKVC 7302b3ab36 SimpCfg: Use stderr wrt internal Log messaging helpers 2024-05-06 11:27:56 +05:30
HanishKVC a09571318a ChatON: meta-dump returns flag inturn returned by meta-ok
test-chat-template-chaton now tries to check if meta-ok is ok wrt
the template-id being looked into.

Log template-id info also, where it was previously missed out.
2024-05-06 11:27:56 +05:30
HanishKVC 44c05305d0 SimpCfg: Add support for get_double 2024-05-06 11:27:56 +05:30
HanishKVC 8ad2c17e5d SimpCfg: get_int64 logic 2024-05-06 11:27:56 +05:30
HanishKVC 000245b8e8 SimpCfg:Warn possible nonstring strings, some invalid floats
Warn if something not starting with double quote is being treated
as a string.

Show some examples of invalid floating point values wrt this
logics floating point determination code
2024-05-06 11:27:56 +05:30
HanishKVC a6648b02f2 SimpCfg:Show floating point values in normal and exponential form 2024-05-06 11:27:56 +05:30
HanishKVC 4181164217 SimpCfg:Implement set_int64 and set_double
Also update the sample simpcfg file, to test for int and float
values.
2024-05-06 11:27:56 +05:30
HanishKVC fb9a7dc7fe SimpCfg:Initial skeleton towards supporting int and floating point 2024-05-06 11:27:56 +05:30
HanishKVC d0b3ebf32e SimpCfg: Use & wrt destination of [] operation
so that one can update the value-item's content, without needing
to explicitly update/store the value-item back into map after the
content has been updated.

This should make these setting operations/helpers more efficient.
2024-05-06 11:27:56 +05:30
HanishKVC 0a534e6897 SimpCfg: Rename test program related #define 2024-05-06 11:27:56 +05:30
HanishKVC ca5a04d607 SimpCfg: Remove double quotes around group, key or string value 2024-05-06 11:27:56 +05:30
HanishKVC 82348e2840 SimpCfg: Put GroupMap back into Map, Iterate during get if DBUG
TODO: Have to look into C++ a bit later including these default
container types. Not sure my current flow is efficient.
2024-05-06 11:27:56 +05:30
HanishKVC 9940bd8ed7 SimpCfg: Allow default values wrt set string and set bool 2024-05-06 11:27:56 +05:30
HanishKVC 951fbc3396 SimpCfg: Change logging to LDBUG and LERRR helpers 2024-05-06 11:27:56 +05:30
HanishKVC d514c81829 SimpCfg: Add the const which I had forgotten wrt args 2024-05-06 11:27:56 +05:30
HanishKVC 6de8a14f32 SimpCfg: Rename member functions to avoid sc_ prefix
now that logic has been converted into a class, no need for this
prefix
2024-05-06 11:27:56 +05:30
HanishKVC 1ecca5a7ec SimpCfg: Convert to a class 2024-05-06 11:27:56 +05:30
HanishKVC 28ae0c5b02 SimpCfg:Make str_trim flexible, use to trim , wrt value
Now one can pass the list/string of chars to trim at either end.
2024-05-06 11:27:56 +05:30
HanishKVC 2cbb00c340 SimpCfg: Add support for boolean fields wrt key-value 2024-05-06 11:27:56 +05:30
HanishKVC aea6850131 SimpCfg: Keep compiler happy, also add newline wrt alt logging def 2024-05-06 11:27:56 +05:30
HanishKVC f4687fa5d4 SimpCfg:Parse config file and load string key-value fields 2024-05-06 11:27:56 +05:30
HanishKVC ce75d434dc SimpCfg: Initial skeleton : get and set string and bool values 2024-05-06 11:27:56 +05:30
HanishKVC af9a0a211b ChatON:ChatTmplApply: Avoid the stringstream 2024-05-06 11:27:56 +05:30
HanishKVC 889a45ff28 ChatON:ChatTmplApply:Update the function notes 2024-05-06 11:27:56 +05:30
HanishKVC ff5f68826b ChatON:ChatTmplApplySingle: Avoid streamstring, update func notes 2024-05-06 11:27:56 +05:30
HanishKVC 32e672c5dd ChatON: Dont log final tagged message string to screen 2024-05-06 11:27:56 +05:30
HanishKVC cad50c527e ChatON: Update the note to match current logic 2024-05-06 11:27:56 +05:30
HanishKVC a4b3285034 ChatON:Show Log on screen when template is applied 2024-05-06 11:27:56 +05:30
HanishKVC d61b071b8d Chaton:Common:Add missing newline wrt cmdline arg usage 2024-05-06 11:27:56 +05:30
HanishKVC fee887fe31 ChatON:Common:Update the cmdline argument name used
Had forgotten to update it before
2024-05-06 11:27:56 +05:30
HanishKVC 58e1ff16bc ChatON: switch to ordered_json from json library
to be in sync with the json namespace in server.
2024-05-06 11:27:56 +05:30
HanishKVC a630564c48 ChatON:ChatTemplateApplyCAPI remaining base logic
As c doesnt have the concept of pass by reference, and inturn the
existing c api uses pointers wrt llama chat message structure, so
switching to same wrt chat_tmpl_apply logics.

Also fix a oversight in previous commit and add the remaining logic.
2024-05-06 11:27:56 +05:30
HanishKVC 308d3bf3ff ChatON:WIP:Add c api wrapper for chat_template_apply
Initial skeletons

Update existing logics to help with same. Also the inbetween helper
was having a bad signature wrt returning status and data, thats also
fixed.
2024-05-06 11:27:56 +05:30
HanishKVC e62699f923 ChatON: Add alertAssistantAtEnd flag & logic wrt MultiMsgs Apply
While sending the current chat session along with new user query
to the model, many models expect that a tag be added at the end
to indicate that user is expecting the model to respond, this
flags allows for the same.
2024-05-06 11:27:56 +05:30
HanishKVC ea3a0f19cc ChatON: Rather check for tmpl existance in single_ex 2024-05-06 11:27:56 +05:30
HanishKVC 01c8db70f7 ChatON+Main: Add C_API wrapper for single
Add a c api wrapper for a single message tagging scenario.

Inturn to match convention followed by existing chat_apply_template
code, make it return the size expected of the tagged message string
buffer. Update internal single logic to help with same.

Explicitly check if tmpl specified is available in the loaded json
or not and then return a error if not found.
2024-05-06 11:27:56 +05:30
HanishKVC 13857f29d6 ChatON+Main: Updates wrt detailed meta json
Fix a oversight wrt key name.

Add a alert in case if passed meta json file contains begin(BoS)
wrt assistant role, similar to check for end (EoS) wrt user role.
Bcas normally both (ie EoS wrt User and BoS wrt Assistant) shouldnt
be needed.

Update main wrt begin & prefix and suffix & end addition.
2024-05-06 11:27:56 +05:30
HanishKVC 0cd7c62706 ChatON: Keep compiler happy
Move helpers to the begining, so can avoid adding prototype
declerations/function signatures to the begining

Get the char * wrt string data in the c++ string.
2024-05-06 11:27:56 +05:30
HanishKVC 6a0214c067 ChatON:MetaOK->MetaDump: Alert if user->end is needed or not
Because user messages dont normally need a EoS token.
2024-05-06 11:27:56 +05:30
HanishKVC 344857b6cb ChatOn:ChatOnTemplateApply: suffix,end flag based control
Also fix a oversight wrt begin, when flag based begin adding control
was introduced.

NOTE: Currently system role suffix/end conditional adding always
triggered, if 1st system prompt seen or additional system prompt
is seen.
2024-05-06 11:27:56 +05:30
HanishKVC f8ae21cec7 ChatON:ChatTemplateApplySingle: update begin+prefix, suffix+end 2024-05-06 11:27:56 +05:30
HanishKVC 5d76f08d37 ChatON: Need to explicitly specify string to use c_str 2024-05-06 11:27:56 +05:30
HanishKVC 7ba0144e42 ChatOn:chaton_tmpl_role_kv: try except to ignore missing ifany
Cas of above reason, switch to directly accessing the keys in
dump helper, which is inturn used by meta_ok check
2024-05-06 11:27:56 +05:30
HanishKVC adab5775bf ChatON: more detailed/spreadout json fields 2024-05-06 11:27:56 +05:30
HanishKVC 3f09eb5dea ChatOn: ChatTemplateApply[Ex] return tagged msgs parts detail
Now there is a simple and extended version of returning tagged
messages.

The extended version returns the tagged string, as well as the
details of the parts that make up that tagged message interms of
the type of parts and the lengths of the parts.
2024-05-06 11:27:56 +05:30
HanishKVC 825a78abaa ChatOn: ChatTemplateApplySingle[Ex] return parts detail
Now there is a simple and extended version of returning tagged
message wrt a single role and its content.

The extended version returns the tagged string, as well as the
details of the parts that make up that tagged message interms of
the type of parts and the lengths of the parts.
2024-05-06 11:27:56 +05:30
HanishKVC 92e780fb1a ChatON:ChatParts: Allow flexibility for more refined tokenization 2024-05-06 11:27:56 +05:30
HanishKVC d1899728aa ChatON: Test ChatParts in chat-template-apply 2024-05-06 11:27:56 +05:30
HanishKVC 9de1d6017f ChatON:ChatParts class initial go
Helps keep user prompt and chat-hs-template tag parts seperate,
but in sequence
2024-05-06 11:27:56 +05:30
HanishKVC 3064a36e74 ChatON+:Update tmpl_role_kv to retrieve wrt multiple keys
Use the same for user role's begin and prefix entries.
2024-05-06 11:27:56 +05:30
HanishKVC f1f39c5256 ChatON:Add Monarch model template, which uses Begin + Prefix
Inturn Begin/BoS is added only for non 1st user messages in a
system+user prompts chain.
2024-05-06 11:27:56 +05:30
HanishKVC 724ff38345 ChatOn: Wrap getting begin in try-catch,
so that even if a role doesnt contain begin, the logic will work
fine.
2024-05-06 11:27:56 +05:30
HanishKVC d70fca7a45 ChatOn: Add begin to the mix along with prefix
Dump shows user->begin.

chat-template-apply[-single] updated to work with begin and prefix

TODO: need to wrap begin in a try-catch, so that irrespective of
role, begin+prefix will work, irrespoective of whether that role
has a begin entry or not.
2024-05-06 11:27:56 +05:30
HanishKVC bdd279c0c9 ChatOn:User Begin+Prefix note update, keep things simple consistent 2024-05-06 11:27:56 +05:30
HanishKVC 84367b9fd1 ChatON: Add template for DeepSeek
Was looking at the tokenized vector, and noticed that the EOS
mentioned by existing chat_apply_template of llama.cpp, is different
from what I noticed in tokenizer_config.json of deepseek llm, so
I have added two entries

* "deepseek-alt" which matches llama.cpp's chat_apply_template and
* "deepseek" which matches that in tokenizer_config.json.

This impacts the assistant suffix and reverse prompt entries.

CasOfThis: Need to look into other entries which I added previously
at a later time. However as the default logic should be picking the
EOS from model file, so I assume reverse-prompt being outofsync,
may not matter beyond a limit, potentially.
2024-05-06 11:27:56 +05:30
HanishKVC 57bd772bfd ChatON: Cleanup logging
Avoid showing on screen the debug messages.

meta-dump can either show on screen or not, based on how LOGXLN
is defined.
2024-05-06 11:27:56 +05:30
HanishKVC 217544e5ff ChatON: Keep compiler happy
Order the functions so that no need for seperate prototypes

Also use kv_bool wrt boolean entries.

Convert string to c char *
2024-05-06 11:27:56 +05:30
HanishKVC 3f9dfc240c ChatON: Check for the boolean entries in meta-json 2024-05-06 11:27:56 +05:30
HanishKVC 42f6b45547 ChatON: Use the constants defined for the keys 2024-05-06 11:27:56 +05:30
HanishKVC efb758ba7d ChatON: Rename helpers to kv suffix, updated wrt metaok
rename because they return value of specified key.

[main] update metaok to take template-id, so that one can cross
check that all needed entries are there wrt that template-id in
the chaton-meta-json file
2024-05-06 11:27:56 +05:30
HanishKVC e8c24c0767 ChatOn:MetaOk: Allows template-id based cross check
For a given template-id, cross check, all needed entries are there
in the json.
2024-05-06 11:27:56 +05:30
HanishKVC b1055641e9 ChatON: Update the notes a bit 2024-05-06 11:27:56 +05:30
HanishKVC 11b47fbcfc ChatON:MetaJson: Add key constants, check metaJson loaded ifNeeded 2024-05-06 11:27:56 +05:30
HanishKVC 221ccd6462 ChatOn: Add SystemUser-1st-User-Has-Prefix flag support
Llama2 seems to need it, so chaton-meta-json sample file updated
to use same.
2024-05-06 11:27:56 +05:30
HanishKVC f03dd2439f ChatOn:No global-begin/end in ChatApplyTmplSingle, ChatApplyTmpl
Avoid adding global begin/end markers wrt ChatApplyTmplSingle.

Add ChatApplyTmpl which goes through a vector of messages.
2024-05-06 11:27:56 +05:30
HanishKVC c4cf0e9075 ChatON:Cleanup: BeginEnd, Debug log
Update the note

Rename global-prefix|suffix to global-begin|end.

Rename chat-apply-template to chat-apply-template-single, cas it
handles only a single message.

Add some debug log messages to the helper functions
2024-05-06 11:27:56 +05:30
HanishKVC 050d329e7e ChatOn+Main: Initial go at chaton in main interactive flow 2024-05-06 11:27:55 +05:30
HanishKVC dc56be951d ChatOn:Main: Load and dump any specified chaton meta file 2024-05-06 11:27:55 +05:30
HanishKVC 35f25196a0 ChatOn:Common: Add the needed cmdline arg params and its parsing 2024-05-06 11:27:55 +05:30
HanishKVC 2146a253e8 ChatOn: Capture the idea 2024-05-06 11:27:55 +05:30
viric fcd84a0f5a
Fix Linux /sys cpu path to guess number of cores (#7064) 2024-05-04 15:26:53 +02:00
Andrew Downing b0d943de17
Update LOG_IMPL and LOG_TEE_IMPL (#7029)
ROCm clang defines _MSC_VER which results in the wrong implementation of LOG_IMPL and LOG_TEE_IMPL being compiled.

This fixes https://github.com/ggerganov/llama.cpp/issues/6972
2024-05-01 23:31:30 +02:00
Johannes Gäßler a8f9b07631
perplexity: more statistics, added documentation (#6936)
* perplexity: more statistics, added documentation

* add LLaMA 3 8b scoreboard
2024-04-30 23:36:27 +02:00
Georgi Gerganov 9c67c2773d
ggml : add Flash Attention (#5021)
* ggml : add ggml_flash_attn_ext API

* ggml : fix GQA support in ggml_flash_attn_ext

* ggml : online attention (CPU)

* metal : initial implementation

* metal : f16 precision

* metal : reduce branches

* metal : specialize for head size

* wip : 8 rows per simd group

* wip : 4 rows per simd group

* wip : template for rows per warp

* metal : parallelize across KV size

* metal : parallel reduce across heads

* metal : efficient flash_attn_f16 implementation

* metal : avoid redundant loads of the attention

* metal : scale and mask in matrix form

* metal : fix comment

* llama : avoid ggml_cast, use F32 query

* metal : add parallel reduce version (disabled)

* metal : move output into local memory + optimize

- the result from each simdgroup now stays in the registers
- significantly reduced SRAM usage
- more efficient skipping of -INF blocks
- avoid simdgroup barrier in hot loop
- add comments

* metal : add tests, fix scaling, support C > 32

* metal : improve precision

* ggml : fix f16 mad

* metal : minor

* metal : support Q > 8

* tests : add ATTN tests

* metal : disable buffer allocation logs

* tests : more

* metal : faster inner loop for C == 32

* metal : fix array initialization

* tests : ifdef

* ggml : switch to padded F16 mask for ggml_soft_max, ggml_flash_attn_ext

* ggml : fix ggml_soft_max mask requirement

* cuda : fix soft_max to use correct mask size

* cuda : add flash_attn kernel (wip)

* metal : optimize softmax for C > 32

* metal : optimize softmax

* tests : minor fix

* cuda : avoid zeroing fragments

* tests : update dims

* cuda : fix __hisinf() result check

* cuda : avoid warp_reduce for smax

* cuda : use int instead of int64_t

Noticeably improves performance (thanks to Johannes)

* cuda : make loops use the same loop values

Thanks Johannes again for the tip

* cuda : unroll some of the loops

* cuda : avoid __hisinf branches

* cuda : use half2 in softmax

* cuda : switch to 1 warp for bs > 16

* cuda : speed-up reduce part of the kernel

* cuda : unroll Q*K^T loop

* cuda : fix -INF block check

* cuda : simplify softmax

* cuda : fix matrix names

* cuda : minor

* llama : adapt to F16 KQ_pos

* llama : adapt new models to F16 KQ_mask

* ggml : fix F16 store (ARM NEON)

* llama : fix type of KQ_mask and KQ_pos

* ggml : fix CPU soft_max

* tests : add hs=256

* cuda : fix build

* metal : improve perf via smaller int registers

* cuda : adapt soft_max to F16 mask and pos

* CUDA: faster FlashAttention, kernel for bs == 1

* 16 cols for Phi-2

* no vec for hs, no hs==256 ncols==32 for Volta

* adjust kernel selection logic

* 4 warps, 256 stride for all D

* no ncols == 64

* Multiple parallel blocks for batch size 1

* fix compile warnings

* fix excessive KQ_b loads

* fix cmake build

* fix KV cache padding, NaN from INFINITY (#6438)

* llama : flash_attn cparam + fix defrag

* server: support flash_attn param

* server: bench: enable flash_attn param

* CUDA: refactor host code, dyn. par. blocks

* fix flash_attn_vec_f16 race condition

* flush softmax exp below threshold to 0

* store temp KQ in registers

* Calculate KQ as FP32 if KQV has GGML_PREC_F32

* Add __hgt2_mask implementation for CUDA 11

* fix KQ FP32 precision fpr parallel_blocks > 1

* llama-bench : add -fa,--flash-attn arg

* metal : add BS=1 kernel for flash attention (#6508)

* metal : add BS=1 kernel for flash attention (wip)

* metal : support more than 1 warps

* metal : opts

* metal : opt

* metal : switch to parallel reduce

* metal : reduce registers

* metal : simplify

* metal : initial FA vec kernel

* metal : use F32 attention accumulators

* batched-bench : add fattn arg

* llama : simplify llama_build_kv_store

ggml-ci

* llama : adapt build_olmo to changes

* ggml : fix arm fp16 store on windows

* metal : clean-up

* metal : clean-up kernel code

* metal : minor

* tests : remove benchmarks

ggml-ci

* ggml : fix avx512 const correctness

ggml-ci

* ggml : fix soft_max with bias on CPU

ggml-ci

* common : print --flash-attn in help

* ggml : fix num dimensions in ggml_flash_attn_ext

* llama : force disable flash attention for incompatible models

* ggml : ggml_soft_max support F16/F32 mask/pos

ggml-ci

* cuda : uint -> uint32_t

* cuda : "constexpr dim3" -> "const dim3"

ggml-ci

* cuda : try to fix __hgt2_mask

ggml-ci

* ggml : add TODO's for F16/F32 mask/pos support in other backends

* llama : replace bool need_kq_pos with use_alibi

* llama : prep ALiBi support for BERT models

ggml-ci

* llama : fix n_batch requirements

ggml-ci

* cont

* server : add help for --flash-attn arg

* llama : disable FA for AMD

* tests : remove TMP_ATTN_BENCH

ggml-ci

* llama : support save/load state with FA enabled

ggml-ci

* ci : add CUDA save-load-state tests

ggml-ci

* llama : llama_kv_cache_clear zeroes data + fix save-load seq

ggml-ci

* llama : fix copy-paste errors, add TODO

* llama : disallow incompatible states

* llama : update llama_state_get_size after v_trans field

* metal : remove tmp log

* llama : add static reminder for llama_state_get_size

* metal : fix max nsg

ggml-ci

* ci : fix arg order

ggml-ci

---------

Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
Co-authored-by: Pierrick HYMBERT <pierrick.hymbert@gmail.com>
2024-04-30 12:16:08 +03:00