llama.cpp/examples/parallel
Herman Semenov 2a9a84be7d ggml llama: align structs for memory optimization on 64-bit platforms:
- ggml_type_traits_t (80 -> 72 bytes)
- llama_batch (72 -> 64 bytes)
- llama_model_params (56 -> 48 bytes)
- hash_node (32 -> 24 bytes)
- ggml_compute_state (32 -> 24 bytes)
- gguf_tensor_info (88 -> 80 bytes)
2024-05-13 18:38:48 -05:00
..
CMakeLists.txt build : link against build info instead of compiling against it (#3879) 2023-11-02 08:50:16 +02:00
README.md Fix some documentation typos/grammar mistakes (#4032) 2023-11-11 23:04:58 -07:00
parallel.cpp ggml llama: align structs for memory optimization on 64-bit platforms: 2024-05-13 18:38:48 -05:00

README.md

llama.cpp/example/parallel

Simplified simulation of serving incoming requests in parallel