#pragma once /** * * Generic tagging logic + text config file based chat templates handling * by Humans for All * * ## Overview * * Helps chat with models, by tagging chat messages based on the specified * chat-handshake-template-standard. This uses a generic tagging code driven * by a json meta data file, which specifies the handshake template details. * * This can be used by * * * main, to build on existing interactive flow and its in-prefix, in-suffix * and antiprompt/reverse-prompt * * * server, by replacing its existing llama_chat_apply_template with the * equivalent helper here. * * * ## The common pattern * * As a convention, the tagging used by LLMs to differentiate between the * different parts when chatting with them normally follows a general pattern of * * * * * * The Roles could include System, User and Assistant (ie the Model) * * * A chat normally consists of * * * a System message/prompt followed by * * * multiple user message/query - model message/response pairs * * The different models will normally have all or some subset of the tagging mentioned above. * * You may also notice some common patterns like * * * Because a user message is normally followed by model/assistant response, in most models * * * user messages wont have EndOfSentenceTag and * * * the following model response wont have BeginOfSentenceTag * * * Because a system message will normally be immidiately followed by a user query, * * * in many models, there wont be a EndOfSentenceTag following the system message and * BeginOfSentenceTag wrt the 1st user message following the system message. * * * in some models there wont even be a RoleSuffixTag following system message * and RolePrefixTag wrt the 1st user message following the system message. * * * however in many of these models, the subsequent user messages will have the * BeginOfSentenceTag and or RolePrefixTag. * * * Some models may require a BoS for a group of messages, independent of BoS (if any) * wrt individual roles. * * * ## The Strategy * * The template meta data json file allows the user to specify the above mentioned tags wrt * each of the Role as well as any global tag for a group of messages. Depending on whether * a given model uses/needs a given tag or not you either specify the required tag or else * you specify a empty string. * * A tag could be a single word or multiple words, and may include newline char specified * using \n and so on. The tag is always demarcated using double quotes and thus also allows * spaces at the begining or end of the tag, if needed. * * In order to account for the conditionality of tags between the system message and the 1st * user message, flags are provided to explicitly control whether each of these possible tags * is used by a specific model or not, as part of its template info. * * The Roles are identified in the json file using "system", "user" and "assistant". However * the model may use different words to identify these roles, in which case setup RolePrefix * and or RoleSuffix appropriately. * * To identify that model is finished with generating response to user query, depending on * the model's handshake template standard, one will need to set the reverse-prompt to either * the assistant's suffix or end tag or to the user's begin or prefix tag, depending on what * is generated by the model at the end of its response. * * Currently flags for trimming wrt user text (be it wrt system or user role) is not added. * * * ## The JSON File * * Can contain the template info wrt multiple models/handshake-standards. And inturn each * unique template is identified by a unique template id string. * * The fields that make up a given chat-handshake-template-standard include * * * global -> begin & end * * * system -> begin, prefix, suffix & end * * * user -> begin, prefix, suffix & end * * * assistant -> begin, prefix, suffix & end * * * reverse-prompt * * * systemuser-system-has-suffix, systemuser-system-has-end, * systemuser-1st-user-has-begin and systemuser-1st-user-has-prefix * * * ## Usage * * One needs to load the json file containing the template meta data and inturn call the * other helper functions as needed. * * Inturn one can use the helper functions to either extract a given tag or to apply all * tags specified wrt a given role to the passed message or to apply tags as needed for * a bunch of messages in one go. * * The individual message tagging helper, will apply all tags specified wrt that role. * * The multiple messages tagging helper chaton-tmpl-apply, will look at the boolean flags * when tagging the passed messages. In this the system suffix, system end, user begin and * user prefix get included only if corresponding flag is set. * * Both the single and multi messages tagging helpers provide two versions. * * one which returns a single string which contains the tagged message(s) * * one which returns * * [tagged msg] the string containing the tagged message(s) * * [parts lengths] an array of integers, which specifies the part lengths, * which divides the returned string into parts. * * [parts types] a string where each character indicates whether the corresponding * part is a normal part which needs to be tokenized without parse_special * or is a special part which needs to be tokenized with parse-special. * * * ## example/main * * The interactive commandline program under example/main, uses * * * the system role related tags to tag the system prompt * * the system prompt includes contents of -p if any * * followed by contents of file specified using -f if any * * the user begin+prefix to map to in-prefix * * the user suffix+end to map to in-suffix * * the reverse-prompt to map to antiprompt * * wrt tokenization * * the user specified system prompt is tokenized with parse_special flag. * * however the user messages are tokenized without parse_special flag. * * Currently Main doesnt use chaton-tmpl-apply, but only * * chaton-tmpl-apply-single (for system prompt) and * * chaton-tmpl-role-kv which maps the user prefix, suffix and reverse-prompt * to in-prefix, in-suffix and antiprompt of main. * These always adds any role specific begin+prefix and suffix+end around * the passed message. * * ## other uses be it wrt llama.cpp-as-library or examples/server or ... * * This module exposes a c-api which is equivalent to the current hardcoded * templating logic's llama_chat_apply_template. So any program using llama.cpp's * chat templating logic can be easily migrated to make use of this generic code * with text based config file based flow. * * If a program doesnt want to bring in json dependency into their project, * there is also common/simpcfg.hpp, which provides a simple text based config * file format, along with the corresponding parser for the same. This can be * modified to work with simpcfg easily, if needed. * * ## Adding support for new model / chat-handshake-template-standard * * 1. Add suitable entries in json for that model/standard * This in itself should work for most of the models. * * 2. If some new model introduces a totally different kind of chat-templating * tag inter/intra mixing, Try to reuse and update the generic flow in * chaton-tmpl-apply, as much as possible, before trying to add any custom logic. * * If you update the generic flow, cross check if existing json files will * need to be updated or not. * * * ## Notes * * Look at the sample chaton_meta.json in examples folder for how the above may apply to * the different llm's out there like * * * llama2, llama3, gemma, zephyr, deepseek(normal and coder), monarch, mistral, phi3 * * chatml, command-r, orion, openchat, vicuna * */ #include #include #include #include "log.h" #include "llama.h" #define LOGXLN LOG_TEELN const auto K_SYSTEM = "system"; const auto K_USER = "user"; const auto K_ASSISTANT = "assistant"; const auto K_PREFIX = "prefix"; const auto K_SUFFIX = "suffix"; const auto K_BEGIN = "begin"; const auto K_END = "end"; const auto K_GLOBAL = "global"; const auto K_SYSTEMUSER_SYSTEM_HAS_SUFFIX = "systemuser-system-has-suffix"; const auto K_SYSTEMUSER_SYSTEM_HAS_END = "systemuser-system-has-end"; const auto K_SYSTEMUSER_1ST_USER_HAS_BEGIN = "systemuser-1st-user-has-begin"; const auto K_SYSTEMUSER_1ST_USER_HAS_PREFIX = "systemuser-1st-user-has-prefix"; const auto K_REVERSE_PROMPT = "reverse-prompt"; #define CHATON_JSON #ifdef CHATON_JSON #include using json = nlohmann::ordered_json; #endif /** * Helps keep user prompt and chat-hs-template tag parts seperate, but in sequence. * Inturn gives the flexibility to tokenize with or without parse_special flag, wrt the different parts of the chat msg(s). * One could use the triplet of str, get_types and get_partslens to achieve the above mentioned flexibility. */ class ChatParts { std::vector parts = {}; std::string types = {""}; public: // Identify string with special tokens that need to be processed. static const auto S = 's'; // Identify string which shouldnt have special token processing done. static const auto N = 'n'; // Identify no string condition and or ignore string. static const auto X = '?'; ChatParts() : parts{}, types{""} {} char last_type() { if (types.length() == 0) { return ChatParts::X; } return types[types.length()-1]; } void add_part(char type, const std::string &part) { if (last_type() == type) { parts[parts.size()-1] += part; } else { parts.emplace_back(part); types += type; } } std::string str() { std::string allin = ""; for(auto part: parts) { allin += part; } return allin; } std::string get_partstypes() { return types; } std::vector get_partslens() { std::vector lens = {}; for(auto part: parts) { lens.push_back(part.length()); } return lens; } std::string name() { return typeid(*this).name(); } void dump() { std::string me = name() + ":" + __func__; LOGXLN("INFO:%s:NumTypes:%zu", me.c_str(), types.length()); LOGXLN("INFO:%s:NumParts:%zu", me.c_str(), parts.size()); LOGXLN("INFO:%s:StrLength:%zu", me.c_str(), str().length()); if (parts.size() != types.length()) { LOG_TEELN("DBUG:%s:Mismatch between parts and types", me.c_str()); } int i = 0; for(auto part: parts) { LOGXLN("INFO:%s:%c:%s", me.c_str(), types[i], part.c_str()); i += 1; } } }; class ChatTemplates : public GroupKV { public: ChatTemplates(GroupKVMapMapVariant defaultMap) : GroupKV(defaultMap) {} /** * Check if the specified chat-template exists or not. * NOTE: This doesnt cross check, if the template inturn contains all the required fields or not. */ bool tmpl_exists(const std::string &tmpl) { if (!group_exists(tmpl)) { LOG_TEELN("WARN:CT:%s: tmpl[%s] not found...", __func__, tmpl.c_str()); return false; } return true; } /** * Check if all expected keys/fields are present wrt the specified chat-template. * If any key/field is missing, expect a exception. */ bool tmpl_basiccheck(const std::string &tmpl, std::stringstream &ss) { std::string globalBegin = get_value(tmpl, { K_GLOBAL, K_BEGIN }); std::string globalEnd = get_value(tmpl, { K_GLOBAL, K_END }); std::string systemBegin = get_value(tmpl, { K_SYSTEM, K_BEGIN }); std::string systemPrefix = get_value(tmpl, { K_SYSTEM, K_PREFIX }); std::string systemSuffix = get_value(tmpl, { K_SYSTEM, K_SUFFIX }); std::string systemEnd = get_value(tmpl, { K_SYSTEM, K_END }); std::string userBegin = get_value(tmpl, { K_USER, K_BEGIN }); std::string userPrefix = get_value(tmpl, { K_USER, K_PREFIX }); std::string userSuffix = get_value(tmpl, { K_USER, K_SUFFIX }); std::string userEnd = get_value(tmpl, { K_USER, K_END }); std::string assistantBegin = get_value(tmpl, { K_ASSISTANT, K_BEGIN }); std::string assistantPrefix = get_value(tmpl, { K_ASSISTANT, K_PREFIX }); std::string assistantSuffix = get_value(tmpl, { K_ASSISTANT, K_SUFFIX }); std::string assistantEnd = get_value(tmpl, { K_ASSISTANT, K_END }); std::string reversePrompt = get_value(tmpl, { K_REVERSE_PROMPT }); bool systemHasSuffix = get_value(tmpl, { K_SYSTEMUSER_SYSTEM_HAS_SUFFIX }); bool systemHasEnd = get_value(tmpl, { K_SYSTEMUSER_SYSTEM_HAS_END }); bool userHasBegin = get_value(tmpl, { K_SYSTEMUSER_1ST_USER_HAS_BEGIN }); bool userHasPrefix = get_value(tmpl, { K_SYSTEMUSER_1ST_USER_HAS_PREFIX }); LOGXLN("INFO:%s:%s:%s", __func__, "global-begin", globalBegin.c_str()); LOGXLN("INFO:%s:%s:%s", __func__, "global-end", globalEnd.c_str()); LOGXLN("INFO:%s:%s:%s", __func__, "system-begin", systemBegin.c_str()); LOGXLN("INFO:%s:%s:%s", __func__, "system-prefix", systemPrefix.c_str()); LOGXLN("INFO:%s:%s:%s", __func__, "system-suffix", systemSuffix.c_str()); LOGXLN("INFO:%s:%s:%s", __func__, "system-end", systemEnd.c_str()); LOGXLN("INFO:%s:%s:%s", __func__, "user-begin", userBegin.c_str()); LOGXLN("INFO:%s:%s:%s", __func__, "user-prefix", userPrefix.c_str()); LOGXLN("INFO:%s:%s:%s", __func__, "user-suffix", userSuffix.c_str()); LOGXLN("INFO:%s:%s:%s", __func__, "user-end", userEnd.c_str()); LOGXLN("INFO:%s:%s:%s", __func__, "assistant-begin", assistantBegin.c_str()); LOGXLN("INFO:%s:%s:%s", __func__, "assistant-prefix", assistantPrefix.c_str()); LOGXLN("INFO:%s:%s:%s", __func__, "assistant-suffix", assistantSuffix.c_str()); LOGXLN("INFO:%s:%s:%s", __func__, "assistant-end", assistantEnd.c_str()); LOGXLN("INFO:%s:%s:%s", __func__, K_REVERSE_PROMPT, reversePrompt.c_str()); LOGXLN("INFO:%s:%s:%d", __func__, K_SYSTEMUSER_SYSTEM_HAS_SUFFIX, systemHasSuffix); LOGXLN("INFO:%s:%s:%d", __func__, K_SYSTEMUSER_SYSTEM_HAS_END, systemHasEnd); LOGXLN("INFO:%s:%s:%d", __func__, K_SYSTEMUSER_1ST_USER_HAS_BEGIN, userHasBegin); LOGXLN("INFO:%s:%s:%d", __func__, K_SYSTEMUSER_1ST_USER_HAS_PREFIX, userHasPrefix); if (!userEnd.empty()) { LOG_TEELN("WARN:%s:User-End seems to be set to [%s], do cross check if this is proper and needed", __func__, userEnd.c_str()); } if (!assistantBegin.empty()) { LOG_TEELN("WARN:%s:Assistant-Begin seems to be set to [%s], do cross check if this is proper and needed", __func__, assistantBegin.c_str()); } } /** * For the specified chat-template, get the value associated with the specified key/field. */ template SupportedDataType tmpl_getkey(const std::string &tmpl, const std::string &key, const SupportedDataType &defaultValue) { return get_value(tmpl, {key}, defaultValue, "CTTmplGetKey"); } /** * For the specified chat-template and the role within, cumulate the values of the specified keys/fields * and return the same. */ std::string tmpl_role_getkeys(const std::string &tmpl, const std::string &role, const std::vector &keys) { std::string got = ""; std::string sKeys = ""; for(auto key: keys) { got += get_value(tmpl, {role, key}, "", "CTTmplRoleGetKeys"); sKeys += "+"; sKeys += key; } LDBUG_LN("DBUG:CT:%s:%s:%s:%s:%s", __func__, tmpl.c_str(), role.c_str(), sKeys.c_str(), got.c_str()); return got; } }; #include "chaton_meta.hpp" //ChatTemplates gCT = {{}}; #ifdef CHATON_JSON template inline SupportedType json_get(json &j, const std::vector &keys, const std::string &msgTag) { json curJ = j; std::stringstream skey; int i = 0; for(auto key: keys) { if (i != 0) skey << "-"; i += 1; skey << key; if (curJ.contains(key)) { curJ = curJ[key]; } else { std::stringstream ss; ss << "ERRR:ChatON:" << __func__ << ":" << msgTag << ":KeyChain [" << skey.str() << "] is missing"; throw std::runtime_error(ss.str()); } } return curJ; } inline bool chaton_meta_load(const std::string &fname) { std::ifstream f(fname); json conMeta = json::parse(f); for(auto it=conMeta.begin(); it != conMeta.end(); ++it) { auto group = it.key(); auto curTmpl = conMeta[group]; std::string globalBegin = json_get(curTmpl, { K_GLOBAL, K_BEGIN }, group); gCT.set_value(group, { K_GLOBAL, K_BEGIN }, globalBegin); std::string globalEnd = json_get(curTmpl, { K_GLOBAL, K_END }, group); gCT.set_value(group, { K_GLOBAL, K_END }, globalEnd); std::string systemBegin = json_get(curTmpl, { K_SYSTEM, K_BEGIN }, group); gCT.set_value(group, { K_SYSTEM, K_BEGIN }, systemBegin); std::string systemPrefix = json_get(curTmpl, { K_SYSTEM, K_PREFIX }, group); gCT.set_value(group, { K_SYSTEM, K_PREFIX }, systemPrefix); std::string systemSuffix = json_get(curTmpl, { K_SYSTEM, K_SUFFIX }, group); gCT.set_value(group, { K_SYSTEM, K_SUFFIX }, systemSuffix); std::string systemEnd = json_get(curTmpl, { K_SYSTEM, K_END }, group); gCT.set_value(group, { K_SYSTEM, K_END }, systemEnd); std::string userBegin = json_get(curTmpl, { K_USER, K_BEGIN }, group); gCT.set_value(group, { K_USER, K_BEGIN }, userBegin); std::string userPrefix = json_get(curTmpl, { K_USER, K_PREFIX }, group); gCT.set_value(group, { K_USER, K_PREFIX }, userPrefix); std::string userSuffix = json_get(curTmpl, { K_USER, K_SUFFIX }, group); gCT.set_value(group, { K_USER, K_SUFFIX }, userSuffix); std::string userEnd = json_get(curTmpl, { K_USER, K_END }, group); gCT.set_value(group, { K_USER, K_END }, userEnd); std::string assistantBegin = json_get(curTmpl, { K_ASSISTANT, K_BEGIN }, group); gCT.set_value(group, { K_ASSISTANT, K_BEGIN }, assistantBegin); std::string assistantPrefix = json_get(curTmpl, { K_ASSISTANT, K_PREFIX }, group); gCT.set_value(group, { K_ASSISTANT, K_PREFIX }, assistantPrefix); std::string assistantSuffix = json_get(curTmpl, { K_ASSISTANT, K_SUFFIX }, group); gCT.set_value(group, { K_ASSISTANT, K_SUFFIX }, assistantSuffix); std::string assistantEnd = json_get(curTmpl, { K_ASSISTANT, K_END }, group); gCT.set_value(group, { K_ASSISTANT, K_END }, assistantEnd); std::string reversePrompt = json_get(curTmpl, { K_REVERSE_PROMPT }, group); gCT.set_value(group, { K_REVERSE_PROMPT }, reversePrompt); bool systemHasSuffix = json_get(curTmpl, { K_SYSTEMUSER_SYSTEM_HAS_SUFFIX }, group); gCT.set_value(group, { K_SYSTEMUSER_SYSTEM_HAS_SUFFIX }, systemHasSuffix); bool systemHasEnd = json_get(curTmpl, { K_SYSTEMUSER_SYSTEM_HAS_END }, group); gCT.set_value(group, { K_SYSTEMUSER_SYSTEM_HAS_END }, systemHasEnd); bool userHasBegin = json_get(curTmpl, { K_SYSTEMUSER_1ST_USER_HAS_BEGIN }, group); gCT.set_value(group, { K_SYSTEMUSER_1ST_USER_HAS_BEGIN }, userHasBegin); bool userHasPrefix = json_get(curTmpl, { K_SYSTEMUSER_1ST_USER_HAS_PREFIX }, group); gCT.set_value(group, { K_SYSTEMUSER_1ST_USER_HAS_PREFIX }, userHasPrefix); } LOGXLN("%s", gCT.dump("", "DBUG:ChatONMetaLoad:ChatTemplates:").c_str()); return true; } #endif inline bool chaton_tmpl_exists(const std::string &tmpl) { return gCT.tmpl_exists(tmpl); } inline std::string chaton_tmpl_role_getkeys(const std::string &tmpl, const std::string &role, const std::vector &keys) { return gCT.tmpl_role_getkeys(tmpl, role, keys); } inline std::string chaton_tmpl_getkey_str(const std::string &tmpl, const std::string &key) { return gCT.tmpl_getkey(tmpl, {key}, ""); } inline bool chaton_tmpl_getkey_bool(const std::string &tmpl, const std::string &key) { return gCT.tmpl_getkey(tmpl, {key}, false); } // Given the template standard, role and a message, this returns // a tagged message, types string and lens vector wrt the parts that make up the returned string // // * a string containing the tagged message // * role-(begin+prefix) + msg + role-(suffix+end) // * a string where the chars contain info about // type of sub-strings/parts that make up the tagged message. // * a vector of ints, which give the length of each part in the tagged message. inline bool chaton_tmpl_apply_single_ex( const std::string &tmpl, const std::string &role, const std::string &content, std::string &tagged, std::string &types, std::vector &lens ) { if (!chaton_tmpl_exists(tmpl)) { return false; } ChatParts cp = {}; std::string beginPrefix = chaton_tmpl_role_getkeys(tmpl, role, {K_BEGIN, K_PREFIX}); std::string suffixEnd = chaton_tmpl_role_getkeys(tmpl, role, {K_SUFFIX, K_END}); cp.add_part(ChatParts::S, beginPrefix); cp.add_part(ChatParts::N, content); cp.add_part(ChatParts::S, suffixEnd); cp.dump(); tagged = cp.str(); LOGLN("DBUG:%s:%s:%s:%s", __func__, tmpl.c_str(), role.c_str(), tagged.c_str()); types = cp.get_partstypes(); lens = cp.get_partslens(); return true; } // Given the template standard, role and a message, this returns the tagged message. // // * a string containing the tagged message // * role-(begin+prefix) + msg + role-(suffix+end) inline size_t chaton_tmpl_apply_single( const std::string &tmpl, const std::string &role, const std::string &content, std::string &tagged ) { std::string types; std::vector lens; if (!chaton_tmpl_apply_single_ex(tmpl, role, content, tagged, types, lens)) { return -1; } return tagged.size(); } /** * Apply chat-handshake-template for the specified template standard and role. * If the passed char array is smaller than that required for the tagged message, * * part of the tagged message which fits within dest buffer is copied * * the returned value, indicates the size of the actual tagged message * NOTE: * * ideally the passed char array should be able to fit the tagged message+0|null char. * * if the return value from this function is larger than or equal to destLength, * then you will have to increase the size of the dest buffer, and call this * function a second time, to ensure that one gets the full tagged message. */ inline size_t chat_tmpl_apply_single_capi( const char *tmpl, const char *role, const char *content, char *dest, const size_t destLength ) { std::string tagged; auto taggedLength = chaton_tmpl_apply_single(tmpl, role, content, tagged); if (taggedLength <= 0) { return taggedLength; } if (dest && (destLength > 0)) { strlcpy(dest, tagged.c_str(), destLength); } return taggedLength; } // Given the template standard and a bunch of messages including their roles, this returns // tagged messages, types string and lens vector. Returned types string and lens vector help // identify the parts of the tagged msgs string, which relate to passed msgs and added tags. // // * a string containing the tagged messages // * global-begin + 1 or more [[role-begin] + [role-prefix] + msg + [role-suffix] +[role-end]] + global-end // * a string where the chars contain info about // type of sub-strings/parts that make up the tagged messages string. // * a vector of ints, which give the length of each part in the tagged messages string. // // if a combination of system-user messages is passed, then tags between the system // and the 1st user message, is based on the flags set wrt the corresponding template standard. inline bool chaton_tmpl_apply_ex( const std::string &tmpl, const std::vector &msgs, bool alertAssistantAtEnd, std::string &tagged, std::string &types, std::vector &lens ) { if (!chaton_tmpl_exists(tmpl)) { return false; } ChatParts cp = {}; std::string globalBegin = chaton_tmpl_role_getkeys(tmpl, K_GLOBAL, {K_BEGIN}); cp.add_part(ChatParts::S, globalBegin); int cntSystem = 0; int cntUser = 0; int cntOthers = 0; for(const auto msg: msgs) { auto role = msg->role; auto content = msg->content; std::string begin = chaton_tmpl_role_getkeys(tmpl, role, {K_BEGIN}); auto prefix = chaton_tmpl_role_getkeys(tmpl, role, {K_PREFIX}); auto suffix = chaton_tmpl_role_getkeys(tmpl, role, {K_SUFFIX}); auto end = chaton_tmpl_role_getkeys(tmpl, role, {K_END}); if (role == K_SYSTEM) { cntSystem += 1; cp.add_part(ChatParts::S, begin); cp.add_part(ChatParts::S, prefix); } else if (role == K_USER) { cntUser += 1; if ((cntSystem == 1) && (cntUser == 1)) { if (chaton_tmpl_getkey_bool(tmpl, K_SYSTEMUSER_1ST_USER_HAS_BEGIN)) { cp.add_part(ChatParts::S, begin); } if (chaton_tmpl_getkey_bool(tmpl, K_SYSTEMUSER_1ST_USER_HAS_PREFIX)) { cp.add_part(ChatParts::S, prefix); } } else { cp.add_part(ChatParts::S, begin); cp.add_part(ChatParts::S, prefix); } } else { cntOthers += 1; cp.add_part(ChatParts::S, begin); cp.add_part(ChatParts::S, prefix); } cp.add_part(ChatParts::N, content); if (role == K_SYSTEM) { if (chaton_tmpl_getkey_bool(tmpl, K_SYSTEMUSER_SYSTEM_HAS_SUFFIX)) { cp.add_part(ChatParts::S, suffix); } if (chaton_tmpl_getkey_bool(tmpl, K_SYSTEMUSER_SYSTEM_HAS_END)) { cp.add_part(ChatParts::S, end); } } else { cp.add_part(ChatParts::S, suffix); cp.add_part(ChatParts::S, end); } } if (alertAssistantAtEnd) { auto assistantBeginPrefix = chaton_tmpl_role_getkeys(tmpl, K_ASSISTANT, {K_BEGIN, K_PREFIX}); cp.add_part(ChatParts::S, assistantBeginPrefix); } auto globalEnd = chaton_tmpl_role_getkeys(tmpl, K_GLOBAL, {K_END}); cp.add_part(ChatParts::S, globalEnd); cp.dump(); tagged = cp.str(); LOGLN("DBUG:%s:%s:%s", __func__, tmpl.c_str(), tagged.c_str()); LOGLN("DBUG:%s:%s:CntSys[%d]:CntUsr[%d]:CntOthers[%d]", __func__, tmpl.c_str(), cntSystem, cntUser, cntOthers); types = cp.get_partstypes(); lens = cp.get_partslens(); return true; } // Given the template standard and a bunch of messages including their roles, this returns // the tagged messages as a string. // global-begin + 1 or more [[role-begin] + [role-prefix] + msg + [role-suffix] +[role-end]] + global-end inline int32_t chaton_tmpl_apply( const std::string &tmpl, const std::vector &msgs, bool alertAssistantAtEnd, std::string &tagged ) { std::string types; std::vector lens; if (!chaton_tmpl_apply_ex(tmpl, msgs, alertAssistantAtEnd, tagged, types, lens)) { return -1; } return tagged.size(); } // Given the template standard and a bunch of messages including their roles, this returns // the tagged messages as a string. // global-begin + 1 or more [[role-begin] + [role-prefix] + msg + [role-suffix] +[role-end]] + global-end // // If the passed char array is smaller than that required for the tagged messages string, // * part of the tagged messages string which fits within dest buffer is copied // * the returned value, indicates the size of the actual tagged message // // NOTE: // * ideally the passed char array should be able to fit the tagged messages string + 0|null char. // * if the return value from this function is larger than or equal to destLength, // then you will have to increase the size of the dest buffer, and call this // function a second time, to ensure that one gets the full tagged messages string. inline int32_t chaton_tmpl_apply_capi( const char *tmpl, const struct llama_chat_message *msgs, const size_t numMsgs, bool alertAssistantAtEnd, char *dest, int32_t destLength ) { if ((tmpl == nullptr) || (dest == nullptr)) { return -1; } std::vector vMsgs; for(size_t i=0; i 0) { strlcpy(dest, taggedMsgs.c_str(), destLength); } return taggedLength; } // // In addition to the semantic provided by chaton_tmpl_apply_capi // this additionally also returns info about the parts that make up // the returned tagged message. // // partsTypes and partsLengths should be arrays that can accomodate the // same number of elements belonging to its respective type. // Inturn the pNumParts should point to a int which specifies the // number of elements. // If the generated tagged message has more parts than the specified // *pNumParts, then the logic copies partsTypes and partsLengths to the // specified length/NumOfParts only. Parallely it updates *pNumParts // to the actual needed length (not including any terminating null char or so). // inline int32_t chaton_tmpl_apply_ex_capi( const char *tmpl, const struct llama_chat_message *msgs, const size_t numMsgs, bool alertAssistantAtEnd, char *dest, int32_t destLength, char *partsTypes, int32_t *partsLengths, int32_t *pNumParts ) { if ((tmpl == nullptr) || (dest == nullptr) || (pNumParts == nullptr)) { return -1; } std::vector vMsgs; for(size_t i=0; i lens; if (!chaton_tmpl_apply_ex(tmpl, vMsgs, alertAssistantAtEnd, taggedMsgs, types, lens)) { return -1; } int32_t taggedLength = taggedMsgs.size(); if (taggedLength < 0) { return taggedLength; } if (destLength > 0) { strlcpy(dest, taggedMsgs.c_str(), destLength); } if (*pNumParts > 0) { if (partsTypes != nullptr) { strlcpy(partsTypes, types.c_str(), *pNumParts); } if (partsLengths != nullptr) { memcpy(partsLengths, lens.data(), (*pNumParts)*sizeof(int32_t)); } } *pNumParts = types.length(); return taggedLength; } // Copied from common.cpp inline std::vector chaton_llama_tokenize( const struct llama_model * model, const std::string & text, bool add_special, bool parse_special) { LOGLN("DBUG:%s:%s:special[add:%d, parse:%d]", __func__, text.c_str(), add_special, parse_special); if (model == nullptr) { LOG_TEELN("ERRR:%s:Model NOT Provided:%s:special[add:%d, parse:%d]", __func__, text.c_str(), add_special, parse_special); return std::vector{}; } // upper limit for the number of tokens int n_tokens = text.length() + 2 * add_special; std::vector result(n_tokens); n_tokens = llama_tokenize(model, text.data(), text.length(), result.data(), result.size(), add_special, parse_special); if (n_tokens < 0) { result.resize(-n_tokens); int check = llama_tokenize(model, text.data(), text.length(), result.data(), result.size(), add_special, parse_special); GGML_ASSERT(check == -n_tokens); } else { result.resize(n_tokens); } return result; } // Tokenize the passed taggedText, keeping in mind the subparts within and // inturn whether to parse special tokens in them or not (partsTypes). // If you want to parse special tokens in the taggedText, independent of what // partsTypes specifies, then set forceParseSpecial to true. inline std::vector chaton_llama_tokenize_ex( const struct llama_model *model, const std::string &taggedText, const std::string &partsTypes, const std::vector &partsLengths, bool addSpecial, bool forceParseSpecial ) { std::vector tokens; int iPart = 0; int iStart = 0; for(auto partLen: partsLengths) { auto partType = partsTypes[iPart]; iPart += 1; auto msgPart = taggedText.substr(iStart, partLen); iStart += partLen; auto parseSpecial = partType == ChatParts::S ? true : false; parseSpecial |= forceParseSpecial; auto curTokens = chaton_llama_tokenize(model, msgPart, addSpecial, parseSpecial); tokens.insert(tokens.end(), curTokens.begin(), curTokens.end()); } return tokens; } /** * if tmpl is * * empty string, then dump the full loaded chaton-meta * * chaton-template-id, then dump contents related to that specific chat-handshake-template-standard * NOTE: It uses the exception raising get_value to check if the tags related keys are present * wrt the specified template-standard/model-id or not. */ inline bool _chaton_meta_dump(std::string &tmpl) { if (!tmpl.empty()) { if (!gCT.tmpl_exists(tmpl)) { LOGXLN("ERRR:%s:Specified template-id [%s] not found", __func__, tmpl.c_str()); return false; } } LOGXLN("\n\nINFO:%s:%s:\n%s", __func__, tmpl.c_str(), gCT.dump(tmpl, "INFO:ChatOnMetaDump:").c_str()); if (!tmpl.empty()) { gCT.tmpl_basiccheck(tmpl); } return true; } /** * Verify that specified chaton-template-id contains required fields using meta-dump */ inline bool chaton_meta_ok(std::string &tmpl) { return _chaton_meta_dump(tmpl); }