llama.cpp/tools/server/public_simplechat
hanishkvc 9039a917d0 SimpleSallap:SimpleMCP: Cleanup, Readme updates
Update the documentation a bit wrt switch from simpleproxy to
simplemcp with mcp-ish kind of handshake between the chat client
and simplemcp.

Rename proxyUrl and related to mcpServerUrl and mcpServerAuth.
Now include the path in the url itself, given that in future,
we may want to allow the chat client logic to handshake with
other mcp servers, which may expose their services through a
different path or so.

Drop SearchEngine related config entries from chat session
settings, given that its now controlled directly in SimpleMCP.
2025-12-08 06:21:24 +05:30
..
MOVED SimpleSallap:SimpleMCP:Move out simpleproxy and its helpers 2025-12-08 04:33:34 +05:30
docs SimpleSallap:SimpleMCP: Cleanup, Readme updates 2025-12-08 06:21:24 +05:30
local.tools SimpleSallap:SimpleMCP: Cleanup, Readme updates 2025-12-08 06:21:24 +05:30
datautils.mjs SimpleChatTC:System Date and Time 2025-12-04 19:41:39 +05:30
idb.mjs SimpleChatTCRV:iDB:GetKeys: helps decide whether restore btn shown 2025-12-04 19:41:39 +05:30
index.html SimpleChatTCRV:CMTextFormat: Common flow accounting User and Auto 2025-12-04 19:41:40 +05:30
main.js SimpleSallap:SimpleMCP: Cleanup, Readme updates 2025-12-08 06:21:24 +05:30
readme.md SimpleSallap:SimpleMCP: Cleanup, Readme updates 2025-12-08 06:21:24 +05:30
simplechat.css SimpleChatTCRV:Markdown:User configurable per session 2025-12-04 19:41:40 +05:30
simplechat.js SimpleSallap:SimpleMCP: Cleanup, Readme updates 2025-12-08 06:21:24 +05:30
test-tools-cmdline.sh SimpleChatTC: Trap any exception raised during tool call 2025-12-04 19:41:38 +05:30
toolai.mjs SimpleChatTCRV:ToolCall:ExtAi: Task decomposition/planning 2025-12-04 19:41:40 +05:30
tooldb.mjs SimpleChatTCRV:Config++:Cleanup the initial go 2025-12-04 19:41:40 +05:30
tooljs.mjs SimpleChatTCRV:Config++:Cleanup the initial go 2025-12-04 19:41:40 +05:30
toolmcp.mjs SimpleSallap:SimpleMCP: Cleanup, Readme updates 2025-12-08 06:21:24 +05:30
tools.mjs SimpleSallap:SimpleMCP:Move out simpleproxy and its helpers 2025-12-08 04:33:34 +05:30
toolsconsole.mjs SimpleChatTC:ToolsConsole:Cleanup a bit, add basic set of notes 2025-12-04 19:41:39 +05:30
toolsdbworker.mjs SimpleChatTCRV:ToolCalls Cleanup 2025-12-04 19:41:40 +05:30
toolsworker.mjs SimpleChatTCRV:UICleanup: ToolCall trigger and UserInput block 2025-12-04 19:41:40 +05:30
typemd.mjs SimpleChatTCRV:Markdown:OrderedLists - allow only number markers 2025-12-04 19:41:40 +05:30
ui.mjs SimpleChatTCRV:UICleanup:Ai Reasoning/Live response 2025-12-04 19:41:40 +05:30
xpromise.mjs SimpleChatTC:Ensure fetch's promise chain is also trapped 2025-12-04 19:41:39 +05:30

readme.md

SimpleChat / AnveshikaSallap

by Humans for All.

A lightweight simple minded ai chat client, which runs in a browser environment, with a web front-end that supports multiple chat sessions, vision models, reasoning and tool calling (including bundled tool calls - some browser native and some bundled simplemcp based).

Quickstart

Server

From the root directory of llama.cpp source code repo containing build / tools / ... sub directories

Start ai engine / server using

build/bin/llama-server -m <path/to/model.gguf> \
  --path tools/server/public_simplechat --jinja
  • --jinja enables toolcalling support
  • --mmproj <path/to/mmproj.gguf> enables vision support
  • --port <port number> use if a custom port is needed
    • default is 8080 wrt llama-server

If one needs web related access / tool calls dont forget to run

cd tools/server/public_simplechat/local.tools; python3 ./simplemcp.py --config simplemcp.json
  • --debug True enables debug mode which captures internet handshake data
  • port defaults to 3128, can be changed from simplemcp.json, if needed
  • add sec.keyFile and sec.certFile to simplemcp.json, for https mode;
    • also dont forget to change mcpServerUrl to mention https scheme

Client

  1. Open http://127.0.0.1:8080/index.html in a browser

    • assuming one is running the llama-server locally with its default port
  2. Select / Create a chat session

    • set a suitable system prompt, if needed
    • modify settings, if needed
      • modifying mcpServerUrl wont reload supported tool calls list, till next app page refresh
    • Restore loads last autosaved session with same name
  3. Enter query/response into user input area at the bottom, press Enter

    • use ShiftEnter for newline
    • include images if required (ai vision models)
  4. View any streamed ai response (if enabled and supported)

  5. If a tool call is requested

    • verify / edit the tool call details before triggering the same
      • one can even ask ai to rethink on the tool call requested, by sending a appropriate user response instead of a tool call response
    • tool call is executed using Browser's web worker or included SimpleMCP.py
    • tool call response is placed in user input area
      • the user input area is color coded to distinguish between user and tool responses
    • verify / edit the tool call response, before submit same back to ai
      • tool response initially assigned TOOL-TEMP role, promoted to TOOL upon submit
    • based on got response, if needed one can rerun tool call with modified arguments
    • at any time there can be one pending tool call wrt a chat session
  6. Delete & Copy available via popover menu for each message

  7. Clear / + New chat with provided buttons, as needed

Overview

A lightweight simple minded ai chat client, which runs in a browser environment, with a web front-end that supports multiple chat sessions, vision models, reasoning and tool calling (including bundled tool calls - some browser native and some bundled simplemcp based).

  • Supports multiple independent chat sessions with

    • Oneshot or Streamed (default) responses
    • Custom settings and system prompts per session
    • Automatic local autosave (restorable on next load)
    • can handshake with /completions or /chat/completions (default) endpoints
  • Supports peeking at model's reasoning live

    • if model streams the same and
    • streaming mode is enabled in settings (default)
  • Supports vision / image / multimodal ai models

    • attach image files as part of user chat messages
      • handshaked as image_urls in chat message content array along with text
      • supports multiple image uploads per message
      • images displayed inline in the chat history
    • specify mmproj file via -mmproj or using -hf
    • specify -batch-size and -ubatch-size if needed
  • Built-in support for GenAI/LLM models that support tool calling

    • includes a bunch of useful builtin tool calls, without needing any additional setup

    • building on modern browsers' flexibility, following tool calls are directly supported by default

      • sys_date_time, simple_calculator, run_javascript_function_code, data_store_*, external_ai
      • except for external_ai, these are run from within a web worker context to isolate main context from them
      • data_store brings in browser IndexedDB based persistant key/value storage across sessions
    • in collaboration with included python based simplemcp.py, these additional tool calls are supported

      • search_web_text, fetch_url_raw, fetch_html_text, fetch_pdf_as_text, fetch_xml_filtered
        • these builtin tool calls (via SimpleMCP) help fetch PDFs, HTML, XML or perform web search
        • PDF tool also returns an outline with numbering, if available
        • result is truncated to iResultMaxDataLength (default128 kB)
      • helps isolate core of these functionality into a separate vm running locally or otherwise, if needed
      • supports whitelisting of acl.schemes and acl.domains through simplemcp.json
      • supports a bearer token shared between server and client for auth
        • needs https mode to be enabled, for better security wrt this flow
        • by default simplemcp.py runs in http mode, however if sec.keyFile and sec.certFile are specified, the logic switches to https mode
      • this handshake is loosely based on MCP standard, doesnt stick to the standard fully
    • follows a safety first design and lets the user

      • verify and optionally edit the tool call requests, before executing the same
      • verify and optionally edit the tool call response, before submitting the same
      • user can update the settings for auto executing these actions, if needed
    • external_ai allows invoking a separate optionally fresh by default ai instance

      • by default in such a instance
        • tool calling is kept disabled along with
        • client side sliding window of 1, ie only system prompt and latest user message is sent to ai server.
      • TCExternalAI is the special chat session used internally for this, and the default behaviour will get impacted if you modify the settings of this special chat session.
        • Restarting this chat client logic will force reset things to the default behaviour, how ever any other settings wrt TCExternalAi, that where changed, will persist across restarts.
        • this instance maps to the current ai server itself by default, but can be changed by user if needed.
      • could help with handling specific tasks using targetted personas or models
        • ai could run self modified targeted versions of itself/... using custom system prompts and user messages as needed
        • user can setup ai instance with additional compute, which should be used only if needed, to keep costs in control
        • can enable a modular pipeline with task type and or job instance specific decoupling, if needed
      • tasks offloaded could include
        • summarising, data extraction, formatted output, translation, ...
        • creative writing, task breakdown, ...
  • Client side Sliding window Context control, using iRecentUserMsgCnt, helps limit context sent to ai model

  • Optional

    • simple minded markdown parsing of chat message text contents (default wrt assistant messages/responses)
      • user can override, if needed globally or at a individual message level
    • auto trimming of trailing garbage from model outputs
  • Follows responsive design to try adapt to any screen size

  • built using plain html + css + javascript and python

    • no additional dependencies that one needs to worry about and inturn keep track of
      • except for pypdf, if pdf support needed. automaticaly drops pdf tool call support, if pypdf missing
    • fits within ~50KB compressed source or ~300KB in uncompressed source form (both including simplemcp.py)
    • easily extend with additional tool calls using either javascript or python, for additional functionality as you see fit

Start exploring / experimenting with your favorite ai models and thier capabilities.

Configuration / Settings

One can modify the session configuration using Settings UI. All the settings and more are also exposed in the browser console via document['gMe'].

Settings Groups

Group Purpose
chatProps ApiEndpoint, streaming, sliding window, markdown, ...
tools enabled, mcpServerUrl, mcpServerAuth, search URL/template & drop rules, max data length, timeouts
apiRequestOptions temperature, max_tokens, frequency_penalty, presence_penalty, cache_prompt, ...
headers Content-Type, Authorization, ...

Some specific settings

  • Ai Server (baseURL)
    • ai server (llama-server) address
    • default is http://127.0.0.1:8080
  • SimpleMCP Server (mcpServerUrl)
    • the simplemcp.py server address
    • default is http://127.0.0.1:3128
  • Stream (stream)
    • true for live streaming, false for oneshot
  • Client side Sliding Window (iRecentUserMsgCnt)
    • -1 : send full history
    • 0 : only system prompt
    • >0 : last N user messages after the most recent system prompt
  • Cache Prompt (cache_prompt)
    • enables serverside caching of system prompt and history to an extent
  • Tool Call Timeout (toolCallResponseTimeoutMS)
    • 200s by default
  • Tool call Auto (autoSecs)
    • seconds to wait before auto-triggering tool calls and auto-submitting tool responses
    • default is 0 ie manual
  • Trim Garbage (bTrimGarbage)
    • tries to remove repeating trailing text

Debugging Tips

  • Local TCPdump
    • sudo tcpdump -i lo -s 0 -vvv -A host 127.0.0.1 and port 8080
  • Browser DevTools
    • inspect document['gMe'] for session state
  • Reset Tool Call
    • delete any assistant response after the tool call handshake
    • next wrt the last tool message
      • set role back to TOOL-TEMP
        • edit the response as needed
      • or delete the same
        • user will be given option to edit and retrigger the tool call
      • submit the new response

At the end

A thank you to all open source and open model developers, who strive for the common good.