Documentation

Documentation

Quickstart

TokenSaver exposes a native REST API for the Python SDK and console, and a chat-completions compatible HTTP API for third-party clients. Every call runs the same governed pipeline — cache, RAG, compression, PII, then LLM.

Choose an integration path

Pick the surface that matches your stack. All routes authenticate with your TokenSaver API key (ts_…).

ApproachBest for
Native APIFull control, Python SDK, RAG uploads, server-side chat
OpenAI-compatible HTTP APILibreChat, Open WebUI, LangChain, any OpenAI-shaped client
Python SDKType-safe helpers — ask(), RAG, chat sessions with minimal boilerplate
GuidesCopy-paste recipes for cache, RAG, PII, and metrics

Model catalogue

Pipeline calls resolve against the platform LLM catalog — OpenAI, Anthropic, Google (Gemini), Mistral, Grok (grok), DeepSeek — on the order of 100+ chat and embedding models via GET /api/v1/llm-reference/models. Browse the live model catalogue page for counts and ids.

Python SDK

Thin client over the native API — install from PyPI, then use ask(), RAG helpers, and server-side chat sessions.

Guides