Documentation
Quickstart
TokenSaver exposes a native REST API for the Python SDK and console, and a chat-completions compatible HTTP API for third-party clients. Every call runs the same governed pipeline — cache, RAG, compression, PII, then LLM.
Choose an integration path
Pick the surface that matches your stack. All routes authenticate with your TokenSaver API key (ts_…).
| Approach | Best for |
|---|---|
| Native API | Full control, Python SDK, RAG uploads, server-side chat |
| OpenAI-compatible HTTP API | LibreChat, Open WebUI, LangChain, any OpenAI-shaped client |
| Python SDK | Type-safe helpers — ask(), RAG, chat sessions with minimal boilerplate |
| Guides | Copy-paste recipes for cache, RAG, PII, and metrics |
Model catalogue
grok), DeepSeek — on the order of 100+ chat and embedding models via GET /api/v1/llm-reference/models. Browse the live model catalogue page for counts and ids.POST /api/v1/pipelines/run, RAG, chats, pricing
Native API
Bearer TokenSaver key (ts_…). Same JSON body as the in-app pipeline. Use from the Python SDK or any HTTP client.
Open native docs→/openai/v1/chat/completions, responses, models, embeddings
Chat-completions HTTP API
Drop-in HTTP shape for chat UIs and agent frameworks. Still authenticated with your TokenSaver key—not the LLM vendor key alone.
Open HTTP API docs→Python SDK
Thin client over the native API — install from PyPI, then use ask(), RAG helpers, and server-side chat sessions.
