Documentation

Quickstart

TokenSaver exposes a native REST API for the Python SDK and console, and a chat-completions compatible HTTP API for third-party clients. Every call runs the same governed pipeline — cache, RAG, compression, PII, then LLM.

Choose an integration path

Pick the surface that matches your stack. All routes authenticate with your TokenSaver API key (ts_…).

Approach	Best for
Native API	Full control, Python SDK, RAG uploads, server-side chat
OpenAI-compatible HTTP API	LibreChat, Open WebUI, LangChain, any OpenAI-shaped client
Python SDK	Type-safe helpers — ask(), RAG, chat sessions with minimal boilerplate
Guides	Copy-paste recipes for cache, RAG, PII, and metrics

Model catalogue

Pipeline calls resolve against the platform LLM catalog — OpenAI, Anthropic, Google (Gemini), Mistral, Grok (grok), DeepSeek — on the order of 100+ chat and embedding models via GET /api/v1/llm-reference/models. Browse the live model catalogue page for counts and ids.

POST /api/v1/pipelines/run, RAG, chats, pricing

Native API

Bearer TokenSaver key (ts_…). Same JSON body as the in-app pipeline. Use from the Python SDK or any HTTP client.

Open native docs→

/openai/v1/chat/completions, responses, models, embeddings

Chat-completions HTTP API

Drop-in HTTP shape for chat UIs and agent frameworks. Still authenticated with your TokenSaver key—not the LLM vendor key alone.

Open HTTP API docs→

Python SDK

Thin client over the native API — install from PyPI, then use ask(), RAG helpers, and server-side chat sessions.

SDK overview Full method reference (single page)

Choose an integration path

Native API

Chat-completions HTTP API

Python SDK

Guides