TokenSaver SDK
API Reference
v0.1.10Python ≥ 3.10
Build governed LLM workflows with the open-source tokensaver-sdk package. Start with LLM provider keys for ephemeral SDK keys vs encrypted keys in Settings (LangChain / LangGraph–friendly). All governance (workspace, quotas, pipeline entitlements, metering, console visibility) is enforced through the TokenSaver API key (ts_...), not the LLM vendor key. The default API base (when you omit base_url) is https://api.tokensaver.fr/api/v1. Pass base_url only for another deployment. Provider and model must match an active row in the platform LLM catalogue (GET /api/v1/llm-reference/models); otherwise the API returns 400 with LLM_MODEL_NOT_SUPPORTED. On the default hosted API URL, the Python SDK also restricts provider to openai before any request (HOSTED_LLM_PROVIDER) — same product rule as LLM keys in Settings. The Python SDK reference section documents every method; Pipeline options lists the same request fields as the console (thresholds, RAG, cache, PII, compression). For integrators using the OpenAI-compatible prefix /openai/v1 (chat/completions, responses, models, embeddings), see OpenAI-compatible HTTP. Examples below cover common Python SDK patterns.
LLM provider keys
TokenSaver needs a valid key for the LLM vendor you select (provider / model). You can supply that secret in two ways—matching how teams already work with LangChain and LangGraph: pass keys at integration time for ephemeral runs, or centralise them for governance and compliance.
Governance is always tied to the TokenSaver API key. Every authenticated request is scoped with Authorization: Bearer <ts_...>. Quotas, plan limits, workspace binding, pipeline module flags, cost and token accounting, and what appears in the platform console all flow from that key. The LLM provider secret (ephemeral or stored) only authorises the outbound call to the vendor—it does not replace TokenSaver for policy or audit.
Ephemeral (SDK, stateless-style)
The Python SDK can send provider_api_key on each POST /pipelines/run (constructor default or per ask / run_pipeline). The backend uses it only for that request and never persists it—similar to setting api_key=... on official LangChain chat models or wiring secrets through LangGraph runtime config: the secret stays in your process or secret manager and transits over HTTPS for the call.
Best when you already inject provider keys from env/Vault in CI, agents, or notebooks and want TokenSaver governance without storing vendor keys on the platform.
Stored, encrypted (console)
In the web console, Settings → LLM provider keys lets you register keys per organisation. They are encrypted at rest and reused for every pipeline run until you rotate or remove them—no provider secret in each JSON payload. This fits stricter security and compliance programmes: fewer moving parts in client code, central rotation, and alignment with “secrets in a vault, not in repos” policies while LangChain/LangGraph apps still call TokenSaver with only the TokenSaver API key.
The console UI does not send provider_api_key; it relies on stored keys.
Hosted console (demo / Free plan)
Only OpenAI keys can be configured in Settings today. The Python SDK uses the same rule when base_url is the default public API. Additional vendors will follow as the product expands; use a custom base_url for self-hosted stacks that already enable other providers.
If both are available, an explicit provider_api_key from the SDK takes precedence for that run. Omit it to fall back to encrypted keys from Settings. See Pipeline options in the Python SDK reference for the parameter name and Best practices for transport and key hygiene.
import os
from tokensaver_sdk import TokenSaver
# Ephemeral provider key (optional)—same idea as LangChain ChatOpenAI(api_key=os.environ["OPENAI_API_KEY"])
ts = TokenSaver(api_key="ts_...", provider_api_key=os.environ["OPENAI_API_KEY"])
ts.ask("Hello", provider="openai", model="gpt-4o")Overview
The SDK is an API client, not a local orchestration engine. All governance logic (quotas, module entitlements, cost and token accounting, security checks) is enforced server-side by TokenSaver and scoped to your TokenSaver API key (ts_...).
Every activity executed through the SDK is governed and surfaced in the TokenSaver platform console. This gives teams a 360-degree view of LLM usage, costs, safeguards, and operational decisions in one single place for consistent governance.
One API shape
Same provider / model parameters across vendors. Catalogue + active flags in DB; hosted default URL → OpenAI-only in the Python SDK.
Three history modes
Stateless, local memory, or server-side persisted sessions.
Console parity
Tune cache, RAG, compression, and PII with the same JSON body as POST /pipelines/run in the UI.
LLM model catalogue
Every POST /api/v1/pipelines/run and POST /api/v1/pricing/estimate must use a provider + model pair that exists in the TokenSaver reference tables (llm_providers, llm_models) with is_active. Otherwise the API responds with 400 and error_code: LLM_MODEL_NOT_SUPPORTED (details.provider, details.model). Chats created with both fields set are validated the same way (web UI and POST /api/v1/sdk/chats).
Discover allowed pairs (no auth required):
GET https://api.tokensaver.fr/api/v1/llm-reference/models?active_only=true
# Optional: ?provider=openai
The Python SDK maps this error to ValidationError with code=LLM_MODEL_NOT_SUPPORTED (ERROR_LLM_MODEL_NOT_SUPPORTED). Maintainer doc: docs/REFERENTIEL-LLM.md.
Authentication
Authenticate every request with your TokenSaver API key. That key is the sole anchor for governance on the platform (workspace, quotas, usage, console reporting). Keep it server-side and never expose it in browser code.
Authorization: Bearer <TOKEN_SAVER_KEY>
Installation
PyPI package tokensaver-sdk. For editable installs from the monorepo or extra dev deps (e.g. python-dotenv), use pip install "tokensaver-sdk[dev]".
pip install tokensaver-sdk
# optional dev extras (dotenv, etc.)
pip install "tokensaver-sdk[dev]"
Base URL & environment
The client defaults to https://api.tokensaver.fr/api/v1 when base_url is omitted. Override the host for staging, self-hosted, or local APIs.
export TOKENSAVER_API_KEY=ts_... # or TS_API_KEY
# Optional — if unset, SDK uses the public production API root
export TOKENSAVER_API_URL=https://api.example.com/api/v1 # alias: TS_API_URL
from tokensaver_sdk import TokenSaver
ts = TokenSaver(api_key="ts_...") # default cloud API
ts = TokenSaver(api_key="ts_...", base_url="http://localhost:8000/api/v1")
In code, TokenSaver(api_key=..., base_url=...) always wins over process env for the base URL.
OpenAI-compatible HTTP
Third-party clients (OpenAI Python/JS SDK with a custom base_url, LangChain ChatOpenAI, chat UIs, etc.) can call the same TokenSaver pipeline on a dedicated prefix /openai/v1 on the API host (no trailing slash). Use your TokenSaver API key in Authorization: Bearer ts_... — not the LLM vendor key alone, and not the console session JWT.
Routes (same governance and module headers as documented in the console Docs):
POST .../openai/v1/chat/completions — Chat Completions shape; SSE with chat.completion.chunk + [DONE] when stream: true.POST .../openai/v1/responses — OpenAI Responses API shape (input, instructions, …); SSE uses event: lines (response.output_text.delta, …).GET .../openai/v1/models and POST .../openai/v1/embeddings.
Machine-readable contract: GET https://api.tokensaver.fr/api/v1/openapi.json (paths under /openai/v1/...). Human-readable guide: workspace Docs → OpenAI-compatible API.
Constants
Prefer SDK constants for chat history modes, error-code comparisons, and documented API roots / provider sets.
from tokensaver_sdk import (
HISTORY_NONE,
HISTORY_LOCAL,
HISTORY_SERVER,
DEFAULT_PUBLIC_API_BASE_URL,
HOSTED_SAAS_LLM_PROVIDERS,
API_PIPELINE_LLM_PROVIDERS,
RAG_UPLOAD_EXTENSIONS,
ERROR_RAG_FILE_NOT_FOUND,
ERROR_RAG_UNSUPPORTED_FILE_TYPE,
ERROR_HOSTED_LLM_PROVIDER,
ERROR_LLM_MODEL_NOT_SUPPORTED,
)See Errors in the Python SDK reference for how HTTP detail payloads are normalized.
Python SDK reference
The tokensaver-sdk package is a thin client over the TokenSaver HTTP API. For ephemeral provider_api_key vs keys stored in Settings, read LLM provider keys at the top of this page. Then use the snippet below and jump to any method for its purpose, return type, and signature.
Stateless request (no history)
Use this for one-shot prompts where you do not want conversation memory.
from tokensaver_sdk import HISTORY_NONE, TokenSaver
ts = TokenSaver(api_key="ts_...")
result = ts.ask(
"Write 3 release-note bullets for v2.4.0.",
provider="openai",
model="gpt-4o",
history=HISTORY_NONE,
)
print(result.text)RAG document upload
Ingest a file into your workspace knowledge base, then pass its document_id in rag_options with use_rag=True (same contract as the console pipeline). The SDK validates extensions before upload; unsupported paths raise ValidationError (RAG_UNSUPPORTED_FILE_TYPE).
Supported extensions (also available as RAG_UPLOAD_EXTENSIONS): pdf, txt, md, csv, json, docx.
rag_ensure_document(path) uploads and waits for ingestion, or returns an existing ready document with the same filename when reuse_existing=True (default).
from tokensaver_sdk import TokenSaver
ts = TokenSaver(api_key="ts_...")
# PDF, DOCX, TXT, MD, CSV, JSON — same as console "+"
doc = ts.rag_ensure_document("./path/to/example_document.docx")
doc_id = str(doc["document_id"])
q1 = ts.ask(
"What is this document about? Answer in one short paragraph.",
provider="openai",
model="gpt-4o",
use_rag=True,
rag_similarity_threshold=0.55,
rag_options={"document_ids": [doc_id]},
)
print(q1.text)
q2 = ts.ask(
"List three key points from the document.",
provider="openai",
model="gpt-4o",
use_rag=True,
use_compression=True,
compression_level=4,
rag_options={"document_ids": [doc_id], "top_k": 8},
)
print(q2.text)Lower-level upload (no wait)
meta = ts.rag_upload_document("./notes.md", name="notes.md")
ts.rag_wait_document_ready(meta["document_id"])
# Or one call:
# ready = ts.rag_upload_and_wait("./data/report.pdf")Server chat + RAG (attach_knowledge)
For a persisted chat on TokenSaver (like the web console), create a server session and attach one or more document_id values. Every session.ask(...) then merges those IDs into rag_options automatically — the same pattern as attaching knowledge with "+" in the UI. Requires SDK ≥ 0.1.9.
from tokensaver_sdk import HISTORY_SERVER, TokenSaver
ts = TokenSaver(api_key="ts_...")
doc = ts.rag_ensure_document("./specs/api_overview.pdf")
doc_id = str(doc["document_id"])
session = ts.chat.session(history=HISTORY_SERVER, name="Support bot")
session.attach_knowledge(doc_id)
answer = session.ask(
"What authentication scheme does the API use?",
provider="openai",
model="gpt-4o",
use_rag=True,
rag_similarity_threshold=0.55,
rag_options={"top_k": 8}, # document_ids come from attach_knowledge
)
print(answer.text)
# Optional: stop merging attached IDs for the next turns
session.clear_knowledge()You can still pass rag_options["document_ids"] explicitly on a single ask; they are merged with attached IDs (attached first, then deduplicated).
History mode (server session)
Use SDK constants (`HISTORY_NONE`, `HISTORY_LOCAL`, `HISTORY_SERVER`) for safer and more maintainable code.
`HISTORY_NONE`: stateless call, no memory.
`HISTORY_LOCAL`: memory kept in SDK process.
`HISTORY_SERVER`: persisted chat on TokenSaver backend.
from tokensaver_sdk import HISTORY_LOCAL, HISTORY_NONE, HISTORY_SERVER, TokenSaver
ts = TokenSaver(api_key="ts_...")
# Stateless — no memory (same as ts.ask(..., history=HISTORY_NONE))
s0 = ts.chat.session(history=HISTORY_NONE)
out = s0.ask("Ping", provider="openai", model="gpt-4o")
# Server-side session — persisted transcript + chat_id
session = ts.chat.session(history=HISTORY_SERVER, name="Onboarding assistant")
first = session.ask("Draft a 5-step fintech onboarding checklist.", provider="openai", model="gpt-4o")
second = session.ask("Rewrite it for a non-technical operations manager.", provider="openai", model="gpt-4o")
print(first.text)
print(second.text)
page = session.messages(limit=50)
print(page["items"], page["next_cursor"])
# Local memory — history kept in the Python process only
local = ts.chat.session(history=HISTORY_LOCAL)
local.ask("My name is Alex.", provider="openai", model="gpt-4o")
local.ask("What is my name?", provider="openai", model="gpt-4o")Cache-first request
Enable cache for repeated prompts. Optionally set cache_similarity_threshold (0–1) to match the semantic cache behavior you use in the console.
from tokensaver_sdk import TokenSaver
ts = TokenSaver(api_key="ts_...")
prompt = "Summarize the Q1 support incidents in exactly 4 bullets."
first = ts.ask(
prompt,
provider="openai",
model="gpt-4o",
use_cache=True,
cache_similarity_threshold=0.85,
)
second = ts.ask(
prompt,
provider="openai",
model="gpt-4o",
use_cache=True,
cache_similarity_threshold=0.85,
)
print("first cache_hit:", first.metrics.cache_hit)
print("second cache_hit:", second.metrics.cache_hit)
print("second cost:", second.metrics.cost_usd)PII anonymization
Set use_pii_filter=True and pass pii_options for engine, strategy, confidence, and entities — same shape as the Pipeline settings in the console.
from tokensaver_sdk import TokenSaver
ts = TokenSaver(api_key="ts_...")
question = """
Draft a follow-up email for customer Jane Doe.
Her SSN is 123-45-6789, phone is +1 415 555 0199,
and card number is 4111 1111 1111 1111.
"""
masked = ts.ask(
question,
provider="openai",
model="gpt-4o",
use_pii_filter=True,
pii_options={
"engine": "gliner",
"strategy": "mask",
"confidence_threshold": 0.5,
"language": "en",
"regex_fallback": True,
},
)
print(masked.text)
# Example output:
# "Hello [REDACTED_NAME], we called you at [REDACTED_PHONE].
# Your verification token is linked to [REDACTED_SSN]."Errors
The SDK maps backend error codes to typed exceptions. FastAPI often nests machine-readable fields under detail; the client unwraps error_code / message / details automatically. For RAG uploads, a missing local PDF raises ValidationError with RAG_FILE_NOT_FOUND (no HTTP round-trip). Wrong provider on the default hosted URL → HOSTED_LLM_PROVIDER. Unknown catalogue pair → LLM_MODEL_NOT_SUPPORTED (both as ValidationError). Connection issues surface as ServerError (NETWORK_ERROR).
import os
from tokensaver_sdk import (
ERROR_HOSTED_LLM_PROVIDER,
ERROR_LLM_MODEL_NOT_SUPPORTED,
ERROR_RAG_FILE_NOT_FOUND,
ERROR_RAG_UNSUPPORTED_FILE_TYPE,
TokenSaver,
)
from tokensaver_sdk.errors import (
AuthenticationError,
ProviderKeyMissingError,
QuotaExceededError,
RateLimitError,
ServerError,
TokenSaverError,
ValidationError,
)
ts = TokenSaver(api_key=os.environ["TOKENSAVER_API_KEY"])
try:
ts.ask("Hi", provider="openai", model="gpt-4o")
except AuthenticationError:
print("Invalid or revoked TokenSaver API key.")
except ProviderKeyMissingError:
print("Configure provider key in TokenSaver settings.")
except QuotaExceededError as e:
print(e.quota_dimension, e.limit, e.current_usage)
except RateLimitError as e:
print("Retry after:", e.retry_after_seconds)
except ValidationError as e:
if e.code == ERROR_RAG_FILE_NOT_FOUND:
print("RAG file path invalid:", e.raw)
elif e.code == ERROR_RAG_UNSUPPORTED_FILE_TYPE:
print("RAG extension not allowed (see RAG_UPLOAD_EXTENSIONS):", e.raw)
elif e.code == ERROR_HOSTED_LLM_PROVIDER:
print("Use provider='openai' on the default API URL, or set base_url for self-hosted.")
elif e.code == ERROR_LLM_MODEL_NOT_SUPPORTED:
print("Pick provider/model from GET /api/v1/llm-reference/models:", e.raw)
except ServerError as e:
if e.code == "NETWORK_ERROR":
print("Cannot reach API host — check base_url and network:", e.message)
except TokenSaverError as e:
print(e.code, e.message, e.status_code, e.request_id)Full list: Python SDK reference → Errors.
Detailed metrics
Every `ask()` returns normalized metrics so you can track quality, performance, and savings per request.
result = ts.ask("Generate a one-paragraph incident summary.", provider="openai", model="gpt-4o")
print(result.text)
print("cost_usd:", result.metrics.cost_usd)
print("latency_ms:", result.metrics.latency_ms)
print("tokens_input:", result.metrics.tokens_input)
print("tokens_output:", result.metrics.tokens_output)
print("tokens_total:", result.metrics.tokens_total)
print("tokens_saved:", result.metrics.tokens_saved)
print("savings_ratio:", result.metrics.savings_ratio)
print("request_id:", result.trace.request_id)
print("history_mode:", result.context.history_mode)Best practices
- Keep API keys in secure server environments only.
- Log `request_id` for production troubleshooting.
- Use explicit module flags for predictable behavior.
- Align thresholds and
pii_options / rag_options with what you validate in the console. - Use server-side chat sessions (
HISTORY_SERVER) when several workers must share the same transcript; pair with attach_knowledge to mirror console “+” document attachment (SDK ≥ 0.1.9). - Catch typed SDK exceptions instead of relying on raw HTTP status.
- Before shipping, sync
provider / model with GET /api/v1/llm-reference/models (or this workspace’s console selectors) so you do not hit LLM_MODEL_NOT_SUPPORTED. - Omit
base_url for the default production API; set it only for staging, self-hosted, or local backends. A NETWORK_ERROR usually means the host is wrong or unreachable. - Optional
provider_api_key on ask / run_pipeline (or on TokenSaver(...)) sends a per-request LLM secret that overrides organisation keys for that run only; it is never stored. The console UI does not use this field.