TokenSaver SDK

API Reference

v0.1.10Python ≥ 3.10

Build governed LLM workflows with the open-source tokensaver-sdk package. Start with LLM provider keys for ephemeral SDK keys vs encrypted keys in Settings (LangChain / LangGraph–friendly). All governance (workspace, quotas, pipeline entitlements, metering, console visibility) is enforced through the TokenSaver API key (ts_...), not the LLM vendor key. The default API base (when you omit base_url) is https://api.tokensaver.fr/api/v1. Pass base_url only for another deployment. Provider and model must match an active row in the platform LLM catalogue (GET /api/v1/llm-reference/models); otherwise the API returns 400 with LLM_MODEL_NOT_SUPPORTED. On the default hosted API URL, the Python SDK also restricts provider to openai before any request (HOSTED_LLM_PROVIDER) — same product rule as LLM keys in Settings. The Python SDK reference section documents every method; Pipeline options lists the same request fields as the console (thresholds, RAG, cache, PII, compression). For integrators using the OpenAI-compatible prefix /openai/v1 (chat/completions, responses, models, embeddings), see OpenAI-compatible HTTP. Examples below cover common Python SDK patterns.

LLM provider keys

TokenSaver needs a valid key for the LLM vendor you select (provider / model). You can supply that secret in two ways—matching how teams already work with LangChain and LangGraph: pass keys at integration time for ephemeral runs, or centralise them for governance and compliance.

Governance is always tied to the TokenSaver API key. Every authenticated request is scoped with Authorization: Bearer <ts_...>. Quotas, plan limits, workspace binding, pipeline module flags, cost and token accounting, and what appears in the platform console all flow from that key. The LLM provider secret (ephemeral or stored) only authorises the outbound call to the vendor—it does not replace TokenSaver for policy or audit.

Ephemeral (SDK, stateless-style)

The Python SDK can send provider_api_key on each POST /pipelines/run (constructor default or per ask / run_pipeline). The backend uses it only for that request and never persists it—similar to setting api_key=... on official LangChain chat models or wiring secrets through LangGraph runtime config: the secret stays in your process or secret manager and transits over HTTPS for the call.

Best when you already inject provider keys from env/Vault in CI, agents, or notebooks and want TokenSaver governance without storing vendor keys on the platform.

Stored, encrypted (console)

In the web console, Settings → LLM provider keys lets you register keys per organisation. They are encrypted at rest and reused for every pipeline run until you rotate or remove them—no provider secret in each JSON payload. This fits stricter security and compliance programmes: fewer moving parts in client code, central rotation, and alignment with “secrets in a vault, not in repos” policies while LangChain/LangGraph apps still call TokenSaver with only the TokenSaver API key.

The console UI does not send provider_api_key; it relies on stored keys.

Hosted console (demo / Free plan)

Only OpenAI keys can be configured in Settings today. The Python SDK uses the same rule when base_url is the default public API. Additional vendors will follow as the product expands; use a custom base_url for self-hosted stacks that already enable other providers.

If both are available, an explicit provider_api_key from the SDK takes precedence for that run. Omit it to fall back to encrypted keys from Settings. See Pipeline options in the Python SDK reference for the parameter name and Best practices for transport and key hygiene.

import os
from tokensaver_sdk import TokenSaver

# Ephemeral provider key (optional)—same idea as LangChain ChatOpenAI(api_key=os.environ["OPENAI_API_KEY"])
ts = TokenSaver(api_key="ts_...", provider_api_key=os.environ["OPENAI_API_KEY"])
ts.ask("Hello", provider="openai", model="gpt-4o")

Overview

The SDK is an API client, not a local orchestration engine. All governance logic (quotas, module entitlements, cost and token accounting, security checks) is enforced server-side by TokenSaver and scoped to your TokenSaver API key (ts_...).

Every activity executed through the SDK is governed and surfaced in the TokenSaver platform console. This gives teams a 360-degree view of LLM usage, costs, safeguards, and operational decisions in one single place for consistent governance.

One API shape

Same provider / model parameters across vendors. Catalogue + active flags in DB; hosted default URL → OpenAI-only in the Python SDK.

Three history modes

Stateless, local memory, or server-side persisted sessions.

Console parity

Tune cache, RAG, compression, and PII with the same JSON body as POST /pipelines/run in the UI.

LLM model catalogue

Every POST /api/v1/pipelines/run and POST /api/v1/pricing/estimate must use a provider + model pair that exists in the TokenSaver reference tables (llm_providers, llm_models) with is_active. Otherwise the API responds with 400 and error_code: LLM_MODEL_NOT_SUPPORTED (details.provider, details.model). Chats created with both fields set are validated the same way (web UI and POST /api/v1/sdk/chats).

Discover allowed pairs (no auth required):

GET https://api.tokensaver.fr/api/v1/llm-reference/models?active_only=true
# Optional: ?provider=openai

The Python SDK maps this error to ValidationError with code=LLM_MODEL_NOT_SUPPORTED (ERROR_LLM_MODEL_NOT_SUPPORTED). Maintainer doc: docs/REFERENTIEL-LLM.md.

Authentication

Authenticate every request with your TokenSaver API key. That key is the sole anchor for governance on the platform (workspace, quotas, usage, console reporting). Keep it server-side and never expose it in browser code.

Authorization: Bearer <TOKEN_SAVER_KEY>

Installation

PyPI package tokensaver-sdk. For editable installs from the monorepo or extra dev deps (e.g. python-dotenv), use pip install "tokensaver-sdk[dev]".

pip install tokensaver-sdk
# optional dev extras (dotenv, etc.)
pip install "tokensaver-sdk[dev]"

Base URL & environment

The client defaults to https://api.tokensaver.fr/api/v1 when base_url is omitted. Override the host for staging, self-hosted, or local APIs.

export TOKENSAVER_API_KEY=ts_...        # or TS_API_KEY
# Optional — if unset, SDK uses the public production API root
export TOKENSAVER_API_URL=https://api.example.com/api/v1   # alias: TS_API_URL

from tokensaver_sdk import TokenSaver

ts = TokenSaver(api_key="ts_...")  # default cloud API
ts = TokenSaver(api_key="ts_...", base_url="http://localhost:8000/api/v1")

In code, TokenSaver(api_key=..., base_url=...) always wins over process env for the base URL.

OpenAI-compatible HTTP

Third-party clients (OpenAI Python/JS SDK with a custom base_url, LangChain ChatOpenAI, chat UIs, etc.) can call the same TokenSaver pipeline on a dedicated prefix /openai/v1 on the API host (no trailing slash). Use your TokenSaver API key in Authorization: Bearer ts_... — not the LLM vendor key alone, and not the console session JWT.

Routes (same governance and module headers as documented in the console Docs):

  • POST .../openai/v1/chat/completions — Chat Completions shape; SSE with chat.completion.chunk + [DONE] when stream: true.
  • POST .../openai/v1/responses — OpenAI Responses API shape (input, instructions, …); SSE uses event: lines (response.output_text.delta, …).
  • GET .../openai/v1/models and POST .../openai/v1/embeddings.

Machine-readable contract: GET https://api.tokensaver.fr/api/v1/openapi.json (paths under /openai/v1/...). Human-readable guide: workspace Docs → OpenAI-compatible API.

Constants

Prefer SDK constants for chat history modes, error-code comparisons, and documented API roots / provider sets.

from tokensaver_sdk import (
    HISTORY_NONE,
    HISTORY_LOCAL,
    HISTORY_SERVER,
    DEFAULT_PUBLIC_API_BASE_URL,
    HOSTED_SAAS_LLM_PROVIDERS,
    API_PIPELINE_LLM_PROVIDERS,
    RAG_UPLOAD_EXTENSIONS,
    ERROR_RAG_FILE_NOT_FOUND,
    ERROR_RAG_UNSUPPORTED_FILE_TYPE,
    ERROR_HOSTED_LLM_PROVIDER,
    ERROR_LLM_MODEL_NOT_SUPPORTED,
)

See Errors in the Python SDK reference for how HTTP detail payloads are normalized.

Python SDK reference

The tokensaver-sdk package is a thin client over the TokenSaver HTTP API. For ephemeral provider_api_key vs keys stored in Settings, read LLM provider keys at the top of this page. Then use the snippet below and jump to any method for its purpose, return type, and signature.

Test a basic request

Create a client with your API key and call ask — the smallest path from install to a model response. You do not need base_url unless you target a non-default host (see Client).

python
from tokensaver_sdk import TokenSaver
 
ts = TokenSaver(api_key="ts_...")
out = ts.ask(
    "Say hello in one sentence.",
    provider="openai",
    model="gpt-4o",
)
print(out.text)

Run with: python main.py (after pip install tokensaver-sdk).

Client

TokenSaver (alias TokenSaverClient) holds your API key, base URL, and chat for session helpers.

TokenSaver.__init__(base_url=None, api_key=None, *, provider_api_key=None, timeout_total=30, connect_timeout=5, read_timeout=25, max_retries=2, headers=None)

Builds the client. Omit base_url to use the built-in production API root (https://api.tokensaver.fr/api/v1). Optional provider_api_key: default LLM provider secret sent on every pipeline run (overrides org keys for that run only; never stored). Raises ValueError if api_key is missing.

Returns
TokenSaver
python
ts = TokenSaver(api_key="ts_...")
ts = TokenSaver(api_key="ts_...", provider_api_key="sk-...")
ts = TokenSaver("https://api.example.com/api/v1", "ts_...")
ts = TokenSaver(base_url="http://localhost:8000/api/v1", api_key="ts_...")

Attributes

  • base_url — Normalized API root (no trailing slash).
  • api_key — Same string you passed in (sent as Bearer).
  • chat — Use ts.chat.session(...) for chat flows (see Chat sessions).

Pipeline & responses

These methods call the governed pipeline (cache, RAG, compression, PII, etc.) according to your flags and org settings.

Hosted API (default base URL)

On https://api.tokensaver.fr/api/v1, use provider="openai" only — same as the console (demo / Free): Anthropic, Google (Gemini), and Mistral keys are not enabled in the UI yet. The SDK rejects other provider codes on that URL with ValidationError (HOSTED_LLM_PROVIDER). Point base_url at a self-hosted stack to use additional providers where the backend allows them.

ask(...)

Recommended entry point: same JSON body as run_pipeline. Returns RunResult (.text, metrics, trace). Optional provider_api_key per call (overrides client default and org DB keys for that run only). History: pass chat_id with HISTORY_SERVER or chat_history with HISTORY_LOCAL. Module tuning (same as console): temperature; rag_similarity_threshold; cache_similarity_threshold; compression_level (1–5); rag_options { document_ids, top_k, query_image_url }; pii_options { engine, strategy, confidence_threshold, entity_types, language, regex_fallback }; optional context_layers or legacy system_prompt / profile_context / workspace_instructions.

HTTP
POST /pipelines/run
Returns
RunResult
python
result = ts.ask(
    "Summarize this in 3 bullets.",
    provider="openai",
    model="gpt-4o",
    use_rag=True,
    rag_similarity_threshold=0.55,
    rag_options={"document_ids": ["<document_id>"], "top_k": 8},
)
print(result.text, result.metrics.cost_usd)

run_pipeline(...)

Lower-level: raw API JSON. Accepts the same optional module fields as ask (temperature, thresholds, compression_level, rag_options, pii_options, context_layers, legacy instruction fields, provider_api_key).

HTTP
POST /pipelines/run
Returns
dict

estimate_cost(prompt_tokens, completion_tokens, provider, model)

Ask TokenSaver for an estimated cost from token counts — no LLM call.

HTTP
POST /pricing/estimate
Returns
dict

Pipeline request options (console parity)

ask and run_pipeline send the same optional JSON fields as the TokenSaver console for POST /pipelines/run. Omit a field to use server defaults. The console does not send provider_api_key; the SDK can (see row below).

ParameterPurpose
temperatureLLM temperature (0–2).
use_cache, use_rag, use_compression, use_pii_filter, streamEnable pipeline modules (booleans).
rag_similarity_thresholdRAG retrieval similarity floor (0–1).
cache_similarity_thresholdSemantic cache similarity (0–1).
compression_levelCompression strength 1–5 when compression is on.
provider_api_keySDK only. Per-request LLM provider secret; takes precedence over organisation keys in the database for that run; never stored. Set on TokenSaver(..., provider_api_key=...) or pass to ask / run_pipeline.
rag_optionsDict: document_ids, top_k, query_image_url.
pii_optionsDict: engine, strategy, confidence_threshold, entity_types, language, regex_fallback.
context_layersStructured instruction / knowledge / interaction (canonical API).
system_prompt, profile_context, workspace_instructionsLegacy flat instruction fields (if not using context_layers).
chat_id, chat_historySession routing (see history modes).

Types RagOptions and PiiOptions (TypedDict) are exported from tokensaver_sdk for editor hints.

HTTP helpers

Authenticated httpx calls with retries on transient errors. Paths are relative to base_url (e.g. "rag/documents").

get(path, **kwargs) → Response

GET with Authorization header and JSON Accept.

post(path, **kwargs) → Response

POST JSON by default; use httpx kwargs for custom bodies.

delete(path, **kwargs) → Response

Used by ChatSession.close() for server chats.

RAG documents

Upload supported files to your workspace, wait for ingestion, then pass document_ids in rag_options on ask(..., use_rag=True) — or use ChatSession.attach_knowledge on server chats. Allowed extensions match the console and RAG_UPLOAD_EXTENSIONS (pdf, txt, md, csv, json, docx). Wrong extension → ValidationError (RAG_UNSUPPORTED_FILE_TYPE) before any HTTP call.

rag_list_documents()

Lists ingested documents for the current API key / workspace (newest first).

HTTP
GET /rag/documents
Returns
dict with key documents

rag_upload_document(file_path, *, name=None, description=None)

Multipart upload only; does not wait for chunking. Raises ValidationError (RAG_FILE_NOT_FOUND) if the path is missing or not a file; RAG_UNSUPPORTED_FILE_TYPE if the extension is not allowed.

HTTP
POST /rag/documents
Returns
dict (includes document_id)

rag_get_document(document_id)

Fetch status, chunk counts, and metadata for one document.

HTTP
GET /rag/documents/{id}
Returns
dict

rag_wait_document_ready(document_id, *, timeout_seconds=90, poll_interval_seconds=2)

Polls until status is done or ingested, or raises on error / timeout.

Returns
dict

rag_upload_and_wait(file_path, *, name=None, description=None, timeout_seconds=90, poll_interval_seconds=2)

Upload then block until ingestion completes. Same ValidationError (RAG_FILE_NOT_FOUND) as rag_upload_document if the local file is missing.

Returns
dict

rag_ensure_document(file_path, *, reuse_existing=True, name=None, description=None, timeout_seconds=90, poll_interval_seconds=2)

If a document with the same filename already exists and is ready, returns it without re-uploading. If pending, waits. Otherwise uploads and waits (missing local file → RAG_FILE_NOT_FOUND; bad extension → RAG_UNSUPPORTED_FILE_TYPE). Set reuse_existing=False to always send a new file.

Returns
dict

ChatSession.attach_knowledge

On a HISTORY_SERVER session, remember one or more RAG document_id strings. Each ask merges them into rag_options["document_ids"] (deduped with any IDs you pass explicitly). Same workflow as attaching knowledge from the console chat "+" menu. Available in SDK 0.1.9+.

python
from tokensaver_sdk import HISTORY_SERVER, TokenSaver
 
ts = TokenSaver(api_key="ts_...")
doc_id = str(ts.rag_ensure_document("./handbook.pdf")["document_id"])
 
session = ts.chat.session(history=HISTORY_SERVER, name="Docs Q&A")
session.attach_knowledge(doc_id)
out = session.ask(
    "What is the refund policy?",
    provider="openai",
    model="gpt-4o",
    use_rag=True,
    rag_options={"top_k": 6},
)
print(out.text)

ChatSession.attach_knowledge(*document_ids: str) → None

Append non-empty document IDs to the session list (no HTTP call). IDs are sent on subsequent ask() calls via merged rag_options.

ChatSession.clear_knowledge() → None

Remove all IDs added with attach_knowledge (in-memory only).

Chat sessions

Use HISTORY_SERVER for chats stored in TokenSaver, or HISTORY_LOCAL for in-process memory.

ts.chat.session(*, history=HISTORY_NONE, name='New Chat', provider=None, model=None) → ChatSession

When history is HISTORY_SERVER, creates a chat via POST /sdk/chats (Idempotency-Key set automatically) and returns a ChatSession with chat_id.

HTTP
POST /sdk/chats (server mode)
Returns
ChatSession

ChatSession

  • ask(prompt, **kwargs) → RunResult — forwards to TokenSaver.ask; in HISTORY_LOCAL, appends turns to memory. Merges attach_knowledge IDs into rag_options before each call.
  • attach_knowledge(*document_ids), clear_knowledge() — RAG document IDs for server/local/none sessions (merge behaviour applies whenever ask runs).
  • messages(limit=50, cursor=None, order="asc") → dict — server transcript when HISTORY_SERVER.
  • close() — deletes the server chat when applicable.

Return types

Dataclasses returned by ask and attached to session calls.

RunResult

text, raw, metrics (MetricsView), trace (TraceView), context (ContextView). Method to_dict().

MetricsView

cost_usd, latency_ms, cache_hit, tokens_*, savings_ratio (all optional).

TraceView

request_id, provider, model.

ContextView

history_mode, chat_id, layers_used.

Errors

Import from tokensaver_sdk.errors. All subclasses expose code, message, status_code, request_id, raw.

  • TokenSaverError — base class.
  • AuthenticationError, ValidationError, ServerError, TimeoutError
  • Transport / DNS ServerError with code=NETWORK_ERROR when the host cannot be reached (wrong base_url, DNS failure, TLS, connection refused). Not an HTTP response from TokenSaver.
  • ProviderKeyMissingError — extra field provider
  • Hosted default URL ValidationError with code=HOSTED_LLM_PROVIDER (ERROR_HOSTED_LLM_PROVIDER) if provider is not OpenAI on the public API root (see Pipeline above).
  • Model catalogue ValidationError with code=LLM_MODEL_NOT_SUPPORTED (ERROR_LLM_MODEL_NOT_SUPPORTED) if provider / model are not an active row in the platform LLM reference (same rule as GET /api/v1/llm-reference/models).
  • QuotaExceededError — quota_dimension, limit, current_usage, retry_after_seconds
  • RateLimitError — retry_after_seconds
  • Client-side RAG path rag_upload_document / rag_upload_and_wait / rag_ensure_document raise ValidationError with code=RAG_FILE_NOT_FOUND if the path is missing, or code=RAG_UNSUPPORTED_FILE_TYPE (ERROR_RAG_UNSUPPORTED_FILE_TYPE) if the extension is not in RAG_UPLOAD_EXTENSIONS. Compare with ERROR_RAG_FILE_NOT_FOUND; raw["path"] may hold the resolved path.

Package imports

Public surface matches tokensaver_sdk.__all__ (stable imports for docs and IDEs).

python
from tokensaver_sdk import (
    TokenSaver,
    TokenSaverClient,
    API_PIPELINE_LLM_PROVIDERS,
    DEFAULT_PUBLIC_API_BASE_URL,
    HOSTED_SAAS_LLM_PROVIDERS,
    RAG_UPLOAD_EXTENSIONS,
    mime_type_for_rag_filename,
    ERROR_HOSTED_LLM_PROVIDER,
    ERROR_LLM_MODEL_NOT_SUPPORTED,
    ERROR_RAG_FILE_NOT_FOUND,
    ERROR_RAG_UNSUPPORTED_FILE_TYPE,
    HISTORY_NONE,
    HISTORY_LOCAL,
    HISTORY_SERVER,
    HistoryMode,
    RunResult,
    MetricsView,
    TraceView,
    ContextView,
    RagOptions,
    PiiOptions,
    TokenSaverError,
    AuthenticationError,
    ProviderKeyMissingError,
    QuotaExceededError,
    RateLimitError,
    ValidationError,
    ServerError,
    TimeoutError,
)
import tokensaver_sdk
 
print(tokensaver_sdk.__version__)

History constants

HISTORY_NONE    # "none"   — no memory
HISTORY_LOCAL   # "local"  — SDK process memory
HISTORY_SERVER  # "server" — persisted on TokenSaver

Stateless request (no history)

Use this for one-shot prompts where you do not want conversation memory.

from tokensaver_sdk import HISTORY_NONE, TokenSaver

ts = TokenSaver(api_key="ts_...")
result = ts.ask(
    "Write 3 release-note bullets for v2.4.0.",
    provider="openai",
    model="gpt-4o",
    history=HISTORY_NONE,
)

print(result.text)

RAG document upload

Ingest a file into your workspace knowledge base, then pass its document_id in rag_options with use_rag=True (same contract as the console pipeline). The SDK validates extensions before upload; unsupported paths raise ValidationError (RAG_UNSUPPORTED_FILE_TYPE).

Supported extensions (also available as RAG_UPLOAD_EXTENSIONS): pdf, txt, md, csv, json, docx.

rag_ensure_document(path) uploads and waits for ingestion, or returns an existing ready document with the same filename when reuse_existing=True (default).

from tokensaver_sdk import TokenSaver

ts = TokenSaver(api_key="ts_...")

# PDF, DOCX, TXT, MD, CSV, JSON — same as console "+"
doc = ts.rag_ensure_document("./path/to/example_document.docx")
doc_id = str(doc["document_id"])

q1 = ts.ask(
    "What is this document about? Answer in one short paragraph.",
    provider="openai",
    model="gpt-4o",
    use_rag=True,
    rag_similarity_threshold=0.55,
    rag_options={"document_ids": [doc_id]},
)
print(q1.text)

q2 = ts.ask(
    "List three key points from the document.",
    provider="openai",
    model="gpt-4o",
    use_rag=True,
    use_compression=True,
    compression_level=4,
    rag_options={"document_ids": [doc_id], "top_k": 8},
)
print(q2.text)

Lower-level upload (no wait)

meta = ts.rag_upload_document("./notes.md", name="notes.md")
ts.rag_wait_document_ready(meta["document_id"])

# Or one call:
# ready = ts.rag_upload_and_wait("./data/report.pdf")

Server chat + RAG (attach_knowledge)

For a persisted chat on TokenSaver (like the web console), create a server session and attach one or more document_id values. Every session.ask(...) then merges those IDs into rag_options automatically — the same pattern as attaching knowledge with "+" in the UI. Requires SDK ≥ 0.1.9.

from tokensaver_sdk import HISTORY_SERVER, TokenSaver

ts = TokenSaver(api_key="ts_...")
doc = ts.rag_ensure_document("./specs/api_overview.pdf")
doc_id = str(doc["document_id"])

session = ts.chat.session(history=HISTORY_SERVER, name="Support bot")
session.attach_knowledge(doc_id)

answer = session.ask(
    "What authentication scheme does the API use?",
    provider="openai",
    model="gpt-4o",
    use_rag=True,
    rag_similarity_threshold=0.55,
    rag_options={"top_k": 8},  # document_ids come from attach_knowledge
)
print(answer.text)

# Optional: stop merging attached IDs for the next turns
session.clear_knowledge()

You can still pass rag_options["document_ids"] explicitly on a single ask; they are merged with attached IDs (attached first, then deduplicated).

History mode (server session)

Use SDK constants (`HISTORY_NONE`, `HISTORY_LOCAL`, `HISTORY_SERVER`) for safer and more maintainable code.

`HISTORY_NONE`: stateless call, no memory.

`HISTORY_LOCAL`: memory kept in SDK process.

`HISTORY_SERVER`: persisted chat on TokenSaver backend.

from tokensaver_sdk import HISTORY_LOCAL, HISTORY_NONE, HISTORY_SERVER, TokenSaver

ts = TokenSaver(api_key="ts_...")

# Stateless — no memory (same as ts.ask(..., history=HISTORY_NONE))
s0 = ts.chat.session(history=HISTORY_NONE)
out = s0.ask("Ping", provider="openai", model="gpt-4o")

# Server-side session — persisted transcript + chat_id
session = ts.chat.session(history=HISTORY_SERVER, name="Onboarding assistant")
first = session.ask("Draft a 5-step fintech onboarding checklist.", provider="openai", model="gpt-4o")
second = session.ask("Rewrite it for a non-technical operations manager.", provider="openai", model="gpt-4o")
print(first.text)
print(second.text)

page = session.messages(limit=50)
print(page["items"], page["next_cursor"])

# Local memory — history kept in the Python process only
local = ts.chat.session(history=HISTORY_LOCAL)
local.ask("My name is Alex.", provider="openai", model="gpt-4o")
local.ask("What is my name?", provider="openai", model="gpt-4o")

Cache-first request

Enable cache for repeated prompts. Optionally set cache_similarity_threshold (0–1) to match the semantic cache behavior you use in the console.

from tokensaver_sdk import TokenSaver

ts = TokenSaver(api_key="ts_...")
prompt = "Summarize the Q1 support incidents in exactly 4 bullets."

first = ts.ask(
    prompt,
    provider="openai",
    model="gpt-4o",
    use_cache=True,
    cache_similarity_threshold=0.85,
)
second = ts.ask(
    prompt,
    provider="openai",
    model="gpt-4o",
    use_cache=True,
    cache_similarity_threshold=0.85,
)

print("first cache_hit:", first.metrics.cache_hit)
print("second cache_hit:", second.metrics.cache_hit)
print("second cost:", second.metrics.cost_usd)

PII anonymization

Set use_pii_filter=True and pass pii_options for engine, strategy, confidence, and entities — same shape as the Pipeline settings in the console.

from tokensaver_sdk import TokenSaver

ts = TokenSaver(api_key="ts_...")
question = """
Draft a follow-up email for customer Jane Doe.
Her SSN is 123-45-6789, phone is +1 415 555 0199,
and card number is 4111 1111 1111 1111.
"""

masked = ts.ask(
    question,
    provider="openai",
    model="gpt-4o",
    use_pii_filter=True,
    pii_options={
        "engine": "gliner",
        "strategy": "mask",
        "confidence_threshold": 0.5,
        "language": "en",
        "regex_fallback": True,
    },
)

print(masked.text)
# Example output:
# "Hello [REDACTED_NAME], we called you at [REDACTED_PHONE].
# Your verification token is linked to [REDACTED_SSN]."

Errors

The SDK maps backend error codes to typed exceptions. FastAPI often nests machine-readable fields under detail; the client unwraps error_code / message / details automatically. For RAG uploads, a missing local PDF raises ValidationError with RAG_FILE_NOT_FOUND (no HTTP round-trip). Wrong provider on the default hosted URL → HOSTED_LLM_PROVIDER. Unknown catalogue pair → LLM_MODEL_NOT_SUPPORTED (both as ValidationError). Connection issues surface as ServerError (NETWORK_ERROR).

import os
from tokensaver_sdk import (
    ERROR_HOSTED_LLM_PROVIDER,
    ERROR_LLM_MODEL_NOT_SUPPORTED,
    ERROR_RAG_FILE_NOT_FOUND,
    ERROR_RAG_UNSUPPORTED_FILE_TYPE,
    TokenSaver,
)
from tokensaver_sdk.errors import (
    AuthenticationError,
    ProviderKeyMissingError,
    QuotaExceededError,
    RateLimitError,
    ServerError,
    TokenSaverError,
    ValidationError,
)

ts = TokenSaver(api_key=os.environ["TOKENSAVER_API_KEY"])

try:
    ts.ask("Hi", provider="openai", model="gpt-4o")
except AuthenticationError:
    print("Invalid or revoked TokenSaver API key.")
except ProviderKeyMissingError:
    print("Configure provider key in TokenSaver settings.")
except QuotaExceededError as e:
    print(e.quota_dimension, e.limit, e.current_usage)
except RateLimitError as e:
    print("Retry after:", e.retry_after_seconds)
except ValidationError as e:
    if e.code == ERROR_RAG_FILE_NOT_FOUND:
        print("RAG file path invalid:", e.raw)
    elif e.code == ERROR_RAG_UNSUPPORTED_FILE_TYPE:
        print("RAG extension not allowed (see RAG_UPLOAD_EXTENSIONS):", e.raw)
    elif e.code == ERROR_HOSTED_LLM_PROVIDER:
        print("Use provider='openai' on the default API URL, or set base_url for self-hosted.")
    elif e.code == ERROR_LLM_MODEL_NOT_SUPPORTED:
        print("Pick provider/model from GET /api/v1/llm-reference/models:", e.raw)
except ServerError as e:
    if e.code == "NETWORK_ERROR":
        print("Cannot reach API host — check base_url and network:", e.message)
except TokenSaverError as e:
    print(e.code, e.message, e.status_code, e.request_id)

Full list: Python SDK reference → Errors.

Detailed metrics

Every `ask()` returns normalized metrics so you can track quality, performance, and savings per request.

result = ts.ask("Generate a one-paragraph incident summary.", provider="openai", model="gpt-4o")

print(result.text)
print("cost_usd:", result.metrics.cost_usd)
print("latency_ms:", result.metrics.latency_ms)
print("tokens_input:", result.metrics.tokens_input)
print("tokens_output:", result.metrics.tokens_output)
print("tokens_total:", result.metrics.tokens_total)
print("tokens_saved:", result.metrics.tokens_saved)
print("savings_ratio:", result.metrics.savings_ratio)
print("request_id:", result.trace.request_id)
print("history_mode:", result.context.history_mode)

Best practices

  • Keep API keys in secure server environments only.
  • Log `request_id` for production troubleshooting.
  • Use explicit module flags for predictable behavior.
  • Align thresholds and pii_options / rag_options with what you validate in the console.
  • Use server-side chat sessions (HISTORY_SERVER) when several workers must share the same transcript; pair with attach_knowledge to mirror console “+” document attachment (SDK ≥ 0.1.9).
  • Catch typed SDK exceptions instead of relying on raw HTTP status.
  • Before shipping, sync provider / model with GET /api/v1/llm-reference/models (or this workspace’s console selectors) so you do not hit LLM_MODEL_NOT_SUPPORTED.
  • Omit base_url for the default production API; set it only for staging, self-hosted, or local backends. A NETWORK_ERROR usually means the host is wrong or unreachable.
  • Optional provider_api_key on ask / run_pipeline (or on TokenSaver(...)) sends a per-request LLM secret that overrides organisation keys for that run only; it is never stored. The console UI does not use this field.