Python SDK — full method reference

Single scrollable page with every method card, pipeline options, and error tables. Prefer the split SDK pages in the sidebar for quicker navigation.

LLM provider keys

Ephemeral vs stored keys are documented on Native → LLM provider keys. Links below that point here scroll to this note.

Python SDK reference

The tokensaver-sdk package is a thin client over the TokenSaver HTTP API. For ephemeral provider_api_key vs keys stored in Settings, read LLM provider keys at the top of this page. Then use the snippet below and jump to any method for its purpose, return type, and signature.

Test a basic request

Create a client with your API key and call ask — the smallest path from install to a model response. You do not need base_url unless you target a non-default host (see Client).

python

from tokensaver_sdk import TokenSaver

ts = TokenSaver(api_key="ts_...")

out = ts.ask(

    "Say hello in one sentence.",

    provider="openai",

    model="gpt-4o",

print(out.text)

Run with: python main.py (after pip install tokensaver-sdk).

Methods overview

Jump to the group you need. Each card below explains one callable with a short description and a copy-pasteable signature.

TokenSaver

Constructor, timeouts, chat facade

ask

Typed prompt → RunResult

run_pipeline

Raw POST /pipelines/run JSON

estimate_cost

Price estimate without LLM call

Pipeline options

Thresholds & module params (console parity)

get / post / delete

Authenticated httpx wrappers

rag_ensure_document

Upload or reuse by filename

attach_knowledge

Server chat + RAG document IDs (console “+” parity)

chat.session

Server or local chat sessions

RunResult & views

Metrics, trace, context

Exceptions

Typed errors from the API

Client

TokenSaver (alias TokenSaverClient) holds your API key, base URL, and chat for session helpers.

TokenSaver.init(base_url=None, api_key=None, *, provider_api_key=None, timeout_total=30, connect_timeout=5, read_timeout=25, max_retries=2, headers=None)

Builds the client. Omit base_url to use the built-in production API root (https://api.tokensaver.fr/api/v1). Optional provider_api_key: default LLM provider secret sent on every pipeline run (overrides org keys for that run only; never stored). Raises ValueError if api_key is missing.

Returns: TokenSaver

python

ts = TokenSaver(api_key="ts_...")
ts = TokenSaver(api_key="ts_...", provider_api_key="sk-...")
ts = TokenSaver("https://api.example.com/api/v1", "ts_...")
ts = TokenSaver(base_url="http://localhost:8000/api/v1", api_key="ts_...")

Attributes

base_url — Normalized API root (no trailing slash).
api_key — Same string you passed in (sent as Bearer).
chat — Use ts.chat.session(...) for chat flows (see Chat sessions).

Pipeline & responses

These methods call the governed pipeline (cache, RAG, compression, PII, etc.) according to your flags and org settings.

Hosted API (default base URL)

On https://api.tokensaver.fr/api/v1, use provider in HOSTED_SAAS_LLM_PROVIDERS (OpenAI, Anthropic, Google, Mistral, Grok, DeepSeek) — same as the hosted console. Other codes raise ValidationError (HOSTED_LLM_PROVIDER). Point base_url at a self-hosted stack for additional providers where the backend allows them.

ask(...)

Recommended entry point: same JSON body as run_pipeline. Returns RunResult (.text, metrics, trace). Optional provider_api_key per call (overrides client default and org DB keys for that run only). History: pass chat_id with HISTORY_SERVER or chat_history with HISTORY_LOCAL. Module tuning (same as console): temperature; rag_similarity_threshold; cache_similarity_threshold; compression_level (1–5); rag_options { document_ids, top_k, query_image_url }; pii_options { engine, strategy, confidence_threshold, entity_types, language, regex_fallback }; optional context_layers or legacy system_prompt / profile_context / workspace_instructions.

HTTP: POST /pipelines/run
Returns: RunResult

python

result = ts.ask(

    "Summarize this in 3 bullets.",

    provider="openai",

    model="gpt-4o",

    use_rag=True,

    rag_similarity_threshold=0.55,

    rag_options={"document_ids": ["<document_id>"], "top_k": 8},

print(result.text, result.metrics.cost_usd)

run_pipeline(...)

Lower-level: raw API JSON. Accepts the same optional module fields as ask (temperature, thresholds, compression_level, rag_options, pii_options, context_layers, legacy instruction fields, provider_api_key).

HTTP: POST /pipelines/run
Returns: dict

estimate_cost(prompt_tokens, completion_tokens, provider, model)

Ask TokenSaver for an estimated cost from token counts — no LLM call.

HTTP: POST /pricing/estimate
Returns: dict

Pipeline request options (console parity)

ask and run_pipeline send the same optional JSON fields as the TokenSaver console for POST /pipelines/run. Omit a field to use server defaults. The console does not send provider_api_key; the SDK can (see row below).

Parameter	Purpose
temperature	LLM temperature (0–2).
use_cache, use_rag, use_compression, use_pii_filter, stream	Enable pipeline modules (booleans).
rag_similarity_threshold	RAG retrieval similarity floor (0–1).
cache_similarity_threshold	Semantic cache similarity (0–1).
compression_level	Compression strength 1–5 when compression is on.
provider_api_key	SDK only. Per-request LLM provider secret; takes precedence over organisation keys in the database for that run; never stored. Set on `TokenSaver(..., provider_api_key=...)` or pass to `ask` / `run_pipeline`.
rag_options	Dict: `document_ids`, `top_k`, `query_image_url`.
pii_options	Dict: `engine`, `strategy`, `confidence_threshold`, `entity_types`, `language`, `regex_fallback`.
context_layers	Structured instruction / knowledge / interaction (canonical API).
system_prompt, profile_context, workspace_instructions	Legacy flat instruction fields (if not using context_layers).
chat_id, chat_history	Session routing (see history modes).

Types RagOptions and PiiOptions (TypedDict) are exported from tokensaver_sdk for editor hints.

HTTP helpers

Authenticated httpx calls with retries on transient errors. Paths are relative to base_url (e.g. "rag/documents").

get(path, **kwargs) → Response

GET with Authorization header and JSON Accept.

post(path, **kwargs) → Response

POST JSON by default; use httpx kwargs for custom bodies.

delete(path, **kwargs) → Response

Used by ChatSession.close() for server chats.

RAG documents

Upload supported files to your workspace, wait for ingestion, then pass document_ids in rag_options on ask(..., use_rag=True) — or use ChatSession.attach_knowledge on server chats. Allowed extensions match the console and RAG_UPLOAD_EXTENSIONS (pdf, txt, md, csv, json, docx). Wrong extension → ValidationError (RAG_UNSUPPORTED_FILE_TYPE) before any HTTP call.

rag_list_documents()

Lists ingested documents for the current API key / workspace (newest first).

HTTP: GET /rag/documents
Returns: dict with key documents

rag_upload_document(file_path, *, name=None, description=None)

Multipart upload only; does not wait for chunking. Raises ValidationError (RAG_FILE_NOT_FOUND) if the path is missing or not a file; RAG_UNSUPPORTED_FILE_TYPE if the extension is not allowed.

HTTP: POST /rag/documents
Returns: dict (includes document_id)

rag_get_document(document_id)

Fetch status, chunk counts, and metadata for one document.

HTTP: GET /rag/documents/{id}
Returns: dict

rag_wait_document_ready(document_id, *, timeout_seconds=90, poll_interval_seconds=2)

Polls until status is done or ingested, or raises on error / timeout.

Returns: dict

rag_upload_and_wait(file_path, *, name=None, description=None, timeout_seconds=90, poll_interval_seconds=2)

Upload then block until ingestion completes. Same ValidationError (RAG_FILE_NOT_FOUND) as rag_upload_document if the local file is missing.

Returns: dict

rag_ensure_document(file_path, *, reuse_existing=True, name=None, description=None, timeout_seconds=90, poll_interval_seconds=2)

If a document with the same filename already exists and is ready, returns it without re-uploading. If pending, waits. Otherwise uploads and waits (missing local file → RAG_FILE_NOT_FOUND; bad extension → RAG_UNSUPPORTED_FILE_TYPE). Set reuse_existing=False to always send a new file.

Returns: dict

ChatSession.attach_knowledge

On a HISTORY_SERVER session, remember one or more RAG document_id strings. Each ask merges them into rag_options["document_ids"] (deduped with any IDs you pass explicitly). Same workflow as attaching knowledge from the console chat "+" menu. Available in SDK 0.1.9+.

python

from tokensaver_sdk import HISTORY_SERVER, TokenSaver

ts = TokenSaver(api_key="ts_...")

doc_id = str(ts.rag_ensure_document("./handbook.pdf")["document_id"])

session = ts.chat.session(history=HISTORY_SERVER, name="Docs Q&A")

session.attach_knowledge(doc_id)

out = session.ask(

    "What is the refund policy?",

    provider="openai",

    model="gpt-4o",

    use_rag=True,

    rag_options={"top_k": 6},

print(out.text)

ChatSession.attach_knowledge(*document_ids: str) → None

Append non-empty document IDs to the session list (no HTTP call). IDs are sent on subsequent ask() calls via merged rag_options.

ChatSession.clear_knowledge() → None

Remove all IDs added with attach_knowledge (in-memory only).

Chat sessions

Use HISTORY_SERVER for chats stored in TokenSaver, or HISTORY_LOCAL for in-process memory.

ts.chat.session(*, history=HISTORY_NONE, name='New Chat', provider=None, model=None) → ChatSession

When history is HISTORY_SERVER, creates a chat via POST /sdk/chats (Idempotency-Key set automatically) and returns a ChatSession with chat_id.

HTTP: POST /sdk/chats (server mode)
Returns: ChatSession

ChatSession

ask(prompt, **kwargs) → RunResult — forwards to TokenSaver.ask; in HISTORY_LOCAL, appends turns to memory. Merges attach_knowledge IDs into rag_options before each call.
attach_knowledge(*document_ids), clear_knowledge() — RAG document IDs for server/local/none sessions (merge behaviour applies whenever ask runs).
messages(limit=50, cursor=None, order="asc") → dict — server transcript when HISTORY_SERVER.
close() — deletes the server chat when applicable.

Return types

Dataclasses returned by ask and attached to session calls.

RunResult

text, raw, metrics (MetricsView), trace (TraceView), context (ContextView). Method to_dict().

MetricsView

cost_usd, latency_ms, cache_hit, tokens_*, savings_ratio (all optional).

TraceView

request_id, provider, model.

ContextView

history_mode, chat_id, layers_used.

Errors

Import from tokensaver_sdk.errors. All subclasses expose code, message, status_code, request_id, raw.

TokenSaverError — base class.
AuthenticationError, ValidationError, ServerError, TimeoutError
Transport / DNS — ServerError with code=NETWORK_ERROR when the host cannot be reached (wrong base_url, DNS failure, TLS, connection refused). Not an HTTP response from TokenSaver.
ProviderKeyMissingError — extra field provider
Hosted default URL — ValidationError with code=HOSTED_LLM_PROVIDER (ERROR_HOSTED_LLM_PROVIDER) if provider is not OpenAI on the public API root (see Pipeline above).
Model catalogue — ValidationError with code=LLM_MODEL_NOT_SUPPORTED (ERROR_LLM_MODEL_NOT_SUPPORTED) if provider / model are not an active row in the platform LLM reference (same rule as GET /api/v1/llm-reference/models).
QuotaExceededError — quota_dimension, limit, current_usage, retry_after_seconds
RateLimitError — retry_after_seconds
Client-side RAG path — rag_upload_document / rag_upload_and_wait / rag_ensure_document raise ValidationError with code=RAG_FILE_NOT_FOUND if the path is missing, or code=RAG_UNSUPPORTED_FILE_TYPE (ERROR_RAG_UNSUPPORTED_FILE_TYPE) if the extension is not in RAG_UPLOAD_EXTENSIONS. Compare with ERROR_RAG_FILE_NOT_FOUND; raw["path"] may hold the resolved path.

Package imports

Public surface matches tokensaver_sdk.__all__ (stable imports for docs and IDEs).

python

from tokensaver_sdk import (

    TokenSaver,

    TokenSaverClient,

    API_PIPELINE_LLM_PROVIDERS,

    DEFAULT_PUBLIC_API_BASE_URL,

    HOSTED_SAAS_LLM_PROVIDERS,

    RAG_UPLOAD_EXTENSIONS,

    mime_type_for_rag_filename,

    ERROR_HOSTED_LLM_PROVIDER,

    ERROR_LLM_MODEL_NOT_SUPPORTED,

    ERROR_RAG_FILE_NOT_FOUND,

    ERROR_RAG_UNSUPPORTED_FILE_TYPE,

    HISTORY_NONE,

    HISTORY_LOCAL,

    HISTORY_SERVER,

    HistoryMode,

    RunResult,

    MetricsView,

    TraceView,

    ContextView,

    RagOptions,

    PiiOptions,

    TokenSaverError,

    AuthenticationError,

    ProviderKeyMissingError,

    QuotaExceededError,

    RateLimitError,

    ValidationError,

    ServerError,

    TimeoutError,

import tokensaver_sdk

print(tokensaver_sdk.__version__)

History constants

HISTORY_NONE    # "none"   — no memory
HISTORY_LOCAL   # "local"  — SDK process memory
HISTORY_SERVER  # "server" — persisted on TokenSaver

Python SDK reference

Test a basic request

Methods overview

Client

TokenSaver.__init__(base_url=None, api_key=None, *, provider_api_key=None, timeout_total=30, connect_timeout=5, read_timeout=25, max_retries=2, headers=None)

Pipeline & responses

ask(...)

run_pipeline(...)

estimate_cost(prompt_tokens, completion_tokens, provider, model)

Pipeline request options (console parity)

HTTP helpers

RAG documents

rag_list_documents()

rag_upload_document(file_path, *, name=None, description=None)

rag_get_document(document_id)

rag_wait_document_ready(document_id, *, timeout_seconds=90, poll_interval_seconds=2)

rag_upload_and_wait(file_path, *, name=None, description=None, timeout_seconds=90, poll_interval_seconds=2)

rag_ensure_document(file_path, *, reuse_existing=True, name=None, description=None, timeout_seconds=90, poll_interval_seconds=2)

ChatSession.attach_knowledge

ChatSession.attach_knowledge(*document_ids: str) → None

ChatSession.clear_knowledge() → None

Chat sessions

ts.chat.session(*, history=HISTORY_NONE, name='New Chat', provider=None, model=None) → ChatSession

Return types

Errors

Package imports

TokenSaver.init(base_url=None, api_key=None, *, provider_api_key=None, timeout_total=30, connect_timeout=5, read_timeout=25, max_retries=2, headers=None)