API Reference

Python SDK — full method reference

Single scrollable page with every method card, pipeline options, and error tables. Prefer the split SDK pages in the sidebar for quicker navigation.

LLM provider keys

Ephemeral vs stored keys are documented on Native → LLM provider keys. Links below that point here scroll to this note.

Python SDK reference

The tokensaver-sdk package is a thin client over the TokenSaver HTTP API. For ephemeral provider_api_key vs keys stored in Settings, read LLM provider keys at the top of this page. Then use the snippet below and jump to any method for its purpose, return type, and signature.

Test a basic request

Create a client with your API key and call ask — the smallest path from install to a model response. You do not need base_url unless you target a non-default host (see Client).

python
from tokensaver_sdk import TokenSaver
 
ts = TokenSaver(api_key="ts_...")
out = ts.ask(
    "Say hello in one sentence.",
    provider="openai",
    model="gpt-4o",
)
print(out.text)

Run with: python main.py (after pip install tokensaver-sdk).

Client

TokenSaver (alias TokenSaverClient) holds your API key, base URL, and chat for session helpers.

TokenSaver.__init__(base_url=None, api_key=None, *, provider_api_key=None, timeout_total=30, connect_timeout=5, read_timeout=25, max_retries=2, headers=None)

Builds the client. Omit base_url to use the built-in production API root (https://api.tokensaver.fr/api/v1). Optional provider_api_key: default LLM provider secret sent on every pipeline run (overrides org keys for that run only; never stored). Raises ValueError if api_key is missing.

Returns
TokenSaver
python
ts = TokenSaver(api_key="ts_...")
ts = TokenSaver(api_key="ts_...", provider_api_key="sk-...")
ts = TokenSaver("https://api.example.com/api/v1", "ts_...")
ts = TokenSaver(base_url="http://localhost:8000/api/v1", api_key="ts_...")

Attributes

  • base_url — Normalized API root (no trailing slash).
  • api_key — Same string you passed in (sent as Bearer).
  • chat — Use ts.chat.session(...) for chat flows (see Chat sessions).

Pipeline & responses

These methods call the governed pipeline (cache, RAG, compression, PII, etc.) according to your flags and org settings.

Hosted API (default base URL)

On https://api.tokensaver.fr/api/v1, use provider in HOSTED_SAAS_LLM_PROVIDERS (OpenAI, Anthropic, Google, Mistral, Grok, DeepSeek) — same as the hosted console. Other codes raise ValidationError (HOSTED_LLM_PROVIDER). Point base_url at a self-hosted stack for additional providers where the backend allows them.

ask(...)

Recommended entry point: same JSON body as run_pipeline. Returns RunResult (.text, metrics, trace). Optional provider_api_key per call (overrides client default and org DB keys for that run only). History: pass chat_id with HISTORY_SERVER or chat_history with HISTORY_LOCAL. Module tuning (same as console): temperature; rag_similarity_threshold; cache_similarity_threshold; compression_level (1–5); rag_options { document_ids, top_k, query_image_url }; pii_options { engine, strategy, confidence_threshold, entity_types, language, regex_fallback }; optional context_layers or legacy system_prompt / profile_context / workspace_instructions.

HTTP
POST /pipelines/run
Returns
RunResult
python
result = ts.ask(
    "Summarize this in 3 bullets.",
    provider="openai",
    model="gpt-4o",
    use_rag=True,
    rag_similarity_threshold=0.55,
    rag_options={"document_ids": ["<document_id>"], "top_k": 8},
)
print(result.text, result.metrics.cost_usd)

run_pipeline(...)

Lower-level: raw API JSON. Accepts the same optional module fields as ask (temperature, thresholds, compression_level, rag_options, pii_options, context_layers, legacy instruction fields, provider_api_key).

HTTP
POST /pipelines/run
Returns
dict

estimate_cost(prompt_tokens, completion_tokens, provider, model)

Ask TokenSaver for an estimated cost from token counts — no LLM call.

HTTP
POST /pricing/estimate
Returns
dict

Pipeline request options (console parity)

ask and run_pipeline send the same optional JSON fields as the TokenSaver console for POST /pipelines/run. Omit a field to use server defaults. The console does not send provider_api_key; the SDK can (see row below).

ParameterPurpose
temperatureLLM temperature (0–2).
use_cache, use_rag, use_compression, use_pii_filter, streamEnable pipeline modules (booleans).
rag_similarity_thresholdRAG retrieval similarity floor (0–1).
cache_similarity_thresholdSemantic cache similarity (0–1).
compression_levelCompression strength 1–5 when compression is on.
provider_api_keySDK only. Per-request LLM provider secret; takes precedence over organisation keys in the database for that run; never stored. Set on TokenSaver(..., provider_api_key=...) or pass to ask / run_pipeline.
rag_optionsDict: document_ids, top_k, query_image_url.
pii_optionsDict: engine, strategy, confidence_threshold, entity_types, language, regex_fallback.
context_layersStructured instruction / knowledge / interaction (canonical API).
system_prompt, profile_context, workspace_instructionsLegacy flat instruction fields (if not using context_layers).
chat_id, chat_historySession routing (see history modes).

Types RagOptions and PiiOptions (TypedDict) are exported from tokensaver_sdk for editor hints.

HTTP helpers

Authenticated httpx calls with retries on transient errors. Paths are relative to base_url (e.g. "rag/documents").

get(path, **kwargs) → Response

GET with Authorization header and JSON Accept.

post(path, **kwargs) → Response

POST JSON by default; use httpx kwargs for custom bodies.

delete(path, **kwargs) → Response

Used by ChatSession.close() for server chats.

RAG documents

Upload supported files to your workspace, wait for ingestion, then pass document_ids in rag_options on ask(..., use_rag=True) — or use ChatSession.attach_knowledge on server chats. Allowed extensions match the console and RAG_UPLOAD_EXTENSIONS (pdf, txt, md, csv, json, docx). Wrong extension → ValidationError (RAG_UNSUPPORTED_FILE_TYPE) before any HTTP call.

rag_list_documents()

Lists ingested documents for the current API key / workspace (newest first).

HTTP
GET /rag/documents
Returns
dict with key documents

rag_upload_document(file_path, *, name=None, description=None)

Multipart upload only; does not wait for chunking. Raises ValidationError (RAG_FILE_NOT_FOUND) if the path is missing or not a file; RAG_UNSUPPORTED_FILE_TYPE if the extension is not allowed.

HTTP
POST /rag/documents
Returns
dict (includes document_id)

rag_get_document(document_id)

Fetch status, chunk counts, and metadata for one document.

HTTP
GET /rag/documents/{id}
Returns
dict

rag_wait_document_ready(document_id, *, timeout_seconds=90, poll_interval_seconds=2)

Polls until status is done or ingested, or raises on error / timeout.

Returns
dict

rag_upload_and_wait(file_path, *, name=None, description=None, timeout_seconds=90, poll_interval_seconds=2)

Upload then block until ingestion completes. Same ValidationError (RAG_FILE_NOT_FOUND) as rag_upload_document if the local file is missing.

Returns
dict

rag_ensure_document(file_path, *, reuse_existing=True, name=None, description=None, timeout_seconds=90, poll_interval_seconds=2)

If a document with the same filename already exists and is ready, returns it without re-uploading. If pending, waits. Otherwise uploads and waits (missing local file → RAG_FILE_NOT_FOUND; bad extension → RAG_UNSUPPORTED_FILE_TYPE). Set reuse_existing=False to always send a new file.

Returns
dict

ChatSession.attach_knowledge

On a HISTORY_SERVER session, remember one or more RAG document_id strings. Each ask merges them into rag_options["document_ids"] (deduped with any IDs you pass explicitly). Same workflow as attaching knowledge from the console chat "+" menu. Available in SDK 0.1.9+.

python
from tokensaver_sdk import HISTORY_SERVER, TokenSaver
 
ts = TokenSaver(api_key="ts_...")
doc_id = str(ts.rag_ensure_document("./handbook.pdf")["document_id"])
 
session = ts.chat.session(history=HISTORY_SERVER, name="Docs Q&A")
session.attach_knowledge(doc_id)
out = session.ask(
    "What is the refund policy?",
    provider="openai",
    model="gpt-4o",
    use_rag=True,
    rag_options={"top_k": 6},
)
print(out.text)

ChatSession.attach_knowledge(*document_ids: str) → None

Append non-empty document IDs to the session list (no HTTP call). IDs are sent on subsequent ask() calls via merged rag_options.

ChatSession.clear_knowledge() → None

Remove all IDs added with attach_knowledge (in-memory only).

Chat sessions

Use HISTORY_SERVER for chats stored in TokenSaver, or HISTORY_LOCAL for in-process memory.

ts.chat.session(*, history=HISTORY_NONE, name='New Chat', provider=None, model=None) → ChatSession

When history is HISTORY_SERVER, creates a chat via POST /sdk/chats (Idempotency-Key set automatically) and returns a ChatSession with chat_id.

HTTP
POST /sdk/chats (server mode)
Returns
ChatSession

ChatSession

  • ask(prompt, **kwargs) → RunResult — forwards to TokenSaver.ask; in HISTORY_LOCAL, appends turns to memory. Merges attach_knowledge IDs into rag_options before each call.
  • attach_knowledge(*document_ids), clear_knowledge() — RAG document IDs for server/local/none sessions (merge behaviour applies whenever ask runs).
  • messages(limit=50, cursor=None, order="asc") → dict — server transcript when HISTORY_SERVER.
  • close() — deletes the server chat when applicable.

Return types

Dataclasses returned by ask and attached to session calls.

RunResult

text, raw, metrics (MetricsView), trace (TraceView), context (ContextView). Method to_dict().

MetricsView

cost_usd, latency_ms, cache_hit, tokens_*, savings_ratio (all optional).

TraceView

request_id, provider, model.

ContextView

history_mode, chat_id, layers_used.

Errors

Import from tokensaver_sdk.errors. All subclasses expose code, message, status_code, request_id, raw.

  • TokenSaverError — base class.
  • AuthenticationError, ValidationError, ServerError, TimeoutError
  • Transport / DNS ServerError with code=NETWORK_ERROR when the host cannot be reached (wrong base_url, DNS failure, TLS, connection refused). Not an HTTP response from TokenSaver.
  • ProviderKeyMissingError — extra field provider
  • Hosted default URL ValidationError with code=HOSTED_LLM_PROVIDER (ERROR_HOSTED_LLM_PROVIDER) if provider is not OpenAI on the public API root (see Pipeline above).
  • Model catalogue ValidationError with code=LLM_MODEL_NOT_SUPPORTED (ERROR_LLM_MODEL_NOT_SUPPORTED) if provider / model are not an active row in the platform LLM reference (same rule as GET /api/v1/llm-reference/models).
  • QuotaExceededError — quota_dimension, limit, current_usage, retry_after_seconds
  • RateLimitError — retry_after_seconds
  • Client-side RAG path rag_upload_document / rag_upload_and_wait / rag_ensure_document raise ValidationError with code=RAG_FILE_NOT_FOUND if the path is missing, or code=RAG_UNSUPPORTED_FILE_TYPE (ERROR_RAG_UNSUPPORTED_FILE_TYPE) if the extension is not in RAG_UPLOAD_EXTENSIONS. Compare with ERROR_RAG_FILE_NOT_FOUND; raw["path"] may hold the resolved path.

Package imports

Public surface matches tokensaver_sdk.__all__ (stable imports for docs and IDEs).

python
from tokensaver_sdk import (
    TokenSaver,
    TokenSaverClient,
    API_PIPELINE_LLM_PROVIDERS,
    DEFAULT_PUBLIC_API_BASE_URL,
    HOSTED_SAAS_LLM_PROVIDERS,
    RAG_UPLOAD_EXTENSIONS,
    mime_type_for_rag_filename,
    ERROR_HOSTED_LLM_PROVIDER,
    ERROR_LLM_MODEL_NOT_SUPPORTED,
    ERROR_RAG_FILE_NOT_FOUND,
    ERROR_RAG_UNSUPPORTED_FILE_TYPE,
    HISTORY_NONE,
    HISTORY_LOCAL,
    HISTORY_SERVER,
    HistoryMode,
    RunResult,
    MetricsView,
    TraceView,
    ContextView,
    RagOptions,
    PiiOptions,
    TokenSaverError,
    AuthenticationError,
    ProviderKeyMissingError,
    QuotaExceededError,
    RateLimitError,
    ValidationError,
    ServerError,
    TimeoutError,
)
import tokensaver_sdk
 
print(tokensaver_sdk.__version__)

History constants

HISTORY_NONE    # "none"   — no memory
HISTORY_LOCAL   # "local"  — SDK process memory
HISTORY_SERVER  # "server" — persisted on TokenSaver