Create chat completion
Standard messages array, optional tools / tool_choice, stream for SSE. Activate cache, RAG, compression, and PII via the tokensaver object and/or X-Tokensaver-* headers; optionally reuse the API key pipeline_settings with X-Tokensaver-Apply-Key-Pipeline-Defaults (see Headers for merge order).
Request highlights
model— preferprovider/model_id(e.g.openai/gpt-4o) fromGET /openai/v1/modelsmessages— user / assistant / system / tool; the last user text becomes the pipeline prompt; prior turns feed context layersstream: true—text/event-stream, chunks withchat.completion.chunk, thendata: [DONE]tools,tool_choice— function calling; plan quotas may apply — see Tools & streaming
Minimal curl (modules off)
curl -sS "https://api.tokensaver.fr/openai/v1/chat/completions" \
-H "Authorization: Bearer $TS_KEY" \
-H "Content-Type: application/json" \
-d '{"model":"openai/gpt-4o","messages":[{"role":"user","content":"Hi"}]}'Minimal curl (modules from API key)
When the TokenSaver key has pipeline_settings in the console and you send no explicit module control, X-Tokensaver-Apply-Key-Pipeline-Defaults: true merges them like POST /api/v1/pipelines/run. Same reserved key in JSON: apply_key_pipeline_defaults.
curl -sS "https://api.tokensaver.fr/openai/v1/chat/completions" \
-H "Authorization: Bearer $TS_KEY" \
-H "Content-Type: application/json" \
-H "X-Tokensaver-Apply-Key-Pipeline-Defaults: true" \
-d '{"model":"openai/gpt-4o","messages":[{"role":"user","content":"Hi"}]}'OpenAI Python SDK
from openai import OpenAI
client = OpenAI(
api_key="ts_...", # TokenSaver key, not the vendor key
base_url="https://api.tokensaver.fr/openai/v1",
)
r = client.chat.completions.create(
model="openai/gpt-4o",
messages=[{"role": "user", "content": "Hello"}],)
print(r.choices[0].message.content)
LangChain (ChatOpenAI)
import json
from langchain_openai import ChatOpenAI
llm = ChatOpenAI(
model="openai/gpt-4o",
api_key="ts_...",
base_url="https://api.tokensaver.fr/openai/v1",
default_headers={ "X-Tokensaver-Options": json.dumps({"use_cache": True,
"cache_similarity_threshold": 0.85,
}),
},
)
print(llm.invoke("Hello").content)Use default_headers for X-Tokensaver-*. To send a tokensaver object in the JSON body, use the OpenAI Python SDK directly (extra_body) or httpx.
Enable cache (threshold)
Similarity is 0–1. Higher threshold → stricter match. Headers override JSON bools if both are sent (merge order).
curl -sS "https://api.tokensaver.fr/openai/v1/chat/completions" \
-H "Authorization: Bearer $TS_KEY" -H "Content-Type: application/json" \
-H 'X-Tokensaver-Use-Cache: true' \
-H 'X-Tokensaver-Options: {"cache_similarity_threshold":0.85}' \ -d '{"model":"openai/gpt-4o","messages":[{"role":"user","content":"Same prompt as before"}]}'# Body-only (also opts in to explicit module control)
payload = {"model": "openai/gpt-4o",
"messages": [{"role": "user", "content": "Explain cache in one line."}], "tokensaver": {"use_cache": True,
"cache_similarity_threshold": 0.85,
},
}
Enable RAG (document_ids, top_k, threshold)
curl -sS "https://api.tokensaver.fr/openai/v1/chat/completions" \
-H "Authorization: Bearer $TS_KEY" -H "Content-Type: application/json" \
-H 'X-Tokensaver-Use-Rag: true' \
-H 'X-Tokensaver-Options: {"rag_similarity_threshold":0.55,"rag_options":{"document_ids":["<uuid>"],"top_k":8}}' \ -d '{"model":"openai/gpt-4o","messages":[{"role":"user","content":"What does our handbook say about refunds?"}]}'client.chat.completions.create(
model="openai/gpt-4o",
messages=[{"role": "user", "content": "Summarize the uploaded policy."}], extra_body={ "tokensaver": {"use_rag": True,
"rag_similarity_threshold": 0.55,
"rag_options": {"document_ids": ["<document_uuid>"], "top_k": 8},}
},
)
Compression + PII
{"model": "openai/gpt-4o",
"messages": [{"role": "user", "content": "…"}], "tokensaver": {"use_compression": true,
"compression_level": 4,
"use_pii_filter": true,
"pii_options": {"engine": "gliner",
"strategy": "mask",
"confidence_threshold": 0.5,
"language": "en",
"regex_fallback": true
}
}
}
Server chat persistence (chat_id)
Optional: pass chat_id inside tokensaver (or in X-Tokensaver-Options / X-Tokensaver-Extensions) to bind the run to a server-side chat, same as the native API.
"tokensaver": {"chat_id": "<existing_server_chat_uuid>",
"use_rag": true,
"rag_options": {"top_k": 6}}
Ephemeral LLM vendor key
Same as native: tokensaver.provider_api_key overrides organisation-stored keys for that request only; it is not persisted.
"tokensaver": {"provider_api_key": "sk-..."}Response shape
Standard chat.completion with choices, usage. When metadata is present, an extra tokensaver object may include model_resolved and metadata. Some integrations also surface metrics via response headers—see Headers.
