Developer documentation
OpenAI-compatible API & LangChain
Clients that speak the OpenAI HTTP contract (chat completions, Responses API, models, embeddings) can point at TokenSaver's /openai/v1 prefix on the same host as the native API. Behind the scenes it is the same pipeline: cache → RAG → compression → PII → LLM → metrics. This page mirrors the OpenAI-compatible guidance from the in-app Docs on the TokenSaver console — formatted for the public site.
Quick start
- Create a TokenSaver API key in your workspace (Settings → API keys). Routes under
/openai/v1acceptAuthorization: Bearer ts_…only — not the console session JWT. - Set
base_url/baseURLtohttps://api.tokensaver.fr/openai/v1(no trailing slash). - Prefer model ids as
provider/model(e.g.openai/gpt-4o) for unambiguous resolution. - Tune cache, RAG, compression, and PII per request with
X-Tokensaver-Options,X-Tokensaver-Use-*headers, or atokensaverobject in the JSON body — applies to both/chat/completionsand/responses.
Chat Completions vs Responses API
TokenSaver exposes two OpenAI-style chat surfaces on the same prefix; both run the identical pipeline and accept the same TokenSaver headers /tokensaver body extensions.
- POST …/chat/completions — classic
messagesarray. Non-stream JSON:chat.completion. Stream: SSE linesdata:withchat.completion.chunk, thendata: [DONE]. - POST …/responses — OpenAI Responses API shape:
input(text or structured items), optionalinstructions. Non-stream:object: "response"withoutput_text-style content. Stream: SSE withevent:+data:(e.g.response.output_text.delta). Text-only inputs in phase 1; see repo spec for limits.
Full HTTP schema: GET https://api.tokensaver.fr/api/v1/openapi.json (paths under /openai/v1/…).
OpenAI-compatible base URL
https://api.tokensaver.fr/openai/v1
Append /chat/completions, /responses, /models, or /embeddings. Native API: https://api.tokensaver.fr/api/v1.
Minimal examples
OpenAI Python SDK
from openai import OpenAI
client = OpenAI(
base_url="https://api.tokensaver.fr/openai/v1",
api_key="ts_your_tokensaver_api_key",
)
completion = client.chat.completions.create(
model="openai/gpt-4o",
messages=[{"role": "user", "content": "Hello"}],
)
print(completion.choices[0].message.content)OpenAI JavaScript / TypeScript
import OpenAI from "openai";
const openai = new OpenAI({
baseURL: "https://api.tokensaver.fr/openai/v1",
apiKey: process.env.TOKENSAVER_API_KEY!,
});
const res = await openai.chat.completions.create({
model: "openai/gpt-4o",
messages: [{ role: "user", content: "Hello" }],
});
console.log(res.choices[0]?.message?.content);cURL
curl -sS "https://api.tokensaver.fr/openai/v1/chat/completions" \
-H "Authorization: Bearer ts_your_tokensaver_api_key" \
-H "Content-Type: application/json" \
-d '{"model":"openai/gpt-4o","messages":[{"role":"user","content":"Hello"}]}'cURL — Responses API
curl -sS "https://api.tokensaver.fr/openai/v1/responses" \
-H "Authorization: Bearer ts_your_tokensaver_api_key" \
-H "Content-Type: application/json" \
-d '{"model":"openai/gpt-4o","input":"Hello"}'LangChain (Python) — ChatOpenAI
from langchain_openai import ChatOpenAI
llm = ChatOpenAI(
model="openai/gpt-4o",
base_url="https://api.tokensaver.fr/openai/v1",
api_key="ts_your_tokensaver_api_key",
)
print(llm.invoke("Hello").content)Pipeline headers (cache, RAG, thresholds)
Use X-Tokensaver-Use-* for the four module switches and a single-line JSON string on X-Tokensaver-Options for thresholds (e.g. cache_similarity_threshold). If both styles are sent, the four Use-* headers override the matching booleans in the JSON.
Header map (illustration)
{
"X-Tokensaver-Use-Cache": "true",
"X-Tokensaver-Use-Rag": "false",
"X-Tokensaver-Use-Compression": "false",
"X-Tokensaver-Use-Pii-Filter": "false",
"X-Tokensaver-Options": "{\"cache_similarity_threshold\":0.85}"
}| Header | Role |
|---|---|
X-Tokensaver-Options | Single-line JSON: flags, thresholds, rag_options, pii_options, etc. |
X-Tokensaver-Extensions | Same shape; merged first — keys in Options win. |
X-Tokensaver-Use-* | Per-module booleans: Cache, Rag, Compression, Pii-Filter (alias Pii). |
X-Tokensaver-Provider | Optional hint if model has no provider prefix. |
OpenAI Python SDK — default_headers (Use-* + Options)
import json
from openai import OpenAI
client = OpenAI(
base_url="https://api.tokensaver.fr/openai/v1",
api_key="ts_your_tokensaver_api_key",
default_headers={
"X-Tokensaver-Use-Cache": "true",
"X-Tokensaver-Use-Rag": "false",
"X-Tokensaver-Use-Compression": "false",
"X-Tokensaver-Use-Pii-Filter": "false",
"X-Tokensaver-Options": json.dumps({"cache_similarity_threshold": 0.85}),
},
)
r = client.chat.completions.create(
model="openai/gpt-4o",
messages=[{"role": "user", "content": "Hello"}],
)
print(r.choices[0].message.content)LangChain — default_headers (Use-* + Options)
import json
from langchain_openai import ChatOpenAI
llm = ChatOpenAI(
model="openai/gpt-4o",
base_url="https://api.tokensaver.fr/openai/v1",
api_key="ts_your_tokensaver_api_key",
default_headers={
"X-Tokensaver-Use-Cache": "true",
"X-Tokensaver-Use-Rag": "true",
"X-Tokensaver-Use-Compression": "false",
"X-Tokensaver-Use-Pii-Filter": "false",
"X-Tokensaver-Options": json.dumps({
"cache_similarity_threshold": 0.85,
"rag_similarity_threshold": 0.75,
}),
},
)
print(llm.invoke("Hello").content)cURL — Use-* + Options
curl -sS "https://api.tokensaver.fr/openai/v1/chat/completions" \
-H "Authorization: Bearer ts_your_tokensaver_api_key" \
-H "Content-Type: application/json" \
-H "X-Tokensaver-Use-Cache: true" \
-H "X-Tokensaver-Use-Rag: true" \
-H "X-Tokensaver-Use-Compression: false" \
-H "X-Tokensaver-Use-Pii-Filter: false" \
-H 'X-Tokensaver-Options: {"cache_similarity_threshold":0.85,"rag_similarity_threshold":0.72}' \
-d '{"model":"openai/gpt-4o","messages":[{"role":"user","content":"Hello"}]}'Chat UIs and localhost
Some third-party front-ends block localhost as an API base (SSRF rules on their side). Use a public HTTPS API host or tunnel — not a TokenSaver error.
Non-streaming /chat/completions returns usage as prompt_tokens / completion_tokens / total_tokens. POST …/responses uses the Responses-style usage (input_tokens / output_tokens / total_tokens). Product spec: docs/OPENAI-API-COMPATIBILITE.md (§5bis for Responses).
