Developer documentation

OpenAI-compatible API & LangChain

Clients that speak the OpenAI HTTP contract (chat completions, Responses API, models, embeddings) can point at TokenSaver's /openai/v1 prefix on the same host as the native API. Behind the scenes it is the same pipeline: cache → RAG → compression → PII → LLM → metrics. This page mirrors the OpenAI-compatible guidance from the in-app Docs on the TokenSaver console — formatted for the public site.

Quick start

  1. Create a TokenSaver API key in your workspace (Settings → API keys). Routes under /openai/v1 accept Authorization: Bearer ts_… only — not the console session JWT.
  2. Set base_url / baseURL to https://api.tokensaver.fr/openai/v1 (no trailing slash).
  3. Prefer model ids as provider/model (e.g. openai/gpt-4o) for unambiguous resolution.
  4. Tune cache, RAG, compression, and PII per request with X-Tokensaver-Options, X-Tokensaver-Use-* headers, or a tokensaver object in the JSON body — applies to both /chat/completions and /responses.

Chat Completions vs Responses API

TokenSaver exposes two OpenAI-style chat surfaces on the same prefix; both run the identical pipeline and accept the same TokenSaver headers /tokensaver body extensions.

  1. POST …/chat/completions — classic messages array. Non-stream JSON: chat.completion. Stream: SSE lines data: with chat.completion.chunk, then data: [DONE].
  2. POST …/responses — OpenAI Responses API shape: input (text or structured items), optional instructions. Non-stream: object: "response" with output_text-style content. Stream: SSE with event: + data: (e.g. response.output_text.delta). Text-only inputs in phase 1; see repo spec for limits.

Full HTTP schema: GET https://api.tokensaver.fr/api/v1/openapi.json (paths under /openai/v1/…).

OpenAI-compatible base URL

https://api.tokensaver.fr/openai/v1

Append /chat/completions, /responses, /models, or /embeddings. Native API: https://api.tokensaver.fr/api/v1.

Minimal examples

OpenAI Python SDK

from openai import OpenAI

client = OpenAI(
    base_url="https://api.tokensaver.fr/openai/v1",
    api_key="ts_your_tokensaver_api_key",
)
completion = client.chat.completions.create(
    model="openai/gpt-4o",
    messages=[{"role": "user", "content": "Hello"}],
)
print(completion.choices[0].message.content)

OpenAI JavaScript / TypeScript

import OpenAI from "openai";

const openai = new OpenAI({
  baseURL: "https://api.tokensaver.fr/openai/v1",
  apiKey: process.env.TOKENSAVER_API_KEY!,
});
const res = await openai.chat.completions.create({
  model: "openai/gpt-4o",
  messages: [{ role: "user", content: "Hello" }],
});
console.log(res.choices[0]?.message?.content);

cURL

curl -sS "https://api.tokensaver.fr/openai/v1/chat/completions" \
  -H "Authorization: Bearer ts_your_tokensaver_api_key" \
  -H "Content-Type: application/json" \
  -d '{"model":"openai/gpt-4o","messages":[{"role":"user","content":"Hello"}]}'

cURL — Responses API

curl -sS "https://api.tokensaver.fr/openai/v1/responses" \
  -H "Authorization: Bearer ts_your_tokensaver_api_key" \
  -H "Content-Type: application/json" \
  -d '{"model":"openai/gpt-4o","input":"Hello"}'

LangChain (Python) — ChatOpenAI

from langchain_openai import ChatOpenAI

llm = ChatOpenAI(
    model="openai/gpt-4o",
    base_url="https://api.tokensaver.fr/openai/v1",
    api_key="ts_your_tokensaver_api_key",
)
print(llm.invoke("Hello").content)

Pipeline headers (cache, RAG, thresholds)

Use X-Tokensaver-Use-* for the four module switches and a single-line JSON string on X-Tokensaver-Options for thresholds (e.g. cache_similarity_threshold). If both styles are sent, the four Use-* headers override the matching booleans in the JSON.

Header map (illustration)

{
  "X-Tokensaver-Use-Cache": "true",
  "X-Tokensaver-Use-Rag": "false",
  "X-Tokensaver-Use-Compression": "false",
  "X-Tokensaver-Use-Pii-Filter": "false",
  "X-Tokensaver-Options": "{\"cache_similarity_threshold\":0.85}"
}
HeaderRole
X-Tokensaver-OptionsSingle-line JSON: flags, thresholds, rag_options, pii_options, etc.
X-Tokensaver-ExtensionsSame shape; merged first — keys in Options win.
X-Tokensaver-Use-*Per-module booleans: Cache, Rag, Compression, Pii-Filter (alias Pii).
X-Tokensaver-ProviderOptional hint if model has no provider prefix.

OpenAI Python SDK — default_headers (Use-* + Options)

import json
from openai import OpenAI

client = OpenAI(
    base_url="https://api.tokensaver.fr/openai/v1",
    api_key="ts_your_tokensaver_api_key",
    default_headers={
        "X-Tokensaver-Use-Cache": "true",
        "X-Tokensaver-Use-Rag": "false",
        "X-Tokensaver-Use-Compression": "false",
        "X-Tokensaver-Use-Pii-Filter": "false",
        "X-Tokensaver-Options": json.dumps({"cache_similarity_threshold": 0.85}),
    },
)
r = client.chat.completions.create(
    model="openai/gpt-4o",
    messages=[{"role": "user", "content": "Hello"}],
)
print(r.choices[0].message.content)

LangChain — default_headers (Use-* + Options)

import json
from langchain_openai import ChatOpenAI

llm = ChatOpenAI(
    model="openai/gpt-4o",
    base_url="https://api.tokensaver.fr/openai/v1",
    api_key="ts_your_tokensaver_api_key",
    default_headers={
        "X-Tokensaver-Use-Cache": "true",
        "X-Tokensaver-Use-Rag": "true",
        "X-Tokensaver-Use-Compression": "false",
        "X-Tokensaver-Use-Pii-Filter": "false",
        "X-Tokensaver-Options": json.dumps({
            "cache_similarity_threshold": 0.85,
            "rag_similarity_threshold": 0.75,
        }),
    },
)
print(llm.invoke("Hello").content)

cURL — Use-* + Options

curl -sS "https://api.tokensaver.fr/openai/v1/chat/completions" \
  -H "Authorization: Bearer ts_your_tokensaver_api_key" \
  -H "Content-Type: application/json" \
  -H "X-Tokensaver-Use-Cache: true" \
  -H "X-Tokensaver-Use-Rag: true" \
  -H "X-Tokensaver-Use-Compression: false" \
  -H "X-Tokensaver-Use-Pii-Filter: false" \
  -H 'X-Tokensaver-Options: {"cache_similarity_threshold":0.85,"rag_similarity_threshold":0.72}' \
  -d '{"model":"openai/gpt-4o","messages":[{"role":"user","content":"Hello"}]}'

Chat UIs and localhost

Some third-party front-ends block localhost as an API base (SSRF rules on their side). Use a public HTTPS API host or tunnel — not a TokenSaver error.

Non-streaming /chat/completions returns usage as prompt_tokens / completion_tokens / total_tokens. POST …/responses uses the Responses-style usage (input_tokens / output_tokens / total_tokens). Product spec: docs/OPENAI-API-COMPATIBILITE.md (§5bis for Responses).