API Reference
POST/openai/v1/chat/completions

Create chat completion

Standard messages array, optional tools / tool_choice, stream for SSE. Activate cache, RAG, compression, and PII via the tokensaver object and/or X-Tokensaver-* headers; optionally reuse the API key pipeline_settings with X-Tokensaver-Apply-Key-Pipeline-Defaults (see Headers for merge order).

Request highlights

  • model — prefer provider/model_id (e.g. openai/gpt-4o) from GET /openai/v1/models
  • messages — user / assistant / system / tool; the last user text becomes the pipeline prompt; prior turns feed context layers
  • stream: truetext/event-stream, chunks with chat.completion.chunk, then data: [DONE]
  • tools, tool_choice — function calling; plan quotas may apply — see Tools & streaming

Minimal curl (modules off)

bash
curl -sS "https://api.tokensaver.fr/openai/v1/chat/completions" \
  -H "Authorization: Bearer $TS_KEY" \
  -H "Content-Type: application/json" \
  -d '{"model":"openai/gpt-4o","messages":[{"role":"user","content":"Hi"}]}'

Minimal curl (modules from API key)

When the TokenSaver key has pipeline_settings in the console and you send no explicit module control, X-Tokensaver-Apply-Key-Pipeline-Defaults: true merges them like POST /api/v1/pipelines/run. Same reserved key in JSON: apply_key_pipeline_defaults.

bash
curl -sS "https://api.tokensaver.fr/openai/v1/chat/completions" \
  -H "Authorization: Bearer $TS_KEY" \
  -H "Content-Type: application/json" \
  -H "X-Tokensaver-Apply-Key-Pipeline-Defaults: true" \
  -d '{"model":"openai/gpt-4o","messages":[{"role":"user","content":"Hi"}]}'

OpenAI Python SDK

python
from openai import OpenAI
 
client = OpenAI(
    api_key="ts_...",  # TokenSaver key, not the vendor key
    base_url="https://api.tokensaver.fr/openai/v1",
)
r = client.chat.completions.create(
    model="openai/gpt-4o",
    messages=[{"role": "user", "content": "Hello"}],
)
print(r.choices[0].message.content)

LangChain (ChatOpenAI)

python
import json
from langchain_openai import ChatOpenAI
 
llm = ChatOpenAI(
    model="openai/gpt-4o",
    api_key="ts_...",
    base_url="https://api.tokensaver.fr/openai/v1",
    default_headers={
        "X-Tokensaver-Options": json.dumps({
            "use_cache": True,
            "cache_similarity_threshold": 0.85,
        }),
    },
)
print(llm.invoke("Hello").content)

Use default_headers for X-Tokensaver-*. To send a tokensaver object in the JSON body, use the OpenAI Python SDK directly (extra_body) or httpx.

Enable cache (threshold)

Similarity is 0–1. Higher threshold → stricter match. Headers override JSON bools if both are sent (merge order).

bash
curl -sS "https://api.tokensaver.fr/openai/v1/chat/completions" \
  -H "Authorization: Bearer $TS_KEY" -H "Content-Type: application/json" \
  -H 'X-Tokensaver-Use-Cache: true' \
  -H 'X-Tokensaver-Options: {"cache_similarity_threshold":0.85}' \
  -d '{"model":"openai/gpt-4o","messages":[{"role":"user","content":"Same prompt as before"}]}'
python
# Body-only (also opts in to explicit module control)
payload = {
    "model": "openai/gpt-4o",
    "messages": [{"role": "user", "content": "Explain cache in one line."}],
    "tokensaver": {
        "use_cache": True,
        "cache_similarity_threshold": 0.85,
    },
}

Enable RAG (document_ids, top_k, threshold)

bash
curl -sS "https://api.tokensaver.fr/openai/v1/chat/completions" \
  -H "Authorization: Bearer $TS_KEY" -H "Content-Type: application/json" \
  -H 'X-Tokensaver-Use-Rag: true' \
  -H 'X-Tokensaver-Options: {"rag_similarity_threshold":0.55,"rag_options":{"document_ids":["<uuid>"],"top_k":8}}' \
  -d '{"model":"openai/gpt-4o","messages":[{"role":"user","content":"What does our handbook say about refunds?"}]}'
python
client.chat.completions.create(
    model="openai/gpt-4o",
    messages=[{"role": "user", "content": "Summarize the uploaded policy."}],
    extra_body={
        "tokensaver": {
            "use_rag": True,
            "rag_similarity_threshold": 0.55,
            "rag_options": {"document_ids": ["<document_uuid>"], "top_k": 8},
        }
    },
)

Compression + PII

json
{
  "model": "openai/gpt-4o",
  "messages": [{"role": "user", "content": "…"}],
  "tokensaver": {
    "use_compression": true,
    "compression_level": 4,
    "use_pii_filter": true,
    "pii_options": {
      "engine": "gliner",
      "strategy": "mask",
      "confidence_threshold": 0.5,
      "language": "en",
      "regex_fallback": true
    }
  }
}

Server chat persistence (chat_id)

Optional: pass chat_id inside tokensaver (or in X-Tokensaver-Options / X-Tokensaver-Extensions) to bind the run to a server-side chat, same as the native API.

json
"tokensaver": {
  "chat_id": "<existing_server_chat_uuid>",
  "use_rag": true,
  "rag_options": {"top_k": 6}
}

Ephemeral LLM vendor key

Same as native: tokensaver.provider_api_key overrides organisation-stored keys for that request only; it is not persisted.

json
"tokensaver": {"provider_api_key": "sk-..."}

Response shape

Standard chat.completion with choices, usage. When metadata is present, an extra tokensaver object may include model_resolved and metadata. Some integrations also surface metrics via response headers—see Headers.