POST/openai/v1/chat/completions

Create chat completion

Standard messages array, optional tools / tool_choice, stream for SSE. Activate cache, RAG, compression, and PII via the tokensaver object and/or X-Tokensaver-* headers; optionally reuse the API key pipeline_settings with X-Tokensaver-Apply-Key-Pipeline-Defaults (see Headers for merge order).

Request highlights

model — prefer provider/model_id (e.g. openai/gpt-4o) from GET /openai/v1/models
messages — user / assistant / system / tool; the last user text becomes the pipeline prompt; prior turns feed context layers
stream: true — text/event-stream, chunks with chat.completion.chunk, then data: [DONE]
tools, tool_choice — function calling; plan quotas may apply — see Tools & streaming

Minimal curl (modules off)

bash

curl -sS "https://api.tokensaver.fr/openai/v1/chat/completions" \

  -H "Authorization: Bearer $TS_KEY" \

  -H "Content-Type: application/json" \

  -d '{"model":"openai/gpt-4o","messages":[{"role":"user","content":"Hi"}]}'

Minimal curl (modules from API key)

When the TokenSaver key has pipeline_settings in the console and you send no explicit module control, X-Tokensaver-Apply-Key-Pipeline-Defaults: true merges them like POST /api/v1/pipelines/run. Same reserved key in JSON: apply_key_pipeline_defaults.

bash

curl -sS "https://api.tokensaver.fr/openai/v1/chat/completions" \

  -H "Authorization: Bearer $TS_KEY" \

  -H "Content-Type: application/json" \

  -H "X-Tokensaver-Apply-Key-Pipeline-Defaults: true" \

  -d '{"model":"openai/gpt-4o","messages":[{"role":"user","content":"Hi"}]}'

OpenAI Python SDK

python

from openai import OpenAI

client = OpenAI(

    api_key="ts_...",  # TokenSaver key, not the vendor key

    base_url="https://api.tokensaver.fr/openai/v1",

r = client.chat.completions.create(

    model="openai/gpt-4o",

    messages=[{"role": "user", "content": "Hello"}],

print(r.choices[0].message.content)

LangChain (ChatOpenAI)

python

import json

from langchain_openai import ChatOpenAI

llm = ChatOpenAI(

    model="openai/gpt-4o",

    api_key="ts_...",

    base_url="https://api.tokensaver.fr/openai/v1",

    default_headers={

        "X-Tokensaver-Options": json.dumps({

            "use_cache": True,

            "cache_similarity_threshold": 0.85,

}),

},

print(llm.invoke("Hello").content)

Use default_headers for X-Tokensaver-*. To send a tokensaver object in the JSON body, use the OpenAI Python SDK directly (extra_body) or httpx.

Enable cache (threshold)

Similarity is 0–1. Higher threshold → stricter match. Headers override JSON bools if both are sent (merge order).

bash

curl -sS "https://api.tokensaver.fr/openai/v1/chat/completions" \

  -H "Authorization: Bearer $TS_KEY" -H "Content-Type: application/json" \

  -H 'X-Tokensaver-Use-Cache: true' \

  -H 'X-Tokensaver-Options: {"cache_similarity_threshold":0.85}' \

  -d '{"model":"openai/gpt-4o","messages":[{"role":"user","content":"Same prompt as before"}]}'

python

# Body-only (also opts in to explicit module control)

payload = {

    "model": "openai/gpt-4o",

    "messages": [{"role": "user", "content": "Explain cache in one line."}],

    "tokensaver": {

        "use_cache": True,

        "cache_similarity_threshold": 0.85,

},

Enable RAG (document_ids, top_k, threshold)

bash

curl -sS "https://api.tokensaver.fr/openai/v1/chat/completions" \

  -H "Authorization: Bearer $TS_KEY" -H "Content-Type: application/json" \

  -H 'X-Tokensaver-Use-Rag: true' \

  -H 'X-Tokensaver-Options: {"rag_similarity_threshold":0.55,"rag_options":{"document_ids":["<uuid>"],"top_k":8}}' \

  -d '{"model":"openai/gpt-4o","messages":[{"role":"user","content":"What does our handbook say about refunds?"}]}'

python

client.chat.completions.create(

    model="openai/gpt-4o",

    messages=[{"role": "user", "content": "Summarize the uploaded policy."}],

    extra_body={

        "tokensaver": {

            "use_rag": True,

            "rag_similarity_threshold": 0.55,

            "rag_options": {"document_ids": ["<document_uuid>"], "top_k": 8},

},

Compression + PII

json

  "model": "openai/gpt-4o",

  "messages": [{"role": "user", "content": "…"}],

  "tokensaver": {

    "use_compression": true,

    "compression_level": 4,

    "use_pii_filter": true,

    "pii_options": {

      "engine": "gliner",

      "strategy": "mask",

      "confidence_threshold": 0.5,

      "language": "en",

      "regex_fallback": true

Server chat persistence (chat_id)

Optional: pass chat_id inside tokensaver (or in X-Tokensaver-Options / X-Tokensaver-Extensions) to bind the run to a server-side chat, same as the native API.

json

"tokensaver": {

  "chat_id": "<existing_server_chat_uuid>",

  "use_rag": true,

  "rag_options": {"top_k": 6}

Ephemeral LLM vendor key

Same as native: tokensaver.provider_api_key overrides organisation-stored keys for that request only; it is not persisted.

json

"tokensaver": {"provider_api_key": "sk-..."}

Response shape

Standard chat.completion with choices, usage. When metadata is present, an extra tokensaver object may include model_resolved and metadata. Some integrations also surface metrics via response headers—see Headers.