Headers & body extensions

Authentication, TokenSaver pipeline toggles, and JSON merges. Chat and responses: POST /openai/v1/chat/completions and POST /openai/v1/responses — key pipeline_settings from the console apply only after X-Tokensaver-Apply-Key-Pipeline-Defaults / apply_key_pipeline_defaults (same merge as native pipelines/run); otherwise the four LLM modules default off. Embeddings: POST /openai/v1/embeddings uses X-Tokensaver-Provider, optional tokensaver / X-Tokensaver-Options, and X-Tokensaver-Use-Embedding-Cache; key defaults follow the embeddings merge order below. Per-request headers and body win over merged defaults.

POST/openai/v1/chat/completions

POST/openai/v1/responses

POST/openai/v1/embeddings

Authentication

Header	Role
Authorization: Bearer <ts_…>	Primary: TokenSaver API key (not the LLM vendor secret alone).
X-TokenSaver-Key: <ts_…>	Alternative if you cannot use Bearer (same key).

bash

curl -H "Authorization: Bearer ts_..." "https://api.tokensaver.fr/openai/v1/models"

# or

curl -H "X-TokenSaver-Key: ts_..." "https://api.tokensaver.fr/openai/v1/models"

Request headers — module toggles (boolean)

Truthy values: true, 1, yes, on. Falsy: false, 0, no, off (case-insensitive). Any other value → 400.

On chat completions and responses, the first four headers map to PipelineRunRequest flags. If set, they override the same boolean after tokensaver and X-Tokensaver-Extensions / X-Tokensaver-Options merges.

Two different “caches”

X-Tokensaver-Use-Cache / use_cache control the LLM response cache (reuse stored assistant answers — exact match, then semantic similarity). They do not turn on Redis caching for POST /openai/v1/embeddings.

Embedding vector cache uses use_embedding_cache in the embeddings tokensaver merge, or X-Tokensaver-Use-Embedding-Cache, or per-key defaults — see the embeddings section below.

HTTP header	Request field	Routes	Module	When true
X-Tokensaver-Use-Cache	use_cache	`/chat/completions`, `/responses`	LLM response cache	Skip the LLM when a prior answer matches (exact prompt first, then similar prompts). Tune similarity with `cache_similarity_threshold` in JSON. Question embeddings for “near match” use `cache_embedding_compute` / `cache_embedding_model` when set.
X-Tokensaver-Use-Rag	use_rag	`/chat/completions`, `/responses`	RAG	Retrieve workspace chunks and inject into context. Pass `rag_options` (`document_ids`, `top_k`, …) in `tokensaver` or JSON headers; tune with `rag_similarity_threshold`.
X-Tokensaver-Use-Compression	use_compression	`/chat/completions`, `/responses`	Compression	Semantic compression before the LLM. Set `compression_level` (1–5) in JSON.
X-Tokensaver-Use-Pii-Filter	use_pii_filter	`/chat/completions`, `/responses`	PII filter	Run PII detection/masking before the model. Configure with `pii_options` in JSON.
X-Tokensaver-Use-Pii	use_pii_filter	`/chat/completions`, `/responses`	PII (alias)	Same as `X-Tokensaver-Use-Pii-Filter`.
X-Tokensaver-Use-Embedding-Cache	use_embedding_cache	`/embeddings` only	Embeddings vector cache	Enable Redis exact cache of embedding vectors for this call. Evaluated after merging key defaults, JSON headers, and body `tokensaver`; this header wins if present. Independent of `use_cache` (LLM responses).

Request headers — apply key pipeline defaults (chat / responses)

Secondary opt-in for minimal OpenAI clients: when the client sends no explicit module control (same definition as the opt-in rule below), you may set X-Tokensaver-Apply-Key-Pipeline-Defaults: true (same truthy grammar as other TokenSaver boolean headers) so the server merges api_keys.pipeline_settings from the TokenSaver key — identical merge behaviour to POST /api/v1/pipelines/run. If the key has no stored settings, the four module flags still end up false. Reserved JSON key apply_key_pipeline_defaults in tokensaver, X-Tokensaver-Options, or X-Tokensaver-Extensions is equivalent; it is not a PipelineRunRequest field and is stripped after evaluation.

HTTP header	Routes	Role
X-Tokensaver-Apply-Key-Pipeline-Defaults	`/chat/completions`, `/responses`	When true and no explicit module control: merge key `pipeline_settings`. Ignored if the client already fixed module config via body or headers.

`tokensaver` & JSON headers — module options (chat / responses)

Put these keys in the JSON body tokensaver object, or in X-Tokensaver-Extensions then X-Tokensaver-Options (same keys; Options overwrites Extensions on duplicates). Only keys that exist on PipelineRunRequest (other than prompt, provider, model) are merged. Invalid JSON in the headers → 400.

Key	Type	Module	Description
use_cache	boolean	LLM response cache	Same meaning as `X-Tokensaver-Use-Cache`. Counts toward OpenAI-compat opt-in when set.
cache_similarity_threshold	float 0–1	LLM response cache	Minimum similarity for “near” questions. Identical prompts hit without this threshold.
cache_embedding_compute	`provider` \| `local`	Embeddings for cache	Where to compute question embeddings for semantic response cache: OpenAI catalogue vs internal `EMBEDDING_SERVICE_URL`. Pair with `cache_embedding_model`.
cache_embedding_model	string	Embeddings for cache	Catalogue embedding id (e.g. `openai/text-embedding-3-small`) when compute is `provider`; fixed/local model id when `local`.
use_rag	boolean	RAG	Same as `X-Tokensaver-Use-Rag`.
rag_similarity_threshold	float 0–1	RAG	Minimum chunk similarity to the query.
rag_options	object	RAG	`document_ids`, `top_k` (1–100), optional `query_image_url`. See the dedicated table below.
use_compression	boolean	Compression	Same as `X-Tokensaver-Use-Compression`.
compression_level	int 1–5	Compression	Higher = stronger compression (and optional model phase when budget is configured server-side).
use_pii_filter	boolean	PII	Same as `X-Tokensaver-Use-Pii-Filter`.
pii_options	object	PII	`engine`, `strategy`, `confidence_threshold`, `entity_types`, `language`, `regex_fallback` — see table below.
chat_id	string	Session	Bind the run to a server-side chat (history + instructions loaded by the backend).
context_layers	object	Context	Structured instruction / knowledge / interaction layers (overrides ad-hoc when provided).
temperature	float 0–2	LLM	Also sent as top-level OpenAI `temperature`; JSON merge can override.
provider_api_key	string	Vendor	Per-request LLM provider secret (not persisted); overrides org keys for that call only.
pipeline_id	string	Pipeline	Select a named pipeline when your workspace defines several.

Request headers — JSON & provider

X-Tokensaver-Extensions and X-Tokensaver-Options must be a single-line JSON object. Invalid JSON → 400.

Header	Format	Role
X-Tokensaver-Provider	Plain string	Provider code (`openai`, `anthropic`, …) when `model` is ambiguous. Chat, responses, embeddings.
X-Tokensaver-Extensions	JSON object	Merges into the pipeline request: any `PipelineRunRequest` key except `prompt`, `provider`, `model`.
X-Tokensaver-Options	JSON object	Same allowed keys as Extensions. Merged after Extensions; duplicate keys are overwritten by Options.
X-Tokensaver-Rag-Options	JSON (CORS)	Listed for browser preflight. On `/openai/v1/*`, put RAG parameters in `rag_options` inside Options or `tokensaver` so they merge automatically.
X-Tokensaver-Context-Layers	JSON (CORS)	Listed for preflight. Prefer `context_layers` inside Options or `tokensaver`.

OpenAI-compat: keys that count as “pipeline control” (opt-in)

If none of these appear in tokensaver, X-Tokensaver-Extensions, or X-Tokensaver-Options, and no X-Tokensaver-Use-* boolean header is set, the API forces use_cache, use_rag, use_compression, use_pii_filter to false.

Key	Meaning
use_cache	Opt-in LLM response cache (not embeddings vector cache).
use_rag	Opt-in RAG module.
use_compression	Opt-in compression module.
use_pii_filter	Opt-in PII module.
pii_options	Counts as explicit PII configuration.
rag_options	Counts as explicit RAG configuration.
cache_similarity_threshold	Counts as explicit cache tuning.
rag_similarity_threshold	Counts as explicit RAG tuning.
compression_level	Counts as explicit compression tuning.
context_layers	Counts as explicit pipeline context control.
pipeline_id	Counts as explicit pipeline selection.
apply_key_pipeline_defaults	Reserved control flag only — does not count as explicit module control; when true (and no explicit control), triggers merge of `api_keys.pipeline_settings`.

Merge order (end state on PipelineRunRequest)

Body built from OpenAI fields (messages → prompt, temperature, max tokens, tools, …).
tokensaver object in the JSON body (if present) merged into the pipeline request.
X-Tokensaver-Extensions JSON, then X-Tokensaver-Options JSON (Options wins on duplicate keys). Invalid JSON → 400. Only keys present on PipelineRunRequest are applied from these payloads; apply_key_pipeline_defaults is read separately for the step below.
X-Tokensaver-Use-* booleans — if set, they override use_cache, use_rag, use_compression, use_pii_filter from the steps above.
OpenAI-compat module defaults (chat / responses only): if the client still has no explicit module control (same definition as the table above — including no parseable Use-* header), then either merge api_keys.pipeline_settings when X-Tokensaver-Apply-Key-Pipeline-Defaults / apply_key_pipeline_defaults requests it and the key has non-empty settings, or set all four module flags to false.

Temperature-only Options does not enable modules

An X-Tokensaver-Options payload that only contains e.g. temperature does not count as opting in to cache/RAG/compression/PII; those stay off unless you also set explicit module keys or Use-* headers.

chat_id and provider_api_key alone do not opt in modules

On /openai/v1/*, the server only treats certain keys as “explicit pipeline control” for the opt-in rule (use_*, pii_options, rag_options, thresholds, compression_level, context_layers, pipeline_id). Sending only chat_id or provider_api_key without any of those still leaves all four module flags forced to false unless you add Use-* headers or the keys above. The reserved flag apply_key_pipeline_defaults (or the homonymous header) does not count as explicit module control; when true it requests key pipeline_settings instead of the force-off branch.

`rag_options` (object)

Key	Type	Description
document_ids	string[] \| null	Restrict retrieval to these workspace document UUIDs.
top_k	int \| null	Max chunks (1–100); overrides server default when set.
query_image_url	string \| null	Optional image URL for multimodal RAG queries when supported.

`pii_options` (object)

Key	Type / values	Description
engine	`gliner` \| `spacy`	Default `gliner`.
strategy	`mask` \| `replace` \| `remove`	How to apply detections to the text.
confidence_threshold	float 0–1	Default 0.5.
entity_types	string[]	Presidio entity types to keep; empty = all supported.
language	`fr` \| `en`	For spaCy engine; default `fr`.
regex_fallback	boolean	Default true; extra regex for emails/phones, etc.

`context_layers` (object)

Prefer OpenAI system + messages for the common case; use this when you need explicit layer control from integrations.

Key	Description
instruction_context	Object with `workspace_instruction`, `user_profile_instruction`, `chat_instruction` (strings).
knowledge_context	`rag_documents` (string[]), `tool_outputs` (string[]) — usually filled by the pipeline; optional input for advanced flows.
interaction_context	`chat_history`: array of messages with `role` `user` \| `assistant` \| `tool`, `content`, optional `tool_calls` / `tool_call_id` / `name`.
token_budget	Optional caps: `instructions`, `rag`, `history` (ints ≥ 0).

OpenAI JSON body (outside tokensaver)

Standard fields on POST /openai/v1/chat/completions that the adapter maps before merges:

Field	Role
model	Resolved to provider + catalogue model (prefer `provider/model_id`).
messages	Last user text → prompt; system → instructions; prior turns → history / context layers.
temperature	Mapped to pipeline `temperature` (0–2).
stream	If true → SSE `chat.completion.chunk` stream.
tools	OpenAI function definitions → `openai_tools` (OpenAI provider only).
tool_choice	Mapped to `openai_tool_choice`.
parallel_tool_calls	Mapped to `openai_parallel_tool_calls`.
user	Optional end-user id for logging (OpenAI field).

Snippets: headers vs body

httpx (Python)

python

import httpx, json

url = "https://api.tokensaver.fr/openai/v1/chat/completions"

headers = {

    "Authorization": "Bearer ts_...",

    "Content-Type": "application/json",

    "X-Tokensaver-Use-Cache": "true",

    "X-Tokensaver-Options": json.dumps({"cache_similarity_threshold": 0.9}),

body = {"model": "openai/gpt-4o", "messages": [{"role": "user", "content": "Hi"}]}

r = httpx.post(url, headers=headers, json=body, timeout=120.0)

r.raise_for_status()

print(r.json()["choices"][0]["message"]["content"])

OpenAI SDK + default_headers

python

import json

from openai import OpenAI

client = OpenAI(

    api_key="ts_...",

    base_url="https://api.tokensaver.fr/openai/v1",

    default_headers={

        "X-Tokensaver-Use-Rag": "true",

        "X-Tokensaver-Options": json.dumps({

            "rag_similarity_threshold": 0.55,

            "rag_options": {"document_ids": ["<uuid>"], "top_k": 8},

}),

},

print(client.chat.completions.create(

    model="openai/gpt-4o",

    messages=[{"role": "user", "content": "What does the doc say?"}],

).choices[0].message.content)

Node (fetch)

javascript

const opts = JSON.stringify({

  use_compression: true,

  compression_level: 4,

});

const r = await fetch("https://api.tokensaver.fr/openai/v1/chat/completions", {

  method: "POST",

  headers: {

    Authorization: "Bearer " + process.env.TS_KEY,

    "Content-Type": "application/json",

    "X-Tokensaver-Options": opts,

},

  body: JSON.stringify({

    model: "openai/gpt-4o",

    messages: [{ role: "user", content: "Short summary." }],

}),

});

console.log(await r.json());

LangChain (default_headers)

python

import json

from langchain_openai import ChatOpenAI

llm = ChatOpenAI(

    model="openai/gpt-4o",

    api_key="ts_...",

    base_url="https://api.tokensaver.fr/openai/v1",

    default_headers={

        "X-Tokensaver-Use-Cache": "true",

        "X-Tokensaver-Use-Rag": "true",

        "X-Tokensaver-Options": json.dumps({

            "cache_similarity_threshold": 0.88,

            "rag_similarity_threshold": 0.55,

            "rag_options": {"document_ids": ["<uuid>"], "top_k": 6},

}),

},

Response headers

Typical HTTP API responses (including /openai/v1/* JSON) include correlation and version headers. Some proxy or chat paths also forward pipeline diagnostics for UIs.

Header	Typical use
X-Tokensaver-Api-Version	Backend API version string.
X-Request-Id	Correlation id for support and logs (also on errors).
X-Tokensaver-Cache-Hit	`true` / `false` when the response path exposes cache outcome (e.g. some console proxies).
X-Tokensaver-Token-Metrics	Structured token / cost metrics (optional encoding via X-Tokensaver-Token-Metrics-Encoding).
X-Tokensaver-RAG-Sources	JSON list of RAG source snippets when exposed by the integration path.

JSON chat.completion may include a tokensaver object (e.g. model_resolved, metadata) when the pipeline returns metadata.

POST /openai/v1/embeddings (body + TokenSaver options)

No chat pipeline. Merge order for TokenSaver-specific options: defaults from the API key's pipeline_settings (console), then X-Tokensaver-Extensions, then X-Tokensaver-Options, then body tokensaver (later wins). Header X-Tokensaver-Use-Embedding-Cache, when present, overrides the resolved use_embedding_cache boolean for this request.

tokensaver / JSON key	Type	Description
use_embedding_cache	boolean	Enable Redis exact cache of embedding vectors. Same effect as `X-Tokensaver-Use-Embedding-Cache` when the header is set (header wins if both are sent).
embedding_compute	`local` \| other	`local` → internal embedding service (`EMBEDDING_SERVICE_URL`). Otherwise default OpenAI (or org) path. Aliases such as `embedding_compute_backend` / `use_internal_embedding_service` are also recognised by the server.
(from key defaults)	—	Per-key `cache_embedding_compute` and `use_embedding_cache` from the console apply when the request does not override them.

OpenAI-shaped body (required fields):

Field	Description
model	Embedding catalogue id (`provider/model_id`).
input	String or array of strings to embed.
tokensaver	Optional object; merged last with the rules above.
encoding_format	Optional; default `float` (extra fields ignored by schema).
dimensions	Optional; ignored if not applicable to the internal embedding service.

CORS

Preflight allows the X-Tokensaver-* headers listed above, including X-Tokensaver-Apply-Key-Pipeline-Defaults (plus X-TokenSaver-Key) so browser apps can send extensions from another origin when the API CORS policy permits.

Authentication

Request headers — module toggles (boolean)

Request headers — apply key pipeline defaults (chat / responses)

tokensaver & JSON headers — module options (chat / responses)

Request headers — JSON & provider

OpenAI-compat: keys that count as “pipeline control” (opt-in)

Merge order (end state on PipelineRunRequest)

rag_options (object)

pii_options (object)

context_layers (object)

OpenAI JSON body (outside tokensaver)

Snippets: headers vs body

Response headers

POST /openai/v1/embeddings (body + TokenSaver options)

CORS

`tokensaver` & JSON headers — module options (chat / responses)

`rag_options` (object)

`pii_options` (object)

`context_layers` (object)