Headers & body extensions
Authentication, TokenSaver pipeline toggles, and JSON merges. Chat and responses: POST /openai/v1/chat/completions and POST /openai/v1/responses — key pipeline_settings from the console apply only after X-Tokensaver-Apply-Key-Pipeline-Defaults / apply_key_pipeline_defaults (same merge as native pipelines/run); otherwise the four LLM modules default off. Embeddings: POST /openai/v1/embeddings uses X-Tokensaver-Provider, optional tokensaver / X-Tokensaver-Options, and X-Tokensaver-Use-Embedding-Cache; key defaults follow the embeddings merge order below. Per-request headers and body win over merged defaults.
Authentication
| Header | Role |
|---|---|
| Authorization: Bearer <ts_…> | Primary: TokenSaver API key (not the LLM vendor secret alone). |
| X-TokenSaver-Key: <ts_…> | Alternative if you cannot use Bearer (same key). |
curl -H "Authorization: Bearer ts_..." "https://api.tokensaver.fr/openai/v1/models"
# or
curl -H "X-TokenSaver-Key: ts_..." "https://api.tokensaver.fr/openai/v1/models"
Request headers — module toggles (boolean)
Truthy values: true, 1, yes, on. Falsy: false, 0, no, off (case-insensitive). Any other value → 400.
On chat completions and responses, the first four headers map to PipelineRunRequest flags. If set, they override the same boolean after tokensaver and X-Tokensaver-Extensions / X-Tokensaver-Options merges.
Two different “caches”
X-Tokensaver-Use-Cache / use_cache control the LLM response cache (reuse stored assistant answers — exact match, then semantic similarity). They do not turn on Redis caching for POST /openai/v1/embeddings.
Embedding vector cache uses use_embedding_cache in the embeddings tokensaver merge, or X-Tokensaver-Use-Embedding-Cache, or per-key defaults — see the embeddings section below.
| HTTP header | Request field | Routes | Module | When true |
|---|---|---|---|---|
| X-Tokensaver-Use-Cache | use_cache | /chat/completions, /responses | LLM response cache | Skip the LLM when a prior answer matches (exact prompt first, then similar prompts). Tune similarity with cache_similarity_threshold in JSON. Question embeddings for “near match” use cache_embedding_compute / cache_embedding_model when set. |
| X-Tokensaver-Use-Rag | use_rag | /chat/completions, /responses | RAG | Retrieve workspace chunks and inject into context. Pass rag_options (document_ids, top_k, …) in tokensaver or JSON headers; tune with rag_similarity_threshold. |
| X-Tokensaver-Use-Compression | use_compression | /chat/completions, /responses | Compression | Semantic compression before the LLM. Set compression_level (1–5) in JSON. |
| X-Tokensaver-Use-Pii-Filter | use_pii_filter | /chat/completions, /responses | PII filter | Run PII detection/masking before the model. Configure with pii_options in JSON. |
| X-Tokensaver-Use-Pii | use_pii_filter | /chat/completions, /responses | PII (alias) | Same as X-Tokensaver-Use-Pii-Filter. |
| X-Tokensaver-Use-Embedding-Cache | use_embedding_cache | /embeddings only | Embeddings vector cache | Enable Redis exact cache of embedding vectors for this call. Evaluated after merging key defaults, JSON headers, and body tokensaver; this header wins if present. Independent of use_cache (LLM responses). |
Request headers — apply key pipeline defaults (chat / responses)
Secondary opt-in for minimal OpenAI clients: when the client sends no explicit module control (same definition as the opt-in rule below), you may set X-Tokensaver-Apply-Key-Pipeline-Defaults: true (same truthy grammar as other TokenSaver boolean headers) so the server merges api_keys.pipeline_settings from the TokenSaver key — identical merge behaviour to POST /api/v1/pipelines/run. If the key has no stored settings, the four module flags still end up false. Reserved JSON key apply_key_pipeline_defaults in tokensaver, X-Tokensaver-Options, or X-Tokensaver-Extensions is equivalent; it is not a PipelineRunRequest field and is stripped after evaluation.
| HTTP header | Routes | Role |
|---|---|---|
| X-Tokensaver-Apply-Key-Pipeline-Defaults | /chat/completions, /responses | When true and no explicit module control: merge key pipeline_settings. Ignored if the client already fixed module config via body or headers. |
tokensaver & JSON headers — module options (chat / responses)
Put these keys in the JSON body tokensaver object, or in X-Tokensaver-Extensions then X-Tokensaver-Options (same keys; Options overwrites Extensions on duplicates). Only keys that exist on PipelineRunRequest (other than prompt, provider, model) are merged. Invalid JSON in the headers → 400.
| Key | Type | Module | Description |
|---|---|---|---|
| use_cache | boolean | LLM response cache | Same meaning as X-Tokensaver-Use-Cache. Counts toward OpenAI-compat opt-in when set. |
| cache_similarity_threshold | float 0–1 | LLM response cache | Minimum similarity for “near” questions. Identical prompts hit without this threshold. |
| cache_embedding_compute | provider | local | Embeddings for cache | Where to compute question embeddings for semantic response cache: OpenAI catalogue vs internal EMBEDDING_SERVICE_URL. Pair with cache_embedding_model. |
| cache_embedding_model | string | Embeddings for cache | Catalogue embedding id (e.g. openai/text-embedding-3-small) when compute is provider; fixed/local model id when local. |
| use_rag | boolean | RAG | Same as X-Tokensaver-Use-Rag. |
| rag_similarity_threshold | float 0–1 | RAG | Minimum chunk similarity to the query. |
| rag_options | object | RAG | document_ids, top_k (1–100), optional query_image_url. See the dedicated table below. |
| use_compression | boolean | Compression | Same as X-Tokensaver-Use-Compression. |
| compression_level | int 1–5 | Compression | Higher = stronger compression (and optional model phase when budget is configured server-side). |
| use_pii_filter | boolean | PII | Same as X-Tokensaver-Use-Pii-Filter. |
| pii_options | object | PII | engine, strategy, confidence_threshold, entity_types, language, regex_fallback — see table below. |
| chat_id | string | Session | Bind the run to a server-side chat (history + instructions loaded by the backend). |
| context_layers | object | Context | Structured instruction / knowledge / interaction layers (overrides ad-hoc when provided). |
| temperature | float 0–2 | LLM | Also sent as top-level OpenAI temperature; JSON merge can override. |
| provider_api_key | string | Vendor | Per-request LLM provider secret (not persisted); overrides org keys for that call only. |
| pipeline_id | string | Pipeline | Select a named pipeline when your workspace defines several. |
Request headers — JSON & provider
X-Tokensaver-Extensions and X-Tokensaver-Options must be a single-line JSON object. Invalid JSON → 400.
| Header | Format | Role |
|---|---|---|
| X-Tokensaver-Provider | Plain string | Provider code (openai, anthropic, …) when model is ambiguous. Chat, responses, embeddings. |
| X-Tokensaver-Extensions | JSON object | Merges into the pipeline request: any PipelineRunRequest key except prompt, provider, model. |
| X-Tokensaver-Options | JSON object | Same allowed keys as Extensions. Merged after Extensions; duplicate keys are overwritten by Options. |
| X-Tokensaver-Rag-Options | JSON (CORS) | Listed for browser preflight. On /openai/v1/*, put RAG parameters in rag_options inside Options or tokensaver so they merge automatically. |
| X-Tokensaver-Context-Layers | JSON (CORS) | Listed for preflight. Prefer context_layers inside Options or tokensaver. |
OpenAI-compat: keys that count as “pipeline control” (opt-in)
If none of these appear in tokensaver, X-Tokensaver-Extensions, or X-Tokensaver-Options, and no X-Tokensaver-Use-* boolean header is set, the API forces use_cache, use_rag, use_compression, use_pii_filter to false.
| Key | Meaning |
|---|---|
| use_cache | Opt-in LLM response cache (not embeddings vector cache). |
| use_rag | Opt-in RAG module. |
| use_compression | Opt-in compression module. |
| use_pii_filter | Opt-in PII module. |
| pii_options | Counts as explicit PII configuration. |
| rag_options | Counts as explicit RAG configuration. |
| cache_similarity_threshold | Counts as explicit cache tuning. |
| rag_similarity_threshold | Counts as explicit RAG tuning. |
| compression_level | Counts as explicit compression tuning. |
| context_layers | Counts as explicit pipeline context control. |
| pipeline_id | Counts as explicit pipeline selection. |
| apply_key_pipeline_defaults | Reserved control flag only — does not count as explicit module control; when true (and no explicit control), triggers merge of api_keys.pipeline_settings. |
Merge order (end state on PipelineRunRequest)
- Body built from OpenAI fields (messages → prompt, temperature, max tokens, tools, …).
tokensaverobject in the JSON body (if present) merged into the pipeline request.X-Tokensaver-ExtensionsJSON, thenX-Tokensaver-OptionsJSON (Options wins on duplicate keys). Invalid JSON →400. Only keys present onPipelineRunRequestare applied from these payloads;apply_key_pipeline_defaultsis read separately for the step below.X-Tokensaver-Use-*booleans — if set, they overrideuse_cache,use_rag,use_compression,use_pii_filterfrom the steps above.- OpenAI-compat module defaults (chat / responses only): if the client still has no explicit module control (same definition as the table above — including no parseable Use-* header), then either merge
api_keys.pipeline_settingswhenX-Tokensaver-Apply-Key-Pipeline-Defaults/apply_key_pipeline_defaultsrequests it and the key has non-empty settings, or set all four module flags tofalse.
Temperature-only Options does not enable modules
An X-Tokensaver-Options payload that only contains e.g. temperature does not count as opting in to cache/RAG/compression/PII; those stay off unless you also set explicit module keys or Use-* headers.
chat_id and provider_api_key alone do not opt in modules
On /openai/v1/*, the server only treats certain keys as “explicit pipeline control” for the opt-in rule (use_*, pii_options, rag_options, thresholds, compression_level, context_layers, pipeline_id). Sending only chat_id or provider_api_key without any of those still leaves all four module flags forced to false unless you add Use-* headers or the keys above. The reserved flag apply_key_pipeline_defaults (or the homonymous header) does not count as explicit module control; when true it requests key pipeline_settings instead of the force-off branch.
rag_options (object)
| Key | Type | Description |
|---|---|---|
| document_ids | string[] | null | Restrict retrieval to these workspace document UUIDs. |
| top_k | int | null | Max chunks (1–100); overrides server default when set. |
| query_image_url | string | null | Optional image URL for multimodal RAG queries when supported. |
pii_options (object)
| Key | Type / values | Description |
|---|---|---|
| engine | gliner | spacy | Default gliner. |
| strategy | mask | replace | remove | How to apply detections to the text. |
| confidence_threshold | float 0–1 | Default 0.5. |
| entity_types | string[] | Presidio entity types to keep; empty = all supported. |
| language | fr | en | For spaCy engine; default fr. |
| regex_fallback | boolean | Default true; extra regex for emails/phones, etc. |
context_layers (object)
Prefer OpenAI system + messages for the common case; use this when you need explicit layer control from integrations.
| Key | Description |
|---|---|
| instruction_context | Object with workspace_instruction, user_profile_instruction, chat_instruction (strings). |
| knowledge_context | rag_documents (string[]), tool_outputs (string[]) — usually filled by the pipeline; optional input for advanced flows. |
| interaction_context | chat_history: array of messages with role user | assistant | tool, content, optional tool_calls / tool_call_id / name. |
| token_budget | Optional caps: instructions, rag, history (ints ≥ 0). |
OpenAI JSON body (outside tokensaver)
Standard fields on POST /openai/v1/chat/completions that the adapter maps before merges:
| Field | Role |
|---|---|
| model | Resolved to provider + catalogue model (prefer provider/model_id). |
| messages | Last user text → prompt; system → instructions; prior turns → history / context layers. |
| temperature | Mapped to pipeline temperature (0–2). |
| stream | If true → SSE chat.completion.chunk stream. |
| tools | OpenAI function definitions → openai_tools (OpenAI provider only). |
| tool_choice | Mapped to openai_tool_choice. |
| parallel_tool_calls | Mapped to openai_parallel_tool_calls. |
| user | Optional end-user id for logging (OpenAI field). |
Snippets: headers vs body
httpx (Python)
import httpx, json
url = "https://api.tokensaver.fr/openai/v1/chat/completions"
headers = {"Authorization": "Bearer ts_...",
"Content-Type": "application/json",
"X-Tokensaver-Use-Cache": "true",
"X-Tokensaver-Options": json.dumps({"cache_similarity_threshold": 0.9}),}
body = {"model": "openai/gpt-4o", "messages": [{"role": "user", "content": "Hi"}]}r = httpx.post(url, headers=headers, json=body, timeout=120.0)
r.raise_for_status()
print(r.json()["choices"][0]["message"]["content"])
OpenAI SDK + default_headers
import json
from openai import OpenAI
client = OpenAI(
api_key="ts_...",
base_url="https://api.tokensaver.fr/openai/v1",
default_headers={"X-Tokensaver-Use-Rag": "true",
"X-Tokensaver-Options": json.dumps({"rag_similarity_threshold": 0.55,
"rag_options": {"document_ids": ["<uuid>"], "top_k": 8},}),
},
)
print(client.chat.completions.create(
model="openai/gpt-4o",
messages=[{"role": "user", "content": "What does the doc say?"}],).choices[0].message.content)
Node (fetch)
const opts = JSON.stringify({use_compression: true,
compression_level: 4,
});
const r = await fetch("https://api.tokensaver.fr/openai/v1/chat/completions", {method: "POST",
headers: {Authorization: "Bearer " + process.env.TS_KEY,
"Content-Type": "application/json",
"X-Tokensaver-Options": opts,
},
body: JSON.stringify({model: "openai/gpt-4o",
messages: [{ role: "user", content: "Short summary." }],}),
});
console.log(await r.json());
LangChain (default_headers)
import json
from langchain_openai import ChatOpenAI
llm = ChatOpenAI(
model="openai/gpt-4o",
api_key="ts_...",
base_url="https://api.tokensaver.fr/openai/v1",
default_headers={"X-Tokensaver-Use-Cache": "true",
"X-Tokensaver-Use-Rag": "true",
"X-Tokensaver-Options": json.dumps({"cache_similarity_threshold": 0.88,
"rag_similarity_threshold": 0.55,
"rag_options": {"document_ids": ["<uuid>"], "top_k": 6},}),
},
)
Response headers
Typical HTTP API responses (including /openai/v1/* JSON) include correlation and version headers. Some proxy or chat paths also forward pipeline diagnostics for UIs.
| Header | Typical use |
|---|---|
| X-Tokensaver-Api-Version | Backend API version string. |
| X-Request-Id | Correlation id for support and logs (also on errors). |
| X-Tokensaver-Cache-Hit | true / false when the response path exposes cache outcome (e.g. some console proxies). |
| X-Tokensaver-Token-Metrics | Structured token / cost metrics (optional encoding via X-Tokensaver-Token-Metrics-Encoding). |
| X-Tokensaver-RAG-Sources | JSON list of RAG source snippets when exposed by the integration path. |
JSON chat.completion may include a tokensaver object (e.g. model_resolved, metadata) when the pipeline returns metadata.
POST /openai/v1/embeddings (body + TokenSaver options)
No chat pipeline. Merge order for TokenSaver-specific options: defaults from the API key's pipeline_settings (console), then X-Tokensaver-Extensions, then X-Tokensaver-Options, then body tokensaver (later wins). Header X-Tokensaver-Use-Embedding-Cache, when present, overrides the resolved use_embedding_cache boolean for this request.
| tokensaver / JSON key | Type | Description |
|---|---|---|
| use_embedding_cache | boolean | Enable Redis exact cache of embedding vectors. Same effect as X-Tokensaver-Use-Embedding-Cache when the header is set (header wins if both are sent). |
| embedding_compute | local | other | local → internal embedding service (EMBEDDING_SERVICE_URL). Otherwise default OpenAI (or org) path. Aliases such as embedding_compute_backend / use_internal_embedding_service are also recognised by the server. |
| (from key defaults) | — | Per-key cache_embedding_compute and use_embedding_cache from the console apply when the request does not override them. |
OpenAI-shaped body (required fields):
| Field | Description |
|---|---|
| model | Embedding catalogue id (provider/model_id). |
| input | String or array of strings to embed. |
| tokensaver | Optional object; merged last with the rules above. |
| encoding_format | Optional; default float (extra fields ignored by schema). |
| dimensions | Optional; ignored if not applicable to the internal embedding service. |
CORS
Preflight allows the X-Tokensaver-* headers listed above, including X-Tokensaver-Apply-Key-Pipeline-Defaults (plus X-TokenSaver-Key) so browser apps can send extensions from another origin when the API CORS policy permits.
