API Reference

Headers & body extensions

Authentication, TokenSaver pipeline toggles, and JSON merges. Chat and responses: POST /openai/v1/chat/completions and POST /openai/v1/responses — key pipeline_settings from the console apply only after X-Tokensaver-Apply-Key-Pipeline-Defaults / apply_key_pipeline_defaults (same merge as native pipelines/run); otherwise the four LLM modules default off. Embeddings: POST /openai/v1/embeddings uses X-Tokensaver-Provider, optional tokensaver / X-Tokensaver-Options, and X-Tokensaver-Use-Embedding-Cache; key defaults follow the embeddings merge order below. Per-request headers and body win over merged defaults.

POST/openai/v1/chat/completions
POST/openai/v1/responses
POST/openai/v1/embeddings

Authentication

HeaderRole
Authorization: Bearer <ts_…>Primary: TokenSaver API key (not the LLM vendor secret alone).
X-TokenSaver-Key: <ts_…>Alternative if you cannot use Bearer (same key).
bash
curl -H "Authorization: Bearer ts_..." "https://api.tokensaver.fr/openai/v1/models"
# or
curl -H "X-TokenSaver-Key: ts_..." "https://api.tokensaver.fr/openai/v1/models"

Request headers — module toggles (boolean)

Truthy values: true, 1, yes, on. Falsy: false, 0, no, off (case-insensitive). Any other value → 400.

On chat completions and responses, the first four headers map to PipelineRunRequest flags. If set, they override the same boolean after tokensaver and X-Tokensaver-Extensions / X-Tokensaver-Options merges.

Two different “caches”

X-Tokensaver-Use-Cache / use_cache control the LLM response cache (reuse stored assistant answers — exact match, then semantic similarity). They do not turn on Redis caching for POST /openai/v1/embeddings.

Embedding vector cache uses use_embedding_cache in the embeddings tokensaver merge, or X-Tokensaver-Use-Embedding-Cache, or per-key defaults — see the embeddings section below.

HTTP headerRequest fieldRoutesModuleWhen true
X-Tokensaver-Use-Cacheuse_cache/chat/completions, /responsesLLM response cacheSkip the LLM when a prior answer matches (exact prompt first, then similar prompts). Tune similarity with cache_similarity_threshold in JSON. Question embeddings for “near match” use cache_embedding_compute / cache_embedding_model when set.
X-Tokensaver-Use-Raguse_rag/chat/completions, /responsesRAGRetrieve workspace chunks and inject into context. Pass rag_options (document_ids, top_k, …) in tokensaver or JSON headers; tune with rag_similarity_threshold.
X-Tokensaver-Use-Compressionuse_compression/chat/completions, /responsesCompressionSemantic compression before the LLM. Set compression_level (1–5) in JSON.
X-Tokensaver-Use-Pii-Filteruse_pii_filter/chat/completions, /responsesPII filterRun PII detection/masking before the model. Configure with pii_options in JSON.
X-Tokensaver-Use-Piiuse_pii_filter/chat/completions, /responsesPII (alias)Same as X-Tokensaver-Use-Pii-Filter.
X-Tokensaver-Use-Embedding-Cacheuse_embedding_cache/embeddings onlyEmbeddings vector cacheEnable Redis exact cache of embedding vectors for this call. Evaluated after merging key defaults, JSON headers, and body tokensaver; this header wins if present. Independent of use_cache (LLM responses).

Request headers — apply key pipeline defaults (chat / responses)

Secondary opt-in for minimal OpenAI clients: when the client sends no explicit module control (same definition as the opt-in rule below), you may set X-Tokensaver-Apply-Key-Pipeline-Defaults: true (same truthy grammar as other TokenSaver boolean headers) so the server merges api_keys.pipeline_settings from the TokenSaver key — identical merge behaviour to POST /api/v1/pipelines/run. If the key has no stored settings, the four module flags still end up false. Reserved JSON key apply_key_pipeline_defaults in tokensaver, X-Tokensaver-Options, or X-Tokensaver-Extensions is equivalent; it is not a PipelineRunRequest field and is stripped after evaluation.

HTTP headerRoutesRole
X-Tokensaver-Apply-Key-Pipeline-Defaults/chat/completions, /responsesWhen true and no explicit module control: merge key pipeline_settings. Ignored if the client already fixed module config via body or headers.

tokensaver & JSON headers — module options (chat / responses)

Put these keys in the JSON body tokensaver object, or in X-Tokensaver-Extensions then X-Tokensaver-Options (same keys; Options overwrites Extensions on duplicates). Only keys that exist on PipelineRunRequest (other than prompt, provider, model) are merged. Invalid JSON in the headers → 400.

KeyTypeModuleDescription
use_cachebooleanLLM response cacheSame meaning as X-Tokensaver-Use-Cache. Counts toward OpenAI-compat opt-in when set.
cache_similarity_thresholdfloat 0–1LLM response cacheMinimum similarity for “near” questions. Identical prompts hit without this threshold.
cache_embedding_computeprovider | localEmbeddings for cacheWhere to compute question embeddings for semantic response cache: OpenAI catalogue vs internal EMBEDDING_SERVICE_URL. Pair with cache_embedding_model.
cache_embedding_modelstringEmbeddings for cacheCatalogue embedding id (e.g. openai/text-embedding-3-small) when compute is provider; fixed/local model id when local.
use_ragbooleanRAGSame as X-Tokensaver-Use-Rag.
rag_similarity_thresholdfloat 0–1RAGMinimum chunk similarity to the query.
rag_optionsobjectRAGdocument_ids, top_k (1–100), optional query_image_url. See the dedicated table below.
use_compressionbooleanCompressionSame as X-Tokensaver-Use-Compression.
compression_levelint 1–5CompressionHigher = stronger compression (and optional model phase when budget is configured server-side).
use_pii_filterbooleanPIISame as X-Tokensaver-Use-Pii-Filter.
pii_optionsobjectPIIengine, strategy, confidence_threshold, entity_types, language, regex_fallback — see table below.
chat_idstringSessionBind the run to a server-side chat (history + instructions loaded by the backend).
context_layersobjectContextStructured instruction / knowledge / interaction layers (overrides ad-hoc when provided).
temperaturefloat 0–2LLMAlso sent as top-level OpenAI temperature; JSON merge can override.
provider_api_keystringVendorPer-request LLM provider secret (not persisted); overrides org keys for that call only.
pipeline_idstringPipelineSelect a named pipeline when your workspace defines several.

Request headers — JSON & provider

X-Tokensaver-Extensions and X-Tokensaver-Options must be a single-line JSON object. Invalid JSON → 400.

HeaderFormatRole
X-Tokensaver-ProviderPlain stringProvider code (openai, anthropic, …) when model is ambiguous. Chat, responses, embeddings.
X-Tokensaver-ExtensionsJSON objectMerges into the pipeline request: any PipelineRunRequest key except prompt, provider, model.
X-Tokensaver-OptionsJSON objectSame allowed keys as Extensions. Merged after Extensions; duplicate keys are overwritten by Options.
X-Tokensaver-Rag-OptionsJSON (CORS)Listed for browser preflight. On /openai/v1/*, put RAG parameters in rag_options inside Options or tokensaver so they merge automatically.
X-Tokensaver-Context-LayersJSON (CORS)Listed for preflight. Prefer context_layers inside Options or tokensaver.

OpenAI-compat: keys that count as “pipeline control” (opt-in)

If none of these appear in tokensaver, X-Tokensaver-Extensions, or X-Tokensaver-Options, and no X-Tokensaver-Use-* boolean header is set, the API forces use_cache, use_rag, use_compression, use_pii_filter to false.

KeyMeaning
use_cacheOpt-in LLM response cache (not embeddings vector cache).
use_ragOpt-in RAG module.
use_compressionOpt-in compression module.
use_pii_filterOpt-in PII module.
pii_optionsCounts as explicit PII configuration.
rag_optionsCounts as explicit RAG configuration.
cache_similarity_thresholdCounts as explicit cache tuning.
rag_similarity_thresholdCounts as explicit RAG tuning.
compression_levelCounts as explicit compression tuning.
context_layersCounts as explicit pipeline context control.
pipeline_idCounts as explicit pipeline selection.
apply_key_pipeline_defaultsReserved control flag only — does not count as explicit module control; when true (and no explicit control), triggers merge of api_keys.pipeline_settings.

Merge order (end state on PipelineRunRequest)

  1. Body built from OpenAI fields (messages → prompt, temperature, max tokens, tools, …).
  2. tokensaver object in the JSON body (if present) merged into the pipeline request.
  3. X-Tokensaver-Extensions JSON, then X-Tokensaver-Options JSON (Options wins on duplicate keys). Invalid JSON → 400. Only keys present on PipelineRunRequest are applied from these payloads; apply_key_pipeline_defaults is read separately for the step below.
  4. X-Tokensaver-Use-* booleans — if set, they override use_cache, use_rag, use_compression, use_pii_filter from the steps above.
  5. OpenAI-compat module defaults (chat / responses only): if the client still has no explicit module control (same definition as the table above — including no parseable Use-* header), then either merge api_keys.pipeline_settings when X-Tokensaver-Apply-Key-Pipeline-Defaults / apply_key_pipeline_defaults requests it and the key has non-empty settings, or set all four module flags to false.

Temperature-only Options does not enable modules

An X-Tokensaver-Options payload that only contains e.g. temperature does not count as opting in to cache/RAG/compression/PII; those stay off unless you also set explicit module keys or Use-* headers.

chat_id and provider_api_key alone do not opt in modules

On /openai/v1/*, the server only treats certain keys as “explicit pipeline control” for the opt-in rule (use_*, pii_options, rag_options, thresholds, compression_level, context_layers, pipeline_id). Sending only chat_id or provider_api_key without any of those still leaves all four module flags forced to false unless you add Use-* headers or the keys above. The reserved flag apply_key_pipeline_defaults (or the homonymous header) does not count as explicit module control; when true it requests key pipeline_settings instead of the force-off branch.

rag_options (object)

KeyTypeDescription
document_idsstring[] | nullRestrict retrieval to these workspace document UUIDs.
top_kint | nullMax chunks (1–100); overrides server default when set.
query_image_urlstring | nullOptional image URL for multimodal RAG queries when supported.

pii_options (object)

KeyType / valuesDescription
enginegliner | spacyDefault gliner.
strategymask | replace | removeHow to apply detections to the text.
confidence_thresholdfloat 0–1Default 0.5.
entity_typesstring[]Presidio entity types to keep; empty = all supported.
languagefr | enFor spaCy engine; default fr.
regex_fallbackbooleanDefault true; extra regex for emails/phones, etc.

context_layers (object)

Prefer OpenAI system + messages for the common case; use this when you need explicit layer control from integrations.

KeyDescription
instruction_contextObject with workspace_instruction, user_profile_instruction, chat_instruction (strings).
knowledge_contextrag_documents (string[]), tool_outputs (string[]) — usually filled by the pipeline; optional input for advanced flows.
interaction_contextchat_history: array of messages with role user | assistant | tool, content, optional tool_calls / tool_call_id / name.
token_budgetOptional caps: instructions, rag, history (ints ≥ 0).

OpenAI JSON body (outside tokensaver)

Standard fields on POST /openai/v1/chat/completions that the adapter maps before merges:

FieldRole
modelResolved to provider + catalogue model (prefer provider/model_id).
messagesLast user text → prompt; system → instructions; prior turns → history / context layers.
temperatureMapped to pipeline temperature (0–2).
streamIf true → SSE chat.completion.chunk stream.
toolsOpenAI function definitions → openai_tools (OpenAI provider only).
tool_choiceMapped to openai_tool_choice.
parallel_tool_callsMapped to openai_parallel_tool_calls.
userOptional end-user id for logging (OpenAI field).

Snippets: headers vs body

httpx (Python)

python
import httpx, json
 
url = "https://api.tokensaver.fr/openai/v1/chat/completions"
headers = {
    "Authorization": "Bearer ts_...",
    "Content-Type": "application/json",
    "X-Tokensaver-Use-Cache": "true",
    "X-Tokensaver-Options": json.dumps({"cache_similarity_threshold": 0.9}),
}
body = {"model": "openai/gpt-4o", "messages": [{"role": "user", "content": "Hi"}]}
r = httpx.post(url, headers=headers, json=body, timeout=120.0)
r.raise_for_status()
print(r.json()["choices"][0]["message"]["content"])

OpenAI SDK + default_headers

python
import json
from openai import OpenAI
 
client = OpenAI(
    api_key="ts_...",
    base_url="https://api.tokensaver.fr/openai/v1",
    default_headers={
        "X-Tokensaver-Use-Rag": "true",
        "X-Tokensaver-Options": json.dumps({
            "rag_similarity_threshold": 0.55,
            "rag_options": {"document_ids": ["<uuid>"], "top_k": 8},
        }),
    },
)
print(client.chat.completions.create(
    model="openai/gpt-4o",
    messages=[{"role": "user", "content": "What does the doc say?"}],
).choices[0].message.content)

Node (fetch)

javascript
const opts = JSON.stringify({
  use_compression: true,
  compression_level: 4,
});
const r = await fetch("https://api.tokensaver.fr/openai/v1/chat/completions", {
  method: "POST",
  headers: {
    Authorization: "Bearer " + process.env.TS_KEY,
    "Content-Type": "application/json",
    "X-Tokensaver-Options": opts,
  },
  body: JSON.stringify({
    model: "openai/gpt-4o",
    messages: [{ role: "user", content: "Short summary." }],
  }),
});
console.log(await r.json());

LangChain (default_headers)

python
import json
from langchain_openai import ChatOpenAI
 
llm = ChatOpenAI(
    model="openai/gpt-4o",
    api_key="ts_...",
    base_url="https://api.tokensaver.fr/openai/v1",
    default_headers={
        "X-Tokensaver-Use-Cache": "true",
        "X-Tokensaver-Use-Rag": "true",
        "X-Tokensaver-Options": json.dumps({
            "cache_similarity_threshold": 0.88,
            "rag_similarity_threshold": 0.55,
            "rag_options": {"document_ids": ["<uuid>"], "top_k": 6},
        }),
    },
)

Response headers

Typical HTTP API responses (including /openai/v1/* JSON) include correlation and version headers. Some proxy or chat paths also forward pipeline diagnostics for UIs.

HeaderTypical use
X-Tokensaver-Api-VersionBackend API version string.
X-Request-IdCorrelation id for support and logs (also on errors).
X-Tokensaver-Cache-Hittrue / false when the response path exposes cache outcome (e.g. some console proxies).
X-Tokensaver-Token-MetricsStructured token / cost metrics (optional encoding via X-Tokensaver-Token-Metrics-Encoding).
X-Tokensaver-RAG-SourcesJSON list of RAG source snippets when exposed by the integration path.

JSON chat.completion may include a tokensaver object (e.g. model_resolved, metadata) when the pipeline returns metadata.

POST /openai/v1/embeddings (body + TokenSaver options)

No chat pipeline. Merge order for TokenSaver-specific options: defaults from the API key's pipeline_settings (console), then X-Tokensaver-Extensions, then X-Tokensaver-Options, then body tokensaver (later wins). Header X-Tokensaver-Use-Embedding-Cache, when present, overrides the resolved use_embedding_cache boolean for this request.

tokensaver / JSON keyTypeDescription
use_embedding_cachebooleanEnable Redis exact cache of embedding vectors. Same effect as X-Tokensaver-Use-Embedding-Cache when the header is set (header wins if both are sent).
embedding_computelocal | otherlocal → internal embedding service (EMBEDDING_SERVICE_URL). Otherwise default OpenAI (or org) path. Aliases such as embedding_compute_backend / use_internal_embedding_service are also recognised by the server.
(from key defaults)Per-key cache_embedding_compute and use_embedding_cache from the console apply when the request does not override them.

OpenAI-shaped body (required fields):

FieldDescription
modelEmbedding catalogue id (provider/model_id).
inputString or array of strings to embed.
tokensaverOptional object; merged last with the rules above.
encoding_formatOptional; default float (extra fields ignored by schema).
dimensionsOptional; ignored if not applicable to the internal embedding service.

CORS

Preflight allows the X-Tokensaver-* headers listed above, including X-Tokensaver-Apply-Key-Pipeline-Defaults (plus X-TokenSaver-Key) so browser apps can send extensions from another origin when the API CORS policy permits.