API Reference
POST/openai/v1/chat/completions

Tools & streaming

Send OpenAI-style tools and optional tool_choice. JSON responses include message.tool_calls and finish_reason tool_calls when the model selects a function. With stream: true, the server emits text/event-stream: deltas with delta.content and/or delta.tool_calls, a final chunk with finish_reason and usage, then data: [DONE]. Tool execution stays in your app—you reply with role tool messages on the next request. Plans may return 403 with quota_dimension tool_assisted_responses_per_month when limits apply. Tools require an OpenAI-routed model (e.g. openai/gpt-4o); other providers return 400 if tools are present.

Provider support

Function calling is validated for OpenAI chat models. Requests that include tools with a non-OpenAI provider/model pair are rejected with 400.

Non-stream JSON (first turn)

When the model calls a tool, choices[0].message.content may be empty and choices[0].message.tool_calls carries id, type, and function.name / function.arguments (JSON string). finish_reason is tool_calls.

bash
curl -sS "https://api.tokensaver.fr/openai/v1/chat/completions" \
  -H "Authorization: Bearer $TS_KEY" \
  -H "Content-Type: application/json" \
  -d @- <<'EOF'
{
  "model": "openai/gpt-4o",
  "messages": [{"role": "user", "content": "What is the weather in Paris? Use get_weather."}],
  "stream": false,
  "tool_choice": "auto",
  "parallel_tool_calls": true,
  "tools": [
    {
      "type": "function",
      "function": {
        "name": "get_weather",
        "description": "Return weather for a city.",
        "parameters": {
          "type": "object",
          "properties": {"city": {"type": "string"}},
          "required": ["city"]
        }
      }
    }
  ]
}
EOF
json
{
  "model": "openai/gpt-4o",
  "messages": [{"role": "user", "content": "What is the weather in Paris? Use get_weather."}],
  "stream": false,
  "tool_choice": "auto",
  "parallel_tool_calls": true,
  "tools": [
    {
      "type": "function",
      "function": {
        "name": "get_weather",
        "description": "Return weather for a city.",
        "parameters": {
          "type": "object",
          "properties": {"city": {"type": "string"}},
          "required": ["city"]
        }
      }
    }
  ]
}

OpenAI Python SDK — JSON

python
from openai import OpenAI
 
client = OpenAI(api_key="ts_...", base_url="https://api.tokensaver.fr/openai/v1")
 
tools = [{
    "type": "function",
    "function": {
        "name": "get_weather",
        "description": "Return weather for a city.",
        "parameters": {
            "type": "object",
            "properties": {"city": {"type": "string"}},
            "required": ["city"],
        },
    },
}]
 
r = client.chat.completions.create(
    model="openai/gpt-4o",
    messages=[{"role": "user", "content": "Weather in Paris? Use get_weather."}],
    tools=tools,
    tool_choice="auto",
    stream=False,
)
msg = r.choices[0].message
if msg.tool_calls:
    print(msg.tool_calls[0].function.name, msg.tool_calls[0].function.arguments)
else:
    print(msg.content)

Streaming SSE (curl)

Use curl -N so chunks are not buffered. Each event is a line data: {...} (JSON) or data: [DONE]. Tool name and arguments may arrive split across several delta.tool_calls chunks (index, id, type, function.name, function.arguments).

bash
curl -N -sS "https://api.tokensaver.fr/openai/v1/chat/completions" \
  -H "Authorization: Bearer $TS_KEY" -H "Content-Type: application/json" \
  -d '{
    "model": "openai/gpt-4o",
    "stream": true,
    "tool_choice": "auto",
    "tools": [{"type":"function","function":{"name":"get_weather","parameters":{"type":"object","properties":{"city":{"type":"string"}}}}}],
    "messages": [{"role":"user","content":"Weather in Paris?"}]
  }'
text
data: {"id":"chatcmpl-...","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"role":"assistant"},"finish_reason":null}]}
 
data: {"id":"chatcmpl-...","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"tool_calls":[{"index":0,"id":"call_abc","type":"function","function":{"name":"get_weather","arguments":""}}]},"finish_reason":null}]}
 
data: {"id":"chatcmpl-...","object":"chat.completion.chunk","choices":[{"index":0,"delta":{},"finish_reason":"tool_calls"}],"usage":{"prompt_tokens":42,"completion_tokens":7,"total_tokens":49}}
 
data: [DONE]

OpenAI Python SDK — stream=True

python
from openai import OpenAI
 
client = OpenAI(api_key="ts_...", base_url="https://api.tokensaver.fr/openai/v1")
tools = [{"type": "function", "function": {"name": "get_weather", "parameters": {"type": "object", "properties": {"city": {"type": "string"}}}}}]
 
stream = client.chat.completions.create(
    model="openai/gpt-4o",
    stream=True,
    tools=tools,
    messages=[{"role": "user", "content": "Weather in Paris?"}],
)
for chunk in stream:
    ch = chunk.choices[0]
    if ch.delta.content:
        print(ch.delta.content, end="", flush=True)
    if ch.delta.tool_calls:
        for tc in ch.delta.tool_calls:
            if tc.function and tc.function.name:
                print(f"\n[tool] {tc.function.name}", flush=True)
    if ch.finish_reason:
        print(f"\nfinish_reason={ch.finish_reason}", flush=True)

LangChain — stream + bind_tools

python
from langchain_openai import ChatOpenAI
from langchain_core.tools import tool
 
@tool
def get_weather(city: str) -> str:
    """Return fake weather for demo."""
    return "sunny"
 
llm = ChatOpenAI(
    model="openai/gpt-4o",
    api_key="ts_...",
    base_url="https://api.tokensaver.fr/openai/v1",
)
llm_tools = llm.bind_tools([get_weather])
for chunk in llm_tools.stream("What's the weather in Paris?"):
    # Text tokens
    if chunk.content:
        print(chunk.content, end="", flush=True)
    # Tool call fragments (LangChain aggregates tool_calls on the message)
    if getattr(chunk, "tool_calls", None):
        print(chunk.tool_calls, flush=True)

Second request (tool results)

Run your function locally, then append an assistant message (with the same tool_calls the API returned) and one tool message per call with a matching tool_call_id. The function name is optional: TokenSaver infers it from the preceding assistant tool_calls when omitted (OpenAI-compatible clients often send only tool_call_id and content). You must send tools again on the follow-up if the history contains role: tool (otherwise the API returns 400).

json
{
  "model": "openai/gpt-4o",
  "tools": [ /* same schema as first turn */ ],
  "messages": [
    {"role": "user", "content": "Weather in Paris?"},
    {
      "role": "assistant",
      "content": "",
      "tool_calls": [
        {"id": "call_abc", "type": "function", "function": {"name": "get_weather", "arguments": "{\"city\":\"Paris\"}"}}
      ]
    },
    {"role": "tool", "tool_call_id": "call_abc", "name": "get_weather", "content": "{\"conditions\":\"sunny\"}"}
  ]
}

Quotas & limits

Tool-assisted completions may count against tool_assisted_responses_per_month. When exceeded, the API returns 403 with a structured body including quota_dimension. Technical limits (tool count, argument size, number of tool_calls) apply to all tenants.

Pipeline modules (cache, RAG, …) use the same tokensaver object and X-Tokensaver-* headers as non-tool calls.