POST/openai/v1/chat/completions

Tools & streaming

Send OpenAI-style tools and optional tool_choice. JSON responses include message.tool_calls and finish_reason tool_calls when the model selects a function. With stream: true, the server emits text/event-stream: deltas with delta.content and/or delta.tool_calls, a final chunk with finish_reason and usage, then data: [DONE]. Tool execution stays in your app—you reply with role tool messages on the next request. Plans may return 403 with quota_dimension tool_assisted_responses_per_month when limits apply. Tools require an OpenAI-routed model (e.g. openai/gpt-4o); other providers return 400 if tools are present.

Provider support

Function calling is validated for OpenAI chat models. Requests that include tools with a non-OpenAI provider/model pair are rejected with 400.

Non-stream JSON (first turn)

When the model calls a tool, choices[0].message.content may be empty and choices[0].message.tool_calls carries id, type, and function.name / function.arguments (JSON string). finish_reason is tool_calls.

bash

curl -sS "https://api.tokensaver.fr/openai/v1/chat/completions" \

  -H "Authorization: Bearer $TS_KEY" \

  -H "Content-Type: application/json" \

  -d @- <<'EOF'

  "model": "openai/gpt-4o",

  "messages": [{"role": "user", "content": "What is the weather in Paris? Use get_weather."}],

  "stream": false,

  "tool_choice": "auto",

  "parallel_tool_calls": true,

  "tools": [

      "type": "function",

      "function": {

        "name": "get_weather",

        "description": "Return weather for a city.",

        "parameters": {

          "type": "object",

          "properties": {"city": {"type": "string"}},

          "required": ["city"]

EOF

json

  "model": "openai/gpt-4o",

  "messages": [{"role": "user", "content": "What is the weather in Paris? Use get_weather."}],

  "stream": false,

  "tool_choice": "auto",

  "parallel_tool_calls": true,

  "tools": [

      "type": "function",

      "function": {

        "name": "get_weather",

        "description": "Return weather for a city.",

        "parameters": {

          "type": "object",

          "properties": {"city": {"type": "string"}},

          "required": ["city"]

OpenAI Python SDK — JSON

python

from openai import OpenAI

client = OpenAI(api_key="ts_...", base_url="https://api.tokensaver.fr/openai/v1")

tools = [{

    "type": "function",

    "function": {

        "name": "get_weather",

        "description": "Return weather for a city.",

        "parameters": {

            "type": "object",

            "properties": {"city": {"type": "string"}},

            "required": ["city"],

},

},

}]

r = client.chat.completions.create(

    model="openai/gpt-4o",

    messages=[{"role": "user", "content": "Weather in Paris? Use get_weather."}],

    tools=tools,

    tool_choice="auto",

    stream=False,

msg = r.choices[0].message

if msg.tool_calls:

    print(msg.tool_calls[0].function.name, msg.tool_calls[0].function.arguments)

else:

    print(msg.content)

Streaming SSE (curl)

Use curl -N so chunks are not buffered. Each event is a line data: {...} (JSON) or data: [DONE]. Tool name and arguments may arrive split across several delta.tool_calls chunks (index, id, type, function.name, function.arguments).

bash

curl -N -sS "https://api.tokensaver.fr/openai/v1/chat/completions" \

  -H "Authorization: Bearer $TS_KEY" -H "Content-Type: application/json" \

  -d '{

    "model": "openai/gpt-4o",

    "stream": true,

    "tool_choice": "auto",

    "tools": [{"type":"function","function":{"name":"get_weather","parameters":{"type":"object","properties":{"city":{"type":"string"}}}}}],

    "messages": [{"role":"user","content":"Weather in Paris?"}]

}'

text

data: {"id":"chatcmpl-...","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"role":"assistant"},"finish_reason":null}]}

data: {"id":"chatcmpl-...","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"tool_calls":[{"index":0,"id":"call_abc","type":"function","function":{"name":"get_weather","arguments":""}}]},"finish_reason":null}]}

data: {"id":"chatcmpl-...","object":"chat.completion.chunk","choices":[{"index":0,"delta":{},"finish_reason":"tool_calls"}],"usage":{"prompt_tokens":42,"completion_tokens":7,"total_tokens":49}}

data: [DONE]

OpenAI Python SDK — stream=True

python

from openai import OpenAI

client = OpenAI(api_key="ts_...", base_url="https://api.tokensaver.fr/openai/v1")

tools = [{"type": "function", "function": {"name": "get_weather", "parameters": {"type": "object", "properties": {"city": {"type": "string"}}}}}]

stream = client.chat.completions.create(

    model="openai/gpt-4o",

    stream=True,

    tools=tools,

    messages=[{"role": "user", "content": "Weather in Paris?"}],

for chunk in stream:

    ch = chunk.choices[0]

    if ch.delta.content:

        print(ch.delta.content, end="", flush=True)

    if ch.delta.tool_calls:

        for tc in ch.delta.tool_calls:

            if tc.function and tc.function.name:

                print(f"\n[tool] {tc.function.name}", flush=True)

    if ch.finish_reason:

        print(f"\nfinish_reason={ch.finish_reason}", flush=True)

LangChain — stream + bind_tools

python

from langchain_openai import ChatOpenAI

from langchain_core.tools import tool

@tool

def get_weather(city: str) -> str:

    """Return fake weather for demo."""

    return "sunny"

llm = ChatOpenAI(

    model="openai/gpt-4o",

    api_key="ts_...",

    base_url="https://api.tokensaver.fr/openai/v1",

llm_tools = llm.bind_tools([get_weather])

for chunk in llm_tools.stream("What's the weather in Paris?"):

    # Text tokens

    if chunk.content:

        print(chunk.content, end="", flush=True)

    # Tool call fragments (LangChain aggregates tool_calls on the message)

    if getattr(chunk, "tool_calls", None):

        print(chunk.tool_calls, flush=True)

Second request (tool results)

Run your function locally, then append an assistant message (with the same tool_calls the API returned) and one tool message per call with a matching tool_call_id. The function name is optional: TokenSaver infers it from the preceding assistant tool_calls when omitted (OpenAI-compatible clients often send only tool_call_id and content). You must send tools again on the follow-up if the history contains role: tool (otherwise the API returns 400).

json

  "model": "openai/gpt-4o",

  "tools": [ /* same schema as first turn */ ],

  "messages": [

    {"role": "user", "content": "Weather in Paris?"},

      "role": "assistant",

      "content": "",

      "tool_calls": [

        {"id": "call_abc", "type": "function", "function": {"name": "get_weather", "arguments": "{\"city\":\"Paris\"}"}}

},

    {"role": "tool", "tool_call_id": "call_abc", "name": "get_weather", "content": "{\"conditions\":\"sunny\"}"}

Quotas & limits

Tool-assisted completions may count against tool_assisted_responses_per_month. When exceeded, the API returns 403 with a structured body including quota_dimension. Technical limits (tool count, argument size, number of tool_calls) apply to all tenants.

Pipeline modules (cache, RAG, …) use the same tokensaver object and X-Tokensaver-* headers as non-tool calls.