Tools & streaming
Send OpenAI-style tools and optional tool_choice. JSON responses include message.tool_calls and finish_reason tool_calls when the model selects a function. With stream: true, the server emits text/event-stream: deltas with delta.content and/or delta.tool_calls, a final chunk with finish_reason and usage, then data: [DONE]. Tool execution stays in your app—you reply with role tool messages on the next request. Plans may return 403 with quota_dimension tool_assisted_responses_per_month when limits apply. Tools require an OpenAI-routed model (e.g. openai/gpt-4o); other providers return 400 if tools are present.
Provider support
Function calling is validated for OpenAI chat models. Requests that include tools with a non-OpenAI provider/model pair are rejected with 400.
Non-stream JSON (first turn)
When the model calls a tool, choices[0].message.content may be empty and choices[0].message.tool_calls carries id, type, and function.name / function.arguments (JSON string). finish_reason is tool_calls.
curl -sS "https://api.tokensaver.fr/openai/v1/chat/completions" \
-H "Authorization: Bearer $TS_KEY" \
-H "Content-Type: application/json" \
-d @- <<'EOF'
{"model": "openai/gpt-4o",
"messages": [{"role": "user", "content": "What is the weather in Paris? Use get_weather."}],"stream": false,
"tool_choice": "auto",
"parallel_tool_calls": true,
"tools": [
{"type": "function",
"function": {"name": "get_weather",
"description": "Return weather for a city.",
"parameters": {"type": "object",
"properties": {"city": {"type": "string"}},"required": ["city"]
}
}
}
]
}
EOF
{"model": "openai/gpt-4o",
"messages": [{"role": "user", "content": "What is the weather in Paris? Use get_weather."}],"stream": false,
"tool_choice": "auto",
"parallel_tool_calls": true,
"tools": [
{"type": "function",
"function": {"name": "get_weather",
"description": "Return weather for a city.",
"parameters": {"type": "object",
"properties": {"city": {"type": "string"}},"required": ["city"]
}
}
}
]
}
OpenAI Python SDK — JSON
from openai import OpenAI
client = OpenAI(api_key="ts_...", base_url="https://api.tokensaver.fr/openai/v1")
tools = [{"type": "function",
"function": {"name": "get_weather",
"description": "Return weather for a city.",
"parameters": {"type": "object",
"properties": {"city": {"type": "string"}},"required": ["city"],
},
},
}]
r = client.chat.completions.create(
model="openai/gpt-4o",
messages=[{"role": "user", "content": "Weather in Paris? Use get_weather."}],tools=tools,
tool_choice="auto",
stream=False,
)
msg = r.choices[0].message
if msg.tool_calls:
print(msg.tool_calls[0].function.name, msg.tool_calls[0].function.arguments)
else:
print(msg.content)
Streaming SSE (curl)
Use curl -N so chunks are not buffered. Each event is a line data: {...} (JSON) or data: [DONE]. Tool name and arguments may arrive split across several delta.tool_calls chunks (index, id, type, function.name, function.arguments).
curl -N -sS "https://api.tokensaver.fr/openai/v1/chat/completions" \
-H "Authorization: Bearer $TS_KEY" -H "Content-Type: application/json" \
-d '{"model": "openai/gpt-4o",
"stream": true,
"tool_choice": "auto",
"tools": [{"type":"function","function":{"name":"get_weather","parameters":{"type":"object","properties":{"city":{"type":"string"}}}}}], "messages": [{"role":"user","content":"Weather in Paris?"}]}'
data: {"id":"chatcmpl-...","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"role":"assistant"},"finish_reason":null}]}data: {"id":"chatcmpl-...","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"tool_calls":[{"index":0,"id":"call_abc","type":"function","function":{"name":"get_weather","arguments":""}}]},"finish_reason":null}]}data: {"id":"chatcmpl-...","object":"chat.completion.chunk","choices":[{"index":0,"delta":{},"finish_reason":"tool_calls"}],"usage":{"prompt_tokens":42,"completion_tokens":7,"total_tokens":49}}data: [DONE]
OpenAI Python SDK — stream=True
from openai import OpenAI
client = OpenAI(api_key="ts_...", base_url="https://api.tokensaver.fr/openai/v1")
tools = [{"type": "function", "function": {"name": "get_weather", "parameters": {"type": "object", "properties": {"city": {"type": "string"}}}}}]stream = client.chat.completions.create(
model="openai/gpt-4o",
stream=True,
tools=tools,
messages=[{"role": "user", "content": "Weather in Paris?"}],)
for chunk in stream:
ch = chunk.choices[0]
if ch.delta.content:
print(ch.delta.content, end="", flush=True)
if ch.delta.tool_calls:
for tc in ch.delta.tool_calls:
if tc.function and tc.function.name:
print(f"\n[tool] {tc.function.name}", flush=True)if ch.finish_reason:
print(f"\nfinish_reason={ch.finish_reason}", flush=True)LangChain — stream + bind_tools
from langchain_openai import ChatOpenAI
from langchain_core.tools import tool
@tool
def get_weather(city: str) -> str:
"""Return fake weather for demo."""
return "sunny"
llm = ChatOpenAI(
model="openai/gpt-4o",
api_key="ts_...",
base_url="https://api.tokensaver.fr/openai/v1",
)
llm_tools = llm.bind_tools([get_weather])
for chunk in llm_tools.stream("What's the weather in Paris?"):# Text tokens
if chunk.content:
print(chunk.content, end="", flush=True)
# Tool call fragments (LangChain aggregates tool_calls on the message)
if getattr(chunk, "tool_calls", None):
print(chunk.tool_calls, flush=True)
Second request (tool results)
Run your function locally, then append an assistant message (with the same tool_calls the API returned) and one tool message per call with a matching tool_call_id. The function name is optional: TokenSaver infers it from the preceding assistant tool_calls when omitted (OpenAI-compatible clients often send only tool_call_id and content). You must send tools again on the follow-up if the history contains role: tool (otherwise the API returns 400).
{"model": "openai/gpt-4o",
"tools": [ /* same schema as first turn */ ],
"messages": [
{"role": "user", "content": "Weather in Paris?"}, {"role": "assistant",
"content": "",
"tool_calls": [
{"id": "call_abc", "type": "function", "function": {"name": "get_weather", "arguments": "{\"city\":\"Paris\"}"}}]
},
{"role": "tool", "tool_call_id": "call_abc", "name": "get_weather", "content": "{\"conditions\":\"sunny\"}"}]
}
Quotas & limits
Tool-assisted completions may count against tool_assisted_responses_per_month. When exceeded, the API returns 403 with a structured body including quota_dimension. Technical limits (tool count, argument size, number of tool_calls) apply to all tenants.
Pipeline modules (cache, RAG, …) use the same tokensaver object and X-Tokensaver-* headers as non-tool calls.
