Pricing

Platform plans from €0 to Enterprise: optimize tokens (local RAG, semantic cache, RAG context compression) and secure every prompt (PII detection, anonymization, sensitive-data governance). Free is a 30-day hosted trial; Starter and Growth use BYOK at provider rates.

Get started free Talk to sales

Growth

Most popular

Most popular — scale without compromise

€99 / month

PII, compression, and 4× quotas: the plan teams choose when LLM powers customer-facing apps in production.

196 models included — GPT-4.1, Claude Opus, Gemini Pro, Mistral Large…
PII + compression + every TokenSaver module
200k req · 100M tokens · 20 GB RAG — best power-to-price ratio

CacheRAGPIICompression

Subscribe in app

Feature comparison — Growth

Monthly platform fee

€99

Requests / month

200,000

Requests / day50/day + 20 RPM fair use · monthly cap for platform trial

—

Rate limit (req / min)

—

Tokens / month

100M

Tool-assisted responses / monthRuns with OpenAI tools enabled

80,000

TokenSaver API keys

RAG documents (quota)Upload limit when RAG module is enabled on your plan

200

Knowledge storage

20 GB

Semantic compression

✓

PII detection & anonymization

✓

RAG retrieval moduleDocument upload & retrieval in pipeline

✓

Upstream LLM (chat)Free: hosted entry models in quotas. Paid plans: BYOK — you pay providers at list price

196 models BYOK

RAG / cache embeddingsLocal open-source models (bge-large, etc.) — no OpenAI embedding API charge

✓

Indicative model pricing (console)Pass-through estimate from llm_models catalogue; actual bill from your provider

Provider rates

Semantic cache (token savings)Hit rate and avoided tokens visible in console

✓

OpenAI-compatible API

✓

Console & usage monitoring

✓

Self-service billing (Stripe)

✓

Contractual SLA & dedicated support

—

Feature	Free€030-day trial · no credit card · hosted LLM Hosted LLMs, smart cache, and local RAG: test on real traffic and prove token savings before you subscribe. CacheRAG	Starter€29/ month BYOK, cache, and RAG in production: your provider keys and rates, with a lower LLM bill from week one. CacheRAG	Growth€99/ month PII, compression, and 4× quotas: the plan teams choose when LLM powers customer-facing apps in production. CacheRAGPIICompression	EnterpriseCustomAnnual & volume options Unlimited quotas, SLA, and a dedicated partner — for organizations where every token, model, and datum matters. CacheRAGPIICompression
Monthly platform fee	€0	€29	€99	Custom
Requests / month	500	50,000	200,000	Unlimited
Requests / day50/day + 20 RPM fair use · monthly cap for platform trial	50	—	—	—
Rate limit (req / min)	20	—	—	—
Tokens / month	150k	10M	100M	Unlimited
Tool-assisted responses / monthRuns with OpenAI tools enabled	200	10,000	80,000	Unlimited
TokenSaver API keys	1	5	10	Unlimited
RAG documents (quota)Upload limit when RAG module is enabled on your plan	3	10	200	Unlimited
Knowledge storage	100 MB	5 GB	20 GB	Unlimited
Semantic compression	—	—	✓	✓
PII detection & anonymization	—	—	✓	✓
RAG retrieval moduleDocument upload & retrieval in pipeline	✓	✓	✓	✓
Upstream LLM (chat)Free: hosted entry models in quotas. Paid plans: BYOK — you pay providers at list price	5 hosted entry models	108 models BYOK	196 models BYOK	261+ full catalogue
RAG / cache embeddingsLocal open-source models (bge-large, etc.) — no OpenAI embedding API charge	✓	✓	✓	✓
Indicative model pricing (console)Pass-through estimate from llm_models catalogue; actual bill from your provider	Included (hosted)	Provider rates	Provider rates	Provider rates
Semantic cache (token savings)Hit rate and avoided tokens visible in console	✓	✓	✓	✓
OpenAI-compatible API	✓	✓	✓	✓
Console & usage monitoring	✓	✓	✓	✓
Self-service billing (Stripe)	—	✓	✓	—
Contractual SLA & dedicated support	—	—	—	✓
	Sign up for free	Subscribe in app	Subscribe in app	Contact sales

Quotas and modules match the TokenSaver console Plan & Usage view. Free is a 30-day trial with hosted entry-model inference within quotas. Starter and Growth are platform fees (cache, RAG, governance) — upstream chat LLM is BYOK at provider rates (see model pricing). RAG and semantic cache use local embeddings (included).

What every plan unlocks

One governed pipeline — token savings and data compliance, visible in the console.

Token optimization
Local RAG on your documents · semantic cache (exact + similarity) · RAG context compression before the LLM · tokens saved per run in metrics
Security & sensitive data
PII detection and anonymization (Growth+) · fewer leaks in outbound prompts · org governance, quotas, and traceability on every call

Measured in the console

Lower billed tokens: local RAG, semantic cache, and context compression

Your knowledge base stays in your perimeter. The pipeline runs cache → RAG retrieval → compression of retrieved context → PII anonymization → LLM. Early adopter pilots see measurable savings on repeated prompts and document-heavy flows — tokens saved per run in the dashboard.

The 3 levers tracked in the console

These are not marketing projections: they are TokenSaver modules measured in your dashboard.

1 · Semantic cachePilots: 25–45% fewer tokens
When a similar request was already handled, the answer is served again without calling the LLM — you pay few or no tokens on that run.
2 · Local RAG + compressionLeaner context in the prompt
Your documents stay in your perimeter; only useful excerpts go to the model, often compressed (Growth) to cut input tokens.
3 · Per-run trackingFull cache hit ≈ 0 LLM tokens billed
Every pipeline run shows billed vs saved tokens — use this to measure real ROI on your workflows.

Ballpark figures from early adopter pilots — your real numbers are in the console (tokens saved per run).

Billing FAQ

How does billing work for Starter and Growth?

Subscribe in the TokenSaver console with Stripe Checkout (€29 or €99 per month). Your organisation plan and quotas update after payment. Change plan later via the Stripe Customer Portal.

What is the Early Adopter (Free) plan?

A no-cost 30-day platform trial: hosted entry LLMs (no provider API key), semantic cache, and local RAG (up to 3 documents, 100 MB) — 500 requests and 150k tokens per calendar month during the trial, 50 requests per day, 20 req/min. Validate token savings and agent workflows, then upgrade to Starter or Growth (BYOK) to continue. Request access via the Early Adopter form — no credit card required.

How do I optimize tokens with TokenSaver?

Enable semantic cache, index documents in local RAG, and use RAG context compression (Growth) so the LLM only receives what matters. Each run shows billed vs saved tokens — ideal for support bots, copilots, and agents with recurring prompts.

How is sensitive data protected?

The PII module (Growth and Enterprise) detects and anonymizes names, emails, and other entities before the provider call. Combine it with RAG and compression to limit business data exposure in prompts. Org-level quotas and traceability in the console.

When is Starter or Growth worth it?

When cache, RAG, and compression save more than the platform fee (€29 or €99/mo) — typical for repeated-prompt agents or internal-document flows. Growth adds PII and compression for compliance use cases. Real savings show up in the dashboard (tokens saved per run).

Which modules are included on each plan?

Free and Starter include cache and RAG. Growth and Enterprise add PII detection and semantic compression. Enterprise adds unlimited quotas, SLA, and dedicated support.

Are LLM provider costs included?

Free is a 30-day trial with hosted entry-model inference within plan quotas (no provider key to start). After the trial, upgrade to Starter or Growth. Starter and Growth charge a platform fee for cache, RAG, compression, PII, and governance — chat LLM is BYOK: you add provider keys and pay OpenAI, Anthropic, etc. at their official rates. TokenSaver monitoring shows indicative $/M from the model catalogue. RAG and semantic cache embeddings are computed locally (no embedding API bill). Enterprise is also BYOK with unlimited quotas and SLA.

Can I change plans later?

Yes. Paid plans can be changed in-app via Stripe. Downgrading to Free may require cancelling your subscription in the portal first.

Growth

What every plan unlocks

Lower billed tokens: local RAG, semantic cache, and context compression

The 3 levers tracked in the console

Billing FAQ

Join teams building with governed LLM traffic