Pricing

Platform plans from €0 to Enterprise: optimize tokens (local RAG, semantic cache, RAG context compression) and secure every prompt (PII detection, anonymization, sensitive-data governance). Free is a 30-day hosted trial; Starter and Growth use BYOK at provider rates.

Feature comparison — Growth

Monthly platform fee
€99
Requests / month
200,000
Requests / day50/day + 20 RPM fair use · monthly cap for platform trial
Rate limit (req / min)
Tokens / month
100M
Tool-assisted responses / monthRuns with OpenAI tools enabled
80,000
TokenSaver API keys
10
RAG documents (quota)Upload limit when RAG module is enabled on your plan
200
Knowledge storage
20 GB
Semantic compression
PII detection & anonymization
RAG retrieval moduleDocument upload & retrieval in pipeline
Upstream LLM (chat)Free: hosted entry models in quotas. Paid plans: BYOK — you pay providers at list price
196 models BYOK
RAG / cache embeddingsLocal open-source models (bge-large, etc.) — no OpenAI embedding API charge
Indicative model pricing (console)Pass-through estimate from llm_models catalogue; actual bill from your provider
Provider rates
Semantic cache (token savings)Hit rate and avoided tokens visible in console
OpenAI-compatible API
Console & usage monitoring
Self-service billing (Stripe)
Contractual SLA & dedicated support
Feature
Free€030-day trial · no credit card · hosted LLM

Hosted LLMs, smart cache, and local RAG: test on real traffic and prove token savings before you subscribe.

CacheRAG
Starter€29/ month

BYOK, cache, and RAG in production: your provider keys and rates, with a lower LLM bill from week one.

CacheRAG
Growth€99/ month

PII, compression, and 4× quotas: the plan teams choose when LLM powers customer-facing apps in production.

CacheRAGPIICompression
EnterpriseCustomAnnual & volume options

Unlimited quotas, SLA, and a dedicated partner — for organizations where every token, model, and datum matters.

CacheRAGPIICompression
Monthly platform fee€0€29Custom
Requests / month50050,000Unlimited
Requests / day50/day + 20 RPM fair use · monthly cap for platform trial50
Rate limit (req / min)20
Tokens / month150k10MUnlimited
Tool-assisted responses / monthRuns with OpenAI tools enabled20010,000Unlimited
TokenSaver API keys15Unlimited
RAG documents (quota)Upload limit when RAG module is enabled on your plan310Unlimited
Knowledge storage100 MB5 GBUnlimited
Semantic compression
PII detection & anonymization
RAG retrieval moduleDocument upload & retrieval in pipeline
Upstream LLM (chat)Free: hosted entry models in quotas. Paid plans: BYOK — you pay providers at list price5 hosted entry models108 models BYOK261+ full catalogue
RAG / cache embeddingsLocal open-source models (bge-large, etc.) — no OpenAI embedding API charge
Indicative model pricing (console)Pass-through estimate from llm_models catalogue; actual bill from your providerIncluded (hosted)Provider ratesProvider rates
Semantic cache (token savings)Hit rate and avoided tokens visible in console
OpenAI-compatible API
Console & usage monitoring
Self-service billing (Stripe)
Contractual SLA & dedicated support
Sign up for freeSubscribe in appSubscribe in appContact sales

Quotas and modules match the TokenSaver console Plan & Usage view. Free is a 30-day trial with hosted entry-model inference within quotas. Starter and Growth are platform fees (cache, RAG, governance) — upstream chat LLM is BYOK at provider rates (see model pricing). RAG and semantic cache use local embeddings (included).

What every plan unlocks

One governed pipeline — token savings and data compliance, visible in the console.

  • Token optimization

    Local RAG on your documents · semantic cache (exact + similarity) · RAG context compression before the LLM · tokens saved per run in metrics

  • Security & sensitive data

    PII detection and anonymization (Growth+) · fewer leaks in outbound prompts · org governance, quotas, and traceability on every call

Measured in the console

Lower billed tokens: local RAG, semantic cache, and context compression

Your knowledge base stays in your perimeter. The pipeline runs cache → RAG retrieval → compression of retrieved context → PII anonymization → LLM. Early adopter pilots see measurable savings on repeated prompts and document-heavy flows — tokens saved per run in the dashboard.

The 3 levers tracked in the console

These are not marketing projections: they are TokenSaver modules measured in your dashboard.

  • 1 · Semantic cachePilots: 25–45% fewer tokens

    When a similar request was already handled, the answer is served again without calling the LLM — you pay few or no tokens on that run.

  • 2 · Local RAG + compressionLeaner context in the prompt

    Your documents stay in your perimeter; only useful excerpts go to the model, often compressed (Growth) to cut input tokens.

  • 3 · Per-run trackingFull cache hit ≈ 0 LLM tokens billed

    Every pipeline run shows billed vs saved tokens — use this to measure real ROI on your workflows.

Ballpark figures from early adopter pilots — your real numbers are in the console (tokens saved per run).

Billing FAQ

How does billing work for Starter and Growth?

Subscribe in the TokenSaver console with Stripe Checkout (€29 or €99 per month). Your organisation plan and quotas update after payment. Change plan later via the Stripe Customer Portal.

What is the Early Adopter (Free) plan?

A no-cost 30-day platform trial: hosted entry LLMs (no provider API key), semantic cache, and local RAG (up to 3 documents, 100 MB) — 500 requests and 150k tokens per calendar month during the trial, 50 requests per day, 20 req/min. Validate token savings and agent workflows, then upgrade to Starter or Growth (BYOK) to continue. Request access via the Early Adopter form — no credit card required.

How do I optimize tokens with TokenSaver?

Enable semantic cache, index documents in local RAG, and use RAG context compression (Growth) so the LLM only receives what matters. Each run shows billed vs saved tokens — ideal for support bots, copilots, and agents with recurring prompts.

How is sensitive data protected?

The PII module (Growth and Enterprise) detects and anonymizes names, emails, and other entities before the provider call. Combine it with RAG and compression to limit business data exposure in prompts. Org-level quotas and traceability in the console.

When is Starter or Growth worth it?

When cache, RAG, and compression save more than the platform fee (€29 or €99/mo) — typical for repeated-prompt agents or internal-document flows. Growth adds PII and compression for compliance use cases. Real savings show up in the dashboard (tokens saved per run).

Which modules are included on each plan?

Free and Starter include cache and RAG. Growth and Enterprise add PII detection and semantic compression. Enterprise adds unlimited quotas, SLA, and dedicated support.

Are LLM provider costs included?

Free is a 30-day trial with hosted entry-model inference within plan quotas (no provider key to start). After the trial, upgrade to Starter or Growth. Starter and Growth charge a platform fee for cache, RAG, compression, PII, and governance — chat LLM is BYOK: you add provider keys and pay OpenAI, Anthropic, etc. at their official rates. TokenSaver monitoring shows indicative $/M from the model catalogue. RAG and semantic cache embeddings are computed locally (no embedding API bill). Enterprise is also BYOK with unlimited quotas and SLA.

Can I change plans later?

Yes. Paid plans can be changed in-app via Stripe. Downgrading to Free may require cancelling your subscription in the portal first.

Join teams building with governed LLM traffic