Platform plans from €0 to Enterprise: optimize tokens (local RAG, semantic cache, RAG context compression) and secure every prompt (PII detection, anonymization, sensitive-data governance). Free is a 30-day hosted trial; Starter and Growth use BYOK at provider rates.
Quotas and modules match the TokenSaver console Plan & Usage view. Free is a 30-day trial with hosted entry-model inference within quotas. Starter and Growth are platform fees (cache, RAG, governance) — upstream chat LLM is BYOK at provider rates (see model pricing). RAG and semantic cache use local embeddings (included).
What every plan unlocks
One governed pipeline — token savings and data compliance, visible in the console.
Token optimization
Local RAG on your documents · semantic cache (exact + similarity) · RAG context compression before the LLM · tokens saved per run in metrics
Security & sensitive data
PII detection and anonymization (Growth+) · fewer leaks in outbound prompts · org governance, quotas, and traceability on every call
Measured in the console
Lower billed tokens: local RAG, semantic cache, and context compression
Your knowledge base stays in your perimeter. The pipeline runs cache → RAG retrieval → compression of retrieved context → PII anonymization → LLM. Early adopter pilots see measurable savings on repeated prompts and document-heavy flows — tokens saved per run in the dashboard.
The 3 levers tracked in the console
These are not marketing projections: they are TokenSaver modules measured in your dashboard.
1 · Semantic cachePilots: 25–45% fewer tokens
When a similar request was already handled, the answer is served again without calling the LLM — you pay few or no tokens on that run.
2 · Local RAG + compressionLeaner context in the prompt
Your documents stay in your perimeter; only useful excerpts go to the model, often compressed (Growth) to cut input tokens.
Every pipeline run shows billed vs saved tokens — use this to measure real ROI on your workflows.
Ballpark figures from early adopter pilots — your real numbers are in the console (tokens saved per run).
Billing FAQ
How does billing work for Starter and Growth?
Subscribe in the TokenSaver console with Stripe Checkout (€29 or €99 per month). Your organisation plan and quotas update after payment. Change plan later via the Stripe Customer Portal.
What is the Early Adopter (Free) plan?
A no-cost 30-day platform trial: hosted entry LLMs (no provider API key), semantic cache, and local RAG (up to 3 documents, 100 MB) — 500 requests and 150k tokens per calendar month during the trial, 50 requests per day, 20 req/min. Validate token savings and agent workflows, then upgrade to Starter or Growth (BYOK) to continue. Request access via the Early Adopter form — no credit card required.
How do I optimize tokens with TokenSaver?
Enable semantic cache, index documents in local RAG, and use RAG context compression (Growth) so the LLM only receives what matters. Each run shows billed vs saved tokens — ideal for support bots, copilots, and agents with recurring prompts.
How is sensitive data protected?
The PII module (Growth and Enterprise) detects and anonymizes names, emails, and other entities before the provider call. Combine it with RAG and compression to limit business data exposure in prompts. Org-level quotas and traceability in the console.
When is Starter or Growth worth it?
When cache, RAG, and compression save more than the platform fee (€29 or €99/mo) — typical for repeated-prompt agents or internal-document flows. Growth adds PII and compression for compliance use cases. Real savings show up in the dashboard (tokens saved per run).
Which modules are included on each plan?
Free and Starter include cache and RAG. Growth and Enterprise add PII detection and semantic compression. Enterprise adds unlimited quotas, SLA, and dedicated support.
Are LLM provider costs included?
Free is a 30-day trial with hosted entry-model inference within plan quotas (no provider key to start). After the trial, upgrade to Starter or Growth. Starter and Growth charge a platform fee for cache, RAG, compression, PII, and governance — chat LLM is BYOK: you add provider keys and pay OpenAI, Anthropic, etc. at their official rates. TokenSaver monitoring shows indicative $/M from the model catalogue. RAG and semantic cache embeddings are computed locally (no embedding API bill). Enterprise is also BYOK with unlimited quotas and SLA.
Can I change plans later?
Yes. Paid plans can be changed in-app via Stripe. Downgrading to Free may require cancelling your subscription in the portal first.