Data stays in your cloud
Resilient SLOs under traffic bursts
Run any vLLM model
With other providers
No guaranteed SLOs in a shared API
No control over prompts, weights, or logs leaving your environment
Limited model selection, fixed pre-configs
Unpredictable costs at scale
With Rivvr
Custom SLOs, held automatically, even under traffic bursts
Runs inside your own AWS account
Run any vLLM model, any quantization
Always predictable $/token economy
Fleet management
Centralized model storage, deployment and rollout for your model fleet in one place.
How is this different from managed platforms?
Rivvr gives you the managed service experience — no infra to run, no scaling rules to tune — but inside your own AWS account. You keep full control over security, SLOs, and cost, without the operational overhead.
How is this different from AIBrix or NVIDIA Dynamo?
Two things. First: you don't operate anything — Rivvr runs as a managed platform inside your AWS account, unlike the software your team deploys and maintains. Second: a different operational model. AIBrix and Dynamo ask you to configure autoscaling, profiling, and dozens of other parameters. Rivvr asks for two — your SLO and your cost guardrail. Set those, and you're running.
Where does Rivvr run?
Inside your AWS account, behind your VPC. Inference traffic, data, and model weights never leave your environment.
How is Rivvr priced?
Rivvr is priced as a management layer based on GPUs under orchestration. You pay AWS directly for infrastructure; Rivvr charges a separate management fee.
Do I need to change my integration code?
No. Rivvr uses an OpenAI-compatible API. Most teams point their existing client at a new endpoint, and nothing else changes.
Does Rivvr modify or compress my models?
No. Rivvr doesn't touch model weights. Optimization happens at the orchestration, placement, and infrastructure layers.
What models does Rivvr support?
Any model served by vLLM, up to 400B parameters. If you're running something larger or unusual, let's talk.
What inference engines are supported?
Rivvr uses a vLLM-compatible runtime.
What GPUs are supported?
NVIDIA GPUs with CUDA compute capability 7.0 or higher — including L4, L40S, A10G, T4, A100, H100, H200, B200, B300, V100.
