Infrastructure Layer
For LLM Inference

Inference
Efficiency Layer
For LLMs

Rivvr is providing the most

efficient inference for serving

Large Language Models,

saving up to 50% GPU

compute costs.

Rivvr powers LLM inference engines with an enterprise-ready infrastructure layer,

streamlining scaling and orchestration,

while reducing GPU compute costs by up to 50%.


Deploys in VPC & On-Prem. Works with industry-standard inference engines:


Deploys in VPC & On-Prem. Works with industry-standard inference engines:

Up to 1.7x Higher Throughput for Single Pod & Multipod Configs

Intelligent load balancing and GPU memory management raise GPU utilization up to 95% vs baseline 40-60%,

while preserving your Service Level Objectives (SLOs).

Intelligent load balancing and GPU memory management raise

GPU utilization up to 95%

vs baseline 40-60%,

while preserving your

Service Level Objectives (SLOs).

Intelligent load balancing and GPU memory management raise

GPU utilization up to 95% vs baseline 40-60%,

while preserving your Service Level Objectives (SLOs).


NVIDIA L40S with the Qwen 2.5 14B model. SLO: TTFT p50 < 300ms and TTFT p90 < 3s


NVIDIA L40S with the Qwen 2.5 14B model, target GPU utilization 95%

Meet Our Team

Dmitrii Shubin

Dmitrii spent 10+ years building high-load Machine Learning infrastructure for Xiaomi (Amazefit/Mio brands), Newmount Global (gold mining business unit), and Index Exchange (RTB platform optimization). MASc at the University of Toronto.

Eugene Radchenko

Eugene specializes in building large-scale enterprise systems. He led the development of claims processing and medical case management at OneShield and SCC Soft Computer. BCS at the State University of Trade and Economics (Kyiv).