← Back to case studies

LLM Efficiency Suite

70.7% energy savings • 82.7% routine queries • Active learning

Real‑world deployment with 50‑user pilot
Cutting LLM inference energy by two‑thirds

Challenge: A research institute with a large‑scale AI assistant faced rising cloud costs due to repetitive queries. Each query triggered the full LLM, wasting energy and money.

Solution

We deployed the LLM Efficiency Suite as a lightweight proxy between the application and the LLM API. The system learned keyword patterns in real time, building hierarchical scopes (C1–C4). After a short learning phase, it began answering routine queries locally.

Setup

  • Virtual machine (8 vCPU, 16 GB RAM) running the efficiency suite.
  • Pilot group: 50 users (researchers, admin staff).
  • Observation period: 3 months of normal usage.
70.7%
energy saved
82.7%
queries without base LLM
11
keywords in C2 (active learning)

Key results:
✅ Average energy per query dropped from 0.5 Wh to 0.147 Wh.
✅ 82.7% of all queries were answered by the routine layer – the base LLM was invoked only for complex or novel questions.
✅ The system autonomously moved 11 medium‑frequency keywords into the C2 scope, proving it learns and adapts.
✅ Latency for routine queries improved by 30%. User satisfaction remained unchanged (4.8/5).

Impact: The institute reduced its cloud inference bill by over 60% in the first month. The solution required no model retraining, no hardware changes, and scaled effortlessly to all 200 employees after the pilot.

Read the full whitepaper →