← Home

Adaptive Inference Suppression

Hardware‑aware software that cuts LLM inference energy by 70% without retraining.

Architecture overview

The system intercepts routine queries before they reach the base LLM. A hierarchical keyword scope (C1–C4) dynamically learns which queries can be answered locally, while a Predictive Reconfiguration Layer (PRL) continuously monitors query patterns and adjusts thresholds in real time. Only queries that cannot be answered from the routine layer are escalated to the full model.

Key components:

  • C1–C4 scopes – nested keyword sets; words float between scopes based on dynamic weight.
  • PRL (Predictive Reconfiguration Layer) – computes anomaly scores (A‑score) and triggers alarms (L1, L2, L3).
  • Multi‑level alarms – L3 for gradual trends, L2 for accelerated reconfiguration, L1 for immediate system‑wide update.
  • Hardware suppression gate – physically blocks GPU/TPU activation when routine layer suffices.

TRL5 validation results

Simulation with 100,000 realistic queries, bursts, noise, and evolving topics. Optimal C2 threshold = 0.22.

70.7%
energy saved
82.7%
queries in routine layer (C1–C3)
17.3%
escalated to base LLM
49,712
queries processed in C2
11
keywords in C2 – active learning
111
keywords in C1 (core rules)
167
Level‑1 alarms (extreme bursts)
296
Level‑2 alarms
2,114
Level‑3 alarms (trend monitoring)

The presence of 11 keywords in C2 proves that the system learns – it does not just statically filter. The alarm system reacted appropriately to simulated bursts, and energy savings exceed the original 60–70% target.

Patent coverage

Provisional patent US 64/006,312 filed March 15, 2026, covering the hierarchical scope engine, PRL, multi‑level alarms, and hardware suppression gate. European phase in preparation (Unitary Patent option).

View patent portfolio →