Adaptive Inference Suppression
Hardware‑aware software that cuts LLM inference energy by 70% without retraining.
Architecture overview
The system intercepts routine queries before they reach the base LLM. A hierarchical keyword scope (C1–C4) dynamically learns which queries can be answered locally, while a Predictive Reconfiguration Layer (PRL) continuously monitors query patterns and adjusts thresholds in real time. Only queries that cannot be answered from the routine layer are escalated to the full model.
Key components:
- C1–C4 scopes – nested keyword sets; words float between scopes based on dynamic weight.
- PRL (Predictive Reconfiguration Layer) – computes anomaly scores (A‑score) and triggers alarms (L1, L2, L3).
- Multi‑level alarms – L3 for gradual trends, L2 for accelerated reconfiguration, L1 for immediate system‑wide update.
- Hardware suppression gate – physically blocks GPU/TPU activation when routine layer suffices.
TRL5 validation results
Simulation with 100,000 realistic queries, bursts, noise, and evolving topics. Optimal C2 threshold = 0.22.
The presence of 11 keywords in C2 proves that the system learns – it does not just statically filter. The alarm system reacted appropriately to simulated bursts, and energy savings exceed the original 60–70% target.
Patent coverage
Provisional patent US 64/006,312 filed March 15, 2026, covering the hierarchical scope engine, PRL, multi‑level alarms, and hardware suppression gate. European phase in preparation (Unitary Patent option).
View patent portfolio →