Your LLM Bill Is 80% Waste. Here Are 4 Fixes.

You’re sending every question to your most expensive model. Simple lookups, complex architecture reviews, one-word classifications — all hitting the same endpoint at the same price. That’s like routing every hospital patient to the head surgeon, whether they need stitches or a heart transplant. The fix is four levers, applied in order. Each compounds on the last. Together: 80–88% cost reduction, and quality goes up on the hard queries. Route by difficulty — send easy questions to cheap models, hard ones to strong models Manage context like memory — give the model the right history at the right time, not all history all the time Cache the instructions — stop re-reading the same playbook on every call Control the output — stop the model from overthinking simple tasks Lever 1: Route by difficulty 65% of your queries don’t need your best model. “What’s our refund policy?” and “Design a distributed caching layer with consistency guarantees” both hit the same model at the same price. One needs a $0.80/million-token model. The other genuinely needs a $15/million-token model. ...

March 23, 2026 · 10 min · Minh-Nhut Nguyen