Build a Token Router with Embeddings and Prompt Templates

No training pipeline. No GPU. Just embeddings, cosine similarity, and structured prompts that cut your LLM bill by 80%. The idea Every query has a shape — topic, complexity, expected output format. You can detect that shape in <5ms using embeddings, then: Pick a prompt template — pre-built system prompt with format constraints, cached by the provider Pick a model — cheap for easy queries, strong for hard ones Cap output tokens — templates define expected length No model training. No preference data. Just geometry in embedding space. ...

March 23, 2026 · 7 min · Minh-Nhut Nguyen

Stop Wasting Tokens

Most LLM cost is waste — context that didn’t need to be there, models too big for the task, reasoning that ran longer than it should. Here’s how to fix it, grounded in 2025 research, with a concrete open-source stack at the end. The problem Token cost and context bloat are the same problem: no mechanism deciding what information is worth keeping. It shows up three ways: ...

March 22, 2026 · 8 min · Minh-Nhut Nguyen