Build a Token Router with Embeddings and Prompt Templates

No training pipeline. No GPU. Just embeddings, cosine similarity, and structured prompts that cut your LLM bill by 80%. The idea Every query has a shape — topic, complexity, expected output format. You can detect that shape in <5ms using embeddings, then: Pick a prompt template — pre-built system prompt with format constraints, cached by the provider Pick a model — cheap for easy queries, strong for hard ones Cap output tokens — templates define expected length No model training. No preference data. Just geometry in embedding space. ...

March 23, 2026 · 7 min · Minh-Nhut Nguyen