How it works

Smart context. No token bloat.

Every token you send to an AI API costs money. Sending your entire project on every message is the naive approach — and it is expensive. StackLatte uses keyword-based smart context to send only what the AI actually needs for each specific question.


The Naive Approach and Its Cost

The obvious way to give an AI context about your project is to paste the whole thing into the system prompt. Every message starts with your full project description, all your knowledge base entries, every step across every track. The AI always has complete information.

The problem is the cost. AI APIs charge per token — input and output. A modest project with a few tracks, a dozen phases, and a knowledge base can easily reach 5,000–15,000 tokens of context. At current API pricing, sending that on every message adds up fast, especially in long working sessions.

There is also a quality problem. Models perform better on focused, relevant context than on a wall of text where the signal is buried in noise. Sending everything is not just expensive — it is often counterproductive.

How Smart Context Works

StackLatte splits your project into typed context blocks — one per project, track, phase, step, substep, and knowledge entry. Each block has a unique ID and its own text content.

At load time, those blocks are fed into an inverted index — a data structure that maps every keyword to the set of blocks containing it. This is a fast, local operation with no AI call involved. The index lives in memory and is rebuilt whenever your project changes.

When you send a message, the system tokenises your input and looks up each keyword in the index. Blocks that score above the relevance threshold are collected. Those blocks are then expanded upward through the parent chain — if a step matches, its parent phase and track are included automatically, so the AI always receives coherent, structurally complete context.

On top of the matched blocks, a compact structure index is always sent: a lightweight text representation of the full project hierarchy with IDs, status indicators, and current position. This gives the AI the project shape without the full detail cost — it can see everything that exists and where things stand, and only the matched blocks come with their full content.

The Guaranteed Floor

Keyword matching works well for specific questions. But what about vague messages — "how do I solve this?", "what should I do next?", "can you help?" — where there are no project-specific keywords to match?

StackLatte handles this with a guaranteed floor: the currently focused step is always included in the context, regardless of keyword match. If you are looking at a specific step and ask a general question, the AI already knows which step you mean. The keyword scoring layer adds anything else that is relevant; the active step is always there as the minimum.

For complex questions that genuinely require the full project — detailed refactoring, cross-track planning, comprehensive reviews — the AI can signal that it needs more context and StackLatte will send a richer payload on a second pass. This two-pass approach means the common case stays cheap while complex cases still work correctly.

Three-Tier Strategy in Practice

ModeWhat gets sentToken cost
Smart (default)Compact index + keyword-matched blocks + active stepLow — scales with relevance
AutoCompact index only, AI requests full context if neededVery low first pass, medium on demand
FullComplete project context every messageHigh — consistent regardless of question

Smart mode is on by default. You can switch to Auto or Full in the AI panel if a session requires it.

Built on an Open Package

The context block system, inverted index construction, and keyword relevance scoring are all handled by @stacklatte/context-manager, an npm package built as the retrieval engine behind StackLatte. The package handles the heavy lifting — buildIndex, keywordRelevance, classifyIntent — so StackLatte composes it rather than reimplementing retrieval logic from scratch.

This separation also means the retrieval layer can be used independently in other projects that need structured context injection without building the whole StackLatte stack.

For more on the broader architecture, see What Is a Personal AI OS? and What Is AI Memory?


Frequently Asked Questions

Why do AI context windows drive up API costs?
AI APIs charge per token — input and output. Sending a large project context on every message means paying for thousands of tokens even when the question concerns only a small part of the project.
How does StackLatte reduce token usage?
Keyword relevance scores your context blocks against each message and sends only the matching ones, plus a compact structural index. Irrelevant blocks are never sent.
What is an inverted index in this context?
A data structure that maps keywords to the blocks containing them. When you send a message, StackLatte looks up the keywords in the index locally — no AI call needed for retrieval — and retrieves the relevant blocks instantly.
Does smart context ever miss something important?
The currently focused step is always included as a guaranteed minimum. Matched blocks are expanded to their parent phase and track for coherence. For complex questions, the AI can request a full context pass.
Is the context-manager package open source?
Yes. @stacklatte/context-manager is published on npm and is the retrieval engine behind StackLatte's smart context system.

“The context layer is the missing infrastructure between you and every AI model you use. Smart retrieval is how we keep it from becoming a cost problem — the AI gets what it needs, not everything that exists.”

RL

Rasmus Lagoni

Founder, StackLatte · 10 years software consultancy

Context that fits. Costs that don't surprise you.

Free. No account. Smart context on by default.

Open StackLatte →