☕️🚀 Beyond the Prompt: The Evolution of Context Engineering in 2026

Lately, I’ve been having a lot of conversations with customers about how to shape their GEMINI.md files to get the absolute most out of the Gemini CLI. It’s a topic I’ve been giving a proper amount of thought to, and I figured it was about time I put some of those thoughts down on paper.

Picture this: It’s 2024. You’ve just spent twenty minutes meticulously crafting a “perfect” prompt. You’ve given it a persona, five examples, and a stern warning not to hallucinate. You hit enter, and the AI… well, it bloody misses the mark anyway.

We’ve all been there, haven’t we? We thought “Prompt Engineering” was the final boss of AI efficiency. We were wrong. In 2025 we crammed all of the context in our GEMINI.md files just in case the info was needed. By March 2026, the industry has moved on. The smart money isn’t on how you talk to the AI—it’s on how you brew the environment it lives in and provide just the right level of information at the right time. No more and no less.

Welcome to the age of Context Engineering.

🎛️ The Three Levers: How We Influence the AI

Before we get into the weeds, let’s take a step back. When you’re using a tool like Gemini CLI, there are really only three ways you can influence the model’s response. Think of these as the dials on your espresso machine:

The Three Levers of AI Influence

The Input (Prompt Engineering & Context): This is the coffee grounds. It includes the explicit question you ask, and the surrounding context you provide (like code snippets or README files) to ground the answer.
The Dials (Inference Parameters): These are settings like Temperature, Top-P, and Top-K, which control the “creativity” or randomness of the output. While powerful, these are often pre-tuned by the toolmakers or exposed only in advanced features. For most users, they are the least accessible lever, which is why mastering Input and System-Level context is so critical. One way to tweak could be in the experimental features of subagents.
The System-Level (System Prompts & RAG): This is the underlying plumbing. System prompts give the model its baseline persona and rules. Retrieval-Augmented Generation (RAG) acts as an external brain. It should be noted that the use of RAG is evolving in this space and becoming less prominent (watch out for a post in the future on this).

For the last few years, we obsessed over lever number one—specifically, the “question asked” part. But as our tasks got bigger, we realised the real power lies in mastering the Context.

☕️ The Fallacy of Infinite Context: Don’t Over-fill the Filter

In 2025, we got greedy. Models like Gemini offered context windows of a million tokens or more. We thought, “Grand! I’ll just shove the whole repo in there and let the AI sort it out.”

Turns out, that’s a bit like trying to brew a proper espresso by dumping a kilo of beans into a single-shot portafilter. It’s a mess, and it tastes like rubbish.

Research by Gloaguen et al. (arXiv:2602.11988) calls this “Context Pollution.” They found that dumping exhaustive guidelines into context files can actually reduce task success rates by 3% while inflating costs by 20%. Because LLMs are so obedient, they end up over-exploring the codebase and losing focus on the actual task.

Filtering out the Grit: What We Are Mitigating

When you saturate the context window, the AI’s “attention budget” gets exhausted. We are mitigating proper technical nightmares:

Poisoning & Hallucinations: Shoving unverified data in leads to variables being fabricated or the model being led astray by “poisoned” snippets.
Distraction & The ‘Pink Elephant’: Ever told an AI not to do something, only for it to do exactly that? That’s the “don’t think about a pink elephant” problem. Too much noise makes negative constraints fail.
Confusion & Fluff: Exhaustive contexts result in low-quality, wordy responses that don’t solve the outcome.
Clashes: Without hierarchy, the AI faces conflicting information.

The NoLiMa benchmark proved that accuracy drops from 99.3% in short contexts down to 69.7% once you hit 32,000 tokens. If you want a reliable brew, you’ve got to prune the grounds.

🏗️ Building the Context Stack: The GEMINI.md Hierarchy

To stop the AI from getting “lost in the middle,” we need a structured way to manage what it sees. This isn’t just a passive dump; it’s a Context Stack.

The Gemini CLI Context Precedence Stack

A study by Lulla et al. (arXiv:2601.20404) confirms that shifting agent guidance from ephemeral prompts to these version-controlled artifacts is a massive win for efficiency. They recorded a 28.6% reduction in runtime and a 16.6% reduction in token usage when proper context files were present.

Gemini CLI employs a six-tiered context hierarchy. While listed below from the most fundamental safety mechanisms to the most general baseline, it’s crucial to remember that highly specific rules override generic ones, with context closer to the active task (like Sub-directory Rules) taking effective precedence over broader layers (like System Defaults):

Detailed Precedence Order

Core Mandates: Hardcoded safety rules (renderCoreMandates).
Sub-directory Rules (project/src/api/GEMINI.md): Localised logic.
Workspace Root (project/GEMINI.md): Global project constraints.
Extensions/Skills: Context from active CLI tools.
Global User Policy (~/.gemini/GEMINI.md): Your personal preferences.
System Defaults: The baseline personality of the AI.

By using this hierarchy, highly specific rules override generic ones, resolving information clashes before they even reach the model.

☕️ Brewing the Perfect GEMINI.md: Best Practices

So, what does a “proper” context file look like? It’s not an encyclopaedia. It’s a set of rigid, operational guardrails. Here is the distilled wisdom from my customer conversations and recent research:

Gemini.md Best Practices (Early 2026)

Embrace Minimalism: Aim for ~60 lines. Research shows that human-written, ruthlessly concise files are the only ones that actually help. Avoid LLM-generated boilerplate “slop”—it just creates context pollution.
Focus on the Un-guessable: Include unique team etiquette, PR conventions, and specific bash commands. The AI can already read your package.json; don’t repeat what it can already see.
Tables for Rules: Structurally, Markdown tables are significantly more token-efficient for rule matrices than plain prose.
Use CLI tools: Crack on with the Gemini CLI extension (e.g. Conductor) or plan mode to manage your spec.md and plan.md artifacts.

Here is a concrete example of a well-engineered GEMINI.md. The HTML comments within are for the human developer’s benefit, explaining the why behind each choice.

NOTE: This is my current good practice based on the exact time this blog was written. Things eveolve rapidly in this palce so you should experiment yourself and iterate over time based on the latest findings.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40


<!-- NOTE: This file is always loaded into context, no matter what you are trying to do! You should ensure that its contents are as universally applicable as possible.-->
<!-- Top Tips
An LLM will perform better on a task when its' context window is full of focused, relevant context including examples, related files, tool calls, and tool results compared to when its context window has a lot of irrelevant context.

* ~60 lines no more than 600 (channel your inner minimalist; less is more)
* If the info is somewhere in the codebase you probably don't need it to be here - Really good at figuring out what files/folders matter for tasks, what commands to run, the dependencies you have
* Use it to steer the model away from things its consistently doing wrong or quirks that keep happening
* Include bash commands that can't be guesses by the model
* Exclude information that changes frequently
* Include unique instructions or team etiquette (branch naming, PR conventions)
* Consider what the model can workout and what it can't (and will) work out by reviewing the codebase; reduce the risk of confusing it -->

# Project: eCommerce
<!-- agents operate best on rigid, operational guardrails and specific constraints rather than polite requests or general guidelines. Stick to concrete "Do X, Never do Y" statements. -->
This file describes common mistakes and confusion points that agents might encounter as they work in this project. If you ever encounter something in the project that surprises you please alert the developer working with you and indicate that this is the case in GEMINI.MD file to help prevent future agents from having the same issue


## Setup & Developer Environment
<!-- Gemini will work out dependencies from the codebase (e.g. package.json). Hardcoding in here is like having stale docs -->
- **Install:** `pnpm install` (Do NOT use npm or yarn)
- **Start dev:** `pnpm turbo run dev --filter ulta-ecomm`
- **Keys:** Requires `LOCAL_KMS_KEY` in `.env`.

## Deep Context (Progressive Disclosure)
<!-- The Gemini CLI executes a downward Breadth-First-Search (BFS) scan through your project, grabbing context files from subdirectories (up to a limit of 200 folders) and layering them over the root file. Ensure this root file remains strictly for global mandates, and rely heavily on nested GEMINI.md files in your sub-folders for component-specific instructions, as those will be appended closer to the active user prompt  -->
- **Database Schema:** ` @./docs/database_schema.md`
- **Authentication Flows:** ` @./src/auth/AGENTS.md`
- **API Specs:** ` @./docs/api_design.md`

## Rules, Gotchas, & Anti-Patterns
<!-- For comparative data or strict rule matrices, structural analysis shows that formatting these rules into a Markdown table or using YAML/XML structures significantly improves the model's comprehension and token ingestion efficiency compared to plain prose or basic lists -->
<!-- Specify prefered process and specific instructions to be followed -->


| Category | Mandate | Anti-Pattern to Avoid |
| :--- | :--- | :--- |
| **Error Handling** | Throw custom `UltaAPIError` classes. | Generic `try/catch` that swallows errors. |
| **Database Access** | Use the Prisma client from `src/db/client.ts`. | No raw SQL strings allowed. |
| **API Responses** | Fail loudly on missing data. | No fallback code with hardcoded fake data. |
| **Commits** | Use format: `[ulta-ecomm] <description>` | Committing without running pnpm lint/test. |

📋 Spec-Driven Development: Context Precedes Code

We’ve moved past “vibe coding.” Today, we use Spec-Driven Development (SDD), a workflow that treats planning artifacts as a primary form of context. The rule is simple: Context Precedes Code. Before the AI writes a single line of logic, it first helps formalise a blueprint.

This is another form of just-in-time context. Rather than giving the AI a vague goal, you provide a highly structured, pre-agreed scope. The Gemini CLI’s Conductor extension enforces this by creating two key files:

spec.md: Defines what we are building and why. This is the mission-level context.
plan.md: An actionable, phased to-do list for the agent. This is the task-level context.

By engineering this context first, you anchor the AI to a concrete plan, drastically reducing the chances of it going off on a tangent or misinterpreting the final goal.

🚰 Progressive Disclosure: The Right Context at the Right Time

One of the biggest faffs in 2025 was the “Skill Gap.” Vercel found that agents given a monolithic set of instructions at the start of a session would often “forget” or ignore crucial rules later on. This happens when the initial context isn’t relevant to the immediate task.

The solution is Progressive Disclosure: delivering context on a just-in-time basis. Instead of a single massive upfront data dump, we provide specific, localised instructions that become active only when the agent enters a certain part of the codebase.

This is precisely how the Gemini CLI’s context hierarchy works. When the agent is operating inside src/api/, the rules in src/api/GEMINI.md are loaded. Because these rules are loaded “last,” they take precedence and are “closer” to the prompt, making them far more effective. This ensures the AI has the most relevant guardrails for its current task, preventing instruction loss and keeping it focused.

🛠️ The Execution Arena: Why CLI is the New MCP

While the Model Context Protocol (MCP)—a system for letting models discover tools via JSON schemas—was a massive leap for retrieving data, it’s often overkill for simple execution tasks. Forcing a model to parse a verbose JSON schema for every available tool is a great way to trigger context rot before you’ve even started.

For example, defining a simple git status command via a JSON schema could take dozens of lines:

1
2
3
4
5
6


// A verbose, token-heavy way to define a tool
{
  "name": "git_status",
  "description": "Get the status of the git repository",
  "parameters": { "type": "object", "properties": {} }
}

We’ve realised that for many tasks, CLI is the new MCP. AI models are already native bash experts. They don’t need a heavy translation layer. Instead of the verbose schema, the AI can simply be empowered to run git --help to discover its own options.

This approach trusts the model to do what it does best: understand and use text-based interfaces. By moving simple tasks to direct CLI execution, we’ve seen a 40% reduction in token overhead. It’s faster, it’s stateless, and it lives exactly where your code is built and tested.

☕️ The Takeaway: Brewing Your Own Context

Best Practices Summary for Context Engineering

It takes a bit of a faff to set up, but once your context is engineered, the AI stops being a temperamental chatbot and starts being a proper execution engine.

What’s your context stack looking like lately? Are you still vibe coding, or have you built a proper hierarchy? Let me know over your next brew!

Grab your favourite roast and let’s customise those context files! ☕️🚀