Context Engineering: The Discipline That Separates Good AI Agents from Great Ones

You’ve heard of prompt engineering. But the engineers shipping production AI agents in 2025 are talking about something deeper: context engineering.

The Agent-Skills-for-Context-Engineering repository by Muratcan Koylan is a comprehensive, open collection of structured skills — already cited in academic research from Peking University — that teaches the art and science of curating what goes into a model’s context window to maximize agent effectiveness.

Part 1: Foundations — The Mental Model

Prompt Engineering vs. Context Engineering

Most developers stop at prompt engineering: writing clever instructions to steer model behavior. That’s necessary, but it’s only one piece of the puzzle.

Context engineering is the discipline of managing everything the language model can attend to at inference time:

System prompts
Tool definitions
Retrieved documents
Message history
Tool outputs

The fundamental constraint is not raw token capacity — it’s attention mechanics. As context length grows, models exhibit predictable degradation: the “lost-in-the-middle” phenomenon, U-shaped attention curves, and attention scarcity. The goal is finding the smallest high-signal set of tokens that maximizes the probability of the desired outcome.

Think of the model’s context window like a detective’s investigation board. A great detective (model) doesn’t pin every newspaper clipping they’ve ever read on the board — they curate only the most relevant evidence. Context engineering is the art of being that detective’s assistant.

What is an “Agent Skill”?

This repository implements the Agent Skills specification — a structured way to package guidance for AI agents. Each skill follows a standard format:

1
2
3
4
skill-name/
├── SKILL.md         # Required: instructions + metadata
├── scripts/         # Optional: executable code
└── references/      # Optional: additional docs

The design follows progressive disclosure: at startup, an agent loads only skill names and descriptions. Full content loads only when a skill is activated for the relevant task. This keeps agents fast while giving them access to deep expertise on demand.

Part 2: The Investigation — Skill Architecture

13 Skills Across 5 Categories

The collection organizes 13 skills into a coherent learning path:

Category	Skills
Foundational	`context-fundamentals`, `context-degradation`, `context-compression`
Architectural	`multi-agent-patterns`, `memory-systems`, `tool-design`, `filesystem-context`, `hosted-agents`
Operational	`context-optimization`, `evaluation`, `advanced-evaluation`
Development	`project-development`
Cognitive	`bdi-mental-states`

Claude Code Plugin Integration

What makes this collection unique is its first-class integration with Claude Code’s plugin marketplace:

1
2
3
4
5
6
7
# Register the marketplace
/plugin marketplace add muratcankoylan/Agent-Skills-for-Context-Engineering

# Install specific plugin bundles
/plugin install context-engineering-fundamentals@context-engineering-marketplace
/plugin install agent-architecture@context-engineering-marketplace
/plugin install agent-evaluation@context-engineering-marketplace

Skills auto-activate based on task context — no manual configuration required.

Part 3: The Diagnosis — What This Actually Does for Developers

3.1 The Anatomy of Context (from `context-fundamentals`)

The context-fundamentals skill breaks down what actually lives in a model’s context:

Component	Characteristics
System prompts	Loaded once, persist throughout the session
Tool definitions	Serialized near the front of context; descriptions steer behavior
Retrieved documents	Just-in-time loaded via RAG
Message history	Grows linearly; dominates long-running tasks
Tool outputs	Can reach 83.9% of total context in agent trajectories

The key insight: context must be treated as a finite resource with diminishing marginal returns. Every new token depletes the attention budget.

Practical example — organizing system prompts with clear section boundaries:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
<BACKGROUND_INFORMATION>
You are a Python expert helping a development team.
Current project: Data processing pipeline in Python 3.9+
</BACKGROUND_INFORMATION>

<INSTRUCTIONS>
- Write clean, idiomatic Python code
- Include type hints for function signatures
- Add docstrings for public functions
</INSTRUCTIONS>

<TOOL_GUIDANCE>
Use bash for shell operations, python for code tasks.
File operations should use pathlib for cross-platform compatibility.
</TOOL_GUIDANCE>

<OUTPUT_DESCRIPTION>
Provide code blocks with syntax highlighting.
Explain non-obvious decisions in comments.
</OUTPUT_DESCRIPTION>

3.2 Context Degradation Patterns (from `context-degradation`)

This is where things get empirically fascinating. The skill documents 5 distinct failure modes:

1. Lost-in-the-Middle — Information in the center of context receives 10-40% lower recall accuracy compared to information at the start or end. This is not a bug; it’s a consequence of attention mechanics.

2. Context Poisoning — A hallucination or incorrect tool output enters context and compounds through repeated reference. Recovery requires truncating to before the poisoning point.

3. Context Distraction — Even a single irrelevant document reduces performance. The effect is not proportional; it’s a step function.

4. Context Confusion — When context contains multiple task types, the model may apply constraints from the wrong task.

5. Context Clash — Multiple correct but conflicting pieces of information create contradictory guidance.

Model-specific degradation thresholds (from the skill’s reference data):

Model	Degradation Onset	Severe Degradation
GPT-5.2	~64K tokens	~200K tokens
Claude Opus 4.5	~100K tokens	~180K tokens
Claude Sonnet 4.5	~80K tokens	~150K tokens
Gemini 3 Pro	~500K tokens	~800K tokens

Practical mitigation — the Four-Bucket approach:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
# Write: save context outside the window
scratchpad: "Write intermediate results to filesystem"

# Select: pull only relevant context
retrieval: "Filter documents before loading"

# Compress: reduce tokens while preserving info
summarization: "Replace verbose outputs with compact references"

# Isolate: split across sub-agents
partition: "Give each agent a fresh, focused context"

3.3 Multi-Agent Patterns (from `multi-agent-patterns`)

The multi-agent skill reveals a critical insight: sub-agents exist primarily to isolate context, not to simulate organizational roles.

Token economics reality:

Architecture	Token Multiplier
Single agent chat	1×
Single agent with tools	~4×
Multi-agent system	~15×

Despite the cost, multi-agent approaches unlock parallelization. Research on the BrowseComp evaluation found token usage explains 80% of performance variance — validating that distributing work across agents with separate context windows is worth the overhead.

The Telephone Game Problem — a critical pitfall in supervisor architectures:

LangGraph benchmarks found supervisor architectures initially performed 50% worse than optimized versions because supervisors paraphrase sub-agent responses incorrectly. The fix is a forward_message tool:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
def forward_message(message: str, to_user: bool = True):
    """
    Forward sub-agent response directly to user without supervisor synthesis.
    
    Use when:
    - Sub-agent response is final and complete
    - Supervisor synthesis would lose important details
    - Response format must be preserved exactly
    """
    if to_user:
        return {"type": "direct_response", "content": message}
    return {"type": "supervisor_input", "content": message}

3.4 Real-World Examples Included

The repo ships 5 production-quality examples that demonstrate how skills combine in practice:

Example	What It Demonstrates
`digital-brain-skill`	Personal OS for founders — 6 modules, 4 automation scripts, JSONL append-only memory
`x-to-book-system`	Multi-agent pipeline monitoring X accounts → generating daily synthesized books
`llm-as-judge-skills`	TypeScript LLM evaluation tools — 19 passing tests, pairwise comparison, bias mitigation
`book-sft-pipeline`	Fine-tune 8B model on any author’s style for $2 total cost
`interleaved_thinking`	Cognitive architecture demonstration

Part 4: The Resolution — How to Use It

For Claude Code Users

The fastest path:

1
2
3
4
5
6
# 1. Register marketplace
/plugin marketplace add muratcankoylan/Agent-Skills-for-Context-Engineering

# 2. Install the bundle you need
/plugin install context-engineering-fundamentals@context-engineering-marketplace
# Includes: context-fundamentals, context-degradation, context-compression, context-optimization

Skills activate automatically when you use trigger phrases:

Trigger phrase	Skill activated
“compress context”	`context-compression`
“implement LLM-as-judge”	`advanced-evaluation`
“design multi-agent system”	`multi-agent-patterns`
“build background agent”	`hosted-agents`

For Cursor / Codex / Any IDE

Copy the relevant SKILL.md content into your .rules file or project-specific instructions folder. The skills are deliberately platform-agnostic.

For Custom Implementations

The skills are designed as extractable patterns. Pick a skill that addresses your current challenge, extract the design principles, and implement them in your agent framework.

Learning Path Recommendation

Start with context-fundamentals — builds the mental model
Study context-degradation — understand how things go wrong
Apply context-compression and context-optimization — prevent problems proactively
Expand to architectural skills based on your system needs

Final Mental Model

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
Context Engineering Summary
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

WHAT:   A structured collection of 13 agent skills for building
        production-grade AI systems through context management

WHY:    Context windows are finite. Attention is scarce. 
        More tokens ≠ better performance.

HOW:    Progressive disclosure + Four-Bucket strategy
        (Write → Select → Compress → Isolate)

KEY INSIGHT:
  Sub-agents exist to isolate context, not simulate org charts.
  Place critical info at context START or END, never middle.
  A single irrelevant document degrades performance measurably.

IMPACT:
  Cited in Peking University research on agentic skill evolution.
  13 skills covering fundamentals → architecture → operations.
  5 complete production examples with real code.

PLATFORMS:  Claude Code · Cursor · Codex · Any agent framework

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

Whether you’re building your first AI agent or optimizing a multi-agent system for production, this collection gives you the vocabulary, mental models, and concrete patterns to do context engineering right.

Repository: Agent-Skills-for-Context-Engineering Author: Muratcan Koylan