You’ve heard of prompt engineering. But the engineers shipping production AI agents in 2025 are talking about something deeper: context engineering.
The Agent-Skills-for-Context-Engineering repository by Muratcan Koylan is a comprehensive, open collection of structured skills — already cited in academic research from Peking University — that teaches the art and science of curating what goes into a model’s context window to maximize agent effectiveness.
Part 1: Foundations — The Mental Model
Prompt Engineering vs. Context Engineering
Most developers stop at prompt engineering: writing clever instructions to steer model behavior. That’s necessary, but it’s only one piece of the puzzle.
Context engineering is the discipline of managing everything the language model can attend to at inference time:
- System prompts
- Tool definitions
- Retrieved documents
- Message history
- Tool outputs
The fundamental constraint is not raw token capacity — it’s attention mechanics. As context length grows, models exhibit predictable degradation: the “lost-in-the-middle” phenomenon, U-shaped attention curves, and attention scarcity. The goal is finding the smallest high-signal set of tokens that maximizes the probability of the desired outcome.
Think of the model’s context window like a detective’s investigation board. A great detective (model) doesn’t pin every newspaper clipping they’ve ever read on the board — they curate only the most relevant evidence. Context engineering is the art of being that detective’s assistant.
What is an “Agent Skill”?
This repository implements the Agent Skills specification — a structured way to package guidance for AI agents. Each skill follows a standard format:
| |
The design follows progressive disclosure: at startup, an agent loads only skill names and descriptions. Full content loads only when a skill is activated for the relevant task. This keeps agents fast while giving them access to deep expertise on demand.
Part 2: The Investigation — Skill Architecture
13 Skills Across 5 Categories
The collection organizes 13 skills into a coherent learning path:
| Category | Skills |
|---|---|
| Foundational | context-fundamentals, context-degradation, context-compression |
| Architectural | multi-agent-patterns, memory-systems, tool-design, filesystem-context, hosted-agents |
| Operational | context-optimization, evaluation, advanced-evaluation |
| Development | project-development |
| Cognitive | bdi-mental-states |
Claude Code Plugin Integration
What makes this collection unique is its first-class integration with Claude Code’s plugin marketplace:
| |
Skills auto-activate based on task context — no manual configuration required.
Part 3: The Diagnosis — What This Actually Does for Developers
3.1 The Anatomy of Context (from context-fundamentals)
The context-fundamentals skill breaks down what actually lives in a model’s context:
| Component | Characteristics |
|---|---|
| System prompts | Loaded once, persist throughout the session |
| Tool definitions | Serialized near the front of context; descriptions steer behavior |
| Retrieved documents | Just-in-time loaded via RAG |
| Message history | Grows linearly; dominates long-running tasks |
| Tool outputs | Can reach 83.9% of total context in agent trajectories |
The key insight: context must be treated as a finite resource with diminishing marginal returns. Every new token depletes the attention budget.
Practical example — organizing system prompts with clear section boundaries:
| |
3.2 Context Degradation Patterns (from context-degradation)
This is where things get empirically fascinating. The skill documents 5 distinct failure modes:
1. Lost-in-the-Middle — Information in the center of context receives 10-40% lower recall accuracy compared to information at the start or end. This is not a bug; it’s a consequence of attention mechanics.
2. Context Poisoning — A hallucination or incorrect tool output enters context and compounds through repeated reference. Recovery requires truncating to before the poisoning point.
3. Context Distraction — Even a single irrelevant document reduces performance. The effect is not proportional; it’s a step function.
4. Context Confusion — When context contains multiple task types, the model may apply constraints from the wrong task.
5. Context Clash — Multiple correct but conflicting pieces of information create contradictory guidance.
Model-specific degradation thresholds (from the skill’s reference data):
| Model | Degradation Onset | Severe Degradation |
|---|---|---|
| GPT-5.2 | ~64K tokens | ~200K tokens |
| Claude Opus 4.5 | ~100K tokens | ~180K tokens |
| Claude Sonnet 4.5 | ~80K tokens | ~150K tokens |
| Gemini 3 Pro | ~500K tokens | ~800K tokens |
Practical mitigation — the Four-Bucket approach:
| |
3.3 Multi-Agent Patterns (from multi-agent-patterns)
The multi-agent skill reveals a critical insight: sub-agents exist primarily to isolate context, not to simulate organizational roles.
Token economics reality:
| Architecture | Token Multiplier |
|---|---|
| Single agent chat | 1× |
| Single agent with tools | ~4× |
| Multi-agent system | ~15× |
Despite the cost, multi-agent approaches unlock parallelization. Research on the BrowseComp evaluation found token usage explains 80% of performance variance — validating that distributing work across agents with separate context windows is worth the overhead.
The Telephone Game Problem — a critical pitfall in supervisor architectures:
LangGraph benchmarks found supervisor architectures initially performed 50% worse than optimized versions because supervisors paraphrase sub-agent responses incorrectly. The fix is a forward_message tool:
| |
3.4 Real-World Examples Included
The repo ships 5 production-quality examples that demonstrate how skills combine in practice:
| Example | What It Demonstrates |
|---|---|
digital-brain-skill | Personal OS for founders — 6 modules, 4 automation scripts, JSONL append-only memory |
x-to-book-system | Multi-agent pipeline monitoring X accounts → generating daily synthesized books |
llm-as-judge-skills | TypeScript LLM evaluation tools — 19 passing tests, pairwise comparison, bias mitigation |
book-sft-pipeline | Fine-tune 8B model on any author’s style for $2 total cost |
interleaved_thinking | Cognitive architecture demonstration |
Part 4: The Resolution — How to Use It
For Claude Code Users
The fastest path:
| |
Skills activate automatically when you use trigger phrases:
| Trigger phrase | Skill activated |
|---|---|
| “compress context” | context-compression |
| “implement LLM-as-judge” | advanced-evaluation |
| “design multi-agent system” | multi-agent-patterns |
| “build background agent” | hosted-agents |
For Cursor / Codex / Any IDE
Copy the relevant SKILL.md content into your .rules file or project-specific instructions folder. The skills are deliberately platform-agnostic.
For Custom Implementations
The skills are designed as extractable patterns. Pick a skill that addresses your current challenge, extract the design principles, and implement them in your agent framework.
Learning Path Recommendation
- Start with
context-fundamentals— builds the mental model - Study
context-degradation— understand how things go wrong - Apply
context-compressionandcontext-optimization— prevent problems proactively - Expand to architectural skills based on your system needs
Final Mental Model
| |
Whether you’re building your first AI agent or optimizing a multi-agent system for production, this collection gives you the vocabulary, mental models, and concrete patterns to do context engineering right.
Repository: Agent-Skills-for-Context-Engineering Author: Muratcan Koylan