Chapter 6: Lessons Learned — Build Memory Into Your System
Something goes wrong. Maybe Claude took an action you didn’t approve. Maybe a deployment broke because of a step that wasn’t in the runbook. Maybe a decision that seemed obvious at the time turned out badly.
The default response is to fix it and move on. The lessons-learned skill is what you run instead. It guides Claude through a structured retrospective — not to assign blame, but to identify what system change would prevent this from happening again — and then implements that change before the session ends.
The Skill File
---
name: lessons-learned
description: Conduct structured retrospective analysis and encode fixes into the system. Use after incidents, mistakes, rollbacks, post-mortems, or when asking "what went wrong."
user-invocable: true
---
There’s no tools specification here because what the skill uses depends entirely on what the fix requires. Encoding a lesson might mean creating a new skill file, updating a CLAUDE.md, adding a checklist to an existing workflow, or writing a new guard. The skill guides the process; the output varies.
The Seven Phases
Phase 1: Incident definition. Before analysis, capture the facts. The skill provides a template:
## Incident Summary
**What happened:** [Factual description]
**When:** [Date/time]
**Impact:** [What was affected, scope of damage]
**Resolution:** [How it was fixed]
**Time to resolution:** [How long to fix]
The instruction is explicit: facts first, analysis later. Getting the facts wrong before you start analysing them means analysing the wrong problem.
Phase 2: Timeline reconstruction. Build a chronological sequence of what happened, who did what, and what resulted. Three key questions: what was the trigger? where did the sequence diverge from expected? what was the point of no return?
Phase 3: Root cause analysis. The skill uses the 5 Whys technique — asking why five times to get from symptom to root cause. Not “the deployment broke” but “the deployment broke because the runbook was missing a step because we never documented that dependency because…”
Phase 4: Contributing factors. Root cause isn’t usually enough. The skill categorises contributing factors: process gaps, communication failures, missing technical guards, context assumptions that turned out to be wrong.
Phase 5: Fix classification. This is where the skill distinguishes itself from a standard post-mortem. Every finding gets classified by fix type:
| Fix type | When to use | How to encode |
|---|---|---|
| Skill | Recurring workflow needs structure | Create SKILL.md in ~/.claude/skills/ |
| Guard | Action requires mandatory checkpoint | Add approval gate to relevant skill |
| Documentation | Knowledge gap caused the issue | Update CLAUDE.md or relevant docs |
| Automation | Manual step was forgotten | Create hook or script |
| Checklist | Multiple steps need verification | Add to existing skill or create new one |
Phase 6: Fix implementation. The skill’s most important instruction: “Don’t just recommend fixes — implement them.” A retrospective that ends with a list of recommendations is a document that will be forgotten. The skill drives Claude to make the actual change — create the file, update the workflow, add the guard — before the session ends.
Phase 7: Verification. Define how you’ll know the fix worked. Test scenario, success criteria, review date.
The Common Patterns
The skill includes a library of recurring incident patterns with their standard fixes:
Premature action — Claude took action before explicit approval. Fix: add an approval gate to the relevant skill. The template:
## Approval Gate Template
Before [ACTION]:
1. Show user exactly what will happen
2. Ask: "Ready to [action]? (yes/no)"
3. Wait for explicit "yes" or "proceed"
4. Only then execute
Sequence error — steps were executed in wrong order. Fix: encode the sequence in the skill with numbered steps and explicit dependency chain.
Missing validation — bad data passed through without a check. Fix: add validation step to skill or create a pre-flight check.
Context carryover — assumptions from a prior session caused the issue. Fix: add explicit context verification at task start.
Scope creep — Claude did more than requested and caused side effects. Fix: add clarifying questions before expanding scope.
The Anti-Patterns Section
The skill explicitly calls out what not to do:
| Anti-pattern | Problem | Instead |
|---|---|---|
| Blame assignment | Creates defensiveness, misses systemic issues | Focus on process, not people |
| Single-cause thinking | Oversimplifies, misses contributing factors | Use 5 Whys, identify multiple factors |
| Recommendation without action | Lessons forgotten, issue recurs | Implement fixes during retrospective |
| Vague fixes | ”Be more careful” doesn’t prevent recurrence | Encode specific, verifiable changes |
| Skip verification | No way to know if fix worked | Define success criteria and review date |
“Be more careful” is not a fix. A guard in the skill file is a fix. The distinction between the two is what separates a retrospective that produces change from one that produces a document.
The Feedback Loop
After the retrospective, the skill instructs Claude to log findings to the daily note via /log-to-daily. If a skill was created or modified, the skill itself is the durable artifact. For high-severity incidents, the fix gets added to an incident log.
The pattern is: incident → retrospective → encoded fix → future incident doesn’t happen. The skill encodes institutional memory directly into the system rather than into a document that no one reads.
How to Customise It
The fix classification table — my workflow has specific places for each fix type. Update the locations to match your own skill directory, CLAUDE.md paths, and hook locations.
Common incident patterns — add patterns specific to your workflow. If you have a recurring issue with a particular tool or process, encode its standard fix in the skill’s pattern library.
The verification criteria — if you want to be more or less rigorous about what counts as a verified fix, adjust the Phase 7 requirements. For critical incidents, you might want a mandatory review date and a test scenario. For minor ones, the fix itself might be sufficient.
Integration with your notes system — the skill currently integrates with Obsidian for the daily note log and the captures file. If you use a different note-taking system, update the log step.
Installing It
mkdir -p ~/.claude/skills/lessons-learned
# Copy the SKILL.md from github.com/aplaceforallmystuff
Invoke with /lessons-learned after any incident, mistake, or near-miss worth analysing.
That’s all five skills. They’re all published at github.com/aplaceforallmystuff — free to copy, modify, and make your own. The whole point is that your version should diverge from mine as you tune each one to your actual workflow.
If you want to go deeper on building skills from scratch — rather than adapting existing ones — the Building AI Agents course covers the full process: from first skill to multi-agent pipelines.
Check Your Understanding
Answer all questions correctly to complete this module.
1. What makes lessons-learned different from a standard post-mortem?
2. Why is 'Be more careful' not a fix?
3. What is the purpose of the 5 Whys technique?