Hermes Agent Automated Learning & Growth — Deep Dive
Nous Research's Hermes Agent is the fastest-growing open-source AI Agent of 2026 (148k GitHub Stars). Its core differentiator is a built-in Learning Loop — it automatically creates skills from experience, self-improves during use, and proactively persists knowledge, enabling cross-session capability accumulation. This article dissects the mechanism from the source code level.
1. Architecture Overview: The Four Stages of the Learning Loop
Hermes Agent's learning loop follows Observe → Distill → Reuse → Refine, running atop the main agent loop:
User Message
│
▼
┌─────────────────────────────────┐
│ Agent Loop │
│ (run_conversation) │
│ │
│ while budget_remaining: │
│ response = LLM.call(...) │
│ if tool_calls: │
│ execute tools │
│ append results │
│ else: │
│ return response │
│ │
│ ┌─────────────────────────┐ │
│ │ Self-Evaluation │ │
│ │ Checkpoint │ │
│ │ (every 15 tool calls) │ │
│ └─────────┬───────────────┘ │
│ ▼ │
│ ┌─────────────────────────┐ │
│ │ Skill Creation / Update │ │
│ │ Memory Nudge │ │
│ └─────────────────────────┘ │
└─────────────────────────────────┘Core insight: This is not model weight-level training, but a structured experience recording and retrieval system. LLM weights are never changed; what changes is the "knowledge layer" around the LLM — prompts, skill documents, and memory files.
2. Main Agent Loop: `run_conversation()`
Hermes' core driver is the AIAgent class in run_agent.py (~13,700 lines). Its main loop logic is as follows:
# run_agent.py - AIAgent.run_conversation() core loop (simplified)
def run_conversation(self):
while (api_call_count < self.max_iterations
and self.iteration_budget.remaining > 0) \
or self._budget_grace_call:
if self._interrupt_requested:
break
# 1. Build system prompt (including skills index, memory, context files)
system_prompt = self._build_system_prompt()
# 2. Call LLM (supports multiple providers and API modes)
response = client.chat.completions.create(
model=model,
messages=messages,
tools=tool_schemas
)
# 3. Parse response
if response.tool_calls:
for tool_call in response.tool_calls:
# Execute tool call (supports ThreadPoolExecutor concurrency)
result = handle_function_call(
tool_call.name,
tool_call.args,
task_id
)
messages.append(tool_result_message(result))
# ★ Self-evaluation checkpoint: triggers every 15 tool calls
if self._should_evaluate():
self._learning_checkpoint(messages)
api_call_count += 1
else:
# Plain text response → persist memory and return
self._flush_memory()
return response.contentKey design points:
self.iteration_budget: Tracks budget consumption of parent and child agents to prevent infinite loopsshould_evaluate(): Determines whether to trigger a learning checkpoint based on tool call count_flush_memory(): Writes critical information to persistent files before context is lost
3. Skill System: The Core Vehicle of the Learning Loop
3.1 SKILL.md Format (agentskills.io Standard)
Skills are stored as Markdown files under ~/.hermes/skills/, with YAML frontmatter declaring metadata:
---
name: code-review
description: Execute standard code review workflow, check CI status, generate diff summary, flag style violations
version: 1.2.0
platforms: [macos, linux]
metadata:
hermes:
tags: [code-review, github, ci]
related_skills: [github-pr-workflow, github-issues]
config:
- key: review.strictness
description: Review strictness level (low/medium/high)
default: "medium"
prompt: Code review strictness level
---
# Code Review Skill
## Trigger Conditions
Automatically load when the user says "review this PR" or "review PR #xxx".
## Steps
1. Read the PR's list of changed files
2. Run linter checks on each file
3. Generate a change summary report
4. Post review comments to the PR
## Known Pitfalls
- For PRs with more than 500 changed lines, ask the author to split first
- Only review changed lines, not the entire file3.2 Progressive Disclosure
prompt_builder.py implements a three-tier cached loading strategy, ensuring context overhead stays manageable even with hundreds of skills:
# agent/prompt_builder.py
def build_skills_system_prompt(
available_tools=None,
available_toolsets=None,
) -> str:
"""Build a compact skills index to inject into the system prompt.
Three-tier cache:
1. In-process LRU dict (keyed by skills_dir + tools + toolsets)
2. Disk snapshot .skills_prompt_snapshot.json (cross-process restart)
3. Full filesystem scan (fallback when cache misses)
"""
# Index lines: only skill name + short description
# Each line ~60-80 chars, 100 skills ≈ only 3K tokens
index_lines = []
for category in sorted(skills_by_category):
for name, desc in sorted(...):
if desc:
index_lines.append(f" - {name}: {desc}")
else:
index_lines.append(f" - {name}")
result = (
"## Skills (mandatory)\n"
"Before replying, scan the skills below. "
"If a skill matches your task, you MUST load it with skill_view(name) "
"and follow its instructions.\n"
"<available_skills>\n"
+ "\n".join(index_lines) + "\n"
"</available_skills>\n"
)
return resultThree-tier loading model:
| Tier | Content | Token Cost | Trigger |
|---|---|---|---|
| L0 | Skill name + description index | ~3K (total) | Session start (auto-injected into system prompt) |
| L1 | Full SKILL.md content | On-demand | Agent determines task relevance, calls skill_view(name) |
| L2 | Supporting files (references/templates/scripts) | On-demand | Agent calls skill_view(name, file_path) |
3.3 Agent Self-Managed Skills: The `skill_manage` Tool
This is the key tool of the learning loop — the Agent can create, update, and delete its own skill files at runtime via skill_manage.
# tools/skill_manager_tool.py
def skill_manage(action, name, content=None, category=None,
old_string=None, new_string=None, replace_all=False,
file_path=None, file_content=None):
"""
Agent manages its own skills (procedural memory).
Actions:
create - Create new skill (full SKILL.md)
patch - Targeted patch (old_string → new_string, preferred method)
edit - Full rewrite (replace entire SKILL.md)
delete - Delete a skill
write_file - Add/update a supporting file
remove_file - Delete a supporting file
"""
if action == "create":
return _create_skill(name, content, category)
elif action == "patch":
return _patch_skill(name, old_string, new_string,
replace_all, file_path)
elif action == "edit":
return _edit_skill(name, content)
elif action == "delete":
return _delete_skill(name)
elif action == "write_file":
return _write_file(name, file_path, file_content)
elif action == "remove_file":
return _remove_file(name, file_path)`_patch_skill` Fuzzy Matching Engine
The patch operation uses the same 8-strategy fuzzy matching engine as the file editing tool:
# tools/skill_manager_tool.py
def _patch_skill(name, old_string, new_string,
replace_all=False, file_path=None):
# ... locate skill file ...
content = target.read_text(encoding="utf-8")
# Use 8-strategy fuzzy matching engine
# Handles: whitespace normalization, indentation differences,
# escape sequences, block anchor matching
from tools.fuzzy_match import fuzzy_find_and_replace
new_content, match_count, _strategy, match_error = \
fuzzy_find_and_replace(content, old_string, new_string, replace_all)
if match_error:
# Return file preview for model self-correction
preview = content[:500] + ("..." if len(content) > 500 else "")
return {
"success": False,
"error": match_error,
"file_preview": preview,
}
# Safety checks: injection detection + size limit
err = _validate_content_size(new_content)
if err:
return {"success": False, "error": err}
# Atomic write (rollback on failure)
_atomic_write_text(target, new_content)
return {
"success": True,
"message": f"Patched skill '{name}' ({match_count} replacements).",
}Full _create_skill Flow:
def _create_skill(name, content, category=None):
# 1. Parse YAML frontmatter from content
frontmatter, body = _parse_frontmatter(content)
skill_name = frontmatter.get("name", name)
# 2. Safety checks
# - Injection pattern detection
# - Content size limit (default 100K chars / 1 MiB)
# - Name conflict detection
if _detect_injection(content):
return {"success": False, "error": "Potential prompt injection detected"}
err = _validate_content_size(content)
if err:
return {"success": False, "error": err}
# 3. Determine target path
if category:
skill_dir = SKILLS_DIR / category / skill_name
else:
skill_dir = SKILLS_DIR / skill_name
skill_dir.mkdir(parents=True, exist_ok=True)
# 4. Write SKILL.md
target = skill_dir / "SKILL.md"
_atomic_write_text(target, content)
return {
"success": True,
"message": f"Skill '{skill_name}' created.",
"path": str(target),
}Content Size Limit Logic (from PR #4414):
_CONTENT_SIZE_LIMIT = 100_000 # 100K chars
_CONTENT_BYTE_LIMIT = 1_048_576 # 1 MiB
def _validate_content_size(content: str, label="SKILL.md"):
if len(content) > _CONTENT_SIZE_LIMIT:
return (
f"{label} content is {len(content):,} characters "
f"(limit: {_CONTENT_SIZE_LIMIT:,}). Consider splitting "
f"into a smaller SKILL.md with supporting files "
f"in references/ or templates/."
)
if len(content.encode("utf-8")) > _CONTENT_BYTE_LIMIT:
return f"{label} exceeds 1 MiB."
return None # OK3.4 Skill Guidance in the System Prompt
The SKILLS_GUIDANCE constant in prompt_builder.py tells the Agent when and how to create skills:
# agent/prompt_builder.py
SKILLS_GUIDANCE = (
"After completing a complex task (5+ tool calls), fixing a tricky error, "
"or discovering a non-trivial workflow, save the approach as a "
"skill with skill_manage so you can reuse it next time.\n"
"When using a skill and finding it outdated, incomplete, or wrong, "
"patch it immediately with skill_manage(action='patch') — "
"don't wait to be asked. "
"Skills that aren't maintained become liabilities."
)This is not hardcoded if-else logic — it guides the LLM through System Prompt to make autonomous decisions:
- 5+ tool calls → deemed a "complex task"
- Successfully completed → create a skill
- Better approach discovered → patch the skill
- User correction → update the skill
This means learning ability comes from the LLM's own reasoning capability, not from preset rules.
4. Skill Lifecycle Management: The Curator
The Curator is Hermes' "skill steward," responsible for automated skill lifecycle management:
# agent/curator.py (core logic)
class SkillCurator:
def run(self):
"""Curator run loop"""
# 1. Load usage statistics
usage = self._load_usage_stats() # ~/.hermes/skills/.usage.json
for skill in self._get_agent_skills():
state = usage.get(skill.name, {})
# 2. Determine state transition based on usage frequency
if self._is_stale(skill, state):
# Not used for over 30 days → archive
self._archive_skill(skill)
elif self._is_frequently_used(skill, state):
# High-frequency usage → pin
self._pin_skill(skill)
# 3. LLM quality review for active skills
if self._needs_review(skill, state):
review = self._llm_review_skill(skill)
if review.suggested_improvements:
self._apply_review(skill, review)
def _archive_skill(self, skill):
"""Move skill to .archive/ directory, never delete"""
archive_dir = SKILLS_DIR / ".archive"
shutil.move(str(skill.dir), str(archive_dir / skill.name))Curator Invariants:
- Only operates on skills with
created_by: "agent"; built-in and Hub-installed skills are unaffected - Never deletes, at most archives to
.archive/ - Pinned skills are exempt from all automatic operations
skill_manage(action="delete")refuses to delete pinned skills
5. Memory System: Cross-Session Knowledge Persistence
Hermes has three layers of memory, each with different storage and retrieval strategies:
5.1 Prompt Memory: MEMORY.md + USER.md
# agent/memory_manager.py
class MemoryManager:
def flush(self, conversation_history):
"""Persist critical information from the current session to memory files"""
memory_path = HERMES_HOME / "MEMORY.md"
user_path = HERMES_HOME / "USER.md"
# Extract key facts, user preferences, working conventions
insights = self._extract_insights(conversation_history)
# MEMORY.md: work-related memories (API endpoints, project structure, passwords, etc.)
with open(memory_path, "a") as f:
for insight in insights.work_memories:
f.write(f"- {insight}\n")
# USER.md: user profile (preferences, style, constraints)
with open(user_path, "a") as f:
for insight in insights.user_insights:
f.write(f"- {insight}\n")5.2 Episodic Memory: SQLite FTS5 Full-Text Search
Session trajectories are indexed into a SQLite database, supporting cross-session context retrieval:
-- Session history index table (simplified)
CREATE VIRTUAL TABLE session_history USING fts5(
content, -- Session content
metadata, -- JSON metadata (time, platform, model, etc.)
tokenize='porter' -- Stemming analyzer
);
-- Query: retrieves across 10,000+ documents in under 10ms
SELECT snippet(session_history, 1, '<b>', '</b>', '...', 32)
FROM session_history
WHERE session_history MATCH ?
ORDER BY rank
LIMIT 5;5.3 Honcho User Modeling
Honcho is Hermes' dialectical user modeling engine. Unlike static user profiles, it maintains an evolving user model through ongoing conversation:
# Honcho user modeling (concept simplified)
class HonchoModel:
def evolve(self, interaction):
"""Update user model based on new interaction"""
# 1. Update user profile
self.profile.update({
"preferences": self._extract_preferences(interaction),
"style": self._detect_communication_style(interaction),
"constraints": self._extract_constraints(interaction),
})
# 2. Pattern detection
if self._detect_pattern(self.history[-3:]):
# Recurring pattern detected → create pattern insight
self.insights.append(
self._synthesize_pattern(self.history[-3:])
)
# 3. Insight synthesis
self.history.append(interaction)5.4 Memory Nudge: Proactive Memory
Every 10 interactions or at session end, Hermes asks itself: "What information from this conversation is worth remembering?"
| Nudge Type | Trigger Condition | Purpose |
|---|---|---|
| Session end | Conversation closes | Summarize key takeaways |
| Pattern detection | 3+ similar requests | Persist preferences |
| User declaration | User says "remember this" | Immediate storage |
| Periodic check | Every 10 interactions | Check for valuable information |
6. The Complete Skill Self-Improvement Cycle
Now let's trace a complete example — see how Hermes learns and grows from a single code review:
First Request: "Review this PR"
1. Agent receives the request
2. Scans skills_list (reads skill index from system prompt)
3. No matching code-review skill → reasons from scratch
4. Executes 7 tool calls:
- gh pr view #42
- gh pr diff
- Run lint on each file
- Summarize changes
- Post comments
5. Completed successfully ✓
6. ★ Self-evaluation checkpoint triggered (7 > 5)
7. Agent determines this is a reusable workflow
8. Calls skill_manage(action='create', name='code-review', content=...)
9. File written to ~/.hermes/skills/code-review/SKILL.mdSecond Request: "Review PR #58 again"
1. Agent receives the request
2. Scans skills_list → finds matching code-review skill
3. ★ Trigger condition matched! Calls skill_view("code-review")
4. Loads full SKILL.md → executes according to steps
5. Encounters a new issue: PR contains binary files
6. Agent discovers the skill has no steps for handling binary files
7. After completing the review:
→ skill_manage(action='patch', name='code-review',
old_string='## Steps\n1. Read the PR list of changed files',
new_string='## Steps\n1. Read the PR list of changed files\n'
'2. Filter binary files (.png, .ico, etc.), only review text files')
8. The skill has grown from 5 steps to 6, adding edge case handlingNth Time: Skill Maturation
After 20-30 uses, the skill document has evolved from a simple instruction set into a battle-tested operations manual:
- Initial: 5 lines of steps
- One month later: 30+ lines, including known pitfalls, verification steps, edge cases
- Previously unhandled edge cases progressively patched
- Obsolete steps removed via edit
- User preferences and organizational norms solidified into the skill
Performance Comparison:
| Metric | Week 1 | Week 6 |
|---|---|---|
| Tool calls per review | 25 | 8-10 |
| Error rate | High (frequently misses steps) | Low (edge cases covered) |
| Human intervention needed | Frequent | Almost never |
7. RL Reinforcement Learning Pipeline: Atropos Integration
Beyond skill-level procedural learning, Hermes integrates the Atropos RL pipeline for deeper behavioral optimization:
# rl_cli.py - Atropos RL training entry point
class AtroposRLPipeline:
def train_from_trajectories(self, trajectories_dir):
"""Reinforcement learning training from interaction trajectories"""
# 1. Batch trajectory generation
trajectories = self._load_trajectories(trajectories_dir)
# 2. Trajectory compression (reduce token overhead)
compressed = trajectory_compressor.compress(trajectories)
# 3. Supports RLHF / DPO training
# User ratings, correction flags, automated evaluation
for trajectory in compressed:
reward = self._compute_reward(trajectory)
# RLHF: user feedback as reward signal
# DPO: preference contrast training
self._training_step(trajectory, reward)
# 4. Export to ShareGPT format
self._export_for_finetuning(compressed)However, it must be emphasized: the RL pipeline is optional and offline. The daily learning loop (skill creation → patching → memory) does not require weight updates and happens in real-time as the user works.
8. What "Self-Improvement" Really Means
To understand Hermes' learning mechanism, a key distinction must be made:
| Dimension | Hermes Learning | Traditional ML Training |
|---|---|---|
| Target | Prompts, skill documents, memory files | Model weights |
| Scope | Specific user workflow | Global capability |
| Frequency | Real-time (after every task) | Periodic (training phases) |
| Storage | Filesystem (plaintext Markdown) | Model parameters (binary) |
| Explainability | Fully transparent (readable and editable) | Black box |
| Rollback | Simple (delete file or git revert) | Requires retraining |
Conclusion: Hermes' "self-improvement" is not the model getting smarter — it is the auxiliary layer around the model — procedural memory (Skills) and declarative memory (MEMORY.md/USER.md) — continuously accumulating experience. But this is precisely the most important improvement from a practical standpoint: an Agent that understands your workflow is more valuable than a general-purpose model with more parameters.
9. Learning from Code: A Complete Minimal Learning Loop Implementation
To aid understanding, here is a minimal learning loop implementation (not Hermes source code, but a principle demonstration):
"""Minimal learning loop demo"""
import json
from pathlib import Path
SKILLS_DIR = Path.home() / ".demo-skills"
TOOL_CALL_THRESHOLD = 3 # Trigger learning after 3+ tool calls
class LearningAgent:
def __init__(self):
self.tool_call_count = 0
self.conversation_history = []
SKILLS_DIR.mkdir(exist_ok=True)
def run(self, user_input):
self.conversation_history.append({"role": "user", "content": user_input})
# 1. Build prompt (including skill index)
prompt = self._build_prompt()
# 2. Call LLM
response = self._call_llm(prompt)
# 3. Execute tool calls
if response.get("tool_calls"):
for tc in response["tool_calls"]:
result = self._execute_tool(tc)
self.conversation_history.append({
"role": "tool",
"content": result
})
self.tool_call_count += 1
# ★ Self-evaluation checkpoint
if self.tool_call_count >= TOOL_CALL_THRESHOLD:
self._learning_checkpoint()
# Continue loop...
return self.run(user_input)
else:
return response["content"]
def _learning_checkpoint(self):
"""Self-evaluation: determine if a skill needs to be created/updated"""
# Extract steps from this task
steps = self._extract_steps()
if not steps:
return
# Detect task type
task_type = self._classify_task(steps)
existing_skill = self._find_skill(task_type)
if existing_skill:
# Compare differences between existing skill and actual execution steps
new_steps = self._find_new_steps(existing_skill, steps)
if new_steps:
# Patch skill: add newly discovered steps
self._patch_skill(existing_skill, new_steps)
print(f" → Skill '{task_type}' updated (+{len(new_steps)} steps)")
else:
# Create new skill
self._create_skill(task_type, steps)
print(f" → New skill '{task_type}' created ({len(steps)} steps)")
def _create_skill(self, name, steps):
"""Create SKILL.md file"""
content = f"""---
name: {name}
description: Auto-created skill
created_by: agent
---
# {name}
## Steps
"""
for i, step in enumerate(steps, 1):
content += f"{i}. {step}\n"
skill_dir = SKILLS_DIR / name
skill_dir.mkdir(exist_ok=True)
(skill_dir / "SKILL.md").write_text(content)
def _patch_skill(self, name, new_steps):
"""Patch skill: add new steps"""
skill_file = SKILLS_DIR / name / "SKILL.md"
content = skill_file.read_text()
# Append new steps under ## Steps section
old = "## Steps\n"
new = old + "\n".join(
f"{i}. {step}" for i, step in enumerate(new_steps, 1)
) + "\n"
content = content.replace(old, new, 1)
skill_file.write_text(content)Running the demo:
agent = LearningAgent()
agent.run("Set up CI/CD pipeline for me")
# Executed 5 tool calls
# → Self-evaluation triggered
# → Created skill 'ci-cd-pipeline' (8 steps)
agent.run("Set up CI/CD for a frontend project")
# Skill matched! Loaded ci-cd-pipeline skill
# Discovered missing npm install step during execution
# → Auto-patched skill
# → Skill updated to 9 steps10. Summary
| Mechanism | Technical Implementation | Learning Effect |
|---|---|---|
| Skill creation | Abstract execution traces of complex tasks into SKILL.md | From "don't know how" to "have a standard method" |
| Skill patching | Patch skill files using fuzzy matching engine | From "have a standard method" to "method keeps getting better" |
| Progressive skill loading | L0 index + L1 content + L2 supporting files | Hundreds of skills without increasing token overhead |
| Memory persistence | MEMORY.md / USER.md + SQLite FTS5 | Cross-session knowledge never lost |
| Curator lifecycle | Auto-archival + LLM review | Skill library stays healthy, no stale skills |
| Honcho user modeling | Dialectical evolving user profiling | Agent knows you better over time |
| Atropos RL | Trajectory compression + DPO/RLHF training | Optional deep optimization of model behavior |
Hermes' learning loop fundamentally combines the LLM's reasoning capability with the filesystem's persistence: the LLM judges "what's worth learning" and "how to improve," while the filesystem handles "remembering" and "retrieval." This architecture allows the Agent to continuously accumulate domain knowledge through use, evolving from a general-purpose assistant on Day 1 into a dedicated work partner by Day 30.
As Nous Research puts it: "This is not a smarter model, this is a smarter wrapper." (The LLM is a replaceable component; the real engineering work happens in the layers around it.)
This article is based on the source code analysis of Hermes Agent v2026.5.7 (v0.13.0), GitHub: https://github.com/NousResearch/hermes-agent