Hermes Agent Automated Learning & Growth — Deep Dive

Nous Research's Hermes Agent is the fastest-growing open-source AI Agent of 2026 (148k GitHub Stars). Its core differentiator is a built-in Learning Loop — it automatically creates skills from experience, self-improves during use, and proactively persists knowledge, enabling cross-session capability accumulation. This article dissects the mechanism from the source code level.

1. Architecture Overview: The Four Stages of the Learning Loop

Hermes Agent's learning loop follows Observe → Distill → Reuse → Refine, running atop the main agent loop:

code
User Message
    │
    ▼
┌─────────────────────────────────┐
│   Agent Loop                    │
│   (run_conversation)            │
│                                 │
│   while budget_remaining:       │
│     response = LLM.call(...)    │
│     if tool_calls:              │
│       execute tools             │
│       append results            │
│     else:                       │
│       return response           │
│                                 │
│   ┌─────────────────────────┐   │
│   │ Self-Evaluation         │   │
│   │ Checkpoint              │   │
│   │ (every 15 tool calls)   │   │
│   └─────────┬───────────────┘   │
│             ▼                   │
│   ┌─────────────────────────┐   │
│   │ Skill Creation / Update │   │
│   │ Memory Nudge            │   │
│   └─────────────────────────┘   │
└─────────────────────────────────┘

Core insight: This is not model weight-level training, but a structured experience recording and retrieval system. LLM weights are never changed; what changes is the "knowledge layer" around the LLM — prompts, skill documents, and memory files.

2. Main Agent Loop: `run_conversation()`

Hermes' core driver is the AIAgent class in run_agent.py (~13,700 lines). Its main loop logic is as follows:

python
# run_agent.py - AIAgent.run_conversation() core loop (simplified)
def run_conversation(self):
    while (api_call_count < self.max_iterations
           and self.iteration_budget.remaining > 0) \
           or self._budget_grace_call:

        if self._interrupt_requested:
            break

        # 1. Build system prompt (including skills index, memory, context files)
        system_prompt = self._build_system_prompt()

        # 2. Call LLM (supports multiple providers and API modes)
        response = client.chat.completions.create(
            model=model,
            messages=messages,
            tools=tool_schemas
        )

        # 3. Parse response
        if response.tool_calls:
            for tool_call in response.tool_calls:
                # Execute tool call (supports ThreadPoolExecutor concurrency)
                result = handle_function_call(
                    tool_call.name,
                    tool_call.args,
                    task_id
                )
                messages.append(tool_result_message(result))

                # ★ Self-evaluation checkpoint: triggers every 15 tool calls
                if self._should_evaluate():
                    self._learning_checkpoint(messages)

            api_call_count += 1
        else:
            # Plain text response → persist memory and return
            self._flush_memory()
            return response.content

Key design points:

self.iteration_budget: Tracks budget consumption of parent and child agents to prevent infinite loops
should_evaluate(): Determines whether to trigger a learning checkpoint based on tool call count
_flush_memory(): Writes critical information to persistent files before context is lost

3. Skill System: The Core Vehicle of the Learning Loop

3.1 SKILL.md Format (agentskills.io Standard)

Skills are stored as Markdown files under ~/.hermes/skills/, with YAML frontmatter declaring metadata:

markdown
---
name: code-review
description: Execute standard code review workflow, check CI status, generate diff summary, flag style violations
version: 1.2.0
platforms: [macos, linux]
metadata:
  hermes:
    tags: [code-review, github, ci]
    related_skills: [github-pr-workflow, github-issues]
    config:
      - key: review.strictness
        description: Review strictness level (low/medium/high)
        default: "medium"
        prompt: Code review strictness level
---

# Code Review Skill

## Trigger Conditions
Automatically load when the user says "review this PR" or "review PR #xxx".

## Steps
1. Read the PR's list of changed files
2. Run linter checks on each file
3. Generate a change summary report
4. Post review comments to the PR

## Known Pitfalls
- For PRs with more than 500 changed lines, ask the author to split first
- Only review changed lines, not the entire file

3.2 Progressive Disclosure

prompt_builder.py implements a three-tier cached loading strategy, ensuring context overhead stays manageable even with hundreds of skills:

python
# agent/prompt_builder.py
def build_skills_system_prompt(
    available_tools=None,
    available_toolsets=None,
) -> str:
    """Build a compact skills index to inject into the system prompt.

    Three-tier cache:
      1. In-process LRU dict (keyed by skills_dir + tools + toolsets)
      2. Disk snapshot .skills_prompt_snapshot.json (cross-process restart)
      3. Full filesystem scan (fallback when cache misses)
    """

    # Index lines: only skill name + short description
    # Each line ~60-80 chars, 100 skills ≈ only 3K tokens
    index_lines = []
    for category in sorted(skills_by_category):
        for name, desc in sorted(...):
            if desc:
                index_lines.append(f"    - {name}: {desc}")
            else:
                index_lines.append(f"    - {name}")

    result = (
        "## Skills (mandatory)\n"
        "Before replying, scan the skills below. "
        "If a skill matches your task, you MUST load it with skill_view(name) "
        "and follow its instructions.\n"
        "<available_skills>\n"
        + "\n".join(index_lines) + "\n"
        "</available_skills>\n"
    )
    return result

Three-tier loading model:

Tier	Content	Token Cost	Trigger
L0	Skill name + description index	~3K (total)	Session start (auto-injected into system prompt)
L1	Full SKILL.md content	On-demand	Agent determines task relevance, calls `skill_view(name)`
L2	Supporting files (references/templates/scripts)	On-demand	Agent calls `skill_view(name, file_path)`

3.3 Agent Self-Managed Skills: The `skill_manage` Tool

This is the key tool of the learning loop — the Agent can create, update, and delete its own skill files at runtime via skill_manage.

python
# tools/skill_manager_tool.py
def skill_manage(action, name, content=None, category=None,
                 old_string=None, new_string=None, replace_all=False,
                 file_path=None, file_content=None):
    """
    Agent manages its own skills (procedural memory).

    Actions:
      create     - Create new skill (full SKILL.md)
      patch      - Targeted patch (old_string → new_string, preferred method)
      edit       - Full rewrite (replace entire SKILL.md)
      delete     - Delete a skill
      write_file - Add/update a supporting file
      remove_file - Delete a supporting file
    """
    if action == "create":
        return _create_skill(name, content, category)
    elif action == "patch":
        return _patch_skill(name, old_string, new_string,
                            replace_all, file_path)
    elif action == "edit":
        return _edit_skill(name, content)
    elif action == "delete":
        return _delete_skill(name)
    elif action == "write_file":
        return _write_file(name, file_path, file_content)
    elif action == "remove_file":
        return _remove_file(name, file_path)

`_patch_skill` Fuzzy Matching Engine

The patch operation uses the same 8-strategy fuzzy matching engine as the file editing tool:

python
# tools/skill_manager_tool.py
def _patch_skill(name, old_string, new_string,
                 replace_all=False, file_path=None):
    # ... locate skill file ...

    content = target.read_text(encoding="utf-8")

    # Use 8-strategy fuzzy matching engine
    # Handles: whitespace normalization, indentation differences,
    #          escape sequences, block anchor matching
    from tools.fuzzy_match import fuzzy_find_and_replace

    new_content, match_count, _strategy, match_error = \
        fuzzy_find_and_replace(content, old_string, new_string, replace_all)

    if match_error:
        # Return file preview for model self-correction
        preview = content[:500] + ("..." if len(content) > 500 else "")
        return {
            "success": False,
            "error": match_error,
            "file_preview": preview,
        }

    # Safety checks: injection detection + size limit
    err = _validate_content_size(new_content)
    if err:
        return {"success": False, "error": err}

    # Atomic write (rollback on failure)
    _atomic_write_text(target, new_content)

    return {
        "success": True,
        "message": f"Patched skill '{name}' ({match_count} replacements).",
    }

Full _create_skill Flow:

python
def _create_skill(name, content, category=None):
    # 1. Parse YAML frontmatter from content
    frontmatter, body = _parse_frontmatter(content)
    skill_name = frontmatter.get("name", name)

    # 2. Safety checks
    #    - Injection pattern detection
    #    - Content size limit (default 100K chars / 1 MiB)
    #    - Name conflict detection
    if _detect_injection(content):
        return {"success": False, "error": "Potential prompt injection detected"}

    err = _validate_content_size(content)
    if err:
        return {"success": False, "error": err}

    # 3. Determine target path
    if category:
        skill_dir = SKILLS_DIR / category / skill_name
    else:
        skill_dir = SKILLS_DIR / skill_name
    skill_dir.mkdir(parents=True, exist_ok=True)

    # 4. Write SKILL.md
    target = skill_dir / "SKILL.md"
    _atomic_write_text(target, content)

    return {
        "success": True,
        "message": f"Skill '{skill_name}' created.",
        "path": str(target),
    }

Content Size Limit Logic (from PR #4414):

python
_CONTENT_SIZE_LIMIT = 100_000       # 100K chars
_CONTENT_BYTE_LIMIT = 1_048_576     # 1 MiB

def _validate_content_size(content: str, label="SKILL.md"):
    if len(content) > _CONTENT_SIZE_LIMIT:
        return (
            f"{label} content is {len(content):,} characters "
            f"(limit: {_CONTENT_SIZE_LIMIT:,}). Consider splitting "
            f"into a smaller SKILL.md with supporting files "
            f"in references/ or templates/."
        )
    if len(content.encode("utf-8")) > _CONTENT_BYTE_LIMIT:
        return f"{label} exceeds 1 MiB."
    return None  # OK

3.4 Skill Guidance in the System Prompt

The SKILLS_GUIDANCE constant in prompt_builder.py tells the Agent when and how to create skills:

python
# agent/prompt_builder.py
SKILLS_GUIDANCE = (
    "After completing a complex task (5+ tool calls), fixing a tricky error, "
    "or discovering a non-trivial workflow, save the approach as a "
    "skill with skill_manage so you can reuse it next time.\n"
    "When using a skill and finding it outdated, incomplete, or wrong, "
    "patch it immediately with skill_manage(action='patch') — "
    "don't wait to be asked. "
    "Skills that aren't maintained become liabilities."
)

This is not hardcoded if-else logic — it guides the LLM through System Prompt to make autonomous decisions:

5+ tool calls → deemed a "complex task"
Successfully completed → create a skill
Better approach discovered → patch the skill
User correction → update the skill

This means learning ability comes from the LLM's own reasoning capability, not from preset rules.

4. Skill Lifecycle Management: The Curator

The Curator is Hermes' "skill steward," responsible for automated skill lifecycle management:

python
# agent/curator.py (core logic)
class SkillCurator:
    def run(self):
        """Curator run loop"""
        # 1. Load usage statistics
        usage = self._load_usage_stats()  # ~/.hermes/skills/.usage.json

        for skill in self._get_agent_skills():
            state = usage.get(skill.name, {})

            # 2. Determine state transition based on usage frequency
            if self._is_stale(skill, state):
                # Not used for over 30 days → archive
                self._archive_skill(skill)
            elif self._is_frequently_used(skill, state):
                # High-frequency usage → pin
                self._pin_skill(skill)

            # 3. LLM quality review for active skills
            if self._needs_review(skill, state):
                review = self._llm_review_skill(skill)
                if review.suggested_improvements:
                    self._apply_review(skill, review)

    def _archive_skill(self, skill):
        """Move skill to .archive/ directory, never delete"""
        archive_dir = SKILLS_DIR / ".archive"
        shutil.move(str(skill.dir), str(archive_dir / skill.name))

Curator Invariants:

Only operates on skills with created_by: "agent"; built-in and Hub-installed skills are unaffected
Never deletes, at most archives to .archive/
Pinned skills are exempt from all automatic operations
skill_manage(action="delete") refuses to delete pinned skills

5. Memory System: Cross-Session Knowledge Persistence

Hermes has three layers of memory, each with different storage and retrieval strategies:

5.1 Prompt Memory: MEMORY.md + USER.md

python
# agent/memory_manager.py
class MemoryManager:
    def flush(self, conversation_history):
        """Persist critical information from the current session to memory files"""
        memory_path = HERMES_HOME / "MEMORY.md"
        user_path = HERMES_HOME / "USER.md"

        # Extract key facts, user preferences, working conventions
        insights = self._extract_insights(conversation_history)

        # MEMORY.md: work-related memories (API endpoints, project structure, passwords, etc.)
        with open(memory_path, "a") as f:
            for insight in insights.work_memories:
                f.write(f"- {insight}\n")

        # USER.md: user profile (preferences, style, constraints)
        with open(user_path, "a") as f:
            for insight in insights.user_insights:
                f.write(f"- {insight}\n")

5.2 Episodic Memory: SQLite FTS5 Full-Text Search

Session trajectories are indexed into a SQLite database, supporting cross-session context retrieval:

sql
-- Session history index table (simplified)
CREATE VIRTUAL TABLE session_history USING fts5(
    content,          -- Session content
    metadata,         -- JSON metadata (time, platform, model, etc.)
    tokenize='porter' -- Stemming analyzer
);

-- Query: retrieves across 10,000+ documents in under 10ms
SELECT snippet(session_history, 1, '<b>', '</b>', '...', 32)
FROM session_history
WHERE session_history MATCH ?
ORDER BY rank
LIMIT 5;

5.3 Honcho User Modeling

Honcho is Hermes' dialectical user modeling engine. Unlike static user profiles, it maintains an evolving user model through ongoing conversation:

python
# Honcho user modeling (concept simplified)
class HonchoModel:
    def evolve(self, interaction):
        """Update user model based on new interaction"""

        # 1. Update user profile
        self.profile.update({
            "preferences": self._extract_preferences(interaction),
            "style": self._detect_communication_style(interaction),
            "constraints": self._extract_constraints(interaction),
        })

        # 2. Pattern detection
        if self._detect_pattern(self.history[-3:]):
            # Recurring pattern detected → create pattern insight
            self.insights.append(
                self._synthesize_pattern(self.history[-3:])
            )

        # 3. Insight synthesis
        self.history.append(interaction)

5.4 Memory Nudge: Proactive Memory

Every 10 interactions or at session end, Hermes asks itself: "What information from this conversation is worth remembering?"

Nudge Type	Trigger Condition	Purpose
Session end	Conversation closes	Summarize key takeaways
Pattern detection	3+ similar requests	Persist preferences
User declaration	User says "remember this"	Immediate storage
Periodic check	Every 10 interactions	Check for valuable information

6. The Complete Skill Self-Improvement Cycle

Now let's trace a complete example — see how Hermes learns and grows from a single code review:

First Request: "Review this PR"

code
1. Agent receives the request
2. Scans skills_list (reads skill index from system prompt)
3. No matching code-review skill → reasons from scratch
4. Executes 7 tool calls:
   - gh pr view #42
   - gh pr diff
   - Run lint on each file
   - Summarize changes
   - Post comments
5. Completed successfully ✓
6. ★ Self-evaluation checkpoint triggered (7 > 5)
7. Agent determines this is a reusable workflow
8. Calls skill_manage(action='create', name='code-review', content=...)
9. File written to ~/.hermes/skills/code-review/SKILL.md

Second Request: "Review PR #58 again"

code
1. Agent receives the request
2. Scans skills_list → finds matching code-review skill
3. ★ Trigger condition matched! Calls skill_view("code-review")
4. Loads full SKILL.md → executes according to steps
5. Encounters a new issue: PR contains binary files
6. Agent discovers the skill has no steps for handling binary files
7. After completing the review:
   → skill_manage(action='patch', name='code-review',
                  old_string='## Steps\n1. Read the PR list of changed files',
                  new_string='## Steps\n1. Read the PR list of changed files\n'
                             '2. Filter binary files (.png, .ico, etc.), only review text files')
8. The skill has grown from 5 steps to 6, adding edge case handling

Nth Time: Skill Maturation

After 20-30 uses, the skill document has evolved from a simple instruction set into a battle-tested operations manual:

Initial: 5 lines of steps
One month later: 30+ lines, including known pitfalls, verification steps, edge cases
Previously unhandled edge cases progressively patched
Obsolete steps removed via edit
User preferences and organizational norms solidified into the skill

Performance Comparison:

Metric	Week 1	Week 6
Tool calls per review	25	8-10
Error rate	High (frequently misses steps)	Low (edge cases covered)
Human intervention needed	Frequent	Almost never

7. RL Reinforcement Learning Pipeline: Atropos Integration

Beyond skill-level procedural learning, Hermes integrates the Atropos RL pipeline for deeper behavioral optimization:

python
# rl_cli.py - Atropos RL training entry point
class AtroposRLPipeline:
    def train_from_trajectories(self, trajectories_dir):
        """Reinforcement learning training from interaction trajectories"""

        # 1. Batch trajectory generation
        trajectories = self._load_trajectories(trajectories_dir)

        # 2. Trajectory compression (reduce token overhead)
        compressed = trajectory_compressor.compress(trajectories)

        # 3. Supports RLHF / DPO training
        #    User ratings, correction flags, automated evaluation
        for trajectory in compressed:
            reward = self._compute_reward(trajectory)
            # RLHF: user feedback as reward signal
            # DPO: preference contrast training
            self._training_step(trajectory, reward)

        # 4. Export to ShareGPT format
        self._export_for_finetuning(compressed)

However, it must be emphasized: the RL pipeline is optional and offline. The daily learning loop (skill creation → patching → memory) does not require weight updates and happens in real-time as the user works.

8. What "Self-Improvement" Really Means

To understand Hermes' learning mechanism, a key distinction must be made:

Dimension	Hermes Learning	Traditional ML Training
Target	Prompts, skill documents, memory files	Model weights
Scope	Specific user workflow	Global capability
Frequency	Real-time (after every task)	Periodic (training phases)
Storage	Filesystem (plaintext Markdown)	Model parameters (binary)
Explainability	Fully transparent (readable and editable)	Black box
Rollback	Simple (delete file or git revert)	Requires retraining

Conclusion: Hermes' "self-improvement" is not the model getting smarter — it is the auxiliary layer around the model — procedural memory (Skills) and declarative memory (MEMORY.md/USER.md) — continuously accumulating experience. But this is precisely the most important improvement from a practical standpoint: an Agent that understands your workflow is more valuable than a general-purpose model with more parameters.

9. Learning from Code: A Complete Minimal Learning Loop Implementation

To aid understanding, here is a minimal learning loop implementation (not Hermes source code, but a principle demonstration):

python
"""Minimal learning loop demo"""
import json
from pathlib import Path

SKILLS_DIR = Path.home() / ".demo-skills"
TOOL_CALL_THRESHOLD = 3  # Trigger learning after 3+ tool calls

class LearningAgent:
    def __init__(self):
        self.tool_call_count = 0
        self.conversation_history = []
        SKILLS_DIR.mkdir(exist_ok=True)

    def run(self, user_input):
        self.conversation_history.append({"role": "user", "content": user_input})

        # 1. Build prompt (including skill index)
        prompt = self._build_prompt()

        # 2. Call LLM
        response = self._call_llm(prompt)

        # 3. Execute tool calls
        if response.get("tool_calls"):
            for tc in response["tool_calls"]:
                result = self._execute_tool(tc)
                self.conversation_history.append({
                    "role": "tool",
                    "content": result
                })
                self.tool_call_count += 1

            # ★ Self-evaluation checkpoint
            if self.tool_call_count >= TOOL_CALL_THRESHOLD:
                self._learning_checkpoint()

            # Continue loop...
            return self.run(user_input)
        else:
            return response["content"]

    def _learning_checkpoint(self):
        """Self-evaluation: determine if a skill needs to be created/updated"""
        # Extract steps from this task
        steps = self._extract_steps()

        if not steps:
            return

        # Detect task type
        task_type = self._classify_task(steps)

        existing_skill = self._find_skill(task_type)

        if existing_skill:
            # Compare differences between existing skill and actual execution steps
            new_steps = self._find_new_steps(existing_skill, steps)
            if new_steps:
                # Patch skill: add newly discovered steps
                self._patch_skill(existing_skill, new_steps)
                print(f"  → Skill '{task_type}' updated (+{len(new_steps)} steps)")
        else:
            # Create new skill
            self._create_skill(task_type, steps)
            print(f"  → New skill '{task_type}' created ({len(steps)} steps)")

    def _create_skill(self, name, steps):
        """Create SKILL.md file"""
        content = f"""---
name: {name}
description: Auto-created skill
created_by: agent
---

# {name}

## Steps
"""
        for i, step in enumerate(steps, 1):
            content += f"{i}. {step}\n"

        skill_dir = SKILLS_DIR / name
        skill_dir.mkdir(exist_ok=True)
        (skill_dir / "SKILL.md").write_text(content)

    def _patch_skill(self, name, new_steps):
        """Patch skill: add new steps"""
        skill_file = SKILLS_DIR / name / "SKILL.md"
        content = skill_file.read_text()

        # Append new steps under ## Steps section
        old = "## Steps\n"
        new = old + "\n".join(
            f"{i}. {step}" for i, step in enumerate(new_steps, 1)
        ) + "\n"
        content = content.replace(old, new, 1)

        skill_file.write_text(content)

Running the demo:

python
agent = LearningAgent()
agent.run("Set up CI/CD pipeline for me")
# Executed 5 tool calls
# → Self-evaluation triggered
# → Created skill 'ci-cd-pipeline' (8 steps)

agent.run("Set up CI/CD for a frontend project")
# Skill matched! Loaded ci-cd-pipeline skill
# Discovered missing npm install step during execution
# → Auto-patched skill
# → Skill updated to 9 steps

10. Summary

Mechanism	Technical Implementation	Learning Effect
Skill creation	Abstract execution traces of complex tasks into SKILL.md	From "don't know how" to "have a standard method"
Skill patching	Patch skill files using fuzzy matching engine	From "have a standard method" to "method keeps getting better"
Progressive skill loading	L0 index + L1 content + L2 supporting files	Hundreds of skills without increasing token overhead
Memory persistence	MEMORY.md / USER.md + SQLite FTS5	Cross-session knowledge never lost
Curator lifecycle	Auto-archival + LLM review	Skill library stays healthy, no stale skills
Honcho user modeling	Dialectical evolving user profiling	Agent knows you better over time
Atropos RL	Trajectory compression + DPO/RLHF training	Optional deep optimization of model behavior

Hermes' learning loop fundamentally combines the LLM's reasoning capability with the filesystem's persistence: the LLM judges "what's worth learning" and "how to improve," while the filesystem handles "remembering" and "retrieval." This architecture allows the Agent to continuously accumulate domain knowledge through use, evolving from a general-purpose assistant on Day 1 into a dedicated work partner by Day 30.

As Nous Research puts it: "This is not a smarter model, this is a smarter wrapper." (The LLM is a replaceable component; the real engineering work happens in the layers around it.)

This article is based on the source code analysis of Hermes Agent v2026.5.7 (v0.13.0), GitHub: https://github.com/NousResearch/hermes-agent