Hermes Agent Skills Advanced Guide: From SKILL.md to GEPA Self-Evolution
In early 2026, Nous Research's Hermes Agent crossed 160K GitHub stars in two months. Its core idea is "the agent that grows with you"—an agent that gets smarter the more you use it. The foundation is the Skills system: standardized, evolvable, cross-session procedural memory—not a one-off prompt.
For developers already running Hermes, this guide covers the full advanced picture: ① how Skills differ from Memory and Prompt, and how Progressive Disclosure controls token cost; ② SKILL.md format, Skill Bundles, conditional activation, and Tap publishing; ③ GEPA + DSPy five-stage self-evolution and the community ecosystem. After reading, you can write, bundle, publish, and evolve your own skill assets independently.
01 Why Hermes Agent's Skills system deserves dedicated study
Getting-started tutorials answer "how to install." Advanced work answers "how to make the agent stronger over time." Hermes Skills stand out on four axes:
- On-demand loading: zero token cost before activation; Progressive Disclosure keeps spend predictable.
- Open standard: follows agentskills.io—skills reuse across Hermes, Claude Code, and Cursor.
- Composable: Skill Bundles load a full workflow with one slash command.
- Evolvable: GEPA analyzes execution traces and improves SKILL.md text without touching model weights.
Four pain points advanced users hit most often:
- Token bloat: stuffing every SOP into the system prompt burns thousands of tokens every session.
- Wrong skill activation: vague descriptions cause the LLM to load the wrong skill in unrelated contexts.
- Fragmented workflows: PR review, TDD, and deploy each need a separate
/skill-name—slow and tedious. - No team sharing: skills live in personal folders; onboarding on a new machine is painful.
02 Skills, Memory, and Prompt: what is the difference?
| Dimension | Plain Prompt | Memory | Skills |
|---|---|---|---|
| Persistence | Current conversation | Cross-session, permanent | Cross-session, permanent |
| Load timing | Always in context | Auto-injected each session | On demand |
| Token cost | Every turn | Small and stable | Zero before activation |
| Content type | Any intent description | User preferences / facts | Procedural steps |
| Maintained by | User manually | Agent automatically | User and agent |
| Shareability | Awkward | Private | Publishable as community Tap |
Memory aid: Prompt = sticky note (valid this turn); Memory = notebook (permanent notes, always nearby); Skill = SOP manual (step-by-step process, opened when needed).
Skills complement MCP: MCP provides tool interfaces (e.g., database access); Skills teach the agent how to use those tools correctly for tasks like migrations.
03 SKILL.md format and Progressive Disclosure
All Hermes Skills follow the agentskills.io open standard. Basic frontmatter structure:
---
name: my-skill
description: |
Use when the user needs to [...].
version: 1.0.0
license: MIT
compatibility: Requires git, docker
allowed-tools: Bash(git:*) Read
metadata:
hermes:
tags: [devops, automation]
category: software-development
related_skills: [github-pr-workflow]
requires_toolsets: [terminal]
fallback_for_toolsets: [web]
---
# My Skill Title
## Overview / When to Use / Procedure / Common Pitfalls / Verification Checklist
Recommended directory layout:
my-category/my-skill/
├── SKILL.md # core steps; aim for ≤500 lines
├── references/ # API refs; loaded on demand
├── templates/ # reusable templates
└── scripts/ # scripts the agent can run directly
| Level | Content | Trigger | Token cost |
|---|---|---|---|
| Level 0 | name + description | Every session start, all skills | ~3K total across all skills |
| Level 1 | Full SKILL.md body | /skill-name or LLM decides needed |
Depends on file length |
| Level 2 | references/ scripts/ files | LLM decides at execution time | On demand, per file |
Writing tips: description is all Level 0 sees—"when to use" beats "what it is"; SKILL.md should include Overview, When to Use, Procedure, Common Pitfalls, and Verification Checklist. Validate with skills-ref validate ./my-skill.
04 Skill Bundles: one command for a full workflow
Skill Bundles are a Hermes 2026 feature: lightweight YAML packs multiple skills into one slash command. Running /bundle-name loads every listed skill at once.
File location: ~/.hermes/skill-bundles/<slug>.yaml
name: backend-dev
description: Full backend feature workflow — code review, TDD, and PR management.
skills:
- github-code-review
- test-driven-development
- github-pr-workflow
instruction: |
Always write failing tests first before implementation.
Never push directly to main.
Advanced scenarios: research workflows can bundle arxiv, deep-research, plan, and excalidraw; MLOps deploy can bundle vllm, llama-cpp, github-pr-workflow, and systematic-debugging.
Priority rules: when a Bundle and single Skill share a name, the Bundle wins; missing skills are skipped with a warning, not an error; Bundles do not alter the system prompt, so Prompt Cache stays valid.
hermes bundles create backend-dev \
--skills github-code-review,test-driven-development,github-pr-workflow \
--instruction "Always write failing tests first"
05 Conditional activation: skills that sense the environment
Under metadata.hermes, four activation rules let skills show or hide based on tool availability:
| Field | Behavior |
|---|---|
requires_toolsets |
Hide skill when listed toolsets are missing |
requires_tools |
Hide skill when listed tools are missing |
fallback_for_toolsets |
Hide skill when listed toolsets exist (fallback path) |
fallback_for_tools |
Hide skill when listed tools exist |
Classic scenario: a DuckDuckGo search skill sets fallback_for_tools: [web_search]—when the user configures FIRECRAWL_KEY or BRAVE_SEARCH_KEY, paid web_search activates and DuckDuckGo hides to save tokens; when the API is unavailable, the fallback surfaces automatically.
Platform awareness: telegram-notify can set requires_toolsets: [messaging] and platforms: [telegram, discord]; via the hermes skills TUI you can toggle skills independently for CLI, Telegram, and Discord.
06 Skills Hub and the open-source ecosystem
Official install channels:
hermes skills install official/research/arxiv
hermes skills install https://example.com/SKILL.md --name my-skill
hermes skills install github:openai/skills/k8s
hermes skills tap add github:my-org/my-skills
| Repository | Highlights |
|---|---|
| awesome-hermes-skills | Curated production skills: Deep Research, MLOps, Apple integration; 23 skills wired for GitHub Copilot |
| hermeshub | Community registry with security scanning and certification; API and marketplace support |
| ai-agent-skills | 191 skills across 28 categories; one-click install for Hermes, Claude Code, and Cursor |
| hermes-agent | Official source of truth: all built-in skills and authoring conventions |
agentskills.io means skills work across Hermes, Claude Code, Cursor, and OpenCode—your assets are not locked to one platform.
07 Publish your Skill Tap: six steps for team and community sharing
A GitHub repo as a Tap lets teams or communities subscribe to your skill set. Recommended repo layout:
my-skills-tap/
├── skills.sh.json # category config (optional)
├── mlops/vllm-deploy/SKILL.md
├── research/paper-summarizer/SKILL.md
└── README.md
- Plan categories: organize by domain (MLOps, Research, etc.); write
skills.sh.jsonto control Hub display groups. - Write SKILL.md files: one directory per skill; validate with
skills-ref validate. - Push to GitHub: public or private (private needs a token).
- Team subscribes:
hermes skills tap add github:your-org/your-skills-tap. - Update regularly:
hermes skills tap updatepulls the latest skills. - Version control: put
~/.hermes/skills/in Git; sync across devices withgit pull && hermes skills reset.
hermes skills tap add github:your-org/private-skills --token $GH_TOKEN
hermes skills tap list
hermes skills tap update
08 Self-evolving Skills: GEPA + DSPy automatic improvement
GEPA (Genetic-Pareto Prompt Evolution) is a 2026 ICLR Oral result, integrated in hermes-agent-self-evolution. Core idea: no model fine-tuning—analyze execution traces, generate variants, and apply multi-objective Pareto optimization to improve skill text. Each optimization run costs roughly $2–10 (API only, no GPU).
Five-stage evolution flow: ① execution trace collection (SQLite); ② reflective failure analysis (actionable side information); ③ targeted mutation (10–20 SKILL.md variants); ④ multi-objective Pareto evaluation (success rate × token efficiency × speed); ⑤ human PR review before merge.
export HERMES_AGENT_PATH=~/.hermes
python -m evolution.skills.evolve_skill \
--skill github-code-review \
--iterations 10 \
--eval-source sessiondb
Four safety guardrails: full test suite must pass 100%; Skills ≤ 15KB, tool descriptions ≤ 500 chars; Prompt cache compatible; semantic preservation check so purpose does not drift.
| Phase | Optimization target | Status |
|---|---|---|
| Phase 1 | Skill files (SKILL.md) | Shipped |
| Phase 2 | Tool descriptions | Planned |
| Phase 3 | System prompt fragments | Planned |
| Phase 4 | Tool implementation code | Planned |
| Phase 5 | Continuous improvement loop (fully automated) | Planned |
Because Skills follow agentskills.io, you can feed Claude Code or Gemini CLI traces to the optimizer: --eval-source mixed --trace-dirs ~/.claude/traces,~/.hermes/sessions.
09 Plugin skills and advanced authoring tips
Plugins pack skills under a namespace plugin:skill: they do not appear in the default skills_list, activate only on explicit user call, and skills within a plugin can cross-reference. Loading skill_view("superpowers:writing-plans") also surfaces sibling skills in the same plugin.
Description drives activation accuracy: avoid vague lines like "Helps with code"; state trigger conditions and exclusion cases clearly.
Pitfalls separate good from great: list concrete failure modes, root causes, and fixes (e.g., fragile CSS selectors, GitHub API rate limits, large diff token overflow).
Scripting: reference executable scripts under scripts/ in Procedure; on failure, fall back to references/manual-extract.md.
| Size | Recommendation |
|---|---|
| < 500 lines | Keep everything in SKILL.md |
| 500–1000 lines | Move detail to references/ |
| > 1000 lines | Split strongly; consider two skills |
| > 15KB | Exceeds GEPA limit; must split |
The agent can dynamically patch or create skills via skill_manage; set skills.agent_writes_require_approval: true in config.yaml for a human approval gate.
10 Case study: tech blog workflow Skills design
Build a blog-workflow Bundle that loads SEO research, outline generation, code validation, bilingual check, and publish skills in one shot:
name: blog-workflow
description: Full tech blog writing workflow.
skills:
- seo-keyword-research
- outline-generator
- code-example-validator
- bilingual-checker
- publish-to-platform
instruction: |
Always research SEO keywords before writing.
Ensure all code examples are tested and runnable.
Generate both Chinese and English title options.
A custom seo-keyword-research skill should set requires_toolsets: [web]. The flow: identify topic → Chinese long-tail ("how to use X", "X tutorial") → English long-tail ("X tutorial", "X vs Y") → cross-reference Juejin/Dev.to/HN trending → output 3–5 primary keywords plus a 10–15 long-tail matrix. Chinese and English audiences search differently; validate technical term translations per target platform.
11 Hermes Agent Skills FAQ
- How do Skills differ from MCP? Skills are procedural knowledge documents; MCP is a tool interface—they complement each other.
- Why does my edited Skill still run the old version? Changes do not apply in the current session; start a new session with
/reset, or install with--now(invalidates Prompt Cache). - Is GEPA evolution safe? Four guardrails plus human PR review—but still review every diff.
- How to reuse in Claude Code? Copy SKILL.md to
~/.claude/skills/, or use ai-agent-skills for one-click multi-platform install. - Does Chinese content affect tokens? Roughly 1–1.5 tokens per Chinese character; keep descriptions in English or bilingual for sharper LLM matching.
Further reading: official docs, Chinese docs, GEPA algorithm, DSPy framework.
12 Hard data and JEXCLOUD wrap-up
- GitHub stars: Hermes Agent launched early 2026; crossed 160K stars within two months.
- Level 0 tokens: all skill name+description fields total ~3K tokens per session.
- GEPA per-run cost: roughly $2–10, API-only, no GPU required.
- GEPA size limits: Skills ≤ 15KB; tool descriptions ≤ 500 characters.
- Community scale: kevinnft/ai-agent-skills has 191 skills in 28 categories; hermeshub has 166 stars with security scanning.
Running Hermes Agent and GEPA evolution pipelines needs a 24/7 online, low-latency macOS host. Raspberry Pi runs out of RAM; oversubscribed shared VPS drops long connections; home broadband jitter—all of that degrades Skills trace collection and Gateway uptime.
For production environments that need a stable Hermes Gateway, continuous sessiondb trace collection, and GEPA iteration, JEXCLOUD multi-region bare-metal Macs are the stronger choice: dedicated Apple Silicon, 24/7 uptime, flexible monthly scaling, 120-second node delivery. Configs and pricing: JEXCLOUD pricing.