OpenClaw Prompt Engineering: Write Better Agent Instructions

Q: What is agents.md and how is it different from SKILL.md?

agents.md defines project-wide agent behavior and loads at the start of every session. SKILL.md defines a specific task workflow and only loads when that skill activates. Use agents.md for general rules, project context, and output standards. Use SKILL.md for repeatable multi-step workflows with specific guardrails.

Q: How long should my agents.md be?

Under 500 words for the core instructions. Every word in agents.md consumes tokens on every interaction, so verbosity has a direct cost. If a section is only relevant to one workflow, move it to a SKILL.md file where it only loads when needed.

Q: Can I define different personas for different tasks?

Yes. Use SOUL.md for a baseline persona that applies to all interactions. For task-specific personas, define the role and communication style in the relevant SKILL.md file. The SKILL.md persona overrides SOUL.md for that specific task.

Q: How do I prevent my OpenClaw agent from hallucinating?

Add explicit guardrails: Never fabricate data. If you cannot find the answer, say so. Define failure paths for every external dependency. Use the AGENTS.md context pattern which showed 47% fewer hallucinations compared to unstructured prompts. Constrain the agent's scope so it does not attempt tasks outside its defined expertise.

Q: Does prompt length affect my API costs?

Yes. OpenClaw injects your agents.md into every interaction, and each character counts toward your token usage. A 500-word agents.md adds roughly 650-750 tokens per message. At current API rates for Claude Opus 4.6, that is approximately $0.01-0.02 per interaction in additional input cost. Multiply by hundreds of daily interactions and it adds up.

The difference between an OpenClaw agent that saves you two hours a day and one that constantly needs babysitting comes down to how you write its instructions. The pattern is consistent: teams that invest thirty minutes writing structured agents.md files get dramatically better results than teams that dump a paragraph of vague instructions and hope for the best.

This guide covers the mechanics of writing effective OpenClaw prompts, from the prompt hierarchy that determines what your agent reads, to structured templates you can copy into your own projects today. If you have not set up OpenClaw yet, start with our setup guide and return here once your agent is running.

How OpenClaw Prompts Work

OpenClaw does not use a single prompt. It assembles a custom system prompt for every agent run from multiple files, each serving a different purpose. Understanding this hierarchy is the first step to writing instructions that stick.

The Prompt Hierarchy

File	Purpose	Scope	When It Loads
System prompt	Core behavior, safety, tooling	Every session	Always (built by OpenClaw)
`agents.md`	Project-specific agent behavior	Per workspace	Every session in that workspace
`SKILL.md`	Task-specific workflow instructions	Per skill activation	Only when the skill triggers
`SOUL.md`	Personality and communication style	Per workspace	Every session (injected into system prompt)

The system prompt is assembled by OpenClaw itself. You do not edit it directly. It includes safety guardrails, available tools, and workspace context. OpenClaw injects up to 8 bootstrap files into this prompt, with a per-file cap of 20,000 characters and a total cap of 150,000 characters across all injected content.

Your primary authoring surfaces are agents.md and SKILL.md. The first defines how your agent behaves in a specific project. The second defines how it executes a specific task. Most guides skip agents.md entirely, but it is the file you will edit most often.

Where agents.md Fits

Think of agents.md as the briefing document you hand a new contractor on day one. It covers who they are working for, what tools they have access to, what the project looks like, and what they should never do.

Your agent reads agents.md at the start of every session in that workspace. Everything in this file shapes every response. This makes it the highest-leverage piece of text you can write, but also the most dangerous if you fill it with noise. The OpenClaw official docs recommend keeping your system-level instructions under 500 words to avoid diluting the signal.

Writing Effective agents.md Files

A weak agents.md reads like a product description. A strong one reads like an operations runbook. Here is a proven structure:

# Project Agent Configuration

## Role
You are a [specific role] working on [specific project].
Your expertise is [domains]. You communicate in [style].

## Project Context
- Repository: [what this codebase does]
- Stack: [languages, frameworks, key tools]
- Team conventions: [naming, branching, testing standards]

## Task Priorities
1. [Most important behavior]
2. [Second priority]
3. [Third priority]

## Output Standards
- [Format preference: markdown, JSON, code blocks]
- [Length preference: concise vs detailed]
- [Tone: technical, conversational, formal]

## Guardrails
- Never [dangerous action 1]
- Never [dangerous action 2]
- Always [safety requirement]

Each section does specific work. Drop any section and your agent’s behavior degrades in a predictable way.

Role Definition

The role section is not decorative. It constrains the agent’s behavior in measurable ways. An agent told it is a “senior backend engineer” will default to different decisions than one told it is a “technical writer.”

Effective role definitions include three elements:

Domain expertise: What the agent knows deeply. “You specialize in Python data pipelines and PostgreSQL optimization” is better than “you are a developer.”
Expertise boundaries: What the agent should not attempt. “You do not provide legal or compliance advice. When asked about GDPR, defer to the legal team.” This prevents confident-sounding nonsense in areas outside the agent’s assigned scope.
Communication style: How the agent delivers answers. “Explain decisions briefly. Use bullet points for action items. Skip preambles.”

Here is a concrete example from a content production workflow:

## Role
You are a senior content strategist for a B2B SaaS blog.
You specialize in SEO-driven editorial content for technical audiences.
You do not provide legal advice, medical information, or financial guidance.
When asked about topics outside your expertise, say so and suggest
the user consult a specialist.

Communicate directly. Skip throat-clearing phrases like "great question"
or "let me explain." Start with the answer, then provide reasoning.

Task Decomposition

Complex requests fail when agents try to do everything at once. Task decomposition in your agents.md means defining how the agent should break work into steps.

Four decomposition patterns work well depending on the workflow:

Sequential: Steps run in order, each depending on the previous output. Use for pipelines where step 2 needs the output of step 1.

## Research Workflow
1. Search for the target keyword using WebSearch.
2. Fetch the top 5 results using WebFetch.
3. Extract key topics from each result.
4. Identify gaps between existing content and our angle.
5. Write a competitive brief to /content/research/[slug]/brief.md.

Conditional: Different steps based on the input. Use when the same skill handles multiple scenarios.

## Code Review Workflow
- If the PR has fewer than 50 changed lines: review inline, suggest fixes directly.
- If the PR has 50-200 changed lines: summarize changes first, then review each file.
- If the PR exceeds 200 changed lines: flag it as too large, recommend splitting.

Parallel: Independent subtasks that can run simultaneously. Use when steps do not depend on each other.

## Daily Report
Run these checks independently:
- Check server uptime via monitoring API
- Pull yesterday's sales figures from the dashboard
- Scan Slack #incidents for unresolved issues
Then combine all results into the daily report template.

Iterative: Repeat a step until a condition is met. Use for quality gates.

## Content Audit Loop
1. Run the content audit checklist on the article.
2. If the score is below 80/100, apply the suggested fixes.
3. Re-run the audit.
4. Repeat up to 2 more times. If the score remains below 80, stop and report.

Output Formatting

Telling an agent “format it nicely” produces inconsistent results. Specifying the exact format produces the same output every time.

The formats that work most reliably with LLM-based agents:

Markdown tables work well for structured comparisons. Agents rarely break table syntax if you provide a template row.

Numbered steps work well for procedures. Agents maintain the sequence and rarely skip steps when the template is numbered.

JSON works but requires explicit schema. Without a schema example, agents improvise field names between runs.

Free-form prose is the least reliable for consistency. If the content matters, constrain it.

Here is how to specify output format in your agents.md:

## Output Standards
All reports must use this structure:

### Report: [Title]
**Date:** [YYYY-MM-DD]
**Status:** [PASS/FAIL/NEEDS REVIEW]

#### Findings
| Item | Status | Detail |
|------|--------|--------|
| [check name] | PASS/FAIL | [one sentence] |

#### Recommendations
1. [Highest priority action]
2. [Second priority]

#### Raw Data
Wrap raw output in code blocks with the appropriate language tag.

Few-Shot Examples

Few-shot examples are the single most underused technique in OpenClaw prompt engineering. Showing the agent what you want is faster and more reliable than describing it.

Include examples in your agents.md or SKILL.md using this pattern:

## Examples

### Example 1: Bug Report Triage
**User says:** "The checkout page throws a 500 error when I add more than 10 items"
**Agent does:**
1. Searches codebase for checkout cart logic
2. Identifies the array bounds check in cart.js line 47
3. Suggests fix: change `if (items.length > 10)` to handle the edge case
4. Creates a draft PR description

### Example 2: Feature Question
**User says:** "Can our API handle batch uploads?"
**Agent does:**
1. Searches API routes for batch endpoints
2. Finds POST /api/v2/batch exists but is rate-limited to 100 items
3. Reports the endpoint, its limits, and links to the relevant code

Two to three examples cover most use cases. More than five starts consuming too many tokens for diminishing returns. Place examples at the end of your instructions so the agent processes the rules first and the examples serve as calibration.

Persona Definitions with SOUL.md

SOUL.md controls your agent’s personality. It sits in your workspace root, and OpenClaw injects its contents into the system prompt with an instruction to embody the described persona.

Most teams skip this file or write something generic like “be helpful and professional.” That is a missed opportunity. A well-crafted persona changes how the agent handles ambiguity, asks clarifying questions, and structures responses.

Here is an example persona for a DevOps monitoring agent:

# Personality

You are a pragmatic ops engineer with 10 years of production experience.
You have seen every type of outage and your instinct is to stabilize
first, diagnose second, and document third.

## Communication Style
- Terse during incidents. Full sentences during retrospectives.
- Default to bullet points. Use prose only for explanations.
- When uncertain, say "I am not sure" rather than guessing.
- Never start messages with greetings or pleasantries.

## Decision Making
- Prefer reversible actions over irreversible ones.
- When two options exist and neither is clearly better, pick the
  one that is easier to undo.
- Always suggest a rollback plan before recommending a change.

The persona does not replace technical instructions in agents.md. It supplements them. Use agents.md for what the agent does. Use SOUL.md for how the agent communicates while doing it.

Common Prompt Engineering Mistakes

These are the prompt engineering mistakes that come up most often across OpenClaw deployments:

Vague instructions with no success criteria. “Help me with code reviews” gives the agent no framework for what a good review looks like. “Review PRs for security vulnerabilities, performance regressions, and style guide violations. Flag each issue with severity: critical, warning, or info” gives it a concrete checklist.

Overloaded prompts that try to cover everything. An agents.md that is 3,000 words long dilutes every instruction. The agent cannot prioritize when everything is marked as important. Keep it under 500 words and move specialized workflows into separate skills. See our skills development guide for how to extract repeating workflows into their own SKILL.md files.

No failure handling. If your instructions only describe the happy path, the agent will improvise when things go wrong. And improvisation from an LLM during an error state is how you get hallucinated error messages, phantom file modifications, and confidently wrong status reports. Every external call (API, database, file system) needs a “what if this fails” instruction.

Ignoring the description field in SKILL.md. The description field in SKILL.md frontmatter is not documentation. It is the trigger mechanism. OpenClaw reads descriptions to decide which skill matches a user’s request. A description that says “handles data tasks” will rarely activate. One that says “Generate weekly ops reports from server logs and GitHub PRs, use when asked for standup updates or weekly summaries” fires reliably.

Treating prompts as write-once artifacts. The best agents.md files are living documents. Teams iterate on them weekly, adding guardrails when the agent misbehaves, sharpening instructions when output quality drifts, and pruning sections that no longer apply. If you wrote your agents.md three months ago and have not touched it since, it is almost certainly stale.

Frequently Asked Questions

What is agents.md and how is it different from SKILL.md?

agents.md defines project-wide agent behavior and loads at the start of every session. SKILL.md defines a specific task workflow and only loads when that skill activates. Use agents.md for general rules, project context, and output standards. Use SKILL.md for repeatable multi-step workflows with specific guardrails.

How long should my agents.md be?

Under 500 words for the core instructions. Every word in agents.md consumes tokens on every interaction, so verbosity has a direct cost. If a section is only relevant to one workflow, move it to a SKILL.md file where it only loads when needed. Our memory configuration guide covers how to manage context efficiently.

How do I add few-shot examples without using too many tokens?

Include 2-3 examples covering your most common use cases. Place them at the end of the file so core instructions load first. Each example should be 3-5 lines maximum. If you need more extensive examples, put them in a references/ folder inside your skill directory and instruct the agent to read them on demand.

Can I define different personas for different tasks?

Yes. Use SOUL.md for a baseline persona that applies to all interactions. For task-specific personas, define the role and communication style in the relevant SKILL.md file. The SKILL.md persona overrides SOUL.md for that specific task.

How do I prevent my OpenClaw agent from hallucinating?

Add explicit guardrails: “Never fabricate data. If you cannot find the answer, say so.” Define failure paths for every external dependency. Use the AGENTS.md context pattern (described in this Vercel AI SDK study) which showed 47% fewer hallucinations compared to unstructured prompts. Constrain the agent’s scope so it does not attempt tasks outside its defined expertise.

What is the CRAFT framework for OpenClaw prompts?

CRAFT stands for Context, Role, Action, Format, Tone. It is a prompting structure where you provide the background situation, assign the agent a specific role, state the exact action required, specify the output format, and set the communication tone. It works well for complex one-off prompts but for recurring workflows, encode the same elements directly into your SKILL.md sections.

What are the most common prompt engineering mistakes in OpenClaw?

Vague instructions without success criteria, overloaded prompts that try to cover every scenario, missing failure handling for external calls, weak SKILL.md descriptions that prevent skill activation, and treating prompts as write-once documents instead of iterating on them. The biggest single mistake is writing instructions that describe goals instead of steps.

Does prompt length affect my API costs?

Yes. OpenClaw injects your agents.md into every interaction, and each character counts toward your token usage. A 500-word agents.md adds roughly 650-750 tokens per message. At current API rates for Claude Opus 4.6, that is approximately $0.01-0.02 per interaction in additional input cost. Multiply by hundreds of daily interactions and it adds up. Keep instructions tight.

Key Takeaways

OpenClaw assembles prompts from multiple files: system prompt, agents.md, SKILL.md, and SOUL.md. Each serves a different purpose in the hierarchy.
Write agents.md like an operations runbook, not a product description. Include role definitions, task decomposition patterns, explicit output formats, and failure handling.
Use few-shot examples. Two to three concrete input/output examples calibrate agent behavior more effectively than paragraphs of abstract instructions.
Keep agents.md under 500 words. Move specialized workflows into SKILL.md files that only load when needed.
Iterate on your prompts. The best agent configurations are living documents that teams refine weekly based on observed behavior.

For deeper coverage of building reusable workflows, see our OpenClaw skills development guide. To understand the cost implications of your prompt choices, check our OpenClaw API costs breakdown.

OpenClaw Prompt Engineering: Write Better Agent Instructions

How OpenClaw Prompts Work

The Prompt Hierarchy

Where agents.md Fits

Writing Effective agents.md Files

Role Definition

Task Decomposition

Output Formatting

Few-Shot Examples

Persona Definitions with SOUL.md

Common Prompt Engineering Mistakes

Frequently Asked Questions

What is agents.md and how is it different from SKILL.md?

How long should my agents.md be?

How do I add few-shot examples without using too many tokens?

Can I define different personas for different tasks?

How do I prevent my OpenClaw agent from hallucinating?

What is the CRAFT framework for OpenClaw prompts?

What are the most common prompt engineering mistakes in OpenClaw?

Does prompt length affect my API costs?

Key Takeaways

Get OpenClaw Running — Without the Headaches

Related articles

The 10x Developer Used to Be a Unicorn — Now We're Approaching the 1000x Paradigm

Agentic AI Development: Tool Use and Function Calling

Agile AI Development: Sprint Planning with Your Agency

Where ideas become AI products

Company

General

Case Studies

Services

Resources