CLI-First Skill Design¶
Design agent skills as CLI tools so the same interface serves both humans debugging locally and agents automating through shell tool calls.
When a skill is implemented as a shell script, a human can run it directly from a terminal and an agent can invoke it through a Bash or run() tool call — no separate interfaces required. The awesome-agentic-patterns catalogue documents this design, and Claude Code best practices identify CLI tools as "the most context-efficient way to interact with external services" (source).
Core Principles¶
One executable per skill. Each capability lives in a single script at ~/.claude/skills/<name>/scripts/<name>.sh. Composition happens via Unix pipes, not by building a monolithic skill.
Subcommands for CRUD. Structure operations as positional arguments:
trello.sh boards # list
trello.sh cards <BOARD_ID> # read
trello.sh create <LIST_ID> "Title" # write
This mirrors how gh, aws, and other agent-friendly CLIs work — tools the agent already knows from pretraining.
Adaptive output. Return JSON for programmatic use; human-readable text when attached to a TTY. Detect with [ -t 1 ] (POSIX) or sys.stdout.isatty() in Python. The agent always gets structured output; a human running the script manually gets formatted text.
Standard exit codes. Use POSIX conventions (IEEE Std 1003.1): 0 success, 1 error, 2 usage problem, 127 command not found. Agents branch on exit codes rather than parsing error messages.
Credentials via environment variables. Follow the 12-Factor App config principle: never hardcode tokens or API keys. Read from $TRELLO_API_KEY, $GITHUB_TOKEN, etc. The agent sets these before calling the script; humans export them in their shell profile.
Non-interactive by default. Skills must not block on prompts. Expose --yes or --force flags for destructive operations. An agent has no stdin to answer questions.
Why CLI-First Beats API-First for Dual-Use Skills¶
| Property | CLI-first | In-process function | Structured API |
|---|---|---|---|
| Debuggable without agent | Yes — run from terminal | No — requires agent context | Partial — needs HTTP client |
| Unix composability | Yes — pipes, &&, || |
No | No |
| Agent transcript visibility | Yes — commands appear in transcript | No | Partial |
| Testability | Straightforward — call the script | Requires agent harness | Requires mock server |
| Cross-tool portability | Yes — any agent that can shell out | No | No |
| Complex data structures | Limited — shell arrays are awkward | Full | Full |
| Process spawn overhead | Per call | None | Per call |
| Persistent state | Not native | Easy | Session-based |
CLI-first wins when skills run infrequently (seconds between calls), operate on text or JSON, and need to be debuggable by a human. It loses when a skill is called hundreds of times per task, needs rich object graphs, or streams data in real time.
Composition via Pipes¶
The payoff of one-script-per-skill is Unix composability. A priority report that draws from three services:
#!/usr/bin/env bash
# priority-report.sh — compose three skill CLIs
{
trello.sh cards "$TRELLO_BOARD" --json
asana.sh tasks --project "$ASANA_PROJECT" --json
github.sh issues --repo "$GITHUB_REPO" --json
} | jq -s '
[ .[][] | select(.priority == "high") ]
| sort_by(.due_date)
| .[:10]
'
Each skill is independently testable; the composition script is a thin orchestrator. The agent calls priority-report.sh and receives a bounded JSON array — not three separate tool calls with three separate outputs to reconcile.
When to Choose Something Else¶
- High call frequency — process spawn overhead accumulates; use an in-process function or consolidate into a single tool
- Complex object graphs — shell arrays and associative maps are fragile; use a Python/Node script with proper data structures
- Real-time streaming — shell scripts cannot hold open WebSocket or SSE connections gracefully
- Windows without WSL — POSIX scripts require a compatibility layer; evaluate whether your audience is exclusively Unix-based
Example¶
A GitHub skill CLI that follows all six principles:
#!/usr/bin/env bash
# github.sh — skill CLI for GitHub issues
set -euo pipefail
REPO="${GITHUB_REPO:-}"
TOKEN="${GITHUB_TOKEN:-}"
usage() { echo "Usage: github.sh issues|create|close <args>" >&2; exit 2; }
[[ $# -lt 1 ]] && usage
case "$1" in
issues)
result=$(gh issue list --repo "$REPO" --json number,title,labels --limit 20)
if [ -t 1 ]; then
# human-readable
echo "$result" | jq -r '.[] | "#\(.number) \(.title)"'
else
echo "$result"
fi
;;
create)
[[ $# -lt 2 ]] && usage
gh issue create --repo "$REPO" --title "$2" ${3:+--body "$3"}
;;
close)
[[ $# -lt 2 ]] && usage
gh issue close --repo "$REPO" "$2"
;;
*)
usage
;;
esac
An agent calling github.sh issues receives a JSON array it can filter with jq. A developer running the same command from a terminal sees #42 Fix auth bug — no flags needed, no separate interface.
Key Takeaways¶
- One script per skill, subcommands for operations, JSON output for agents, human-readable for TTY
- POSIX exit codes (0/1/2/127) let agents branch on failure without parsing error text
- Credentials always via environment variables — never hardcoded
- Composition via pipes replaces complex multi-service skills with thin orchestration scripts
- Choose a different approach for high-frequency calls, complex data structures, or real-time streaming