HM.AI: Agent Skills Science Popularization: The Capability Stack from

In the past two years, the most impressive ability of big models has been their ability to write, answer, and summarize. But when you really put it into the workflow: writing requires verification, analyzing requires running numbers, operating requires sending emails, customer service requires checking orders - you will soon realize that "being able to generate" does not mean "being able to complete tasks".

The key to truly turning AI into productivity is not to replace it with a larger model, but to turn it into a manageable, reusable, and online system: Agent. The core component that enables agents to work stably is Agent Skills.

This article proposes a disassembly method from the perspective of "science popularization+methodology+practical experience" that is friendly to beginners and can be directly implemented by experts: using a three-layer architecture of Metadata - Instruction - Resources, the concept, technical principles, engineering points, and multi scenario practical experience of Agent Skills are explained in one go, and a writing template that can be directly reused is provided.

1.What exactly are Agent Skills: It's not about "installing tools", it's about "encapsulating task capabilities"

Many people, when they first hear about Skills, understand it as' adding a few more tools (APIs) to the model '. This is only half right.

To be more precise:

Agent Skills is a "reusable unit of task completion capability": it encapsulates goals, processes, tools, knowledge, constraints, and acceptance criteria together, allowing intelligent agents to deliver steadily.

If big models are compared to "brains" and tools are compared to "hands", then skills are more like "muscle memory+operating procedures+seat belts":

It knows when to use hands, which hand to use, and how to check after use;

It knows which actions need to be confirmed and which actions cannot be performed;

It can retry, switch paths, and downgrade output in case of failure;

It can also be reused, versioning, monitored, and evaluated.

That's also why various platforms today emphasize tool/function calling: the model can initiate a "call request" during the generation process, which is executed by an external system and backfilled with the results. The model then completes the final answer and action loop OpenAI based on the results.

Similarly, Gemini's documentation breaks down concepts very clearly: Tools are specific capabilities (search, code execution, maps, etc.), while Agents are systems that can plan, execute, and synthesize in multiple steps; Tool expansion capability, intelligent agent organization capability Google Gemini.

2. Overview of three-tier architecture: What do Metadata, Instruction, and Resources manage?You can imagine a skill as' a small product '. The three-tier architecture is its product structure:

（1） Metadata: Who are you, what can you do, where are the boundaries

Metadata is the "manual+contract+configuration portal" for skills.

It answers:

What is the name of this skill? What problem is being solved? Who is suitable for?

What is the structure of input/output? What are the criteria for success?

What are the risks and boundaries? Which actions require manual confirmation?

What are the cost budget, permission level, and version information?

Agents without metadata often become "well written but uncontrollable": either calling tools randomly, drifting in output style, or executing beyond authority.

（2） Instructions: How do you do it, what process do you follow, and how do you self check

Instruction is the 'operating system' of skills.

It is not a 'please be more professional', but an executable SOP:

Clarify which information is missing first and ask what is missing;

When it is necessary to search and when it should not be searched;

How to choose tools, what is the order of calling, and how to deal with failures;

How to verify the results and handle conflicting evidence;

Output format, citation standards, and tone requirements;

How to conduct self inspection and acceptance in the end.

In research and engineering practice, paradigms such as ReAct are important because they emphasize the interweaving of reasoning and action: thinking, researching, and correcting while using external information to suppress illusions and error propagation.

（3） Resources: What external abilities and information can you use

Resources are the "hands and feet+database+observation system" of skills.

including:

Tools/functions (APIs, databases, enterprise systems, automation scripts)

Retrieval and Knowledge (RAG: Vector Library, Document Library, Web Page, Intranet)

Execution environment (code execution, browser automation, workflow engine)

Observable and assessable (Tracing, logs, evaluation sets, alerts)

After the Agent goes online, you need to be able to answer: Why is it slow? Why is it wrong? Is the mistake in search or tool? Have you exceeded your authority? This requires an observation system. OpenTelemetry emphasizes that for non deterministic AI agents, telemetry data is not only used for monitoring, but also as a feedback loop for continuous improvement.

3. Technical principle: the underlying mechanism from "dialogue model" to "action agent"

Transforming Agent Skills into an engineering usable form typically consists of three underlying mechanisms:

3.1 Tool/Function Call: Enabling the Model to Do Things

A typical link is:

You send the tool definition (function schema) along with the user problem to the model
The model determines which tool to call and provides structured parameters
Your system execution tool will backfill the results into the model
The model generates the final response based on the results (or continues to call more tools)

This process is a fundamental capability shared by mainstream platforms, OpenAI.

The key engineering point here is "controllability": the function name, parameters, strict mode, parallel calls, and rules for "when to use/when not to use" need to be clearly written, otherwise the model will "look hard, but the harder it works, the more likely it is to make mistakes".

3.2 RAG (Retrieval Enhanced Generation): Making the model "verifiable"

When you ask it not to fabricate, but don't give it a reliable source, it can only rely on guessing. The significance of RAG is to decouple "facts" from model parameters and hand them over to renewable data sources such as document libraries, web pages, product manuals, internal knowledge, etc.

The further Agentic RAG is: the intelligent body can determine when to retrieve, what to retrieve, how to verify retrieval results, and what to do if retrieval is not possible.

3.3 Observation and Evaluation: Enable the System to 'Iterate'

When an agent enters a production environment, the most important ability is not "occasionally getting it right", but "continuously improving". Therefore, you need to:

Track every step of tool invocation and retrieval of evidence chain

Statistics on success rate, time consumption, cost, and reasons for failure

Using a review set for regression testing to prevent "fixing one bug and introducing three new bugs"

LlamaIndex explicitly states in its observability guidelines that one of the key requirements for building RAGs and agents is "observable, debuggable, and assessable", and supports exporting traces to LlamaIndex and other systems such as OpenTelemetry.

4. Multi scenario combat: How can the same three-layer architecture adapt to different tasks?

Here are six high-frequency scenarios for both media and enterprises, telling you what needs to be written at each level to be considered appropriate.

Scenario A: Technology Media Writing (Topic Selection → Verification → Draft)

Metadata: media positioning, reader hierarchy, taboos (no fabricated references/data)

Instruction: First outline and information gaps → fill in evidence item by item → write with sources → self check

Resources: Web page search/specified URL reading, historical manuscript library, terminology list

You will find that the core of writing skills is not literary talent, but evidence chain.

Scenario B: Fact check

Instructions must define: strategies for handling conflicting evidence (presenting them side by side, providing reasons for differences, and annotating uncertainty)

Resources: Multi source search, authoritative database, public financial reports/papers/official announcement link pool

Scenario C: Enterprise Knowledge Base Q&A (RAG)

Metadata: knowledge scope, confidentiality level, and whether it can be outsourced

Instruction: You must first search before answering; If unable to retrieve, ask or clarify 'insufficient information'

Resources: Vector library, document permission system, audit logs

Scenario D: Operations automation (data lookup → content generation → action execution)

Metadata: Which actions require manual confirmation (mass sending, editing, payment)

Instruction: First rehearse the output, then execute the tool call, and finally review the results

Resources: CRM/work order/email system API+tracing

Scenario E: Data Analysis Report (Run → Chart → Conclusion)

Instruction: Calculations must be done using code/tools, and guessing results through verbal calculation is not allowed

Resources: Data sources, code execution, chart templates

Scenario F: Multi agent collaboration (Researcher/Writer/Editor/Checker)

Metadata: Role Assignment and Deliverables Contract

Instruction: The main controller is responsible for disassembly and acceptance, while the sub agents are responsible for specialized tasks and submitting verifiable materials

Resources: Shared memory, task queue, unified observation and evaluation

5. Directly usable: A universal writing style for "Agent Skills three-layer architecture" (can be copied and pasted)

The following section is the 'directly usable text' you need. You can use it as a 'methodology paragraph' in media articles, or as a template for team internal skill specifications.

5.1 General Definition (can be directly included in the main text of the manuscript)

Agent Skills can be understood using a three-tier architecture: Metadata, Instruction, and Resources.

Among them, Metadata defines the identity, boundaries, and governance rules of skills; Instruction defines the execution process and quality standards of skills; Resources defines tools that can be called upon for skills, retrievable knowledge, and observable feedback systems. The significance of three-layer separation lies in transforming "whether the model will be" into "whether the system can deliver stably", and transforming "relying on prompt word mysticism" into "reusable and iterative engineering capabilities" OpenAI OpenTelemetry.

5.2 Directly implementable template: Technology Media Writing Skill (can be directly used as an appendix/toolbox for your manuscript)

Skill Name: [Agent Skills Technology Manuscript Generation]

Metadata (metadata)

Skill goal: Organize user themes or materials into publishable technology news/science popularization articles

Reader level: Balancing novice and advanced readers (terms must be explained for the first time; key conclusions must have evidence)

Output Structure: Title | Introduction | Section Text | Key Points Box | Reference Source List

Fact and compliance constraints: It is not allowed to fabricate individuals, institutions, data, and references; Information that cannot be verified must be expressed as' uncertain 'and gaps must be explained

Success criteria: Clear structure, consistent arguments and evidence, traceable source links for key facts, strong readability for readers

Risk classification: Conclusions involving finance/healthcare/law must include a "non recommended" prompt and readers are advised to refer to official documents

Instruction (Instruction/SOP)

Clarify requirements: Confirm the theme, perspective (science popularization/industry/commentary), length, reader level, and whether there are designated or prohibited sources
Generate outline: Provide 5-8 subheadings and key questions to be answered in each section; List the information gaps
Complete evidence: Provide clickable sources for each key fact; If there are conflicts from multiple sources, they must be presented side by side and the differences explained
Writing output: Introduce the conclusion and key facts first; The main text follows the narrative of "concept → principle → application → controversy/limitation → trend"
Self inspection and acceptance: check each section for the existence of no source assertions; Check if there is any concept substitution; Check if the terminology is explained; Check if the conclusion exceeds the evidence
Delivery Specification: Each factual content paragraph should be accompanied by a source link at the end of the paragraph; At the end of the article, list 'Sources' (only the links that have actually been referenced)

Resources

Search resources: web search/specified URL reading (for fact checking and background supplementation)

Knowledge resources: User provided materials, historical reports, public papers/announcements (if available)

Tool resources: Structured outline generation, data calculation (prioritize tools over guesswork when needed)

Observation resources: Record citation hit rates, reasons for failed fact verification, and feedback on revisions as the basis for optimizing the next version of the Instruction using LlamaIndex.

6. Restraint: the most critical criterion for judgment

To determine if an Agent Skill is' engineering usable ', you can simply ask:

Can it still provide stable, interpretable, and acceptable delivery even in situations where information is incomplete, sources conflict, tools fail, and costs are limited?

If the answer is' yes', then it is a skill; If the answer is' luck ', then it's just a conversation.