AI Agents Need Scaffolding: Prompts to Plugins Guide

AI News & Strategy Daily | Nate B Jones2026-05-09go watch the original →

the gist

Most waste 40% of AI time on prompts for repeatable tasks. Build agent 'mech suits' with skills for house style, plugins for full workflows, MCPs for data access, and hooks/scripts for reliability—reusable across teams and LLMs.

Agents Aren't Just LLMs—They Need a 'Mech Suit' Scaffold

LLMs alone handle simple tasks but fail on messy, multi-part workflows like planning, tool use, or ambiguity navigation—core strengths of models like GPT-5.5 or Claude. The fix isn't smarter models; it's scaffolding: prompts, skills, plugins, MCPs, hooks, and scripts that wrap the LLM like Darth Vader's suit or a Transformer's armor. This lets non-engineers customize agents for real work, saving hours wasted on repetitive prompting.

Nate Jones argues people get stuck because they overload prompts, missing how to layer these components. Without clear boundaries, agents produce generic output; with scaffolding, they apply your standards, pull live data, and run deterministic checks. Tradeoff: Prompts are instant but non-reusable; full scaffolds require upfront setup but yield 10x leverage for teams.

"It's like Darth Vader has a mech suit right and that's how Darth Vader works or Transformers have these huge metal suits and that's how they get the job done—this is how LLMs work they have these suits around them that help them get work done." (Nate Jones uses this metaphor to explain why raw LLMs need external structure for production workflows, shifting focus from model intelligence to customizable harnesses.)

Prompts for One-Offs, Skills for Reusable House Style

Prompts suit temporary, specific tasks: a unique client note with custom backstory. They're plain text, carrying no permissions, tools, or team-sharing ease—fine for one-offs but wasteful for repeats, burning hours on copy-paste tweaks.

Skills upgrade reusability: Markdown docs encoding your "house style" for processes like PR reviews, marketing docs, or cold emails. Write once (AI can help), invoke across LLMs like Codex or Claude. Example: A skill for outbound emails specifies structure—paragraphs, data pulls from docs, strong closes—consistent for batches without per-use prompting.

Decision chain: If one-off → prompt. Repeatable/high-sensitivity → skill. Skills follow power law: 20% deliver 80% value; prioritize frequent, error-prone tasks (e.g., customer service, engineering reviews). Track them to avoid sprawl—divide/conquer by workflow chunks.

Tradeoffs: Skills are universal/portable but text-only—no live integrations. Over-prompting mimics skills poorly, lacking enforcement.

"A prompt is what you do when you want to do something once... a prompt is not a great home for anything that you repeat constantly or consistently." (Jones contrasts to push viewers from prompt dependency, estimating massive time waste from non-packaged repeats.)

Plugins Package Full Workflows for Team Reuse

Plugins scale beyond skills: Bundles including skills, MCPs, hooks, scripts, assets, commands, and metadata. Install once, run entire workflows like Salesforce-pulled outbound emails or architecture reviews. Not just app add-ons—workflow wrappers making humans obsolete as the "human plugin" (copy-paste across apps).

Why plugins now? Agents handle rich work; tools like Codex simplify non-coder builds. Example: Editorial plugin for first-pass reviews—reads text multiple ways, flags rough spots/incoherence/facts—faster than humans for routine, saving time without full automation.

Build logic: Identify repeatable structures with clear boundaries (e.g., refunds vs. upgrades in customer success—not one mega-plugin). Plugins aren't app-store shopping; they're active packaging: "What part of my work has enough repeatable structure that the agent should be able to inherit it?"

Tradeoffs: High setup (but easier in 2026—no engineering needed); golden skill to bound workflows right, avoiding bloated/over-wide packages prone to failure.

"You are literally the human plugin cuz you copy from one app you paste into the chat... if you don't want to be the human plugin consider making an actual plugin." (Jones demystifies plugins, showing serious AI users already do this manually—formalize for leverage.)

MCPs/Connectors for Live Data, Hooks/Scripts for Determinism

MCPs (Managed Compute Providers?) and app connectors plug agents into live systems: Slack, Figma, GitHub, Salesforce. Like old-school internet plugs—fetch real data, not imagined. Plugins often contain MCPs but add workflow layers (e.g., data → process → output).

Hooks/scripts handle non-LLM-trustable parts: Format code (run formatter), validate schemas/JSON, execute tests, enforce pre-stop reviews. Deterministic, not model's "best judgment." Embed in plugins for full loops.

Confusion fix: MCPs ≠ plugins (data access vs. full package); hooks/scripts ≠ MCPs (internal checks vs. external plugs). All Lego bricks; plugins pre-assemble into structures.

"Some things ought to be deterministic by which I mean some things should not be left to the model... if the tests need to pass run the test don't ask the model to imagine running the test." (Emphasizes reliability—agents shine on ambiguity, falter on facts; scripts ensure quality.)

Marketplaces Evolve Beyond App Stores

Plugins distribute via marketplaces, but undersold as "extensions." Think workflow sharing: Team installs PR review plugin—your standards, tools, checks—once. Enables semantic workflow design: Right unit? (E.g., separate refund/upgrade plugins over monolithic customer success.)

Result: Scaffolded agents 10x generic prompting. Same LLM, better output via human-provided structure. Non-engineers build now; 2025 impossible, 2026 routine.

"If you think of plugins as workflow packaging you're going to ask a much sharper question... 'What part of my work has enough repeatable structure?'" (Challenges passive consumption, urging proactive bounding for team-scale reuse.)

Key Takeaways

Audit workflows: One-off → prompt; repeatable style → skill (Markdown); full package → plugin.
Prioritize 20% high-value skills/plugins (frequent, sensitive tasks like PRs/marketing).
Bound plugins tightly: One job per (e.g., refunds ≠ full customer success) to avoid failure.
Embed determinism: Hooks/scripts for formatting/tests/validation inside plugins.
Use MCPs/connectors for live data (Salesforce/GitHub); plugins wrap them with process.
Build now: Non-coders can; formalize your "human plugin" habits for 10x team leverage.
Test trust: Checklist before install—does it match standards? Pull right data?
Distribute via marketplaces: Sharable team assets, not solo hacks.