AI Agents Need Scaffolding: Prompts to Plugins

Nate B Jones2026-05-09go watch the original →

the gist

Most AI time waste comes from over-relying on prompts for repeatable work; use skills for styles, plugins for workflows, MCPs for data, hooks/scripts for determinism to build reusable agent 'mech suits'.

The Scaffolding Problem in AI Agents

AI agents aren't just LLMs—they require a 'mech suit' of components like prompts, skills, plugins, MCPs, hooks, and scripts to handle real, repeatable work. The core issue: people waste ~40% of AI time stuffing everything into prompts, which fail for workflows due to lack of reusability, permissions, tools, or determinism. With advancing models like GPT 5.5 excelling at 'messy multi-part work' (planning, tool use, ambiguity), the bottleneck shifts to scaffolding. Without it, users act as 'human plugins'—manually copying data, checking outputs—losing hours. The opportunity: non-engineers can now build custom suits for team workflows, turning generic models into specific, 10x effective agents via proper layering.

"It's like Darth Vader has a mech suit right and that's how Darth Vader works or Transformers have these huge metal suits and that's how they get the job done—this is how LLMs work they have these suits around them that help them get work done." (Nate Jones uses this metaphor to demystify why raw LLMs need external structure for productivity, emphasizing agents as composite systems.)

Layering Components: From One-Offs to Full Workflows

Start with prompts for temporary, specific tasks: a complex one-off client note with custom backstory. They're simple text but don't package processes, carry no tools/permissions, and waste time when repeated—hours of re-prompting across teams. Transition to skills for reusable 'house styles': markdown docs encoding team processes like PR reviews, marketing docs, or outbound emails (e.g., structure with paragraphs, data pulls, strong closes). Skills are LLM-agnostic (work in Codex, Claude), AI-generatable (even 'skills to write skills'), and scale via power law—focus 20% high-value ones covering 80% repeats. Example: skill for cold outbound vs. prompt for single note.

Scale to plugins for installable workflow bundles: wrap skills + MCPs (data connectors), hooks, scripts, assets, metadata. Unlike skills (process-only), plugins handle full flows like Salesforce-integrated outbound emails. They're team-sharable, avoiding manual reconstruction. Plugins aren't mere app add-ons; they're Lego assemblies of components for bounded units (e.g., separate plugins for refunds, activations, upgrades in customer success—not one mega-plugin). Building them identifies workflow edges: "Your job is to understand the semantic meaning of the workflow and to say this is a good unit of work that has a neat edge and boundaries around it."

"A plugin is something that your team can use without everyone manually reconstructing the setup... here's the workflow package that you can install and all of it will just get magically done for you." (Jones contrasts plugins' power for real work—live data, revisions, skills—with prompt limitations, highlighting sharability as key leverage.)

Data Access and Determinism: MCPs, Hooks, Scripts

MCPs/app connectors provide live plugs to work systems (Salesforce, Slack, Figma, GitHub): fetch real data, not imagined. Plugins often contain MCPs but add workflow around data (process, review). SaaS tools increasingly offer pre-built MCPs, reducing build needs.

Hooks/scripts handle non-LLM-trustable parts: deterministic validation (format code, validate JSON/schema, run tests, pre-stop reviews). Don't rely on model 'judgment'—script it. These fit inside plugins, ensuring reliability. Confusion arises mistaking them for MCPs; they're workflow enforcers, not data fetches.

"Hooks and scripts are for the parts of your workflow where you should not rely on the model remembering to be careful... some things ought to be deterministic by which I mean some things should not be left to the model." (This quote stresses designing agents with human-enforced reliability, preventing failures in production workflows.)

Decision Framework and Tradeoffs

Choose by scale/repeatability:

Prompt: one-off, momentary.
Skill: reusable style/process.
Plugin: full, installable workflow (pros: 10x reuse, team-scale; cons: upfront build time, boundary definition skill needed).
Embed MCPs/hooks/scripts as needed.

Tradeoffs: Prompts are fast but non-scalable; skills universal but unmanaged sprawl (mitigate via power-law focus); plugins powerful but require workflow auditing (too big = fragile; too small = overhead). Non-engineers build them now (2026 tools simplify), e.g., editorial plugin for multi-source first-pass reviews (flags rough text, incoherence, facts—faster/better than human solo). Scaffolding makes generic LLMs 'smarter' via human structure, not model upgrades.

"You are literally the human plugin cuz you copy from one app you paste into the chat... if you don't want to be the human plugin consider making an actual plugin." (Jones reveals users already do plugin work manually, empowering non-coders to automate.)

Key Takeaways

Audit workflows: Identify repeatable structures (20% skills/plugins = 80% value) to cut prompt waste.
Bound plugins tightly: One job per plugin (e.g., split customer success into refunds/activations).
Determinism first: Use hooks/scripts for validation/tests—never model guesswork.
Start small: Write markdown skills for house styles; bundle into plugins for teams.
Leverage marketplaces: Plugins as sharable 'Lego structures', not passive app store shopping.
Non-engineers: Build via no-code 2026 tools; test with checklists/trust questions.
Why now: GPT 5.5+ handles messiness; focus shifts to scaffolding for 10x agent gains.
Replicate: Use decision tree (prompt/skill/plugin/MCP) + workflow edges skill for teams.