feed
last scrape: 2h ago · 961 in archive · 9 today
today

YC Founder Firesides interview with Vori CEO Brandon Hill on digitizing supermarkets from clipboards/faxes (starting with inventory reordering app), family roots in grocery, early customer wins, and $22M Series B. No robots or autonomy.
Vori's AI OS Digitizes $1.5T US Grocery Retail

Explains the "agent judge layer": a separate LLM validator at the action boundary that classifies risks into four buckets and uses a four-way decision (beyond yes/no) to gate agent tools, as Lindy added after unauthorized emails. Prompts & playbook here.
Agent Judge Layer Guards Production Actions

Walkthrough of Gemini's File Search API now embedding images alongside text for cross-modal retrieval, metadata filtering (e.g., department=legal), and page-level citations — Colab demo covers upload/query flow.
Gemini File Search Adds Multimodal RAG

Outlines a five-level bash security framework for AI coding agents—user prompts (L1), system prompts (L2), blacklist hooks like Damage Control (L3), whitelists (L4), no bash at all via MCP servers or Pi extensions (L5)—demoed with destructive prompts on Claude Code and Pi Agent, including a GPT exploit via fake package.json.
5 Levels to Eliminate Bash Risk in AI Agents

NLW argues AI expands the demand frontier via six elasticities—price, access, complexity, continuity, personalization, relational—plus a lasting "human premium," using healthcare (e.g., continuous monitoring) as a case study for new work categories. Companion read.
AI Expands Economy's Demand Frontier, Creating Human-Premium Jobs

Walkthrough of Okara AI CMO, a $99/mo SaaS that analyzes your site and runs background agents for SEO audits, GEO optimization, blog writing, and Reddit/HN/X posts.
Okara AI CMO deploys site-analyzing marketing agents

Breakdown of Rippling's SEO playbook: segment-specific site pages (e.g., HCM for HR leaders), audience-topic cluster blogs grown 40x via Semrush analysis, state-by-state labor law pages, plus original research PR and "Rippling Plus" for AI resilience. Semrush trial.
Rippling's 3-Pillar Playbook Scales Traffic 75x

Step-by-step on cross-compiling llama.cpp for ARMv6 via dockcross (no Neon/OpenMP), loading Q4/Q8 Falcon-H1-Tiny-90M on Pi OS Lite with --no-mmap and 128-token context; 4-bit coherent but slow (~3s/token), 2-bit nonsense.
90M Falcon Runs on 2014 Raspberry Pi

Reaction to Lars Faye's article on "agentic coding" causing cognitive atrophy, non-determinism complexity, skill loss, vendor lock-in, and token costs—with creator pushing back on costs while agreeing on atrophy risks. Sponsored by Browserbase.
Agentic Coding Trap: Cognitive Debt Hits Hard
yesterday

Commentary on McKinsey's Lilly platform exploit—$20 AI agent via SQL injection on 22 unauthenticated endpoints—as a procurement/strategy failure, not hygiene, plus vendor responses from Anthropic, OpenAI et al., and a 6-question checklist. Full playbook here.
Lily Hack: AI Procurement Ignores Agent Realities

Overview of Codex's new Chrome extension for signed-in browser access (Gmail, Salesforce, etc.) with allow/block lists, plus v0.129 CLI upgrades like Vim editing, plugin sharing, and hooks; v0.128 adds persisted goals and keymaps.
Codex Chrome Extension Enables Signed-In Browser Tasks

Narrator walks through Codex's new Chrome extension for signed-in browser automation (e.g., Gmail, Salesforce) plus CLI upgrades in v0.128/0.129 like Vim editing, better permissions, plugin sharing, hooks, and persisted goals—mostly explaining release notes.
Codex Chrome Extension Bridges Code to Real Browser Workflows

Quick demo and setup walkthrough for the Hermes Desktop App, a native UI wrapper for Nous Research's Hermes Agent that simplifies local multi-agent management, tool integration, and persistent memory on Windows/Mac/Linux. Brief OpenClaw comparison and basic usage examples included.
Hermes Desktop App Enables Easy Self-Evolving AI Agents

Walkthrough of enabling Codex's experimental /goal feature (add features.goals = true to config.toml) for autonomous long-running tasks like a ReAct loop with budget handling, then demoing it build a 2D survival game from plan.
Codex /goal: Simple Harness for Hour-Long AI Coding Agents

Tutorial on enabling Codex's experimental /goal slash command (via config tweak) for long-running autonomous coding, contrasting it with ReAct loops, followed by a hands-off demo building a 2D arcade game "Rift Salvage" from a detailed plan—includes plugs for the creator's Claude Code & Codex course, free community, and consults.
Codex /goal Beats Claude Code for Autonomous Coding

A hands-on 1-hour screen-share tutorial deploying open-source Hermes Agent to a Hostinger VPS, connecting it to Telegram, adding your first skill and cron job, GitHub backup, plus comparisons to Claude Code/OpenClaw and scaling tips.
Build Hermes AI Agent: VPS Setup to Scaled Automations

Hour-long walkthrough of installing open-source Hermes Agent on Hostinger VPS, wiring it to Telegram, adding a first skill/cron job, GitHub backup, plus its five pillars and vs. Claude Code/OpenClaw.
Build Self-Improving Hermes AI Agent on VPS

Walkthrough of Google Labs' Pomelli (free tool): input site URL to auto-extract brand/colors/tone, new Catalog pulls full product lineup, generate per-product photo shoots (e.g., model try-on templates) and full social campaigns, download ready-to-post.
Pomelli Catalog Imports Products for Scaled Campaigns

Hands-on demo of Google's Pomelli (Google Labs experiment), walking through brand setup from a website URL, auto-pulling products into a new Catalog feature, generating AI product photos, and building/downloading social campaigns for a jewelry shop example.
Pomelli Catalog Scales On-Brand Ads from Product Sites
this week

Reaction to Thariq Shihipar's thesis on using self-contained HTML files (with SVG, tables, JS) over Markdown for Claude Code specs, plans, and reports—covers 2-4x token cost (offset by 1M context), five use cases, and why Claude Code's filesystem/MCP access shines, with examples.
HTML Beats Markdown for AI Specs at 2-4x Token Cost

Outlines a four-step AI workflow to catch subtle errors in high-stakes outputs like contracts or due diligence: finish the draft, extract claims into a table, validate against sources (supported/conflicts/no proof/needs human judgment), rewrite. Copy-paste prompts are in the presentation.
4-Step Audit Catches AI's 'Almost Right' Errors

Four-step audit for subtle AI hallucinations in high-stakes docs: 1) finish AI draft, 2) extract claims to table (w/ sources), 3) validate vs source via 4 labels (supported/conflicts/no-proof/needs-human), 4) rewrite—all in fresh chats. Prompts in presentation.
4-Step AI Audit Catches 'Almost Right' Errors

Live demo of Archon (repo), a local framework for AI coding agents that uses YAML DAG workflows, git worktrees for parallel runs, and auto-loading skills to produce consistent PRs without repo conflicts. Covers setup on M4 Pro, transparency features, and tradeoffs like upfront workflow design.
Archon Fixes AI Agent Randomness with Harness Engineering

Live demo of Archon using "harness engineering": YAML DAG workflows, git worktrees for isolated parallel agents, and auto-loading skills with Claude Code for more consistent PRs. Model quality still matters, and workflows need upfront design.
Archon Makes AI Coding Agents Deterministic via Harness Engineering

Breakdown of "agentic scaffolding" around LLMs in Codex: prompts for one-offs, skills for reusable "house styles," plugins for installable workflows, plus MCPs/connectors and hooks/scripts. Links to a Substack guide with decision trees and examples.
AI Agents Need Scaffolding: Prompts to Plugins Guide