Transcript
AI spending vs real GDP & ChatGPT ads and trust - AI News (Feb 24, 2026)
February 24, 2026
← Back to episodeWhat if the biggest AI spending boom in history… barely moved the economy at all? A new Goldman Sachs read of the numbers says 2025’s AI capex added basically zero to U.S. GDP growth—and the reason is more concrete than you might think. Welcome to The Automated Daily, AI News edition. The podcast created by generative AI. I’m TrendTeller, and today is February-24th-2026. On deck: ChatGPT starts showing ads—sometimes right after your very first prompt—plus a clever “Personal Brain OS” that lives entirely in a Git repo, new tools to secure and monitor AI agents, and a grab bag of releases and research from Firefox controls to math proof attempts and even an AI-built FreeBSD Wi‑Fi driver.
Let’s start with the money question: is AI spending actually lifting the U.S. economy? Goldman Sachs economists are pushing back hard on the popular narrative. Their take: the 2025 surge in AI investment added “basically zero” to U.S. GDP growth. That’s a direct challenge to the story you’ve heard from parts of Wall Street—and from Washington—where AI data centers and chips are framed as a major growth engine. Goldman’s core argument is accounting, not vibes. A lot of the hardware that makes AI possible—especially leading-edge chips and key components—is imported. In GDP math, imports subtract from domestic output, so the investment doesn’t translate neatly into U.S. growth. In other words, a meaningful slice of “U.S. AI spend” shows up as GDP contribution in places like Taiwan and South Korea where the supply chain is anchored. And even beyond the import story, Goldman says we still don’t have a reliable way to measure how day-to-day AI usage becomes economy-wide productivity. A survey of nearly 6,000 executives adds a sobering layer: about 70% of firms say they’re using AI, but roughly 80% report no impact on employment or productivity so far. That doesn’t mean AI won’t matter—it means the big macro payoff may be slower, harder to measure, and maybe not where the capex headlines suggest.
Now, monetization—and the trust issues that come with it. OpenAI has begun showing ads in ChatGPT for U.S. users on the Free and Go tiers, with the rollout starting February 9. The detail that jumps out: sponsored placements can appear immediately after the very first prompt, before the system has much context on what you want. OpenAI says ads are visually separated from assistant responses and cannot influence what ChatGPT answers, and that conversations remain private from advertisers. Paid tiers—Plus, Pro, Business, Enterprise, and Education—don’t see ads in this beta. Free users reportedly can’t fully opt out, though they can dismiss individual ads and tweak personalization. The program is also positioned as premium advertising: reports point to around a $60 CPM and a $200,000 minimum commitment, so early access is largely limited to big brands. Even with guardrails—like excluding sensitive topics such as health and politics—this is a meaningful shift in how the product feels, especially after earlier backlash over “suggestions” that briefly appeared even for high-paying users.
Staying with “AI in products,” Microsoft is reportedly experimenting with something called Copilot Advisors: a structured debate between two AI personas on any question you pose. You pick two archetypes—say a legal expert versus a finance expert—assign them to affirmative and negative, and then the system plays out the argument, apparently in audio with distinct voices. It’s reminiscent of Google NotebookLM’s Audio Overviews, but with a more explicit, adversarial framing. If it ships, the best use case might be idea stress-testing: hearing the strongest version of both sides before you commit to a decision, write a memo, or walk into a meeting. No timeline yet, and no confirmation from Microsoft, but it fits the broader “multi-agent experiences” trend.
Let’s pivot to a practical problem many of you feel daily: repeating context to your assistant. Muratcan Koylan’s “Personal Brain OS” proposes a deceptively simple fix—make your AI’s long-term context a Git repo full of plain files. No database, no vector store, no custom retrieval service. Just Markdown, YAML, and JSONL—formats both humans and models can read directly in tools like Cursor or Claude Code. The interesting bit is not the file formats; it’s the workflow design. Koylan says the bottleneck isn’t better prompting, it’s “context engineering.” Models have limited attention, and if you cram everything into one giant system prompt, you hit the “lost in the middle” effect: the model misses important instructions because everything competes for space. So he splits the repo into 11 modules and uses progressive disclosure. A small routing file loads first, then module-level instruction files, and only then the heavy details—logs, research, configs—when the task requires it. He also adds an instruction hierarchy to prevent rule conflicts: repo onboarding rules at the top, a decision table mapping request types to action sequences, and module-level behavioral constraints. A standout component is “episodic memory”: append-only logs of experiences, decisions, and failures. The goal is to let the agent reference real tradeoffs you made previously instead of generating generic advice. And he treats the whole thing like a flat-file relational model—linking items via IDs—so the agent can traverse connections without loading the entire universe into context. The payoff is portability: clone the repo anywhere, and the assistant starts with your voice, priorities, and pipelines already versioned in Git.
If you’re building or deploying agents, security is rapidly becoming the second half of the conversation. Anthropic just launched Claude Code Security in a limited research preview. The pitch is codebase scanning that reasons more like a human reviewer—following data flow and component interactions—so it can catch complex issues like broken access control or business-logic vulnerabilities. To keep noise down, Anthropic says each finding goes through a multi-stage verification process where Claude tries to disprove itself before the report reaches a human analyst. Findings come with severity and confidence scores, and suggested patches are presented for review—nothing is auto-applied. Anthropic also claims internal use of Claude Opus 4.6 surfaced 500-plus vulnerabilities in production open-source projects, now being triaged and responsibly disclosed. On the governance side, Anthropic also posted that roughly half of agentic tool calls through its API come from software engineering, and argued that as autonomy increases, post-deployment monitoring becomes non-optional. That’s a notable framing shift: not “evaluate before release,” but “assume you must observe and adjust in the wild.” Meanwhile, Wiz is pushing a gated one-pager called “Securing AI Agents 101.” The content promises a quick, practical checklist—where risk emerges in agent deployments and how to reduce exposure—but it’s behind a form. Still, the fact that cloud security vendors are packaging “agent security” as a baseline concept tells you where enterprise attention is heading.
Related: Anthropic released an “AI Fluency Index,” looking beyond adoption to whether people are learning to collaborate effectively with AI. They analyzed 9,830 anonymized multi-turn Claude conversations from a week in January 2026, using a framework that defines observable behaviors like iteration, clarifying goals, and questioning reasoning. The headline: longer, iterative conversations correlate strongly with better “fluency.” Iteration and refinement showed up in about 85.7% of chats, and those iterative users displayed more collaborative behaviors—like being 5.6 times more likely to question the model’s reasoning. But there’s a twist: when users asked Claude to generate “artifacts”—apps, code, documents—people became more directive, specifying formats and examples, yet less evaluative. They were less likely to fact-check, identify missing context, or challenge reasoning. One plausible explanation is psychological: polished outputs feel trustworthy. Another is practical: evaluation happens outside the chat—by running code or testing—so it isn’t visible in logs. Either way, it’s a useful reminder: the shinier the output, the more you should slow down and verify.
On agentic coding specifically, we have a cluster of stories that all rhyme. First, a benchmark-for-workflows experiment: Martin Alderson tested 19 web frameworks for “token efficiency” by having Claude Code build the same simple blog app—SQLite persistence, basic CSS, run on port 3003, verify via curl. All 19 worked, which is itself a big change from a year ago. But the token costs varied sharply: minimal frameworks clustered around roughly 26–29k tokens, with ASP.NET Minimal API the cheapest at about 26k, while Phoenix was the priciest at about 74k. Then he resumed each session and asked for a feature addition—categories with a new table, seeded values, a dropdown, filtering, and verification. That succeeded in 18 of 19, with Spring Boot breaking due to migrations not running correctly. The bigger insight: framework choice mattered most in initial setup; extending an existing codebase cost a more similar amount across frameworks. Second, a developer perspective piece argues why people keep coming back to Claude: not because it always writes the best snippet, but because it has better “process discipline.” The claim is that real coding success is maybe 40% code generation and 60% workflow behavior—safe edits, staying on track, asking clarifying questions, and not making risky, sweeping changes. Third, an analysis asks why Claude’s desktop app is built with Electron despite all the “agents can write anything” energy. The answer is the unglamorous last mile: cross-platform native apps multiply maintenance burden, and agents still struggle with edge cases and long-term reliability. Electron remains the pragmatic choice when you value consistent behavior and a single codebase. And finally, a product moment: Amp argues “coding agents” as a category are effectively over because the model is now the star, not the wrapper. Amp is discontinuing its VS Code and Cursor extensions—dramatically, they’ll ‘self-destruct’ on March 5 at 8pm Pacific—and pushing users to a CLI, as it reorients toward more autonomous, less editor-centric workflows.
If you want to run agents on your own infrastructure, there’s also an open-source angle: OpenClaw, an MIT-licensed agent framework aimed at real workflows rather than chatbot UI glue. It’s model-agnostic, has “channels” for platforms like Slack or REST, a skill engine for plugins, and a sandbox model that can expand permissions gradually. The promise is a “serverless brain” pattern: webhook triggers come in—new support ticket, new lead, new incident—OpenClaw classifies, summarizes, and chains actions across your systems. The caution is equally clear: setup and operations are non-trivial, and third-party skills can be risky. So if you treat it like production software—with authentication, deployment discipline, and least privilege—it could be powerful. If you treat it like a weekend bot, it could become an attack surface.
Consumer and platform updates: Mozilla released Firefox 148, and the standout feature is an explicit AI kill switch. You can go to Settings, AI Controls, and toggle “Block AI Enhancements.” Mozilla says future updates will respect that preference—no sneaky re-enabling—and it also removes previously downloaded AI models. There’s selective blocking too, so you can keep on-device translation while avoiding cloud AI. Firefox 148 also adds security-focused APIs—Trusted Types and the Sanitizer API—to reduce cross-site scripting risk, plus accessibility improvements for PDFs and new translation languages like Vietnamese and Traditional Chinese. On the hardware front, Bloomberg reports Apple is accelerating work on AI wearables: smart glasses, a pendant, and AirPods with cameras, all aimed at deeper Siri integration. The potentially surprising detail is a reported $2 billion acquisition of a startup called Q.ai, tied to “silent voice” input—interpreting micro facial movements so you can ‘speak’ to the assistant without actually talking. If Apple can make that reliable, it would address the main social friction of voice assistants: you don’t always want to talk to your glasses in public. And in the analytics and web-building world, there’s a lot of marketing noise—Amplitude is promoting an “Amplitude AI” event with demos of analytics agents and MCP integrations, and Framer is pushing AI-assisted no-code site building plus a startup program. The trend line is still worth noting: vendors everywhere are trying to make their proprietary data—product analytics, CMS content, customer behavior—portable as “context” inside the tools people already use, from chat assistants to IDEs.
Two research notes to close. OpenAI published the full compiled set of its “First Proof” attempts—research-level math problems designed to test whether an AI can generate correct, checkable proofs in specialized domains. After expert feedback, OpenAI believes five attempts are likely correct, while at least one earlier optimistic assessment—problem 2—has been retracted as likely incorrect. The bigger story here is transparency: we’re seeing frontier reasoning evaluated not as a single benchmark score, but as a living dossier of arguments, reviews, and revisions. And a new arXiv paper proposes using LLMs to discover improved multiagent learning algorithms for imperfect-information games. The authors’ system—an evolutionary coding agent called AlphaEvolve—searches a huge design space of algorithm variations and outputs new variants like VAD-CFR and SHOR-PSRO, with some mechanisms described as non-intuitive but empirically stronger. It’s an early example of AI not just playing games well, but helping invent the training algorithms that future agents might rely on.
One of the most concrete “AI helps build real systems” stories today comes from the FreeBSD world. Vladimir Varankin tried to repurpose a 2016 MacBook Pro as a FreeBSD 15 test machine, only to run into a classic problem: Broadcom Wi‑Fi. The BCM4350 chip lacks native FreeBSD support, and the common workaround is wifibox—passing the Wi‑Fi device into a Linux VM. Varankin first tried the obvious route: port the Linux brcmfmac driver using FreeBSD’s LinuxKPI layer. An AI assistant helped, it compiled, and then it fell apart on real hardware—kernel panics, missing LinuxKPI features, and a ballooning diff with no clear progress. So he switched tactics: instead of porting code, he used an AI agent to generate a clean-room specification of how brcmfmac works for that chip—an 11-chapter “book”—cross-checked it against Linux source with multiple models, and then built a new FreeBSD driver from scratch. After many iterations, the result can scan networks, connect on 2.4 and 5 GHz, and do WPA/WPA2 auth. The repo is public, the author says he wrote none of the code, and he warns it’s best treated as a study exercise with known issues and some licensing debate. Still, as a case study in “spec first, code second,” it’s hard to ignore.
That’s the AI news for February-24th-2026: a reality check on AI’s macroeconomic impact, ads landing in ChatGPT, a Git-based approach to persistent agent context, and a steady march toward both more capable—and more securable—agents. As always, links to all stories can be found in the episode notes. Thanks for listening to The Automated Daily, AI News edition. I’m TrendTeller—see you tomorrow.