Transcript: AI agents move into workplaces

An AI-generated photo of an escaped wolf reportedly rerouted a real police search—and now someone’s been arrested over it. That’s the kind of downstream impact synthetic media is starting to have. Welcome to The Automated Daily, AI News edition. The podcast created by generative AI. I’m TrendTeller, and today is April 24th, 2026. In the next few minutes: OpenAI and Google both push AI agents deeper into workplace systems, Google says most new code is now AI-generated, we get fresh clues about OpenAI’s next image model, and there’s a big shift coming in how coding assistants may be billed. Let’s get into it.

Let’s start with the biggest theme of the week: AI agents becoming actual coworkers inside enterprise workflows, not just chatbots. OpenAI has introduced “workspace agents” in ChatGPT. Think of these as shared agents for teams that can run long, multi-step processes in the cloud, keep memory, use connected tools, and keep working in the background or on a schedule. The key point isn’t that they can write code—it’s that they’re designed to operate under an organization’s existing permissions, with approvals required for sensitive actions like sending emails or editing spreadsheets. OpenAI is positioning this as the next step after GPTs: less single-prompt Q&A, more business process automation with governance, analytics, and monitoring baked in.

Google is pushing in the same direction, but with a platform message aimed squarely at IT and engineering orgs. Google Cloud launched the Gemini Enterprise Agent Platform, pitched as a unified place to build, deploy, govern, and optimize agents—effectively a new layer that absorbs where Vertex AI was headed. Google is emphasizing the enterprise checklist: agent identity, a registry of approved tools and agents, and a gateway that enforces policies meant to reduce prompt injection and data leakage. It also leans hard into evaluation and observability, including simulation and tools that group failures and suggest instruction refinements. The takeaway: agent pilots are no longer the hard part—operating them safely, repeatedly, and audibly is the real product.

Google also unveiled “Workspace Intelligence,” which is a different but related bet: making Google Workspace itself the shared context engine for agents. Instead of each app—Gmail, Drive, Docs, Sheets—being its own island, Google wants a semantic layer that links files, conversations, collaborators, and projects into something Gemini can reason over. “Ask Gemini” in Chat is being framed as the command center, with features like briefings, context-based retrieval, and cross-app actions. This matters because the next competitive frontier against Microsoft 365 isn’t who has the best model—it’s who has the best, safest access to your organization’s living knowledge base.

As agents spread, the plumbing to connect them to real systems is becoming its own battleground. Anthropic’s Claude team is arguing that many teams will move from one-off API hookups to the Model Context Protocol, or MCP. Their claim is basically about scaling maintenance: direct integrations multiply quickly, and command-line shortcuts don’t translate well to hosted agents. MCP is positioned as a standardized way for systems to expose capabilities—plus discovery and authentication—to many different agent clients. Whether MCP becomes “the standard” is still open, but the direction is clear: agent ecosystems are converging on shared protocols the way the web converged on HTTP.

Now, a striking data point on how fast this is reshaping software work: Google says about 75% of newly created code is now generated by AI and then reviewed by human engineers. That’s a steep jump from roughly 25% in late 2024, and it reinforces a broader shift from “autocomplete” toward agentic workflows where AI can take on bigger chunks of engineering tasks. Google even cited an internal migration completed multiple times faster than a year ago. But there’s a human side here too: reports suggest some employees have AI-usage goals tied to performance reviews, and there’s internal tension around tool choices—like allowing some staff to use Anthropic’s Claude Code. The bigger signal is that AI usage is moving from optional productivity booster to measured expectation.

All of that brings up an uncomfortable question: are we even measuring the right things when we talk about model and inference performance? Applied Compute argues that classic LLM inference benchmarks—simple prompt and completion pairs—don’t resemble agent behavior anymore. Real agents are multi-turn, tool-using sessions with long-lived caches, bursts of short generations, and messy latency, including long tail delays while waiting on tools. They released recorded workload profiles and an open-source harness that replays full traces against OpenAI-compatible endpoints, accounting for tool wait time and cache behavior. The practical implication is that “fast tokens per second” can be misleading; for agent deployments, tail latency and cache capacity can become the real bottlenecks that decide whether an experience feels reliable.

Staying on reliability, two separate pieces landed on the same lesson: good agents are as much about process and documentation as they are about models. Augment studied AGENTS.md files—those agent-facing guidance docs—and found they can either meaningfully boost performance or actively make it worse. The best ones were short and structured for progressive disclosure: enough to guide common workflows, while pushing deep details into well-scoped reference docs. Meanwhile, investor and operator Garry Tan proposed “skillify”: turning every real agent failure into a durable, test-backed skill so the broken behavior becomes structurally hard to repeat. The shared message is simple: if you want dependable agents, you need the software engineering discipline—clear entry-point docs, deterministic checks, and regression tests—not just better prompts.

On the training side, Perplexity published a detailed look at how it post-trains search-augmented models without sacrificing safety or response quality. Their approach uses supervised fine-tuning to lock in “must not break” behaviors—like instruction following, consistency, and abstention—then applies on-policy reinforcement learning to improve search accuracy and reduce unnecessary tool calls. The notable idea is a gated reward design: preference-style rewards only count if the model first clears correctness and compliance checks, which helps avoid “optimizing into” unsafe or sloppy behavior. This matters because search agents are judged in production on multiple axes at once: accuracy, cost, latency, and trustworthiness.

Now to images—because there’s a fascinating breadcrumb trail around OpenAI’s next image generator. OpenAI briefly uploaded three anonymous image models to LM Arena earlier this month, then removed them within two days after the community connected the dots. Developers now widely refer to the likely contender as “GPT Image 2.” Leaks and community tests suggest improvements in the areas people actually notice: more reliable text rendering inside images, more natural color and realism, better depiction of real-world products and interfaces, and faster generation. The timing is important because OpenAI plans to shut down DALL‑E 2 and DALL‑E 3 on May 12th, so a successor needs to be ready—or users will feel the gap.

Here’s the story we teased at the top, and it’s a real-world warning shot. South Korean police arrested a man accused of disrupting the search for an escaped wolf by circulating an AI-generated image claiming to show the animal near a road intersection. The image spread, officials redirected resources, and residents received an emergency alert—before authorities determined the photo was fake. The suspect reportedly said he made it “for fun,” and he’s being investigated for obstructing government work. This is the growing problem in one snapshot: synthetic media doesn’t need to be perfect to cause harm; it just needs to be plausible enough, fast enough, at the exact wrong moment.

In open models, there’s a notable shift toward smaller systems that still feel close to “flagship” for coding. Simon Willison highlighted Qwen’s new open-weights model, Qwen3.6-27B, which Qwen claims beats a much larger prior open flagship on major coding benchmarks—while being dramatically smaller and more practical to run locally. Willison’s hands-on testing emphasized something that’s easy to miss in benchmark talk: accessibility. When strong performance fits into a footprint people can actually download and run, it changes who can build agentic tools on-prem, offline, or with tighter data control.

Two final business signals show where the economics and governance of AI tooling are heading. First, Microsoft is reportedly planning to move GitHub Copilot customers from request-based limits to token-based billing starting in June. If that happens, the big change is predictability: token pools may help orgs govern usage, but heavy users could see costs feel less fixed and more variable. Second, infrastructure company Vast Data says it raised a billion dollars at a $30 billion valuation, with Nvidia joining the investor group. That reinforces where capital is flowing: not just into models, but into the data and storage layer required to feed large-scale AI—because the “picks and shovels” are where costs and lock-in often live.

And one more quick note on governance and provenance, because communities are starting to fight about what “trusted code” even means. The MeshCore project’s core team says it split after a dispute involving governance, branding, and allegations that major components were rebuilt with AI-generated code without disclosure. Regardless of who’s right in that specific conflict, the broader point is timely: as AI-generated code becomes normal, expectations around transparency—what was generated, reviewed, and by whom—are becoming a social and security issue, not just a technical one.

That’s our update for April 24th, 2026. The throughline today is that AI is moving from tools you ask, to systems that act—inside workplaces, inside codebases, and sometimes, unfortunately, inside public incidents. As always, links to all the stories we covered can be found in the episode notes. Thanks for listening to The Automated Daily, AI News edition. I’m TrendTeller—see you tomorrow.