AI agents move into workplaces & Google’s agent platform shift - AI News (Apr 24, 2026)
OpenAI workspace agents vs Google’s agent platform, 75% AI-written code at Google, GPT Image 2 hints, Copilot token billing, and AI misinformation.
Our Sponsors
Today's AI News Topics
-
AI agents move into workplaces
— OpenAI introduced ChatGPT “workspace agents” that run long workflows with tool access, memory, approvals, and enterprise controls—pushing AI deeper into real operations. -
Google’s agent platform shift
— Google launched the Gemini Enterprise Agent Platform with governance, identity, registry, runtime, and evaluation, signaling Vertex AI’s roadmap is consolidating into an agent-first platform. -
AI-generated code becomes mainstream
— Google says around 75% of new code is AI-generated then reviewed by engineers, highlighting the rapid normalization of AI-assisted development and new management pressures. -
Realistic benchmarking for agent workloads
— Applied Compute argues classic LLM benchmarks miss agentic reality, releasing recorded multi-turn tool-using traces and a replay harness to measure latency tails, KV-cache pressure, and throughput. -
OpenAI’s next image model
— OpenAI briefly tested anonymous image models on LM Arena; the community suspects “GPT Image 2,” with stronger text-in-image, photorealism, and speed—timed ahead of the DALL‑E shutdown. -
How to make agents reliable
— Augment’s AGENTS.md study and Garry Tan’s “skillify” idea both point to a key lesson: durable agent reliability comes from tight docs, deterministic safeguards, and tests—not prompt tweaks. -
Training search agents without regressions
— Perplexity described a two-stage post-training pipeline—SFT plus on-policy RL with gated rewards—to improve search accuracy and tool efficiency without breaking safety and style guardrails. -
Misinformation from AI images
— South Korean police arrested a man over an AI-generated wolf photo that diverted an emergency search, underscoring how synthetic media can waste public resources and spark panic. -
Open models get more capable
— Qwen’s Qwen3.6-27B is being praised for near-flagship agentic coding performance at a far smaller footprint, accelerating the shift toward powerful local and on-prem models. -
Costs and governance reshape tooling
— Microsoft is reportedly moving GitHub Copilot toward token-based billing, while infrastructure funding like Vast Data’s big raise shows costs, governance, and scale are driving product decisions.
Sources & AI News References
- → OpenAI Launches Shared ‘Workspace Agents’ for Team Workflows in ChatGPT
- → Google Cloud Launches Gemini Enterprise Agent Platform to Build and Govern AI Agents
- → Google: 75% of New Code Is AI-Generated as Company Moves to Agentic Workflows
- → Applied Compute Releases Agentic Workload Benchmarks to Test LLM Inference Engines
- → Report: OpenAI quietly tests ‘GPT Image 2’ with hints of a near-term launch
- → Study Finds AGENTS.md Can Sharply Improve or Degrade AI Coding Output
- → Perplexity Unveils Two-Stage SFT-to-RL Pipeline to Train More Efficient, Reliable Search Agents
- → Google Launches Workspace Intelligence to Connect Gemini Across Gmail, Drive, Docs and Chat
- → South Korea arrests man over AI-generated photo that misled search for escaped zoo wolf
- → Ex-OpenAI researcher Jerry Tworek launches Core Automation to automate AI research
- → Anthropic Explains Why Production AI Agents Are Shifting to the Model Context Protocol
- → Garry Tan Calls for ‘Skillify’ Workflow to Make AI Agent Fixes Permanent
- → Vast Data raises $1 billion at $30 billion valuation with Nvidia among backers
- → Google Cloud Next 2026 in Las Vegas to Spotlight Agentic AI and Keynotes
- → Simon Willison Tests Qwen3.6-27B, a Smaller Open Model Claiming Flagship Coding Performance
- → AI-Managed SF Store Draws Scrutiny Over Odd Orders and Pay Disparity
- → Every Podcast Argues Humans Provide the ‘Bread’ in AI Workflows as Workplace Agents Consolidate
- → MeshCore Core Team Splits After Trademark and AI-Code Dispute with Andy Kirby
- → Anker Unveils ‘Thus’ Compute-in-Memory Chip to Bring Local AI to Earbuds and More
- → Personalized LLM Answers Often Share a Stable Core, Not Infinite Divergence
- → Microsoft Reportedly Shifting GitHub Copilot to Token-Based Billing Starting in June
Full Episode Transcript: AI agents move into workplaces & Google’s agent platform shift
An AI-generated photo of an escaped wolf reportedly rerouted a real police search—and now someone’s been arrested over it. That’s the kind of downstream impact synthetic media is starting to have. Welcome to The Automated Daily, AI News edition. The podcast created by generative AI. I’m TrendTeller, and today is April 24th, 2026. In the next few minutes: OpenAI and Google both push AI agents deeper into workplace systems, Google says most new code is now AI-generated, we get fresh clues about OpenAI’s next image model, and there’s a big shift coming in how coding assistants may be billed. Let’s get into it.
AI agents move into workplaces
Let’s start with the biggest theme of the week: AI agents becoming actual coworkers inside enterprise workflows, not just chatbots. OpenAI has introduced “workspace agents” in ChatGPT. Think of these as shared agents for teams that can run long, multi-step processes in the cloud, keep memory, use connected tools, and keep working in the background or on a schedule. The key point isn’t that they can write code—it’s that they’re designed to operate under an organization’s existing permissions, with approvals required for sensitive actions like sending emails or editing spreadsheets. OpenAI is positioning this as the next step after GPTs: less single-prompt Q&A, more business process automation with governance, analytics, and monitoring baked in.
Google’s agent platform shift
Google is pushing in the same direction, but with a platform message aimed squarely at IT and engineering orgs. Google Cloud launched the Gemini Enterprise Agent Platform, pitched as a unified place to build, deploy, govern, and optimize agents—effectively a new layer that absorbs where Vertex AI was headed. Google is emphasizing the enterprise checklist: agent identity, a registry of approved tools and agents, and a gateway that enforces policies meant to reduce prompt injection and data leakage. It also leans hard into evaluation and observability, including simulation and tools that group failures and suggest instruction refinements. The takeaway: agent pilots are no longer the hard part—operating them safely, repeatedly, and audibly is the real product.
AI-generated code becomes mainstream
Google also unveiled “Workspace Intelligence,” which is a different but related bet: making Google Workspace itself the shared context engine for agents. Instead of each app—Gmail, Drive, Docs, Sheets—being its own island, Google wants a semantic layer that links files, conversations, collaborators, and projects into something Gemini can reason over. “Ask Gemini” in Chat is being framed as the command center, with features like briefings, context-based retrieval, and cross-app actions. This matters because the next competitive frontier against Microsoft 365 isn’t who has the best model—it’s who has the best, safest access to your organization’s living knowledge base.
Realistic benchmarking for agent workloads
As agents spread, the plumbing to connect them to real systems is becoming its own battleground. Anthropic’s Claude team is arguing that many teams will move from one-off API hookups to the Model Context Protocol, or MCP. Their claim is basically about scaling maintenance: direct integrations multiply quickly, and command-line shortcuts don’t translate well to hosted agents. MCP is positioned as a standardized way for systems to expose capabilities—plus discovery and authentication—to many different agent clients. Whether MCP becomes “the standard” is still open, but the direction is clear: agent ecosystems are converging on shared protocols the way the web converged on HTTP.
OpenAI’s next image model
Now, a striking data point on how fast this is reshaping software work: Google says about 75% of newly created code is now generated by AI and then reviewed by human engineers. That’s a steep jump from roughly 25% in late 2024, and it reinforces a broader shift from “autocomplete” toward agentic workflows where AI can take on bigger chunks of engineering tasks. Google even cited an internal migration completed multiple times faster than a year ago. But there’s a human side here too: reports suggest some employees have AI-usage goals tied to performance reviews, and there’s internal tension around tool choices—like allowing some staff to use Anthropic’s Claude Code. The bigger signal is that AI usage is moving from optional productivity booster to measured expectation.
How to make agents reliable
All of that brings up an uncomfortable question: are we even measuring the right things when we talk about model and inference performance? Applied Compute argues that classic LLM inference benchmarks—simple prompt and completion pairs—don’t resemble agent behavior anymore. Real agents are multi-turn, tool-using sessions with long-lived caches, bursts of short generations, and messy latency, including long tail delays while waiting on tools. They released recorded workload profiles and an open-source harness that replays full traces against OpenAI-compatible endpoints, accounting for tool wait time and cache behavior. The practical implication is that “fast tokens per second” can be misleading; for agent deployments, tail latency and cache capacity can become the real bottlenecks that decide whether an experience feels reliable.
Training search agents without regressions
Staying on reliability, two separate pieces landed on the same lesson: good agents are as much about process and documentation as they are about models. Augment studied AGENTS.md files—those agent-facing guidance docs—and found they can either meaningfully boost performance or actively make it worse. The best ones were short and structured for progressive disclosure: enough to guide common workflows, while pushing deep details into well-scoped reference docs. Meanwhile, investor and operator Garry Tan proposed “skillify”: turning every real agent failure into a durable, test-backed skill so the broken behavior becomes structurally hard to repeat. The shared message is simple: if you want dependable agents, you need the software engineering discipline—clear entry-point docs, deterministic checks, and regression tests—not just better prompts.
Misinformation from AI images
On the training side, Perplexity published a detailed look at how it post-trains search-augmented models without sacrificing safety or response quality. Their approach uses supervised fine-tuning to lock in “must not break” behaviors—like instruction following, consistency, and abstention—then applies on-policy reinforcement learning to improve search accuracy and reduce unnecessary tool calls. The notable idea is a gated reward design: preference-style rewards only count if the model first clears correctness and compliance checks, which helps avoid “optimizing into” unsafe or sloppy behavior. This matters because search agents are judged in production on multiple axes at once: accuracy, cost, latency, and trustworthiness.
Open models get more capable
Now to images—because there’s a fascinating breadcrumb trail around OpenAI’s next image generator. OpenAI briefly uploaded three anonymous image models to LM Arena earlier this month, then removed them within two days after the community connected the dots. Developers now widely refer to the likely contender as “GPT Image 2.” Leaks and community tests suggest improvements in the areas people actually notice: more reliable text rendering inside images, more natural color and realism, better depiction of real-world products and interfaces, and faster generation. The timing is important because OpenAI plans to shut down DALL‑E 2 and DALL‑E 3 on May 12th, so a successor needs to be ready—or users will feel the gap.
Costs and governance reshape tooling
Here’s the story we teased at the top, and it’s a real-world warning shot. South Korean police arrested a man accused of disrupting the search for an escaped wolf by circulating an AI-generated image claiming to show the animal near a road intersection. The image spread, officials redirected resources, and residents received an emergency alert—before authorities determined the photo was fake. The suspect reportedly said he made it “for fun,” and he’s being investigated for obstructing government work. This is the growing problem in one snapshot: synthetic media doesn’t need to be perfect to cause harm; it just needs to be plausible enough, fast enough, at the exact wrong moment.
In open models, there’s a notable shift toward smaller systems that still feel close to “flagship” for coding. Simon Willison highlighted Qwen’s new open-weights model, Qwen3.6-27B, which Qwen claims beats a much larger prior open flagship on major coding benchmarks—while being dramatically smaller and more practical to run locally. Willison’s hands-on testing emphasized something that’s easy to miss in benchmark talk: accessibility. When strong performance fits into a footprint people can actually download and run, it changes who can build agentic tools on-prem, offline, or with tighter data control.
Two final business signals show where the economics and governance of AI tooling are heading. First, Microsoft is reportedly planning to move GitHub Copilot customers from request-based limits to token-based billing starting in June. If that happens, the big change is predictability: token pools may help orgs govern usage, but heavy users could see costs feel less fixed and more variable. Second, infrastructure company Vast Data says it raised a billion dollars at a $30 billion valuation, with Nvidia joining the investor group. That reinforces where capital is flowing: not just into models, but into the data and storage layer required to feed large-scale AI—because the “picks and shovels” are where costs and lock-in often live.
And one more quick note on governance and provenance, because communities are starting to fight about what “trusted code” even means. The MeshCore project’s core team says it split after a dispute involving governance, branding, and allegations that major components were rebuilt with AI-generated code without disclosure. Regardless of who’s right in that specific conflict, the broader point is timely: as AI-generated code becomes normal, expectations around transparency—what was generated, reviewed, and by whom—are becoming a social and security issue, not just a technical one.
That’s our update for April 24th, 2026. The throughline today is that AI is moving from tools you ask, to systems that act—inside workplaces, inside codebases, and sometimes, unfortunately, inside public incidents. As always, links to all the stories we covered can be found in the episode notes. Thanks for listening to The Automated Daily, AI News edition. I’m TrendTeller—see you tomorrow.