AI News · March 11, 2026 · 7:49

Agentic AI hacks McKinsey chatbot & Pentagon rolls out Gemini agents - AI News (Mar 11, 2026)

McKinsey’s AI bot breach, Gemini agents at the Pentagon, Anthropic vs DoD, OpenAI’s data-center pivot, and AI usage surging toward search levels.

Agentic AI hacks McKinsey chatbot & Pentagon rolls out Gemini agents - AI News (Mar 11, 2026)
0:007:49

Our Sponsors

Today's AI News Topics

  1. Agentic AI hacks McKinsey chatbot

    — A red-team claims an autonomous AI agent breached McKinsey’s Lilli platform via exposed API docs and unauthenticated endpoints, highlighting agentic offensive security and prompt-poisoning risk.
  2. Pentagon rolls out Gemini agents

    — Google will deploy Gemini AI agents across the Defense Department’s unclassified networks, signaling scaled adoption of task-running agents for routine Pentagon productivity work.
  3. Anthropic fights Pentagon blacklisting

    — Anthropic is suing the US government over being labeled a national-security supply-chain risk, testing how procurement power collides with AI safety limits and defense demand.
  4. OpenAI rethinks Stargate data centers

    — Reports say OpenAI is pulling back from expanding an Oracle-backed Texas site as GPU generations move fast, exposing the timing mismatch between chip cycles and data-center buildouts.
  5. AI usage rivals search volumes

    — New estimates suggest AI assistants generate tens of billions of monthly sessions, shifting discovery away from classic SEO toward LLM visibility across mobile and app-based usage.
  6. Bayesian teaching improves LLM adaptation

    — Google Research shows ‘Bayesian teaching’ can train LLMs to update beliefs more reliably over repeated interactions, improving agent adaptability beyond single-turn chat.
  7. Debian stalls rules for AI code

    — Debian debated disclosure and responsibility for LLM-generated contributions but chose case-by-case handling, reflecting ongoing uncertainty around quality, licensing, and policy.
  8. OpenAI to acquire Promptfoo

    — Promptfoo says it agreed to be acquired by OpenAI while staying open source, underscoring how evals, red-teaming, and safety testing are becoming core AI infrastructure.

Sources & AI News References

Full Episode Transcript: Agentic AI hacks McKinsey chatbot & Pentagon rolls out Gemini agents

An autonomous AI agent reportedly broke into a major consulting firm’s internal chatbot platform in about two hours—no credentials, just a fast-moving chain of ordinary web mistakes. Welcome to The Automated Daily, AI News edition. The podcast created by generative AI. I’m TrendTeller, and today is March 11th, 2026. Here’s what’s happening in AI—what changed, and why it matters.

Agentic AI hacks McKinsey chatbot

Let’s start with security, because it’s getting more automated. A red-team startup called CodeWall says its autonomous agent breached McKinsey’s internal gen-AI platform, Lilli, and reached read-write access to the production database. The claim is that the agent found exposed API documentation, then followed unauthenticated paths to a classic vulnerability—ending with access that could have revealed massive volumes of chats and internal files. McKinsey says it fixed the issues quickly and found no evidence of unauthorized access beyond the researchers. The bigger takeaway is speed: when agentic systems can run reconnaissance, exploit attempts, and iteration loops end-to-end, basic misconfigurations become far more dangerous—and prompt manipulation becomes a realistic, quiet way to sabotage an AI system at scale.

Pentagon rolls out Gemini agents

On the defense side, Google is preparing to introduce Gemini AI “agents” across the Pentagon’s roughly three-million-person workforce, starting on unclassified networks. These are positioned as task-running agents—things that take an assignment and execute it with minimal babysitting. Why it matters is less about shiny demos and more about institutional adoption: the Defense Department is signaling it wants agents in everyday administrative workflows, at real scale. And the decision to begin on unclassified systems suggests a phased approach—prove controls, auditing, and usefulness in lower-risk environments before anyone even considers sensitive networks.

Anthropic fights Pentagon blacklisting

That rollout lands in a tense moment for defense procurement. Anthropic has filed lawsuits challenging the Pentagon’s decision to label the company a national-security “supply-chain risk.” Anthropic argues the designation is unlawful and retaliatory, tied to the company’s refusal to allow what it describes as unrestricted military use—especially around mass surveillance or fully autonomous weapons. The Pentagon’s position, publicly, is that lawful use shouldn’t be constrained by a vendor’s policy preferences. This case matters because it could set a precedent for how much leverage governments have to effectively blacklist AI providers, and how much room vendors have to enforce safety boundaries when public-sector demand is growing fast.

OpenAI rethinks Stargate data centers

Now to the infrastructure story that keeps getting more complicated. Reports indicate OpenAI has pulled back from plans to expand its Stargate data-center partnership with Oracle at a site in Abilene, Texas—apparently looking for newer Nvidia GPU generations and larger clusters elsewhere. Oracle disputes the characterization, but the underlying tension is real: data centers take a year or two to plan and energize, while AI chips are improving on something close to an annual rhythm. That mismatch turns big buildouts into a gamble—by the time the power is on, the hardware can feel like last year’s model. It’s also putting pressure on the financing side of AI infrastructure, where debt-funded expansion is far less forgiving than the cash-heavy strategies of hyperscalers.

AI usage rivals search volumes

That same dynamic rippled into markets, with SoftBank shares taking a hit amid chatter that financing and demand assumptions for giant AI data-center plans are softer than the headlines suggested. Whether or not any single site expansion is ‘on’ or ‘off,’ investors are increasingly asking a harder question: how much of the announced capacity will actually be built, on time, with power secured—and with customers ready to pay. As models get more efficient and hardware cycles compress, the risk isn’t just overbuilding. It’s locking into a timeline that makes the economics worse by default.

Bayesian teaching improves LLM adaptation

A separate debate this week is about who’s really paying for AI usage. Viral claims around premium coding plans suggested an individual subscriber could ‘consume’ thousands of dollars in compute a month. A counter-argument making the rounds is that those big numbers often reflect retail, token-based API pricing—not the provider’s actual inference cost. In other words, the price you see on an API menu is not the same thing as the marginal cost of running the model. The practical implication: some intermediaries may be the ones squeezed—especially if they pay close to retail API rates while selling flat-fee plans. And for the model labs, the bigger financial gravity well may still be training and research, not serving everyday inference.

Debian stalls rules for AI code

On adoption and discovery: one analysis from Graphite’s CEO estimates AI assistants now generate around 45 billion monthly sessions worldwide—arguing that’s already a sizable fraction of global search volume. The point isn’t that search is dead; it’s that discovery is fragmenting. A lot of AI usage happens inside mobile apps and assistant interfaces, which means classic web analytics can miss the real shift. If you’re building products or marketing, the implication is straightforward: you still care about SEO, but you also need an LLM visibility strategy—how your brand, docs, and data show up when users ask an assistant instead of typing keywords.

OpenAI to acquire Promptfoo

In research, Google has published work aimed at making LLMs update beliefs more like Bayesian inference over repeated interactions. In a multi-round recommendation game, off-the-shelf models often improved only slightly after the first turn. But after fine-tuning with ‘Bayesian teaching’—training the model to make uncertainty-aware updates—it performed more consistently and generalized to other recommendation domains. Why this matters is agents, not chatbots: in the real world, assistants have to revise assumptions about what a user wants, what’s changed, and what’s likely true. Teaching models to adjust beliefs reliably could reduce the whiplash we see today—where an AI seems smart in one turn and strangely stubborn in the next.

Finally, on the open-source governance front: Debian developers debated whether to formalize rules for LLM-generated contributions—covering disclosure, responsibility for security and licensing, and concerns about low-effort drive-by patches. The discussion ultimately ended without a formal resolution, keeping decisions case by case. That may sound anticlimactic, but it’s actually a telling signal: major projects aren’t just arguing about code quality—they’re weighing mentorship, fairness if tools become paywalled, and legal uncertainty around training data and outputs. For now, Debian is choosing flexibility over a rule that could age badly in a fast-changing landscape.

And one quick M&A note: Promptfoo says it has agreed to be acquired by OpenAI while remaining open source and continuing to support multiple model providers. Promptfoo focuses on testing and adversarial evaluation of AI apps—exactly the kind of work enterprises need when they’re worried about jailbreaks, data leakage, or unexpected behavior. The significance here is directional: model labs are treating evals and security testing less like optional tooling, and more like foundational infrastructure that needs to be built in—not bolted on.

That’s the AI rundown for March 11th, 2026. If there’s one theme today, it’s that AI is shifting from “answers” to “actions”—and that raises the stakes for security, procurement, and infrastructure planning. Links to all stories can be found in the episode notes. Thanks for listening—this is TrendTeller, and I’ll be back tomorrow with more from The Automated Daily, AI News edition.