AI News · February 23, 2026 · 13:23

Google AI Ultra account restrictions & BinaryAudit benchmark for backdoors - AI News (Feb 23, 2026)

Google AI Ultra users report sudden lockouts, BinaryAudit tests AI vs binary backdoors, Pinterest wrestles with AI slop, plus Aqua, LLM Timeline, Wittgenstein.

Google AI Ultra account restrictions & BinaryAudit benchmark for backdoors - AI News (Feb 23, 2026)
0:0013:23

Our Sponsors

Today's AI News Topics

  1. 01

    Google AI Ultra account restrictions

    — A Google AI Developers Forum thread details a sudden Google AI Ultra restriction after a Gemini OAuth integration, with slow support response, billing confusion, and users migrating away.
  2. 02

    BinaryAudit benchmark for backdoors

    — Quesma’s open-source BinaryAudit benchmark tests AI agents on detecting injected backdoors in stripped binaries using tools like Ghidra and Radare2, highlighting high false positives and uneven model accuracy.
  3. 03

    Pinterest AI slop and moderation

    — Artists report Pinterest feeds flooded with AI-generated content and automated moderation errors—human-made art mislabeled as “AI modified,” takedowns, appeals loops, and trust issues amid an AI-first strategy.
  4. 04

    Aqua encrypted agent messaging protocol

    — Aqua (AQUA Queries & Unifies Agents) is a Go-based open-source protocol and CLI for peer-to-peer, end-to-end encrypted agent messaging with identity verification, durable queues, and relay support.
  5. 05

    LLM Timeline: models and milestones

    — The LLM Timeline site catalogs 194+ LLM releases from Transformers (2017) through early 2026, tracking openness, parameter counts, long-context, MoE efficiency, multimodality, and reasoning models.
  6. 06

    Wittgenstein, meaning, and LLM coding

    — An essay uses Wittgenstein’s “meaning is use” and “language games” to explain why LLMs struggle with subjective goals in creative coding, and why shared codebases ground intent better than prompts.

Sources & AI News References

Full Episode Transcript: Google AI Ultra account restrictions & BinaryAudit benchmark for backdoors

Imagine paying $249 a month for an AI subscription—and getting locked out for days with no warning, no violation notice, and no clear way to even file a bug. That’s where we’re starting today. Welcome to The Automated Daily, AI News edition. The podcast created by generative AI. I’m TrendTeller, and today is February-23rd-2026. We’ve got a packed rundown: an escalating support nightmare around Google’s Gemini ecosystem, a new benchmark that asks whether AI agents can spot hidden backdoors in huge binaries, creators pushing back on Pinterest’s AI flood, plus a fresh open-source messaging layer for agent-to-agent comms, a detailed timeline of the LLM era, and a thoughtful take on why “meaning” is the real bottleneck when you build with models.

Google AI Ultra account restrictions

First up: account restrictions and the not-so-glamorous side of “AI subscriptions.” On Google’s AI Developers Forum, a user named Aminreza Khoshbahar says their Google AI Ultra account was suddenly restricted—no prior warning, no stated policy violation—and it stayed unusable for three days. The timing is what raised eyebrows: the only recent workflow change they mention is connecting Gemini models through a third-party tool, OpenClaw, using OAuth. Their argument is pretty straightforward: if Google doesn’t want certain third-party integrations, block the integration. Don’t silently kneecap a paid account—especially one billed at $249 per month—without any communication. The user says they emailed support and heard nothing back, and also complained that what looked like the “right” support channel required paying extra, which feels absurd given the subscription price. A Google representative, Abhijit Pramanik, replied that the report was shared internally for investigation and suggested filing a bug via an in-app feedback tool called Antigravity. But the original poster’s response highlights the catch-22: they were logged out and couldn’t access the app at all, so in-app reporting wasn’t possible. They even posted a screenshot showing an account restriction message. By “Day 4,” they said official channels were still silent—no acknowledgment through support or the feedback center—and they started moving data and subscriptions away from Google. Another forum participant, Mike_L, added a familiar enterprise-meets-consumer confusion: they were bounced between Google Cloud Support and Google One Support, with each side pointing to the other because the restriction seemed tied to a personal subscription rather than a Cloud billing account. Mike_L also said they didn’t get replies from the mailboxes associated with feedback and support, and noted their issue appeared days after buying an annual subscription. The thread attracted more users asking how it ended, which is its own kind of signal: a lot of people are now conditioned to treat “sudden restriction + no human response” as a pattern, not a one-off. The bigger takeaway isn’t just that support can be slow. It’s that AI products are increasingly stitched together from identity, billing, safety systems, and third-party integrations. When the safety system trips, users can end up with zero visibility, no workable escalation path, and a support maze split across consumer and cloud org charts. If you’re building a business workflow on top of these services, that’s operational risk—not just inconvenience.

BinaryAudit benchmark for backdoors

Staying with platforms making aggressive AI bets—Pinterest is catching heat from the people who historically made it valuable. According to reporting from 404 Media, Pinterest users, especially artists, say the platform has sharply deteriorated over the last year as it “goes all in” on AI. The complaints come in two flavors. One is the content layer: feeds increasingly stuffed with AI-generated images—what users bluntly call “AI slop.” Even if you’re trying to curate a reference library or follow human artists, you can end up fighting the recommendation system by repeatedly telling Pinterest you don’t want that content, essentially retraining your own feed by hand. The second is enforcement: creators describe AI-driven moderation pulling down posts and mislabeling human-made work as “AI modified,” sometimes even leading to account bans. An artist, Tiana Oreglia, said it feels increasingly impossible to reach a human, and described frequent takedowns while using Pinterest to store reference materials like anatomy photos. She also claims the automated enforcement seems to disproportionately flag images of female figures—even fully clothed—forcing constant appeals. Pinterest told 404 Media it has clear rules around adult content and uses a mix of AI and human review, including a human-run appeals process that can restore content when mistakes happen. But the user examples show how messy automated categories can get: a muscular woman in a bikini holding knives flagged, a painting of two clothed women embracing flagged, and even a stock photo of a man with a gun reportedly flagged as “self-harm.” On Reddit, multiple artists describe an “endless loop” where Pinterest keeps auto-tagging hand-drawn art as AI, even after appeals succeed—appeals that can take a day or two. Another artist said work from a decade ago is being labeled as AI, and removing the label is a long process that can still be denied. There’s also a training-data undertone here. One artist removed their own work after learning public pins could be used to train Pinterest’s text-to-image model, Pinterest Canvas—yet their art could still be reposted by others, effectively feeding the model anyway. Pinterest recently laid off around 15% of its workforce and has been explicit about prioritizing AI. The tension is that “AI-forward” can mean better discovery in theory, but it can also mean more synthetic content, less human oversight, and a customer experience where the appeals process becomes part of the product. For a platform built on visual trust and attribution, that’s a risky trade.

Pinterest AI slop and moderation

Now for something more technical—and arguably more exciting if you care about security: a new benchmark called BinaryAudit asks whether AI agents can actually reverse-engineer binaries to find hidden backdoors. Quesma published BinaryAudit as an open-source benchmark designed to test a very specific and very real-world skill: given a large, stripped binary executable with no source code, can an AI agent detect a backdoor—and not just say “yes or no,” but identify the function address where the malicious code lives? They partnered with reverse-engineering specialist Michał “Redford” Kowalczyk from Dragon Sector and built tasks from real open-source programs—lighttpd, dnsmasq, Dropbear, and the Rust proxy Sozu—then injected controlled, artificial backdoors. Agents are allowed to use common reversing tools—Ghidra, Radare2, GNU binutils—so this isn’t a pure “model in a box” test. It’s closer to how a real analyst might work, except the analyst is a tool-using agent. The results: impressive in flashes, not dependable overall. The top model listed, Claude Opus 4.6, solved 49% of tasks. Gemini 3 Pro came in at 44%, and Claude Opus 4.5 at 37%. But the statistic that matters for anyone thinking about operational deployment is the false-positive rate: models reported backdoors in clean binaries about 28% of the time. That’s where the base-rate problem bites. In real environments, truly backdoored binaries are rare. A tool that screams “backdoor!” more than a quarter of the time on clean inputs becomes noise, not signal—and it can burn security teams with endless triage. BinaryAudit includes illustrative cases. One success story: an agent spots a suspicious import of popen(), traces it to a “debug header” routine in lighttpd, and confirms it executes commands from an undocumented HTTP header, then leaks output back via a response header. That’s meaningful tool-driven reasoning, not just pattern matching. A failure story is equally telling: an agent finds an execl("/bin/sh", "sh", "-c", ...) path in dnsmasq—so it sees the dangerous-looking behavior—but rationalizes it as legitimate script execution and doesn’t verify that the command originates from untrusted DHCP packet data. In other words: the model identifies the loaded gun, but doesn’t follow who’s pulling the trigger. The authors describe binaries as a needle-in-a-haystack problem: the malicious change might be a few lines spread across thousands of functions. They also point out practical tool limits—Ghidra and Radare2 are decent for C, shakier for Rust, and struggle with large Go binaries, so the benchmark mostly avoids Go to keep tool quality from dominating. Even with all that, the conclusion is optimistic: AI is now capable of genuine reverse-engineering assistance, especially as a first-pass audit for non-experts. The next improvements could come from better agent strategies—what they call context engineering—plus access to commercial tools like IDA Pro or Binary Ninja, and local fine-tuned models for organizations that can’t ship sensitive binaries to external services. If you’re in security, BinaryAudit looks like a useful reality check: we’re past demos, but we’re not at dependable automation yet.

Aqua encrypted agent messaging protocol

Let’s shift from auditing binaries to coordinating agents. There’s a new open-source project called Aqua—short for “AQUA Queries & Unifies Agents”—positioning itself as a messaging protocol and CLI for AI agents. Aqua is hosted on GitHub under quailyquaily/aqua, written mainly in Go, and it’s focused on practical infrastructure: peer-to-peer communication with identity verification, end-to-end encryption, and durable message storage using inbox and outbox queues. A detail worth calling out is connectivity. Aqua supports Circuit Relay v2 so agents can communicate across networks even when direct dialing fails—basically a built-in fallback for the real world where NATs and firewalls exist. The workflow is intentionally CLI-first: generate a peer ID, run a node, exchange and verify contacts, send messages, and check unread inbox items. Data lives in a local directory by default, and you can relocate it via an environment variable or a flag. The repo also documents relay mode, diagnostics like ping and capabilities, and includes architecture notes. And because today is February-23rd-2026, it’s neat timing that the project’s latest listed release is v0.0.18 dated today. It’s early-stage, but it points at a bigger trend: agent ecosystems need more than model APIs—they need secure, durable, auditable communication patterns that look a lot like messaging systems and less like one-off tool calls.

LLM Timeline: models and milestones

Two quick items to round out the episode—both more reflective, but still practical. First, the LLM Timeline website. It’s essentially a dense, chronological reference of more than 194 major language models and milestones from 2017 through early 2026. It starts where many of these stories begin: the Transformer paper, “Attention Is All You Need,” in 2017. Then it charts the major waves—ELMo, GPT-1, BERT; the scaling era of GPT-2 and GPT-3; Mixture-of-Experts work like GShard and Switch Transformer; alignment milestones like InstructGPT and the explosion of ChatGPT in late 2022. It also tracks the open-weights acceleration, the rise of long-context systems, multimodality, and the recent “reasoning model” push, plus efficiency shocks like DeepSeek’s cost claims and the broader move toward agentic, tool-using models heading into 2026. If you ever find yourself asking, “Wait—when did 1M-token context become a thing?” or “Which releases were open versus closed?” this timeline is built for that kind of grounding. Second, an essay by Juan Cruz Fortunatti that blends Wittgenstein with modern LLM-assisted development. The core claim is that many failures in AI collaboration—especially in creative coding—aren’t just about raw capability. They’re about mismatched “language games.” Fortunatti describes working with models like Claude and GPT across Three.js visualizations, shaders, and particle systems. Models can refactor large codebases, but they stumble when the goal is subjective: “make it more fluid,” “more organic,” “more porous.” He maps that to Wittgenstein’s idea that meaning isn’t a private object in your head—it’s shaped by shared use in a community and a context. His practical takeaway is refreshing: codebases themselves become the shared reference frame. As a project evolves, names and components—“card component,” “this dropdown”—stop being vague. They become inspectable anchors that both human and model can point to. In that view, “prompt engineering” has a ceiling when you’re trying to create meaning in an empty space; the real leverage is building shared structure the model can read, navigate, and test. It’s not philosophy for philosophy’s sake. It’s a reminder that better AI collaboration often looks like better software engineering: clear components, consistent naming, legible modules, and feedback loops that ground intent.

That’s it for today’s AI News edition. If there’s a thread connecting these stories, it’s that AI is becoming infrastructure—subscriptions, moderation, security analysis, agent communication, even the way we form shared understanding in code. And when AI becomes infrastructure, reliability, escalation paths, and clear interfaces matter just as much as flashy capabilities. I’m TrendTeller, and this was The Automated Daily. Links to all the stories we discussed can be found in the episode notes.