AI News · May 28, 2026 · 7:30

CEO “AI psychosis” and layoffs & Legal and coding benchmarks reality - AI News (May 28, 2026)

CEOs chasing “AI psychosis,” LLMs claiming new math proofs, tougher legal/coding benchmarks, YouTube AI labels, privacy debates, and fresh GPU tuning.

CEO “AI psychosis” and layoffs & Legal and coding benchmarks reality - AI News (May 28, 2026)
0:007:30

Our Sponsors

Today's AI News Topics

  1. CEO “AI psychosis” and layoffs

    — TechCrunch spotlights “AI psychosis,” where executives over-believe agent automation after glossy demos, fueling layoffs despite mixed productivity evidence.
  2. Legal and coding benchmarks reality

    — Two new yardsticks—Legal Agent Benchmark and DeepSWE—show frontier models still struggle with long-horizon, real-world work, emphasizing reliability over hype.
  3. AI claims major math proofs

    — Anthropic staff say Claude Mythos can tackle the Erdős unit-distance conjecture, echoing OpenAI and DeepMind math wins and reigniting debate over tool-assisted vs “pure” LLM results.
  4. Containing the blast radius of agents

    — Anthropic details agent security lessons: sandboxes, VMs, and egress controls matter because human approvals are inconsistent and attackers exploit weak boundaries.
  5. AI transparency and anti-AI search

    — YouTube is making AI-content labels more prominent and adding automatic detection signals, while DuckDuckGo’s AI-free search page sees a surge amid backlash to AI-heavy results.
  6. Customer data used for training

    — PostHog plans to train in-house models on customer usage data with opt-outs and regional defaults, highlighting the privacy tradeoffs behind “smarter” product features.
  7. GPU tuning, compute, and geopolitics

    — NVIDIA’s CompileIQ aims to squeeze extra GPU performance via compiler auto-tuning, while SpaceX’s S-1 raises questions about terrestrial vs orbital AI compute—and China tightens travel rules for top AI staff.
  8. Better image generation and AI fluency

    — Microsoft’s MAI-Image-2.5 climbs leaderboards with better text-in-image control, and Anthropic is reportedly building an AI Fluency scorecard to evaluate how humans use AI, not just how AI performs.

Sources & AI News References

Full Episode Transcript: CEO “AI psychosis” and layoffs & Legal and coding benchmarks reality

An AI model may have produced a fresh proof for a geometry problem that’s been open since 1946—and now the big question is what even counts as an “AI breakthrough” anymore. Welcome to The Automated Daily, AI News edition. The podcast created by generative AI. I’m TrendTeller, and today is May 28th, 2026. In the next few minutes: why some CEOs are overestimating AI agents, what new benchmarks say about real-world reliability, and how platforms and vendors are scrambling to earn trust as AI spreads into everything.

CEO “AI psychosis” and layoffs

First up: the C-suite reality check. TechCrunch highlights what Box CEO Aaron Levie called “AI psychosis”—the tendency for executives to see dazzling agent demos and assume entire workflows are basically solved. The problem, Levie argues, is distance: leaders get the happy-path prototype, while teams on the ground deal with the last mile—hallucinations, edge cases, debugging, and the painful work of fitting AI into company-specific processes. The story ties this mindset to ongoing layoffs, with firms increasingly framing cuts as “AI-driven productivity,” even as research from places like UC Berkeley, NBER, and others suggests gains are inconsistent and sometimes just shift the bottleneck upward to managers who must review a flood of AI output.

Legal and coding benchmarks reality

That theme—measuring reality instead of vibes—shows up in two new benchmarks. The Legal Agent Benchmark, or LAB, released early baseline results on long-horizon legal tasks graded with an unforgiving “all-pass” standard. End-to-end success stayed in the single digits across frontier models, which is a blunt reminder that “pretty good drafting” is not the same as dependable legal work product. Meanwhile, Datacurve’s DeepSWE benchmark targets real software engineering changes across active open-source repos, designed to reduce contamination and catch verifier errors that can inflate leaderboard scores. Put together, these projects are nudging the industry away from bragging rights and toward a harder question: can agents reliably finish the job when the task is long, messy, and judged like it would be in production?

AI claims major math proofs

Now to the most eyebrow-raising claim of the day: more AI-assisted math breakthroughs. Anthropic employees say a system they call Claude Mythos produced a simple proof for the Erdős unit-distance conjecture, a problem open since 1946. That lands right after OpenAI publicized its own claimed disproof, and alongside DeepMind’s recent announcements around solving multiple Erdős problems using formalization workflows. The interesting part isn’t just who’s first—it’s how these results are achieved. Reports suggest Anthropic used a multi-agent setup where separate Claude instances explored different paths and then shared discoveries. It raises the bar for what we call an “LLM achievement,” because the line between a single model’s insight and an orchestrated tool-and-agent system is getting blurrier by the week.

Containing the blast radius of agents

As agents get more capable, Anthropic is also sharing more about how to keep them from doing damage. In a new write-up, the company argues that human-in-the-loop supervision is not enough, because people approve prompts too easily and get fatigued. So the focus shifts to containment: sandboxes, sealed VMs, and strict network controls that limit what an agent can touch even when it makes a bad decision—or an attacker tricks it. Anthropic also describes incidents that shaped this approach, including cases where credential exfiltration was possible despite guardrails. The broader takeaway is one many security teams will relate to: the safest policy is one that assumes mistakes will happen and designs the environment so mistakes can’t travel far.

AI transparency and anti-AI search

On trust and disclosure, YouTube is tightening how it labels AI-altered video. The platform says its AI disclosure labels will become more visible—prominent below long-form videos and overlaid on Shorts when content is photorealistic or meaningfully modified. It’s also adding automatic detection signals starting this May, so labels may appear even if creators don’t proactively disclose. YouTube says this is about transparency, not punishment—labels shouldn’t affect recommendations or monetization. At the same time, TechCrunch reports a surge of interest in DuckDuckGo’s AI-free search page, with traffic and installs rising after renewed frustration with AI-heavy search experiences. The common thread: users want control and clarity, not surprises.

Customer data used for training

That tension shows up in product analytics too. PostHog says it plans to start training its own AI models using customer data stored in PostHog, aiming to make features more proactive—starting with scaling session replay analysis and eventually things like synthetic user testing and behavior prediction. The company frames this as opt-out for some customers and opt-in by default for others, with anonymization and in-house training meant to reduce risk. Even if the intentions are practical, the significance is bigger: more vendors are deciding that generic foundation models aren’t enough, and the only way to get “useful” is to train on real customer behavior—forcing a new round of hard conversations about consent, defaults, and trust.

GPU tuning, compute, and geopolitics

In AI infrastructure, NVIDIA introduced CompileIQ in CUDA 13.3, an auto-tuning framework that searches for workload-specific compiler settings to squeeze extra performance out of already-optimized GPU kernels. This matters because modern AI stacks often hinge on a relatively small set of hot kernels, and incremental improvements can translate into real throughput and cost wins at scale. On the business side of compute, SpaceX’s latest S-1 filing is stirring debate by presenting two futures at once: big near-term revenue from terrestrial data centers—plus a far more speculative argument that AI inference eventually belongs in orbit for power and cooling advantages. The filing doesn’t fully reconcile how those stories fit together, which is exactly the kind of ambiguity investors will scrutinize when capex is massive and timelines are long.

Better image generation and AI fluency

Two more signals that AI is now treated like strategic infrastructure: China is reportedly expanding overseas travel restrictions to include top AI staff at major private companies, requiring approvals before some employees can travel. That could chill conferences, partnerships, and routine business trips, and it underlines how much AI talent is being viewed through a national-security lens. And in the multi-model world, OpenRouter raised a major new round, reflecting how enterprises increasingly want the ability to route workloads across different LLMs to manage cost, quality, and risk—less loyalty to any single model, more emphasis on flexibility and leverage.

Finally, on creative tools and the human side of AI work: Microsoft’s MAI-Image-2.5 debuted near the top of the Arena leaderboard, with the company emphasizing better instruction-following and, notably, improved text rendering inside images—one of the big blockers for professional design use. And Anthropic appears to be experimenting with a personal “AI Fluency” scorecard inside Claude, aimed at evaluating how people collaborate with AI across sessions and nudging users toward better habits like iterative refinement and careful checking. The subtext is important: the next wave of progress may come not only from smarter models, but from teaching humans to use them in ways that are safer and more dependable.

That’s the state of play: hype colliding with benchmarks, breakthroughs colliding with definitions, and trust becoming the real product feature.

That’s it for today’s AI News edition of The Automated Daily. If you’re tracking where AI is actually delivering—and where it’s still mostly a demo—today was a pretty clear snapshot. Links to all the stories we covered can be found in the episode notes. I’m TrendTeller—thanks for listening, and I’ll see you tomorrow.

More from AI News