Transcript: AI proves new math | The Automated Daily

Sometime in the last week, an internal reasoning model at OpenAI produced a verifiable proof that overturned a longstanding conjecture in discrete geometry. The conjecture, posed by Paul Erdős, was about planar unit distances — a question that had been studied actively for roughly three decades. The model did not just generate a plausible-sounding argument. It produced steps that external mathematicians could check, and those mathematicians have now confirmed the proof works. Not 'AI suggests a direction.' Not 'AI helps a human prove something.' A model produced a verifiable proof, and the field accepts it. Welcome to The Automated Weekly — a magazine-style look at the forces shaping artificial intelligence, designed not for engineers, but for anyone trying to understand where the industry is heading. I'm TrendTeller. This week, the Erdős proof landed on the same week that Microsoft was reported to be ending Claude Code licenses for its engineers — steering them onto GitHub Copilot. The same week Anthropic was reported to be discussing Microsoft's Maia 200 custom chips while signing a roughly forty-five-billion-dollar SpaceX compute deal. The same week the Wall Street Journal said OpenAI is moving toward an IPO as early as September. The same week Andrej Karpathy left to join Anthropic. And the human-cost side of AI got louder. A JavaScript educator with a free educational site that had run for ten years took it offline because AI crawlers had tripled his hosting bills while his income had fallen to zero. The Pew Research Center published a survey showing a sharp optimism gap between AI experts and the general public. Eric Schmidt was booed off-stage at the University of Arizona commencement. Five threads. One week. Let's pull on each.

The Erdős announcement came in two parts. First OpenAI's internal team published the result, with detailed accompanying material on the reasoning approach. Then external mathematicians went through the steps and confirmed the proof is genuine — meaning every transition between propositions is rigorously justified, no skipped cases, no unstated assumptions. The conjecture is in the category sometimes called 'concrete but hard': about distances between points in the plane, the kind of problem that admits no shortcut and resists most known techniques. What makes the result interesting isn't just that AI did math. AI has been doing math for a while, with humans either prompting or verifying. What's different here is that the reasoning model generated the proof end-to-end in a form mathematicians can check at the level of individual steps. That's the threshold where the answer to 'is AI doing real research' stops being a debate. It landed in a week with other quieter signals about how these models actually learn. A new paper from researchers at multiple institutions argued that, with enough compute, the best data-quality filter for pretraining may be no filter at all — that careful curation has been quietly destroying signal at scale. Separately, researchers reported a phenomenon called mode-hopping during pretraining: models abruptly switching between shallow heuristics and actual reasoning, complicating which checkpoints to ship. A Goodfire paper this week argued that sparse autoencoders — the dominant tool for mechanistic interpretability — often capture features in a 'dilution' regime where individual neurons represent fractional concepts, and proposed clustering them to recover the underlying manifold structure. On the efficiency side, Nous Research introduced Lighthouse Attention to attack long-context KV-cache costs, and a DeepMind and Seoul National University collaboration released LiteFrame, a compact video encoder that meaningfully extends long-form video understanding. Taken together: the engineering of these models is moving faster than the science describing them. The Erdős proof is the public moment. The data-filter and interpretability papers are the underground signals that suggest the public moments are about to get more frequent.

Three things lined up this week and pointed at the same conclusion. First, multiple outlets reported that Microsoft has been ending Claude Code licenses for many of its engineers and steering teams toward GitHub Copilot CLI. The framing internally is budget discipline and ecosystem control. The signal externally is that Microsoft no longer wants its developer fleet running on a competitor's premium tooling — and Microsoft is the largest investor in OpenAI, so 'competitor' here specifically means Anthropic. Second, a separate report said Anthropic is discussing purchasing capacity on Microsoft's new Maia 200 custom AI chips. Anthropic — formerly the most pointedly anti-OpenAI frontier lab — is now potentially renting compute from Microsoft. Bloomberg also reported Anthropic agreed to a roughly forty-five-billion-dollar compute commitment with SpaceX. The alliance map keeps getting rewritten. Third, the Wall Street Journal reported that OpenAI is moving toward an IPO as early as September 2026, with the recent dismissal of Elon Musk's lawsuit removing a significant overhang. If that timeline holds, OpenAI will be the most-anticipated public-market event of the decade. In the background, NVIDIA's Vera CPU started shipping to Anthropic, OpenAI, xAI, and Oracle — the company's first ARM-based server CPU, designed to pair with its GPUs at scale. Alibaba unveiled the Zhenwu M890 accelerator as part of China's push to reduce reliance on NVIDIA amid export controls. The compute squeeze story is now a margin story. Enterprises are warning that LLM inference costs are eating their margins. The Epoch AI team published an analysis arguing that frontier labs currently use only a minority of the world's operational AI compute — most of it goes to inference, open models, and non-LLM workloads. The labs need to keep growing their share to maintain training capacity. That growth costs more than their revenue can comfortably support.

While the labs were burning cash on compute, the agents themselves had a more practical week. Alibaba previewed Qwen3.7-Max with the headline that an internal evaluation included a 35-hour autonomous coding optimization run. The benchmark isn't a single response — it's endurance. How long can the agent run, with heavy tool use, before it loses coherence or hits an environmental failure? The shift from 'model accuracy on a prompt' to 'model endurance on a workflow' is a real category change in how labs are positioning their products. Cursor's engineering team published a piece arguing something many in the field have been observing: cloud coding agents live or die by the development environment they're given. Missing dependencies, misconfigured runtimes, and unavailable tools don't just cause errors — they quietly degrade output quality. The model still produces something. It just produces something worse, and you don't notice until weeks later when the code starts misbehaving. A separate strand of agent infrastructure work focused on durable execution. As agents move from a ten-minute session on a laptop to running continuously on a dedicated VM for hours or days, the failure modes shift. Now you have to handle the cloud provider's reboots, transient network outages, deployment-induced restarts, and partial-state recovery. Durable execution frameworks are starting to be embedded directly into agent harnesses for that reason. On the governance side, Warp launched Oz, an enterprise control plane for multi-harness AI agent orchestration. Oracle pushed an agent-aware data layer with permissions, audit logs, and structured access. Anthropic shared deployment patterns for Claude Code in very large repositories. Google's I/O announcements pivoted Gemini further toward agentic tasks across Search, YouTube, and Workspace — long-running actions, not chat replies. Two opposite pressures meeting at the agent layer: more capability, more failure modes, more infrastructure overhead. Whichever stack figures out the operational story first will define what enterprise AI looks like.

On Tuesday, OpenAI announced it would adopt C2PA Content Credentials and the SynthID watermarking standard for its image outputs. C2PA is a cross-industry provenance specification originally backed by Adobe, the BBC, Microsoft, and others. It cryptographically signs content with metadata about how and when it was generated. SynthID, from Google DeepMind, embeds a statistical watermark in the pixels themselves — invisible to humans, detectable by classifiers. OpenAI's adoption of both is meaningful. It means the dominant AI image producer is now committing to a verifiable provenance signal. On the same day, an open-source tool launched specifically to remove visible and invisible AI watermarks and strip provenance metadata. The tool's authors framed it as a free-speech and reverse-engineering project. The practical effect is to make the watermarks OpenAI just adopted defeatable for anyone motivated. The same week, OpenAI was reported to have acquired Weights.gg — the largest open library of celebrity voice clones. The terms weren't disclosed. The strategic reasoning is straightforward: control the largest source of unauthorized voice models, and the conversation about voice IP shifts from 'OpenAI is enabling impersonation' to 'OpenAI is responsibly managing it.' Critics argue both interpretations are correct. ChatGPT also began testing a personal finance dashboard that connects via Plaid to user bank accounts and credit cards. The promise: an AI that can analyze your spending and answer questions about cash flow. The risk surface: an AI service that now holds connectivity to your money, sees every transaction, and has a track record of producing confident-sounding wrong answers. Each of these — provenance adoption, watermark removal, voice library acquisition, bank account integration — is a node in the same trust war. The infrastructure for verifying content authenticity and the infrastructure for circumventing it are being built in parallel, by different actors, on the same week.

The week's most poignant story came from Axel Rauschmayer, a JavaScript educator who runs 2ality.com and several free online programming books that have run for over a decade. He took everything offline. The reason: AI crawlers had tripled his hosting bills while ad revenue and book sales had fallen to zero. The free educational web that anchored a generation of developers is now being scraped at industrial scale by AI training pipelines, with no compensation, while the human attention that used to fund it has migrated to those same AI products. The Pew Research Center published a survey this week showing a sharp optimism gap between AI experts and the general public. Experts are mostly positive about AI's effects on jobs and society. The public is mostly anxious. Gen Z in particular reported high AI usage combined with high anxiety and low confidence in policy direction. The trust gap isn't closing — it's widening. Eric Schmidt was booed off-stage during a commencement speech at the University of Arizona after pro-AI remarks. The Associated Press reported multiple commencement speakers being booed this season for similar comments. The pattern is becoming a real signal of how the cultural conversation is shifting. Andrej Karpathy — a co-founder of OpenAI, longtime Tesla AI lead, and one of the most respected practitioners in the field — left to join Anthropic. Talent moves matter at this layer of the industry. The Manus AI founders, meanwhile, were reportedly seeking financing to unwind Meta's acquisition of the lab after Beijing ordered the deal reversed. AI labs are now strategic national assets, treated as such by states. The texture of the year has shifted. The story of 2026 isn't 'AI continues to advance.' It's 'AI continues to advance while everyone around it gets louder.'

That's your week in AI — May 17th through May 23rd, 2026. An Erdős conjecture fell. The compute economics tightened. Microsoft retreated to GitHub Copilot. Anthropic explored Microsoft's chips and committed forty-five billion to SpaceX in the same week. OpenAI took on bank accounts and IPO timing simultaneously. Agents started getting evaluated on thirty-five-hour runs instead of single prompts. The provenance arms race got both adoption and circumvention on the same day. A JavaScript educator's free website went dark. A commencement crowd booed AI. Karpathy moved labs. Three things to watch next week. First, whether external mathematicians publish any critiques of the Erdős proof — and whether OpenAI extends the same reasoning approach to other longstanding conjectures. Second, whether the Microsoft / Claude Code unwind triggers similar moves at other major enterprises, marking the first real consolidation phase of the agentic-coding market. Third, whether the OpenAI IPO timeline survives the next four months of compute commitments and revenue scrutiny. I'll see you next Saturday. From The Automated Weekly, this is TrendTeller.