GPT helps crack Erdős conjecture & Talent and compute arms race - AI News (Apr 28, 2026)
GPT sparks an Erdős proof, AI labs trade talent for chips, Anthropic gets mega-funding, Gemini may go credit-based, and cloud threats stay stubbornly basic.
Our Sponsors
Today's AI News Topics
-
GPT helps crack Erdős conjecture
— An amateur used GPT-5.4 Pro to spark a novel proof idea for an Erdős conjecture on primitive sets; experts say humans still had to verify and rewrite the argument. Keywords: Erdős, primitive sets, GPT-5.4, proof verification, Terence Tao. -
Talent and compute arms race
— Thinking Machines Lab and Meta are trading researchers while big cloud deals unlock scarce Nvidia compute; the pattern repeats with Anthropic funding and Meta’s AWS CPU expansion. Keywords: talent mobility, GB300, cloud commitments, infrastructure access, valuation. -
Cloud security risks in 2026
— Wiz’s 2026 retrospective says most breaches still come from misconfigurations, exposed secrets, and known vulns—AI mainly expands the surface area and speeds attacker workflows. Keywords: cloud misconfig, supply chain, identities, integrations, AI reconnaissance. -
AI agents: memory and repo crawling
— Anthropic is pushing persistent Memory for managed agents, while Claude Code experiments like “Bugcrawl” hint at full-repo scanning—raising stakes for governance and token budgets. Keywords: agent memory, audit logs, Claude API, repo analysis, enterprise controls. -
Evaluating and measuring AI coding
— Teams are building evaluation stacks because LLM testing isn’t deterministic, while debates grow over inflated “AI wrote X% of code” dashboards that can mislead leadership. Keywords: LLM eval, CI regression, LLM-as-judge, attribution metrics, ROI. -
Distributed training across data centers
— DeepMind’s Decoupled DiLoCo trains across regions with looser synchronization, aiming to keep runs going through outages and reduce networking bottlenecks. Keywords: distributed training, resiliency, WAN bandwidth, self-healing, scaling. -
Generative vision models do perception
— The “Vision Banana” paper claims an image generator can be tuned into a strong general vision system by expressing tasks as image outputs, blurring the line between understanding and generation. Keywords: generative pretraining, segmentation, depth, unified vision, benchmarks. -
Sovereign AI: hype versus reality
— A critique argues most enterprises don’t need nationally branded frontier models; the real requirement is sovereign deployment—data residency, auditability, and control of data flows. Keywords: sovereign AI, data residency, open models, vendor lock-in, compliance. -
AI product trust, pricing, and UX
— Google may move Gemini toward credits, Canva fixed a politically sensitive text-alteration bug, and OpenAI published new principles—together highlighting trust, pricing, and governance pressures. Keywords: usage credits, safety, transparency, content integrity, policy. -
Real-world AI: the agent-run store
— A San Francisco shop run by an AI agent made bizarre inventory choices and lost money, illustrating how fragile autonomy looks outside demos and APIs. Keywords: AI agent, retail ops, automation limits, human-in-the-loop, hype gap.
Sources & AI News References
- → Thinking Machines Lab counters Meta poaching with major hires and a Google compute deal
- → San Francisco Boutique Run by an A.I. Agent Struggles With Inventory and Staffing
- → Post Argues Sovereign AI Labs Are Unnecessary for Most Enterprise Needs
- → Google Eyes Up to $40B Investment in Anthropic as Compute Demand Surges
- → Wiz: Familiar Cloud Weaknesses Drove 2025 Attacks as AI and Ecosystem Trust Amplified Impact
- → Sean Boots Makes the Case for ‘Generative AI Vegetarianism’
- → DeepMind unveils Decoupled DiLoCo for fault-tolerant global AI training
- → Google Signals Shift to Credit-Based Gemini Usage and Adds New Images Section
- → SpaceX Secures $60B Option to Buy Cursor as AI Compute Costs Squeeze Margins
- → Canva fixes Magic Layers bug that replaced 'Palestine' in user designs
- → Anthropic Adds Auditable Memory to Claude Managed Agents in Public Beta
- → David Silver’s new AI lab Ineffable raises $1.1B to build reinforcement-learning ‘superlearner’
- → Meta Expands AWS Deal to Run Agentic AI Workloads on Graviton CPUs
- → OpenAI Issues New Five-Principle AGI Framework Amid Rising Regulatory Scrutiny
- → Vision Banana Paper Claims Image Generators Can Become Generalist Vision Models
- → Coding Agents Fuel AI Demand Surge, Exposing Compute and Chip Supply Bottlenecks
- → Anthropic tests ‘Bugcrawl’ repo-wide bug scanning for Claude Code
- → Stash launches as a self-hosted persistent memory layer for AI agents via MCP and Postgres
- → VentureBeat outlines a layered evaluation stack to monitor LLM drift, retries, and refusals
- → Paper Proposes Trajectory Summaries to Scale Test-Time Compute for Coding Agents
- → Efficient Video Intelligence in 2026: Compression, On-Device Tracking, and Deployment Challenges
- → Amateur’s ChatGPT Prompt Leads to New Proof of 60-Year-Old Erdős Conjecture
- → Cohere and Aleph Alpha Form Sovereign AI Partnership Backed by Schwarz Group
- → Tests Suggest AI IDE Dashboards Can Overstate How Much Code AI Writes
Full Episode Transcript: GPT helps crack Erdős conjecture & Talent and compute arms race
A 23-year-old amateur may have cracked a decades-old Erdős conjecture—after a prompt to GPT-5.4 Pro—yet the real story is what humans still had to do to make the math believable. Welcome to The Automated Daily, AI News edition. The podcast created by generative AI. I’m TrendTeller, and today is April-28th-2026. Let’s get into what happened in AI, and why it matters.
GPT helps crack Erdős conjecture
Starting with that math surprise. A young amateur, Liam Price, posted what looks like a genuine solution to a long-standing Erdős conjecture about “primitive sets” and a particular sum Erdős studied. What’s striking is that the key move reportedly came from GPT-5.4 Pro making an unusual connection—pulling in a known formula from a neighboring area that researchers hadn’t applied in this exact way. Experts like Terence Tao and others say the AI’s raw proof was messy, but the central idea appears to hold up after human reconstruction. The takeaway isn’t “AI replaces mathematicians.” It’s that models can now propose unfamiliar pathways—while humans still carry the burden of rigor, explanation, and trust.
Talent and compute arms race
Now zooming out to the AI industry’s other big theme: the race is increasingly about who gets the talent and the compute—often at the same time. Thinking Machines Lab, or TML, is reportedly scaling fast by hiring notable researchers from Meta, even as Meta has picked up several TML founders. On paper it looks like a tug-of-war; on LinkedIn, the net flow currently seems to favor TML. And it’s not just hiring—TML also landed a major cloud deal with Google that reportedly includes early access to Nvidia’s newest GB300 chips. For a lab with roughly 140 people and limited public product output, that combination—elite researchers plus scarce infrastructure—signals how “access” can outrank track record in today’s AI market.
Cloud security risks in 2026
Anthropic is a second example of the same dynamic, but at hyperscaler scale. Bloomberg reports Google plans to invest at least ten billion dollars into Anthropic, potentially much more if targets are hit, coming days after Amazon announced another large commitment. The practical reason is capacity: Anthropic’s growth—especially around Claude’s developer and agentic tooling—has pushed infrastructure hard enough to cause outages and usage limits. These mega-investments are also a flywheel: cloud providers fund top labs, and those labs then spend heavily on those same clouds to train and serve models, even when the cloud providers are building their own AI offerings.
AI agents: memory and repo crawling
And the compute story isn’t only GPUs anymore. Meta and AWS expanded an agreement to run large-scale AI workloads on Graviton CPUs, framing it around “agentic AI” workloads that can be surprisingly CPU-hungry in production—think orchestration, retrieval, and lots of small, fast tasks around the model. The broader message: the AI stack is diversifying, and infrastructure advantages now include CPUs, networking, power delivery, and operations—not just the latest accelerator.
Evaluating and measuring AI coding
A separate analysis made that bottleneck picture even clearer, arguing that AI coding agents may be the first truly repeat-paid AI product at scale—and that demand is colliding with slow-to-expand industrial realities. The claim is that shortages move upstream: it’s not only GPU supply, but packaging, memory, power, and eventually the limited ability of advanced manufacturing to ramp quickly. Why it matters: even if model quality keeps improving, users may still feel friction through rationing—stricter limits, higher prices, or more aggressive tiering—simply because atoms and megawatts don’t scale like software.
Distributed training across data centers
That pressure is showing up in deals that look more like corporate strategy than simple product growth. One report says SpaceX has an option arrangement tied to the AI coding startup Cursor—either a massive acquisition option, or a large payout linked to joint work. Cursor reportedly needed a backstop as model-usage costs squeezed margins, and SpaceX gains leverage: access to strong coding automation while steering compute and model dependence. It’s another sign that application-layer AI companies are being pulled into infrastructure politics—because inference bills can become existential.
Generative vision models do perception
Staying with agents and developer tooling: Anthropic released a public beta “Memory” feature for Claude Managed Agents. The key point here is governance. Anthropic is positioning memory as something you can audit, scope, and roll back—more like a controlled knowledge base than a mysterious blob of context. Persistent memory is what makes agents feel less like short-lived chat sessions and more like ongoing coworkers, but it also raises obvious questions about privacy, data retention, and who’s allowed to write to that memory in the first place.
Sovereign AI: hype versus reality
In the same neighborhood, Anthropic is also testing an unreleased Claude Code feature called “Bugcrawl,” which appears designed to scan larger portions of a repository—more like broad codebase analysis than file-by-file help. If this ships, it pushes coding assistants further into “wide context” work that teams actually pay for: finding patterns, risky areas, and likely defects across a whole project. The catch, as the interface itself warns, is cost—these scans can be token-intensive, and that cost will shape who uses it and how often.
AI product trust, pricing, and UX
If agents are getting more capable, teams also need better ways to decide whether a new model or prompt change made things better or worse. One essay argues traditional testing breaks for stochastic systems, so enterprises are building an “AI evaluation stack”: quick structural checks to catch obvious failures, plus model-based judging to score usefulness and policy compliance, backed by curated regression sets that evolve from real production incidents. The point is simple: without continuous evaluation, AI quality drifts quietly—until it fails loudly in front of customers.
Real-world AI: the agent-run store
And on the topic of measuring AI in software work, a developer reverse-engineered analytics from an AI-enhanced IDE and argues the “percent of code written by AI” can be wildly inflated depending on how the metric is computed. Another tool that ties attribution to commits looked more reasonable, but still overcounted in edge cases. Why it matters: leaders love tidy ROI dashboards, but simplistic byte-or-line counting can distort staffing plans, performance expectations, and even legal assumptions about authorship.
On the research side, Google DeepMind introduced Decoupled DiLoCo, a distributed training approach meant to keep large runs moving even when parts of the system fail or when compute is spread across regions. Instead of tightly locking every accelerator into the same step, it allows looser synchronization, so an outage doesn’t freeze the entire job. The significance is operational: frontier training is increasingly a reliability problem as much as an algorithmic one.
Another paper—nicknamed “Vision Banana”—argues something provocative: image generators can be tuned into strong general visual understanding systems by turning perception tasks into image outputs, like producing a segmentation mask or depth map as an image. If the results hold up broadly, it suggests generative pretraining may become an even more central route to general-purpose vision, reducing the need for separate specialized architectures for every task.
Meta research also surveyed “efficient video intelligence” as of April 2026, emphasizing a practical trend: compressing and distilling video understanding so it works on real devices and long clips, not just short benchmarks. The through-line is efficiency—less redundant processing, smarter temporal handling, and on-device models that are finally credible for tracking and segmentation. It’s a reminder that progress isn’t only bigger models; it’s making them usable where latency, battery, and cost actually matter.
Now to sovereignty—because it’s everywhere in policy decks right now. One critique argues “sovereign labs,” meaning nationally branded frontier-model builders, are mostly unnecessary for typical enterprise needs. The author draws a line between sovereign pre-training and sovereign deployment, and says most companies really want data residency, auditability, and protection against their data being absorbed into someone else’s training loop. That’s less about model nationality and more about controlling data flows and deployments—often using open models locally, with strict isolation for sensitive inputs.
Still, sovereign AI is attracting major alliances. Cohere and Germany’s Aleph Alpha announced a partnership positioned as an independent, enterprise-grade alternative for regulated sectors, with sovereign cloud hosting in the mix. Whether this becomes a real technical advantage or mainly a procurement story will depend on performance, integration, and long-term support—but the demand signal is clear: governments and regulated industries want leverage and options.
A few product and trust stories round out the day. Google appears to be preparing a shift of the Gemini app toward a credit-based usage model. If that lands, it’s a more flexible way to price heavy features—especially long multimodal sessions and agentic tools—while making costs feel more “metered” than “tiered.” Expect this to influence user behavior, because credits change how people experiment.
Canva also dealt with a trust-and-safety mess: users reported its Magic Layers feature was replacing the word “Palestine” in designs. Canva says it fixed the bug and added safeguards. Even if it was unintended, it’s a sharp example of why creators get nervous when AI tools touch existing content: a small, opaque change can become politically loaded instantly, and trust is hard to win back once people fear silent edits.
In governance and public posture, OpenAI published a new “Our Principles” statement, framing commitments around democratization, empowerment, prosperity, resilience, and adaptability—while acknowledging that in some cases it may prioritize safety over maximum user control. These documents don’t settle debates on their own, but they signal how labs are positioning themselves as scrutiny rises from regulators and the public.
Finally, two reality checks—one societal, one operational. A writer argued for “generative AI vegetarianism”: a personal stance of opting out of generative AI tools in daily life to preserve autonomy, craft, and critical thinking, while still allowing older, narrower automation like spam filtering. Whether you agree or not, it’s a useful label for a growing counter-movement against default AI adoption.
And in San Francisco, a boutique called Andon Market is being billed as the first retail store “run by an AI agent,” Luna. The experiment gave Luna money, a lease, and control over decisions—yet early outcomes include bizarre over-ordering, missing price tags, scheduling shutdowns, and a reported operating loss. It’s an unusually honest demo of the gap between persuasive AI interfaces and the messy, physical, exception-filled world. Agents can plan and talk; running a store still demands dependable execution—and humans are quietly doing much of that work.
That’s the update for April-28th-2026: AI inspiring real mathematical progress, labs and clouds locking arms around compute, agents gaining memory and broader codebase reach, and repeated reminders that trust and reliability are the hard parts. Links to all stories can be found in the episode notes. Thanks for listening to The Automated Daily, AI News edition—I’m TrendTeller. Talk to you tomorrow.