GPT helps crack Erdős conjecture & Talent and compute arms race - AI News (Apr 28, 2026)

A 23-year-old amateur may have cracked a decades-old Erdős conjecture—after a prompt to GPT-5.4 Pro—yet the real story is what humans still had to do to make the math believable. Welcome to The Automated Daily, AI News edition. The podcast created by generative AI. I’m TrendTeller, and today is April-28th-2026. Let’s get into what happened in AI, and why it matters.

GPT helps crack Erdős conjecture

Starting with that math surprise. A young amateur, Liam Price, posted what looks like a genuine solution to a long-standing Erdős conjecture about “primitive sets” and a particular sum Erdős studied. What’s striking is that the key move reportedly came from GPT-5.4 Pro making an unusual connection—pulling in a known formula from a neighboring area that researchers hadn’t applied in this exact way. Experts like Terence Tao and others say the AI’s raw proof was messy, but the central idea appears to hold up after human reconstruction. The takeaway isn’t “AI replaces mathematicians.” It’s that models can now propose unfamiliar pathways—while humans still carry the burden of rigor, explanation, and trust.

Talent and compute arms race

Now zooming out to the AI industry’s other big theme: the race is increasingly about who gets the talent and the compute—often at the same time. Thinking Machines Lab, or TML, is reportedly scaling fast by hiring notable researchers from Meta, even as Meta has picked up several TML founders. On paper it looks like a tug-of-war; on LinkedIn, the net flow currently seems to favor TML. And it’s not just hiring—TML also landed a major cloud deal with Google that reportedly includes early access to Nvidia’s newest GB300 chips. For a lab with roughly 140 people and limited public product output, that combination—elite researchers plus scarce infrastructure—signals how “access” can outrank track record in today’s AI market.

Cloud security risks in 2026

Anthropic is a second example of the same dynamic, but at hyperscaler scale. Bloomberg reports Google plans to invest at least ten billion dollars into Anthropic, potentially much more if targets are hit, coming days after Amazon announced another large commitment. The practical reason is capacity: Anthropic’s growth—especially around Claude’s developer and agentic tooling—has pushed infrastructure hard enough to cause outages and usage limits. These mega-investments are also a flywheel: cloud providers fund top labs, and those labs then spend heavily on those same clouds to train and serve models, even when the cloud providers are building their own AI offerings.

AI agents: memory and repo crawling

And the compute story isn’t only GPUs anymore. Meta and AWS expanded an agreement to run large-scale AI workloads on Graviton CPUs, framing it around “agentic AI” workloads that can be surprisingly CPU-hungry in production—think orchestration, retrieval, and lots of small, fast tasks around the model. The broader message: the AI stack is diversifying, and infrastructure advantages now include CPUs, networking, power delivery, and operations—not just the latest accelerator.

Evaluating and measuring AI coding

A separate analysis made that bottleneck picture even clearer, arguing that AI coding agents may be the first truly repeat-paid AI product at scale—and that demand is colliding with slow-to-expand industrial realities. The claim is that shortages move upstream: it’s not only GPU supply, but packaging, memory, power, and eventually the limited ability of advanced manufacturing to ramp quickly. Why it matters: even if model quality keeps improving, users may still feel friction through rationing—stricter limits, higher prices, or more aggressive tiering—simply because atoms and megawatts don’t scale like software.

Distributed training across data centers

That pressure is showing up in deals that look more like corporate strategy than simple product growth. One report says SpaceX has an option arrangement tied to the AI coding startup Cursor—either a massive acquisition option, or a large payout linked to joint work. Cursor reportedly needed a backstop as model-usage costs squeezed margins, and SpaceX gains leverage: access to strong coding automation while steering compute and model dependence. It’s another sign that application-layer AI companies are being pulled into infrastructure politics—because inference bills can become existential.

Generative vision models do perception

Staying with agents and developer tooling: Anthropic released a public beta “Memory” feature for Claude Managed Agents. The key point here is governance. Anthropic is positioning memory as something you can audit, scope, and roll back—more like a controlled knowledge base than a mysterious blob of context. Persistent memory is what makes agents feel less like short-lived chat sessions and more like ongoing coworkers, but it also raises obvious questions about privacy, data retention, and who’s allowed to write to that memory in the first place.

Sovereign AI: hype versus reality

In the same neighborhood, Anthropic is also testing an unreleased Claude Code feature called “Bugcrawl,” which appears designed to scan larger portions of a repository—more like broad codebase analysis than file-by-file help. If this ships, it pushes coding assistants further into “wide context” work that teams actually pay for: finding patterns, risky areas, and likely defects across a whole project. The catch, as the interface itself warns, is cost—these scans can be token-intensive, and that cost will shape who uses it and how often.

AI product trust, pricing, and UX

If agents are getting more capable, teams also need better ways to decide whether a new model or prompt change made things better or worse. One essay argues traditional testing breaks for stochastic systems, so enterprises are building an “AI evaluation stack”: quick structural checks to catch obvious failures, plus model-based judging to score usefulness and policy compliance, backed by curated regression sets that evolve from real production incidents. The point is simple: without continuous evaluation, AI quality drifts quietly—until it fails loudly in front of customers.

Real-world AI: the agent-run store

And on the topic of measuring AI in software work, a developer reverse-engineered analytics from an AI-enhanced IDE and argues the “percent of code written by AI” can be wildly inflated depending on how the metric is computed. Another tool that ties attribution to commits looked more reasonable, but still overcounted in edge cases. Why it matters: leaders love tidy ROI dashboards, but simplistic byte-or-line counting can distort staffing plans, performance expectations, and even legal assumptions about authorship.

On the research side, Google DeepMind introduced Decoupled DiLoCo, a distributed training approach meant to keep large runs moving even when parts of the system fail or when compute is spread across regions. Instead of tightly locking every accelerator into the same step, it allows looser synchronization, so an outage doesn’t freeze the entire job. The significance is operational: frontier training is increasingly a reliability problem as much as an algorithmic one.

Another paper—nicknamed “Vision Banana”—argues something provocative: image generators can be tuned into strong general visual understanding systems by turning perception tasks into image outputs, like producing a segmentation mask or depth map as an image. If the results hold up broadly, it suggests generative pretraining may become an even more central route to general-purpose vision, reducing the need for separate specialized architectures for every task.

Meta research also surveyed “efficient video intelligence” as of April 2026, emphasizing a practical trend: compressing and distilling video understanding so it works on real devices and long clips, not just short benchmarks. The through-line is efficiency—less redundant processing, smarter temporal handling, and on-device models that are finally credible for tracking and segmentation. It’s a reminder that progress isn’t only bigger models; it’s making them usable where latency, battery, and cost actually matter.

Now to sovereignty—because it’s everywhere in policy decks right now. One critique argues “sovereign labs,” meaning nationally branded frontier-model builders, are mostly unnecessary for typical enterprise needs. The author draws a line between sovereign pre-training and sovereign deployment, and says most companies really want data residency, auditability, and protection against their data being absorbed into someone else’s training loop. That’s less about model nationality and more about controlling data flows and deployments—often using open models locally, with strict isolation for sensitive inputs.

Still, sovereign AI is attracting major alliances. Cohere and Germany’s Aleph Alpha announced a partnership positioned as an independent, enterprise-grade alternative for regulated sectors, with sovereign cloud hosting in the mix. Whether this becomes a real technical advantage or mainly a procurement story will depend on performance, integration, and long-term support—but the demand signal is clear: governments and regulated industries want leverage and options.

A few product and trust stories round out the day. Google appears to be preparing a shift of the Gemini app toward a credit-based usage model. If that lands, it’s a more flexible way to price heavy features—especially long multimodal sessions and agentic tools—while making costs feel more “metered” than “tiered.” Expect this to influence user behavior, because credits change how people experiment.

Canva also dealt with a trust-and-safety mess: users reported its Magic Layers feature was replacing the word “Palestine” in designs. Canva says it fixed the bug and added safeguards. Even if it was unintended, it’s a sharp example of why creators get nervous when AI tools touch existing content: a small, opaque change can become politically loaded instantly, and trust is hard to win back once people fear silent edits.

In governance and public posture, OpenAI published a new “Our Principles” statement, framing commitments around democratization, empowerment, prosperity, resilience, and adaptability—while acknowledging that in some cases it may prioritize safety over maximum user control. These documents don’t settle debates on their own, but they signal how labs are positioning themselves as scrutiny rises from regulators and the public.

Finally, two reality checks—one societal, one operational. A writer argued for “generative AI vegetarianism”: a personal stance of opting out of generative AI tools in daily life to preserve autonomy, craft, and critical thinking, while still allowing older, narrower automation like spam filtering. Whether you agree or not, it’s a useful label for a growing counter-movement against default AI adoption.

And in San Francisco, a boutique called Andon Market is being billed as the first retail store “run by an AI agent,” Luna. The experiment gave Luna money, a lease, and control over decisions—yet early outcomes include bizarre over-ordering, missing price tags, scheduling shutdowns, and a reported operating loss. It’s an unusually honest demo of the gap between persuasive AI interfaces and the messy, physical, exception-filled world. Agents can plan and talk; running a store still demands dependable execution—and humans are quietly doing much of that work.

That’s the update for April-28th-2026: AI inspiring real mathematical progress, labs and clouds locking arms around compute, agents gaining memory and broader codebase reach, and repeated reminders that trust and reliability are the hard parts. Links to all stories can be found in the episode notes. Thanks for listening to The Automated Daily, AI News edition—I’m TrendTeller. Talk to you tomorrow.

GPT helps crack Erdős conjecture & Talent and compute arms race - AI News (Apr 28, 2026)

Our Sponsors

Today's AI News Topics