AI benchmarks gamed by exploits & iPhone passcode broken by update - Hacker News (Apr 12, 2026)
AI benchmarks exposed as easy to game, an iOS passcode bug locks users out, plus JVM flag mapping, reversible computing, and the next compute bottlenecks.
Our Sponsors
Today's Hacker News Topics
-
AI benchmarks gamed by exploits
— UC Berkeley researchers show popular AI agent benchmarks can be reward-hacked via environment leakage and weak validators, undermining leaderboard trust and safety claims. -
iPhone passcode broken by update
— A student reports an iOS update blocks a Czech diacritic in a lock-screen passcode, highlighting fragile input methods, encryption constraints, and data-recovery pitfalls. -
JVM flags database for OpenJDK
— An updated VM Options Explorer catalogs OpenJDK 11 HotSpot flags with defaults, deprecations, and vendor differences—useful for performance tuning and upgrade planning. -
Reversible computing and energy limits
— A piece connects Landauer’s principle to reversible computation, explaining why reducing information erasure could lower energy use even if today’s hardware is far from the limit. -
Hard tech bets after Intel
— Pat Gelsinger, now backing hard tech startups, outlines looming compute bottlenecks—memory, networking, energy—and why heterogeneous systems may shape the next AI era. -
Design for skimmers, not readers
— The “Miller Principle” argues people rarely read docs, UI text, or long messages, pushing teams toward resilient product design and communication that survives skimming. -
Lean software ops without VC
— A developer argues for ultra-lean infrastructure—simple deployments, low burn, pragmatic tooling—so profitable software can scale without constant fundraising pressure. -
Maintenance as engine of progress
— Stewart Brand reframes maintenance, repair, and precision as drivers of scientific and industrial advances, shaping how institutions and cultures sustain innovation. -
Debating history’s biggest ideas
— A blog’s chronological ‘greatest intellectual achievements’ list—Shannon, Darwin, computation, and more—sparks debate about what truly counts as a field-defining breakthrough.
Sources & Hacker News References
- → VM Options Explorer updates searchable catalog of OpenJDK 11 HotSpot JVM flags
- → Blogger Compiles and Debates a Canon of History’s Biggest Intellectual Breakthroughs
- → Pat Gelsinger on Post-Intel Investing, 10,000x Inference Gains, and the Next Phase of Moore’s Law
- → Alex Miller’s ‘Miller Principle’: Assume People Won’t Read Your Text
- → Phyphox app showcases smartphones as tools for physics experiments
- → How Toffoli Gates Enable Universal Reversible Computing
- → iOS update blocks Czech háček in passcodes, locking some iPhone users out
- → Berkeley Researchers Show Top AI Agent Benchmarks Can Be Gamed for Near-Perfect Scores
- → Bootstrapped Founder Details a $20/Month Stack for Running Profitable SaaS Apps
- → Stewart Brand Argues Maintenance and Precision Drive Technological Progress
Full Episode Transcript: AI benchmarks gamed by exploits & iPhone passcode broken by update
What if some of the AI agent scores you’ve been seeing are basically perfect… because the benchmarks can be tricked into handing out wins? Today’s lead story is a reality check on how fragile evaluation can be. Welcome to The Automated Daily, hacker news edition. The podcast created by generative AI. I’m TrendTeller, and today is April 12th, 2026. Let’s get into what happened, and why it matters.
AI benchmarks gamed by exploits
First up: a group at UC Berkeley says several widely used AI agent benchmarks can be “reward-hacked” to score near the top without actually doing the intended work. Their point isn’t that researchers are dumb—it’s that many eval setups accidentally leak answers, blur the boundary between the agent and the grader, or rely on brittle validation. That matters because benchmark numbers drive everything from model selection to funding to safety narratives. If scores can be gamed, the incentives drift toward manipulating measurement instead of improving real capability, and the public story about progress gets distorted.
iPhone passcode broken by update
Staying with software that behaves differently than expected: a student says an iOS update locked him out of his iPhone because the lock-screen passcode keyboard stopped accepting a specific Czech character. The key still appears, but the phone won’t actually input it during the “before first unlock” passcode entry. And because he didn’t have a cloud backup, the official recovery path—restore the device—means losing the photos and data he cares about most. The broader lesson is uncomfortable: security features like strong encryption make recovery genuinely hard, so small input-method changes can turn into catastrophic access failures for anyone using uncommon characters to strengthen passcodes.
JVM flags database for OpenJDK
On the developer-ops side, Chris Whocodes published a refreshed “VM Options Explorer” for OpenJDK 11 HotSpot—a searchable, normalized catalog of JVM flags with context like defaults, deprecations, and where each option lives in the code. The interesting part isn’t just that there are a lot of knobs; it’s that the page helps you see how flags evolve across JDK releases and across vendor builds. If you operate JVM services, this is exactly the kind of detail that can prevent a painful upgrade, where an old tuning flag suddenly turns into a warning—or worse, a startup failure—and you’re left wondering what changed and when.
Reversible computing and energy limits
Now to computing fundamentals: a readable explainer revisits Landauer’s principle—the idea that erasing information has an unavoidable energy cost—and contrasts it with reversible computation, which in theory can avoid that particular penalty. Even though modern hardware burns far more energy than the theoretical minimum, the argument is that “reversible” thinking can still guide practical efficiency gains. The piece also highlights the tradeoff: you often need extra scratch space and additional outputs to keep computations reversible. Why this matters right now is simple: as compute demand keeps climbing, energy efficiency is turning from a nice-to-have into a core constraint.
Hard tech bets after Intel
Speaking of constraints, former Intel CEO Pat Gelsinger is now backing hard-tech startups and used a recent interview to lay out where he thinks the next computing bottlenecks are forming. He’s betting on a heterogeneous future—systems mixing classic CPUs with AI accelerators and, eventually, quantum components—while warning that today’s AI growth is running into very real limits around memory, interconnects, and cluster reliability. He also frames energy supply as a strategic resource, not just an operating cost, and ties it to geopolitics and supply-chain resilience. Whether you agree with every prediction or not, it’s a useful map of where an industry veteran expects money, engineering talent, and policy attention to converge.
Design for skimmers, not readers
A lighter read with a serious takeaway: Alex Miller’s “Miller Principle” claims, bluntly, that no one reads anything—docs, UI text, long emails, even code comments. It’s tongue-in-cheek, but the product lesson is real: if your system only works when users carefully absorb instructions, it probably won’t work. Good design assumes skimming, distraction, and time pressure—and tries to make the correct action the easiest action.
Lean software ops without VC
On building sustainably, a developer wrote about getting turned down at a pitch night because investors couldn’t see why funding was needed—his products already make recurring revenue with very low infrastructure spend. The essay’s broader theme is anti-glamour: keep deployments simple, keep costs predictable, and avoid architectural choices that drag you into constant operational overhead. The reason this resonates is that it’s less about any single tool and more about a strategy: reducing burn rate buys you optionality—more time to learn what customers actually want, and less pressure to chase growth-at-all-costs.
Maintenance as engine of progress
Two culture-and-ideas stories to close. First, Stewart Brand argues in a new project that maintenance—repair, calibration, and upkeep—isn’t a footnote to progress; it’s one of the engines of progress. His through-line connects precision manufacturing, interchangeable parts, and the institutions that preserve practical know-how. It’s a useful reframing for tech culture, which often celebrates invention while undervaluing the steady work that keeps systems safe, reliable, and improvable.
Debating history’s biggest ideas
And finally, a blog post proposes an informal, chronological list of world-changing intellectual breakthroughs—using Claude Shannon’s information theory as an example of a foundational idea that most people benefit from without ever hearing about. The author’s real goal seems to be sparking debate about what counts as a true intellectual revolution, and what gets left out of the usual canon. In a moment when AI and computing dominate headlines, it’s a reminder that today’s “obvious” technologies often rest on yesterday’s quiet, abstract insights.
That’s it for today’s edition. If you want to dig deeper, links to all stories are in the episode notes. Thanks for listening—until next time.