Git commits with AI session notes & AI productivity: Scheme to WebAssembly - AI News (Mar 2, 2026)
Git commits with AI transcripts, a weekend-built Scheme→WASM compiler, eBPF auditing for agents, and why military AI needs real interpretability—listen now.
Our Sponsors
Topics
- 01
Git commits with AI session notes
— A new Git extension, git-memento, stores cleaned AI coding transcripts as Markdown inside git notes, preserving normal commit workflows while improving provenance and review. - 02
AI productivity: Scheme to WebAssembly
— Puppy Scheme is a fast-built, alpha Scheme-to-WebAssembly compiler accelerated by Claude, featuring WASI 2, the Component Model, WASM GC, and big compile-time speedups. - 03
Auditing AI agents with eBPF
— Logira uses eBPF, cgroup v2, JSONL timelines, and SQLite queries to audit what AI agents actually do on Linux—processes, files, and network—plus risky-behavior detections. - 04
Near-term AI security truce
— Matthew Honnibal calls for focusing on practical AI risks like prompt injection, autonomous attack loops, and unsafe agent marketplaces—urging basic security hardening over hype. - 05
Accountable agents via cryptographic covenants
— Nobulex proposes verifiable agent behavior using DIDs, Ed25519 keys, a Cedar-like policy DSL, hash-chained action logs with Merkle proofs, and staking/slashing enforcement. - 06
Military AI, interpretability, and governance
— Two essays argue that lethal or medical AI must be interpretable and that the Pentagon–Anthropic debate is too narrowly framed around “human in the loop,” missing oversight and accountability. - 07
When not to share transcripts
— Cory Doctorow warns that dumping chatbot transcripts into public threads is rude and unreliable, and that sending unverified AI critiques to authors shifts unpaid verification work onto them.
Sources
- → https://github.com/mandel-macaque/memento
- → https://matthewphillips.info/programming/posts/i-built-a-scheme-compiler-with-ai/
- → https://github.com/melonattacker/logira
- → https://pluralistic.net/2026/03/02/nonconsensual-slopping/#robowanking
- → https://honnibal.dev/blog/clownpocalypse
- → https://manidoraisamy.com/ai-interpretable.html
- → https://github.com/nobulexdev/nobulex
- → https://weaponizedspaces.substack.com/p/the-information-space-around-military
Full Transcript
A developer asked an AI to “grind on performance” overnight—and woke up to a compiler going from three and a half minutes to eleven seconds. That’s either the future of tooling… or a new kind of dependency we don’t fully understand yet. Welcome to The Automated Daily, AI News edition. The podcast created by generative AI. I’m TrendTeller, and today is March 2nd, 2026. We’ve got a packed lineup: Git commits that carry their AI session history, new ways to audit what agents actually do on your machine, and a growing argument that the “human in the loop” framing for military AI is missing the bigger governance question.
Git commits with AI session notes
Let’s start with developer workflow—because today’s most concrete shift is happening right inside Git. A new open-source project called git-memento, from the mandel-macaque/memento repository, is essentially a Git extension for provenance. The idea is simple: if an AI coding session contributed to a commit, you should be able to attach a cleaned, human-readable trace of that session to the commit—without breaking how developers already work. Here’s the clever part: it stores that transcript as Markdown in git notes, not in the commit message and not in your codebase. That means your usual flow stays intact—you can still commit with -m or open an editor—while the “how we got here” context lives alongside the commit for anyone who wants it. You initialize per repo with something like “git memento init”, optionally choosing a provider like codex or claude. Configuration lives in your local .git/config under memento.* keys, so it’s repo-scoped and doesn’t demand a new centralized service. Then the daily usage looks like: “git memento commit <session-id> -m ‘message’” or “git memento amend” when you’re rewriting history. It supports both a legacy single-session format and a versioned multi-session envelope, using explicit HTML comment markers—so you can attach multiple sessions, even from different providers, to one commit. That’s important because real work rarely fits into a single AI interaction. It also leans into collaboration. Commands like share-notes, push, and notes-sync deal with refs/notes/* properly—pushing and merging notes, configuring remote fetch refspecs, and even creating timestamped backups under refs/notes/memento-backups/<timestamp> before merges. If you’ve ever had git notes drift across a team, you’ll recognize why that backup step matters. For teams that rebase and rewrite history a lot, there are features to carry notes forward automatically—notes-rewrite-setup—or to aggregate notes from a rewritten range into a new commit via notes-carry, with a provenance block so reviewers can see what got rolled up. And there’s quality tooling: “git memento audit” can check coverage, validate metadata markers like provider and session ID, and even output JSON. “git memento doctor” helps debug configuration and whether your remotes are set up to sync notes sanely. From an engineering standpoint, it’s shipped as a single native executable per platform using .NET SDK 10 and NativeAOT. There’s a curl-based installer that pulls from GitHub releases/latest, plus CI smoke tests across Linux, macOS, and Windows. There’s also a GitHub Marketplace Action: one mode posts commit comments by rendering memento notes, and another mode gates CI by failing builds when audit coverage checks fail. In other words: not just capture, but enforcement. The repo is MIT-licensed, roughly 200 stars at snapshot time, and today—March 2, 2026—v1.1.0 is listed as the first public release of the CLI and Actions. Stepping back, git-memento is part of a broader theme: if AI is contributing to code, we need better receipts. Not for performative transparency—just enough traceability for code review, incident response, and institutional memory.
AI productivity: Scheme to WebAssembly
Now let’s talk about the upside of AI-assisted building—where the speed is real, but the maturity isn’t. Matthew Phillips wrote about building “Puppy Scheme,” a Scheme-to-WebAssembly compiler, largely motivated by watching people ship near-production tools at a surprising pace with AI in the loop. His headline claim is time: most of a weekend plus a couple weekday evenings—work that traditionally could stretch into months or even years. Claude played a major role, and the most striking example is performance. Phillips describes an overnight request to “grind on performance” that took compilation time from about three and a half minutes down to roughly eleven seconds. That is a jaw-dropping improvement, and it’s exactly the kind of story that makes developers both excited and a little uneasy: what changed, and do we really understand it? Technically, the project is ambitious for its age. Puppy Scheme reportedly supports about 73% of R5RS and R7RS. It targets modern WebAssembly features: WASI 2, the WebAssembly Component Model, and WASM GC. It includes dead-code elimination for smaller binaries, and it’s self-hosting—meaning it can compile its own source into a puppyc.wasm artifact. There’s also a wasmtime-based wrapper that turns the generated WASM into native binaries, plus a website demo running the compiler output in Cloudflare Workers. Phillips even hints at a component-model style UI approach with a counter example written in Scheme. But he’s clear: it’s alpha quality and buggy, not ready for general users. That honesty matters. We’re entering an era where “built fast” is common; “trusted and maintained” still takes time.
Auditing AI agents with eBPF
Next: if agents are acting on your machine, how do you verify what they actually did? A project called Logira takes a very pragmatic stance: don’t trust the agent’s narrative—instrument the operating system. Logira is an observe-only Linux CLI plus a root daemon, logirad, that uses eBPF to record runtime activity: process execution, file access, and network behavior. The key design detail is attribution. Logira tracks events per run using cgroup v2, so actions can be tied back to a single audited command invocation. The typical workflow is “logira run -- <command>” and then you review what happened using commands like runs, view, query, and explain. Under the hood, each run is stored locally in both JSONL—for timeline-style playback—and SQLite for fast searching, plus run metadata. That’s a sensible combo: one format optimized for auditing chronologically, one for asking pointed questions. Logira also ships with an opinionated detection ruleset aimed at risky behavior during AI or automation runs, and lets you add custom per-run rules via YAML. Defaults cover things security teams actually care about: reads or writes of credential stores like SSH keys, AWS and kube configs, .netrc, and .git-credentials; persistence and system changes like /etc edits, systemd units, cron, and shell startup files; and classic “temp dropper” patterns like executables created under /tmp or /dev/shm. It flags suspicious command patterns too—curl piped to sh, wget piped to sh, tunneling or reverse-shell tooling, base64 decode-to-shell hints—and destructive operations like rm -rf, git clean -fdx, mkfs, or terraform destroy. Network rules highlight odd egress ports and cloud metadata endpoint access. Practical constraints: Linux kernel 5.8 or newer, systemd, and cgroup v2. Licensing is Apache-2.0, with the eBPF programs dual-licensed Apache-2.0 or GPL-2.0-only for kernel compatibility. If you’re deploying agents in real environments, Logira is an important reminder: the fastest way to build trust is often to measure the world around the agent, not the agent itself.
Near-term AI security truce
That brings us neatly to a broader security argument: can we call a truce in the AI safety debate and focus on what’s already breaking? Matthew Honnibal is arguing exactly that—a “truce” that sets aside battles over superintelligence and focuses on near-term, severe, non-existential risks from today’s deployments. His central fear is not a brilliant adversary model. It’s cheap, automated, self-replicating attack loops—systems that don’t need to be very smart to cause enormous damage once exploit creation becomes cheaper than the average payoff. That becomes especially dangerous in a “race” mentality where attack surfaces expand quickly and AI coding agents get broad permissions, then run unsupervised. One concrete example he gives is the emerging ecosystem of “skills” files for coding agents—Markdown appended to prompts—shared in marketplaces like Skills.sh. If those skill files allow hidden HTML comments, they can smuggle unrendered instructions that hijack an agent. And the fix, in his view, is almost embarrassingly simple: forbid HTML comments. He points to a high-profile demo skill—Jamieson O’Reilly’s “What Would Elon Do”—that was boosted in a marketplace and used to show victims they’d been compromised. The troubling part is the lag: the issue allegedly lingered for weeks without action. He also calls out agent ecosystems like OpenClaw as “incident-prone,” with internet-exposed misconfigurations, yet still rocketing in popularity—an example of what he labels normalization of deviance. Prompt injection, he argues, is effectively unresolved, but increasingly treated as an “oopsie” risk that can chain across systems. Another case study: Google Gemini’s one-click API key workflow. Honnibal says it broke a longstanding assumption that many Google API keys were safe to embed publicly, because some keys also authorized Gemini usage and could be abused for spend. He claims Google initially denied the issue and didn’t fully fix it before disclosure. His conclusion isn’t apocalyptic; it’s procedural: preventing what he calls an “AI clownpocalypse” is feasible with basic security hardening—but it requires accepting friction. Less magic, more guardrails.
Accountable agents via cryptographic covenants
If you want a more formal answer to “how do we hold agents accountable,” there’s a new protocol attempt worth watching. Nobulex is an MIT-licensed open protocol—implemented as a TypeScript monorepo—built around a blunt observation: you can’t truly audit a neural net’s internal reasoning, but you can audit an agent’s actions. The core concept is “cryptographic behavioral commitments with trustless verification.” An agent declares a covenant—written in a Cedar-inspired DSL with permit, forbid, and require rules—then produces an action log. A verifier can run something like verify(covenant, actionLog) and get a compliance result plus violation proofs. Importantly: “forbid wins,” and anything unmatched is denied by default. Identity is handled with W3C-style decentralized identifiers—did:nobulex—backed by Ed25519 keys. Actions are recorded in a tamper-evident SHA-256 hash-chained log that can generate Merkle proofs, so you can prove specific events occurred without dumping everything. Enforcement comes in two flavors. There’s runtime middleware that can block forbidden actions before they happen, and economic enforcement via staking and slashing—making violations financially irrational. Nobulex describes a two-tier model: Tier 1 uses TEE-based middleware like Intel SGX or AMD SEV to physically prevent forbidden actions in high-stakes contexts; Tier 2 relies on penalties for more general use. The repo claims contracts are deployed on Sepolia—CovenantRegistry, StakeManager, and SlashingJudge—and includes a demo script showing covenant creation, blocking a forbidden transfer, and then verifying compliance. Whether this becomes a standard or stays a niche experiment, it’s part of the same movement as OS-level auditing and Git-based provenance: verifiability over vibes.
Military AI, interpretability, and governance
Now for the heavier segment: military AI, accountability, and what “safety” even means when decisions are life-or-death. One essay today argues that for truly high-stakes domains—fully autonomous weapons, or medical diagnosis—accuracy is not enough. The system must be interpretable in a way that supports real accountability. It opens with an example from deterministic engineering: a Boeing 787 crash into a medical college in India that killed 260 people. Investigators blame pilots, families blame Boeing, and the final report is pending. The point isn’t the specifics—it’s that when catastrophe happens, society demands traceable explanation, even from systems we built ourselves. The author then points to reporting that the U.S. Department of War wants to use Anthropic’s model in fully autonomous weapons without human approval—something Anthropic reportedly rejected as too unreliable, echoing Dario Amodei’s stance that current AI remains fundamentally unpredictable. The essay’s technical argument is that unreliability isn’t just “hallucinations.” It’s also structural: models are lossy—tokenization and reconstruction can degrade fidelity—and they’re black boxes: internal vectors are high-dimensional and not human-meaningful. Using a skin-cancer-style prompt about a changing mole, the author explains how a model might implicitly mirror clinical heuristics like the ABCDE rule, yet we can’t see the pathway from phrases like “bluish spot” to a risk assessment. They discuss efforts like Anthropic’s “Scaling Monosemanticity,” which extracts interpretable features via sparse autoencoders—features like “brown”—but still built atop unnamed dimensions. The proposed direction is bolder: define canonical, orthogonal, named dimensions first—think RGB for color or valence/arousal/dominance for emotion—then let higher-level features emerge on top. They even suggest replacing dense vectors with sparse graph embeddings processed by graph transformers, making intermediate representations traceable. A related article argues the public debate around the Pentagon and Anthropic is being artificially narrowed—almost like narrative warfare. Instead of asking broad governance questions—should advanced AI be embedded into military decision-making at all, who controls it, what oversight and constitutional process applies—the conversation collapses into a single proxy: “human in the loop.” The piece warns that human-in-the-loop can become theater. Automation bias means human overseers often defer to machine output. It cites a war-game described by New Scientist where large language models selected nuclear strikes in about 95% of runs when objectives were loosely constrained—an uncomfortable reminder that language models can generate extreme “decisive” recommendations under ambiguous goals. The deeper risk, the author argues, is that AI can accelerate decision cycles and pre-shape the menu of options before any review occurs—quietly shifting where power and accountability live. They also highlight democratic governance concerns: Congress holds war powers, yet AI integration proceeds largely through executive contracting and internal policy. And there’s a surveillance angle too: AI-driven inference can expand monitoring beyond what frameworks like FISA, focused on data collection rather than inference, were designed to constrain. Taken together, these pieces push the same message: if we can’t interpret it and govern it, we probably shouldn’t hand it the steering wheel—especially when the stakes are irreversible.
When not to share transcripts
Finally, a cultural note that loops back to our first story about saving AI session traces. Cory Doctorow’s Pluralistic post today argues that, generally speaking, other people don’t want to see your AI chatbot transcripts. He compares pasting chatbot “conversations” into social threads—or tagging a bot into conversations with strangers—to an intrusive, nonconsensual act. Even if it’s fascinating to you, it’s often unwanted verbosity for everyone else. He’s especially critical of a pattern many writers have experienced: someone reads an essay, asks an AI for a rebuttal or “commentary,” and then emails that unverified output to the original author. Doctorow’s point is that AI companies themselves admit these systems are error-prone, so the recipient is being forced into the role of unpaid verifier—becoming the “human in the loop” for someone else’s chatbot. He also disputes the claim that chatbots boost productivity by summarizing subjects you don’t understand. If you don’t understand the domain, you can’t judge whether the summary is accurate. What’s interesting here is the tension with tools like git-memento. Doctorow isn’t saying “never keep transcripts.” He’s saying: be considerate about where you paste them and who you burden with them. In software, attaching an AI trace to a commit—opt-in, reviewable, and in the right place—might be the responsible version of “show your work.” Dumping it into a public thread and demanding others sort truth from noise is something else entirely.
That’s our AI news for March 2nd, 2026. If there’s a single thread running through today’s stories, it’s accountability: keep a usable trail of AI assistance in code; measure agent behavior at the OS level; verify actions against declared policies; and resist governance debates that shrink down to easy proxies like “human in the loop.” Links to all stories can be found in the episode notes. I’m TrendTeller—see you next time on The Automated Daily, AI News edition.