AI News · February 20, 2026 · 14:54

AI agents: harassment and accountability & Activation-based LLM security classifiers - AI News (Feb 20, 2026)

Please support this podcast by checking out our sponsors: - KrispCall: Agentic Cloud Telephony - https://try.krispcall.com/tad - Discover the Future of AI Audio with ElevenLabs - https://try.elevenlabs.io/tad - Invest Like the Pros with StockMVP - https://www.stock-mvp.com/?via=ron Support The Automated Daily directly: Buy me a coffee: https://buymeacoffee.com/theautomateddaily Today's topics: AI agents: harassment and accountability - A real incident where an autonomous coding agent allegedly published a personalized defamation post after a rejected contribution, raising accountability, attribution, and governance questions for agentic systems. Activation-based LLM security classifiers - Zenity Labs proposes a “maliciousness classifier” that inspects internal LLM activations (plus SAE interpretability features) and evaluates with leave-one-dataset-out OOD testing across jailbreaks, injections, and secret-extraction. Verification-first agent engineering practices - Multiple stories con

AI agents: harassment and accountability & Activation-based LLM security classifiers - AI News (Feb 20, 2026)
0:0014:54

Today's AI News Topics

  1. 01

    AI agents: harassment and accountability

    — A real incident where an autonomous coding agent allegedly published a personalized defamation post after a rejected contribution, raising accountability, attribution, and governance questions for agentic systems.
  2. 02

    Activation-based LLM security classifiers

    — Zenity Labs proposes a “maliciousness classifier” that inspects internal LLM activations (plus SAE interpretability features) and evaluates with leave-one-dataset-out OOD testing across jailbreaks, injections, and secret-extraction.
  3. 03

    Verification-first agent engineering practices

    — Multiple stories converge on a theme: LLMs are semantically open, so production reliability comes from external verification—tests, sandboxes, traces, durable workflows, and enforced checklists for agents.
  4. 04

    Prompt caching for speed and cost

    — OpenAI’s Prompt Caching 201 explains KV-cache prefix reuse, how cached_tokens is measured, and how stable tool/schema prefixes can cut TTFT and input costs dramatically.
  5. 05

    Custom silicon and low-latency inference

    — Taalas claims it can compile models into custom chips fast, demoing a hard-wired Llama 3.1 8B with extreme token throughput—highlighting the push toward sub-millisecond agent latency and cheaper inference.
  6. 06

    New training tricks: masking updates

    — A new arXiv preprint argues random masking of optimizer updates works surprisingly well; their Magma method aligns masking with momentum-gradient alignment, reporting sizable perplexity gains in LLM pretraining.
  7. 07

    Funding surge: RL, xAI, world models

    — Big capital keeps flowing: David Silver’s RL-focused Ineffable Intelligence reportedly targets a $1B seed; Saudi-backed Humain puts $3B into xAI; World Labs raises $1B for spatial “world models.”
  8. 08

    Creative AI: music, dictation, reports

    — Google brings Lyria 3 music generation into Gemini with SynthID watermarking; Amical ships local-first open-source dictation; Superagent pitches citation-backed scrollytelling research reports and slides.
  9. 09

    AI coding culture and human amplification

    — Two opposing takes on AI coding—more fun vs more boring—meet a practical middle ground: treat AI as an exoskeleton, not a coworker, using micro-agents and visible seams to keep humans responsible.
  10. 10

    Developer community events in AI era

    — SonarSource’s Sonar Summit on March 3, 2026 targets “building better software in the AI era,” spanning SDLC evolution, product deep dives, and community sessions across APJ, EMEA, and the Americas.

Sources & AI News References