Human-in-the-loop execution for LLM agents
-
Updated
Jan 11, 2026 - Python
Human-in-the-loop execution for LLM agents
Guardrails for LLMs: detect and block hallucinated tool calls to improve safety and reliability.
🛡️ Safe AI Agents through Action Classifier
The missing safety layer for AI Agents. Adaptive High-Friction Guardrails (Time-locks, Biometrics) for critical operations to prevent catastrophic errors.
Runtime detector for reward hacking and misalignment in LLM agents (89.7% F1 on 5,391 trajectories).
A runtime authorization layer for LLM tool calls policy, approval, audit logs.
Safety-first agentic toolkit: 10 packages for collapse detection, governance, and reproducible runs.
A2A version of Agent Action Guard: Safe AI Agents through Action Classifier
An open-source engineering blueprint for defining and designing the core capabilities, boundaries, and ethics of any AI agent.
Canonical texts and implementation primitives for the Safe Superintelligence Framework (v1.2.1): Constitution, Minimum Rescue Protocol, system prompt, decision matrix.
A hierarchical AI safety architecture with asymmetric supervisory control.
Energy based legality gating SDK for AI reasoning. Predicts, repairs, and audits collapse before it happens; reduces hallucinations and provides numeric audit logs.
A security-first control plane for autonomous AI code agents: sandboxed execution, hash grounding, diff validation, verification, and full auditability.
Production-ready safety framework preventing identity fusion, synthetic intimacy, and unbounded behavior in AI agent systems. Machine-readable contracts and verse-lang primitives for immediate deployment.
🌌 Unify and enhance simulations with Negentropy Constellation, a monorepo of ten robust packages designed for reproducibility and real-world insight.
PULSE • Deterministic, fail‑closed release gates for Safe & Useful AI — CI‑enforced, audit‑ready (status.json + Quality Ledger).
🛡️ Safeguard AI agents from harmful actions with A2A-Agent-Action-Guard, ensuring safe tool usage through effective action classification.
External kill switch for autonomous runtimes. Validate at enforcement boundaries. Revoke to halt execution.
Add a description, image, and links to the agent-safety topic page so that developers can more easily learn about it.
To associate your repository with the agent-safety topic, visit your repo's landing page and select "manage topics."