Open-source framework for continuously improving AI agents.
Agentune helps teams analyze, improve, and evaluate customer-facing AI agents through measurable, data-driven iterations — not guesswork.
Instead of tweaking prompts and hoping for the best, Agentune connects real conversations, context data, and simulations into a repeatable optimization loop that drives actual KPI improvements such as conversion, CSAT, and retention.
Most agents are launched and left to stagnate — tuned by intuition, not evidence.
Agentune enables continuous agent improvement by combining analytics, optimization, and simulation in a single open framework:
- Analyze – uncover what drives your agent’s KPIs up or down
- Improve – generate actionable recommendations to lift performance
- Simulate – safely test and benchmark improvements before deployment
The result: agents that don’t just respond — they learn what works.
Agentune Simulate is a separately installable library that enables you to create customer simulations to test and benchmark your agent's behavior before production.
Together with agentune, it forms the Analyze → Improve → Simulate loop — a disciplined framework for building smarter, higher-performing AI agents.
A future version of agentune-simulate will merge it into the main agentune package.
Agentune is built for teams who want to move beyond trial-and-error:
- AI platform / infra teams managing production-grade agents across multiple domains or use cases
- ML / data teams accountable for KPI impact, not just model accuracy
- Product / ops teams who need to measure and harden conversational behavior before it reaches users
Common scenarios:
- Diagnose why conversion or CSAT is dropping
- Quantify which behaviors, intents, or flows impact KPIs
- Test new prompt or policy versions safely
- Continuously improve deployed agents over time
Turn real conversations into insights that measurably improve your AI agents.
Agentune Analyze & Improve helps teams discover what drives an agent’s KPIs up or down — and generate concrete recommendations to enhance performance.
It transforms messy operational data into interpretable, data-driven actions that actually move business metrics.
Most AI agents are optimized by intuition: a few sample chats, some prompt edits, and best guesses.
Agentune replaces guesswork with evidence.
Using structured and unstructured data from real conversations, it:
- Identifies patterns that correlate with KPI outcomes
- Surfaces interpretable insights (not opaque scores)
- Recommends targeted changes to prompts, policies, and logic
No more trial-and-error tuning — just measurable improvement grounded in data.
For example: suppose you built a sales agent and now have a dataset of conversations with labeled outcomes as win, undecided, or lost. Using Agentune Analyze & Improve, you can discover insights showing which patterns or intents correlate with those outcomes and receive concrete recommendations to refine the agent’s playbook — for instance, improving how it handles discounts, competitor mentions, or shipping questions.
Agentune Analyze & Improve follows a transparent, two-step process:
- Ingests conversations, outcomes, and optional context data (e.g., product, policy, CRM).
- Generates semantic and structural features that capture patterns in language, behavior, or flow.
- Selects statistically significant features correlated with KPI changes — these become your drivers of performance.
Example insights:
- “Mentions of competitors early in chat increase conversion probability.”
- “Discount discussion combined with shipping-time questions lowers CSAT.”
- Maps the discovered drivers into actionable recommendations — changes to prompts, tool usage, escalation logic, or playbooks.
- Outputs a ranked list of improvement opportunities, each linked to its supporting data.
These recommendations can then be validated using Agentune Simulate before deployment.
- Getting Started -
01_getting_started.ipynbfor an introductory walkthrough of library fundamentals - End-to-End Script Example -
e2e_script_example.md- a runnable example executing the entire analysis workflow - Advanced Examples -
advanced_examples.mdfor customizing components, using LLM requests caching, and advanced workflows
We've tested Agentune Analyse with the combination of OpenAI o3 and gpt-4o-mini. In our tests, the cost per conversation was approximately 5-10 cents per conversation.
pip install agentuneRequirements
- Python ≥ 3.12
- Note for Mac users: If you encounter errors related to lightgbm, you may need to install OpenMP first: brew install libomp. See the LightGBM macOS installation guide for details.
- 🧩 Feature Generation – semantic, structural, and behavioral signals derived from real interactions
- 📈 Feature Selection – statistical and semantic correlation with target KPIs
- 💡 Actionable Insights – interpretable drivers with examples and metrics
- 🧠 Context Awareness (upcoming) – integrates CRM, product, and policy metadata for deeper understanding
Current focus: advancing Analyze & Improve with structured, context-aware optimization.
Planned milestones:
- Context-aware feature generation and insight discovery
- Integration of context features into the recommendation layer for targeted improvement actions
- Expanded evaluation and visualization tooling for Analyze & Improve results
- Visualization tools for insight exploration
- Seamless flow into
agentune-simulatefor validating improvements
Longer-term:
- Multi KPI analytics: understand how improving one KPI impacts other KPIs and account for that in the suggested improvement recommendations.
- Optional multi-agent analytics and cross-agent benchmarking
We welcome contributions from engineers who care about robust, measurable agents.
- Open issues for bugs, integrations, or feature proposals
- Early adopters: reach us at agentune-dev@sparkbeyond.com
- 💬 Join our community on Discord to connect with maintainers, share ideas, and get support