HelixAgent is a production-ready, AI-powered ensemble LLM service that intelligently combines responses from multiple language models to provide the most accurate and reliable outputs.
- Docker & Docker Compose
- Go 1.24+ (for local development)
- Git
# Clone the repository
git clone https://dev.helix.agent.git
cd helixagent
# Copy environment configuration
cp .env.example .env
# Start all services
make docker-full
# Or start specific profiles
make docker-ai # AI services only
make docker-monitoring # Monitoring stack only# Install dependencies
make install-deps
# Setup development environment
make setup-dev
# Run locally
make run-devComprehensive Reference: See docs/FEATURES.md for complete documentation of all 21 LLM providers, 13 embedding providers, 35 MCP implementations, 10 LSP servers, and 24+ power features.
- Multi-Provider Support: 21 LLM providers including Claude, DeepSeek, Gemini, Mistral, OpenRouter, Qwen, xAI/Grok, Cohere, Perplexity, Groq, and more
- Dynamic Provider Selection: Real-time verification scores via LLMsVerifier integration
* Note: Ollama is deprecated for production use (verification score: 5.0) and only serves as a fallback for local development/testing. Recommended production providers: Claude, DeepSeek, Gemini.
- AI Debate System: Multi-round debate between providers for consensus (5 positions x 3 LLMs = 15 total)
- Intelligent Routing: Confidence-weighted, majority vote, custom strategies
- Graceful Fallbacks: Automatic fallback to best performing provider based on verification scores
- Streaming Support: Real-time streaming responses
- High Availability: PostgreSQL + Redis clustering
- Monitoring: Prometheus metrics + Grafana dashboards
- Security: JWT authentication, rate limiting, CORS
- Scalability: Horizontal scaling, load balancing
- Caching: Redis-based response caching
- Comprehensive Testing: Unit, integration, benchmark tests
- Hot Reloading: Automatic plugin system updates
- Health Checks: Comprehensive service health monitoring
- API Documentation: Auto-generated OpenAPI specs
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β HelixAgent β
β ββββββββββββββββ ββββββββββββββββ ββββββββββββββββββββββββββ β
β β Web API β β AI Debate β β LLMsVerifier β β
β β (Gin) β β Orchestrator β β (Dynamic Scoring) β β
β ββββββββ¬ββββββββ βββββββββ¬βββββββ ββββββββββββ¬ββββββββββββββ β
β β β β β
β ββββββββββββββββββββΌββββββββββββββββββββββ β
βββββββββββββββββββββββββββββ¬β¬ββββββββββββββββββββββββββββββββββββββ
ββ
ββββββββββββββββββββΌβΌβββββββββββββββββββ
βΌ βΌβΌ βΌ
βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
β PostgreSQL β β Redis β β 10 LLM Providersβ
β β β β β βββββββββββββββ β
β - Sessions β β - Caching β β βClaudeβDeepSeekβ
β - Analytics β β - Queues β β βGeminiβMistral β
β β β β β βQwen βZAI β
βββββββββββββββββββ βββββββββββββββββββ β βZen βCerebrasβ
β βOpenRouter β
β βOllama(local)β
β βββββββββββββββ β
βββββββββββββββββββ
- URL: http://localhost:3000
- Credentials: admin/admin123
- Features:
- Response time metrics
- Error rate monitoring
- Provider performance comparison
- Request throughput tracking
- URL: http://localhost:9090
- Metrics Available:
helixagent_requests_totalhelixagent_response_time_secondshelixagent_errors_totalhelixagent_provider_health
HelixAgent uses comprehensive environment-based configuration. Key variables:
# Server Configuration
PORT=8080
HELIXAGENT_API_KEY=your-api-key
GIN_MODE=release
# Database
DB_HOST=localhost
DB_PORT=5432
DB_USER=helixagent
DB_PASSWORD=your-password
DB_NAME=helixagent_db
# LLM Providers (Ollama is deprecated - use as fallback only)
# See docs/providers/ollama.md for deprecation notice
OLLAMA_ENABLED=true
OLLAMA_BASE_URL=http://ollama:11434
OLLAMA_MODEL=llama2
# Recommended Production Providers
CLAUDE_API_KEY=sk-your-claude-key
DEEPSEEK_API_KEY=sk-your-deepseek-key
GEMINI_API_KEY=your-gemini-key
β οΈ Ollama is deprecated for production - use it only for local development and testing. For production, use API key-based providers like Claude, DeepSeek, or Gemini.
# Ollama requires no API keys and works locally
docker run -p 11434:11434 ollama/ollama
# Pull a model (first time only)
docker exec -it ollama ollama pull llama2
# Test the model
curl -X POST http://localhost:11434/api/generate \
-H "Content-Type: application/json" \
-d '{"model": "llama2", "prompt": "Hello!"}'# Standard build
make build
# Multi-architecture build
make build-all
# Production build
make docker-build-prod# Run all tests
make test
# Run with coverage
make test-coverage
# Run specific test suites
make test-unit
make test-integration
# Run benchmarks
make test-bench# Format code
make fmt
# Run static analysis
make vet
make lint
# Security scanning
make security-scan# Full stack (recommended)
make docker-full
# AI services only
make docker-ai
# Monitoring stack only
make docker-monitoring
# Custom configuration
docker-compose --profile custom up -dAll containers include comprehensive health checks:
- Application:
/healthendpoint monitoring - Database: PostgreSQL connection validation
- Cache: Redis ping verification
- LLM Providers: API endpoint health monitoring
Full API Reference: docs/api/API_REFERENCE.md - Complete REST API documentation with examples
Capability Detection: LLMsVerifier/docs/CAPABILITY_DETECTION.md - Dynamic capability detection for 18+ CLI agents
GET /health- Service health statusGET /v1/health- Detailed health with provider statusGET /v1/models- Available LLM modelsGET /v1/providers- Configured providersGET /metrics- Prometheus metrics
POST /v1/completions- Single completion requestPOST /v1/chat/completions- Chat-style completionsPOST /v1/completions/stream- Streaming completionsPOST /v1/ensemble/completions- Ensemble completions
curl -X POST http://localhost:7061/v1/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer YOUR_API_KEY" \
-d '{
"prompt": "Explain quantum computing in simple terms",
"model": "llama2",
"max_tokens": 500,
"temperature": 0.7
}'curl -X POST http://localhost:7061/v1/ensemble/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer YOUR_API_KEY" \
-d '{
"prompt": "What is the meaning of life?",
"ensemble_config": {
"strategy": "confidence_weighted",
"min_providers": 2,
"confidence_threshold": 0.8,
"fallback_to_best": true
}
}'curl -X POST http://localhost:7061/v1/completions/stream \
-H "Content-Type: application/json" \
-H "Authorization: Bearer YOUR_API_KEY" \
-d '{
"prompt": "Tell me a story",
"stream": true,
"model": "llama2"
}'- JWT Tokens: Secure session management
- API Key Authentication: Request validation
- Rate Limiting: Configurable per-user limits
- CORS Support: Cross-origin request handling
- Input Sanitization: Request validation and sanitization
- Error Handling: Secure error responses without information leakage
- Logging: Structured logging with security events
- Environment Variables: No hardcoded secrets
- Hot Reloading: Automatic plugin detection and loading
- Interface-Based: Standardized plugin interfaces
- Configuration: Plugin-specific config management
- Health Monitoring: Plugin health status tracking
package main
import (
"dev.helix.agent/internal/plugins"
)
type MyPlugin struct {
name string
}
func (p *MyPlugin) Name() string { return p.name }
func (p *MyPlugin) Version() string { return "1.0.0" }
func (p *MyPlugin) Init(config map[string]any) error { /* init logic */ }
func (p *MyPlugin) HealthCheck(ctx context.Context) error { /* health check */ }
func (p *MyPlugin) Shutdown(ctx context.Context) error { /* cleanup */ }- Request Throughput: 1000+ requests/second
- Response Time: <500ms for cached responses
- Memory Usage: <512MB for typical workloads
- CPU Usage: <50% on 4-core instances
- Connection Pooling: Database connection reuse
- Response Caching: Redis-based intelligent caching
- Async Processing: Non-blocking I/O operations
- Resource Limits: Configurable timeouts and pool sizes
HelixAgent includes a comprehensive LLM optimization framework for improving performance:
- Semantic Cache: Vector similarity-based caching (GPTCache-inspired)
- Structured Output: JSON schema validation and generation (Outlines-inspired)
- Enhanced Streaming: Word/sentence buffering, progress tracking, rate limiting
- SGLang: RadixAttention prefix caching for multi-turn conversations
- LlamaIndex: Advanced document retrieval with Cognee sync
- LangChain: Task decomposition and ReAct agents
- Guidance: CFG/regex constrained generation
- LMQL: Query language for LLM constraints
# Start optimization services
docker-compose --profile optimization up -d
# Services available:
# - langchain-server (port 8011)
# - llamaindex-server (port 8012)
# - guidance-server (port 8013)
# - lmql-server (port 8014)
# - sglang (port 30000, requires GPU)import "dev.helix.agent/internal/optimization"
// Create and use optimization service
config := optimization.DefaultConfig()
svc, _ := optimization.NewService(config)
// Check cache, retrieve context, decompose complex tasks
optimized, _ := svc.OptimizeRequest(ctx, prompt, embedding)- Unit Tests: 95%+ coverage for core logic
- Integration Tests: End-to-end API testing
- Security Tests: LLM penetration testing (prompt injection, jailbreaking, data exfiltration)
- Challenge Tests: AI debate maximal challenge validation
- Benchmark Tests: Performance regression detection
- Race Tests: Concurrency safety validation
- Chaos Tests: Resilience and fault tolerance testing
make test # Run all tests (auto-detects infrastructure)
make test-unit # Unit tests only (./internal/... -short)
make test-integration # Integration tests (./tests/integration)
make test-e2e # End-to-end tests (./tests/e2e)
make test-security # Security tests (./tests/security)
make test-stress # Stress tests (./tests/stress)
make test-chaos # Chaos/challenge tests (./tests/challenge)
make test-bench # Benchmark tests
make test-race # Race condition detection- Mock Providers: Isolated unit testing with HTTP mock servers
- Docker Compose: Full integration testing
- Free LLM Testing: Ollama/Zen-based testing without API keys
- CI/CD Pipeline: Automated testing on every push
The security test suite validates LLM security including:
- Prompt Injection: System prompt extraction, role manipulation
- Jailbreaking: Multi-language attacks, hypothetical scenarios
- Data Exfiltration: PII extraction, credential probing
- Indirect Injection: Markdown/HTML injection, encoded payloads
// Request metrics
requestCounter := prometheus.NewCounterVec(
prometheus.CounterOpts{
Name: "helixagent_requests_total",
Help: "Total number of requests processed",
},
[]string{"method", "endpoint", "provider"},
)
// Response time metrics
responseTime := prometheus.NewHistogramVec(
prometheus.HistogramOpts{
Name: "helixagent_response_time_seconds",
Help: "Request response time in seconds",
},
[]string{"method", "endpoint"},
){
"status": "healthy",
"timestamp": "2024-01-01T00:00:00Z",
"version": "1.0.0",
"uptime": "72h30m15s",
"providers": {
"total": 6,
"healthy": 4,
"unhealthy": 2,
"details": {
"ollama": {"status": "healthy", "response_time": 150},
"claude": {"status": "unhealthy", "error": "authentication_failed"},
"deepseek": {"status": "healthy", "response_time": 300}
}
},
"database": {"status": "healthy", "connections": 15/20},
"cache": {"status": "healthy", "hit_rate": 0.85}
}- Multi-Architecture Builds: Linux (amd64/arm64), macOS, Windows
- Docker Image Building: Automated image creation and publishing
- Security Scanning: CodeQL and dependency scanning
- Test Execution: Unit, integration, and end-to-end tests
- Release Automation: Semantic versioning and release notes
- Docker Hub: Production image repository
- Kubernetes: Production K8s manifests
- Cloud Providers: AWS, GCP, Azure deployment guides
- Self-Hosted: On-premise deployment documentation
# Check provider configuration
curl http://localhost:7061/v1/providers
# Test provider health
curl http://localhost:7061/v1/providers/ollama/health
# View logs
docker-compose logs helixagent# Check database connectivity
docker-compose exec postgres pg_isready -U helixagent -d helixagent_db
# Test from application container
docker-compose exec helixagent ./helixagent check-db# Monitor response times
curl -w "@{time_total}\n" -o /dev/null -s http://localhost:7061/health
# Check resource usage
docker stats helixagent
# View metrics
curl http://localhost:9090/metrics# Enable debug logging
export LOG_LEVEL=debug
export GIN_MODE=debug
make run-dev
# Enable detailed error responses
export DEBUG_ENABLED=true
export REQUEST_LOGGING=true- Full Documentation: Complete documentation index
- Features Reference: Comprehensive list of all providers, protocols, and features
- API Reference: http://localhost:7061/docs
- Architecture Guide: System architecture
- Deployment Guide: Production deployment
- Quick Start: Getting started guide
- GitHub Discussions: Community Support
- Issues: Bug Reports & Feature Requests
- Contributing: Contribution Guidelines
- Documentation: HelixAgent Docs
- Website: HelixAgent.ai
- Email: support@helixagent.ai
# Show all available commands
make help
# Setup development environment
make setup-dev
# Start with monitoring
make docker-full
# View logs
make docker-logs
# Stop all services
make docker-stopThis project is licensed under the MIT License - see the LICENSE file for details.
HelixAgent - Intelligent ensemble LLM service for production workloads. π