AI-powered communication training platform that generates personalized guidelines and provides detailed feedback on user communication skills.
- Scenario Management: Create communication scenarios with AI-generated guidelines
- Audio Processing: Analyze audio submissions from Cloudflare R2 URLs
- Speech Analytics: Extract speech rate, clarity, pronunciation accuracy, and more
- AI Feedback: Generate detailed feedback using OpenAI GPT models
- Real-time Processing: Asynchronous audio processing with status tracking
- Python 3.10+
- PostgreSQL database
- OpenAI API key
- AssemblyAI API key
- Cloudflare R2 for audio storage
- Clone and setup environment:
git clone <repository>
cd monolog-core- Install dependencies:
poetry install- Activate virtual environment:
eval $(poetry env activate)- Environment variables:
cp .env.example .env
# Edit .env with your API keys and database URL- Database setup:
# Generate migration
alembic revision --autogenerate -m "add audio features"
# Apply migrations
alembic upgrade head- Run the application:
poetry run uvicorn app.main:app --host 0.0.0.0 --port 8000 --reloadPOST /api/v1/scenario/- Create a new scenario with AI-generated guidelinesGET /api/v1/scenario/{user_id}- Get all scenarios for a user
POST /api/v1/submission/- Submit audio for analysis and feedbackGET /api/v1/submission/{scenario_id}- Get all submissions for a scenarioGET /api/v1/submission/detail/{submission_id}- Get detailed submission with feedback
GET /api/v1/status/submission/{submission_id}/status- Check processing statusGET /api/v1/status/submissions/stats/{scenario_id}- Get scenario statisticsGET /api/v1/status/health/detailed- Detailed health check
GET /api/v1/feedback/{submission_id}- Get feedback for a submission
- Create a Scenario:
POST /api/v1/scenario/
{
"user_id": "uuid",
"context": "You are a sales representative calling a potential client to introduce our new software solution...",
"additional_info": {}
}- Submit Audio:
POST /api/v1/submission/
{
"scenario_id": "uuid",
"audio_url": "https://your-r2-bucket.com/audio/recording.wav"
}- Check Status:
GET /api/v1/status/submission/{submission_id}/status- Get Results:
GET /api/v1/submission/detail/{submission_id}The system analyzes:
- Transcription: Full speech-to-text conversion
- Speech Rate: Words per minute (optimal: 140-180 WPM)
- Pronunciation Accuracy: Based on ASR confidence scores
- Speech Flow: Pause analysis and rhythm
- Volume Consistency: Variation in speaking volume
- Spectral Clarity: Audio quality metrics
- Overall Score: Weighted composite score
Feedback includes:
- Content Analysis: Alignment with scenario guidelines
- Delivery Assessment: Speech quality and presentation
- Specific Recommendations: Actionable improvement suggestions
- Numerical Scoring: Detailed breakdown across multiple dimensions
# Create new migration
alembic revision --autogenerate -m "description"
# Apply migrations
alembic upgrade head
# Rollback
alembic downgrade -1# Run tests (when implemented)
poetry run pytest
# Type checking
poetry run mypy app/# Build and run
docker-compose up --build
# Production deployment
docker build -t monolog-core .
docker run -p 8000:8000 monolog-coreKey environment variables:
DATABASE_URL: PostgreSQL connection stringOPENAI_API_KEY: For AI feedback generationASSEMBLYAI_API_KEY: For audio transcriptionCLERK_JWKS_URL&CLERK_ISSUER: For authentication
- FastAPI: Modern async web framework
- SQLAlchemy: Database ORM with async support
- Alembic: Database migrations
- AssemblyAI: Speech-to-text processing
- OpenAI: AI-powered feedback generation
- PostgreSQL: Primary database with JSONB support
- Background Tasks: Async audio processing
- Health checks at
/healthand/api/v1/status/health/detailed - Processing status tracking for all submissions
- Audio analysis metrics storage
- Error handling with detailed logging