This project is a real-time American Sign Language (ASL) gesture translation system that converts hand gestures captured through a webcam into text and spoken language. The system uses computer vision and deep learning techniques to recognize both static signs (individual letters and numbers) and dynamic gestures (words or phrases that involve movement over time).
The system employs a multi-model approach to achieve high accuracy in ASL translation:
-
Landmark-based Model: A neural network that processes hand landmark coordinates extracted by MediaPipe. It consists of:
- Dense layers with batch normalization and dropout for regularization
- Input features representing 21 landmarks × 3 coordinates (x, y, z) × 2 hands
- Optimized for static gesture recognition
-
Sequence Model: An LSTM-based neural network for dynamic gesture recognition:
- Processes sequences of hand landmarks (30 frames, approximately 1 second)
- Captures temporal patterns in gestures that involve movement
- Two LSTM layers followed by dense layers
-
Image Model (Optional): A MobileNetV2-based CNN for processing hand images:
- Transfer learning from pre-trained ImageNet weights
- Fine-tuned for ASL gesture recognition
- Useful for challenging lighting conditions or when landmark detection is difficult
The system uses a voting mechanism to combine predictions from these models, improving overall accuracy and robustness.
- Real-time translation: Processes webcam feed at interactive framerates
- Multi-hand support: Can detect and process two hands simultaneously
- Interactive training mode: Add new signs and customize the model for personal use
- Text-to-speech output: Spoken output of recognized signs with configurable cooldown
- Dynamic gesture recognition: Captures signs that involve movement over time
- Modern GUI: User-friendly interface with visualization of hand landmarks
- Cross-platform compatibility: Works on Windows, macOS, and Linux
- Hand tracking: Uses MediaPipe Hands for accurate hand landmark detection
- Data preprocessing: Normalizes hand landmarks and extracts regions of interest
- Model training: Supports both loading pre-trained models and on-the-fly training
- Inference optimization: Uses efficient prediction pipelines for real-time performance
-
Clone this repository:
git clone https://github.com/yourusername/my-augmented-voice.git cd my-augmented-voice -
Install dependencies:
pip install -r requirements.txt
Note: The requirements file automatically handles platform-specific dependencies, including TensorFlow for Apple Silicon.
-
Create necessary directories (first run will do this automatically):
mkdir -p asl_translator/model
Run the application in standard recognition mode:
python asl_cnn_translator.pyThis will launch the GUI application ready to recognize ASL signs.
To add new signs or improve existing ones:
python asl_cnn_translator.py --interactiveIn this mode, you can:
- Create new sign labels
- Collect training examples by demonstrating the sign
- Train the models with your collected data
- Test recognition immediately
To create fresh models instead of loading existing ones:
python asl_cnn_translator.py --retrainThis is useful if you want to start with a clean slate or if the existing models have become corrupted.
- Toggle Camera: Start/stop the webcam feed
- Train Model: In interactive mode, train the model with collected examples
- Collect Examples: Capture frames for a specific sign in interactive mode
- Delete Sign: Remove a sign from the dataset
- Toggle TTS: Enable/disable text-to-speech output
- Toggle Dynamic Recognition: Switch between static and dynamic gesture recognition
asl_cnn_translator.py: Main application file containing model definitions and GUIrequirements.txt: Dependencies required to run the applicationasl_translator/model/: Directory for storing trained models and training datadual_hand_model.keras: Landmark-based model for static gesture recognitiondynamic_gesture_model.keras: LSTM model for dynamic gesture recognitionimage_model.keras: CNN model for image-based recognitionlabels.pkl: Mapping between class indices and sign labelstraining_data/: Contains collected training examples

