Real-Time ASL Gesture to Text Translation System

Project Overview

This project is a real-time American Sign Language (ASL) gesture translation system that converts hand gestures captured through a webcam into text and spoken language. The system uses computer vision and deep learning techniques to recognize both static signs (individual letters and numbers) and dynamic gestures (words or phrases that involve movement over time).

Model Architecture

The system employs a multi-model approach to achieve high accuracy in ASL translation:

Landmark-based Model: A neural network that processes hand landmark coordinates extracted by MediaPipe. It consists of:
- Dense layers with batch normalization and dropout for regularization
- Input features representing 21 landmarks × 3 coordinates (x, y, z) × 2 hands
- Optimized for static gesture recognition
Sequence Model: An LSTM-based neural network for dynamic gesture recognition:
- Processes sequences of hand landmarks (30 frames, approximately 1 second)
- Captures temporal patterns in gestures that involve movement
- Two LSTM layers followed by dense layers
Image Model (Optional): A MobileNetV2-based CNN for processing hand images:
- Transfer learning from pre-trained ImageNet weights
- Fine-tuned for ASL gesture recognition
- Useful for challenging lighting conditions or when landmark detection is difficult

The system uses a voting mechanism to combine predictions from these models, improving overall accuracy and robustness.

Key Features

Real-time translation: Processes webcam feed at interactive framerates
Multi-hand support: Can detect and process two hands simultaneously
Interactive training mode: Add new signs and customize the model for personal use
Text-to-speech output: Spoken output of recognized signs with configurable cooldown
Dynamic gesture recognition: Captures signs that involve movement over time
Modern GUI: User-friendly interface with visualization of hand landmarks
Cross-platform compatibility: Works on Windows, macOS, and Linux

Technical Implementation

Hand tracking: Uses MediaPipe Hands for accurate hand landmark detection
Data preprocessing: Normalizes hand landmarks and extracts regions of interest
Model training: Supports both loading pre-trained models and on-the-fly training
Inference optimization: Uses efficient prediction pipelines for real-time performance

Installation

Clone this repository:

git clone https://github.com/yourusername/my-augmented-voice.git
cd my-augmented-voice

Install dependencies:
```
pip install -r requirements.txt
```
Note: The requirements file automatically handles platform-specific dependencies, including TensorFlow for Apple Silicon.
Create necessary directories (first run will do this automatically):
```
mkdir -p asl_translator/model
```

Usage

Basic Recognition Mode

Run the application in standard recognition mode:

python asl_cnn_translator.py

This will launch the GUI application ready to recognize ASL signs.

Interactive Training Mode

To add new signs or improve existing ones:

python asl_cnn_translator.py --interactive

In this mode, you can:

Create new sign labels
Collect training examples by demonstrating the sign
Train the models with your collected data
Test recognition immediately

Retrain Models

To create fresh models instead of loading existing ones:

python asl_cnn_translator.py --retrain

This is useful if you want to start with a clean slate or if the existing models have become corrupted.

Interface Controls

Toggle Camera: Start/stop the webcam feed
Train Model: In interactive mode, train the model with collected examples
Collect Examples: Capture frames for a specific sign in interactive mode
Delete Sign: Remove a sign from the dataset
Toggle TTS: Enable/disable text-to-speech output
Toggle Dynamic Recognition: Switch between static and dynamic gesture recognition

Project Structure

asl_cnn_translator.py: Main application file containing model definitions and GUI
requirements.txt: Dependencies required to run the application
asl_translator/model/: Directory for storing trained models and training data
- dual_hand_model.keras: Landmark-based model for static gesture recognition
- dynamic_gesture_model.keras: LSTM model for dynamic gesture recognition
- image_model.keras: CNN model for image-based recognition
- labels.pkl: Mapping between class indices and sign labels
- training_data/: Contains collected training examples

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
.gitignore		.gitignore
README.md		README.md
asl_cnn_translator.py		asl_cnn_translator.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Real-Time ASL Gesture to Text Translation System

Project Overview

Model Architecture

Key Features

Technical Implementation

Installation

Usage

Basic Recognition Mode

Interactive Training Mode

Retrain Models

Interface Controls

Project Structure

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 3

Uh oh!

Languages

liamellison02/MyAugmentedVoice

Folders and files

Latest commit

History

Repository files navigation

Real-Time ASL Gesture to Text Translation System

Project Overview

Model Architecture

Key Features

Technical Implementation

Installation

Usage

Basic Recognition Mode

Interactive Training Mode

Retrain Models

Interface Controls

Project Structure

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 3

Uh oh!

Languages

Packages