A functional, real-time voice-activated Smart AI Assistant that uses Automatic Speech Recognition (ASR) to process voice commands and Text-to-Speech (TTS) to provide spoken feedback. This project showcases the practical application of voice recognition technology in an interactive desktop system.
- About the Project
- Key Features
- How It Works (Methodology)
- System Design
- Getting Started
- Limitations
- Acknowledgments
This project implements a Smart AI Assistant that operates in real-time, built using Python[cite: 46, 51]. [cite_start]It integrates the speech_recognition library for capturing and transcribing user audio and pyttsx3 for synthesized voice output[cite: 47]. [cite_start]The successful completion of this project was a partial fulfillment of the requirements for the degree of Bachelor of Technology at Rai Technology University[cite: 31].
]The assistant was developed by the team of Jeevan KL, Chandan H, Towhid Aalam, and Dhamodhara for their 3rd Semester B.Tech CSE AIML program during the 2025-2026 session[cite: 4, 5, 6, 7, 8, 9, 10, 30].
- Voice-Controlled Execution: The assistant processes voice commands to execute various actions.
- ASR and TTS Implementation: It uses the
speech_recognitionlibrary with Google's API to convert speech to text (ASR) andpyttsx3to convert text responses to natural-sounding speech (TTS). - Diverse Task Handling: The system can fetch the time and date, search the web, get summaries from Wikipedia, and play media on YouTube.
- Real-time Status Feedback: A minimal tkinter GUI displays the assistant's current state, such as Listening or Processing, to the user].
- Robust Multi-threading: A multi-threaded architecture is used to ensure the GUI remains responsive while the system is waiting for voice input or processing a command
The system utilizes a sequential pipelined approach involving four main phases to execute a command[cite: 69]:
- Audio Acquisition: The
sr.Microphone()object captures live audio, andlistener.adjust_for_ambient_noise()pre-processes the input to filter background noise - Speech Processing (ASR): The recorded audio is sent to the Google Speech Recognition API via
listener.recognize_google(voice)to convert the speech into a text string - Command Logic: The recognized text is analyzed using Python's
if...elif...conditional statements to identify keywords (e.g., "play," "time," "wikipedia") and execute the corresponding task using external libraries likepywhatkitorwikipedia. - Text-to-Speech (TTS) Output: The
talk(text)function uses thepyttsx3engine to synthesize the response text and speak it back to the user.
The assistant is implemented with a multi-threaded design to maintain responsiveness.
GUI Thread (Main): Handles the tkinter window (root.mainloop()) and updates the status label (e.g., "Listening...").
Assistant Thread (Daemon): A separate thread runs the core run_assistant() function, which contains the blocking calls for microphone input (take_command()) and speech synthesis (talk()). This separation prevents the user interface from freezing.
- Python 3.x
- The following required Python libraries:
You can install the required libraries using pip: