This project demonstrates how to perform text-to-audio and audio-to-text conversions using Hugging Face transformers. It includes examples of generating audio from text, converting audio to text, and downloading audio for further processing.
- Text-to-Audio Conversion
Use the Hugging Facepipelineand thesuno/bark-smallmodel to convert text into audio. - Audio-to-Text Conversion
Use the Hugging Facewhisper-mediummodel to transcribe audio files into text. - Download and Play Audio
Download an audio file, play it directly, and process it for transcription.
- Python 3.7+
- Hugging Face
transformerslibrary - Jupyter Notebook (or Google Colab)
IPythonfor displaying audio in notebooks
- Clone the repository:
git clone https://github.com/<your-username>/<your-repo-name>.git cd <your-repo-name>
This project demonstrates text-to-audio and audio-to-text conversions using Hugging Face transformers. It includes key models, installation instructions, and code examples for quick implementation.
This project demonstrates text-to-audio and audio-to-text conversions using Hugging Face transformers. It showcases how to generate audio from text, transcribe audio into text, and play audio directly within a Python environment.
Run the following command to install the required libraries:
pip install transformers ipythonConvert text into audio using the suno/bark-small model:
from transformers import pipeline
from IPython.display import Audio
# Input text
text = "Subscribe to my channel: Content on Demand"
# Load the text-to-speech pipeline
pipe = pipeline("text-to-speech", model='suno/bark-small')
# Generate audio
output = pipe(text)
# Play the audio
Audio(output["audio"], rate=output["sampling_rate"])from transformers import pipeline
# Load the Whisper model
whisper = pipeline('automatic-speech-recognition', model='openai/whisper-medium')
# Transcribe the audio file
transcription = whisper('audio.mp3')
# Print the transcription
print(transcription)-suno/bark-small: Used for converting text into audio.
-openai/whisper-medium: Used for transcribing audio into text. from transformers import pipeline
-Hugging Face for providing powerful transformer-based models.
-Google Colab for offering an easy-to-use platform for running Python code.
