Automatic multimodal question and answering for video lectures

Presentation: Canva

Project description

This work involves synthesizing a video from a set of video lectures that answers the question raised by the student. This contains following objectives.

Select a video lectures set that containing SRTs.
Study and implement the voice activity detection (VAD) algorithm.
Extract the speech segments from the VAD output.
Identify the spoken content in text form using ASR for each segment.
Obtain the sentence specific time stamps.
Create answer summary.
Identify video parts corresponding to the answer summary.
Stitch the summary video segments to obtain natural like video.

Guide

Chiranjeevi Yarra (Spoken Language Forensics & Informatics (SLFI) group - LTRC)

Running the Frontend

We used Flask, HTML to run frontend server. To run

Navigate to the frontend directory:
```
cd frontend
```
Run
```
python3 main.py
```

Video to Audio Conversion and Dividing Audio into Audio Chunks

Prerequisities

FFMPEG: pip3 install ffmpeg-python
PyTorch: pip3 install torch torchvision
Transformers: pip3 install transformers
Sentence Transfomers: pip3 install -U sentence-transformers
Faiss: pip3 install faiss-cpu
Silero VAD: pip3 install silero-vad
SoundFile: pip3 install soundfile
Sox: pip3 install sox
Streamlit: pip3 install streamlit
pysrt: pip3 install pysrt
moviepy: pip3 install moviepy==1.0.3

Note: We require ffmpeg in system also. So please install through apt install ffmpeg (Linux) or brew install ffmpeg (Mac)

Steps

Videos should be in Data/ Folder.

Run the following notebook to complete the processing up to audio chunk generation:

pipeline-qwen.ipynb

This notebook will:

Convert Video → Audio
Perform Voice Activity Detection (VAD)
Generate Audio Chunks

This will generate Audio Chunks (.wav) files for each lectures.

Converting Audio Chunks into SRT files using Whisper model and MFA.
Encode this SRT files using QWEN model by running Encoding.py file.
Now, the generated .index files use in the backend repo to use with backend.

Name		Name	Last commit message	Last commit date
Latest commit History 74 Commits
Data		Data
Results/QA-Models		Results/QA-Models
Transcription-MFA-SRT		Transcription-MFA-SRT
assets		assets
backend @ e591375		backend @ e591375
new_ui		new_ui
ui		ui
utils		utils
.gitignore		.gitignore
.gitmodules		.gitmodules
Encoding.py		Encoding.py
Generate-SRT.py		Generate-SRT.py
Notes.md		Notes.md
QA.py		QA.py
Question-generation-colab.ipynb		Question-generation-colab.ipynb
Question-generation.ipynb		Question-generation.ipynb
README.md		README.md
Video-Generation.py		Video-Generation.py
pipeline-qwen.ipynb		pipeline-qwen.ipynb
pipeline-small-models.ipynb		pipeline-small-models.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Automatic multimodal question and answering for video lectures

Project description

Guide

Running the Frontend