GitHub - ducspe/VisualOnlyVoiceActivityDetection: Visual-only voice activity detection based on optical flow and RGB fusion.

This repository containes our implementation for the approach presented in our paper: "See the silence: improving visual-only voice activity detection by optical flow and RGB fusion".

The official published paper is available here: https://link.springer.com/chapter/10.1007/978-3-030-87156-7_4
and an earlier version is also available here: https://github.com/ducspe/VVADpaper

The built system has the following structure:

The program needs to be run twice: one time with RGB inputs, and second time with optical flow inputs.

To create mean and standard deviation statistics, the learn_trainingsubset_statistics.py needs to be called for RGB and optical flow separately. The resulting .npy file will have to be named correspondingly. The file name string is used as a parameter in vvad_train.py, vvad_test.py and vvad_fusion_test.py, where 2 separate .npy files are necessary.

data_dda is a folder with a very small example subset from the TCD-TIMIT preprocessed data. Full preprocessing code is available in a separate repository at: https://github.com/ducspe/TCD-TIMIT-Preprocessing After the preprocessing, the full dataset has to follow the structure of the data_dda folder in this repository.

The train is started by running vvad_train.py

You can then test without any fusion by running vvad_test.py

The RGB and optical flow models need to be saved and fused with the help of vvad_fusion_test.py

Code related to audio label inference is available in the processing folder/module

All other helper utility functions are available in utils.py

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
data_dda		data_dda
extra_material		extra_material
networks		networks
processing		processing
.gitignore		.gitignore
README.md		README.md
learn_trainingsubset_statistics.py		learn_trainingsubset_statistics.py
losses.py		losses.py
requirements.txt		requirements.txt
train_dataset.py		train_dataset.py
utils.py		utils.py
validation_dataset.py		validation_dataset.py
video_train_statistics.npy		video_train_statistics.npy
vvad_fusion_test.py		vvad_fusion_test.py
vvad_test.py		vvad_test.py
vvad_train.py		vvad_train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

ducspe/VisualOnlyVoiceActivityDetection

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages