AV Speech Separation

30 December 2024

AV Speech SeparationAV Speech Separation

Overview

The An Audio-Visual Speech Separation and Personalized Keyphrase Detection in Noisy Environments is an advanced project inspired by the human brain's ability to focus on a single voice amid overlapping conversations, known as the "cocktail party effect". This system addresses challenges in scenarios like conferences, public events, and crowded spaces where traditional audio processing falls short. By leveraging audio-visual cues, the project combines facial movement detection, such as lip movements, with advanced audio filtration techniques to remove background noise and improve transcription accuracy. The system ensures synchronization and clarity by mapping audio to corresponding visual elements, making it applicable in domains like security, media production, and assistive technologies.

Key Features

Output

My teammates, Rishab R Budale, Tejas Nayak B, and Hithaish, and I had the opportunity to present our paper at the 3rd Congress on Control, Robotics, and Mechatronics (CRM2025), organized by SR University, Warangal, India, on February 2, 2025. This work was built under the guidance of Dr. Priya R Kamath.

Srajan Kumar2025