Departamento de Ingeniería de Telecomunicación

URI permanente para esta comunidadhttps://hdl.handle.net/10953/39

En esta Comunidad se recogen los documentos generados por el Departamento de Ingeniería de Telecomunicación y que cumplen los requisitos de Copyright para su difusión en acceso abierto.

Examinar

Mostrando 1 - 2 de 2

Multimodal speaker diarization for meetings using volume-evaluated SRP-PHAT and video analysis
(Springer, 2018-04-11) Cabañas-Molero, Pablo Antonio; Lucena, Manuel; Fuertes, José Manuel; Vera-Candeas, Pedro; Ruiz-Reyes, Nicolás
Speaker diarization is traditionally defined as the problem of determining “who speaks when” given an audio or video stream. This is an important task in many applications for meeting rooms, including automatic transcription of conversations, camera steering or content summarization. When the room is equipped with microphone arrays and cameras, speakers can be distinguished according to their location and the problem can be addressed through localization techniques. This article proposes a multimodal speaker diarization system for meeting environments based on a modified SRP-PHAT function evaluated on space volumes rather than discrete points. In our system, this function is used in combination with a circular array, enabling audio-based localization based on the selection of local maxima. Voicing detection is used to detect speech frames, whereas video analysis is introduced to aid in the decision when users move or simultaneously speak. The approach is evaluated on the well-known AMI dataset with approximately 100 hours of realistic meeting recordings and shows an average diarization error rate of 21% – 25%.
The music demixing machine: toward real-time remixing of classical music
(Springer, 2023-04-06) Cabañas-Molero, Pablo Antonio; Muñoz-Montoro, Antonio Jesús; Vera-Candeas, Pedro; Ranilla, José
Classical music, unlike popular music, is usually recorded live with close microphone techniques. For this reason, isolated tracks are not available to create the final mixture/stream, and so the mixing process requires greater effort. Source separation methods are a potential solution to this problem. However, current algorithms are not fast enough to yield real-time separation in professional setups with dozens of microphones and sources. In this paper, we propose a fast approach consisting of a panning-based multichannel non-negative matrix factorization model to separate classical music. We tested the system on real professional recordings, where we were able to reach real-time with very low latency and promising quality.

RUJA: Repositorio Institucional de Producción Científica

Examinar

Examinando Departamento de Ingeniería de Telecomunicación por Autor "Cabañas-Molero, Pablo Antonio"