RUJA: Repositorio Institucional de Producción Científica

 

Multimodal speaker diarization for meetings using volume-evaluated SRP-PHAT and video analysis

dc.contributor.authorCabañas-Molero, Pablo Antonio
dc.contributor.authorLucena, Manuel
dc.contributor.authorFuertes, José Manuel
dc.contributor.authorVera-Candeas, Pedro
dc.contributor.authorRuiz-Reyes, Nicolás
dc.date.accessioned2024-02-07T00:37:51Z
dc.date.available2024-02-07T00:37:51Z
dc.date.issued2018-04-11
dc.description.abstractSpeaker diarization is traditionally defined as the problem of determining “who speaks when” given an audio or video stream. This is an important task in many applications for meeting rooms, including automatic transcription of conversations, camera steering or content summarization. When the room is equipped with microphone arrays and cameras, speakers can be distinguished according to their location and the problem can be addressed through localization techniques. This article proposes a multimodal speaker diarization system for meeting environments based on a modified SRP-PHAT function evaluated on space volumes rather than discrete points. In our system, this function is used in combination with a circular array, enabling audio-based localization based on the selection of local maxima. Voicing detection is used to detect speech frames, whereas video analysis is introduced to aid in the decision when users move or simultaneously speak. The approach is evaluated on the well-known AMI dataset with approximately 100 hours of realistic meeting recordings and shows an average diarization error rate of 21% – 25%.es_ES
dc.description.sponsorshipThis work was supported by the Andalusian Economy and Knowledge Council under project 2010-TIC6762, and the Spanish Ministry of Economy and Competitiveness under project TEC2015-67387-C4-2-R.es_ES
dc.identifier.citationCabañas-Molero, P., Lucena, M., Fuertes, J.M. et al. Multimodal speaker diarization for meetings using volume-evaluated SRP-PHAT and video analysis. Multimed Tools Appl 77, 27685–27707 (2018). https://doi.org/10.1007/s11042-018-5944-2es_ES
dc.identifier.issn1380-7501es_ES
dc.identifier.other10.1007/s11042-018-5944-2es_ES
dc.identifier.uri-es_ES
dc.identifier.urihttps://hdl.handle.net/10953/2188
dc.language.isoenges_ES
dc.publisherSpringeres_ES
dc.relation.ispartofMultimedia Tools and Applications 2018; 77, 27685–27707es_ES
dc.rightsCC0 1.0 Universal*
dc.rights.accessRightsinfo:eu-repo/semantics/openAccesses_ES
dc.rights.urihttp://creativecommons.org/publicdomain/zero/1.0/*
dc.subjectSpeaker diarizationes_ES
dc.subjectMeeting roomses_ES
dc.subjectSRP-PHATes_ES
dc.subjectMultimodal processinges_ES
dc.subject.udc621.39es_ES
dc.titleMultimodal speaker diarization for meetings using volume-evaluated SRP-PHAT and video analysises_ES
dc.typeinfo:eu-repo/semantics/articlees_ES
dc.type.versioninfo:eu-repo/semantics/publishedVersiones_ES

Archivos

Bloque original

Mostrando 1 - 1 de 1
Cargando...
Miniatura
Nombre:
Multimedia_tools_2018.pdf
Tamaño:
1.43 MB
Formato:
Adobe Portable Document Format
Descripción:
PDF del artículo

Bloque de licencias

Mostrando 1 - 1 de 1
No hay miniatura disponible
Nombre:
license.txt
Tamaño:
1.98 KB
Formato:
Item-specific license agreed upon to submission
Descripción:

Colecciones