Departamento de Ingeniería de Telecomunicación

URI permanente para esta comunidadhttps://hdl.handle.net/10953/39

En esta Comunidad se recogen los documentos generados por el Departamento de Ingeniería de Telecomunicación y que cumplen los requisitos de Copyright para su difusión en acceso abierto.

Examinar

Mostrando 1 - 7 de 7

A constrained tonal semi-supervised non-negative matrix factorization to classify presence/absence of wheezing in respiratory sounds
(Elsevier, 2020-04-01) Torre-Cruz, Juan; Cañadas-Quesada, Francisco Jesús; García-Galán, Sebastián; Ruiz-Reyes, Nicolás; Vera-Candeas, Pedro; Carabias-Orti, Julio José
From a clinical point of view, the detection of wheezing presence in respiratory sounds is a challenging task for early identification of pulmonary diseases since wheezing is the main manifestation associated to airway obstruction. In this article, we propose a novel method to detect the presence or absence of wheeze sounds in breath recordings in order to increase the reliability of the subjective diagnosis provided by the physician in the auscultation process. Specifically, it is assumed an unhealthy subject when wheeze sounds can be detected during breathing. The proposed method consists of three stages. The first stage attempts to estimate the spectral interval, band of interest (BOI), that shows the highest probability to find wheeze sounds. In the second stage, a constrained tonal semi-supervised non-negative matrix factorization (NMF) approach is applied to obtain spectral patterns that models the periodic or tonal nature typically shown by wheeze sounds. The third stage analyzes the estimated wheezing spectrogram based on the smoothness of the spectral trajectories from the most significant energy previously factorized in the BOI. Our system has been evaluated and compared to other state-of-the-art methods, yielding competitive results in the wheezing presence detection in respiratory sounds.
A novel wheezing detection approach based on constrained non-negative matrix factorization
(Elsevier, 2019-05-01) Torre-Cruz, Juan; Cañadas-Quesada, Francisco Jesús; Carabias-Orti, Julio José; Vera-Candeas, Pedro; Ruiz-Reyes, Nicolás
The early wheezing detection is still a challenging task in biomedical signal processing because the presence of wheeze sounds often indicate respiratory diseases from airway obstructions. Currently, most of the first clinical examinations to detect any airway obstructions are carried out using auscultation. However, a high percentage of diagnoses are misdiagnosed since they are highly dependent on the physician’s training in the wheezing detection, especially in noisy environments in which weak wheeze sounds can be masked by louder respiratory sounds. In this work, we propose a novel wheezing detection approach, based on Constrained Non-negative Matrix Factorization, that uses two-stage cascade: separation and detection. The novelty of the separation stage is to model wheeze and respiratory sounds as reliably as possible that they can be observed in the nature incorporating constraints (sparseness and smoothness) into the NMF factorization. Once the estimated wheezing and respiratory signal are obtained from the separation stage, the detection contribution is based on the use of the Kullback-Leibler divergence to discriminate between wheezing and respiratory areas. The experiments have been conducted using three different datasets composed of healthy or unhealthy patients. First, an optimization process is applied to obtain the optimal parameters of the separation stage. Finally, the performance of the wheezing detection of the proposed method is evaluated taking into account other state-of-the-art methods. Experimental results report that i) the proposed method outperforms recent state-of-the-art wheezing detection approaches showing a robust wheezing detection performance even evaluating noisy environments and ii) the ability of the proposal to reliably detect healthy patients.
Combining a recursive approach via non-negative matrix factorization and Gini index sparsity to improve reliable detection of wheezing sounds
(Elsevier, 2020-06-01) De La Torre Cruz, Juan; Cañadas Quesada, Francisco Jesús; Carabias Orti, Julio José; Vera Candeas, Pedro; Ruiz Reyes, Nicolás
Auscultation constitutes a fast, non-invasive and low-cost tool widely used to diagnose respiratory diseases in most of the health centres. However, the acoustic training and expertise acquired by the physician is still crucial to provide a reliable diagnosis of the status of the lung. Each wrong diagnosis increases the risk to the health of patients and the costs associated with the treatment of the disease detected. A wheezing detection system can be useful to the physician to minimize the subjectivity of the interpretation of the breathing sounds, misdiagnoses due to stress and elucidating complex acoustic scenes (such as louder background noises). Highlight that the presence of wheeze sounds is one of the main indicators of respiratory disorders from airway obstructions. This work presents an expert and intelligent system to detect wheeze sounds based on a recursive algorithm that combines orthogonal non-negative matrix factorization (ONMF) and the sparsity descriptor Gini index. The recursive algorithm is composed of four stages. The first stage is based on ONMF modelling to factorize the spectral bases as dissimilar as possible. The second stage clusters the ONMF bases into two categories: wheezing and normal breath. The third stage proposes a novel stopping criterion that controls the loss of wheezing spectral content at the expense of removing normal breath content in the recursive algorithm. Finally, the fourth stage determines the patient’s condition to locate the temporal intervals in which wheeze sounds are active for unhealthy patients. Experimental results report that the proposed method: (i) provides the best detection performance compared to the recent state-of-the-art wheezing detection approaches, achieving the highest robustness in noisy environments; and (ii) reliably distinguishes the patient’s condition (healthy/unhealthy). The strengths of the proposed method are the following: (i) its unsupervised nature since it does not depend on any training stage to learn in advanced the sounds of interest (wheezing). This fact could make this method attractive to be used in clinical settings because wheezing sound databases are often unavailable; and (ii) the modelling of the spectral behaviour by means of a common feature, the sparsity, that represents the typically energy distributions shown by most of the wheeze and normal breath sounds.
Monitoring the internal quality of ornamental stone using impact-echo testing
(Elsevier, 2019-12-01) Montiel-Zafra, María Violeta; Cañadas-Quesada, Francisco Jesús; Campos-Suñol, María José; Vera-Candeas, Pedro; Ruiz-Reyes, Nicolás
The decay and durability of stone materials is a natural response to the progressive adjustment to different environmental and harsh conditions. Usually stone building elements with no apparent sign of decay are affected by the loss of cohesion. Non-invasive, early and low-cost identification of the internal damage of stone materials would be a great step forward. This paper presents an impact-echo (IE) method to analyse the internal quality of ornamental stone. The proposed method attempts to estimate the P-wave velocity in the material applying a frequency estimator that best explains the energy distribution of the possible modes of vibration from the captured IE signals. The velocity estimation will be analysed along a set of freeze-thawing cycles in order to establish a correlation with the internal damage caused in the material confirmed by its porosity. This value has been measured after several freezing-thawing cycles at each stone specimen. Experimental results show that the proposed method can be considered as a valid and effective tool for determining the internal damage of ornamental stone materials. Besides, the proposed method could be easily adapted to analyse specimens of different sizes, shapes and types of rocks.
Multichannel Blind Sound Source Separation Using Spatial Covariance Model With Level and Time Differences and Nonnegative Matrix Factorization
(IEEE, 2018-04-27) Carabias-Orti, Julio José; Nikunen, Joonas; Virtanen, Tuomas; Vera-Candeas, Pedro
This paper presents an algorithm for multichannel sound source separation using explicit modeling of level and time differences in source spatial covariance matrices (SCM). We propose a novel SCM model in which the spatial properties are modeled by the weighted sum of direction of arrival (DOA) kernels. DOA kernels are obtained as the combination of phase and level difference covariance matrices representing both time and level differences between microphones for a grid of predefined source directions. The proposed SCM model is combined with the NMF model for the magnitude spectrograms. Opposite to other SCM models in the literature, in this work, source localization is implicitly defined in the model and estimated during the signal factorization. Therefore, no localization preprocessing is required. Parameters are estimated using complex-valued nonnegative matrix factorization with both Euclidean distance and Itakura-Saito divergence. Separation performance of the proposed system is evaluated using the two-channel SiSEC development dataset and four channels signals recorded in a regular room with moderate reverberation. Finally, a comparison to other state-of-the-art methods is performed, showing better achieved separation performance in terms of SIR and perceptual measures.
Multimodal speaker diarization for meetings using volume-evaluated SRP-PHAT and video analysis
(Springer, 2018-04-11) Cabañas-Molero, Pablo Antonio; Lucena, Manuel; Fuertes, José Manuel; Vera-Candeas, Pedro; Ruiz-Reyes, Nicolás
Speaker diarization is traditionally defined as the problem of determining “who speaks when” given an audio or video stream. This is an important task in many applications for meeting rooms, including automatic transcription of conversations, camera steering or content summarization. When the room is equipped with microphone arrays and cameras, speakers can be distinguished according to their location and the problem can be addressed through localization techniques. This article proposes a multimodal speaker diarization system for meeting environments based on a modified SRP-PHAT function evaluated on space volumes rather than discrete points. In our system, this function is used in combination with a circular array, enabling audio-based localization based on the selection of local maxima. Voicing detection is used to detect speech frames, whereas video analysis is introduced to aid in the decision when users move or simultaneously speak. The approach is evaluated on the well-known AMI dataset with approximately 100 hours of realistic meeting recordings and shows an average diarization error rate of 21% – 25%.
The music demixing machine: toward real-time remixing of classical music
(Springer, 2023-04-06) Cabañas-Molero, Pablo Antonio; Muñoz-Montoro, Antonio Jesús; Vera-Candeas, Pedro; Ranilla, José
Classical music, unlike popular music, is usually recorded live with close microphone techniques. For this reason, isolated tracks are not available to create the final mixture/stream, and so the mixing process requires greater effort. Source separation methods are a potential solution to this problem. However, current algorithms are not fast enough to yield real-time separation in professional setups with dozens of microphones and sources. In this paper, we propose a fast approach consisting of a panning-based multichannel non-negative matrix factorization model to separate classical music. We tested the system on real professional recordings, where we were able to reach real-time with very low latency and promising quality.

RUJA: Repositorio Institucional de Producción Científica

Examinar

Examinando Departamento de Ingeniería de Telecomunicación por Materia "621.39"