A cellular-based evolutionary approach for the extraction of emerging patterns in massive data streams
Archivos
Fecha
2021
Título de la revista
ISSN de la revista
Título del volumen
Editor
Springer
Resumen
Today, the number of existing devices generates immense amounts of data on a continuous basis that must be processed by new distributed data stream mining approaches. In this paper we present a new approach for extracting descriptive emerging patterns in massive data streams from different sources through Apache Kafka and Apache Spark Streaming whose objective is to monitor the state of the system with respect to a variable of interest. For this purpose, the proposed algorithm is a cellular-based multi-objective evolutionary fuzzy system that uses an informed strategy for efficient data processing and a re-initialisation and filtering mechanism to eliminate redundant and low-reliable patterns. The experimental study carried out demonstrates an interpretability improvement of 25% in the extraction of high-interest knowledge by the proposed algorithm, which would make it easier for experts to analyse the problem. Finally, the proposed algorithm is up to five times faster than another proposal on the processing of the same amount of data. In this experimental study, up to 750,000 instances have been processed in approximately four seconds.
Descripción
Palabras clave
Big dataData stream mining, Evolutionary algorithms, Fuzzy logic, Emerging pattern mining