A data mining framework based on boundary-points for gene selection from DNA-microarrays: Pancreatic Ductal Adenocarcinoma as a case study

Ramos González, Juan; Castellanos Garzón, José Antonio; De Paz, Juan F.; Corchado Rodríguez, Juan Manuel

doi:10.1016/j.engappai.2018.01.007

Título

A data mining framework based on boundary-points for gene selection from DNA-microarrays: Pancreatic Ductal Adenocarcinoma as a case study

Autor(es)

Ramos González, Juan

Castellanos Garzón, José Antonio

De Paz, Juan F.

Corchado Rodríguez, Juan Manuel

Palabras clave

Feature selection

Gene selection

Data mining

Cluster analysis

Evolutionary computation

Boundary point

Visual analytics

Filter method

Boundary gene

Clasificación UNESCO

1203.17 Informática

Fecha de publicación

2018-01-20

Citación

Ramos, J., Castellanos Garzón, J., de Paz, J. and Corchado, J., 2018. A data mining framework based on boundary-points for gene selection from DNA-microarrays: Pancreatic Ductal Adenocarcinoma as a case study. Engineering Applications of Artificial Intelligence, 70, pp.92-108. https://doi.org/10.1016/j.engappai.2018.01.007

Resumen

[EN] Gene selection (or feature selection) from DNA-microarray data can be focused on different techniques, which generally involve statistical tests, data mining and machine learning. In recent years there has been an increasing interest in using hybrid-technique sets to face the problem of meaningful gene selection; nevertheless, this issue remains a challenge. In an effort to address the situation, this paper proposes a novel hybrid framework based on data mining techniques and tuned to select gene subsets, which are meaningfully related to the target disease conducted in DNA-microarray experiments. For this purpose, the framework above deals with approaches such as statistical significance tests, cluster analysis, evolutionary computation, visual analytics and boundary points. The latter is the core technique of our proposal, allowing the framework to define two methods of gene selection. Another novelty of this work is the inclusion of the age of patients as an additional factor in our analysis, which can leading to gaining more insight into the disease. In fact, the results reached in this research have been very promising and have shown their biological validity. Hence, our proposal has resulted in a methodology that can be followed in the gene selection process from DNA-microarray data.

URI

https://hdl.handle.net/10366/145834

ISSN

0952-1976

DOI

10.1016/j.engappai.2018.01.007

Aparece en las colecciones