Compartir
Titel
Técnicas de mineração incrementais em recuperação de informação
Autor(es)
Director(es)
Schlagwort
Minería de datos
Data mining
Tesis y disertaciones académicas
Universidad de Salamanca (España)
Academic dissertations
Internet
Recuperación de la información
Information retrieval
Estadística bayesiana
Bayesian statistical decision theory
Clasificación UNESCO
1203.17 Informática
Fecha de publicación
2010-02-24
Resumen
[EN] A desirable property of learning algorithms is the ability of incorporating new data in an incremental way. Incremental algorithms have received attention on the last few years. Particulary Bayesian networks, this is due to the hardness of the task. In Bayesian networks one example can change the whole structure of the Bayesian network. In this theses we focus on incremental induction of Tree Augmented Naive Bayes (TAN)
algorithm. A incremental version of TAN saves computing time, is more suite to data mining and concept drift. But, as usual in Bayesian learning TAN is restricted to discrete attributes. Complementary to the incremental TAN, we propose an incremental discretization algorithm, necessary to evaluate TAN in domains with continuous attribute. Discretization is a fundamental pre-processing step for some well- known
algorithms, the topic of incremental discretization has received few attention
from the community.
This theses has two major contributions, the benefict of both proposals is incremental learning, one for TAN and the other for discretization.We present and test a algorithm that rebuilds the network structure of tree augmented naive Bayes (TAN) based on the weighted sum of vectors containing the mutual information. We also present a new discretization method, this works in two layers. This two-stage architecture is very
fexible. It can be used as supervised or unsupervised. For the second layer any base discretization method can be used: equal width, equal frequency, recursive entropy discretization, chi-merge, etc. The most relevant aspect is that the boundaries of the intervals of the second layer can change when new data is available. We tested experimentally the incremental approach to discretization with batch and incremental learners.
The experimental evaluation of incremental TAN shows a perfor mance similar to the batch version. Similar remarks apply to incremental discretization. This is a relevant aspect, because few works in machine learning address the fundamental aspect of incremental discretization.
We believe that with Incremental discretization, the evaluation of the incremental algorithms can become more realistic and accurate.
We evaluated two versions of incremental discretization: supervised and unsupervised. We have seen that this feature can improve accuracy for the incremental learners and that the preview of future algorithm performance can be more precise. This method of discretization has another advantages, like, can be used with large data set's or can be used in dynamic environments with concept drift, areas where a batch discretization can be difficult or is not adequate. [ES] Esta tesis tenía como objetivo el estudio de una red Bayesiana (TAN) incremental. Durante el transcurso de esta se verificó la laguna en el área de una discretización incremental para la evaluación de un algoritmo incremental. Así se procuró dar como contribución para el área no solo un clasificador Bayesiano incremental sino también un modo de evaluación correcto del clasificador.
Los Sistemas de Recuperación de Información tienen como objetivo la realización de las tareas de indexación, búsqueda y clasificación de documentos (expresos en la forma textual), con el fin de satisfacer la necesidad de información del individuo, generalmente expresa a través de consultas. La necesidad de información puede ser entendida como la búsqueda de respuestas para determinadas cuestiones que tienen que ser resueltas, la recuperación de documentos que tratan sobre un determinado asunto o incluso la relación entre asuntos.
URI
DOI
10.14201/gredos.76525
Aparece en las colecciones