A Comparative Performance Study of Feature Selection Methods for the Anti-spam Filtering Domain
Fecha de publicación
Springer Science + Business Media
Advances in Data Mining. Applications in Medicine, Web Mining, Marketing, Image and Signal Mining Lecture Notes in Computer Science. Lecture Notes in Computer Science. Volumen 4065, pp. 106-120.
In this paper we analyse the strengths and weaknesses of the mainly used feature selection methods in text categorization when they are applied to the spam problem domain. Several experiments with different feature selection methods and content-based filtering techniques are carried out and discussed. Information Gain, χ 2-text, Mutual Information and Document Frequency feature selection methods have been analysed in conjunction with Naïve Bayes, boosting trees, Support Vector Machines and ECUE models in different scenarios. From the experiments carried out the underlying ideas behind feature selection methods are identified and applied for improving the feature selection process of SpamHunting, a novel anti-spam filtering software able to accurate classify suspicious e-mails.
978-3-540-36036-0 (Print) / 978-3-540-36037-7 (Online)
0302-9743 (Print) / 1611-3349 (Online)
- BISITE. Congresos