Compartir
Título
Tokenising, Stemming and Stopword Removal on Anti-spam Filtering Domain
Autor(es)
Palabras clave
Computer Science
Fecha de publicación
2006
Editor
Springer Science + Business Media
Citación
Lecture Notes in Computer Science Current Topics in Artificial Intelligence. 11th Conference of the Spanish Association for Artificial Intelligence, CAEPIA 2005, Santiago de Compostela, Spain, November 16-18, 2005, Revised Selected Papers. Lecture Notes in Computer Science. Volumen 4177, pp. 449-458.
Resumen
Junk e-mail detection and filtering can be considered a cost-sensitive classification problem. Nevertheless, preprocessing methods and noise reduction strategies used to enhance the computational efficiency in text classification cannot be so efficient in e-mail filtering. This fact is demonstrated here where a comparative study of the use of stopword removal, stemming and different tokenising schemes is presented. The final goal is to preprocess the training e-mail corpora of several content-based techniques for spam filtering (machine approaches and case-based systems). Soundness conclusions are extracted from the experiments carried out where different scenarios are taken into consideration.
URI
ISBN
978-3-540-45914-9 (Print) / 978-3-540-45915-6 (Online)
ISSN
0302-9743 (Print) / 1611-3349 (Online)
Aparece en las colecciones
- BISITE. Congresos [298]













