Urdu News Clustering Using K-Mean Algorithm On The Basis Of Jaccard Coefficient And Dice Coefficient Similarity

Rahman, Zahid; Hussain, Altaf; Shah, Hussain; Arshad, Muhammad

Título

dc.contributor.author	Rahman, Zahid
dc.contributor.author	Hussain, Altaf
dc.contributor.author	Shah, Hussain
dc.contributor.author	Arshad, Muhammad
dc.date.accessioned	2022-02-24T11:19:39Z
dc.date.available	2022-02-24T11:19:39Z
dc.date.issued	2022-02-08
dc.identifier.citation	ADCAIJ: Advances in Distributed Computing and Artificial Intelligence Journal, 10 (2021)
dc.identifier.issn	2255-2863
dc.identifier.uri	http://hdl.handle.net/10366/148641
dc.description.abstract	Clustering is the unsupervised machine learning process that group data objects into clusters such that objects within the same cluster are highly similar to one another. Every day the quantity of Urdu text is increasing at a high speed on the internet. Grouping Urdu news manually is almost impossible, and there is an utmost need to device a mechanism which cluster Urdu news documents based on their similarity. Clustering Urdu news documents with accuracy is a research issue and it can be solved by using similarity techniques i.e., Jaccard and Dice coefficient, and clustering k-mean algorithm. In this research, the Jaccard and Dice coefficient has been used to find the similarity score of Urdu News documents in python programming language. For the purpose of clustering, the similarity results have been loaded to Waikato Environment for Knowledge Analysis (WEKA), by using k-mean algorithm the Urdu news documents have been clustered into five clusters. The obtained cluster's results were evaluated in terms of Accuracy and Mean Square Error (MSE). The Accuracy and MSE of Jaccard was 85% and 44.4%, while the Accuracy and MSE of Dice coefficient was 87% and 35.76%. The experimental result shows that Dice coefficient is better as compared to Jaccard similarity on the basis of Accuracy and MSE.
dc.format.mimetype	application/pdf
dc.language.iso	eng
dc.publisher	Ediciones Universidad de Salamanca (España)
dc.rights	info:eu-repo/semantics/openAccess
dc.subject	Urdu News
dc.subject	Clustering Mechanism
dc.subject	Jaccard Coefficient
dc.subject	Dice coefficient
dc.subject	Python
dc.subject	WEKA
dc.subject	K-mean
dc.subject	MSE
dc.title	Urdu News Clustering Using K-Mean Algorithm On The Basis Of Jaccard Coefficient And Dice Coefficient Similarity
dc.type	info:eu-repo/semantics/article
dc.rights.accessRights	info:eu-repo/semantics/openAccess

Dateien zu dieser Ressource

Name:: Urdu_News_Clustering_Using_K-M ...
Größe:: 1.868Mb
Format:: PDF

Öffnen

Das Dokument erscheint in:

ADCAIJ, Vol.10, n.4 [9]

Zur Kurzanzeige