W - kmeans: Clustering News Articles using WordNet

TitleW - kmeans: Clustering News Articles using WordNet
Publication TypeConference Paper
Year of Publication2010
AuthorsBouras, C, Tsogkas, V
Conference NameAdvanced Knowledge - based Systems, Invited Session of the 14th International Conference on Knowledge – based and Intelligent Information & Engineering Systems, Cardiff Wales, UK
Date PublishedSeptember 8 - 10
Abstract

Document clustering is a powerful technique that has been widely
used for organizing data into smaller and manageable information kernels.
Several approaches have been proposed suffering however from problems like
synonymy, ambiguity and lack of a descriptive content marking of the
generated clusters. We are proposing the enhancement of standard kmeans
algorithm using the external knowledge from WordNet hypernyms in a twofold
manner: enriching the “bag of words” used prior to the clustering process and
assisting the label generation procedure following it. Our experimentation
revealed a significant improvement over standard kmeans for a corpus of news
articles derived from major news portals. Moreover, the cluster labeling process
generates useful and of high quality cluster tags.