Towards Subjectifying Text Clustering

Sajib Dasgupta and Vincent Ng.
Proceedings of the 33rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 483-490, 2010.

Click here for the PostScript or PDF version.

Abstract

Although it is common practice to produce only a single clustering of a dataset, in many cases text documents can be clustered along different dimensions. Unfortunately, not only do traditional text clustering algorithms fail to produce multiple clusterings of a dataset, the only clustering they produce may not be the one that the user desires. In this paper, we propose a simple active clustering algorithm that is capable of producing multiple clusterings of the same data according to user interest. In comparison to previous work on feedback-oriented clustering, the amount of user feedback required by our algorithm is minimal. In fact, the feedback turns out to be as simple as a cursory look at a list of words. Experimental results are very promising: our system is able to generate clusterings along the user-specified dimensions with reasonable accuracies on several challenging text classification tasks, thus providing suggestive evidence that our approach is viable.

BibTeX entry

@InProceedings{Dasgupta+Ng:10a,
  author = {Sajib Dasgupta and Vincent Ng},
  title = {Towards Subjectifying Text Clustering},
  booktitle = {Proceedings of the 33rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval},
  pages = {483--490},
  year = 2010
}