Towards Subjectifying Text Clustering
Sajib Dasgupta and Vincent Ng.
Proceedings of the 33rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 483-490, 2010.
Click here for the
PostScript or PDF
version.
Abstract
Although it is common practice to produce only a single clustering of a dataset,
in many cases text documents can be clustered along different dimensions.
Unfortunately, not only do traditional text clustering algorithms fail to produce multiple clusterings of a dataset,
the only clustering they produce may not be the one that the user desires.
In this paper, we propose a simple active clustering algorithm that is capable of producing multiple clusterings of the same data
according to user interest. In comparison to previous work on feedback-oriented clustering,
the amount of user feedback required by our algorithm is minimal.
In fact, the feedback turns out to be as simple as a cursory look at a list of
words.
Experimental results are very promising: our system is able to generate
clusterings along the user-specified dimensions with reasonable accuracies
on several challenging text classification tasks,
thus providing suggestive evidence that our approach is viable.
BibTeX entry
@InProceedings{Dasgupta+Ng:10a,
author = {Sajib Dasgupta and Vincent Ng},
title = {Towards Subjectifying Text Clustering},
booktitle = {Proceedings of the 33rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval},
pages = {483--490},
year = 2010
}