Topic-wise, Sentiment-wise, or Otherwise: Identifying the Hidden Dimension for Unsupervised Text Classification

Sajib Dasgupta and Vincent Ng.
Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, pp. 580-589, 2009.

Click here for the PostScript or PDF version. The talk slides are available here.

Abstract

While traditional work on text clustering has largely focused on grouping documents by topic, it is conceivable that a user may want to cluster documents along other dimensions, such as the author's mood, gender, age, or sentiment. Without knowing the user's intention, a clustering algorithm will only group documents along the most prominent dimension, which may not be the one the user desires. To address this problem, we propose a novel way of incorporating user feedback into a clustering algorithm, which allows a user to easily specify the dimension along which she wants the data points to be clustered via inspecting only a small number of words. This distinguishes our method from existing ones, which typically require a large amount of effort on the part of humans in the form of document annotation or interactive construction of the feature space. We demonstrate the viability of our method on several challenging sentiment datasets.

BibTeX entry

@InProceedings{Dasgupta+Ng:09b,
  author = {Sajib Dasgupta and Vincent Ng},
  title = {Topic-wise, Sentiment-wise, or Otherwise? Identifying the Hidden Dimension for Unsupervised Text Classification},
  booktitle = {Proceedings of the 2009 Conference on Empirical Methods in Natural Langauge Processing},
  pages = {580--589},
  year = 2009
}