Topic-wise, Sentiment-wise, or Otherwise: Identifying the Hidden Dimension for Unsupervised Text Classification
Sajib Dasgupta and Vincent Ng.
Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, pp. 580-589, 2009.
Click here for the
PostScript or PDF
version.
The talk slides are available here.
Abstract
While traditional work on text clustering
has largely focused on grouping documents by topic, it
is conceivable that a user may want to cluster documents along other
dimensions, such as the author's mood, gender, age, or sentiment.
Without knowing the user's intention, a clustering algorithm will only
group documents along the most prominent dimension, which may not be
the one the user desires.
To address this problem, we propose
a novel way of incorporating user feedback into a clustering algorithm,
which allows a user to easily specify the dimension along which she
wants the data points to be clustered via inspecting only a small number of
words.
This distinguishes our method from existing ones, which typically require
a large amount of effort on the part of humans in the form of document
annotation or interactive construction of the feature space.
We demonstrate the viability of our method
on several challenging sentiment datasets.
BibTeX entry
@InProceedings{Dasgupta+Ng:09b,
author = {Sajib Dasgupta and Vincent Ng},
title = {Topic-wise, Sentiment-wise, or Otherwise? Identifying the Hidden Dimension for Unsupervised Text Classification},
booktitle = {Proceedings of the 2009 Conference on Empirical Methods in Natural Langauge Processing},
pages = {580--589},
year = 2009
}