Unsupervised Morphological Segmentation

This page is the distribution site for "Morpheme++", a language-independent morphological word segmentation system. Given a list of words in a particular language our system can morphologically segment each word in the list without requiring any prior segmentation samples, language-specific segmentation rules, or morpheme dictionaries (say, prefix and suffix dictionaries).

As an input it requires,
(1) a list of words in a particular language i.e., the vocabulary, and
(2) frequency of each word in a corpus (Optional).

As an output it produces the following:
(1) morphological segmentation of each word in the vocabulary e.g., unfriendliness=un+friend+ly+ness.
(2) automatically learned morpheme dictionaries of a language i.e, a suffix and prefix list, and a root list.
(3) [A feature *not* available in the existing softwares like Morfessor and Linguistica] automatically learned allomorphic character change rules i.e., a single character replacement, addition or deletion at the segmentation boundary.
Example of a learned rule: "y:i<=> _ +:0 able" which denotes that character y changes to i at the boundary when the right context is "able".

The software is free to use and distribute for non-commercial purposes. For any reference to the Software cite the following paper:

High-Performance, Language-Independent Morphological Segmentation.
Sajib Dasgupta and Vincent Ng.
In the conference of the NAACL-HLT, New York, 2007.


Download Morpheme++ (Updated as recently as August 27th, 2014)

Guideline to run the software:


[Don't forget to add the list of vowels in a language. See the guideline.]