Conference: International Conference on Asian Language Processing (IALP 2017)
Date: December 05, 2017
Venue: Singapore, National University of Technology
Organizer: COLIPS
Presented paper: Analyzing word embeddings and improving POS tagger of Tigrinya

Abstract—In this paper, we analyze word embeddings for
a morphologically rich language, Tigrinya. Tigrinya is a
Semitic language spoken natively in Eritrea and Ethiopia
by over seven million people. The unique and complex
morphology of Semitic languages, which includes Arabic,
Amharic, and Hebrew, is commonly known as ‘root and
template pattern’ morphology. This morphology generates
a large number of inflected forms that often cause
out-of-vocabulary (OOV) challenges in language processing.
This problem is more challenging for low resource languages,
such as Tigrinya, that offers very little support of annotated
resources. Word embedding methods, given a large raw text
corpus, form semantic and syntactic vector representation
of words. Therefore, we construct a new text corpus and
investigate the optimal settings for generating word vectors
for Tigrinya. We also utilize word embeddings to improve
the performance of a Tigrinya part-of-speech tagger created
from a small tagged corpus.

-Tigrinya language; word embeddings;
part-of-speech tagging; skip-gram; continuous bag of words;

presentation slides