Conference: International Conference on Asian Language Processing (IALP 2017)

Date: December 05, 2017

Venue: Singapore, National University of Technology

Organizer: COLIPS

Presented paper: Analyzing word embeddings and improving POS tagger of Tigrinya

Abstract—In this paper, we analyze word embeddings for

a morphologically rich language, Tigrinya. Tigrinya is a

Semitic language spoken natively in Eritrea and Ethiopia

by over seven million people. The unique and complex

morphology of Semitic languages, which includes Arabic,

Amharic, and Hebrew, is commonly known as ‘root and

template pattern’ morphology. This morphology generates

a large number of inflected forms that often cause

out-of-vocabulary (OOV) challenges in language processing.

This problem is more challenging for low resource languages,

such as Tigrinya, that offers very little support of annotated

resources. Word embedding methods, given a large raw text

corpus, form semantic and syntactic vector representation

of words. Therefore, we construct a new text corpus and

investigate the optimal settings for generating word vectors

for Tigrinya. We also utilize word embeddings to improve

the performance of a Tigrinya part-of-speech tagger created

from a small tagged corpus.

Keywords-Tigrinya language; word embeddings;

part-of-speech tagging; skip-gram; continuous bag of words;

presentation slides