Data & Model Releases

As part of GoURMET’s efforts to increase resources and tools available for low-resource machine translation, we have released many of the corpora and software created during the project. 


English–Swahili parallel corpus
English–Turkish parallel corpus
English–Amharic parallel corpus and Amharic monolingual corpus

English–Kyrgyz parallel corpus and Kygryz monolingual corpus

Kyrgyz–Russian parallel corpus
PMIndia – Parallel corpus of languages of India


Morphological segmentation using Apertium resources
LASER train (language-agnostic sentence embeddings)
Top-level-domain crawler

Evaluation Tools

Direct AssessmentSentence Pairs Evaluation Tool

The goal of Direct Assessment is to evaluate a translation model by asking a human to compare the quality of a machine translated sentence to a human translated sentence where the human translation is assumed to be the gold standard

Gap Fill Evaluation Tool

The goal of Gap Fill is to evaluate a translation model by asking a human to fill in the gaps in a sentence that has been translated by a human using the machine translation of the same sentence as a guide to what words should go in that sentence.

Photo by Ankush Minda on Unsplash