Data & Model Releases
As part of GoURMET’s efforts to increase resources and tools available for low-resource machine translation, we have released many of the corpora and software created during the project.
English–Swahili parallel corpus
English–Turkish parallel corpus
English–Amharic parallel corpus and Amharic monolingual corpus
English–Kyrgyz parallel corpus and Kygryz monolingual corpus
Kyrgyz–Russian parallel corpus
PMIndia – Parallel corpus of languages of India
Morphological segmentation using Apertium resources
LASER train (language-agnostic sentence embeddings)
Direct Assessment – Sentence Pairs Evaluation Tool
The goal of Direct Assessment is to evaluate a translation model by asking a human to compare the quality of a machine translated sentence to a human translated sentence where the human translation is assumed to be the gold standard
Gap Fill Evaluation Tool
The goal of Gap Fill is to evaluate a translation model by asking a human to fill in the gaps in a sentence that has been translated by a human using the machine translation of the same sentence as a guide to what words should go in that sentence.