Data & Model Releases

As part of GoURMET’s efforts to increase resources and tools available for low-resource machine translation, we have released many of the corpora and software created during the project. 


English–Swahili parallel corpus
English–Turkish parallel corpus
English–Amharic parallel corpus and Amharic monolingual corpus

English–Kyrgyz parallel corpus and Kygryz monolingual corpus

Kyrgyz–Russian parallel corpus
PMIndia – Parallel corpus of languages of India


Morphological segmentation using Apertium resources
LASER train (language-agnostic sentence embeddings)
Top-level-domain crawler