Building the GoURMET Translation Tool for low-resource languages

By Susie Coleman

Image Susie Coleman, Profil

Susie is a senior software engineer for BBC News Labs. Her focus within the team is the GoURMET Project, which aims to use and improve neural machine translation for low-resource language pairs and domains. Prior to that she was a Software Developer at the Guardian on the Editorial Tools team.

Why are media organisations interested in Machine Translation?

Bringing technologies such as Machine Translation into the newsroom supports a news organisation’s ability to reach a worldwide audience by allowing news and other information to be more effectively shared between languages, freeing journalists from translation and re-versioning tasks. Combined with the increasing numbers of individuals accessing content online this allows news organisations to have a much larger international reach. Translation technologies also support newsgathering by opening up possibilities to gather news in languages which might otherwise be difficult or impossible to monitor.

The aim of the GoURMET project is to use and improve neural machine translation for low-resource language pairs and domains. Where a low-resourced language is a language where limited data resources exist for training and testing machine translation models. The two media partners on the GoURMET project, the BBC in the UK and Deutsche Welle (DW) in Germany, publish news content in 40 and 30 different languages, respectively, and gather news in over 100 languages. Many of these languages are low-resourced.

Why build a Translation Tool?

In order to showcase the quality of the translation models produced, a team at the BBC has built a simple User Interface (UI) that incorporated the translation models developed by the university partners.

One of the BBC’s responsibilities is to build an API to provide a standard interface that exposes the translation models built by the partner universities. This API allows translation between the language pairs supported by the project.

While an API is useful when integrating the translation technology into software, it is not a very user-friendly way to demonstrate the work done by the GoURMET consortium members. As a result, the BBC has worked closely with a UX Designer to build a user facing graphical interface, which uses the API to communicate with the translation models.

Translate for example between English and Gujarati – Screenshot of GoURMET Translate

Building this interface gives all members of the consortium an easy way to interact with all the translation models available as well as demo their work to others. It is a web-based interface so there is no set up required to perform a demo all that is required is a web browser. One of the goals when building this UI was to make it possible to carry out demos of the translation technology anywhere and as a result the team also ensured that the UI had a mobile friendly view.

How does it work?

The UI was built using the React Framework for the front end and node with the Express framework for the back end. Both the front end and back end are using TypeScript. Using TypeScript which is a typed superset of JavaScript allowed the use of static types. This allows bugs related to type errors to be caught before run-time. However it does make it more challenging for JavaScript developers to start working with the code as it requires an understanding of both JavaScript and TypeScript.

The code itself runs on AWS Infrastructure.

AWS Infrastructure for GoURMET Translate

The application runs on an EC2 instance which sits within an autoscaling group. The autoscaling group allows the number of instances to be scaled up and down as required to deal with the amount of traffic received by the service. The load balancer is responsible for distribution of traffic across available EC2 instances. The autoscaling group is also responsible for monitoring the EC2 instances and checking at regular intervals that they all pass a health check. If an instance fails its health check the autoscaling group will replace that instance. All instances communicate with the translation API.

What are our Learnings?

This UI has proved an incredibly valuable way to start conversations about the work the consortium is doing. It has also highlighted why building an API to access the translation models is so important. By providing a single platform it is easy to build multiple clients that use the underlying technology through a standard REST API rather than having to integrate the translation models directly into the application.

###

The idea behind the GoURMET Project

How can machine translation be built for languages without a wealth of training data? This is the challenge the GoURMET project is tackling. The BBC World Service and Deutsche Welle reports in many languages for which gathering these training datasets is difficult, for example Kyrgyz (BBC) and Bulgarian (DW). 

GoURMET is an EU Horizon 2020 funded project with multiple academic and media partners around Europe.

Cover photo by mohamed hassan from PxHere