The following projects are covering technologies close to GoURMET

SUMMAThe SUMMA project is an EU H2020 project which ran from 2016 to early 2019. It is a large consortium with nine partners, three of whom overlap with this project: the University of Edinburgh, BBC and Deutsche Welle. The aim of SUMMA is to significantly improve media monitoring by creating a platform to automate the analysis of media streams across many languages, to aggregate and distil the content, to automatically create rich knowledge bases, and to provide visualisations to cope with this deluge of data. 

ParaCrawl – The ParaCrawl project (2017–2019) is a European Connecting Europe Facility (CEF) project which will create and release large parallel corpora to/from English for all official EU languages by a broad web crawling effort. Two GoURMET members, University of Edinburgh and Universitat d’Alacant, are partners in the ParaCrawl project. ParaCrawl deploys state-of-the-art methods to the entire processing chain, from identifying web sites with translated text all the way to collecting, cleaning and delivering parallel corpora that are ready as training data for the Automated Translation component of the CEF and translation memories for the DG Translation of the European Commission.

Alto –  The Automated Language Tool (Alto) is prototype software developed by BBC News Labs in order to promote content reversioning within BBC World Service. Alto is a tool for reversioning video content which has been originally created in English into multiple languages. The tool can either display translated output as subtitles overlayed on the original video (which is especially popular for videos published on social media and likely to be consumed on a mobile device in a public or noisy environment) or, using text-to-speech voice synthesis, re-voice the audio track.

SCRIPT – The Speech Synthesis for Spoken Content Production (SCRIPT) project is a 3-year EPSRC-funded research project specifically seeking to develop synthetic voices for low-resourced languages. The primary research step is to combine the broadcast-quality, ‘natural’-sounding voicerecordings produced through unit selection, driven by deep neutral network technology to enable parametric changes to tone, pitch and speed.

MeMaT – The Medical Machine Translation (MeMaT) is a UK EPSRC funded project (2017-2018), and is part of the Global Research Challenges Fund and Edinburgh co-ordinated this project. MeMaT’s goal is to improve machine translation quality for the low-resource languages isiXhosa and isiZulu. The research was run in collaboration with partners at the University of Cape Town and the Guguletu Health Clinic. This project provided an initial pilot study for translation into low-resource translation and transfer learning across related languages.

news.bridge – DNI (Google-funded) Digital News Initiative Project 2018-2019 on automated subtitling, in particular for public broadcasters. It makes audiovisual content available in virtually any language through automated transcription and translation using off-the-shelf tools, enhanced with post-editing. It also provides voice-over for some target languages. The focus of this project which is coordinated by Deutsche Welle is on combining efficient media production and editorial workflows with AI. The tool is currently under betatesting at Deutsche Welle.

Related ICT29 H2020 Projects

ELG – European Language Grid

Pret-a-LLOD – Ready-to-use Multilingual Linked Language Data for Knowledge Services across Sectors

Bergamot – Browser-based Multilingual Translation

EMBEDDIA – Cross-Lingual Embeddings for Less-Represented Languages in European News Media

ELITR – European Live Translator

AI4EU – A European AI On Demand Platform and Ecosystem

COMPRISE – Cost-effective, Multilingual, Privacy-driven voice-enabled Services