Unbabel Releases Award-Winning Open Source Quality Estimation System

Time to read
2 minutes
Read so far

Unbabel Releases Award-Winning Open Source Quality Estimation System

Tue, 02/26/2019 - 06:34
Posted in:

The AI and human-powered scale up delivers ‘OpenKiwi’ to the Global Machine Translation Community

Lisbon, Portugal; San Francisco, USA: 26th February 2019. Unbabel, the leading AI + human translation scaleup, announces the release of the open source edition of its award-winning quality estimation and automatic post editing suite to the global technology community.

Since 2016, Unbabel’s AI team has been focused on advancing its state of the art in Quality Estimation (QE). An active part of the QE research community, Unbabel’s teams have participated, won, and co-organized various QE shared tasks at the Conference for World Machine Translation (WMT), and last year organized the first workshop on QE and Automatic Post-Editing in AMTA to discuss the future of the field.

Strong believers in the concept of making AI research reproducible, Unbabel decided to make its QE system available to external researchers in the form of OpenKiwi, a Pytorch-based open-source framework that implements the best Quality Estimation systems from the WMT 2015-18 shared tasks, with added improvements.

The AI-human framework will be used as a baseline system, enabling businesses to provide fast, accurate translations at the scale of machine translation.

“Quality estimation has already proven itself in terms of reducing the time and costs associated with post-editing, and we want to share this toolset with the rest of the world so that teams can contribute to the global development of QE,” says Unbabel co-founder and CTO João Graça. “We’re excited to be examining the new issues presented to QE and automatic post-editing by neural machine translation, and we look forward to feedback from the global QE community.”

OpenKiwi is implemented in Python using Pytorch as its deep learning framework, and has a user-friendly API which can be imported as a package in other projects, or run from the command line. With this release, teams taking part in the shared tasks of WMT19, the fourth conference on Machine Translation, can use OpenKiwi to examine automatic methods for estimating the quality of machine translation output at run-time, covering estimation at various levels and studying the performance of quality estimation approach on the output of neural machine translation systems.

“Over the last decade, artificial intelligence applications such as machine translation have helped break down language barriers, both for consumers but also for enterprises,” says Christian Federmann, Senior Data Scientist, Microsoft Translator and Research Director, Association for Machine Translation in the Americas (AMTA). “Faced with an increasing amount of machine translated content, there is a growing need for quality estimation to identify which content may be ready for publication, and which may still need human refinements. This process is at the core of Unbabel’s business and has resulted in the creation of OpenKiwi which will now be released publicly, under an open-source license. This release will benefit both machine translation researchers and translation business alike, enabling them to integrate more machine translation into their workflows, at a higher quality, further expanding personal and professional communication capabilities.”

OpenKiwi is an open source project hosted in GitHub: https://github.com/Unbabel/OpenKiwi.

OpenKiwi features:

·         Implementation of state-of-the-art QE systems which won WMT shared tasks in 2016—2018

·         Implemented in Python using Pytorch as the deep learning framework.

·         Easy to use API: can be imported as a package in other projects or run from the command line.

·         Ability to train new QE models on new data.

·         Ability to use pre-trained QE models on data from the WMT 2018 campaign.

·         Easy to track and reproduce experiments via yaml configuration files.

·         Open-source licence.