Facebook says the new multilingual machine translation model was created to help its more than two billion users worldwide. The company is still testing the translation system – which it calls M2M-100 - and hopes to add it to different products in the future.
Facebook表示,创建这一新的多语言机器翻译模型是为了帮助该公司在全球超过20亿的用户。该公司仍在测试这款名为M2M-100的翻译系统,并希望将来能将其添加到其它产品中。
The social media service says it has made the system open source -- meaning its computer code will be freely available for others to copy or change.
这家社交媒体公司称已经将该系统开源,这意味着其计算机代码将可供他人免费复制或修改。
Angela Fan, a research assistant at Facebook, explained the new machine translation model this week on one of the company's websites. She said its development represented a "milestone" in progress after years of "foundational work in machine translation."
Facebook研究助理安吉拉·范本周在该公司的一个网站上解释了这种新的机器翻译模型。她称该模型的开发代表着多年“机器翻译基础工作”进程上的一座“里程碑。”
Fan said the model produces better results than other machine learning systems that depend on English to help in the translation process. The other systems use it as an intermediate step -- like a bridge -- to translate between two non-English languages.
范女士表示,与其它在翻译过程中依赖英语帮助的机器学习系统相比,该系统能够生成更好的翻译结果。其它系统将英语作为两种非英语语言之间进行翻译的桥梁。
One example would be a translation from Chinese to French. Fan noted that many machine translation models begin by translating from Chinese to English first, and then from English to French. This is done "because English training data is the most widely available," she said. But such a method can lead to mistakes in translation.
其中一个例子是从中文到法语的翻译。范女士指出,许多机器翻译模型先将中文翻译成英语,然后再将英语翻译成法文。她说,这样做的原因“是因为英语训练数据是最广泛的。”但是这种做法可能会导致翻译错误。
"Our model directly trains on Chinese to French data to better preserve meaning," Fan said. Facebook said the system outperformed English-centered systems in a widely used system that uses data to measure the quality of machine translations.
范女士表示:“我们的模型直接训练中文到法语的数据,以更好地保留语义。”Facebook表示,该模型在一款用于衡量机器翻译质量的系统中的表现要优于以英语为中心的翻译模型。
Facebook says about two-thirds of its users communicate in a language other than English. The company already carries out an average of 20 billion translations every day on Facebook's News Feed. But it faces a huge test with many users publishing massive amounts of content in more than 160 languages.
Facebook表示,该公司大约2/3的用户使用英语以外的语言进行交流。该公司已经在Facebook网站的新闻源上平均每天进行200亿次翻译。但是它面临了巨大的考验,许多用户发布了超过160种语言的大量内容。
The development team trained, or directed, the new model on a data set of 7.5 billion sentence pairs for 100 languages. In addition, the system was trained on a total of 2,200 language directions. Facebook said this is 10 times the number on the best machine translation models in the past.
该开发团队用100种语言的75亿对句子的数据集训练或指导这款新模型。此外,该系统共接受了2200种语言方向的训练。Facebook表示,这在数量上是过去最佳机器翻译模型的10倍。
One difficulty the team faced was trying to develop an effective machine translation system for language combinations that are not widely used. Facebook calls these "low-resource languages." The data used to create the new model was collected from content available on the internet. But there is limited internet data on low-resource languages.
该团队面临的困难之一是试图为未广泛使用的语言组合开发有效的机器翻译系统。Facebook称这些是“低资源语言。”这些用于创建新模型的数据是从互联网上的内容中收集到的。但是,低资源语言在互联网上的数据有限。
To deal with this problem, Facebook said it used a method called back-translation. This method can create "synthetic translations" to increase the amount of data used to train on low-resource languages.
为了解决这个问题,Facebook称其利用了一种名为“反向翻译”的办法。这种办法可以创建“合成翻译,”以增加用于训练低资源语言的数据量。
For now, the company says, it plans to continue exploring new language research methods while working to improve the new model. No date has been set for launching the translation system on Facebook.
该公司表示,目前它计划继续探索新的语言研究方法,同时努力改进这款新模型。该公司尚未确定在Facebook网站上启用该翻译系统的日期。
But Angela Fan said the new system marks an important step for Facebook, especially for the times we live in. "Breaking language barriers through machine language translation is one of the most important ways to bring people together, provide authoritative information on COVID-19, and keep them safe from harmful content," she said.
但是范女士表示,该新系统标志着Facebook迈出了重要的一步,尤其是对于我们所处的时代来说。她说:“通过机器翻译打破语言障碍,这是将人们团结在一起,提供新冠肺炎权威信息,并使得他们免受有害内容毒害的最重要手段之一。”