Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ


Choose your language

InfoQ Homepage News Google Document Translation Now Generally Available

Google Document Translation Now Generally Available

This item in japanese

Google Cloud recently announced the general availability of Document Translation, a new feature of Translation API Advanced that allows formatting of documents to be retained throughout the translation process.

Until now the translation of documents required that text was separated from the layout attributes, with the document’s structure either lost or recreated after the text translation. Sarah Weldon, product manager at Google, explains:

One of the biggest differentiators for Translation API Advanced’s document translation capabilities is the ability to do real-time, synchronous processing for a single file. For example, if you are translating a business document such as HR documentation, online translation provides flexibility for smaller files and provides faster results (...) Meanwhile, batch translation allows customers to translate multiple files into multiple languages in a single request.

The new service lets customers translate documents in over 100 languages and supports formats such as Docx, PPTx, XLSx, and PDF while preserving document formatting. The GA adds right to left language support for PDFs, preservation of font size, font color, font style, and hyperlinks for native PDFs and introduces configurable endpoints to store machine translation processing in the European Union.

To improve the accuracy of the results, Document Translation now supports four different translation approaches: customers can rely on Google’s SOTA translation models, import glossaries for specific terms and phrases defining preferred translations, choose a pre-trained model or build custom translation models with AutoML.

In a separate article, Tristan Li, customer engineer at Google, and Wayne Davis, customer engineering manager at Google, highlight the best practices for translating websites with Translation API. Google is not the only cloud provider offering API for document translation. As recently reported on InfoQ, Microsoft Translator now supports over 100 languages and dialects, covering languages natively spoken by 72% of the world population. AWS offers Amazon Translate to localize websites and applications or translate large volumes of text for analysis.

Rafael Quevedo questions the accuracy of the new API:

The cloud projects are at mercy of the diversity team that designed them. Google Translator can claim that it can translate languages from all types using the existing literature, but can it deal with old style TV phrases? Or slang?

Cloud Translation charges customers by the amount of text processed by the service, starting at 20 USD per million characters. Additional charges apply for the Advanced API calls detectLanguage, translateText, batchTranslateText, translateDocument, and batchTranslateDocument. For example, TranslateDocument costs 0.08 USD for every page processed.

Rate this Article