Google Translate

Google Translate

Google Translate is a multilingual machine translation system, developed and provided by Google, to translate text, voice, images or video in real-time from one language to another. It offers a web interface, as well as mobile interfaces for iOS and Android, and an API, which developers can use to build browser extensions, apps, and other software. Google Translate has the ability to translate into 133 languages at different levels, the system provides a free service and is used daily by more than 200 million people.

Google incorporated in November 2016 its neural machine translation system; a system that according to the company will perfect the dynamic evolution of the Translator, since it will analyze the composition of the sentences taking into account a series of factors. The system learns over time and user queries, which improves the quality of its translations. For now, Google has incorporated into this system the languages English, French, German, Portuguese, Spanish, Chinese, Japanese, and Turkish.

Since December 2016, free text translation has been limited by Google to 5,000 characters, while web page translation has no length limit.

Google Translate
Screenshot of the desktop version of Google Translate

Characteristics of Google Translate

Web interface

For some languages, Google Translate can pronounce the translated text, highlight corresponding words and phrases in the source and target text, and act as a simple dictionary for only one word at a time. If “Detect language” or “Text in an unknown language” is selected, the system can automatically identify the language.

In the web interface, users can suggest alternative translations, such as for technical terms, or correct errors, and these suggestions will be included in future updates to the translation process. If a user enters a URL in the original text, Google Translate will produce a hyperlink to a machine translation of the website. For some languages, text can be entered using: an on-screen keyboard, a handwriting tablet using handwriting recognition algorithms, or a microphone using a speech recognition system.

Browser integration

Google Translate is available in some browsers as an extension that translates the texts it collects on the websites it accesses.

The system has a series of Firefox extensions for Google Translate, which allow you to select commands from the translation service. There are also several Google Gadgets available that use Google Translate to integrate with iGoogle and other websites.

There is also an extension for Google’s Chrome browser; in February 2010, Google Translate was integrated into the standard Google Chrome browser to automatically translate the web page being viewed.

Mobile device interface of Google Translate

The application is compatible with more than 100 languages and allows translating: 50 languages starting from a photo of the source text, 43 languages starting from voice in conversation mode and 27 languages starting from a video in real-time in augmented reality mode.

Conversation Mode is a Google Translate interface that allows users to communicate fluently with a person in another language. The interface is available for some languages only.

The ‘input from camera’ functionality allows users to take a picture of a document or sign and Google Translate recognizes the text of the image using optical character recognition (ROC) technology and gives the translation to the selected target language. Camera input mode is not available for all languages.

The application has the ability to translate the text in real-time using the camera of the mobile device, using the “Snapshot” option. The speed and quality of the characteristic video translation in real-time (augmented reality) were further improved by using convolutional neural networks.

Android version

Google Translate is available as a free download app for users of the Android operating system. It works simply as the browser version, Google Translate for Android contains two main options: “SMS Translation” and “History”.

The app supports more than 130 languages and voice input has the ability to process 15 languages. It is available for devices running Android 2.1 and above and can be downloaded by searching for “Google Translate” on Google Play. The app possesses the functionality, whereby any language can be translated just by focusing the text on the mobile device’s camera and also offers a conversation mode that uses Google’s voice command and cloud storage to translate the dialogue between two people who speak different languages.

iOS version

There is an HTML5 Google Translate web app for iOS for iPhone, iPod Touch, and iOS users. The current Google Translate app is compatible with updated iPhone, iPad, and iPod Touch for iOS 15.0 or higher. It accepts voice input in 15 languages and allows the translation of a word or phrase into one of the more than 500 languages available. The system has the option to provide the translated version of the text by pronouncing it aloud in 100 different languages.

API

In May 2011, Google announced that the Google Translate API for software developers had become obsolete and would stop working on December 1, 2011, “due to the high operating cost resulting from the abuse of the use of the same”. Because the API is used on numerous third-party websites, this decision led some developers to criticize Google and question the feasibility of using Google’s APIs in their products. In response to public pressure, Google announced in June 2011 that the API would still be available through a paid service.

Translation Methodology

Google Translate does not use grammatical rules, as its algorithms are based on statistical analysis rather than analysis based on traditional grammatical rules. The original creator of the system, Franz Josef Och, has criticized the effectiveness of rule-based algorithms, highlighting the better performance of systems based on statistical approaches.

Google Translate is based on a method called statistical machine translation, specifically on the results of research conducted by Och with which he won the DARPA contest for speed machine translation in 2003. Och was head of Google’s machine translation group until he left the company to join Human Longevity in July 2014.

In its inner workings, Google Translate does not translate directly from one language to another (I1 → I2). Rather, it often translates first from the source language into English and then from English into the target language (I1 → EN → I2). However, because English, like all human languages, is ambiguous and context-dependent, this method can cause translation errors. For example, the translation of vous from French to Russian gives vous → you → ты or Bы/вы. 31 If Google were to use unambiguous, artificial language as an intermediary, it would be vous → you → Bы/вы or your → thou → ты

When Google Translate translates, it looks for patterns in hundreds of millions of documents to decide which is the best translation. By detecting patterns in documents that were translated by humans, the system makes intelligent decisions about which translation is most appropriate.

The following languages do not have a direct Google translation to or from English. These languages are translated through the indicated intermediate language (which in any case is closely related to the desired language, but is more widely spoken), and then passed through English (in a process comprising three successive translations):

  • Belarusian (be ↔ ru ↔ in ↔ another);
  • Catalan (ca ↔ is ↔ in ↔ another);
  • Haitian Creole (ht ↔ fr ↔ in ↔ another);
  • Galician (gl ↔ pt ↔ in ↔ another);
  • Slovak (sk ↔ cs ↔ in ↔ other);
  • Ukrainian (uk ↔ ru ↔ in ↔ another);
  • Urdu (ur ↔ hi ↔ in ↔ another).

According to Och, a solid foundation for the development of a usable statistical machine translation system for a new language pair requires having a bilingual text corpus (or a parallel collection) of more than 150 to 200 million words, and two monolingual corpora each of more than one billion words. It is then possible to use statistical models from this data to translate between these languages.

To acquire this enormous amount of linguistic data, Google uses, for example, United Nations documents and reports. The UN normally publishes its documents and records in the six official languages of the UN, which has produced a large corpus of text in 6 languages.

Google representatives have participated in national conferences in Japan, where Google has requested bilingual data from researchers.

When Google Translate generates a translation, it looks for patterns in hundreds of millions of documents to help decide on the best translation. By detecting patterns in documents that have already been translated by human translators, Google Translate makes smart guesses (using an artificial intelligence system) as to what a proper translation should be.

Prior to October 2007, for languages other than Arabic, Chinese and Russian, Google Translate used SYSTRAN, a translation software engine that was used by several other online translation services such as Babel Fish (now discontinued). But since October 2007, Google Translate has used proprietary technology based on statistical machine translation.

Limitations

Google Translate has its limitations. The free service limits the number of paragraphs and the range of technical terms that can be translated, and while it can help the reader understand the general content of a text in another language, it doesn’t always deliver accurate translations, and sometimes the same word you want to translate is repeated verbatim.

Google Translate tries to differentiate between imperfect and perfect times in Romance languages so that habitual and continuous acts in the past often become individual historical events. Although seemingly pedantic, this can result in incorrect results that would have been avoided by a human translator. Knowledge of the subjunctive mode is practically non-existent. On the other hand, the informal second person (you) is often chosen, whatever the context or use is accepted. If your English reference material contains only “you” forms, you find it difficult to translate into a language that has more forms.

Some languages produce better results than others. Google Translate does a correct job especially when English is the target language and the source language is one of the languages of the European Union due to the large number of documents translated by the EU Parliament, to which the system has access. A 2010 analysis concluded that the translation from French to English is relatively accurate, and analyses conducted in 2011 and 2012 showed that the translation from Italian to English is also accurate.

However, if the source text is short, rule-based machine translations often perform better; this effect is particularly evident in translations from Chinese to English. While translations, in general, can be edited, in Chinese it is not possible to edit sentences. Instead, arbitrary sets of characters need to be edited, resulting in incorrect edits.

Texts written in Greek, Devanagari, Cyrillic and Arabic scripts can be automatically transliterated from phonetic equivalents written in the Latin alphabet. The browser version of Google Translate offers the option to read phonetically for Japanese to English conversion. The same option is not available in the paid API version. It also gives us a NOAD – New Oxford American Dictionary transcription when we translate a word from English into any other language which is a diacritical transcription.

For many of the most popular languages, the system has a “text-to-speech” audio function that allows you to read a text of a dozen words in that language. In the case of pluricentric languages, the accent of the message depends on the region:

from English, in the Americas, most of the Asia-Pacific and West Asia region the audio uses a general American accent with a feminine tone, while in Europe, Hong Kong, Malaysia, Singapore, Guyana and all other parts of the world use a British English accent with a feminine tone, a special accent is used in Australia, New Zealand and Norfolk Island; for Spaniards, in the Americas a Spanish accent from Latin America is used, while in the other parts of the world, an accent is used Castilian Spanish; in Portuguese, in general, a São Paulo accent is used, except for Portugal, where its native accent is used. For some less widely used languages, the open-source voice synthesizer eSpeak is used; however, production using a voice robot can be difficult to understand.

Supported languages on Google Translate

As of June 2022, Google Translate supports the following 133 languages.

  1. Afrikaans
  2. Aymara
  3. Albanian
  4. German
  5. Amharic
  6. Arabic
  7. Armenian
  8. Assamese
  9. Azeri
  10. Bambara
  11. Bengali
  12. Bhojpuri
  13. Belarusian
  14. Burmese
  15. Bosnian
  16. Bulgarian
  17. Cambodian
  18. Kannada
  19. Catalan
  20. Cebuano
  21. Czech
  22. Chichewa
  23. Chinese (Simplified)
  24. Chinese (traditional)
  25. Sinhalese
  26. Korean
  27. Corsican
  28. Haitian Creole
  29. Croatian
  30. Danish
  31. Dogri
  32. Slovak
  33. Slovenian
  34. Spanish
  35. Esperanto
  36. Estonian
  37. Ewe
  38. Basque
  39. Finnish
  40. French
  41. Frisian
  42. Scottish Gaelic
  43. Welsh
  44. Galician
  45. Georgian
  46. Greek
  47. Guarani
  48. Gujarati
  49. Hausa
  50. Hawaiian
  51. Hebrew
  52. Hindi
  53. Hmong
  54. Hungarian
  55. Igbo
  56. Ilocano
  57. Indonesian
  58. English
  59. Irish
  60. Icelandic
  61. Italian
  62. Japanese
  63. Javanese
  64. Kazakh
  65. Kiñaruanda
  66. Kyrgyz
  67. Konkani
  68. Krio
  69. Kurdish (Kurmanji)
  70. Kurdish (Sorani)
  71. Lao
  72. Latin
  73. Latvian
  74. Lingala
  75. Lithuanian
  76. Luganda
  77. Luxembourgish
  78. Macedonian
  79. Maithili
  80. Malayalam
  81. Malay
  82. Maldivian
  83. Malagasy
  84. Maltese
  85. Maori
  86. Marathi
  87. Meiteilon (Manipuri)
  88. Mizo
  89. Mongolian
  90. Dutch
  91. Nepali
  92. Norwegian
  93. Oriya
  94. Oromo
  95. Punjabi
  96. Pashtun
  97. Persian
  98. Polish
  99. Portuguese
  100. Quechua
  101. Romanian
  102. Russian
  103. Samoan
  104. Sanskrit
  105. Sepedi
  106. Serbian
  107. Southern Sotho
  108. Shona
  109. Sindhi
  110. Somali
  111. Swahili
  112. Swedish
  113. Sundanese
  114. Tagalog
  115. Thai
  116. Tamil
  117. Tatar
  118. Tajik
  119. Telugu
  120. Tigrinya
  121. Tsonga
  122. Turkish
  123. Turkmen
  124. Twi
  125. Ukrainian
  126. Uyghur
  127. Urdu
  128. Uzbek
  129. Vietnamese
  130. Xhosa
  131. Yiddish
  132. Yoruba
  133. Zulu

Languages in development and beta version of Google Translate

The following languages are not yet supported by Google Translate, however, you can contribute to these languages through the website for Google to support in the future. As of June 2022, 103 languages are in development, of which 9 are in beta.

Beta languages are closer to their public release and have an exclusive extra option to contribute that allows you to evaluate up to 4 translations of the beta version by translating an English text of up to 50 characters.

  1. Achenese
  2. Adyghe
  3. Afar BETA
  4. Ahirani
  5. Southern Altai
  6. Aragonese
  7. Avar
  8. Bagheli
  9. Balochi
  10. Bangala
  11. Baoulé
  12. Bashkir
  13. Batak tuff
  14. Betawi
  15. Bodo BETA
  16. Breton
  17. Kashmiri
  18. Cantonese
  19. Chatisgarí
  20. Chechen
  21. Cherokee
  22. Chiluba
  23. Chitonga
  24. Chittagonio
  25. Chuvash
  26. Cumuco
  27. Decaní
  28. Dholuo
  29. Diula
  30. Dzongkha
  31. Edo
  32. Efik
  33. Esan
  34. Fon
  35. Fulfulde BETA
  36. Gagaúzo
  37. Garhwali
  38. Kalaallisut
  39. Haryanvi
  40. Hiligainon
  41. Inuktitut
  42. Isoko
  43. Khakasium
  44. Kamba
  45. Kanuri
  46. Karachai-Balkaro
  47. Karakalpak
  48. Kashgai
  49. Kikuyu
  50. Kokborok
  51. Lakota
  52. Luba
  53. Madurés
  54. Magahi
  55. Kedah Malay
  56. Kelantan Malay
  57. Marwari
  58. Mazandaraní
  59. Minangkabau
  60. Montenegrin
  61. Mossi
  62. Navajo
  63. South Ndebele
  64. Nepal bhasa BETA
  65. Occitan
  66. Pampanga
  67. Pidgin from Nigeria
  68. K’iche
  69. Rangpuri
  70. Rayastani
  71. Rohingya
  72. Romansh
  73. Sadri
  74. Salt
  75. Northern Sami
  76. Samogitian
  77. Sango
  78. Santali BETA
  79. Saraiki BETA
  80. Serrano
  81. Tswana
  82. Shor
  83. Sicilian
  84. Swahili of the Congo
  85. Suryapuri
  86. Sylheti
  87. Tamazight BETA
  88. Siberian Tatar
  89. Tibetan BETA
  90. Tiv
  91. Tok Pisin
  92. Tswa
  93. Khorasan Turk
  94. Tuvinian
  95. Urhobo
  96. Urrumano
  97. Varhadi-Nagpuri
  98. Bandage
  99. Wolof
  100. Yakut
  101. Yucatecan BETA
  102. Zazaki
  103. Zhuang

Open source licenses and components

LanguageWordNetLicense
SpanishSpanishCC-BY 3.0/GPL 3
ArabicArabic Wordnet
CatalanMultilingual Central RepositoryCC-BY-3.0
ChineseChinese WordnetWordnet
DanishDannetNo Wordnet
SpanishMultilingual Central RepositoryCC-BY-3.0
FinnishFinnWordnetWordnet
FrenchWOLF (WOrdnet Libre du Français)CeCILL-C
GalicianMultilingual Central RepositoryCC-BY-3.0
HebrewHebrew WordnetWordnet
HindiIIT Bombay WordnetIndo Wordnet
IndonesianWordnet BahasaMIT
EnglishPrinceton WordnetWordnet
ItalianMultiWordnetCC-BY-3.0
JapaneseJapanese WordnetWordnet
JavaneseJavanese WordnetWordnet
MalayWordnet BahasaMIT
NorwegianNorwegian WordnetWordnet
PersianPersian WordnetFree to Use
PolishplWordnetWordnet
PortugueseOpenWN-PTCC-BY-SA-3.0
ThaiThai WordnetWordnet

Reviews

Shortly after launching the translation service, Google won an international competition for English-Arabic and English-Chinese machine translation.

Translation errors and oddities

Because Google Translate uses statistical matching to translate, sometimes the translated text can include glaring errors and seemingly meaningless phrases, using common terms for similar but not equivalent common terms in the target language, as with the Latin translation, reversing or altering its meaning of the requested sentence.

On April 23, 2020, it was announced that it adopted a new model to reduce the gender bias that occurs between two languages, when one of them distinguishes between male and female in the terms that the other has of gender neutral.

References (sources)