Google Translate adds over 100 more languages

Google Translate adds over 100 more languages

Google Translate adds over 100 more languages PlatoBlockchain Data Intelligence. Vertical Search. Ai.

Google is adding more languages to Google Translate – lots more. This time around, 110 of them, including Manx.

This is the largest single expansion ever to Google’s translation tool. It now handles 243 different tongues, coming close to doubling the number of languages it handles.

The expansion is powered by PaLM 2, the latest release of Google’s Pathways Language Model, which it introduced in 2022 and then improved with version 2 in May 2023.

Google Translate has been gradually growing its repertoire for years, as The Register covered back in 2008 when, among others, it added Czech. That was a lifesaver for this vulture, when he moved here a decade ago. As we have described previously, Čeština is a brutally complex and difficult language. Last year, your correspondent relocated to the Isle of Man, which also has its own unique indigenous language, Manx.

This expansion, like the previous more modest one of 24 languages back in 2022, uses a method Google calls Zero Shot machine translation. Google Translate has been using neural network models for translation since 2016, and zero-resource training means that it’s possible to train its models to translate languages even though the training database doesn’t include one-to-one matching texts in both source and target languages.

For once, it seems to us that this is an excellent use for the painfully trendy large language models that the current generation of hucksters are shilling as AI. LLMs are a form of neural network, and the “AI accelerator chips” that some silicon vendors would like you to believe let computers think are mainly dedicated co-processors for doing tensor mathematics calculations faster.

Machine translation can be very valuable in the efforts to save minority languages. Manx has come back from extinction in the last couple of decades. The last native Manx speaker, Edward “Ned” Maddrell, died in 1974, but as the remaining native speakers grew old, multiple recordings and videos of the language being spoken were made, and today there’s a new generation of native Manx speakers being raised by adults who learned it as a second language, and even a Manx language primary school, Bunscoill Ghaelgagh.

Gura mie ayd, Google, as slaynt mie. ®

Bootnote

The whole list of languages now supported is as follows:

Abkhaz, Acehnese, Acholi, Afar, Afrikaans, Albanian, Alur, Amharic, Arabic, Armenian, Assamese, Avar, Awadhi, Aymara, Azerbaijani, Balinese, Baluchi, Bambara, Baoulé, Bashkir, Basque, Batak Karo, Batak Simalungun, Batak Toba, Belarusian, Bemba, Bengali, Betawi, Bhojpuri, Bikol, Bosnian, Breton, Bulgarian, Buryat, Cantonese, Catalan, Cebuano, Chamorro, Chechen, Chichewa, Chinese (Simplified), Chinese (Traditional), Chuukese, Chuvash, Corsican, Crimean Tatar, Croatian, Czech, Danish, Dari, Dhivehi, Dinka, Dogri, Dombe, Dutch, Dyula, Dzongkha, check, English, Esperanto, Estonian, Ewe, Faroese, Fijian, Filipino, Finnish, Fon, French, Frisian, Friulian, Fulani, Ga, Galician, Georgian, German, Greek, Guarani, Gujarati, Haitian Creole, Hakha Chin, Hausa, Hawaiian, Hebrew, Hiligaynon, Hindi, Hmong, Hungarian, Hunsrik, Iban, Icelandic, Igbo, Ilocano, Indonesian, Irish, Italian, Jamaican Patois, Japanese, Javanese, Jingpo, Kalaallisut, Kannada, Kanuri, Kapampangan, Kazakh, Khasi, Khmer, Kiga, Kikongo, Kinyarwanda, Kituba, Kokborok, Komi, Konkani, Korean, Krio, Kurdish (Kurmanji), Kurdish (Sorani), Kyrgyz, Lao, Latgalian, Latin, Latvian, Ligurian, Limburgish, Lingala, Lithuanian, Lombard, Luganda, Luo, Luxembourgish, Macedonian, Madurese, Maithili, Makassar, Malagasy, Malay, Malay (Jawi), Malayalam, Maltese, Mam, Manx, Maori, Marathi, Marshallese, Marwadi, Mauritian Creole, Meadow Mari, Meiteilon (Manipuri), Minang, Mizo, Mongolian, Myanmar (Burmese), Nahuatl (Eastern Huasteca), Ndau, Ndebele (South), Nepalbhasa (Newari), Nepali, NKo, Norwegian, Nuer, Occitan, Odia (Oriya), Oromo, Ossetian, Pangasinan, Papiamento, Pashto, Persian, Polish, Portuguese (Brazil), Portuguese (Portugal), Punjabi (Gurmukhi), Punjabi (Shahmukhi), Quechua, Q’eqchi’, Romani, Romanian, Rundi, Russian, Sami (North), Samoan, Sango, Sanskrit, Santali, Scots Gaelic, Sepedi, Serbian, Sesotho, Seychellois Creole, Shan, Shona, Sicilian, Silesian, Sindhi, Sinhala, Slovak, Slovenian, Somali, Spanish, Sundanese, Susu, Swahili, Swati, Swedish, Tahitian, Tajik, Tamazight, Tamazight (Tifinagh), Tamil, Tatar, Telugu, Tetum, Thai, Tibetan, Tigrinya, Tiv, Tok Pisin, Tongan, Tsonga, Tswana, Tulu, Tumbuka, Turkish, Turkmen, Tuvan, Twi, Udmurt, Ukrainian, Urdu, Uyghur, Uzbek, Venda, Venetian, Vietnamese, Waray, Welsh, Wolof, Xhosa, Yakut, Yiddish, Yoruba, Yucatec Maya, Zapotec, and Zulu.

Time Stamp:

More from The Register