Google Gets More Multilingual, but Will It Get the Nuance? | Technology News

By FRANKLIN BRICEÑO and MATT O’BRIEN, Associated Press
LIMA, Peru (AP) – About 10 million people speak Quechua, but trying to automatically translate emails and text messages into America’s most widely spoken Native language family has long been impossible.
That changed on Wednesday, when Google added Quechua and various languages to its digital translation service.
The internet giant says new artificial intelligence technology will enable it to greatly expand Google Translate’s repertoire into the world’s languages. It added 24 of them this week, including Quechua and other Native South American languages such as Guarani and Aymara. It also adds a number of widely spoken African and South Asian languages that have long since disappeared from popular tech products.
“We’re looking at languages with very large, very sparse populations,” Google researcher Isaac Caswell told reporters.
Political Cartoons

The news from the California company’s annual I/O technology showcase can be celebrated in many corners of the globe. But it can also draw criticism from those who have been disappointed with past technology products that have failed to understand the nuances of their language or culture.
Quechua was the lingua franca of the Inca Empire, stretching from southern Colombia to central Chile. Its status began to decline after the Spanish conquest of Peru more than 400 years ago.
Adding it to Google-recognized languages is a huge win for Quechua language activists like Luis Illaccanqui, a Peruvian creator of the Qichwa 2.0 website, which includes dictionaries and resources for learning. on the language.
“It will help put Quechua and Spanish on the same footing,” said Illaccanqui, who was not involved in the Google project.
Illaccanqui, whose surname in Quechua means “you are the lightning,” said the translator will also help perpetuate the language of a new generation of young people and teenagers, “speaking Quechua and Spanish in same time and attracted to social networks. ”
Caswell called the news a “very big technological step forward” because until now, it wouldn’t have been possible to add languages if researchers couldn’t find a large enough trove of online text – such as digital books, newspapers or social media posts – for their AI systems to learn.
U.S. tech giants don’t have a good track record of making their language technology good outside of the richest markets, a problem that also makes it difficult for them to identify dangerous misinformation on their platforms. As of this week, Google Translate is offered in European languages such as Frisian, Maltese, Icelandic and Corsican – each with less than 1 million speakers – but not in East African languages such as Oromo and Tigrinya, which have millions of speakers.
The new languages will be released on Google’s Android system this week and on Apple devices later this month. They are not yet understood by Google’s voice assistant, which limits them to text-to-text translations at the moment. Google says it is working on increasing language recognition and other capabilities, such as interpreting a signal by pointing a camera at it.
That’s important for mostly spoken languages like Quechua, especially in the field of health, because many Peruvian doctors and nurses who speak only Spanish work in rural areas and “don’t understand most patients. speaks Quechua, “said Illaccanqui.
“The next frontier, or challenge, is language work,” said Arturo Oncevay, a Peruvian machine translation researcher at the University of Edinburgh who has set up a research group to improve Indigenous language technology across the board. of America. “Native American languages rather than traditional oral.”
In announcing this, Google warned that the quality of translations in the newly added languages is “far behind” the other languages it supports, such as English, Spanish and German, and noted that the models “make mistakes and show their own biases.” But the company only adds languages if its AI systems meet a certain standard of proficiency, according to Caswell.
“If there’s a significant number of cases where it’s very wrong, then we don’t include it,” he said. “Even if 90% of the translations are perfect, but 10% are useless, that’s too little for us.”
Google says its products now support 133 languages. The latest 24 is the largest one batch to be added since Google incorporated 16 new languages in 2010. What makes the expansion possible is what Google calls a “Zero-Stop Machine Translation “model – an AI model that has learned to translate into another language without ever seeing an example of it.
Facebook and Instagram parent company Meta introduced a similar concept called Universal Speech Translator last year.
“At a high level, the way you can imagine it working is that you have a giant neural model and it’s trained in 100 different languages,” Caswell said of the Google model.
He said the new group ranged from small languages like Mizo, spoken in north-eastern India by about 800,000 people, to more widely spoken languages like Lingala, spoken by about 45 million people across Central Africa.
More than 15 years ago – in 2006 – Microsoft gained positive attention in South America with a piece of software that translates familiar Microsoft menus and commands into Quechua. But that’s before the current wave of AI advances in real-time translation.
Harvard University language scholar Américo Mendoza, who speaks Quechua, said Google’s attention span brings some of the need to be seen in the language in places like Peru, where Quechua speakers are still lacking. many public services. The survival of most of these languages “will depend on their use in the digital context,” he said.
New languages added are: Assamese, Aymara, Bambara, Bhojpuri, Dhivehi, Dogri, Ewe, Guarani, Ilocano, Konkani, Krio, Lingala, Luganda, Maithili, Meiteilon (Manipuri), Mizo, Oromo, Quechua, Sanskrit, Sepedi, Sorani Kurdish, Tigrinya, Tsonga and Twi.
Copyright 2022 The Associated Press. All rights reserved. This material may not be published, broadcast, rewritten or distributed.