Meet mmT5: A Modular Multilingual Sequence-To-Sequence Model That Outperforms mT5


Pre-trained models that speak many languages have performed excellently on natural language interpretation challenges. Large volumes of unlabeled data in hundreds of languages are often used to train these models. Although being pre-trained mostly on English data, recent huge language models have remarkable multilingual abilities. All of these models, however, have one thing in common: they can only hold so many representations of different languages. As a result, models perform badly on languages with fewer pretraining data and more pretraining languages. The “curse of multilingualism” is another name for this. 

For existing multilingual models, natural language production tasks provide additional issues since they may overfit the training languages and partially forget their generation skill in the target language, resulting in text that has the right meaning but needs to be written correctly. The “source language hallucination problem” is how they describe this. Researchers from Google DeepMind suggest the modular multilingual T5, the first modular multilingual generative model, to overcome these two drawbacks. To boost capacity for multilingual modeling, mmT5 allots a modest number of language-specific parameters during pretraining. 

By freezing the language-specific modules during fine-tuning and adjusting the common parameters, they enable direct adaptation to a target language by switching to the appropriate language-specific module. They also note another area for improvement with mmT5: the fine-tuned shared representations could diverge from the decoder’s frozen modular representations. Thus, the modular approach is much like its non-modular equivalents, prone to producing content in the incorrect language. They suggest freezing a portion of the common decoder parameters to help with this, which makes a significant difference in zero-shot cross-lingual generation for modular generative models. 

They discover that the mmT5 model effectively addresses the two drawbacks of multilingual sequence-to-sequence models: 1) By allowing for more model capacity to be added to various languages during pretraining, mmT5 alleviates the curse of multilingualism. On a typical collection of multilingual NLU and NLG tasks, it outperforms conventional baselines and mT5 at the same parameter sizes; moreover, mmT5 impressively addresses the source language hallucination problem on zero-shot cross-lingual text production. According to their investigation, for a zero-shot multilingual summarization job, mT5 only produces text in the target language 7% of the time, but mmT5 makes the text in the right language for 99% of cases. 

A modular multilingual encoder-decoder model called mmT5 has been suggested. The bulk of mmT5 parameters used during multilingual pretraining are shared across tasks, but each language is also given a limited number of parameters that are exclusive to that language. They showed that adding modularity as an architectural inductive bias greatly increases training efficiency, achieving the same perplexity as a comparable completely dense model in a fourth of the update steps. On a wide range of tasks, including Question Answering, Semantic Parsing, Summarization, and Classification in both zero-shot and multilingual contexts, mmT5 significantly outperforms comparable models. 

Finally, they demonstrate that the model reliably produces text in the target language while fine-tuning mmT5 on a target task in a source language by freezing certain decoder regions. Therefore, modularity eliminates source language hallucinations in cross-lingual transmission cases.

Check Out The Paper. Don’t forget to join our 23k+ ML SubRedditDiscord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more. If you have any questions regarding the above article or if we missed anything, feel free to email us at

Check Out 100’s AI Tools in AI Tools Club

The post Meet mmT5: A Modular Multilingual Sequence-To-Sequence Model That Outperforms mT5 appeared first on MarkTechPost.

 Read More MarkTechPost 







Leave a Reply

Your email address will not be published. Required fields are marked *