Maha Elbayad

I am a senior research scientist at Meta AI based in Menlo Park, CA. I specialize in massively multilingual and multimodal machine translation models for speech and text, working on projects like No-Language-Left-Behind and SeamlessM4T. I completed my PhD in 2020 from Université Grenoble Alpes, where I worked at Laboratoire d’informatique de Grenoble and Inria under the supervision of Jakob Verbeek and Laurent Besacier. My doctoral thesis explored novel designs of sequence-to-sequence models for efficient offline and streaming machine translation.

Prior to my PhD, I graduated from Centrale Paris (Applied Mathematics) and ENS Paris-Saclay (M2 MVA Mathématiques, Vision, Apprentisage).

News

Jan 15, 2025	SeamlessM4T published in Nature.
Dec 10, 2024	Publishing Large Concept Models (LCMs). This is a new direction in language modeling that moves beyond traditional token-level LLMs.
Nov 07, 2024	Merging Text Transformer Models from Different Initializations accepted by TMLR.
Jun 04, 2024	No Language Left Behind published in Nature.
Nov 30, 2023	Releasing Seamless communication models, a family of AI research models that enable more natural and authentic communication across languages (SeamlessExpressive, SeamlessStreaming and SeamlessM4T v2). Download the models from github or 🤗
Oct 24, 2023	TIME magazine selected SeamlessM4T as one of the best inventions of 2023!

Selected publications

Nature

Joint speech and text machine translation for up to 100 languages

Seamless Communication, Loı̈c Barrault, Yu-An Chung, Mariano Coria Meglioli, David Dale, and 64 more authors

Nature, 2025

Abs DOI HTML PDF

Creating the Babel Fish, a tool that helps individuals translate speech between any two languages, requires advanced technological innovation and linguistic expertise. Although conventional speech-to-speech translation systems composed of multiple subsystems performing translation in a cascaded fashion exist1–3, scalable and high-performing unified systems4,5 remain underexplored. To address this gap, here we introduce SEAMLESSM4T–Massively Multilingual and Multimodal Machine Translation–a single model that supports speech-to-speech translation (101 to 36 languages), speech-to-text translation (from 101 to 96 languages), text-to-speech translation (from 96 to 36 languages), text-to-text translation (96 languages) and automatic speech recognition (96 languages). Built using a new multimodal corpus of automatically aligned speech translations and other publicly available data, SEAMLESSM4T is one of the first multilingual systems that can translate from and into English for both speech and text. Moreover, it outperforms the existing state-of-the-art cascaded systems, achieving up to 8% and 23% higher BLEU (Bilingual Evaluation Understudy) scores in speech-to-text and speech-to-speech tasks, respectively. Beyond quality, when tested for robustness, our system is, on average, approximately 50% more resilient against background noise and speaker variations in speech-to-text tasks than the previous state-of-the-art systems. We evaluated SEAMLESSM4T on added toxicity and gender bias to assess translation safety. For the former, we included two strategies for added toxicity mitigation working at either training or inference time. Finally, all contributions in this work are publicly available for non-commercial use to propel further research on inclusive speech translation technologies.
preprint

Large Concept Models: Language Modeling in a Sentence Representation Space

LCM Team, Loı̈c Barrault, Paul-Ambroise Duquenne, Maha Elbayad, Artyom Kozhevnikov, and 7 more authors

arXiv e-prints, Dec 2024

Abs PDF Code

LLMs have revolutionized the field of artificial intelligence and have emerged as the de-facto tool for many tasks. The current established technology of LLMs is to process input and generate output at the token level. This is in sharp contrast to humans who operate at multiple levels of abstraction, well beyond single words, to analyze information and to generate creative content. In this paper, we present an attempt at an architecture which operates on an explicit higher-level semantic representation, which we name a concept. Concepts are language- and modality-agnostic and represent a higher level idea or action in a flow. Hence, we build a "Large Concept Model". In this study, as proof of feasibility, we assume that a concept corresponds to a sentence, and use an existing sentence embedding space, SONAR, which supports up to 200 languages in both text and speech modalities. The Large Concept Model is trained to perform autoregressive sentence prediction in an embedding space. We explore multiple approaches, namely MSE regression, variants of diffusion-based generation, and models operating in a quantized SONAR space. These explorations are performed using 1.6B parameter models and training data in the order of 1.3T tokens. We then scale one architecture to a model size of 7B parameters and training data of about 2.7T tokens. We perform an experimental evaluation on several generative tasks, namely summarization and a new task of summary expansion. Finally, we show that our model exhibits impressive zero-shot generalization performance to many languages, outperforming existing LLMs of the same size. The training code of our models is freely available.
TMLR

Merging Text Transformer Models from Different Initializations

Neha Verma, and Maha Elbayad

Transactions on Machine Learning Research, Nov 2024

PDF Code
Nature

Scaling neural machine translation to 200 languages

Marta R. Costa-jussà, James Cross, Onur Çelebi, Maha Elbayad, Kenneth Heafield, and 34 more authors

Nature, Jun 2024

Abs DOI HTML Code

The development of neural techniques has opened up new avenues for research in machine translation. Today, neural machine translation (NMT) systems can leverage highly multilingual capacities and even perform zero-shot translation, delivering promising results in terms of language coverage and quality. However, scaling quality NMT requires large volumes of parallel bilingual data, which are not equally available for the 7,000+ languages in the world1. Focusing on improving the translation qualities of a relatively small group of high-resource languages comes at the expense of directing research attention to low-resource languages, exacerbating digital inequities in the long run. To break this pattern, here we introduce No Language Left Behind—a single massively multilingual model that leverages transfer learning across languages. We developed a conditional computational model based on the Sparsely Gated Mixture of Experts architecture2–7, which we trained on data obtained with new mining techniques tailored for low-resource languages. Furthermore, we devised multiple architectural and training improvements to counteract overfitting while training on thousands of tasks. We evaluated the performance of our model over 40,000 translation directions using tools created specifically for this purpose—an automatic benchmark (FLORES-200), a human evaluation metric (XSTS) and a toxicity detector that covers every language in our model. Compared with the previous state-of-the-art models, our model achieves an average of 44% improvement in translation quality as measured by BLEU. By demonstrating how to scale NMT to 200 languages and making all contributions in this effort freely available for non-commercial use, our work lays important groundwork for the development of a universal translation system.