We can't find the internet
Attempting to reconnect
Something went wrong!
Hang in there while we get back on track
Marzieh Fadaee - Keynote: The Art of Language: Mastering Multilingual Challenges in LLMs
Explore the complexities of building multilingual LLMs with Marzieh Fadaee. Learn about data challenges, cultural nuances, and successful strategies from the AYA project.
-
Building multilingual language models brings unique challenges around cultural context, translation complexities, and data quality/availability for different languages
-
The AYA project created one of the largest open multilingual instruction datasets, covering 65 languages through community-driven data collection involving over 3,000 people from 119 countries
-
Language models often perform worse on low-resource languages and can exhibit catastrophic forgetting when adding new languages - careful balancing is needed between language coverage and model quality
-
Evaluation of multilingual models is particularly challenging due to:
- Need for culturally-appropriate benchmarks for each language
- Difficulty in comparing performance across languages
- Limited availability of human evaluators for many languages
- Translation artifacts affecting results
-
Cross-lingual transfer can occur both positively (languages helping each other improve) and negatively (performance degradation in one language when adding another)
-
Critical challenges in multilingual LLM development:
- Data collection and quality for low-resource languages
- Handling cultural nuances and biases
- Balancing general vs. language-specific knowledge
- Privacy and ethical considerations around data usage
- Model transparency and accountability
-
Community involvement of native speakers is essential for:
- Creating high-quality training data
- Designing appropriate evaluation benchmarks
- Understanding cultural context
- Ensuring proper representation of languages
-
Open science and transparency in multilingual model development helps advance the field by allowing others to identify issues and build improvements