AI & ML
impact 16
Toward Cross-Lingual Quality Classifiers for Multilingual Pretraining Data Selection
Toward Cross-Lingual Quality Classifiers for Multilingual Pretraining Data Selection arXiv:2604.20549v1 Announce Type: cross Abstract: As Large Language Models (LLMs) scale, data curation has shifted from maximizing vol…
Why it matters
The timing matters: data is converging with shifts in toward, which could amplify the downstream impact.