Best Paper Award at EMNLP 2024
14 November 2024
Mir Tafseer Nayeem and Davood Rafiei won the Best Resource Paper Award at EMNLP 2024 for their paper “KidLM: Advancing Language Models for Children – Early Insights and Future Directions”.
This paper, one of two recipients of the “Resource Paper Award,” lays the groundwork for developing child-specific language models by highlighting the critical role of high-quality pre-training data. The authors present a novel user-centric data collection pipeline that involves gathering and validating a corpus specifically tailored for children, including content written for and sometimes by them. Additionally, they introduce Stratified Masking, a new training objective that dynamically adjusts masking probabilities based on domain-specific child language data, enabling models to prioritize vocabulary and concepts more suitable for children’s linguistic needs.
Read the full paper for more details here!