DESCRIPTION :
The PhD position will be hosted within the MAGNET team at Inria Lille [1], in partnership with with the SCALAB group at University of Lille [2] in an effort to strenghten collaborations between these two research teams, and specifically to foster cross-fertilizations
between Natural Language Processing (NLP) and psycholinguistics. The MAGNET is actually evolving into a new interdisciplinary research group focusing on cognitively-grounded computational, neural-based models of language and reasoning.
Mission confiée
This PhD project investigates semantic memory through complementary contrastive and integrative approaches, at the intersection of
cognitive psychology and natural language processing. The overarching goal is to better understand the semantic capacities of large language models (LLMs) by comparing them to human cognition, and to improve these models using cognitively inspired learning biases.
Principales activités
The first research axis focuses on contrastive evaluation: we will design robust probing and prompting techniques to analyze how
different families of LLMs (e.g., auto-regressive vs. masked models) encode and organize semantic knowledge. Models will be evaluated on datasets from experimental psychology, such as typicality norms (e.g., Rosch) and semantic feature norms (e.g., McRae, Buchanan), possibly including new data collection. The goal is to assess whether and how these models exhibit well-known features of human semantic memory such as taxonomic and prototypical organization, semantic feature sharing and inheritance, and polysemy -building upon preliminary work carried out in the team [3, 4, 5]. In addition, we intend to explore the structure of representations in vision-language models to investigate how multi-modal grounding shapes semantic memory, in light of findings from blind populations and developmental theories that challenge the necessity of visual input for acquiring rich word meanings.
The second axis focuses on integrative modeling, aiming to develop LLMs with inductive biases inspired by human cognitive development. Drawing from developmental psycholinguistics and findings in semantic memory acquisition, we will explore how representations evolve in humans and model this process in artificial learners. We will experiment with training regimes that control input volume, syntactic complexity, and curriculum structure. Longitudinal corpora and multimodal input (e.g., visual and symbolic data) will be used to simulate developmental conditions. This approach is directly inspired by recent initiatives such as the BabyLM benchmark campaigns, which promote the design of smaller, more data-efficient language models grounded in child language learning. Our goal is to integrate such developmental constraints into the architecture and training of LLMs in order to foster interpretability, efficiency, and cognitive plausibility. In both axes, both English and
French data will be considered.
Code d'emploi : Thésard (h/f)
Niveau de formation : Bac+5
Temps partiel / Temps plein : Plein temps
Type de contrat : Contrat à durée déterminée (CDD)
Compétences : Intelligence Artificielle, Psychologie Cognitive, Sciences Cognitives, Linguistique Informatique, Simulation Informatique, Programmation Informatique, Python (Langage de Programmation), Machine Learning, Traitement du Langage Naturel, Large Language Models, Anglais, Français, Recherche, Architecture, Développement Cognitif, Collecte de Données, Recherche Empirique, Psychologie Expérimentale, Sémantique, Simulations
Courriel :
Pascal.Denis@inria.fr
firstname.lastname@inria.fr
Téléphone :
0139635511
Type d'annonceur : Employeur direct