DESCRIPTION :
Beyond RL, CURL generalizes several frameworks in machine learning, including:
* Pure exploration [1],
* Imitation learning [2],
* Certain instances of mean-field control [3],
* Mean-field games [4],
* Risk-averse reinforcement learning [5].
The non-linearity of CURL breaks the linear structure inherent in standard RL, rendering the classical Bellman equations invalid. The theoretical performance analysis of algorithms in this general framework remains largely unexplored [6-8], and existing solutions rely on strong assumptions and require finite state and action spaces, leading to poor scalability as these spaces grow.
In this postdoctoral project, we aim to lift these restrictive assumptions and extend this line of work to parametrized state and action spaces. The main challenge will be to develop an efficient solution that adapts to the effective dimension of these spaces. We also anticipate that new research directions may emerge during the visit., [1] E. Hazan, S. Kakade, K. Singh et A. Van Soest. "Provably Efficient Maximum Entropy Exploration". In : Interna-
tional Conference on Machine Learning. T. 97. Sept. 2019, p. 2681-2691.
[2] J. W. Lavington, S. Vaswani et M. Schmidt. "Improved Policy Optimization for Online Imitation Learning". In :
Proceedings of The 1st Conference on Lifelong Learning Agents. Sous la dir. de S. Chandar, R. Pascanu et
D. Precup. T. 199. Proceedings of Machine Learning Research. PMLR, 22-24 Aug 2022, p. 1146-1173.
[3] A. Bensoussan, P. Yam et J. Frehse. Mean Field Games and Mean Field Type Control Theory. English. Sprin-
gerBriefs in Mathematics. Springer, 2013.
[4] P. Lavigne et L. Pfeiffer. Generalized conditional gradient and learning in potential mean field games. 2023.
[5] J. Garcia, Fern et o Fernandez. "A Comprehensive Survey on Safe Reinforcement Learning". In : Journal of
Machine Learning Research 16.42 (2015), p. 1437-1480.
[6] B. M. Moreno, M. Bregere, P. Gaillard et N. Oudjane. "Efficient model-based concave utility reinforcement
learning through greedy mirror descent". In : International Conference on Artificial Intelligence and Statistics.
PMLR. 2024, p. 2206-2214., The research mission includes the production of both theoretical and practical contributions, to be enhanced by:
- publications and presentations in machine learning or optimization conferences or journals,
- creation of Python packages
Avantages
* Subsidized meals
* Partial reimbursement of public transport costs
* Leave: 7 weeks of annual leave + 10 extra days off due to RTT (statutory reduction in working hours) + possibility of exceptional leave (sick children, moving home, etc.)
* Possibility of teleworking (90 days / year) and flexible organization of working hours
* Professional equipment available (videoconferencing, loan of computer equipment, etc.)
* Social, cultural and sports events and activities
* Access to vocational training
* Complementary health insurance under conditions
Code d'emploi : Technologies de l'Information et de la Communication (autre)
Domaine professionnel actuel : Technologies de l'Information et de la Communication (autre)
Niveau de formation : Bac+8
Temps partiel / Temps plein : Plein temps
Type de contrat : Contrat à durée indéterminée (CDI)
Compétences : Intelligence Artificielle, Game Theory, Python (Langage de Programmation), Libcurl, Machine Learning, Apprentissage par Renforcement, Matériel Informatique, Technologies Informatiques, Anglais, Algorithmes, Formation Continue, Théories de Contrôle, Gestion de la Performance, Organisation d'Événements, Scalabilité, Mathématiques, Services aux Passagers, Recherche Post-Doctorale, Analyse de Risques, Etudes et Statistiques, Capacités de Démonstration, Vidéoconférence, Imitation, Événements Sportifs
Courriel :
pierre.gaillard@inria.fr
Téléphone :
0139635511
Type d'annonceur : Employeur direct