DESCRIPTION :
System Development & Optimization: Design, implement, and maintain the infrastructure and core components for large-scale generative models (e.g., text-to-video, virtual try-on architectures). Focus on optimizing model training and inference pipelines for efficiency and scalability (e.g., using frameworks like PyTorch/TensorFlow, distributed training).
Data & Pipeline Management: Develop robust data ingestion and processing pipelines for multi-modal data (text prompts, garment images, 3D body parameters). Ensure the integrity and accessibility of research datasets.
M.L.Ops and Deployment: Facilitate the transition of research prototypes into stable, reusable codebase modules. Implement version control, automated testing, and documentation to support rapid iteration and collaboration within the research team.
Tooling and Infrastructure: Provide development support for research experiments, including setting up and managing cloud/GPU compute resources, developing visualization tools, and maintaining a high-quality, reproducible code environment.
Evaluation System Implementation: Engineer the software infrastructure for the comprehensive evaluation framework, automating quantitative metrics (e.g., FID, LPIPS, temporal coherence measures) and integrating tools for collecting and analyzing perceptual user study data.
Research & Development Objectives
The candidate will actively contribute to the following research objectives:
Architecture Development: Design and implement novel generative architectures that unify text-based scene control with fine-grained garment preservation. This involves combining the flexibility of text-to-video models with precision garment conditioning.
Fidelity and Coherence: Develop and integrate components (e.g., specialized attention mechanisms, diffusion model conditioning) to ensure that virtual garments retain their texture, shape, and material properties while maintaining coherence in lighting, shading, and motion across the generated scene.
Realistic Interaction Modeling: Research and implement techniques to model dynamic garment response to human-environment interactions (e.g., sitting, object handling, movement). Explore methods for integrating 3D body parameters and physical constraints into the generation process for enhanced realism.
Multi-Modal Conditioning: Lead the implementation of advanced multi-modal conditioning strategies, effectively leveraging text prompts, garment images, and body parameters simultaneously to steer the generation process.
Personal Research: The role includes dedicated time for the candidate to propose and lead self-directed research threads that align with or extend the project's core goals, resulting in publications and/or patentable technology.
Principales activités
The Research Engineer will drive the technical development and implementation of advanced generative models, providing essential engineering support for a leading research project while pursuing independent research contributions.
Technical Scope & Implementation
The role involves the engineering execution of a multi-stage technical roadmap, requiring expertise in deep learning frameworks and system architecture design:
System Architecting: Design and optimize scalable architectures that unify text-to-video (T2V) models with fine-grained visual conditioning (e.g., virtual try-on, object control).
Conditioning Integration: Develop and implement multi-modal conditioning techniques, utilizing text, image, and body parameters simultaneously to control generation fidelity.
Dynamic Modeling: Engineer modules to incorporate physics-aware or learned priors to model realistic garment and object deformation in response to human pose changes and environmental interactions.
Coherence Mechanisms: Implement cross-attention and control networks, alongside advanced loss functions (e.g., temporal regularization), to ensure lighting, texture, and motion consistency across video sequences.
Evaluation Framework Development: Build and maintain an automated evaluation suite to benchmark the proposed framework using both standard quantitative metrics (e.g., FID) and tools for conducting perceptual user studies.
Infrastructure Support: Provide robust M.L.Ops support, including pipeline development, optimization for distributed training, and maintaining a high-quality, reproducible research codebase.
Research Contribution
The candidate is expected to contribute original research ideas and technical solutions to key challenges, leading to publications and technical innovations within the domain of generative human-centric video.
Code d'emploi : Monteur de Vidéos (h/f)
Domaine professionnel actuel : Techniciens Image et Son
Niveau de formation : Bac+5
Temps partiel / Temps plein : Plein temps
Type de contrat : Contrat à durée déterminée (CDD)
Compétences : Intelligence Artificielle, Vision par Ordinateur, Automatisation des Tests, Visualisation de Données, Machine Learning, Cycle de Vie du Développement de Systèmes, Tensorflow, Systèmes Logiciels, Pytorch, Large Language Models, Deep Learning, Software Version Control, Anglais, Recherche, Architecture, Conception Architecturale, Edition, Expérimentation, Gestion des Infrastructures, Sciences Physiques, Déformation, Conception et Réalisation en Robotique
Courriel :
stephane.lathuiliere@inria.fr
Téléphone :
0139635511
Type d'annonceur : Employeur direct