Denis Barthou

Full professor, Bordeaux INP/ENSEIRB-MATMECA
PhD, HdR

I am full professor in Computer Science, at Bordeaux INP/ENSEIRB-MATMECA. I graduated from ENS Lyon and earned a PhD from the University of Versailles–Saint-Quentin in 1998, where I worked on dependency analysis in the polyhedral model. I then became an assistant professor at the same university. In 2009, I joined Bordeaux INP as a full professor and became part of Inria’s Runtime team. In 2017, I created and led Inria’s Storm team while also overseeing the Computer Science program at the ENSEIRB-MATMECA engineering school of Bordeaux INP. From 2023 to mid-2025, I was on leave to direct Huawei Paris’s Distributed and Parallel Research Lab. After returning from this leave, I joined Inria’s Topal team.
Over the past two years, I’ve increasingly focused on how to scale AI systems, especially open-source large language models. I’ve explored parallelization techniques for both training and inference, and how these models behave when deployed on large cloud infrastructures built around modern AI accelerators. Along the way, I’ve gained substantial hands-on experience with a variety of heterogeneous hardware platforms—from the familiar NVIDIA GPU ecosystem to architectures like Huawei’s Ascend NPUs, as well as multicore ARM and Intel processors. More broadly, my research has always revolved around making complex applications run faster and more efficiently. I’m interested in the full stack: optimization, parallel and distributed methods, and the design of compilation and runtime systems for AI and high-performance computing. These themes have accompanied me throughout my career, even as the computing landscape continues to evolve.

Denis Barthou


Software

MAQAO (Modular Assembly Quality Analyzed and Optimizer) is a parallel performance analysis and optimization framework that I initiated in 2004. Designed for developers and performance experts, it provides advanced capabilities for analyzing and optimizing low-level code. MAQAO has stood the test of time: it is actively maintained and extended at the University of Versailles, supported by a long-lasting community of users and contributors. It has become part of the VI-HPS Institute, is listed among the EU Innovation Radar technologies, and has been featured in an AWS blog highlighting its potential. MAQAO is distributed under the GPL license and is now used in training programs for engineers as well as in large industrial environments.
AFF3CT (A Fast Forward Error Correction Toolbox) is a high-performance simulation framework dedicated to Forward Error Correction (FEC, or channel coding). It was initiated during A. Cassagne’s PhD in 2017 and supports a wide range of codes—from well-established Turbo codes to modern Polar and LDPC codes. AFF3CT has grown into a vibrant, active and industrial community maintained and developped at Sorbonne University, IMS Lab/Bordeaux University and Inria. Users and developers gather annually during the AFF3CT Day, illustrating the maturity and engagement around the project. Thanks to its performance and robustness, AFF3CT is now used in both academic research and industrial settings. It is distributed under the MIT license.
MIPP: MIPP is a portable, open-source (MIT license) C++11 wrapper for SIMD intrinsic functions. It supports SSE, AVX, AVX-512, and ARM NEON (32-bit and 64-bit) instruction sets, covering both single/double-precision floating-point operations and signed integer arithmetic (64, 32, 16, and 8 bits). By abstracting architecture-specific intrinsics behind a unified interface, MIPP eliminates the need for developers to write manual SIMD code; the appropriate intrinsics are generated automatically. Initially developed during A. Cassagne’s PhD, MIPP continues to be actively maintained and widely adopted in performance-critical applications.
PARCOACH PARCOACH (PARallel COntrol flow Anomaly CHecker) targets debugging of modern scientific applications that rely on MPI and hybrid MPI+X models (where X is typically a thread-based runtime such as OpenMP). Originally initiated during Emmanuelle Saillard’s PhD, PARCOACH is today actively maintained and developed by her. PARCOACH combines static and dynamic analyses to detect misuse of collective operations in parallel applications, helping developers diagnose subtle communication errors that arise in large-scale systems. PARCOACH is now used and referenced for instance in the EuroHPC DEEP-SEA software stack and in recent venues such as Correctness@SC’23 and EuroMPI/Australia 2024, where it is cited as one of the state-of-the-art MPI correctness tools.

Research activities
Current PhD students
► L. Sauleau: PhD directed by Pr. C. Ancourt on Enhancing Data Transfer and Performance on Heterogeneous Architectures.
► B. Priour: PhD directed by Pr. A. Tchana on Topology-adaptive optimization for distributing computing in LLM using xOS's principles.
► V. Alba: PhD on Resource dimensioning for heterogeneous architectures.
PhD Alumni
► D. Orhan. Modeling and dynamic optimization of software radio chains on heterogeneous architectures. 2025, U.Bordeaux PhD thesis, co-directed with C.Jego
► B. Coye (with Ubisoft). Dynamic Task Graph Scheduling by Composition, 2023, U.Bordeaux PhD thesis, co-directed with Pr. R.Namyst. Now research engineer at Ubisoft.
► V.-M. Nguyen. Compile-time Validation and Optimization of MPI Nonblocking Communications, 2022, U.Bordeaux PhD thesis, co-directed with P. Carribault (CEA). Now research engineer at Eviden.
► C. T. Ait Kaci. Static and dynamic analysis for memory access concurrency error detection in MPI-RMA applications, 2022, U.Bordeaux PhD thesis. Now scientific project manager at Cap Gemini.
► A. Cassagne. Optimization and parallelization methods for software-defined radio, 2020, U. Bordeaux PhD thesis, co-directed with Pr. C.Jego. Now Ass. Professor at Paris Sorbonne University
► P. Huchant. Static Analysis and Dynamic Adaptation for Parallelism, 2019, U. Bordeaux PhD thesis. Now Senior Software Engineer at Synopsis, Bordeaux
► H. Brunie. Optimization of data allocation for HPC applications on heterogeneous memory architectures, 2019, U. Bordeaux PhD thesis, co-directed with P. Carribault (CEA). Now Postdoc at Inria Grenoble.
► C. Haine. Kernel Optimization by Layout Restructuring, 2017, U. Bordeaux PhD thesis. Now research engineer at HPE, Swizerland.
► G. Vaumourin. Hybrid Memory Hierarchy and Dynamic Data Handling in Embedded Parallel Architectures., 2016, U. Bordeaux PhD thesis. Now research engineer at ATOS, Grenoble.
► E. Saillard. Static/dynamic/iterative analyses for validation and improvement of multi-models HPC applications, 2015, U. Bordeaux PhD thesis. Now Inria Researcher.
► B. Putigny. Benchmark-driven Approaches to Performance Modeling of Multi-core architectures, 2014, U. Bordeaux PhD thesis. Now HPC engineer at Eviden, Bordeaux.
► S. Henry. Programming Models and Runtime Systems for Heterogeneous Architectures, 2013, U. Bordeaux PhD thesis. Now engineer at IOHK.
► L. Duchateau. Automatic Algorithm Derivation and Exploration in Linear Algebra for Parallelism and Locality, 2013, UIUC PhD thesis, co-directed with Pr. D. Padua. Now senior software engineer at Pure Storage, Bellevue, USA.
► A. Mazouz. Une Etude Empirique des Performances des Applications OpenMP sur les Plateformes Multi-coeurs, 2012, UVSQ PhD thesis, co-directed with Pr. S.-A. Touati. Now senior software engineer at Intel, Paris.
► A. Charif-Rubial. On code performance analysis and optimization for multicore architectures, 2012, UVSQ PhD thesis, co-directed with Pr. W. Jalby. In memoriam.
► J. Jaeger. Source-to-source transformations for irregular and multithreaded code optimization, 2012, UVSQ PhD thesis. Now Research engineer at CEA.
► P. De Oliveira Castro Herrero. Expression and optimization of data reorganizations on data flow parallelism , 2010, UVSQ PhD thesis. Now Professor, HDR, at Paris-Saclay University, Versailles St Quentin en Yvelines.
► S. Donadio. Iterative optimization of performance libraries by hierarchical division of codes, 2007, UVSQ PhD thesis, directed by and co-advised with Pr. W. Jalby. Now Architect/Product manager at Bloomberg, New York, USA and adjunct professor at Columbia Engineering.
► C. Alias. Program Optimization by Template Recognition and Replacement, 2005, UVSQ PhD thesis, directed by and co-advised with P. Feautrier. Now Inria Researcher, HDR and chief scientific advisor of XtremLogic
Postdoc
► Lilia Ziane Khodja (2014), Modeling of parallel HPC applications running on platforms composed by modern multicore nodes interconnected with high performance networks, with B.Goglin. Now Consultant at ANEO;
Engineers
► P. Virouleau, working with ATOS on Parcoach project (2022-2024). Now permanent research engineer at Inria.
► M. Makni, working on H2020 Microcard project (2022-2023). Now research engineer at Lytid.
► C. Sakka, working on ANR Exacard project (2021-2022). Now engineer at ANEO.
► K. He, working on AFF3CT (2018-2019), with A. Cassagne and O. Aumage. Now engineer at IHU Liryc.
► A. Cassagne, working on optimizing Error Correcting Codes (2015-2016), with B. Le Gal (IMS), C. Leroux (IMS) and O. Aumage. Now Ass. Professor at Paris Sorbonne University;
► J. Tombi A Mba, working on MAQAO for Arm (2014-2015), with O. Aumage. Now engineer senior software engineer at BePatient;
► T. Meunier, working on performance analysis for vectorization and data restructuring (2013), with O. Aumage;

Publications
Patent
Devices and Methods for Optimizing Parallel Execution of a Machine Learning Model. Lamprou Ioannis, Zhang Zhen, Filhol Etienne, and Barthou Denis. 106-539-523-079-778, Dec 2024. Pending.Patent ]