Denis Barthou

Industrial & Community Impact

I've been working with industry through different projects (European, ANR grants, direct industrial projects). The projects for instance of 2022 can be consulted in the Storm Inria scientific annual report.

Patents and innovation:

I am co-inventor on two international patent applications with Huawei, both addressing core challenges in the efficient execution of parallel and AI workloads. They are currently active and pending under the PCT framework.

WO2024/255997 — Optimizing Parallel Execution of Machine Learning Models. This patent covers methods and systems for improving the parallel execution of machine learning models through optimized scheduling and execution strategies on heterogeneous hardware platforms.
WO2025/194432 — Processing Computational Graphs for Task Scheduling. This patent introduces techniques for analyzing and transforming computational graphs to improve runtime scheduling and execution efficiency for complex task-based and parallel applications.

Publications cited in patents

My research has been cited in over 30 active international patents, including filings from major technology companies such as Google, IBM, Intel, Qualcomm, Microsoft, Oracle, Reservoir Labs and Xilinx^a. The number of citations has grown significantly in recent years, with an industrial relevance in AI systems, compiler technology, heterogeneous computing, and performance engineering.

Software cited in patents

Beyond publications, the open-source software tools I contributed to have also been cited directly in patent filings:

MAQAO — cited in patents from IBM and Xilinx, in the context of performance analysis, compilation, and hardware-aware optimization.
AFF3CT — cited in a patent from MIT, U. of Columbia, and U. Ireland Maynooth, related to forward error-correction and simulation frameworks.

Logos are shown for identification purposes only.

Large-scale AI systems (selected experience)

I have contributed to AI system projects spanning the full lifecycle, from research and prototyping to deployment on large-scale production infrastructures, including customer environments involving tens of thousands of accelerators. This work covered performance modeling, parallelization strategies, and system-level optimization for real-world constraints.

Software & Tools

I have directly contributed to MAQAO and AFF3CT software, that have now active communities of users, in both industry and academia.

MAQAO

Modular Assembly Quality Analyzed and Optimizer is a parallel performance analysis and optimization framework that I initiated in 2004. Designed for developers and performance experts, it provides advanced capabilities for analyzing and optimizing low-level code. MAQAO has stood the test of time: it is actively maintained and extended at the University of Versailles, supported by a long-lasting community of users and contributors. It has become part of the VI-HPS Institute, is listed among the EU Innovation Radar technologies, and has been featured in an AWS blog highlighting its potential.

GitLab

Web page

AFF3CT

A Fast Forward Error Correction Toolbox is a high-performance simulation framework dedicated to Forward Error Correction (FEC, or channel coding). It was initiated during A. Cassagne’s PhD in 2017 and supports a wide range of codes—from well-established Turbo codes to modern Polar and LDPC codes. AFF3CT has grown into a vibrant, active and industrial community maintained and developped at Sorbonne University, IMS Lab/Bordeaux University and Inria. Users and developers gather annually during the AFF3CT Day, illustrating the maturity and engagement around the project. Thanks to its performance and robustness, AFF3CT is now used in both academic research and industrial settings.

GitHub

Web page

I have also contributed to other software tools through PhD supervision and research projects (e.g., MIPP, PARCOACH), focusing on SIMD abstraction, MPI correctness checking. These projects are actively maintained by their respective communities

People, Mentoring & Leadership

I regularly supervise PhD students in collaboration with academic and industrial partners. Please get in touch if you are interested.

Current PhD students

► L. Sauleau: PhD directed by Pr. C. Ancourt on Enhancing Data Transfer and Performance on Heterogeneous Architectures.

► B. Priour: PhD directed by Pr. A. Tchana on Topology-adaptive optimization for distributing computing in LLM using xOS's principles.

PhD Alumni

► V. Alba. Resource dimensioning for heterogeneous architectures., 2025, U.Bordeaux PhD thesis.

► D. Orhan. Modeling and dynamic optimization of software radio chains on heterogeneous architectures. 2025, U.Bordeaux PhD thesis, co-directed with C.Jego

► B. Coye (with Ubisoft). Dynamic Task Graph Scheduling by Composition, 2023, U.Bordeaux PhD thesis, co-directed with Pr. R.Namyst. Now research engineer at Ubisoft.

► V.-M. Nguyen. Compile-time Validation and Optimization of MPI Nonblocking Communications, 2022, U.Bordeaux PhD thesis, co-directed with P. Carribault (CEA). Now research engineer at Eviden.

► C. T. Ait Kaci. Static and dynamic analysis for memory access concurrency error detection in MPI-RMA applications, 2022, U.Bordeaux PhD thesis. Now scientific project manager at Cap Gemini.

► A. Cassagne. Optimization and parallelization methods for software-defined radio, 2020, U. Bordeaux PhD thesis, co-directed with Pr. C.Jego. Now Ass. Professor at Paris Sorbonne University

► P. Huchant. Static Analysis and Dynamic Adaptation for Parallelism, 2019, U. Bordeaux PhD thesis. Now Senior Software Engineer at Synopsis, Bordeaux

► H. Brunie. Optimization of data allocation for HPC applications on heterogeneous memory architectures, 2019, U. Bordeaux PhD thesis, co-directed with P. Carribault (CEA). Now Postdoc at Inria Grenoble.

► C. Haine. Kernel Optimization by Layout Restructuring, 2017, U. Bordeaux PhD thesis. Now research engineer at HPE, Swizerland.

► G. Vaumourin. Hybrid Memory Hierarchy and Dynamic Data Handling in Embedded Parallel Architectures., 2016, U. Bordeaux PhD thesis. Now research engineer at ATOS, Grenoble.

► E. Saillard. Static/dynamic/iterative analyses for validation and improvement of multi-models HPC applications, 2015, U. Bordeaux PhD thesis. Now Inria Researcher.

► B. Putigny. Benchmark-driven Approaches to Performance Modeling of Multi-core architectures, 2014, U. Bordeaux PhD thesis. Now HPC engineer at Eviden, Bordeaux.

► S. Henry. Programming Models and Runtime Systems for Heterogeneous Architectures, 2013, U. Bordeaux PhD thesis. Now engineer at IOHK.

► L. Duchateau. Automatic Algorithm Derivation and Exploration in Linear Algebra for Parallelism and Locality, 2013, UIUC PhD thesis, co-directed with Pr. D. Padua. Now senior software engineer at Pure Storage, Bellevue, USA.

► A. Mazouz. Une Etude Empirique des Performances des Applications OpenMP sur les Plateformes Multi-coeurs, 2012, UVSQ PhD thesis, co-directed with Pr. S.-A. Touati. Now senior software engineer at Intel, Paris.

► A. Charif-Rubial. On code performance analysis and optimization for multicore architectures, 2012, UVSQ PhD thesis, co-directed with Pr. W. Jalby. In memoriam.

► J. Jaeger. Source-to-source transformations for irregular and multithreaded code optimization, 2012, UVSQ PhD thesis. Now Research engineer at CEA.

► P. De Oliveira Castro Herrero. Expression and optimization of data reorganizations on data flow parallelism , 2010, UVSQ PhD thesis. Now Professor, HDR, at Paris-Saclay University, Versailles St Quentin en Yvelines.

► S. Donadio. Iterative optimization of performance libraries by hierarchical division of codes, 2007, UVSQ PhD thesis, directed by and co-advised with Pr. W. Jalby. Now Architect/Product manager at Bloomberg, New York, USA and adjunct professor at Columbia Engineering.

► C. Alias. Program Optimization by Template Recognition and Replacement, 2005, UVSQ PhD thesis, directed by and co-advised with P. Feautrier. Now Inria Researcher, HDR and chief scientific advisor of XtremLogic

Postdoc

► Lilia Ziane Khodja (2014), Modeling of parallel HPC applications running on platforms composed by modern multicore nodes interconnected with high performance networks, with B.Goglin. Now Consultant at ANEO;

Engineers

► P. Virouleau, working with ATOS on Parcoach project (2022-2024). Now permanent research engineer at Inria.

► M. Makni, working on H2020 Microcard project (2022-2023). Now research engineer at Lytid.

► C. Sakka, working on ANR Exacard project (2021-2022). Now engineer at ANEO.

► K. He, working on AFF3CT (2018-2019), with A. Cassagne and O. Aumage. Now engineer at IHU Liryc.

► A. Cassagne, working on optimizing Error Correcting Codes (2015-2016), with B. Le Gal (IMS), C. Leroux (IMS) and O. Aumage. Now Ass. Professor at Paris Sorbonne University;

► J. Tombi A Mba, working on MAQAO for Arm (2014-2015), with O. Aumage. Now engineer senior software engineer at BePatient;

► T. Meunier, working on performance analysis for vectorization and data restructuring (2013), with O. Aumage;

Leadership through Coaching

In addition to technical mentoring, I have experience training HR, engineering and research managers through leadership development programs, specifically leadership through coaching. This includes training sessions focused on technical leadership and team dynamics in high-technology environments.

Selected Publications

My complete list of publications can be found on ORCID.

Optimal scheduling algorithms for software-defined radio pipelined and replicated task chains on multicore architectures, Diane Orhan, Laércio Lima Pilla, Denis Barthou, Adrien Cassagne, Olivier Aumage, Romain Tajan, Christophe Jégo, Camille Leroux. In Journal of Parallel and Distributed Computing, vol 204, 2025 [ DOI | http ]
This paper presents an optimal and greedy scheduling method for streams on chains of tasks, using both replication (for parallelism) and pipeline, with limited number of cores. It optimizes the throughput of the stream.
PolyTOPS: Reconfigurable and Flexible Polyhedral Scheduler. Gianpietro Consolaro, Zhen Zhang, Harenome Razanajato, Nelson Lossing, Nassim Tchoulak, Adilla Susungi, Artur Cesar Araujo Alves, Renwei Zhang, Denis Barthou, Corinne Ancourt, and Cédric Bastoul. IEEE/ACM International Symposium on Code Generation and Optimization (CGO), 2024. [ DOI | http ]
PolyTOPS is a configurable, tunable polyhedral scheduler, applied to both compute intensive and AI codes. The configurability allows auto-tuning.
Exploring scheduling algorithms for parallel task graphs: a modern game engine case study. Mustapha Regragui, Baptiste Coye, Laércio Lima Pilla, Raymond Namyst, and Denis Barthou. In International European Conference on Parallel and Distributed Computing (Euro-Par), 2022. [DOI | .pdf ]
This paper describes how task scheduling methods can impact game engine performance. This was evaluated on Ubisoft game engines.
AFF3CT: A Fast Forward Error Correction Toolbox! Adrien Cassagne, Olivier Hartmann, Mathieu Leonardon, Kun He, Camille Leroux, Romain Tajan, Olivier Aumage, Denis Barthou, Thibaud Tonnellier, Vincent Pignoly, Bertrand Le Gal, and Christophe Jego. SoftwareX, 2019. [ DOI | .pdf ]
This is the reference paper for AFF3CT, proposing an efficient framework for the writing of error correcting codes in software defined radios.
Adaptive Partitioning for Iterated Sequences of Irregular OpenCL Kernels. Pierre Huchant, Denis Barthou, and Marie-Christine Counilh. In SBAC-PAD - 30th International Symposium on Computer Architecture and High Performance Computing, 2018. [ DOI | .pdf ]
The method proposed in this paper is to automatically split and distribute a kernel on multiple GPUs. Provided the kernel is repeatedly executed, a dynamic and efficient load balancing is performed, leading to optimal performance for instance on physic simulation such as n-body simulation.
Rewriting System for Profile-Guided Data Layout Transformations on Binaries. Olivier Aumage, Christopher Haine, and Denis Barthou. In Int. European Conference on Parallel and Distributed Computing, 2017. [ bib | DOI | .pdf ]
This paper proposes a method based on MAQAO that identifies the access patterns to the data (SoA, AoS, ...) and allows to evaluate the impact of changing this layout on performance. The transformation is directly performed on the assembly code and allows the programmer to quickly have an estimation before an expensive rewrite of the data layout.
PARCOACH: Combining static and dynamic validation of MPI collective communications. Emmanuelle Saillard, Patrick Carribault, and Denis Barthou. International Journal of High Performance Computing Applications, 2014. [ DOI | .pdf ]
Parcoach analyzes statically MPI collectives and adds code to prevent deadlocks whenever the situation may arise, identifying the root cause of the deadlock.
Hydra: Automatic algorithm exploration from linear algebra equations. Alexandre Duchâteau, David A. Padua, and Denis Barthou. In IEEE/ACM International Symposium on Code Generation and Optimization (CGO), 2013. [ DOI ]
Hydra takes a mathematical equation on matrices and decomposes it recursively in order to create parallel tasks. It enables the automatic production of efficient solvers requiring very little or no cod- ing at all and delivering performance approximating that of the highly tuned library routines such as Intel’s MKL
Performance Tuning of x86 OpenMP Codes with MAQAO. Denis Barthou, Andres Charif Rubial, William Jalby, Souad Koliai, and Cédric Valensi. In Tools for High Performance Computing, 2009. [ DOI | .pdf ]
The journal article is describing how, but analyzing the binary code, MAQAO can help for performance engineering and debugging on multicore architecture.
Fuzzy Array Dataflow Analysis. Jean-Francois Collard, Denis Barthou, and Paul Feautrier. In ACM SIGPLAN Symposium on Principles & Practice of Parallel Programming (PPOPP), , 1995. [ DOI ] [ DOI ]
This paper is on a new method to extend the polydral model to non-affine constraints, in particular for dependence analysis. That is the core of my PhD.

Full professor | AI Systems · Parallel Computing · Performance Engineering