Lecturers

Each Lecturer will hold up to four lectures on one or more research topics.

Lucas Beyer

Topics

Foundation Models, Transformers, Representation Learning, Reinforcement Learning,

Biography

Together with Xiaohua Zhai and Alexander Kolesnikov, I co-founded the Zürich OpenAI office, which made some news.

Before that, I was a Staff Research Scientist at Google DeepMind (formerly Brain) in Zürich, where I co-lead our multimodal research effort and codebase.

I have a growing list of publications at top tier conferences such as CVPR, NeurIPS, ICCV, … See my Google Scholar or Semantic Scholar pages for the full list of over 50. However, here’s a few of my favourite publications that you may have heard of, with a one-sentence TL;DR:

https://lucasb.eyer.be/

https://scholar.google.com/citations?user=p2gwhK4AAAAJ&hl=fr

Lectures

Lecture 1/3 ” Transformers & Vision Transformers”

Lecture 2/3 ” Transformers & Vision Transformers”

Lecture 3/3 ” Transformers & Vision Transformers”

Giuseppe Di Fatta

Free University of Bolzano, Italy

Biography

Giuseppe Di Fatta is a Full Professor at the Free University of Bozen-Bolzano (Italy) since 2022. From 2006 to 2021, he was with the University of Reading (UK), where he also served as Head of the Department of Computer Science from 2016 to 2021. Between 2004 and 2006, he was at the University of Konstanz (Germany), where he was part of the initial development team of KNIME, a widely used data science and machine learning platform. From 2000 to 2004, he worked with the High-Performance Computing and Networking Institute of the National Research Council of Italy, and in 1999 he was a research fellow at the International Computer Science Institute (ICSI) in Berkeley, California.
His research interests include artificial intelligence, machine learning algorithms, data science, and data-driven applications in both scientific and industrial domains. He has authored more than 140 peer-reviewed publications and has been a member of the IEEE since 2002 and a Fellow of the Higher Education Academy (UK) since 2009. He is also a member of the Technical Committee on Machine Learning (TC-ML) of the IEEE Systems, Man, and Cybernetics Society.

Lectures

Lecture “Multitask Deep Learning”

This lecture introduces the foundations of Multi-Task Learning (MTL), a machine learning paradigm in which multiple related tasks are learned jointly using shared representations. By allowing tasks to benefit from common information, MTL can improve generalisation, increase data efficiency, and reduce overfitting compared to learning tasks independently. The lecture will discuss the relationship between MTL and transfer learning, present examples from computer vision, natural language processing, and healthcare, and examine the role of multi-task learning in modern deep learning systems, including large language models.

Slides: ACDL 2026-Lecture

Paper: ACDL-2026-Paper-ECML-2025-SAM-GS

Arthur Gretton

UCL, UK

Topics

Generative Models, Causality, Hypothesis Testing, Machine Learning

Biography

Arthur Gretton is a Professor with the Gatsby Computational Neuroscience Unit, CSML, UCL, which he joined in 2010. He received degrees in physics and systems engineering from the Australian National University, and a PhD with Microsoft Research and the Signal Processing and Communications Laboratory at the University of Cambridge. He worked from 2002-2012 at the MPI for Biological Cybernetics, and from 2009-2010 at the Machine Learning Department, Carnegie Mellon University. Arthur’s research interests include machine learning, kernel methods, statistical learning theory, nonparametric hypothesis testing, blind source separation, Gaussian processes, and non-parametric techniques for neural data analysis. He has been an associate editor at IEEE Transactions on Pattern Analysis and Machine Intelligence from 2009 to 2013, an Action Editor for JMLR since April 2013, a member of the NIPS Program Committee in 2008 and 2009, a Senior Area Chair for NIPS in 2018, an Area Chair for ICML in 2011 and 2012, and a member of the COLT Program Committee in 2013. Arthur was co-chair of AISTATS in 2016 (with Christian Robert), and co-tutorials chair of ICML in 2018 (with Ruslan Salakhutdinov).

Lectures

Lecture 1/2 “Causal Effect Estimation with Context and Confounders (Part 1 )”

A fundamental causal modelling task is to predict the effect of an intervention (or treatment) on an outcome, given context/covariates. Examples include predicting the effect of a medical treatment on patient health given patient symptoms and demographic information, or predicting the effect of ticket pricing on airline sales given seasonal fluctuations in demand. The problem becomes especially challenging when the treatment and context are complex (for instance, “treatment” might be a web ad design or a radiotherapy plan), and when only observational data is available (i.e., we have access to historical data, but cannot intervene or conduct trials ourselves). The challenge is greater still when the covariates are not observed, and constitute a hidden source of confounding.

I will give an overview of some practical tools and methods for estimating causal effects of complex, high dimensional treatments from observational data. The approach is based on conditional feature means, which represent conditional expectations of relevant model features. These features can be deep neural nets (adaptive, finite dimensional, learned from data), or kernel features (fixed, infinite dimensional, enforcing smoothness). The methods will be applied to modelling employment outcomes for the US Job Corps program for Disadvantaged Youth, and in policy evaluation for reinforcement learning.

Part 1 addresses the setting where all relevant information is observed (no hidden confounding), and where the aim is to predict (conditional) average causal effects from observations of data, without resorting to intervention or randomized trials.

Lecture 2/2 “Causal Effect Estimation with Context and Confounders (Part 2)”

Part 2 addresses the setting where hidden confounding is present, and can be accounted for using techniques such as instrumental variables and proxy variables.

Mario R. Guarracino

University of Cassino and Souther Lazio, Italy

Lectures

Lecture 1/3 “A Short Journey through Graph Embedding Techniques Part 1”

Networks and ensembles of networks are able to capture interactions and dependencies among variables or observations, providing simple and powerful modeling of phenomena in different fields. Graph embedding involves the projection of graphs into a vector space, while retaining their structural properties. We will review some among the several embedding techniques developed in recent years.

Lecture 2/3 “A Short Journey through Graph Embedding Techniques Part 2”

Graph Neural Networks (GNN) have been developed to learn low dimensional representations of nodes, subgraphs and graphs with complex node and edge features. These embeddings can then be used in several applications, ranging from feature extraction, graph clustering to classification models. In this lecture, we survey GNNs, also in the light of their interpretability and explainability.

Lecture 3/3 “Graph Representation Learning for Agentic AI”

This talk introduces to the usage of Large Language Models with Graph Neural Networks (GNNs). We will see how modern architectures transform static data into autonomous reasoning graphs, capable of deterministic, traceable, multi-hop reasoning.

Katja Hofmann

Microsoft Research Cambridge, UK

Topics

Machine Learning, Generative Models, Reinforcement Learning, Video Games

Biography

I am a Partner Research Manager at Microsoft Research Cambridge, where I co-lead the People-Centric AI research area. My work focuses on generative AI, interactive media, and game intelligence, combining advances in machine learning with human-computer interaction, design, and social science. With my team we aim to create AI systems that empower people through collaboration, creativity, and play – unlocking new forms of interaction and addressing complex real-world challenges. I am passionate about driving interdisciplinary research that shapes the future of AI experiences across productivity, entertainment, and beyond.

Previously, I led the Game Intelligence team with a focus on machine learning research with a focus on video games, which now forms part of the broader People-Centric AI area.

I am proud to serve the academic research community in my current roles of Board Member (since 2022) and Secretary of the Board (since 2024) of the International Conference on Learning Representations (ICLR(opens in new tab)), and have previously served as Senior Program Chair (ICLR 2021) and General Chair (ICLR 2022).

As part of the Microsoft Research PhD Scholarship program, I have deeply enjoyed co-supervising, and successfully graduating, the following PhD students:

David Lindner(opens in new tab) (ETH Zurich, Switzerland and Microsoft Joint Research Center) – co-supervision with Andreas Krause(opens in new tab)
Rémy Portelas(opens in new tab) (Inria, Bordeaux, France) – co-supervision with Pierre-Yves Oudeyer(opens in new tab)
Steindor Saemundsson(opens in new tab) (Imperial College London, UK) – co-supervision with Marc Deisenroth(opens in new tab)
Laetitia Teodorescu(opens in new tab) (Inria, Bordeaux, France) – co-supervision with Pierre-Yves Oudeyer(opens in new tab)
Luisa Zintgraf(opens in new tab) (University of Oxford, UK) – co-supervision with Shimon Whiteson(opens in new tab)

Before joining Microsoft Research, I completed my PhD in Computer Science as part of the former ILPS group at the University of Amsterdam(opens in new tab). I worked with Maarten de Rijke(opens in new tab) and Shimon Whiteson(opens in new tab) on smart search engines that learn directly from their users. For a list of my publications before joining MSR, please see the ILPS (Information and Language Processing Systems) list of publications(opens in new tab), MSR Academic, or dblp(opens in new tab).

Lectures

Lecture 1/3 “Foundations of World Models”

This lecture introduces world models as learned representations of environment dynamics, conceptualized as simulators that capture the temporal evolution of states. We will distinguish between world models as internal components of agents (e.g., supporting planning and decision-making) and world simulators as standalone generative systems (e.g., for creative uses). We will trace the historical development of the field and provide a structured overview of the architectural landscape, establishing a foundation for subsequent lectures.

Lecture 2/3 “Building a World Simulator”

This lecture covers the full pipeline for training a world simulator, from visual tokenization to large-scale generation, using games as a testbed. It examines the practical ecosystem around inference, evaluation, controllability, and contrasts leading architectural paradigms and their trade-offs.

Lecture 3/3 “Open Challenges and Future Directions”

While recent world simulators can generate visually and temporally coherent sequences, they often lack a deeper, causally grounded understanding of the environments they model. This lecture critically examines the central open challenges in the field, including maintaining long-horizon consistency, capturing causal structure and physical plausibility, and achieving robust generalization beyond the training distribution, while outlining potential future applications that can be unlocked as the field progresses to address these challenges.

Arnulf Jentzen

University of Münster, Germany

Topics

Deep Learning, Gradient Descent Optimization Methods, Mathematical Analysis of the Gradients in Deep Learning, Adam Algorithm, Scientific Machine Learning

Biography

Prof. Arnulf Jentzen is appointed as a presidential chair professor at the Chinese University of Hong Kong, Shenzhen (since 2021) and as a full professor at the University of Münster (since 2019). In 2004 he started his undergraduate studies in mathematics at Goethe University Frankfurt in Germany, in 2007 he received his diploma degree at this university, and in 2009 he completed his PhD in mathematics at this university. The core research topics of his research group are machine learning approximation algorithms, computational stochastics, numerical analysis for high dimensional partial differential equations (PDEs), stochastic analysis, and computational finance. Currently he serves in the editorial boards of several scientific journals such as the Annals of Applied Probability, Communications in Mathematical Sciences, the Journal of Machine Learning, the SIAM Journal on Scientific Computing, and the SIAM Journal on Numerical Analysis. In 2020 he was the recipient of the Felix Klein Prize of the European Mathematical Society (EMS), in 2022 he has been awarded an ERC Consolidator Grant from the European Research Council (ERC), and in 2022 he has been awarded the Joseph F. Traub Prize for Achievement in Information-Based Complexity. Further details on the activities of his research group can be found at the webpage http://www.ajentzen.de.

Lectures

Lecture 1/3 “Mathematical Introduction to Stochastic Gradient Descent Optimization”

In these lectures we present several selected basic results regarding the theoretical understanding of artificial intelligence (AI) methods and structures. Specifically, we first review popular stochastic optimization methods used for training AI models such as the standard stochastic gradient descent (SGD) method, the momentum method, the adaptive root mean square propagation (RMSprop) method, and the famous adaptive moment estimation (Adam) optimizer. In particular, we discuss the Adam symmetry theorem, the Adam vector field, and the Adam limit theorem, as well as convergence speeds and stability regions for different gradient based optimization methods. Thereafter, we also review the capabilities of deep neural networks (DNNs) to approximate certain high-dimensional functions such as solution functions of high-dimensional PDEs.

Lecture 2/3 “Error Analyses for Adam and further Accelerated and Adaptive Optimizers”

Lecture 3/3 “Deep Learning for High-Dimensional Partial Differential Equations”

Panos Pardalos

University of Florida, USA

Topics

Data Science, Global Optimization, Mathematical Modeling, Financial Applications, AI

Biography

Distinguished Emeritus Professor Panos Pardalos

University of Florida

Panos Pardalos was born in Drosato (Mezilo) Argitheas, Greece, in 1954 and graduated from Athens University (Department of Mathematics). He received his PhD in Computer and Information Sciences from the University of Minnesota. He is an Emeritus Distinguished Professor in the Department of Industrial and Systems Engineering at the University of Florida, and an affiliated faculty member in the Biomedical Engineering and Computer Science & Information Engineering departments. Since 2011, he has served as the academic advisor at LATNA, HSE.

Panos Pardalos is a world-renowned leader in Global Optimization, Mathematical Modeling, Energy Systems, Financial Applications, and Data Sciences. He is a Fellow of AAAS, AAIA, AIMBE, EUROPT, and INFORMS, and was awarded the 2013 Constantin Carathéodory Prize by the International Society of Global Optimization. In addition, he was awarded the 2013 EURO Gold Medal by the Association of European Operational Research Societies. This medal is the preeminent European award given to Operations Research (OR) professionals for “scientific contributions that stand the test of time.”

Professor Pardalos was also honored with the prestigious Humboldt Research Award (2018–2019). This award is granted in recognition of a researcher’s entire body of work—fundamental discoveries, new theories, and insights that have had a significant impact on their discipline.

Furthermore, he is a member of several Academies of Sciences and holds numerous honorary PhD degrees and affiliations. He is the Founding Editor of Optimization Letters and Energy Systems, and Co-Founder of the International Journal of Global Optimization, Computational Management Science, and Springer Nature Operations Research Forum. He has published over 600 journal papers and edited or authored over 200 books. As one of the most highly cited authors in his field, he has graduated 71 PhD students to date.

Further details can be found at: https://faculty.eng.ufl.edu/pardalos/publications/

Lectures

“A New Frontier: From a Single Network to a Network of Networks”

Panos Pardalos UF & LATNA

https://faculty.eng.ufl.edu/pardalos/publications/

This lecture examines the fundamental shift from isolated, monolithic systems to the expansive “Network of Networks” architecture that underpins modern global infrastructure. We move beyond traditional single-layer analysis to explore the intricate interdependencies among critical domains—for example, the Energy–Financial nexus, where real-time market signals influence grid stability, and the Transportation–Digital nexus, where autonomous logistics depend on ubiquitous communication.

Problems in networks of networks are far more complex than those in single networks. For example, in a single network, the propagation of failures can often be predicted and contained. In contrast, within a “Network of Networks,” such failures become exponentially more difficult to anticipate due to hidden interdependencies—connections that remain invisible until they trigger cascading and often unpredictable effects.

Raniero Romagnoli

Almawave Spa, Italy

Topics

LLMs, Foundation Models, AI, NLP.

Biography

Raniero Romagnoli is CTO of Almawave and CEO of OBDA Systems. He is an expert in Artificial Intelligence and Natural Language Processing both in the enterprise and academic world. He leads the company’s technology strategy by managing research and development teams. He actively participates in numerous national and international initiatives in the field of AI by collaborating with research centers and academies. He holds advanced courses in Data Science, Machine Learning and AI and is co-author of numerous scientific articles and international patents.

Lectures

Lecture “Building Enterprise AI Applications in Sovereign AI Ecosystem”

Abstract TBA

Michal Valko

Stealth Startup & Inria & ENS MVA, France

Topics

Large Language Models, Reasoning, Foundation Models, Fine-tuning Large Language Models, Reinforcement Learning with Human Feedback, Test-Time Computation

Biography

Michal is the Founding Researcher at a stealth startup, tenured researcher at Inria, and a lecturer at MVA at ENS Paris-Saclay. Michal is primarily interested in designing algorithms that would require as little human supervision as possible. He works on methods and settings that are able to deal with minimal feedback, such as deep reinforcement learning, bandit algorithms, self-supervised learning, or self play. Michal has recently worked on representation learning, world models and deep (reinforcement) learning algorithms that have some theoretical underpinning. In the past he has also worked on sequential algorithms with structured decisions where exploiting the structure leads to provably faster learning. Michal is now working on a new generation of large language models (LLMs), in addition to providing algorithmic solutions for their scalable test-time inference, fine-tuning and alignment. He received his PhD in 2011 from the University of Pittsburgh, before getting a tenure at Inria in 2012 and co-creating Google DeepMind Paris with R. Munos. In 2024, he became a Principal Llama Scientist at Meta, building online reinforcement learning stack and research for Llama 3.

Lectures

Lecture 1/3 “Lightspeed RL Fine Tuning for LLMs”

Abstract TBA

Lecture 2/3 “Online and Offline RL Considerations for LLMs “

Abstract TBA

Lecture 3/3 “New Advances on the Theory of Language Generation and Hallucination”

Abstract TBA

Sagar Vaze

Mistral AI, France

Topics

Multimodal Models, Vision Language Models

Biography

I’m a Research Scientist at Mistral AI, working on multi-modal language models. Previously, I completed a PhD in the VGG at Oxford University, working on representation learning in computer vision, where I was fortunate to be supervised by Andrew Zisserman and Andrea Vedaldi. During my PhD, I also spent time at Meta AI (FAIR): first with Ishan Misra in New York, and then in the Segment Anything team with Ross Girshick.

https://scholar.google.com/citations?user=lvuOknUAAAAJ&hl=en

Lectures

Lecture 1/3 “Large Vision Language models: Foundations I”

Abstract TBA

Lecture 2/3 “Large Vision Language models: Foundations II”

Abstract TBA

Lecture 3/3 “Large Vision Language models: Open Questions”

Abstract TBA

Jason Weston

META New York, USA

New York University, USA

Topics

Artificial Intelligence, Machine Learning, Natural Language Processing, Vision

Biography

Jason is a Research Scientist at Facebook, NY and a Visiting Research Professor at NYU. He earned his Ph.D. in machine learning at Royal Holloway, University of London and at AT&T Research in Red Bank, NJ. Previously, he was a researcher at Biowulf Technologies, a research scientist at the Max Planck Institute for Biological Cybernetics, Tuebingen, Germany, a research staff member at NEC Labs America, Princeton, and a research scientist at Google, NY. His interests lie in statistical machine learning, with a focus on reasoning, memory, perception, interaction, and communication. Jason has published over 100 papers, including Best Paper awards at ICML and ECML, and received a Test of Time Award for his work, “A Unified Architecture for Natural Language Processing: Deep Neural Networks with Multitask Learning,” (with Ronan Collobert). He was part of the YouTube team that won a National Academy of Television Arts & Sciences Emmy Award for Technology and Engineering for Personalized Recommendation Engines for Video Discovery. Jason was also listed as the 16th most influential machine learning scholar at AMiner and one of the top 50 authors in Computer Science in Science.