Lectures

Lecture 1/3 ” Transformers & Vision Transformers”

Lecture 2/3 ” Transformers & Vision Transformers”

Lecture 3/3 ” Transformers & Vision Transformers”

Lecture “Multitask Deep Learning”

This lecture introduces the foundations of Multi-Task Learning (MTL), a machine learning paradigm in which multiple related tasks are learned jointly using shared representations. By allowing tasks to benefit from common information, MTL can improve generalisation, increase data efficiency, and reduce overfitting compared to learning tasks independently. The lecture will discuss the relationship between MTL and transfer learning, present examples from computer vision, natural language processing, and healthcare, and examine the role of multi-task learning in modern deep learning systems, including large language models.

Slides: ACDL 2026-Lecture

Paper: ACDL-2026-Paper-ECML-2025-SAM-GS

Giuseppe Di Fatta

Lecture 1/2 “Causal Effect Estimation with Context and Confounders (Part 1 )”

A fundamental causal modelling task is to predict the effect of an intervention (or treatment) on an outcome, given context/covariates. Examples include predicting the effect of a medical treatment on patient health given patient symptoms and demographic information, or predicting the effect of ticket pricing on airline sales given seasonal fluctuations in demand. The problem becomes especially challenging when the treatment and context are complex (for instance, “treatment” might be a web ad design or a radiotherapy plan), and when only observational data is available (i.e., we have access to historical data, but cannot intervene or conduct trials ourselves). The challenge is greater still when the covariates are not observed, and constitute a hidden source of confounding.

I will give an overview of some practical tools and methods for estimating causal effects of complex, high dimensional treatments from observational data. The approach is based on conditional feature means, which represent conditional expectations of relevant model features. These features can be deep neural nets (adaptive, finite dimensional, learned from data), or kernel features (fixed, infinite dimensional, enforcing smoothness). The methods will be applied to modelling employment outcomes for the US Job Corps program for Disadvantaged Youth, and in policy evaluation for reinforcement learning.

Part 1 addresses the setting where all relevant information is observed (no hidden confounding), and where the aim is to predict (conditional) average causal effects from observations of data, without resorting to intervention or randomized trials.

Lecture 2/2 “Causal Effect Estimation with Context and Confounders (Part 2)”

Part 2 addresses the setting where hidden confounding is present, and can be accounted for using techniques such as instrumental variables and proxy variables.

Arthur Gretton

Lecture 1/3 “A Short Journey through Graph Embedding Techniques Part 1”

Networks and ensembles of networks are able to capture interactions and dependencies among variables or observations, providing simple and powerful modeling of phenomena in different fields. Graph embedding involves the projection of graphs into a vector space, while retaining their structural properties. We will review some among the several embedding techniques developed in recent years.

Lecture 2/3 “A Short Journey through Graph Embedding Techniques Part 2”

Graph Neural Networks (GNN) have been developed to learn low dimensional representations of nodes, subgraphs and graphs with complex node and edge features. These embeddings can then be used in several applications, ranging from feature extraction, graph clustering to classification models. In this lecture, we survey GNNs, also in the light of their interpretability and explainability.

Lecture 3/3 “Graph Representation Learning for Agentic AI”

This talk introduces to the usage of Large Language Models with Graph Neural Networks (GNNs). We will see how modern architectures transform static data into autonomous reasoning graphs, capable of deterministic, traceable, multi-hop reasoning.

Mario R. Guarracino

Lecture 1/3 “Foundations of World Models”

This lecture introduces world models as learned representations of environment dynamics, conceptualized as simulators that capture the temporal evolution of states. We will distinguish between world models as internal components of agents (e.g., supporting planning and decision-making) and world simulators as standalone generative systems (e.g., for creative uses). We will trace the historical development of the field and provide a structured overview of the architectural landscape, establishing a foundation for subsequent lectures.

Lecture 2/3 “Building a World Simulator”

This lecture covers the full pipeline for training a world simulator, from visual tokenization to large-scale generation, using games as a testbed. It examines the practical ecosystem around inference, evaluation, controllability, and contrasts leading architectural paradigms and their trade-offs.

Lecture 3/3 “Open Challenges and Future Directions”

While recent world simulators can generate visually and temporally coherent sequences, they often lack a deeper, causally grounded understanding of the environments they model. This lecture critically examines the central open challenges in the field, including maintaining long-horizon consistency, capturing causal structure and physical plausibility, and achieving robust generalization beyond the training distribution, while outlining potential future applications that can be unlocked as the field progresses to address these challenges.

Katja Hofmann

Lecture 1/3 “Mathematical Introduction to Stochastic Gradient Descent Optimization”

In these lectures we present several selected basic results regarding the theoretical understanding of artificial intelligence (AI) methods and structures. Specifically, we first review popular stochastic optimization methods used for training AI models such as the standard stochastic gradient descent (SGD) method, the momentum method, the adaptive root mean square propagation (RMSprop) method, and the famous adaptive moment estimation (Adam) optimizer. In particular, we discuss the Adam symmetry theorem, the Adam vector field, and the Adam limit theorem, as well as convergence speeds and stability regions for different gradient based optimization methods. Thereafter, we also review the capabilities of deep neural networks (DNNs) to approximate certain high-dimensional functions such as solution functions of high-dimensional PDEs.

Lecture 2/3 “Error Analyses for Adam and further Accelerated and Adaptive Optimizers”

Lecture 3/3 “Deep Learning for High-Dimensional Partial Differential Equations”

Arnulf Jentzen

“A New Frontier: From a Single Network to a Network of Networks”

Panos Pardalos UF & LATNA

https://faculty.eng.ufl.edu/pardalos/publications/

This lecture examines the fundamental shift from isolated, monolithic systems to the expansive “Network of Networks” architecture that underpins modern global infrastructure. We move beyond traditional single-layer analysis to explore the intricate interdependencies among critical domains—for example, the Energy–Financial nexus, where real-time market signals influence grid stability, and the Transportation–Digital nexus, where autonomous logistics depend on ubiquitous communication.

Problems in networks of networks are far more complex than those in single networks. For example, in a single network, the propagation of failures can often be predicted and contained. In contrast, within a “Network of Networks,” such failures become exponentially more difficult to anticipate due to hidden interdependencies—connections that remain invisible until they trigger cascading and often unpredictable effects.