Lectures

Lectures


Lecture 1/3 ” Transformers & Vision Transformers”
Lecture 2/3 ” Transformers & Vision Transformers”
Lecture 3/3 ” Transformers & Vision Transformers”


Lecture “Multitask Deep Learning”

Abstract TBA



Lecture 1/2 “Causal Effect Estimation with Context and Confounders (Part 1 )”
A fundamental causal modelling task is to predict the effect of an intervention (or treatment) on an outcome, given context/covariates. Examples include predicting the effect of a medical treatment on patient health given patient symptoms and demographic information, or predicting the effect of ticket pricing on airline sales given seasonal fluctuations in demand. The problem becomes especially challenging when the treatment and context are complex (for instance, “treatment” might be a web ad design or a radiotherapy plan), and when only observational data is available (i.e., we have access to historical data, but cannot intervene or conduct trials ourselves). The challenge is greater still when the covariates are not observed, and constitute a hidden source of confounding.
I will give an overview of some practical tools and methods for estimating causal effects of complex, high dimensional treatments from observational data. The approach is based on conditional feature means, which represent conditional expectations of relevant model features. These features can be deep neural nets (adaptive, finite dimensional, learned from data), or kernel features (fixed, infinite dimensional, enforcing smoothness).   The methods will be applied to modelling employment outcomes for the US Job Corps program for Disadvantaged Youth, and in policy evaluation for reinforcement learning.
Part 1 addresses the setting where all relevant information is observed (no hidden confounding),  and where the aim is to predict (conditional) average causal effects from observations of data, without resorting to intervention or randomized trials.
Lecture 2/2 “Causal Effect Estimation with Context and Confounders (Part 2)”
A fundamental causal modelling task is to predict the effect of an intervention (or treatment) on an outcome, given context/covariates. Examples include predicting the effect of a medical treatment on patient health given patient symptoms and demographic information, or predicting the effect of ticket pricing on airline sales given seasonal fluctuations in demand. The problem becomes especially challenging when the treatment and context are complex (for instance, “treatment” might be a web ad design or a radiotherapy plan), and when only observational data is available (i.e., we have access to historical data, but cannot intervene or conduct trials ourselves). The challenge is greater still when the covariates are not observed, and constitute a hidden source of confounding.
I will give an overview of some practical tools and methods for estimating causal effects of complex, high dimensional treatments from observational data. The approach is based on conditional feature means, which represent conditional expectations of relevant model features. These features can be deep neural nets (adaptive, finite dimensional, learned from data), or kernel features (fixed, infinite dimensional, enforcing smoothness).   The methods will be applied to modelling employment outcomes for the US Job Corps program for Disadvantaged Youth, and in policy evaluation for reinforcement learning.
Part 2 addresses the setting where hidden confounding is present, and can be accounted for using techniques such as instrumental variables and proxy variables.


Lecture 1/3 “A Short Journey through Graph Embedding Techniques Part 1”

Networks and ensembles of networks are able to capture interactions and dependencies among variables or observations, providing simple and powerful modeling of phenomena in different fields. Graph embedding involves the projection of graphs into a vector space, while retaining their structural properties. We will review some among the several embedding techniques developed in recent years.

Lecture 2/3 “A Short Journey through Graph Embedding Techniques Part 2”

Graph Neural Networks (GNN) have been developed to learn low dimensional representations of nodes, subgraphs and graphs with complex node and edge features. These embeddings can then be used in several applications, ranging from feature extraction, graph clustering to classification models. In this lecture, we survey GNNs, also in the light of their interpretability and explainability.

Lecture 3/3 “Graph Representation Learning for Agentic AI”

This talk introduces to the usage of Large Language Models with Graph Neural Networks (GNNs). We will see how modern architectures transform static data into autonomous reasoning graphs, capable of  deterministic, traceable, multi-hop reasoning.



Lecture 1/3 “Foundations of World Models”

This lecture introduces world models as learned representations of environment dynamics, conceptualized as simulators that capture the temporal evolution of states. We will distinguish between world models as internal components of agents (e.g., supporting planning and decision-making) and world simulators as standalone generative systems (e.g., for creative uses). We will trace the historical development of the field and provide a structured overview of the architectural landscape, establishing a foundation for subsequent lectures.

Lecture 2/3 “Building a World Simulator”

This lecture covers the full pipeline for training a world simulator, from visual tokenization to large-scale generation, using games as a testbed. It examines the practical ecosystem around inference, evaluation, controllability, and contrasts leading architectural paradigms and their trade-offs.

Lecture 3/3 “Open Challenges and Future Directions”

While recent world simulators can generate visually and temporally coherent sequences, they often lack a deeper, causally grounded understanding of the environments they model. This lecture critically examines the central open challenges in the field, including maintaining long-horizon consistency, capturing causal structure and physical plausibility, and achieving robust generalization beyond the training distribution, while outlining potential future applications that can be unlocked as the field progresses to address these challenges.



Lecture 1/3 “Mathematical Introduction to Stochastic Gradient Descent Optimization”

In these lectures we present several selected basic results regarding the theoretical understanding of artificial intelligence (AI) methods and structures. Specifically, we first review popular stochastic optimization methods used for training AI models such as the standard stochastic gradient descent (SGD) method, the momentum method, the adaptive root mean square propagation (RMSprop) method, and the famous adaptive moment estimation (Adam) optimizer. In particular, we discuss the Adam symmetry theorem, the Adam vector field, and the Adam limit theorem, as well as convergence speeds and stability regions for different gradient based optimization methods. Thereafter, we also review the capabilities of deep neural networks (DNNs) to approximate certain high-dimensional functions such as solution functions of high-dimensional PDEs.

Lecture 2/3 “Error Analyses for Adam and further Accelerated and Adaptive Optimizers”

In these lectures we present several selected basic results regarding the theoretical understanding of artificial intelligence (AI) methods and structures. Specifically, we first review popular stochastic optimization methods used for training AI models such as the standard stochastic gradient descent (SGD) method, the momentum method, the adaptive root mean square propagation (RMSprop) method, and the famous adaptive moment estimation (Adam) optimizer. In particular, we discuss the Adam symmetry theorem, the Adam vector field, and the Adam limit theorem, as well as convergence speeds and stability regions for different gradient based optimization methods. Thereafter, we also review the capabilities of deep neural networks (DNNs) to approximate certain high-dimensional functions such as solution functions of high-dimensional PDEs.

Lecture 3/3 “Deep Learning for High-Dimensional Partial Differential Equations”

In these lectures we present several selected basic results regarding the theoretical understanding of artificial intelligence (AI) methods and structures. Specifically, we first review popular stochastic optimization methods used for training AI models such as the standard stochastic gradient descent (SGD) method, the momentum method, the adaptive root mean square propagation (RMSprop) method, and the famous adaptive moment estimation (Adam) optimizer. In particular, we discuss the Adam symmetry theorem, the Adam vector field, and the Adam limit theorem, as well as convergence speeds and stability regions for different gradient based optimization methods. Thereafter, we also review the capabilities of deep neural networks (DNNs) to approximate certain high-dimensional functions such as solution functions of high-dimensional PDEs.



“A New Frontier: From a Single Network to a Network of Networks”

Panos Pardalos UF & LATNA

https://faculty.eng.ufl.edu/pardalos/publications/

This lecture examines the fundamental shift from isolated, monolithic systems to the expansive “Network of Networks” architecture that underpins modern global infrastructure. We move beyond traditional single-layer analysis to explore the intricate interdependencies among critical domains—for example, the Energy–Financial nexus, where real-time market signals influence grid stability, and the Transportation–Digital nexus, where autonomous logistics depend on ubiquitous communication.

Problems in networks of networks are far more complex than those in single networks. For example, in a single network, the propagation of failures can often be predicted and contained. In contrast, within a “Network of Networks,” such failures become exponentially more difficult to anticipate due to hidden interdependencies—connections that remain invisible until they trigger cascading and often unpredictable effects.



Lecture “Building Enterprise AI Applications in Sovereign AI Ecosystem”

Abstract TBA



Lecture 1/3 “Lightspeed RL Fine Tuning for LLMs”

Abstract TBA

Lecture 2/3 “Online and Offline RL Considerations for LLMs “

Abstract TBA

Lecture 3/3 “New Advances on the Theory of Language Generation and Hallucination”

Abstract TBA



Lecture 1/3 “Large Vision Language models: Foundations I”

Abstract TBA

Lecture 2/3 “Large Vision Language models: Foundations II”

Abstract TBA

Lecture 3/3 “Large Vision Language models: Open Questions”

Abstract TBA



Lecture 1/4 “Self-Improving Language Models Part 1”

Abstract TBA

Lecture 2/4 “Self-Improving Language Models Part 2”

Abstract TBA

Lecture 3/4 “Self-Improving Agents”

Abstract TBA

Lecture 4/4 ” The future of Self-Improvement & the promise of Co-Improving AI”

Abstract TBA





Tutorial “When Classical Meets Generative: Mixed Methods for Practical LLM Evaluation – Part 1”

Abstract TBA

Tutorial “When Classical Meets Generative: Mixed Methods for Practical LLM Evaluation – Part 2”

Abstract TBA



Tutorial “When Classical Meets Generative: Mixed Methods for Practical LLM Evaluation – Part 1”

Abstract TBA

Tutorial “When Classical Meets Generative: Mixed Methods for Practical LLM Evaluation – Part 2”

Abstract TBA



Tutorial “When Classical Meets Generative: Mixed Methods for Practical LLM Evaluation – Part 1”

Abstract TBA

Tutorial “When Classical Meets Generative: Mixed Methods for Practical LLM Evaluation – Part 2”

Abstract TBA