Hi! PARIS Summer School

Keynotes

Keynote 3

Speaker(s):

Joan
BRUNA
Abstract:

The pace of progress of large-scale machine learning keeps increasing, towards even bigger models and datasets, producing astonishing results along the way in data-heavy domains such as text or images. Such rapid progress also leaves our mathematical understanding further behind, to the extent that one wonders whether it will ever catch up?

In this talk, we will raise salient questions about this trend while zooming-in on technical snippets, covering approximation properties of transformers, mathematical aspects of score-based diffusion generative models, and optimization aspects of learning semi-parametric models.

Requirements:

Keynote 1

Speaker(s):

Aristides
GIONIS
Abstract:

Online social networks are widely used nowadays by people to engage in conversations about a variety of topics. Over time, these discussions can have a significant impact on people's opinions. In this talk we present an overview of models that have been proposed in the literature to capture how information spreads and how opinions form in online social media. One of our objectives is to obtain a better understanding of adverse social phenomena, such as increase in polarization and creation of filter bubbles. We then present some of the computational challenges that have arisen recently in this domain. In particular, we discuss mediation stategies for maximizing the diversity of the content of users via recommendations and prioritizing their feed in order to reduce polarization. Finally, we study the question of whether an adversary can sow disagreement in a social network by influencing the opinion of a small set of users.

Requirements:

Keynote 2

Speaker(s):

Balaji
PADMANABHAN
Abstract:

This talk will first present an overview of Artificial Intelligence over the years leading to where we are today, and use that historical context to discuss some of the successes and failures we have seen over the years. With this backdrop, we will set the stage for thinking about designing for “augmented intelligence” not just artificial intelligence; where we want to think about tackling more complex business and societal problems (than image recognition, say) using a combination of data, algorithms and people. In addition to providing an overview of some important recent work in this context we will present a complex systems perspective into this issue, and show how such a perspective can be useful to design, develop, evaluate and refine newer augmented intelligence methods going forward.

Requirements:

Tutorials

All tutorials are 3 hours long and organized in two parallel tracks. They will be held at the same time of the day.
Tutorial Track A – Data Science for Business and Society Tutorial Track B – Theory and methods of AI
Tutorial 1A

Speaker(s):

Johan
HOMBERT
Abstract:

This tutorial includes a short lecture followed by an interactive game in which participants play the role of a FinTech lender. Context: Banks and insurers increasingly use alternative data and machine learning to screen consumers and price products. For example, a FinTech using digital footprints to predict default will have a competitive edge over traditional banks. However, there are important pitfalls to avoid when using alternative data and machine learning to score consumers, such as the winner’s curse, the risk of discrimination and the Lucas critique. This tutorial and its interactive game provide an introduction to these issues.

Requirements:

Multivariate statistical analysis, in particular OLS / logit regressions, and/or machine learning methods

Tutorial 2A

Speaker(s):

Winston
MAXWELL
Abstract:

How will Europe’s future AI regulation impact the design, testing and use of AI applications such as credit scoring, recruitment algorithms, anti-fraud algorithms and facial recognition? We will explore how AI concepts such as explainability, fairness, accuracy, robustness and human oversight will be implemented into the future regulation, and how the regulation compares to other international standards on trustworthy AI. The course will focus on two concrete use cases, facial recognition and credit scoring, to see how the European regulatory framework would apply throughout the lifecycle of the project, walking students through the process of creating a risk management system, including an impact assessment on potential risks for safety and fundamental rights, developing a list of requirements, testing, performance parameters, documentation, and human oversight mechanisms. We’ll explore the potential friction between the European AI Act and other regulatory frameworks such as the European General Data Protection Regulation (GDPR), and lead a debate on how the future regulation will impact AI innovation and research in Europe.

Requirements:

TBD

Tutorial 3A

Speaker(s):

Mitali
BANERJEE
Abstract:

This 3 hour module will offer a hands-on introduction to deep learning based image recognition tools. Participants will gain familiarity with preparing and importing images into software (python) and applying one of the foundational deep learning architectures to classify the images and create vector representations. We will discuss different applications of the output of deep learning tools to extract managerial and scientific insights. In particular, the course will discuss applications of these tools to creating large-scale measures that have otherwise proven to be elusive to measure or susceptible to bias in measurement.

Requirements:
  1. Basic knowledge of linear algebra is helpful but not required
  2. Basic knowledge of python (e.g. libraries such as pandas and numpy) is helpful but not required.
  3. Basic familiarity with standard regression OLS models. You should be familiar with what it means to estimate relationships between variables using OLS models.
  4. A gmail account is required to open the google collab notebooks which will be shared before the class.

Tutorial 4A

Speaker(s):

Klaus
MILLER
Abstract:

We will discuss the impact of privacy regulation on the online advertising market and specifically focus on the case of the European Union’s General Data Protection Regulation (GDPR). Specifically, participants of this tutorial will learn: (1) Why and how the European General Data Protection Regulation (GDPR) impacts the online advertising market, particularly advertisers, publishers and users. (2) How advertisers and publishers leverage users’ personal data to pursue their goals. (3) Which aspects of the GDPR are most relevant for advertisers, publishers and users.(4) How complex it is to go through the process of obtaining user permission for personal data processing, and how IAB’s Transparency and Consent Framework (TCF) intends to help.(5) How many firms a publisher provides with access to its users’ data, and how long it takes a user to respond to all permission requests. (6) Which developments are taking place with regard to personal data processing, among players in the online advertising industry, as well as among regulators and consumer protection agencies. Anyone interested in learning how and why the online advertising industry benefits from using personal data, and how the GDPR impacts this practice should attend this tutorial. The tutorial is based on the book “The Impact of the General Data Protection Regulation (GDPR) on the Online Advertising Market” available completely for free at www.gdpr-impact.com

Requirements:
  • Reading Chapter 1 and Chapter 2 of the referenced book available at gdpr-impact.com
  • Installed Version of Base R and R Studio for the Empirical Analysis of Cookie Data

Tutorial 5A

Speaker(s):

Julien
GRAND-CLEMENT
Abstract:

The goal of this tutorial is to understand how uncertainty impacts classical decision-making models and the operational and business consequences. Any decision model that is data-driven may face uncertainty due to errors in the data, in the modeling assumptions, or due to the inherent randomness of the decision process. Overlooking this uncertainty may lead to decisions that are suboptimal, unreliable, or, in some crucial applications, practically infeasible and dangerous for the users. In this tutorial, we will learn to (1) estimate the uncertainty given a decision problem and a dataset, and (2) mitigate the impact of uncertainty with a robust approach. As an application, a robust portfolio management problem will be investigated in detail, though we will see that the problem of uncertainty arises in many (if not most) real decision settings.

This tutorial is structured as follows:

  1. How to estimate the uncertainty in a decision model?
    • Motivating examples: what is the practical impact of uncertainty?
      • Wrong images classification, variability in demands for supply chains, artificial intelligence in healthcare, Tesla auto-driving, robotics, maintenance, inventory optimization, facility location, project management, etc.
      • Introduction of the running example: portfolio management.
    • Understanding the origin of the uncertainty: poor data, little data, is the uncertainty inherent to the application? When do we need to take it into account?
    • Risk-sensitive decisions vs. parameter uncertainty.
    • How to estimate the uncertainty? Examples with simulations with Colab and synthetic data for the portfolio management problem.
  1. How to mitigate the impact of uncertainty in practice? Robust portfolio management.
    • Deterministic approach: pessimism in parameters estimations.
    • Robust and distributional robust approach: how to obtain decisions with guarantees of good performances.
    • Evidence from simulations with Colab: trade-offs nominal performances vs. worst-case performances for the portfolio management problem. How to deal with variability?
    • (Time-permitting) Two-stage decision-making: how to act when uncertainty is revealed over time?
Requirements:

Basic knowledge of statistics (means, confidence intervals, quantiles). Knowing linear programming is a plus. For the simulations, all code will be in Python, and a Colab notebook will be available for the participants, with some pre-coded examples.

Tutorial 6A

Speaker(s):

Aluna
WANG
Abstract:

Risk management encompasses the identification, analysis, and response to risk factors arising over the life of a business. Recognizing patterns and detecting anomalies in big data can be critical to effective risk management. While numerous technologies for spotting anomalies in collections of multi-dimensional data points have been developed in the past years, anomaly detection techniques for structured graph data have lately become a focus. Why do we need to use graph-based approaches to anomaly detection? What are some of the high-impact applications of graph-based anomaly detection in risk management? How can we develop and deploy graph-based anomaly detection techniques for financial transaction data? This short course answers the above questions by introducing two general, scalable, and explainable anomaly detection models, with a focus on the use of graphs and the minimum description length (MDL) principle.  The course also discusses how to deploy these techniques and use them for risk management.

Requirements:

Basic knowledge of Python and Jupyter Notebook

Tutorial 1B

Speaker(s):

Rémi
FLAMARY
Abstract:

This tutorial aims at presenting the mathematical theory of optimal transport (OT) and providing a global view of the potential applications of this theory in machine learning, signal and image processing and biomedical data processing. The first part of the tutorial will present the theory of optimal transport and the optimization problems through the original formulation of Monge and the Kantorovitch formulation in the primal and dual. The algorithms used to solve these problems will be discussed and the problem will be illustrated on simple examples. We will also introduce the OT-based Wasserstein distance and the Wasserstein barycenters that are fundamental tools in data processing of histograms. Finally we will present recent developments in regularized OT that bring efficient solvers and more robust solutions.

The second part of the tutorial will present numerous recent applications of OT in the field of machine learning and signal processing and biomedical imaging. We will see how the mapping inherent to optimal transport can be used to perform domain adaptation and transfer learning. Finally we will discuss the use of OT on empirical datasets with applications in generative adversarial networks, unsupervised learning and processing of structured data such as graphs.

Requirements:

TBD

Tutorial 2B

Speaker(s):

Alexandre
GRAMFORT
Abstract:

Understanding how the brain works in healthy and pathological conditions is considered as one of the major challenges for the 21st century. After the first electroencephalography (EEG) measurements in 1929, the 90's was the birth of modern functional brain imaging with the first functional MRI (fMRI) and full head magnetoencephalography (MEG) system. Presently new tech companies are developing new consumer grade devices for at home recordings of neural activity. By offering noninvasively unique insights into the living brain, these technologies have started to revolutionize both clinical and cognitive neuroscience.

The availability of such new devices made possible by pioneering breakthroughs in physics and engineering now pose major computational and statistical challenges for which machine learning currently plays a major role. In this course you will discover hands-on the types of data one can collect to record the living brain. Then you will learn about state-of-the-art supervised machine learning approaches for EEG signals in the clinical context of sleep stage classification as well as brain computer interfaces. ML techniques that will be explored are based on deep learning as well as Riemannian geometry that has proven very powerful to classify EEG data. You will do so with MNE-Python (https://mne.tools) which has become a reference tool to process MEG/EEG/sEEG/ECoG data in Python, as well as the scikit-learn library (https://scikit-learn.org). For the deep learning aspect you will use the Braindecode package (https://braindecode.org) based on PyTorch. The teaching will be done hands-on using Jupyter notebooks and public datasets, that you will be able to work using google colab.

Finally this tutorial will be a unique opportunity to see what ML can offer beyond standard applications like computer vision, speech or NLP.

Requirements:

TBD

Tutorial 3B

Speaker(s):

Isabelle
BLOCH
Abstract:

The tutorial will review a few methods for symbolic AI, for knowledge representation and reasoning, and show how they can be combined with learning approaches for image understanding. Examples in medical image understanding will illustrate the talk.

Requirements:

TBD

Tutorial 4B

Speaker(s):

Corentin
TALLEC
Abstract:

Be it on Atari Games, Go, Chess, Starcraft II or Dota, Deep Reinforcement Learning (DRL) has opened up Reinforcement Learning to a variety of large scale applications. While it could formally appear as a straightforward extension of reinforcement learning to deep learning based function approximations, DRL often involves more than simply plugging the newest deep learning architecture into the best theoretical reinforcement learning method. In this tutorial, we will journey through the recent history of DRL, from the now seminal Neural fitted-Q, to the most popular Deep Q-Network (DQN). Alongside the lecture, the practical session will revolve around implementing and testing DRL algorithms in JAX and Haiku on simple environments.

Requirements:

Tutorial 5B

Speaker(s):

Geoffroy
PEETERS
Abstract:

As in many fields, deep neural networks have allowed important advances in the processing of audio signals.
In this tutorial, we review the specificities of these signals, elements of audio signal processing (as used in the traditional machine-learning approach) and how deep neural networks (in particular convolutional ones) can be used to perform feature learning (without prior knowledge --1Dconv, TCN--, or using prior knowledge --source/filter, auto-regressive, HCQT, SincNet, DDSP--).
We then review the dominant DL architectures, meta-architectures and training paradigms (classification, metric learning, supervised, unsupervised, self-supervised, semi-supervised) used in audio.
We exemplify the used of those for some key applications in music and environmental sounds processing: sound event detection, localization, auto-tagging, source separation, generation.

Requirements:

TBD

Tutorial 6B

Speaker(s):

Krikamol
MUANDET
Abstract:

Data-driven decision-making tools have become increasingly prevalent in society today with applications in critical areas like health care, economics, education, and the justice system. To ensure reliable decisions, it is essential that the models learn from data the genuine correlations (i.e., causal relationships) between the outcomes and the decision variables. In this tutorial, I will first give an introduction to the causal inference problem from a machine learning perspective including causal discovery, treatment effect estimation, instrumental variable (IV), and proxy variables. Then, I will review recent developments in how we can leverage machine learning (ML) based methods, especially modern kernel methods, to tackle some of these problems.

Requirements:

TBD

Social Time

Social time is an opportunity for participants in the summer school to interact with others. For the participants who are attending the event physically can chat with others over a real cup of coffee. Virtual attendees can interact with other participants and professors in the « Hi! PARIS Summer School Space in Gather Town », which offers space for attendees to move around in our virtual premises, chat on private tables or view posters being presented in the summer school. Attendees can even fix up a prior meeting appointment and meet up in the Gather Town to discuss research ideas !

They trust us

Get in Touch

Pr. Gaël RICHARD

Executive Director

contact@hi-paris.fr

Executive Director

Phone

+33 (0)1 75 31 96 60

Copyright © 2022 • Hi! Paris • All right reserved