Sessions description

Keynotes

Keynote 1

Speaker(s): 

Aristides
GIONIS

Abstract: 

Online social networks are widely used nowadays by people to engage in conversations about a variety of topics. Over time, these discussions can have a significant impact on people's opinions. In this talk we present an overview of models that have been proposed in the literature to capture how information spreads and how opinions form in online social media. One of our objectives is to obtain a better understanding of adverse social phenomena, such as increase in polarization and creation of filter bubbles. We then present some of the computational challenges that have arisen recently in this domain. In particular, we discuss mediation stategies for maximizing the diversity of the content of users via recommendations and prioritizing their feed in order to reduce polarization. Finally, we study the question of whether an adversary can sow disagreement in a social network by influencing the opinion of a small set of users.

Requirements: 

Documents: 

Keynote 2

Speaker(s): 

Balaji
PADMANABHAN

Abstract: 

This talk will first present an overview of Artificial Intelligence over the years leading to where we are today, and use that historical context to discuss some of the successes and failures we have seen over the years. With this backdrop, we will set the stage for thinking about designing for “augmented intelligence” not just artificial intelligence; where we want to think about tackling more complex business and societal problems (than image recognition, say) using a combination of data, algorithms and people. In addition to providing an overview of some important recent work in this context we will present a complex systems perspective into this issue, and show how such a perspective can be useful to design, develop, evaluate and refine newer augmented intelligence methods going forward.

Requirements: 

Documents: 

Keynote 3

Speaker(s): 

Joan
BRUNA

Abstract: 

The pace of progress of large-scale machine learning keeps increasing, towards even bigger models and datasets, producing astonishing results along the way in data-heavy domains such as text or images. Such rapid progress also leaves our mathematical understanding further behind, to the extent that one wonders whether it will ever catch up?

In this talk, we will raise salient questions about this trend while zooming-in on technical snippets, covering approximation properties of transformers, mathematical aspects of score-based diffusion generative models, and optimization aspects of learning semi-parametric models.

Requirements: 

Documents: 

Keynote 4

Speaker(s): 

Helen
MARGETTS

Abstract: 

Most Artificial Intelligence is developed by and for the private sector. This talk will focus on what can happen when we think about AI from a public sector perspective. How can AI be used to improve policymaking, public services and governance? What are the 'wicked' public policy problems that AI might help to solve? Drawing on research underway at the Public policy programme at The Alan Turing Institute for Data Science and AI in the UK, the talk will explain the tasks for which data science and AI are particularly suited. It will show how the use of these data-driven technologies can foster government innovation, optimise resource allocation and highlight longstanding injustices in public decision-making. Developing and using AI in the public sector might help to make governments more efficient, effective, fair and resilient than ever before.  

Requirements: 

Documents: 

Tutorials

All tutorials are 3 hours long and organized in two parallel tracks. They will be held at the same time of the day.
Tutorial Track A – Data Science for Business and Society Tutorial Track B – Theory and methods of AI
Tutorial 1A

Speaker(s): 

Johan
HOMBERT

Abstract: 

This tutorial includes a short lecture followed by an interactive game in which participants play the role of a FinTech lender. Context: Banks and insurers increasingly use alternative data and machine learning to screen consumers and price products. For example, a FinTech using digital footprints to predict default will have a competitive edge over traditional banks. However, there are important pitfalls to avoid when using alternative data and machine learning to score consumers, such as the winner’s curse, the risk of discrimination and the Lucas critique. This tutorial and its interactive game provide an introduction to these issues.

Requirements: 

Multivariate statistical analysis, in particular OLS / logit regressions, and/or machine learning methods

Documents: 

Tutorial 2A

Speaker(s): 

David
RESTREPO AMARILES

Abstract: 

How will Europe’s future AI regulation impact the design, testing and use of AI applications such as credit scoring, recruitment algorithms, anti-fraud algorithms and facial recognition? We will explore how AI concepts such as explainability, fairness, accuracy, robustness and human oversight will be implemented into the future regulation, and how the regulation compares to other international standards on trustworthy AI. The course will focus on two concrete use cases, facial recognition and credit scoring, to see how the European regulatory framework would apply throughout the lifecycle of the project, walking students through the process of creating a risk management system, including an impact assessment on potential risks for safety and fundamental rights, developing a list of requirements, testing, performance parameters, documentation, and human oversight mechanisms. We’ll explore the potential friction between the European AI Act and other regulatory frameworks such as the European General Data Protection Regulation (GDPR), and lead a debate on how the future regulation will impact AI innovation and research in Europe.

Requirements: 

TBD

Documents: 

Tutorial 3A

Speaker(s): 

Mitali
BANERJEE

Abstract: 

This 3 hour module will offer a hands-on introduction to deep learning based image recognition tools. Participants will gain familiarity with preparing and importing images into software (python) and applying one of the foundational deep learning architectures to classify the images and create vector representations. We will discuss different applications of the output of deep learning tools to extract managerial and scientific insights. In particular, the course will discuss applications of these tools to creating large-scale measures that have otherwise proven to be elusive to measure or susceptible to bias in measurement.

Requirements: 

  1. Basic knowledge of linear algebra is helpful but not required
  2. Basic knowledge of python (e.g. libraries such as pandas and numpy) is helpful but not required.
  3. Basic familiarity with standard regression OLS models. You should be familiar with what it means to estimate relationships between variables using OLS models.
  4. A gmail account is required to open the google collab notebooks which will be shared before the class.

Documents: 

Tutorial 4A

Speaker(s): 

Klaus
MILLER

Abstract: 

We will discuss the impact of privacy regulation on the online advertising market and specifically focus on the case of the European Union’s General Data Protection Regulation (GDPR). Specifically, participants of this tutorial will learn: (1) Why and how the European General Data Protection Regulation (GDPR) impacts the online advertising market, particularly advertisers, publishers and users. (2) How advertisers and publishers leverage users’ personal data to pursue their goals. (3) Which aspects of the GDPR are most relevant for advertisers, publishers and users.(4) How complex it is to go through the process of obtaining user permission for personal data processing, and how IAB’s Transparency and Consent Framework (TCF) intends to help.(5) How many firms a publisher provides with access to its users’ data, and how long it takes a user to respond to all permission requests. (6) Which developments are taking place with regard to personal data processing, among players in the online advertising industry, as well as among regulators and consumer protection agencies. Anyone interested in learning how and why the online advertising industry benefits from using personal data, and how the GDPR impacts this practice should attend this tutorial. The tutorial is based on the book “The Impact of the General Data Protection Regulation (GDPR) on the Online Advertising Market” available completely for free at www.gdpr-impact.com

Requirements: 

  • Reading Chapter 1 and Chapter 2 of the referenced book available at gdpr-impact.com
  • Installed Version of Base R and R Studio for the Empirical Analysis of Cookie Data

Documents: 

Tutorial 5A

Speaker(s): 

Julien
GRAND-CLEMENT

Abstract: 

The goal of this tutorial is to understand how uncertainty impacts classical decision-making models and the operational and business consequences. Any decision model that is data-driven may face uncertainty due to errors in the data, in the modeling assumptions, or due to the inherent randomness of the decision process. Overlooking this uncertainty may lead to decisions that are suboptimal, unreliable, or, in some crucial applications, practically infeasible and dangerous for the users. In this tutorial, we will learn to (1) estimate the uncertainty given a decision problem and a dataset, and (2) mitigate the impact of uncertainty with a robust approach. As an application, a robust portfolio management problem will be investigated in detail, though we will see that the problem of uncertainty arises in many (if not most) real decision settings.

This tutorial is structured as follows:

  1. How to estimate the uncertainty in a decision model?
    • Motivating examples: what is the practical impact of uncertainty?
      • Wrong images classification, variability in demands for supply chains, artificial intelligence in healthcare, Tesla auto-driving, robotics, maintenance, inventory optimization, facility location, project management, etc.
      • Introduction of the running example: portfolio management.
    • Understanding the origin of the uncertainty: poor data, little data, is the uncertainty inherent to the application? When do we need to take it into account?
    • Risk-sensitive decisions vs. parameter uncertainty.
    • How to estimate the uncertainty? Examples with simulations with Colab and synthetic data for the portfolio management problem.
  1. How to mitigate the impact of uncertainty in practice? Robust portfolio management.
    • Deterministic approach: pessimism in parameters estimations.
    • Robust and distributional robust approach: how to obtain decisions with guarantees of good performances.
    • Evidence from simulations with Colab: trade-offs nominal performances vs. worst-case performances for the portfolio management problem. How to deal with variability?
    • (Time-permitting) Two-stage decision-making: how to act when uncertainty is revealed over time?

Requirements: 

Basic knowledge of statistics (means, confidence intervals, quantiles). Knowing linear programming is a plus. For the simulations, all code will be in Python, and a Colab notebook will be available for the participants, with some pre-coded examples.

Documents: 

Tutorial 6A

Speaker(s): 

Aluna
WANG

Abstract: 

Risk management encompasses the identification, analysis, and response to risk factors arising over the life of a business. Recognizing patterns and detecting anomalies in big data can be critical to effective risk management. While numerous technologies for spotting anomalies in collections of multi-dimensional data points have been developed in the past years, anomaly detection techniques for structured graph data have lately become a focus. Why do we need to use graph-based approaches to anomaly detection? What are some of the high-impact applications of graph-based anomaly detection in risk management? How can we develop and deploy graph-based anomaly detection techniques for financial transaction data? This short course answers the above questions by introducing two general, scalable, and explainable anomaly detection models, with a focus on the use of graphs and the minimum description length (MDL) principle.  The course also discusses how to deploy these techniques and use them for risk management.

https://www.summerschool.hi-paris.fr/wp-content/uploads/2022/04/Aluna_CODEtect-Algorithm.zip

--

Requirements: 

Basic knowledge of Python and Jupyter Notebook

Tutorial 1B

Speaker(s): 

Rémi
FLAMARY

Abstract: 

This tutorial aims at presenting the mathematical theory of optimal transport (OT) and providing a global view of the potential applications of this theory in machine learning, signal and image processing and biomedical data processing. The first part of the tutorial will present the theory of optimal transport and the optimization problems through the original formulation of Monge and the Kantorovitch formulation in the primal and dual. The algorithms used to solve these problems will be discussed and the problem will be illustrated on simple examples. We will also introduce the OT-based Wasserstein distance and the Wasserstein barycenters that are fundamental tools in data processing of histograms. Finally we will present recent developments in regularized OT that bring efficient solvers and more robust solutions.

The second part of the tutorial will present numerous recent applications of OT in the field of machine learning and signal processing and biomedical imaging. We will see how the mapping inherent to optimal transport can be used to perform domain adaptation and transfer learning. Finally we will discuss the use of OT on empirical datasets with applications in generative adversarial networks, unsupervised learning and processing of structured data such as graphs.

Requirements: 

TBD

Documents: 

Tutorial 2B

Speaker(s): 

Alexandre
GRAMFORT

Abstract: 

Understanding how the brain works in healthy and pathological conditions is considered as one of the major challenges for the 21st century. After the first electroencephalography (EEG) measurements in 1929, the 90's was the birth of modern functional brain imaging with the first functional MRI (fMRI) and full head magnetoencephalography (MEG) system. Presently new tech companies are developing new consumer grade devices for at home recordings of neural activity. By offering noninvasively unique insights into the living brain, these technologies have started to revolutionize both clinical and cognitive neuroscience.

The availability of such new devices made possible by pioneering breakthroughs in physics and engineering now pose major computational and statistical challenges for which machine learning currently plays a major role. In this course you will discover hands-on the types of data one can collect to record the living brain. Then you will learn about state-of-the-art supervised machine learning approaches for EEG signals in the clinical context of sleep stage classification as well as brain computer interfaces. ML techniques that will be explored are based on deep learning as well as Riemannian geometry that has proven very powerful to classify EEG data. You will do so with MNE-Python (https://mne.tools) which has become a reference tool to process MEG/EEG/sEEG/ECoG data in Python, as well as the scikit-learn library (https://scikit-learn.org). For the deep learning aspect you will use the Braindecode package (https://braindecode.org) based on PyTorch. The teaching will be done hands-on using Jupyter notebooks and public datasets, that you will be able to work using google colab.

Finally this tutorial will be a unique opportunity to see what ML can offer beyond standard applications like computer vision, speech or NLP.

Requirements: 

Materials for the class: https://github.com/agramfort/hiparis_ml_eeg

Documents: 

Tutorial 3B

Speaker(s): 

Isabelle
BLOCH

Abstract: 

The tutorial will review a few methods for symbolic AI, for knowledge representation and reasoning, and show how they can be combined with learning approaches for image understanding. Examples in medical image understanding will illustrate the talk.

More info on the AIDA lecture: https://www.i-aida.org/events/hybrid-ai-for-knowledge-representation-and-model-based-medical-image-understanding/

Link to the video: https://www.youtube.com/watch?v=3QcHB3MNcVI

Requirements: 

TBD

Documents: 

Tutorial 4B

Speaker(s): 

Corentin
TALLEC

Abstract: 

Be it on Atari Games, Go, Chess, Starcraft II or Dota, Deep Reinforcement Learning (DRL) has opened up Reinforcement Learning to a variety of large scale applications. While it could formally appear as a straightforward extension of reinforcement learning to deep learning based function approximations, DRL often involves more than simply plugging the newest deep learning architecture into the best theoretical reinforcement learning method. In this tutorial, we will journey through the recent history of DRL, from the now seminal Neural fitted-Q, to the most popular Deep Q-Network (DQN). Alongside the lecture, the practical session will revolve around implementing and testing DRL algorithms in JAX and Haiku on simple environments.

Requirements: 

Tutorial 5B

Speaker(s): 

Geoffroy
PEETERS

Abstract: 

As in many fields, deep neural networks have allowed important advances in the processing of audio signals.
In this tutorial, we review the specificities of these signals, elements of audio signal processing (as used in the traditional machine-learning approach) and how deep neural networks (in particular convolutional ones) can be used to perform feature learning (without prior knowledge --1Dconv, TCN--, or using prior knowledge --source/filter, auto-regressive, HCQT, SincNet, DDSP--).
We then review the dominant DL architectures, meta-architectures and training paradigms (classification, metric learning, supervised, unsupervised, self-supervised, semi-supervised) used in audio.
We exemplify the used of those for some key applications in music and environmental sounds processing: sound event detection, localization, auto-tagging, source separation, generation.

Requirements: 

TBD

Documents: 

Tutorial 6B

Speaker(s): 

Krikamol
MUANDET

Abstract: 

Data-driven decision-making tools have become increasingly prevalent in society today with applications in critical areas like health care, economics, education, and the justice system. To ensure reliable decisions, it is essential that the models learn from data the genuine correlations (i.e., causal relationships) between the outcomes and the decision variables. In this tutorial, I will first give an introduction to the causal inference problem from a machine learning perspective including causal discovery, treatment effect estimation, instrumental variable (IV), and proxy variables. Then, I will review recent developments in how we can leverage machine learning (ML) based methods, especially modern kernel methods, to tackle some of these problems.

Requirements: 

TBD

Documents: 

Round tables

Academic round table

The panel is formed of our dear keynote speakers and will be moderated by our scientific committee member Prof. Anna Korba (IP PARIS – ENSAE Paris).

After an opening introduction, each of the panel members will be asked to give a very short presentation of their research (about 5-minutes).

The discussion will be led by the moderator with the speakers on personal point of view of the speakers about priorities in AI research, evolution of the field of AI, personal evolution of thinking/using AI, academic and industrial job markets. Will follow  questions from the audience.

Industry round table

The Industry Panel, composed of Hi! PARIS Corporate Donors, is an opportunity to talk about and disseminate their activities in AI/Data Science and also start an interchange with the academic community. The rond table will be moderated by our scientific committee member Prof. Jean-Edouard Colliard (HEC Paris).

After an opening introduction by the moderator, each of the panel members will be asked to present their latest AI initiatives, opportunities and challenges.

The discussion will be led by the moderator with the speakers on and questions from the audience.

We hope the industry panel will be a very interactive event with an opportunity to open communication channels for further research opportunities between the industry and the academic community.

Student Program

Who better to give professional advice than students who have gone through the same situations as you?

The student program is a conference round table organized by our students, Willie Hernandez and Aurore Troussel (HEC Paris).
This activity aims to introduce Brayam, Ahmed and Dilia, former students of the Institut Polytechnique de Paris who are currently working in the Data Science in the industry. In this opportunity, we will ask them about their experiences at the university and in their work. They will present their tips on how to get a job, they will tell us the Do’s and Don’t’s when presenting a job interview and finally, they will tell us how their experience at IP helped them to become the successful professionals they are today.

Invitees:

  • Ahmed Lachtar – Ernst & Young, graduate of ENSTA Paris
  • Brayam Velandia – Technopolis Group, graduate of Telecom Paris
  • Dilia Olivo – La Banque Postale, PhD at Telecom Paris

Social Time

Conference Networking and Coffee

is an opportunity for participants in the summer school to interact with others and chat over a drink or a cup of coffee.

Social Event

great time evening event at AllTogether Saclay, a leisure student bar close to Télécom Paris. Attendees to the Summer School will be able to enjoy around games, drinks and finger food!