[AIP Progress Report Event] Workshop on Trustable Data-Driven Science
[Description]
The Workshop on Trustable Data-Driven Science, organized by the Data-Driven Experimental Design Team at RIKEN AIP, explores cutting-edge approaches in machine learning and statistical methods to strengthen the reliability of data-driven science. The workshop will highlight recent advances in ensuring the robustness and trustworthiness of machine learning models, with a focus on, but not limited to, Selective Inference. Leading researchers from Japan and around the world, together with our team, will share insights and engage in in-depth discussions on building a trustworthy foundation for the future of data-driven science. We warmly invite AI researchers and practitioners who are interested in trustworthy AI, robust machine learning, and their applications in science to join this exciting event.
[Venue]
Online and RIKEN AIP Open Space
[How to Participate On-Site]
For those who wish to attend on-site, please register via the Google Form below (Deadline: September 14, 17:00). In addition to registering on Doorkeeper, please access the following address to apply (maximum 30 participants):
https://forms.gle/rCmBgTrkV2oRqBou9
An entry pass will be sent to on-site participants by Friday, September 18 (only to those whose affiliation can be confirmed). If you do not receive it, it may be due to unverified affiliation or because the capacity has been exceeded. Please note that we will not send notifications in such cases, nor will we be able to respond to inquiries regarding the matter.
Access: https://www.riken.jp/access/tokyo-map/
[Program]
September 19 (Fri)
13:00-13:05 Opening
13:05-13:45
Ichiro Takeuchi (RIKEN AIP)
Selective inference on complex machine learning models
13:45-14:25
Ronan Perry (University of Washington)
Post-selection inference with penalized M-estimators via score thinning
14:25-15:05
Masaaki Imaizumi (University of Tokyo)
Advances in the theory of high-dimensional model neural nets with application to statistical inference
15:05-15:30 Coffee Break
15:30-16:10
Vo Nguyen Le Duy (University of Information Technology, Vietnam National University/RIKEN AIP)
Post-Transfer Learning Statistical Inference
16:10-16:50
Yoshiyuki Ninomiya (Institute of Statistical Mathematics)
Selective inference in propensity score analysis and its development
16:50-16:55 Closing
[Talk Abstract]
Speaker: Ichiro Takeuchi (RIKEN AIP)
Title: Selective inference on complex machine learning models
Abstract: With the rapid advancement of AI, scientific discoveries that were once difficult to achieve using conventional methods are becoming increasingly feasible. On the other hand, ensuring reproducibility remains essential for the sound development of science, highlighting the need for frameworks that can quantitatively assess the reliability of AI-derived knowledge. To address this challenge, our team has been developing statistical testing methods to advance data-driven science in a rigorous manner. Central to this effort is the theory and algorithm of Selective Inference, which provides valid inference while accounting for the influence of analytic choices during data analysis. In this talk, I will present our recent studies on selective inference, focusing on applications to complex machine learning models such as deep learning models.
Speaker: Ronan Perry (University of Washington)
Title: Post-selection inference with penalized M-estimators via score thinning
Abstract: We consider the problem of providing valid inference with M-estimators in a model selected via a sparse penalty, e.g., the lasso. In this setting, a number of solutions exist but face certain limitations including: strong distributional assumptions, known variances, affine selection events, and bespoke methods for inference. As a solution, we prove that a simple approach which adds noise to the selection and inference estimators yields asymptotically valid post-selection inference across a wide range of loss functions and penalties. This result hinges on the asymptotic normality of an underlying "core statistic" which we can split into asymptotically independent pieces. Our method is valid without distributional assumptions so long as the variance can be estimated consistently, and can be applied using only standard software (i.e. the glm function in R).
Speaker: Masaaki Imaizumi (University of Tokyo)
Title: Advances in the theory of high-dimensional model neural nets with application to statistical inference
Abstract: In this talk, we will discuss recent advances in high-dimensional statistics and their potential for statistical inference. With the rapid development of deep learning and artificial intelligence technologies, statistical models with an enormous number of parameters have become increasingly important. To understand such large-scale models, statistical theory has progressed in the high-dimensional limit, where the number of parameters diverges to infinity. We explore the properties of complex and deep statistical models through their dynamical behavior in high dimensions, and develops frameworks for conducting statistical inference in such high-dimensional regimes. We will also touch on the potential of applying these frameworks to realize statistical inference in deep models.
Speaker: Vo Nguyen Le Duy (University of Information Technology, Vietnam National University/RIKEN AIP)
Title: Post-Transfer Learning Statistical Inference
Abstract: Transfer learning (TL) is a fundamental paradigm in machine learning (ML), as it enables models to leverage knowledge obtained from one domain or task and apply it to another related problem. This strategy reduces the need for extensive labeled data in the target domain while improving training efficiency and predictive performance. Yet, despite its success, an open challenge remains: how can we assess the reliability of ML results after the TL process? In particular, we currently lack a statistical framework to ensure proper control of false discoveries. This is crucial, because failing to do so can lead to misleading conclusions and even harmful outcomes. In this talk, I will present my recent work on developing statistical inference methods for feature selection and anomaly detection in post-TL settings.
Speaker: Yoshiyuki Ninomiya (Institute of Statistical Mathematics)
Title: Selective inference in propensity score analysis and its development
Abstract: Selective inference (post-selection inference) is a methodology that has attracted much attention in recent years in the fields of statistics and machine learning. Naive inference based on data that are also used for model selection tends to show an overestimation, and so the selective inference conditions the event that the model was selected. In this presentation, we develop selective inference in propensity score analysis with a semiparametric approach, which has become a standard tool in causal inference. Specifically, for the most basic causal inference model in which the causal effect can be written as a linear sum of confounding variables, we conduct Lasso-type variable selection by adding an L1 penalty term to the loss function that gives a semiparametric estimator. Confidence intervals are then given for the coefficients of the selected confounding variables, conditional on the event of variable selection, with asymptotic guarantees. We also incorporate recent developments in selective inference.