検索条件

キーワード
タグ
ツール
開催日
こだわり条件

タグ一覧

JavaScript
PHP
Java
Ruby
Python
Perl
Scala
Haskell
C言語
C言語系
Google言語
デスクトップアプリ
スマートフォンアプリ
プログラミング言語
U/UX
MySQL
RDB
NoSQL
全文検索エンジン
全文検索
Hadoop
Apache Spark
BigQuery
サーバ構成管理
開発サポートツール
テストツール
開発手法
BI
Deep Learning
自然言語処理
BaaS
PaaS
Iaas
Saas
クラウド
AI
Payment
クラウドソフトウェア
仮想化ソフトウェア
OS
サーバ監視
ネットワーク
WEBサーバ
開発ツール
テキストエディタ
CSS
HTML
WEB知識
CMS
WEBマーケティング
グラフィック
グラフィックツール
Drone
AR
マーケット知識
セキュリティ
Shell
IoT
テスト
Block chain
知識

[Deep Learning Theory Team Seminar] Talk by Denny Wu (NYU) on Understanding the Mechanisms of Fast Hyperparameter Transfer

2026/04/10(金)
05:00〜06:00
Googleカレンダーに追加
参加者

87人/

主催:RIKEN AIP Public

Venue: Hybrid (Online and the Open Space at the RIKEN AIP Nihonbashi office)

Abstract:
The growing scale of deep learning models has rendered standard hyperparameter (HP) optimization prohibitively expensive. A promising solution is the use of scale-aware hyperparameters, which can enable direct transfer of optimal HPs from small-scale grid searches to large models with minimal performance loss. To understand the principles governing such transfer strategies, we develop a conceptual framework for reasoning about HP transfer across scale. In synthetic settings, we present quantitative examples where transfer either offers a provable computational advantage or fails even under μP. To explain the fast transfer observed in practice, we conjecture that decomposing the optimization trajectory reveals two contributions to loss reduction: (1) a width-stable component that determines the optimal HPs and (2) a width-sensitive component that improves with width but weakly perturbs the HP optimum. We present empirical evidence for this hypothesis in large language model pretraining.

Speaker Bio:
Dr. Denny Wu is a Faculty Fellow at the Center for Data Science, New York University, and the Flatiron Institute. His research focuses on developing a mathematical foundation for modern machine learning systems, particularly neural networks. He obtained his Ph.D. in Computer Science from the University of Toronto and the Vector Institute, under the supervision of Prof. Jimmy Ba and Prof. Murat A. Erdoğdu, and completed his undergraduate studies at Carnegie Mellon University under the supervision of Prof. Ruslan Salakhutdinov.

Workship