$$ \newcommand{\floor}[1]{\left\lfloor #1 \right\rfloor} \newcommand{\ceil}[1]{\left\lceil #1 \right\rceil} \newcommand{\N}{\mathbb{N}} \newcommand{\R}{\mathbb{R}} \newcommand{\Z}{\mathbb{Z}} \newcommand{\Q}{\mathbb{Q}} \newcommand{\C}{\mathbb{C}} \renewcommand{\L}{\mathcal{L}} \newcommand{\x}{\times} \newcommand{\contra}{\scalebox{1.5}{$\lightning$}} \newcommand{\inner}[2]{\left\langle #1 , #2 \right\rangle} \newcommand{\st}{\text{ such that }} \newcommand{\for}{\text{ for }} \newcommand{\Setcond}[2]{ \left\{\, #1 \mid #2 \, \right\}} \newcommand{\setcond}[2]{\Setcond{#1}{#2}} \newcommand{\seq}[1]{ \left\langle #1 \right\rangle} \newcommand{\Set}[1]{ \left\{ #1 \right\}} \newcommand{\set}[1]{ \Set{#1} } \newcommand{\sgn}{\text{sign}} \newcommand{\halfline}{\vspace{0.5em}} \newcommand{\diag}{\text{diag}} \newcommand{\legn}[2]{\left(\frac{#1}{#2}\right)} \newcommand{\ord}{\text{ord}} \newcommand{\di}{\mathrel{|}} \newcommand{\gen}[1] \newcommand{\irr}{\mathrm{irr }} \renewcommand{\deg}{\mathrm{deg }} \newcommand{\nsgeq}{\trianglelefteq} \newcommand{\nsg}{\triangleleft} \newcommand{\argmin}{\mathrm{argmin}} \newcommand{\argmax}{\mathrm{argmax}} \newcommand{\minimize}{\mathrm{minimize}} \newcommand{\maximize}{\mathrm{maximize}} \newcommand{\subto}{\mathrm{subject\ to}} \newcommand{\DKL}[2]{D_{\mathrm{KL}}\left(#1 \di\di #2\right)} \newcommand{\ReLU}{\mathrm{ReLU}} \newcommand{\E}{\mathsf{E}} \newcommand{\V}{\mathsf{Var}} \newcommand{\Corr}{\mathsf{Corr}} \newcommand{\Cov}{\mathsf{Cov}} \newcommand{\covariance}[1]{\Cov\left(#1\right)} \newcommand{\variance}[1]{\V\left[#1\right]} \newcommand{\variancewith}[1]{\V\left[#1\right]} \newcommand{\expect}[1]{\E\left[#1\right]} \newcommand{\expectwith}[2]{\E_{#1}\left[#2\right]} \renewcommand{\P}{\mathsf{P}} \newcommand{\uniform}[2]{\mathrm{Uniform}\left(#1 \dots #2\right)} \newcommand{\gdist}[2]{\mathcal{N}\left(#1, #2\right)} \DeclarePairedDelimiter{\norm}{\lVert}{\rVert} $$ \everymath{\displaystyle}

How powerful are performance predictors in NAS? | Hyunju Kim

How powerful are performance predictors in NAS?

Colin White et al. / NeurIPS / 2021

May 17, 2023

[Abstract]

compare predictor
- curve extrapolation
- weight sharing
- supervised learning
- zero-cost proxies
test correlation- rank-based performance measures

[Introduction]

Untitled

Untitled

contributions
- the first large-scale study of performance predictors
- release comprehensive library of 31 performance predictors
- combining different families of performance predictors → better predictive power

[2. Related work]

NAS
- initial: RL, EL, one-shot, predictor-based
- recent: tree-based methods

[3. Performance prediction methods for NAS]

goal: find a model with smallest validation error
due to computational cost, introduce performance predictor f’ which is aligned with f (validation error)
performance predictor
- initialize routine: first time
- query routine: many time
model-based (trainable) methods
- most common
- initialization routine: fully training many architectures
- query time: less than a second
- framework: BO, evolutionary, tree-based
learning curve-based methods
- partially trained network, extrapolating the learning curve
- doesn’t require initialization time
- query time takes minutes
hybrid methods
- curve + model-basd methods
zero-cost methods
- initialize
weight sharing methods
- all architeuctures in the search space are combined to form a single over-parameterized supernetwork
- not effective at ranking
tradeoff between initialize and query time
- different model requires different initialize / query time

[4. Experiments]

Untitled

NAS benchmark datasets
- NAS-Bench-101
- NAS-Bench-201
- DARTS search space : 1e18
- NAS-Bench-301
- NAS-Bench-NLP: 1e53

Untitled