$$ \newcommand{\floor}[1]{\left\lfloor #1 \right\rfloor} \newcommand{\ceil}[1]{\left\lceil #1 \right\rceil} \newcommand{\N}{\mathbb{N}} \newcommand{\R}{\mathbb{R}} \newcommand{\Z}{\mathbb{Z}} \newcommand{\Q}{\mathbb{Q}} \newcommand{\C}{\mathbb{C}} \renewcommand{\L}{\mathcal{L}} \newcommand{\x}{\times} \newcommand{\contra}{\scalebox{1.5}{$\lightning$}} \newcommand{\inner}[2]{\left\langle #1 , #2 \right\rangle} \newcommand{\st}{\text{ such that }} \newcommand{\for}{\text{ for }} \newcommand{\Setcond}[2]{ \left\{\, #1 \mid #2 \, \right\}} \newcommand{\setcond}[2]{\Setcond{#1}{#2}} \newcommand{\seq}[1]{ \left\langle #1 \right\rangle} \newcommand{\Set}[1]{ \left\{ #1 \right\}} \newcommand{\set}[1]{ \Set{#1} } \newcommand{\sgn}{\text{sign}} \newcommand{\halfline}{\vspace{0.5em}} \newcommand{\diag}{\text{diag}} \newcommand{\legn}[2]{\left(\frac{#1}{#2}\right)} \newcommand{\ord}{\text{ord}} \newcommand{\di}{\mathrel{|}} \newcommand{\gen}[1] \newcommand{\irr}{\mathrm{irr }} \renewcommand{\deg}{\mathrm{deg }} \newcommand{\nsgeq}{\trianglelefteq} \newcommand{\nsg}{\triangleleft} \newcommand{\argmin}{\mathrm{argmin}} \newcommand{\argmax}{\mathrm{argmax}} \newcommand{\minimize}{\mathrm{minimize}} \newcommand{\maximize}{\mathrm{maximize}} \newcommand{\subto}{\mathrm{subject\ to}} \newcommand{\DKL}[2]{D_{\mathrm{KL}}\left(#1 \di\di #2\right)} \newcommand{\ReLU}{\mathrm{ReLU}} \newcommand{\E}{\mathsf{E}} \newcommand{\V}{\mathsf{Var}} \newcommand{\Corr}{\mathsf{Corr}} \newcommand{\Cov}{\mathsf{Cov}} \newcommand{\covariance}[1]{\Cov\left(#1\right)} \newcommand{\variance}[1]{\V\left[#1\right]} \newcommand{\variancewith}[1]{\V\left[#1\right]} \newcommand{\expect}[1]{\E\left[#1\right]} \newcommand{\expectwith}[2]{\E_{#1}\left[#2\right]} \renewcommand{\P}{\mathsf{P}} \newcommand{\uniform}[2]{\mathrm{Uniform}\left(#1 \dots #2\right)} \newcommand{\gdist}[2]{\mathcal{N}\left(#1, #2\right)} \DeclarePairedDelimiter{\norm}{\lVert}{\rVert} $$ \everymath{\displaystyle}

Strategies for pre-trained GNNs | Hyunju Kim

Strategies for pre-trained GNNs

W. Hj et al. / 2020 / ICLR

October 24, 2022

[abstract]

GNN에서도 pre-training이 효과적인가
pre-training GNN에 대한 새로운 self-supervised method 개발
- individual node 단위에서 학습
- entire graph에서 학습
분류 문제에서 multiple graph를 pre-train함
- naive approach는 성능 개선을 보여주지 못함
- 우리의 approach는 성능 개선을 보여줌

Introduction
- pre-training is useful
- task-specific labeled data can be extremely scarse (chemistry, biology)
- real world graph data에는 가끔씩 out-of-distribution samples가 있음 - application to graph is hard
- requires domain expertise to select examples and target labels (correlated with the downstream task of interest)
- otw, harm generalization → negative transfer - contributions
- conduct investigation of strategies
- develop effective pre-training strategy for GNNs
  - capture domain-specific knowledge about nodes → graph-level knowledge
- a.i: node-level: node끼리 cluster
- a.ii: graph-level: graph끼리 cluster
preliminaries of GNNs

Untitled

Strategies for pre-training GNNs

3.1 Node-level pre-training

Untitled

use unlabled data to capture domain-specific knowledge
1. context prediction
  - structure
    
    (1) k-hop neighbors
    
    (2) context graph: subgraphs between r1-hops and r2-hops (r1<K)
    
    ⇒ with b and c, we know how neighborhood and context graphs are connected with each other
  - encoding method (context → fixed vector)
    - context graphs → fixed-length vectors
    - use auxiliary GNN (context GNN)
  - learning via negative sampling
    - 목표: neighborhood와 context graph의 node가 같은 node인지
      - v’=v, G’=G : positive neighborhood-positive pair
      - randomly sample v’ from random G’: negative neighborhood-context pair
      - sampling ratio: one nagative: one positive
2. attribute masking
  - input node/edge attributes are randomly masked ⇒ predict
  - example
    - molecular graph: node attributes (atom type)
    - PPI: edge attributes (interactions)

3.2 Graph-Level pre-training

method : making predition about
1. domain-specific attributes
  - properties of molecules (supervised tasks)
  - naive performing may be fail → may be unrelated to the downstream task
  - regularize GNN at the level of individual nodes via node-level pre-training methods
2. graph structure
  - task: graph edit distance, structure similarity
  - graph distance values → difficult problem… ⇒ left this future work

3.3 Overview

[pre-training]

step1: node-level self-supervised pre-training
step2: graph-level multi-task supervised pre-training

[fine-tuning]

linear classifiers to predict graph labels

experiments

5.1 datasets

data splitting
- molecules: scaffold split → solve out-of-distribution generalization
- biology: species split
[result]

Untitled

[apply to graph NAS]

node level: attribute masking
graph level: graph edit distance
fine tuning