Sangdoo Yun

I am a research scientist at Naver AI Lab, working on computer vision and machine learning towards real-world applications.

At Naver, I've worked for network architectures (ReXNet, PiT), training techniques (CutMix, ReLabel, AdamP, KD), and robustness (ReBias). I've also participated on Naver's OCR (e.g., CRAFT, STR, Donut) and Face products.

I received my MS, and PhD in computer vision at Seoul National University in 2013 and 2017, respectively, under supervision of Prof. Jin Young Choi. I received my BS from Seoul National University in 2010.

I also became an adjunct professor in Dept. of Computer Science and Engineering at Seoul National University from September 2021.

Email  /  CV  /  Google Scholar  /  Github

profile photo
News
Research

I am interested in training robust, generalizable, and transferable ML models for real-world applications.

lee2022cvpr_wood Weakly Supervised Semantic Segmentation using Out-of-Distribution Data
Jungbeom Lee, Seong Joon Oh, Sangdoo Yun, Junsuk Choe, Eunji Kim, Sungroh Yoon.
CVPR, 2022
Bibtex / Code

Weakly supervised semantic segmentation (WSSS) suffers from spurious correlations between foreground (e.g., train) and background (e.g., rail). Our idea is to collect background images without any foreground pixels (e.g., collecting railroad images without trains). Then we teach the model not to see the background pixels to classify foreground class. Adding small amount of background images brings large performance gain in WSSS.

park2021cmo The Majority Can Help The Minority: Context-rich Minority Oversampling for Long-tailed Classification
Seulki Park, Youngkyu Hong, Byeongho Heo, Sangdoo Yun, Jin Young Choi.
CVPR, 2022
Bibtex

Data oversampling is a simple solution for long-tailed classification, but it may exacerbate overfitting with limited context information. Motivated from CutMix, we introduce a simple context-rich oversampling method. Interestingly, majority classes play a key role for boosting classification accuracy of minority classes!

luca2021shortcut Which shortcut cues will dnns choose? a study from the parameter-space perspective
Luca Scimeca*, Seong Joon Oh*, Sanghyuk Chun, Michael Poli, Sangdoo Yun.
*Equal contribution
ICLR, 2022
Bibtex / OpenReview

What causes shortcut learning problem? We observe the model's behaviors when we provide equal chance of being fit to multiple cues (e.g., color and shape with equal chance). Interestingly, the model would like to fit into a certain cue (e.g., color than shape) in such even situation. This paper explains the reason in terms of parameter-space perspective.

kim2021donut Donut 🍩: Document Understanding Transformer without OCR
Geewook Kim, Teakgyu Hong, Moonbin Yim, Jinyoung Park, Jinyeong Yim, Wonseok Hwang, Sangdoo Yun, Dongyoon Han, Seunghyun Park
arXiv, 2021
Bibtex

Current visual document understanding (VDU) models heavily rely on external OCR framework (e.g., text detection, text recognition). OCR is expensive and sometimes not available. We bravely remove the dependency of OCR by modeling a simple transformer architecture. Take our highly efficient and powerful VDU model, Donut 🍩!

cho2021text Detecting and Removing Text in the Wild
Junho Cho, Sangdoo Yun, Dongyoon Han, Byeongho Heo, Jin Young Choi.
IEEE Access, 2021
Bibtex

Unifyied text detection and text removal framework for scene text removal in the wild.

heo2021iccv_pit Rethinking spatial dimensions of vision transformers
Byeongho Heo, Sangdoo Yun, Dongyoon Han, Sanghyuk Chun, Junsuk Choe, Seong Joon Oh.
ICCV, 2021
Bibtex / Code

The Vision transformer (ViT) has become a strong design principle for vision modeling. Because ViT is originated from NLP's Transformer, it has no intermediate pooling layers, which is common in CNNs. We simply inject the pooling concept on ViT and introduce a new architecture PiT.

kim2021iccv_wsol Normalization Matters in Weakly Supervised Object Localization
Jeesoo Kim, Junsuk Choe, Sangdoo Yun, Nojun Kwak.
ICCV, 2021
Bibtex / Code

We investigates the effect of CAM (CVPR'16) normalization on WSOL, and suggest a new normalization method.

yun2021relabel Re-labeling ImageNet: from Single to Multi-Labels, from Global to Localized Labels
Sangdoo Yun, Seong Joon Oh, Byeongho Heo, Dongyoon Han, Junsuk Choe, Sanghyuk Chun.
CVPR, 2021
Bibtex / Code / Video / Poster

ImageNet has lots of label noises and there have been efforts to fix them on the evaluation set (e.g. Shankar et al., Bayer et al.). We paid our attention to the training set, whose label noises have been overlooked, and release the re-labeled ImageNet and codebase (published at this repo). The re-labeled data improves the ImageNet and downstream task accuracies.

han2021cvpr_rex Rethinking Channel Dimensions for Efficient Model Design
Dongyoon Han, Sangdoo Yun, Byeongho Heo, Youngjoon Yoo.
CVPR, 2021
Bibtex / Code

CNN architectures (e.g., ResNet, MobileNet, etc.) usually follows the same feature-map down-sampling policy. We conjecture such design policy would harm the representation ability of intermediate layers. We analyze the feature-map's rank (inspired by softmax-bottleneck) and suggests a new network architecture, namely, Rank eXpanded Network (ReXNet).

heo2021iclr_adamp AdamP: Slowing Down the Slowdown for Momentum Optimizers on Scale-invariant Weights
Byeongho Heo*, Sanghyuk Chun*, Seong Joon Oh, Dongyoon Han, Youngjung Uh, Sangdoo Yun, Jungwoo Ha.
*Equal contribution
ICLR, 2021
Bibtex / Code / Project

Adding projection operation on Adam and SGD optimizer to mitigate slowdown of convergence due to rapidly increased norm. It leads to performance improvements across the board with easy installation (pip install adamp).

yun2020videomix VideoMix: Rethinking Data Augmentation for Video Classification
Sangdoo Yun, Seong Joon Oh, Byeongho Heo, Dongyoon Han, Jinhyung Kim.
arXiv, 2020
Bibtex

Extension of CutMix to video recognition. We search for the best mixing strategy for video tasks.

bahng2020rebias Learning De-biased Representations with Biased Representations
Hyojin Bahng, Sanghyuk Chun, Sangdoo Yun, Jaegul Choo, Seong Joon Oh,
ICML, 2020
Bibtex / Code / ICML Virtual / Youtube

Models tend to learn biased representations. To "de-bias" model representation, we "minus" biased representation from the target model.

yoo2019extd EXTD: Extremely tiny face detector via iterative filter reuse
Youngjoon Yoo, Dongyoon Han, Sangdoo Yun.
arXiv, 2019
Bibtex / Code

Face detector has multi-stage for multi-resolution, but it indeed does not require such complex feature encoding. We introduce an extremely tiny face detector via iterative filter reuse.

yun2019cutmix CutMix: Regularization Strategy to Train Strong Classifiers with Localizable Features
Sangdoo Yun, Dongyoon Han, Seong Joon Oh, Sanghyuk Chun, Junsuk Choe, Youngjoon Yoo.
ICCV, 2019 (Oral Presentation)
Bibtex / Code / Talk / Poster / Blog

Simple cut-and-paste strategy brings significant performance boosts across tasks and datasets.

baek2019STR What Is Wrong with Scene Text Recognition Model Comparisons? Dataset and Model Analysis
Jeonghun Baek, Geewook Kim, Junyeop Lee, Sungrae Park, Dongyoon Han, Sangdoo Yun, Seong Joon Oh, Hwalsuk Lee.
ICCV, 2019 (Oral Presentation)
Bibtex / Code

Scene text recognition evaluation has been somewhat wrong because the model and dataset were not controlled. We provide unified benchmark protocol and fairly reproduced results. We also found a new architecture from those unified experiments.

heo2019iccv_od A Comprehensive Overhaul of Feature Distillation
Byeongho Heo, Jeesoo Kim, Sangdoo Yun, Hyojin Park, Nojun Kwak, Jin Young Choi.
ICCV, 2019
Bibtex / Code

There are lots of options for feature distillation: loss function, distillation position, teacher/student transforms. We study all the possible methods and provide comprehensive overhaul for feature distillation. Through this, we found the best feature distillation method which even beats the teacher's accuracy.

chun2019robustness An Empirical Evaluation on Robustness and Uncertainty of Regularization Methods
Sanghyuk Chun, Seong Joon Oh, Sangdoo Yun, Dongyoon Han, Junsuk Choe, Youngjoon Yoo.
ICML Workshop, 2019
Bibtex

We provide structured experimental results for the effectiveness of regularization methods on robustness and uncertainty benchmarks.

baek2019craft Character Region Awareness for Text Detection
Youngmin Baek, Bado Lee, Dongyoon Han, Sangdoo Yun, Hwalsuk Lee.
CVPR, 2019
Bibtex / Code

Text detectors often fail to detect real-world scene-texts, e.g., curved or long texts. We propose a two-stage approach; first detect individual characters and connect them. We also introduce semi-weakly-supervised training trick to boost our detector's performance.

heo2019aaai_ab Knowledge Transfer via Distillation of Activation Boundaries Formed by Hidden Neurons
Byeongho Heo, Minsik Lee, Sangdoo Yun, Jin Young Choi.
AAAI, 2019 (Oral Presentation)
Bibtex / Code

Previous feature distillation approach (e.g. FitNet) focuses on mimicking the teacher's feature values. Rather, our goal is to transfer the actual "activation boundary" by assigning binary labels (i.e. activated or not) for all the neurons. Our loss minimizes the binary-labels' similarity. It shows outperforming performance against state-of-the-art KD methods.

heo2019aaai_adv Knowledge Distillation with Adversarial Samples Supporting Decision Boundary
Byeongho Heo, Minsik Lee, Sangdoo Yun, Jin Young Choi.
AAAI, 2019
Bibtex / Code

To find teacher network's decision boundary more precisely, we adopt adversarial attack technique. We show the attacked samples improve distillation performance.

lee2018keypatchgan Unsupervised Holistic Image Generation from Key Local Patches
Donghoon Lee, Sangdoo Yun, Sungjoon Choi, Hwiyeon Yoo, Ming-Hsuan Yang, Songhwai Oh
ECCV, 2018
Bibtex / Code

We train a GAN model that generates a holistic image from its small parts.

choi2018traca Context-aware Deep Feature Compression for High-speed Visual Tracking
Jongwon Choi, Hyung Jin Chang, Tobias Fischer, Sangdoo Yun, Kyuewang Lee, Jiyeoup Jeong, Yiannis Demiris, Jin Young Choi.
CVPR, 2018
Bibtex / Code

Correlation-based trackers have shown promising performance using hand-crafted features (e.g., HOG). When adopting deep features for correlation-based trackers, the bottleneck is the computing costs for CNN feature extraction. We propose a deep feature compression method for high-speed and high-accuracy visual tracker.

yun2018tnnls_adnet Action-Driven Visual Object Tracking with Deep Reinforcement Learning
Sangdoo Yun, Jongwon Choi, Youngjoon Yoo, Kimin Yun, Jin Young Choi.
TNNLS, 2018
Bibtex / Code

A journal extension of ADNet (CVPR'17).

yun2017_adnet Action-Decision Networks for Visual Tracking with Deep Reinforcement Learning
Sangdoo Yun, Jongwon Choi, Youngjoon Yoo, Kimin Yun, Jin Young Choi.
CVPR, 2017 (Spotlight Presentation)
Bibtex / Code

We fomulate visual tracking as a decision making process and propose a reinforcement learning method to train visual trackers. Our RL-based tracker shows state-of-the-art level performance and especially it shows high efficiency with semi-supervised scenario.

yoo2017vae Variational Autoencoded Regression: High Dimensional Regression of Visual Data on Complex Manifold
Youngjoon Yoo, Sangdoo Yun, Hyung Jin Chang, Yiannis Demiris, Jin Young Choi.
CVPR, 2017
Bibtex / Code

Generating visual data from given condition (e.g. frame index, pose skeleton, etc.) is difficult due to the visual data's high dimensions. Our idea is to regress the visual data in latent space which is encoded by VAE. Our method can generate high-quality visual data from frame index or pose skeletons.

choi2017acfn Attentional Correlation Filter Network for Adaptive Visual Tracking
Jongwon Choi, Hyung Jin Chang, Sangdoo Yun, Tobias Fischer, Yiannis Demiris, Jin Young Choi.
CVPR, 2017
Bibtex / Code

Correlation-filter-based trackers usually use pre-defined feature extractor (e.g., color, edge, etc). Using more correlation filters with diverse feature extractors at the same time will bring higher accuracy, but it induces speed-accuracy trade-off. This work extends the number of correlation filters more than one hundred for maximizing accuracy. To deal with heavy computation, we introduce a LSTM-based attentional filter selection approach. Our method the state-of-the-art performance amongst real-time trackers.

cho2017palette PaletteNet: Image Recolorization with Given Color Palette
Junho Cho, Sangdoo Yun, Kyoung Mu Lee, Jin Young Choi.
CVPR Workshop, 2017
Bibtex

We propose a image colorization method from the given palette.

yoo2017butterfly Butterfly Effect: Bidirectional Control of Classification Performance by Small Additive Perturbation
Youngjoon Yoo, Seonguk Park, Junyoung Choi, Sangdoo Yun, Nojun Kwak.
arXiv, 2017
Bibtex

This paper proposes a new algorithm for controlling classification results by generating a small perturbation without changing the classifier network. We show that the perturbation can degrade the performance like adversarial attack, or can improve classification accuracy as well.

yoo2016lda Visual Path Prediction in Complex Scenes with Crowded Moving Objects
Youngjoon Yoo, Kimin Yun, Sangdoo Yun, JongHee Hong, Hawook Jeong, Jin Young Choi.
CVPR, 2016
Bibtex

Learn latent Dirichlet allocation model from the trajectory of people and predict future paths of people.

yun2016density Density-aware Pedestrian Proposal Networks for Robust People Detection in Crowded Scenes
Sangdoo Yun, Kimin Yun, Jongwon Choi, Jin Young Choi.
ECCV Workshop, 2016
Bibtex

Detecting people in crowded scene by considering crowd density information. Our intuition is more people should be detected in crowded region.

yun2016voting Voting-based 3D Object Cuboid Detection Robust to Partial Occlusion from RGB-D Images
Sangdoo Yun, Hawook Jeong, Soo Wan Kim, Jin Young Choi.
WACV, 2016
Bibtex

Predicting holistic 3D structure from pratially occluded RGB-D images. The key idea is a voting mechanism. Each part of an object indicates the center of the 3D structure.

yun2014vsb Visual Surveillance Briefing System: Event-based Video Retrieval and Summarization
Sangdoo Yun, Kimin Yun, Soo Wan Kim, Youngjoon Yoo, Jiyeoup Jeong.
AVSS, 2014 (Oral Presentation)
Bibtex

We propose a Visual Surveillance Briefing (VSB) system which generates summarized video with important events.

yun2014dpm Self-organizing Cascaded Structure of Deformable Part Models for Fast Object Detection
Sangdoo Yun, Hawook Jeong, Woo-Sung Kang, Byeongho Heo, Jin Young Choi.
ICPR, 2014
Bibtex

We improve the computational efficiency of deformable part model (DPM) by re-organizing the order of part filters. With a cascaded structure, we place more important part filter at first for early rejection.

yun2012mgp Multiple ground plane estimation for 3D scene understanding using a monocular camera
Sangdoo Yun, Soo Wan Kim, Kwang Moo Yi, Haan-ju Yoo, Jin Young Choi.
IVCNZ, 2012 (Oral Presentation)
Bibtex

Ground plain estimation is important for 3D scene understanding. Usually models assume the scene has a single ground plain, but sometimes it has multiple ground planes. We introduce multiple ground plane estimation for more robust scene understanding.

Academic service
imagenet2021neurips_workshop Workshop on ImageNet: Past, Present, and Future.
Zeynep Akata, Lucas Beyer, Sanghyuk Chun, Almut Sophia Koepke, Diane Larlus, Seong Joon Oh, Rafael Sampaio de Rezende, Sangdoo Yun, ‪Xiaohua Zhai‬.
NeurIPS, 2021
Website / Virtual Page / Preview in CV News

ImageNet has played an important role in CV and ML in the last decade. It was created to train image classifiers at first but it has become a go-to benchmark for model architecture and training techniques. We believe now is a good time to discuss the ImageNet and its future. The workshop's questions will be like: Did we solve ImageNet? What have we learned from ImageNet? What should the next-generation ImageNet-like dataset be?


Reviewing activities

  • Serve as a reviewer at CVPR, ICCV, ECCV, ICML, NeurIPS, ICLR, AAAI, etc.
  • Outstanding reviewer awards at CVPR'21, ICCV'21.
  • Serve as a meta-reviewer at AAAI'22
Talks


Template borrowed from Jon Barron and Seong Joon Oh.