Sangdoo Yun
I'm a research director at Naver AI Lab,
working on various machine learning models towards real-world applications.
At Naver, I've worked for network architectures (ReXNet, PiT),
training techniques (CutMix, ReLabel, AdamP,
KD), and
robustness (ReBias, Shortcut learning, Model Stock).
I've also participated on Naver's OCR
(e.g., CRAFT, STR, Donut),
face recognition, and LLMs (e.g., Cream) products.
I received my MS, and PhD in computer vision at Seoul National University in 2013 and 2017, respectively, under supervision of Prof. Jin Young Choi.
I received my BS from Seoul National University in 2010.
I'm also an adjunct professor at SNU AI Inst. from Sep 2022, continuing my previous position at SNU CSE Dept (Sep 2021 - Aug 2022).
Email  / 
CV  / 
Google Scholar  / 
Github
|
|
Research
I am interested in training robust, generalizable, and transferable
ML models (including vision, language, and vision-language models) for real-world applications.
|
|
Model Stock: All we need is just a few fine-tuned models
Dong-Hwan Jang,
Sangdoo Yun*,
Dongyoon Han*.
ECCV, 2024 (Oral Presentation)
arXiv /
Bibtex /
Code
Averaging many models (e.g., 70 models for Model soups) improves in-distribution and out-of-distribution performance.
Our method, Model Stock (an analogy for very efficient version of Model soup), can achieve comparative performance with just a few fine-tuned models.
Our secret is in the weight space of fine-tuned models. We find that the models have special geometric characteristics
and we derive an efficient way to achieve the effect of averaging many models.
|
|
HYPE: Hyperbolic Entailment Filtering for Underspecified Images and Texts
Wonjae Kim,
Sanghyuk Chun,
Taekyung Kim,
Dongyoon Han,
Sangdoo Yun.
ECCV, 2024 (Oral Presentation)
arXiv /
Bibtex /
Code
Data quality, especially specificity and clarity, is crucial for self-supervised model training.
We introduce Hyperbolic Entailment Filtering (HYPE) for images and texts, with state-of-the-art performance in
the DataComp benchmark.
|
|
Rotary Position Embedding for Vision Transformer
Byeongho Heo,
Song Park,
Dongyoon Han,
Sangdoo Yun.
ECCV, 2024
arXiv /
Bibtex /
Code
We study the impact of Rotary Position Embedding (RoPE) on vision models, especially Vision Transformers (ViTs).
From the original 1D RoPE for text data, we search for practical implementation for 2D vision data.
Our analysis shows that RoPE shows remarkable extrapolation performance (e.g., increasing image resolution).
|
|
TRAP: Targeted Random Adversarial Prompt Honeypot for Black-Box Identification
Martin Gubri,
Dennis Thomas Ulmer,
Hwaran Lee,
Sangdoo Yun,
Seong Joon Oh.
ACL Findings, 2024
arXiv /
Bibtex /
Code
LLM weights are valuable assets, and their usage should be strictly controlled under rules given by the providers, either in open- or closed-source models.
For instance, when a new model has came, the provider wants to know whether the model is based on their own LLMs or not.
To this end, we propose TRAP, a black-box LLM identification.
TRAP asks the model a very specific question (e.g., give a random 4 digit number) that only one company's machine will answer in a certain way.
|
|
Calibrating Large Language Models Using Their Generations Only
Dennis Thomas Ulmer,
Martin Gubri,
Hwaran Lee,
Sangdoo Yun,
Seong Joon Oh.
ACL, 2024
arXiv /
Bibtex /
Code
Our goal is to calibrate the confidence of LLMs to increase the reliability of their outputs.
We propose APRICOT, a method to calibrate the confidence of LLMs via utilizing external lightweight model.
|
(generated by Dall-E)
|
TimeChara: Evaluating Point-in-Time Character Hallucination of Role-Playing Large Language Models
Jaewoo Ahn,
Taehyun Lee,
Junyoung Lim,
Jin-Hwa Kim,
Sangdoo Yun,
Hwaran Lee,
Gunhee Kim.
ACL Findings, 2024
arXiv /
Bibtex /
Code
When interacting with LLM-based chatbot, especially in role-playing scenarios, users expect the model to act as a specific character at a specific time point.
We propose TimeChara, a method to evaluate the point-in-time character hallucination of role-playing large language models.
TimeChara can evaluate the model's consistency and coherence in the role-playing task.
|
(generated by Dall-E)
|
Who Wrote this Code? Watermarking for Code Generation
Taehyun Lee*,
Seokhee Hong*,
Jaewoo Ahn,
Ilgee Hong,
Hwaran Lee,
Sangdoo Yun,
Jamin Shin,
Gunhee Kim.
ACL, 2024
arXiv /
Bibtex /
Code
We propose SWEET, a watermarking method for code generation LLMs.
Previous watermarking methods fail to handle code generation scenarios due to syntactic and semantic characteristics of code.
SWEET leverages the entropy of token distribution, and shows strong watermarking performance in code generation.
|
|
Toward Interactive Regional Understanding in Vision-Large Language Models
Jungbeom Lee,
Sanghyuk Chun*,
Sangdoo Yun*.
NAACL, 2024
arXiv /
Bibtex /
Code
We propose RegionVLM, a vision-large language model that can interactively attend to regions of interest in images.
Region-VLM simply handles the regional infomation (e.g., coordinates) as additional text inputs.
Training with Localized Narratives dataset, RegionVLM shows improved performance in regional understanding tasks.
|
|
Language-only Efficient Training of Zero-shot Composed Image Retrieval
Geonmo Gu*,
Sanghyuk Chun*,
Wonjae Kim,
Yoohoon Kang,
Sangdoo Yun.
CVPR, 2024
arXiv /
Bibtex /
Code /
Demo
Do we really need images for training composed image retrieval (CIR) models? Our answer is NO.
We propose LinCIR, a language-only training method for CIR.
LinCIR shows strong performance with efficeint training cost (e.g., training LinCIR with a CLIP ViT-G backbone in 1 hour).
|
|
Compressed Context Memory for Online Language Model Interaction
Jang-Hyun Kim,
Junyoung Yeom,
Sangdoo Yun*,
Hyun Oh Song*.
ICLR, 2024
arXiv /
Bibtex /
Code
How can we use infinitly long contexts for LLMs? Inspired from the gist token, we propose an online context compression method.
Our method can compress accumulated attention KVs into few [comp] tokens with 5x smaller context memory size.
|
|
Prometheus: Inducing Fine-grained Evaluation Capability in Language Models
Seungone Kim*,
Jamin Shin*,
Yejin Cho*,
Joel Jang,
Shayne Longpre,
Hwaran Lee
Sangdoo Yun,
Seongjin Shin,
Sungdong Kim,
James Thorne,
Minjoon Seo
ICLR, 2024
arXiv /
Bibtex /
Code
We introduce Prometheus, a fully open-sourced LLMs with GPT-4 compatible evaluation performance.
We built the Feedback Collection dataset, which is also open-sourced, including more than 20K instructions and 100K responses.
|
|
Cream: Visually-Situated Natural Language Understanding with Contrastive Reading Model and Frozen Large Language Models
Geewook Kim,
Hodong Lee,
Daehee Kim,
Haeji Jung ,
Sanghee Park,
Yoonsik Kim,
Sangdoo Yun,
Taeho Kil,
Bado Lee,
Seunghyun Park
EMNLP, 2023
arXiv /
Bibtex /
Code /
Demo
After introducing Donut, we build Cream🍦 which leverages large language models (LLMs).
To mitigate the gap between vision encoders and LMs, we propose auxiliary encoders and contrastive learning scheme.
Cream demonstrates robust and impressive document understanding performance.
|
|
ProPILE: Probing Privacy Leakage in Large Language Models
Siwon Kim,
Sangdoo Yun,
Hwaran Lee,
Martin Gubri,
Sungroh Yoon,
Seong Joon Oh.
NeurIPS, 2023 (Spotlight)
arXiv /
Bibtex /
tweet
Code /
Demo
Perhaps, Large language models (LLMs) can answer just about anything, with their hyper-scale parameters and data.
However, they may answer your private information (i.e., personally identifiable information (PII)), then it could be problematic.
With our probing tool, ProPILE, we can investigate whether the model reveals our personal information or not.
|
|
Neural Relation Graph for Identifying Problematic Data
Jang-Hyun Kim,
Sangdoo Yun,
Hyun Oh Song.
NeurIPS, 2023
arXiv /
Bibtex /
Code
Problematic data (e.g., outlier data or incorrect labels) harm model performance and robustness.
However, identifying such problematic data in large-scale datasets is quite challenging.
Our solution focuses on the relationship among data, particularly in the feature space.
By utilizing our relation graph, we can easily determine whether a data point is an outlier, has a misassigned label, or is perfectly fine.
|
|
CompoDiff: Versatile Composed Image Retrieval With Latent Diffusion
Geonmo Gu*,
Sanghyuk Chun*,
Wonjae Kim,
HeeJae Jun,
Yoohoon Kang,
Sangdoo Yun.
*Equal contribution
TMLR, 2024
arXiv /
Bibtex /
Code /
Demo
We propose a diffusion-based model, CompoDiff, for Composed Image Retrieval (CIR) task.
To train the model, we created a new dataset comprising 18 million triplets of images and associated conditions.
CompoDiff shows state-of-the-art zero-shot CIR performance.
|
|
Neglected Free Lunch -- Learning Image Classifiers Using Annotation Byproducts
Dongyoon Han*,
Junsuk Choe*,
Dante Chun,
John Joon Young Chung,
Minsuk Chang,
Sangdoo Yun,
Jean Y. Song,
Seong Joon Oh.
*Equal contribution
ICCV, 2023
arXiv /
cvf /
Bibtex /
Code & Dataset /
Video /
slide (@KCCV2024)
When annotating data, annotators unintionally generate auxiliary information during the annotation task,
such as mouse traces, mouse clicks, time durations.
We call them annotation byproducts (AB).
We propose the new paradigm of learning using annotation byproducts (LUAB) which can enhance robustness of image classifiers by aligning them with human recognition mechanisms.
|
|
SeiT: Storage-Efficient Vision Training with Tokens Using 1% of Pixel Storage
Song Park*,
Sanghyuk Chun*,
Byeongho Heo,
Wonjae Kim,
Sangdoo Yun.
*Equal contribution
ICCV, 2023
arXiv /
cvf /
Bibtex /
Code
Vision deep models are image data hungry, but image storage has become a bottleneck (e.g., LAION-5B images require 240 TB).
We propose a storage-efficient training method, SeiT, that utilizes only 1% of standard pixel storage without sacrificing accuracy.
|
|
MPChat: Towards Multimodal Persona-Grounded Conversation
Jaewoo Ahn,
Yeda Song,
Sangdoo Yun,
Gunhee Kim.
ACL, 2023
arXiv /
Bibtex /
Code
Building persona is crucial for personalized dialog sistem.
We explore additional vision modality beyond text-based persona.
To this end, we collect multimodal persona dialog dataset (MPChat) and demonstrate how vision modality help the conversation.
|
|
What Do Self-Supervised Vision Transformers Learn?
Namuk Park,
Wonjae Kim,
Byeongho Heo,
Taekyung Kim,
Sangdoo Yun.
ICLR, 2023
OpenReview /
Poster /
Slide /
arXiv /
Bibtex /
Code
What are the differences between contrastive learning (CL) and masked image modeling (MIM)?
Our findings indicate that: (1) CL captures global patterns more effectively than MIM,
(2) CL learns shape-oriented features while MIM focuses on texture-oriented features,
and (3) CL plays a key role in later layers, whereas MIM is more concentrated on early layers.
|
|
Exploring Temporally Dynamic Data Augmentation for Video Recognition
Taeoh Kim,
Jinhyung Kim,
Minho Shim,
Sangdoo Yun,
Myunggu Kang,
Dongyoon Wee,
Sangyoun Lee.
ICLR, 2023 (Notable Top 25%)
OpenReview /
arXiv /
Bibtex
We introduce DynaAugment, a new video data augmentation to capture the temporal dynamics in videos.
DynaAugment changes the magnitude of augmentation operation over time to emulate temporal dynamics found in real-world videos.
|
|
A Unified Analysis of Mixed Sample Data Augmentation: A Loss Function Perspective
Chanwoo Park*,
Sangdoo Yun*,
Sanghyuk Chun.
*Equal contribution
NeurIPS, 2022
OpenReview /
arXiv /
Poster /
Bibtex /
Code
Mixed sample data augmentation (MSDA), such as mixup and CutMix, has become a de facto strategy,
but its understanding is not studied deeply yet.
We introduce the first unified theoretical analysis for MSDAs and figure out what is the difference between mixup and CutMix.
Up on the analysis, we build a simple hybrid version of mixup and CutMix to leverage the advantages of mixup and CutMix.
|
|
Donut 🍩: Document Understanding Transformer without OCR
Geewook Kim,
Teakgyu Hong,
Moonbin Yim,
Jinyoung Park,
Jinyeong Yim,
Wonseok Hwang,
Sangdoo Yun,
Dongyoon Han,
Seunghyun Park
ECCV, 2022
arXiv /
Bibtex /
Code
Current visual document understanding (VDU) models heavily rely on external OCR framework (e.g., text detection,
text recognition).
OCR is expensive and sometimes not available.
We bravely remove the dependency of OCR by modeling a simple transformer architecture.
Take our highly efficient and powerful VDU model, Donut 🍩!
|
|
Dataset Condensation via Efficient Synthetic-Data Parameterization
Jang-Hyun Kim,
Jinuk Kim,
Seong Joon Oh,
Sangdoo Yun,
Hwanjun Song,
Joonhyun Jeong,
Jung-Woo Ha,
Hyun Oh Song.
ICML, 2022
Bibtex /
Code
Data condensation is a trick to compress training data by synthesizing them into several images.
The goal is to obtain higher performance with lower consumption of data storage.
We propose practical tricks for data condensation to bring it into more practical real-world settings (e.g., 224x224 size with ImageNet)
beyond previous toy-ish settings (e.g., 32x32 size with CIFARs).
|
|
Dataset Condensation with Contrastive Signals
Saehyung Lee,
Sanghyuk Chun,
Sangwon Jung,
Sangdoo Yun,
Sungroh Yoon.
ICML, 2022
Bibtex
Existing data condensation methods deal with class-wise gradients and ignore the inter-class information.
We show it would degrade performance in practical scenarios like fine-grained classification.
Our simple remedy is modifying the loss function to integrate contrastive signals, which shows effectiveness in several practical scenarios.
|
|
Weakly Supervised Semantic Segmentation using Out-of-Distribution Data
Jungbeom Lee,
Seong Joon Oh,
Sangdoo Yun,
Junsuk Choe,
Eunji Kim,
Sungroh Yoon.
CVPR, 2022
Bibtex /
Code
Weakly supervised semantic segmentation (WSSS) suffers from spurious correlations between foreground (e.g., train) and background (e.g., rail).
Our idea is to collect background images without any foreground pixels (e.g., collecting railroad images without trains).
Then we teach the model not to see the background pixels to classify foreground class.
Adding small amount of background images brings large performance gain in WSSS.
|
|
The Majority Can Help The Minority: Context-rich Minority Oversampling for Long-tailed Classification
Seulki Park,
Youngkyu Hong,
Byeongho Heo,
Sangdoo Yun,
Jin Young Choi.
CVPR, 2022
Bibtex /
Code
Data oversampling is a simple solution for long-tailed classification, but it may exacerbate overfitting with limited context information.
Motivated from CutMix, we introduce a simple context-rich oversampling method.
Interestingly, majority classes play a key role for boosting classification accuracy of minority classes!
|
|
Hypergraph-Induced Semantic Tuplet Loss for Deep Metric Learning
Jongin Lim
Sangdoo Yun,
Seulki Park,
Jin Young Choi.
CVPR, 2022
Bibtex /
Code
We formulate deep metric learning as a hypergraph node classification problem to capture multilateral relationship by semantic tuples beyond previous pairwise relationship-based methods.
|
|
Which shortcut cues will dnns choose? a study from the parameter-space perspective
Luca Scimeca*,
Seong Joon Oh*,
Sanghyuk Chun,
Michael Poli,
Sangdoo Yun.
*Equal contribution
ICLR, 2022
Bibtex /
OpenReview
What causes shortcut learning problem?
We observe the model's behaviors when we provide equal chance of being fit to multiple cues (e.g., color and shape with equal chance).
Interestingly, the model would like to fit into a certain cue (e.g., color than shape) in such even situation.
This paper explains the reason in terms of parameter-space perspective.
|
|
Detecting and Removing Text in the Wild
Junho Cho,
Sangdoo Yun,
Dongyoon Han,
Byeongho Heo,
Jin Young Choi.
IEEE Access, 2021
Bibtex
Unifyied text detection and text removal framework for scene text removal in the wild.
|
|
Rethinking spatial dimensions of vision transformers
Byeongho Heo,
Sangdoo Yun,
Dongyoon Han,
Sanghyuk Chun,
Junsuk Choe,
Seong Joon Oh.
ICCV, 2021
Bibtex /
Code
The Vision transformer (ViT) has become a strong design principle for vision modeling.
Because ViT is originated from NLP's Transformer,
it has no intermediate pooling layers, which is common in CNNs.
We simply inject the pooling concept on ViT and introduce a new architecture PiT.
|
|
Normalization Matters in Weakly Supervised Object Localization
Jeesoo Kim,
Junsuk Choe,
Sangdoo Yun,
Nojun Kwak.
ICCV, 2021
Bibtex /
Code
We investigates the effect of CAM (CVPR'16) normalization on WSOL,
and suggest a new normalization method.
|
|
Re-labeling ImageNet: from Single to Multi-Labels, from Global to Localized Labels
Sangdoo Yun,
Seong Joon Oh,
Byeongho Heo,
Dongyoon Han,
Junsuk Choe,
Sanghyuk Chun.
CVPR, 2021
Bibtex /
Code /
Video /
Poster
ImageNet has lots of label noises and there have been efforts to fix them on the evaluation set
(e.g. Shankar et al., Bayer et al.).
We paid our attention to the training set, whose label noises have been overlooked, and release the re-labeled ImageNet and codebase
(published at this repo).
The re-labeled data improves the ImageNet and downstream task accuracies.
|
|
Rethinking Channel Dimensions for Efficient Model Design
Dongyoon Han,
Sangdoo Yun,
Byeongho Heo,
Youngjoon Yoo.
CVPR, 2021
Bibtex /
Code
CNN architectures (e.g., ResNet, MobileNet, etc.) usually follows the same feature-map down-sampling policy.
We conjecture such design policy would harm the representation ability of intermediate layers.
We analyze the feature-map's rank (inspired by softmax-bottleneck) and suggests a new network architecture,
namely, Rank eXpanded Network (ReXNet).
|
|
AdamP: Slowing Down the Slowdown for Momentum Optimizers on Scale-invariant Weights
Byeongho Heo*,
Sanghyuk Chun*,
Seong Joon Oh,
Dongyoon Han,
Youngjung Uh,
Sangdoo Yun,
Jungwoo Ha.
*Equal contribution
ICLR, 2021
Bibtex /
Code /
Project
Adding projection operation on Adam and SGD optimizer to mitigate slowdown of convergence due to rapidly increased norm.
It leads to performance improvements across the board with easy installation (pip install adamp).
|
|
VideoMix: Rethinking Data Augmentation for Video Classification
Sangdoo Yun,
Seong Joon Oh,
Byeongho Heo,
Dongyoon Han,
Jinhyung Kim.
arXiv, 2020
Bibtex
Extension of CutMix to video recognition.
We search for the best mixing strategy for video tasks.
|
|
Learning De-biased Representations with Biased Representations
Hyojin Bahng,
Sanghyuk Chun,
Sangdoo Yun,
Jaegul Choo,
Seong Joon Oh,
ICML, 2020
Bibtex /
Code /
ICML Virtual /
Youtube
Models tend to learn biased representations.
To "de-bias" model representation, we "minus" biased representation from the target model.
|
|
EXTD: Extremely tiny face detector via iterative filter reuse
Youngjoon Yoo,
Dongyoon Han,
Sangdoo Yun.
arXiv, 2019
Bibtex /
Code
Face detector has multi-stage for multi-resolution, but it indeed does not require such complex feature encoding.
We introduce an extremely tiny face detector via iterative filter reuse.
|
|
CutMix: Regularization Strategy to Train Strong Classifiers with Localizable Features
Sangdoo Yun,
Dongyoon Han,
Seong Joon Oh,
Sanghyuk Chun,
Junsuk Choe,
Youngjoon Yoo.
ICCV, 2019 (Oral Presentation)
Bibtex /
Code /
Talk /
Poster /
Blog
Simple cut-and-paste strategy brings significant performance boosts across tasks and datasets.
|
|
What Is Wrong with Scene Text Recognition Model Comparisons? Dataset and Model Analysis
Jeonghun Baek,
Geewook Kim,
Junyeop Lee,
Sungrae Park,
Dongyoon Han,
Sangdoo Yun,
Seong Joon Oh,
Hwalsuk Lee.
ICCV, 2019 (Oral Presentation)
Bibtex /
Code
Scene text recognition evaluation has been somewhat wrong because the model and dataset were not controlled.
We provide unified benchmark protocol and fairly reproduced results. We also found a new architecture from those unified experiments.
|
|
A Comprehensive Overhaul of Feature Distillation
Byeongho Heo,
Jeesoo Kim,
Sangdoo Yun,
Hyojin Park,
Nojun Kwak,
Jin Young Choi.
ICCV, 2019
Bibtex /
Code
There are lots of options for feature distillation: loss function, distillation position, teacher/student transforms.
We study all the possible methods and provide comprehensive overhaul for feature distillation.
Through this, we found the best feature distillation method which even beats the teacher's accuracy.
|
|
An Empirical Evaluation on Robustness and Uncertainty of Regularization Methods
Sanghyuk Chun,
Seong Joon Oh,
Sangdoo Yun,
Dongyoon Han,
Junsuk Choe,
Youngjoon Yoo.
ICML Workshop, 2019
Bibtex
We provide structured experimental results for the effectiveness of regularization methods on robustness and uncertainty benchmarks.
|
|
Character Region Awareness for Text Detection
Youngmin Baek,
Bado Lee,
Dongyoon Han,
Sangdoo Yun,
Hwalsuk Lee.
CVPR, 2019
Bibtex /
Code
Text detectors often fail to detect real-world scene-texts, e.g., curved or long texts.
We propose a two-stage approach; first detect individual characters and connect them.
We also introduce semi-weakly-supervised training trick to boost our detector's performance.
|
|
Knowledge Transfer via Distillation of Activation Boundaries Formed by Hidden Neurons
Byeongho Heo,
Minsik Lee,
Sangdoo Yun,
Jin Young Choi.
AAAI, 2019 (Oral Presentation)
Bibtex /
Code
Previous feature distillation approach (e.g. FitNet) focuses on mimicking the teacher's feature values.
Rather, our goal is to transfer the actual "activation boundary" by assigning binary labels (i.e. activated or not) for all the neurons.
Our loss minimizes the binary-labels' similarity.
It shows outperforming performance against state-of-the-art KD methods.
|
|
Knowledge Distillation with Adversarial Samples Supporting Decision Boundary
Byeongho Heo,
Minsik Lee,
Sangdoo Yun,
Jin Young Choi.
AAAI, 2019
Bibtex /
Code
To find teacher network's decision boundary more precisely, we adopt adversarial attack technique.
We show the attacked samples improve distillation performance.
|
|
Unsupervised Holistic Image Generation from Key Local Patches
Donghoon Lee,
Sangdoo Yun,
Sungjoon Choi,
Hwiyeon Yoo,
Ming-Hsuan Yang,
Songhwai Oh
ECCV, 2018
Bibtex /
Code
We train a GAN model that generates a holistic image from its small parts.
|
|
Context-aware Deep Feature Compression for High-speed Visual Tracking
Jongwon Choi,
Hyung Jin Chang,
Tobias Fischer,
Sangdoo Yun,
Kyuewang Lee,
Jiyeoup Jeong,
Yiannis Demiris,
Jin Young Choi.
CVPR, 2018
Bibtex /
Code
Correlation-based trackers have shown promising performance using hand-crafted features (e.g., HOG).
When adopting deep features for correlation-based trackers, the bottleneck is the computing costs for CNN feature extraction.
We propose a deep feature compression method for high-speed and high-accuracy visual tracker.
|
|
Action-Driven Visual Object Tracking with Deep Reinforcement Learning
Sangdoo Yun,
Jongwon Choi,
Youngjoon Yoo,
Kimin Yun,
Jin Young Choi.
TNNLS, 2018
Bibtex /
Code
A journal extension of ADNet (CVPR'17).
|
|
Action-Decision Networks for Visual Tracking with Deep Reinforcement Learning
Sangdoo Yun,
Jongwon Choi,
Youngjoon Yoo,
Kimin Yun,
Jin Young Choi.
CVPR, 2017 (Spotlight Presentation)
Bibtex /
Code
We fomulate visual tracking as a decision making process and propose a reinforcement learning method to train visual trackers.
Our RL-based tracker shows state-of-the-art level performance and especially it shows high efficiency with semi-supervised scenario.
|
|
Variational Autoencoded Regression: High Dimensional Regression of Visual Data on Complex Manifold
Youngjoon Yoo,
Sangdoo Yun,
Hyung Jin Chang,
Yiannis Demiris,
Jin Young Choi.
CVPR, 2017
Bibtex /
Code
Generating visual data from given condition (e.g. frame index, pose skeleton, etc.) is difficult due to the visual data's high dimensions.
Our idea is to regress the visual data in latent space which is encoded by VAE.
Our method can generate high-quality visual data from frame index or pose skeletons.
|
|
Attentional Correlation Filter Network for Adaptive Visual Tracking
Jongwon Choi,
Hyung Jin Chang,
Sangdoo Yun,
Tobias Fischer,
Yiannis Demiris,
Jin Young Choi.
CVPR, 2017
Bibtex /
Code
Correlation-filter-based trackers usually use pre-defined feature extractor (e.g., color, edge, etc).
Using more correlation filters with diverse feature extractors at the same time will bring higher accuracy, but it induces speed-accuracy trade-off.
This work extends the number of correlation filters more than one hundred for maximizing accuracy.
To deal with heavy computation, we introduce a LSTM-based attentional filter selection approach.
Our method the state-of-the-art performance amongst real-time trackers.
|
|
PaletteNet: Image Recolorization with Given Color Palette
Junho Cho,
Sangdoo Yun,
Kyoung Mu Lee,
Jin Young Choi.
CVPR Workshop, 2017
Bibtex
We propose a image colorization method from the given palette.
|
|
Butterfly Effect: Bidirectional Control of Classification Performance by Small Additive Perturbation
Youngjoon Yoo,
Seonguk Park,
Junyoung Choi,
Sangdoo Yun,
Nojun Kwak.
arXiv, 2017
Bibtex
This paper proposes a new algorithm for controlling classification results by generating a small perturbation without changing the classifier network.
We show that the perturbation can degrade the performance like adversarial attack, or can improve classification accuracy as well.
|
|
Visual Path Prediction in Complex Scenes with Crowded Moving Objects
Youngjoon Yoo,
Kimin Yun,
Sangdoo Yun,
JongHee Hong,
Hawook Jeong,
Jin Young Choi.
CVPR, 2016
Bibtex
Learn latent Dirichlet allocation model from the trajectory of people and predict future paths of people.
|
|
Density-aware Pedestrian Proposal Networks for Robust People Detection in Crowded Scenes
Sangdoo Yun,
Kimin Yun,
Jongwon Choi,
Jin Young Choi.
ECCV Workshop, 2016
Bibtex
Detecting people in crowded scene by considering crowd density information.
Our intuition is more people should be detected in crowded region.
|
|
Voting-based 3D Object Cuboid Detection Robust to Partial Occlusion from RGB-D Images
Sangdoo Yun,
Hawook Jeong,
Soo Wan Kim,
Jin Young Choi.
WACV, 2016
Bibtex
Predicting holistic 3D structure from pratially occluded RGB-D images.
The key idea is a voting mechanism. Each part of an object indicates the center of the 3D structure.
|
|
Visual Surveillance Briefing System: Event-based Video Retrieval and Summarization
Sangdoo Yun,
Kimin Yun,
Soo Wan Kim,
Youngjoon Yoo,
Jiyeoup Jeong.
AVSS, 2014 (Oral Presentation)
Bibtex
We propose a Visual Surveillance Briefing (VSB) system which generates summarized video with important events.
|
|
Self-organizing Cascaded Structure of Deformable Part Models for Fast Object Detection
Sangdoo Yun,
Hawook Jeong,
Woo-Sung Kang,
Byeongho Heo,
Jin Young Choi.
ICPR, 2014
Bibtex
We improve the computational efficiency of deformable part model (DPM) by re-organizing the order of part filters.
With a cascaded structure, we place more important part filter at first for early rejection.
|
|
Multiple ground plane estimation for 3D scene understanding using a monocular camera
Sangdoo Yun,
Soo Wan Kim,
Kwang Moo Yi,
Haan-ju Yoo,
Jin Young Choi.
IVCNZ, 2012 (Oral Presentation)
Bibtex
Ground plain estimation is important for 3D scene understanding.
Usually models assume the scene has a single ground plain, but sometimes it has multiple ground planes.
We introduce multiple ground plane estimation for more robust scene understanding.
|
|
Workshop on ImageNet: Past, Present, and Future.
Zeynep Akata,
Lucas Beyer,
Sanghyuk Chun,
Almut Sophia Koepke,
Diane Larlus,
Seong Joon Oh,
Rafael Sampaio de Rezende,
Sangdoo Yun,
Xiaohua Zhai.
NeurIPS, 2021
Website /
Virtual Page /
Preview in CV News
ImageNet has played an important role in CV and ML in the last decade.
It was created to train image classifiers at first but it has become a go-to benchmark for model architecture and training techniques.
We believe now is a good time to discuss the ImageNet and its future.
The workshop's questions will be like:
Did we solve ImageNet?
What have we learned from ImageNet?
What should the next-generation ImageNet-like dataset be?
|
Reviewing activities
- Serve as an area chair at NeurIPS'23 D&B, NeurIPS'24 D&B, ECCV 2024.
- Serve as a meta-reviewer at AAAI'22, AAAI'23, AAAI'24
- Serve as a reviewer at CVPR, ICCV, ECCV, ICML, NeurIPS, ICLR, AAAI, etc.
- Outstanding reviewer awards at
CVPR'21,
ICCV'21,
CVPR'22.
|
Talks
- Towards Strong and Robust Deep Models -– Insights from Data and Supervision, RCV Workshop @ ICCV 2023
- Towards Strong and Robust Deep Models -– Insights from Data and Supervision, POSTECH, Sep, 2023
- Towards Strong and Robust Deep Models -– Insights from Data and Supervision, UNIST, Sep, 2023
- How to Make Deep Models Strong and Robust, Sogang Uviv, Sep, 2022
- How to Make Deep Models Strong and Robust, Korea Uviv, Apr, 2022
- How to Make Deep Models Strong and Robust, UNIST, Jun, 2021
|
|