[논문 리딩] EAGLE: Eigen Aggregation Learning for Object-Centric Unsupervised Semantic Segmentation

AIst 2024. 6. 29. 14:50

2024. 6. 29. 14:50

Article References

Kim, C., Han, W., Ju, D., & Hwang, S. J. (2024). EAGLE: Eigen Aggregation Learning for Object-Centric Unsupervised Semantic Segmentation. arXiv (Cornell University). https://doi.org/10.48550/arxiv.2403.01482

EAGLE: Eigen Aggregation Learning for Object-Centric Unsupervised Semantic Segmentation

Global Issue:
- Pixel-level annotation process is time-consuming => Unsupervised semantic segmentation (USS) 집중
Patch-level feature를 사용하는 traditional USS의 문제점:
- The lack of explicit object-level semantic encoding in patch-level features
- 이 문제는 Inadequate segmentation of complex objects with diverse structures로 이어짐.

Patch-level feature은 object의 경계 부분을 제대로 처리하지 못할 수도 있음. 즉, 객체가 patch 경계에 걸쳐 있는 경우, 해당 객체를 두 개 이상의 패치로 나누어 처리해야하므로 경계 부분에서 정확도가 떨어지며, 또한 object 전체를 포괄하는 의미적 정보를 잘 반영하지 못함.

따라서 이 논문에서는:
1. EiCue라는 spectral technique을 도입하여 더 정확하고 일관된 semantic 정보와 구조적 정보를 제공한다.
  - Deep image features의 semantic similarity matrix와 이미지의 color affinity에서 유도된 eigenbasis를 이용
2. Incorporating object-centric contrastive loss with EiCue
  - EiCue와 object-centric contrastive loss를 결합하여 intra- and inter-image object-feature consistency를 학습하도록 하여 semantic accuracy를 향상

Eigen vector가 computer vision에서 중요한 이유?
1. 차원 축소 기법인 Principal Component Analaysis (PCA)에서 핵심 역할을 함. PCA는 고차원의 데이터를 저차원으로 변환하여 데이터의 중요한 패턴을 유지하면서 연산 복잡도를 줄임.
2. 이미지 데이터의 고유 벡터는 이미지의 중요한 특징을 나타냄. (예를 들어, Eigenfaces 알고리즘은 얼굴 인식에서 사용되며, 고유 벡터를 통해 얼굴 이미지의 주요 변동 요소를 추출)

Semantic segmentation?
이미지 또는 비디오의 각 픽셀을 특정 클래스로 분류하는 과정
Semantic segmentation에서 'Semantic'이 붙는 이유는 단순히 이미지 분할하는 것 이상의 작업을 수행하기 때문이다. 즉, 단순히 이미지를 부분적으로 나누는 것(segmentation)이 아니라, 각 부분(픽셀이나 영역)에 의미 있는 레이블을 할당하기 때문이다.

EAGLE Mechanism
1. Unlabeled Images Preparation
2. Feature K extraction
  - use a self-supervised pre-trained vision transformer as an image encoder
  - 위의 물결 표시는 input image가 augmented image일 경우를 의미한다.
  - K: structural information about the objects based on the attention mechanism
3. Semantic Feature S extraction
  - Feature K에서 semantic features S를 추출
  - S = S_θ(K)
4. EiCue via the Eigen Aggregation Module
  - derive a strong and simple semantic structural cue (EiCue)
  - use Spectral Clustering
  - EiCue Construction: Spectral Clustering을 사용하여 adjacency matrix A를 생성하고, graph Laplacian L을 구성한 후, eigendecomposition을 수행하여 eigenbasis V를 유도
    1. Production of an adjacency matrix A
    2. Construction of the graph Laplacian L
    3. Performing the eigendecomposition on L to derive the eigenbasis V
  - EiCue-based ObjNCELoss (object-centric contrastive learning strategy)
    - Aggregation of object representation and Creation of a segmentation map that reflects object semantic representation are important!
    - refine the discriminative capabilities of feature S for distinctions among various object semantics
5. Inferences
  - Inference 시에 새로운 이미지가 주어지면, semantic feature S를 기반으로 K-means clustering 및 linear probing과 같은 conventional evaluation setups을 사용하여 최종 semantic segmentation output을 얻음
    - Key point: S_θ를 훈련시켜 strong semantic features S를 내놓는 것이 중요하다.
    - K-means clustering: 데이터를 군집(클러스터)으로 나누는 비지도 학습 알고리즘. 이를 통해 비슷한 특성을 가진 데이터들이 같은 그룹에 속하도록 함.
    - Linear Probing: 주로 사전 학습된 모델에서 특징(feature)을 추출하고 이를 선형 모델로 평가하는 방법

Overview of EAGLE (Eigen AGgregation LEarning)

Computer Vision Trend in this Article:
- Time-consuming process of pixel-level annotation의 문제점을 해결하기 위해, unsupervised learning에 대한 연구가 진행되고 있다.

'Reading Articles > Computer Vision' 카테고리의 다른 글

[논문 리딩] Masked Autoencoders Are Scalable Vision Learners (0)	2024.07.28
[논문리딩] Censored Sampling of Diffusion Models Using 3 Minutes of Human Feedback (0)	2024.07.18

AIst's Blog

[논문 리딩] EAGLE: Eigen Aggregation Learning for Object-Centric Unsupervised Semantic Segmentation

'Reading Articles > Computer Vision' 카테고리의 다른 글

+ Recent posts

티스토리툴바