Pixel-Level Noise Mining for Weakly Supervised Salient Object Detection
Abstract
Training a deep model for visual saliency detection requires the collection and labor-intensive annotation of overwhelmingly large data. We propose to learn saliency detection in a weakly supervised manner from single noisy label, which is easy to obtain from unsupervised handcrafted feature-based methods. However, deep networks tend to overfit such noises leading to a dramatic drop in accuracy. Given our goal, we address a natural question: can we identify outliers during network prediction and rectify the label noises? To this end, we propose a pixel-level noise mining framework for robust salient object detection (SOD) by exploiting its own knowledge, and without the need for external models. Specifically, during the early training stage, we progressively identify the outliers from a novel perspective during saliency detection, before the network overfits to the noisy labels, and generate a selection matrix in each iteration. Next, we adaptively rectify the label noises under the guidance of the selection matrix for better supervision in the later training stage. Extensive experiments on multiple benchmark datasets demonstrate the superiority of our method showing its ability to learn saliency detection comparable to state-of-the-art fully supervised methods. Furthermore, our approach outperforms existing weakly supervised methods utilizing single noisy label and surpasses the half of existing weakly supervised methods employing multiple noisy labels. Our approach, which trains with multiple noisy labels, outperforms all other methods employing multiple noisy labels across four major datasets. Furthermore, we also evaluate the generalization ability of our method on the multiclass semantic segmentation (SS) task. Our code is available at https://github.com/kendongdong/NoiseMining.