🎉

PR] One paper has been accepted

Tags

Academic

Time

2023/06/27

2 more properties

One paper has been accepted to Pattern Recognition (IF 8.518).

Title: ENInst: Enhancing Weakly-supervised Low-shot Instance Segmentation [Paper][arXiv]

Authors: Moon Ye-Bin (POSTECH), Dongmin Choi (KAIST), Yongjin Kwon (ETRI), Junsik Kim (Harvard University), Tae-Hyun Oh (POSTECH)

[Abstract]

We address a weakly-supervised low-shot instance segmentation, an annotation-efficient training method to deal with novel classes effectively. Since it is an under-explored problem, we first investigate the difficulty of the problem and identify the performance bottleneck by conducting systematic analyses of model components and individual sub-tasks with a simple baseline model. Based on the analyses, we propose ENInst with sub-task enhancement methods: instance-wise mask refinement for enhancing pixel localization quality and novel classifier composition for improving classification accuracy. Our proposed method lifts the overall performance by enhancing the performance of each sub-task. We demonstrate that our ENInst is 7.5 times more efficient in achieving comparable performance to the existing fully-supervised few-shot models and even outperforms them at times.

[EnInst]

Illustration of our ENInst (left) and the proposed enhancement methods developed by our analysis (right). 1) We train the whole network in the base training phase (gray region), 2) fine-tune the prediction heads (blue region), where the classification head for novel classes is parameterized by a linear combination of base classifiers and random vectors, named as the novel classifier composition (NCC; pink region), and its coefficients are fine-tuned with Manifold Mixup (Verma et al., 2019), 3) and then conduct inference with instance-wise mask refinement (IMR; yellow region) in a test-time optimization manner.

[Results]

Label efficiency of ENInst on MS-COCO (Lin et al., 2014). Our ENInst needs much fewer clicks to achieve similar performance to fully-supervised MTFA (Ganea et al., 2021), where F denotes fully-supervised setting with mask, and W denotes weak one with bounding box for novel classes adaption.