Two papers have been accepted to IEEE TPAMI. TPAMI has the highest impact factor across all computer science categories (the second-highest impact factor of all IEEE publications). As of 2019, the impact factor of TPAMI is 17.861.
Title: Dense Relational Image Captioning via Multi-task Triple-Stream Networks
Authors: Dong-Jin Kim (KAIST), Tae-Hyun Oh (POSTECH), Jinsoo Choi (Apple), In So Kweon (KAIST)
We introduce dense relational captioning, a novel image captioning task which aims to generate multiple captions with respect to relational information between objects in a visual scene. Relational captioning provides explicit descriptions of each relationship between object combinations. This framework is advantageous in both diversity and amount of information, leading to a comprehensive image understanding based on relationships, e.g., relational proposal generation. For relational understanding between objects, the part-of-speech (POS, i.e., subject-object-predicate categories) can be a valuable prior information to guide the causal sequence of words in a caption. We enforce our framework to not only learn to generate captions but also predict the POS of each word. To this end, we propose the multi-task triple-stream network (MTTSNet) which consists of three recurrent units responsible for each POS which is trained by jointly predicting the correct captions and POS for each word. In addition, we found that the performance of MTTSNet can be improved by modulating the object embeddings with an explicit relational module. We demonstrate that our proposed model can generate more diverse and richer captions, via extensive experimental analysis on large scale datasets and several metrics. We additionally extend analysis to an ablation study, applications on holistic image captioning, scene graph generation, and retrieval tasks.
Title: Robust and Efficient Estimation of Relative Pose for Cameras on Selfie Sticks
Authors: Kyungdon Joo (UNIST), Hongdong Li (ANU), Tae-Hyun Oh (POSTECH), In So Kweon (KAIST) (Corresponding author: Tae-Hyun Oh)
Taking selfies has become one of the major photographic trends of our time. In this study, we focus on the selfie stick, on which a camera is mounted to take selfies. We observe that a camera on a selfie stick typically travels through a particular type of trajectory around a sphere. Based on this finding, we propose a robust, efficient, and optimal estimation method for relative camera pose between two images captured by a camera mounted on a selfie stick. We exploit the special geometric structure of camera motion constrained by a selfie stick and define this motion as spherical joint motion. Utilizing a novel parametrization and calibration scheme, we demonstrate that the pose estimation problem can be reduced to a 3-degrees of freedom (DoF) search problem, instead of a generic 6-DoF problem. This facilitates the derivation of an efficient branch-and-bound optimization method that guarantees a global optimal solution, even in the presence of outliers. Furthermore, as a simplified case of spherical joint motion, we introduce selfie motion, which has a fewer number of DoF than spherical joint motion. We validate the performance and guaranteed optimality of our method on both synthetic and real-world data. Additionally, we demonstrate the applicability of the proposed method for two applications: refocusing and stylization.