Learning Bottom-Up Text Attention Maps For Text Detection Using Stroke Width Transform
Humans have a remarkable ability to quickly discern regions containing text from other noisy regions in images. The primary contribution of this paper is to learn a model to mimic this behavior and aid text detection algorithms. The proposed approach utilizes multiple low level visual features which signify visually salient regions and learns a model to eventually provide a text attention map which indicates potential text regions in images. In the next stage, a text detector using stroke width transform only focusses on these selective image regions achieving dual benefits of reduced computation time and better detection performance. Experimental results on the ICDAR 2003 text detection dataset demonstrate that the proposed method outperforms the baseline implementation of stroke width transform, and the generated text attention maps compare favorably with human fixation maps on text images.