Feature fusion and redundancy pruning for rush video summarization

Jim Kleban, Anindya Sarkar, Emily Moxley, Stephen Mangiat, Swapna Joshi, Thomas Kuo and B.S. Manjunath
Vision Research Laboratory, University of California, Santa Barbara, USA
kleban,anindya,emoxley,smangiat,sjoshi,thekuo,manj [at] ece.ucsb.edu


This paper presents a video summarization technique for rushes that employs high-level feature fusion to identify segments for inclusion. It aims to capture distinct video events using a variety of features: k-means based weighting, speech, camera motion, significant differences in HSV colorspace, and a dynamic time warping (DTW) based feature that suppresses repeated scenes. The feature functions are used to drive a weighted k-means based clustering to identify visually distinct, important segments that constitute the final summary. The optimal weights corresponding to the individual features are obtained using a gradient descent algorithm that maximizes the recall of ground truth events from representative training videos. Analysis reveals a lengthy computation time but high quality results (60% average recall over 42 test videos) as based on manually-judged inclusion of distinct shots. The summaries were judged relatively easy to view and had an average amount of redundancy.
[PDF] [BibTex]
J. Kleban, A. Sarkar, E. Moxley, S. Mangiat, S. Joshi, T. Kuo and B.S. Manjunath,
International Workshop on TRECVID Video Summarization, pp. 84-88, Augsburg, Bavaria, Germany, Sep. 2007.
Node ID: 487 , DB ID: 293 , Lab: VRL , Target: Proceedings