The Vision Research Lab of UCSB at TRECVID 2007

Elisa Drelie Gelasca, Swapna Joshi, James Kleban, Stephen Mangiat,
B.S. Manjunath, Emily Moxley, Anindya Sarkar, Jiejun Xu
Vision Research Lab, University of California at Santa Barbara
Santa Barbara, California 93106 U.S.A.
drelie, sjoshi, kleban, smangiat, emoxley, manj, anindya, jiejun [at] ece.ucsb.edu

Abstract

The Vision Research Lab at the University of California at Santa Barbara participated in three TRECVID 2007 tasks: rushes summarization, high level feature extraction, and search. This paper describes contributions in the high level feature and search tasks. The high level feature submissions relied on visual features for three runs, audio features exclusively for one, and a fusion of audio and visual for the remaining two; Table 1 provides a summary. Four MPEG- 7 features (DCD, CLD, EHD, and HTD) comprised the global visual features, and a SIFT signature from a vocabulary tree generates the local-feature representation. It was discovered that the local features performed quite well independently. We combined audio and visual methods as a weighted fusion using SVM scores from the visual features, kNN-derived scores for the visual features, and audio feature SVM scores. Linear fusion using a grid search for weights on the visual features, without audio, is found to perform best. Additionally, we submitted a fused run based on weighted Borda counting on the ranked lists from audio, global visual features, local visual features, and a face feature. This run had similar performance to the weighted fusion that also included audio. All of our runs were type A, only using commonly annotated data for training.
[PDF] [BibTex]
E. Drelie Gelasca, S. Joshi, J. Kleban, S. Mangiat, B.S. Manjunath, E. Moxley, A. Sarkar, and J. Xu,
Proceedings of the TRECVID 2007 Workshop, Nov. 2007.
Node ID: 481 , DB ID: 287 , Lab: VRL , Target: Proceedings
Subject: [Managing Multimedia Databases] « Look up more