Jelena Tešić | Research Directions in 2003

Capturing and organizing vast volumes of multimedia data, requires new information processing techniques in the context of pattern recognition and data mining. This problem has received much attention in the last decade, especially following explosion in media data storage and usage. There is a strong need for more effective systems to handle large media data processing and understanding. The challenges dealing with these data are many, including such inter-related issues as high dimensionality of the image feature descriptors, similarity metrics, and indexing. The unique nature of the media data makes the problem significantly more difficult and interesting with many commercial and scientific applications. My research interests include topics in content-based retrieval, pattern recognition, similarity search and indexing in multimedia databases, image processing and analysis, semantic classification and labelling, relevance feedback, multimedia mining and knowledge discovery, dimensionality reduction, learning, clustering, and statistical modelling of multimedia feature data-sets. Main contribution of my doctorate dissertation is to enable efficient, effective, and interactive data access in large scientific (aerial and biomedical) multimedia systems. Current research issues, my contributions and the research plan follows.

Feature Extraction

Texture descriptor proved to be an effective one for the image and video database search and retrieval. My research contribution was to propose a novel approach to dimensionality reduction of feature vector based on MPEG7 descriptor origin, that are cost effective and removes data redundancies. A modified texture descriptor that has comparable performance, but with nearly half the dimensionality and less computational expense. Furthermore, it is easy to compute the new feature using the old one, without having to repeat the computationally expensive filtering step. This work offers orders of magnitude improvement comparing to the existing methods. We also propose a new normalization and indexing methods that improve similarity retrieval and indexing efficiency, based on filter statistics behavior along feature dimensions. Future work on this subject includes efficient approximate search using bit pattern. Instead of a static-length descriptor, we will consider a variable texture descriptor of a smaller dimension, and a similarity measure that sets a lower bound on the distance comparison along the dimensions if filter outputs follow different distributions. This method will result in a more effective perceptual similarity computation , since we do not give the actual weight to the components that are not comparable (i.e. we will not compare apples and oranges). Similar approach can be adopted for an effective search and retrieval in the color and shape descriptor space.

Nearest Neighbor Search

Relevance feedback was introduced to facilitate interactive learning for refining the retrieval results. However, nearest neighbor computations over a large number of dataset items alone is expensive. This is further aggravated by the fact that image descriptors are in very high dimensions, and the need to perform this search repetitively in relevance feedback. My contribution was a nearest neighbor search method to considerably accelerate interactive searches in the general context of relevance feedback mechanisms. The proposed scheme exploits correlations between two consecutive nearest neighbor sets and significantly reduces the overall search complexity for general distance metric updates and compression-based data indexing. Future work will focus on vector quantization based indexing and efficient approximate search for non-linear relevance feedback scenarios. Approximate search in those scenarios can speed up by: (1) estimating kernels in the mapped space using previously suggested estimated kernels of interest in the original space, and (2) using cluster representatives in the filtering phase. The focus of this project will be integration of the proposed work into a prototype system that would facilitate fast, iterative, approximate similarity searches in biomedical data.

Supporting Complex Queries

Commercial and Scientific multimedia data repositories (such as consumer's multimedia files and biomedical datasets) are sensitive to its multi-modal characteristics. There is a need to develop new data models and content classifiers in order to utilize label and descriptor information. My contribution was an effective multimedia classifying system called Modular Intelligent Multimedia Analysis System (see PATENT section in my resume). System has large variety of media attributes and, therefore, its modular structure is needed to increase efficiency. Media data are hierarchically categorized based on its features and descriptive class labels. In this structure, media data is described with label assigned by the system and by the features extracted. Future projects will include developing a hierarchical classifying system that will utilize user's feedback, devising algorithms for query response in this multi-descriptor feature space, and designing a set of tools that will make this hybrid search more effective and efficient in the interactive scenarios with complex descriptors.

High-dimensional Learning

Meaningful semantic analysis and knowledge extraction require data representations that are understandable at a conceptual level. The framework must efficiently summarize information contained in the image data; it must provide scalability with respect to the nature, size and dimension of a dataset; and it must offer simple representations of the results. My contribution was a visual thesaurus that provides summarized data information derived from the low-level features of aerial image dataset and Amazon video key frames. Visually similar tiles are assigned the same class label by partitioning the high dimensional feature space. Data clustering is then used to minimize the impact of the sparsity of high-dimensional feature space on the formed clusters. A representative codeword is selected for each cluster and forms the visual thesaurus entry. Future work includes several learning mechanisms for more efficient high-dimensional feature clustering, in order to develop a generative statistical models for other types of multimedia data. The other direction is further exploration of the joint clustering and compression schemes that provides effective filtering of both the irrelevant data, and irrelevant information about the remaining data by imposing certain structural constraints on building the thesaurus. The idea is to use an algorithm which iteratively optimizes (for a fixed clustering map) the compression of data entries in each cluster, and then optimizes (for fixed compression design) the clustering map.

Multimedia Mining

In the media databases feature descriptors fail to capture image content. Humans can instantly answer the question ``Is this highway going through a desert?'' just by looking at an aerial photograph of a region. This query, essentially formulated as a visual concept, raises many research issues, and the semantic analysis of multimedia content is imperative. My contribution was to introduce a framework for summarizing basic semantic concepts to detect coarse spatial patterns and visual concepts in image and video aerial data and biological imagery. Association rules were introduced as a way of discovering interesting patterns in transactional databases. Visual thesaurus entries and their spatial relationships define a non-traditional space for data mining applications. This space can be used to discover coarse spatial relationships in the dataset. We use Spatial Event Cubes to distill the frequent visual patterns in image and video datasets. A primary contribution is the derivation of image equivalents for the traditional association rule components, namely the items, the itemsets, and the rules. Future work focuses on the extension of the spatial event cubes to include associations within or between other dimensions i.e., spectral, or in the case of video, temporal. Also, the project will focus on proposing new image and video mining methods based on the learned concepts.

Conclusion

There are enormous potential benefits that could result from the implementation of a successful multimedia system. Current research focuses on the design and characteristics of such intelligent multimedia system.  There are many issues to be addressed: based on limited prior knowledge construct complex data model, high-dimensional data handling - scalable structures and algorithms, meaningful classification in huge high-dimensional databases, data mining of structural patterns, and supervised/unsupervised learning and refinement for knowledge extraction 

The combination of proposed projects and tools will enable a development of an organized, easily searchable database of digital images and videos that will support complex search queries and user's interaction. I strongly believe this will enable us to further investigate patterns that are not perceived by human inspection, an allow us to efficiently search in data bases for relevant phenomena. In conclusion, I am still thriving to continue contributing to a variety of research topics ranging from the theoretical advances of image processing, machine learning, and pattern recognition to practical adaptive indexing, compression, and prediction schemes for the system's response that exceeds user's expectations. In addition to making individual and independent efforts in different areas of research, I believe I have successfully collaborated with interdisciplinary people from various backgrounds and that helped me accomplish my goals, and I hope to continue my effort to contribute to science.

Back to Projects page.