APLA: Indexing Arbitrary Probability Distributions

Vebjorn Ljosa and Ambuj K. Singh
University of California, Santa Barbara
{ljosa,ambuj} [at] cs.ucsb.edu

Abstract

The ability to store and query uncertain information is of great benefit to databases that infer values from a set of observations, including databases of moving objects, sensor readings, historical business transactions, and biomedical images. These observations are often inexact to begin with, and even if they are exact, a set of observations of an attribute of an object is better represented by a probability distribution than by a single number, such as a mean. In this paper, we present adaptive, piecewise-linear approximations (APLAs), which represent arbitrary probability distributions compactly with guaranteed quality. We also present the APLA-tree, an index structure for APLAs. Because APLA is more precise than existing approximation techniques, the APLAtree can answer probabilistic range queries twice as fast. APLA generalizes to multiple dimensions, and the APLA-tree can index multivariate distributions using either one-dimensional or multidimensional APLAs. Finally, we propose a new definition of k-NN queries on uncertain data. The new definition allows APLA and the APLA-tree to answer k-NN queries quickly, even on arbitrary probability distributions. No efficient k-NN search was previously possible on such distributions.
[PDF] [BibTex]
Vebjorn Ljosa and Ambuj K. Singh,
23rd International Conference on Data Engineering (ICDE), pp. 946-955, Apr. 2007.
Node ID: 453 , DB ID: 257 , Lab: BIO , Target: Proceedings