Representing and Querying Correlated Tuples in Probabilistic Databases 论文

2007引用 239
Bayesian Modeling and Causal InferenceData Management and AlgorithmsAdvanced Database Systems and Queries

摘要

Probabilistic databases have received considerable attention recently due to the need for storing uncertain data produced by many real world applications. The widespread use of probabilistic databases is hampered by two limitations: (1) current probabilistic databases make simplistic assumptions about the data (e.g., complete independence among tuples) that make it difficult to use them in applications that naturally produce correlated data, and (2) most probabilistic databases can only answer a re-stricted subset of the queries that can be expressed using traditional query languages. We address both these limitations by proposing a framework that can represent not only probabilistic tuples, but also correlations that may be present among them. Our proposed framework naturally lends itself to the possible world semantics thus preserving the precise query semantics extant in current probabilistic databases. We develop an effi-cient strategy for query evaluation over such probabilistic databases by casting the query processing problem as an inference problem in an ap-propriately constructed probabilistic graphical model. We present several optimizations specific to probabilistic databases that enable efficient query evaluation. We validate our approach by presenting an experimental eval-uation that illustrates the effectiveness of our techniques at answering various queries using real and synthetic datasets. 1