Discovering correlations in annotated databases

Xuebin He, Stephen Donohue, Mohamed Y. Eltabakh

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Most emerging applications, especially in science domains, maintain databases that are rich in metadata and annotation information, e.g., auxiliary exchanged comments, related articles and images, provenance information, corrections and versioning information, and even scientists' thoughts and observations. To manage these annotated databases, numerous techniques have been proposed to extend the DBMSs and efficiently integrate the annotations into the data processing cycle, e.g., storage, indexing, extended query languages and semantics, and query optimization. In this paper, we address a new facet of annotation management, which is the discovery and exploitation of the hidden corrections that may exist in annotated databases. Such correlations can be either between the data and the annotations (data-to-annotation), or between the annotations themselves (annotation-to-annotation). We make the case that the discovery of these annotation-related correlations can be exploited in various ways to enhance the quality of the annotated database, e.g., discovering missing attachments, and recommending annotations to newly inserted data. We leverage the state-ofart in association rule mining in innovative ways to discover the annotation-related correlations. We propose several extensions to the state-of-art in association rule mining to address new challenges and cases specific to annotated databases, i.e., incremental addition of annotations, and hierarchy-based annotations. The proposed algorithms are evaluated using real-world applications from the biological domain, and an end-to-end system including an Excel-based GUI is developed for seamless manipulation of the annotations and their correlations.

Original languageEnglish
Title of host publicationAdvances in Database Technology - EDBT 2016
Subtitle of host publication19th International Conference on Extending Database Technology, Proceedings
EditorsIoana Manolescu, Evaggelia Pitoura, Amelie Marian, Sofian Maabout, Letizia Tanca, Georgia Koutrika, Kostas Stefanidis
PublisherOpenProceedings.org
Pages503-514
Number of pages12
ISBN (Electronic)9783893180707
DOIs
Publication statusPublished - 2016
Externally publishedYes
Event19th International Conference on Extending Database Technology, EDBT 2016 - Bordeaux, France
Duration: 15 Mar 201618 Mar 2016

Publication series

NameAdvances in Database Technology - EDBT
Volume2016-March
ISSN (Electronic)2367-2005

Conference

Conference19th International Conference on Extending Database Technology, EDBT 2016
Country/TerritoryFrance
CityBordeaux
Period15/03/1618/03/16

Fingerprint

Dive into the research topics of 'Discovering correlations in annotated databases'. Together they form a unique fingerprint.

Cite this