DATA CLEANING METHODS AND SYSTEMS

Nan Tang (Inventor), Mourad Ouzzani (Inventor), Paolo Papotti (Inventor), Ihab Francis Ilyas Kaldas (Inventor), Xu Chu (Inventor)

Research output: Patent

Abstract

A end-to-end system to annotate unknown type data instances using a knowledge base and crowdsourcing. A computer implemented method for cleaning a database instance using a plurality of holistic patterns, the database instance comprising a plurality of dirty tuples with unknown attribute data types, the method comprising: generating a plurality of candidate holistic patterns using the database instance and a knowledge base, the knowledge base comprising data-types and data-type relationships; determining a valid holistic pattern from the plurality of candidate holistic patterns using at least one of: the knowledge base; and a crowd of users which validate the data- types and the data-type relationships; annotating tuples in the database instance using the valid holistic pattern, wherein the method annotates the tuples with annotations indicating at least one of: knowledge base validated; jointly validated, wherein the crowd of users who at least partially validate the holistic pattern; or erroneous, and repairing the erroneous annotated tuples to generate a clean database instance.

Original languageEnglish
Patent numberWO2015181511
IPCG06F 17/ 30 A I
Priority date30/05/14
Publication statusPublished - 3 Dec 2015

Fingerprint

Dive into the research topics of 'DATA CLEANING METHODS AND SYSTEMS'. Together they form a unique fingerprint.

Cite this