Data curation with deep learning

Saravanan Thirumuruganathan, Nan Tang, Mourad Ouzzani, An Hai Doan

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

39 Citations (Scopus)

Abstract

Data curation – the process of discovering, integrating, and cleaning data – is one of the oldest, hardest, yet inevitable data management problems. Despite decades of efforts from both researchers and practitioners, it is still one of the most time consuming and least enjoyable work of data scientists. In most organizations, data curation plays an important role so as to fully unlock the value of big data. Unfortunately, the current solutions are not keeping up with the ever-changing data ecosystem, because they often require substantially high human cost. Meanwhile, deep learning is making strides in achieving remarkable successes in multiple areas, such as image recognition, natural language processing, and speech recognition. In this vision paper, we explore how some of the fundamental innovations in deep learning could be leveraged to improve existing data curation solutions and to help build new ones. We identify interesting research opportunities and dispel common myths. We hope that the synthesis of these important domains will unleash a series of research activities that will lead to significantly improved solutions for many data curation tasks.

Original languageEnglish
Title of host publicationAdvances in Database Technology - EDBT 2020
Subtitle of host publication23rd International Conference on Extending Database Technology, Proceedings
EditorsAngela Bonifati, Yongluan Zhou, Marcos Antonio Vaz Salles, Alexander Bohm, Dan Olteanu, George Fletcher, Arijit Khan, Bin Yang
PublisherOpenProceedings.org
Pages277-286
Number of pages10
ISBN (Electronic)9783893180837
Publication statusPublished - 2020
Event23rd International Conference on Extending Database Technology, EDBT 2020 - Copenhagen, Denmark
Duration: 30 Mar 20202 Apr 2020

Publication series

NameAdvances in Database Technology - EDBT
Volume2020-March
ISSN (Electronic)2367-2005

Conference

Conference23rd International Conference on Extending Database Technology, EDBT 2020
Country/TerritoryDenmark
CityCopenhagen
Period30/03/202/04/20

Fingerprint

Dive into the research topics of 'Data curation with deep learning'. Together they form a unique fingerprint.

Cite this