TY - GEN
T1 - Data curation with deep learning
AU - Thirumuruganathan, Saravanan
AU - Tang, Nan
AU - Ouzzani, Mourad
AU - Doan, An Hai
N1 - Publisher Copyright:
© 2020 Copyright held by the owner/author(s).
PY - 2020
Y1 - 2020
N2 - Data curation – the process of discovering, integrating, and cleaning data – is one of the oldest, hardest, yet inevitable data management problems. Despite decades of efforts from both researchers and practitioners, it is still one of the most time consuming and least enjoyable work of data scientists. In most organizations, data curation plays an important role so as to fully unlock the value of big data. Unfortunately, the current solutions are not keeping up with the ever-changing data ecosystem, because they often require substantially high human cost. Meanwhile, deep learning is making strides in achieving remarkable successes in multiple areas, such as image recognition, natural language processing, and speech recognition. In this vision paper, we explore how some of the fundamental innovations in deep learning could be leveraged to improve existing data curation solutions and to help build new ones. We identify interesting research opportunities and dispel common myths. We hope that the synthesis of these important domains will unleash a series of research activities that will lead to significantly improved solutions for many data curation tasks.
AB - Data curation – the process of discovering, integrating, and cleaning data – is one of the oldest, hardest, yet inevitable data management problems. Despite decades of efforts from both researchers and practitioners, it is still one of the most time consuming and least enjoyable work of data scientists. In most organizations, data curation plays an important role so as to fully unlock the value of big data. Unfortunately, the current solutions are not keeping up with the ever-changing data ecosystem, because they often require substantially high human cost. Meanwhile, deep learning is making strides in achieving remarkable successes in multiple areas, such as image recognition, natural language processing, and speech recognition. In this vision paper, we explore how some of the fundamental innovations in deep learning could be leveraged to improve existing data curation solutions and to help build new ones. We identify interesting research opportunities and dispel common myths. We hope that the synthesis of these important domains will unleash a series of research activities that will lead to significantly improved solutions for many data curation tasks.
UR - https://www.scopus.com/pages/publications/85084190895
M3 - Conference contribution
AN - SCOPUS:85084190895
T3 - Advances in Database Technology - EDBT
SP - 277
EP - 286
BT - Advances in Database Technology - EDBT 2020
A2 - Bonifati, Angela
A2 - Zhou, Yongluan
A2 - Vaz Salles, Marcos Antonio
A2 - Bohm, Alexander
A2 - Olteanu, Dan
A2 - Fletcher, George
A2 - Khan, Arijit
A2 - Yang, Bin
PB - OpenProceedings.org
T2 - 23rd International Conference on Extending Database Technology, EDBT 2020
Y2 - 30 March 2020 through 2 April 2020
ER -