Abstract
Data pipelines are the new code. Consequently, data scientists need new tools to support the often time-consuming process of debugging their pipelines. We introduce Dagger, an end-to-end system to debug and mitigate data-centric errors in data pipelines, such as a data transformation gone wrong or a classifier underperforming due to noisy training data. Dagger supports inter-module debugging, where the pipeline blocks are treated as black boxes, as well as intra-module debugging, where users can debug data objects in Python scripts (e.g., DataFrames). In this demo, we will walk the audience through a rich, real-world business intelligence use case from our industrial collaborators at Intel, to highlight how Dagger enables data scientists to productively identify and mitigate data-centric problems at different stages of pipeline development.
| Original language | English |
|---|---|
| Pages (from-to) | 2993-2996 |
| Number of pages | 4 |
| Journal | Proceedings of the VLDB Endowment |
| Volume | 13 |
| Issue number | 12 |
| DOIs | |
| Publication status | Published - 2020 |
Fingerprint
Dive into the research topics of 'Debugging Large-Scale Data Science Pipelines using Dagger'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver