TY - GEN
T1 - FormaT5: Abstention and Examples for Conditional Table Formatting with Natural Language
AU - Singh, Mukul
AU - Cambronero, José Cambronero
AU - Gulwani, Sumit
AU - Le, Vu
AU - Negreanu, Carina
AU - Nouri, Elnaz
AU - Raza, Mohammad
AU - Verbruggen, Gust
N1 - Publisher Copyright:
© 2023, VLDB Endowment. All rights reserved.
PY - 2023/11/1
Y1 - 2023/11/1
N2 - Formatting is an important property in tables for visualization, presentation, and analysis. Spreadsheet software allows users to automatically format their tables by writing data-dependent conditional formatting (CF) rules. Writing such rules is often challenging for users as it requires understanding and implementing the underlying logic. We present FormaT5, a transformer-based model that can generate a CF rule given the target table and a natural language description of the desired formatting logic. We find that user descriptions for these tasks are often under-specified or ambiguous, making it harder for code generation systems to accurately learn the desired rule in a single step. To tackle this problem of under-specification and minimise argument errors, FormaT5 learns to predict placeholders though an abstention objective. These placeholders can then be filled by a second model or, when examples of rows that should be formatted are available, by a programming-by-example system. To evaluate FormaT5 on diverse and real scenarios, we create an extensive benchmark of 1053 CF tasks, containing real-world descriptions collected from four different sources. We release our benchmarks to encourage research in this area. Abstention and filling allow FormaT5 to outperform 8 different neural approaches on our benchmarks, both with and without examples. Our results illustrate the value of building domain-specific learning systems.
AB - Formatting is an important property in tables for visualization, presentation, and analysis. Spreadsheet software allows users to automatically format their tables by writing data-dependent conditional formatting (CF) rules. Writing such rules is often challenging for users as it requires understanding and implementing the underlying logic. We present FormaT5, a transformer-based model that can generate a CF rule given the target table and a natural language description of the desired formatting logic. We find that user descriptions for these tasks are often under-specified or ambiguous, making it harder for code generation systems to accurately learn the desired rule in a single step. To tackle this problem of under-specification and minimise argument errors, FormaT5 learns to predict placeholders though an abstention objective. These placeholders can then be filled by a second model or, when examples of rows that should be formatted are available, by a programming-by-example system. To evaluate FormaT5 on diverse and real scenarios, we create an extensive benchmark of 1053 CF tasks, containing real-world descriptions collected from four different sources. We release our benchmarks to encourage research in this area. Abstention and filling allow FormaT5 to outperform 8 different neural approaches on our benchmarks, both with and without examples. Our results illustrate the value of building domain-specific learning systems.
UR - https://www.scopus.com/pages/publications/85183596508
U2 - 10.14778/3632093.3632111
DO - 10.14778/3632093.3632111
M3 - Conference contribution
VL - 17
T3 - Proceedings of the Vldb Endowment
SP - 497
EP - 510
BT - Proceedings of the VLDB Endowment
ER -