Learning Program Representations for Food Images and Cooking Recipes

  • Dim P. Papadopoulos
  • , Enrique Mora
  • , Nadiia Chepurko
  • , Kuan Wei Huang
  • , Ferda Ofli
  • , Antonio Torralba

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

32 Citations (Scopus)

Abstract

In this paper, we are interested in modeling a how-to instructional procedure, such as a cooking recipe, with a meaningful and rich high-level representation. Specifically, we propose to represent cooking recipes and food images as cooking programs. Programs provide a structured repre-sentation of the task, capturing cooking semantics and se-quential relationships of actions in the form of a graph. This allows them to be easily manipulated by users and executed by agents. To this end, we build a model that is trained to learn a joint embedding between recipes and food images via self-supervision and jointly generate a program from this embedding as a sequence. To validate our idea, we crowdsource programs for cooking recipes and show that: (a) projecting the image-recipe embeddings into programs leads to better cross-modal retrieval results; (b) generating programs from images leads to better recognition re-sults compared to predicting raw cooking instructions; and (c) we can generate food images by manipulating programs via optimizing the latent code of a GAN. Code, data, and models are available online11http://cookingprograms.csail.mit.edu.

Original languageEnglish
Title of host publicationProceedings - 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022
PublisherIEEE Computer Society
Pages16538-16548
Number of pages11
ISBN (Electronic)9781665469463
DOIs
Publication statusPublished - 2022
Event2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022 - New Orleans, United States
Duration: 19 Jun 202224 Jun 2022

Publication series

NameProceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition
Volume2022-June
ISSN (Print)1063-6919

Conference

Conference2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022
Country/TerritoryUnited States
CityNew Orleans
Period19/06/2224/06/22

Keywords

  • Datasets and evaluation
  • Recognition: detection
  • Vision + language
  • categorization
  • retrieval

Fingerprint

Dive into the research topics of 'Learning Program Representations for Food Images and Cooking Recipes'. Together they form a unique fingerprint.

Cite this