Every picture tells a story: Generating sentences from images

  • Ali Farhadi*
  • , Mohsen Hejrati
  • , Mohammad Amin Sadeghi
  • , Peter Young
  • , Cyrus Rashtchian
  • , Julia Hockenmaier
  • , David Forsyth
  • *Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

846 Citations (Scopus)

Abstract

Humans can prepare concise descriptions of pictures, focusing on what they find important. We demonstrate that automatic methods can do so too. We describe a system that can compute a score linking an image to a sentence. This score can be used to attach a descriptive sentence to a given image, or to obtain images that illustrate a given sentence. The score is obtained by comparing an estimate of meaning obtained from the image to one obtained from the sentence. Each estimate of meaning comes from a discriminative procedure that is learned using data. We evaluate on a novel dataset consisting of human-annotated images. While our underlying estimate of meaning is impoverished, it is sufficient to produce very good quantitative results, evaluated with a novel score that can account for synecdoche.

Original languageEnglish
Title of host publicationComputer Vision, ECCV 2010 - 11th European Conference on Computer Vision, Proceedings
PublisherSpringer Verlag
Pages15-29
Number of pages15
EditionPART 4
ISBN (Print)364215560X, 9783642155604
DOIs
Publication statusPublished - 2010
Externally publishedYes
Event11th European Conference on Computer Vision, ECCV 2010 - Heraklion, Crete, Greece
Duration: 10 Sept 201011 Sept 2010

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
NumberPART 4
Volume6314 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference11th European Conference on Computer Vision, ECCV 2010
Country/TerritoryGreece
CityHeraklion, Crete
Period10/09/1011/09/10

Fingerprint

Dive into the research topics of 'Every picture tells a story: Generating sentences from images'. Together they form a unique fingerprint.

Cite this