Linking Entities Across Images and Text

Rebecka Weegar, Karl Åström, Pierre Nugues

Research output: Chapter in Book/Report/Conference proceedingPaper in conference proceedingpeer-review

196 Downloads (Pure)

Abstract

This paper describes a set of methods to link entities across images and text. As a corpus, we used a data set of images,
where each image is commented by a short caption and where the regions in the images are manually segmented and labeled with a category. We extracted the entity mentions from the captions and we computed a semantic similarity between the mentions and the region labels. We also
measured the statistical associations between these mentions and the labels and we combined them with the semantic similarity to produce mappings in the form of pairs consisting of a region label and
a caption entity. In a second step, we used the syntactic relationships between the mentions and the spatial relationships
between the regions to rerank the lists of candidate mappings. To evaluate our methods, we annotated a test set of 200 images, where we manually linked the im- age regions to their corresponding mentions in the captions. Eventually, we could match objects in pictures to their correct mentions for nearly 89 percent of the segments, when such a matching exists.
Original languageEnglish
Title of host publicationProceedings of the Nineteenth Conference on Computational Natural Language Learning (CoNLL 2015)
PublisherAssociation for Computational Linguistics
Pages185-193
ISBN (Electronic)978-1-941643-77-8
Publication statusPublished - 2015
EventNineteenth Conference on Computational Natural Language Learning (CoNLL 2015) - Bejing, China
Duration: 2015 Jul 302015 Jul 31

Conference

ConferenceNineteenth Conference on Computational Natural Language Learning (CoNLL 2015)
Country/TerritoryChina
CityBejing
Period2015/07/302015/07/31

Subject classification (UKÄ)

  • Computer and Information Science

Fingerprint

Dive into the research topics of 'Linking Entities Across Images and Text'. Together they form a unique fingerprint.

Cite this