Webis-Wikipedia-IPC-23

Synopsis
People
Publications

Synopsis

When an image is reused on the Web, an original caption is often assigned. We hypothesize that different captions for the same image naturally form a set of mutual paraphrases. To demonstrate the suitability of this idea, we analyzed captions in the English Wikipedia, where editors frequently relabel the same image for different articles. As a result, the Wikipedia-IPC (Image caption Paraphrase Corpus) dataset was created which include caption pairs of the same image which represent paraphrases. It contains 30,237 gold, 229,877 silver, and 656,560 bronze quality paraphrase pairs.

Access

Please refer to this publication for citing the dataset. If you want to link the dataset, please use the dataset permalink [doi].

Download the dataset from Zenodo.
Find the related metadata at Google.

People

Marcel Gohsen
Matthias Hagen
Martin Potthast
Benno Stein

Webis-Wikipedia-IPC-23

Synopsis

Access

People

Publications

CVIP

DSDA

HCIIS

MATH

NLPN

RTA

AP