Mauricio Villegas, Roberto Paredes. Image-Text Dataset Generation for Image Annotation and Retrieval. II Congreso Español de Recuperación de Información, CERI 2012, 2012. pp. 115-120.

This paper presents a new dataset of images gathered from the Web with corresponding text obtained from the webpages near where the images appeared. Already extracted features are provided to ease the dataset usage for other researchers. An initial release of 250,000 images is targeted at automatic image annotation with unsupervised data. This dataset is the one being used for the ImageCLEF 2012 Web image annotation benchmark.