Pattern Recognition and Human Language Technology

Research Center

Contests

The IAM-PRHLT bi-modal Handwritten Text corpus II benchmark

This is a new benchmark to test and develop word-graph based multimodal protocols. These word-graphs are obtained for any word instance (on-line and off-line) of the biMod-IAM-PRHLT-2 corpus, using the viterbi algorithm, with a lexical restriction (prefix-tree). The contest consists in classifying each pair of word-graph-represented on/off-line test samples into one of the 1300 different classes (words). A more detailed description and instructions to download can be found here.

The Karyotype benchmark

This is a new benchmark to test and develop interactive protocols. It contains karyotypes, where each one is composed of 22 chromosome images. The goal is to associate each chromosome image with a label from a set of 22 labels. A more detailed description and instructions to download can be found here.

The Interactive Sequence Labeling benchmark

The aim of this benchmark is to find new search strategies for passive and active interactive sequence labeling. The corpus is a compilation of handwritten national identification numbers (DNI) from real forms. A more detailed description and instructions to download can be found here.

Photo-web: Large-scale annotation using general Web data

Concept detection relies on training data that have been manually, and thus reliably annotated, an expensive and laborious endeavor that cannot easily scale. To address this issue, this new annotation subtask is introduced this year: large-scale image annotation using as training a collection of automatically obtained Web images. A very large amount of images can be cheaply gathered from the Web, and furthermore, from the webpages that contain the images, text associated with them can be obtained. However, the degree of relationship between the surrounding text and the image varies greatly, thus the data can be considered to be very noisy. Moreover, the webpages can be of any language or even a mixture of languages, and they tend to have many writing mistakes. The goal of this task is to evaluate different strategies to deal with the noisy data so that it can be reliably used for annotating images from practically any topic. A more detailed description and instructions to download can be found here.

Interactive Image Annotation Benchmark

The objective of this benchmark is to compare the performance of different strategies for the task of interactive image annotation. The goal of this task is to assign words/tags to a given new image which hopefully describes or has relations with the content of that image. A more detailed description and instructions to download can be found here.