Please visit the Carabela project web site.
The goal of the project is to apply techniques that allow textual and large-scale searches in manuscripts from the 15th to 16th Centuries containing key information for identifying thousands of shipwrecks from that period.
The project will focus on 150,000 images from collections of interest to underwater archaeology belonging to the Archivo General de Indias and the Archivo Histórico Provincial de Cádiz. These are manuscripts related to Spanish expeditions and naval commerce during the 15th to 19th Centuries on which OCR techniques (designed for printed text) and specific techniques for handwritten material (which produce imprecise results when applied to historical texts) do not work.
The team has developed automatic learning methods that enable the probabilistic indexing of images of handwritten text suitable for approximated but effective contextual searches in large-scale collections of historic documents.
This will allow the effective extraction of valuable information on shipwrecks constituting archaeological heritage of the highest level due to the great historical and cultural value of their content. This information will be classified according to its ‘risk level’ so as to avoid the plundering of underwater heritage.