Advanced search


Aitor Álvarez, Carlos D. Martínez-Hinarejos, Haritz Arzelus. Comparing rule-based and statistical methods in automatic subtitle segmentation for Basque and Spanish. Proceedings of IberSpeech 2016, 2016. pp. 251-260.

The correct segmentation of subtitles is crucial to obtain quality subtitles. For this reason, one of the main tasks of human subtitlers is to segment subtitles properly in order to help audience read them with as little effort as possible. The manual segmentation can be done faster if subtitlers are provided with a draft segmentation so that they can focus on post-editing the potential errors. In this work, we explore the use of different automatic techniques to obtain those draft segmentations of subtitles. Two rule-based techniques (Counting Characters and Chink-Chunk) and one statistical method (Conditional Random Field) are tested and compared through several evaluation metrics at line and subtitle levels. The results show that Conditional Random Fields out- perform the other techniques, and that it would be therefore feasible to provide reasonable good draft segmentations to post-editors.