Pattern Recognition and Human Language Technology

Research Center

Natural Language Processing

This area includes basic research on:

Cross-lingual information retrieval and script retrieval

For many languages that use non-Roman based indigenous scripts (e.g., Arabic, Greek and Indic languages) one can often find a large amount of user generated transliterated content on the Web in the Roman script. IR in such space is challenging because queries written in either the native or the Roman scripts need to be matched to the documents written in both the scripts. Moreover, transliterated content features extensive spelling variations. We propose a principled solution to handle the cross-script term matching and spelling variation where the terms across the scripts are modelled jointly in a deep-learning architecture and can be compared in a low-dimensional abstract space