For many languages that use non-Roman based indigenous scripts (e.g., Arabic, Greek and Indic languages) one can often find a large amount of user generated transliterated content on the Web in the Roman script. IR in such space is challenging because queries written in either the native or the Roman scripts need to be matched to the documents written in both the scripts. Moreover, transliterated content features extensive spelling variations. We propose a principled solution to handle the cross-script term matching and spelling variation where the terms across the scripts are modelled jointly in a deep-learning architecture and can be compared in a low-dimensional abstract space

  • Author profiling in social media
  • Irony detection and opinion mining
  • Opinion spam detection
  • Plagiarism detection
  • statistical parsing
Related Demos