Welcome to the


The Pattern Recognition and Human Language Technology (PRHLT) research center is composed by researchers from the Universitat Politècnica de València (UPV) in the areas of Multimodal Interaction, Pattern Recognition, Image Processing (Image Analysis, Computer Vision, Handwritten Text Recognition, Document Analysis) and Language Processing (Speech Recognition and Understanding, Machine Translation, Information Retrieval).

The PRHLT center is an active research entity with important ongoing research projects, technology transfer activities, and research publications.


Big data and deep learning

“Machine Learning is the new electricity” Deep Learning is a technique that belongs to the Machine Learning Field. Machine Learning techniques learns from data. Nowadays the amount of data grows exponentially year after year. Therefore machine learning techniques obtain a great potential to solve very complex problems. Big-data is the perfect partner and deep learning techniques are becoming a standard thanks to the hardware and software advances. In PRHLT we have [...]

Read more


Speech processing and dialogue systems

Speech-to-speech translation or text-to-text translation for limited domains fall within these kind of projects. Finite-state and statistical transducers are used as the basis of the machine translation systems. These models can be learnt automatically from real examples of translation. Some applications included (but are not limited to) translation of technical reports, hotel services, Speech interaction with mobile devices Speaker and domain adaptation Statistical dialogue annotation models Multimodal speech recognition

Read more


Handwritten Text Recognition

Both off-line (document images) and on-line HTR (tablet or e-pen signals) are considered. No prior character or word segmentation is needed. Technology, borrowed from Speech Recognition, relies on character Hidden Markov Models, Finite State word models, and syntactic N-Grams. After model training, for each given text line image, a holistic (“Viterbi”) search provides both an optimal transcription and the corresponding word and character segmentations. Applications: Transcription of ancient and legacy [...]

Read more


Computer vision

General Statistical and Syntactic Pattern Recognition techniques for image analysis and recognition. Some applications: OCR and document analysis, medical diagnosis, biometric identification, image and video retrieval. Relevance-based Image Retrieval Biometrics

Read more


Language translation

The activities of the Machine Translation group began some years ago with the use of finite-state models for speech-to-speech translation and for text-to-text translation in limited domains. This group has developped a number of translation models with the corresponding learning algorithms and a number of prototypes for speech translation and computer-assisted translation. Currently, the Machine Translation group is devoted to the development of new interactive-predictive techniques for computer-assisted translation, techniques for [...]

Read more


Natural Language Processing

Social media data analysis: Author profiling, Stance detection, Deceptive opinion detection, Irony detection and sentiment analysis, Mixed-script text analysis, Plagiarism and social copying detection. Author profiling Given a text, what are the author’s traits? The focus is on inferring traits such as gender, age, native language, language variety, and personality on the basis of the stylistic analysis of the author’s texts. This is of interest for areas such [...]

Read more

Current Projects

READ: Recognition and Enrichment of Archival Documents

The overall objective of READ is to implement a Virtual Research Environment where archivists, humanities scholars, computer scientists and volunteers are collaborating with the ultimate goal of boosting research, innovation, development and usage of cutting edge technology for the automated recognition, transcription, indexing and enrichment of handwritten archival documents. This Virtual Research Environment will not be built from the ground up, but will benefit from research, tools, data and [...]

Duration: 1 February 2017 to 30 June 2019
Read more

SomEMBED: SOcial Media language understanding-EMBEDing contexts

SomEMBED (SOcial Media language understanding – EMBEDing contexts) is a coordinated project whose goal is to advance in the area of Computational Linguistics (CL) and in Natural Language Processing (NLP) in order to deal with and solve the challenges posed by the use of language in the social media: (i) from CL, our goal is to develop techniques and methods for modeling non-standard language from representative corpus of the social [...]

Duration: 1 January 2016 to 31 December 2018
Read more

CoMUN-HaT: Contexto, multimodalidad y colaboración del usuario en procesado de texto manuscrito

Processing of handwritten documents is a task that is of wide interest for many purposes, such as those related to preserve cultural heritage. Handwritten text recognition techniques have been successfully applied during the last decade to obtain transcriptions of handwritten documents, and keyword spotting techniques have been applied for searching specific terms in image collections of handwritten documents. However, results on transcription and indexing are far from perfect, although models [...]

Duration: 1 January 2016 to 31 December 2018
Read more

Arabic Author Profiling for Cyber-Security

Cyber-security has evolved to a key priority for Qatar and all nations over the world. Malicious actors from anywhere misuse the cyberspace to perpetrate various crimes such as phishing, Cyber-blackmailing, Cyber-bullying, and communicating or planning terrorist attacks using social media. For instance, there is a tendency from these cybercriminals to use similar writing styles in their messages, which makes it possible for security experts to detect and stop these threats [...]

Duration: 4 February 2017 to 4 February 2020
Members: P. Rosso
Read more

Carabela: probabilistic indexing of manuscript collections for the protection of underwater historic heritage

The goal of the project is to apply techniques that allow textual and large-scale searches in manuscripts from the 15th to 16th Centuries containing key information for locating thousands of shipwrecks from that period. The project will focus on 150,000 images from collections of interest to underwater archaeology belonging to the Archivo General de Indias and the Archivo Histórico Provincial de Cádiz. These are manuscripts related to Spanish expeditions and naval [...]

Duration: 30 November 2017 to 30 November 2019
Read more

Latest News

More news

Advances in the development of a hybrid neural machine translation platform

The development of a hybrid neural machine translation platform reaches its first milestone. Its goal is the design and development of advance machine translation software using hybridization techniques over [...]

Colaboración con la empresa PANGEANIC en el desarrollo de una plataforma de traducción automática

El grupo de Traducción Automática del centro PRHLT está involucrado en el desarrollo de una plataforma de traducción automática basada en redes neuronales (Neural Machine Translation, NMT). Este desarrollo se está [...]

Premio a la mejor tesis doctoral

La Sociedad Española de Recuperación de Información entregó el premio a la mejor tesis doctoral en el ámbito de la Recuperación de la Información (periodo 2016-2018) a Marc Franco Salvador [...]


PRHLT Research Center
Universitat Politècnica de València
Ciudad Politécnica la Innovación
Edif. 8B Acceso N Planta 0
Camí de Vera, s/n
46022 Valencia (VLC), Spain
(+34) 96 387 81 70
Contact form

Write the text below (required)