Duración: 1 enero 2013 hasta 31 diciembre 2015
Financiado por: referencia TIN2012-38603-C02-01

DIscourse ANAlysis for knowledge understanding (DIANA) is a coordinated project that aims at moving forward in Computational Linguistics (CL) and Natural Language Processing (NLP) in order to overcome state of the art limitations. More specifically, (i) from CL, our objective is setting out and evaluating empirically new theoretical foundations for the conception of human language structure; (ii) from NLP, we aim at developing new techniques and methods departing from the state of the art. These new approaches will be applied, on the one hand, to tasks involving language processing in the discourse framework, especially in entity and event coreference resolution, treatment of implicit arguments and in more recent problems, such as paraphrase identification in plagiarism; on the other hand, they will be applied to the analysis of subjective texts. Among the project objectives, we highlight: (a) The development of a NLP framework for the analysis of discourse level texts, which will be used to detect general constructions and construction that are specific of concrete semantic environments. The core of this framework will be a semantic parser for the partial analysis of corpora with arguments and thematic roles for the languages involved in the project. (b) The development of NLP hybrid systems combining linguistic knowledge (corpora annotated at various levels) and machine learning techniques using semantic similarity measures that expand the available knowledge to unseen cases. (c) The application of these technologies to overcome the limitations in coreference resolution systems, paraphrase extraction and argument structure. (d) The application of these technologies to identify linguistic constructions in the analysis and comprehension of subjective texts: state of mind identification (stress, frustration, depression, neurosis and aggressiveness), detection of paedophile and social media harassment, and detection of deceitful behaviour in social media. (e) Inference of linguistic constructions that are characteristic of figurative texts in social media, with the aim of understanding their actual sense, humour and irony in opinion are particularly relevant in this sense.