Duration: 1 September 2025 to 31 August 2028
Grant PID2024-156022OB-C32 funded by MICIU/AEI/10.13039/501100011033 and by ERDF/EU
PI: Paolo Rosso, Carlos D. Martínez
Members: Alberto Albiol, Jon Ander Gómez, Jorge González
ANNOTATE-MULTI2 will focus on the development of multimodal human-centric AI models for the detection of sexism, disinformation, and mental health disorders as depression (problems addressed at UPV mainly at textual level), in social media texts, memes and videos, both in Spanish and English. Special emphasis will be made on the integration of sensor data such as electroencephalogram signals, heart rate variability, and also eyetracking information that will be collected during the annotation phase of the multimodal data and integrated to develop next generation AI models that leverage peoples emotional reactions.
Sensor data from annotators will be integrated in a disaggregated manner to move beyond the conventional approach of providing a single, aggregated label representing the majority view and emotional reactions. The recent Learning With Disagreement paradigm will ensure the incorporation of the diverse perspectives and emotional reactions of the annotators, and will help develop more robust humancentric AI models, better to leverage the diverse emotional reactions that users have looking at the same multimodal piece of information, especially when dealing with very sensitive topics such as sexism and disinformation.
The annotation process of social media texts, memes and videos will provide for all annotators their conscious viewpoint (e.g. on sexist or disinformation content) as well as their unconscious perception of the topic given by their emotional reactions (measured by sensor data). Emotional reactions will be measured through a 32 channels Brian Vision device that is able to capture electroencephalogram (EEC) and electrocardiogram (ECG) signals, as well as respiration.
Moreover, eye-tracking information will be collected from wearable glasses. This way we will be able to measure: (i) brain wave activity (EEG signals have been associated to specific emotional valence, with negative emotions activating right hemisphere areas and positive emotions activating left hemisphere areas); (ii) heart rate variability (it may be used to identify emotional states like stress, relaxation, or excitement); (iii) respiration (breathing patterns may be used to detect emotional and physical states, e.g. rapid breathing may signal anxiety, while deep breaths suggest calmness); and (iv) eye-tracking (tracking gaze patterns, focus duration, and shifts in attention may help to study cognitive arousal and to detect emotional states such as curiosity, excitement, or anxiety).
The exploration of physiological markers will be valuable for mental well-being analysis and will allow for detecting anxiety and distress that are often preliminary signals of depression.
The incorporation of sensor data will help develop multimodal human-centric AI models for an early detection of mental health disorders. Modern AI models predominantly focus on training through the ingestion of massive amounts of multimodal data. We believe that despite significant advancements in AI, a fundamental paradigm shift is required to integrate a deeper level of human involvement particularly in the phases of the creation of datasets that should incorporate physiological markers of annotators responses to multimodal stimuli. This will allow to place humans at the center of the AI models design and development for a sensor data-based next generation human-centric AI models that will help recognize and understand cognitive and affective-related physiological responses.
Project PID2024-156022OB-C32 funded by:




