Duration: 1 January 2019 to 30 September 2022
Grant RTI2018-096883-R-C43 funded by MCIN/AEI/ 10.13039/501100011033” and by “ERDF A way of making Europe”
PI: Roberto Paredes Members: Jon Ander Gómez

The objective of this subproject is to develop the necessary tools for the automatic understanding and categorization of food labeling by the purchasing users. To this end, from images acquired from a mobile device, it is intended to make a literal transcription of the labeling of food products. This literal transcript may be used by other project members to assess the adequacy of this product to a given user profile. For this purpose it is necessary to solve a series of tasks and intermediate objectives. These tasks are on the one hand the correct location of the labeling in the image as well as the literal transcription of the same. We propose to use deep learning, neural networks, because they are currently the state of the art in the aforementioned intermediate tasks. Both tasks, however, require a prior acquisition of a database of examples of different types of labels present in product images (thousands). This acquisition task entails an important effort since not only we have to deal with the image acquisition but also to the supervised labeling. This labeling must be carried out at two
levels, first to determine the minimum inclusion box of the nutritional information (detection) and second the literal transcription of the nutritional information. This supervised labeling involves an important cost of human supervision and involves a classic approach to the problem of machine learning, therefore, and in order to avoid this type of supervised labeling, we propose a more direct approach to the problem. In this case, it is proposed to investigate and develop a tool also based on deep learning techniques, an end-to-end approach.

This end-to-end approach works as follows, given an image and a specific user profile, the system issues an alarm signal (traffic light) informing about the adequacy of that product to that particular user. This end-to-end approach requires much weaker labeling than the previous one, since it is not necessary to detect or transcribe the nutritional information but simply to inform (in a weak way) about the adequacy of the product to the user in question. Both proposals will be made available to the rest of the project participants in the form of a
library (API) that, given an image, will detect and transcribe the nutritional information, in the first case, or that given a pair, image-user, return the level of adequacy of the product that appears in the image to that user.

Finally, we propose to evaluate the feasibility of transferring the underlying technologies in both proposals to be implemented in mobile devices. This feasibility study will involve analyzing among the different types and techniques of neural networks used which are suitable and which of them should be modified to be included
in these devices with limited resources of storage and computation. An example is the use of MobileNets for the computer vision tasks, among others.