The project, led byEurekain collaboration with 3Cat, has embarked once again on the manufacture of new tools for journalistic assistants, in this case for photo description, exploring the unprecedented capabilities of current multimodal language models.
Describing photos in the media is often a routine task, far from the intellectual activities necessary in writing a news story. Captions usually address editorial criteria, while photo descriptions require a description in purely visual terms. These descriptions, included in non-visible fields of digital formats, such as web pages, also called “alternative texts”, are often not present due to different factors in current workflows. This means that people with accessibility limitations cannot be informed in the same way.
The development of new tools based on multimodal models of generative artificial intelligence, optimized for the description of photographs for informational purposes, can constitute a journalistic assistant that has a positive impact on the quality of information, both in the creation of photo captions and in visual descriptions for accessibility purposes.
In this session, the CIDAI, in collaboration with 3Cat will present the results and knowledge acquired during the execution of one of the High Impact Projects where multimodal generative AI tools have been used.

