Intelligent assistance in contextualized description of photos for news

The project, led by Eurecat in collaboration with 3Cat, has embarked once again on the manufacture of new tools for journalistic assistants, in this case for photo description, exploring the unprecedented capabilities of current multimodal language models.

Describing photos in the media is often a routine task, far from the intellectual activities necessary in writing a news story. Captions usually address editorial criteria, while photograph descriptions require a description in purely visual terms. These descriptions, included in non-visible fields of digital formats, such as web pages, also called “alternative texts”, are often not present due to various factors in current workflows. This means that people with accessibility limitations cannot be informed in the same way.

The development of new tools based on multimodal models of generative artificial intelligence, optimized for the description of photographs for informational purposes, can constitute a journalistic assistant that has a positive impact on the quality of information, both in the creation of photo captions and in visual descriptions for accessibility purposes.

In this session, CIDAI, in collaboration with 3Cat, presented the results and knowledge acquired during the execution of one of the High Impact Projects where multimodal generative AI tools have been used.

Calendar

Presentation Impact Project “Intelligent assistance to the contextualized description of photos for news”