Share it

Over the past ten years, generative models have gone from being a promising line of research in machine learning (ML) to becoming one of the core technologies of artificial intelligence (AI). Their evolution has been so rapid that in just a decade they have gone from generating experimental results to producing synthetic text, images, audio, video, and data with a level of realism and utility that is already transforming entire sectors.

Initially, architectures such as variational autoencoders (VAEs) and Generative adversarial networks (GANs) stood out, capable of generating realistic synthetic images and data. Later, Transformer-based models gained prominence, currently forming the core architecture of large language models ( LLMs ). More recently, diffusion models have significantly improved the quality, variety, and control over results in domains such as audio and video. Today, we see these architectures combined in innovative ways in many AI models, in a vibrant race for the highly realistic generation of synthetic data . This is turning science fiction into reality in the development of diverse scientific and technological applications, from automation and industrial simulation to healthcare and robotics . Even with the emergence of new challenges related to privacy, bias, transparency, and trust, the expansion of this unstoppable technology, with its ever-increasing economic and social impact, has not been hindered.

Beyond the now commonplace LLMs like ChatGPT, the generation of synthetic data is expanding research and development in areas of social value, for example, by facilitating data exchange for science, health, and public policy. This is the case when preserving the privacy of sensitive data, or when helping to balance the representation of data from infrequent situations, emergencies, or under-observed populations to improve the training of systems that must operate beyond the average scenario, as is being done in autonomous driving systems for accident prevention.

However, a lesser-known facet of generative models is their enormous predictive potential. Their ability to learn unsupervised, without annotation, extremely complex data distributions—which are difficult to explicitly model at a human scale—is demonstrating their skill in predicting future situations based on certain initial conditions, a key functionality in many innovative applications. Let’s look at some examples:

  • Training robots in real-world environments is often expensive and time-consuming, and not without physical risk. Multimodal vision and language (LVM) models, video prediction using the currently popular WorldModels, and automaticinpainting ” editing are enabling robots to be trained to perform highly dexterous manipulation tasks without the need for complex, explicit three-dimensional models. For example, an industrial or logistics robot can learn to identify parts, estimate trajectories, or anticipate collisions before operating in the physical world using only a conventional camera. Here, synthetic data doesn’t completely replace real-world data, but it does accelerate training and improve coverage of unusual situations.
  • When we observe the atmosphere and climate, we see that they function as an enormously complex and difficult-to-predict system, in which many variables interact simultaneously and where small changes can produce very different results. In this context, generative models, often treating climate data as structured time series, are becoming especially useful because they not only learn to implicitly model how the system behaves from large amounts of data, but they can also generate different possible scenarios from the same initial situation . This allows us to better anticipate the probability of a certain climate event occurring, which is invaluable for preparing for extreme phenomena such as floods, droughts, or abrupt temperature changes.
  • In healthcare, generative models are opening new possibilities for anticipating how a disease evolves over time . Beyond creating clinical histories or synthetic images, their value lies in learning a patient’s trajectory and generating plausible future scenarios based on their current state, in order to develop the first systems capable of predicting progression, relapses, response to treatments, or the onset of complications. This line of work is also being explored in initiatives such as PHASE IV AI1, which develops privacy-compliant synthetic health data services to support trustworthy AI development and validation in healthcare. Recent work on AI models for predicting the evolution of pulmonary nodules in CT scans could soon be used to anticipate cancer relapses, the progression of neurodegenerative diseases, or the appearance of cardiovascular complications, helping to identify high-risk cases earlier and supporting clinical decision-making.

In the coming years, the true potential of generative models will lie not only in their ability to create increasingly realistic data and content, thereby accelerating many production processes, but also in their role as tools for exploring what has yet to happen. Their value will grow wherever it is necessary to anticipate complex scenarios. In this context, synthetic generation will cease to be seen merely as a training support technique and will become a key infrastructure for simulation, prediction, and decision-making. The challenge will be to ensure that this progress is accompanied by guarantees of quality, transparency, and control, so that its impact is not only technically impressive but also socially valuable.

 

Rafael Redondo
Director of the Multimedia Tecnologies Unit, Eurecat

Other articles

On December 13, 2023, a Workshop on AI Training in Catalonia, organized by CIDAI and AIR. The workshop brought together professionals from the academic, industrial […]

Artificial intelligence is no longer a promise of the future—it is a present reality. Every week, new tools and platforms make headlines, and executive boards […]

In this context of growth and evolution of constant language models, it is important that we consider adopting and adapting small language models when developing tools based on natural language models.
CIDAI