Synthetic Data Generation

The study shows AI-generated datasets can outperform traditional ones—faster, cheaper, more diverse, and easier to create

Date: 15/04/2024Reading time: 2 min

Synthetic Data Generation

AI-generated datasets are changing the game.
Recent research reveals that synthetic semantic segmentation datasets—created through controlled AI generation—can outperform traditional ones. These datasets are faster to produce, more cost-effective, diverse, and easier to scale, unlocking new potential in computer vision pipelines.

A new approach with real impact

In collaboration with CITIC – Centro de Investigación TIC, Universidade da Coruña, our team published a peer-reviewed study that fuels the foundations of our platform for conscious design.

"Should I Render or Should AI Generate? Crafting Synthetic Semantic Segmentation Datasets with Controlled Generation"
Read the full paper

Using a combination of Diffusion Models and ControlNet, the approach enables the creation of annotated synthetic datasets that rival or exceed those built using traditional 3D rendering methods.

Why this matters

Traditional dataset generation can be slow, expensive, and limited in diversity.
This method offers a scalable, accessible way to produce training data — especially in domains where real images are hard to obtain or annotate.

When crafted intentionally, AI-generated datasets can outperform those rendered via classical pipelines.

Authors and collaborators

This work wouldn't be possible without the brilliant researchers and partners behind it:

Luis Omar Álvarez Mures
Manuel Silva Díaz
Manuel Lijó Sánchez
Emilio José Padrón González
José A. Iglesias

Special thanks to the institutions and teams supporting this effort.

Synthetic Data Generation