The Ridiculated Stable Diffusion 3 release excels at AI-generated body horror

Magnify / Image of a girl lying in the grass created with Stable Diffusion 3.

On Wednesday, Stability AI released the scales for Stable Diffusion 3 Medium, an AI image synthesis model that turns text prompts into AI-generated images. However, its arrival has been derided online as it generates images of people in a way that looks like a step backwards from other state-of-the-art image synthesis models like Midjourney or DALL-E 3. As a result, it can churn out wild anatomically incorrect visual abominations with ease.

A Reddit thread titled: “Is this release supposed to be a joke? [SD3-2B]” describes SD3 Medium’s spectacular failure to render people, especially human limbs such as hands and feet. Another thread titled “Why is SD3 so bad at generating girls lying on grass?” shows similar problems, but for full human bodies.

Hands have traditionally been a challenge for AI image generators due to the lack of good examples in early training datasets, but recently several image synthesis models seem to have overcome this problem. In that sense, the SD3 seems like a huge step back for the image synthesis enthusiasts flocking to Reddit — especially compared to recent stable releases like the SD XL Turbo in November.

“It wasn’t long ago that StableDiffusion competed with Midjourney, now it looks like a joke in comparison. At least our datasets are safe and ethical!” one Reddit user wrote.

AI generated image created using Stable Diffusion 3 Medium.
Image of a woman lying in the grass created with Stable Diffusion 3.
An AI generated image created with Stable Diffusion 3 showing damaged hands.
Image of a woman lying in the grass created with Stable Diffusion 3.
An AI generated image created with Stable Diffusion 3 showing damaged hands.
AI-generated SD3 Medium image created by a Reddit user with the caption “woman in dress on beach”.
AI-generated SD3 Medium image created by a Reddit user with the prompt “photo of a person taking a nap in the living room”.

Image AI fans have so far blamed the failure of Stable Diffusion 3’s anatomy on Stability’s insistence on filtering out adult content (often called “NSFW” content) from the SD3 training data that teaches the model how to generate images. “Believe it or not, heavily censoring the model also removes human anatomy, so… that’s what happened,” one Reddit user wrote in the thread.

Basically, whenever the user engages with a concept that is not well represented in the AI model’s training dataset, the image synthesis model will confuse its best interpretation of what the user is asking. And sometimes it can be downright terrifying.

The 2022 release of Stable Diffusion 2.0 suffered from similar problems in rendering people well, and AI researchers soon discovered that censoring adult content that contained nudity could seriously impair the AI model’s ability to generate accurate human anatomy. At the time, Stability AI reversed course with SD 2.1 and SD XL, regaining some of the capabilities lost through strong NSFW content filtering.

Another problem that can occur during pre-training of the model is that sometimes the NSFW filter that the researchers use to remove adult images from the dataset is too selective, randomly removing images that may not be offensive, and in certain situations the model view of people. “[SD3] works fine as long as there are no humans in the image, I think their improved nsfw filter for filtering training data has decided that anything humanoid is nsfw,” one Redditor wrote on the topic.

Using the free SD3 online demo at Hugging Face, we ran the challenges and saw similar results to what others have reported. For example, the prompt “man shows hands” returned an image of a man holding up two giant stunted hands, although each hand had at least five fingers.

We generated the SD3 Medium example with the prompt “Woman lying on the beach”.
Example nSD3 Medium we generated with the prompt “Man showing hands.”

AI stability
We generated the SD3 Medium example with the prompt “Woman shows her hands.”

AI stability
We generated the SD3 Medium example with the prompt “muscular barbarian with weapons next to CRT TV, cinematic, 8K, studio lighting”.
We generated an SD3 Medium example with the prompt “Cat in car holding beer can”.

Stability announced the Stable Diffusion 3 in February, and the company plans to make it available in a variety of model sizes. Today’s release is for the “Medium” version, which is a model with 2 billion parameters. In addition to the weights available on Hugging Face, they are also available for experimentation through the company’s stability platform. Weights are available for free download and use under a non-commercial license only.

Soon after the announcement in February, the SD3 weight release delay inspired rumors that the release was delayed due to technical issues or mismanagement. AI’s stability as a company recently took a nosedive when its founder and CEO Emad Mostaque resigned in March, followed by a series of layoffs. Just before that, three key engineers left the company – Robin Rombach, Andreas Blattmann and Dominik Lorenz. And its problems go back even further, with reports of the company’s dire financial situation persisting as of 2023.

For some Stable Diffusion fans, the failures of Stable Diffusion 3 Medium are a visual manifestation of the company’s mismanagement – and a clear sign that things are falling apart. Although the company did not file for bankruptcy, some users made dark jokes about the possibility after seeing the SD3 Medium:

“I think they can now go bankrupt safely and ethically [sic] after all.”

Leave a Comment Cancel Reply