Stable Diffusion XL, the highly anticipated next version of Stable Diffusion, is set to be released to the public soon. With significantly larger parameters, this new iteration of the popular AI model is currently in its testing phase. Very little is known about this AI image generation model, this could very well be the stable diffusion 3 we have been waiting for but it could also be Stable diffusion 2.2. Since there is very little information available about upcoming versions of stable diffusion, everything in this article is all speculation and based on information I’ve read from Twitter and the official stable diffusion discord channel.
Only a few images have been shared online by Stability AI’s founder Emed Mosquec and a select few have gained access to it. As of now, it seems like Stable Diffusion XL is only available on Dream Studio, the official image generator of Stability AI. In this article, we’ll dive into everything we know so far about the upcoming version of Stable Diffusion.
More Parameters
Stable diffusion XL is expected to be much bigger compared to previous versions. Stable diffusion 1.5 had around 900 million parameters, yet still it performed very well. So far we have not seen any reports about how big is this version of stable diffusion. We also don’t know when and how the model was trained whether it was trained with A100 or H100 Nvidia GPUs.
High-Quality Images
It is very clear from the images so far shared by Emad and others that have access to the beta, Stable Diffusion XL outputs are much better than standard 1.5, 2.0, and 2.1 stable diffusion models. see some of the images shared by people who got access to the beta in Dream Studio. Keep in mind that this is just the standard version, once it gets released to the public and further fine-tuned, it will perform even better just like stable diffusion 1.5
Better Composition Control
From looking at the outputs generated, it is also clear that Stable diffusion XL also allows better composition control. See the image in the tweet given below from Emad. He was able to merge two faces without losing detail, which is almost impossible to do in the previous versions of stable diffusion without tools like controlnet. it is unclear if Emad achieved this output with just plain text prompts or if he also relied on some new features added to dream studio similar to Controlnet.
Text In Images
It seems like Stable diffusion XL like all other image generation models currently available to the public is unable to add comprehensible text within images. It seems to be better at it than older models but nowhere close to Google imagen which is a very large image generation model that can add comprehensible text in images. Some images shared on the official Stable Diffusion discord showed some promising results.
Easier To Prompt
One beta user on Twitter also claimed that he was able to achieve photorealistic output with one word, there is no way we can verify this for now. Most likely this is true. He also claimed he didn’t use any negative prompts. But it is not clear if this is Stable diffusion XL or another model currently in development.
ControlNet Support
The existing controlNet extension and its models are confirmed to be supported out of the box. This means many other extensions may very well be able to work with this new model. Instead of rebuilding each tool for this model, developers can build additional tools to improve the quality of outputs from stable diffusion XL.