AI image generators are everywhere now, more and more people are joining in the fun using them to generate creative images. Some of the popular sites that provide text to image generation services on the web pages are:
- Dall.E-2 by openAI
- dalle-mini ( note it’s an open source model and not a mini version of OpenAI’s DALL.E)
- stablediffusionweb （Stable Diffusion)
- DreamStudio (Stable Diffusion)
These AIGC (Artificial Intelligence Generated Content) image generators use deep learning algorithms to generate images from text. Specifically, they use a type of neural network called a Generative Adversarial Network (GAN).
The process typically involves training the GAN on a large dataset of images and corresponding text descriptions. The GAN consists of two neural networks: a generator and a discriminator. The generator takes in a text description as input and generates an image that it thinks corresponds to that description. The discriminator then evaluates the generated image and compares it to the actual images in the dataset. If the discriminator determines that the generated image is too dissimilar from the real images, it provides feedback to the generator, which adjusts its output to try to create a better match.
This process of feedback and adjustment continues until the generator is able to consistently produce images that are indistinguishable from real images based on the text descriptions. Once the generator has been trained, it can be used to generate new images from text descriptions that it has never seen before.
I have tried and compared generating images on these web sites using same prompts, and the results are great. and It’s fun to see the differences of images between different models, as well as the imperfect and weird results generated 🙂
prompt 1: 蒙古人在草原上骑马 (Chinese, meaning “a Mongolian riding a horse on grassland”)
You can see none of them generate a picture about a Mongolian riding a horse except for OpenAI’s DALL.E-2 ( the last one). But if we change the prompt to English, the results are much better.
Prompt 2: a Mongolian riding a horse on grassland
You can see they are able to generate the intended images this time.
Conclusion 1: DALL.E-2 is the only service that works as intended. All the other services are weak on natural language processing for Chinese– they don’t understand Chinese well like DALL.E-2. For the latter it probably benefits from the GPT large language model that also powers chatGPT.
As a work around, for Chinese or any other languages prompts, you can have Google translate Chinese into English, then use the translated prompt for the other image gen services. Or even better, ask chatGPT to translate and also come up with more precise prompts.
The other important aspect is the quality of the images. Here are some other tests:
Prompt 3: An Asian girl, who dressed a black jacket, is playing electric guitar in a rock band concert
Conclusion 2: Common issues of the AI generated human photo-realistic images include human faces, eyes and fingers. Hands and fingers seem to be the most difficult among them– even if AI got the number of fingers right, hands and fingers often twist in some weird angles or the lengths are in bad aspect ratios.
In the testing above, MidJourney ‘s result is the best.
I also played around with some other random prompts on Dalle-2 and Stable Diffusion. Here are some samples, both good and “bad” ones.
And the following are from testings on MidJourney (some are others test results on MidJourney)
Conclusion 3: From my testings, the one that generates images with best quality is MidJourney. It consistently generated great images , whether it’s photo-realistic, carton or other visual arts.
Overall the best goes to MidJourney.