Generating images from text using AttnGAN
Hello, Habr! I present to your attention the translation of the article " AttnGAN: Fine-Grained Text to Image Generation with Attentional Generative Adversarial Networks " By Tao Xu, Pengchuan Zhang, Qiuyuan Huang, Han Zhang, Zhe Gan, Xiaolei Huang, Xiaodong He.
In this publication I want to talk about my experiments with the AttnGAN architecture for generating images from a text description. This architecture was already mentioned on Habr after the original article was published in early 201? and I was interested in the question - how difficult will it be to train such a model on its own?
Description of architecture
For those who are not familiar with AttnGAN and the classical GAN, I will briefly describe the essence. The classical GAN consists of a minimum of 2 neural networks - a generator and a discriminator. The task of the generator is the generation of some data (images, text, audio, video, etc.), "similar" to the real data from the dataset. The task of the discriminator is to evaluate the generated data, try to compare them with the real ones and discard them. The rejected result of the generator's work stimulates it to generate the best result to "deceive" the discriminator, who, in turn, learns to better recognize forgery.
There are a lot of GAN modifications, and the authors of AttnGAN have approached the question of architecture quite ingeniously. The model consists of 9 neural networks, finely tuned to the interaction. It looks like this:
AttnGAN: Fine-Grained Text to Image Generation with Attentional Generative Adversarial Networks , 2018 - Tao Xu, Pengchuan Zhang, Qiuyuan Huang, Han Zhang, Zhe Gan, Xiaolei Huang, Xiaodong He. [/i]
It may be interesting
Thankyou for sharing the data which is beneficial for me and others likewise to see.Gulf Coast Western Reviews