Department of CSE (Artificial Intelligence and Machine Learning), ACE Engineering College, India.
International Journal of Science and Research Archive, 2025, 14(01), 720-725
Article DOI: 10.30574/ijsra.2025.14.1.0137
Received on 04 December 2024; revised on 13 January 2025; accepted on 15 January 2025
Generating text to images is a difficult task that combines natural language processing and computer vision. Currently available generative adversarial network (GAN)-based models usually employ text encoders that have already been trained on image-text pairs. Nevertheless, these encoders frequently fall short in capturing the semantic complexity of unread text during pre-training, which makes it challenging to produce images that accurately correspond with the written descriptions supplied.Using BERT, a very successful pre-trained language model in natural language processing, we present a novel text-to-image generating model in order to address this problem. BERT's aptitude for picture generating tasks is improved by allowing it to encode rich textual information through fine-tuning on a large text corpus. Results from experiments on a CUB_200_2011 dataset show that our approach performs better than baseline models in both qualitative and quantitative measures.
Text to image generation; Multimodal data; BERT; GAN; High quality
Preview Article PDF
Kavitha Soppari, Bhanu Vangapally, Syed Sameer Sohail and Harish Dubba. Text to image generation using BERT and GAN. International Journal of Science and Research Archive, 2025, 14(01), 720-725. Article DOI: https://doi.org/10.30574/ijsra.2025.14.1.0137.
Copyright © 2025 Author(s) retain the copyright of this article. This article is published under the terms of the Creative Commons Attribution Liscense 4.0







