TY - JOUR
T1 - Neural Architecture Search With a Lightweight Transformer for Text-to-Image Synthesis
AU - Li, Wei
AU - Wen, Shiping
AU - Shi, Kaibo
AU - Yang, Yin
AU - Huang, Tingwen
N1 - Publisher Copyright:
© 2013 IEEE.
PY - 2022
Y1 - 2022
N2 - Despite the cross-modal text-to-imagesynthesis task has achieved great success, most of the latest works in this field are based on the network architectures proposed by predecessors, such as StackGAN, AttnGAN, etc. Since the quality for text-to-image synthesis is more and more demanding, these old and tandem architectures with simple convolution operations are no longer suitable. Therefore, a novel text-to-image synthesis network combining with the latest technologies is in urgent need of exploration. To tackle with this challenge, we creatively propose a unique architecture for text-to-image synthesis, dubbed T2IGAN, which is automatically searched by neural architecture search (NAS). In addition, considering the amazing capabilities of the popular transformer in natural language processing and computer vision, a lightweight transformer is applied in our search space to efficiently integrate the text features and image features. Ultimately, the effectiveness of our searched T2IGAN is remarkable by experimentally evaluating it on the typical text-to-image synthesis datasets. Specifically, we achieve an excellent result of IS 5.12 and FID 10.48 on CUB-200 Birds, IS 4.89 and FID 13.55 on Oxford-102 Flowers, IS 31.93 and FID 26.45 on COCO. By contrast with the state-of-the-art works, ours gets better performance on CUB-200 Birds and Oxford-102 Flowers.
AB - Despite the cross-modal text-to-imagesynthesis task has achieved great success, most of the latest works in this field are based on the network architectures proposed by predecessors, such as StackGAN, AttnGAN, etc. Since the quality for text-to-image synthesis is more and more demanding, these old and tandem architectures with simple convolution operations are no longer suitable. Therefore, a novel text-to-image synthesis network combining with the latest technologies is in urgent need of exploration. To tackle with this challenge, we creatively propose a unique architecture for text-to-image synthesis, dubbed T2IGAN, which is automatically searched by neural architecture search (NAS). In addition, considering the amazing capabilities of the popular transformer in natural language processing and computer vision, a lightweight transformer is applied in our search space to efficiently integrate the text features and image features. Ultimately, the effectiveness of our searched T2IGAN is remarkable by experimentally evaluating it on the typical text-to-image synthesis datasets. Specifically, we achieve an excellent result of IS 5.12 and FID 10.48 on CUB-200 Birds, IS 4.89 and FID 13.55 on Oxford-102 Flowers, IS 31.93 and FID 26.45 on COCO. By contrast with the state-of-the-art works, ours gets better performance on CUB-200 Birds and Oxford-102 Flowers.
KW - Generative adversarial network
KW - neural architecture search
KW - text-to-image synthesis
KW - transformer
UR - https://www.scopus.com/pages/publications/85124238949
U2 - 10.1109/TNSE.2022.3147787
DO - 10.1109/TNSE.2022.3147787
M3 - Article
AN - SCOPUS:85124238949
SN - 2327-4697
VL - 9
SP - 1567
EP - 1576
JO - IEEE Transactions on Network Science and Engineering
JF - IEEE Transactions on Network Science and Engineering
IS - 3
ER -