Visual Question Answering using Transformer Architectures: Applying Transformer Models to Improve Performance in VQA Tasks

Vedant  Singh

doi:10.47363/JAICC/2022(1)E228

Authors

Vedant Singh USA Author

DOI:

https://doi.org/10.47363/JAICC/2022(1)E228

Keywords:

Visual Question Answering, Transformer Models, Vision Transformers, Multimodal AI, Natural Language Processing, Deep Learning, Assistive Technologies, Ethical AI, Human-AI Collaboration, Lightweight Transformers

Abstract

Visual Question Answering (VQA) programs are an area of subject and field in computer science that seeks to develop technologies that enable the
client to answer questions based on displayed images. Bert, Vision Transformers (ViTs), and multimodal transformers have aided the VQA systems significantly in understanding the relations between vision and text data. In this paper, these architectures are considered in relation to the scalability, dynamic attention mechanism, and multimodal pre-trained models of the prior CN-RNN hybrid models’ weaknesses. Feedback activities connected with assistive technologies, healthcare, retail, self-driving vehicles, and creative industries indicate how VQA might be easily introduced and provide examples of likely relative positive and negative societal impacts such as bias, privacy, and inclusion. VQA systems are slowly becoming paramount for improving accessibility solutions, for instance, where a visually impaired person talks to the system to explain what a picture is about. However, VQA systems based on current transformers have some issues, notably from the point of view of computational complexity and reasoning capability. This paper covers the current state of research, issues, and the direction of further development, going further and noting that more attention should be paid to lightweight models, datasets from multiple domains, as well as the integration of human-generated data with AI. Accordingly, the identified results show that VQA systems can become one of the elements of context-aware, inquiry-based solutions for advanced applications in various fields.

Author Biography

Vedant Singh, USA

Vedant Singh, USA.

Journal of Artificial Intelligence & Cloud Computing

Visual Question Answering using Transformer Architectures: Applying Transformer Models to Improve Performance in VQA Tasks

Authors

DOI:

Keywords:

Abstract

Author Biography

Downloads

Published

Issue

Section

License

How to Cite

Most read articles by the same author(s)

Similar Articles

Similar Articles

Advancements in Natural Language Processing (NLP) and Its Applications in Voice Assistants and Chatbots

NLP-Based De-Identification Techniques for Patient Data Anonymization

Standardizing Open Table Formats for Big Data Analysis: Implications for Machine Learning and AI Applications

Named Entity Recognition: A Deep Dive

AI & ML for Standards Development: Transforming Collaboration and Efficiency

Cyber Threat Intelligence: Leveraging AI for Predictive Analytics in Hybrid Cloud Systems

Revolutionizing Sales Training: The Transformative Impact of Generative AI

Deep Learning for Medical Image Analysis: Advances, Challengesand Future Prospects

AI-Driven Multi-PDF Chatbot: Integrating LangChain and GPT-3 for Enhanced Data Processing

Comparative Analysis of Humans and Large Language Models Decision-Making Abilities: Exploring the Potential ConsiderationsRegarding Utilization of Artificial Intelligence in Decision SupportSystems