Multi Block Transformer for Malayalam Language Modeling

Rohit  TP; Sasi  Gopalan; Varsha  Shaheen

doi:10.47363/JAICC/2024(3)228

Authors

Rohit TP Department of Computer Science, SOE, Cochin University of Science and Technology, Kochi, India Author
Sasi Gopalan Department of Mathematics, Cochin University of Science and Technology, Kochi, India Author
Varsha Shaheen Department of Computer Science, SOE, Cochin University of Science and Technology, Kochi, India Author

DOI:

https://doi.org/10.47363/JAICC/2024(3)228

Keywords:

Language Modeling, Transformer Architecture, Attention Mechanism, Sequence Modeling and Transduction

Abstract

In this research, we present a novel neural network architecture for natural language generation, specifically designed for Malayalam text. We have adapted the Transformer architecture which is commonly used in language modeling and extended it to work in non-Latin languages. To evaluate the effectiveness of our model, we trained it on a large corpus of Malayalam text and fine-tuned the hyper-parameters using a grid search. Our model achieved a significant improvement in generating coherent and grammatically correct Malayalam text compared to the state-of-the-art models. The model was able to generate text after just 4000 iterations and was able to effectively generalize the relation between symbols and alphabets of the language within 8000 training iterations. The transformer architecture used proved to be highly efficient in language modeling. Our work highlights the importance of developing new model architectures for text generation in complex and rich languages and opens up new avenues for future research in this area.

Author Biographies

Rohit TP, Department of Computer Science, SOE, Cochin University of Science and Technology, Kochi, India

Rohit TP, Department of Computer Science, SOE, Cochin University of Science and Technology, Kochi, India
Sasi Gopalan, Department of Mathematics, Cochin University of Science and Technology, Kochi, India

Department of Mathematics, Cochin University of Science and Technology, Kochi, India
Varsha Shaheen, Department of Computer Science, SOE, Cochin University of Science and Technology, Kochi, India

Department of Computer Science, SOE, Cochin University of Science and Technology, Kochi, India

Journal of Artificial Intelligence & Cloud Computing

Multi Block Transformer for Malayalam Language Modeling

Authors

DOI:

Keywords:

Abstract

Author Biographies

Downloads

Published

Issue

Section

License

How to Cite

Similar Articles

Similar Articles

Application of Generative AI for Business Analyst role

Generative Artificial Intelligence Unveiled

Hierarchical Management of AI for Automated Monitoring and Query Resolution

Exploration of Java-Based Big Data Frameworks: Architecture, Challenges, and Opportunities

Exploration of Java-Based Big Data Frameworks: Architecture,Challenges, and Opportunities

GCP Cloud Data Security Best Practices for HIPAA Compliance of Healthcare Data: A Comprehensive Research Analysis

Developing Software for Automated Firmware Updates in Fuel Controllers

AI Tutors in E-Learning: Analyzing Personalized Learning Pathways

Rules-based Expert System to Assist Physicians in Pre-laboratory Screening for the Diagnosis of Pulmonary Tuberculosis Disease in Rural Areas

Revolutionizing Customer Interaction: Implementing Next Best Action with PEGA Voice AI