Multi Block Transformer for Malayalam Language Modeling

Authors

  • Rohit TP Department of Computer Science, SOE, Cochin University of Science and Technology, Kochi, India Author
  • Sasi Gopalan Department of Mathematics, Cochin University of Science and Technology, Kochi, India Author
  • Varsha Shaheen Department of Computer Science, SOE, Cochin University of Science and Technology, Kochi, India Author

DOI:

https://doi.org/10.47363/JAICC/2024(3)228

Keywords:

Language Modeling, Transformer Architecture, Attention Mechanism, Sequence Modeling and Transduction

Abstract

In this research, we present a novel neural network architecture for natural language generation, specifically designed for Malayalam text. We have adapted the Transformer architecture which is commonly used in language modeling and extended it to work in non-Latin languages. To evaluate the effectiveness of our model, we trained it on a large corpus of Malayalam text and fine-tuned the hyper-parameters using a grid search. Our model achieved a significant improvement in generating coherent and grammatically correct Malayalam text compared to the state-of-the-art models. The model was able to generate text after just 4000 iterations and was able to effectively generalize the relation between symbols and alphabets of the language within 8000 training iterations. The transformer architecture used proved to be highly efficient in language modeling. Our work highlights the importance of developing new model architectures for text generation in complex and rich languages and opens up new avenues for future research in this area.

Author Biographies

  • Rohit TP, Department of Computer Science, SOE, Cochin University of Science and Technology, Kochi, India

    Rohit TP, Department of Computer Science, SOE, Cochin University of Science and Technology, Kochi, India

  • Sasi Gopalan, Department of Mathematics, Cochin University of Science and Technology, Kochi, India


    Department of Mathematics, Cochin University of Science and Technology, Kochi, India 

  • Varsha Shaheen, Department of Computer Science, SOE, Cochin University of Science and Technology, Kochi, India

    Department of Computer Science, SOE, Cochin University of Science and Technology, Kochi, India

Downloads

Published

2024-03-18

How to Cite

Multi Block Transformer for Malayalam Language Modeling. (2024). Journal of Artificial Intelligence & Cloud Computing, 3(2), 1-4. https://doi.org/10.47363/JAICC/2024(3)228

Similar Articles

11-20 of 169

You may also start an advanced similarity search for this article.