Developing Resilient Cloud Systems through AI-Augmented Site Reliability Engineering

Authors

  • Ayisha Tabasumm Senior IEEE, IEEE member, USA Author
  • Shaik Abdul Kareem Independent Researcher, USA Author

DOI:

https://doi.org/10.47363/JEAST/2020(2)E124

Keywords:

AI-Augmented SRE, Cloud Resilience, Predictive Maintenance, Automated Incident Response, Machine Learning, Cloud Computing, Site Reliability Engineering, AIOps, Cloud Infrastructure, Continuous Learning

Abstract

As cloud infrastructures become more complex and critical to business operations, ensuring their resilience and reliability is paramount. Traditional Site Reliability Engineering (SRE) practices, while effective, struggle to cope with the scale and complexity of modern cloud environments. This paper explores the integration of Artificial Intelligence (AI) into SRE practices to develop more resilient cloud systems. By leveraging AI to augment decision-making, automate responses, and predict potential issues, organizations can enhance the reliability of their cloud services. This research presents novel frameworks and methodologies, provides real-world case studies, and offers empirical evidence of the improvements achieved through AI-augmented SRE.

Author Biographies

  • Ayisha Tabasumm, Senior IEEE, IEEE member, USA

    Ayisha Tabasumm, Senior IEEE, IEEE member, USA

  • Shaik Abdul Kareem, Independent Researcher, USA

    Shaik Abdul Kareem, Independent Researcher, USA

Downloads

Published

2020-10-18