Ensuring High Data Quality and Error Resilience in Autonomous Self-Schedulable Libraries for Heterogeneous Data Sources in Near Real-Time Ingestion Pipelines

Authors

  • Venkata Tadi Senior Data Analyst, Frisco, Texas, USA Author

DOI:

https://doi.org/10.47363/JEAST/2023(5)259

Keywords:

Data Quality, Error Resilience, Autonomous Libraries, Heterogeneous Data Sources, Real-Time Data Ingestion

Abstract

In the era of big data, enterprises increasingly rely on near-real-time data ingestion pipelines to drive advanced analytics and machine learning models. The complexity and diversity of heterogeneous data sources pose significant challenges to maintaining high data quality and error resilience in these pipelines. This paper investigates strategies to ensure robust data quality and error management within autonomous self-schedulable libraries designed for handling diverse data formats. We explore architectural designs, best practices, and innovative techniques that enable seamless integration and real-time processing of disparate data sources. Key areas of focus include error detection and correction mechanisms, data validation frameworks, and resilient pipeline orchestration. Through comprehensive case studies and experimental evaluations, we demonstrate the efficacy of these strategies in enhancing the reliability and accuracy of data ingestion processes. Our findings provide a roadmap for enterprises seeking to optimize their data pipelines, ensuring they are equipped to handle the complexities of heterogeneous data environments with minimal human intervention.

Author Biography

  • Venkata Tadi, Senior Data Analyst, Frisco, Texas, USA

    Venkata Tadi, Senior Data Analyst, Frisco, Texas, USA.

Downloads

Published

2023-03-28