Leveraging Hadoop for High Volume ETL Workflows: A Performance Analysis
DOI:
https://doi.org/10.47363/axcda419Keywords:
Hadoop, ETL, MapReduce, Oozie, SparkAbstract
With the rapid increase of data in today's organizations, there is a need to have sustainable and effective ETL solutions. The current paper covers a detailed performance evaluation of Hadoop-based tools such as MapReduce, Oozie, and Spark applications on large-volume ETL operations. We then measure the performance of these tools based on different factors like speed, efficiency, and resources used and available. Based on our research, Hadoop based solutions considerably enhance scalability compared to conventional ETL techniques for projects involving big data. Consequently, this study offers insights to organizations undertaking analysis of big data on how to design their data pipelines best.
Downloads
Published
Issue
Section
License
Copyright (c) 2023 Journal of Artificial Intelligence & Cloud Computing

This work is licensed under a Creative Commons Attribution 4.0 International License.