Leveraging Hadoop for High Volume ETL Workflows: A Performance Analysis

Authors

  • Santosh Kumar Singu Senior Solution Specialist, Deloitte Consulting LLP, 338 Autumn Sage Dr, Indian Trail, NC, USA Author

DOI:

https://doi.org/10.47363/axcda419

Keywords:

Hadoop, ETL, MapReduce, Oozie, Spark

Abstract

With the rapid increase of data in today's organizations, there is a need to have sustainable and effective ETL solutions. The current paper covers a detailed performance evaluation of Hadoop-based tools such as MapReduce, Oozie, and Spark applications on large-volume ETL operations. We then measure the performance of these tools based on different factors like speed, efficiency, and resources used and available. Based on our research, Hadoop based solutions considerably enhance scalability compared to conventional ETL techniques for projects involving big data. Consequently, this study offers insights to organizations undertaking analysis of big data on how to design their data pipelines best.

Author Biography

  • Santosh Kumar Singu, Senior Solution Specialist, Deloitte Consulting LLP, 338 Autumn Sage Dr, Indian Trail, NC, USA

    Santosh Kumar Singu, Senior Solution Specialist, Deloitte Consulting LLP, 338 Autumn Sage Dr, Indian Trail, NC, USA.

Downloads

Published

2023-10-23

How to Cite

Leveraging Hadoop for High Volume ETL Workflows: A Performance Analysis. (2023). Journal of Artificial Intelligence & Cloud Computing, 2(4), 1-4. https://doi.org/10.47363/axcda419

Similar Articles

You may also start an advanced similarity search for this article.