Architecting AI Workflows with Apache Spark Chemotherapy

Hina   Gandhi

doi:10.47363/JAICC/TechFusion2025/2025(4)5

Architecting AI Workflows with Apache Spark Chemotherapy

Authors

Hina Gandhi New York, USA Author

DOI:

https://doi.org/10.47363/JAICC/TechFusion2025/2025(4)5

Keywords:

Architecting AI , Apache Spark Chemotherapy

Abstract

This session traces the evolution of big data systems - from Hadoop’ s batch-driven model to modern distributed architectures - and explores how AI-driven approaches can enhance and optimize Apache Spark. We’ll break down Spark’s internal design, including the roles of the Driver, DAG Scheduler, Task Scheduler, and Executors, to show how large-scale workloads are processed efficiently across clusters. Using real-world examples like cost aggregation pipelines, the talk highlights how Spark overcomes Hadoop’s limitations while still facing challenges around configuration complexity, data skew, and resource management. Finally,
we’ll discuss how reinforcement learning can be applied to Spark to enable dynamic scheduling, smarter partitioning, and adaptive
resource allocation, transforming Spark into a self-optimizing data processing engine.

Author Biography

Hina Gandhi , New York, USA

Hina Gandhi, New York, USA

Downloads

View PDF

Published

2025-11-28

Issue

Vol. 4 No. 6 (2025): Conference Proceedings: TechFusion 2025 - AI, Cybersecurity, and Emerging Trends in Computer Sciences

Section

Articles

License

This work is licensed under a Creative Commons Attribution 4.0 International License.

How to Cite

Architecting AI Workflows with Apache Spark Chemotherapy. (2025). Journal of Artificial Intelligence & Cloud Computing, 4(6), 1-1. https://doi.org/10.47363/JAICC/TechFusion2025/2025(4)5

Download Citation

Architecting AI Workflows with Apache Spark Chemotherapy

Authors

DOI:

Keywords:

Abstract

Author Biography

Downloads

Published

Issue

Section

License

How to Cite

Similar Articles