Analyzing Job Aware Scheduling Algorithm in Hadoop for Heterogeneous ClusterAuthor(s) : Mayuri A Mehta, Supriya Pati
Volume & Issue : VOLUME 2 / 2015 , ISSUE 2
Page(s) : 51-57
ISSN (Online): 2394-3858
ISSN (Print) : 2394-3866
A scheduling algorithm is required to efficiently manage cluster resources in a Hadoop cluster, thereby to increase resource utilization and to reduce response time. The job aware scheduling algorithm schedules non-local map tasks of jobs based on job execution time, earliest deadline first or workload of the job. In this paper, we present the performance evaluation of the job aware scheduling algorithm using MapReduce WordCount benchmark. The experimental results are compared with matchmaking scheduling algorithm. The results show that the job aware scheduling algorithm reduces average waiting time and memory wastage considerably as compared to matchmaking algorithm.
Scheduling algorithm, heterogeneous cluster, Hadoop, MapReduce
- K. Shvachko, H. Kuang, S. Radia, and R. Chansler, “The Hadoop Distributed File System,” IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST), pp. 1-10, May 2010.
- J. Dean and S. Ghemawat, “MapReduce: Simplified Data Processing on Large Clusters,” in Proceeding of the 6th Symposium on Operating systems Design and Implementation (OSDI), pp. 137-150. USENIX Association, December 2004.
- D. Yoo and K. M. Sim, “A Comparative Review of Job Scheduling For Mapreduce,” in Proceedings of IEEE Cloud Computing and Intelligence Systems (CCIS), pp. 353-358, September 2011.
- B. Thirumala Rao and Dr. L. S. S. Reddy, “Survey on Improved Scheduling in Hadoop MapReduce in Cloud Environments,” International Journal of Computer Applications (0975 – 8887), November 2011.
- “Hadoop MapReduce Next Generation – Fair Scheduler [Online].” Available: http://hadoop.apache.org/docs/current/Hadoop-yarn/Hadoop-yarn-site/FairScheduler.html [Last accessed: November, 2014].
- “Hadoop MapReduce Next Generation – Capacity Scheduler [Online].” Available: http://hadoop.apache.org/docs/current/Hadoop-yarn/Hadoop-yarn-site/CapacityScheduler.html [Last accessed: November, 2014].
- M. Zaharia, D. Borthankur, J. Sarma, K. Elmellegy, S. Shenker, and I. Stoica, “Delay Scheduling: A Simple Technique for Achieving Locality and Fairness in Cluster Scheduling,” in Proceedings of the 5th European conference on Computer systems, ACM, pp. 265-278, 2010.
- C. He, Y. Lu, and D. Swanson, “Matchmaking: A new mapreduce scheduling technique,” IEEE Third International Conference on Cloud Computing Technology and Science(CloudCom), pp. 40-47, December 2011.
- M. Zaharia, A. Kowinski, A. Joseph, R. Katz, and I. Stoica, “Improving MapReduce Performance in Heterogeneous Environments,” USENIX OSDI, 2008.
- Q. Chen, D. Zhang, M Guo, Q. Deng , and S. Guo, “SAMR: A Self-Adaptive MapReduce Scheduling Algorithm In Heterogeneous Environment,” IEEE 10th International Conference on Computer and Information Technology(CIT 2010), pp. 2736-2743, July 2010.
- X. Sun, C. He and Y. Lu, “ESAMR: An Enhanced Self-Adaptive MapReduce Scheduling Algorithm,” IEEE 18th International Conference on Parallel and Distributed Systems, pp. 148-155, December 2012.
- M. Elteit, H. Lin, and W. Feng, “Enhancing MapReduce via Asynchronous Data Processing,” in Proceedings of IEEE 16th International Conference on Parallel and Distributed Systems (ICPADS), pp. 397-405, December 2010.
- K. Kambatla, N. Rapolu, S. Jagannathan, and A. Grama, “Asynchronous Algorithm in MapReduce,” in Proceedings - IEEE International Conference on Cluster Computing (ICCC), pp. 245-254, September 2010.
- Jason Venner, “Tuning Your MapReduce Jobs”, in Pro Hadoop, CA: Apress, 2009.
- Tom White, “How MapReduce Works”, in Hadoop The Definitive Guide, Third ed. CA: O’REILLY,2012.
- X. Dai and B. Bensaou, “A Novel Decentralized Asynchronous Scheduler for Hadoop”, IEEE Global Communications Conference (GLOBECOM), pp. 1470-1475, December 2013.