Special Issue on the Future of Middleware (FOME'11)
- SI: FOME - The Future of Middleware
- Open Access
- Published:
On Cloud computational models and the heterogeneity challenge
Journal of Internet Services and Applications volume 3, pages 77–86 (2012)
Abstract
Cloud computing is by far the most cost-effective technology for hosting Internet-scale services and applications. The MapReduce model, in particular, is largely used nowadays in Cloud infrastructures to meet the demand of large-scale data and computation intensive applications. Despite its success, the implications of MapReduce on the management of Cloud workload and cluster resources are still largely unstudied. In this article, we show that dealing with the heterogeneity of workloads and machine capabilities is a key challenge. In today’s cloud environment, workloads can have varied sizes, lengths, resource requirements, and arrival rates. The machines also have varied CPU, memory, I/O speed, and network bandwidth capacities. Jointly they pose difficult challenges pertaining, among others, to job scheduling, task and data placement, resource sharing and resource allocation. We analyze the heterogeneity challenge in these specific problem domains and survey the representative state-of-the-art works that try to address them. We found that although advances are made that partially address some of the outlined challenges, there are even more open challenges yet to be explored, and this topic at large is ripe for scientific contributions.
References
Amazon EC2, http://aws.amazon.com/ec2/
Ananthanarayanan G, Kandula S, Greenberg A, Stoica I, Lu Y, Saha B, Harris E (2010) Reining in the outliers in MapReduce clusters using Mantri. In: Proc. OSDI
Ananthanarayanan G, Agarwal S, Kandula S, Greenberg A, Stoica I, Harlan D, Harris E (2011) Scarlett: coping with skewed content popularity in mapreduce clusters. ACM European conference on computing systems (EuroSys)
Apache hadoop, http://hadoop.apache.org/
Chen Y, Ganapathi AS, Griffith R, Katz RH (2010) Towards understanding cloud performance tradeoffs using statistical workload analysis and replay. Tech rep, University of California, Berkeley
Chen Y, Ganapathi AS, Griffith R, Katz RH (2010) Analysis and lessons from a publicly available Google cluster trace. Tech rep, University of California, Berkeley
Cheng L, Zhang Q, Boutaba R (2011) Mitigating the negative impact of preemption on heterogeneous MapReduce workloads. In: International conference on network and service management (CNSM)
Chowdhury M, Zaharia M, Ma J, Jordan M, Stoica I (2011) Managing data transfers in computer clusters with orchestra. In: ACM SIGCOMM
Dean J, Ghemawat S (2008) MapReduce: simplified data processing on large clusters. Commun ACM 51(1):107–113
Feng H, Misra V, Rubenstein D (2005) Optimal state-free, size-aware dispatching for heterogeneous M/G-type systems. In: Performance evaluation
Foss S, Korshunov D (2006) Heavy tails in multi-server queue. In: Queueing systems: theory and applications
Ghemawat S, Gobioff H, Leung ST (2003) The Google file system. ACM SIGOPS Oper Syst Rev 37(5):29–43
Ghodsi A, Zaharia M, Hindman B, Konwinski A, Shenker S, Stoica I (2011) Dominant resource fairness: fair allocation of multiple resource types. In: Networked systems design implementation (NSDI), pp 323–336
Hadoop distributed file system, http://hadoop.apache.org/hdfs/
Harchol-Balter M (2002) Task assignment with unknown duration. J. ACM 49(2):260–288
Harchol-Balter M, Downey AB (1997) Exploiting process lifetime distributions for dynamic load balancing. In: ACM transactions on computer systems
Isard M, Budiu M, Yu Y, Birrell A, Fetterly D (2007) Dryad: distributed data-parallel programs from sequential building blocks. In: Proc. Eurosys, March 2007, pp 59–72
Isard M, Prabhakaran V, Currey J, Wieder U, Talwar K, Goldberg A (2009) Quincy: fair scheduling for distributed computing clusters. In: Proc. SOSP
Jackson DS, Kunzinger FF (2003) Calculation of system availability using traffic statistics. Bell Labs Tech J 7(3):139–150
Lempiäinen J, Manninen M (2002) Radio interface system planning for GSM/GPRS/UMTS. Springer, Berlin
Mishra AK, Hellerstein JL, Cirne W, Das CR (2010) Towards characterizing Cloud backend workloads: insights from Google compute clusters. ACM SIGMETRICS Perform Eval Rev 37(4):34–41
NIST definition of cloud computing v15, http://csrc.nist.gov/groups/SNS/cloud-computing/cloud-def-v15.doc
Palanisamy B, Singh A, Liu L, Jain BP (2011) Locality-aware resource allocation for MapReduce in a cloud. In: ACM international conference on supercomputing (SC)
Tari Z, Broberg J, Zomaya A, Baldoni R (2005) A least flow-time first load sharing approach for distributed server farm. J Parallel and Distributed Computing
Tian F, Chen K (2011) Towards optimal resource provisioning for running MapReduce programs in public clouds. In: IEEE international conference on cloud computing (CLOUD)
Traffic analysis for voice over IP, Cisco Technical report, 2007
Zaharia M, Konwinski A, Joseph AD, Katz R, Stoica I (2008) Improving MapReduce performance in heterogeneous environments. In: Proc. OSDI
Zaharia M, Borthakur D, Sarma JS, Elmeleegy K, Shenker S, Stoica I (2010) Delay scheduling: a simple technique for achieving locality and fairness in cluster scheduling. In: Proc. Eurosys
Zhang Q, Cheng L, Boutaba R (2010) Cloud computing: state-of-the-art and research challenges. J Internet Serv Appl 1(1):7–18
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 2.0 International License ( https://creativecommons.org/licenses/by/2.0 ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
About this article
Cite this article
Boutaba, R., Cheng, L. & Zhang, Q. On Cloud computational models and the heterogeneity challenge. J Internet Serv Appl 3, 77–86 (2012). https://doi.org/10.1007/s13174-011-0054-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13174-011-0054-7
Keywords
- Cloud computing
- MapReduce
- Heterogeneity
- Scheduling
- Resource allocation