TY - STD TI - Kamburugamuve S, et al.Twister2: Design of a big data toolkit. Concurr Comput: Pract Experience. 2019;31(14). https://doi.org/10.1002/cpe.5189. ID - ref1 ER - TY - STD TI - Fox G, et al.Big data, simulations and HPC convergence. In: Big Data Benchmarking: 6th International Workshop, WBDB 2015, Toronto, ON, Canada, June 16-17, 2015 and 7th International Workshop, WBDB 2015, New Delhi, India, December 14-15, 2015, Revised Selected Papers. Cham, Switzerland: Springer: 2016. p. 3–17. https://doi.org/10.1007/978-3-319-49748-8_1. ID - ref2 ER - TY - STD TI - Tejedor E, et al.PyCOMPSs: Parallel computational workflows in Python. Int High Perform Comput Appl. 2017; 31(1):66–82. https://doi.org/10.1177/1094342015594678. ID - ref3 ER - TY - STD TI - Asch M, et al.Big data and extreme-scale computing: Pathways to convergence-toward a shaping strategy for a future software and data ecosystem for scientific inquiry. Int J High Perform Comput Appl. 2018; 32(4):435–79. https://doi.org/10.1177/1094342018778123. ID - ref4 ER - TY - STD TI - Lezzi D, et al.Enabling e-Science applications on the cloud with COMPSs. In: Parallel Processing Workshops at European Conference on Parallel Processing (Euro-Par 2011). Berlin: Springer: 2011. p. 25–34. https://doi.org/10.1007/978-3-642-29737-3_4. ID - ref5 ER - TY - CHAP AU - Lordan, F. AU - Ejarque, J. AU - Sirvent, R. AU - Badia, R. M. PY - 2016 DA - 2016// TI - Energy-aware programming model for distributed infrastructures BT - 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP 2016) PB - IEEE Computer Society CY - Washington ID - Lordan2016 ER - TY - STD TI - Zaharia M, et al.Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing. In: Proceedings of the 9th USENIX Symposium on Networked Systems Design and Implementation (NSDI ’12). Berkeley: USENIX Association: 2012. p. 15–28. https://dl.acm.org/citation.cfm?id=2228301. UR - https://dl.acm.org/citation.cfm?id=2228301 ID - ref7 ER - TY - STD TI - Santos W, et al.Lemonade: A scalable and efficient Spark-based platform for Data Analytics. In: 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID). Piscataway: IEEE Press: 2017. p. 745–8. https://doi.org/10.1109/CCGRID.2017.142. ID - ref8 ER - TY - STD TI - Marozzo F, et al.Enabling cloud interoperability with COMPSs. In: Parallel Processing Workshops at European Conference on Parallel Processing (Euro-Par 2012). Berlin: Springer: 2012. p. 16–27. https://doi.org/10.1007/978-3-642-32820-6_4. ID - ref9 ER - TY - STD TI - Ramon-Cortes C, et al.Transparent orchestration of task-based parallel applications in containers platforms. 2018; 16(1):137–60. https://doi.org/10.1007/s10723-017-9425-z. ID - ref10 ER - TY - STD TI - Apache Cassandra. http://cassandra.apache.org/. Accessed 4 July 2019. UR - http://cassandra.apache.org/ ID - ref11 ER - TY - JOUR AU - Shepler, S. AU - Eisler, M. AU - Noveck, D. PY - 2010 DA - 2010// TI - Network file system (NFS) version 4 minor version 1 protocol JO - RFC VL - 5661 ID - Shepler2010 ER - TY - STD TI - Li H. Alluxio: A virtual distributed file system. EECS Department, University of California, Berkeley, USA. 2018. http://www2.eecs.berkeley.edu/Pubs/TechRpts/2018/EECS-2018-29.html. UR - http://www2.eecs.berkeley.edu/Pubs/TechRpts/2018/EECS-2018-29.html ID - ref13 ER - TY - STD TI - Amazon Simple Storage Service (S3). https://aws.amazon.com/s3/. Accessed 4 July 2019. UR - https://aws.amazon.com/s3/ ID - ref14 ER - TY - STD TI - Microsoft Azure Storage. https://azure.microsoft.com/services/storage/. Accessed 4 July 2019. UR - https://azure.microsoft.com/services/storage/ ID - ref15 ER - TY - CHAP AU - Schwan, P. PY - 2003 DA - 2003// TI - Lustre: Building a file system for 1000-node clusters BT - Proceedings of the Linux Symposium PB - Linux symposium CY - Ottawa ID - Schwan2003 ER - TY - STD TI - OpenStack Storage (Swift). https://docs.openstack.org/swift/. Accessed 4 July 2019. UR - https://docs.openstack.org/swift/ ID - ref17 ER - TY - STD TI - Weil SA, et al.Ceph: A scalable, high-performance distributed file system. In: Proceedings of the 7th Symposium on Operating Systems Design and Implementation (OSDI ’06). Berkeley: USENIX Association: 2006. p. 307–20. http://dl.acm.org/citation.cfm?id=1298455.1298485. UR - http://dl.acm.org/citation.cfm?id=1298455.1298485 ID - ref18 ER - TY - STD TI - Andersen DG, et al.FAWN: A fast array of wimpy nodes. In: Proceedings of the 22nd ACM SIGOPS Symposium on Operating Systems Principles (SOSP ’09). New York: ACM: 2009. p. 1–14. https://doi.org/10.1145/1629575.1629577. ID - ref19 ER - TY - STD TI - DeCandia G, et al.Dynamo: Amazon’s highly available key-value store. In: Proceedings of the 21st ACM SIGOPS Symposium on Operating Systems Principles (SOSP ’07). New York: ACM: 2007. p. 205–20. https://doi.org/10.1145/1294261.1294281. ID - ref20 ER - TY - STD TI - Memcached: A distributed memory object caching system. http://memcached.org/.. Accessed 4 July 2019. UR - http://memcached.org/. ID - ref21 ER - TY - STD TI - Apache HBase. http://hbase.apache.org/.. Accessed 4 July 2019. UR - http://hbase.apache.org/. ID - ref22 ER - TY - STD TI - Palankar MR, et al.Amazon S3 for science grids: A viable solution? In: Proceedings of the 2008 International Workshop on Data-aware Distributed Computing (DADC ’08). New York: ACM: 2008. p. 55–64. https://doi.org/10.1145/1383519.1383526. ID - ref23 ER - TY - STD TI - Wickramasinghe P, et al.Twister2:TSet high-performance iterative dataflow. In: International Conference on High Performance Big Data and Intelligent Systems (HPBD&IS 2019). Piscataway: IEEE Press: 2019. p. 55–60. https://doi.org/10.1109/HPBDIS.2019.8735495. ID - ref24 ER - TY - JOUR AU - Goodstadt, L. PY - 2010 DA - 2010// TI - Ruffus: a lightweight Python library for computational pipelines JO - Bioinformatics VL - 26 UR - https://doi.org/10.1093/bioinformatics/btq524 DO - 10.1093/bioinformatics/btq524 ID - Goodstadt2010 ER - TY - STD TI - Gafni E, et al.COSMOS: Python library for massively parallel workflows. Bioinformatics. 2014; 30(20):2956–8. https://doi.org/10.1093/bioinformatics/btu385. ID - ref26 ER - TY - STD TI - Mierswa I, et al.YALE: Rapid prototyping for complex data mining tasks. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York: ACM: 2006. p. 935–40. https://doi.org/10.1145/1150402.1150531. ID - ref27 ER - TY - STD TI - Demšar J, et al.Orange: Data mining toolbox in Python. J Mach Learn Res. 2013; 14(1):2349–53. ID - ref28 ER - TY - STD TI - Berthold MR, et al.KNIME - the konstanz information miner: version 2.0 and beyond. ACM SIGKDD Explor Newsl. 2009; 11(1):26–31. https://doi.org/10.1145/1656274.1656280. ID - ref29 ER - TY - CHAP AU - Dean, J. AU - Ghemawat, S. PY - 2004 DA - 2004// TI - Mapreduce: Simplified data processing on large clusters BT - OSDI’04: Sixth Symposium on Operating System Design and Implementation PB - USENIX Association CY - San Francisco ID - Dean2004 ER - TY - STD TI - Kranjc J, et al.ClowdFlows: A cloud based scientific workflow platform. In: Machine Learning and Knowledge Discovery in Databases: European Conference (ECML PKDD 2012). Berlin: Springer: 2012. p. 816–9. https://doi.org/10.1007/978-3-642-33486-3_5. ID - ref31 ER - TY - JOUR AU - Podpečan, V. AU - Zemenova, M. AU - Lavrač, N. PY - 2012 DA - 2012// TI - Orange4WS environment for service-oriented data mining JO - Comput J VL - 55 UR - https://doi.org/10.1093/comjnl/bxr077 DO - 10.1093/comjnl/bxr077 ID - Podpečan2012 ER - TY - STD TI - Microsoft Azure Machine Learning. https://azure.microsoft.com/services/machine-learning-studio/.. Accessed 4 July 2019. UR - https://azure.microsoft.com/services/machine-learning-studio/. ID - ref33 ER - TY - STD TI - Conejero J, et al.Task-based programming in COMPSs to converge from HPC to big data. Int J Perform Comput Appl. 2018; 32(1):45–60. https://doi.org/10.1177/1094342017701278. ID - ref34 ER - TY - BOOK AU - White, T. PY - 2015 DA - 2015// TI - Hadoop: The Definitive Guide PB - O’Reilly Media, Inc. CY - Sebastopol ID - White2015 ER - TY - STD TI - Gonzales SD. PyWebHDFS: a Python wrapper for the Hadoop WebHDFS REST API. 2016. https://pypi.python.org/pypi/pywebhdfs/.. Accessed 4 July 2019. UR - https://pypi.python.org/pypi/pywebhdfs/. ID - ref36 ER - TY - STD TI - Luckow A. WebHDFS: HDFS Python client based on WebHDFS REST API. 2014. https://pypi.org/project/WebHDFS/.. Accessed 4 July 2019. UR - https://pypi.org/project/WebHDFS/. ID - ref37 ER - TY - STD TI - Kalika M. Python WebHDFS. 2019. https://github.com/mk23/webhdfs.. Accessed 4 July 2019. UR - https://github.com/mk23/webhdfs. ID - ref38 ER - TY - STD TI - Rosen J. PySpark Internals. 2016. https://cwiki.apache.org/confluence/display/SPARK/PySpark+Internals/.. Accessed 4 July 2019. UR - https://cwiki.apache.org/confluence/display/SPARK/PySpark+Internals/. ID - ref39 ER - TY - CHAP AU - Leo, S. AU - Zanetti, G. PY - 2010 DA - 2010// TI - Pydoop: a Python MapReduce and HDFS API for Hadoop BT - Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing PB - ACM CY - New York ID - Leo2010 ER - TY - STD TI - Apache Arrow Developers. Pyarrow: Python library for Apache Arrow. 2016. https://pypi.org/project/pyarrow/.. Accessed 4 July 2019. UR - https://pypi.org/project/pyarrow/. ID - ref41 ER - TY - STD TI - Chang L, et al.HAWQ: A massively parallel processing SQL engine in Hadoop. In: Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data (SIGMOD ’14). New York: ACM: 2014. p. 1223–34. https://doi.org/10.1145/2588555.2595636. ID - ref42 ER - TY - CHAP AU - McKinney, W. PY - 2011 DA - 2011// TI - Pandas: a foundational Python library for data analysis and statistics BT - Workshop on Python for High Performance and Scientific Computing Collocated with the 24rd International Conference for High Performance Computing, Networking, Storage and Analysis (SC ’11) PB - ACM CY - New York ID - McKinney2011 ER - TY - BOOK AU - Jain, R. PY - 1991 DA - 1991// TI - The Art of Computer Systems Performance Analysis: Techniques for Experimental Design, Measurement, Simulation, and Modeling. Wiley Computer Publishing PB - Wiley CY - New York ID - Jain1991 ER -