Skip to main content

Table 1 Operations and Algorithms implemented in COMPSs available in Lemonade

From: Upgrading a high performance computing environment for massive data processing

Categories Operations and algorithms
Data Read and write files, attributes changer, data balancer
ETL Add columns, aggregation, clean missing data, difference, distinct (remove duplicate rows), drop columns, filter, intersection, joins (inner, left and right join), replace values, sample, select columns (projection), sort, split, transformation, union.
Geographic Read shapefile, Geo within (check if a point is within a region), ST-DBSCAN
Graph PageRank
Metrics Classification (accuracy, precision/recall and f-measure), regression (MSE, RMSE, MAE, R2)
ML Feature assembler, Scalers (min-max, max-abs and standard), String Indexer, PCA, K-Means, DBSCAN, KNN, Naive Bayes, SVM, Logistic regression, Linear regression, Apriori, Load/Save model
Text Vectorization by Bag-of-Words and Tf-idf, tokenizer, stop-words remover