Skip to main content

Table 1 Operations and Algorithms implemented in COMPSs available in Lemonade

From: Upgrading a high performance computing environment for massive data processing

Categories

Operations and algorithms

Data

Read and write files, attributes changer, data balancer

ETL

Add columns, aggregation, clean missing data, difference, distinct (remove duplicate rows), drop columns, filter, intersection, joins (inner, left and right join), replace values, sample, select columns (projection), sort, split, transformation, union.

Geographic

Read shapefile, Geo within (check if a point is within a region), ST-DBSCAN

Graph

PageRank

Metrics

Classification (accuracy, precision/recall and f-measure), regression (MSE, RMSE, MAE, R2)

ML

Feature assembler, Scalers (min-max, max-abs and standard), String Indexer, PCA, K-Means, DBSCAN, KNN, Naive Bayes, SVM, Logistic regression, Linear regression, Apriori, Load/Save model

Text

Vectorization by Bag-of-Words and Tf-idf, tokenizer, stop-words remover