Class | Methods | Data sets | Benchmark model | Performance/Remarks |
---|---|---|---|---|
Before publication | SVM, Naive Bayes, Bagging, Decision Trees, Regression [72] | Feedzilla | Â | Shows an accuracy of 84% in predicting the popularity range of a news article. |
Before publication | Random Forests [71] | AD, De Pers, FD, NUjiji, Spits, Telegraaf, Trpuw, WMR | Â | Good performance in identifying which articles will receive at least one comment. |
Cumulative growth | Constant growth [29] | Slashdot | Â | Good performance in predicting the number of comments one day after the publication of an article (MSE = 36%). |
Cumulative growth | Constant scaling [30] | Digg, YouTube | Constant growth, Log-linear | Outperforms the constant growth and the log-linear models in terms of MRSE. |
Cumulative growth | Log-linear [30] | Digg, YouTube | Constant growth, Constant scaling | Outperforms the constant growth and the constant scaling models in terms of MSE. |
Cumulative growth | Survival analysis [65] | DPreview, MySpace | Â | Using the information received in the first day after the publication it can detect with 80% accuracy which threads will receive more than 100 comments. |
Cumulative growth | Logistic regression [61] | Â | The model can successfully identify which messages will not be retweeted (99% accuracy) and those that will be retweeted more than 10,000 times (98% accuracy). | |
Temporal analysis | Multivariate linear regression [33] | YouTube | Constant scaling | An average improvement of 15% in terms of MRSE compared to the constant scaling model. |
Temporal analysis | Reservoir computing [77] | YouTube | Constant scaling | Minor improvement compared to the constant scaling model. |
Temporal analysis | Time series prediction [32] | YouTube | Â | Designed for frequently-accessed videos. Good performance in predicting the daily number of views. |
Temporal analysis | kSAIT [63] | Regression-based methods | Predict the number of tweets using information from the first hour after content publication. An improvement of up to 10% compared to regression-based methods. | |
Popularity evolution patterns | Hierarchical clustering [32] | YouTube | Â | Designed for rarely-accessed videos. The model shows good performance for short-term predictions but significantly larger ones for long-term predictions. |
Popularity evolution patterns | MRBF [33] | YouTube | Constant scaling, Multivariate linear regression | An average improvement of 5% in terms of MRSE compared to multivariate linear regression and 21% compared to constant scaling model. |
Popularity evolution patterns | Temporal-evolution prediction [34] | YouTube, Vimeo, Digg | Log-linear | Significant improvement compared to the log-linear method. The model can be used to predict the temporal evolution of popularity. |
Individual behavior | Social dynamics [81] | Digg | Log-linear | It incorporates information about the design of the web site. Shows an accuracy of 95% in identifying which articles will get on Digg’s front page. |
Individual behavior | Conformer Maverick [67] | JokeBox | Collaborative filtering solutions | Adequate for platforms that rank content based on user votes. Better performances than collaborative filtering solutions. |
Individual behavior | Bayessian networks [64] | Â | MRE of 40% when predicting the total number of tweets using the information received in the first five minutes after publication. | |
Cross-domain | Linear regression [36] | IMDb, Twitter, YouTube | Â | Designed to predict movie ratings using social media signals. The best performance was achieved when using textual features from Twitter and the fraction of likes over dislikes from YouTube. |
Cross-domain | Linear regression [14] | Al Jazeera | Â | Results show that a model based on social media reactions in the first ten minutes has the same performance as one based on the number of views received in the first three hours. |
Cross-domain | Social transfer [35] | YouTube, Twitter | SVM basic | Shows a 70% accuracy in identifying which videos will receive sudden bursts of popularity (60% improvement over a model that uses only the information available on YouTube). |