Skip to main content

Table 1 Summary of the popularity prediction methods presented in the survey

From: A survey on predicting the popularity of web content

Class

Methods

Data sets

Benchmark model

Performance/Remarks

Before publication

SVM, Naive Bayes, Bagging, Decision Trees, Regression [72]

Feedzilla

 

Shows an accuracy of 84% in predicting the popularity range of a news article.

Before publication

Random Forests [71]

AD, De Pers, FD, NUjiji, Spits, Telegraaf, Trpuw, WMR

 

Good performance in identifying which articles will receive at least one comment.

Cumulative growth

Constant growth [29]

Slashdot

 

Good performance in predicting the number of comments one day after the publication of an article (MSE = 36%).

Cumulative growth

Constant scaling [30]

Digg, YouTube

Constant growth, Log-linear

Outperforms the constant growth and the log-linear models in terms of MRSE.

Cumulative growth

Log-linear [30]

Digg, YouTube

Constant growth, Constant scaling

Outperforms the constant growth and the constant scaling models in terms of MSE.

Cumulative growth

Survival analysis [65]

DPreview, MySpace

 

Using the information received in the first day after the publication it can detect with 80% accuracy which threads will receive more than 100 comments.

Cumulative growth

Logistic regression [61]

Twitter

 

The model can successfully identify which messages will not be retweeted (99% accuracy) and those that will be retweeted more than 10,000 times (98% accuracy).

Temporal analysis

Multivariate linear regression [33]

YouTube

Constant scaling

An average improvement of 15% in terms of MRSE compared to the constant scaling model.

Temporal analysis

Reservoir computing [77]

YouTube

Constant scaling

Minor improvement compared to the constant scaling model.

Temporal analysis

Time series prediction [32]

YouTube

 

Designed for frequently-accessed videos. Good performance in predicting the daily number of views.

Temporal analysis

kSAIT [63]

Twitter

Regression-based methods

Predict the number of tweets using information from the first hour after content publication. An improvement of up to 10% compared to regression-based methods.

Popularity evolution patterns

Hierarchical clustering [32]

YouTube

 

Designed for rarely-accessed videos. The model shows good performance for short-term predictions but significantly larger ones for long-term predictions.

Popularity evolution patterns

MRBF [33]

YouTube

Constant scaling, Multivariate linear regression

An average improvement of 5% in terms of MRSE compared to multivariate linear regression and 21% compared to constant scaling model.

Popularity evolution patterns

Temporal-evolution prediction [34]

YouTube, Vimeo, Digg

Log-linear

Significant improvement compared to the log-linear method. The model can be used to predict the temporal evolution of popularity.

Individual behavior

Social dynamics [81]

Digg

Log-linear

It incorporates information about the design of the web site. Shows an accuracy of 95% in identifying which articles will get on Digg’s front page.

Individual behavior

Conformer Maverick [67]

JokeBox

Collaborative filtering solutions

Adequate for platforms that rank content based on user votes. Better performances than collaborative filtering solutions.

Individual behavior

Bayessian networks [64]

Twitter

 

MRE of 40% when predicting the total number of tweets using the information received in the first five minutes after publication.

Cross-domain

Linear regression [36]

IMDb, Twitter, YouTube

 

Designed to predict movie ratings using social media signals. The best performance was achieved when using textual features from Twitter and the fraction of likes over dislikes from YouTube.

Cross-domain

Linear regression [14]

Al Jazeera

 

Results show that a model based on social media reactions in the first ten minutes has the same performance as one based on the number of views received in the first three hours.

Cross-domain

Social transfer [35]

YouTube, Twitter

SVM basic

Shows a 70% accuracy in identifying which videos will receive sudden bursts of popularity (60% improvement over a model that uses only the information available on YouTube).