A survey on predicting the popularity of web content

Tatar, Alexandru; de Amorim, Marcelo Dias; Fdida, Serge; Antoniadis, Panayotis

doi:10.1186/s13174-014-0008-y

Table 1 Summary of the popularity prediction methods presented in the survey

From: A survey on predicting the popularity of web content

Class	Methods	Data sets	Benchmark model	Performance/Remarks
Before publication	SVM, Naive Bayes, Bagging, Decision Trees, Regression [72]	Feedzilla		Shows an accuracy of 84% in predicting the popularity range of a news article.
Before publication	Random Forests [71]	AD, De Pers, FD, NUjiji, Spits, Telegraaf, Trpuw, WMR		Good performance in identifying which articles will receive at least one comment.
Cumulative growth	Constant growth [29]	Slashdot		Good performance in predicting the number of comments one day after the publication of an article (MSE = 36%).
Cumulative growth	Constant scaling [30]	Digg, YouTube	Constant growth, Log-linear	Outperforms the constant growth and the log-linear models in terms of MRSE.
Cumulative growth	Log-linear [30]	Digg, YouTube	Constant growth, Constant scaling	Outperforms the constant growth and the constant scaling models in terms of MSE.
Cumulative growth	Survival analysis [65]	DPreview, MySpace		Using the information received in the first day after the publication it can detect with 80% accuracy which threads will receive more than 100 comments.
Cumulative growth	Logistic regression [61]	Twitter		The model can successfully identify which messages will not be retweeted (99% accuracy) and those that will be retweeted more than 10,000 times (98% accuracy).
Temporal analysis	Multivariate linear regression [33]	YouTube	Constant scaling	An average improvement of 15% in terms of MRSE compared to the constant scaling model.
Temporal analysis	Reservoir computing [77]	YouTube	Constant scaling	Minor improvement compared to the constant scaling model.
Temporal analysis	Time series prediction [32]	YouTube		Designed for frequently-accessed videos. Good performance in predicting the daily number of views.
Temporal analysis	kSAIT [63]	Twitter	Regression-based methods	Predict the number of tweets using information from the first hour after content publication. An improvement of up to 10% compared to regression-based methods.
Popularity evolution patterns	Hierarchical clustering [32]	YouTube		Designed for rarely-accessed videos. The model shows good performance for short-term predictions but significantly larger ones for long-term predictions.
Popularity evolution patterns	MRBF [33]	YouTube	Constant scaling, Multivariate linear regression	An average improvement of 5% in terms of MRSE compared to multivariate linear regression and 21% compared to constant scaling model.
Popularity evolution patterns	Temporal-evolution prediction [34]	YouTube, Vimeo, Digg	Log-linear	Significant improvement compared to the log-linear method. The model can be used to predict the temporal evolution of popularity.
Individual behavior	Social dynamics [81]	Digg	Log-linear	It incorporates information about the design of the web site. Shows an accuracy of 95% in identifying which articles will get on Digg’s front page.
Individual behavior	Conformer Maverick [67]	JokeBox	Collaborative filtering solutions	Adequate for platforms that rank content based on user votes. Better performances than collaborative filtering solutions.
Individual behavior	Bayessian networks [64]	Twitter		MRE of 40% when predicting the total number of tweets using the information received in the first five minutes after publication.
Cross-domain	Linear regression [36]	IMDb, Twitter, YouTube		Designed to predict movie ratings using social media signals. The best performance was achieved when using textual features from Twitter and the fraction of likes over dislikes from YouTube.
Cross-domain	Linear regression [14]	Al Jazeera		Results show that a model based on social media reactions in the first ten minutes has the same performance as one based on the number of views received in the first three hours.
Cross-domain	Social transfer [35]	YouTube, Twitter	SVM basic	Shows a 70% accuracy in identifying which videos will receive sudden bursts of popularity (60% improvement over a model that uses only the information available on YouTube).

Back to article page