Skip to main content

Table 2 Tasks assigned to the first group

From: An open source library to parse and analyze online collaborative knowledge-building portals

Data Extraction

Task 1

Extracting 5 Wikipedia articles from each quality category namely FA, GA, B, C, Start and Stub. a

Task 2

Extracting 10,000 random questions, its answers and comments from

 

Stack Exchange site, say, anime.stackexchange.com

Data Parsing

Task 3

Finding the number of words, sentences and Wikilinks added/deleted in each revision

 

of an article (United States).

Task 4

Extracting all the questions which had an accepted answer from anime.stackexchange.com

Analysis Methods

Task 5

Find the correlation between monthly pageviews and the number of revisions of

 

an article (United States).

Task 6

Find the correlation between Gini coefficient (a measure of inequality of contribution)

 

and answer to question ratio for various stack stackexchange portals.

  1. aWikipedia has defined seven quality grades, starting from Stub class to Featured Articles (FA) class, where the least developed articles are in class Stub and fully developed articles are in FA class