2.1 Network prestige
In this paper, we are interested in measuring the prestige of GitHub users located in Brazil. In social network analysis, prestige can be measured based on directional relations among actors. We measured prestige using a graph of follow-relationships, in which there is an arc linking user A to B if A follows B. On GitHub, this implies that user A receives notifications from B’s development activities, which means that there is interest from A in assessing B’s contributions.
There are different network measures that can be computed to quantify the prestige of an actor in a social network. The simplest actor-level measure of prestige is the in-degree of a vertex i [43] in a graph, which is often referred to as i’s popularity. However, popularity is a very restricted measure of prestige because it takes only direct choices into account. With popularity it does not matter whether choices are received from popular people. The overall structure of the network is disregarded [6].
Another prestige measure is proximity. It defines an influence domain of actor i as the set of actors from whom i is reachable and considers the distance these actors are from i. It ignores actors who cannot reach i, thus it is defined even if the network is not connected (when some actors are not reachable from other actors) [43].
We used Pajek [17] to calculate the proximity prestige for each vertex in the graph of follow-relationships described above. In Pajek, the proximity prestige of a vertex is the proportion of all vertices (except itself) in its input domain divided by the mean distance from all vertices in its input domain.
Maximum proximity prestige is achieved if a vertex is directly chosen by all other vertices. This is the case, for example, in a star-network in which all choices are directed to the central vertex. Then, the proportion of vertices in the input domain is 1 and the mean distance from these vertices is 1, so proximity prestige is 1 divided by 1. Vertices without input domain get minimum proximity prestige by definition, which is zero [6].
2.2 GitHub
GitHub is a web-based hosting service that allows developers to host their software project repositories using the Git revision control system. Since its launch in April 2008, GitHub has become one of the most popular source code hosting services with over twenty million projects maintained by over eight million registered developers [18]. It is now the largest code host in the world [9].
In addition to revision control, GitHub acts as a social network site that enables developers to connect and collaborate with each other. Developers can search for software projects that they are interested in, easily fork those projects to make their own contributions, and follow the work of others. We are particularly interested in follow-relationships, as they represent a deliberate interest from one developer in another’s work and denote the prestige of a developer in GitHub’s social network.
The site organizes software repositories by software developer or organization, rather than by project, showing a list of each developer’s repositories and their activity on GitHub in a news feed. For a developer, this effectively turns their GitHub profile into an easily accessible public portfolio of their open source development activities [36].
A GitHub user profile includes information on their repositories (i.e., projects) and their recent public activities, such as committing code to a repository or opening an issue report, which are usually not visible in other development environments. The profile page also shows several statistics that are often used on social networking sites, such as the number of other developers following a user or the number of projects they are watching. Such transparency is an interesting feature of GitHub and other social coding sites [42].
GitHub is particularly attractive for researchers because it provides access to its internal data stores through an extensive REST API [19], which researchers can use to access a rich collection of unified and versioned process and product data [9].
Before we detail how we accessed data on Brazilian developers using the GitHub API, we introduce basic demographics about Brazil in the next section to frame our research.
2.3 Brazil’s demographics
Brazil is the fifth-biggest country in the world in terms of area and population. With more than 200 million inhabitants, it is also the biggest country in South America and covers almost half (47.3 %) of the entire continent. Except for Chile and Ecuador, Brazil shares a border with every other country in South America.
Roughly 90 % of Brazil’s inhabitants live in states on the eastern and southern coasts of Brazil, where the population density varies from 20 to 300 residents per square kilometre. The rest of Brazil, i.e., the Amazon and the mountain regions, offers a lot more space with a population density of less than 5 residents per square kilometre in some cases. In contrast, the Federal District of the capital Brasília and the state of Rio de Janeiro have population densities of more than 300 inhabitants per square kilometre.
Brazil is divided into 26 states and a Federal District, which can be divided into five major regions: North. The North accounts for almost half of the area of Brazil (45 %), but it is the region with the fewest inhabitants. In particular the Northwest is not industrially developed. Instead, the region is home to the Amazon basin, the largest ecosystem on earth. The following states are in the North: Acre (AC), Amapá (AP), Amazonas (AM), Pará (PA), Rondônia (RO), Roraima (RR), and Tocantins (TO). Northeast. Almost a third of Brazilians live in the Northeast, a region that is culturally very diverse. It is characterized by Portuguese, African, and indigenous influences. The following states are in the Northeast: Alagoas (AL), Bahia (BA), Ceará (CE), Maranhão (MA), Paraíba (PB), Pernambuco (PE), Piauí (PI), Rio Grande do Norte (RN), and Sergipe (SE). Center-West. The Center-West of Brazil owes its importance to its wealth in raw materials. Nevertheless, the region is not particularly well developed. However, intensive efforts, such as the move of the capital to Brasília, are being made to strengthen the region. The following states are in the Center-West: Distrito Federal (DF), Goiás (GO), Mato Grosso (MT), and Mato Grosso do Sul (MS). The capital, Brasília, is located in the DF. Southeast. The Southeast of Brazil is home to more people than any other South American country. With the metropolitan areas of São Paulo and Rio de Janeiro, this region is the economic engine of the country. The following states are in the Southeast: Espírito Santo (ES), Minas Gerais (MG), Rio de Janeiro (RJ), and São Paulo (SP). South. The South is the smallest region of Brazil with climatic conditions similar to those of southern Europe. The region shows significant cultural influences from German, Polish, and Italian immigrants. The following states are in the South: Paraná (PR), Santa Catarina (SC), and Rio Grande do Sul (RS).
Brazil’s most populous metropolitan areas are São Paulo with about 20 million inhabitants, Rio de Janeiro with about 12.5 million inhabitants, and Belo Horizonte with about 5 million inhabitants, making São Paulo the largest city in the southern hemisphere.
Nowadays, Brazil’s economy is the seventh largest in the world in terms of nominal gross domestic product (GDP), and the seventh largest in terms of purchasing power parity. A member of the BRIC countries, Brazil had one of the world’s fastest growing major economies until about 2010 with economic reforms that gave the country new international reputation and influence. However, the economy has slowed down to modest growth over the last four years.
2.4 Brazil’s IT industry
According to a recent study [1], Brazil ranked 7th in IT investments worldwide and 1st in Latin America, with an investment of 61.6 billion US dollars in 2013. Of this, 10.7 billion came from the software market and 14.4 billion from the services market.
The domestic market is operated by approximately 11,230 companies, dedicated to the development, production and distribution of software and services. From those companies, about 93 % can be categorized as micro and small enterprises. Finance, Services and Telecom accounted for almost 51 % of the user market, followed by Industry, Government and Commerce.
The study also pointed out the regional concentration of investments in the IT market. The Southeast region of Brazil met the largest volume of funds allocated to the sector in 2013, with 64.6 %. The North of the country was the least invested in the sector, with a percentage of 2.2 %; the Northeast recorded 8.6 %; South and Center-West accounted for 13.4 % and 11.0 % respectively.
2.5 Hypotheses
There are many challenges associated with the Brazilian software industry and its growth. The vast majority of software companies are located in the Southeast and South regions of Brazil. In 2008, these two regions accounted for 84.3 % of all software companies in the country with more than 20 employees [37]. This result emphasizes the well-known inequality across regions in Brazil.
Assuming that the uneven distribution of software n and their employees across Brazilian states is related to the socio-economic situation in each of these states, our first hypothesis tests whether developers’ prestige is associated with the development level of the state they are located in:
H1. Developers’ prestige is associated with the development level of the state they are located in.
Our following hypotheses explore the follow-relationships between Brazilian developers on GitHub in more detail. Previous work on follow-relationships found that developers tended to connect with people with similar levels of performance and experience [35]. In fact, the presence of homophily, i.e. the tendency of individuals to associate and bond with similar others, has been discovered in many other network studies in sociology (see McPherson et al. [31] for a review).
For our paper, we examined the homophily of follow-relationships by focusing on two different attributes: geographic location and programming language choice. In particular, our second hypothesis investigates whether developers tend to follow other developers located in the same state:
H2. Developers tend to follow developers located in the same state.
As an alternative explanation of why developers might follow each other, we investigate whether they might use common programming languages as a decision factor:
H3. Developers tend to follow developers who use the same programming languages.
Prestige itself might be a factor for a developer when deciding who to follow. In a study on Twitter’s follow-relationships, Hopcroft et al. [13] found that the likelihood of two prestigious users creating a reciprocal relationship is nearly 8 times higher than the likelihood of two ordinary users. Our forth hypothesis tests whether, in Brazil, a prestigious developer tends to follow other prestigious developers.
H4. In Brazil, the prestige level of a developer who is following is associated with the prestige level of a developer who is being followed.
In a study with open-source software communities, Shen and Monge found that project leaders tend to follow more people, showing that project leaders are more well-connected than developers in other roles [35]. Our fifth hypotheses does a similar test by focusing on the association between network prestige and the number of people developers follow:
H5. In Brazil, developers’ prestige is associated with the number of developers they follow.