Concurrency bugs in open source software: a case study
© The Author(s) 2017
Received: 24 August 2016
Accepted: 13 March 2017
Published: 4 April 2017
Concurrent programming puts demands on software debugging and testing, as concurrent software may exhibit problems not present in sequential software, e.g., deadlocks and race conditions. In aiming to increase efficiency and effectiveness of debugging and bug-fixing for concurrent software, a deep understanding of concurrency bugs, their frequency and fixing-times would be helpful. Similarly, to design effective tools and techniques for testing and debugging concurrent software, understanding the differences between non-concurrency and concurrency bugs in real-word software would be useful.
This paper presents an empirical study focusing on understanding the differences and similarities between concurrency bugs and other bugs, as well as the differences among various concurrency bug types in terms of their severity and their fixing time, and reproducibility. Our basis is a comprehensive analysis of bug reports covering several generations of five open source software projects. The analysis involves a total of 11860 bug reports from the last decade, including 351 reports related to concurrency bugs. We found that concurrency bugs are different from other bugs in terms of their fixing time and severity while they are similar in terms of reproducibility. Our findings shed light on concurrency bugs and could thereby influence future design and development of concurrent software, their debugging and testing, as well as related tools.
KeywordsConcurrency bugs Bug severity Fixing time Open source software Apache Hadoop Apache ZooKeeper Apache Oozie Apache Accumulo Apache spark Case study
With the introduction of multicore and other parallel architectures, there is an increased need for efficient and effective handling of software executing on such architectures. An important aspect in this context is to understand the bugs that occur due to parallel and concurrent execution of software. In this paper, we look into how the increase of such executions have impacted a number of issues, including the occurrence of related bugs, the difficulty to fix these bugs compared to fixing non-concurrent ones, and the distribution of unreproducible concurrency and non-concurrency bugs.
Testing and debugging concurrent software are faced with a variety of challenges . These challenges concern different aspects of software testing and debugging, such as parallel programming , performance testing, error detection  and more. Since concurrent software exhibit more non-deterministic behavior and non-deterministic bugs are generally viewed to be more challenging than other types of bugs [4–6], testing and debugging concurrent software are also considered to be more challenging compared to testing and debugging of sequential software.
Developing concurrent software requires developers to keep track of all the possible communication patterns that evolve from the large number of possible interleavings or concurrently overlapping executions that can occur between different execution threads through utilizing the shared memory.
Handling the many execution scenarios that this results in is a notoriously difficult task in debugging and makes it equally hard to create test cases .
In the presented study, we are particularly interested in isolating concurrency bugs from other types of bugs (non-concurrency bugs) and analyzing the distinguishing features in their respective fixing processes. Hence, the main emphasis of this research is on concurrency bugs, and on exploring the nature and extent of concurrency bugs in real-world software. This exploration of bugs can be helpful to understand how we should address concurrency bugs, estimate the most time-consuming and difficult to reproduce ones, and prioritize them to speed up the debugging and bug-fixing processes. Also it could be helpful for designers to avoid the errors that are more likely to occur during the early phases of the software life-cycle.
RQ1: How common are different types of concurrency bugs compared to non-concurrency bugs?
RQ2: What is the fraction of unreproducible bugs that are also concurrency bugs?
RQ3: How long time is required to fix concurrency bugs compared to fixing non-concurrency bugs?
RQ4: Are concurrency bugs severer than non-concurrency bugs?
RQ5: How long time is required to fix the severest concurrency bugs as compared to fix the severest non-concurrency bugs?
In this study, we investigate the bug reports from five open source software projects, i.e., Apache Hadoop project, Apache ZooKeeper project, Oozie project, Accumulo project and Apache Spark project. We classify the reported bugs into three distinct categories, i.e., fixed and closed concurrency bugs, fixed and closed non-concurrency bugs and unreproducible bugs. We further classify the concurrency bugs based on bug type. For concurrency and non-concurrency we additionally consider severity and fixing time. We compare the non-concurrency, concurrency and unreproducible bugs in terms of their reporting frequency. Our results indicate that a relatively small share of bugs is related to concurrency issues, while the vast majority are non-concurrency bugs. Fixing time for concurrency and non-concurrency bugs is different but this difference is not big. However, the fixing time for unreproducible concurrency and unreproducible non-concurrency bugs is similar. In addition, concurrency bugs are considered to be slightly severer than non-concurrency bugs.
The remainder of this paper is organized as follows. We describe our research methodology in Section 2. In Section 3, we present the classification schemes of the study. We provide a comprehensive set of results, quantitative analysis of bug reports and answer the research questions in Section 4. The discussion on obtained results and the threats to validity are presented in Section 5. We survey related work in Section 6. Finally, we conclude the study and highlight the direction of future work in Section 7.
2.1 Bug-source software selection
We considered five open source applications viz., Apache Hadoop project, Apache ZooKeeper project, Apache Oozie project, Apache Accumulo project and Apache Spark project. The projects coordinate distributed processes with significant number of releases and an issue management platform for managing, configuring and testing. The projects have a web interface for managing, configuring and testing its services and components while the detailed information on bugs and bug fixes are openly available.
Apache Hadoop project1 is a Java based distributed computing framework built for applications implemented using the MapReduce2 programming model . Hadoop has changed constantly and considerably in 59 releases over six years of development. The Hadoop framework has been widely adopted by both the industry and research communities . Due to Hadoop’s key concept of parallel and distributed abstraction, it is recently adopted by several big companies such as Facebook, Ebay, Yahoo, Amazon, Adobe and more.
Apache ZooKeeper project3 is a fault-tolerant and a distributed coordination service for other distributed applications such as cloud computing applications . This tool provides fundamental services by encapsulating distributed coordination algorithms and maintaining a simple database. An application can use the ZooKeeper client to build higher-level coordination functions  such as leader election, barriers, queues, and read/write revocable locks.
Apache Oozie project4 is a coordination system and server-based workflow scheduling to manage data processing jobs for Hadoop. This tool is a mature solution for defining and executing workflows (that can be triggered by a user, time event or data arrival) . It supports useful functionalities such as the persistence of an execution history.
Apache Accumulo project5 is a distributed, column oriented and multidimensional data storage  that built on top of Apache Hadoop, Zookeeper and Thrift. Accumulo’s design is based on the Google Bigtable implementation and is very similar to HBase (the fundamental data storage capabilities are between Accumulo and HBase are clearly similar).
Apache Spark project6 is a cluster computing system from Apache with incubator status. This tool is pretty fast at running programs and writing data . Spark supports in-memory computing, that enables it to query data much faster compared to diskbased engines such as Hadoop, and supports a rich set of other tools such as Spark SQL (for SQL and structured data processing), MLLib (for machine learning) and Spark Streaming. It also provides an optimized engine that supports general computation graphs (GraphX). The aim of the project is to build a faster distributed parallel processing framework  that can perform better for certain tasks than Hadoop.
These five projects track both enhancement requests and bugs using JIRA7. JIRA is an issue management platform, which allows users to manage their issues throughout their entire life cycle. It is mainly used in software development and allows users to track any kind of unit of work, such as project task, issue, story and bug to manage and track development efforts.
2.2 Bug reports selection
In this stage, we selected the concurrency bugs from the bug report database including bug reports from the period 2006-2015, i.e., the last decade. In total, the projects bug report databases contain 11860 issues in this period that are tagged as “Bug”.
In filtering based on “Bug” as report type step, we practically searched in the projects report databases for the reports with issue type “Bug”8 according to our main objective and bug definition.
In filtering based on the status of the report step, we searched for bugs with “Closed” (i.e., this report considered finished, the resolution is correct) and “Fixed” resolution status (i.e., fix for this issue has been implemented). We only selected “Fixed” and “Closed” reports since unfixed and open bug reports might be invalid and root causes described in the reports could be incorrect. It would then be impossible for us to completely understand the details on these bugs and determine their types. It would have been interesting to also consider bugs with other statuses or resolutions (such as duplicate bugs, or bugs that were coded with “won’t fix”) but these bug reports are not likely to be as complete and reliable as bugs that are labeled closed and fixed. Thus without reasonably complete bug reports it would not be possible to recognize the reported bugs.
In filtering based on the keywords relevant to concurrency step, we decided to use the keywords that could help us to include the bug reports were compatible with the scope of this study. In identifying such keywords, we reviewed the keywords utilized in similar previous studies [1, 14]. The keywords included in the search, i.e. the terms, were:
thread, blocked, locked, race, dead-lock, deadlock, concurrent, concurrency, atomic, synchronize, synchronous, synchronization, starvation, suspension, “order violation”, “atomicity violation”, “single variable atomicity violation”, “multi variable atomicity violation”, livelock, live-lock, multi-threaded, multithreading, and multi-thread.
The selected bugs from each project after the keywords relevant to concurrency step
We automatically filtered reports that are not likely to be relevant by performing a search query on the bug report database. Our search query filtered bugs based on (1) “Bug” as report type, (2) the status of the report, and (3) keywords relevant to concurrency.
In filtering based on “Bug” as report type step, we filtered the projects’ reports with the same issue type as we applied for filtering the reports for concurrency reports. We included the bugs from the period 2006-2015 (last decade). Similarly, after this step we had 11860 reported bugs in our database.
In filtering based on the status of the report step, we searched for bugs with “Cannot Reproduce”9 resolution status. After accomplishing this filter, we have obtained 513 reported bugs.
Finally, in filtering based on the keywords relevant to concurrency step, we used the same keywords as we have used for excluding concurrency bugs. We obtained a final set with 203 reports by applying these steps.
2.3 Manual exclusion of bug reports and sampling of non-concurrency bugs
In this stage, we manually analyzed the 647 bug reports obtained in the previous step. The manual inspection revealed that some of the bugs that matched the search query were not concurrency bugs. Thus, we excluded them. More specifically, we determined the relevance of the bugs by checking (1) if they describe a concurrency bug, and if they do, (2) what type of concurrency bug it is. The latter is done, by comparing their descriptions (or explanations) with our concurrency bug classification (given in Section 3.1). If we could not map a report with any class, we excluded that report from our set. We also excluded reports with very little information, since we could not analyze them properly. After filtering, we obtained a final set with 351 concurrency bugs.
As explained in Section 1, our main objective is understanding the differences between non-concurrency and concurrency bugs. For comparison purposes, we randomly sampled an equally sized subset of non-concurrency bugs (351) that were reported during 2006-2015 and were “Fixed” and “Closed”. These bugs were used for comparative analysis between the concurrency and non-concurrency bug sets.
Report counts from different step of bug report selection process
# of reports
2006-2015 & Bug & Fixed & Closed
Total bug reports (from five projects)
Keywords match related bug reports
Concurrency bug reports analyzed
2006-2015 & Bug & Fixed & Closed
Sample of non-concurrency bug reports
2006-2015 & Bug & (Cannot reproduce)
Total bug reports (from five projects)
Keywords match related bug reports
2.4 Bug reports classification
We analyzed the issues and information contained in the reports using them to map to the concurrency bug classification manually. Each bug report contains several types of information, which were valuable in recognizing and filtering the concurrency bugs with other types of bugs to aid us understand the characteristics of bugs. The bug reports contained for example the description of the bug with some discussions among the developers on how to detect, where to detect (bug localization) and how to fix the bugs. Typically most of the reports include a description of the correction of the bug, and a link to the version of the software where the bug has been corrected, and even the scenario of reproducing the reported bug. The reports also contain additional fields such as perceived priority, created date, resolved date, version affected and more.
We used different types of fields in order to explore the concurrency bug issues in the five selected projects. We used the priority field to estimate the severity of the bug. The interval between the Created date and Resolved date fields was used to calculate the amount of (calendar) time required to fix the bug (fixing time).
3 Study classification schemes
In order to perform the bug classification process, we defined three main classifiers and grouped the reports based on these classifiers. The classifiers were type of concurrency bug, fixing time and severity. For Concurrency bug classification, we used the classification scheme for concurrency bugs described in Section 3.1. For fixing time, we calculated the fixing time for each bug report (Section 3.2) and for severity, we used the scheme proposed by the JIRA in  (Section 3.3). The details of these three classification schemes are described as follows:
3.1 Concurrency bug classification
Data race occurs when at least two threads access the same data and at least one of them write the data . It occurs when concurrent threads perform conflicting accesses by trying to update the same memory location or shared variable [18, 19].
Deadlock is “a condition in a system where a process cannot proceed because it needs to obtain a resource held by another process but it itself is holding a resource that the other process needs” . More generally, it occurs when two or more threads attempts to access shared resources held by other threads, and none of the threads can give them up [18, 21]. During deadlock, all involved threads are in a waiting state.
Livelock happens when a thread is waiting for a resource that will never become available while the CPU is busy releasing and acquiring the shared resource. It is similar to deadlock except that the state of the process involved in the livelock constantly changes and is frequently executing without making progress .
Starvation is “a condition in which a process indefinitely delayed because other processes are always given preference” . Starvation typically occurs when high priority threads are monopolising the CPU resources. During starvation, at least one of the involved threads remains in the ready queue.
Suspension-based locking or Blocking suspension occurs when a calling thread waits for an unacceptably long time in a queue to acquire a lock for accessing a shared resource .
Order violation is defined as the violation of the desired order between at least two memory accesses . It occurs when the expected order of interleavings does not appear . If a program fails to enforce the programmer’s intended order of execution then an order violation bug could happen .
Atomicity violation refers to the situation when the execution of two code blocks (sequences of statements protected by lock, transaction) in one thread is concurrently overlapping with the execution of one or more code blocks of other threads such a way that the result is inconsistent with any execution where the blocks of the first thread are executed without being overlapping with any other code block.
3.2 Fixing time calculation
Fixing is the time duration (days) which developers/debuggers spend to fix a reported bug. Fixing time, calculated by subtracting the Created date and Resolved date fields of the bug, i.e. it refers to the calender time rather than the actual time spent fixing the bug.
3.3 Bug report severity classification
Blocker shows the highest priority. It indicates that this issue takes precedence over all others.
Critical indicates that this issue is causing a problem and requires urgent attention.
Major shows that this issue has a significant impact.
Minor indicates that this issue has a relatively minor impact.
Trivial is the lowest priority.
4 Results and quantitative analysis
This section provides the analysis of the data collected for bugs obtained from the projects bug database. We used 702 bugs (i.e., 351 are concurrency bugs while the rest (351) are non-concurrency bugs sampled for our analysis) reported between 2006 and 2015. The bug selection process is described in Section 2. We provide the raw data of this study at https://goo.gl/wcdD16.
4.1 \thelikesubsection RQ1: How common are different types of concurrency bugs compared to non-concurrency bugs?
The 351 concurrency bugs were further categorized according to the concurrency bug classification in . As mentioned already in Section 3.1, this taxonomy defines seven types of concurrency bugs: (1) Data race, (2) Deadlock, (3) Livelock, (4) Starvation, (5) Suspension, (6) Order violation and (7) Atomicity violation. For the sake of this study, we have added one more category to the taxonomy: (8) Not clear. The Not clear category includes reports that cover bugs related to concurrency and parallelism, but are not classified according to the concurrency bugs taxonomy. For these bugs, the summary and description of the report indicate that it is a concurrency bug, but further classification of bug type is impossible by a very project implementation-specific explanation of the bug details and solution.
In addition, we investigated the frequency with which each type of concurrency bug appears, in order to get insights of bug prioritization.
Answer RQ1: Only 4% of the total set of bugs are related to concurrency issues, while the majority of bugs (i.e., 96%) are of non-concurrency type.
4.2 RQ2: What is the fraction of unreproducible bugs that are also concurrency bugs?
Answer RQ2: The fraction of unreproducible bugs from the total set of bugs is only 4% while 2% of the total set are unreproducible and related to concurrency issues.
4.3 \thelikesubsection RQ3: How long time is required to fix concurrency bugs compared to fixing non-concurrency bugs?
In order to gain better understanding on how difficult is to fix concurrency bugs in comparison with non-concurrency bugs, we conducted a quantitative analysis of the effort required to fix both concurrency and non-concurrency bugs. This effort was measured by calculating the calendar time between the Created date and Resolved date fields from projects’ databases. We used this time as an indicator for the complexity involved in fixing these bugs.
Descriptive statistics results for concurrency and non-concurrency bug sets in terms of fixing time. We report here: average, minimum, maximum, median and standard deviation values
To evaluate if there is any statistical difference between concurrency and non-concurrency bugs fixing time, we use a Wilcoxon Signed Rank test, a non-parametric hypothesis test for determining if there is any statistical difference among the two data sets, with the assumption that the data is drawn from an unknown distribution. We use 0.05 as the significance level.
In addition, we calculate the Vargha-Delaney A-statistic as a measure of effect size  for analyzing significance. This statistic is independent of the sample size and has a range between 0 and 1. The choice of what constitutes a significant effect size can depend on context. Vargha and Delaney  suggest that A-statistic of greater than 0.64 (or less than 0.36) is indicative of “medium” effect size, and of greater than 0.71 (or less than 0.29) can be indicative of a “large” effect size.
We are interested in determining if the fixing time for concurrency bugs is similar to the one for non-concurrency bugs. We begin by formulating the statistical hypotheses as follows: the null hypothesis is that fixing time of the concurrency and non-concurrency bugs sets have identical distributions (H0) and the alternative hypothesis is that the distributions are different (H1). Based on the p-value of 0.006, which is less than 0.05, we reject the null hypothesis. That is, the fixing time of concurrency bugs and non-concurrency bugs are statistically different. When calculating the Vargha-Delaney A-statistic, we obtained a value of 0.559 which indicates a “small” standardized effect size . From our results, we can see that the fixing time for concurrency bugs is different from the fixing time for non-concurrency bugs, but that this difference corresponds to a “small” standardized effect size.
Fixing time comparison for concurrency bug types
Wilcoxon test for concurrency bugs fixing time comparison
H 0 \ H A
Examining Table 5, we can conclude that the null hypothesis is accepted with p-values above the traditional statistical significance limit of 0.05 for the majority of bug types except for “Deadlock-Data race”, “Deadlock-Order violation” and “Deadlock-Suspension” pairs where the null hypothesis is rejected. This shows that the bug fixing time is not different except between “Deadlock-Data race”, “Deadlock-Order violation” and “deadlock-Suspension”. For example, in Table 5, we show the obtained p-value of 0.003 for testing the pair “Deadlock-Data race”, which is less than 0.05, and therefore we can reject the null hypothesis: the fixing time for Deadlock and Data Race bug types are different. In addition, the A-statistic for the same pair of bug types is about 0.824 (or 0.175 in the second row), which is greater than the significance level of 0.71. We can say that in this case the effect size is “large”. We can conclude that fixing time for deadlock and data race bug types is different with a “large” effect size.
Answer RQ3: Concurrency bugs require longer fixing time than non-concurrency bugs, but the difference is not large.
4.4 \thelikesubsection RQ4: Are concurrency bugs severer than non-concurrency bugs?
In order to statistically compare the severity between concurrency and non-concurrency bugs, we apply a Two-Sample Kolmogorov-Smirnov test (also known as two-sample K-S test) to find if the frequency between these two types of bugs is significantly different. Our null hypothesis can be formulated as follows: are the severity level results of concurrency bugs and non-concurrency bugs drawn from the same distribution. In this test, if the D-value is larger than the critical-D-value, the observed frequency is distributed differently.
Kolmogorov-Smirnov test for concurrency and non-concurrency bugs severity comparison
Critical D-value = D351,0.05 = 1.36 / √351 = 0.072
D-value = Sup |f(x)−g(x)| = 0.122507123
Answer RQ4: Concurrency bugs are considered to be severer than non-concurrency bugs, but the difference is relatively small.
Sorted concurrency bugs severity
Most frequent concurrency bug types
Atomicity violation (5%)
Data race (59%)
Order violation (18%)
Atomicity violation (5%)
4.5 \thelikesubsection RQ5: How long time is required to fix the severest concurrency bugs as compared to fix the severest non-concurrency bugs?
We analyze the fixing time of the severest concurrency bugs and the severest non-concurrency bugs to investigate how fast the severest bugs get fixed. Based on the severity classification given in Section 3.3, “Blocker” group defined for categorizing the bugs with highest priority, thus we extract the bugs tagged as “Blocker” from our concurrency and non-concurrency bugs’ data sets. We also calculate the fixing time for each bug as explained in Section 3.2. We analyze the fixing time of the bug with “Blocker” severity in order to compare the fixing time for the severest concurrency and non-concurrency bugs.
Fixing time for the bugs with Blocker severity
In addition, to evaluate if there is any statistical difference among two data sets (blocker concurrency and non-concurrency bugs fixing time), we use Wilcoxon Signed Rank test. Also, we calculate the Vargha-Delaney A-statistic as a measure of effect size  for analyzing significance. The tested hypotheses are formulated as follows: the null hypothesis (H0) is that fixing time of the blocker concurrency and blocker non-concurrency bugs sets have identical distributions. The alternative hypothesis (H1) is that the distributions are different. Based on the p-value of 0,308, which is larger than 0.05, we accept the (H0). That is, the fixing time of concurrency bugs and non-concurrency bugs have statistically identical distributions. We calculate the Vargha-Delaney A-statistic and obtain a value of 0.52 which indicates a “small” standardized effect size . From our results, we can see that the fixing time for blocker concurrency bugs is not different from the fixing time for blocker non-concurrency bugs and this similarity corresponds to a “small” standardized effect size.
Answer RQ5: Fixing time for the severest concurrency bugs is statistically similar to fixing time for the severest non-concurrency bugs.
In our study, we find a much smaller share of concurrency bugs than the one found by other similar studies. This could possibly be due to the different bug-source software and case studies of our study and other similar studies. An interesting observation is that 52% of 351 concurrency bugs were reported in the five-year interval of 2006–2010, and the remaining 48% were reported in the five-year interval of 2011–2015.
Similarly, the fixing time found by prior studies is much larger for concurrency bugs than for non-concurrency bugs. We find a difference, but it is relatively small. The large portion of fixing time in other studies could be due to reproducing the bugs and using the bug scenarios in the bug descriptions. However, in our study as shown in Fig. 6, we find few reports stating difficulties in reproducing the bug (4% in compare to 77%).
The involvement of more than one thread cause a concurrency bug. For this reason, we predict to find that concurrency bugs were severer than non-concurrency bugs. However, we expected most of the “Blocker” bugs belong to deadlock type due to its characteristic and properties but this was not the case. In our study Data race was the biggest portion of “Blocker”. Our interpretation is here that Data races are the most problematic bugs to fix in these projects.
Moreover, our investigation shows that about half of the concurrency bugs are of Data race type. The reason could be that there are not many approaches to avoid Data race bug in programming or the approaches are not well developed and explained for developers. About 13% of the bugs belongs to the Suspension type. By investigating the bug reports’ descriptions and comments, we noticed that most of the Suspension bugs occurred when the developer put the code in waiting mode for an unnecessary long time (by using sleep() or wait() method), thereby causing a Suspension bug. We also noticed that, developers (or testers) in confronting with Order violation bugs were confusing. They typically were not being able to easily find the correct order of thread execution (lots of description in the bug report between the developers to figure out the real cause of the bug). In addition, as we expected Livelock has the lowest frequency of concurrency bug types. By reading the reports’ descriptions and comments, we understand that most of developers were aware of Livelock and during the implementation they took care of this type of bug.
Our analysis on unreproducible concurrency bugs illustrates that only 513 of reported bugs (∼4%) were unreproducible. Out of 513 bugs 203 bugs (∼40%) have at least one concurrency keyword on their descriptions. While 647 bugs out of 11860 (∼5%) contain concurrency keywords. Our interpretation is here that the number of unreproducible concurrency bugs are similar to the number of unreproducible non-concurrency bugs.
Furthermore, by comparing the number of two sets of reproduced and categorized bugs i.e., closed and fixed concurrency keywords matched and unreproducible concurrency keywords matched, our interpretation is here that the number of closed and fixed concurrency keywords matched bugs is two times bigger than the number of unreproducible concurrency keywords matched bugs.
5.1 Validity threats
We have chosen the five projects for our study. Based on the their characteristics (given in Section 2.1), we consider the projects likely to be representative for browsers, desktop GUI, and server programs that use concurrency. However, other projects (e.g., implemented in other programming languages) might show different concurrency bug characteristics. For instance, prior studies [1, 7] shows deadlocks in MySQL represent 40% of the total number of concurrency bugs while in our project, deadlocks account for a mere 13% of the concurrency bugs. Nevertheless, for atomicity violation and order violation, our results are similar to prior findings .
The reports with other status such as “In Progress” (i.e., this issue is being actively worked on at the moment by the assignee) or “Open” (i.e., This issue is in the initial “Open” state, ready for the assignee to start work on it) were not considered in this study. However, there is a chance that we did not include the relevant reports from these groups.
It is possible - or even likely - that the search query may have failed to identify some actual concurrency bugs. It should however be noted that we used more keywords compared to previous similar studies (see e.g., [1, 7]) and we argue that a concurrency bug report that does not contain any of the keywords is likely to be incomplete.
Some concurrency bugs might go unfixed because they occur infrequently or only on certain platforms/software configurations. It would be interesting to consider these kinds of bugs, but they are not likely to have detailed discussions. As a result, these bugs are not considered as important as the reported and fixed concurrency bugs that are used in our study.
We used Seven categories (type) and manual categorization for concurrency bugs. We excluded bugs which did not have sufficiently detailed information to be categorized. This procedure can lead to erroneously discarding some concurrency bugs.
It should be noted that the fixing time measure used is elapsed calendar time and does hence not necessarily reflect the actual effort spent. It could for instance be the case that less severe bugs are put on hold for some time at times when efforts need to be devoted to severer bugs or other tasks. However, it seems likely that there at least at an aggregated level is a reasonable positive correlation between calendar time and effort.
Since all unreproducible reported bugs did not have sufficient and detailed discussions thus we could not continue further and map their descriptions (or explanation) with our concurrency bug classification. However, we believe that the amount of extracted bugs can provide an overview of unreproducible bugs in general.
Even if the obtained results (for all research questions RQ1-RQ5) are based on data samples from the five projects, these results might apply to other software as well. More analysis is required to confirm whether this is in fact the case.
6 Related work
A series of related studies on debugging, predicting and fixing concurrent software have been conducted. In particular, there is a large body of studies on prediction [29–32] and propagation [33, 34] of bugs in source code. Most of these studies strive to identify the components or source code files, that are most prone to contain bugs. Fault prediction partially focuses on understanding the behavior of programmers and its effects on software reliability. This work is complementary to the study conducted in this research, which is concentrated on a specific type of bugs (i.e., concurrency bugs) and on understanding their consequences.
In , Vandiver et al. analyzed the consequences of bugs for three database systems. This work is focused on presenting a replication architecture, instead of on studying bugs. The authors did not distinguish between concurrency and non-concurrency bugs, and only evaluated whether they caused crash or Byzantine faults.
Three open-source applications bug databases (Apache web server, GNOME desktop environment and MySQL database) are investigated by Chandra and Chen , again in a study with a slightly different focus than ours. The authors analyzed all types of bugs (only 12 of them were concurrency bugs) to determine the effectiveness of generic recovery techniques in tolerating the bugs. Concurrency bugs are only one possible type of bug that affects their results. In contrast, based on our main objective we focus on a more narrow type of bugs by limiting our study scope to concurrency bugs, but provide a broader analysis (comparing concurrency and non-concurrency bugs) taking into consideration several types of these bugs.
Farchi et al.  analyzed concurrency bugs by creating such bugs artificially. They asked programmers to write codes which have concurrency bugs. We believe that artificially creating bugs may not lead to bugs that are representative of the real-world software bugs. We, on the other hand, analyze the bug database of an open-source software, which is well maintained, and widely used software.
Lu et al. examined concurrency bug patterns, manifestation, and fix strategies of 105 randomly selected real-world concurrency bugs from four open-source applications (MySQL, Apache, Mozilla and OpenOffice) bug databases . Their study focused on several aspects of the causes of concurrency bugs, and the study of their effects was limited to determining whether they caused deadlocks or not. We use the similar study methodology to find relevant bug reports for our analysis but we provide a complementary angle by studying the effects of recent concurrency bugs with a more fine-grained classification than mapping bugs into deadlock and not-deadlock bug classes.
7 Conclusion and future work
This paper provides a comprehensive study of 11860 fixed bug reports from a widely used open source storage designed for big-data applications. The study covers the fixed bug reports from the last ten years, with the purpose of understanding the differences between concurrency and non-concurrency bugs. Two aspects of these reports are examined: fixing time and severity. Based on a structured selection process, we ended up with 351 concurrency bugs and 351 non-concurrency bugs (sampled). By analyzing these reports, we have identified the frequencies of concurrency, non-concurrency and unreproducible bugs. The study also helped us to recognize the most common type of concurrency bugs in terms of severity and fixing time. The main results of this study are: (1) Only a small share of bugs is related to concurrency while the vast majority are non-concurrency bugs. (2) Similarly, only a small share of bugs are unreproducible and related to concurrency issues (2%). (3) Fixing time for concurrency and non-concurrency bugs is different but this difference is relatively small. (4) Concurrency and non-concurrency bugs are different in terms of severity, while concurrency bugs are severer than non-concurrency bugs. (5) Fixing time for the severest concurrency bugs and the severest non-concurrency bugs is similar. These findings could help software designers and developers to understand how to address concurrency bugs, estimate the most time-consuming ones, and prioritize them to speed up the debugging and bug-fixing processes.
2 MapReduce is a programming model and an associated implementation for processing and generating large datasets that is amenable to a broad variety of real-world tasks .
8 Bug is “a problem which impairs or prevents the functions of the product” .
9 This issue (bug) could not be reproduced at this time, or not enough information was available to reproduce the issue .
10 Bug reproducibility indicates the reproduction of software failure(s) caused by bug(s).
This research is supported by Swedish Research Council (VR) through the EXACT project, and the Swedish Knowledge Foundation (KKS) through grant 20130258. Additionally, we would like to thank Eduard Paul Enoiu for interesting discussions and valuable feedback.
SAA is the main driver and author of all parts. DS, SE and HH have contributed in ideas, discussion, review and revision of all sections of the paper. All authors read and approved the final manuscript.
The authors declare that they have no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
- Lu S, Park S, Seo E, Zhou Y. Learning from mistakes: a comprehensive study on real world concurrency bug characteristics. In: ACM Sigplan Notices. Washington: ACM: 2008. p. 329–39.Google Scholar
- Peri R. Software development tools for multi-core/parallel programming. In: 6th Workshop on Parallel and Distributed Systems: Testing, Analysis, and Debugging. Seattle: ACM: 2008. p. 9.Google Scholar
- Zhang W, Sun C, Lim J, Lu S, Reps T. Conmem: Detecting crash-triggering concurrency bugs through an effect-oriented approach. ACM Trans Softw Eng Methodol (TOSEM). 2013; 22(2):10.View ArticleGoogle Scholar
- Desouza J, Kuhn B, De Supinski BR, Samofalov V, Zheltov S, Bratanov S. Automated, scalable debugging of MPI programs with Intel®; Message Checker. In: Proceedings of the Second International Workshop on Software Engineering for High Performance Computing System Applications. St. Louis: ACM: 2005. p. 78–82.Google Scholar
- Godefroid P, Nagappan N. Concurrency at Microsoft: An exploratory survey. In: CAV Workshop on Exploiting Concurrency Efficiently And Correctly. Princeton: 2008.Google Scholar
- Süß M, Leopold C. Common mistakes in OpenMP and how to avoid them. In: OpenMP Shared Memory Parallel Programming. Academic Press Ltd.: 2008. p. 312–23.Google Scholar
- Fonseca P, Li C, Singhal V, Rodrigues R. A study of the internal and external effects of concurrency bugs. In: Dependable Systems and Networks (DSN), 2010 IEEE/IFIP International Conference on. Chicago: IEEE: 2010. p. 221–30.Google Scholar
- Joshi SB. Apache Hadoop Performance-tuning Methodologies and Best Practices. In: Proceedings of the 3rd ACM/SPEC International Conference on Performance Engineering. ICPE ’12. New York: ACM: 2012. p. 241–2.Google Scholar
- Polato I, Ré R, Goldman A, Kon F. A comprehensive view of Hadoop research? A systematic literature review. J Netw Comput Appl. 2014; 46:1–25.View ArticleGoogle Scholar
- Medeiros A. ZooKeeper’s atomic broadcast protocol: Theory and practice. Technical report. 2012.Google Scholar
- Kamburugamuve S, Fox G, Leake D, Qiu J. Survey of apache big data stack. Indiana University, Tech. Rep. 2013.Google Scholar
- Dendek PJ, Czeczko A, Fedoryszak M, Kawa A, Wendykier P, Bolikowski Ł. Content analysis of scientific articles in apache hadoop ecosystem. In: Intelligent Tools for Building a Scientific Information Platform: From Research to Implementation. Springer: 2014. p. 157–72.Google Scholar
- Shoro AG, Soomro TR. Big data analysis: Apache spark perspective. Glob J Comput Sci Technol. 2015; 15(1).Google Scholar
- Abbaspour Asadollah S, Hansson H, Sundmark D, Eldh S. Towards classification of concurrency bugs based on Observable properties. In: 1st International Workshop on Complex Faults and Failures in Large Software Systems. Italy: IEEE Press: 2015.Google Scholar
- What is an Issue - Atlassian Documentation. 2015. https://confluence.atlassian.com/jira063/what-is-an-issue-683542485.html. Accessed 12 June 2016.
- Abbaspour Asadollah S, Sundmark D, Eldh S, Hansson H, Afzal W. 10 years of research on debugging concurrent and multicore software: a systematic mapping study. Softw Qual J. 2016; 25:1–34.Google Scholar
- Yoshiura N, Wei W. Static data race detection for java programs with dynamic class loading. In: Internet and Distributed Computing Systems. Springer: 2014. p. 161–73.Google Scholar
- Henningsson K, Wohlin C. Assuring fault classification agreement - an empirical evaluation. In: International Symposium on Empirical Software Engineering, ISESE’04. Redondo beach CA: IEEE Computer Society: 2004. p. 95–104.Google Scholar
- Akhter S, Roberts J, Vol. 33. Multi-core Programming: Richard Bowles; 2006.Google Scholar
- Bhatia Y, Verma S. Deadlocks in distributed systems. Int J Res. 2014; 1(9):1249–52.Google Scholar
- Gove D. Multicore Application Programming: For Windows, Linux, and Oracle Solaris: Addison-Wesley Professional; 2010.Google Scholar
- Chapman B, Jost G, Van Der Pas R, Vol. 10. Using OpenMP: Portable Shared Memory Parallel Programming: MIT press; 2008.Google Scholar
- Stallings W, Vol. 7th. Operating Systems- Internals and Design Principles: Prentice Hall Englewood Cliffs; 2012.Google Scholar
- Lin S, Wellings A, Burns A. Supporting lock-based multiprocessor resource sharing protocols in real-time programming languages. Concurr Comput Pract Experience. 2013; 25(16):2227–51.View ArticleGoogle Scholar
- Jayasinghe D, Xiong P. CORE: Visualization tool for fault localization in concurrent programs. Citeseer; 2010.Google Scholar
- Park S, Vuduc RW, Harrold MJ. Falcon: Fault Localization in Concurrent Programs. In: Proceedings of the 32Nd ACM/IEEE International Conference on Software Engineering - Volume 1. New York: ACM: 2010. p. 245–54.Google Scholar
- Qi S, Otsuki N, Nogueira LO, Muzahid A, Torrellas J. Pacman: Tolerating asymmetric data races with unintrusive hardware. In: High Performance Computer Architecture, 2012 IEEE 18th International Symposium on. USA: IEEE: 2012. p. 1–12.Google Scholar
- Vargha A, Delaney HD. A critique and improvement of the CL common language effect size statistics of McGraw and Wong. J Educ Behav Stat. 2000; 25(2):101–32.Google Scholar
- Neuhaus S, Zimmermann T, Holler C, Zeller A. Predicting vulnerable software components. In: Proceedings of the 14th ACM Conference on Computer and Communications Security. Alexandria: ACM: 2007. p. 529–40.Google Scholar
- Nagappan N, Ball T. Static analysis tools as early indicators of pre-release defect density. In: 27th International Conference on Software Engineering. Saint Louis: ACM: 2005. p. 580–6.Google Scholar
- Rahman F, Khatri S, Barr ET, Devanbu P. Comparing static bug finders and statistical prediction. In: Proceedings of the 36th International Conference on Software Engineering. Hyderabad: ACM: 2014. p. 424–34.Google Scholar
- Lewis C, Lin Z, Sadowski C, Zhu X, Ou R, Whitehead EJ. Does bug prediction support human developers? findings from a google case study. In: Software Engineering (ICSE), 2013 35th International Conference on. San Francisco: IEEE: 2013. p. 372–81.Google Scholar
- Voinea L, Telea A. How do changes in buggy Mozilla files propagate? In: Proceedings of the 2006 ACM Symposium on Software Visualization. Brighton: ACM: 2006. p. 147–8.Google Scholar
- Pan WF, Li B, Ma YT, Qin YY, Zhou XY. Measuring structural quality of object-oriented softwares via bug propagation analysis on weighted software networks. J Comput Sci Technol. 2010; 25(6):1202–13.View ArticleGoogle Scholar
- Vandiver B, Balakrishnan H, Liskov B, Madden S. Tolerating byzantine faults in transaction processing systems using commit barrier scheduling. ACM SIGOPS Oper Syst Rev. 2007; 41(6):59–72.View ArticleGoogle Scholar
- Chandra S, Chen PM. Whither generic recovery from application faults? A fault study using open-source software. In: Dependable Systems and Networks, 2000. DSN 2000. Proceedings International Conference on. New York: IEEE: 2000. p. 97–106.Google Scholar
- Farchi E, Nir Y, Ur S. Concurrent bug patterns and how to test them. In: Parallel and Distributed Processing Symposium. Washington: IEEE: 2003. p. 7.Google Scholar
- Dean J, Ghemawat S. MapReduce: Simplified Data Processing on Large Clusters. Commun ACM. 2008; 51(1):107–13.View ArticleGoogle Scholar