Statistical Science

Opportunities and Challenges Applying Functional Data Analysis to the Study of Open Source Software Evolution

Katherine J. Stewart, David P. Darcy, and Sherae L. Daniel
Source: Statist. Sci. Volume 21, Number 2 (2006), 167-178.

Abstract

This paper explores the application of functional data analysis (FDA) as a means to study the dynamics of software evolution in the open source context. Several challenges in analyzing the data from software projects are discussed, an approach to overcoming those challenges is described, and preliminary results from the analysis of a sample of open source software (OSS) projects are provided. The results demonstrate the utility of FDA for uncovering and categorizing multiple distinct patterns of evolution in the complexity of OSS projects. These results are promising in that they demonstrate some patterns in which the complexity of software decreased as the software grew in size, a particularly novel result. The paper reports preliminary explorations of factors that may be associated with decreasing complexity patterns in these projects. The paper concludes by describing several next steps for this research project as well as some questions for which more sophisticated analytical techniques may be needed.

First Page: Show Hide
Full-text: Open access
Links and Identifiers

Permanent link to this document: http://projecteuclid.org/euclid.ss/1154979819
Digital Object Identifier: doi:10.1214/088342306000000141
Mathematical Reviews number (MathSciNet): MR2324076
Zentralblatt MATH identifier: 05191858

References

Banker, R. D., Davis, G. B. and Slaughter, S. A. (1998). Software development practices, software complexity, and software maintenance performance: A field study. Management Sci. 44 433--450.
Belady, L. A. and Lehman, M. M. (1976). A model of large program development. IBM Systems J. 15 225--252.
Chidamber, S. R., Darcy, D. P. and Kemerer, C. F. (1998). Managerial use of metrics for object-oriented software: An exploratory analysis. IEEE Trans. Software Engineering 24 629--639.
Darcy, D. P., Kemerer, C. F., Slaughter, S. A. and Tomayko, J. E. (2005). The structural complexity of software: An experimental test. IEEE Trans. Software Engineering 31 982--995.
Gorla, N. and Ramakrishnan, R. (1997). Effect of software structure attributes on software development productivity. J. Systems and Software 36 191--199.
Hastie, T., Tibshirani, R. and Friedman, J. (2001). The Elements of Statistical Learning. Springer, New York.
Mathematical Reviews (MathSciNet): MR1851606
Zentralblatt MATH: 0973.62007
Jank, W. and Shmueli, G. (2005). Profiling price dynamics in online auctions using curve clustering. Working paper RHS-06-004, Smith School of Business, Univ. Maryland. Available at ssrn.com/abstract=902893.
Kaufmann, L. and Rousseeuw, P. J. (1987). Clustering by means of medoids. In Statistical Analysis Based on the $L_1$-Norm and Related Methods (Y. Dodge, ed.) 405--416. North-Holland, Amsterdam.
Mathematical Reviews (MathSciNet): MR949240
Kemerer, C. F. (1995). Software complexity and software maintenance: A survey of empirical research. Annals of Software Engineering 1 1--22.
Kemerer, C. F. and Slaughter, S. A. (1999). An empirical approach to studying software evolution. IEEE Trans. Software Engineering 25 493--509.
MacCormack, A., Rusnak, J. and Baldwin, C. (2004). Exploring the structure of complex software designs: An empirical study of open source and proprietary code. Working paper 05-016, Harward Business School.
Prahalad, C. K. and Krishnan, M. S. (1999). The new meaning of quality in the information age. Harvard Business Review Sept. 109--118.
Ramsay, J. O. and Silverman, B. W. (2002). Applied Functional Data Analysis: Methods and Case Studies. Springer, New York.
Mathematical Reviews (MathSciNet): MR1910407
Zentralblatt MATH: 1011.62002
Scacchi, W. (2002). Understanding the requirements for developing open source software systems. IEEE Proc. Software 149 24--39.
Shmueli, G. and Jank, W. (2006). Modeling the dynamics of online auctions: A modern statistical approach. In Economics, Information Systems and E-Commerce Research II: Advanced Empirical Methods 1 (R. Kauffman and P. Tallon, eds.). Sharpe, Armonk, NY. To appear.
Smith, T. (2002). Open source: Enterprise ready---with qualifiers. Available at www.linuxtoday.com/it_management/2002100101126NWBZ.
Stewart, K., Ammeter, A. and Maruping, L. M. (2006). Impacts of license choice and organizational sponsorship on user interest and development activity in open source software projects. Information Systems Research 17 126--144.
Stewart, K. and Gosain, S. (2006). The impact of ideology on effectiveness in open source software development teams. Management Information Systems Quarterly 30 291--314.
Tan, Y. and Mookerjee, V. S. (2005). Comparing uniform and flexible policies for software maintenance and replacement. IEEE Trans. Software Engineering 31 238--255.
Yu, L., Schach, S. R., Chen, K. and Offutt, J. (2004). Categorization of common coupling and its application to the maintainability of the Linux kernel. IEEE Trans. Software Engineering 30 694--706.

2012 © Institute of Mathematical Statistics

Statistical Science

Statistical Science