The Annals of Applied Statistics

Analysis of dependence among size, rate and duration in internet flows

Cheolwoo Park, Felix Hernández-Campos, J. S. Marron, Kevin Jeffay, and F. Donelson Smith

Full-text: Open access

Abstract

In this paper we examine rigorously the evidence for dependence among data size, transfer rate and duration in Internet flows. We emphasize two statistical approaches for studying dependence, including Pearson’s correlation coefficient and the extremal dependence analysis method. We apply these methods to large data sets of packet traces from three networks. Our major results show that Pearson’s correlation coefficients between size and duration are much smaller than one might expect. We also find that correlation coefficients between size and rate are generally small and can be strongly affected by applying thresholds to size or duration. Based on Transmission Control Protocol connection startup mechanisms, we argue that thresholds on size should be more useful than thresholds on duration in the analysis of correlations. Using extremal dependence analysis, we draw a similar conclusion, finding remarkable independence for extremal values of size and rate.

Article information

Source
Ann. Appl. Stat., Volume 4, Number 1 (2010), 26-52.

Dates
First available in Project Euclid: 11 May 2010

Permanent link to this document
https://projecteuclid.org/euclid.aoas/1273584446

Digital Object Identifier
doi:10.1214/09-AOAS268

Mathematical Reviews number (MathSciNet)
MR2758083

Zentralblatt MATH identifier
1189.62101

Keywords
Correlation analysis extremal dependence analysis internet flows network performance thresholding

Citation

Park, Cheolwoo; Hernández-Campos, Felix; Marron, J. S.; Jeffay, Kevin; Smith, F. Donelson. Analysis of dependence among size, rate and duration in internet flows. Ann. Appl. Stat. 4 (2010), no. 1, 26--52. doi:10.1214/09-AOAS268. https://projecteuclid.org/euclid.aoas/1273584446


Export citation

References

  • Aitkin, M. (1964). Correlattion in a singly truncated bivariate normal distribution. Psychometrika 29 263–270.
  • Beirlant, J., Goegebeur, Y., Segers, J. and Teugels, J. (2004). Statistics of Extremes: Theory and Applications. Wiley, Chichester.
  • Coles, S., Heffernan, J. and Tawn, J. (1999). Dependence measures for extreme value analyses. Extremes 2 339–365.
  • de Haan, L. and de Ronde, J. (1998). Sea and wind: Multivariate extremes at work. Extremes 1 7–45.
  • Einmahl, J., de Haan, L. and Piterbarg, V. (2001). Nonparametric estimation of the spectral measure of an extreme value distribution. Ann. Statist. 29 1401–1423.
  • Hernández-Campos, F. (2006). Generation and validation of empirically-derived tcp application workloads. Unpublished Ph.D. dissertation, Univ. North Carolina at Chapel Hill, Dept. Computer Science.
  • Hernández-Campos, F., Marron, J. S., Samorodnitsky, G. and Smith, F. D. (2004). Variable heavy tails in internet traffic. Journal of Performance Evaluation 58 261–284.
  • Hernández-Campos, F., Jeffay, K., Park, C., Marron, J. S. and Resnick, S. (2005). Extremal dependence: Internet traffic applications. Stoch. Models 21 1–35.
  • Huang, X. (1992). Statistics of bivariate extreme values. Ph.D. thesis, Tinbergen Institute Research Series 22, Erasmus Univ. Rotterdam, Postbus 1735, 3000DR, Rotterdam, The Netherlands.
  • Johnson, N. L. and Kotz, S. (1972). Distributions in Statistics: Continuous Multivariate Distributions. Wiley, New York.
  • Kurose, J. F. and Ross, K. W. (2007). Computer Networking: A Top-Down Approach. Addison Wesley, Boston, MA.
  • Lan, K.-C. and Heidemann, J. (2006). A measurement study of correlations of internet flow characteristics. Computer Networks 50 46–62.
  • Ledford, A. W. and Tawn, J. A. (1996). Statistics for near independence in multivariate extreme values. Biometrika 83 169–187.
  • Ledford, A. W. and Tawn, J. A. (1997). Modelling dependence within joint tail regions. J. Roy. Statist. Soc. Ser. B 59 475–499.
  • Lu, D., Qiao, Y., Dinda, P. A. and Bustamante, F. E. (2005). Characterizing and predicting tcp throughput on the wide area network. In Proceedings of IEEE International Conference on Distributed Computing Systems 2005, 414–424.
  • Poon, S.-H., Rockinger, M. and Tawn, J. (2001). New extreme-value dependence measures and finance applications. In CEPR Discussion Paper No. 2762. Available at Social Science Research Network: http://ssrn.com/abstarct=267283.
  • Resnick, S. (2002). Hidden regular variation, second order regular variation and asymptotic variation. Extremes 5 303–336.
  • Resnick, S. (2004). On the foundations of multivariate heavy tailed analysis. J. Appl. Probab. 41A 191–212.
  • Rosenbaum, S. (1960). Moments of a truncated bivariate normal distribuion. J. Roy. Statist. Soc. Ser. B 23 405–408.
  • Smith, F. D., Hernández-Campos, F., Jeffay, K. and Ott, D. (2001). What tcp/ip protocol headers can tell us about the web. In Proceedings of ACM SIGMETRICS 2001 Conference 245–256.
  • Zhang, Y., Breslau, L., V., P. and Shenker, S. (2002). On the characteristics and origins of internet flow rates. In Proceedings of ACM SIGCOMM 2002 Conference 309–322.
  • Zhang, Z. (2008). Quotient correlation: A sample based alternative to pearson’s correlation. Ann. Statist. 36 1007–1030.