The Annals of Probability

An Information-Geometric Approach to a Theory of Pragmatic Structuring

Nihat Ay

Full-text: Open access


Within the framework of information geometry, the interaction among units of a stochastic system is quantified in terms of the Kullback–Leibler divergence of the underlying joint probability distribution from an appropriate exponential family. In the present paper, the main example for such a family is given by the set of all factorizable random fields. Motivated by this example, the locally farthest points from an arbitrary exponential family $\mathcal{E}$ are studied. In the corresponding dynamical setting, such points can be generated by the structuring process with respect to $\mathcal{E}$ as a repelling set. The main results concern the low complexity of such distributions which can be controlled by the dimension of $\mathcal{E}$.

Article information

Ann. Probab., Volume 30, Number 1 (2002), 416-436.

First available in Project Euclid: 29 April 2002

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Primary: 62H20: Measures of association (correlation, canonical correlation, etc.) 92B20: Neural networks, artificial life and related topics [See also 68T05, 82C32, 94Cxx] 62B05: Sufficient statistics and fields 53B05: Linear and affine connections

Information geometry Kullback-Leibler divergence mutual information infomax principle stochastic interaction exponential family


Ay, Nihat. An Information-Geometric Approach to a Theory of Pragmatic Structuring. Ann. Probab. 30 (2002), no. 1, 416--436. doi:10.1214/aop/1020107773.

Export citation


  • [1] AMARI, S.-I. (1985). Differential-Geometric Methods in Statistics. Lecture Notes in Statist. 28. Springer, Berlin.
  • [2] AMARI, S.-I. (1997). Information geometry. Contemp. Math. 203 81-95.
  • [3] AMARI, S.-I. (2001). Information geometry on hierarchy of probability distributions. IEEE Trans. Inform. Theory 47 1701-1711.
  • [4] AMARI, S.-I. and NAGAOKA, H. (2000). Methods of Information Geometry. Math. Monogr. 191. Oxford Univ. Press.
  • RAO, C. R. (1987). Differential Geometry in Statistical Inference. IMS, Hayward, CA.
  • [6] AY, N. (2000). Aspekte einer Theorie pragmatischer Informationsstrukturierung. Ph.D. dissertation, Univ. Leipzig.
  • [7] BOOTHBY, W. M. (1975). An Introduction to Differentiable Manifolds and Riemannian Geometry. Pure Appl. Math. 63. Academic Press, New York.
  • [8] BRONDSTED, A. (1983). An Introduction to Convex Polytopes. Springer, New York.
  • [9] COVER, T. M. and THOMAS, J. A. (1991). Elements of Information Theory. WileyInterscience, New York.
  • [10] CSISZÁR, I. (1967). On topological properties of f -divergence. Studia Sci. Math. Hungar. 2 329-339.
  • [11] CSISZÁR, I. (1975). I -divergence geometry of probability distributions and minimization problems. Ann. Probab. 3 146-158.
  • [12] DECO, G. and OBRADOVIC, D. (1996). An Information-Theoretic Approach to Neural Computing. Perspectives in Neural Computing. Springer, New York.
  • [13] FUJIWARA, A. and AMARI, S.-I. (1995). Gradient systems in view of information geometry. Phys. D 80 317-327.
  • [14] GZYL, H. (1995). The Method of Maximum Entropy. Ser. Adv. Math. Appl. Sci. 29. World Scientific, Singapore.
  • [15] HIRSCH, M. and SMALE, S. (1974). Differential Equations, Dynamical Systems, and Linear Algebra. Academic Press, New York.
  • [16] INGARDEN, R. S., KOSSAKOWSKI A. and OHYA M. (1997). Information Dynamics and Open Systems, Classical and Quantum Approach. Kluwer, Dordrecht.
  • [17] JAYNES, E. T. (1957). Information theory and statistical mechanics. Phys. Rev. 106.
  • [18] KULLBACK, S. (1968). Information Theory and Statistics. Dover, Mineola, NY.
  • [19] KULLBACK, S. and LEIBLER, R. A. (1951). On information and sufficiency. Ann. Math. Statist. 22 79-86.
  • [20] LINSKER, R. (1988). Self-organization in a perceptual network. Computer 21 105-117.
  • [21] MARTIGNON, L., VON HASSELN, H., GRÜN, S., AERTSEN, A. and PALM, G. (1995). Detecting higher-order interactions among the spiking events in a group of neurons. Biol. Cybernet. 73 69-81.
  • [22] MURRAY, M. K. and RICE, J. W. (1994). Differential Geometry and Statistics. Chapman and Hall, London.
  • [23] NAGAOKA, H. and AMARI, S. (1982). Differential geometry of smooth families of probability distributions. AETR 82-7, Univ. Tokyo.
  • [24] NAKAMURA, Y. (1993). Completely integrable gradient systems on the manifolds of gaussian and multinomial distributions. Japan J. Indust. Appl. Math. 10 179-189.
  • [25] RAO, C. R. (1945). Information and the accuracy attainable in the estimation of statistical parameters. Bull. Calcutta Math. Soc. 37 81-91.
  • [26] ROCKAFELLAR, R. T. and WETS, J. B. R. (1998). Variational Analysis. Springer, New York.
  • [27] ROMAN, S. (1992). Coding and Information Theory. Springer, New York.
  • [28] SHANNON, C. E. (1948). A mathematical theory of communication. Bell System Tech. J. 27 379-423, 623-656.
  • [29] VAPNIK, V. (1998). Statistical Learning Theory. Adaptive and Learning Systems for Signal Processing, Communications, and Control. Wiley, New York.
  • [30] VAPNIK, V. and CHERVONENKIS, A. (1971). On the uniform convergence of relative frequencies of events to their probabilities. Theory Probab. Appl. 16 264-280.
  • [31] WEBSTER, R. (1994). Convexity. Oxford Univ. Press.