The Annals of Applied Statistics

Dating medieval English charters

Gelila Tilahun, Andrey Feuerverger, and Michael Gervers

Full-text: Access denied (no subscription detected) We're sorry, but we are unable to provide you with the full text of this article because we are not able to identify you as a subscriber. If you have a personal subscription to this journal, then please login. If you are already logged in, then you may need to update your profile to register your subscription. Read more about accessing full-text


Deeds, or charters, dealing with property rights, provide a continuous documentation which can be used by historians to study the evolution of social, economic and political changes. This study is concerned with charters (written in Latin) dating from the tenth through early fourteenth centuries in England. Of these, at least one million were left undated, largely due to administrative changes introduced by William the Conqueror in 1066. Correctly dating such charters is of vital importance in the study of English medieval history. This paper is concerned with computer-automated statistical methods for dating such document collections, with the goal of reducing the considerable efforts required to date them manually and of improving the accuracy of assigned dates. Proposed methods are based on such data as the variation over time of word and phrase usage, and on measures of distance between documents. The extensive (and dated) Documents of Early England Data Set (DEEDS) maintained at the University of Toronto was used for this purpose.

Article information

Ann. Appl. Stat. Volume 6, Number 4 (2012), 1615-1640.

First available in Project Euclid: 27 December 2012

Permanent link to this document

Digital Object Identifier

Zentralblatt MATH identifier

Mathematical Reviews number (MathSciNet)


Tilahun, Gelila; Feuerverger, Andrey; Gervers, Michael. Dating medieval English charters. Ann. Appl. Stat. 6 (2012), no. 4, 1615--1640. doi:10.1214/12-AOAS566.

Export citation


  • Berry, M. W. and Browne, M. (2005). Understanding Search Engines—Mathematical Modeling and Text Retrieval, 2nd ed. SIAM, Philadelphia.
  • Broder, A. Z. (1998). On the resemblance and containment of documents. In International Conference on Compression and Complexity of Sequences (SEQUENCES’97), June 1113 1997, Positano, Italy 21–29. IEEE Comput. Soc., Los Alamitos, CA.
  • de Jong, F., Rode, H. and Hiemstra, D. (2005). Temporal language models for the disclosure of historical text. In Proc. 16th Int. Conf. of the Assoc. for History and Computing 161–168. KNAW, Amsterdam.
  • Djeraba, C. (2003). Multimedia Mining—A Highway to Intelligent Multimedia Documents. Kluwer, Boston.
  • Domingos, P. and Pazzani, M. (1996). Beyond independence: Conditions for optimality of the Bayes classifier. In Proceedings of the 13th International Conference on Machine Learning 105–112. Association for Computing Machinery, New York.
  • Fan, J. and Gijbels, I. (2000). Local polynomial fitting. In Smoothing and Regression: Approaches, Computation, and Application (M. G. Schimek, ed.) 229–276. Wiley, New York.
  • Feuerverger, A., He, Y. and Khatri, S. (2012). Statistical significance of the Netflix challenge. Statist. Sci. 27 202–231.
  • Feuerverger, A., Hall, P., Tilahun, G. and Gervers, M. (2005). Distance measures and smoothing methodology for imputing features of documents. J. Comput. Graph. Statist. 14 255–262.
  • Feuerverger, A., Hall, P., Tilahun, G. and Gervers, M. (2008). Using statistical smoothing to date medieval manuscripts. In Beyond Parametrics in Interdisciplinary Research: Festschrift in Honor of Professor Pranab K. Sen (N. Balakrishnan, E. Pena, M. J. Silvapulle, eds.). Inst. Math. Stat. Collect. 1 321–331. Inst. Math. Statist., Beachwood, OH.
  • Fiallos, R. (2000). An overview of the process of dating undated medieval charters: Latest results and future developments. In Dating Undated Medieval Charters (M. Gervers, ed.). Boydell Press, Woodbridge.
  • Gervers, M. (2000). Dating Undated Medieval Charters. Boydell Press, Woodbridge.
  • Gervers, M. and Hamonic, N. (2010). Pro amore dei: Diplomatic evidence of social conflict during the reign of King John. Preprint.
  • Hastie, T., Tibshirani, R. and Friedman, J. (2009). The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd ed. Springer, New York.
  • Kanhabua, N. and Norvag, K. (2008). Improving Temporal Language Models for Determining Time of Non-Timestamped Documents. Lecture Notes in Computer Science 5173. Springer, Berlin.
  • Kanhabua, N. and Norvag, K. (2009). Using Temporal Language Models for Documents Dating. Lecture Notes in Computer Science 5782. Springer, Berlin.
  • Koenker, R. (2005). Quantile Regression. Econometric Society Monographs 38. Cambridge Univ. Press, Cambridge.
  • Loader, C. (1999). Local Regression and Likelihood. Springer, New York.
  • Luhn, H. P. (1958). The automatic creation of literature abstracts. IBM J. Res. Develop. 2 159–165.
  • Manning, C., Raghavan, P. and Schütze, H. (2008). Introduction to Information Retrieval. Cambridge Univ. Press, New York.
  • McGill, M., Koll, M. and Noreault, T. (1979). An evaluation of factors affecting document ranking by information retrieval systems. Technical Report. School of Information Studies, Syracuse Univ., Syracuse, NY.
  • Mosteller, F. and Wallace, D. (1963). Inference in an authorship problem. J. Amer. Statist. Assoc. 58 275–302.
  • Nadaraya, E. A. (1964). On estimating regression. Theory Probab. Appl. 10 186–190.
  • Quang, P. X., James, B., James, K. L. and Levina, L. (1999). Document similarity measure for the vector space model in information retrieval. NSASAG Problem 99-5.
  • Salton, G., Wang, A. and Yang, C. (1975). A vector space model for information retrieval. J. Amer. Soc. Inf. Sci. 18 613–620.
  • Simonoff, J. S. (1996). Smoothing Methods in Statistics. Springer, New York.
  • Tan, P. N., Steinbach, M. and Kumar, V. (2005). Introduction to Data Mining. Addison-Wesley, Reading.
  • Wand, M. P. and Jones, M. C. (1995). Kernel Smoothing. Monographs on Statistics and Applied Probability 60. Chapman & Hall, London.
  • Watson, G. S. (1964). Smooth regression analysis. Sankhyā Ser. A 26 359–372.
  • Zhang, J. and Korfhagen, R. (1999). A distance and angle similarity measure method. J. Amer. Soc. Inf. Sci. 50 772–778.