Document status: draft
Last modified: 2009-01-30
This document is addressed to Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) data providers whose repositories contain mathematical literature. It describes a set of recommended best practices for producing unqualified (simple) Dublin Core records to be shared via OAI-PMH harvesting. For more information about OAI-PMH, see http://www.openarchives.org.
The intent of these recommendations is to make shared simple Dublin Core metadata more immediately useful by defining common practices and conventions.
The recommendations here represent a profile of simple Dublin Core. Unless noted, all DCMI policies and recommendations regarding the use of simple DC apply.
These recommendations were orginally drafted by:
OAI-PMH sets provide a mechanism to group records and thereby support selective harvesting by service providers. Recommendations made here about establishing OAI-PMH sets apply to harvesting only. No recommendations are made for how harvested records are subsequently grouped or made accessible to users.
In general, it is recommended that OAI-PMH sets be used and that they be based on publication entities and/or types, as opposed to subjects or other criteria. For serial literature, it is recommended that data providers establish sets that correspond to title-level bibliographic entities (i.e., journals). For conference proceedings, a set should represent a conference series as a whole. For monographs, sets should be made for any monographic series. Arbitrary general sets (e.g., "Cornell Historical Mathematics Monographs") can be created to hold individual monographs that otherwise belong to no bibliographically defined collection. Avoid sets made up of a single monograph.
The OAI-PMH setName should contain the name of the publication entity included in the set. For example:
<setName>Journal of Applied Mathematics</setName>
<setName>Institute of Mathematical Statistics Lecture Notes - Monograph Series</setName>
<setName>Mémoires de la Société mathématique de France</setName>
<setName>Astérisque</setName>
| Term | Title |
|---|---|
| Recommendations |
#1: Provide at least one title in every dml_dc record. #2: Include only one title per title element. Repeat title element for each additional title. Treat translations and transcriptions as additional titles. #3: If multiple titles are included in the record, the data provider should determine which title is most appropriate in a shared context and place that title first. #4: Keep title and sub-title within a single title element. #5: Encode mathematical expressions in TeX, following recommended TeX usage1. #6: Encode any non-math special characters in UTF-8 (i.e., not in TeX). |
| Examples |
<title>Super Riemann surfaces: uniformization and Teichmüller theory</title> <title>Généralités sur les groupes algébriques affines. Groupes algébriques affines commutatifs</title> <title>Geometry of $\mathrm {SU}(2)$ gauge fields</title> <title>Sur quelques définitions possibles de l'intégrale de Stieltjes</title> <title>à propos de la série $\sum_{n = 1}^{+ \infty} \frac{x^n}{q^n - 1}$</title> <title>Diamond representations of $\mathfrak{sl}(n)$</title> <title>Détermination finie de singularités dicritiques dans $(\mathbb{C}^2,0)$</title> |
| Term | Creator |
|---|---|
| Recommendations |
#1: Include only one name per creator element. Repeat creator element for each additional name. #2: Use a format of the name that will sort appropriately in an alphabetized listing (e.g., for Western personal names use inverted format: "surname, forename"). #3: Include only one version of an author's name. The data provider should determine which version of a name is most appropriate in a shared context. #4: Any special characters (i.e., those beyond 7-bit ASCII) should be encoded in UTF-8, not in TeX. |
| Examples |
<creator>Tamm de Araujo Moreira, Carlos Gustavo</creator> <creator>Hàn Thế, Thành</creator> <creator>Siu, Yum-Tong</creator> <creator>Van der Put, Marius</creator> <creator>De La Vallée Poussin, Charles</creator> <creator>Fermat, Pierre de</creator> <creator>Colin de Verdière, Yves</creator> <creator>Levi-Civita, Tullio</creator> <creator>Abel, Niels Henrik</creator> <creator>Agnesi, Maria Gaetana</creator> <creator>Sanz Serna, Jesús María</creator> |
| Term | Contributor |
|---|---|
| Recommendations |
#1: Use the contributor element for an entity responsible for making a contribution to the resource, such as a redactor, translator, or editor. #2: Include only one name per contributor element. Repeat contributor element for each additional name. #3: Use a format of the name that will sort appropriately in an alphabetized listing (e.g., for Western personal names use inverted format: "surname, forename"). #4: Include only one version of a contributor's name. The data provider should determine which version of a name is most appropriate in a shared context. #5: Any special characters (i.e., those beyond 7-bit ASCII) should be encoded in UTF-8, not in TeX. |
| Examples |
<contributor>Shenitzer, A. (translator)</contributor> <contributor>Dieudonné, Jean (redactor)</contributor> |
| Term | Publisher |
|---|---|
| Recommendations |
No recommendation is made for how to use this element. |
| Examples |
| Term | Subject |
|---|---|
| Recommendations |
#1: Include only one subject term (may be a phrase) per subject element. Repeat subject element for each additional subject. #2: Subject terms should be taken from relevant controlled vocabularies whenever possible, such as the Mathematics Subject Classification (MSC). #3: When using a controlled vocabulary, preface the subject term with an abbreviation (e.g., "msc:14H60"). #4: When subject terms from controlled vocabularies are ranked as primary and secondary, place primary subject terms first. #5: Data providers are encouraged to include natural language versions of any controlled vocabulary codes (not as a substitute for codes, but in addition to codes). #6: Encode mathematical expressions in TeX, following recommended TeX usage1. #7: Encode any non-math special characters in UTF-8 (i.e., not in TeX). |
| Examples |
<subject>msc:32S35</subject> <subject>msc:Mixed Hodge theory of singular varieties</subject> <subject>msc:36.0X</subject> <subject>Differential equations</subject> <subject>dewey:516.2</subject> <subject>Geométrie euclidienne</subject> <subject>unesco:1204.02</subject> <subject>Complex manifolds</subject> <subject>nonparametric experiments</subject> <subject>ensembles inévitables</subject> |
| Term | Description |
|---|---|
| Recommendations |
#1: Provide an abstract of the resource if available. #2: Include one complete abstract per description element. If the abstract contains multiple paragraphs, insert ¶ (pilcrow sign--x00B6) between paragraphs. #3: Treat translations and transcriptions as additional abstracts. Repeat description element for each additional abstract. #4: If multiple descriptions are included in the record, the data provider should determine which one is most appropriate in a shared context and place that description first. #5: Encode mathematical expressions in TeX, following recommended TeX usage1. #6: Encode any non-math special characters in UTF-8 (i.e., not in TeX). |
| Examples |
<description>We propose a general theorem providing upper bounds for the risk of an empirical risk minimizer (ERM). We essentially focus on the binary classification framework. We extend Tsybakov's analysis of the risk of an ERM under margin type conditions by using concentration inequalities for conveniently weighted empirical processes.</description> <description>For a prime $p$ and positive integers $\ell <k<h<p$ with $d=(h,k,\ell ,p-1)$, we show that $M$, the number of simultaneous solutions $x, y, z, w$ in $\mathbb{Z}_p^*$ to $x^h+y^h=z^h+w^h$, $x^k+y^k=z^k+w^k$, $x^{\ell }+y^{\ell }=z^{\ell }+w^{\ell }$, satisfies $$M\le 3d^2(p-1)^2+25hk\ell (p-1).$$ ¶ When $hk\ell =o(pd^2)$ we obtain a precise asymptotic count on $M$. This leads to the new twisted exponential sum bound $$\left|\sum _{x=1}^{p-1}\chi (x) e^{2\pi i f(x)/p}\right| \le 3^{\frac{1}{4}}d^{\frac{1}{2}}p^{\frac{7}{8}} + \sqrt{5} \left(hk\ell \right)^{\frac{1}{4}}p^{\frac{5}{8}},$$ for trinomials $f=ax^h+bx^k+cx^\ell $, and to results on the average size of such sums.</description> <description>Soit $X$ un espace analytique complexe normal, soit $S$ un sous-ensemble analytique fermé de $X$, de codimension $\ge 2$, et soit $\mathbf{F}$ un faisceau analytique cohérent sans torsion sur $X-S$. On démontre l'équivalence des trois propriétés suivantes : ¶ (i) L'image directe de $\mathbf{F}$ par l'injection $X-S\rightarrow X$ est un faisceau cohérent sur $X$. ¶ (ii) Il existe un faisceau analytique cohérent sur $X$ qui prolonge $\mathbf{F}$. ¶ (iii) Pour tout $s\in S$, il existe un voisinage ouvert $U$ de $s$ tel que la restriction de $\mathbf{F}$ à $U-S\cap U$ soit engendrée par ses sections (sur $U-S\cap U$). ¶ Les implications (i) $\rightarrow $ (ii) $\rightarrow $ (iii) sont triviales. L'implication (iii) $\rightarrow $ (i) utilise le théorème de Remmert-Stein sur le prolongement des sous-variétés. ¶ Lorsque $X$ est une variété projective, les conditions (i), (ii) et (iii) équivalent à dire que le faisceau $\mathbf{F}$ est ``algébrique''.</description> |
| Term | Date |
|---|---|
| Recommendations |
#1: Provide only one date element per record: the formal date of publication of the original resource. #2: Format the date as "YYYY-MM-DD" (year-month-day) or "YYYYMMDD", where MM and DD are optional. That is, use W3C-DTF (with hyphens) or ISO 8601 (without hyphens). #3: At a minimum, provide the year of publication. #4: Provide month, or month and day, if appropriate and available, but do not include either month or day if meaningless (i.e., do not allow software to extend dates automatically, merely to achieve a 6 or 8-digit string, such as "1999-01-01"). #5: Do not provide timestamps in the record date element, such as "2003-04-24T13:15:52Z". |
| Examples |
<date>2005</date> <date>2005-06</date> <date>20050615</date> |
| Term | Type |
|---|---|
| Recommendations |
#1: Use the DCMI Type Vocabulary recommended values. For textual material, use "Text". #2: In addition, include a one-word description of the kind of text described by the record. Use the appropriate BibTeX Entry Type2 for the resource if possible. |
| Examples |
<type>Text</type> <type>article</type> <type>Text</type> <type>inproceedings</type> |
| Term | Format |
|---|---|
| Recommendations |
#1: Include only one format description per format element. Repeat format element for each additional format description. #2: For format description values, use MIME Media Types. #3: Provide all formats available for the resource. |
| Examples |
<format>application/pdf</format> <format>application/x-djvu</format> |
| Term | Identifier |
|---|---|
| Recommendations |
#1: Include only one resource identifier per identifier element. #2: Provide a persistent URL to a record page for the resource (as opposed to a link to a particular format of the resource). #3: Provide only one http identifier. #4: If the resource has a DOI, provide it using the "doi:" prefix. #5: Provide a human readable bibliographic citation sufficient to identify the resource. Use the prefix "bibliographicCitation:" with this data. For example, if the resource is a journal article, the bibliographic citation should include journal title (if abbreviation, follow MR or Zbl conventions), volume number, issue/number (if available), year of publication, and page range. In this bibliographic citation, do not include article author or title. #6: All these identifiers should be a UTF-8 encoded character string (i.e., no formatting instruction, no TeX code). #7: If the resource described by the record is an entire book, include the ISBN using the "isbn:" prefix. #8: If possible, include an OpenURL containing bibliographic citation data. |
| Examples |
<identifier>http://www.numdam.org/item?id=AIF_1994__44_1_213_0</identifier> <identifier>doi:10.1215/S0012-7094-79-04608-8</identifier> <identifier>bibliographicCitation:Ann. Inst. Fourier 44, no.1, 213-248 (1994)</identifier> <identifier>bibliographicCitation:Ann. Statist. 29 (2001), no. 5, 1281-1296</identifier> <identifier>isbn:0268034869</identifier> <identifier>bibliographicCitation:Princeton Math. Series, n° 19, Princeton, 1956</identifier> <identifier>bibliographicCitation:Sém. Bourbaki, 1959-1960, n° 195</identifier> <identifier>bibliographicCitation:Act. Sci. Ind., n° 1264, Paris, Hermann, 1959</identifier> <identifier>bibliographicCitation:Berlin, Springer, 1956</identifier> <identifier>bibliographicCitation:Andreatta, Marco (ed.) et al., Higher dimensional complex varieties. Proceedings of the international conference, Trento, Italy, June 15--24, 1994. Berlin: Walter de Gruyter. 67-81 (1996)</identifier> <identifier>bibliographicCitation:International Journal of Mathematics and Mathematical Sciences, vol. 2007, Article ID 50875, 15 pages, 2007.</identifier> <identifier>isbn:3-11-014503-0</identifier> |
| questions/notes |
We need to refine #8, so as to only get the content object portion of the OpenURL. Otherwise this conflicts with #3. I think we need to identify this as a "KEV OpenURL ContextObject." And of course, we need an example. |
| Term | Language |
|---|---|
| Recommendations |
#1: Use the language element to indicate the language or languages of the full-text resource, not the language of the metadata. #2: Include only one language per language element. If the entire full-text of a resource is presented in multiple languages, indicate this by repeating the language element for each additional language. #3: Use language code values from ISO 639 (e.g., "en") rather than text names (e.g., "English"). #4: Use of the ISO 639-1, alpha-2 (two-character), language codes is preferred. #5: If available, values constructed according to RFC 3066 are acceptable. |
| Examples |
<language>de</language> <language>en</language> <language>fr</language> |
| Term | Relation |
|---|---|
| Recommendations |
#1: Provide all available references to Math Reviews, Zentralblatt MATH, and Jahrbuch über die Fortschritte der Mathematik using prefixes (e.g., mr:3209574, zbl:0792.43002, jfm:60.0158.01). #2: If the resource described by the record is a journal article, provide the ISSN of the journal, using the prefix "issn:" (e.g., "issn:1234-2344"). #3: If the resource described by the record is part of a book, provide the ISBN of the book, using the prefix "isbn:". [If the resource described by the record is an entire book, provide the ISBN of the book in the Identifier term, not here.] |
| Examples |
<relation>mr:0223268</relation> <relation>mr:MR0223268</relation> <relation>zbl:0176.22301</relation> <relation>jfm:56.0296.03</relation> <relation>isbn:5904350894</relation> |
| Term | Rights |
|---|---|
| Recommendations |
#1: If access to the full-text resource is currently unrestricted, indicate this with a rights element containing "access:Unrestricted". #2: If access to the full-text resource is currently restricted by subscription policies, indicate this with a rights element containing "access:SubscribersOnly". #3: Other statements or links to information about the rights, restrictions, or conditions of use of a resource are left to the discretion of the provider. |
| Examples |
<rights>access:Unrestricted</rights> <rights>access:SubscribersOnly</rights> |
| questions/notes |
In Minn., we began to question this more. One problem is that items may be restricted for reasons other than not having a subscription. More importantly though, access restrictions on any resource will likely change over time (open access windows, etc.), and how to deal with that (UnrestrictedAfter, UnrestrictedUntil, etc.). I think this starts getting too complicated for simple DC records. If people keep their records current, and harvesters re-harvest, then the proposal above (#1-2) may not be a problem. But another tactic would be to simplify this all by only declaring whether a resource is unrestricted, assuming that if it is, it probably won't become restricted in the future (though this clearly won't always be true). If there is no "unrestricted" flag, then nothing is expressly claimed, but the assumption is that there's some sort of access restrictions in place. This doesn't really solve the problem of changing access restrictions, however. So this seems unresolved. |
| Term | Source |
|---|---|
| Recommendations |
No recommendation is made for how to use this element. |
| Examples |
| Term | Coverage |
|---|---|
| Recommendations |
No recommendation is made for how to use this element. |
| Examples |
\mbox{\rm diag}\;\left(x,\,y,\,z\right); \left[\phantom{\displaystyle\iint}\!\!\!{}_{\scripscriptstyle\Omega}\right.; etc.As this is not a fixed vocabulary, the recommendation is to use the most established word for the kind of item: if it's a book, usebook, if it's an article published in a proceedings volume, useinproceedings. Use ofeprintorpostprint, though not documented in official BibTeX literature, might be appropriate under certain circumstances.