EuDML metadata schema specification (v2.0 - final)

The EuDML metadata schema version 2.0 as defined by deliverable D3.6 is implemented in two XML schemas providing the 2 root elements holding XML metadata for two major types of items, namely journal articles and books.

A consequence of this choice is that:

  • There is no separate schema for book parts (typically individual articles in a proceedings volume); these are described and exchanged within the whole book record they belong to.
  • There is no separate schema for multi-volume works. Instead a book record may carry the description of the multi-volume work it belongs to, if any.


Journal articles are described with unmodified Journal Archiving and Interchange Tag Set, NISO version 1.0 XML schema structure with root element <article>. However, as the NISO schema lacks a namespace declaration, the formal specification of the journal article schema, located at http://eudml.org/schema/2.0/eudml-article-2.0.xsd adds the definition of the target namespace “http://jats.nlm.nih.gov” to the standard XML schema definition. There are no other modifications to the standard.

Structured documentation for this schema is available at http://jats.nlm.nih.gov/archiving/tag-library/1.0/.

The Best practices document helps maintain consistant encoding practices among EuDML content providers. See D3.6-appendix.pdf.



Books are described with a new schema with root element <book>. This schema is very similar to our previous EuDML book v1.0 structure (up to the introduction of a <front> elemnt to make it more similar to the article structure) but instead of being based on NCBI book tag library version 3.0, it only introduces the specific superstructure while most of the elements are reused unchanged from the article schema.

In the following a high level view of the schema is given, readers needing a detailed description are referred to the above documentation. The overall tree structure is similar to an article’s structure, with the following main parts:



<b:book>
 <b:front> ...</b:front>
 <b:body>  ...</b:body>
 <a:back>  ...</a:back>
</b:book>

Note that the <a:back> element lives in the journal article namespace. In the context of EuDML, it is used to hold reference lists (i.e the list of works that are cited by the given item) and the content is the same for both document types.

<b:front> : book level metadata

Book level metadata is wrapped in the <b:front> container, which may contain the following book specific elements:

  • <b:book-meta>. This is a container for book specific metadata, including the following elements:
  •             <b:book-id> : book identifier, one or more,
  •             <b:book-title-group> : container for all title related metadata,
  •             all other metadata can be described by elements already defined in the “a” (article) namespace. They include in particular:
  •                   description of contributors (authors, editors, translators),
  •                   publishing information: publisher, date published, volume, edition, series,
  •                   abstracts, keywords, conference information in case of a proceeding volume,
  •                   links to the book’s text.
  •     <b:mbook-meta>. When a given book is actually part of a multi-volume work, this container element must be used to describe this work. Since a multi-volume work is actually a book that happens to have been published in several volumes, the content model for <mbook-meta> is the same as for <book-meta>. If present, this element must be the first child of the <b:front> element.

<b:body> : book contents

The body of a book contains the metadata for its constituent parts, when available, according to the following (recursive) representation.



<b:book-part>
 <b:book-part-meta> ... </b:book-part-meta>
 <b:body> ... </b:body>
 <a:back> ... </a:back>
</b:book-part>

The metadata for book parts is actually almost the same as for a journal article. A few elements have been renamed for consistency while serving exactly the same purpose: <book-part-id> should be used instead of <article-id>. Various book part titles are grouped in a <title-group> element whose content model is slightly different from an article’s <title-group>, the <title> element should be used instead of <article-title>. A book part may have its own reference list (as is the case of a proceedings article or a chapter in an edited book), the <a:back> element is thus allowed in a book part description.

The following is a simplified example of tagging a book part.



<b:book-part>
 <b:book-part-meta>
  <b:book-part-id pub-id-type="dmlcz-id">400227</b:book-part-id>
  <b:title-group>
   <b:title xml:lang="de">Bernard Bolzano’s Schriften</b:title>
   <a:trans-title-group xml:lang="cs">
    <a:trans-title>Spisy Bernarda Bolzana</a:trans-title>
   </a:trans-title-group>
  </b:title-group>
  <a:contrib-group>
   <a:contrib contrib-type="author">
    <a:string-name>Bolzano, Bernard</a:string-name>
   </a:contrib>
  </a:contrib-group>
  <a:fpage>[1a]</a:fpage>
  <a:lpage>[1e]</a:lpage>
 </b:book-part-meta>
</b:book-part>

The formal definition of this schema can be found at the following location: http://eudml.org/schema/2.0/eudml-book-2.0.xsd.

Structured documentation is available at http://eudml.mathdoc.fr/schemas/doc/book.html.

 

 

Tagging best practices

The Best practices document are part of these specifications and explain how to use a certain number of JATS elements and attributes in order to contribute unambiguously high quality metadata to EuDML.

A validation service, against the schemas and the best practices, using a dedicated schematron engine, is ran on each incoming XML harvested through OAI-PMH. A demo that can be used by external partners to test the quality of their files and compliance with these specifications and best practices will be soon available at http://eudml.mathdoc.fr/eudml-validation-demo/.

Conformant EuDML v2.0 XML examples are available from EuDML OAI-PMH server, see https://project.eudml.org/oai-pmh-server. Examples of EuDML XML structures and coding practices can be found (in some non-documented variant) using the "NLM metadata via REST" services, see https://project.eudml.org/api-tester/restNlm.

 

 

Background informations

Since we released the initial version of the EuDML schema, the NLM Journal Archiving and Interchange Tag Suite has been passed over to the NISO standard body which has published NISO Z39.96-2012. As a consequence, a new Journal Archiving and Interchange Tag Set schema, with NISO version 1.0 has been published.

Differences between the NLM 3.0 and NISO 1.0 are shown on the web page http://jats.nlm.nih.gov/archiving/tag-library/1.0/n-zad2.html.

Straightforward consequences of these changes are:

  • the out-of-the-box article XSD is now perfectly suiting our needs, as all the extension we had adopted, or planned to adopt in the next revision of our schema are now standard.
  • a number of nice features have been introduced that could help improve some parts of eudml operation. For instance, <citation-alternatives> element could be used to store different representations of the same citation (for instance as a result of some internal non-destructive matching/enhancing process, such as adding ref; database links, or a structured (<element-citation>) version for an initially unstructured one.
  • moreover it is "fully backward compatible with NLM version 3.0." and thus with EuDML article 1.0 (http://jats.nlm.nih.gov/about.html)
  • The NCBI Book Tag Set is not part of the NISO standard and was thus not upgraded to that standard.


The main differences between initial and final versions of the EuDML metadata schema are:

  • All new structures introduced in JATS NISO 1.0 are supported.
  • The article document type is described with the standard article XSD provided by JATS NISO 1.0.
  • Name spaces have changed.
  • The book document type is described using a new schema defined from scratch based explicitly on JATS NISO 1.0 elements. This schema is very similar to EuDML 1.0 book structure, but the way it is defined has completely changed and backward compatibility has not been sought in this case.
  • Multi-volume works are now not any more supported by a third XML record type (mbook), but through a special metadata element in the book schema.

 

 

 

Archive

EuDML metadata schema specification (v1.0) (previous version)