Objectives: To provide tools and workflows to extract textual (coded in a proper XML schema) and mathematical (MathML) metadata (i.e. titles, keywords, authors, references etc.) from items in the content repositories, namely various types of mathematical documents, including scanned images, TeX/LaTeX sources, PDF documents, etc.
To validate and merge the discovered metadata with that already registered for the target items.
To provide tools to identify, from the Metadata Repository, items that may benefit from metadata enhancement and automatically apply such enhancement processes to them.
- Deliverable 7.1 – State of the art of augmenting metadata techniques and technology Identification of main issues and challenges on augmenting metadata techniques and technologies appropriate for using on a corpora of mathematical scientific documents. For most partial tasks tools were identified that are able to cover basic functionalities that are expected to be needed by a digital library of EuDML type, as in other projects like PubMed Central or Portico. Generic standard techniques for metadata enhancement and normalization are applicable there.
Deliverable also reviews and identifies expertize and tools from some project partners (MU, CMD, ICM, FIZ, IU, and IMI-BAS). Main (unresolved) challenges posed are OCR of mathematics and reliable and robust converting between different math formats (TEX and MathML) to normalize in one primary metadata format (NLM Archiving DTD Suite) to allow services like math indexing and search.
In a follow up deliverable D7.2, tools and techniques will be chosen for usage in the EuDML core engine (combining YADDA and REPOX), or as a (loosely coupled) set of enhancement tools in a linked data fashion.
- Deliverable 7.2 – Toolset for image and text processing and metadata editing – initial release Tools produced by EuDML partners and made available demonstrate building bricks of enhancer tools, whose functionality should check, correct and enhance metadata collected both from partners, including Zentralblatt MATH, and from the analysis of full text or PDF document versions of items in the EuDML collection. Demonstration web pages allow testing and evaluation of thirteen tool prototypes.
- Deliverable 7.3 – Toolset for image and text processing and metadata enhancement – Value release This demonstration description presents tools and partial workflow results produced by EuDML partners and made available for demonstration. They demonstrate enhancement tools, whose functionality should find, check, merge, correct and enhance metadata and full texts collected both from partners, including Zentralblatt MATH, and from the analysis of full text or PDF document versions of items in the EuDML collection.
- D7.4 – Toolset for Image and Text Processing and Metadata Enhancements – Final release This demonstration description presents tools and partial workflow results produced by EuDML [partners] and either integrated and used in core EuDML processing and/or made available as standalone tool or as demonstrations. Enhancement workflow and tools whose functionality should find, check, merge, correct and enhance metadata and text or PDF document full text of items in the EuDML collection are described. Demonstration web pages allow testing and evaluation of these tools, in addition to the project site itself, where enhanced data are projected.