Sharing via OAI-PMH: Tips for Member Repositories

Introduction

OAI-PMH is a protocol for harvesting the metadata descriptions of objects from an archive and it is the way that MWDL members share their metadata with us.

In this guide you will find tips for making the OAI-PMH sharing process smoother between your institution and MWDL. While we have included links to OAI-PMH resources for some common digital repositories in our network (right), we are not experts in how to configure your specific OAI-PMH settings. Each system is different, so you will need to consult your user manual, vendor, or systems team for information about making configuration changes.

How we use OAI to harvest

Our harvesting system in Ex Libris Primo sends a standard “Identify” request first to verify that the OAI repository is functioning.

Then it sends a “ListRecords” request with “from” and “until” parameters to obtain the first batch of metadata records from the repository.

Additional “ListRecords” requests with appropriate “resumptionToken” parameters, are sent as needed to get the full listing of records.

We run a number of normalization routines on the harvested records to transform the Dublin Core metadata into Primo normalized XML. 

Metadata schemes

Most institutions create metadata that complies with one of several possible metadata schemes. MODS, Dublin Core (DC), and Qualified Dublin Core (QDC) are the most common schemes that our partners use, though your institution might use a different scheme like METS or MARC, or even a locally created scheme.

MWDL only harvests OAI-PMH feeds of Dublin Core (DC) and Qualified Dublin Core (QDC) at this time. Be sure to double-check your OAI-PMH feed to make sure that it is outputting your metadata in Dublin Core format.

Querying your collection’s OAI stream in a Web browser is a great way to check the mapping of your fields, and to see if any text is getting truncated or any non-Unicode characters are being included. Also, if the OAI provision is not working, you will get an error message in the results, which may give useful information.

You can use MWDL’s Metadata Audit tool to check mapping, general metadata completeness, and conformity with the MWDL Metadata Application Profile (V3).

Collections

MWDL harvests metadata at the collection level. We do not have the ability to harvest individual items or to omit specific items from a collection. In other words, we can’t cherry pick items from your collections.

MWDL includes digital collections for search on our search portal at https://mwdl.org only by explicit permission of the repository managers. To request collection(s) be added to or deleted from the MWDL database, please use this form.

Notes

  • We recommend that you use a digital assets management system that includes built-in OAI metadata provision that is easy to configure.
  • Please ensure your metadata format validates against a schema from the Dublin Core Metadata Initiative, not a proprietary vendor schema.
  • If you have a repository that does not have built-in OAI metadata provision, please implement one of the many open-source or low-cost OAI provider tools. We strongly advise against creating your own OAI provider module.
  • Take advantage of the OAI sets implementation to separate the different collections. MWDL can harvest separate sets, using the setSpec assigned to each item to separate the collections. The setSpec should be reflected in the OAI identifier, so that we can retrieve the setSpec from there. Typically, we harvest all records and tag only certain sets for display, i.e., the sets that you submit to us. If you do not implement sets, we have no option other than to harvest your entire repository and present it as one collection.
  • We recommend that you implement OAI deleted record status. This is not required of OAI repositories but, without it, we have no way to remove from our harvester the records that you delete locally, except by a full delete-and-reload of your entire repository, which we prefer not to do (very often).
  • While any system for assigning unique identifiers is acceptable with the OAI protocol, we recommend you generate a meaningful OAI identifier that is related to the setSpec and item number in your digital assets management repository. An example of such an identifier is “oai:archives.myuniversity.edu:photos/6”, where “archives.myuniversity.edu” is the domain, “photos” is the setSpec and “6” is the item number. This makes it easy for our harvester to identify the collection each item belongs to and to create links to the items in your repository without having to resort to the <dc:identifier> field.
  • Please test your OAI provider to ensure it conforms to the OAI protocol before offering it for harvest. Fix any issues identified by these tools: Validator at the Open Archives Initiative site or OAI-PMH Validator.
  • There is a Google Group for OAI-PMH (formerly called oai-implementers) that provides a great place to ask questions.
  • If your systems administrators keep whitelists for access to your OAI provision, they will need to add MWDL’s IP: 66.151.7.130.