CONTENTdm Production Pilot

November 5, 2015


CONTENTdm Production Pilot Project



1)  To move forward with a small linked data project using BIBFRAME in "production" mode in order to learn what is currently feasible and where obstacles lie.

2)  To evaluate BIBFRAME as a common format for data derived from a variety of schemas:  local schema, MARC and EAD.


Corpus of Materials:

The Libraries has many digitized historical photographs in CONTENTdm.  Five collections on the theme of Alaska, Western Canada and Washington State, with a strong representation of the Klondike and the Gold Rush, were selected:  Alaska, Western Canada and United States Collection, Asahel Curtis Photo Company Photographs, Frank La Roche Photographs, William E. Meed Photographs of the Yukon Territory, and Eric A. Hegg Photographs of Alaska and the Klondike.

From the library catalog, approximately 1,000 MARC records related to the Klondike Gold Rush were selected, including collection level records for the five CONTENTdm collections.

Two of the five CONTENTdm collections also have EAD finding aids.


Conversion to BIBFRAME:

The metadata schemas of the five CONTENTdm collections are identical, and a mapping was developed from a local schema to BIBFRAME.  The mapping table is available (.xlsx).  Metadata for the collections was exported from CONTENTdm in XML and converted to BIBFRAME in an RDF/XML serialization using an XSLT transformation based on the mapping table.

MARC records were exported from the local library system, converted to MARCXML with MarcEdit, and then converted to BIBFRAME in an RDF/XML serialization using the Library of Congress transformation in GitHub.

EAD finding aids were evaluated but not converted.  It became apparent that the information in them duplicates what is in  metadata from CONTENTdm and collection-level MARC records.  In addition, conversion of EAD to BIBFRAME is more complex than conversion of Dublin Core or MARC and further work on it was not justified for this project.

The resulting BIBFRAME files from CONTENTdm are available here:

Alaska and Western Canada



La Roche



Evaluation of BIBFRAME linked data:

The goal is to search the combined corpus of  converted CONTENTdm and MARC records in order to evaluate how well they work together.


Current Status:

We lack a triple store and SPARQL endpoint, which are the tools needed for the next step.  In the meantime, we have stored BIBFRAME data in eXist-db and developed queries by title keyword, subject keyword, photographer, date, format and ID number using XQuery.  Queries are available through simple form web pages.  We plan to use them to see what they can reveal about BIBFRAME as a common format.