HTR Transcription: a project-based approach to automated transcription

Friday, 12 November 2021

How McGill University Library built upon its pilot digital collection in Quartex by exploring the potential of HTR Transcription to remove barriers to accessing and understanding primary sources in its Fur Trade Collection.

As told in a Quartex/Library Journal webinar by Jacquelyn Sundberg (Outreach and Special Projects at McGill University Library and McGill ROAAr (Rare Books, Osler Library of the History of Medicine, Visual Arts Collection and McGill University Archives)) and Carolyn Pecoskie (Metadata and Electronic Resources Librarian at McGill Library).

 

Breaking down barriers to understanding manuscript materials 

“At McGill Library and ROAAr, our archival collections, which include treasures such as historical atlases, travel narratives and natural history books, number around 250,000 rare materials. This includes a significant amount of manuscript or handwritten material, which presents unique challenges for access.”

“As much as possible, digitization makes these items more accessible but it is only the first step; they also require cataloging, metadata and a framework to be discoverable, as well as time-consuming manual or crowdsourced transcription.”

“In addition, even history students who receive the most training in primary source literacy can be put off by the time and effort required to use handwritten sources and interpret them for more than just the literal meaning of the text. This creates two significant barriers: primary source literacy skills and handwriting skills.”

 

Quartex Pilot Project

“As a first stage of piloting the use of Handwritten Text Recognition, or HTR, technology available in Quartex, we chose one collection, our entirely manuscript Doncaster Recipes Collection of papers, recipe books and medical receipts. The collection was ingested and published, alongside an accompanying digital exhibit, in April 2020. It made an interesting test case for HTR, which at that point couldn’t generate transcripts, but provided enhanced search functionality in the form of full-text search.”

“We then began working on a second collection, the Fur Trade Collection, with the enhancement of HTR Transcription. Thanks to a grant from the National Heritage Digitization Strategy, we were able to make digitally available a new swathe of materials to complement our existing fur trade collection, this time documenting the Colonial-era Fur Trade through the lens of the North West Company through which the university founder, James McGill, made his fortune.”

“Primarily authored by the Company’s bourgeois and predominantly European senior management, the holdings in this collection nevertheless reveal, albeit indirectly, the presence of Indigenous peoples and the extent to which their knowledge was critical to the success of the Company and of the Fur Trade itself.”

“One of the reasons that HTR was actually quite an exciting prospect for the Fur Trade Collection is that it opens up new pathways to uncover the hidden and indirect content; the legacy of Indigenous knowledge.”

 

To continue the story, register to watch the full webinar.

 

Discover:

  • The team’s approach to metadata configuration as a way of making the published Fur Trade Collection as accessible and discoverable as possible
  • The methodology and mechanics behind the collection’s pathways to discovery
  • The published collection through a guided tour of search functionality and display features
  • The effectiveness of automated HTR Transcription on a variety of materials in the collection
  • The conclusions drawn on the effectiveness of HTR as a tool for breaking down barriers to using handwritten materials
  • The Library’s plans to build upon this pilot project and extend its use of HTR into further collections

 

 

 

Get in touch to find out more

If you have any queries regarding the Quartex platform please contact us.