Using AI to decipher historical manuscripts

Jean-Olivier Dicaire-Leduc lets us rediscover historical documents from the Louis-François-Georges-Baby collection.

Jean-Olivier Dicaire-Leduc lets us rediscover historical documents from the Louis-François-Georges-Baby collection.

Credit: Amélie Philibert, Université de Montréal

In 5 seconds

Jean-Olivier Dicaire-Leduc throws light on hundreds of previously illegible pages of handwritten documents – some more than 350 years old – using text-recognition software powered by AI.

How many Indigenous men were selected to fight in John Ferdinand Dalziel Smith's 1784 tour of North America?

How much land near Sillery did the Augustinian Sisters of Hôtel-Dieu de Québec grant as a concession to the Indigenous people of Tadoussac in 1642?

What penalty did the Sovereign Council hand down to bootleggers in Trois-Rivières in 1667?

The answers to these and many other questions are found in the Louis-François-Georges Baby Collection of historical documents preserved by Université de Montréal's Archives and Information Management Division (DAGI).

And now, the collection is unlocking new secrets, rediscovered by Jean-Olivier Dicaire-Leduc, who is completing a master's degree in history at UdeM and recently did an internship at the DAGI.

Dicaire-Leduc focused on a portion of the Baby Collection—the N Series, which includes a variety of handwritten documents dealing with aboriginal affairs—with a view to deciphering the handwriting, which in many cases is illegible.

Dicaire-Leduc's master's studies were supervised by history professor Mathieu Arsenault, while his internship was overseen by a DAGI archivist. The budding historian also received support from the team behind the Donner le goût de l'archive à l'ère numérique project, which is dedicated to promoting archiving in the digital age and which is led by professor Dominique Deslandres.

Open-source software developed in Austria

Jean-Olivier Dicaire Leduc

Jean-Olivier Dicaire Leduc

Credit: Amélie Philibert, Université de Montréal

To decipher the historical manuscripts, Dicaire-Leduc used Transkribus, an open-source software program developed by a team from the University of Innsbruck, in Austria. It scans photos of manuscripts and then automatically transcribes the handwritten text appearing on them. It also allows users to share documents and run exhaustive searches on several hundreds of thousands of pages, which are subsequently archived.

"After you take a high-resolution digital image of an archival document and upload it to Transkribus, the software generates language models using learning algorithms that scan the text, identify recurring words and turns of phrase, and ultimately decipher the message," explained Dicaire-Leduc.

The challenge with the Baby Collection's N Series was that the documents covered a wide range of subjects and were written by different authors during different time periods.

"This series runs the gamut from court decisions to land concessions, demographic observations and merchandise purchase orders," the researcher said. "The automated transcriptions generated by Transkribus contained some errors, so I had to make corrections so that the documents would be easier to read and index."

Three goals in mind

Dicaire-Leduc had three goals for his internship, which was a pilot project for DAGI.

The first was to transcribe documents so that the content would be more accessible and could be indexed using archival research tools like UdeM's AtoM archival description app.

He also hoped to contribute to the university's initiative to promote archiving in the digital age by producing and sharing historical and archaeological data on the peoples living in the Montreal area during the 17th century.

"Finally, I wanted to analyze the content of the documents and critically review the archival descriptions produced when the Baby Collection was initially archived 70 years ago," added Dicaire-Leduc. "We don't want to eliminate terms that have aged poorly, but we want to contextualize them."

Above all else, Dicaire-Leduc believes the internship gave him the chance to "facilitate access to a part of the history of New France and possibly create archival research aids and dissemination tools that are accessible to everyone."

Who was Louis-François-Georges Baby?

Born in 1832, Louis-François-Georges Baby studied law before becoming first the mayor of Joliette, the federal MP for Joliette in 1872, and then Canada's inland revenue minister under Prime Minister John A. Macdonald from 1878 to 1880. He resigned from this position to be appointed to the Quebec Superior Court, and the following year was promoted to the Court of Appeal, where he served until his retirement in 1896.

For much of his adult life, Baby was an avid collector of historical documents and antiques. He amassed some 20,000 archival documents spanning three centuries (1601–1905), as well as a library of 3,400 rare and antiquarian books, which are now kept at UdeM's Bibliothèque des livres rares et collections spéciales.

His collection covers a wide range of subjects, including agriculture, education, military affairs, literature and politics. Some notable items include documents signed by important historical figures such as France's King Louis XIV and the Cardinal de Richelieu, as well as the correspondence of Patriotes leader and politician Louis-Joseph Papineau.

A more detailed description of the Louis-François-George Baby Collection is available in the UdeM Archives online catalogue.

The entire Baby Collection can also be found in Quebec's cultural heritage repertoire (Répertoire du patrimoine culturel du Québec).

  • One of the documents that Jean-Olivier Dicaire-Leduc managed to decipher using the Transkribus tool, which uses artificial intelligence.

    One of the documents that Jean-Olivier Dicaire-Leduc managed to decipher using the Transkribus tool, which uses artificial intelligence.

    Credit: Amélie Philibert, Université de Montréal
  • One of the documents that Jean-Olivier Dicaire-Leduc managed to decipher using the Transkribus tool, which uses artificial intelligence.

    One of the documents that Jean-Olivier Dicaire-Leduc managed to decipher using the Transkribus tool, which uses artificial intelligence.

    Credit: Amélie Philibert, Université de Montréal