Skip to main content

Digital MethodsEventsResearch

Medieval Studies Mobilising Digital Humanities

29 October 2025

On October 17th, Cardiff hosted the second workshop on Medieval Studies Mobilising Digital Humanities, a project funded by a GW4 Generator Grant between Cardiff University and Bristol University. The workshop, organised by Sara Pons-Sanz(Cardiff University) and Amy Jones (Bristol University), focussed on corpora and digital editions. Participants were introduced to a range of methods and tools, with the opportunity to gain hands-on experience in navigating the treacherous path from a physical manuscript all the way through transcription, corpus complication and searching.

The first talk was from Alexander Roberts (Swansea University) talking about Transkribus, a software platform for automatic transcription of handwritten documents. The platform has been updated recently, with many new models to handle different languages and writing styles from handwritten letters to illustrated manuscripts. We got to practice on a fragment of Dylan Thomas’s notebooks, with the AI models taking mere moments to get a starting transcription. Roberts showed that choosing the right model and high resolution are key considerations for the best results. No AI transcription will be perfect – Thomas’s idiosyncratic “I” was consistently transcribed as a 9 – but it provides a rapid starting point for editing. Transkribus is not free, but includes a powerful way of organising data, mapping text to images and hosting your data for others to access.

Next, Seán Roberts (Cardiff University) talked about what to do with transcribed text once you have it. We went through the basics of using Sketch Engine to create our own corpus that could be searched using metadata. Sketch Engine has recently added some limited support for medieval languages, but lemmatisation is still a challenge, especially with a lack of spelling standards. Roberts (the younger) argued that a little programming could help researchers externalise their processes into a ‘pipeline’ that provided reproducibility and reduced stress.

Anna Havinga (University of Bristol) and Tom Hinton (University of Exeter) discussed how to take text collections and annotate them further with XML metadata. They covered TEI headers to encode important information about a text, including the ability to mark different sections of a text as being in different languages. They also covered the XQuerylanguage for searching XML. It is similar to SQL, but can handle finding nested items. They showed this in practice on the web interface for the manuscripts of Walter de Bibbesworth’s ‘Tretiz’, a thirteenth century guide to the French language composed in rhyming couplets. The web tool allows one to search and compare different editions of the text side-by-side.

The last talk was by Jennifer Hurd, part of the new words team at the Oxford English Dictionary. The OED has been a critical resource for a long time, but the recently updated advanced search options provide powerful new ways of finding data. For example, words can be filtered to find only those first appearing in a certain date, or only those borrowed into the language. There are now deeper links with the Oxford Historical Thesaurus, including being able to search within a specific semantic field. This allowed us to do some fun searches, such as finding historical slang for writers or horse-riding. Hurd also discussed upcoming plans to expand the dictionary to include more historical evidence and a greater variety of world Englishes.

Overall, it’s a very exciting time to be studying medieval texts with digital methods. There are lots of new digital tools available, but importantly they are all aimed at facilitating human investigation of the past.

The next workshop in the series will be held in Bristol on Friday 28th of November and will focus on tools for geographic mapping of language data.