What possibilities do open-source tools offer for automated handwriting recognition? How long does the transcription of a pathological dissection protocol in Kurrent script from 1862 take, and what factors need to be considered when training the software? These questions will be addressed by a pilot project being conducted by the Berlin Museum of Medical History at the Charité, supported by Digitales Netzwerk Sammlungen and in collaboration with the DACHS research centre at the University of Würzburg’s Zentrum für Philologie und Digitalität „Kallimachos“ (ZPD).
The Berlin Museum of Medical History at the Charité preserves 46 voluminous folios containing the handwritten, bound section protocols of the Charité’s Institute of Pathology from 1856 to 1902. This collection comprises 35,156 protocols in total, spanning 41,111 folios. Further volumes of these records from the period 1849 to 1856 are held by the University Archives in Würzburg, representing an important primary source in the history of science. These date from the time of Rudolf Virchow (1821–1902), who is considered a founder of modern science-based medicine. He made a significant contribution to the development of pathology as a discipline by coining terms for and concepts of diseases that are still in use today.
Due to their poor state of preservation, the protocols at the Berlin Medical History Museum of the Charité have rarely been made available to researchers. However, funding from the Coordination Office for the Preservation of Written Cultural Heritage (KEK) and the “Friends of the Museum” (Förderverein) is currently enabling all 46 volumes to be restored and digitised by the end of 2025.
Project goal
This project has been designed as a pilot study for a larger research project. The segmentations and transcriptions produced will be used as ‘ground truth’ to train a model for the subsequent machine processing of the handwritten material. This will provide valuable information on the performance of the open-source OCR4all software and the LAREX editor for the project, as well as providing a solid basis for further cost estimation. The long-term aim is to create a digital critical edition of all the minutes from Rudolf Virchow’s creative period in Berlin and Würzburg (1849–1902) and make it available to researchers.
Photos (1, 2): © 2022 KBE/Kuhn, FREIZEIT Gestaltung;
Screenshot (3): Sektionsprotokoll from 1862 in the LAREX-Editor (2024, Digitales Netzwerk Sammlungen)