What possibilities do open-source tools offer for automated handwriting recognition? How long does the transcription of a pathological dissection protocol in Kurrent script from 1862 take, and what factors need to be considered when training the software? These questions will be addressed by a pilot project being conducted by the Berlin Museum of Medical History at Charité, supported by the Digital Network for Collections and in collaboration with the DACHS research unit at the University of Würzburg’s Kallimachos Center for Philology and Digitality (ZPD).
The Berlin Museum of Medical History at Charité preserves 46 voluminous folios containing the handwritten, bound dissection protocols of Charité’s Institute of Pathology from 1856 to 1902. This collection comprises 35,156 protocols spanning a total of 41,111 pages of folio. Further volumes of these records from the period 1849 to 1856 are held by the University Archives in Würzburg, representing an important primary source in the history of science. They date from the time of Rudolf Virchow (1821–1902), who is regarded as one of the founders of modern, science-based medicine. Virchow made a significant contribution to the development of pathology as a discipline by coming up with terms for and concepts of disease that are still in use today.
Due to their poor state of preservation, the protocols at Charité’s Berlin Museum of Medical History have rarely been made available to researchers. However, funding from the Coordination Office for the Preservation of Written Cultural Heritage (KEK) and the Friends of the Museum (Förderverein) is currently enabling all 46 volumes to be restored and digitized by the end of 2025.
Project goal
This project has been designed as a pilot study for a larger research project. The segmentations and transcriptions produced will be used as a “ground truth” in order to train a model for the subsequent machine processing of the handwritten material. This will generate valuable information on the performance of the open-source OCR4all software and the LAREX editor for the project, as well as providing a solid basis for further cost estimations. The long-term aim is to create a historical-critical edition of all the minutes from the period Rudolf Virchow spent working in Berlin and Würzburg (1849–1902) and to make it available online to researchers.
Photos (1, 2): © 2022 KBE/Kuhn, FREIZEIT Gestaltung
Screenshot (3): Dissection protocol from 1862 in the LAREX editor (2024, Digital Network for Collections)