Marine Lives guide to creating a Transkribus Ground Truth
This page is the main page of a guide that Marine Lives is creating to cover practical aspects of creating a Transkribus Ground Truth
BACK TO GROUND TRUTH MAIN PAGE
Objective
Our overall objective is to make 80,000 images of English High Court of Admiralty depositions covering 1570 to 1690 publicly available and searchable.
To do this our immediate objective is to create a C17th English secretarial hand HTR model, which we will use on our collection of 80,000 images of English High Court of Admiralty depositions.
We are aiming to create two models. The first based on a Ground Truth of 500,000 words (roughly 1,000 manuscript images). The second based on one million words (roughly 2,000 manuscript images).
For our first model, we are using existing semi-diplomatic transcriptions of the HCA 13/72 volume [late 1650s], made between 2013 and 2015 by Marine Lives volunteers. We have to convert these semi-diplomatic transcriptions back to full diplomatic, and mark up contractions, as part of the process of creating a highly reliable Ground Truth. We also have to treat interlineation differently, with each interline shown separately and the insertion point or points marked inm the line on which the interlineation depends.
Tools
We are working with several related Transkribus Tools and with our own semantic media wiki
2. Transkribus Lite version 2.0
3. Marine Lives semantic media wiki
Work Process
1 Automatic layout recognition of all 1518 images in HCA 13/72
We are experimenting
We are experimenting with a range of Transkribus tools related to layout analysis and HTR.
Questions
We are developing a running list of questions.
Some of these questions we will probably be able to answer ourselves, as we get more experience of building our Ground Truth.
But, in the meantime, we would appreciate sugegstions from fellow Transkribus users.