<?xml version="1.0"?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en-gb">
		<id>http://www.marinelives.org/index.php?action=history&amp;feed=atom&amp;title=Ground_Truth_Work_Process</id>
		<title>Ground Truth Work Process - Revision history</title>
		<link rel="self" type="application/atom+xml" href="http://www.marinelives.org/index.php?action=history&amp;feed=atom&amp;title=Ground_Truth_Work_Process"/>
		<link rel="alternate" type="text/html" href="http://www.marinelives.org/index.php?title=Ground_Truth_Work_Process&amp;action=history"/>
		<updated>2026-04-05T19:46:34Z</updated>
		<subtitle>Revision history for this page on the wiki</subtitle>
		<generator>MediaWiki 1.25alpha</generator>

	<entry>
		<id>http://www.marinelives.org/index.php?title=Ground_Truth_Work_Process&amp;diff=130939&amp;oldid=prev</id>
		<title>ColinGreenstreet at 22:02, March 3, 2022</title>
		<link rel="alternate" type="text/html" href="http://www.marinelives.org/index.php?title=Ground_Truth_Work_Process&amp;diff=130939&amp;oldid=prev"/>
				<updated>2022-03-03T22:02:55Z</updated>
		
		<summary type="html">&lt;p&gt;&lt;/p&gt;
&lt;table class='diff diff-contentalign-left'&gt;
				&lt;col class='diff-marker' /&gt;
				&lt;col class='diff-content' /&gt;
				&lt;col class='diff-marker' /&gt;
				&lt;col class='diff-content' /&gt;
				&lt;tr style='vertical-align: top;'&gt;
				&lt;td colspan='2' style=&quot;background-color: white; color:black; text-align: center;&quot;&gt;← Older revision&lt;/td&gt;
				&lt;td colspan='2' style=&quot;background-color: white; color:black; text-align: center;&quot;&gt;Revision as of 22:02, March 3, 2022&lt;/td&gt;
				&lt;/tr&gt;&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 1:&lt;/td&gt;
&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 1:&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;'''We have set up a simple work process'''&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;'''We have set up a simple work process'''&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot;&gt;&amp;#160;&lt;/td&gt;&lt;td class='diff-marker'&gt;+&lt;/td&gt;&lt;td style=&quot;color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;[http://www.marinelives.org/wiki/Marine_Lives_guide_to_creating_a_Transkribus_Ground_Truth BACK TO GROUND TRUTH MAIN PAGE]&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot;&gt;&amp;#160;&lt;/td&gt;&lt;td class='diff-marker'&gt;+&lt;/td&gt;&lt;td style=&quot;color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;----&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;__TOC__&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;__TOC__&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;−&lt;/td&gt;&lt;td style=&quot;color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&amp;#160;&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt;+&lt;/td&gt;&lt;td style=&quot;color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;----&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;==Automatic layout recognition of all 1518 images in HCA 13/72==&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;==Automatic layout recognition of all 1518 images in HCA 13/72==&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;/td&gt;&lt;/tr&gt;

&lt;!-- diff cache key marinelives:diff:version:1.11a:oldid:130927:newid:130939 --&gt;
&lt;/table&gt;</summary>
		<author><name>ColinGreenstreet</name></author>	</entry>

	<entry>
		<id>http://www.marinelives.org/index.php?title=Ground_Truth_Work_Process&amp;diff=130927&amp;oldid=prev</id>
		<title>ColinGreenstreet: Created page with &quot;'''We have set up a simple work process'''  __TOC__  ==Automatic layout recognition of all 1518 images in HCA 13/72==  - Used the CITlab Advanced Tool  File:CITlab Advanced...&quot;</title>
		<link rel="alternate" type="text/html" href="http://www.marinelives.org/index.php?title=Ground_Truth_Work_Process&amp;diff=130927&amp;oldid=prev"/>
				<updated>2022-03-03T21:31:40Z</updated>
		
		<summary type="html">&lt;p&gt;Created page with &amp;quot;&amp;#039;&amp;#039;&amp;#039;We have set up a simple work process&amp;#039;&amp;#039;&amp;#039;  __TOC__  ==Automatic layout recognition of all 1518 images in HCA 13/72==  - Used the CITlab Advanced Tool  File:CITlab Advanced...&amp;quot;&lt;/p&gt;
&lt;p&gt;&lt;b&gt;New page&lt;/b&gt;&lt;/p&gt;&lt;div&gt;'''We have set up a simple work process'''&lt;br /&gt;
&lt;br /&gt;
__TOC__&lt;br /&gt;
&lt;br /&gt;
==Automatic layout recognition of all 1518 images in HCA 13/72==&lt;br /&gt;
&lt;br /&gt;
- Used the CITlab Advanced Tool&lt;br /&gt;
&lt;br /&gt;
[[File:CITlab Advanced Tool ML 03032022.png|500px|thumb|left|Layout Analysis controls in Tools section of Transkribus Expert Client controls panel]]&lt;br /&gt;
&lt;br /&gt;
- Modified the layout page by page after manual inspection of automatically generated layouts&lt;br /&gt;
&lt;br /&gt;
     We are only just beginning to think through what makes sense in terms of use of Text Regions when creating our Ground Truth&lt;br /&gt;
     We are finding that the automatic tool is typically producing between one and three Text Regions per manucript image&lt;br /&gt;
     Typically the tool is NOT identifying text blocks on the left hand side of an image as separate from structurally separate text in the main body of text&lt;br /&gt;
     Ideally, we would train the automatic layout recognition tool to be sensitive to the typical structures of HCA legal depositions, and we are looking into this&lt;br /&gt;
     In the short term, we are manually adding Text Regions, and changing the shape and size of Text Regions&lt;br /&gt;
     However, base lines of text have already been recognised and allocated to specific text regions. &lt;br /&gt;
     We have found an easy way using Transkribus layout tools to reallocate the base lines [see below]&lt;br /&gt;
&lt;br /&gt;
- The two key modifications we are making are&lt;br /&gt;
&lt;br /&gt;
(a) Adjusting number size and shape of Text Regions&lt;br /&gt;
(b) Checking all automatically generated base lines (which themselves are &amp;quot;children&amp;quot; of a partent Text Region)&lt;br /&gt;
     Look for breaks in base lines&lt;br /&gt;
     Look for incomplete base lines&lt;br /&gt;
     Connect broken base lines&lt;br /&gt;
     Extend incomplete base lines&lt;br /&gt;
(c) Reallocating base lines to our newly created and/or adjusted Text Regions&lt;br /&gt;
&lt;br /&gt;
[[File:Transkribus Expert Client Layout HCA 1372 f.14v.png|750px|thumb|left|Layout out HCA 13/72 f.14v once we have manually adjusted the Text Regions, creating six Text Regions and reallocating lines to those regions]]&lt;br /&gt;
&lt;br /&gt;
[[File:Reallocating Base Lines To New Text Regions One 03032022.png|750px|thumb|left|Reallocating base lines to new Text Regions: Part One]]&lt;br /&gt;
&lt;br /&gt;
[[File:Reallocating Base Lines To New Text Regions Two 03032022.png|750px|thumb|left|Reallocating base lines to new Text Regions: Part Two]]&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
==Input of existing semi-diplomatic transcriptions of HCA 13/72 manuscript pages into Transkribus Expert Client==&lt;br /&gt;
&lt;br /&gt;
Once the automatically generated Text Regions have been adjusted for a specific image page&lt;br /&gt;
&lt;br /&gt;
* Input the semi-diplomatic Marine Lives transcription for the relevant page, matching each line of transcribed text to the correct automatically generated line within the correct Text Region&lt;br /&gt;
&lt;br /&gt;
* The chart below shows our workflow for manuscript page HCA 13/72 f.11v.  &lt;br /&gt;
     We have the Marine Lives wiki open at the correct page on the left hand side of our screen. &lt;br /&gt;
     In the middle and on the right hand of our screen we have the Transkribus Expert Client open with the Layout Tab open in Transcription View. &lt;br /&gt;
     This enables us to see the relevant part of the image, with the relevant Text Region.&lt;br /&gt;
     We are pasting transcribed text against the correct lines. &lt;br /&gt;
     To ensure a good human overview, we have pasted two or three lines of transcribed text into each Text Region&lt;br /&gt;
     This gives us good human oversight of the document.&lt;br /&gt;
     Then we work methodically through all the text&lt;br /&gt;
&lt;br /&gt;
[[File:Workflow Page HCA1372f.11v.png|750px|thumb|left|Our workflow showing Marine Lives wiki page and Transkribus Expert Client with Layout Tab open in Transcription View: Part Two]]&lt;br /&gt;
&lt;br /&gt;
----&lt;/div&gt;</summary>
		<author><name>ColinGreenstreet</name></author>	</entry>

	</feed>