User talk:GavinRobinson/Named entity subobjects

From MarineLives
Jump to: navigation, search

Some things don't look quite right in the query results but we'll see if it sorts itself out later.--GavinRobinson (talk) 14:14, July 17, 2016 (UTC)

Hi Gavin, I am just catching up with the work you have been doing.

General Principles


  • Fully agree as to the importance of Templates, and would add Forms to the mix.


  • Crucial thing about Forms is it keeps less experienced editors away from semantic markup of Properties


  • Forms can be used to restrict the choice of values for Properties to a controlled list


- For example, enforcing one consistent spelling of a London parish when entered as a birth or residential location



In-line mark-up of people vs. biographical pages of people


  • This is a big topic!


  • Firstly, in the short term we have a problem in implementing in-line markup in the transcription field of our Form-based pages. This is because of a design decision made when porting data from eight separate volume based Wikispot.org wikis into one integrated MarineLives wiki on our own server and managed by our own volunteer systems admin, Rowan Beentje. In the wikispot.org environment we had used square brackets and curly brackets [ and ] and { and } as symbols to denote specific things.


Curly brackets were used to denote the name of a case as in:

Oliver Cromwell vs King}
Charles}

Square brackets were used to denote editorial comments related to layout, as in

[CENTRE HEADING]
[LH MARGIN]
[SIGNATURE, RH SIDE]

This existing markup got in the way of the parser function in SMW when we ported the data over, resulting in some pages with this markup only partially displaying. Consequently, Rowan wrote a special SMW module to turn the function of the square and curly brackets off in the Transcription field of our Form-based Pages and also in the portions of the pages which were conceived of for notes on People, Places, Ships, Miscellaneous and Sources (in passing, please note that these fields have only been used to a limited degree in annotating transcriptions and have been used inconsistently). They were not conceived of semantically - purely as text annotations.

It would be possible, however, to edit all the transcribed text (ca. 10,000 text pages) to eliminate the square and curly brackets or to put nowiki protection on them (I think this would be possible). In which case, we can talk seriously about in-line markup

  • The second big issue is the way that SMW works in associating properties with the underlying datastructure(s). Our datastructure currently follows the physical manuscript volumes, which are based on physical pages. Our page naming convention follows the naming of the physical pages. Where there is no foliation in the physical volumes, we have introduced artifical foliation, following the recto, verso, recto, verso pattern. This has the big advantage of enabling a transcriber and/or a reader and/or a researcher to move easily from image to text transcription. Each physical page contains 350-450 words, which is digestible. It also gives confidence to academics when using the material because they themselves can quickly get a sense of the accuracy of the transcription.


BUT, there is of course a second datastructure in the volumes, built around depositions (and cases). A case may have anything between one and fifteen depositions. Of course many cases have no depositions, becasue they were stopped before the deposition stage. You can follow the legal process in the Act Books. See for example HCA 3/46, which us the Act book for the years 1654-1656.

Thierry Daunois is working with me to explore what our choices are for semanticisation, and is arguing that we should restructure around depositions. This would be a very big change, with pros and cons.

The crucial point about in-line markup is that it is associated with the page on which it appears. So if the page is the electronic representation of a physical page that is one type of association; and if the page is the electronic representation of the deposition (anything between one third of a physical page and ten to twelve pages) that is another type of assocaiation.

If the inline semantic markup is being used simply to enhance search then the links to the "page" are not so important, e.g. show me the physical pages or the depositions in which there are Events of a certain type.

But if the searches seek to infer relationships between the in-line markup and the metadata around a deponent, then the choice of physical page or deposition page as a datastructure is very important.

I need to break off now, but will come back to the topic with some specifics Colin Greenstreet (talk) 19.06, July 26, 2016 (UTC)