Difference between revisions of "User:GavinRobinson/Named entity subobjects"

From MarineLives
Jump to: navigation, search
(overview of demo)
 
m (Templates, Forms and Transcripts: link to query template)
 
Line 23: Line 23:
  
 
* inline markup: [[Template:persname]] can be embedded directly in transcribed text. If the person is positively identified, it creates a wikilink to their page. For example, see [[User:GavinRobinson/Test page inline]].
 
* inline markup: [[Template:persname]] can be embedded directly in transcribed text. If the person is positively identified, it creates a wikilink to their page. For example, see [[User:GavinRobinson/Test page inline]].
* semantic forms: [[Template:persname in form]] is a repeatable template that can be added to a form. It allows entering metadata about people separately from the transcribed text. For example, see [[User:GavinRobinson/Test page with form]], which uses [[Form:persname]].
+
* semantic forms: [[Template:persname in form]] is a repeatable template that can be added to a form. It allows entering metadata about people separately from the transcribed text. For example, see [[User:GavinRobinson/Test page with form]], which uses [[Form:persname]] (which also transcludes [[Template:Get person subobjects]]).
  
 
There's plenty of room to explore and debate which approach is better. Inline is likely to be easier for readers but harder for editors, and could be inconvenient for text miners. The semantic forms approach is likely to be easier for editors but harder for readers, and fits better with current practice at Marine Lives.
 
There's plenty of room to explore and debate which approach is better. Inline is likely to be easier for readers but harder for editors, and could be inconvenient for text miners. The semantic forms approach is likely to be easier for editors but harder for readers, and fits better with current practice at Marine Lives.

Latest revision as of 16:25, July 18, 2016

This is a demo of possible ways of using semantic tags for named entities mentioned in transcribed documents. The current version concentrates on people but if it works it could easily be extended to other types of named entity such as ships or places. You can discuss it on the talk page of this page or on talk pages of more specific pages linked to below. Everyone should feel free to edit the test pages.

General Principles


  • Template everything. Hiding the implementation makes things easier for editors because they only have to learn templates. It's also easy to change the implementation in the future without editing every page.
  • Getting the best out of Semantic MediaWiki means not having to search as much. Most users should be able to find most of what they want by clicking through a trail of wikilinks, although the trail will usually have to start with a simple search, and there will always be a need for some users to construct their own queries.
  • Historical data is often uncertain. Semantic markup needs to represent uncertainty.
  • TEI can help us to think more clearly about the structure and semantics of documents even if we're not using it for markup.


Subobjects and Properties


This approach uses subobjects to group together the following properties:

  • Property:Name transcribed as text property containing the name string as transcribed in the text. This allows searching on strings marked up as names even if the person they refer to hasn't been identified.
  • Property:Identified as person page property linking to a page for the person if they can be positively identified.
  • Property:Could be person page property similar to above but for people whose identity isn't as certain. Allows multiple values if it could be more than one person.
  • Property:Performs role in document the person's role in the document, eg deponent. Currently a text property but could be changed to a page property in future as that would allow documenting what each role means.
  • Property:Mentioned in page always defaults to the page name of the page that contains the subobject. Used to hide subobjects in query results and avoid the need for an extra level of query to find the parent page.


Templates, Forms and Transcripts


The subobjects can be used in two different ways:


There's plenty of room to explore and debate which approach is better. Inline is likely to be easier for readers but harder for editors, and could be inconvenient for text miners. The semantic forms approach is likely to be easier for editors but harder for readers, and fits better with current practice at Marine Lives.

Person Pages


These are examples of pages for people that can contain a biography, queries for other pages that mention them, and links to external sources:


In future, pages like this could contain structured semantic data about people. They could contain semantic links to each other, which might be something else to think about when deciding on semantic markup of names.


Example Query


This inline query should return every existing person subobject that has at least a transcribed name:

Mentioned in pageName transcribed asIdentified as personCould be personPerforms role in document
GavinRobinsonDaniell SowtonMentioned
GavinRobinsonDavid AvysGavinRobinson/David AvysMentioned
GavinRobinsonGeorge WillinghamGavinRobinson/George Willingham (d. 1651), merchantMentioned
GavinRobinsonWilliam PeaseWilliam Pease senior
William Pease junior
Mentioned
GavinRobinson/Test page inlineLydeaMentioned
GavinRobinson/Test page inlineEarle of denbyeWilliam Fielding, 1st Earl of Denbigh
Basil Fielding, 2nd Earl of Denbigh
Mentioned
GavinRobinson/Test page with formWilliam PeaseWilliam Pease senior
William Pease junior
Mentioned
GavinRobinson/Test page with formDaniell SowtonMentioned
GavinRobinson/Test page with formDavid AvysGavinRobinson/David AvysMentioned
GavinRobinson/Test page with formGeorge WillinghamGavinRobinson/George Willingham (d. 1651), merchantMentioned