OpenText.org Word Group Annotation Proposal November 2000

Word Group Annotation

Version 0.1 (10/11/2000)

OpenText.org Proposal November 2000

Editors:
Matthew Brook O'Donnell
Stanley E. Porter
Jeffrey T. Reed

Abstract

The word group is the basic unit of analysis in the OpenText.org model. This document outlines the linguistic analysis of the word group and its components and describes the XML elements and attributes used for its annotation.

Status of this document

This document is the initial proposal of the word group annotation scheme. It is currently under review and comments are requested. Please post comments to OpenText.org forum.

1. Introduction

2. Definitions

3. Features analyzed at the word group level
3.1. Semantic domains and relations (Field)
3.2. Participant reference type (Tenor)
3.3. Boundaries, punctuation, lexical form, and morphological features (Mode)

4. Elements and attributes available for word group annotation
4.1. Namespace
4.2. Main elements
4.3. <wg:group> element
4.4. <w> element
4.5. Part-of-speech elements
4.6. <wf> element
4.7. <punc> element
4.8. <wg:part> element

5. Examples

6. Use of word group annotation scheme and its components

7. Document Type Definition

1. Introduction

a. The concept of the 'word' as the basic unit of linguistic analysis is central to many grammatical and lexicographical resources. Though most language users possess an intuitive notion of the word, its definition and analysis has been an enduring problem for linguists and general linguistics. Recent studies making use of the concept of collocation have demonstrated that distinct meanings are often communicated in fixed phrases or combinations of words, quite distinct from the semantic analysis of individual words in the phrase. Morphological analysis of individual words can also be ambiguous, as in the case of the article tw'n which can be identified as genitive plural, but its gender cannot be decided at the word level. This ambiguity is removed when the word is analyzed as part of a word group, e.g. tw'n lovgwn aujtou'.

b. A further justification for considering the word group as the basic unit of analysis is that relations between the members of the group, e.g. a head-term with a genitive modifier take place before the clause level. This allows the clause level analysis to focus on the ordering and interaction of its components (e.g. subject, predicate, complement and adjunct [see Clause Level Annotation]). Thus many features that have traditionally been treated under the label of syntax can be dealt with at the word group level.

2. Definitions

[d1] This document assumes that the tokenization (the orthographic marking of boundaries between words) of the material has already been completed (see Orthographic Annotation). The term word refers to a distinct orthographic unit. This includes cases of crasis, e.g. kajkei' and kajgwv.

[d2] A word group will usually consist of a head-term and any and all of its modifiers, though it will frequently consist of just a single word.

[d3] A phrase such as oJ lovgo" Pauvlou is a single word group. The head term is lovgo" to which the other words are in a subordinate relationship. The terms oJ and Pauvlou are referred to as modifiers.

[d4] Two words joined by a conjunctive particle, such as Pau'lo" kai; Timovqeo", are said to be in a co-ordinating relationship and thus constitute two separate word groups.

[d5] Appositional phrases, e.g. Timovqeo" oJ ajdelfov", are usually classified as modifiers of the term to which they stand in apposition. Thus Timovqeo" oJ ajdelfov" is considered as a single word group.

[d7] There are five relationships associated with word groups. There are four types of modification of a term within a word group by its modifiers: (1) specification, (2) definition, (3) qualification, and (4) preposition. An additional relationship between two terms in a word group is connection, where the two terms are joined by a conjunction.

[d8] Specification occurs when a modifier classifies or identifies the word it modifies. Common examples of specifiers are articles, e.g. hJ ajdelfhv, and prepositions, e.g. ejn dovxh/. In a preposition phrase such as eij" to;n lovgon, both eij" and tovn are specifiers of lovgon.

[d9] Definition occurs when a modifier attributes features or further defines the word it modifies. Common examples of definers are adjectives (both attributive and predicate strcuture) and appositional words or phrases.

[d10] Qualification occurs when a modifier in some way limits or constrains the scope the word it modifies. Common examples of qualifiers are words in the genitive and dative case, and also negative particles functioning at the word group level.

[d11] Preposition occurs when a word specified by a preposition (i.e. the object of a preposition) modifies another element within the word group. For example, in the word group to; kat« ejme; provqumon, the term ejmev is in a prepositional relationship with the head term provqumon. This relationship only applies to prepositional phrases within word groups and not when the prepositional phrase functions as a clause component.

[d12] Connection is a relationship between two modifiers in a single word group. The conjunctive particles involved in this relationship are functioning at the word group level (e.g. Filhvmoni tw'/ ajgaphtw'/ kai; sunergw'/ hJmw'n) and differ from those that join two word groups (e.g. Pau'lo" kai; Timovqeo").

[d13] A discourse Participant is a person or thing that might represent a significant or relevant token within the discourse. It is not necessarily a token involved as an actor or patient of processes. For instance, the only references to qeov" in a discourse may be as word group qualifiers (e.g. oJ lovgo" qeou'). However, an annotator might choose to include it as one of the participants to be marked. The decision as to which words and phrases to count as participants is a subjective one. The important factor is consistency throughout the discourse.

[d14] Grammaticalized participant reference involves a full, substantive reference to a participant (e.g. oJ Timovqeo" or oJ levgwn).

[d15] Reduced participant reference involves the use of a pronoun or other referring expression to point to a participant (e.g. aujtov").

[d16] Implied participant reference includes the morphological features of person and number with a finite verb form (e.g. the third singular reference in the form levgei).

3. Features analyzed at the word group level

a. Though not formally recognized in the current elements and attributes, it is helpful to divide the features analyzed at the word group level according to whether they belong to the field, tenor or mode of discourse.

3.1. Semantic domains and relations (Field)

a. Each word within the group is marked with its major domain number(s) from the Louw-Nida dictionary. If a word is classified in more than one major domain each domain is noted in a comma separated list in the order they appear in the index of Louw-Nida, e.g. 88, 65, 57.

b. The semantic domain of a word group is taken to be that of its head term. If there are more than one possible domain for this term, the annotator must select just one from this list. The entry for a word in the Louw-Nida lexicon often includes citations associated with a specific domain classification. These should be consulted when selecting the domain classification of a head term. In addition, the annotator should pay attention to the domains of surrounding words, as by collocation they may help the disambiguation process. [[Do we need to specify criteria for this selection? What are they?] (-)]

c. Each word and word group within the discourse is assigned a unique identifier. For each word group the identifier of its head term is marked. For the other words within the group, each one is marked with an indication of which other word within the group it modifies and an indication of the kind of modification relationship (see Definitions).

d. A modifier can only modify one other word within its group. Thus only the closest semantic relation is indicated. This explains why the preposition katav in the group to; kat« ejme; provqumon is said to modify ejmev and not provqumon. In cases where a modifier appears to modify more than one word, for instance the article in Phlm. 2 tw'/ ajgathtw'/ kai suvergw'/ hJmw'n, it should be marked as modifying the word closest to it. So, tw'/ is annotated as a specifier of ajgathtw'/ (Higher level annotation can mark the fact that is serves as a modifier for two or more words).

e. These semantic relationships between members of a word group can be visualized using a simple series of boxes. Each word in a group is placed in the top row of a box, as shown in Fig. 1. If the word has modifiers within the group, four modifier slots are drawn, one for each of the types of modification: specification (sp), definition (df), qualification (ql), and preposition (pr).

WORD
sp	df	ql	pr
WORD(S)	WORD(S)	WORD(S)	WORD(S)

Fig. 1. A Word and its Possible Modifier Slots

f. Each of these modification slots can contain one or more words, visualized in the same way. In a complex word group a recursive nesting of word boxes results. For example, Fig. 2 visualizes the semantic relations within the word group Pau'lo" devsmio" Cristou' «Ihsou'. The head term is Pau'lo", which is modified through definition by devsmio", which is in turn qualified by Cristou', which itself is defined by «Ihsou'. This analysis conforms with the principle that each modifier in a word group modifies at most one other word in the group.

Fig. 2. Nesting within a Complex Word Group

3.2. Participant reference type (Tenor)

a. References to a discourse participant, either in a single word or extended phrase, are marked according to type of reference (see definitions).

b. This annotation can be added in-line within the same document or out-of-line in the document or in an external document through links referencing word identifiers. The latter option is recommended as it allows a word group node to have only words as direct children. Examples of both in-line and out-of-line participant annotation are provided in section 5.

c. The extent of the boundaries of a participant reference need not coincide with those of the word group. For example, the phrase Timovqeo" oJ ajdevlfo" constitutes a single word group. But the annotator may choose to mark two grammaticalized references to the participant Timothy.

d. This proposal makes no attempt to provide criteria for making decisions on whether such phrases should be treated as a single reference or two distinct references.

e. Frequently a phrase contains references to two or more participants in the discourse. For example, in the phrase tw'/ qew'/ mou, there is grammaticalized reference to the participant qeov", and mou is a reduced reference to the 'I' participant.

3.3. Boundaries, punctuation, lexical form, and morphological features (Mode)

a. Each word and word group in a discourse is given a unique identifier allowing reference from other elements and documents. [[Should we specify these identifiers as globally unique to the OpenText.org document universe? Or is the document URL+local uniqueness enough?] (-)]

b. Punctuation and any structural and orthographic markings should be marked in-line at the point at which they occur. If the mark occurs between word groups or at the end of a group it should be marked after the end of the group marking (i.e. it should not be included in the word group). For instance: [Ma'rko"] {,} [«Arivstarco"] {,} [Dhma'"] {,} [Louka'"] {,} [oiJ sunergoiv mou] {.} (word group boundaries = [], punctuation boundaries = {}).

c. The boundaries of each word group are ascertained on the basis of the criteria outlined in the definition of a word group (see Definitions). Each word group consists of one head term and any and all its modifiers.

d. Conjunctive particles that do not function within the group (see 3.1) but at a higher level of discourse should not be included in the scope of a word group if this is possible. Thus [Pau'lo"] kai; [Timovqeo"] and not [Pau'lo"] [kai; Timovqeo"]. The annotation of these elements is discussed in the clause level specification (see Clause Level Annotation).

e. This separation is not always possible, as in the case of post-positive conjunctive particles, e.g. [cara;n ga;r pollhvn]. However, it might be justified to break the word group, allowing for the intervening element, e.g. [cara;n]^wg1a ga;r [pollhvn]^wg1b. This is not two separate word groups, but one (wg1) in two parts (a & b) with an intervening word.

f. Variation in word order within a clause may also present cases where a word group is split by another word group, e.g. [me]^wg11a [e[cei"]^wg12 [koinwnovn]^wg11b. Here word group wg12 separates me (the head term of group wg11) from its definer koinwnovn. N.B. Future versions of this specification will specify how these instances should be handled more clearly.

g. The lexical form of each word element is marked as an attribute of the element containing the actual (inflected) form of the text. Verbs should be given in the first person singular present active indicative form (-w form), unless an active form is not found in Hellenistic Greek, then the middle should be used (-omai form). [[Should we specify a standard format for lexical forms? e.g. -w forms of verbs always (even for so-called deponents) etc.] (-)]

h. Each word within the group is categorized as belonging to one of ten possible part-of-speech categories. These categories are formal morphological categories, thus a word like kaiv is simply marked as a particle (though it may be functioning adverbially at a higher level of discourse). Table 1 shows the ten part-of-speech categories and the morphological attributes associated with each one.

Code	Part-of-speech	Attributes
ADJ	Adjective	gender (mas, fem, neu)	case (nom, voc^d, gen, dat, acc)	number (sin, plu)
ADJ	Adjective	[type (pos, com, sup)]^a
ADV	Adverb	[type (pos, com, sup)]^a
ART	Article	gender (mas, fem, neu)	case (nom, gen, dat, acc)	number (sin, plu)
NON	Noun	gender (mas, fem, neu)	case (nom, voc^d, gen, dat, acc)	number (sin, plu)
PAR	Particle	no attributes
PRO	Pronoun	type (int, per, rel, rog, ind, dem, pos, cor, ref, rec, neg)^b	[gender (mas, fem, neu)]	case (nom, gen, dat, acc)
PRO	Pronoun	number (sin, plu)	[person (1, 2, 3)]^c
PRP	Preposition	no attributes
VBF	Finite Verb	tense-form (aor, pre, imp, per, plu, fut)	voice (act, mid, pas, mop)	mood (ind, imp, sub, opt)
VBF	Finite Verb	person (1, 2, 3)	number (sin, plu)
VBP	Participle	tense-form (aor, pre, per, fut)	voice (act, mid, pas, mop)	mood (par)
VBP	Participle	gender (mas, fem, neu)	case (nom, voc^d, gen, dat, acc)	number (sin, plu)
VBN	Infinitive	tense-form (aor, pre, per, fut)	voice (act, mid, pas, mop)	mood (inf)

Table 1. Parts-of-speech Codes and Grammatical (Morphological) Attributes

NOTES

the type attribute for an adjective or adverb indicates whether it is positive, comparative or superlative.
the possible pronoun types are: int = intensive [e.g. aujtov"], per = personal [e.g. ejgwv], rel = relative [e.g. o{"], rog = interogative [e.g. tiv"], ind = indefinite [e.g. ti], dem = demonstrative [e.g. ou|to"], pos = possessive [e.g. ejmov"], cor = correlative [e.g. toiou'to"], ref = reflexive [e.g. ejmautou'] and rec = reciprocal [e.g. ajllhvlwn], neg = negative [e.g. oujdeiv"].
attribute features in square brackets may not occur in some varieties of a part-of-speech, e.g. only personal pronouns have person. All other attributes (not in square brackets) are mandatory for their associated POS.
The vocative case should be marked only when it is morphologically distinct from the nominative. All other instances are marked as nominative.

4. Elements and attributes available for word group annotation

4.1. Namespace

a. The namespace for elements at the word group level is http://www.OpenText.org/ns/word-group.

4.2. Main elements

a. There are five main elements for marking at the word group level: <group>, <w>, part-of-speech elements (e.g. <VBF>, <NON>, <ADJ>, etc.), <wf> and <part>. These are nested in the following manner (the <POS element/> refers to the relevant part-of-speech for each word):


`<wg:group xmlns="http://www.OpenText.org/ns/word-group">`
	`<w>`
		`<POS element/>`
		`<wf>`oJ`</wf>`
	`</w>`
	`<w>`
		`<POS element/>`
		`<wf>`lovgo">`</wf>`
	`</w>`
	...
`</wg:group>`

b. The word group namespace is declared as default on the group element and is therefore applied to all child elements without a prefix (this is a FIXED attribute in the group declaration). The group element has a prefix referring to the same namespace, declared on the document element (e.g. xmlns:wg="http://www.OpenText.org/ns/word-group"). This arrangement is selected to overcome the current lack of support for namespaces in XML DTDs (the group element is actually defined as wg:group in the DTD; see section 7).

c. The element to mark a participant reference (<wg:part>) also carries the word group namespace prefix. Used in-line it surrounds <w> elements within a <wg:group> element (e.g. <wg:part> <w>oJ</w> <w>pisteuvwn</w> <w>ejn</w> <w>qew'/</w> </wg:part>). Used out-of-line outside of the co-textual data or in an external document the element is an extended-link with <start> and <end> locators (see 4.7).

4.3. `<wg:group>` element

syntax:	`<wg:group>`...`</wg:group>`
function:	marks the boundaries of a word group
use:	in-line
contains:	any number and combination of `<w>`, `<punc>`, `<conj>` elements
attributes:	attribute	description and values	status
	`id`	unique identifier e.g. wg1	REQUIRED
	`head`	reference to the identifier of head term word element	REQUIRED
	`dom`	semantic domain number (main domain from Louw-Nida lexicon) for head term	OPTIONAL


Example:
`<wg:group id="wg54" head="w172" dom="54">`
	`<w id="w170">`...`</w>`
	`<w id="w171">`...`</w>`
	`<w id="w172">`...`</w>`
	`<w id="w173">`...`</w>`
	...
`</wg:group>`

4.4. `<w>` element

syntax:	`<w>` ... `</w>`
function:	marks the boundaries of a word element
use:	in-line
contains:	one of the part-of-speech elements followed by a single `<wf>` element
attributes:	attribute	description and values	status
	`id`	unique identifier e.g. w1	REQUIRED
	`modify`	reference to the identifier of another word element in the group which is modified by current element	OPTIONAL
	`rel`	indicates the type of modification values: specify, define, qualify, preposition and connect	OPTIONAL
	`from`	[only used with connect relationship] reference to the identifier of the word joined in connective relationship by current element to another element	OPTIONAL
	`to`	[only used with connect relationship] reference to the identifier of the word joined in connective relationship by current element from another element	OPTIONAL


Example:
`<w id="w169">`
	`<NON/>`
	`<wf>`Filhvmoni`</wf>`
`</w>`
`<w id="w170" modify="w171" rel="specify">`
	`<ART/>`
	`<wf>`tw'/`</wf>`
`</w>`
`<w id="w171" modify="w169" rel="define">`
	`<ADJ/>`
	`<wf>`ajgaphtw'/`</wf>`
`</w>`
`<w id="w172" rel="connect" from="w171" to="173">`
	`<PAR/>`
	`<wf>`kai;`</wf>`
`</w>`
`<w id="w173" modify="w169" rel="define">`
	`<NON/>`
	`<wf>`sunergw'/`</wf>`
`</w>`

4.5. Part-of-speech elements

a. The following tables specify the syntax for each of the ten possible part-of-speech elements.

syntax:	`<ADJ/>`
function:	marks enclosing `<w>` element as an adjective
use:	in-line
contains:	EMPTY
attributes:	attribute	description and values	status
	`gen`	gender of adjective: mas, fem, neu	REQUIRED
	`cas`	case of adjective: nom, voc, gen, dat, acc	REQUIRED
	`num`	number of adjective: sin, plu	REQUIRED
	`type`	adjective type: pos, com, sup	OPTIONAL

Example:

<ADJ gen="dat" cas="nom" num="plu"/>

<ADJ gen="fem" cas="gen" num="sin" type="pos"/>

<ADJ gen="mas" cas="nom" num="sin" type="com"/>

syntax:	`<ADV/>`
function:	marks enclosing `<w>` element as an adverb
use:	in-line
contains:	EMPTY
attributes:	attribute	description and values	status
	`type`	adverb type: pos, com, sup	OPTIONAL

Example:

<ADV/>

<ADV type="pos"/>

<ADV type="sup"/>

syntax:	`<ART/>`
function:	marks enclosing `<w>` element as an article
use:	in-line
contains:	EMPTY
attributes:	attribute	description and values	status
	`gen`	gender of article: mas, fem, neu	REQUIRED
	`cas`	case of article: nom, gen, dat, acc	REQUIRED
	`num`	number of article: sin, plu	REQUIRED

Example:

<ART gen="mas" cas="nom" num="plu"/>

<ART gen="fem" cas="gen" num="sin"/>

<ART gen="neu" cas="dat" num="plu"/>

syntax:	`<NON/>`
function:	marks enclosing `<w>` element as a noun
use:	in-line
contains:	EMPTY
attributes:
`gen`	gender of noun: mas, fem, neu	REQUIRED
`cas`	case of noun: nom, voc, gen, dat, acc	REQUIRED
`num`	number of noun: sin, plu	REQUIRED

Example:

<NON gen="mas" cas="dat" num="sin"/>

<NON gen="fem" cas="gen" num="plu"/>

syntax:	`<PAR/>`
function:	marks enclosing `<w>` element as a particle
use:	in-line
contains:	EMPTY
attributes:	NONE

Example:

<PAR/>

syntax:	`<PRO/>`
function:	marks enclosing `<w>` element as a pronoun
use:	in-line
contains:	EMPTY
attributes:	attribute	description and values	status
	`type`	type of pronoun: int, per, rel, rog, ind, dem, pos, cor, ref, rec, neg	REQUIRED
	`gen`	gender of pronoun: mas, fem, neu [specified only for pronouns that have gender]	OPTIONAL
	`cas`	case of pronoun: nom, gen, dat, acc	REQUIRED
	`num`	number of pronoun: sin, plu	REQUIRED
	`per`	person of pronoun: 1, 2, 3 [specified only for pronouns that have person]	OPTIONAL

Example:

<PRO type="per" case="gen" num="sin" per="1"/>

<PRO type="int" gen="mas" case="gen" num="plu"/>

<PRO type="rel" gen="neu" case="acc" num="sin"/>

syntax:	`<PRP/>`
function:	marks enclosing `<w>` element as a preposition
use:	in-line
contains:	EMPTY
attributes:	NONE

Example:

<PRP/>

syntax:	`<VBF/>`
function:	marks enclosing `<w>` element as a finite verb
use:	in-line
contains:	EMPTY
attributes:	attribute	description and values	status
	`tf`	tense-form of finite verb: aor, pre, imp, per, plu, fut	REQUIRED
	`voc`	voice of finite verb: act, mid, pas, mop	REQUIRED
	`mod`	mood of finite verb: ind, imp, sub, opt	REQUIRED
	`per`	person of finite verb: 1, 2, 3	REQUIRED
	`num`	number of finite verb: sin, plu	REQUIRED

Example:

<VBF tf="aor" voc="mid" mod="ind" per="1" num="plu"/>

<VBF tf="imp" voc="mop" mod="ind" per="3" num="sin"/>

<VBF tf="per" voc="act" mod="imp" per="2" num="plu"/>

syntax:	`<VBP/>`
function:	marks enclosing `<w>` element as a participle
use:	in-line
contains:	EMPTY
attributes:	attribute	description and values	status
	`tf`	tense-form of finite verb: aor, pre, imp, per, plu, fut	REQUIRED
	`voc`	voice of finite verb: act, mid, pas, mop	REQUIRED
	`mod`	mood of finite verb: par	FIXED
	`gen`	gender of noun: mas, fem, neu	REQUIRED
	`cas`	case of noun: nom, voc, gen, dat, acc	REQUIRED
	`num`	number of noun: sin, plu	REQUIRED

Example:

<VBP tf="aor" voc="mid" mod="par" gen="fem" cas="nom" num="plu"/>

<VBP tf="per" voc="act" mod="par" gen="neu" cas="gen" num="sin"/>

<VBP tf="pre" voc="mop" mod="par" gen="mas" cas="nom" num="sin"/>

syntax:	`<VBN/>`
function:	marks enclosing `<w>` element as an infinitive
use:	in-line
contains:	EMPTY
attributes:	attribute	description and values	status
	`tf`	tense-form of finite verb: aor, pre, imp, per, plu, fut	REQUIRED
	`voc`	voice of finite verb: act, mid, pas, mop	REQUIRED
	`mod`	mood of finite verb: inf	FIXED

Example:

<VBN tf="pre" voc="mop" mod="inf"/>

<VBN tf="per" voc="act" mod="inf"/>

<VBN tf="fut" voc="mid" mod="inf"/>

4.6. `<wf>` element

syntax:	`<wf>` ... `</wf>`
function:	marks the inflected (text) form of a word
use:	in-line
contains:	only character data
attributes:	attribute	description and values	status
	`lex`	lexical form of word element	OPTIONAL
	`dom`	list of major domains from Louw-Nida	OPTIONAL

Example:

<wf lex="ajgavph" dom="25,23">ajgavphn</wf>

<wf lex="oJ" dom="92"> to; </wf>

<wf lex="eijmiv" dom="13,85,71"> w]n</wf>

4.7. `<punc>` element

syntax:	`<punc>` ... `</punc>`
function:	marks punctuation and other textual markings
use:	in-line
contains:	only character data
attributes:	NONE

Example:

<punc>,</punc>

<punc>.</punc>

<punc>:</punc>

4.8. `<wg:part>` element

syntax:	`<wg:part>` ... `</wg:part>`
function:	to mark extent and type of reference to a discourse participant
use:	in-line or out-of-line
contains:	in-line: contains one or more `<w>` elements out-of-line: contains one `<start>` and one `<end>` element
attributes:	attribute	description and values	status
	`type`	kind of reference to participant: gram, redu, impl	REQUIRED
	`start`	[used when `<wg:part>` is used out-of-line] reference to first `<w>` of reference to participant, by identifier	OPTIONAL
	`end`	[used when `<wg:part>` is used out-of-line] reference to last `<w>` of reference to participant, by identifier	OPTIONAL


Example: in-line:
`<wg:part type="gram">`
	`<w>`Timovqeo"`</w>`
`</wg:part>`

`<wg:part type="gram">`
	`<w>`oJ`</w>` `<w>`ajdevlfo"`</w>`
`</wg:part>`

`<wg:part type="redu"><w>`aujtou'`</w></wg:part>`
`<wg:part type="impl"><w>`levgei`</w></wg:part>`


out-of-line:
`<wg:part type="gram">`
	`<start href="#w13"/>`
	`<end href="#w15"/>`
`</wg:part>`

`<wg:part type="impl">`
	`<start href="#w45"/>`
	`<end href="#w45"/>`
`</wg:part>`

5. Examples


Mark 8.11 kai; ejxh'lqon oiJ Farisai'oi kai; h[rxanto suzhtei'n aujtw'/, zhtou'nte" para; aujtou' shmei'on ajpo; tou' oujranou', peiravzonte" aujtovn.
`<w id="w1">`
	`<PAR/>`
	`<wf lex="kaiv" dom="89,91">`kai;`</wf>`
`</w>`
`<wg:group id="wg1" head="w2" dom="15">`
	`<w id="w2">`
		`<VBF tf="aor" voc="act" mod="ind" per="3" num="plu"/>`
		`<wf lex="ejxevrxomai" dom="15,13">`ejxh'lqon`</wf>`
	`</w>`
`</wg:group>`
`<wg:group id="wg2" head="w4" dom="11">`
	`<w id="w3" modify="w4" rel="specify">`
		`<ART gen="mas" cas="nom" num="plu"/>`
		`<wf lex="oJ" dom="92">`oiJ`</wf>`
	`</w>`
	`<w id="w4">`
		`<NON gen="mas" cas="nom" num="plu"/>`
		`<wf lex="Farisai'o"" dom="11">`Farisai'oi`</wf>`
	`</w>`
`</wg:group>`
`<w id="w5">`
	`<PAR/>`
	`<wf lex="kaiv" dom="89,91">`kai;`</wf>`
`</w>`
`<wg:group id="wg3" head="w6" dom="68">`
	`<w id="w6">`
		`<VBF tf="aor" voc="mid" mod="ind" per="3" num="plu"/>`
		`<wf lex="a[rcw" dom="68,67,37">`h[rxanto`</wf>`
	`</w>`
`</wg:group>`
`<wg:group id="wg4" head="w7" dom="33">`
	`<w id="w7">`
		`<VBN tf="pre" voc="act" mod="inf"/>`
		`<wf lex="suzhtevw" dom="33">`suzhtei'n`</wf>`
	`</w>`
`</wg:group>`
`<wg:group id="wg5" head="w8" dom="92">`
	`<w id="w8">`
		`<PRO type="int" gen="mas" cas="dat" num="sin" per="3"/>`
		`<wf lex="aujtov"" dom="92">`aujtw'/`</wf>`
	`</w>`
`</wg:group>`
`<punc>`,`</punc>`
`<wg:group id="wg6" head="w9" dom="27">`
	`<w id="w9">`
		`<VBP tf="pre" voc="act" mod="par" gen="mas" cas="nom" num="plu"/>`
		`<wf lex="zhtevw" dom="27,25,33,68,57,13">`zhtou'nte"`</wf>`
	`</w>`
`</wg:group>`
`<wg:group id="wg7" head="w11" dom="92">`
	`<w id="w10" modify="w11" rel="specify">`
		`<PRP/>`
		`<wf lex="parav" dom="83,84,89,90">`para;`</wf>`
	`</w>`
	`<w id="w11">`
		`<PRO type="int" gen="mas" cas="gen" num="sin" per="3"/>`
		`<wf lex="aujtov"" dom="92">`aujtou'`</wf>`
	`</w>`
`</wg:group>`
`<wg:group id="wg8" head="w12" dom="33">`
	`<w id="w12">`
		`<NON gen="neu" cas="acc" num="sin"/>`
		`<wf lex="shmei'on" dom="33">`shmei'on`</wf>`
	`</w>`
`</wg:group>`
`<wg:group id="wg9" head="w15" dom="1">`
	`<w id="w13" modify="w15" rel="specify">`
		`<PRP/>`
		`<wf lex="ajpov" dom="89,90,84,63,67">`ajpo;`</wf>`
	`</w>`
	`<w id="w14" modify="w15" rel="specify">`
		`<ART gen="mas" cas="gen" num="sin"/>`
		`<wf lex="oJ" dom="92">`tou'`</wf>`
	`</w>`
	`<w id="w15">`
		`<NON gen="mas" cas="gen" num="sin"/>`
		`<wf lex="oujranov"" dom="1">`oujranou'`</wf>`
	`</w>`
`</wg:group>`
`<punc>`,`</punc>`
`<wg:group id="wg10" head="w16" dom="88">`
	`<w id="w16">`
		`<VBP tf="per" voc="act" mod="inf" gen="mas" cas="non" num="plu"/>`
		`<wf lex="periavzw" dom="27,88,68">`peiravzonte"`</wf>`
	`</w>`
`</wg:group>`
`<wg:group id="wg11" head="w17" dom="92">`
	`<w id="w17">`
		`<PRO type="int" gen="mas" cas="acc" num="sin" per="3"/>`
		`<wf lex="aujtov"" dom="92">`aujtovn`</wf>`
	`</w>`
`</wg:group>`
`<punc>`.`</punc>`


Out-of-line marking of participant reference for Mark 8.11
`<participant title="Phrasisees">`
`<wg:part type="impl">`
		`<start href="#w2"/>`
	`<end href="#w2"/>`
`</wg:part>`

`<wg:part type="gram">`
		`<start href="#w3"/>`
	`<end href="#w4"/>`
`</wg:part>`

`<wg:part type="impl">`
		`<start href="#w6"/>`
	`<end href="#w6"/>`
`</wg:part>`
`</participant>`

`<participant title="Jesus">`
`<wg:part type="redu">`
		`<start href="#w8"/>`
	`<end href="#w8"/>`
`</wg:part>`

`<wg:part type="redu">`
		`<start href="#w11"/>`
	`<end href="#w11"/>`
`</wg:part>`

`<wg:part type="redu">`
		`<start href="#w17"/>`
	`<end href="#w17"/>`
`</wg:part>`
`</participant>`

6. Use of word group annotation scheme and its components

a. The word group annotation schema provides the foundational level of the OpenText.org discourse model. It is recommended that annotators utilize all of the elements and relationships specified in this document.

b. However, a minimal application could be made using the <w> elements and their child <POS> and <wf> elements. Thus omitting the <wg:group> element and the attributes to specify head terms and the relationships between words. It is imperative that the unique identifier attribute be specified for each word element. This kind of partial application of the scheme will allow full annotation according to this specification at a later point.

7. Document Type Definition



<!--

Word Group Annotation

Version 0.1

http://www.OpenText.org 2000(c)

-->

<!-- ENTITY DEFINITIONS-->

<!ENTITY % domains "1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 
15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | 28 | 29 | 30 | 
31 | 32 | 33 | 34 | 35 | 36 | 37 | 38 | 39 | 40 | 41 | 42 | 43 | 44 | 45 | 46 | 
47 | 48 | 49 | 50 | 51 | 52 | 53 | 54 | 55 | 56 | 57 | 58 | 59 | 60 | 61 | 62 | 
63 | 64 | 65 | 66 | 67 | 68 | 69 | 70 | 71 | 72 | 73 | 74 | 75 | 76 | 77 | 78 | 
79 | 80 | 81 | 82 | 83 | 84 | 85 | 86 | 87 | 88 | 89 | 90 | 91 | 92 | 93">

<!ENTITY % POS "ADJ | ADV | ART | NON | PAR | PRO | PRP | VBF | VBN | VBP">

<!ENTITY % gender "mas | fem | neu">
<!ENTITY % case1 "nom | voc | gen | dat | acc">
<!ENTITY % case2 "nom | gen | dat | acc">
<!ENTITY % number "sing | plur">

<!ENTITY % tense-form1 "aor | pre | imp | per | plu | fut">
<!ENTITY % tense-form2 "aor | pre | per | fut">
<!ENTITY % voice "act | mid | pas | mop">
<!ENTITY % mood "ind | sub | imp | opt">
<!ENTITY % person "1st | 2nd | 3rd">

<!ENTITY % pronouns " int | per | rel | rog | ind | dem | 
 			    pos | cor | ref | neg ">


<!ENTITY % relations "specify | define | qualify | preposition | connect">

<!ENTITY % reference-type "gram | redu | impl">

<!-- End of Entity defintions -->




<!-- WORD GROUP ELEMENT -->
<!ELEMENT wg:group ((wg:part|w|conj)</elem>)>
	<!ATTLIST wg:group id ID #REQUIRED>
	<!ATTLIST wg:group head IDREF #IMPLIED>
	<!ATTLIST wg:group dom (%domains;) #IMPLIED>
	<!ATTLIST wg:group xmlns 'www.OpenText.org/ns/word-group' 
#FIXED>

<!-- WORD ELEMENT -->
<!ELEMENT w ((%POS;), wf)>
	<!ATTLIST w id ID #REQUIRED>
	<!ATTLIST w modify IDREF #IMPLIED>
	<!ATTLIST w rel (%relations;) #IMPLIED>


<!-- POS ELEMENTS -->
<!ELEMENT ADJ EMPTY>
        <!ATTLIST ADJ gen (%gender;) #REQUIRED>
        <!ATTLIST ADJ cas (%case1;) #REQUIRED>
        <!ATTLIST ADJ num (%number;) #REQUIRED>
        <!ATTLIST ADJ type (pos|com|sup) #IMPLIED>

<!ELEMENT ADV EMPTY>

<!ELEMENT ART EMPTY>
        <!ATTLIST ART gen (%gender;) #REQUIRED>
        <!ATTLIST ART cas (%case2;) #REQUIRED>
        <!ATTLIST ART num (%number;) #REQUIRED>

<!ELEMENT NON EMPTY>
        <!ATTLIST NON gen (%gender;) #REQUIRED>
        <!ATTLIST NON cas (%case1;) #REQUIRED>
        <!ATTLIST NON num (%number;) #REQUIRED>

<!ELEMENT PAR EMPTY>

<!ELEMENT PRO EMPTY>
        <!ATTLIST PRO type (%pronouns;) #REQUIRED>
        <!ATTLIST PRO cas  (%case1;) #REQUIRED>
        <!ATTLIST PRO num  (%number;) #REQUIRED>
        <!ATTLIST PRO gen  (%gender;) #IMPLIED>
        <!ATTLIST PRO num  (%number;) #IMPLIED>
        <!ATTLIST PRO per  (%person;) #IMPLIED>

<!ELEMENT PRP EMPTY>

<!ELEMENT VBF EMPTY>
        <!ATTLIST VBF tf  (%tense-form1;) #REQUIRED>
        <!ATTLIST VBF voc  (%voice;) #REQUIRED>
        <!ATTLIST VBF mod  (%mood;) #REQUIRED>
        <!ATTLIST VBF per (%person;) #REQUIRED>
        <!ATTLIST VBF num (%number;) #REQUIRED>

<!ELEMENT VBN EMPTY>
        <!ATTLIST VBN tf  (%tense-form2;) #REQUIRED>
        <!ATTLIST VBN vc  (%voice;) #REQUIRED>
        <!ATTLIST VBN md  (inf) #FIXED>

<!ELEMENT VBP EMPTY>
        <!ATTLIST VBP tf  (%tense-form2;) #REQUIRED>
        <!ATTLIST VBP voc  (%voice;) #REQUIRED>
        <!ATTLIST VBP mod  (par) #FIXED>
        <!ATTLIST VBP gen (%gender;) #REQUIRED>
        <!ATTLIST VBP cas (%case2;) #REQUIRED>
        <!ATTLIST VBP num (%number;) #REQUIRED>

<!-- WORD FORM ELEMENT (holds inflected form) -->
<!ELEMENT wf (#PCDATA)>
	<!ATTLIST wf lex CDATA #IMPLIED>
	<!ATTLIST wf dom CDATA #IMPLIED>

Word Group Annotation

Version 0.1 (10/11/2000)

OpenText.org Proposal November 2000

Abstract

Status of this document

Table of contents

1. Introduction

2. Definitions

3. Features analyzed at the word group level

3.1. Semantic domains and relations (Field)

3.2. Participant reference type (Tenor)

3.3. Boundaries, punctuation, lexical form, and morphological features (Mode)

4. Elements and attributes available for word group annotation

4.1. Namespace

4.2. Main elements

4.3. <wg:group> element

4.4. <w> element

4.5. Part-of-speech elements

4.6. <wf> element

4.7. <punc> element

4.8. <wg:part> element

5. Examples

6. Use of word group annotation scheme and its components

7. Document Type Definition

4.3. `<wg:group>` element

4.4. `<w>` element

4.6. `<wf>` element

4.7. `<punc>` element

4.8. `<wg:part>` element