Editors:
Matthew Brook O'Donnell
Stanley E. Porter
Jeffrey T. Reed
Catherine J. Smith
Randall K. Tan
1. Introduction
2. Definitions
3. Features analyzed at the word group level
3.1. Semantic domains
3.2. (Tenor)
3.3. Strucutre, boundaries, punctuation, lexical form, and morphological features (Mode)
4. Elements and attributes available for word group annotation
4.1. Namespace
4.2. Main elements
4.3.
<w>
element
4.4. Part-of-speech elements
4.5.
<wf>
element
4.6.
<punc>
element
5. Examples
6. Use of base level annotation scheme and its components
a. The concept of the 'word' as the basic unit of linguistic analysis is central to many grammatical and lexicographical resources. Though most language users possess an intuitive notion of the word, its definition and analysis has been an enduring problem for linguists and general linguistics. Recent studies making use of the concept of collocation have demonstrated that distinct meanings are often communicated in fixed phrases or combinations of words, quite distinct from the semantic analysis of individual words in the phrase. Morphological analysis of individual words can also be ambiguous, as in the case of the article tw'n which can be identified as genitive plural, but its gender cannot be decided at the word level. This ambiguity is removed when the word is analyzed as part of a word group, e.g. tw'n lovgwn aujtou'.
b. A further justification for considering the word group as the basic unit of analysis is that relations between the members of the group, e.g. a head-term with a genitive modifier take place before the clause level. This allows the clause level analysis to focus on the ordering and interaction of its components (e.g. subject, predicate, complement and adjunct [see Clause Level Annotation]). Thus many features that have traditionally been treated under the label of syntax can be dealt with at the word group level.
a. Though not formally recognized in the current elements and attributes, it is helpful to divide the features analyzed at the word group level according to whether they belong to the field, tenor or mode of discourse.
a. Each word within the group is marked with its major domain number(s) from the Louw-Nida dictionary. If a word is classified in more than one major domain each domain is noted in a comma separated list in the order they appear in the index of Louw-Nida, e.g. 88, 65, 57.
b. The semantic domain of a word group is taken to be that of its head term. If there are more than one possible domain for this term, the annotator must select just one from this list. The entry for a word in the Louw-Nida lexicon often includes citations associated with a specific domain classification. These should be consulted when selecting the domain classification of a head term. In addition, the annotator should pay attention to the domains of surrounding words, as by collocation they may help the disambiguation process. [[Do we need to specify criteria for this selection? What are they?] (-)]
a. No features under tenor at base level?
a. Each word and word group in a discourse is given a unique identifier allowing reference from other elements and documents. [[Should we specify these identifiers as globally unique to the OpenText.org document universe? Or is the document URL+local uniqueness enough?] (-)]
b. Punctuation and any structural and orthographic markings should be marked in-line at the point at which they occur. If the mark occurs between word groups or at the end of a group it should be marked after the end of the group marking (i.e. it should not be included in the word group). For instance: [Ma'rko"] {,} [«Arivstarco"] {,} [Dhma'"] {,} [Louka'"] {,} [oiJ sunergoiv mou] {.} (word group boundaries = [], punctuation boundaries = {}).
c. The boundaries of each word group are ascertained on the basis of the criteria outlined in the definition of a word group (see Definitions). Each word group consists of one head term and any and all its modifiers.
d. Conjunctive particles that do not function within the group (see 3.1) but at a higher level of discourse should not be included in the scope of a word group if this is possible. Thus [Pau'lo"] kai; [Timovqeo"] and not [Pau'lo"] [kai; Timovqeo"]. The annotation of these elements is discussed in the clause level specification (see Clause Level Annotation).
e. This separation is not always possible, as in the case of post-positive conjunctive particles, e.g. [cara;n ga;r pollhvn]. However, it might be justified to break the word group, allowing for the intervening element, e.g. [cara;n]wg1a ga;r [pollhvn]wg1b. This is not two separate word groups, but one (wg1) in two parts (a & b) with an intervening word.
f. Variation in word order within a clause may also present cases where a word group is split by another word group, e.g. [me]wg11a [e[cei"]wg12 [koinwnovn]wg11b. Here word group wg12 separates me (the head term of group wg11) from its definer koinwnovn. N.B. Future versions of this specification will specify how these instances should be handled more clearly.
g. The lexical form of each word element is marked as an attribute of the element containing the actual (inflected) form of the text. Verbs should be given in the first person singular present active indicative form (-w form), unless an active form is not found in Hellenistic Greek, then the middle should be used (-omai form). [[Should we specify a standard format for lexical forms? e.g. -w forms of verbs always (even for so-called deponents) etc.] (-)]
h. Each word within the group is categorized as belonging to one of ten possible part-of-speech categories. These categories are formal morphological categories, thus a word like kaiv is simply marked as a particle (though it may be functioning adverbially at a higher level of discourse). Table 1 shows the ten part-of-speech categories and the morphological attributes associated with each one.
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Table 1. Parts-of-speech Codes and Grammatical (Morphological) Attributes |
NOTES
|
a. No namespace for base level in current version.
a. There are main elements for marking at the word group level: <book>
, <chapter>
, <verse>
, <w>
, part-of-speech elements (e.g. <VBF>
, <NON>
, <ADJ>
, etc.), <wf>
and <sem>
.
These are nested in the following manner (the <POS element/>
refers to the relevant part-of-speech for each word):
<book>
|
|||||
<chapter num="1">
|
|||||
<verse num="1">
|
|||||
<w id="">
|
|||||
<POS element/>
|
|||||
<wf betaLex="o(" betaForm="o(" lex="">
oJ
</wf>
|
|||||
<sem>
|
|||||
<dom majorNum="92".../>
|
|||||
... | |||||
</sem>
|
|||||
</w>
|
|||||
<w>
|
|||||
<POS element/>
|
|||||
<wf betaLex="lo/gos" betaForm="lo/gos" lex="">
lovgo"</wf>
|
|||||
</w>
|
|||||
... | |||||
</verse>
|
|||||
... | |||||
</chapter>
|
|||||
... | |||||
</book>
|
<w>
elementsyntax: |
<w> ... </w>
|
||
function: | marks the boundaries of a word element | ||
use: | in-line | ||
contains: | one of the part-of-speech elements followed by a single <wf> element |
||
attributes: | attribute | description and values | status |
id | unique identifier e.g. w1 | REQUIRED |
Example: | |
<w id="w169">
|
|
<NON/>
|
|
<wf>
Filhvmoni
</wf>
|
|
</w>
|
|
<w id="w170" modify="w171" rel="specify">
|
|
<ART/>
|
|
<wf>
tw'/
</wf>
|
|
</w>
|
|
<w id="w171" modify="w169" rel="define">
|
|
<ADJ/>
|
|
<wf>
ajgaphtw'/
</wf>
|
|
</w>
|
|
<w id="w172" rel="connect" from="w171" to="173">
|
|
<PAR/>
|
|
<wf>
kai;
</wf>
|
|
</w>
|
|
<w id="w173" modify="w169" rel="define">
|
|
<NON/>
|
|
<wf>
sunergw'/
</wf>
|
|
</w>
|
a. The following tables specify the syntax for each of the ten possible part-of-speech elements.
syntax: |
<ADJ/>
|
||
function: | marks enclosing <w> element as an adjective |
||
use: | in-line | ||
contains: | EMPTY | ||
attributes: | attribute | description and values | status |
gen | gender of adjective: mas, fem, neu | REQUIRED | |
cas | case of adjective: nom, voc, gen, dat, acc | REQUIRED | |
num | number of adjective: sin, plu | REQUIRED | |
type | adjective type: pos, com, sup | OPTIONAL |
Example: |
<ADJ gen="dat" cas="nom" num="plu"/>
|
<ADJ gen="fem" cas="gen" num="sin" type="pos"/>
|
<ADJ gen="mas" cas="nom" num="sin" type="com"/>
|
syntax: |
<ADV/>
|
||
function: | marks enclosing <w> element as an adverb |
||
use: | in-line | ||
contains: | EMPTY | ||
attributes: | attribute | description and values | status |
type | adverb type: pos, com, sup | OPTIONAL |
Example: |
<ADV/>
|
<ADV type="pos"/>
|
<ADV type="sup"/>
|
syntax: |
<ART/>
|
||
function: | marks enclosing <w> element as an article |
||
use: | in-line | ||
contains: | EMPTY | ||
attributes: | attribute | description and values | status |
gen | gender of article: mas, fem, neu | REQUIRED | |
cas | case of article: nom, gen, dat, acc | REQUIRED | |
num | number of article: sin, plu | REQUIRED |
Example: |
<ART gen="mas" cas="nom" num="plu"/>
|
<ART gen="fem" cas="gen" num="sin"/>
|
<ART gen="neu" cas="dat" num="plu"/>
|
syntax: |
<NON/>
|
||
function: | marks enclosing <w> element as a noun |
||
use: | in-line | ||
contains: | EMPTY | ||
attributes: | |||
gen | gender of noun: mas, fem, neu | REQUIRED | |
cas | case of noun: nom, voc, gen, dat, acc | REQUIRED | |
num | number of noun: sin, plu | REQUIRED |
Example: |
<NON gen="mas" cas="dat" num="sin"/>
|
<NON gen="fem" cas="gen" num="plu"/>
|
syntax: |
<PAR/>
|
||
function: | marks enclosing <w> element as a particle |
||
use: | in-line | ||
contains: | EMPTY | ||
attributes: | NONE |
Example: |
<PAR/>
|
syntax: |
<PRO/>
|
||
function: | marks enclosing <w> element as a pronoun |
||
use: | in-line | ||
contains: | EMPTY | ||
attributes: | attribute | description and values | status |
type | type of pronoun: int, per, rel, rog, ind, dem, pos, cor, ref, rec, neg | REQUIRED | |
gen | gender of pronoun: mas, fem, neu [specified only for pronouns that have gender] | OPTIONAL | |
cas | case of pronoun: nom, gen, dat, acc | REQUIRED | |
num | number of pronoun: sin, plu | REQUIRED | |
per | person of pronoun: 1, 2, 3 [specified only for pronouns that have person] | OPTIONAL |
Example: |
<PRO type="per" case="gen" num="sin" per="1"/>
|
<PRO type="int" gen="mas" case="gen" num="plu"/>
|
<PRO type="rel" gen="neu" case="acc" num="sin"/>
|
syntax: |
<PRP/>
|
||
function: | marks enclosing <w> element as a preposition |
||
use: | in-line | ||
contains: | EMPTY | ||
attributes: | NONE |
Example: |
<PRP/>
|
syntax: |
<VBF/>
|
||
function: | marks enclosing <w> element as a finite verb |
||
use: | in-line | ||
contains: | EMPTY | ||
attributes: | attribute | description and values | status |
tf | tense-form of finite verb: aor, pre, imp, per, plu, fut | REQUIRED | |
voc | voice of finite verb: act, mid, pas, mop | REQUIRED | |
mod | mood of finite verb: ind, imp, sub, opt | REQUIRED | |
per | person of finite verb: 1, 2, 3 | REQUIRED | |
num | number of finite verb: sin, plu | REQUIRED |
Example: |
<VBF tf="aor" voc="mid" mod="ind" per="1" num="plu"/>
|
<VBF tf="imp" voc="mop" mod="ind" per="3" num="sin"/>
|
<VBF tf="per" voc="act" mod="imp" per="2" num="plu"/>
|
syntax: |
<VBP/>
|
||
function: | marks enclosing <w> element as a participle |
||
use: | in-line | ||
contains: | EMPTY | ||
attributes: | attribute | description and values | status |
tf | tense-form of finite verb: aor, pre, imp, per, plu, fut | REQUIRED | |
voc | voice of finite verb: act, mid, pas, mop | REQUIRED | |
mod | mood of finite verb: par | FIXED | |
gen | gender of noun: mas, fem, neu | REQUIRED | |
cas | case of noun: nom, voc, gen, dat, acc | REQUIRED | |
num | number of noun: sin, plu | REQUIRED |
Example: |
<VBP tf="aor" voc="mid" mod="par" gen="fem" cas="nom" num="plu"/>
|
<VBP tf="per" voc="act" mod="par" gen="neu" cas="gen" num="sin"/>
|
<VBP tf="pre" voc="mop" mod="par" gen="mas" cas="nom" num="sin"/>
|
syntax: |
<VBN/>
|
||
function: | marks enclosing <w> element as an infinitive |
||
use: | in-line | ||
contains: | EMPTY | ||
attributes: | attribute | description and values | status |
tf | tense-form of finite verb: aor, pre, imp, per, plu, fut | REQUIRED | |
voc | voice of finite verb: act, mid, pas, mop | REQUIRED | |
mod | mood of finite verb: inf | FIXED |
Example: |
<VBN tf="pre" voc="mop" mod="inf"/>
|
<VBN tf="per" voc="act" mod="inf"/>
|
<VBN tf="fut" voc="mid" mod="inf"/>
|
<wf>
elementsyntax: |
<wf> ... </wf>
|
||
function: | marks the inflected (text) form of a word | ||
use: | in-line | ||
contains: | only character data | ||
attributes: | attribute | description and values | status |
lex | lexical form of word element in UTF-8 (Unicode) form | OPTIONAL | |
betaLex | lexical form of word element in Betacode greek encoding | OPTIONAL | |
betaForm | form of word element in Betacode greek encoding | OPTIONAL |
Example: |
<wf lex="ajgavph" dom="25,23">
ajgavphn
</wf>
|
<wf lex="oJ" dom="92">
to;
</wf>
|
<wf lex="eijmiv" dom="13,85,71">
w]n
</wf>
|
<punc>
elementsyntax: |
<punc> ... </punc>
|
||
function: | marks punctuation and other textual markings | ||
use: | in-line | ||
contains: | only character data | ||
attributes: | NONE |
Example: |
<punc>
,
</punc>
|
<punc>
.
</punc>
|
<punc>
:
</punc>
|
Mark 8.11 kai; ejxh'lqon oiJ Farisai'oi kai; h[rxanto suzhtei'n aujtw'/, zhtou'nte" para; aujtou' shmei'on ajpo; tou' oujranou', peiravzonte" aujtovn. |
||
<w id="w1">
|
||
<PAR/>
|
||
<wf lex="kaiv" dom="89,91">
kai;
</wf>
|
||
</w>
|
||
<wg:group id="wg1" head="w2" dom="15">
|
||
<w id="w2">
|
||
<VBF tf="aor" voc="act" mod="ind" per="3" num="plu"/>
|
||
<wf lex="ejxevrxomai" dom="15,13">
ejxh'lqon
</wf>
|
||
</w>
|
||
</wg:group>
|
||
<wg:group id="wg2" head="w4" dom="11">
|
||
<w id="w3" modify="w4" rel="specify">
|
||
<ART gen="mas" cas="nom" num="plu"/>
|
||
<wf lex="oJ" dom="92">
oiJ
</wf>
|
||
</w>
|
||
<w id="w4">
|
||
<NON gen="mas" cas="nom" num="plu"/>
|
||
<wf lex="Farisai'o"" dom="11">
Farisai'oi
</wf>
|
||
</w>
|
||
</wg:group>
|
||
<w id="w5">
|
||
<PAR/>
|
||
<wf lex="kaiv" dom="89,91">
kai;
</wf>
|
||
</w>
|
||
<wg:group id="wg3" head="w6" dom="68">
|
||
<w id="w6">
|
||
<VBF tf="aor" voc="mid" mod="ind" per="3" num="plu"/>
|
||
<wf lex="a[rcw" dom="68,67,37">
h[rxanto
</wf>
|
||
</w>
|
||
</wg:group>
|
||
<wg:group id="wg4" head="w7" dom="33">
|
||
<w id="w7">
|
||
<VBN tf="pre" voc="act" mod="inf"/>
|
||
<wf lex="suzhtevw" dom="33">
suzhtei'n
</wf>
|
||
</w>
|
||
</wg:group>
|
||
<wg:group id="wg5" head="w8" dom="92">
|
||
<w id="w8">
|
||
<PRO type="int" gen="mas" cas="dat" num="sin" per="3"/>
|
||
<wf lex="aujtov"" dom="92">
aujtw'/
</wf>
|
||
</w>
|
||
</wg:group>
|
||
<punc> ,</punc>
|
||
<wg:group id="wg6" head="w9" dom="27">
|
||
<w id="w9">
|
||
<VBP tf="pre" voc="act" mod="par" gen="mas" cas="nom" num="plu"/>
|
||
<wf lex="zhtevw" dom="27,25,33,68,57,13">
zhtou'nte"
</wf>
|
||
</w>
|
||
</wg:group>
|
||
<wg:group id="wg7" head="w11" dom="92">
|
||
<w id="w10" modify="w11" rel="specify">
|
||
<PRP/>
|
||
<wf lex="parav" dom="83,84,89,90">
para;
</wf>
|
||
</w>
|
||
<w id="w11">
|
||
<PRO type="int" gen="mas" cas="gen" num="sin" per="3"/>
|
||
<wf lex="aujtov"" dom="92">
aujtou'
</wf>
|
||
</w>
|
||
</wg:group>
|
||
<wg:group id="wg8" head="w12" dom="33">
|
||
<w id="w12">
|
||
<NON gen="neu" cas="acc" num="sin"/>
|
||
<wf lex="shmei'on" dom="33">
shmei'on
</wf>
|
||
</w>
|
||
</wg:group>
|
||
<wg:group id="wg9" head="w15" dom="1">
|
||
<w id="w13" modify="w15" rel="specify">
|
||
<PRP/>
|
||
<wf lex="ajpov" dom="89,90,84,63,67">
ajpo;
</wf>
|
||
</w>
|
||
<w id="w14" modify="w15" rel="specify">
|
||
<ART gen="mas" cas="gen" num="sin"/>
|
||
<wf lex="oJ" dom="92">
tou'
</wf>
|
||
</w>
|
||
<w id="w15">
|
||
<NON gen="mas" cas="gen" num="sin"/>
|
||
<wf lex="oujranov"" dom="1">
oujranou'
</wf>
|
||
</w>
|
||
</wg:group>
|
||
<punc> ,</punc>
|
||
<wg:group id="wg10" head="w16" dom="88">
|
||
<w id="w16">
|
||
<VBP tf="per" voc="act" mod="inf" gen="mas" cas="non" num="plu"/>
|
||
<wf lex="periavzw" dom="27,88,68">
peiravzonte"
</wf>
|
||
</w>
|
||
</wg:group>
|
||
<wg:group id="wg11" head="w17" dom="92">
|
||
<w id="w17">
|
||
<PRO type="int" gen="mas" cas="acc" num="sin" per="3"/>
|
||
<wf lex="aujtov"" dom="92">
aujtovn
</wf>
|
||
</w>
|
||
</wg:group>
|
||
<punc> .</punc>
|
a. The base level annotation schema provides the foundational level of the OpenText.org discourse model. It is recommended that annotators utilize all of the elements and relationships specified in this document.
<!-- Base Level Annotation Version 0.2 http://www.OpenText.org 2000-2004 (c) --> <!-- ENTITY DEFINITIONS--> <!ENTITY % domains "1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | 28 | 29 | 30 | 31 | 32 | 33 | 34 | 35 | 36 | 37 | 38 | 39 | 40 | 41 | 42 | 43 | 44 | 45 | 46 | 47 | 48 | 49 | 50 | 51 | 52 | 53 | 54 | 55 | 56 | 57 | 58 | 59 | 60 | 61 | 62 | 63 | 64 | 65 | 66 | 67 | 68 | 69 | 70 | 71 | 72 | 73 | 74 | 75 | 76 | 77 | 78 | 79 | 80 | 81 | 82 | 83 | 84 | 85 | 86 | 87 | 88 | 89 | 90 | 91 | 92 | 93"> <!ENTITY % POS "ADJ | ADV | ART | NON | PAR | PRO | PRP | VBF | VBN | VBP"> <!ENTITY % gender "mas | fem | neu"> <!ENTITY % case1 "nom | voc | gen | dat | acc"> <!ENTITY % case2 "nom | gen | dat | acc"> <!ENTITY % number "sing | plur"> <!ENTITY % tense-form1 "aor | pre | imp | per | plu | fut"> <!ENTITY % tense-form2 "aor | pre | per | fut"> <!ENTITY % voice "act | mid | pas | mop"> <!ENTITY % mood "ind | sub | imp | opt"> <!ENTITY % person "1st | 2nd | 3rd"> <!ENTITY % pronouns " int | per | rel | rog | ind | dem | pos | cor | ref | neg "> <!-- End of Entity defintions --> <!-- STRUCTURAL ELEMENTS --> <!ELEMENT book (header, chapter+)> <!ELEMENT header ()> <!ELEMENT chapter (verse+)> <!ATTLIST chapter num #REQUIRED> <!ELEMENT verse ((w | punc)+)> <!ATTLIST verse num #REQUIRED> <!-- WORD ELEMENT --> <!ELEMENT w ((%POS;), wf, sem)> <!ATTLIST w id ID #REQUIRED> <!-- POS ELEMENTS --> <!ELEMENT ADJ EMPTY> <!ATTLIST ADJ gen (%gender;) #REQUIRED> <!ATTLIST ADJ cas (%case1;) #REQUIRED> <!ATTLIST ADJ num (%number;) #REQUIRED> <!ATTLIST ADJ type (pos|com|sup) #IMPLIED> <!ELEMENT ADV EMPTY> <!ELEMENT ART EMPTY> <!ATTLIST ART gen (%gender;) #REQUIRED> <!ATTLIST ART cas (%case2;) #REQUIRED> <!ATTLIST ART num (%number;) #REQUIRED> <!ELEMENT NON EMPTY> <!ATTLIST NON gen (%gender;) #REQUIRED> <!ATTLIST NON cas (%case1;) #REQUIRED> <!ATTLIST NON num (%number;) #REQUIRED> <!ELEMENT PAR EMPTY> <!ELEMENT PRO EMPTY> <!ATTLIST PRO type (%pronouns;) #REQUIRED> <!ATTLIST PRO cas (%case1;) #REQUIRED> <!ATTLIST PRO num (%number;) #REQUIRED> <!ATTLIST PRO gen (%gender;) #IMPLIED> <!ATTLIST PRO num (%number;) #IMPLIED> <!ATTLIST PRO per (%person;) #IMPLIED> <!ELEMENT PRP EMPTY> <!ELEMENT VBF EMPTY> <!ATTLIST VBF tf (%tense-form1;) #REQUIRED> <!ATTLIST VBF voc (%voice;) #REQUIRED> <!ATTLIST VBF mod (%mood;) #REQUIRED> <!ATTLIST VBF per (%person;) #REQUIRED> <!ATTLIST VBF num (%number;) #REQUIRED> <!ELEMENT VBN EMPTY> <!ATTLIST VBN tf (%tense-form2;) #REQUIRED> <!ATTLIST VBN vc (%voice;) #REQUIRED> <!ATTLIST VBN md (inf) #FIXED> <!ELEMENT VBP EMPTY> <!ATTLIST VBP tf (%tense-form2;) #REQUIRED> <!ATTLIST VBP voc (%voice;) #REQUIRED> <!ATTLIST VBP mod (par) #FIXED> <!ATTLIST VBP gen (%gender;) #REQUIRED> <!ATTLIST VBP cas (%case2;) #REQUIRED> <!ATTLIST VBP num (%number;) #REQUIRED> <!-- WORD FORM ELEMENT (holds inflected form) --> <!ELEMENT wf (#PCDATA)> <!ATTLIST wf lex CDATA #IMPLIED> <!ATTLIST wf betaLex CDATA #IMPLIED> <!ATTLIST wf betaForm CDATA #IMPLIED> <!-- SEMANTIC DOMAINS --> <!ELEMENT sem (dom+)> <!ELEMENT dom EMPTY> <!ATTLIST dom majorNum (%domains) #REQUIRED> <!ATTLIST dom subNum CDATA #IMPLIED> <!ATTLIST dom primarydomain CDATA #IMPLIED>